rothwell.im

by Jonathan Rothwell

Goto FAIL: the aftermath

The huge TLS/SSL bug in OS X has finally been patched in OS X 10.9.2. The fix was about four days too late (seriously, would fixing a single line of code and re-issuing it as a delta 10.9.1.1 update have been too hard?) but as always, it’s better late than never. (Naturally, if you’re a Mac user on Mavericks, or an iOS user on 7.0.5 or earlier, you should update your machine now.)

Many words have been said about the bug itself, unearthed in Apple’s TLS stack. Many have (unfairly) blamed the goto command, which is slightly suspect: one could easily have allowed a similar bug to creep in using functional calls and return values. (I’m not sure I agree with the use of goto in this particular case, but it’s not the cause of the problem.) Others have suggested that, had the programmer been conditioned to hit ^I (in Xcode, or ⌘⇧F, had they been using Eclipse or a similar IDE) before pushing, they might’ve spotted the bug, or at least increased the chance it might’ve been picked up during a code review.

Judging by the diff (mirrored on GitHub), it looks unlikely that the duplication of the goto fail happened as a result of a merge conflict.1 It would also be astonishing if Apple didn’t have any unit test coverage on such an important part of the security stack, or at least a vigorous code review process. Code review doesn’t always work: sometimes the programmers reviewing the code are overworked, tired, distracted, drunk, etc., much like the programmers who wrote the code.

An interesting theory was posited on Daring Fireball: the bug first appeared in iOS 6, which was released a short time before iOS was ‘added’ to the NSA’s PRISM programme. I still think that it is unlikely Apple (or any large technology company) would agree to build in a backdoor at the beck and call of an espionage agency. The chain of people that would need to agree to such a thing (and also be sworn to silence), from managerial positions right down to the developer who commits the code, is just too long; furthermore, it also assumes that all executives of all companies are easily bribable micromanagers with fine control over their codebase, with no moral compass whatsoever and in cahoots with the Pentagon/Fort Meade. One person is easier to bribe or plant than a whole chain.

So, the way I see it, there are three possibilities:

  1. The goto fail bug was introduced by accident and is a bona fide software defect. Apple has no automatic testing on this stack (or the test suite is incomplete, or it’s ignored when it fails.) Therefore Apple’s software development policies are shocking. The NSA has better auto-testing than Apple, picked up this defect and (probably) exploited it.
  2. The goto fail bug was introduced by accident and is a bona fide software defect. Apple did have automatic testing on this stack, but a programmer (probably a junior programmer, or an inexperienced QA engineer or test monkey) couldn’t work out why the test for SSLVerifySignedServerKeyExchange was failing, and so set it to be ignored or deleted it. This is extremely bad practice, but I can believe it happened. The NSA picked up this defect and (probably) exploited it.
  3. The goto fail bug was a backdoor deliberately inserted by someone (a planted Apple employee or a contractor) in the pay of a third party. This person also snaffled the automatic test coverage at the same time. No-one noticed, or bothered to question, why this code was removed.

Of course, we’ll likely never know the exact reason why this bug slipped through the net. But there are lessons to be reminded of from the whole sorry affair anyway: patches for critical defects must be issued immediately, and auditing and auto testing is important!

  1. This assumes, of course, that there was only one programmer working on this file for iOS 6/OS X 10.9 and there were no internal (i.e. between Apple employees’ repos) merge conflicts in the interim.