Tuesday, September 30, 2014

Heartbleed, Shellshock and Raymond's Linus's Law

You have probably heard by now that bash, one of the basic tools in the Linux/GNU toolkit, has had a glaring vulnerability for the last, oh, twenty-plus years, now deemed Shellshock.  You've probably also heard of the Heartbleed vulnerability in OpenSSL.  Apart from making international press and raising serious questions about computer security, these two bugs have a number of features in common:
  • They're implementation bugs.  Bash, as defined in its documentation, does not allow the sort of behavior that Shellshock allows, and likewise for SSL (the protocol) and OpenSSL (an implementation of SSL).  In both cases, the implementations were doing things they shouldn't have.
  • They're basic implementation bugs.  In Shellshock, text which should be ignored or discarded is instead interpreted as a command.  In Heartbleed, a reply message which is supposed to have a given length instead has another.
  • No one noticed them for a long time.  In the case of shellshock, a very long time.  Or at least, no one seems to have visibly exploited them.
It's that last item I want to focus on.  In his famous essay The Cathedral and the Bazaar, extolling the virtues of open source development, Eric Raymond claimed that "given enough eyeballs, all bugs are shallow," or in other words, if you had enough people looking at the source code to a system, any serious issues would be flushed out and fixed quickly.  He called this principle Linus's Law, in honor of Linux creator Linus Torvalds (Linus didn't come up with it.  Linus did put forth his own Linus's Law, but it doesn't seem to have garnered much attention).

In any case, despite bash and OpenSSL being two of the most widely used tools in the software world, these basic and serious bugs don't seem to have been flushed out quickly at all.  Now, it is possible that multiple people noticed the problems, shrugged and went on with their lives, or that some entity or another discovered the bugs and exploited them very quietly, but that's not how Raymond's Linus's law is supposed to work.

I think there are two reasons for this.

First, as many have pointed out, there's no convincing evidence that more eyeballs really do mean more bugs found.  Rather, it seems that you quickly hit diminishing returns.  Four people may or may not find about twice as many bugs as two people, but forty people probably won't find twice as many bugs as twenty.  Forty people may not even find twice as many bugs as two.

Exactly why this might be is a good research topic, but I'd guess that a lot of it is because some bugs are easy to find, some aren't, and once you've found the easy bugs throwing more eyeballs at the problem (now there's an image) won't necessarily help find the hard bugs.

One of the sobering implications of Shellshock and Heartbleed is that even simple bugs can be hard to find, but that's not news to anyone who's done much coding.

I think there's a second reason, though, more subtle than the first but worth noting:  There probably aren't really that many eyeballs on the source code to begin with.

In theory, millions of people could have found either of these two bugs.  If you've installed Linux, you have the bash and OpenSSL source code, or if you didn't copy it, you can easily get at it.  Odds are you didn't, though, unless you were actively developing one of those packages.  Why would you?  I use Linux systems all the time.  I don't want to study the source code.  I just want it to work.  I have looked at various parts of the Linux/GNU source, but generally just to see how it worked, not with a particular eye toward finding bugs.  Maybe that makes me a bad net.citizen, but if so, I'm pretty sure I'm in good company.

OK, but there have still been hundreds of contributors to each of those projects.  Surely one of them would have seen the problem code and fixed it?  Not necessarily.  A tool like bash consists of a large number of modules (more or less), and the whole point of breaking things down into modules is that you can work on one without caring (much) about (many of) the others.  Someone who worked on job control in bash would not necessarily have even looked at the environment variable parsing, which is where the problem actually was.

In other words, there might only have been a handful of people who even had the opportunity to find Shellshock or Heartbleed in the source code, and they didn't happen to spot the problems, probably because they were trying to get something else done at the time.


There's another kind of eyeball, though: testers.  Even if only a few people were looking closely at the source, lots of people actually use bash, OpenSSL and other open-source tools.

Fair enough, but again, their attention is not necessarily focused where the bugs are.  Most people logging into a Linux box and using bash are not going to be defining functions in environment variables.  Most script writers aren't either (though git, headed by Linus himself, seems to like to).  It's a moderately tricky thing to do.  Likewise, almost no one using OpenSSL is even going to be in a position to look at heartbeat packets.  Most of us don't even know if we're using OpenSSL or not, though if you've visited an https:// URL, you probably have.

In short, Raymond's implicit assumption that bug-finding is a matter of many independent trials, in the statistical sense, evenly distributed over the space of all possible bugs, looks to be wrong on both counts: "many" and "independent".

[The current Wikipedia article on Linus's law cites Robert Glass's Facts and Fallacies about Software Engineering, which made similar observations in 2003, over a decade before this was posted.  It also no longer seems to mention any version of Linus's law due to Linus himself.  That was removed in this edit  --D.H. Oct 2018]