Friday, October 14, 2011

Dennis Ritchie, 1941 - 2011

I have no intention of turning this blog into an obituaries column, and no desire to see "celebrity deaths come in threes" spill over into the tech world, but having noted the passing of Steve Jobs I feel obliged to note the passing of Dennis Ritchie as well.

You may or may not have heard of him before.  It took the major news outlets a while to pick up the story, and even then it wasn't front page.  For hours the main public source was colleague Rob Pike's Google+ page.  That's not too surprising.  CEO of major corporations and eminent computer scientist are two completely different gigs.  Nonetheless, Ritchie had as profound an effect on the Web As We Know It as anyone else, even though his groundbreaking work predates the web by a good measure.

It's fair to say that the web as we know it would not exist if not for Unix.  The first web server ran on NeXTSTEP,  which traces its roots to Unix [and, in fact, NeXT was run by the late Steve Jobs -- tech is a small world at times -- D.H. Nov 2018].  A huge number of present-day web servers, large and small, run on Linux/GNU which, even though the Linux kernel was developed from scratch and GNU stands for "GNU's Not Unix", provide an environment that's firmly in the Unix lineage.  The HTTP protocol the web runs on has its roots in the older internet protocols and belongs to a school of development in which Unix played a major role.

Ritchie was one of the original developers of Unix.

The Unix operating system, the Linux kernel, many of the GNU tools and countless other useful things (and at least one lame hack) are written in the C language, which is also one of the bases for C++, C#, Objective C and Java, among others.  All in all, C and its descendants account for a large chunk of the software that makes the web run, and for years, before the ANSI C standard, the de facto standard for the language was a book universally called "K&R" after its authors, Brian Kernighan and Dennis Ritchie.  That flavor of the language is still called "K&R C".

Ritchie continued to do significant work throughout his life and won various high honors, including the Association for Computing Machinery's top honor, the Turing award, and the US National Medal of Technology.  He was head of the Lucent Technologies System Software Research Department when he retired in 2007.  He may not have been a cultural icon, but in the world of software geekery he cast a long shadow.


Thursday, October 6, 2011

So ... what version are we on?

Trying to do a bit of tidying up, I tagged a previously-untagged recent post "Web 2.0".  I did this because the post was a followup to an older post that was specifically about Web 2.0, but it felt funny.  Web 2.0 is starting to sound like "Information Superhighway" and "Cyberspace".  A quick check of the Google search timeline for the term suggests that usage peaked around 2007 and has been declining steadily since.  Always on the cutting edge, Field Notes uses the tag most heavily in 2008.

Google's timeline isn't foolproof.  Anything before the late 90s is probably an article that mentioned the date (and Web 2.0) with no stronger indication of when the page is from.  On the other hand, the more recent portion is probably more representative, since there's more metadata around these days.  Also, the numbers are larger, which is often good for washing out errors.

But anyway, are we still in Web 2.0?  Are we up to 3.0?  Does it really matter (spoiler: probably not)?

I've argued before that while Web 1.0 was a game-changing event, Web 2.0 is more a collection of incremental improvements.  Enough incremental improvements can produce significant changes as well, but not in such a way as you can draw a clear bright line between "then" and "now".  The Linux kernel famously spent about 15 years on version 2.x, only just recently moving up to 3.0, and Linus says very clearly that 3.0 essentially just another release with a shiny new number.  From a technical standpoint I'd say we've been on Web 2.x for a while and will continue to be for a while, unless we decide to start calling it 3.x instead.

Because, of course, "Web 2.0" is not a technical term.  Never mind who uses it to what ends in what context.  The ".0" gives the game away to begin with.  A real version 2.0, if it ever exists, is very soon supplanted by 2.0.1, or 2.1, or 2.0b or whatever as the inevitable patches get pushed out, which is why I was careful to say "2.x" above.  "2.0" as popularly used doesn't designate a particular version.  It's supposed to indicate a dramatic change from crufty old 1.0 (or 1.x if you prefer).  In the real world of incremental changes, that trope will only get you so far.

Hmm ... in real life versioning usually goes more like
  • 0.1, 0.2 ... 0.13 ... 0.42 ... 0.613 as we sneak in "just one more" minor tweak before officially turning the thing loose
  • 1.0 First official release.  Everyone collapses in a heap.  The bug reports start coming in
  • 1.1 Yeah, that oughta fix it.
  • 1.1.1, 1.1.2 ... 1.1.73 ... the third number emphasizing these are just "small patches" to our mostly-perfect product -- bug fixes, cosmetic changes, behind-the-scenes total rewrites, major new features important customers were demanding, that sort of thing.
  • 2.0.1 OK, now we've got some snazzy new stuff.  Anything coming up for a while is just going to be a "minor update".  Everyone collapses in a heap.  Bug reports keep coming in.
  • 2.0.2, 2.0.3 ... yeah, we've seen this movie before
  • 5.0, because our latest version is so much better than anything you've ever seen, including our own previous versions (Actually, version 3.x ended in tears, 5.x is largely a rewrite by a different team and no one knows what happened to 4.x  -- maybe that's why one of the co-founders was sleeping under his desk and living on pizza for a couple of months?).
  • 5.0.1, 5.0.2 ... you know the drill
  • Artichoke.  Yep.  Artichoke.  Version numbers are so two-thousand-and-late.  We're going with vegetables now.  Already having long meetings on whether it's Brussels Sprout or Broccoli next.
  • Artichoke 1.1, Artichoke 1.2 ...

Wednesday, October 5, 2011

Steve Jobs, 1955-2011

Well, we all knew it was coming, but you could still feel the earth shift.  None of us in the tech business has remained untouched by Jobs' work, and by extension, Jobs himself.  There was never, nor will there ever be, anyone quite like him.


Crowdsourcing the sky

Astronomy has been likened to watching a baseball game through a soda straw.  For example, the Hubble Deep Field, assembled from 342 images taken over the course of ten days, covers about 1/500,000th of the sky, or about the size of a tennis ball seen a hundred yards away.  It's quite possible to survey large portions of the sky, but there are trade-offs involved since you can only collect so much light so fast.  To cover a large area and still pick up faint objects, you need some combination of a big telescope and a lot of time.  The bigger the telescope (technically, there's more to it than sheer size) the faster you can cover a given area down to a given magnitude (how astronomers measure faintness).

The Large Synoptic Survey Telescope (LSST) is designed to cover the entire sky visible from its location every three days, using a 3.2 gigapixel camera and three very large mirrors.  In doing this, it will produce stupefying amounts of data -- somewhere around 100 petabytes, or 100,000 terabytes, over the course of its survey.  So imagine 100,000 terabyte disk drives, or over 2 million two-sided Blu-ray disks.  Mind, the thing hasn't been built yet, but two of its three mirrors have been cast, which is a reasonable indication people are serious.  Even if it's never finished, there are other sky surveys in progress, for example the Palomar Transient Factory.

Got a snazzy 100 gigabit ethernet connection?  Great!  You can transfer the whole dataset in a season -- start at the spring equinox and you'll be done by the summer solstice.  The rest of us would have to wait a little longer.  My not-particularly-impressive "broadband" connection gets more like 10 megabits, order-of-magnitude, so that'd be more like 2500 years, assuming I don't upgrade in the meantime and leaving aside the small question of where I'd put it all.

Nonetheless, the LSST's mammoth dataset is well within reach of crowdsourcing, even as we know it today:
  • Galaxy Zoo claims that 250,000 people have participated in the project.  Many of them are deadbeats like me who haven't logged in for ages, but suppose there are even 10,000 active participants.
  • The LSST is intended to produce its data over ten years, for an average of around 2-3Gbps.  Still fairly mind-bending -- about a thousand channels worth of HD video, but ...
  • Divide that by our hypothetical 10,000 crowdsourcers and you get 200-300Kbps, not too much at all these days.  Each crowdsourcer could download a 3GB chunk of data in under an hour in the middle of the night or spread it out through the day without noticeably hurting performance.
  • Assuming you kept all the data, you'd need a new terabyte disk every few months, so that's not prohibitive either.
  • The hard part is probably uploading a steady stream of 2-3Gbps (bittorrent wouldn't help here, since each recipient gets a unique chunk of data).  As far as I can tell the bandwidth is there, but at that volume I'm guessing the cost would be significant.
  • In reality, there would probably be various reasons not to ship out all the raw data in real time, but instead send a selection or a condensed version.
Bottom line, it's at least technically possible with today's technology, to say nothing of that available when the LSST actually goes online, to distribute all the raw data to a waiting crowd of amateur astronomers.

Wikipedia references a 2007 press release saying Google has signed up to help.  As usual I don't know anything beyond that, but it does seem like a googley thing to do.