Friday, April 6, 2012

Old albums

A few weeks ago the Encyclop√¶dia Britannica finally threw in the towel and, after over 200 years, stopped publishing the lovely multi-volume sets that have graced bookshelves the world over.  Naturally, there has been a run on the last edition (2010).  When I heard the story on the radio today there were supposed to have been no more than 800 copies left.  I'd be surprised if there were any left by now.

Clearly the whole point of this run is to get at the physical volumes.  The contents can be had digitally for much less.  The doorstop edition is valuable for the same reason any artifact is valuable apart from any utility it may have: rarity and emotional significance.

Imagine you are rummaging through an attic trying to decide what to keep and what to throw.  You run across a stash of vinyl LPs of popular hits from the 70s.  Odds are most if not all of the songs can be had digitally with better sound, but that's not the point.  As with the 2010 Britannica it's the physical artifact that matters.  Do you like that vintage artwork on the jacket with the circular imprint of the record worn into it?  Do you enjoy the tactile experience of dropping the needle on the platter, the crackle and pop of surface noise, the ritual of cleaning any wayward lint from the grooves?

Then you run across an album of photos, page after plain page of pictures tucked into little white corner-pockets, colors desaturated, edges curling.  Tucked into an envelope with them are the negatives.  Scan them and you probably have images of reasonable quality that you can attach to an email, share on your favorite social site and archive durably.  The physical artifact is less important here.  It's the actual images that matter, images you can't get anywhere else.  With the bits, you could create another album as good as the original one.

That's the common question that determines what's really of interest: what can't you get anywhere else?  It's not a matter of songs versus pictures or LPs versus photos.  If the vinyl in the Greatest Hits album is warped and cracked and the album art is nothing special, you may as well just buy the tunes online.  If the photo album is something your great Aunt put together, with cutouts and notes and decorations, you probably want the physical album as much as the images.

If the content is important, then you'll want to get it into the cloud, or at least into bits on some local disk.  If the artifact is important, then the web will play less of a role.

What's twice as big as the internet?

(Yikes ... I went 0 for March!)

I've mentioned before that telescopes can generate a lot of data.  IBM seems inclined to drive the point home by collaborating with ASTRON (the Netherlands Institute for Radio Astronomy) to put together "exascale" computing horsepower behind the world's largest radio telescope.

The telescope is actually (or rather, will be) an array of millions of antennas spread out over a square kilometer, from which the name SKA, for Square Kilometer Array.  This array is expected to produce on the order of an exabyte of data per day.  This is an absolutely ridiculous amount of data by today's standards.  Think one million terabyte disk drives, or twenty million feature film's worth of Blu-ray, or ... according to IBM, twice the daily volume currently carried on the internet.

I'm a little skeptical as to exactly how one measures that, but hey, you've got to trust a press release, right?

So where do you put an exabyte a day worth of data?  Well, you don't.  You're certainly not going to upload it to the web.  Particle physicists are faced with the same problem of having to figure out what portion of a huge data set to keep for later analysis, and a large part of running an experiment is setting up the "trigger" criteria by which the software collecting the data will decide what to keep and what to throw.  IBM and ASTRON's system will be dealing with the same problem, but on an even larger scale.

Or I suppose you could sign up two million people and somehow stream an equal share of the data to each at Blu-ray resolution all day every day, but somehow I doubt that kind of crowdsourcing will help much.