Wednesday, October 5, 2011

Crowdsourcing the sky

Astronomy has been likened to watching a baseball game through a soda straw.  For example, the Hubble Deep Field, assembled from 342 images taken over the course of ten days, covers about 1/500,000th of the sky, or about the size of a tennis ball seen a hundred yards away.  It's quite possible to survey large portions of the sky, but there are trade-offs involved since you can only collect so much light so fast.  To cover a large area and still pick up faint objects, you need some combination of a big telescope and a lot of time.  The bigger the telescope (technically, there's more to it than sheer size) the faster you can cover a given area down to a given magnitude (how astronomers measure faintness).

The Large Synoptic Survey Telescope (LSST) is designed to cover the entire sky visible from its location every three days, using a 3.2 gigapixel camera and three very large mirrors.  In doing this, it will produce stupefying amounts of data -- somewhere around 100 petabytes, or 100,000 terabytes, over the course of its survey.  So imagine 100,000 terabyte disk drives, or over 2 million two-sided Blu-ray disks.  Mind, the thing hasn't been built yet, but two of its three mirrors have been cast, which is a reasonable indication people are serious.  Even if it's never finished, there are other sky surveys in progress, for example the Palomar Transient Factory.

Got a snazzy 100 gigabit ethernet connection?  Great!  You can transfer the whole dataset in a season -- start at the spring equinox and you'll be done by the summer solstice.  The rest of us would have to wait a little longer.  My not-particularly-impressive "broadband" connection gets more like 10 megabits, order-of-magnitude, so that'd be more like 2500 years, assuming I don't upgrade in the meantime and leaving aside the small question of where I'd put it all.

Nonetheless, the LSST's mammoth dataset is well within reach of crowdsourcing, even as we know it today:
  • Galaxy Zoo claims that 250,000 people have participated in the project.  Many of them are deadbeats like me who haven't logged in for ages, but suppose there are even 10,000 active participants.
  • The LSST is intended to produce its data over ten years, for an average of around 2-3Gbps.  Still fairly mind-bending -- about a thousand channels worth of HD video, but ...
  • Divide that by our hypothetical 10,000 crowdsourcers and you get 200-300Kbps, not too much at all these days.  Each crowdsourcer could download a 3GB chunk of data in under an hour in the middle of the night or spread it out through the day without noticeably hurting performance.
  • Assuming you kept all the data, you'd need a new terabyte disk every few months, so that's not prohibitive either.
  • The hard part is probably uploading a steady stream of 2-3Gbps (bittorrent wouldn't help here, since each recipient gets a unique chunk of data).  As far as I can tell the bandwidth is there, but at that volume I'm guessing the cost would be significant.
  • In reality, there would probably be various reasons not to ship out all the raw data in real time, but instead send a selection or a condensed version.
Bottom line, it's at least technically possible with today's technology, to say nothing of that available when the LSST actually goes online, to distribute all the raw data to a waiting crowd of amateur astronomers.

Wikipedia references a 2007 press release saying Google has signed up to help.  As usual I don't know anything beyond that, but it does seem like a googley thing to do.


No comments: