Tuesday, April 16, 2019

Distributed astronomy

Recently, news sources all over the place have been reporting on the imaging of a black hole, or  more precisely, the immediate vicinity of a black hole.  The black hole itself, more or less by definition, can't be imaged (as far as we know so far).  Confusing things a bit more, any image of a black hole will look like a black disc surrounded by a distorted image of what's actually in the vicinity, but because the black hole distorts space-time due to its gravitational field, not because you're looking at something black.  It's the most natural thing in the world to look at the image and think "Oh, that round black area in the middle is the black hole", but it's not.

Full disclosure: I don't completely understand what's going on here.  Katie Bouman has done a really good lecture on how the images were captured, and Matt Strassler has an also really good, though somewhat long overview of how to interpret all this.  I'm relying heavily on both.

Imaging a black hole in a nearby galaxy has been likened to "spotting a bagel on the moon".  A supermassive black hole at the middle of a galaxy is big, but even a "nearby" galaxy is far, far away.

To do such a thing you don't just need a telescope with a high degree of magnification.  The laws of optics place a limit on how detailed an image you can get from a telescope or similar instrument, regardless of the magnification.  The larger the telescope, the higher the resolution, that is, the sharper the image.  This applies equally well to ordinary optical telescopes, X-ray telescopes, radio telescopes and so forth.  For purposes of astronomy these are all considered "light", since they're all forms of electromagnetic radiation and so all follow the same laws.

Actual telescopes can only be built so big, so in order to get sharper images astronomers use interferometry to combine images from multiple telescopes.  If you have a telescope at the South Pole and one in the Atacama desert in Chile, you can combine their images to get the same resolution you would with a giant telescope that spanned from Atacama to the pole.  The drawback is that since you're only sampling a tiny fraction of the light falling on that area, you have to reconstruct the rest of the image using highly sophisticated image processing techniques.  It helps to have more than two telescopes.  The Event Horizon Telescope project that produced the image used eight, across six sites.

Even putting together images from several telescopes, you don't have enough information to precisely know what the full image really would be and you have to be really careful to make sure that the image you reconstruct shows things that are actually there and not artifacts of the processing itself (again, Bouman's lecture goes into detail).  In this case, four teams worked with the raw data independently for seven weeks, using two fundamentally different techniques, to produce the images that were combined into the image sent to the press.  In preparation for that, the image processing techniques themselves were thoroughly tested for their ability to recover images accurately from test data.  All in all, a whole lot of good, careful work by a large number of people went into that (deliberately) somewhat blurry picture.

All of this requires very precise synchronization among the individual telescopes, because interferometry only works for images taken at the same time, or at least to within very small tolerances.  The limiting factor is the frequency of the light used in the image, which for radio telescopes is on the order of gigahertz. This means that images from the telescopes have to be recorded on the order of a billion times a second.  The total image data ran into the petabytes (quadrillions of bytes), with the eight telescopes producing hundreds of terabytes (trillions of bytes) each.

That's a lot of data, which brings us back to the web (as in "Field notes on the ...").  I haven't dug up the exact numbers, but accounts in the popular press say that the telescopes used to produce the black hole images produced "as much data as the LHC produces in a year", which in approximate terms is a staggering amount of data.  A radio interferometer comprising multiple radio telescopes at distant points on the globe is essentially an extremely data-heavy distributed computing system.

Bear in mind that one of the telescopes in question is at the south pole.  Laying cable there isn't a practical option, nor is setting up and maintaining a set of radio relays.  Even satellite communication is spotty.  According to the Wikipedia article, the total bandwidth available is under 10MB/s (consisting mostly of a 50 megabit/second link), which is nowhere near enough for the telescope images, even if stretched out over days or weeks.  Instead, the data was recorded on physical media and flown back to the site where it was actually processed.

I'd initially thought that this only applied to the south pole station, but in fact all six sites flew their data back rather than try to send it over the internet (just to throw numbers out, receiving a petabyte of data over a 10GB/s link would take about a day).   The south pole data just took longer because they had to wait for the antarctic summer.

Not sure if any carrier pigeons were involved.

No comments: