Monday, January 6, 2025

The future still isn't what it used to be: Vannevar Bush

(According to Blogger, this is the 700th post on this blog, which seems like a completely arbitrary milestone to note, but I noticed it nonetheless, so now you get to. You're welcome.)

Vannevar Bush casts something of a long shadow. He held several high-level technology-related posts in the FDR and Truman administrations, had a long and distinguished academic career at MIT and elsewhere, and won several prestigious awards, including the National Medal of Science. His students included Claude Shannon, whose work in information theory is still directly relevant, and Frederick Terman, who was influential in the development of what we now call Silicon Valley (I used to work fairly near Terman Drive in Palo Alto).

Bush is also often credited with anticipating the World-Wide Web in his Atlantic Monthly article As We May Think. Since I've been comparing early visions of the Web with what actually happened, I thought I'd take a look. I've linked to the ACM version rather than the Atlantic's version, which may or may not even be online, since the ACM version highlights the relevant passages. Though there's a Wikipedia page on the piece, I've deliberately skipped it in favor of Bush's original text (with the ACM's highlights).

Two things jump out immediately, neither directly relevant to the web:

  • The language is relentlessly gendered. Men do science. Girls [sic] sit in front of keyboards typing in data for men of science to use in their work. A mathematician is a particular kind of man, technology has improved man's life, and so forth. Yes, this is 1945, and we expect a certain amount of this, but from what I can tell Bush's style stands out even for the time. I mention this mainly as a heads-up for anyone who wants to go back and read the original piece -- which I do nonetheless recommend.
  • There is an awful lot of technical detail about technologies that would be obsolete within a couple of decades, and in several cases nearly fossilized by the dawn of the Internet in the 1970s. Bush speculates in detail about microphotography, facsimile machines, punch cards, analog computers, vacuum tubes, photocells and on and on for pages. Yes, all of these still existed in the 1970s (I spent many an hour browsing old newspapers and magazines on microfilm as a kid), but digital technology would make most if not all of them irrelevant before much longer. As far as predicting the technology underpinning the web, Bush's record is nearly perfect: If he speculated about it, it almost certainly isn't relevant to today's web.
Two thoughts on this. First, it's almost impossible to speculate about the future without mentioning at least something that will be hopelessly out of date by the time that future arrives. In our own time, all we have are the tools and mental models of the world of that time. I don't fault Bush for thinking about the future in terms of photographic storage, and I don't this takes anything away from his thoughts on the "Memex", which is what people are referring to when they talk about Bush anticipating the web.

I just wish he hadn't done nearly so much of it. Alan Turing's Computing Machinery and Intelligence spends two sentences on the idea of using a teleprinter so that it's not obvious whether there's a human or machine on the other end of the conversation, and one of those sentences just says that this is only one possible approach. That seems about right for that paper. In Bush's case, I could see a few paragraphs about how to store large amounts of information (for those days, at least) on film or magnetic media, and so forth. The article would have been much shorter, but no less interesting.

Second it's worth noting how many things were possible with mid 1900s technology. You could convert, both ways, between sound, image and video (in the sense of moving images) on the one hand and electrical signals on the other. You could store electrical signals magnetically. You could communicate them over a distance. You could store digital information in a variety of forms, including the famous punched cards, but also magnetically.

There were ways to produce synthesized speech and read printed text. Selecting machines could do boolean queries on data (Bush gives the example of "all employees who live in Trenton and know Spanish"). Telephone switching networks could connect any of millions of phones to any other in about the time it took to dial (and less time than it sometimes takes my phone to set up a call using my WiFi). Logic gates existed. For that matter, the first general-purpose digital computer, the ENIAC, existed in 1945 and Bush would certainly have known about its development.

In other words, even in 1945, Bush isn't drawing on a blank canvas. He's trying to pull existing pieces of technology together in a new way in order to deal with what was, even at the time, an overwhelming surplus of information. The gist of the argument is "If we make these existing technologies smaller, faster and cheaper, and put them together in this particular way, we can make it easier to deal with all this information."


The particular problem Bush is really interested in isn't so much storing information as retrieving it ("selecting" as Bush says). This is totally understandable for a national science adviser who had until recently been working on one of the largest technological efforts to date (the Manhattan Project). Bush cites Gregor Mendel's work having been essentially unknown until decades after the fact as just one example of a significant advance nearly being lost because no one knew about it, even though it was there to be found. Bush's desire to prevent this sort of thing in the future is palpable.

Bush mentions traditional indexing systems that can find items by successively narrowing down the search space (everything starting with 'F', everything within that with second letter 'i' ... ah, here it is, Field Notes on the Web), but he's much more interested in following a trail of connections from one document to another. That is, he's envisioning a vast collection of documents traversable by following links between them. That's the world-wide web. Ok, we're done.


Except ...

Bush sees the Memex as literally a piece of furniture, looking pretty much like a desk but with a keyboard attached along with various projection screens and a few other attachments. Inside it is a store of microfilmed documents together with some writable film, which takes up a small portion of the space under the desk, and a whole bunch of machinery to be named later, taking up most of the space.

Associated with each document is a writable area containing some number of code spaces, each of which can hold the index code of a document. There's also a top-level code book to get you started, and when you add a new document, you add it to the code book. To be honest, this seems a bit tedious.

To link two documents together, you pull them both up, one on one projection screen and the other on the other, and press a button. This writes the index code for each document in the other's next open code space. The next time you pull up either of the documents, you can select a code space and pull up the document with that code.

Codes are meant to have two parts: a human-readable text code and a "positional" numeric code (probably binary or maybe decimal). Linking this post to Bush's article might add "Bush-as-we-may-think" to a code space for this post, along with (somewhere offscreen) the numeric index for Bush's article, and "Field-notes-future-ramblings-Bush" to a code space on Bush's article (along with the numeric code for this post). At that point you've got one link in a presumably much larger web.  Actually, you have two links, or one-bidirectional link if you prefer. Not quite Xanadu's transclusion, but arguably closer than what we  actually have.

Pretty webby, except ... coupla things ...

For one thing, this is all happening on my Memex. My copy of this post is linked with my copy of Bush's article. Yours remains untouched. If there's a way of copying either content or links from one Memex to another, I didn't catch it. Bush's description of how document linking works is hand-wavy enough that it wouldn't be particularly more hand-wavy to talk about a syncing mechanism (and/or an update mechanism), but I doubt Bush was thinking in that direction.

Bush seems to be thinking more about a memory aid for an individual person (or possibly a household or small office/laboratory). Functionally, it's a personal library with much larger capacity and the ability to leave trails among documents. It's certainly an interesting idea, but it misses the "world-wide" part. When I link to the ACM's version of Bush's paper, the link is from my blog to the ACM's site. If you write something and link it to Bush's paper, we're pointing at the same thing, not separate copies of it, and we're pointing to a thing that might be stored anywhere in the world (and someplace else next time we access it).

In the same post I mentioned above, I talk about a couple of features that make the web the web, particularly that a link can be dangling -- pointing to nothing -- and it can become broken -- you pointed at a page, but that page is no longer there (early posts on this blog are full of these, though at the time it wasn't clear whether rotting links would be an issue as storage got cheaper; it is). There's also some ambiguity as to what exactly a link is pointing to. If I point to the front page of a news site, for example, the contents on the other end of that link will probably be different tomorrow. In other cases, it's worth going to some effort to ensure the contents don't change significantly.

These may seem like bugs at first glance, but for the most part, they're features, because the flexibility they provide allows the web to be decoupled. I can do what I like with my site without caring or even knowing what links to it. Since a Memex is a closed system, none of this really applies. On the one hand, it's not a problem, but on the other hand, it's not a problem because a Memex is not a distributed system, which the web as we know it very much is.

Finally, the mechanism of linking is noticeably different from what HTML does. You have a pair of links between documents (or maybe pages of documents?). An HTML link is between a particular piece of the source document to, in general, a particular anchor on the destination document. To be fair, this doesn't seem like an essential difference. You could imagine a Memex with a linking mechanism that goes from a piece of one document to a piece of another, which would be much more like an HTML link (and, arguably, more like a Xanadu transclusion).


So did Vannevar Bush anticipate the web by nearly half a century?

I think the fair answer is "not really", because the distributed, dynamic nature of the web is critical.

Did he anticipate the idea of an interconnected web of documents? I think the fair answer is "sorta". Again, actual web links are one-directional and non-intrusive. You can link from document A to document B without doing anything at all to document B or its associated metadata. You don't need a backlink and you generally won't have one.

This one-way form of link was not a new idea. Documents have been referencing each other forever. Bush's notion of linking is different from an HTML link, and since an HTML link is structurally the same as a reference in a footnote in a book, it's different from that as well.

In other words, the original idea in Bush's work is more an evolutionary dead end than an innovation. A pretty interesting dead end, but a dead end just the same.


Postscript:

There's one more thing that I'd been meaning to mention but, embarrassingly enough, forgot to: search. Bush is quite right in saying that people access information by content, but in the Memex world everything eventually boils down to an index number. You access document 12345, not "any documents mentioning Memex" or whatever.

Search is probably the aspect of the web with the least precedent in mid-1900s technology. There were ways to attach index numbers to things, or even content tags, and retrieve them, with a minimum of human intervention. Bush goes into those at length. But if you wanted to get to something by what was in it, you needed a person for that, if only to add indexing information. Indeed, Memex is aimed directly at making it easier for a human to do that task, by making it easy to leave a trail of breadcrumbs a human could easily follow.

It would be almost a half-century before documents could be easily accessed by way of what was in them.


Oh, and also ... in Bush's vision, linking documents together would be a frequent activity for anyone using a Memex. In today's web, not so much, except, I think, in the particular case of re-whatevering a piece of social media content. I think the reason for that is also search (see this early post for a take on that).

No comments: