Thursday, July 10, 2008

Again, just what is this "web" thing?

This is post number 200, so the Eubie Blake comment goes double. Before returning to my usual random potshots, I wanted to step back and take another run at the Central Question: What is the web?

In an early post, I stumbled on a working definition I still like: The web is all the resources accessible on the net, whatever resources are and whatever the net is. That's fine as a technical definition, but it needs sauce.

Here are two ways to look at the web: the human point of view and the computer point of view.

The human view has a human shape. I'm blogging on blogger.com. I can check my local weather on weather.com, or at one of my local TV stations, generally using their familiar call letters. Companies have their own chunks of the web, as do governments of all sizes, schools and so forth. It's not hard for an individual to have a web presence and many of us do.

In fact, let's expand that a bit. I was originally equating "chunk of the web" with "domain name", and to some extent that's true for organizations. But for people, it's not. I have a blog here, but I also have accounts all over the place, some off on their own and some connected to other people's accounts. None of this requires me to have a personal domain name. Instead, I get small pieces of other domains.

This is not news, of course. Social networking is all about reflecting human relationships on the web, and the notion of personal datastores is all about letting people manage how their presence diffuses into the web at large. The larger point is that the web, having grown organically through the contributions of millions of people, is structured according to the whims of, and on a good day for the convenience of, people

From a computer's point of view, the web is a fairly strange place, compared to, say, a relational database. There is no single format for a web page, beyond broad statements like "It's often XHTML." Gleaning any more meaningful structure is a hit-or-miss affair. There are various efforts, like microformats, to make web pages more easily digestible, for example by providing ways of saying "this is a date" or "this is a physical location," but there is no requirement for anyone to use them.

There are links between resources, but there may or may not be a clear way to figure out what those links mean (is this a link to another post, or to the author's profile, or to something else entirely?). In many cases the cues are in the text on the page, or in the visual structure, both of which the wise application will generally not even try to understand.

If there's more, it's because the author of the page explicitly put it there in computer-digestible form (generally XML), or used tools that did, and because the application trying to make sense of the page has some knowledge of what the author or tool did. Either that, or someone painstakingly figured out what tags such and such a page happens to use and told an application how to "scrape" it -- until the webmaster at the other end decides to tweak the format in an unexpected way.

That's not to say that the web is completely opaque from the computer view. There has been a lot of work in this direction, under such headings as "Semantic Web" and "Web Services". As I understand it and in very broad terms, the Semantic Web is about making the web in general more accessible to computers, including (but not limited to) making human-visible structure more computer-visible. Web Services are more about creating a parallel universe of resources aimed specifically at computers, using the same formats and protocols as the human-visible web, but structuring things much more precisely so that an application accessing a resource knows exactly what to look for where.

Even in the most automated case, say when you want to use some sort of tool to book a flight, and that tool communicates directly with the various airlines and travel sites, speaking protocols that only computers were meant to understand, the human structure still wins. The connection between your tool and the travel sites, and the protocols they use to talk to each other, all reflect the fact that people want to fly and airlines want to sell them tickets.

Which brings me back to the question in the title. Another of the many possible answers to "What is the web?" is "a reflection of human society and its interconnections in electronic form."

In keeping with the "field notes" theme, here's a possible analog from biology. The class nematoda is one of the most successful on earth. If you could remove all matter on earth except for the nematodes, you could still make out most of what went on on the surface -- the topography, the shapes of buildings and roads, the shapes of larger life forms like trees and people.

Just so, if you could somehow remove all information on earth except for the web, you could still make out much of what goes on in human life. Maybe not as great a proportion as with the nematode example, but quite a bit nonetheless.

No comments: