Wednesday, September 26, 2007

Names and addresses

Consider the humble HTTP URL (there are other kinds, of course, but http:// and https:// have by far the lion's share of the action). Compared to the email address which, after a couple of stages of evolution, became a nice abstract way of denoting a mailbox, the HTTP URL might seem to leave a bit to be desired.

For most purposes, the literal meaning of an HTTP URL is "make a TCP connection to the server in question and send an HTTP GET request to it." It says not only where to look for a resource, but what protocol you have to use to get it. By contrast, a mailto: URL says neither. Find any server you like. It might speak SMTP, but it doesn't have to. The receiver has equal latitude in retrieving the message you send.

To be fair, a URL is a Resource Locator, so what else would you expect? Except for a couple of things.

First, people don't just use URLs for locating resources. For example, XML namespaces are generally HTTP URLs. There is no requirement that they point to a schema or even that they point to anything at all (again, to be fair, URLs in general aren't required to point to anything at all, either).

If a namespace happens to dereference to a schema, there is no requirement that it be the right schema for a given document. It's certainly a good idea to bump the namespace when you change the schema and to keep schemas backward compatible, but it's not required. Regardless of this, the namespace URL serves as a unique name to disambiguate my PurchaseOrder element from yours. In other words, it's acting as a name, whether or not it's any good as an address.

Second, there's already a facility, the URN (Universal Resource Name), for naming things independent of their location. I'm not sure who uses it. There's also XDI, but again, I'm not sure what traction that's got. I'm not claiming that HTTP URLs are better or worse than anything else, but it's pretty clear that for most of us they're perfectly good identifiers and there's not a pressing need to use anything else.

So here's the thing. In the case of bang paths vs. modern email addresses, it was night-and-day clear that moving away from "where it is and how to get there" toward "unique identifier" was a Good Thing. But with URLs, the world seems perfectly happy and functional using a thing that says "where and how" both for its intended purpose and as a name. What gives?

For one thing, HTTP is a completely different beast from UUCP (the protocol behind bang paths). It has explicit support for redirects, proxies, caches and other things that decouple the URL from exactly how you get to the resource behind it. Essentially, the URL says "start here", namely at the server/port part (more properly the authority). What happens after that is fairly flexible.

For another thing, a URL is generally pretty opaque. If it doesn't work, I don't go and consult a map to figure out what alternative might work for some part in the middle. I either give up or go hunting for a whole new URL for the same resource.

Finally, URLs are better and better hidden these days. If I see a web address on a billboard, it's probably just the domain name. I type the domain name into my browser and it fills in the http:// for me. Most of the time I won't even do that. I'll just chase a link somewhere. At that point I really don't care what URL the link uses to find its referent.

Somewhere among or around those three things is an explanation for why HTTP URLs work as well as they do in practice, while in theory they shouldn't. Or rather, why the theory that identifiers shouldn't talk about wheres and hows doesn't seem to hold in all cases.

No comments: