Sunday, November 9, 2008

3xx REDIRECT

[If you got here by googling "3xx redirect" or similar, you may want to consult the HTTP 1.1 spec.]

I'd originally titled this post "In praise of 3xx REDIRECT" and led off with a couple of quotes, one from Bjarne Stroustrup (of C++ fame), about levels of indirection in computing science.

Then I tried to chase down the Stroustrup quote and discovered that he was probably quoting someone else when I heard it. Then I tried to chase down the other one, with even less luck.

Then I turned to investigating 3xx REDIRECT itself.

Now, while I'm not going to retract my claim that 3xx is praiseworthy, it turns out that there's a difference between the nice warm fuzzy abstract feeling that indirection is useful and the, um, interesting reality of what happens in the World's Most Famous Protocol as the web grows explosively, browser battles browser and search engines try to index everything in sight. "It turns out" is a math-major euphemism for "I should have realized".

OK, for those who don't spend their time poring through RFCs and other technical documents, what is this "3xx REDIRECT" thing? As I said, the idea is simple. It's a way for a server on the web to send back a message that says "What you're looking for, it's not here. It's actually over there." In other words, it's a forwarding facility, entirely analogous to mail forwarding or call forwarding or the sign on the front of a shop that says "We've moved across the street."

In web land, every HTTP request returns a three-digit status code, an idea stolen from FTP (File Transfer Protocol) or wherever FTP stole it from, because it's a fine idea well worth stealing. Codes in the 200s, like "200 OK" say "It worked". Codes in the 400s, like "404 Not Found" and the particularly harsh "406 Not Acceptable" say "It didn't work and it's your fault." Codes in the 500s, like "500 Internal Server Error", say "It didn't work and it's my fault." [*]

The 3xx codes say "It's not here, but here's where you can find it." There are several variants. The main division is between "301 Moved Permanently", which says you should forget all about the old address and use the new one, and everything else, which doesn't. Two of particular interest are "302 Found" and "307 Moved Temporarily".

Now, if 301 is "Moved Permanently", wouldn't you expect "Moved Tempoarily" to be right next to it at 302? Indeed it was, in HTTP 1.0. Unfortunately [**], not everyone treated 302 as it was specified and in HTTP 1.1 302 became "Found", meaning (sort of) "I found what you wanted, but not here." and 307 became the new 302 (the actual differences in what happens on the wire are a bit more subtle). Worse, at least some server setups will use 302 by default for any redirection unless you tell them otherwise.

As a result, 302 is now hopelessly overloaded. It might mean what it originally meant. It might mean what it's officially supposed to mean. It might even mean something else, like "moved permanently, forget you ever knew that old address" but the webmaster neglected to say so explicitly. And yet, the web goes on working its wonders.

Standards. You gotta love 'em. Any standard that sees real use is really three things:
  1. What the document says
  2. What the implementations do (based in part on what people think the document says)
  3. What everyone thinks the implementations do
And of course, (2) depends partly on (3), and the next version of (1) is generally influenced by both.

[*] The astute reader will point out that I omitted 1xx. The astute reader will be right, as usual.

[**] I'm by no means an expert on what web servers, browsers and crawlers actually get up to. I'm relying here on stuff I've heard, or gleaned from a bit of googling, and particularly on this lengthy writeup, or at least the part of it I actually read.

No comments: