Wednesday, November 26, 2008

Brachiating through the web

In a previous post, I needed to show two items of a list, intersperse some text and then resume the list with item 3. I knew there was an incantation for this, but I couldn't remember what it was. So I visited my old friend WebMonkey, whose HTML cheat sheet has remained unchanged for many years, but which still proves useful from time to time (WebMonkey also has more current material, but, leaving my webmastering to others wherever possible, I don't find myself referring to it).

Unfortunately, Ye Olde Cheate Sheete only documents HTML 2.0 or so. So I then fell back on my other standby, googling "HTML RFC". That brought up the RFC for ... HTML 2.0 (RFC 1866), dating to 1995. That's as far as the IETF goes. If you want more up to date than that, you have to go to the W3C. Sure enough, they have the HTML 4.01 spec, and that has the lowdown on lists [*], including the advice that I shouldn't be giving list items numbers anyway. I should be using stylesheets. Unless I should really be using XHTML. Oh well.

What caught my eye, though, was the definition given there of the Web:
The World Wide Web (Web) is a network of information resources.
It then goes on to mention the intertwined roles of URIs, HTTP and HTML. That seems impeccable, as far as it goes, and you can't question the source, but it tends to leave one wanting more. Which is why I don't feel too bad about having tried to go further, once or twice (or thrice).

[* What I really did was compose a new email with Thunderbird, use its GUI to set a list to start at item 3, save the result as a file and discover that the magic words are
<ol start="3"> ...</ol>
It took a couple of tries to get that to show up correctly, but that's a different story]

CD Player. Comes with music.

This is take two of the post I was trying to write when I ended up writing about BodyNet instead.

Technically, there's not a lot of difference between a cell phone and a streaming audio player. Throw in some flash memory and downloaded tunes are no problem either. Add a screen and you can say the same thing for video. But how do you get the content to the phone? Two models spring to mind:
  1. A big happy open web-driven marketplace. Surf wherever you want. Find something you like? Download it to your phone just like you'd download it to your PC. Pay whoever you need to when you download (or pay for a subscription). This is pretty similar to the CD/DVD market. Sounds nice, but as far as I know you can't do it. It's a lot easier to do DRM on a captive device like a cell phone, and cell phone makers are pretty aggressive about making sure you don't tamper with their devices.
  2. A collaboration between the content owners (i.e., studios and record labels, not to be confused with singers, songwriters, screenwriters, actors etc.) and the service providers. Subscribe to a service and you can also download or stream content from whatever content owners the provider has partnered with. This is pretty similar to the cable TV model. It ensures that everybody gets a cut (as always, we can argue over who gets what cut) and a number of partnerships have formed.
There's another model that doesn't come to mind because when you try to map it back to "old media" terms, it doesn't really fit. Yet there are at least two examples going, one of them recent:
  1. The cell phone makers sell the content. As the title suggests, this seems like selling a CD player and then selling the CDs to go with it. You see this in niches (e.g., Disney makes an MP3 player and sells plug-in cards with songs from their artists), and I wouldn't be surprised if some early phonograph maker tried it, but it doesn't seem like a great idea. Selling electronic widgets and selling bits are just two different things. Nonetheless, it certainly worked for Apple and the iPod/iPhone, and now Nokia is trying the same approach with Comes With Music (TM). It's not quite the same model as iPhone -- for a subscription fee, you can download all you want and keep it forever -- but it does share the feature of putting the phone maker in the content business.
So maybe they know something I don't. Wouldn't be the first time.

Tuesday, November 25, 2008

SearchWIki: addendum

It looks like the top hits for "SearchWiki" are heavy with "How do I turn this thing off?"

(Not) announcing SearchWiki

I'm not sure exactly when this happened. It seems recent, but maybe I'm just slow to notice. In another one of its quietly-snuck-in tweaks, Google has added a couple of widgets to its search results: "promote" and "remove".

The function is pretty clear: move this item up the list, or weed it out entirely. But over what scope? Ah, there's another clue: at the bottom of the page is something about "SearchWiki". And there's a "learn more" link.

Aha. Your choices and comments are kept with your account and re-used whenever you do overlapping searches (or as a special case, repeat the same search). You can also add links, make comments, and see what others have done (in the aggregate, I would expect).

Looks interesting, and harmless enough in its current form. Wonder if I'll end up using it.  [... and it's gone.  Not too long after it came along, if I remember right --D.H. May 2015]

Saturday, November 22, 2008

BodyNet fifteen years after

About fifteen years ago, Olin Shivers looked at a typical tech-savvy professional of the day carrying a pager (remember those?) a cell phone, a "digital diary", a keyless car remote, a notebook computer and a Walkman (remember those?) and concluded "That's one headset, two communications systems, four keyboards and five displays too many." You typically didn't have a good mobile web connection in those days, either. Shivers then went on to describe a more modular collection of pieces communicating through a short-range network he dubbed "BodyNet".

Fast-forward fifteen years. Are things any better? Well, yes. Did BodyNet happen? Well, depends on how you count.

These days you can get a portable thingie [see also "pocket-thing"] that will let you make phone calls, download and play music and videos, get and answer email, surf the web, keep your calendar and use GPS (or other means) to tell you where you are. You can also download other widgets/gadgets/apps/whatever-you-call-them that will let you do all manner of other things (or at least, play games).

There is even a short-range network standard, namely Bluetooth, that you can use to attach accessories to the thingie, including a headset that will let you talk hands-free, albeit at the cost of sometimes looking like you're having a conversation with the wall or your invisible friend Harvey the Rabbit. Except for its somewhat broader range, Bluetooth looks remarkably like the BodyTalk Shivers describes. I doubt that's a coincidence.

So: The hodge-podge of mobile devices one carries around have consolidated. There is now a short-range, personal, body-sized network. This network can connect to the Web. There is a market for devices to plug into that network. So BodyNet happened, right?

Not really. The original BodyNet was meant to be a mix-and-match affair in which you get the pieces you need a la carte and plug them together. Shivers specifically argues against a "monolithic" approach:
We do not believe in this [monolithic approach] over a broad class of users [...] Individual users will persist in remaining individual -- the system requirements of one will not satisfy the needs of another.
But the monolithic approach is just what we have today. "Phones" pack in more and more features. Worse (from the geekly perspective), they tend to do so in a very closed-system sort of way. The cell phone makers and service providers want very tight control over what you can and can't do with their product/service. True mix-and-match is restricted to accessories like headsets.

This seems to be a recurring blind spot for us geeks. We're taught "clean interfaces", "standard protocols" and "modularity, modularity, modularity." We forget that most people don't want to pick their favorite components and plug them together. The best selling stereos (do people still listen to stereos?) are all-one-piece. We have software "office suites" because no one wants to find out how well Word Processor A works with Spreadsheet B. Even on the wild and woolly web, people like portals and mashups that do the grunt work of pulling things together.

Phone makers sell all-singing, all-dancing phone/modem/GPS/email/web/music/video/... devices because people want them. They want the features, not the joy of putting them together.

Thursday, November 20, 2008

Is Vermont the new Delaware?

Huh?

It's all explained in the CFO.com article Vermont wants to be the "Delaware of the Net".

If you've dealt with the incorporation of a company (or you've already heard about this in the past few months), you probably know what this is about. Delaware has structured its laws in such a way as to be particularly friendly for companies seeking to incorporate. I've worked for at least one Delaware corporation (it was based in Silicon Valley). Sort of the Liberian ship registry of the US corporate world.

Vermont is trying to do the same for "virtual corporations" -- those without physical offices, paper filings and other physical artifacts one might expect of a company. In doing so it's trying to duplicate its success in captive insurance [*]. It's an interesting idea, but I'm curious just how much of a competitive advantage Vermont would really have. There are more factors to consider than whether papers have to be filed on paper. For example, credit card companies love Delaware for its lax usury laws. One could conceivably start a virtual credit card company, but would one want to charter it in Vermont or in Delaware?

On the other hand, it's hard to see what Vermont stands to lose. So why not give it a shot?

[*]I had to look that one up. Wikipedia informs us that a captive insurance company is a subsidiary that exists primarily to insure its parent company. Seems somewhat circular, but the captive always has the option of purchasing re-insurance from independent insurers.

Wednesday, November 19, 2008

How disruptive is online advertising?

A Forrester Research report quoted in a Wall Street Journal article, neither of which seems easy to access on line, says that Kids These Days spend more time online now than they do watching TV (apparently a good chunk of that time is spent gaming). Widespread adoption of broadband connections (or at least, considerably-faster-than-dialup connections) has been a big driver for this.

This is causing both major advertisers and major online advertising players to re-think how best to reach consumers in the new higher-bandwidth net.world. In the case in point, Proctor & Gamble and Google are going so far as to exchange employees, the better to understand each other's cultures and outlooks. An odder couple you couldn't ask for, and yet it appears to make business sense.

On the one hand, this is just another chapter of the "How do we make money off this 'web' thing?" saga that's been playing itself out slowly but surely for the last decade or so. But on the other hand, it gains a bit more urgency when rephrased as "We need to make money off this 'web' thing. The other stuff is drying up!"

My feeling continues to be that the web will have much more impact on how companies make money than on which companies are making it. That certainly seems to be the lesson from the dot.com boom and bust: WebVan folded but brick-and-mortar grocery stores still take orders online. EToys got bought out by KB Toys and Walmart and Target went online. Not to say that new companies haven't sprung up -- Google, Amazon and EBay come to mind -- just that old companies adapting has been more the norm.

In fact, the how is not necessarily changing that much. P&G still makes money selling soap, grocery stores still sell groceries and toy stores still sell toys. The big difference is in how they reach their customers. And even then, an ad still looks pretty much like an ad and an online catalog still looks a lot like a catalog.

Tuesday, November 18, 2008

Another size for URLs

I previously claimed that URLs come in three sizes: small, large and monster, of which only the last is actually a full-fledged URL.

Some time before that, I did a post on tinyurl. So that would be another size of URL: tiny.

Except, by my reckoning, tiny URLs are actually large -- they're definitely not "small" or "monster" -- except that unlike the examples I gave, tinyurl URLs are real live URLs.

However you slice and dice the categories, tinyurls seem to have found a very natural habitat in twitter.

Old-school image processing, or Moon pictures remastered

Before anyone set foot on the moon, it was considered important to survey the place. To this end, NASA sent probes fitted with cameras and the ability to beam back pictures. There being no CCDs at the time, the probes actually used film, developed it and scanned the resulting prints for transmission. The transmissions were recorded on magnetic tape as a backup, thanks to the foresight and efforts of Charles J. Byrne; the preferred mode of storage was to print the pictures and store the prints. What most of us saw the first time around, including the famous Earthrise image, was actually third-generation material: reproductions of photos of those prints. [Or fourth-generation if you count the original film up in space.]

Twenty or so years later, Planetary Data System co-founder Nancy Evans, then at JPL, took the tapes into her care and started a project with Mark Nelson to find and refurbish tape drives that could read the old tapes. The project stalled for lack of funds. Evans retired from JPL to become a veterinarian and stored the tape drives in her garage.

Another twenty or so years later, Evans retired as a veterinarian and went looking for someone to take the drives off her hands and, hopefully, put them to their intended use. Dennis Wingo and Keith Cowing took on the job, moved the drives into a disused McDonalds in NASA Ames' research park and set to getting them working again. This involved a lot of cleaning, swapping of parts and working with circuits whose components were actually large enough to see and handle. It took them 99 days, but they got the thing working.

Even better, the results are now on the web, as is the more complete account I'm summarizing.

The web has acquired another significant chunk of history -- the digital images the probes would have sent back if they could have, and if there had been any place to put them.

Most definitely a neat hack.

A friend is a friend is a friend ...

... or at least from the point of view of LinkedIn, but I think they're typical.

It's not unusual to have dozens of links ("friends") on a social networking site, even if you're not trying that hard. If you are trying, you can easily get hundreds. Are these all close personal friends, people you'd walk through fire for if they so much as asked you to? Probably not. Some of them are going to be closer than others, but there doesn't seem to be any way to indicate that.

Should there be? On the one hand, it would be useful, when chasing through your connections, to have some idea of whether friend A's friend B is someone A knows really well, or just an old schoolmate who happened to extend an invitation and, well, you wouldn't want to just say no for no reason, would you? So why not let members assign a degree of "closeness" to any friend? The resulting graph would be richer and more informative, and not appreciably harder to handle from an algorithm-geek point of view.

But would this really help? Everyone would have to make a snap judgment about "closeness" every time they added a link, and everyone will have their own slightly different idea of how "close" (say) a "5" is. Worse, ratings will almost certainly change over time, particularly on a purely social site like MySpace or FaceBook. Keeping the "who's in/who's out" numbers up to date could turn into a major timesink, not to mention an intricate political maze (but maybe that's what a lot of people are looking for on the sites in the first place?).

On the other hand, should you be looking at your connections' connections in the first place without talking to the person in the middle? Even if you decide to do that, there are still other cues to go by. On LinkedIn, for example, you can compare people's profiles and get some idea where the intersect, and you can look for recommendations. I would expect MySpace and FaceBook have more finely-developed mechanisms and conventions, but I don't know first hand (see previous comment on timesinks).

Monday, November 17, 2008

URLs, URLs and URLs

It occurs to me there are basically three sizes of URL:
  • Small, like foocorp.com. These aren't even really URLs, but your browser is smart enough to figure out you mean http://www.foocorp.com/index.html or whatever it really is.
  • Large, like www.foocorp.com/widgets. Still not really URLs, but the browser will kindly prepend the http://.
  • Monsters, like http://fieldnotesontheweb.blogspot.com/2008/07/again-just-what-is-this-web-thing.html or http://www.blogger.com/post-create.g?blogID=2129929182918599848. These are actual URLs as defined in the standard.
The small size is what everyone thinks of as a web address. No one wants to type more than that into the browser. Sometimes you can get away with a large URL, if both parts make sense and you're pretty sure people are really interested in what you're saying.

Real URLs are not fit for human consumption, except maybe for cutting and pasting. They might as well all say http://dontreadthisyadayadapeanutbutter. If you actually have to read a real URL, and you're not actively committing web-geekery at the time, something has gone wrong.

[Side note: Some, notably the standards authors, use "URL" as the plural of "URL", evidently on the grounds that URL may stand for "Uniform Resource Locators" just as well as "Uniform Resource Locator". This may be standard, but it's not the usual way acronyms and initialisms form their plurals. I shan't use it thus.]

What's the difference between the Web and HTTP?

The web, being whatever we want it to be, is already up to 2.0.

HTTP, being a tangible thing, is only on 1.1.

Sunday, November 9, 2008

3xx REDIRECT

[If you got here by googling "3xx redirect" or similar, you may want to consult the HTTP 1.1 spec.]

I'd originally titled this post "In praise of 3xx REDIRECT" and led off with a couple of quotes, one from Bjarne Stroustrup (of C++ fame), about levels of indirection in computing science.

Then I tried to chase down the Stroustrup quote and discovered that he was probably quoting someone else when I heard it. Then I tried to chase down the other one, with even less luck.

Then I turned to investigating 3xx REDIRECT itself.

Now, while I'm not going to retract my claim that 3xx is praiseworthy, it turns out that there's a difference between the nice warm fuzzy abstract feeling that indirection is useful and the, um, interesting reality of what happens in the World's Most Famous Protocol as the web grows explosively, browser battles browser and search engines try to index everything in sight. "It turns out" is a math-major euphemism for "I should have realized".

OK, for those who don't spend their time poring through RFCs and other technical documents, what is this "3xx REDIRECT" thing? As I said, the idea is simple. It's a way for a server on the web to send back a message that says "What you're looking for, it's not here. It's actually over there." In other words, it's a forwarding facility, entirely analogous to mail forwarding or call forwarding or the sign on the front of a shop that says "We've moved across the street."

In web land, every HTTP request returns a three-digit status code, an idea stolen from FTP (File Transfer Protocol) or wherever FTP stole it from, because it's a fine idea well worth stealing. Codes in the 200s, like "200 OK" say "It worked". Codes in the 400s, like "404 Not Found" and the particularly harsh "406 Not Acceptable" say "It didn't work and it's your fault." Codes in the 500s, like "500 Internal Server Error", say "It didn't work and it's my fault." [*]

The 3xx codes say "It's not here, but here's where you can find it." There are several variants. The main division is between "301 Moved Permanently", which says you should forget all about the old address and use the new one, and everything else, which doesn't. Two of particular interest are "302 Found" and "307 Moved Temporarily".

Now, if 301 is "Moved Permanently", wouldn't you expect "Moved Tempoarily" to be right next to it at 302? Indeed it was, in HTTP 1.0. Unfortunately [**], not everyone treated 302 as it was specified and in HTTP 1.1 302 became "Found", meaning (sort of) "I found what you wanted, but not here." and 307 became the new 302 (the actual differences in what happens on the wire are a bit more subtle). Worse, at least some server setups will use 302 by default for any redirection unless you tell them otherwise.

As a result, 302 is now hopelessly overloaded. It might mean what it originally meant. It might mean what it's officially supposed to mean. It might even mean something else, like "moved permanently, forget you ever knew that old address" but the webmaster neglected to say so explicitly. And yet, the web goes on working its wonders.

Standards. You gotta love 'em. Any standard that sees real use is really three things:
  1. What the document says
  2. What the implementations do (based in part on what people think the document says)
  3. What everyone thinks the implementations do
And of course, (2) depends partly on (3), and the next version of (1) is generally influenced by both.

[*] The astute reader will point out that I omitted 1xx. The astute reader will be right, as usual.

[**] I'm by no means an expert on what web servers, browsers and crawlers actually get up to. I'm relying here on stuff I've heard, or gleaned from a bit of googling, and particularly on this lengthy writeup, or at least the part of it I actually read.