Wednesday, December 31, 2008

Happy 2009!

As you're waiting for the ball to drop tonight (or whatever other marker you like), remember that 2008 will be hanging around for just a bit longer than the last few years. A leap second has been introduced for the first time since 2005 and only the second time this century.

Leap seconds are interesting (and somewhat problematic) to anyone interested in keeping a computer's clock exactly in sync with the official timekeepers because, unlike leap years, they do not follow a predetermined rule. Rather, the International Earth Rotation and Reference Systems Service (IERS) announces them based on observations of the earth's rotation. Despite what the name might suggest, the IERS does not actually cause the earth to rotate.

The Wikipedia link above has a good rundown. The official version is on the IERS site, but be advised that it's a bit technical. Evidently "How much is the accuracy of precession/nutation improved by MHB2000 and corrections to the precession constant and obliquity rate over the current IAU models?" is a frequently asked question.

Business model blues

Item 1: A local newspaper reporter muses on a local radio show about the future of newspapers. Yes, the economy is bad, but the long-term problem is that classified ads have gone online and they're not coming back. One option is to charge for a paper what it actually costs to produce; this would more than double the price, cutting readership and so on and so on. The days of the newspaper as everyone's window to the world are fading fast. That, too has gone online.

Item 2: Viacom is urging viewers to call their cable operators (Time-Warner cable in particular) to urge them not to drop their channels. At issue: Viacom wants to increase its charges to providers. Providers argue that ratings are decreasing and the shows are available online for free in any case (often via the exact same cable). Fundamentally, the shift is from TV -- or perhaps more precisely, dedicated TV channels -- to online, where bits is bits and video is just one more bit stream, albeit a considerably bigger one than pretty much everything else.

(Here's the top hit I got for viacom time warner, while putting together item 2. The particular article is from Dow Jones, now owned by NewsCorp, carried on CNN Money, owned by Time-Warner. Caveat lector.)

Monday, December 29, 2008

Email in the loop

Email seems to be a sort of universal escape hatch for otherwise automated schemes:
  • If you forget your password on many sites, or in some cases even if you forget your login entirely, you can have the site send a special email to an address you gave when you registered.
  • If you want to join a site, you'll often get an email containing a magic link to follow to complete the registration process.
  • The same scheme has been used for product registration.
  • If you want to join an automated mailing list, you'll typically get an email asking you to confirm that you wanted to join.
Why does this work? It's painfully clear that email is completely insecure on the sending side. Anyone can spoof anything without a lot of effort. I've sometimes received email from myself for products I'm pretty sure I'm not trying to sell myself (my evil twin, on the other hand, may have other ideas ...). However, receiving email is somewhat more secure. Generally you at least have to give a password and you can use TLS to help prevent various attacks. This is nowhere near ironclad, but it does help:
  • A random person trying to recover your password will have to know what email address you registered with and be able to intercept your mail.
  • Someone trying to register hordes of people on a particular site will have to make up a bunch of email addresses and have a bot ready to answer the confirmation mails. Hmm ... that doesn't sound like a particularly high bar, so maybe I'm missing something.
  • When you legitimately register a product, the seller now has an email address that it knows someone has replied to at least once (and presumably a pirate will use other means to get the use of the product).
  • A spammer can't add your email address to someone else's mailing list without your getting an email asking if you really want mail from that list. This doesn't cure all ills, but it at least cures one of them.
If all this sounds like a sort of lukewarm endorsement, it should. The fact remains that email isn't really secure and doesn't seem to be getting any more secure very fast (I'm aware of PGP and its cousins and offspring -- good stuff, but not widely deployed). I think what bothers me here is that the extra email step might give an exaggerated air of security. About the best that can be said in most cases is that if you send an email to a legitimate address, at least the intended recipient is likely to see it, and schemes that rely on this seem to work well enough in practice that they continue to be used.

[Access to email (the receiving side, above) has gradually become more secure.  HTTPS is now pretty standard and two-factor authentication is available.  SPAM and phishing are still significant issues, though SPAM filters seem to have gotten better faster than spammers have gotten better at getting around them.  As to the main point of the article, email seems to be just as much in the loop as it was when I wrote this. --D.H. May 2015]

Are we still on the information superhighway?

Some words and phrases are just plain dated. They were the cat's pajamas back in the day, but now only a square would think they were hep. For all practical purposes they've dropped off the face of the earth into a silent abyss, only to be dragged out when we feel like watching 80s videos, pointing and laughing (*). Which brings me back to the title.

What was the "information superhighway"? As usual, Wikipedia has the relevant citation:
"One of the technologies Vice President Al Gore (**) is pushing is the information superhighway, which will link everyone at home or office to everything else—movies and television shows, shopping services, electronic mail and huge collections of data."
So that happened. Not only that, the analogy to the original superhighways (i.e., the U.S. Interstate system) seems apt. Both are "enabling technologies", valuable not so much for what they do, but for the other technologies and trends they make possible. So why does the term sound so completely dated?

I'd guess there are a couple of reasons. One is that "information superhighway" was never a real geek term. It was a way of explaining the geek stuff to congresspeople who largely didn't care greatly about the technology. They cared about what supporting said technology could bring to their constituents. The interstate system was widely regarded as a Good Thing, so if you could convince people this "internet" stuff was the modern version of that Good Thing you were ahead of the game.

That was probably helpful in congress, but it gave off the impression that only suits and pointy-haired bosses talked about "superhighways". Real geeks talked about networks, client-server architectures, not-so-client-server architectures, RISC vs. CISC and whatever else happened to be floating around at the time. If you wanted to be cool in the dot-com days, you wanted to talk like a geek.

Another reason, perhaps less obvious, is that we don't call superhighways "superhighways" anymore. They're just highways. What's super about them, anyway?

These days, if you're parked on the 405 during rush hour -- that is, pretty much any time -- they don't look so super. But consider what came before them. For example, to get from LA to the San Joaquin valley, you had little choice but to take the "Ridge Route" over the Tejon summit. These days, you've got eight lanes of I-5, traversable at speeds (ahem) well in excess of whatever's posted. Before that, well here are some excerpts from a description on GBCnet.com:
Perhaps the most hazardous section [...] was the Grapevine Grade between Fort Tejon and the base of Grapevine Creek [...]. The original Ridge Route highway built in 1915 was an absolute deathtrap. This section had 119 sharp turns, two with a radius as small as 80' (a 10 MPH hairpin turn - the turning radius of an average car is 40') and when totaled caused the traveler to drive the equivalent of 12 full circles [My inner math major compels me to infer "six one way and six the other" -- DH]. As a testament to the hazards presented by this road, one turn was even labeled "Deadman's Curve." By 1934 the extension of the US 99 Ridge Route Alternate [...] offered a substantial improvement in safety and ease of driving. For one, the number of turns was reduced to 23 and the number of complete circles a driver would have to make was reduced to 1½. [...]. Perhaps the biggest improvement was the widening of the road from 20' to 30' with three 10' lanes. The addition of suicide lanes enabled motorists to pass slow trucks and allowed traffic to move more quickly as backups behind the slow vehicles became minimized.
Yep. "Suicide lanes" were an improvement. Granted, by the time I-5 came around in 1960, there had been further improvements, including an expressway that was partially incorporated into the interstate. Nonetheless, all of the above fun and excitement would have been well known to drivers at the time, and eight relatively straight and level lanes, none designated "suicide," would have warranted the "super" tag.

Similarly, if you compare grainy YouTube video, the occasional annoying wait and spam-a-go-go to dialup access to a BBS (***), the difference is night and day. But one soon forgets the night.

(*) As you might guess from my profile, I caught those 80s videos the first time around. Thought they were infinitely bitchin'.

(**) It's hard to pull out a quote like that without dragging up the "Al Gore claims he invented the internet" controversy. I'll defer to Wikipedia on that, too.

(***) Nothing against BBSs. Like many geeks of a certain age, I spent a fair bit of time on one (thanks Keith!). I hear now you can use the internet to type short text messages to people who can type back at you ... you can even download timesink games and ...

Tuesday, December 23, 2008

All of human knowledge

In the annual (?) appeal for funding for the Wikimedia Foundation, Jimmy Wales asks us to
Imagine a world in which every single person on the planet is given free access to the sum of all human knowledge.
This seems like perfectly fine wording for a fundraising appeal, a decent description of what Wikipedia is about, and a noble ideal to boot. So let's rain on the parade by picking it apart, shall we?

Is it possible, even in principle, to give even one person access to the sum of all human knowledge? Actually, what does "the sum of human knowledge" even mean? Some time ago, I was convinced it was "everything in the encyclopedia". Now I'm not so sure. Wikipedia itself specifically excludes knowledge that isn't "notable" (what did I have for breakfast yesterday?) and "original research" such as tends to creep in as people summarize pieces of articles and draw conclusions from them. It also goes to great lengths to exclude or at least neutralize opinion (POV in the jargon (*)).

In other words, it aims to gather information generally accepted as "known". This is the kind of philosophical quicksand that holds up just fine so long as all you do is walk blithely across it. So let's just walk ...

Assuming there's such a thing as the sum of human knowledge, for some value of "knowledge", could anyone access it? Well, you don't really want to access all of it. You couldn't anyway. You want to be able to access the bit you need at the moment, right then and there.

This runs directly into the limits human bandwidth. Not only is there only so much raw information you can process at one time, there is only so much metadata -- information about what and where other information is -- that you can process at one time. Sure, the knowledge you're looking for is in there, and you have both the careful work of editors and categorizers and the raw horsepower of text search at your disposal. But can you find it? Empirically, the answer so far is "often". I doubt it will ever be "always".

Nonetheless, an unachievable goal is still worth aiming for so long as we produce useful results along the way.

(*) The Wikipedia article on POV contains a very relevant bit of wisdom:

In Thought du Jour Harold Geneen has stated:[1]

The reliability of the person giving you the facts is as important as the facts themselves. Keep in mind that facts are seldom facts, but what people think are facts, heavily tinged with assumptions.

Sunday, December 14, 2008

Giving online email the bird

The other day I was trying to send an invitation using a popular web site. The result was unsatisfying, but it wasn't particularly the invitation site's fault.

It was my first time on that site, so it didn't know any email addresses, and of course that makes sending invitations a bit harder. So I went into T-bird and started writing a dummy email. T-bird dutifully filled in the addresses from the first few letters of the names. I then went to cut-n-paste those addresses into the site's text box. T-bird doesn't seem to want to select more than one address at once from a message being composed.

Bad T-bird (or dumb me for missing the obvious -- but when dealing with software with a seven-digit user base and an ostensibly infinite supply of eyeballs, I tend to be less forgiving).

Then I tried creating an ad-hoc mailing list in T-bird and copying that in. Turns out this was not T-bird's day (or mine, for that matter). So I finally did a silly-walk so unwholesome I hesitate to mention it: I saved the dummy email to a text file and cut-n-pasted the names in from that.

Well, clearly the problem here is that I persist in keeping my email address book in my own personal silo instead of on the web. The invitation site was certainly of that opinion. It offered me the opportunity to import my email addresses from any of a dozen or so widely-used sources.

That there are a dozen or so to choose from tells me that this "silo" problem is not quite licked yet, even on the wide wonderful web.

Besides the obvious concern about security -- which is maybe not such a big concern considering that everybody's bank accounts are online and your address book is more vulnerable to an email virus on your machine than on someone's secure server -- I think the problem here is granularity. I don't want everyone to see everything in my address book. I want different sites to see different portions.

I haven't done even the minimal research of finding out if this is possible online, and frankly, I hardly ever run into a situation like the above, so I have no idea whether this is a real issue or not. It sounds like the kind of problem personal datastores are aimed at, though.

In the meantime, I plan to continue hiding in my silo, though I'll try to look into how the online system at work works. It's one of the major providers, but I access it through T-bird, of course. Better the devil you know.

[This is one of the few posts that's struck me as seriously dated on re-reading, probably because it wasn't really that relevant in the first place.  It does hit on some interesting themes.  It just doesn't do much with them, and plenty of other posts have better takes on the same themes.  Now that you've read this far, feel free to skip it --D.H. May 2015]

The jury is out (but online)

I heard a news story on the radio the other day about a high-profile case that was being reviewed, among other things, because jurors were thought to have been accessing the internet when they were supposed to have been sequestered.

This is sort of a dog-that-didn't bark situation. The net and web have been around for a long time now, with their potential for tainting jurors who are supposed to be isolated. You'd think there would have been more and bigger stories about it by now. Evidently, though, it's not a major problem. Take your pick of possible reasons:
  • Most trials are over quickly. Relatively few require actually sequestering jurors.
  • Web access is just the latest in a long line of potential leaks. Cell phones, (not to mention ordinary phones), have been a problem for years now.
  • Jurors are generally good about following instructions.
  • Jurors not communicating with the outside world is fundamentally a human problem, not a technological one.
What I found really interesting, though, was how the potential taint came to light: via email. And of course, the lawyers involved preface any discussion of this with "If these emails are genuine ..."

There's your barking dog -- email problems, particularly the ease of spoofing email, are a much bigger deal than the a web connection being available when it shouldn't.

Tuesday, December 9, 2008

The OCR X-Prize

A while ago -- in fact, just about a year ago as it happens -- I remarked that the use of Captchas to try to keep bots away is effectively an X Prize for OCR hackers. If there's money to be made by using a bot, as there is with, say, online ticket sales for popular shows, then there ought to be reason for someone to write a better text recognizer to get past the Captcha. I was sort of right, kind of.

Sure enough, people have written better text recognizers. But from what I can make it out, they've done it for fame and recognition (well, at least to publish and not perish) and not for money. There are several academic papers out there, but as far as I can tell no enterprising script kiddie has done the requisite research. There have been reports of real sites invaded by real Captcha-cracking bots, but most likely someone just cadged the work done in the research papers and put it to ill use.

So, Catpchas have spurred research, because OCR has become a somewhat hot topic. They've also spurred actual scammers to crack Captchas, but not necessarily through OCR research. Actual scammers have little reason to invent new OCR algorithms, or even read the literature on the subject. That's not their strong suit. Their strong suit is social engineering. People are good at reading squiggly Captcha letters; spammers are good at getting people to do stuff; ergo, spammers get people to read the squiggly Captcha letters for them.

How do they do this? Put up a site featuring a thrilling pictorial presentation of, say, accounting standards through the ages (they actually used a slightly different subject matter). After each image is the promise of more ... if you can read the squiggly letters in the box. The letters, of course, are taken from a legitimate site that the scammer is trying to crack into at the moment, and the mark's response is fed directly back to that site. The awful beauty of this approach is that it will work for any "reverse Turing test" approach whatsoever.

If they're smart, they wait for a successful response back from the legitimate site before letting the mark proceed. Otherwise the mark could put in anything at all, for example "notarealticketbuyer", and by definition the scam site wouldn't know the difference.

Meanwhile, Captchas have become just almost too hard for humans to read (I came up empty on one today, which is what spurred this post). In other words, they've almost reached the point at which they can no longer discriminate between humans and bots. Clearly rendered text can't discriminate, because both humans and bots can read it easily. Gibberish can't discriminate either, because no one can read it. There is less and less room left in the middle.

Wednesday, December 3, 2008

Happy birthday, Kindle

CNN points out it's already been a year since Amazon introduced the Kindle. It's currently sold out, and Oprah likes it, as do Toni Morrison and James Patterson (but not J.K. Rowling). It's accounting for 10% of Amazon's book sales, even though only 200,000 of Amazon's zillions of titles are available on it (evidently it's the right 200,000). So it's a game-changing hit, right?

Well, it's definitely not a flop, and the article claims that sales are "on a par with other high-profile mobile devices in their first year." On the other hand, in keeping with my not-so-disruptive technology theme, I'd have to side with Paul Reynolds of Consumer Reports:
I think it's certainly a ways away from hitting the mainstream ... because of the price and the experience a reader gets from long-form reading. Whether these ... are successful, stand-alone devices remains to be seen. From what I've seen and heard, I think the technology is here to stay.
So ... so far, so good, and it definitely bears watching, but more of a leadoff single than a grand slam home run.

Maybe all this googly stuff is worth something after all

Well, I never really doubted it, but it's good to have a working example from time to time.

A friend called, saying they were at a pizza shop on Smith Street (that's not exactly what happened, but let's say it is). The shop was next to a big red brick building. Could I come pick them up? Before they could relay any more detail, their cell phone went dead.

So I went to Google, searched for "pizza" "smith street" <my town>. Up came a little map with pizza shops marked. Two were even on Smith Street. I then clicked on the little push-pins for the two shops and checked the street view. Only one was next to a big red building. Voila!

I plugged the address into my GPS (I'm pretty sure in some setups that can happen automagically), set off, and sure enough, there was my friend waiting. Just like the breathless descriptions you'd see about how the web was going to Change Everything, with the added bonus that it actually happened.

That probably came off as overly cynical, so let me climb down a bit: A lot of the technology and trends that have been hyped over the years have actually happened. Phones, computers and TVs really are converging towards each other. You really can find all sorts of useful information and belong to far-flung virtual communities over the web. You can even shop and bank on the web.

I'm not down on the technology. I'm not even particularly down on the hype. Hype happens. If people didn't get excited about cool technology we wouldn't have any. Nonetheless, I can't help feeling that, with all these changes, my life is essentially the same -- even taking into account that it would have been a lot harder to track down my friend without the web.wizardry. Sure, technology can be disruptive, but most of it isn't, at least not as quickly or in the ways people often seem to imagine it to be.

Two Nice UI Amenities

Neither of these is particularly new to the web, much less new to the world, but they sure can make life easier:
  • The type-a-few-letters-and-it-narrows-the-list-down list. I remember seeing this one in one of the major PC apps in the 80s and going "oooohhh" kind of like the little green men in Toy Story. It hasn't lost any of its appeal. Wikipedia added it to its search field a while back, and it's a welcome addition.
  • The field-that-remembers-what-you've-put-in-it-previously. This is a standard browser feature (or plug-in), but I remember life without it and I like life with it better. Besides the obvious convenience of not having to type that 20-digit account number over and over again, it helps ensure consistency. The usual implementation is even a bit smarter than I imply: it can remember what you've put in similar fields, possibly in other pages entirely, so it can suggest your usual email address in a new "email" field it sees.
The two fit nicely together. If over the course of time you've given several different responses in a given field, it's very nice to be able to just type the first few letters and get the one you want. For example, blogger's tagging feature works exactly this way. Whenever I have occasion to refer to Alexander Stanhope St. George, he's only a few keystrokes away, and I don't have to remember whether it was, say, "St." or "Saint".

This is exactly the kind of small-scale memory that makes UIs more intelligent, and as I've argued before, "intelligence" here isn't just metaphor. The UI is intelligent in the same sense as an animal is. I'll hasten to add that a smart fields like I'm describing are of course not intelligent in the classical AI sense of being able to pass a Turing test. For me, that just says there are more kinds of useful intelligence than being able to converse like a human. But that's a whole other long discussion.

Note that both of these features can be handled either locally (as when the browser remembers addresses you've typed into "Address" fields) or remotely (as with Wikipedia's article titles) [More accurately, the knowledge can be stored locally, remotely or both, and the processing of that knowledge can also happen locally, remotely or both, mostly mix-n-match]. While the basic features don't inherently require AJAX, AJAX can certainly make them more widespread and useful.

Monday, December 1, 2008

LISP history from Olin Shivers

It may have seemed like I was bagging on BodyNet in a previous post. In fact, I was impressed by the paper when I first read it (a few years after it came out, I think), and I'm still favorably impressed. Getting the details of the future somewhat wrong fifteen years before the fact is still pretty good work.

Shivers has done quite a bit of other outstanding work, most of which will be incomprehensible unless you're a LISP-head or other such programming language geek. His History of T is probably in that category as well, and it has pretty much nothing to do with my theme of "figuring out the web as I go along", but so what? It's a fascinating account for a sometime LISP-head PL-geek and I reserve the right to drop stuff like this in from time to time.

I'm also hoping the piece is still interesting if you replace unitelligible phrases like "lexical scope" and "removing a cons cell from the free list" with "peanut butter". What's left is a bunch of now well-respected researchers in their early days, bouncing around the finest institutions in the US with stops in Silicon Valley and elsewhere, leaving significant discoveries in their wake.

And a lot of peanut butter.

Wednesday, November 26, 2008

Brachiating through the web

In a previous post, I needed to show two items of a list, intersperse some text and then resume the list with item 3. I knew there was an incantation for this, but I couldn't remember what it was. So I visited my old friend WebMonkey, whose HTML cheat sheet has remained unchanged for many years, but which still proves useful from time to time (WebMonkey also has more current material, but, leaving my webmastering to others wherever possible, I don't find myself referring to it).

Unfortunately, Ye Olde Cheate Sheete only documents HTML 2.0 or so. So I then fell back on my other standby, googling "HTML RFC". That brought up the RFC for ... HTML 2.0 (RFC 1866), dating to 1995. That's as far as the IETF goes. If you want more up to date than that, you have to go to the W3C. Sure enough, they have the HTML 4.01 spec, and that has the lowdown on lists [*], including the advice that I shouldn't be giving list items numbers anyway. I should be using stylesheets. Unless I should really be using XHTML. Oh well.

What caught my eye, though, was the definition given there of the Web:
The World Wide Web (Web) is a network of information resources.
It then goes on to mention the intertwined roles of URIs, HTTP and HTML. That seems impeccable, as far as it goes, and you can't question the source, but it tends to leave one wanting more. Which is why I don't feel too bad about having tried to go further, once or twice (or thrice).

[* What I really did was compose a new email with Thunderbird, use its GUI to set a list to start at item 3, save the result as a file and discover that the magic words are
<ol start="3"> ...</ol>
It took a couple of tries to get that to show up correctly, but that's a different story]

CD Player. Comes with music.

This is take two of the post I was trying to write when I ended up writing about BodyNet instead.

Technically, there's not a lot of difference between a cell phone and a streaming audio player. Throw in some flash memory and downloaded tunes are no problem either. Add a screen and you can say the same thing for video. But how do you get the content to the phone? Two models spring to mind:
  1. A big happy open web-driven marketplace. Surf wherever you want. Find something you like? Download it to your phone just like you'd download it to your PC. Pay whoever you need to when you download (or pay for a subscription). This is pretty similar to the CD/DVD market. Sounds nice, but as far as I know you can't do it. It's a lot easier to do DRM on a captive device like a cell phone, and cell phone makers are pretty aggressive about making sure you don't tamper with their devices.
  2. A collaboration between the content owners (i.e., studios and record labels, not to be confused with singers, songwriters, screenwriters, actors etc.) and the service providers. Subscribe to a service and you can also download or stream content from whatever content owners the provider has partnered with. This is pretty similar to the cable TV model. It ensures that everybody gets a cut (as always, we can argue over who gets what cut) and a number of partnerships have formed.
There's another model that doesn't come to mind because when you try to map it back to "old media" terms, it doesn't really fit. Yet there are at least two examples going, one of them recent:
  1. The cell phone makers sell the content. As the title suggests, this seems like selling a CD player and then selling the CDs to go with it. You see this in niches (e.g., Disney makes an MP3 player and sells plug-in cards with songs from their artists), and I wouldn't be surprised if some early phonograph maker tried it, but it doesn't seem like a great idea. Selling electronic widgets and selling bits are just two different things. Nonetheless, it certainly worked for Apple and the iPod/iPhone, and now Nokia is trying the same approach with Comes With Music (TM). It's not quite the same model as iPhone -- for a subscription fee, you can download all you want and keep it forever -- but it does share the feature of putting the phone maker in the content business.
So maybe they know something I don't. Wouldn't be the first time.

Tuesday, November 25, 2008

SearchWIki: addendum

It looks like the top hits for "SearchWiki" are heavy with "How do I turn this thing off?"

(Not) announcing SearchWiki

I'm not sure exactly when this happened. It seems recent, but maybe I'm just slow to notice. In another one of its quietly-snuck-in tweaks, Google has added a couple of widgets to its search results: "promote" and "remove".

The function is pretty clear: move this item up the list, or weed it out entirely. But over what scope? Ah, there's another clue: at the bottom of the page is something about "SearchWiki". And there's a "learn more" link.

Aha. Your choices and comments are kept with your account and re-used whenever you do overlapping searches (or as a special case, repeat the same search). You can also add links, make comments, and see what others have done (in the aggregate, I would expect).

Looks interesting, and harmless enough in its current form. Wonder if I'll end up using it.  [... and it's gone.  Not too long after it came along, if I remember right --D.H. May 2015]

Saturday, November 22, 2008

BodyNet fifteen years after

About fifteen years ago, Olin Shivers looked at a typical tech-savvy professional of the day carrying a pager (remember those?) a cell phone, a "digital diary", a keyless car remote, a notebook computer and a Walkman (remember those?) and concluded "That's one headset, two communications systems, four keyboards and five displays too many." You typically didn't have a good mobile web connection in those days, either. Shivers then went on to describe a more modular collection of pieces communicating through a short-range network he dubbed "BodyNet".

Fast-forward fifteen years. Are things any better? Well, yes. Did BodyNet happen? Well, depends on how you count.

These days you can get a portable thingie [see also "pocket-thing"] that will let you make phone calls, download and play music and videos, get and answer email, surf the web, keep your calendar and use GPS (or other means) to tell you where you are. You can also download other widgets/gadgets/apps/whatever-you-call-them that will let you do all manner of other things (or at least, play games).

There is even a short-range network standard, namely Bluetooth, that you can use to attach accessories to the thingie, including a headset that will let you talk hands-free, albeit at the cost of sometimes looking like you're having a conversation with the wall or your invisible friend Harvey the Rabbit. Except for its somewhat broader range, Bluetooth looks remarkably like the BodyTalk Shivers describes. I doubt that's a coincidence.

So: The hodge-podge of mobile devices one carries around have consolidated. There is now a short-range, personal, body-sized network. This network can connect to the Web. There is a market for devices to plug into that network. So BodyNet happened, right?

Not really. The original BodyNet was meant to be a mix-and-match affair in which you get the pieces you need a la carte and plug them together. Shivers specifically argues against a "monolithic" approach:
We do not believe in this [monolithic approach] over a broad class of users [...] Individual users will persist in remaining individual -- the system requirements of one will not satisfy the needs of another.
But the monolithic approach is just what we have today. "Phones" pack in more and more features. Worse (from the geekly perspective), they tend to do so in a very closed-system sort of way. The cell phone makers and service providers want very tight control over what you can and can't do with their product/service. True mix-and-match is restricted to accessories like headsets.

This seems to be a recurring blind spot for us geeks. We're taught "clean interfaces", "standard protocols" and "modularity, modularity, modularity." We forget that most people don't want to pick their favorite components and plug them together. The best selling stereos (do people still listen to stereos?) are all-one-piece. We have software "office suites" because no one wants to find out how well Word Processor A works with Spreadsheet B. Even on the wild and woolly web, people like portals and mashups that do the grunt work of pulling things together.

Phone makers sell all-singing, all-dancing phone/modem/GPS/email/web/music/video/... devices because people want them. They want the features, not the joy of putting them together.

Thursday, November 20, 2008

Is Vermont the new Delaware?

Huh?

It's all explained in the CFO.com article Vermont wants to be the "Delaware of the Net".

If you've dealt with the incorporation of a company (or you've already heard about this in the past few months), you probably know what this is about. Delaware has structured its laws in such a way as to be particularly friendly for companies seeking to incorporate. I've worked for at least one Delaware corporation (it was based in Silicon Valley). Sort of the Liberian ship registry of the US corporate world.

Vermont is trying to do the same for "virtual corporations" -- those without physical offices, paper filings and other physical artifacts one might expect of a company. In doing so it's trying to duplicate its success in captive insurance [*]. It's an interesting idea, but I'm curious just how much of a competitive advantage Vermont would really have. There are more factors to consider than whether papers have to be filed on paper. For example, credit card companies love Delaware for its lax usury laws. One could conceivably start a virtual credit card company, but would one want to charter it in Vermont or in Delaware?

On the other hand, it's hard to see what Vermont stands to lose. So why not give it a shot?

[*]I had to look that one up. Wikipedia informs us that a captive insurance company is a subsidiary that exists primarily to insure its parent company. Seems somewhat circular, but the captive always has the option of purchasing re-insurance from independent insurers.

Wednesday, November 19, 2008

How disruptive is online advertising?

A Forrester Research report quoted in a Wall Street Journal article, neither of which seems easy to access on line, says that Kids These Days spend more time online now than they do watching TV (apparently a good chunk of that time is spent gaming). Widespread adoption of broadband connections (or at least, considerably-faster-than-dialup connections) has been a big driver for this.

This is causing both major advertisers and major online advertising players to re-think how best to reach consumers in the new higher-bandwidth net.world. In the case in point, Proctor & Gamble and Google are going so far as to exchange employees, the better to understand each other's cultures and outlooks. An odder couple you couldn't ask for, and yet it appears to make business sense.

On the one hand, this is just another chapter of the "How do we make money off this 'web' thing?" saga that's been playing itself out slowly but surely for the last decade or so. But on the other hand, it gains a bit more urgency when rephrased as "We need to make money off this 'web' thing. The other stuff is drying up!"

My feeling continues to be that the web will have much more impact on how companies make money than on which companies are making it. That certainly seems to be the lesson from the dot.com boom and bust: WebVan folded but brick-and-mortar grocery stores still take orders online. EToys got bought out by KB Toys and Walmart and Target went online. Not to say that new companies haven't sprung up -- Google, Amazon and EBay come to mind -- just that old companies adapting has been more the norm.

In fact, the how is not necessarily changing that much. P&G still makes money selling soap, grocery stores still sell groceries and toy stores still sell toys. The big difference is in how they reach their customers. And even then, an ad still looks pretty much like an ad and an online catalog still looks a lot like a catalog.

Tuesday, November 18, 2008

Another size for URLs

I previously claimed that URLs come in three sizes: small, large and monster, of which only the last is actually a full-fledged URL.

Some time before that, I did a post on tinyurl. So that would be another size of URL: tiny.

Except, by my reckoning, tiny URLs are actually large -- they're definitely not "small" or "monster" -- except that unlike the examples I gave, tinyurl URLs are real live URLs.

However you slice and dice the categories, tinyurls seem to have found a very natural habitat in twitter.

Old-school image processing, or Moon pictures remastered

Before anyone set foot on the moon, it was considered important to survey the place. To this end, NASA sent probes fitted with cameras and the ability to beam back pictures. There being no CCDs at the time, the probes actually used film, developed it and scanned the resulting prints for transmission. The transmissions were recorded on magnetic tape as a backup, thanks to the foresight and efforts of Charles J. Byrne; the preferred mode of storage was to print the pictures and store the prints. What most of us saw the first time around, including the famous Earthrise image, was actually third-generation material: reproductions of photos of those prints. [Or fourth-generation if you count the original film up in space.]

Twenty or so years later, Planetary Data System co-founder Nancy Evans, then at JPL, took the tapes into her care and started a project with Mark Nelson to find and refurbish tape drives that could read the old tapes. The project stalled for lack of funds. Evans retired from JPL to become a veterinarian and stored the tape drives in her garage.

Another twenty or so years later, Evans retired as a veterinarian and went looking for someone to take the drives off her hands and, hopefully, put them to their intended use. Dennis Wingo and Keith Cowing took on the job, moved the drives into a disused McDonalds in NASA Ames' research park and set to getting them working again. This involved a lot of cleaning, swapping of parts and working with circuits whose components were actually large enough to see and handle. It took them 99 days, but they got the thing working.

Even better, the results are now on the web, as is the more complete account I'm summarizing.

The web has acquired another significant chunk of history -- the digital images the probes would have sent back if they could have, and if there had been any place to put them.

Most definitely a neat hack.

A friend is a friend is a friend ...

... or at least from the point of view of LinkedIn, but I think they're typical.

It's not unusual to have dozens of links ("friends") on a social networking site, even if you're not trying that hard. If you are trying, you can easily get hundreds. Are these all close personal friends, people you'd walk through fire for if they so much as asked you to? Probably not. Some of them are going to be closer than others, but there doesn't seem to be any way to indicate that.

Should there be? On the one hand, it would be useful, when chasing through your connections, to have some idea of whether friend A's friend B is someone A knows really well, or just an old schoolmate who happened to extend an invitation and, well, you wouldn't want to just say no for no reason, would you? So why not let members assign a degree of "closeness" to any friend? The resulting graph would be richer and more informative, and not appreciably harder to handle from an algorithm-geek point of view.

But would this really help? Everyone would have to make a snap judgment about "closeness" every time they added a link, and everyone will have their own slightly different idea of how "close" (say) a "5" is. Worse, ratings will almost certainly change over time, particularly on a purely social site like MySpace or FaceBook. Keeping the "who's in/who's out" numbers up to date could turn into a major timesink, not to mention an intricate political maze (but maybe that's what a lot of people are looking for on the sites in the first place?).

On the other hand, should you be looking at your connections' connections in the first place without talking to the person in the middle? Even if you decide to do that, there are still other cues to go by. On LinkedIn, for example, you can compare people's profiles and get some idea where the intersect, and you can look for recommendations. I would expect MySpace and FaceBook have more finely-developed mechanisms and conventions, but I don't know first hand (see previous comment on timesinks).

[The friend model doesn't seem to have changed much, but language and custom have adapted -- a Facebook friend is probably not your BFF -- D.H. Dec 2018]

Monday, November 17, 2008

URLs, URLs and URLs

It occurs to me there are basically three sizes of URL:
  • Small, like foocorp.com. These aren't even really URLs, but your browser is smart enough to figure out you mean http://www.foocorp.com/index.html or whatever it really is.
  • Large, like www.foocorp.com/widgets. Still not really URLs, but the browser will kindly prepend the http://.
  • Monsters, like http://fieldnotesontheweb.blogspot.com/2008/07/again-just-what-is-this-web-thing.html or http://www.blogger.com/post-create.g?blogID=2129929182918599848. These are actual URLs as defined in the standard.
The small size is what everyone thinks of as a web address. No one wants to type more than that into the browser. Sometimes you can get away with a large URL, if both parts make sense and you're pretty sure people are really interested in what you're saying.

Real URLs are not fit for human consumption, except maybe for cutting and pasting. They might as well all say http://dontreadthisyadayadapeanutbutter. If you actually have to read a real URL, and you're not actively committing web-geekery at the time, something has gone wrong.

[Side note: Some, notably the standards authors, use "URL" as the plural of "URL", evidently on the grounds that URL may stand for "Uniform Resource Locators" just as well as "Uniform Resource Locator". This may be standard, but it's not the usual way acronyms and initialisms form their plurals. I shan't use it thus.]

What's the difference between the Web and HTTP?

The web, being whatever we want it to be, is already up to 2.0.

HTTP, being a tangible thing, is only on 1.1.

Sunday, November 9, 2008

3xx REDIRECT

[If you got here by googling "3xx redirect" or similar, you may want to consult the HTTP 1.1 spec.]

I'd originally titled this post "In praise of 3xx REDIRECT" and led off with a couple of quotes, one from Bjarne Stroustrup (of C++ fame), about levels of indirection in computing science.

Then I tried to chase down the Stroustrup quote and discovered that he was probably quoting someone else when I heard it. Then I tried to chase down the other one, with even less luck.

Then I turned to investigating 3xx REDIRECT itself.

Now, while I'm not going to retract my claim that 3xx is praiseworthy, it turns out that there's a difference between the nice warm fuzzy abstract feeling that indirection is useful and the, um, interesting reality of what happens in the World's Most Famous Protocol as the web grows explosively, browser battles browser and search engines try to index everything in sight. "It turns out" is a math-major euphemism for "I should have realized".

OK, for those who don't spend their time poring through RFCs and other technical documents, what is this "3xx REDIRECT" thing? As I said, the idea is simple. It's a way for a server on the web to send back a message that says "What you're looking for, it's not here. It's actually over there." In other words, it's a forwarding facility, entirely analogous to mail forwarding or call forwarding or the sign on the front of a shop that says "We've moved across the street."

In web land, every HTTP request returns a three-digit status code, an idea stolen from FTP (File Transfer Protocol) or wherever FTP stole it from, because it's a fine idea well worth stealing. Codes in the 200s, like "200 OK" say "It worked". Codes in the 400s, like "404 Not Found" and the particularly harsh "406 Not Acceptable" say "It didn't work and it's your fault." Codes in the 500s, like "500 Internal Server Error", say "It didn't work and it's my fault." [*]

The 3xx codes say "It's not here, but here's where you can find it." There are several variants. The main division is between "301 Moved Permanently", which says you should forget all about the old address and use the new one, and everything else, which doesn't. Two of particular interest are "302 Found" and "307 Moved Temporarily".

Now, if 301 is "Moved Permanently", wouldn't you expect "Moved Tempoarily" to be right next to it at 302? Indeed it was, in HTTP 1.0. Unfortunately [**], not everyone treated 302 as it was specified and in HTTP 1.1 302 became "Found", meaning (sort of) "I found what you wanted, but not here." and 307 became the new 302 (the actual differences in what happens on the wire are a bit more subtle). Worse, at least some server setups will use 302 by default for any redirection unless you tell them otherwise.

As a result, 302 is now hopelessly overloaded. It might mean what it originally meant. It might mean what it's officially supposed to mean. It might even mean something else, like "moved permanently, forget you ever knew that old address" but the webmaster neglected to say so explicitly. And yet, the web goes on working its wonders.

Standards. You gotta love 'em. Any standard that sees real use is really three things:
  1. What the document says
  2. What the implementations do (based in part on what people think the document says)
  3. What everyone thinks the implementations do
And of course, (2) depends partly on (3), and the next version of (1) is generally influenced by both.

[*] The astute reader will point out that I omitted 1xx. The astute reader will be right, as usual.

[**] I'm by no means an expert on what web servers, browsers and crawlers actually get up to. I'm relying here on stuff I've heard, or gleaned from a bit of googling, and particularly on this lengthy writeup, or at least the part of it I actually read.

Thursday, October 30, 2008

Musopen: Making money on free music

Also on the UCSD music blog I mentioned (which, on closer examination turns out to be, ahem, the "music" category of the UCSD Arts Libraries blog) is a post on a Musopen, a free classical music site that, among other things, will let you pledge toward getting someone to record the public domain work of your choice and place the recording in the public domain (They call it "bidding", but I'm not sure that's quite the right term to use -- generally a bid is for the whole price and the highest wins).

The performer sets the price (e.g., $60 for Für Elise or $3500 for Mozart's Requiem). You tell them how much you'd be willing to pay to see it recorded. When the total pledges match the asking price, the transaction goes through, the performer performs and everyone in the world can listen for free, whether they contributed or not. This is effectively one of the business models Stallman mentions in the GNU manifesto.

The economics of this look interesting. I doubt anyone is going to get rich off of it, but there would appear to be some value to people in making a recording happen and free riders be damned.

[Museopen is still around, but without the business model.  The business model is still around, without Museopen, under the name of KickStarter, of course --D.H. May 2015]

Wednesday, October 29, 2008

IMSLP and international copyrights

While digging through the links in a so-far fruitless search for evidence that the Worldscape Laptop Orchestra still exists, I ran across an interesting blog on music, hosted by the University of California at San Diego. Along with posts on local goings on and stuff like cool-looking random musical instruments, it contained this interesting/alarming post on IMSLP (the International Music Score Library Project).

This site was basically a carefully-moderated wiki to which people could upload music scores. The Canadian administrator of the Canadian-hosted site had been very careful only to put up scores in the Canadian public domain, and the Canadian Supreme Court had ruled that such sites are entitled to presume that they are being used in a lawful manner.

Austrian music publisher Universal Edition sued on the grounds that it had put up works (it didn't say which particular ones) still under European copyright. At stake, then, is the question of whose copyright law applies: the host or administrator's or (effectively) the longest copyright period on the books anywhere. The situation seems entirely analogous to that of international libel laws.

As a result of the suit, the site was taken down in October 2007. At this writing, the IMSLP is back up, having been put back up in June 2008. The re-opening letter accompanying the re-launch is well worth reading. Among other things, it discusses (and endorses) the idea of collaborating with commercial music publishers.

No, I won't help you crack Worldscape

We have a new record holder for most random Google search to bring up a Field Notes post. As of this writing, “how to guess someones password on worldscape”pulls up 80 hits, of which the second is the Field Notes archive for November 2007.

Though the preamble at the top of Google's cached version claims that someones and password "only appear in links pointing to this page" (one day I shall track down what they're driving at with that), both password and the uninflected someone appear in the page, along with how, to, guess, on and worldscape.

In other words, Google is perfectly correct in bringing up this page. Normally, it would have put millions of other articles ahead of such a hit in the list, but in this case there just don't seem to be that many candidates to choose from. Sometimes dumb can only be so smart. Fortunately for our searcher, some of the other hits have to do with the art of guessing passwords, a good thing to read up on next time you have to make one up.

For the curious and/or obsessive, here's the rundown:
  • guess shows up mostly in the articles on anonymity, with regard to guessing someone's identity, and otherwise where I say I'm guessing (four times in that apparently uncertain month)
  • password shows up in the post on trusted computing
  • someone shows up because it's a reasonably common word. It's probably a bit more common than chance in the anonymity articles
  • how and to are most likely dropped on the floor, but of course they appear in numerous places
  • And the punchline: worldscape refers to the Worldscape Laptop Orchestra, probably not what the searcher was trying to gain access to. But I do see a note there that I need to find a better link. Don't they have a home page yet?

Tuesday, October 28, 2008

What do I mean "over the net"?

Re-reading, I notice I said in the post on tinyurl that URLs are "specifically designed to be written down, printed, put on billboards, read over the phone and otherwise transmitted non-electronically."

That's not quite right. Reading something over the phone is transmitting it electronically. How about "non-digitally"? As far as I'm concerned, text is digital, whether you put it on the wire, store it in a disk or spell it out in print. OK, how about "not transmitted over the net"? Well, OK, but what about VOIP? A significant portion of phone traffic goes over the net.

How did the RFC I was referencing get around this? It says a URL "may be represented in a variety of ways: e.g., ink on paper, or a sequence of octets in a coded character set." In other words, it talks about representation and not transmission. That's a good call for the RFC, but it leaves me out in the cold.

OK, so what's different about sending a URL in an HTTP request and reading it over the phone to someone over what happens to be a VOIP connection? Clearly the VOIP-ness of the connection is accidental, not essential, whereas HTTP has to run over a TCP connection. It's incidental because the audio stream is opaque to the software. The VOIP infrastructure will carry conversations that contain URLs and ones that don't with equal ease. In the case of an HTTP request, the URL is very much visible and needed.

Is this all just quibbling over definitions? Sort of. But the same problem crops up in other, more practical situations. An unstructured block of text is semi-opaque to the web infrastructure. You can scrape it and parse it and extract nuggets of meaning, but that takes a lot of effort. About the only thing you can really do with it easily is index it using what are largely crude but effective methods. Images and audio are even more recalcitrant.

As I understand it, the semantic web and its cousins like microformats are attempts to dig out otherwise obscured information and present it in machine-friendly form. From that point of view, if I were able to, say, press a "mark URL" button on my phone and cause the URL to be extracted via speech recognition, the phone call would cease to be opaque and would become at least that bit more webby. If it were stored somewhere accesible by a URL along with any extracted information, it would be a full-fledged resource.

Not sure there's anything deep or notable in all that, but there it is. I should also note that I previously touched on the metadata-scraping theme from a slightly different angle in this post.

Cox and cell phones

Frankly, I'm not up to speed on this one, but I saw it float by and thought I'd mention it. Cable provider Cox Communications is looking to get into the cell phone business. Since like most cable companies they're already in the broadband internet business, and they also provide wireless internet, this seems like a bid to expand its presence into yet another means of delivering bits.

I have no idea whether this will work. If it does, you've got one-stop shopping for pretty much everything people do with bits these days and it's a brilliant coup. If it doesn't you've got a company with more fingers in pies than it has fingers and it's a colossal blunder. Judgment will be delivered in retrospect, as usual ...

[The Wikipedia article on Cox calls it the third largest cable provider and the seventh largest telecoms provider in the country.  Looks like it worked out OK.  Even if on a later sweep through Cox is defunct, I don't see how you could blame it on the decision to enter the cell phone business. --D.H. May 2015]

Christian Science Monitor to drop print edition

The Christian Science Monitor has just announced that, starting in 2009 it will no longer publish a print edition but will shift to "an online publication that is updated continuously each day." Since this would describe most papers' online editions (including the Monitor's I would think), the upshot is that they're dispensing with the print edition. I heard about this on American Public Media's Marketplace, which pointed out that the Monitor is something of an outlier, since it's a non-profit publication funded by a church.

As such, the Monitor is more concerned with maximizing exposure while minimizing cost. That sounds an awful lot like the usual newspaper business model, except for the small detail of advertising revenue. In other words, it's nothing at all like the usual newspaper business model.

Except, maybe it's not so different. It's not the Daily Monitor, it's the Christian Science Monitor. The whole idea is to get the Christian Science "brand" out there. In that sense, the Monitor's model is also advertising based, albeit with a single, fairly deep-pocketed and indulgent advertiser. So ... maybe it's not such an outlier.

We shall see.

Sunday, October 26, 2008

Tinyurl

(I thought a short title was appropriate)

One of my favorite measures of how well something is doing on the web is whether I see it mentioned in the news (excluding, say, the San Jose Mercury News). Today I saw a couple of tinyurl links in the Sunday paper.

Tinyurl is one of those cool ideas that I personally don't find much occasion to use. In fact, I had forgotten they were around, but I'm glad they still are. The whole concept is just so, well, webby. You can explain in a short sentence what it does: Provides nice short URLs that work in place of the monsters you sometimes run across. You can also explain in a short sentence how it does it: Keep a database of short identifiers and long URLs and use HTTP's redirect facility to make one stand in for the other.

It's so fundamentally cool that I'm surprised that I don't use it. Most likely it's because I send and receive most email in URL-friendly HTML, and of course Blogger is right at home with any URL-from-the-deep. But as the RFC makes clear, URLs aren't just intended for computers to see. They're specifically designed to be written down, printed, put on billboards, read over the phone and otherwise transmitted non-electronically. That's not something I seem to do much, but it's a good place for tinyurl. My newspaper seems to agree.

[Shortened links seem to have found a nice niche on Twitter -- D.H. Dec 2018]

Monday, October 20, 2008

Radio bookmarks

It's fund-raising season at NPR again, and some stations are giving out "radio bookmarks" as premiums. What's a radio bookmark? It's basically a USB device that you can push a button on while you're listening to the radio. It will record enough information for you to go to the station's website when you next go online, find out what was on just then, and even play it back.

In other words, it records the time.

Sort of disappointing when you put it that way, but on the other hand there's no rule that says an idea has to be complex for it to be at least moderately useful. Back in the dotcom days, the same idea was supposed to make Sony a lot of money. Dan Bricklin has a review of it (then called "emarker") from Comdex 2000. The original emarker could store (drumroll, please) up to 10 events, which seems ridiculously small even for the times, but what do I know?

Bricklin makes the point in real time that I was going to make eight years after the fact, that the setup "puts the appropriate intelligence at the right places". In other words, the device itself is dumb. The intelligence comes from the underlying database via the web. I agree with Bricklin that this is the right way to do it, if only because you can easily correlate with other databases as well (video springs to mind). In a way it's a good example of "dumb is smarter", though in this case it's a particular piece, not the system as a whole, that's deliberately dumb.

It's a separate (and interesting) question whether the emarker/radio bookmark "uses the Internet in ways that show the future". Eight years on, I'm thinking not so much, but then the question becomes "why not?". The general idea of connecting a fairly dumb device up to the web to get a smart system overall seems good. Maybe there are other examples hiding in plain sight?

Thursday, October 16, 2008

Intellectual Property -- the very idea

I just went through and tagged a bunch of posts intellectual property, including all the ones tagged copyrights or DRM, and several tagged copy protection (there is some overlap, naturally). We now have a new number one with a bullet on the list of tags.

I'm not crazy about the term "intellectual property". It hash-collides with "internet protocol", its exact meaning is not particularly obvious, and the usual connotations of "intellectual" feel a bit out of place here. But it's certainly useful to have a general term for, well, what shall we say ... "data having economic value"? I'm not as particular which exact term we use, so long as we more or less agree on what it means and when to use it.

Actually ... one of the great things about the web is it's so easy to look things up. Dictionary.com gives the Random House Dictionary's definition, which includes the key phrase "property that results from original creative thought," and -- this was a pretty big surprise to me -- traces it back to the 1840s. So, while we might object to the term as "marketing speak" or to the whole concept on the basis that "information wants to be free", the term has a legitimate pedigree and is clearly here to stay.

A quick take on the IP czar

Amidst everything else that's been going on in Washington DC lately, congress recently passed the "PRO-IP" bill (here's the Senate version). One of the many things it does is establish a cabinet-level "Intellectual Property Enforcement Coordinator". Apparently the Justice Department was not best pleased with this, but (as always keeping in mind that I'm not a lawyer), it doesn't seem completely out of line that such a post would exist. Regulation of patents, at least, is on the short list of powers granted to congress in Article I, Section 8 of the constitution, so this is not like establishing a cabinet-level post for, say, design of postage stamps.

On the other hand, we seem to have done fine for a couple of centuries without such a post, and as always the devil is in the details. This being legislation, there are a lot of details. Since I'm still not a lawyer, and legislation is written as deltas against the mammoth U.S. Code, making it essentially impossible to just read through a bill and know what it means, and as far as I can tell the emphasis of the thing is on piracy and not on rationalizing IP law in general, I'm not prepared to say whether we might need this particular realization of the idea.

That said, it's at least worth noting that such a thing has happened. New cabinet-level posts don't get created every day.

Monday, October 13, 2008

Is the web saving trees?

The latest Economist has not one but two articles -- from its print edition, mind -- on the "Paperless Office". The first is more about the fact of the increasingly paperless office, the second delves into implications.

Back in the 1980s, when PCs started to take off and people used terms like "Desktop Publishing", it was obvious that before long there would be no need for paper in an office. Why xerox a memo when you can send an email? Why keep a paper ledger when an Accounts Receivable application will do all that for you? Why work up a spreadsheet when you could use, well, a spreadsheet?

By the year 2001, which saw the publication of a book called The Myth of the Paperless Office, it was clear that the hype was not going to pan out. People were using more paper than ever. A lot of places wanted a physical, paper backup of important files. Some people -- myself included -- just wanted to see certain things on paper, things like email (for executives) or code (for geeks like myself). In my case, I had a hard time navigating a 1000-line code listing on a 24-line screen. More on that in a bit.

Exept the year 2001 was also when paper consumption in the office peaked. It turns out that people really did start to use less paper. Email really has replaced memos, and so forth. When people do print things out, they seem more interested in nice, high-contrast color prints (e.g., of photographs or brochures) than just printing draft text. So what happened? The prevailing theory seems to be that kids these days are just more comfortable with a paperless environment. I'm sure they are, but I don't think that explains it all.

Like I said, I used to have a hard time navigating code without printing it out and scribbling on it. What changed? For one thing, I got more used to editing on a screen. For another thing, screens got bigger and sharper. There is a world of difference between paging through 24 monochrome 80-character lines at a time and looking at more than twice as many nice wide lines. For another, interfaces got nicer. It's much more convenient to search back and forth, to navigate from one place to another, to flip back and forth between several different documents, and so forth, than it was in the 80s.

Navigability is a big deal. In coding, it means I can easily get, say, from where I use something to where it's defined -- one of the big reasons I'd want to print out a bunch of code and lay it out on the floor. For non-geeks, it's the ability to click on a link on a web page and Just Go There. You can't do that on paper. Until the web came along, you (generally) couldn't do that on your screen either. Now it's routine.

In other words, it's not just (or perhaps not even primarily) a societal change. The technology, and particularly the software, made significant advances between the 1980s and the turn of the century. One change -- the widespread adoption of hyperlinks -- was seismic, but many changes -- higher screen resolutions, nicer UI widgets, nicer text formatting -- were incremental.

From that point of view, I'd agree with the second article that technological change and external shocks can bring technologies back into favor, but I'm not completely sold on the societal angle. In particular, I disagree that "The paperless office shows how a sociological shift can make the difference: although the technology did not change very much, its users did." The technology did change quite a bit, just not all at once or all that visibly. The gradual improvements were too gradual to draw notice, the web too pervasive. Who notices the air?

[Nowadays I often go for days, weeks even, without printing anything out, and whole days from time to time without writing anything down on a physical medium.  Along with better displays and better tools, I've gotten more comfortable working without the tactile cue of, say, circling something and drawing an arrow from it to something else.  In other words, it's partly technology, and partly gradual shifts in behavior from using the technology --D.H. May 2015]

Sunday, October 5, 2008

Steven Pinker's notion of Web 2.0

In a commentary on recent happenings in the US economy, linguist/evolutionary psychologist Steven Pinker asserts that "The past decade has shown us that unplanned, bottom-up, productive activity can lead to huge advances in social well-being, such as Linux, Wikipedia, YouTube, and the rest of Web 2.0." I generally like Pinker's work and tend to agree with many of his points, but ...

Personally, I wouldn't call any of these "huge" advances in social well-being, though I'll certainly agree they've moved the ball forward in their own ways. I'm also not sure that "unplanned" is quite the right word and I suspect Linus might agree on that (at least regarding Linux). But that's a separate quibble.

What I find interesting here is that he appears to include Linux as part of Web 2.0 (The sentence is slightly ambiguous -- he might conceivably have meant "YouTube, and the rest of Web 2.0" as distinct from "Linux, Wikipedia", but that seems unlikely given the punctuation and that Wikipedia is very much a web thing).

Leaving aside the finer points of parsing the sentences of linguists, what's interesting about including Linux, and not specifically mentioning "social networking" sites and not even coming close to mentioning technologies like AJAX, is the emphasis on "unplanned, bottom-up productive activity". So to Pinker, Web 2.0 appears to be more about a democratic "open source" ethos than about any particular product, and even less about any particular technology.

While it differs fairly sharply from the more familiar "social networking" or "AJAX" or "social networking plus AJAX" versions of Web 2.0 one runs across, the position is certainly worthy of consideration. I have a particular sympathy for it since I consider technology more a reflection of human nature than a shaper of it.

However, I don't think that "Linux, Wikipedia, YouTube, and the rest" is an appropriate definition for Web 2.0. First, I try to defer to current usage, and most people in the biz don't seem to use "Web 2.0" that way. Second, the very real and useful thing Pinker seems to be trying to capture here considerably predates "Web 2.0".

In fact, it predates Web 1.0. According to Wikipedia (see, I told you it was socially useful), the Web debuted in August 1991. It can, of course, trace its technical roots back earlier, but as a living, wide-scale collaboration it didn't really get going until 1992 or so. Meanwhile, the Linux kernel also debuted in August, 1991. I cheerfully admit I had no idea that these two major developments were sprung (quietly) on the world in the exact same month.

If we're litigating here over "which came first, the kernel or the web?" we'd have to declare a virtual tie. But that's not my contention. My contention is that the notion of internet-driven "unplanned, bottom-up productive activity" was well-established long before the web was the web. The announcement of Linux just happens to provide a useful case in point. Consider:

  • Like everything else, Linux did not appear from nowhere. It draws directly on BSD, an open-source UNIX kernel with its origins in the late 1970s. One can quibble over whether BSD, with its institutional sponsorship, would qualify as "unplanned", but the 'B' does stand for "Berkeley".
  • Linux is also intimately related to, though decidedly not identical to, Richard Stallman's GNU, first announced in September 1983.
And here's the punchline: Linux, GNU and the Web were themselves first announced on Usenet. Usenet seems at least as good an example of what Pinker is driving at as any of the examples he does cite, including being "productive" in roughly the same sense that YouTube is "productive". In retrospect, Usenet could be considered Web 0.1. Usenet dates to 1979, roughly the same vintage as BSD.

Also of that vintage (depending on just how you count): The internet itself.

In short, Web 2.0 has been around for a few years. Linux is older than Web 2.0 and about as old as Web 1.0. The kind of bottom-up social collaboration Pinker seems to be referring to has occurred over the internet for roughly as long as there's been an internet. In keeping with my theme of human nature shaping technology, I'll leave it to the reader to ponder whether such activity might have been occurring off the internet for considerably longer.

Sunday, September 28, 2008

More on Chinese smart phones

I previously wondered whether the lack of a full-sized keyboard on smart phones would be less of a problem in China than in the western world. So I asked a friend who analyzes the cell phone market and has recently been to China on just such business. He told me that a smart phone with a touch screen is a pretty good match for Chinese. There are two input methods, one more based on picking from a menu and the other on actually drawing the characters. Both seem to work well.

Saturday, September 27, 2008

Virtualizing buzzword compliance

The words of the day are "virtualization" and "cloud computing". The idea behind them is that actual computing -- CPUs crunching through instructions, databases being updated on permanent storage, nuts and bolts computing -- can happen in data centers instead of on people's desktops.

It's "virtual" because the chunk of computing resources you're using is not necessarily tied to any particular physical machine, much less the particular machine on your desktop, on your lap or in your handset. It's "cloud computing" because you neither know care where the actual computing is taking place. It's happening somewhere out in "the cloud".

I first heard someone talk about "the cloud" many years ago, back when bang paths still walked the earth. At the time I was accessing the internet, sort of, by dialing into a friend's system at his lab. That box was connected to more-connected boxes at the university, which in turn were connected to similar boxes at other sites, and so forth. In other words, to connect box A to box B, you had to know at least something about how to get from one to another.

Or, you could get a newfangled connection into "the cloud". If box A and box B both had such a connection, you could go directly from one to the other. Any intermediate connections happened "in the cloud". In other words, it was exactly the kind of connection everyone has today.

I first heard about virtualization even longer ago, though not under that name, in a high school class on business computing. And by business computing I mean COBOL, RPG, radix-sorting of Hollerith cards and similar specimens from adjoining strata. At one point we had a real live computing professional come in and explain to us how a real business computer worked.

He explained that it could be a problem to have, say, everyone trying to print on the same printer at the same time (and by printer I mean a beer-fridge-sized box chewing through wide-format green-and-white-striped fanfold paper). But, he went on, with a little magic in the operating system you could say "disk drive, you're a printer". The print job would sit in a queue on the disk drive until a printer was free and then it would print. If you had five paper-spewing beer fridges, you could divvy the work up among them without caring which one was going to actually print any particular job. Print spooling, in other words.

That was an eye-opener. The task of printing was separated from exactly where or when the printing took place.

At the same time, in a different class, I could dial into the "time sharing system" at the university and hook up to what looked like my own personal computer (and by hook up, I mean dial up at 110 baud by mashing a phone receiver into a foam receptacle). It ran whatever program I told it to, it remembered my files from last time, and so forth. Except it wasn't actually my own computer. I was really sharing one room-sized computer with everyone else and the computer's operating system was multitasking amongst us (here's a bit more from those days).

That seemed like a really neat trick, and it still does.

Now, you can read this sort of "nothing new under the sun" spiel one of two ways. One is "All this 'cloud computing' and 'virtualization' stuff was done years ago. There's nothing to it." The other is "The basic ideas behind 'cloud computing' and 'virtualization' are so old and useful we forget they're even there until someone drums up a marketing campaign to remind us."

I tilt toward the lattter.

Thursday, September 25, 2008

Will web 2.0 "change the shape of scientific debate"?

The subtitle of this piece in The Economist set my spidey-sense tingling: "Web 2.0 tools are beginning to change the shape of scientific debate."

"Which web 2.0 tools are they talking about," I wondered, "and how are those tools changing scientific debate?" I also picked up notes of "it's all different now." (I'm a technologist by training and practice and a moderate technophile by inclination, but I'm also convinced that technology use reflects human nature at least as much as it changes it.)

Come to find out, they mean that people are -- in some cases -- bypassing journal publication as a means for publicizing results, and there is now a meta-blog, Research Blogging, aggregating research blogs. Cool stuff, and well worth a browse, but is it all that big a change?

As I said, I'm a technologist, not a scientist. The scientists I've known, I've known on a personal, non-professional basis. I won't claim to be an expert on the workings of the research community, but as far as I can tell there have always been several different levels of scientific information exchange, ranging from beer-soaked discussions at conferences through personal correspondence to publication in peer-reviewed journals.

My experience as a former member of the ACM is that there are also various flavors of publication: some are lightly reviewed and aimed at speedy publication of interesting ideas (e.g., SIGPLAN Notices) while some have heavyweight formal review processes and are aimed at publishing major developments in definitive form (e.g., Journal of the ACM). Now, one can argue whether computer science is a proper science, but as I understand it computer science research has much the same form as other areas of academic research.

From this point of view, a site like Blogging Research probably fits in between personal correspondence and publication in an unreviewed journal. More a filling in of a niche than a major shift in the landscape.

It also seems good to bear in mind two general trends: First, the overall volume and speed of communication has been steadily increasing for centuries, as has the number of people one can potentially communicate with easily. Second, the scientific community has been growing and becoming more specialized.

In Marin Mersenne's day, one could argue that the (Western) scientific community numbered in the dozens or hundreds, and Mersenne personally corresponded with a significant portion of them. Over time, communication has improved, the community has grown and the overall volume of scientific writing has increased dramatically. From this point of view, the adoption of email, newsgroups and now blogs is just part of a natural progression. For that matter, so is the development of the peer-reviewed journal.

The shape of scientific debate has more to do with the process of research and collaboration, I think, than with the particular means of communicating findings and theories. In that sense, I wouldn't expect the shape of debate to change much at all in the face of web 2.0 or web x.0 in general.

The article does point out an amusing example of the cobbler's children going barefoot, though:
[T]he internet was created for and by scientists, yet they have been slow to embrace its more useful features ... 35% of researchers surveyed say they use blogs. This figure may seem underwhelming, but it was almost nought just a few years ago.
(On the other hand, there were a lot fewer blogs of any kind a few years ago ... now how would one properly account for that statistically?)

[Not long after this was posted, ArXiv came along -- D.H. Dec 2018]

Monday, September 22, 2008

Another data point on immersion

A while ago I estimated the bandwidth required for 3-D Imax at 13GB/s, uncompressed. Today I got the latest newsletter from the California Academy of Sciences bragging about the new and improved Morrison Planetarium. They say this bad boy can blast 300 million pixels per second onto its hemispherical screen. At 32 bits per pixel, that's about 1.2GB/s, or about a tenth of my IMAX estimate (I think I used 32 bits per pixel to ensure a conservative estimate of the bandwidth required. 24 ought to be good enough).

Take out a factor of two since 3-D requires two images and assume that the frame rate is 24 frames/s in both cases. The remaining difference is down to spatial resolution. IMAX is about 10K by 7k, or 70 megapixels, so the Morrison is more like 14 megapixels. In the earlier article I guessed that the 13GB/s could probably be compressed down to 1GB/s, partly because the two 3-D images would be largely redundant. Planetarium-level 2-D video would also compress, but not quite as much. Bottom line, you're still looking at gigabits per second to be able to handle this stuff.

The Academy also claims that the panetarium would hold 2,402,494,700 M&Ms, but I'm skeptical about either the volume of an M&M or the planetarium, much less both, being known to a part in 10 million.

Signing webmail

While looking up something on S/MIME, I ran across one of those little "hmm ... interesting point" points. How do you digitally sign webmail?

The problem isn't that mail sent through a web interface is somehow harder to sign. The question is who do you trust to do the signing. The whole point of webmail (and one version of "the cloud") in general, is that you can access it from anywhere on any device. The whole point of digital signatures is convincing everyone that only you could have done the signing. Two great tastes that don't go so great together.

If I only send mail from a particular box, I can just leave my private keys in a file on that box (password protected, of course) and use a mail client running locally. I tell the mail client my password, it fetches the keys and signs my message. This approach is as secure as my box and the firewall it sits behind.

If I'm in a cybercafe or using one of those airport internet kiosks that I've never, ever seen anyone use, I don't want to copy my secrets onto a local drive (assuming I even can) or otherwise hand them over to a the browser directly. You don't know where that thing's been.

The next candidate would be the webmail server. Hand it my keys, once, very securely. Then I log in to my webmail and send a message from wherever. The server signs it for me without revealing my secrets to the browser.

There are a couple of problems with that. One is that I'll need to be careful logging into the server. If the browser is actually a password-stealing trojan horse, then whoever is running it can grab my login and forge whatever they like under my identity. Another is that if the webmail server is ever compromised, the attacker doesn't just get my key, but a whole treasure trove of keys to go cracking for weak passwords. That's a problem for the people running the server, but it's also a problem for me since the server is a nice, big, visible target.

There are ways to make logins trojan-horse-resistant, for example smart cards that generate a fresh hunk of random digits every so often, but these are not widely deployed. The bottom line is that any server with your keys on it is only as strong as its authentication system, and most of those aren't that strong.

What would be better? The basic trust issues are clear enough, but the kind of mental ju-jitsu needed to think through all the various counter-measures and counter-counter-measures is hairy in the extreme. True black belts are relatively rare, and I'm not one of them. But from what I can tell an airtight solution would probably involve some kind of smart card that you could plug into the device you're actually using (the pocket-thing would be a candidate, but then, it could probably send signed mail directly). The server would then need a conduit through which it could send data up to the smart card and get signed data back.

And even then it would do well to check that what you signed was actually what it sent you.

Friday, September 19, 2008

Is it too early for a greatest hits collection?

Of course it is. But that never stopped anyone.

About a year ago I started tracking traffic to this site, such as it is. For the most part, it looks like a few people stop by on an average day (thanks!), more if I've been posting more lately, fewer if I haven't. Gizmodo I ain't.

Most pages get read when they're first posted, if then, but a few have fared somewhat better. Here are the top several:
  • Two things I didn't know about Blu-Ray got more hits than anything else on the site, all on one day when it briefly appeared in the "From the Blogs" section of the CNN piece it links to. Since then, nada.
  • On the other hand, Digital vs. interactive vs. internet vs. broadcast got some hits when it was first posted, I think because it showed up on Bloglines, but continues to show up in people's searches from time to time.
  • Information age: Not dead yet has been a slow but steady earner. It seems to pop up consistently in searches like "When did the information age start?", a question it asks but doesn't definitively answer (indeed, one of the main points is that you can't definitively answer that one).
  • Wikipedia's angle on anonymous IP addresses has consistently shown up in searches with "Wikipedia", "anonymous" and "IP" in them. Interestingly, the top search is anonymous IP, with no "Wikipedia" in it. Apparently the post managed to acquire a good page rank at some point, though I couldn't find it in the quick searches I just tried, and indeed the hits stopped a couple of months ago.
  • I'm very glad to see that people are still interested in Peter Deutsch's list of fallacies of distributed computing, and glad if my post on it helps people find and appreciate it. The post still shows up on the first page of hits for deutsch fallacies. While verifying that, I ran across this interesting piece on the history of the list. I had no clue as to the history of the list until just now. I just liked its contents.
  • The very first post here, E-Tickets and copy protection, still turns up in a motley assortment of searches, including copying a dongle, copies of e tickets and beowulf copy protected cannot copy. I get the feeling that at least some of these visitors are going to come away disappointed. Quel dommage.
  • Megacommunities is another one with stronger interest at first and blips afterward. I'm afraid it doesn't offer much more on the subject than a few rough numbers and some musings, but maybe it'll help get the ball rolling.
  • Hacking the ENIAC turns up in searches for the people involved, and I'm very glad if I can help direct a little more credit towards that team.
  • My fairly skeptical take on hyperreality, Is hyperreality the new reality? has turned up in searches, though it seems to have dropped off of late. While checking that, though, I noticed that someone is actually using the tile I originally had in mind for this blog ("morphisms"), though under a different spelling (and with somewhat different theme and content).