Wednesday, December 31, 2008

Happy 2009!

As you're waiting for the ball to drop tonight (or whatever other marker you like), remember that 2008 will be hanging around for just a bit longer than the last few years. A leap second has been introduced for the first time since 2005 and only the second time this century.

Leap seconds are interesting (and somewhat problematic) to anyone interested in keeping a computer's clock exactly in sync with the official timekeepers because, unlike leap years, they do not follow a predetermined rule. Rather, the International Earth Rotation and Reference Systems Service (IERS) announces them based on observations of the earth's rotation. Despite what the name might suggest, the IERS does not actually cause the earth to rotate.

The Wikipedia link above has a good rundown. The official version is on the IERS site, but be advised that it's a bit technical. Evidently "How much is the accuracy of precession/nutation improved by MHB2000 and corrections to the precession constant and obliquity rate over the current IAU models?" is a frequently asked question.

Business model blues

Item 1: A local newspaper reporter muses on a local radio show about the future of newspapers. Yes, the economy is bad, but the long-term problem is that classified ads have gone online and they're not coming back. One option is to charge for a paper what it actually costs to produce; this would more than double the price, cutting readership and so on and so on. The days of the newspaper as everyone's window to the world are fading fast. That, too has gone online.

Item 2: Viacom is urging viewers to call their cable operators (Time-Warner cable in particular) to urge them not to drop their channels. At issue: Viacom wants to increase its charges to providers. Providers argue that ratings are decreasing and the shows are available online for free in any case (often via the exact same cable). Fundamentally, the shift is from TV -- or perhaps more precisely, dedicated TV channels -- to online, where bits is bits and video is just one more bit stream, albeit a considerably bigger one than pretty much everything else.

(Here's the top hit I got for viacom time warner, while putting together item 2. The particular article is from Dow Jones, now owned by NewsCorp, carried on CNN Money, owned by Time-Warner. Caveat lector.)

Monday, December 29, 2008

Email in the loop

Email seems to be a sort of universal escape hatch for otherwise automated schemes:
  • If you forget your password on many sites, or in some cases even if you forget your login entirely, you can have the site send a special email to an address you gave when you registered.
  • If you want to join a site, you'll often get an email containing a magic link to follow to complete the registration process.
  • The same scheme has been used for product registration.
  • If you want to join an automated mailing list, you'll typically get an email asking you to confirm that you wanted to join.
Why does this work? It's painfully clear that email is completely insecure on the sending side. Anyone can spoof anything without a lot of effort. I've sometimes received email from myself for products I'm pretty sure I'm not trying to sell myself (my evil twin, on the other hand, may have other ideas ...). However, receiving email is somewhat more secure. Generally you at least have to give a password and you can use TLS to help prevent various attacks. This is nowhere near ironclad, but it does help:
  • A random person trying to recover your password will have to know what email address you registered with and be able to intercept your mail.
  • Someone trying to register hordes of people on a particular site will have to make up a bunch of email addresses and have a bot ready to answer the confirmation mails. Hmm ... that doesn't sound like a particularly high bar, so maybe I'm missing something.
  • When you legitimately register a product, the seller now has an email address that it knows someone has replied to at least once (and presumably a pirate will use other means to get the use of the product).
  • A spammer can't add your email address to someone else's mailing list without your getting an email asking if you really want mail from that list. This doesn't cure all ills, but it at least cures one of them.
If all this sounds like a sort of lukewarm endorsement, it should. The fact remains that email isn't really secure and doesn't seem to be getting any more secure very fast (I'm aware of PGP and its cousins and offspring -- good stuff, but not widely deployed). I think what bothers me here is that the extra email step might give an exaggerated air of security. About the best that can be said in most cases is that if you send an email to a legitimate address, at least the intended recipient is likely to see it, and schemes that rely on this seem to work well enough in practice that they continue to be used.

[Access to email (the receiving side, above) has gradually become more secure.  HTTPS is now pretty standard and two-factor authentication is available.  SPAM and phishing are still significant issues, though SPAM filters seem to have gotten better faster than spammers have gotten better at getting around them.  As to the main point of the article, email seems to be just as much in the loop as it was when I wrote this. --D.H. May 2015]

Are we still on the information superhighway?

Some words and phrases are just plain dated. They were the cat's pajamas back in the day, but now only a square would think they were hep. For all practical purposes they've dropped off the face of the earth into a silent abyss, only to be dragged out when we feel like watching 80s videos, pointing and laughing (*). Which brings me back to the title.

What was the "information superhighway"? As usual, Wikipedia has the relevant citation:
"One of the technologies Vice President Al Gore (**) is pushing is the information superhighway, which will link everyone at home or office to everything else—movies and television shows, shopping services, electronic mail and huge collections of data."
So that happened. Not only that, the analogy to the original superhighways (i.e., the U.S. Interstate system) seems apt. Both are "enabling technologies", valuable not so much for what they do, but for the other technologies and trends they make possible. So why does the term sound so completely dated?

I'd guess there are a couple of reasons. One is that "information superhighway" was never a real geek term. It was a way of explaining the geek stuff to congresspeople who largely didn't care greatly about the technology. They cared about what supporting said technology could bring to their constituents. The interstate system was widely regarded as a Good Thing, so if you could convince people this "internet" stuff was the modern version of that Good Thing you were ahead of the game.

That was probably helpful in congress, but it gave off the impression that only suits and pointy-haired bosses talked about "superhighways". Real geeks talked about networks, client-server architectures, not-so-client-server architectures, RISC vs. CISC and whatever else happened to be floating around at the time. If you wanted to be cool in the dot-com days, you wanted to talk like a geek.

Another reason, perhaps less obvious, is that we don't call superhighways "superhighways" anymore. They're just highways. What's super about them, anyway?

These days, if you're parked on the 405 during rush hour -- that is, pretty much any time -- they don't look so super. But consider what came before them. For example, to get from LA to the San Joaquin valley, you had little choice but to take the "Ridge Route" over the Tejon summit. These days, you've got eight lanes of I-5, traversable at speeds (ahem) well in excess of whatever's posted. Before that, well here are some excerpts from a description on GBCnet.com:
Perhaps the most hazardous section [...] was the Grapevine Grade between Fort Tejon and the base of Grapevine Creek [...]. The original Ridge Route highway built in 1915 was an absolute deathtrap. This section had 119 sharp turns, two with a radius as small as 80' (a 10 MPH hairpin turn - the turning radius of an average car is 40') and when totaled caused the traveler to drive the equivalent of 12 full circles [My inner math major compels me to infer "six one way and six the other" -- DH]. As a testament to the hazards presented by this road, one turn was even labeled "Deadman's Curve." By 1934 the extension of the US 99 Ridge Route Alternate [...] offered a substantial improvement in safety and ease of driving. For one, the number of turns was reduced to 23 and the number of complete circles a driver would have to make was reduced to 1½. [...]. Perhaps the biggest improvement was the widening of the road from 20' to 30' with three 10' lanes. The addition of suicide lanes enabled motorists to pass slow trucks and allowed traffic to move more quickly as backups behind the slow vehicles became minimized.
Yep. "Suicide lanes" were an improvement. Granted, by the time I-5 came around in 1960, there had been further improvements, including an expressway that was partially incorporated into the interstate. Nonetheless, all of the above fun and excitement would have been well known to drivers at the time, and eight relatively straight and level lanes, none designated "suicide," would have warranted the "super" tag.

Similarly, if you compare grainy YouTube video, the occasional annoying wait and spam-a-go-go to dialup access to a BBS (***), the difference is night and day. But one soon forgets the night.

(*) As you might guess from my profile, I caught those 80s videos the first time around. Thought they were infinitely bitchin'.

(**) It's hard to pull out a quote like that without dragging up the "Al Gore claims he invented the internet" controversy. I'll defer to Wikipedia on that, too.

(***) Nothing against BBSs. Like many geeks of a certain age, I spent a fair bit of time on one (thanks Keith!). I hear now you can use the internet to type short text messages to people who can type back at you ... you can even download timesink games and ...

Tuesday, December 23, 2008

All of human knowledge

In the annual (?) appeal for funding for the Wikimedia Foundation, Jimmy Wales asks us to
Imagine a world in which every single person on the planet is given free access to the sum of all human knowledge.
This seems like perfectly fine wording for a fundraising appeal, a decent description of what Wikipedia is about, and a noble ideal to boot. So let's rain on the parade by picking it apart, shall we?

Is it possible, even in principle, to give even one person access to the sum of all human knowledge? Actually, what does "the sum of human knowledge" even mean? Some time ago, I was convinced it was "everything in the encyclopedia". Now I'm not so sure. Wikipedia itself specifically excludes knowledge that isn't "notable" (what did I have for breakfast yesterday?) and "original research" such as tends to creep in as people summarize pieces of articles and draw conclusions from them. It also goes to great lengths to exclude or at least neutralize opinion (POV in the jargon (*)).

In other words, it aims to gather information generally accepted as "known". This is the kind of philosophical quicksand that holds up just fine so long as all you do is walk blithely across it. So let's just walk ...

Assuming there's such a thing as the sum of human knowledge, for some value of "knowledge", could anyone access it? Well, you don't really want to access all of it. You couldn't anyway. You want to be able to access the bit you need at the moment, right then and there.

This runs directly into the limits human bandwidth. Not only is there only so much raw information you can process at one time, there is only so much metadata -- information about what and where other information is -- that you can process at one time. Sure, the knowledge you're looking for is in there, and you have both the careful work of editors and categorizers and the raw horsepower of text search at your disposal. But can you find it? Empirically, the answer so far is "often". I doubt it will ever be "always".

Nonetheless, an unachievable goal is still worth aiming for so long as we produce useful results along the way.

(*) The Wikipedia article on POV contains a very relevant bit of wisdom:

In Thought du Jour Harold Geneen has stated:[1]

The reliability of the person giving you the facts is as important as the facts themselves. Keep in mind that facts are seldom facts, but what people think are facts, heavily tinged with assumptions.

Sunday, December 14, 2008

Giving online email the bird

The other day I was trying to send an invitation using a popular web site. The result was unsatisfying, but it wasn't particularly the invitation site's fault.

It was my first time on that site, so it didn't know any email addresses, and of course that makes sending invitations a bit harder. So I went into T-bird and started writing a dummy email. T-bird dutifully filled in the addresses from the first few letters of the names. I then went to cut-n-paste those addresses into the site's text box. T-bird doesn't seem to want to select more than one address at once from a message being composed.

Bad T-bird (or dumb me for missing the obvious -- but when dealing with software with a seven-digit user base and an ostensibly infinite supply of eyeballs, I tend to be less forgiving).

Then I tried creating an ad-hoc mailing list in T-bird and copying that in. Turns out this was not T-bird's day (or mine, for that matter). So I finally did a silly-walk so unwholesome I hesitate to mention it: I saved the dummy email to a text file and cut-n-pasted the names in from that.

Well, clearly the problem here is that I persist in keeping my email address book in my own personal silo instead of on the web. The invitation site was certainly of that opinion. It offered me the opportunity to import my email addresses from any of a dozen or so widely-used sources.

That there are a dozen or so to choose from tells me that this "silo" problem is not quite licked yet, even on the wide wonderful web.

Besides the obvious concern about security -- which is maybe not such a big concern considering that everybody's bank accounts are online and your address book is more vulnerable to an email virus on your machine than on someone's secure server -- I think the problem here is granularity. I don't want everyone to see everything in my address book. I want different sites to see different portions.

I haven't done even the minimal research of finding out if this is possible online, and frankly, I hardly ever run into a situation like the above, so I have no idea whether this is a real issue or not. It sounds like the kind of problem personal datastores are aimed at, though.

In the meantime, I plan to continue hiding in my silo, though I'll try to look into how the online system at work works. It's one of the major providers, but I access it through T-bird, of course. Better the devil you know.

[This is one of the few posts that's struck me as seriously dated on re-reading, probably because it wasn't really that relevant in the first place.  It does hit on some interesting themes.  It just doesn't do much with them, and plenty of other posts have better takes on the same themes.  Now that you've read this far, feel free to skip it --D.H. May 2015]

The jury is out (but online)

I heard a news story on the radio the other day about a high-profile case that was being reviewed, among other things, because jurors were thought to have been accessing the internet when they were supposed to have been sequestered.

This is sort of a dog-that-didn't bark situation. The net and web have been around for a long time now, with their potential for tainting jurors who are supposed to be isolated. You'd think there would have been more and bigger stories about it by now. Evidently, though, it's not a major problem. Take your pick of possible reasons:
  • Most trials are over quickly. Relatively few require actually sequestering jurors.
  • Web access is just the latest in a long line of potential leaks. Cell phones, (not to mention ordinary phones), have been a problem for years now.
  • Jurors are generally good about following instructions.
  • Jurors not communicating with the outside world is fundamentally a human problem, not a technological one.
What I found really interesting, though, was how the potential taint came to light: via email. And of course, the lawyers involved preface any discussion of this with "If these emails are genuine ..."

There's your barking dog -- email problems, particularly the ease of spoofing email, are a much bigger deal than the a web connection being available when it shouldn't.

Tuesday, December 9, 2008

The OCR X-Prize

A while ago -- in fact, just about a year ago as it happens -- I remarked that the use of Captchas to try to keep bots away is effectively an X Prize for OCR hackers. If there's money to be made by using a bot, as there is with, say, online ticket sales for popular shows, then there ought to be reason for someone to write a better text recognizer to get past the Captcha. I was sort of right, kind of.

Sure enough, people have written better text recognizers. But from what I can make it out, they've done it for fame and recognition (well, at least to publish and not perish) and not for money. There are several academic papers out there, but as far as I can tell no enterprising script kiddie has done the requisite research. There have been reports of real sites invaded by real Captcha-cracking bots, but most likely someone just cadged the work done in the research papers and put it to ill use.

So, Catpchas have spurred research, because OCR has become a somewhat hot topic. They've also spurred actual scammers to crack Captchas, but not necessarily through OCR research. Actual scammers have little reason to invent new OCR algorithms, or even read the literature on the subject. That's not their strong suit. Their strong suit is social engineering. People are good at reading squiggly Captcha letters; spammers are good at getting people to do stuff; ergo, spammers get people to read the squiggly Captcha letters for them.

How do they do this? Put up a site featuring a thrilling pictorial presentation of, say, accounting standards through the ages (they actually used a slightly different subject matter). After each image is the promise of more ... if you can read the squiggly letters in the box. The letters, of course, are taken from a legitimate site that the scammer is trying to crack into at the moment, and the mark's response is fed directly back to that site. The awful beauty of this approach is that it will work for any "reverse Turing test" approach whatsoever.

If they're smart, they wait for a successful response back from the legitimate site before letting the mark proceed. Otherwise the mark could put in anything at all, for example "notarealticketbuyer", and by definition the scam site wouldn't know the difference.

Meanwhile, Captchas have become just almost too hard for humans to read (I came up empty on one today, which is what spurred this post). In other words, they've almost reached the point at which they can no longer discriminate between humans and bots. Clearly rendered text can't discriminate, because both humans and bots can read it easily. Gibberish can't discriminate either, because no one can read it. There is less and less room left in the middle.

Wednesday, December 3, 2008

Happy birthday, Kindle

CNN points out it's already been a year since Amazon introduced the Kindle. It's currently sold out, and Oprah likes it, as do Toni Morrison and James Patterson (but not J.K. Rowling). It's accounting for 10% of Amazon's book sales, even though only 200,000 of Amazon's zillions of titles are available on it (evidently it's the right 200,000). So it's a game-changing hit, right?

Well, it's definitely not a flop, and the article claims that sales are "on a par with other high-profile mobile devices in their first year." On the other hand, in keeping with my not-so-disruptive technology theme, I'd have to side with Paul Reynolds of Consumer Reports:
I think it's certainly a ways away from hitting the mainstream ... because of the price and the experience a reader gets from long-form reading. Whether these ... are successful, stand-alone devices remains to be seen. From what I've seen and heard, I think the technology is here to stay.
So ... so far, so good, and it definitely bears watching, but more of a leadoff single than a grand slam home run.

Maybe all this googly stuff is worth something after all

Well, I never really doubted it, but it's good to have a working example from time to time.

A friend called, saying they were at a pizza shop on Smith Street (that's not exactly what happened, but let's say it is). The shop was next to a big red brick building. Could I come pick them up? Before they could relay any more detail, their cell phone went dead.

So I went to Google, searched for "pizza" "smith street" <my town>. Up came a little map with pizza shops marked. Two were even on Smith Street. I then clicked on the little push-pins for the two shops and checked the street view. Only one was next to a big red building. Voila!

I plugged the address into my GPS (I'm pretty sure in some setups that can happen automagically), set off, and sure enough, there was my friend waiting. Just like the breathless descriptions you'd see about how the web was going to Change Everything, with the added bonus that it actually happened.

That probably came off as overly cynical, so let me climb down a bit: A lot of the technology and trends that have been hyped over the years have actually happened. Phones, computers and TVs really are converging towards each other. You really can find all sorts of useful information and belong to far-flung virtual communities over the web. You can even shop and bank on the web.

I'm not down on the technology. I'm not even particularly down on the hype. Hype happens. If people didn't get excited about cool technology we wouldn't have any. Nonetheless, I can't help feeling that, with all these changes, my life is essentially the same -- even taking into account that it would have been a lot harder to track down my friend without the web.wizardry. Sure, technology can be disruptive, but most of it isn't, at least not as quickly or in the ways people often seem to imagine it to be.

Two Nice UI Amenities

Neither of these is particularly new to the web, much less new to the world, but they sure can make life easier:
  • The type-a-few-letters-and-it-narrows-the-list-down list. I remember seeing this one in one of the major PC apps in the 80s and going "oooohhh" kind of like the little green men in Toy Story. It hasn't lost any of its appeal. Wikipedia added it to its search field a while back, and it's a welcome addition.
  • The field-that-remembers-what-you've-put-in-it-previously. This is a standard browser feature (or plug-in), but I remember life without it and I like life with it better. Besides the obvious convenience of not having to type that 20-digit account number over and over again, it helps ensure consistency. The usual implementation is even a bit smarter than I imply: it can remember what you've put in similar fields, possibly in other pages entirely, so it can suggest your usual email address in a new "email" field it sees.
The two fit nicely together. If over the course of time you've given several different responses in a given field, it's very nice to be able to just type the first few letters and get the one you want. For example, blogger's tagging feature works exactly this way. Whenever I have occasion to refer to Alexander Stanhope St. George, he's only a few keystrokes away, and I don't have to remember whether it was, say, "St." or "Saint".

This is exactly the kind of small-scale memory that makes UIs more intelligent, and as I've argued before, "intelligence" here isn't just metaphor. The UI is intelligent in the same sense as an animal is. I'll hasten to add that a smart fields like I'm describing are of course not intelligent in the classical AI sense of being able to pass a Turing test. For me, that just says there are more kinds of useful intelligence than being able to converse like a human. But that's a whole other long discussion.

Note that both of these features can be handled either locally (as when the browser remembers addresses you've typed into "Address" fields) or remotely (as with Wikipedia's article titles) [More accurately, the knowledge can be stored locally, remotely or both, and the processing of that knowledge can also happen locally, remotely or both, mostly mix-n-match]. While the basic features don't inherently require AJAX, AJAX can certainly make them more widespread and useful.

Monday, December 1, 2008

LISP history from Olin Shivers

It may have seemed like I was bagging on BodyNet in a previous post. In fact, I was impressed by the paper when I first read it (a few years after it came out, I think), and I'm still favorably impressed. Getting the details of the future somewhat wrong fifteen years before the fact is still pretty good work.

Shivers has done quite a bit of other outstanding work, most of which will be incomprehensible unless you're a LISP-head or other such programming language geek. His History of T is probably in that category as well, and it has pretty much nothing to do with my theme of "figuring out the web as I go along", but so what? It's a fascinating account for a sometime LISP-head PL-geek and I reserve the right to drop stuff like this in from time to time.

I'm also hoping the piece is still interesting if you replace unitelligible phrases like "lexical scope" and "removing a cons cell from the free list" with "peanut butter". What's left is a bunch of now well-respected researchers in their early days, bouncing around the finest institutions in the US with stops in Silicon Valley and elsewhere, leaving significant discoveries in their wake.

And a lot of peanut butter.