Field notes on the Web: August 2008

Friday, August 29, 2008

More undead technology

For whatever reason, I found myself browsing through the upcoming generation tall buildings the other day. The Burj Dubai [Now the Burj Khalifa --D.H. June 2015] still has pride of place, but the one that caught my eye was the Tokyo Sky Tree. With a shape suggesting a huge Katana, the Sky Tree was considered necessary because the existing Tokyo Tower isn't tall enough to broadcast over Tokyo's latest skyscrapers.

So Japan, one of the more wired nations on earth, is spending hundreds of millions if not billions of dollars to build a new structure to broadcast radio and TV over the air. Another case where the surprise is that it's not surprising. Evidently the transition from broadcast to internet audio and video has a ways to go yet. Assuming it ever happens completely at all.

Wednesday, August 27, 2008

You saw the website. Now read the book!

Someone sent me a link the other day to a definition on Urban Dictionary. While enjoying that, I noticed a link for their latest book. It's not surprising that a major site should have an associated book or two. What's surprising is that it's not surprising.

I've argued before that text as a medium is not going to die out any time soon and that the web is a major factor in that. But it's a bit more puzzling why print doesn't die out. Books from web sites are a particular curiosity, and one for an online reference even more so. Think about it: The online version is searchable, hyperlinked and up to date. The print version is none of the three.

A dictionary is meant to be searched, is generally more fun and useful to browse by chasing cross-references, and had best be up to date (it'll often be new words you'll want to know the meaning of, at least when it comes to slang). Why bother with print? I can see why a publisher would bother: they know how to get paid for print. But why bother to buy a book?

Copy protection isn't an issue for a free online dictionary. It's got to be the form factor. It's still hard to take the online version with you wherever you go. Sure, you can carry a laptop with you, and there are hot spots and cell modems, but a book is generally smaller and more reliable. And it's not that hard to search, particularly if the entries are alphabetized.

Kindle was supposed to change all that, but as far as I can tell the infrastructure and selection aren't quite there yet to make Kindle take off.

Monday, August 25, 2008

Google Earth and alignment in cattle

Prof. Hynek Burda of the University of Duisburg-Essen spends much of his time studying the biology of mole rats. These fascinating little critters are nearly blind and otherwise adapted to life under ground. The naked mole rat, for example, is practically cold-blooded, lacks pain sensation in the skin and has a bee-like social structure. Some naked mole rats, evidently, are adept at matial arts.

Ansell's mole rats, on the other hand, tend to build their nests towards the south. That seems a bit odd for blind, burrowing animals. They're probably not using the direction of the sun as a cue. Other animals are known to be sensitive to the earth's magnetic field, so maybe that's it. And in fact, Ansell's tend to do worse in classic rat-in-maze tests in the presence of strong magnetic fields. What, if anything, this southward nesting is good for is a separate question. It may well just be a side effect of the clearly useful magnetic sense of direction.

Those south-nesting mole rats got Prof. Burda and company wondering if any other mammals had the same kind of sense (which has primarily been documented in non-mammals, such as birds and bees). And this is where the web comes in. First, they tried using Google Earth to see whether tents at campsites tended to be oriented (somehow, I would assume, factoring out any of several other reasons one might pitch a tent square to the world). That didn't work so well, but as so often happens they noticed something else.

Cattle, it seems, tend to graze lined up more or less north-south. It doesn't seem to matter where the sun is or where the wind is coming from, as long as it's not too windy -- if it gets really windy, cattle will face into the wind. It does matter, however, where you are on the globe. If local magnetic north is off of true north, as it is in the northeast US for example, cattle will also line up off of true north. In other words, they tend to line up along a magnetic north-south axis.

The next step, which no one has done yet, is to attach magnets to a few head of cattle and see if they line up differently then. It's not clear who's going to attach the magnets, but with school just starting, and with a little luck, an enterprising team might be able to convince some incoming freshmen it's a variant on cow-tipping. Stranger things have happened ...

[There was some controversy about this, with a second team unable to confirm the Burda's results, but according to this article on phys.org, there's probably something to it after all. --D.H. June 2015]

Saturday, August 23, 2008

Happy birthday, Field Notes!

Well now.

A year ago I wrote the first post on this blog, about e-tickets and copy protection. The thesis, which I still buy, is that strong copy protection only exists in the physical world and that in most cases tying virtual content to the physical world is likely to fail. If technology is inherently limited but we still want people who create content to get paid, it'll be up to " a web (if you will) of legal and social constructs". The good news is that that's already how the world works.

It might seem a random place to start, but it does introduce one of the main themes here: the interaction between technology and society. I've since come to think that the web is one of the clearest and most pervasive examples of this interplay. One of the beauties of blogging is seeing such themes develop over time. You can only set out so much at the beginning. The rest you discover, thus the subtitle: "figuring out the web as I go along."

When I first started, I was imagining a wide-ranging discussion of high-level architectural concepts, spiced with real-world examples. But I kept the title deliberately vague (the original candidate, wisely discarded, was "morphisms") to allow some wiggle room. In the event, I think the focus has drifted toward a wide-ranging collection of real-world examples, spiced with the occasional comment on architecture.

I'm happy with that, and it seems more in keeping with the idea of "field notes". As I understand it, real field work in science consists mostly of meticulous observation, with theory providing some hints as to what to look for. I'm coming from the same angle here, minus the "meticulous" part. Sometimes I'll write up something I've been stewing over for a while, but if I see something random and interesting float by in the meantime I'll go ahead and write it up. Why not? It's fun.

I read somewhere that of the millions of blogs out there, most don't survive their first year, so I'm happy to have made it this far. Except for an initial burst of activity tapering off last September, I seem to be managing a dozen or so posts a month, though not at the steady rate of one every two or three days that that might suggest. This seems about right. There's always something to write about when your topic is "the web", but not always time to write.

If you've been reading along so far, thanks, and I hope you've enjoyed it. Thoughtful comments and questions are always welcome, but lurking silently is just fine, too.

Friday, August 22, 2008

What do I mean, "explicitly sign away"?

I previously said that one of the features of good online data storage is that anything you put there belongs to you unless you explicitly say otherwise (or perhaps better, storing something in the cloud doesn't alter the rights to it unless without specific instructions otherwise).

But hang on. The usual yada yada that everyone just clicks on to get to the good stuff generally contains very specific statements about rights to data. And you specifically clicked on that, right?

Legally (and keeping in mind that I'm not a lawyer), yes, accepting the yada yada commits you to whatever it says. So this isn't a legal matter. It's a matter of customer service until some regulation on license agreements says otherwise.

The point is that the usual yada yada really consists of two things:

Stuff most people care about (or should), like what sorts of activity are allowed or prohibited, and who has what rights to what data.
Stuff most people don't care about, but lawyers do, like governing law (what state or nation's laws apply), severability (if one section turns out to be legally unsound, the rest still stands), whole agreement (these are all the rules of the game; in particular, older versions of the agreement don't count) and so forth.

People like to brush off the second category as "legalese", but every clause is there for a reason, generally because there was an ugly dispute over the issue somewhere in the past. There are very seldom any winners in an ugly legal dispute.

Ideally, provisions in the first category would be shown prominently, with concise headlines and clear explanations under the headlines, and there would also be a clearly labeled link for "other legal considerations" (or "our lawyers made us put this here" or whatever). When I said "explicitly sign", I was imagining a situation where the "important" clauses were presented prominently and not mixed in with the "other legal considerations", as they often are.

It's still your responsibility if you accept an agreement that's unfavorable to you. This is really just about clearer signposts. I should also say that, as far as I can tell, most vendors make an effort to make license agreements painless and many make an effort to point out clauses that might be surprising. I don't think license agreements are in desperate need of overhaul or that the vendors' lawyers are out to get us. But there's always room for tweaks and improvements.

The "bigger fish to fry" rule

A long while back I was talking to a developer about a messaging system that was meant to deliver messages reliably even if the network was misbehaving (in other words, it was like TCP, but on a different time scale). I asked the obvious question: "What happens if the network goes down?"

"The sender keeps track of what messages have been sent. It retries if it doesn't get an acknowledgment back that a message was received."

"But what if, say, the sending machine loses power?"

"There's an option to log to disk. It's slower, but the system will recover when the machine comes back up."

"But what if there's a disk crash when the power goes off?"

"You can configure sending processes on more than one machine. It'll cost you more speed, but if one sender fails, the others will take over."

"But what if they all go down?"

At this point the developer started to lose patience. There's really not much more you can do at that point, except keep more copies and reduce the chances of them all being destroyed simultaneously. At some point, it's just not worth it, if only because keeping everyone sufficiently in sync takes more and more effort, especially since you're assuming an unreliable network in the first place. There is no 100% reliable system -- in computing or anywhere else that I know of.

On the other hand, if you had, say, three senders, all logging to disk, keeping in sync over a fast, decently reliable local network (it's the outside network we're most concerned with here), and they all crash unrecoverably, what's going on? Most likely the building is on fire or something similarly bad is happening, and whatever is trying to produce the messages in the first place is probably not able to do its job either. You'd better have enough off-site backups to get things going again after the fire trucks leave.

In such a case, the messaging system is certainly going to fail. But its job is not to be perfect. Its job, and everyone else's, is to be good enough that if it fails there are bigger fish to fry.

Who owns the cloud?

Along the lines of "the usual yada yada," NPR recently ran a story on the downside of storing important personal data -- email, pictures, schedules, secret recipes, whatever -- "in the cloud", that is, online somewhere, you neither know nor care where, conveniently managed and backed up by someone else.

They mention three main problems:

Your provider could fold, taking your data with it.
Depending on the terms of its yada yada, your provider could shut down your account for any number of reasons beyond your control. For example, a random person could tell them, without proof, that they think you're engaged in committing a crime.
Again depending on the terms of the yada yada, the provider might share your data with anyone and everyone.

Now, without meaning to be harsh on anyone (when was the last time I scraped a copy of this blog?), these seem like problems one could anticipate, if only on the basis that in any sweet deal, there's got to be a catch someplace. But that doesn't stop them from being serious concerns.

The holy grail here is a service whereby your data is:

Safe: It won't go away, barring disasters in multiple, geographically separated sites (in which case there are probably bigger fish to fry). You may lose access to it, whether because you don't have connectivity, or because your provider folded and the data is temporarily in escrow, or because you really are accused of a crime, or whatever.
Secure: Only you can get at it. If you provider leaks your data, it's liable up to some fairly substantial point. If you lose your keys, you can have them replaced conveniently.
Yours: You have the rights to whatever you store (provided you created it or otherwise had the rights to it in the first place) unless and until you explicitly sign them away. As I understand it, this is one of the key tenets behind personal datastores.

In most cases, providers are implicitly suggesting this kind of service, and since no one reads the yada yada, everyone is expecting it. Providers also have a strong incentive to make this level of service a reality. If it's too far off, word will eventually get out and fewer people will want to buy in. In particular, the chances of one of the major players folding and taking your data down, or simply losing large portions of it, appear fairly small. Not zero by any means, but fairly small.

On the other hand, there is probably room for a few well-placed regulations to help things along here. In particular:

That data held by a provider that goes out of business should go into escrow and made available to former customers for a reasonable period.
That data remains private unless specifically made public.

Monday, August 18, 2008

The usual yada yada

Who reads the fine print in license agreements? I have to admit I don't do it as much as I used to (unless it's for some major component). Why would this be? Maybe

There aren't as many as there used to be, because more software is open source.
There aren't as many as there used to be, because outside open source, there aren't as many companies producing software.
No one believes that a really heinous license agreement would stand up anyway, so you might as well just click "Accept" and behave reasonably after that.
When you click "Accept", you're really saying "I trust the people I'm getting this from" and most of the time it's a well-established name.
Everyone else does, so even if it's a mistake, you're in good company.

Whatever the reason, it seems like it's part of the culture now, so much so that Google made a point of saying "This isn't the usual yada yada" (or words to that effect) when it knew one of its agreements was out of the ordinary.

Sunday, August 17, 2008

One of life's little mysteries

At our local airport, near the baggage claims, is a little kiosk-thing offering email and/or internet service. I have never, ever seen anyone use it. I don't recall ever seeing anyone use any of its counterparts in other airports. And yet it's still there. In fact, it appears to have been upgraded recently.

What gives? Does the airport subsidize it? Is it required by law? Do hordes of people wait until I'm out of the airport to line up for it?

Just wondering ...

[Some time later, I actually saw not one but two of these in use in the wild ... and that was about it --D.H. May 2015]

Wednesday, August 13, 2008

I'm not sure I buy my own argument

I just argued that DNS and routing were vulnerable to their recent exploits due to their complexity. I'm not sure I agree with myself.

First, there's a good argument that the culprit is not complexity but an overly trusting nature. The people behind the routing exploit say as much, and a more paranoid version of DNS has been around, but not widely deployed, for some time now (see here for some of the background, for example). In other words, people have known about the basic problems for some time, arguing against the "no obvious deficiencies" position.

Second, as I said previously, the standards themselves are not horribly complex. The specifications are deliberately loose in places, but they have to be. An overly tight specification is unlikely to survive unanticipated situations, and there are always unanticipated situations. The complexity comes from actually deploying in the field.

But that's just life with computers. Another advocate of simplicity, Edsger Dijkstra, once provided code examples in a language without subroutines on the grounds that even a few dozen lines of straight code can be hard enough to analyze. [I'm working from memory here. It might have been Niklaus Wirth. Or possibly Colonel Mustard with a lead pipe.] If you don't buy that, have a look at the "rule 110 automaton".

So what, if anything, can we conclude here? I like to come back to a few recurrent themes:

The algorithms are not as important as the monumental efforts of the security experts and administrators who track down what's actually going on and try to fix it.
What's deployed trumps what's implemented trumps what's specified.
The technical picture is not the whole picture. If your identity is stolen, it doesn't matter so much how it happened. There's a crime to be investigated and a reputation to be repaired in any case.
There are no 100% solutions. Even when the latest exploits are countered, there will still be vulnerabilities. The only secure computer is one that's turned off -- and you don't want to be too sure about that.

I still don't think I've quite said what I'd meant to say here, but I'm going to stop for a while now.

Tuesday, August 12, 2008

Now available in glorious PGP

OK, I'm really not sure what I might use this for right now, but I've exported a public key to hkp://pgp.mit.edu:11371. The key id is 4E0EED6D [Not any more, apparently, but no matter. The public key is right here. The problem is more, what did I do with the private key, and what password did I use? --D.H. June 2015][I don't think I ever used this key for anything, and I have only the foggiest idea where the private key might be. The server at pgp.mit.edu still doesn't turn up anything, nor is its FAQ particularly helpful ... --D.H. Dec 2018], and it goes a little something like this:

Tony Hoare's revenge

C.A.R. Hoare ("Tony" to his friends and to random upstart bloggers) has had many worthy things to say about computing. One that sticks in my mind most is this:

I conclude that there are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies and the other way is to make it so complicated that there are no obvious deficiencies.

(This is taken from Hoare's ACM Turing award lecture The Emperor's Old Clothes. The same sentiment can occasionally be seen floating around the net in somewhat simpler form; I'd like to think Hoare would appreciate the simpler formulations.)

Two of the more spectacular exploits at the latest Black Hat conference prove the point, in a way. The exploits involve DNS and routing. Now on the one hand, DNS and routing are not monsters. The same basic standards have been in use for decades now (with updates from time to time) and have scaled from the early internet to the sprawling virtual metropolis we know today. The concepts behind them are well-established. Implementations abound. All of these are a good sign that something has been done right.

So really, the designs are remarkably good, particularly considering the vast changes that have come to pass over the past two decades or so -- changes that those standards had a major role in effecting. The problem is that the systems that DNS and routing give rise to, what you get when you actually deploy them on thousands or millions of hosts and pump zillions of packets through them, under the administration of any number of entities, are beyond ferociously complex.

Which brings us back to Hoare's observation. The recent exploits are not ferociously complex. From what I understand, they are rather elegant. I would call them "neat hacks" but for the horrible confusion that comes from using "hack" in its earlier sense, particularly in this context. They do, however, take advantage of systems that have grown far beyond anyone's easy control. From this point of view, the concern is not so much the exploits themselves, but the difficulty of patching the live internet to counter them.

Thursday, August 7, 2008

Deja vu in the Galaxy Zoo

Hmm ... where have I seen this before?

Wednesday, August 6, 2008

This should come as no surprise

[Um, apparently I saved a draft of this on August 6, but neglected to, erm, actually publish it. I'd like to say I was reasonably up-to-date on this one, but obviously I snoozed and I lost. Here it is anyway.]
It's now official: There's a hole in DNS.

After giving the major players a month's head start, Dan Kaminsky is going public with a major DNS exploit, just in time for the Black Hat security conference. Apparently, it only took half of that month for people to reverse-engineer the exploit, and there have already been reports of people using DNS poisoning to snag clicks and claim ad revenue from them. Considering the possibilties, that's relatively benign, but there's no guarantee that worse hasn't been done. And those who would know have various reasons for not telling.

Paul Vixie, for one, has gone on record repeatedly about DNS's weaknesses (see here, for example), so it should come as no surprise that he's not surprised: "Quite frankly, all the pieces of this have been staring us in the face for decades, and none of us saw it until Dan put it all together." I'd take that as more "We knew this would happen, we just weren't sure how." as opposed to "How could we have missed this?"

So one of the major pieces of net.infrastructure is vulnerable, and complex enough in its fully-operational glory that you wouldn't want to bet that we've heard the last of this. It's also fundamental enough that many of the standard security measures, like SSL for example, depend on it.

What now? Most likely tactical patches in the short term, and, with luck, a serious re-think about how to get to "DNS 2.0". My understanding is that the pieces of that are also already reasonably familiar, but not well deployed. Along the way, expect to see strong security mechanisms like keys and certificates take root in more places, though not necessarily all that visibly.

[DNS has had several security extensions added since this was written, and, empirically, DNS seems to work, but it's not clear to me exactly why. In particular, is DNS fundamentally more secure, has the web at large come to rely less on its security, or some of both? -- D.H. June 2015]

Tuesday, August 5, 2008

Throwing music at the web

A while ago, when I duly reported Radiohead's "No Really, it's up to you" pricing scheme, I had forgotten all about Kristin Hersh's throwingmusic site. For a decade now Hersh has been offering her work online in various forms, starting with the "Work in Progress" mp3 subscription service and including a fairly conventional online album release (the release, if not the album, being fairly conventional).

Then in 2005, Hersh released an EP by her band 50FootWave (or L'~ if you prefer) in a name-you-own-price scheme under a Creative Commons license. Over 2 million copies were downloaded. I don't know what she got for them, but the EP's title, "Free Music" may have biased the price a bit. In fact, it's still available under the free music section at throwingmusic.

All of this seems somewhat staid, however, next to the "10-4" project, wherein Hersh offered to burn personalized CDs containing any 10 songs from a menu of 200, along with a personal dedication to the "special boy or girl who ordered it." At $50 a pop, the price was deliberately high to limit interest. Within 20 minutes there were 100 orders.

If you think this is easy money, you might want to see what Hersh actually ended up doing to get the results to her and her fans' satisfaction.

Field notes on the Web