Field notes on the Web: 2009

Thursday, December 31, 2009

TWC and Fox in the new year

As of this writing it appears that the game of chicken that Time Warner Cable and News Corp have been playing is likely to end up with the two splatting into each other at high speed. News Corp wants more cash for carrying its popular TV programming, including US college football bowls and sitcoms such as It's Always Sunny in Philadelphia. Time Warner Cable (now independent of parent Time Warner) claims it would have to pass along $1 per subscriber and is leery of setting a precedent that the other major TV networks could then use to their advantage.

So far there has been no agreement and there's every possibility that Fox channels such as FX, Speed, Fuel and Fox Soccer Channel -- but not Fox News or Fox Business, which have separate contracts -- could simply go dark to TWC subscribers at midnight tonight.

So what does this have to do with the web? For one thing, TWC is also a major ISP (full disclosure: they're mine, in particular). For another thing, Hulu (part-owned by Fox) will be happy to show you Sunny and other FX shows (not sure about the sports), whether over TWC's pipe or someone else's. That's particularly interesting since Hulu is supported by ads, and declining ad revenue is one of the reasons Fox wants to charge cable companies like TWC more. Another might possibly be that Fox wouldn't mind cable subscribers dumping cable in favor of an online service Fox has a piece of.

Nah.

Monday, December 28, 2009

How disrupted is technology?

The idea of disruptive technology is that changes in technology can bring about significant changes in society as a whole. But how much does technology itself change?

Let's start with what's on my screen right now:

A browser
An email client
A few explorer/navigators or whatever you call things that let you browse through a file system.
A couple of flavors of text editor
A command-line terminal (which I often don't have open) into which I mostly type commands I learned over twenty years ago.
If I'm working with code, I'll also have an IDE running

Everything on that list could have been there ten or fifteen years ago [The browser has since eaten the email client and a couple of flavors of text editor. In general the trend seems to have been spending more and more time in the browser, to the point that, at least on my laptop at home, I spend virtually all the time in the browser. I don't know what the split is between apps and browser pages on my phone, but certainly almost all screen time on a modern phone is web-driven one way or another --D.H. Dec 2015].

Now that first item, "browser", is a bit misleading because what you can access with the browser has changed significantly over the last decade or so, but even then the last major changes in browser technology, namely the key pieces of AJAX, happened over a decade ago. Again, I'm talking about disrupted technology here, not disruptive technology. Whatever changes the web and the computer desktop have wrought over the last decade, the underlying technology hasn't changed fundamentally.

What about the broadband connection behind the browser? There's broadband and then there's broadband, but if you mean "much faster than dial-up, fast enough to stream some sort of audio and video", that's been widely available for years as well. What about the server farms full of virtual machines at the other end of that connection? The whole point of such server farms is that they're using off-the-shelf parts, not the bleeding edge. Virtualization has become a buzzword lately, but the basic concept has been in practice for decades.

In short, I don't see any fundamental shifts in the underlying technology of the web. In fact, it seems just as likely that it's the stability of web technology that's enabled applications like e-commerce and social networking to build out over the last decade. Whether those are disruptive is a separate question, one which I've been chewing on for a while, mostly under the rubric of not-so-disruptive technology (in case you wonder where I stand on the matter).

Now, it's a legitimate question whether a decade or so is a long time or a short time. If you're a historian, it's a short time, but wasn't even one year supposed to be a long time in "internet time"?

Sunday, December 20, 2009

This one has a little bit of everything

For quite a while, the Did you feel it? link on the USGS web site has given the general public a chance to report earthquakes. This allows the seismologists to get a quick fix on the location and intensity of a quake before their instruments can produce more precise results -- seismic waves take time to travel through the earth.

This is a nice bit of crowdsourcing, somewhat akin to Galaxy Zoo, but it depends on people getting to the USGS site soon after they feel an earthquake. Some people are happy to do just that, but it's not necesarily everyone's top priority. So now the USGS has started searching through Twitter for keywords like "earthquake" or "shaking", and they're finding enough to be useful. The tweets range from a simple "Earthquake! OMG!" to something more like "The ceiling fan is swaying and my aunt's vase just fell off the top shelf," which gives some idea of magnitude.

As with Twitter in Iran, tweets are a great primary source of information, but you need to sift through them to get useful data. As with Google Flu, mining tweets doesn't require active cooperation from the people supplying the data. Rather, it mines data that people have already chosen to make public. In the case of Google Flu, Google is trying to use its awesome power for good by mining information that people give up in exchange for being able to use Google. (you have read Google's privacy policy, haven't you?) With Twitter, the picture is much simpler: The whole point is that you're broadcasting your message to the world.

It should come as no surprise that tweets about seismic activity are much more useful if you know where they came from (though even the raw timestamp should be of some use). Recently (November 2009), Twitter announced that its geotagging API had gone live. This allows twitterers to choose to supply location information with their tweets. The opt-in approach is definitely called for here, but even so there are serious questions of privacy. Martin Bryant has a good summary, to which I'll add that information about your location over time is a very good indicator of who you are.

Wednesday, December 16, 2009

And I would want to do this, why?

As I've mentioned, Google has decided that not only are blog readers a potential market for ads, so are the bloggers themselves. One ad, in particular, offers to print one's blog in book form. I can see the appeal of that in general, and I'm not alone, but the devil is in the details. The details in this case are not particularly attractive.

They're offering to print up your blog in softcover for $15 for the first 20 pages and $0.35 for each additional page. This is way, way more than Lulu charges to print on demand, so they're essentially charging you a hefty premium for scraping your blog and formatting it lightly before printing it.
The formatting options are extremely limited. You can show the entries in forward or reverse order, with or without comments. I didn't find out whether they carry along the style of the online version.
I didn't see any mention of indexing or table of contents.
As far as I can tell you can't even customize the title. You can, however, add an introduction/dedication of ... wait for it ... up to 350 characters!

I wouldn't call this a scam by any means, as they say up front what they do and what it costs, but it's definitely vanity publishing in the broader sense (see here for some background on what I mean by that). It's certainly not a commercially viable proposition [for the blogger, that is].

If I were to produce a print edition of Field Notes, I'd take a somewhat different approach:

I'd regroup the posts by theme so it becomes painfully obvious how often I've flogged each dead horse. The tags would be of some help here, but only some.
I'd provide a short lead-in for each section and a longer introduction for the book.
I'd probably do some light editing to improve the flow from one post to the next.
I'd give some indication of links between posts and probably selected external links. Sidenotes, maybe.
I'd clean up some of the formatting for consistency's sake, particularly the pseudo footnotes that appear here and there and maybe the editorial notes I sometimes add after the fact.
I'd take out any superfluous commas and parentheses I missed the first time around.
I'd provide a table of contents and index. Again the tags would be of some help, but only some, in constructing the index.
Along the way I'd probably end up doing some gardening in the blog itself, cleaning up tags and tweaking posts.
I'd title it Field Notes on the Web: Old-media Remix

All of this would entail quite a bit of hand-editing, some custom scripting/XML-bashing, considerable puzzling over what belongs in which section and, not least, re-reading the whole blog and the finished result from start to finish multiple times. To make this worth my while I'd need to see some indication that people would buy it, and I'm at least a couple of orders of magnitude away from that level of readership.

So if you're interested in my version, tell a hundred or so of your closest friends to stop by, and tell them to tell their friends, etc.. Go ahead. I'll wait. In the mean time, if you really, really want to get your hands on a printed, bound copy of a bunch of Field Notes, feel free to track down the service yourself. As far as I can tell, they don't really care whose blog you print, so long as you print it. Myself, I don't see the point.

Tuesday, December 15, 2009

Surfing up and down the east coast

In a previous life, I made a couple of long-haul bus trips across the American southwest. It was an option worth considering for someone with more time than money -- an undergraduate, say -- provided the traveler was someone -- an undergraduate, say -- who didn't mind sitting and occasionally half-sleeping in cramped quarters in the company all manner of interesting people. On the one hand it's slow, something over 40 hours of nearly continuous driving to get about halfway across the continent. On the other hand, you see all sorts of things you'll completely miss in flyover mode. That cuts both ways.

Here in the 21st century it's a whole different game, at least in the Boston-DC corridor. Two carriers, BoltBus and megabus.com, are offering service between major cities aimed not only at scruffy undergrads but also at business travelers. What makes it work?

Speed is not such a problem in the target market. Flying will still be considerably faster for the longer routes, even counting time to and from airports, but DC - New York is nowhere near as long a haul as LA - Albuquerque.
The bus is still way cheaper.
These new busses are spiffy, new, less cramped than Ye Olde Greyhound and ... you knew there it had to be in here somewhere ... webby.

Not only do you book online and bring an email confirmation as your ticket, you can surf while you're on the bus at no extra charge. Check your social networking site, fire off a few emails and tweets, who knows, maybe even get some work done while you're en route.

How does it work? I wasn't able to track down the exact mechanism, but I would have to guess a series of WiMax towers along the route feeding the on-board WiFi.

How well does it work? Well, BoltBus's FAQ cautions that "This technology is new, and there are spots on the trip where the service may be unavailable. We also do not advise downloading large files, as the speed will be relatively slow [...] Plug in and Wi-Fi disclaimer: BoltBus makes every effort to provide these services free of charge to every passenger. However, if, for whatever reason, the service is unavailable we are unable to supply a refund."

Well, whaddya want for a nickel?

I'm not sure which company's transparent attempt to sound like Bus 2.0 works better. You'd think that either CamelCaseNames or domainname.com would be about to fall out of fashion any day now, so I'll give the edge to BoltBus for not using mega.

[Looks like both of these outfits are still in business --D.H. Dec 2015]

Monday, December 14, 2009

Sunday, December 13, 2009

Additive change considered useful

This post is going to be a bit more hard-core geekly than most, but as with previous such posts, I'm hoping the main point will be clear even if you replace all the geek terms with "peanut butter" or similar.

Re-reading my post on Tog's predictions from 1994, I was struck by something that I'd originally glossed over. The prediction in question was:

The three major operating systems in use today, DOS/Windows, Macintosh, and Unix, were all launched in the seventies. They are old, tired, and creaking under the weight of today's tasks and opportunities. A new generation of object-oriented systems is waiting in the wings.

My specific response was that object-oriented programming has indeed become prominent, but that for the most part object-oriented applications run on top of the same three operating systems. I also speculated, generally, that such predictions tend to fail because they focus too strongly on present trends and assume that they will continue to the logical conclusion of sweeping everything else aside. But in fact, trends come and go.

Fair enough, and I still believe it goes a long way towards explaining how people can consistently misread the implications of trends, but why doesn't the new actually sweep aside the old, even in a field like software where everything is just bits in infinitely modifiable memory?

The particular case of object-oriented operating systems gives a good clue as to why, and the clue is in the phrase I originally glossed over: object-oriented operating systems. I instinctively referred to object-oriented programming instead, precisely because object-oriented operating systems didn't supplant the usual suspects, old and creaky though they might be.

The reason seems pretty simple to me: Sweeping aside the old is more trouble than it's worth.

The operating system is the lowest level of a software platform. It's responsible for making a collection of hard drives and such into a file system, sending the right bits to the video card to put images on the screen, for telling the memory management unit what goes where, dealing with processor interrupts and scheduling, and other such finicky stuff. It embodies not just the data structures and algorithms taught in introductory OS classes, but, crucially, huge amounts of specific knowledge about CPUs, memory management units, I/O buses, hundreds of models of video cards, keyboards, mice, etc., etc., etc.

For example, a single person was able to put together the basic elements of the Linux kernel, but its modern incarnation runs to millions of lines and is maintained by a dedicated core team and who knows how many contributors in all. This for the kernel. The outer layers are even bigger.

It's all written in an unholy combination of assembler and C with a heavy dose of magical functions and macros you won't find anywhere else, and it takes experience and a particular kind of mind to hack in any significant way. I don't have the particulars on Mac OS and DOS/Windows, but the basics are the same: Huge amounts of specialized knowledge distributed through millions of lines of code.

So, while it might be nice to have that codebase be written in your favorite OO language, leaving aside that using an OO platform enables but certainly does not automatically bring about several improvements in code quality, why would anyone in their right mind want to rewrite millions of lines of tested, shipping code? As far as function is concerned, it ain't broke, and where it is broke, it can be fixed for much, much less than the cost of rewriting. Sure, the structure might not be what you'd want, and sure, that has incremental costs, but so what? The change just isn't worth it*.

So instead, we have a variety of ways to write desktop applications, some of them OO but all running on one of the old standbys.

Except ...

An application developer would rather not see an operating system. You don't want to know what exact system calls you need to open a TCP connection. You just want a TCP connection. To this end, the various OS vendors also supply standard APIs that handle the details for you. Naturally, each vendor's API is tuned toward the underlying OS, leading to all manner of differences, some essential, many not so essential. If only there were a single API to deal with no matter which platform you're on.

There have been many attempts at such a lingua franca over the years. One of the more prominent ones is Java's JVM, of course. While it's not quite the "write once, run anywhere" magic bullet it's sometimes been hyped to be, it works pretty well in practice. And it's OO.

And it has been implemented on bare metal, notably on the ARM architecture. If you're running on that -- and if you're writing an app for a cell phone you may well be** -- you're effectively running on an OO operating system [Or nearly so. The JVM, which talks to the actual hardware, isn't written in Java, but everything above that is][Re-reading, I don't think I made it clear that in the usual Java setup, the JVM is relying on the underlying OS to talk to the hardware, making it a skin over a presumably non-OO OS. In the ARM case, you have an almost entirely OO platform from the bare metal up, the exception being the guts of the JVM].

Why did this work? Because ARM was a new architecture. There was no installed base of millions of users and there weren't hundreds of flavors of peripherals to deal with. Better yet, a cell phone is not an open device. You're not going to go out and buy some new video card for it. The first part gives room to take a new approach to the basic OS task of talking to the hardware. The second makes it much more tractable to do so.

What do the two cases, of Java sitting on existing operating systems in the desktop world but on the bare metal in the cell phone world, have in common? In both cases the change has been additive. The existing operating systems were not swept away because in the first case it would have been madness and in the second case there was nothing to sweep away.

* If you're curious, the Linux kernel developers give detailed reasons why they don't use C++ (Linus's take is particularly caustic). Whether or not we choose to count C++ as an OO language, the discussion of the costs and benefits is entirely valid. Interestingly, one of the points is that significant portions of the kernel are object-oriented, even though they're written in plain C.

** I wasn't able to run down clear numbers on how many cell phones in actual use run with this particular combination. I believe it's a lot.

Saturday, December 12, 2009

Usability, convention and standards

A while back now, I wrote about the importance of convention in life in general and by extension on the web in particular. Earl commented, wondering whether there was such a thing as an encyclopedia of conventions. A bit later, I asked a colleague who designs user interfaces about that. I wasn't expecting an actual encyclopedia, but perhaps a few widely-used reference works or such.

My colleague chuckled and pointed at Jakob's Law, named after Jakob Nielsen, which states that "Users spend most of their time on other sites." This ensures a certain homogeneity, since sites that don't conform to what everyone else is doing will tend to be hard to navigate and so lose traffic. As a corollary, conventions arise not by fiat from some authority, but de facto from what the most prominent sites do.

Fair enough, but that can't be the whole story. Some conventions are dictated by standards. Take URLs, for example. Granted, the casual web user can get by without seeing them much, but they can't be ignored completely, and every bit of a URL is dictated by standards. For example:

The http: or https: prefix is the standard designation of the HyperText Transport Protocol (RFC 2616). The URL syntax itself is specified in RFC 1738 and others.
HTTP URLs, and several other flavors, are hierarchical, meaning that they can be broken down into a list of sub-parts separated by slashes. Why slashes and not backslashes? That's what the standard calls for.
The authority part of an HTTP URL is the name of a host to which you should be able to make an HTTP connection*, typically something like www.example.com. The parts-separated-by-dots aspect is specified as part of DNS (RFC 1035).
Why do so many domain names end in .com? Thank the IANA.

There are also empirically-derived results that put limits on what will or will not work. Fitts's law, for example, states that how long it takes to point at something depends on how big it is and how close it is**. This has a strong effect on how well a user interface works. This in turn has at least some effect on how widespread a particular approach becomes.

But hang on. What did I just say, "has some effect"? Fitts's law has something of the character of a law when it comes to measuring how long it takes people to, say, find and select a menu option. There's laboratory evidence for it. It has less of a the character of a law when it comes to determining what real interfaces look like. That's determined in large part by what the prominent vendors happen to put out and by similar factors not having little to do with the merits of the interfaces themselves.

And those standards? It's slashes and not backslashes because the first web servers used the UNIX convention of forward slashes and not (say) the DOS convention of backslashes. Moreover, we use URLs and HTTP at all, and not some other set of standards with similar intent, because they caught on.

Anyone can write a standard. Writing a widely-accepted standard is a different matter, and it helps greatly to start with something people are already using. Why standardize something people are already using? Because people want some assurance that when they say "Foo protocol", they mean the same thing as everyone else, and in particular, their Foo implementation will work with everyone else's. Typically it goes something like this:

Someone puts out a nifty application that, say, uses something new called "Foo protocol" to coordinate its actions.
The app is so nifty that other people want to write pieces that work with it. So they figure out how to make their stuff speak Foo, whether by reading the source (if it's open), or reverse engineering, or by whatever other means.
Unfortunately, everyone's implementation works just a bit differently. At first, it doesn't matter much because I'm using my Foo to do something I know about and you're using yours to do soemthing you know about.
But sooner or later, people start using Foo to do something new. This happens with both the original application and with the third parties.
"Hey wait a minute! I was expecting your server to do X when I sent it a Blah message, but it did Y!" "Well, that always worked before, and it was consistent with the other stuff we were doing ..."

And thus is born the working group on Foo. With luck and a following wind, a written standard pops out some time later, and if adoption is good, the working group on Foo interoperability comes along to work out what the Foo standard really means.

But I digress. The main point, if there is one, is that conventions seem to arise organically, influenced by considerations such as existing standards and the technical details of the problems being addressed, but ultimately decided by accidents of history such as what happened to catch on first.

(*) Strictly speaking the part right after http:// doesn't have to be a real web server. But let's just pretend it does.

(**) Tog has an interesting quiz on the topic.

Wednesday, December 9, 2009

More bad news for Murdoch

Rupert Murdoch's approach to online content rests on a few basic tenets:

News aggregators* such as Google and Yahoo are stealing News Corp's content
People's online reading habits will reflect their newspaper buying habits
If the major outlets can all be persuaded to charge for content, people will have to pay

To which The Economist, citing a study by media consultancy Oliver & Ohlbaum, rebuts

People don't generally find their news through aggregators, so it's rather moot whether news aggregators are stealing or not
People's online reading habits have very little to do with what print publications they buy -- they'll read pretty much anything online
As more outlets decide to charge for content, people become less likely to pay

These findings fit against Murdoch's stated positions so tightly one would almost think they were aimed deliberately at rebutting them, and the last item is based on people's answers to a hypothetical question, but still ... the study does seem to put some meat on the bones of what a whole lot of web-savvy people have already been thinking and saying.

* Not to be confused with the search engines themselves, but Murdoch is also having a go at reining them in.

Tuesday, December 8, 2009

Real-time Google

[Not to be confused with Google Instant, which shows search results in real time.]

I tend to operate somewhat slower than real time myself, so I may not get around to investigating Google's latest magical trick, real-time search, right away, but for me what jumped out of CNET's article on it wasn't the inevitable Google-Twitter partnership, but that

Real-time search at Google involves more than just social-networking and microblogging services. While Google will get information pushed to it through deals with those companies, it also has improved its crawlers to index and display virtually any Web page as it is generated.

That's been coming along, by degrees, for a while, but it still seems kind of eerie.

[Sure enough, by the time I could decide that 'real-time search, right away, but for me what jumped out of' would be unique enough to find this post, and put it into Google, it was already in the index. Granted, Blogger is Google territory. Still pretty slick, though]

Monday, December 7, 2009

The future isn't what it used to be

Further into my digression into usability land -- a fine and useful place to digress, I might add -- I ran across the introduction to Bruce Tognazzini's Tog on Software Design, written in 1994, predicting the tumult of the next decade. Demonstrating that being a brilliant UI designer does not necessarily make one a brilliant futurologist, it nicely summarizes the "internet will change everything" vibe that was particularly strong then and still alive and kicking to this day. As such, it provides a fine chance to jump on my "not ... so ... fast" hobby horse and respond. Maybe even get it out of my system for a while.

Nah.

Following is a series of quotes, probably on the hairy edge of fair use. I had originally done the old point-by-point reply, but the result was tedious even for me to read, so instead let's pause to contemplate some of the more forceful statments in the area of technology ...

[W]ithin only a few more years, electronic readers thinner than this book, featuring high-definition, paper-white displays, will begin the slow death-knell for the tree mausoleums we call bookstores.
...
The three major operating systems in use today, DOS/Windows, Macintosh, and Unix, were all launched in the seventies. They are old, tired, and creaking under the weight of today's tasks and opportunities. A new generation of object-oriented systems is waiting in the wings.
...
[Cyberspace] will be an alternate universe that will be just as sensory, just as real, just as compelling as the physical universe to which we have until now been bound.

... economics ...

Every retail business from small stores to shopping centers to even the large discount superstores will feel an increasing pinch from mail-order, as people shop comfortably and safely in the privacy of their own homes from electronic, interactive catalogs.
...
a new electronic economy will likely soon rise, based on a system of barter and anonymous electronic currency that not even the finest nets of government intrusion will be able to sieve. [Bitcoin, anyone? --D.H. May 2015]

... society ...

Security is as much an illusion, as naïve, idealistic hackers automate their activities and release them, copyright-free, to an awaiting world of less talented thieves and charlatans. Orwell's prediction of intrusion is indeed coming true, but government is taking a back seat to the activities of both our largest corporations and our next-door neighbors. The trend will be reversed as the network is finally made safe, both for business and for individuals, but it will be accomplished by new technology, new social custom, and new approaches to law. The old will not work.
...
More and more corporations are embracing telecommuting, freeing their workers from the drudgery of the morning commute and society from the wear, tear, upkeep, and pollution of their physical vehicles. They will flit around Cyberspace instead, leaving in their wake only a trail of ones and zeros.
...
Dry-as-dust, committee-created and politically-safe textbooks will be swept away by the tide of rough, raw, real knowledge pouring forth from the Cyberspace spigot.

... and the creative world ...

As the revolution continues, our society will enjoy a blossoming of creative expression the likes of which the world has never seen.
...
[W]e are also seeing the emergence of a new and powerful form of expression, as works grow, change, and divide, with each new artist adding to these living collages of color, form, and action. If history repeats itself--and it will--we can expect a period of increasing repression as corporate intellectual property attorneys try desperately to hold onto the past.
...
Writers will no longer need to curry the favor of a publisher to be heard, and readers will be faced with a bewildering array of unrefereed, often inaccurate (to put it mildly), works.

Start with what more-or-less panned out: Object-oriented development has taken root. People do shop online. People do telecommute. Corporate intellectual property attorneys have indeed tried to put various genies back in their bottles. Blogs supply a bewildering array of unrefereed works (not to be confused with "rough, raw, real knowledge pouring forth"). Whether anyone reads them is a different matter.

Much more prominent here is what didn't happen, and there's a clear pattern: The new did not sweep aside the old. The most telling phrase along those lines is "mail-order". If you look at online shopping as a completely new way of doing business, then it's obvious that WebVan, eToys and Pets.com are going to slay the dinosaurs. But if you look at the web as the latest heir to the Sears Catalog, it's no surprise what actually happened. Far from feeling the pinch, the big box stores have simply added online shopping to their marketing arsenal.

And so on down the line: Object-oriented platforms are definitely here, but they generally run on DOS/Windows, Unix/Linux or the Mac. Various net-borne security threats have come along, but a scam is still a scam and a bank is still a bank. Some people telecommute now, but most can't and many would prefer not to. Wikipedia came along but textbooks are still here. Blogs and twitter came along, but major media outlets are still here. Record labels still produce music, studios still produce movies and publishers still publish. Often on paper, even.

I've left out a few more of the original predictions in the interest of brevity and because, though they would be interesting to discuss, they would take longer to go into in sufficient depth. I'm thinking particularly of the items about licensing fees and micropayments, and the have/have not divide. However I don't believe these omissions materially affect my main thesis that this piece, and many like it, are based mainly on taking what's hot at the moment and predicting that it will push everything else aside.

Why, then, the willingness to believe that today's particular preocupations will devour the future? Paradoxically, I think it may come of an inability to see change. If, to take a contemporary example, Twitter and social networking are all that everyone's talking (or tweeting) about, then simple inertia can lead one to assume that they're all that everyone will be talking about tomorrow, or in a month, or in a decade.

This is evident in one of the more jarring ironies in the piece: Directly after declaring that "Saying Information Superhighway is no longer cool," Tog goes on to extoll Cyberspace.

Remember Cyberspace?

[For a bit more on this thread, see this later post --D.H. Dec 2015]

Monday, November 30, 2009

Required reading I haven't read yet

I was going to do a followup on my earlier post on the importance of convention, and I may yet, but as I was researching that response a colleague pointed me at Jakob Nielsen's web site. Nielsen has for some time now been a major voice when it comes to usability on the web, sort of the web's answer to Bruce "Tog" Tognazzini, not that Tog is a stranger to the web, or that he and Nielsen are strangers to each other.

The two have a lot in common, not surprisingly. Notably, they share the strong conviction that if you want to figure out how people are going to respond to a system, you need to actually bring in some people, put them in front of the system and observe what happens.

They also have odd-looking sites. When I first saw Tog's, I thought I'd gotten the name wrong (even though it was the top google hit) and landed on one of those "sorry, the site you wanted is no longer there but here are some commercial links with the words you asked for in them" sites. I can't say why, but it's probably to do with the way the page is laid out, including plain-text ads for Tog's courses. Nielsen, for his part, is defiantly old school, almost all text and links, with only a yellow stripe and multicolored logotype at the top, oddly and odds with Nielsen's advice to avoid looking like an ad at all costs (It's item 7. Curiously there are no anchors for the items themselves).

Both seem to think sans-serif body fonts are fine.

No matter. Both sites are chock full of useful information, clearly and engagingly presented. You might not agree with everything they say, but that's not the point of required reading. Agree or disagree, one should at least know the major arguments. So I'll, um, be working on that ...

[Both Tog's and Nielsen's sites are still around. They're considerably spiffier than the description above, but still quite spare compared to many other sites. I would expect that this is more what they were originally going for, but that the tools at the time didn't give satisfactory results. --D.H. May 2015]

Sunday, November 29, 2009

Next blog, please

I'm not sure why this is happening just now, but a portion of the visitors of this blog are visiting a link called /?expref=next-blog. My guess is that, rather than searching for this particular next blog link, these folks got here by starting someplace else and clicking on next blog until they saw something they liked or got bored and went off to do something else. Blog surfing, basically. So I thought I'd do the same and see what else was out there. [The Next Blog button disappeared a while ago, at least from this blog's style sheets --D.H. Dec 2015]

I didn't keep a close count, but the breakdown was roughly:

A few family blogs, as in "here's what my family is up to", including one in Swedish. Sort of a year-round online version of the annual holiday letter to one's far-flung friends and relations.
A few photo blogs, one linked to flickr and offering to sell prints and send e-cards using the images.
A craft blog or two, one in Norwegian and English.
Several poetry blogs
Nothing technical, whether figuring out the web or anything else.

I have no idea if this is a representative sample, or if not, just how it is selected, or in any case, why the tilt towards Scandinavia. The lack of technical content has an easy and -- to my mind -- encouraging explanation: The web really is accessible to a broad range of people, only a few of whom are interested in its technical workings.

From a purely formal point of view, almost all the blogs hew pretty closely to the prototypical one or several contributors posting sporadically about whatever. That makes perfect sense given that the blog is a form, not a genre, but I was still a bit taken by just how much blogs look and smell like blogs.

One formal experiment that I ran across was Quoted Images, Imaged Quotes, in which a photographer and partner are collaborating to produce a captioned image every day for a year. Rather than writing a caption, the captioner chooses a quote to fit the image (or perhaps vice versa, or both). Even this experiment is not without precedent. Thing a week comes to mind.

Monday, November 23, 2009

Rupert and the interwebs, again

If I understand recent reports correctly, Rupert Murdoch's latest attempt to wring money out of online content rests on a simple, intriguing concept: if you can't control what people are reading, control whether they can find it. So News Corp will be partnering with Microsoft and against Google by blocking access from Google's web crawlers and charging Microsoft for the privilege of indexing News Corp content on Bing. With a 10% or so share of search volume (to Google's 60%), Microsoft is eager to give people a reason to switch. News Corp gets paid, so their end is pretty easy to understand.

On the face of it this seems like a pretty interesting test case of free vs paid content, and the search angle is clever, but just how is this going to work? First, there's the sheer business angle: If I want to read free content from the Wall Street Journal online, it's not exactly hard to find. If I just want to find out about XyzCorp, am I really going to notice that there's nothing there from the Journal? If I am the sort to notice, seems there's a pretty good chance I'm a Journal reader anyway.

For Microsoft, becoming known as the News Corp search engine could be a double-edged sword. It raises obvious issues of bias and could reignite the Microsoft-as-Evil-Empire fire which seemed to have died down of late (or maybe I'm just older now and have worked with enough perfectly reasonable Microsofties).

But more than that is the technical angle. I'm guessing it's going to take, oh, five minutes for some enterprising free content advocate to put up a site in some News Corp-unfriendly jurisdiction that will present essentially the same profile to search engines as the Journal or whatever, without or without actually violating copyright laws. At which point, without any involvement from Google, News Corp is back on Google.

I'd think it would be difficult for Google to stop this sort of thing even if it wanted to, and I'm not convinced they'd want to. They could explicitly blacklist sites, I suppose, in the usual endless cat-and-mouse, but as for automatically figuring out whether a site was just a front for someone else's? Technically enforcing "you can read it but you can't search it" smells like another example of modern-day perpetual motion. Enter the lawyers.

One way or another, there will be lawsuits. My guess -- and I suppose this would be a good time to dust off the old "I am not a lawyer" disclaimer -- is that News Corp will try to assert some right to control searchability, perhaps drawing on existing case law involving reference works and such, but I doubt it will get very far. If it did, it would be game-changing, and not necessarily in a good way.

Sunday, November 22, 2009

Today is yesterday's tomorrow (sort of)

The other night I was watching Ghostbusters II (oh, don't ask why) and right in the middle of it Harold Ramis' character uses The Computer to look up information on a historical figure. I'll use GBII for reference here since it's handy, but I could have picked any number of others.

The Computer has been a staple of science fiction for decades. It's interesting that its role in such movies is very often not to compute but to look something up, as was the case here. Our hero gives the computer the name, and back comes a neatly formatted 80-column by 24-row answer, with underlines and everything, saying who the person is.

Of all the technological devices in such movies, The Computer always seemed among the less plausible. I'm not counting the ghost-zapping equipment as technology; it's magic and falls firmly under suspension of disbelief. The Computer counts as technology because it's assumed just to be there. At some point in the future, super-powerful all-knowing computers will be generally available. How do we know? Just look at the movies ...

There were a couple of reasons The Computer always seemed particularly implausible. First, knowing a bit about real computers makes it harder for me to gloss over the technical hurdles. Force fields? Jet packs? Sure, why not? That's physics. Physics is what you major in if you're too smart for anything else. They'll figure it out. But a computer you can just type some vague query into and get a sensible answer? Come on. Like that'll happen.

Second, it always seemed like a computer smart enough to, essentially, act like the Encyclopedia Galactica would surely have all kinds of other powers that the careful scriptwriter would have to take into account. If The Computer can tell you who the bad guy in the painting is, why can't it tell you how to take him out?

You can probably tell where I'm going with this. Today, about fifteen years after GBII, you can sit down at your home computer, type in the name of a historical figure and very likely come up with a concise, well-formatted description of who the person was, thanks to the now ubiquitous browser/search-engine/Wikipedia setup.

As powerful as it is, though, the system is an idiot savant. It won't tell you how to neutralize a malevelolent spirit (or rather, it won't tell you a single, clear way to do so) and it won't do a lot of other things. It just allows you to quickly locate useful information that's already been discovered and made publicly available. It's powerful, but not magic.

What particularly strikes me about the description above is the presence of Wikipedia. Large, fast networks of computers were already building out in 1994. Mosaic came out while GBII was in production. The missing piece, and one that I don't recall very many people predicting, was the massively-collaborative human-powered Wikipedia, not a technical advance in itself, but something very much enabled by several technical advances.

The Internet, HTTP, browsers, scripting languages, broadband, email, databases, server farms, cell phones, etc. -- these are all technologies. Wikipedia isn't, and yet it fits easily and comfortably into the list of advances from the last few decades. It fills a niche that's been anticipated for decades, but -- fascinatingly -- not by the anticipated means of using sheer computing power to somehow divine the history of the world.

Friday, November 20, 2009

How to report an error

Stereotypically, sports fans are unsophisticated, cave-dwelling couch potatoes. And yet, sports web sites tend to be informative -- if you're looking for sports information, at least -- easy to navigate, full of links to outside content, lively and fun. For example, any site can serve up an error code, but how many serve one up like this?

(If you don't see an error message, you'll most likely see a piece on the botched call on Thierry Henry's now-infamous handball against Ireland. I could say more but I won't.)

Wednesday, November 18, 2009

Murdoch vs. the interwebs

A while ago I mentioned that media übermogul Rupert Murdoch is trying to buck the trend toward free content by charging for his properties' online content, and I wondered how it might play out. Unsurprisingly, I wasn't the only one . In the November issue of Vanity Fair, for example, Murdoch biographer Michael Wolff lays out why the "new media" folks think that Murdoch Just Doesn't Get It. Actually, they're not quite that nice about it:

Almost all Internet professionals, on the other hand, think that charging for general-interest news online is fanciful—“Rubbish … bonkers … a crock … a form of madness,” in the description of Emily Bell, who has long run the Guardian newspaper’s Web site, one of the industry’s most successful—and, in fact, it has been tried before and failed. “It’s Groundhog Day,” adds Bell. The New York Times tried to levy a subscription charge for its columnists but reversed course and declared itself free again. Even Murdoch’s Wall Street Journal, the model of subscription content online, has made more and more of its site free.

(Elsewhere Wolff cites the Grauniad, as Private Eye likes to call it, as a prime example of a locally-known publication making itself into an international brand on the web)

There appears to be a wide consensus that in the newspaper and magazine business, charging for content just means driving away the vast bulk of your readers and angering your columnists by cutting of their exposure to lucrative outside gigs:

The position of Internet professionals is straightforward: while it’s possible to charge for certain kinds of specialized information—specifically, information that helps you make money (and that you can, as with an online Wall Street Journal subscription, buy on your company expense account)—there are no significant examples of anyone being able to charge for general-interest information. Sites where pay walls have been erected have suffered cuts in user traffic of, in many cases, as much as 95 percent as audiences merely move on to other, free options.

Evidently Murdoch's track record with online publishing is not that great. Wolff gives a litany of failures, including MySpace being "flattened" by FaceBook and suggests that the pay wall was one reason that Murdoch has alerady taken a $3 billion writedown on his purchase of the Journal.

Wolff's most intersting argument, though, gives me a chance to drag the not-so-disruptive techonlogy tag out of mothballs:

Murdoch has a larger problem still. It is, after all, not the Internet that has made news free. News in penny-newspaper or broadcast (or bundled cable) form has always been either free or negligibly priced. In almost every commercial iteration, news has been supported by advertising. This is, more than the Internet, Murdoch’s (and every publisher’s) problem: the dramatic downturn in advertising.

[...]

It is hard to imagine that when advertising growth resumes there will not once again be a rush to encourage traffic growth, but right now, the news business, supported for a hundred years by advertising, whose core skill has been selling advertising, believes it must right away, this second, re-create itself with a new business model where advertising is just the cream on top and where it’s the consumer who pays the true cost of newsgathering.

Indeed.

Monday, November 16, 2009

Googling the flu

How can you track the progress of the flu season? If you want rigorous results, you can ask the CDC or its analog where you live for the results of their thorough surveys of public health data. Since that takes a while to compile and cross-check, the definitive answers are weeks out of date. If you're in a hurry, just track how many people are googling for "flu" and similar targets.

The nice folks at Google found that with a little tuning and tweaking, this approach gives results that match the official results remarkably closely. It's not just a neat hack, though it definitely is that. The CDC has taken it seriously enough to collaborate with the Google team on an article published in Nature.

Google is careful to point out that they are using anonymized data aggregated over millions of searches, more than enough for the "I'm Spartacus" effect to come into play. Individual results would be of little help anyway. Just because I google "flu" doesn't mean I'm ill. I might just be doing a little research for my blog.

Looks like Australia is having a fairly flu-free spring so far.

[Google Flu Trends and Google Dengue Trends are no longer active projects, but historical data is still available --D.H. Dec 2015]

Well, this is embarrassing ...

Dead links are part of the fabric of the web. Indeed, tolerance of them is one of the web's major architectural features. Every once in a while, though, one runs into a particularly amusing example. I was just checking my usage statistics and at the bottom was a link encouraging me to "Try Google's free webmaster tools." and "Improve your site's visibility in Google search results."

I'll let you fill in your own punchline.

Sunday, November 15, 2009

Fingerpickin', or Is YouTube the new Folkways?

For some reason the old Doc Watson song* Deep River Blues popped into my head. So I googled, and up came a clip of Doc pickin' it, every note in the right place as usual. Right there in the list of related videos next to it was Leo Kottke's take on the same song. Clearly the same song, but it's fascinating how Doc's sounds like no one but Doc and Kottke's sounds like no one but Kottke.

On the related list for that one in turn is impromptu-looking footage of Doc, Kottke and the late Chet Atkins for good measure picking I-forget-what tune along much the same lines as Deep River. Atkins is mostly comping here, but when he does take a lead, darned if it doesn't sound exactly like Chet Atkins. Note that Kottke, being the junior member, doesn't seem to merit a mention in the opening captions.

One could spend a fair bit of time (and this one did) chasing links, comparing and contrasting the styles and techniques of the greats, almost as though there were something to this whole web-as-online-educational-resource thing. If that sounds intruguing to you, I'll leave you to it. In which case if you run across bootleggy-looking versions** of Eight Miles High by Kottke, Michael Hedges and Roger McGuinn himself, say hi for me.

* He didn't write it, just made it his
** I'm going to assume here that YouTube would have taken down anything egregiously illicit, and/or that Kottke, McGuinn and the Hedges estate are not greatly offended. See the previous post on iBiblio for why McGuinn in particular might not mind.

Thursday, November 12, 2009

No monopoly on BitTorrent hype

Google tells me that a few people have turned up this blog by searching for "60 minutes bittorrent". I couldn't replicate that effect, perhaps because I didn't dig deeply enough into the hits, more likely because other hits have taken Field Notes' place, but I did turn up some other interesting tidbits:

The top hit was for a BitTorrent site offering downloads of 60 Minutes, including a couple of files dated 11/1, presumably the episode in question. I'm not going to include the link here, as I don't see any indication that the particular site has permission to distribute that content. However ...
A few pages further down, past several other unfriendly headlines, is a piece entitled Leslie Stahl Needs to Get A Clue About P2P. Hmm ... I wonder where they stand on the issue? Deftly dodging the various popups, I was reminded that BitTorrent (the company) partnered with several of the major studios back in late 2006. However ...
I'm hard-pressed to find any legal BitTorrent service for movies from the major studios -- something, for example, where you pay $X and then download the movie of your choice, duly uploading bits of your previously-purchased movies to other paying customers. Maybe it's there and I missed it, but neither BitTorrent's site, Fox Movies' site nor a general Google search turned up anything likely. By contrast, Roxio's Cinema Now delivers movies in a large variety of ways, but all of them involve a dedicated device.
Conversely the handful of BitTorrent sites I clicked through to explicitly don't charge anything (and handle an, ahem, wide variety of content).

So ... BitTorrent seems like a fine way to distribute bulky content -- whether legit or illicit -- for free, but not so good for paid content. No great surprise there. But, just as 60 Minutes deserves grief for putting forth the MPAA party line as an investigative report, and the MPAA for claiming loss of revenue it could never have realistically had in the first place, it comes off as disingenuous for someone defending BitTorrent to claim that the movie industry is actually benefitting from BitTorrent, based on a 2006 press release.

Wednesday, November 11, 2009

30 seconds amusingly spent

A colleague mailed this bit of fun out to everyone.

The shape of the backbone

I was all set to write a somewhat philosophical post comparing the internet backbone to the phone system that came before it, when I realized I know very little about the internet backbone. Since I claim not just to be writing about the web, but actually figuring it out (however slowly) as I go along, that seemed like a pretty embarrassing lapse. So I've started doing a little basic digging and, so as to have something to post while I'm doing it, pass along some of my findings.

There was a British general once who, whenever he was sent to a new theater of battle, would start off by making a crude, not-painstakingly-to-scale map of the major roads and rail lines in the territory, noting the distances between the most important cities, junctions and other points of interest. This exercise would pay dividends later, not just in the day-to-day running of the campaign, but in preventing blunders on the order of depending on troops to cover a week's worth of distance in two days. It would also give some idea of the political structure of the place -- who was likely to be trading with whom, who might not care too much about what was going on where, and so forth -- and help greatly in deciding which supply lines to guard where, which enemy supply lines to attack where, etc., etc.

Along those lines (minus the military perspective), it seems useful to start with a very large-scale look at the international internet backbone, with an eye toward who's connected to whom and with what capacity. Physical distance is not as important here, though latency can matter in some cases.

Here's just such a map, dating from 2005, prepared by TeleGeography research for the International Telecommunications Union. Please don't look at the part that says "Proprietary and Confidential". There are probably more recent maps available* but this should give a reasonable impression. A few things that jump out:

North America <-> Europe dominates, followed by North America <-> Asia with roughly 40% of that capacity, followed by everyone else, far behind. In particular, there is very little capacity, relatively speaking, directly linking Europe and Asia. If you're in Bangalore talking to Bucharest, you may well be going by way of San Francisco.
Capacity to and from Africa is almost negligible on this map, but see below.
Capacity on these scales is currently measured in Gbps (the document gives 5- and 6-figure numbers in Mbps, but most of that precision is probably spurious). That's pretty big compared to my home connection, but not really stunningly large considering how many people are involved. By comparison, the human visual system appears to have a capacity on the order of 1Gb/s, so the 500Gb/s pipe between North America and Europe is, in some notional sense, roughly equivalent to 500 pairs of eyes. Put more realistically, though, 500Gb/s is about 125,000 DVDs playing simultaneously.
What these numbers actually mean is a different question entirely. Suppose for example that everyone in the US wants to follow the European football leagues online -- not that that's going to happen anytime soon. Would those millions of viewers saturate the pipe? Hardly, since the main pipe only has to carry a few video feeds at any given time.

As I said, the big missing piece in the 2005 picture is bandwidth between Africa and the rest of the world. As of this summer, that bandwidth has gone from practically nothing to about 1.3 Tb/s ,or 1300Gb/s to put it in the same units used above. That's over twice the size of the Europe <-> North America pipe, a classic example of leapfrogging technology assuming we're comparing apples to apples. Even if we're comparing apples to pears it still looks like a pretty big pipe.

(*) Back in the days that bang paths roamed the earth, one could subscribe to a newsgroup which published frequently updated maps of the backbone as it then existed, meaning mainly a bunch of T1 and similar lines. A T1 line could carry about 1.5 Mb/s, or .0015 Gb/s, or about 0.0003% of the Europe <-> North America link.

Friday, November 6, 2009

Moto perpetuo

It occurs to me that unbreakable copy protection is the perpetual motion of our day.

Back at the beginnings of the industrial revolution, when inventions like the steam engine and electrical generator were making new and mysterious things possible and were not widely understood, people were constantly coming up with perpetual motion schemes. And why not? If you can generate more power than hundreds of strong men and horses can produce just by burning coal, and transmit that power miles and miles away with simple metal wires, is it so implausible that some arrangement of magnets and overbalanced wheels could generate endless power from nothing?

Eventually, in the early 1800s, after commercial steam power had been around for about a century, the principle of conservation of energy came to be widely accepted and by the the middle of the century the familiar laws of thermodynamics were established, including the crucial first two:

You can't win (conservation of energy).
You can't break even (entropy increases in a closed system).

These two principles explain why perpetual motion schemes don't work. That hasn't stopped people from coming up with them, but it has stopped knowledgeable engineers and scientists from wasting time on them. It hasn't completely stopped investors from investing in them, but the long and sorry track record of such schemes probably has been a deterrent.

Why do people still bother, then? Because if it were possible, large-scale perpetual motion would do away with energy shortages forever. It woudn't necessarily make any money, infinite supply implying zero price, and an energy surplus would have drawbacks of its own, but we could probably deal with those problems when they came up. The point is that people try to prove that perpetual motion is possible because they really, really want it to be.

Anyone in the business of selling information would really, really like to be able to control the propagation of that information. You do the math.

I don't know of any specific principle of information theory that explains why this will never work, but there's a growing body of empirical evidence to that effect. Intuitively, copying bits costs much less than the price sellers would like to charge, so the protection has to come in the conversion of those bits into usable form. That runs you right in to the analog reconversion problem, of which "camming" (sneaking cameras into movie theaters) is a crude but effective example.

Clearly none of this is currently stopping people from trying to come up with copy protection schemes, or people from paying for them. The track record probably isn't quite long or sorry enough yet. I suspect it eventually will be.

Fortunately, selling bits and making them impossible to copy are two different things.

Thursday, November 5, 2009

60 Minutes and the MPAA: Postscript

In the last few posts on this topic (and I hope this will be the last one for a while), I tried to keep the 60 Minutes bashing toned down to a dull roar and concentrate more on the technical and economic issues. And besides, as it turns out, others have done it better and funnier [Unfortunately this link appears dead, as in not-even-on-the-Wayback-Machine dead. I no longer remember what it said, but I'm fairly sure it was pretty funny. Sic transit ... --D.H. May 2015].

From the first link, I learned that 60 Minutes has gotten this one wrong once before, and from the second I learned how to fill up an entire web page in order to display a message of no more than 140 characters. "Twitterati", eh?

Wednesday, November 4, 2009

60 Minutes and the MPAA: Part VI - What now?

Along with passing along the $6 billion/year figure, Steven Soderbergh tells 60 Minutes that, thanks to piracy, movies that got made in the past could not be made today. He cites The Matrix as an example.

"The chances of a movie, for instance, like 'The Matrix' being made shrinks. Here's a guy, here's a movie, two guys, they've made a small independent film. Warner Brothers gives them $75 million to make this script that nobody can understand, right?" Soderbergh said. "Wouldn't happen today."

Now, I'm not going to claim I know anywhere near what Steven Soderbergh knows about getting movies made. I will go so far as to claim I know next to nothing. And yet, looking at the Yahoo! box office grosses, I can't say I see anything amiss.

Clearly all kinds of movies are still getting made, including some pretty expensive-looking ones. A lot of them stick pretty close to the usual formulas, but that's always been true. Movies by unknowns? I wouldn't know, but I'm quite sure that intersting movies by intersting people with interesting viewpoints are still getting made.

But what about the future? The movie studios are understandably worried, particularly in light of what their brethren in the music business have been going through. But every industry is unique. Movies are not the same as songs and albums. Nobody goes to a "music theater" to listen to pre-recorded music.

Making a movie, even a cheap movie, carries a lot more overhead that recording a song, and songs are easier to market. At least one songwriter has recorded a song a week for a year at a stretch, and at least one of them was pretty good. Plenty of small bands self-produce and self-distribute, and a lot of them are pretty good. By comparison very few feature films are made without the involvement of some flavor of existing studio. There's plenty of self-produced stuff on YouTube, but much less of it is pretty good and most of it goes unseen or nearly unseen anyway.

In short, the economics are different and hey, people still make money selling books, to my continual puzzlement. So my guess is that the movie industry is going to be just fine, particularly if it stops trying to boil the ocean and embraces online distribution.

I certainly hope so, anyway. As much as I've questioned the MPAA's rhetoric and logic, I wholeheartedly agree with them on some basics: Movies are cool, and people who make them should be able to get paid for their work.

P.S. While chasing down the link above on YouTube, I ran across two previous posts on paying for movies, which might be relevant.

60 Minutes and the MPAA: Part V - Relevance

OK, so the L.E.K. study says that people who buy pirated DVDs say they would have collectively spent billions of dollars on legitimate fare if the pirated DVDs hadn't been available. Let's assume these people are telling the truth and are accurately estimating how much they would have spent. So we're done, right? That's how much money the studios are losing.

Actually there's another level of estimation involved. The $6.1 billion quoted was the amount the studios were said to be losing and itself is a portion of the larger total that the motion picture industry as a whole was said to be losing. But let's take that, too, at face value. Now we're done, right?

Well ... remember when I was discussing BitTorrent in Part I and mentioned the importance of carefully considering what problem you're trying to solve? The principle is just as vital here.

The MPAA, like the music industry before it, and the software industry before it, seems to be trying to solve the problem of keeping people from copying bits. Being no more able to do this than their predecessors, and with Moore's law catching up with them just like it did with everyone else, they -- again like everyone before them -- claim damages by comparing the real world with what they might have had if they could stop people from copying bits.

Fair enough, but they can't. No one can, and a false antecedent implies anything you want it to. Rather than trying to keep people from copying bits, would it not be better to frame the problem as how to make money from making movies?

The traditional way of doing that, selling tickets at theaters, is still bringing in revenue, on what I'd call a slight upward trend and what the site I got the figures from says is "not any substantial increase". That's not great news for an industry constantly trying to grow, particularly once you adjust for inflation, but neither are they falling off a cliff. Evidently "Let me take you to the movies" can sometimes have more appeal than "Let's go back to my grungy apartment and watch a DVD."

Leaving aside the TV networks, cable movie channels and pay-per-view (which is not necessarily a valid approach) the complaint, and certainly the thrust of the L.E.K. study, is that DVD sales are not doing as well as expected. They were supposed keep chugging along like video before them, but they appear to be levelling off or even falling. At the same time people are selling lots of pirated videos, so the difference is attributed to piracy. QED.

But correlation does not imply cause. Certainly one interpretation of those facts is that piracy is hurting DVD sales, but another one, which I personally find more plausible, is that since bits are getting easier to copy and watching digital movies no longer requires a DVD, DVDs just aren't that valuable any more and the original sales projections that they would be were just wrong. Another symptom of the same technological changes is that making shoddy DVDs is easy enough that you can still make money off of them by charging closer to what they're worth, but neither of these symptoms is the cause of the other.

I don't recall when I last bought a DVD and I can assure you it's not because of piracy. Why pay $20 or even $12 for a DVD to take home and unwrap when I can order up a movie on demand for $4 or (if it's not particularly popular) watch it over Netflix as part of my $10/month subscription? How many DVDs do I watch more than three or four times? The commentary tracks seemed cool at first, but I don't really have time for those, either. Manifestly, I'd rather blog about DVDs than buy them.

I would, however, gladly pay more than $10/month for some sort of "premium" Netflix subscription that would open up more selection. I suspect I'm not alone. There's certainly money to be made legitimately by selling high-quality video online that won't get you sued or arrested. It's not a foregone conclusion that there's enough money to be made to keep the movie studios going in the style to which they (and we) have become accustomed, but I'm quite sure that money is not to be found in trying to prevent bit-copying.

60 Minutes and the MPAA: Part IV - Error bars

In the 60 Minutes piece I've been referencing, A-list director Steven Soderbergh drops the oft-quoted figure of $6.1 billion per year in industry losses. This figure comes from a 2006 study by consulting firm L.E.K. It's easy to find a summary of this report. Just google "video piracy costs" and up it comes. Depending on your browser settings, you may not even see the rest of the hits, but most of the top ones are repeats or otherwise derived from the L.E.K study. And you didn't need to see anything else anyway, did you?

So ... $6.1 billion. Let's assume for the moment that the figure is relevant -- more on that in the next post. How accurate is it?

One of the handful of concepts I retained from high school physics, beyond Newton's laws, was that of significant digits, or "sig digs" as the teacher liked to call them. By convention, if I say "6.1 billion", I mean that I'm confident that it's more than 6.05 billion and less than 6.15 billion. If I'm not sure, I could say 6 billion (meaning more than 5.5 billion and less than 6.5 billion).

Significant digits are just a rough-and-ready convention. If you're serious about measurement you state the uncertainty explicitly, as "6.1 billion, +/- 300 million". My personal opinion is that even if you're not being that rigorous, it's a bad habit to claim more digits than you really know, and a good habit to question anything presented like it's known to an unlikely number of digits.

The point of all this is that precise results are rare in the real world. Much more often, the result is a range of values that we're more or less sure the real value lies in. For extra bonus points, you can say how sure, as "6.1 billion, plus or minus 300 million, with 95% confidence".

From what I can make out, L.E.K. is a reputable outfit and made a legitimate effort to produce meaningful results and explain them. In particular, they didn't just try to count up the number of illegal DVDs sold. If I buy an illegal DVD but go and see the movie anyway, or I never would have seen the movie at all if not for the DVD, it's hard to claim much harm. So L.E.K. tried to establish "how many of their pirated movies [viewers] would have purchased in stores or seen in theaters if they didn't have an unauthorized copy". They did this by surveying 17,000 consumers in 22 countries, doing focus groups and applying a regression model to estimate figures for countries they didn't survey. (This is from a Wall Street Journal article on L.E.K. web site and from the "methodology" section of the summary mentioned above).

On average, they surveyed about 800 people per country, presumably more in larger countries and fewer in smaller. That's enough to do decent polling, but even an ideal poll typically has a statistical error of a few percent. This theoretical limit is closely approached in political polls in countries with frequent elections, because it's done over and over and the pollsters have detailed knowledge of the demographics and how that might effect results. They apply this knowledge to weight the raw results of their polling in order to compensate for their sample not being completely representative (for example it's weighted towards people who will answer the phone when they call and are willing to answer intrusive questions).

For international market research in a little-covered subject, none of this is available. So even if you have a reasonably large sample, you still have to estimate how well that sample represents the public at large. There are known techniques for this sort of thing, so it's not a total shot in the dark, but I don't see anyway you can assume anything near the familiar "+/- 3%" margin. At a wild guess, maybe more like 10-20%, by which I mean you're measuring how the population at large would answer the question, and not what they would actually do, with an error of -- who knows but let's say -- 10-20%. More than the error you'd assume by just running the sample size and the population size through the textbook formula, anyway.

All of this is assuming that people won't lie to surveyors about illicit activity, and that they are able to accurately report what they might have done in some hypothetical situation. Add to that uncertainties in the model for estimating countries not surveyed and the nice, authoritative statement that "Piracy costs the studios $6.1 billion a year" comes out as "Based on surveys and other estimates done in 2006, we think that people who bought illegal DVDs might have spent -- I'm totally making this up here -- somewhere between $4 billion and $8 billion on legitimate fare that year instead, but who really knows?"

Now $4 billion, or whatever it might really be, is still serious cash. The L.E.K. study at the least makes a good case that people are spending significant amounts on pirated goods they might otherwise have bought from studios. I'm not disputing that at the moment. Rather, I'm objecting to a spurious air of precision and authority where very little such exists. More than that, I'm objecting to an investigative news program taking any such key figure at face value without examining the assumptions behind it or noting, for that matter, that it was commissioned by the same association claiming harm.

And again, this is still leaving aside the crucial question of relevance.

Tuesday, November 3, 2009

60 Minutes and the MPAA: Part III - Interlude

While trying to track down just how video piracy actually works, I followed a Wikipedia external link to a fascinating and well-written article in Legal Affairs about a real live pirate who is also in the employ of a major media conglomerate. I offer it here on its own merit and for comparison purposes against the 60 Minutes piece. Judge for yourself which is a better piece of investigative reporting.

60 Minutes and the MPAA: Part II - Pirates and pirates

OK, I'll say it again: Copyright violation is illegal. Don't do it.

However, if you're doing an investigative piece on video piracy, it would seem useful to distinguish various kinds of piracy. Otherwise there's a risk of throwing out a figure like $6 billion, showing pictures of convicted gang members and later an animation sort of depicting BitTorrent, and having people think that online file sharing sends $6 billion a year into the pockets of gangsters. Not that anyone would ever want to suggest such a thing.

In fact, there are pirates, and then there are pirates.

Gangs make money by selling counterfeit DVDs of movies. The practice is particularly rife (and as I understand it, more in a legal gray area) in Asia.
People trade movies on the internet. No money changes hands.

Lumping all of this under the label "video piracy" captures some common features, particularly that both are illegal and there's a case to be made that both cost studios money, but it ignores the obvious difference in motivation. Busting people on the street is not going to stop file sharing, and somehow shutting down file sharing would not stop people from selling DVDs on the street.

Two different problems, two (largely) different sets of people, and most likely two different solutions.

Monday, November 2, 2009

60 Minutes and the MPAA: Part I - BitTorrent

Before I start: I'm in favor of copyrights, I believe movie makers, even big mainstream Hollywood studios, have a right to make a buck and I think that sneaking out a lo-fi recording of a movie and the guy in the next row snoring, besides being illegal, is just pretty lame. That said ...

CBS's long-running news show 60 Minutes just ran a piece on video piracy. I'm calling it a "piece" and not a news story because it's essentially the Motion Picture Association of America's position on the issue re-routed through a major news show. The MPAA certainly has a right to make its case, and the case is not without merits, but passing it off as news -- particularly old-school investigative reporting -- has considerably less merit.

In my first draft of this post I tried to hit all of my objections, but the result was unwieldy and it soon became clear I had some homework of my own to do. So before digging into the real meat of the issue, let's just talk about BitTorrent.

BitTorrent is widely used for exchanging large files. Said large files include new Linux distributions, legitimately produced free video and, of course, pirated videos. In other words, like pretty much any other information technology out there, it can be used for good or ill. 60 Minutes seems mildly shocked that such a thing could be "perfectly legal."

BitTorrent, as you may know, works by avoiding the bottleneck of a central server. Instead of thousands of home computers all bombarding the central site with requests and overloading even its capacity to fulfill them, BitTorrent uses a central computer to coordinate the home computers distributing the data to each other. The 60 Minutes piece gets this wrong by suggesting that the data itself is going both to and from a central server:

Tiny "bits" moving toward a blue column in the middle of Malcolm's screen are pieces of the movie we were getting from people all around the world. The bits moving away from the column are pieces we have and are sharing with someone else.

You can sort of see where the key concepts got lost in translation, but ... huh? Pieces I'm sending out are going away from the column? Am I the column then? Each piece goes both to and from the column? What's the point of that? If you want to see what's really going on, Wikipedia has a much less whizzy but more accurate picture.

I should pause to point out here that BitTorrent's architecture is a classic example of the value of carefully considering the problem you're trying to solve. Instead of solving "How can you transmit a file quickly from one computer to another?" which basically has only one answer (high bandwidth on both ends), BitTorrent solves "How can you distribute copies of a file among a large number of computers?" Once you look at it that way, the answer of letting a given host pay back its download over time and to lots of other hosts seems fairly natural, but looking at it that way in the first place is an engineering masterstroke.

To use BitTorrent, you have to let a central tracker know what file you're retrieving. You're also committing to upload data to other BitTorrent users in return for downloading. This architecture makes BitTorrent different from other well-known ways of moving big files around:

Unlike the central server model, BitTorrent is distributed. There is no single bottleneck for data. There is a single place for handling metadata, but metadata is much, much smaller.
BitTorrent is much more traceable than older peer-to-peer file sharing systems (but see below).
It's also faster, because you're effectively downloading your file from many other places instead of being constrained by one person's upload speed.

In short, you trade off anonymity for speed. This is a perfectly good trade if you're conducting legitimate business, not so good if you're not. Even if you neglect to tell The Man you're setting up a BitTorrent tracker to share files, the pattern of lots of peer-to-peer traffic coupled with frequent low-volume traffic between everybody to a central node is pretty distinctive. Once the central server is located, watching it is much easier than watching everybody. It's also much easier to tell who's involved, since they're all talking to the tracker. All in all, this seems like much less of a headache for parties like the MPAA than, say, Napster and its descendants.

However ...

BitTorrent's speed is a definite headache. A typical cable-based home connection, at least in the States, has a download speed massively higher than its upload speed. Last time I measured mine, the ratio was 40:1. This makes it reasonably easy for me to download big files from a central, easily traceable server with huge bandwidth, but a real pain for me to send a copy of that file to my friend or whoever. That's fine, but it's much less of a pain for ten thousand people to distribute copies of a file amongst each other.

More worrisome to folks like the MPAA, though, is that it is possible to run the same distribution scheme without a central tracker. As I said, metadata is small, so distributing it to everyone takes relatively little time. A particular mechanism, distributed hash tables, has been around for a while now. As with file sharing itself, distributed hash tables aren't inherently evil. In fact, they have some very generally useful properties. But a highly efficient file distribution system without a visible center presents a real problem if you see your job as preventing people from distributing files.

In summary: Copyright violation is illegal, harmful and lame. BitTorrent can be used for copyright violation or for legitimate purposes, and it's a very neat hack* in any case. Preventing illegal file sharing by purely technical means looks like a tall order. Bashing BitTorrent or any other particular product is unlikely to help.

* I suppose in this context it's particularly important to point out I mean "hack" in the sense of "cool and unexpected thing done with technology," not in the more widespread sense of "act of computer vandalism".

Friday, October 30, 2009

A little word geekery

FAQ is originally an acronym* for Frequently Asked Question, but as often happens usage has drifted. In this particular case, marketing occasionally co-opts FAQ to mean, "A question contrived to produce the message we're really trying to get through as an answer." These range from fairly innocuous, like "Does your product support <some cool feature>?" or "How can I purchase <your wonderful product>?" to the more egregious, like "Is your product much more stable than <evil competitor's>?" ("Why yes, I'm glad you asked that question ...")

Either way it smells of Astroturf**. My personal reaction ranges from amusement, if the information is at least useful, to annoyance if it's not. But the question here today is, is there a word for it? My thought was "FAQE", but while a bit of googling indicates that other people have had the same idea, a bit more googling indicates that few, if any, people are actually using it that way.

And that's actually about as much time as I care to spend on the topic.

* If you pronounce it as a word, like NATO or WASP, it's an acronym, regardless of whether the word existed previously (WASP) or not (NATO). If you pronounce it as letters, like FBI or NBA, and you're feeling pedantic it's an initialism. If you're not, it's also an acronym.

** There should probably be a trademark in there someplace, even though we're not referring to the lovely green carpet used in sports stadiums. Please don't sue.

Monday, October 26, 2009

Well, I kinda had to post this one

It's Galaxy Zoo on the Astronomy Picture of the Day.

Sunday, October 25, 2009

On the importance of convention

What's going on in this picture? If you're from Todmorden, Lancashire, you'll probably know exactly what's up. If you're English or from some other place with a similar traffic system, you'll probably have a pretty good idea: Traffic keeps to the left, the triangles mean "yield", the arrows mean "go this way," of course, and the green circle is a mini-roundabout. If you're coming from the lower right, for example, and want to shift over to the inner roadway, you'll have to yield to anyone who might already be going round the green circle, then go round the green circle yourself to complete a right turn, yield to anyone on the inner roadway and turn left onto it.

All of this is encoded into the markings on the road, the little red triangular sign and, crucially, the minds of the road users who know how to interpret the markings. I should probably mention here that the road users in question are meant to be cyclists under the age of 12 and that grownup mini roundabouts are typically also clearly marked with arrows, not that everyone always takes notice.

To those of us on the other side of the pond, things might not be so obvious. The triangles are on the far side of the intersection if you're on the right, and they're pointing forwards, so maybe they mean "go on through"? The red triangle sign is maybe telling me the road goes around in a circle? The green circle on the road? No idea. Green means "go," maybe? That's consistent with the triangles. Better just blast through there as fast as I can. And keep right.

So what brought this on? I was looking at a revamped version of someone's web page. The page was mostly filled with a rectangular area containing text and figures. The space directly above that was divided into three rectangles, rounded on top, each containing a short phrase. One of the three was highlighted in a contrasting color. The other two highlighted (without unhighlighting the first) as the cursor went over them.

Nowadays most people will probably not have too much trouble figuring out what's going on. The rectangles are tabs, of course, the rollover highlighting reinforces "you can click on me" and clicking will change the contents of the large rectangle. All this is encoded in the shapes on the screen, the highlighting behavior and, crucially, the mind of the viewer who knows how to interpret these signs.

The tab convention, along with several other widely-used conventions, makes modern web pages considerably easier to use than older ones. From a coder's point of view these are not big technical innovations. They're considerably easier to implement now that browsers understand scripting languages, but they could also have been implemented with new HTML markup, and in any case scripting in browsers was envisioned (if not widely available) pretty early in web.history.

New user interface metaphors are innovations in convention much more than technical innovations. One thing that comes across in looking at old web pages is that there wasn't as much shared understanding of what a web page looked like, even though there were fewer choices of how a web page could look.

In no way does this mean that the web will soon be completely hidebound by convention. Traffic planners haven't completely figured out what road signs should look like, and roads have been around for a while now.

[After posting this, I remembered a story I'd heard about a young linguist doing field work (yes, real field work, not haphazard musings about the web). The linguist had learned a few basic phrases and was trying to find out more, and so pointed at a house and asked "What's this?". The local answered (let's say) blah. The linguist dutifully recorded that the word for house was blah. Pointing at a tree, the linguist again asked "What's this?". Again the answer was blah. "Interesting," thought the linguist, "They use the same word for 'house' and 'tree'." Elaborate hypotheses regarding meaning and metaphor began to spin.

Soon the linguist had discovered that blah also meant "dog", "basket" and either "path" or "dirt". How could that be? After a bit confusion and hilarity (on the local's part), it eventually became clear that blah meant "index finger" and that people there pointed at things by pursing their lips in the appropriate direction.]