Monday, November 30, 2009

Required reading I haven't read yet

I was going to do a followup on my earlier post on the importance of convention, and I may yet, but as I was researching that response a colleague pointed me at Jakob Nielsen's web site. Nielsen has for some time now been a major voice when it comes to usability on the web, sort of the web's answer to Bruce "Tog" Tognazzini, not that Tog is a stranger to the web, or that he and Nielsen are strangers to each other.

The two have a lot in common, not surprisingly. Notably, they share the strong conviction that if you want to figure out how people are going to respond to a system, you need to actually bring in some people, put them in front of the system and observe what happens.

They also have odd-looking sites. When I first saw Tog's, I thought I'd gotten the name wrong (even though it was the top google hit) and landed on one of those "sorry, the site you wanted is no longer there but here are some commercial links with the words you asked for in them" sites. I can't say why, but it's probably to do with the way the page is laid out, including plain-text ads for Tog's courses. Nielsen, for his part, is defiantly old school, almost all text and links, with only a yellow stripe and multicolored logotype at the top, oddly and odds with Nielsen's advice to avoid looking like an ad at all costs (It's item 7. Curiously there are no anchors for the items themselves).

Both seem to think sans-serif body fonts are fine.

No matter. Both sites are chock full of useful information, clearly and engagingly presented. You might not agree with everything they say, but that's not the point of required reading. Agree or disagree, one should at least know the major arguments. So I'll, um, be working on that ...

[Both Tog's and Nielsen's sites are still around.  They're considerably spiffier than the description above, but still quite spare compared to many other sites.  I would expect that this is more what they were originally going for, but that the tools at the time didn't give satisfactory results. --D.H. May 2015]

Sunday, November 29, 2009

Next blog, please

I'm not sure why this is happening just now, but a portion of the visitors of this blog are visiting a link called /?expref=next-blog. My guess is that, rather than searching for this particular next blog link, these folks got here by starting someplace else and clicking on next blog until they saw something they liked or got bored and went off to do something else. Blog surfing, basically. So I thought I'd do the same and see what else was out there. [The Next Blog button disappeared a while ago, at least from this blog's style sheets --D.H. Dec 2015]

I didn't keep a close count, but the breakdown was roughly:
  • A few family blogs, as in "here's what my family is up to", including one in Swedish. Sort of a year-round online version of the annual holiday letter to one's far-flung friends and relations.
  • A few photo blogs, one linked to flickr and offering to sell prints and send e-cards using the images.
  • A craft blog or two, one in Norwegian and English.
  • Several poetry blogs
  • Nothing technical, whether figuring out the web or anything else.
I have no idea if this is a representative sample, or if not, just how it is selected, or in any case, why the tilt towards Scandinavia. The lack of technical content has an easy and -- to my mind -- encouraging explanation: The web really is accessible to a broad range of people, only a few of whom are interested in its technical workings.

From a purely formal point of view, almost all the blogs hew pretty closely to the prototypical one or several contributors posting sporadically about whatever. That makes perfect sense given that the blog is a form, not a genre, but I was still a bit taken by just how much blogs look and smell like blogs.

One formal experiment that I ran across was Quoted Images, Imaged Quotes, in which a photographer and partner are collaborating to produce a captioned image every day for a year. Rather than writing a caption, the captioner chooses a quote to fit the image (or perhaps vice versa, or both). Even this experiment is not without precedent. Thing a week comes to mind.

Monday, November 23, 2009

Rupert and the interwebs, again

If I understand recent reports correctly, Rupert Murdoch's latest attempt to wring money out of online content rests on a simple, intriguing concept: if you can't control what people are reading, control whether they can find it. So News Corp will be partnering with Microsoft and against Google by blocking access from Google's web crawlers and charging Microsoft for the privilege of indexing News Corp content on Bing. With a 10% or so share of search volume (to Google's 60%), Microsoft is eager to give people a reason to switch. News Corp gets paid, so their end is pretty easy to understand.

On the face of it this seems like a pretty interesting test case of free vs paid content, and the search angle is clever, but just how is this going to work? First, there's the sheer business angle: If I want to read free content from the Wall Street Journal online, it's not exactly hard to find. If I just want to find out about XyzCorp, am I really going to notice that there's nothing there from the Journal? If I am the sort to notice, seems there's a pretty good chance I'm a Journal reader anyway.

For Microsoft, becoming known as the News Corp search engine could be a double-edged sword. It raises obvious issues of bias and could reignite the Microsoft-as-Evil-Empire fire which seemed to have died down of late (or maybe I'm just older now and have worked with enough perfectly reasonable Microsofties).

But more than that is the technical angle. I'm guessing it's going to take, oh, five minutes for some enterprising free content advocate to put up a site in some News Corp-unfriendly jurisdiction that will present essentially the same profile to search engines as the Journal or whatever, without or without actually violating copyright laws. At which point, without any involvement from Google, News Corp is back on Google.

I'd think it would be difficult for Google to stop this sort of thing even if it wanted to, and I'm not convinced they'd want to. They could explicitly blacklist sites, I suppose, in the usual endless cat-and-mouse, but as for automatically figuring out whether a site was just a front for someone else's? Technically enforcing "you can read it but you can't search it" smells like another example of modern-day perpetual motion. Enter the lawyers.

One way or another, there will be lawsuits. My guess -- and I suppose this would be a good time to dust off the old "I am not a lawyer" disclaimer -- is that News Corp will try to assert some right to control searchability, perhaps drawing on existing case law involving reference works and such, but I doubt it will get very far. If it did, it would be game-changing, and not necessarily in a good way.

Sunday, November 22, 2009

Today is yesterday's tomorrow (sort of)

The other night I was watching Ghostbusters II (oh, don't ask why) and right in the middle of it Harold Ramis' character uses The Computer to look up information on a historical figure. I'll use GBII for reference here since it's handy, but I could have picked any number of others.

The Computer has been a staple of science fiction for decades. It's interesting that its role in such movies is very often not to compute but to look something up, as was the case here. Our hero gives the computer the name, and back comes a neatly formatted 80-column by 24-row answer, with underlines and everything, saying who the person is.

Of all the technological devices in such movies, The Computer always seemed among the less plausible. I'm not counting the ghost-zapping equipment as technology; it's magic and falls firmly under suspension of disbelief. The Computer counts as technology because it's assumed just to be there. At some point in the future, super-powerful all-knowing computers will be generally available. How do we know? Just look at the movies ...

There were a couple of reasons The Computer always seemed particularly implausible. First, knowing a bit about real computers makes it harder for me to gloss over the technical hurdles. Force fields? Jet packs? Sure, why not? That's physics. Physics is what you major in if you're too smart for anything else. They'll figure it out. But a computer you can just type some vague query into and get a sensible answer? Come on. Like that'll happen.

Second, it always seemed like a computer smart enough to, essentially, act like the Encyclopedia Galactica would surely have all kinds of other powers that the careful scriptwriter would have to take into account. If The Computer can tell you who the bad guy in the painting is, why can't it tell you how to take him out?

You can probably tell where I'm going with this. Today, about fifteen years after GBII, you can sit down at your home computer, type in the name of a historical figure and very likely come up with a concise, well-formatted description of who the person was, thanks to the now ubiquitous browser/search-engine/Wikipedia setup.

As powerful as it is, though, the system is an idiot savant. It won't tell you how to neutralize a malevelolent spirit (or rather, it won't tell you a single, clear way to do so) and it won't do a lot of other things. It just allows you to quickly locate useful information that's already been discovered and made publicly available. It's powerful, but not magic.

What particularly strikes me about the description above is the presence of Wikipedia. Large, fast networks of computers were already building out in 1994. Mosaic came out while GBII was in production. The missing piece, and one that I don't recall very many people predicting, was the massively-collaborative human-powered Wikipedia, not a technical advance in itself, but something very much enabled by several technical advances.

The Internet, HTTP, browsers, scripting languages, broadband, email, databases, server farms, cell phones, etc. -- these are all technologies. Wikipedia isn't, and yet it fits easily and comfortably into the list of advances from the last few decades. It fills a niche that's been anticipated for decades, but -- fascinatingly -- not by the anticipated means of using sheer computing power to somehow divine the history of the world.

Friday, November 20, 2009

How to report an error

Stereotypically, sports fans are unsophisticated, cave-dwelling couch potatoes. And yet, sports web sites tend to be informative -- if you're looking for sports information, at least -- easy to navigate, full of links to outside content, lively and fun. For example, any site can serve up an error code, but how many serve one up like this?

(If you don't see an error message, you'll most likely see a piece on the botched call on Thierry Henry's now-infamous handball against Ireland. I could say more but I won't.)

Wednesday, November 18, 2009

Murdoch vs. the interwebs

A while ago I mentioned that media übermogul Rupert Murdoch is trying to buck the trend toward free content by charging for his properties' online content, and I wondered how it might play out. Unsurprisingly, I wasn't the only one . In the November issue of Vanity Fair, for example, Murdoch biographer Michael Wolff lays out why the "new media" folks think that Murdoch Just Doesn't Get It. Actually, they're not quite that nice about it:
Almost all Internet professionals, on the other hand, think that charging for general-interest news online is fanciful—“Rubbish … bonkers … a crock … a form of madness,” in the description of Emily Bell, who has long run the Guardian newspaper’s Web site, one of the industry’s most successful—and, in fact, it has been tried before and failed. “It’s Groundhog Day,” adds Bell. The New York Times tried to levy a subscription charge for its columnists but reversed course and declared itself free again. Even Murdoch’s Wall Street Journal, the model of subscription content online, has made more and more of its site free.
(Elsewhere Wolff cites the Grauniad, as Private Eye likes to call it, as a prime example of a locally-known publication making itself into an international brand on the web)

There appears to be a wide consensus that in the newspaper and magazine business, charging for content just means driving away the vast bulk of your readers and angering your columnists by cutting of their exposure to lucrative outside gigs:
The position of Internet professionals is straightforward: while it’s possible to charge for certain kinds of specialized information—specifically, information that helps you make money (and that you can, as with an online Wall Street Journal subscription, buy on your company expense account)—there are no significant examples of anyone being able to charge for general-interest information. Sites where pay walls have been erected have suffered cuts in user traffic of, in many cases, as much as 95 percent as audiences merely move on to other, free options.
Evidently Murdoch's track record with online publishing is not that great. Wolff gives a litany of failures, including MySpace being "flattened" by FaceBook and suggests that the pay wall was one reason that Murdoch has alerady taken a $3 billion writedown on his purchase of the Journal.

Wolff's most intersting argument, though, gives me a chance to drag the not-so-disruptive techonlogy tag out of mothballs:
Murdoch has a larger problem still. It is, after all, not the Internet that has made news free. News in penny-newspaper or broadcast (or bundled cable) form has always been either free or negligibly priced. In almost every commercial iteration, news has been supported by advertising. This is, more than the Internet, Murdoch’s (and every publisher’s) problem: the dramatic downturn in advertising.

[...]

It is hard to imagine that when advertising growth resumes there will not once again be a rush to encourage traffic growth, but right now, the news business, supported for a hundred years by advertising, whose core skill has been selling advertising, believes it must right away, this second, re-create itself with a new business model where advertising is just the cream on top and where it’s the consumer who pays the true cost of newsgathering.
Indeed.

Monday, November 16, 2009

Googling the flu

How can you track the progress of the flu season? If you want rigorous results, you can ask the CDC or its analog where you live for the results of their thorough surveys of public health data. Since that takes a while to compile and cross-check, the definitive answers are weeks out of date. If you're in a hurry, just track how many people are googling for "flu" and similar targets.

The nice folks at Google found that with a little tuning and tweaking, this approach gives results that match the official results remarkably closely. It's not just a neat hack, though it definitely is that. The CDC has taken it seriously enough to collaborate with the Google team on an article published in Nature.

Google is careful to point out that they are using anonymized data aggregated over millions of searches, more than enough for the "I'm Spartacus" effect to come into play. Individual results would be of little help anyway. Just because I google "flu" doesn't mean I'm ill. I might just be doing a little research for my blog.

Looks like Australia is having a fairly flu-free spring so far.

[Google Flu Trends and Google Dengue Trends are no longer active projects, but historical data is still available --D.H. Dec 2015]

Well, this is embarrassing ...

Dead links are part of the fabric of the web. Indeed, tolerance of them is one of the web's major architectural features. Every once in a while, though, one runs into a particularly amusing example. I was just checking my usage statistics and at the bottom was a link encouraging me to "Try Google's free webmaster tools." and "Improve your site's visibility in Google search results."

I'll let you fill in your own punchline.

Sunday, November 15, 2009

Fingerpickin', or Is YouTube the new Folkways?

For some reason the old Doc Watson song* Deep River Blues popped into my head. So I googled, and up came a clip of Doc pickin' it, every note in the right place as usual. Right there in the list of related videos next to it was Leo Kottke's take on the same song. Clearly the same song, but it's fascinating how Doc's sounds like no one but Doc and Kottke's sounds like no one but Kottke.

On the related list for that one in turn is impromptu-looking footage of Doc, Kottke and the late Chet Atkins for good measure picking I-forget-what tune along much the same lines as Deep River. Atkins is mostly comping here, but when he does take a lead, darned if it doesn't sound exactly like Chet Atkins. Note that Kottke, being the junior member, doesn't seem to merit a mention in the opening captions.

One could spend a fair bit of time (and this one did) chasing links, comparing and contrasting the styles and techniques of the greats, almost as though there were something to this whole web-as-online-educational-resource thing. If that sounds intruguing to you, I'll leave you to it. In which case if you run across bootleggy-looking versions** of Eight Miles High by Kottke, Michael Hedges and Roger McGuinn himself, say hi for me.

* He didn't write it, just made it his
** I'm going to assume here that YouTube would have taken down anything egregiously illicit, and/or that Kottke, McGuinn and the Hedges estate are not greatly offended. See the previous post on iBiblio for why McGuinn in particular might not mind.

Thursday, November 12, 2009

No monopoly on BitTorrent hype

Google tells me that a few people have turned up this blog by searching for "60 minutes bittorrent". I couldn't replicate that effect, perhaps because I didn't dig deeply enough into the hits, more likely because other hits have taken Field Notes' place, but I did turn up some other interesting tidbits:
  • The top hit was for a BitTorrent site offering downloads of 60 Minutes, including a couple of files dated 11/1, presumably the episode in question. I'm not going to include the link here, as I don't see any indication that the particular site has permission to distribute that content. However ...
  • A few pages further down, past several other unfriendly headlines, is a piece entitled Leslie Stahl Needs to Get A Clue About P2P. Hmm ... I wonder where they stand on the issue? Deftly dodging the various popups, I was reminded that BitTorrent (the company) partnered with several of the major studios back in late 2006. However ...
  • I'm hard-pressed to find any legal BitTorrent service for movies from the major studios -- something, for example, where you pay $X and then download the movie of your choice, duly uploading bits of your previously-purchased movies to other paying customers. Maybe it's there and I missed it, but neither BitTorrent's site, Fox Movies' site nor a general Google search turned up anything likely. By contrast, Roxio's Cinema Now delivers movies in a large variety of ways, but all of them involve a dedicated device.
  • Conversely the handful of BitTorrent sites I clicked through to explicitly don't charge anything (and handle an, ahem, wide variety of content).
So ... BitTorrent seems like a fine way to distribute bulky content -- whether legit or illicit -- for free, but not so good for paid content. No great surprise there. But, just as 60 Minutes deserves grief for putting forth the MPAA party line as an investigative report, and the MPAA for claiming loss of revenue it could never have realistically had in the first place, it comes off as disingenuous for someone defending BitTorrent to claim that the movie industry is actually benefitting from BitTorrent, based on a 2006 press release.

Wednesday, November 11, 2009

30 seconds amusingly spent

A colleague mailed this bit of fun out to everyone.

The shape of the backbone

I was all set to write a somewhat philosophical post comparing the internet backbone to the phone system that came before it, when I realized I know very little about the internet backbone. Since I claim not just to be writing about the web, but actually figuring it out (however slowly) as I go along, that seemed like a pretty embarrassing lapse. So I've started doing a little basic digging and, so as to have something to post while I'm doing it, pass along some of my findings.


There was a British general once who, whenever he was sent to a new theater of battle, would start off by making a crude, not-painstakingly-to-scale map of the major roads and rail lines in the territory, noting the distances between the most important cities, junctions and other points of interest. This exercise would pay dividends later, not just in the day-to-day running of the campaign, but in preventing blunders on the order of depending on troops to cover a week's worth of distance in two days. It would also give some idea of the political structure of the place -- who was likely to be trading with whom, who might not care too much about what was going on where, and so forth -- and help greatly in deciding which supply lines to guard where, which enemy supply lines to attack where, etc., etc.

Along those lines (minus the military perspective), it seems useful to start with a very large-scale look at the international internet backbone, with an eye toward who's connected to whom and with what capacity. Physical distance is not as important here, though latency can matter in some cases.

Here's just such a map, dating from 2005, prepared by TeleGeography research for the International Telecommunications Union. Please don't look at the part that says "Proprietary and Confidential". There are probably more recent maps available* but this should give a reasonable impression. A few things that jump out:
  • North America <-> Europe dominates, followed by North America <-> Asia with roughly 40% of that capacity, followed by everyone else, far behind. In particular, there is very little capacity, relatively speaking, directly linking Europe and Asia. If you're in Bangalore talking to Bucharest, you may well be going by way of San Francisco.
  • Capacity to and from Africa is almost negligible on this map, but see below.
  • Capacity on these scales is currently measured in Gbps (the document gives 5- and 6-figure numbers in Mbps, but most of that precision is probably spurious). That's pretty big compared to my home connection, but not really stunningly large considering how many people are involved. By comparison, the human visual system appears to have a capacity on the order of 1Gb/s, so the 500Gb/s pipe between North America and Europe is, in some notional sense, roughly equivalent to 500 pairs of eyes. Put more realistically, though, 500Gb/s is about 125,000 DVDs playing simultaneously.
  • What these numbers actually mean is a different question entirely. Suppose for example that everyone in the US wants to follow the European football leagues online -- not that that's going to happen anytime soon. Would those millions of viewers saturate the pipe? Hardly, since the main pipe only has to carry a few video feeds at any given time.
As I said, the big missing piece in the 2005 picture is bandwidth between Africa and the rest of the world. As of this summer, that bandwidth has gone from practically nothing to about 1.3 Tb/s ,or 1300Gb/s to put it in the same units used above. That's over twice the size of the Europe <-> North America pipe, a classic example of leapfrogging technology assuming we're comparing apples to apples. Even if we're comparing apples to pears it still looks like a pretty big pipe.

(*) Back in the days that bang paths roamed the earth, one could subscribe to a newsgroup which published frequently updated maps of the backbone as it then existed, meaning mainly a bunch of T1 and similar lines. A T1 line could carry about 1.5 Mb/s, or .0015 Gb/s, or about 0.0003% of the Europe <-> North America link.

Friday, November 6, 2009

Moto perpetuo

It occurs to me that unbreakable copy protection is the perpetual motion of our day.

Back at the beginnings of the industrial revolution, when inventions like the steam engine and electrical generator were making new and mysterious things possible and were not widely understood, people were constantly coming up with perpetual motion schemes. And why not? If you can generate more power than hundreds of strong men and horses can produce just by burning coal, and transmit that power miles and miles away with simple metal wires, is it so implausible that some arrangement of magnets and overbalanced wheels could generate endless power from nothing?

Eventually, in the early 1800s, after commercial steam power had been around for about a century, the principle of conservation of energy came to be widely accepted and by the the middle of the century the familiar laws of thermodynamics were established, including the crucial first two:
  1. You can't win (conservation of energy).
  2. You can't break even (entropy increases in a closed system).
These two principles explain why perpetual motion schemes don't work. That hasn't stopped people from coming up with them, but it has stopped knowledgeable engineers and scientists from wasting time on them. It hasn't completely stopped investors from investing in them, but the long and sorry track record of such schemes probably has been a deterrent.

Why do people still bother, then? Because if it were possible, large-scale perpetual motion would do away with energy shortages forever. It woudn't necessarily make any money, infinite supply implying zero price, and an energy surplus would have drawbacks of its own, but we could probably deal with those problems when they came up. The point is that people try to prove that perpetual motion is possible because they really, really want it to be.

Anyone in the business of selling information would really, really like to be able to control the propagation of that information. You do the math.

I don't know of any specific principle of information theory that explains why this will never work, but there's a growing body of empirical evidence to that effect. Intuitively, copying bits costs much less than the price sellers would like to charge, so the protection has to come in the conversion of those bits into usable form. That runs you right in to the analog reconversion problem, of which "camming" (sneaking cameras into movie theaters) is a crude but effective example.

Clearly none of this is currently stopping people from trying to come up with copy protection schemes, or people from paying for them. The track record probably isn't quite long or sorry enough yet. I suspect it eventually will be.

Fortunately, selling bits and making them impossible to copy are two different things.

Thursday, November 5, 2009

60 Minutes and the MPAA: Postscript

In the last few posts on this topic (and I hope this will be the last one for a while), I tried to keep the 60 Minutes bashing toned down to a dull roar and concentrate more on the technical and economic issues. And besides, as it turns out, others have done it better and funnier [Unfortunately this link appears dead, as in not-even-on-the-Wayback-Machine dead.  I no longer remember what it said, but I'm fairly sure it was pretty funny.  Sic transit ... --D.H. May 2015].

From the first link, I learned that 60 Minutes has gotten this one wrong once before, and from the second I learned how to fill up an entire web page in order to display a message of no more than 140 characters. "Twitterati", eh?

Wednesday, November 4, 2009

60 Minutes and the MPAA: Part VI - What now?

Along with passing along the $6 billion/year figure, Steven Soderbergh tells 60 Minutes that, thanks to piracy, movies that got made in the past could not be made today. He cites The Matrix as an example.
"The chances of a movie, for instance, like 'The Matrix' being made shrinks. Here's a guy, here's a movie, two guys, they've made a small independent film. Warner Brothers gives them $75 million to make this script that nobody can understand, right?" Soderbergh said. "Wouldn't happen today."
Now, I'm not going to claim I know anywhere near what Steven Soderbergh knows about getting movies made. I will go so far as to claim I know next to nothing. And yet, looking at the Yahoo! box office grosses, I can't say I see anything amiss.

Clearly all kinds of movies are still getting made, including some pretty expensive-looking ones. A lot of them stick pretty close to the usual formulas, but that's always been true. Movies by unknowns? I wouldn't know, but I'm quite sure that intersting movies by intersting people with interesting viewpoints are still getting made.

But what about the future? The movie studios are understandably worried, particularly in light of what their brethren in the music business have been going through. But every industry is unique. Movies are not the same as songs and albums. Nobody goes to a "music theater" to listen to pre-recorded music.

Making a movie, even a cheap movie, carries a lot more overhead that recording a song, and songs are easier to market. At least one songwriter has recorded a song a week for a year at a stretch, and at least one of them was pretty good. Plenty of small bands self-produce and self-distribute, and a lot of them are pretty good. By comparison very few feature films are made without the involvement of some flavor of existing studio. There's plenty of self-produced stuff on YouTube, but much less of it is pretty good and most of it goes unseen or nearly unseen anyway.

In short, the economics are different and hey, people still make money selling books, to my continual puzzlement. So my guess is that the movie industry is going to be just fine, particularly if it stops trying to boil the ocean and embraces online distribution.

I certainly hope so, anyway. As much as I've questioned the MPAA's rhetoric and logic, I wholeheartedly agree with them on some basics: Movies are cool, and people who make them should be able to get paid for their work.

P.S. While chasing down the link above on YouTube, I ran across two previous posts on paying for movies, which might be relevant.

60 Minutes and the MPAA: Part V - Relevance

OK, so the L.E.K. study says that people who buy pirated DVDs say they would have collectively spent billions of dollars on legitimate fare if the pirated DVDs hadn't been available. Let's assume these people are telling the truth and are accurately estimating how much they would have spent. So we're done, right? That's how much money the studios are losing.

Actually there's another level of estimation involved. The $6.1 billion quoted was the amount the studios were said to be losing and itself is a portion of the larger total that the motion picture industry as a whole was said to be losing. But let's take that, too, at face value. Now we're done, right?

Well ... remember when I was discussing BitTorrent in Part I and mentioned the importance of carefully considering what problem you're trying to solve? The principle is just as vital here.

The MPAA, like the music industry before it, and the software industry before it, seems to be trying to solve the problem of keeping people from copying bits. Being no more able to do this than their predecessors, and with Moore's law catching up with them just like it did with everyone else, they -- again like everyone before them -- claim damages by comparing the real world with what they might have had if they could stop people from copying bits.

Fair enough, but they can't. No one can, and a false antecedent implies anything you want it to. Rather than trying to keep people from copying bits, would it not be better to frame the problem as how to make money from making movies?

The traditional way of doing that, selling tickets at theaters, is still bringing in revenue, on what I'd call a slight upward trend and what the site I got the figures from says is "not any substantial increase". That's not great news for an industry constantly trying to grow, particularly once you adjust for inflation, but neither are they falling off a cliff. Evidently "Let me take you to the movies" can sometimes have more appeal than "Let's go back to my grungy apartment and watch a DVD."

Leaving aside the TV networks, cable movie channels and pay-per-view (which is not necessarily a valid approach) the complaint, and certainly the thrust of the L.E.K. study, is that DVD sales are not doing as well as expected. They were supposed keep chugging along like video before them, but they appear to be levelling off or even falling. At the same time people are selling lots of pirated videos, so the difference is attributed to piracy. QED.

But correlation does not imply cause. Certainly one interpretation of those facts is that piracy is hurting DVD sales, but another one, which I personally find more plausible, is that since bits are getting easier to copy and watching digital movies no longer requires a DVD, DVDs just aren't that valuable any more and the original sales projections that they would be were just wrong. Another symptom of the same technological changes is that making shoddy DVDs is easy enough that you can still make money off of them by charging closer to what they're worth, but neither of these symptoms is the cause of the other.

I don't recall when I last bought a DVD and I can assure you it's not because of piracy. Why pay $20 or even $12 for a DVD to take home and unwrap when I can order up a movie on demand for $4 or (if it's not particularly popular) watch it over Netflix as part of my $10/month subscription? How many DVDs do I watch more than three or four times? The commentary tracks seemed cool at first, but I don't really have time for those, either. Manifestly, I'd rather blog about DVDs than buy them.

I would, however, gladly pay more than $10/month for some sort of "premium" Netflix subscription that would open up more selection. I suspect I'm not alone. There's certainly money to be made legitimately by selling high-quality video online that won't get you sued or arrested. It's not a foregone conclusion that there's enough money to be made to keep the movie studios going in the style to which they (and we) have become accustomed, but I'm quite sure that money is not to be found in trying to prevent bit-copying.

60 Minutes and the MPAA: Part IV - Error bars

In the 60 Minutes piece I've been referencing, A-list director Steven Soderbergh drops the oft-quoted figure of $6.1 billion per year in industry losses. This figure comes from a 2006 study by consulting firm L.E.K. It's easy to find a summary of this report. Just google "video piracy costs" and up it comes. Depending on your browser settings, you may not even see the rest of the hits, but most of the top ones are repeats or otherwise derived from the L.E.K study. And you didn't need to see anything else anyway, did you?

So ... $6.1 billion. Let's assume for the moment that the figure is relevant -- more on that in the next post. How accurate is it?

One of the handful of concepts I retained from high school physics, beyond Newton's laws, was that of significant digits, or "sig digs" as the teacher liked to call them. By convention, if I say "6.1 billion", I mean that I'm confident that it's more than 6.05 billion and less than 6.15 billion. If I'm not sure, I could say 6 billion (meaning more than 5.5 billion and less than 6.5 billion).

Significant digits are just a rough-and-ready convention. If you're serious about measurement you state the uncertainty explicitly, as "6.1 billion, +/- 300 million". My personal opinion is that even if you're not being that rigorous, it's a bad habit to claim more digits than you really know, and a good habit to question anything presented like it's known to an unlikely number of digits.

The point of all this is that precise results are rare in the real world. Much more often, the result is a range of values that we're more or less sure the real value lies in. For extra bonus points, you can say how sure, as "6.1 billion, plus or minus 300 million, with 95% confidence".

From what I can make out, L.E.K. is a reputable outfit and made a legitimate effort to produce meaningful results and explain them. In particular, they didn't just try to count up the number of illegal DVDs sold. If I buy an illegal DVD but go and see the movie anyway, or I never would have seen the movie at all if not for the DVD, it's hard to claim much harm. So L.E.K. tried to establish "how many of their pirated movies [viewers] would have purchased in stores or seen in theaters if they didn't have an unauthorized copy". They did this by surveying 17,000 consumers in 22 countries, doing focus groups and applying a regression model to estimate figures for countries they didn't survey. (This is from a Wall Street Journal article on L.E.K. web site and from the "methodology" section of the summary mentioned above).

On average, they surveyed about 800 people per country, presumably more in larger countries and fewer in smaller. That's enough to do decent polling, but even an ideal poll typically has a statistical error of a few percent. This theoretical limit is closely approached in political polls in countries with frequent elections, because it's done over and over and the pollsters have detailed knowledge of the demographics and how that might effect results. They apply this knowledge to weight the raw results of their polling in order to compensate for their sample not being completely representative (for example it's weighted towards people who will answer the phone when they call and are willing to answer intrusive questions).

For international market research in a little-covered subject, none of this is available. So even if you have a reasonably large sample, you still have to estimate how well that sample represents the public at large. There are known techniques for this sort of thing, so it's not a total shot in the dark, but I don't see anyway you can assume anything near the familiar "+/- 3%" margin. At a wild guess, maybe more like 10-20%, by which I mean you're measuring how the population at large would answer the question, and not what they would actually do, with an error of -- who knows but let's say -- 10-20%. More than the error you'd assume by just running the sample size and the population size through the textbook formula, anyway.

All of this is assuming that people won't lie to surveyors about illicit activity, and that they are able to accurately report what they might have done in some hypothetical situation. Add to that uncertainties in the model for estimating countries not surveyed and the nice, authoritative statement that "Piracy costs the studios $6.1 billion a year" comes out as "Based on surveys and other estimates done in 2006, we think that people who bought illegal DVDs might have spent -- I'm totally making this up here -- somewhere between $4 billion and $8 billion on legitimate fare that year instead, but who really knows?"

Now $4 billion, or whatever it might really be, is still serious cash. The L.E.K. study at the least makes a good case that people are spending significant amounts on pirated goods they might otherwise have bought from studios. I'm not disputing that at the moment. Rather, I'm objecting to a spurious air of precision and authority where very little such exists. More than that, I'm objecting to an investigative news program taking any such key figure at face value without examining the assumptions behind it or noting, for that matter, that it was commissioned by the same association claiming harm.

And again, this is still leaving aside the crucial question of relevance.

Tuesday, November 3, 2009

60 Minutes and the MPAA: Part III - Interlude

While trying to track down just how video piracy actually works, I followed a Wikipedia external link to a fascinating and well-written article in Legal Affairs about a real live pirate who is also in the employ of a major media conglomerate. I offer it here on its own merit and for comparison purposes against the 60 Minutes piece. Judge for yourself which is a better piece of investigative reporting.

60 Minutes and the MPAA: Part II - Pirates and pirates

OK, I'll say it again: Copyright violation is illegal. Don't do it.

However, if you're doing an investigative piece on video piracy, it would seem useful to distinguish various kinds of piracy. Otherwise there's a risk of throwing out a figure like $6 billion, showing pictures of convicted gang members and later an animation sort of depicting BitTorrent, and having people think that online file sharing sends $6 billion a year into the pockets of gangsters. Not that anyone would ever want to suggest such a thing.

In fact, there are pirates, and then there are pirates.
  • Gangs make money by selling counterfeit DVDs of movies. The practice is particularly rife (and as I understand it, more in a legal gray area) in Asia.
  • People trade movies on the internet. No money changes hands.
Lumping all of this under the label "video piracy" captures some common features, particularly that both are illegal and there's a case to be made that both cost studios money, but it ignores the obvious difference in motivation. Busting people on the street is not going to stop file sharing, and somehow shutting down file sharing would not stop people from selling DVDs on the street.

Two different problems, two (largely) different sets of people, and most likely two different solutions.

Monday, November 2, 2009

60 Minutes and the MPAA: Part I - BitTorrent

Before I start: I'm in favor of copyrights, I believe movie makers, even big mainstream Hollywood studios, have a right to make a buck and I think that sneaking out a lo-fi recording of a movie and the guy in the next row snoring, besides being illegal, is just pretty lame. That said ...

CBS's long-running news show 60 Minutes just ran a piece on video piracy. I'm calling it a "piece" and not a news story because it's essentially the Motion Picture Association of America's position on the issue re-routed through a major news show. The MPAA certainly has a right to make its case, and the case is not without merits, but passing it off as news -- particularly old-school investigative reporting -- has considerably less merit.

In my first draft of this post I tried to hit all of my objections, but the result was unwieldy and it soon became clear I had some homework of my own to do. So before digging into the real meat of the issue, let's just talk about BitTorrent.

BitTorrent is widely used for exchanging large files. Said large files include new Linux distributions, legitimately produced free video and, of course, pirated videos. In other words, like pretty much any other information technology out there, it can be used for good or ill. 60 Minutes seems mildly shocked that such a thing could be "perfectly legal."

BitTorrent, as you may know, works by avoiding the bottleneck of a central server. Instead of thousands of home computers all bombarding the central site with requests and overloading even its capacity to fulfill them, BitTorrent uses a central computer to coordinate the home computers distributing the data to each other. The 60 Minutes piece gets this wrong by suggesting that the data itself is going both to and from a central server:
Tiny "bits" moving toward a blue column in the middle of Malcolm's screen are pieces of the movie we were getting from people all around the world. The bits moving away from the column are pieces we have and are sharing with someone else.
You can sort of see where the key concepts got lost in translation, but ... huh? Pieces I'm sending out are going away from the column? Am I the column then? Each piece goes both to and from the column? What's the point of that? If you want to see what's really going on, Wikipedia has a much less whizzy but more accurate picture.

I should pause to point out here that BitTorrent's architecture is a classic example of the value of carefully considering the problem you're trying to solve. Instead of solving "How can you transmit a file quickly from one computer to another?" which basically has only one answer (high bandwidth on both ends), BitTorrent solves "How can you distribute copies of a file among a large number of computers?" Once you look at it that way, the answer of letting a given host pay back its download over time and to lots of other hosts seems fairly natural, but looking at it that way in the first place is an engineering masterstroke.

To use BitTorrent, you have to let a central tracker know what file you're retrieving. You're also committing to upload data to other BitTorrent users in return for downloading. This architecture makes BitTorrent different from other well-known ways of moving big files around:
  • Unlike the central server model, BitTorrent is distributed. There is no single bottleneck for data. There is a single place for handling metadata, but metadata is much, much smaller.
  • BitTorrent is much more traceable than older peer-to-peer file sharing systems (but see below).
  • It's also faster, because you're effectively downloading your file from many other places instead of being constrained by one person's upload speed.
In short, you trade off anonymity for speed. This is a perfectly good trade if you're conducting legitimate business, not so good if you're not. Even if you neglect to tell The Man you're setting up a BitTorrent tracker to share files, the pattern of lots of peer-to-peer traffic coupled with frequent low-volume traffic between everybody to a central node is pretty distinctive. Once the central server is located, watching it is much easier than watching everybody. It's also much easier to tell who's involved, since they're all talking to the tracker. All in all, this seems like much less of a headache for parties like the MPAA than, say, Napster and its descendants.

However ...

BitTorrent's speed is a definite headache. A typical cable-based home connection, at least in the States, has a download speed massively higher than its upload speed. Last time I measured mine, the ratio was 40:1. This makes it reasonably easy for me to download big files from a central, easily traceable server with huge bandwidth, but a real pain for me to send a copy of that file to my friend or whoever. That's fine, but it's much less of a pain for ten thousand people to distribute copies of a file amongst each other.

More worrisome to folks like the MPAA, though, is that it is possible to run the same distribution scheme without a central tracker. As I said, metadata is small, so distributing it to everyone takes relatively little time. A particular mechanism, distributed hash tables, has been around for a while now. As with file sharing itself, distributed hash tables aren't inherently evil. In fact, they have some very generally useful properties. But a highly efficient file distribution system without a visible center presents a real problem if you see your job as preventing people from distributing files.

In summary: Copyright violation is illegal, harmful and lame. BitTorrent can be used for copyright violation or for legitimate purposes, and it's a very neat hack* in any case. Preventing illegal file sharing by purely technical means looks like a tall order. Bashing BitTorrent or any other particular product is unlikely to help.

* I suppose in this context it's particularly important to point out I mean "hack" in the sense of "cool and unexpected thing done with technology," not in the more widespread sense of "act of computer vandalism".