Saturday, July 5, 2008

Google, Viacom and privacy

A certain amount of controversy over privacy is just part of being Google, not just because Google is a large software company, but also because it aims to make as much of the world's information available to as many of the world's people as it can, subject, of course, to the admonition not to be evil. "Don't be evil" is just three simple words, but just how those three words apply when the bits hit the wire is the stuff of dissertations and lawsuits.

Google is embroiled in at least two significant disputes lately: The ongoing rumbles over Street View, which seem to be getting worked out piece by piece as we go along, and a lawsuit by Viacom over YouTube which, while probably not as bad as the flap over Italian tax privacy, does involve at least a couple of echoes of the AOL search data debacle.

That one certainly looks bad at first blush. In his ruling, Judge Louis L. Stanton has granted Viacom access to "all data from the Logging database concerning each time a YouTube video has been viewed on the YouTube website or through embedding on a third-party website." That data includes "for each instance a video is watched, the unique “login ID” of the user who watched it, the time when the user started to watch the video, the internet protocol address other devices connected to the internet use to identify the user’s computer (“IP address”), and the identifier for the video." [The link above is via Wired. If the original ruling is on the District Court's website yet, I can't find it. If anyone has a better link, please send it]

So there's a bit of a privacy issue there.

Google had two objections to this, first that the database was big. Well, 12 terabytes is a lot of data, but as the judge points out, it's not too big for commodity disks these days. The more serious argument is that the database contains personally identifiable data and is more than Viacom needs to “recreate the number of views for any particular day of a video” and "compare the attractiveness of allegedly infringing videos with that of non-infringing videos."

The judge was not impressed, calling Google's concerns "speculative" and citing a (very reasonable) blog post by Google developer Alma Whitten arguing that an IP address by itself is generally not personally identifiable information. That seems a bit odd since in this case the IP address is not by itself, but linked with just the kind of information that Whitten claims would make it personally identifiable.

However, the main thrust of the judge's argument seems to be that Viacom's use of the data is limited to a particular purpose in the discovery phase of a particular civil case. Presumably, if Viacom is later found to be making other use of the data, or if the data leaks out into the larger world, Google or someone else can come back after them. In the case at hand, Viacom would probably also run afoul of the Video Privacy Protection Act. Well, maybe so, but a genie out of a bottle is a genie out of a bottle ..

It's not clear to me why Google couldn't have just been compelled to disclose what Viacom said it was after: detailed logs of how many people watched what videos at what time, but not which particular people or from what IP address. In the cases Viacom is interested in, where large numbers of people watched copyrighted material, there should be more than enough individuals involved to provide anonymity.

On the bright side, Google and Viacom are now trying to work out how best to implement the court order without giving away personally identifiable information. Google's interest in this is obvious. Viacom also has an interest, though. They would like to be able to say "we only looked at the information we asked for, and we can prove it." No one wants to be seen as the company that inadvertently gave away information on millions of users. AOL went through that. It hurt.

In the background to all this is the long-standing complaint from privacy advocates that Google should have anonymized the YouTube data to begin with, as it has with its web search data. You can't divulge what you don't know, and in a case like the present one Google could have convincingly argued that it can only supply Viacom with what it asked for and no more. This is clearly easier and less error-prone than the present case.

In practice, it's probably not that simple. It's easy to think of a company as a monolith, but only the smallest companies really are. When you get to be Google's size, and the entity in question is a newly-acquired subsidiary, it's not a great surprise that rules and practices would differ.

No comments: