Friday, June 13, 2008

The slowest fiber service in Europe

British phone/internet provider BT is rolling out fiber-optic service to homes in Ebbsfleet, Kent. BT says the speeds are "Higher in fact that anyone currently needs." Critics say it's "The slowest fiber service in Europe". Both statements are factual, but neither seems very helpful.

On the one hand, how much bandwidth does one need? The answer could range from zero, given that thousands of generations of humans survived before the internet, to "enough to saturate the senses of everyone in the house simultaneously" (not currently available anywhere I know of).

The BT offering is somewhere in between. Sustained bandwitdh is 2.5Mb/s, shy of full DVD, but it can also handle "bursts" of up to 100Mb/s. That's a fairly broad range, and it's not clear how long a burst can be. If it's say, 10 seconds, then in that time you could buffer up about four minutes of DVD video, and if you could do it again every few minutes you should be able to keep the buffers full. Of course, if you "need" HD, you'll need to buffer more at the outset. If you "need" to watch two or more different live HD offerings at once, you're probably out of luck.

On the other hand, if your bandwidth is adequate to your needs, what does it matter how big a pipe they have on the Continent, or in Asia or wherever? But maybe that's just sour grapes from a (relatively) bandwidth-constrained Yank.

Thursday, June 12, 2008

Virgin Media, copying and use

BBC Blogger Bill Thompson is concerned that his ISP, Virgin Media, appears to be monitoring its customers' activity for signs of illegal copying. Along with the obvious general question of who can monitor whose activity when and for what, he has a specific concern about copying via BitTorrent:

Like almost every technically-competent internet user of my acquaintance I've used BitTorrent to get my hands on a copy of a TV show that I missed, taking advantage of the kindness of strangers who bothered to record and upload the shows for fans because the companies that make and broadcast them choose not to.

However I also go out and buy the DVD box sets as soon as I can.

And I don't feel like a criminal, because I don't see why downloading a copy of a show that someone else has recorded should be seen as a breach of copyright while recording it myself onto a DVD is not.
It's certainly not wrong to make a copy of something you've already paid for, and by that argument it doesn't seem wrong to make a copy of something that you fully intend to pay for and do pay for reasonably soon. But the question here is not whether something is right or wrong, but whether it is legal. Ideally, these are closely related questions, but we're talking about digital media here.

A while ago I had a small epiphany from Linus's assertion that copyright is about distribution and not use. The copyright holder can't and shouldn't be able to control how you use something you've bought, but it does have some say over who can copy it and how.

Thompson's two cases look the same from the point of view of use. In either case, he's watching something and paying for it. They look completely different from the point of view of copying and distribution. In one case he's following the licensed method and in the other case he's not.

It's also pretty clear why the distributor would care. In the licensed case, it knows exactly who copied what and it knows it's getting paid. In the unlicensed case, it knows neither, particularly not the latter. Many people, like Thompson, are honest, but even an honest person might be tempted to think "Oh, I only watched it once" or "I'll pay for it next week" and next week never comes.

Of course, this being the law, things are not always so clear-cut. It's generally a copyright violation to charge people admission to watch a DVD you bought. That looks more like "use" than "distribution", unless you squint just right.

As always, remember I'm not a lawyer.

A cautionary tale from AOL

Anyone doing research on, say, the locations of people's cell phones would have to be aware of, and keen not to repeat, the great AOL search data debacle of 2006.

I have to admit I didn't follow this closely at the time. Seemed like the kind of thing that was bound to happen sooner or later, and might happen a little less often given that AOL, despite repeated and doubtless sincere apologies, lost business and was generally humiliated for its troubles. But as fate would have it, two stories I wanted to comment on intersected exactly there. One was the piece on cell phones, and the other I'll get to in a bit.

There is certainly value in gathering anonymized bulk data and studying overall patterns. Paul Boutin has an interesting informal analysis of the AOL data, for example. Unfortunately, there are limits to how anonymous that data can be.

Anonymity depends critically on everyone being able to plausibly say "How do you know it was me? It could have been any of these people." I call this the "I'm Spartacus" effect, and it in turn depends on not giving away specific, unique data.

It turns out that people's internet searches can be very specific indeed. Sure, lots of people search for popular products, or celebrities, or any of a number of other things, but we also search for friends or acquaintances, or local businesses, or organizations we belong to or what-have-you. In the case of the AOL data, the New York Times had no trouble tracking down a lady in Georgia, who was kind enough to be interviewed, and several other searchers have also been identified.

At least one searcher, User 927, became notorious even without being identified, owing to a particularly disturbing search history, and is now the inspiration for a play of the same name. This was the other news item that led me to revisit the AOL fiasco. I haven't seen the play and doubt I will, just as I doubt User 927 will be laying claim to any of royalties.

Naturally, AOL tried to put the genie back in the bottle, and naturally it failed. The raw data is available on several sites -- you can search for them, of course -- and at least one site lets you search the searches on line. I wonder if they log that.


[The domain name for the original link for the play seems to have turned over since this was written.  The link I gave now points at a banking site somewhere in Scandinavia.  I've updated to an Ars Technica article on the play -- D.H. Sep 2018]

Wednesday, June 11, 2008

More on cell phones as tracking devices

It was this BBC piece on a recent study at Northeastern University that set me musing about tracking via cell phone.

The article is sort of a roller coaster ride of "yikes!":

It would be wonderful if every [mobile] carrier could give universities access to their data because it's so rich

The researchers said they were 'not at liberty' to disclose where the information had been collected.

... giving way to "that's not so bad":

[S]teps had been taken to guarantee the participants' anonymity

[W]e only know the coordinates of the tower routing the communication, hence a user's location is not known within a tower's service area

... and the occasional "hmm ...":

Nokia have put forward an idea to attach sensors to phones that could report back on air quality. The project would allow a large location-specific database to be built very quickly.

Ofcom is also planning to use mobiles to collect data about the quality of wi-fi connections around the UK.

Evidently the business of attaching interesting sensors to cell phones is expected to boom in the next few years.

The real punchline, though, was the unsurprising conclusion that most people's daily activities are pretty boring: "The study concludes that humans are creatures of habit, mostly visiting the same few spots time and time again. Most people also move less than 10km on a regular basis[.]" Even those that travel further still tend to visit a small number of places repeatedly.

It's natural to be concerned about the ever-increasing speed of communication, and the prospect that at some point everyone might have access to everything known about everyone. But on the other hand it's comforting to know that one's own activities are probably too boring for most people to care about.

Monday, June 9, 2008

It's 2:00 am. Does your cell phone know where you are?

As I understand it, cell phones are called cell phones because the area of coverage is divided into (generally overlapping) "cells", each covered by a given tower. This means that if you're connected to the network, the provider will be able to tell, at a minimum, which cells you're in. By looking at signal strength from the towers involved one can get a much more accurate estimate. And as if that's not enough -- and apparently it isn't -- GPS is becoming a standard feature.

Having a precise, accurate location device on hand at all times can be handy and in some cases even life-saving. On the other hand, having an unobtrusive tracking device on one's person at all times raises some obvious privacy issues.

There are two contrasting extreme views on this sort of thing. The Utopian view plays up the "never lost" and "find a restaurant" features and goes on to argue that a world where everybody can locate everybody else is a fundamentally Good Thing.

The dystopian view plays up the privacy concerns, argues that The Man wants to know where you are and, further, wants to make it nearly-impossible to live without your personal tracker.

Naturally, I don't subscribe to either extreme view. I'm not really excited by the idea of a service that alerts me if a friend happens to wander into my vicinity (or vice versa), but neither do I see the whole thing as a step down the slippery slope towards Big Brother. I am a bit concerned that it's easy to forget, or never really realize, how locatable you are when you carry a cell phone, but that problem has been around for a while now.

On the balance I see it as technology taking yet another incremental step and life going on more or less as usual.

The web as distinct from its applications

In a fairly interesting article pondering what the next big platform, or platforms might be, Josh Quittner parenthesizes:
(Yes, the Web is nothing more than a big layer of code; all those websites we visit are merely applications that sit atop it.)
Now, I think I get what angle this is coming from. I've argued myself that you don't interact with "the web" but with a web application. Even so, I don't think the picture above is quite right, or even quite consistent. The web applications are indeed a layer of code. But if they sit on top of the web itself, then what is the web itself?

Muddying the waters a bit is the recent swing towards fatter clients, represented by the AJAX head of the Web 2.0 hydra, but for my purposes here it doesn't much matter where the code is sitting, whether in the browser or at the other end of the connection. Wherever the code is running, there's you, there are "all those websites" and either
  • Nothing else, in which case the applications aren't a layer on top of the web, they are the web.
  • Something else, in which case what?
I'm reasonably comfortable with either view. Either one can be made to fit my earlier working definition of the web as "all resources accessible on the net".

That definition is deliberately vague on what a resource is. If by resource you mean "web application", then you have the "nothing else" view. This works as long as you include the application's data with the application. That's reasonable, and good if you have an "active data" point of view.

On the other hand, the original web was (largely but not completely) about hypertext documents linked to each other, and the modern web is still very much about documents (or other data sets) and links between them. Much of the modern machinery is about either finding documents more effectively or about presenting the findings in a more useful or interactive way.

Following this line of reasoning a bit, you can use different applications to get at the same resources, and the same application to get at different resources. If the resources are the web, then the applications stand in an M:N relation to them, certainly not 1:1, and are thus clearly a different thing.

That's not to say that a resource can't itself be an application, say an annoying popup-filled multimedia experience. Rather, a resource isn't always an application, an application that accesses the web isn't necessarily a web resource, and the web does not appear to be a big layer of code with websites sitting on top of it.

Tuesday, June 3, 2008

Filters through a personal datastore lens

Returning to the theme of personal datastores ...

Services like The Filter consist of several parts:
  • A social networking component. Who knows whom? Everybody and their dog has this now, which is unfortunate, since each then has its own slightly different copy.
  • A database of things you've done through the service. Music services, for example, have a record of what you've bought and perhaps of what you play. Your email service, whether it's web-based or not, has a record of whom you've emailed and who's in your address book. Unlike the case of social networks, different kinds of services have different kinds of associated data, but for a particular kind of data, each service still has its own slightly different copy.
  • An engine that pulls the rest of the data together and tries to give you something of value from it. This is the secret sauce. Different music services will have different ways of recommending music (and different music collections to draw from).
One of the core tenets of personal datastores, as I understand the concept, is that data about you should be centered around you and you should control it. In the current "data fiefdom" model, every service has its own record of whom you know and what you've done. That's of value to them, of course, so they treat it as their data, but this means that, leaving aside the obvious privacy concerns, each service has a different, incomplete view of your world. Such an incomplete view is less valuable to the service provider than a complete one would be.

On the other hand, it's about you, so why shouldn't you control it? If you control data about your history and connections, you benefit because you control access to it and because all your data is in one place. In such a world, if you subscribe to a service that wants to know about your listening habits, you give it permission to see your music choices -- and nothing else. Then when you listen to a song using whatever means you like, the recommendation service knows about it. You don't have to use their web site or do things their way. If you switch services, you don't have to somehow move your history from the old service to the new one.

Whether the service providers would go for such a scheme remains to be seen. On the one hand, it lets them give better service, because they have better information and can get to it easily and uniformly. On the other hand, it eliminates a form of "vendor lock-in". If you can easily switch from their service to someone else's, maybe you will. This is a problem for established players, but a good thing for upstarts.

If you buy the premise that data about connections and history should move from the services to the person using the service (or more likely a datastore provider acting on that person's behalf), then all that's left is the engine. How would this work?
  • I launch a new service promising to recommend, say, lawn care products based on the music you listen to. Hey, stranger things have happened.
  • If you want to subscribe, you give me access to your music database (or, if you don't want me to know about your secret obsession for accordion music, you give me access to a sanitized view of it). This access would typically include a feed of updates, so that if you bought a new song I would know about it. My engine crunches that data and gives you recommendations.
  • You might also choose to give me access to your "casual acquaintances" data. In that case, if an acquaintance also joins and also grants the necessary permissions, my engine will know about it and (perhaps) make better recommendations to both of you.
Technically, this seems pretty nice. Each party deals with what it's most equipped to deal with. You maintain your personal data, I run my engine. There's only one copy of the personal data. You can give out or withhold permission as you see fit.

As I've said before, with more or less cynicism, it's not clear how or whether we get to such a world from here, but it does seem like a pretty sensible world.

Peter Gabriel adds a wrinkle or two (sorry, couldn't resist)

Actually, despite the somewhat dismissive tone of my last post, there are a couple of interesting wrinkles to The Filter.

First is that it does away with user ratings. It doesn't ask you to give a number of stars or whatever to a selection. Instead, it notes what you do with it. If you keep purchasing something by an artist, it concludes that you must like that artist.

In other words, actions speak louder than words.

The Filter also weights more recent actions more heavily than older ones, sort of a "what have you done for me lately?" approach. Your rave about your favorite boy band is not going to come back and haunt you in your thirties (unless you're still into them and buy the reunion DVD, of course).

Ratings have a couple of theoretical weaknesses. One is how to normalize the scores. I might tend to reserve a five-star rating for that rare near-perfect item, while you might tend to give it to everything you like. Another is that either of our ratings of an item might change after the initial enthusiasm or reluctance wears off, but most of us won't be bothered to change a rating except perhaps in extreme cases. Another is that many of us can rarely be bothered to assign a rating in the first place.

One of the tenets of the whole "wisdom of crowds" approach is that, given enough data points, such discrepancies will tend to even out. Fair enough, but that's true whether the raw data comes from people's recommendations or from their mouse clicks. If ratings are redundant or even inconsistent with more accurate indicators of what people are thinking, then they may well just be in the way.

The only way to know for sure is to try the experiment. Either way The Filter remains a crowd-based service. As far as I can tell you're not getting PG's picks, per se, or anybody's in particular.

I still don't think any of this is particularly novel. As I recall, the various music playing apps track how much you play a song and can infer a rating from it, and weighting recent activity more heavily is an old idea (compare exponential moving averages in technical trading, for example). On the other hand, not everything has to be a breakthrough. Much -- some could argue all -- progress is made by incremental tinkering and trying existing ideas in slightly new combinations.

Monday, June 2, 2008

Another old rocker on the web

Neil Young isn't the only aging rocker trying to establish a presence on the web. Well, most of them probably are, really, but Peter Gabriel in particular seems to be at it. Well he's been at it for a while, what with OD2 and Real World but ...

OK, where was the news item here? Apart from PG's servers getting nicked? Ah yes ...

PG is about to launch The Filter, a web-based recommendation service that seems quite a bit like other recommendation services, except, well, PG's behind it. The site does aim to be fairly broad in scope, including not just music but movies, web videos and such, and aiming eventually to include features like restaurant recommendations for tourists. Good stuff if you're a PG fan, but probably not so web-shaking in the larger scheme of things.

BTW, I actually like PG's stuff, by and large, and his efforts in distributing music are interesting ... just having second thoughts about the notability of this particular item. But the major news services had no such doubts, and (as someone else of that vintage said) who am I to disagree?

Netflix Roku box on order

I've gone ahead and ordered Netflix's set-top box, made by Roku. Due to high demand, it'll be a couple of weeks before it arrives, but I'll post a review when I've had a chance to play with it [as promised, the review is here].

Flashing your virtual headlights

It's not as common a practice as it once was, but if you drive in the US, you've probably heard of the custom of flashing your headlights to warn oncoming traffic of a speed trap you've just passed.

Now you can do the equivalent on the web, using your cell phone. Once you've set up Trapster, you can speed dial a phone number (without taking your eyes off the road, assuming you're good at speed-dialing) to report a speed trap you drive past. Other Trapster-aware drivers will then be alerted as they approach the trap. You can also go online (while parked, presumably) to see speed traps in your area.

Is it legal? It seems like it ought to be, particularly if it's a passenger and not the driver playing with the phone. It's not clear that The Man minds all that much either, the question being whether The Man is more interested in getting you to slow down or in handing out tickets.