Friday, May 23, 2008

Steampunk writ large

It is said that sometime century-before-last, one Alexander Stanhope St. George had the notion of boring a tunnel from London to New York, affixing rather large lenses to either end and, with the aid of mirrors, allowing passersby in the two cities end to see each other as though they were face to face. It is also said that many of the workers who constructed the tunnel hailed from Liverpool.

And sure enough, if you go to the South Bank in London or the Fulton Ferry Landing in Brooklyn, there it is: an imposing Victorian-looking metal tube angling out of the ground, and sure enough, you can look into the lens and see across the Atlantic. The New York times has the full details; Auntie has a briefer, more bemused take. There's even a web page for making appointments to meet via Telectroscope, as the device is known.

For my money, the brilliant part of the concept is that there's no sound. The creator, Paul St. George, explains that if there were sound, people would likely just line up and use it as a telephone, but without sound, they actually have to move around and interact visually (or at least scrawl messages on a whiteboard.)

I'll leave it as an exercise for the reader to decide whether there is actually a purpose-built tunnel connecting the ends of the Telectroscope together.

Crowdsourcing crime statistics

A while ago I ran across a dispute concerning Sitefinder. Sitefinder provides access to a database of cell tower locations. Unfortunately, the database is incomplete, as not all providers have agreed to provide data for it. In the original post, I suggested this was a job for crowdsourcing, though I don't know whether anything of the sort actually happened.

However, someone has put into practice the general concept of crowdsourcing a parallel database when the official information is not readily available. Vasco Furtado's site, wikicrimes.org, uses pushpins on a Google map to chart crime in Brazil. Anyone can add a pushpin, or confirm (or disconfirm) a crime already reported.

Judging from the site, either they haven't hit critical mass yet or Brazil's crime rate is exceptionally low. Even if there were more data points, one would have to take it with a grain of salt, if only because some areas may happen to have more wikicrimes contributors and different areas have different rates of internet penetration. Of course, one should take any statistics, official or otherwise, with a grain of salt. At the very least, it's an interesting experiment.

Wednesday, May 21, 2008

Netflix, content and pipes

Previously, I pondered what cable companies would make of Netflix's new set-top box [reviewed here], which would seem to compete with existing on-demand video services, but also drive up demand for broadband internet. Hardly was the virtual ink dry before I ran across this item: Time Warner, under pressure from shareholders, is spinning off its cable business and is now, according to one analyst, "headed toward being a pure content company."

So, at a guess, and keeping in mind that besides not being a lawyer I'm also not a financial analyst, this might make everyone happy. Time-Warner gets to distribute its content however it likes, including via Netflix if that makes sense. Netflix sells boxes and rents movies, and the cable arm should do OK as long as it can sneak any money it loses in on-demand rentals into higher prices for its broadband service -- perhaps by making sure that the "regular" broadband service doesn't carry Netflix's movies quite as well as the "premium" version.

The new black box in town

Netflix has finally announced its long-rumored set-top box [reviewed here]. The box, made by Roku, is $99 and the service is $9 a month (it's included with any Netflix subscription except the $5 entry-level special). There are a couple of catches:
  • Obviously, you'll need broadband. The service promises "near DVD-quality", that is, somewhere around a megabit. I don't know if they can send you HD if you've got the bandwidth for it but if you live in the States, you probably don't so the question is moot.
  • The selection is limited. In particular, the studios are still leery of putting out their latest releases on the internet, so you won't get those. In all, about 10,000 titles are available, compared to the 100,000 they offer by mail.
Hmm ... anyone remember when movie theaters had one (count it, one) screen and there were three TV networks plus the odd UHF station playing Our Gang, The Munsters and Laurel and Hardy re-runs? Now 10,000 titles is "limited". Progress, I suppose.

"Broadband" in the states won't support live HD, but if the box had enough storage on board you could get HD sort-of-on-demand. Ask for your selection when you leave for work and it'll be there when you get home, probably, assuming no one else wants to do much with the internet connection during the day. Faster than waiting for a DVD in the mail, but interestingly, not a lot faster.

I would be surprised, though, if the box had anywhere enough storage on board for a full HD movie. My guess is it's enough to buffer a few minutes in case of net.hiccups, and Wikipedia's entry on Roku (the manufacturer) seems to support this.

Even at near-DVD resolution and a "limited" selection, I'm expecting the things to sell like hotcakes. For about the price of a premium movie channel package you get orders of magnitude more selection with comparable picture quality (if the quality isn't at least comparable, the whole thing will be dead on arrival). If they do sell, the projected video flood of the internet comes one sizable step closer.

In the case of households with cable, a chunk of video traffic that now goes over the provider's system and then down the cable to the TV will shift to going over the backbone to the cable provider and thence to the TV. In other words, the net shift is from the provider's system onto the backbone. The obvious solution is to put Netflix's servers at the upstream end of the cable (but see this post for a slightly different take).

Whether this happens depends on how the cable companies feel about their newfound "co-opetition". If you're in the bandwidth business, you love Netflix's box. If you're in the content business, maybe not so much.

Tuesday, May 20, 2008

An amusing broken link

How many recently-coined words have "google" in them? There's "googlewhack" for example, and one I just learned: "googlejack". To googlejack someone is to mimic their page when the visitor is a web crawler, but to redirect to your own page when the visitor appears to be an ordinary visitor. The result is that you get to use your victim's popularity (and thus their page rank) to promote your page. Very rude.

Now here's the fun part. The Wikipedia page I found this on mentions a site, googlejacking.org, which it says keeps track of instances of googlejacking with the idea of making Google aware of them.

The site appears to be parked at the moment, full of random ads ...

Note: For what I hope are obvious reasons, the link to Wikipedia above is a permalink to the revision current at this writing.

[Continuing the theme ... the original link for googlewhack.com now redirects to a Scandinavian SEO outfit, so I changed it to the Wikipedia link for the topic -- D.H. Dec 2018]

What a concept. Or rather, what's a concept?

One theme I've had kicking around in my head for a while, and may yet write up in earnest, is the concept of "dumb is smarter". The idea is that you can often do better by giving up on the idea of "understanding" the problem your solving and using a blatant hack. For example, Google relies on page rank -- the way a page is connected to other pages -- rather than any abstract understanding of a document, to decide what hits are likely to be "relevant".

The technique of Latent Semantic Analysis is an interesting case. LSA attempts to solve some of the well-known problems with searching based on words alone, particularly synonymy and polysemy.

Synonymy -- different words meaning the same thing -- means you can ask for "house" and miss pages that only say "home". Worse, you don't know what you're missing since, well, you missed it.

Polysemy -- the same word meaning different things -- means you can ask for "house" and get pages on the U.S. House of Representatives when you wanted real estate. This is probably more of an annoyance, particularly since you probably want the more popular sense of a word and not the one that that sense is drowning out.

LSA tries to mitigate these problems by starting with information on what words appear in what documents, then applying a little linear algebra to reduce the number of dimensions involved.

This means that, for example, instead keeping a separate scores for "house", "home" and "senate", there might be one combined score for "house" and "home" and another one for "house" and "senate". A document that contains "house" and "home" but not "senate" would be rated differently from one that contains "house" and "senate" but not "home", which is just the kind of thing we're looking for.

This combined system is called "concept space". Does it deserve the name?

On the one hand, yes, because intuitively it reflects the idea that "house" and "home" can represent the same, or at least related concepts, and because it seems to do fairly well empirically in mimicking how people actually rate documents as "similar" or "different".

On the other hand, clearly no, since all we're doing is counting words and doing a little math, and also because the "concept space" can include combinations that don't have much to do with each other, but happen to fall out of the particular texts used -- maybe "house" and "eggnog" happen to appear together for whatever reason.

The last would be a case of "correlation doesn't necessarily mean cause", and the interesting thing here is that LSA seems to do a decent job of emulating faulty human reasoning. People make that particular mistake all the time, too. As always, one must distinguish "human-like" from "intelligent".

Wednesday, May 14, 2008

I didn't say it and if I did I didn't mean it!

Here's an unsettling fact of life: According to an article in The Economist, there's been a rash of libel suits in the English courts. Why should those of us outside England care? Because you can be publishing in England, and subject to English libel laws, without even knowing it.

This isn't a strictly English thing. The article points out a case where a celebrity embarrassed by articles in one of the English tabloids (and what celebrity hasn't been?) sued in France in an unsuccessful bid to take advantage of its tighter privacy laws. However, England's libel standard, that it's up to the defendant to justify a "defamatory" statement, and the cost of defending a case there ($200,000 if you win, much more if you lose) have made it particularly attractive.

What do you have to do to be considered to have published in England? Not much, it seems. In one case, a book published in America happened to have sold a handful of copies in England. In another case, a foreign-language website hosted outside the UK* was sued by a Ukrainian citizen.

As far as I can make out, the criterion is that someone in England has read what you wrote, not that you did anything in particular to try to make that happen, or that said person was in any way offended by what you wrote or even believed it.

According to the usage statistics, this blog is (occasionally) read in England so first, thanks for that and second, if I somehow said something that defamed someone, anywhere, ever, I really didn't mean it and I'm very very sorry. Honest, guv!

* at least, that looks like Cyrillic to me and the domain is registered via a German company; you be the judge

Tuesday, May 13, 2008

More on Blu-Ray

Kris made a couple of good points in a comment on "Two things I didn't know about Blu-Ray".

First, if the content is songs recorded in the '60s, it doesn't matter how many bits your medium can hold. Those old analog tapes are still going to sound the same, and if you're a fan you probably already have them anyway.

Now in Neil's case, that's not what he's selling. He's selling new material (to us, not him) in a snazzy presentation that will allow you to do things like ponder lyrics and photographs while listening to the digitally scrubbed tape hiss on Mr. Soul or whatever. That's not enough for me personally to take the plunge, but for some folks it will be.

The second and more fundamental point is that, assuming the DRM stuff works, a blu-ray disc, even a "live" one, is a closed medium. You can only play it on a blu-ray player, not on your car stereo or your portable music player, or even on the upstairs TV unless you get another player. That's a feature to whoever's making the players, but a bug to the rest of us.

In the original post I was a bit too vague in calling that a "wrinkle". The other wrinkle, that modifications can be tied to a particular player, is also hard to see as a feature.

If you can't make your own copy, you're also stuck if you should lose or damage the original. The manufacturers know this and have developed a special scratch-resistant coating for the new disks, but however you coat it, having one copy is no match for being able to make backups. We've seen this movie a couple of times already, enough to have a pretty good idea how it ends.

Thursday, May 8, 2008

Connect the dots and the world will follow

Musing about connectivity and intelligence put me in mind of one of my favorite branches of mathematics (yep, I'm a geek -- that's "favorite branches of mathematics", plural): graph theory.

For those unfamiliar with the game, here's how it works. Take a bunch of dots and connect them with lines. The lines don't have to be straight, and they can cross each other. You can move the dots around however you like to make the picture clearer. All that matters is what's connected to what. If dot A is connected to dot B, it doesn't matter here how many lines you drew between the two, so let's just say there's never more than one. From this simple setup come many deep and interesting results (interesting to a math geek, at least).

Suppose you can connect each dot to at most one other dot (if you can't connect them at all, you've just got dots). In that case, you'll always end up with some number (maybe zero) of loose dots, and some number (maybe zero) of pairs of dots connected by a line.

Suppose you can connect each dot to at most two other dots. Then (after maybe re-arranging to get a clearer picture) you'll get three different things: loose dots, connected pairs and rings of three or more dots.

Now suppose you can connect each dot to at most three other dots. There are now infinitely many different possibilities, almost all of which are completely unknown except in a broad statistical sense. Even telling if two arrangements are the same or different is (in general) an intractable problem. Or at least we're pretty sure it is.

What if you can connect each dot to up to four (or five, or a million) different dots. Have you gained much? Not really. You can take any dot with more than three lines out and replace it with a ring of dots, each with one line out to the rest of the world.

In short, there are effectively only four levels of connectivity: zero (boring), one (boring), two (pretty boring) and three or more (infinitely complex). How connected the world is really only matters at one critical point.

Wednesday, May 7, 2008

Two things I didn't know about Blu-Ray

Thing one is that Neil Young considers Blu-RayTM to have provided the long-sought suitable medium for his archive project. Me and Neil, we go way back. Well, he wouldn't know me from Adam, but I spent many a college (and post-college) hour listening to his stuff. Even saw him live once. Good times. Good times.

Thing two, from the same article, and since I'm a stick-in-the-mud this is probably just news to me and not to the rest of the world, is the "live" aspect. In a pretty slick move, Sun managed to make Java support part of the standard for Blu-Ray players. The result is called BD-J. This is good for Sun, obviously, but also good for anyone who likes "interactive" content.

DVDs have a fairly crude mechanism for letting you navigate menus and play simple games (which basically boil down to navigating cleverly disguised menus). With Blu-Ray, there's a real programming environment under the hood. Expect this to go mostly unused, but also expect a few people to do seriously cool things with it. Neil and the gang certainly look intent to get some good out of it.

Further, a "BD-live" player is required to include (at least) a gigabyte of local storage, which the BD-J platform is able to divvy up securely so that each disc has its own private piece of storage associated with it. This means that an appropriately authored disc can effectively be updated, even if the disc itself is read-only. A writable disc can be updated after the fact, which is the way to go if you want more than a parcel of the player's gigabyte to work with.

OK, so what's so special about a disc that can be updated? If I download something to my hard drive, then download an update, I've now got an updated version because, well, that's what hard drives do. There are a couple of interesting wrinkles in the Blu-Ray case:
  • You can either update the disc, in which case the update travels with the disc, or (to a lesser extent) update the player, in which case the update stays with the player.
  • Assuming the various DRM measures Blu-Ray uses are effective, the updated content can't be copied anywhere else. Or at least, it's no more copyable than the original content.
The second item leads to a weak form of Vixie's dystopia: It would appear quite possible for a BD-J application to let you add your own content to a disc, say notes on where you were when you first heard a particular song, or comments on the action in a movie. If all the machinery works as advertised, your only access to that content will be through a Blu-Ray player. It's your content, but you don't quite own it the way you normally would.

There ought to be ways around this, for example by making your own copy of any changes you're about to make before actually making them, but it's still an interesting point.

Tuesday, May 6, 2008

Now available in FeedBurner

That is all, but I'll let you know if anything interesting happens.