Tuesday, July 29, 2008

The good driving monitor

A while ago I reported the dilemma of a teenager whose stepfather had insisted on his having a GPS unit installed in his car -- and who stood to get out of a speeding ticket as a result. Now car insurance companies are offering the same dilemma to the market at large. Put a monitoring device in your car, and if it shows you drive carefully they'll reduce your rates. If it doesn't -- and keep in mind that everyone thinks they're an above-average driver -- your rates will go up, though not by as much. Here's the breakdown for one company, in relative terms:
  • If the monitor convinces them you drive the way they like, you pay $0.60
  • If you don't need no stinking monitor, you pay $1.00
  • If the monitor convinces them you drive badly, you pay $1.09
In other words, you're more or less guilty until proven innocent. Not that that's wrong. This is business, not a court of law.

Privacy advocates, naturally, have a problem with this. The problem is not the monitoring per se, but that the company effectively owns the data. I don't see why it should have to be that way. Suppose you own your driving data. You can choose to sell access to it in return for cheaper insurance, or you can decline, in which case your insurer will presume you have a reason.

And that's the more subtle consequence: You could be a perfectly good driver, but not like the idea of turning that information over to The Man, and end up paying for the privilege. That's actualy not the case at the moment. The non-monitored driver currently pays less than the monitored "bad" driver. That seems like an unstable situation, though. If such monitors become widespread, the presumptions change, and in any case the actuarial risk has to show up somewhere. Some possibilities:
  • Require that everyone pay the same rate, no matter what. There's no need to gather driving data, but dangerous drivers pay less at the expense of safer drivers.
  • Prohibit the use of monitor data in setting rates, but allow accidents, speeding tickets and such to count, as they do now (at least in the US).
  • Allow the use of monitor data, but prohibit companies from charging more than X to customers who decline to supply the data. If X is the lowest rate, then no one will volunteer and we're back to the first case. If X is the worst rate, then the non-volunteer rate will come up and/or the worst rate will come down to eliminate the $0.09 difference.
  • Do nothing and see what happens.
In other words (and I should probably throw in the "I'm not an economist" disclaimer here), if this sort of thing catches on it looks like it would be very hard to prevent insurers from -- rationally -- charging non-volunteers the same rate as demonstrably unsafe drivers.

However, that doesn't mean people can't have control over whether they volunteer the information or not. There are at least two ways to do this. One, which I mentioned in a follow-up, is for you to own your own monitor and decide whether to let it yield up its secrets. The other, which would have much the same effect, would be for your monitor to send its data to your personal datastore, whence you could share it out as you saw fit. In either case the monitor needs to be tamper-resistant, but that's a given.

In any case, add this to the list of "Who owns the data?" cases where the initial answer is "a particular private company" but the eventual answer ought to be "you".

Friday, July 25, 2008

Laptop orchestras and such on YouTube

A while ago I mentioned an article on laptop orchestras. I'd meant to check some of them out some time, and the other day I finally got around to browsing through YouTube to see what I could see. Here are a few items that turned up. I won't try to play music critic except to say that the style tends heavily toward "electronic" "modern" music -- more abstract soundscapes than beats and chord progressions.
Interestingly, I didn't find anything for the Worldscape Laptop Orchestra that inspired the original post. However, I did find some performances that, while not involving laptops, were clearly in the same vein of using computers and consumer electronics to make music:
  • Here is the Modified Toy Orchestra playing, well, modified toys. If you dig around, you can find how-tos and links to a whole subculture of "circuit bending".
  • You may have seen the famous iBand clip featuring two iPhones and what looks to me like a DS.
  • And finally, here's turntablist DJ Kentaro jamming with Kenoshita Shinichi on Tsugaru-shamisen.

Wednesday, July 23, 2008

OpenID picks up steam

How many user names and passwords does a person need, anyway?

The first answer that springs to mind is "one". I don't think this is necessarily the right answer, though. I probably don't want to have the same credentials attached to my bank account as to my more frivolous pursuits, just like I don't want to have my safe deposit key open my front door or unlock my bicycle.

On the other hand, one of the annoying facts of web.life is that seemingly every single thing you use wants a user name it can identify you by. And if you give a mouse a user name, he'll probably want a password to go with it ... This gets old quickly.  While "one" is probably not the right answer, "dozens and dozens ... I don't even know how many" probably isn't either.

In practice, of course, most people come up with a few user names and passwords and use them over and over again. That's the equivalent of having the same key fit lots and lots of little locks, and as long as there's not anything too valuable behind any of those locks, it's probably OK. On the other hand, if anyone steals that key, you end up having to change lots and lots of locks.

In the virtual world it's a bit worse, even. While you have the only copy of a physical key, every service you sign up for potentially has a copy of your user name and password. I say "potentially" because there's a standard technique (hashing) for not knowing a user's password, but what are the chances of every single service using it correctly, or at all? [Note that if a site just stores a hash of your password (as it should), it's still possible for an attacker to figure out your password, buy guessing and seeing if any guess matches the hash.  If you have a good password this is much, much harder (assuming the site is using a "cryptographically strong" hash, but it's not impossible]

The correct answer to "how many user names and passwords" is probably "as many as I like, but no more than that," realizing that in some cases you'll need, or at least should need, a completely unique ID whether you like it or not. So how do we do that?

Your typical login goes like this:
  • Who are you?
  • I am Sir Galahad of Camelot
  • What is your quest?
  • I seek the holy grail
  • What is your password?
  • python
  • Hmm ... do we know a "Galahad"? Does the password match our records? Yes? You may pass ...
Now change this just a little:
  • Who are you?
  • I am Sir Galahad of Camelot
  • What is your quest?
  • I seek the holy grail
  • How do I know you're Galahad of Camelot?
  • Ask http://roundtable.ct/galahad
  • Hmm ... http://roundtable.ct/galahad, do you know this "Galahad"? (Galahad deftly turns around, whips out his web-enabled Sword of Righteousness, logs in to roundtable.ct and tells it to accept the bridgekeeper's request for authentication) Yes? Do I trust this roundtable.ct? I suppose so. You may pass ...
Dodgy analogies aside, this is the basic approach behind OpenID [Note: that's openid.net, not .com] What's happened is that instead of keeping track of names and passwords directly, the bridgekeeper, being OpenID-aware, agrees to take an OpenID provider's word for it. Galahad, for his part, only has to be able to tell his OpenID provider (roundtable.ct in this case) to accept the bridgekeeper's request. The OpenID URL serves as his user name and whatever procedure he uses to log into his OpenID provider -- maybe a password, maybe a smart card, maybe a retina scan or whatever -- is good enough for the bridgekeeper.

If you like all that, there's still the little question of getting people to use the scheme. This requires two things to happen. One is getting sites to to provide OpenIDs. This isn't hard -- more or less anyone can do it. The other problem is to get sites to accept OpenIDs.

Some sites aren't very fussy. A lot of places are more concerned with having some sort of name to track than proving that that name belongs to anyone in particular. They'll let anyone make up a random name and password. The OpenID equivalent would be accepting any URL as an OpenID, so long as it follows the standard.

Other sites want to tie an identity to a given email address. You know the drill: You provide an email address when you register, that email address gets an email with a magic link in it, you chase that link and only then is the account activated. The OpenID equivalent would be to accept URLs only from sites you knew required that sort of validation. There are many such. In particular, blogger.com does. Any blog URL can serve as an OpenID, so I can provide http://fieldnotesontheweb.blogger.com as an OpenID.

As usual there is more to OpenID than the short summary here, but that's the gist as I understand it. OpenID aims to scratch an itch that clearly needs scratched, and it seems to be getting some traction. I've run across OpenID login options on several mass-market web sites, including CNN and, more geekily, on SourceForge.net. Since blogger.com also accepts OpenID, you'll also see it on blogs attached to major sites.

On the provider side, besides Blogger, AOL, Flickr, Orange Telecom, Yahoo! and several other blogging services, it seems MySpace has jumped on the bandwagon (and Facebook hasn't). There are also several sites that specialize in providing and managing OpenIDs, notably including one run by VeriSign. OpenID maintains a list, and there is also a commercial directory aimed at promoting providers and OpenID-enabled sites.

[OpenID is still a thing, but it clearly hasn't taken the world by storm.  Amusingly, the Wikipedia article's section on "Adoption" was last updated in 2009 -- D.H. June 2015]

Friday, July 18, 2008

Bringing Greene County on line

Here's a fairly heartening story, reported by NPR:

Five years ago, the US government stopped subsidizing tobacco farming. At the time, Greene County, North Carolina was more dependent on tobacco than all but one county in the US. Faced with disaster, the county government made what seemed like a most improbable choice for a place with little money and big problems coming down the road. Instead of trying to attract a factory to a work force with few skills outside tobacco farming, they would bring in broadband and increase internet coverage from practically nothing to the entire county. They would also supply laptops to all high school students and make other educational reforms not detailed in the story.

The results were dramatic: many more students applied to and attended college, teen pregnancies fell, business improved and instead of falling off the map, the county was able, in the words of county government worker Misty Chase, to "jump over the industrial age and move straight into the technology age." On a less tangible level, the kids had iTunes and folks were able to email their relatives outside the county. No one is claiming that the internet and laptops were the only factor in the turnaround, but it's clear it wouldn't have happened without them.

In a world of hype about the net and the web changing everything, it's easy to be skeptical of such claims and easy to forget that, just as with rural electrification and universal phone service, technology really can make a difference.

Thursday, July 17, 2008

Another classic example of "dumb is smarter"

When I originally tried to come up with a list of "dumb is smarter" examples, I knew I was missing and at least one, and now I remember what it was.

The Iterated Prisoner's Dilemma is an abstract game, with many real examples or near-examples, in which
  • Two players repeatedly face a choice of acting generously, called "cooperating", or acting selfishly, called "defecting".
  • At any particular turn, they'll both do better if they both cooperate.
  • However, if only one cooperates and the other defects at that turn, the cooperator gets shafted.
  • Therefore, looking at a particular turn in isolation, the only rational choice is to defect, with the result that both players do less well than if they'd both cooperated.
  • However, since the choice is presented repeatedly, players have a chance to base their present actions on the results of previous turns.
It turns out that under these rules, rational players can agree to cooperate over the long term, even though it makes no sense in the short term ("it turns out that" is math-geek for "I don't want to go into the details"). It's an interesting result.

To probe this further, Robert Axelrod organized a tournament in which computer programs could compete with each other at the game. The tournament attracted considerable interest, and there were many competitors, some quite sophisticated in design.

And now the payoff: The winner, Anatol Rapaport's "tit-for-tat", consisted of four lines. Of BASIC. Its logic was:
  • Cooperate on the first turn.
  • After that, do whatever the other player did on the last turn.
This isn't an unbeatable strategy. It will lose, by just a bit, to "always defect", but the winner was determined by who had the best total result against all opponents. By always passing up the bigger gain of cooperating, "always defect" gets a low total score. Tit for tat does better than that when playing anyone who cooperates. It gets the best score possible when playing itself (or when playing "always cooperate" or anything else that always happens to cooperate when playing it).

In a later edition of the tournament, a team from Southampton University was able to beat tit-for-tat by having multiple copies of its program collude, but that's a different story (and even then who won depends on how you measure).

Wednesday, July 16, 2008

Privacy hot potato

It seems that Google and Viacom have reached an agreement on the YouTube usage data Google was ordered to turn over; Google will be allowed to anonymize the data by replacing user IDs and IP addresses with random tokens.

This is good news (though not as good as, say "turns out the data was anonymous to begin with"), but not a surprise. Google and Viacom had both stated that they wanted to find a way to protect the anonymity of the data. Google's interest is obvious, but Viacom had an interest as well: If they get anonymized data, no one can accuse them of abusing personalized data or accidentally leaking it. AOL already saw what it's like to be the guy that leaks personalized data, even if only by accident. No one wants to be that guy.

Now, the whole reason this is a big deal is because personalized data is valuable, and that presents a temptation. But a rational player will realize that the high cost of getting caught, together with the difficulty of keeping a dozen terabytes of valuable data completely secret and the lack of anyone else but Google to blame a breach on, far outweighs any benefit there may be. Viacom is just being rational. If there's a breach now, the list of suspects is one, not two companies long.

Put another way, the personal content of the usage data has value in general, but it has less than no net value to Viacom. It's a hot potato they don't want to catch. Better to make sure it's not thrown in the first place.

[Re-reading my original post on this topic, I see I already made this point, but I still think it's a good point.]

Thursday, July 10, 2008

Again, just what is this "web" thing?

This is post number 200, so the Eubie Blake comment goes double. Before returning to my usual random potshots, I wanted to step back and take another run at the Central Question: What is the web?

In an early post, I stumbled on a working definition I still like: The web is all the resources accessible on the net, whatever resources are and whatever the net is. That's fine as a technical definition, but it needs sauce.

Here are two ways to look at the web: the human point of view and the computer point of view.

The human view has a human shape. I'm blogging on blogger.com. I can check my local weather on weather.com, or at one of my local TV stations, generally using their familiar call letters. Companies have their own chunks of the web, as do governments of all sizes, schools and so forth. It's not hard for an individual to have a web presence and many of us do.

In fact, let's expand that a bit. I was originally equating "chunk of the web" with "domain name", and to some extent that's true for organizations. But for people, it's not. I have a blog here, but I also have accounts all over the place, some off on their own and some connected to other people's accounts. None of this requires me to have a personal domain name. Instead, I get small pieces of other domains.

This is not news, of course. Social networking is all about reflecting human relationships on the web, and the notion of personal datastores is all about letting people manage how their presence diffuses into the web at large. The larger point is that the web, having grown organically through the contributions of millions of people, is structured according to the whims of, and on a good day for the convenience of, people

From a computer's point of view, the web is a fairly strange place, compared to, say, a relational database. There is no single format for a web page, beyond broad statements like "It's often XHTML." Gleaning any more meaningful structure is a hit-or-miss affair. There are various efforts, like microformats, to make web pages more easily digestible, for example by providing ways of saying "this is a date" or "this is a physical location," but there is no requirement for anyone to use them.

There are links between resources, but there may or may not be a clear way to figure out what those links mean (is this a link to another post, or to the author's profile, or to something else entirely?). In many cases the cues are in the text on the page, or in the visual structure, both of which the wise application will generally not even try to understand.

If there's more, it's because the author of the page explicitly put it there in computer-digestible form (generally XML), or used tools that did, and because the application trying to make sense of the page has some knowledge of what the author or tool did. Either that, or someone painstakingly figured out what tags such and such a page happens to use and told an application how to "scrape" it -- until the webmaster at the other end decides to tweak the format in an unexpected way.

That's not to say that the web is completely opaque from the computer view. There has been a lot of work in this direction, under such headings as "Semantic Web" and "Web Services". As I understand it and in very broad terms, the Semantic Web is about making the web in general more accessible to computers, including (but not limited to) making human-visible structure more computer-visible. Web Services are more about creating a parallel universe of resources aimed specifically at computers, using the same formats and protocols as the human-visible web, but structuring things much more precisely so that an application accessing a resource knows exactly what to look for where.

Even in the most automated case, say when you want to use some sort of tool to book a flight, and that tool communicates directly with the various airlines and travel sites, speaking protocols that only computers were meant to understand, the human structure still wins. The connection between your tool and the travel sites, and the protocols they use to talk to each other, all reflect the fact that people want to fly and airlines want to sell them tickets.

Which brings me back to the question in the title. Another of the many possible answers to "What is the web?" is "a reflection of human society and its interconnections in electronic form."

In keeping with the "field notes" theme, here's a possible analog from biology. The class nematoda is one of the most successful on earth. If you could remove all matter on earth except for the nematodes, you could still make out most of what went on on the surface -- the topography, the shapes of buildings and roads, the shapes of larger life forms like trees and people.

Just so, if you could somehow remove all information on earth except for the web, you could still make out much of what goes on in human life. Maybe not as great a proportion as with the nematode example, but quite a bit nonetheless.

Saturday, July 5, 2008

Google, Viacom and privacy

A certain amount of controversy over privacy is just part of being Google, not just because Google is a large software company, but also because it aims to make as much of the world's information available to as many of the world's people as it can, subject, of course, to the admonition not to be evil. "Don't be evil" is just three simple words, but just how those three words apply when the bits hit the wire is the stuff of dissertations and lawsuits.

Google is embroiled in at least two significant disputes lately: The ongoing rumbles over Street View, which seem to be getting worked out piece by piece as we go along, and a lawsuit by Viacom over YouTube which, while probably not as bad as the flap over Italian tax privacy, does involve at least a couple of echoes of the AOL search data debacle.

That one certainly looks bad at first blush. In his ruling, Judge Louis L. Stanton has granted Viacom access to "all data from the Logging database concerning each time a YouTube video has been viewed on the YouTube website or through embedding on a third-party website." That data includes "for each instance a video is watched, the unique “login ID” of the user who watched it, the time when the user started to watch the video, the internet protocol address other devices connected to the internet use to identify the user’s computer (“IP address”), and the identifier for the video." [The link above is via Wired. If the original ruling is on the District Court's website yet, I can't find it. If anyone has a better link, please send it]

So there's a bit of a privacy issue there.

Google had two objections to this, first that the database was big. Well, 12 terabytes is a lot of data, but as the judge points out, it's not too big for commodity disks these days. The more serious argument is that the database contains personally identifiable data and is more than Viacom needs to “recreate the number of views for any particular day of a video” and "compare the attractiveness of allegedly infringing videos with that of non-infringing videos."

The judge was not impressed, calling Google's concerns "speculative" and citing a (very reasonable) blog post by Google developer Alma Whitten arguing that an IP address by itself is generally not personally identifiable information. That seems a bit odd since in this case the IP address is not by itself, but linked with just the kind of information that Whitten claims would make it personally identifiable.

However, the main thrust of the judge's argument seems to be that Viacom's use of the data is limited to a particular purpose in the discovery phase of a particular civil case. Presumably, if Viacom is later found to be making other use of the data, or if the data leaks out into the larger world, Google or someone else can come back after them. In the case at hand, Viacom would probably also run afoul of the Video Privacy Protection Act. Well, maybe so, but a genie out of a bottle is a genie out of a bottle ..

It's not clear to me why Google couldn't have just been compelled to disclose what Viacom said it was after: detailed logs of how many people watched what videos at what time, but not which particular people or from what IP address. In the cases Viacom is interested in, where large numbers of people watched copyrighted material, there should be more than enough individuals involved to provide anonymity.

On the bright side, Google and Viacom are now trying to work out how best to implement the court order without giving away personally identifiable information. Google's interest in this is obvious. Viacom also has an interest, though. They would like to be able to say "we only looked at the information we asked for, and we can prove it." No one wants to be seen as the company that inadvertently gave away information on millions of users. AOL went through that. It hurt.

In the background to all this is the long-standing complaint from privacy advocates that Google should have anonymized the YouTube data to begin with, as it has with its web search data. You can't divulge what you don't know, and in a case like the present one Google could have convincingly argued that it can only supply Viacom with what it asked for and no more. This is clearly easier and less error-prone than the present case.

In practice, it's probably not that simple. It's easy to think of a company as a monolith, but only the smallest companies really are. When you get to be Google's size, and the entity in question is a newly-acquired subsidiary, it's not a great surprise that rules and practices would differ.

Thursday, July 3, 2008

Is Netflix/Roku a web application?

Already sounds like one of those tail-chasing exercises where it all depends on your definitions, doesn't it? Or, put more kindly, a case where exploring the question is more valuable than any particular answer that might pop out. Let's try that angle [if you just want a review of the box, see here] ...

On the one hand, how could a set-top box be a web application? All you do is pick movies and watch them. There's no browser. You could say that picking a movie from the queue is like chasing a link but it seems more like a plain old menu. In particular, it feels a lot like picking a movie on demand with cable, except a little smoother and nicer. If the Netflix box is a web application you might as well say digital cable is, too.

On the other, the web is an essential part of the experience. You can't set up our queue without it. You can't even activate a box without going to the Netflix web site. The web interface isn't necessarily the most visible part of the picture, but it's definitely there.

In one of the earliest posts here, I tentatively defined the web as "all resources accessible on the net." I still like that definition -- it seems a little broad, but I'm not sure how to narrow it without cutting out too much -- and by that definition the box is definitely part of the web, and would (arguably) be even if the movies themselves didn't come in over a net connection.

If anything, the split between setting up the queue (webby) and watching (not so webby) bolsters the idea that the web is mostly about metadata -- relatively small bits of information about things, like in this case which movies are on your queue -- and not so much about large chunks of raw data like songs and movies.

Does this matter? From one point of view it's all pretty arbitrary, but questions like "is it the internet or not?" or "is it the web or not?" may matter quite a bit if you're down in the trenches fighting business and legal battles about who gets to charge whom how much for what. It will probably all shake out in the long term, but I can imagine it mattering at least for a while whether something is a "data service" or a "video service" or whatever.

I'm completely guessing here. I'm not a lawyer, and even less a businessperson.

Broadband: A somewhat broader view

CNN reports on a recent Pew study on broadband penetration. The thrust of the article, headlined "Broadband Internet? No thanks," is that surprisingly many dial-up users don't want broadband. The body of the article, however, suggests that it's not that people don't want broadband, but that it costs too much. Or, slicing it a bit finer, they don't see enough added value for the added price.

If you look at the numbers in the study, you get yet another somewhat different picture. The great majority of people who have had dial-up have now switched to broadband, and evidently a fair number have skipped dial-up entirely and gone straight to broadband. If broadband is stalling at all, it isn't stalling for lack of interest or even because of price, but because the low-hanging fruit has already been picked.

The CNN article is focusing on the minority who still have dial-up or who don't have any internet connectivity at all. The article makes sense in that context, but it's a pretty narrow context. The headline writer still loses either way: Of the reasons given for not switching, "nothing would convince me" ranks a distant second to "the price would have to fall."

The Pew study's headline reflects the luxury of not having to write for a mass market on a tight deadline: "Adoption Stalls For Low-Income Americans Even As Many Broadband Users Opt For Premium Services". Not quite as snappy as "Broadband Internet? No thanks," but it does seem more in line with Pew's numbers.

Right at the top of the Pew report is a graph showing
  • Broadband penetration rising from near-zero in June 2000 to 55% now.
  • Dialup penetration rising from 35% in June 2000 to 40% in April 2001, then falling to 10% now.
  • Total internet penetration rising from 35% in June 2000 to 65% now.
  • Broadband adoption rising faster from March 2007 to March 2008 than it did from March 2006 to March 2007.
On the last point, note that the time axis of the graph is not to scale, and that there seems to have been some leveling off since December 2007.

As the CNN article and the Pew headline suggest, in the cases where people don't have broadband, price is a factor when availability isn't. It would be interesting to know for what portion of people who can't afford an internet connection is it because they can't afford a computer to hook up to it.

There's one more point I'd like to pull out here. While broadband still costs more, broadband has been getting cheaper over the past few years while dial-up has been getting more expensive. This doesn't seem surprising. Dial-up, which converts digital data to an analog signal for transmission over infrastructure that's now mainly digital, is inherently less efficient, and at this point the economies of scale are no longer on its side.

What do I mean, "dumb is smarter"?

I previously mentioned Google's page rank as an example of "dumb is smarter" -- doing better by not trying to understand anything deep or do anything clever. Some other examples:
  • On balance, chess programs that bash out large numbers of positions do better than ones that try to emulate human judgment.
  • SPAM filtering is an interesting arms-race-in-progress, but the most effective technique I've run across is a simple whitelist. Bayesian filtering worked well for a while and all it does is crunch numbers. Any filter based on rules is vulnerable to gaming once people figure out the rules.
  • One of the many long-raging unresovled debates in the financial world concerns whether you can do better by carefully picking which stocks you buy and sell and when, or whether you should just make regular purchases of an "index fund" that tracks, say, the Russell 5000 and forget about it. I'm not going to take sides on that one. Rather, the point is that it's a serious debate at all. (Ironically, one of the best-known proponents of the "dumb money is smarter" approach, Warren Buffett, is also considered one of the best stock pickers ever.)
  • Every so often, it seems, someone puts out a program that appears able write free-form prose or even take part in a natural-language conversation with a person. Its understanding and grasp of humanity seem almost uncanny. Sooner or later, though, it comes out that the putative AI is really just doing some simple manipulation and the reader is assigning meaning to what closer inspection reveals to be gibberish. Everyone goes back to what they were doing, and real progress in natural language processing continues slowly and painstakingly. The classic examples are Eliza and Parry. Racter and Mark V. Shaney come to mind as well. These days, people write "chatterbots", some of which are one-trick ponies and some of which are considerably more sophisticated.
I'm not particularly knowledgeable about AI research, but my understanding is that there is a basic division between "strong" and "weak" AI. Strong AI has the rather broad goal of being able to do anything a human mind can do, ideally better. The notion of strong AI can be traced back to the Turing test, though there is debate concerning to what extent the Turing test is a valid measure of intelligence.

Weak AI aims to solve a particular engineering problem, say, beating a human master at chess or finding relevant web pages containing a particular set of terms. A weak AI does not need to claim to be conscious, or to have any particular understanding of the problem at hand. It just has to produce useful results. At which point, we decide that it can't possibly have any real intelligence, since we understand what it's doing and how.

In terms of the challenges of the 20th century, "producing a strong AI" compares to "curing cancer", while producing a weak AI is more like sending a rocket to the moon.

When I say "dumb is smarter", I'm not saying that strong AI is a useless goal, only that it's a very difficult, long-range goal which to this day is not particularly well-defined (though it is better defined than it was forty years ago). As such, it's more likely that progress will come in small steps.

Like anything else, "dumb is smarter" can be taken too far. The best chess programs tend to incorporate the advice of human grandmasters. When Google answers "What is the time zone for Afghanistan?" with a separate line at the top with the time zone, clearly it's doing some sort of special-casing. The absolute dumbest approach is not always the absolute best, but it does seem that the best trade-off is often much closer to it than one might think, and as a corollary, the dumbest approach is often a good place to start.