Back during the Baker's Dozen series on search engines (a.k.a. "the topic that ate my blog"), I threw questions like "Who starred in 2001?" at various search engines. The idea was to see how well they would deal with questions beyond just matching up words statistically. Mind, I'm a fan of the statistical approach. It's easy to explain and, with a little googly special sauce, produces good results quickly.
I was particularly intrigued by True Knowledge (and by Wolfram Alpha). True Knowledge uses a fairly classic AI knowledge base approach to store facts in a structured way and draw inferences. For example, it might be able to glean from "starred in" that we're talking about a film or play, and it might know that there was a film called "2001". This sort of real-world, can't-be-derived-from-general-rules knowledge was one of the larger rocks against which the exuberant early predictions of AI -- I'm talking 1960s here -- were dashed. These days, with orders of magnitude more storage and processing power available, the parameters have changed and so the game has too.
At the time, True Knowledge was able to provide a good answer to "Who starred in 2001: A Space Odyssey?", but it couldn't quite connect the dots and realize that "Who starred in 2001?" was probably the same question. However, it did find a possible link, and offered
2001 can also be used as a way of referring to 2001: A Space Odyssey, the 1968 science fiction film directed by Stanley Kubrick, written by Kubrick and Arthur C. Clarke. If this is actually the recordable medium you are adding, please click the button below.
I did so, but the answer still came up the same. In the post I said:
Most likely the new facts are still rattling through the various caches, or perhaps someone's moderating the input. But if the search succeeds for you later, you'll know whom to thank.
Just now, I wondered whether the new knowledge had been assimilated into the database. And voila, True Knowledge can now answer the question. And the credit is mine! All mine! Bwahahahaha!(and, um, maybe a little bit to the nice folks at True Knowledge for putting the engine together in the first place, and all the people who contributed related facts to the database, and ...).
Flippant comments aside, this is actually pretty cool. Partly it's cool to see one's contribution, however minor, go into the Big Mix. But that's been a feature of web.life pretty much since the start. Mostly it's cool that True Knowledge was able to assimilate it the way it did.
In very broad and oversimplified strokes, the whole AI/robotics thing has gone through several phases:
(very early, but I suspect still very much present in the popular view) Hey, these computers can be programmed to do anything! They can solve equations in seconds that humans could never figure out. Simple stuff like walking and talking should only be a couple of years away.
Oh my. This walking and talking is much more complicated than it looks (again, this was a pretty early realization). You need some specialized knowledge.
A long period of building tools and solving specialized problems ensues. It becomes clear that you don't need "some" specialized knowledge. You need a whole lot. It also becomes clear that there are not just "some" specialized problems to solve, but lots and lots. To the outside world, nothing's happening. It's all dead, debunked (again, I suspect this is a fairly prevalent view in the world at large).
In reality, the research is paying benefits. It's just not producing I, Robot scenarios. This is the decades-long "If we know how to do it, it's not AI (any more)" phase on the computing side. Cognitive science (or "natural computation") is blossoming as a field and producing all kinds of interesting findings about how brains work.
And now, stuff is actually starting to appear. Computers are winning chess matches against top humans (albeit mostly through sheer computation). Demos like Big Dog are appearing. The computer end of human-computer interaction is getting smarter.
Personally, I still don't see I Robot coming to life any time soon, but I do see things that got written off as impossible during the dead-and-debunked phase starting to stir to life again. I'm thinking, say, competent machine translation or robots that can pick things up and carry them around a house, happening gradually in the next decade or so.
In his post on Wolfram Alpha, Mark Johnson mentions True Knowledge as a point of comparison, so naturally that seemed like a good place to try next. TK is supposed to be able to reason inferentially. For example, if you ask "Who are the Queen's grandchildren?" it will be able to find H.M. kids, and from them their kids, and thence the answer.
Game on.
TK wanted me to create a beta account, complete with re-CAPTCHA squiggly words and an email verification step, but it went ahead and logged me right in without verifying. A good thing, as I'd meant to give a different address.
How much energy does the US consume?
TK answers "Sorry, I don't understand that question." It then wonders if I might be interested in any of a number of recent, utterly unrelated queries, but it also offers a list of standard search engine hits. These don't appear to be the top few hits for the question itself, but rather (I'm guessing) the top hits for several similar questions. It certainly seemed heavier on "how much energy" than other lists I've seen. It's probably not googling for the question verbatim, quoted or not. Hmm ... maybe it's googling for the parts of the question it deems important, something like "how much" energy consume?
How many cell phones are there in Africa?
Again it's sorry, but the screen looks a little different. It tells me "It sounds like this is something True Knowledge doesn't know about yet. Most of what True Knowledge knows has been taught to it by people like you." and then goes on to paraphrase the question: "How many mobile phones (a telephone device) are geographically located (completely) within Africa (the continent)?" Interesting. Then follow the standard search engine results, probably based on the rephrased form.
But there's more. Right below the "Most of what True Knowledge has been taught ..." message is a button labeled "Teach True Knowledge about this". Sounds good, so I click the button and try to put in the answer from Wolfram Alpha. The tabs are intriguing, including a time period asking when the fact started being true and a couple that appear to provide a glimpse into the technical workings of the engine. Unfortunately, the "Add this fact" option appears to be grayed out, probably because I'm not a confirmed user.
When is the next Cal-Stanford game?
Overall TK seems a bit sluggish. This is the cost of actually thinking about what you're saying. After pondering a while, TK decides it doesn't understand. The answer is similar to the one for "How much energy ..."
When is the next Cal game?
Likewise.
Who starred in 2001?
Well, it gets partway. In particular, it is able to extract just the kind of information I had hoped it would. Here's what it said:
Sorry. I couldn't find a meaningful interpretation of that. The following objects matched your query, but none of them are recordable media (such as TV series, movies, or radio broadcasts)
the year 2001
the integer 2,001
the length of time 2,001 years
the age 2,001 years
You may be thinking of a particular recordable medium that isn't in the Knowledge Base yet, in which case you can help out by adding it
The "adding it" link was not marked in any way (Say, by underlining it and putting it in blue, mabye?), but now that I see it pasted in here, I see there's a button for adding it. A couple of button-pressing guesses later, I get
2001 can also be used as a way of referring to 2001: A Space Odyssey, the 1968 science fiction film directed by Stanley Kubrick, written by Kubrick and Arthur C. Clarke. If this is actually the recordable medium you are adding, please click the button below.
Looking good ... and next I get
Here are the facts gathered from your information:
I click the "Add these facts" button. It thanks me.
I retry the question. Same result as before. Most likely the new facts are still rattling through the various caches, or perhaps someone's moderating the input. But if the search succeeds for you later, you'll know whom to thank.
OK, if it's still learning the 2001 -> 2001: a space odyssey link, it presumably knows about 2001 under its full name:
Who starred in 2001: a Space Odyssey?
And sure enough, there it is, with sources (Wikipedia) cited and even a chance to disagree with its findings.
Who has covered "Ruby Tuesday"?
TK doesn't understand, but it does provide the Wikipedia entry in its list of regular search results. It also appears someone has asked it "How tall is Barack Obama in nautical miles?"
What kinds of trees give red fruit?
Likewise (but with a different selection of random questions from other users). As always, the regular search hits are there, so I could always mine that for answers.
Who invented the hammock?
This time I am asked to confirm its translation of my question. Toward the bottom of a long list of amusing attempts, including "Who is believed by a significant number of people to be the inventor of Hammock Music, based in Nashville, Tennessee, the label imprint under which Hammock released its initial two recordings, Kenotic and the Stranded Under Endless Sky EP?" I see the more relevant "Who is a key person or group involved in the invention of hammock, a fabric sling used for sleeping or resting?"
This sort of thing is the bane of natural language processing. The more you know about it, the more you appreciate the Google approach's* brilliance in deliberately sidestepping it.
Chasing the link, I find that TK doesn't know, but could I tell it? I'm not going to try to educate it on this one.
Who played with Miles Davis on Kind of Blue?
No comprende.
How far is it from Bangor to Leeds?
After it asking me which of a long list of Bangors I meant, and my telling it I meant "Bangor, the city in Caernarfonshire, Wales, the United Kingdom," it tells me 182 km. If I add "in miles" to the query it tells me the answer to twelve decimal places. Perhaps it's impatient with me for asking so many questions and knows that spurious precision is a pet peeve of mine.
How far is it from Bangor to New York?
This time, instead of giving me a list of Bangors to choose from, it gives a long list of eye-watering rephrases (for example: "How far is it from Bangor, the large town in County Down, Northern Ireland, with a population of 76,403 people in the 2001 Census, making it the most populous town in Northern Ireland and the third most populous settlement in Northern Ireland to The State of New York, the state in the Mid-Atlantic and Northeastern regions of the United States and is the nation's third most populous?"). Fortunately, the one I want is at the top: "How far is it from Bangor International Airport, the public airport located 3 miles (5 km) west in the city of Bangor, in Penobscot County, Maine, United States to the US state of New York?" The answer given is 547 km, or (by my rough-n-ready calculation) about 340 miles.
How far is it from Paris to Dallas?
This time, fascinatingly, there is only one choice available: "How far is it from the French city of Paris to Dallas, the place in Dallas County, Texas, USA?" The answer given is 7928km, consistent with everyone else's answer to that particular form. Even more fascinatingly, it knows about Paris, TX. Asking "How far is it from Paris, Texas to Dallas" gives the rephrase "How far is it from Paris, the place in Lamar County, Texas, USA to Dallas, the place in Dallas County, Texas, USA?" and the answer 152km.
Wow. That was ... certainly interesting.
Clearly, it's a work in progress. Clearly, it's doing a lot of interesting stuff. Clearly, a lot of thought and effort has gone into it. I certainly commend the team for putting the thing together and letting the public have at it. Considered as a journey, it was easily the most engaging of the sites I've visited so far. Considered as a destination, not so much.
If nothing else, if you're wondering why it's so darned hard to make a computer answer questions "the right way", and why dumb can be smarter, a little browsing around True Knowledge might provide some insights.
* Again, I acknowledge that "Google approach" glosses over a lot of other early work. Google is just the most prominent current exponent of the pure text-based approach as opposed to "semantic" approaches.