Saturday, June 6, 2009

Bakers dozen: Good ol' Google

To evaluate search 2.0, first we need a baseline from search 1.0 (leaving aside that there were search engines before Google).

Google doesn't even pretend to understand what you're asking it. If you ask it "How much energy does the US consume" it says something like "I heard 'much', 'energy', 'US' and 'consume' ... hmm ... here are some documents with those words in them."

Coming from a human research assistant, this would be totally unacceptable. We expect better. But since it's Google, and we've come to know what to expect from Google, we accept it, and it turns out to be quite useful. In that context, an acceptable answer from Google would be the standard 10-item first page containing links to enough documents to easily answer the question.

And so, even with fairly straightforward, objectively answerable questions, we're already veering off into the subjective. What does it mean to "easily" answer a question? With luck, we'll know it when we see it.

On with the show. In the following, I've given Google the question verbatim, without quotes.
  • How much energy does the US consume?
Google fared pretty well, by finding sites that asked and answered similar questions. Top hits:
  1. Population and Energy Consumption. This link appears broken.
  2. General Energy FAQs - Energy Information Administration is a FAQ from the US department of Energy. The second question is "Question: How much of the world’s energy does the United States use?" and the answer given is "[T]he United States primary energy consumption was 100.691 Quadrillion Btu, about 21.8% of the world total."
  3. WikiAnswers - How much energy does the United States use a year"The United States is the largest energy consumer in terms of total use, using 100 quadrillion BTU (105 exajoules, or 29000 TWh) in 2005, equivalent to an (average) consumption rate of 3.3 TW." This matches the DOE figure, but that's probably because the author used the DOE as a source.
  • How many cell phones are there in Africa?
Google didn't appear to do quite so well on this one. Just from looking at the snippets of the articles found, it was hard to tell if any answered the question. However, skimming through the first hit, Cellphones give Africa's farmers a chance to set out their stall ...,I found "At the end of 2007 there were more than 280-million cellphone subscribers in Africa, representing a penetration rate of 30,4%."

The next hit references the African Mobile Factbook, well worth a browse and almost certainly the source of the 280 million figure.
  • When is the next Cal-Stanford game?
I wouldn't expect Google to do well on this one. It might find documents referencing the next game at the time the particular article was written, but how many will have mentioned the 2009 game together with the date? What we need is the Cal (or Stanford) football calendar, which this search is unlikely to turn up ... and sure enough, I see a couple of articles about The Play and about Big Games from several years, but nothing obviously pointing me at Saturday, November 21.

Which I found by googling Stanford Football Schedule 2009, of course.
  • When is the next Cal game?
The results here are even less helpful, as Google cleverly expands "Cal" to "California" and turns up several hits for "California Games," something else entirely. Again, you'd have to think to search for "Cal football schedule 2009" (or whatever sport you're actually interested in). Search 2.0 endeavors to do that for you.
  • Who starred in 2001?
This is not specific enough for Google to get its hooks in. It turns up hits for stars of movies made in 2001, but nothing about the Kubrick classic. Adding more words to a Google search rapidly hits diminishing returns, but this looks like a good place to try ...
  • Who starred in 2001: a Space Odyssey?
Ah, there we go. Didn't even need to quote "a Space Odyssey." The very first hit is 2001 A Space Odyssey starring Keir Dullea and Gary Lockwood ...
  • Who has covered "Ruby Tuesday"?
Naturally, quoting "Ruby Tuesday" turns up hits for the restaurant, but the very first hit is the Wikipedia article on the song, which contains a long list of covers.
  • What kinds of trees give red fruit?
I wasn't expecting much on this one. There was certainly nothing like an exhaustive list in plain sight. Drilling through, however, produced a few answers, such as Brazilian cherry, cocoa, curry leaf, miracle fruit, Malay apple, Kapoho solo, rambutan/lychee, Thai salak, Surinam cherry, Akee fruit, Shadblow serviceberry, Russian hawthorn, downy hawthorn, Toba, madrona and just plain cherry.

Mind, not all are considered good eating. More relevant to the point in question, I had to search through a number of different pages to come up with the colorful list above.
  • Who invented the hammock?
Again, Google and Wikipedia team up, this time for a thorough and nuanced answer, the gist of which is, we don't really know, but probably someone in the Amazon basin. I also turned up the bane of Google searches: sites asking, but not answering, the question you're interested in. Funny how these also tend to be chock full of garish ads.
  • Who played with Miles Davis on Kind of Blue?
Yet again, Wikipedia for the win. The relevant article appears as the first hit, and the personnel section gives the full (impressive) lineup.
  • How far is it from Bangor to Leeds?
Heh. Hit number two is some shady outfit called "Field Notes on the Web" asking the very same question. At the bottom, though, is a link to a UK distance calculator giving the distance as 174.06 miles. The figure is suspiciously precise, but plausible [but see below].
  • How far is it from Bangor to New York?
Hit one is WikiAnswers with the none-too-helpful answer "ma thi wo", but hit two is WikiAnswers to a slight rephrase of the question. The answer given is "From New York, New York to Bangor, Maine it is about 448 miles." Myself, I would have said "about 450 miles."
  • How far is it from Paris to Dallas?
Well, WikiAnswers has Paris, France to Dallas, TX as about 5000 miles, and they'd like to know how far it is from Paris, TX to Dallas, TX. Nothing else on the list looks particularly relevant.

But wait a second. For these last three there's clearly another option in the Google family: Google maps. In all cases I'll simply type in the city names and see what pops out, then refine if that doesn't work.
  • We could not calculate directions between Bangor and Leeds.
  • Bangor, Wales to Leeds UK ("UK" was autofilled -- I was going to type "Leeds, England") gives 142 miles.
  • Bangor to New York turns up two routes, of 447 and 485 miles.
  • We could not calculate directions between Paris and Dallas.
  • Paris, TX to Dallas, TX turns up two routes, of 105 and 110 miles.

So ... what have we learned?
  • Google and Wikipedia. Two great tastes that go great together. Wikipedia has done much of the heavy lifting of pulling together coherent results, and Google does a pretty good job of finding them. Three of the thirteen questions, and three of the ten non-mapping questions, went straight to Wikipedia.
  • It matters, at least to Google, how you ask. If you have a distance question, ask Google maps, not Google search. Um, that doesn't seem like a big surprise. Be prepared to give country/state/province information in ambiguous cases. If you want to know when the next X happens, look for X schedule instead of asking directly.
  • Of the thirteen questions, Google gave a reasonable pointer to a good answer on eight of them on its first page of hits just by putting in the question and making no effort to be Google friendly. On two others (red fruit and Paris to Dallas) it gave links to at least some relevant information. On the remaining three (next Cal-Stanford game, next Cal game, who starred in 2001), you could find a good answer by recasting the question slightly.
In other words, an experienced Google user, which is to say a great many people by now, could have been expected to readily answer all thirteen questions. As far as I can tell this leaves two main areas of improvement, at least in the narrow domain of answering research questions is concerned: Finding the Google-friendly question from a more human-friendly one, and wrapping up the results neatly instead of requiring people to chase links.

As I understand it, this is exactly what the current crop of prospective Google-killers is trying to do. Whether they can, and whether that's enough value added to make a difference to the general public, remains to be seen.

Up next:

No comments: