Showing posts with label Powerset. Show all posts
Showing posts with label Powerset. Show all posts

Thursday, June 11, 2009

Baker's dozen: More on Powerset, Bing and search 2.0 in general

At this rate, there'll be a baker's dozen baker's dozen posts.

While trying to figure out what to try next, I ran across a blog by Mark Johnson, evidently one of the forces behind Bing. Among other things, he makes the point that there's more to evaluating a search engine than just throwing a few queries at it.

Fair enough. Several of the points are good ones, in particular the advice to try a prospective new search engine for a week or so with everyday queries instead of throwing two or three (or thirteen) contrived queries at it. Some, I don't buy so much. Johnson argues that people often make the mistake of just looking at the top result of a query. For my money this is a mistake the same way saying "irregardless" is a mistake. OK, you can call it a mistake, but it's what people do and they're unlikely to change.

In any case, I'm not aiming at the same kind of evaluation here that Johnson probably has in mind. I'm not looking for a document-turner-upper with nicer amenities, even though I fully understand that a collection of smallish amenities can make a major difference over an extended period of time.

An extended trial is a sensible approach if you're looking for something that's basically Google but better. I'm looking for something that's not Google, something that takes a fundamentally different approach and provides a fundamentally different experience. So far, Alpha is the only such engine I've found.

Nonetheless, I thought it was at least worth double-checking that the Bing that Powerset linked to was substantially the same as Bing in its own right, so I ran the baker's dozen past Bing itself. The search results were substantially the same, but not exactly. I'm not sure if that's because Bing searches differently on its own and Powerset was directing me to a page of Powerset results presented Bingishly, or just because web.contents tend to shift in transit and things have changed in the last day or two.

Even from these brief encounters I can see that both Powerset and Bing have various UI amenities beyond the pretty formatting that might well be helpful for routine use. If you're looking for Google-but-better, you might give them a look and decide for yourself. I might do so myself, though the cynic in me wonders whether a one-week trial is meant to be long enough to establish sufficient inertia to keep one from bothering to switch back ...

Johnson also provides pointers to several other engines to try out. So on with the show ...

Monday, June 8, 2009

Baker's dozen: Powerset

[If you came here for a review of Powerset, you might also want to look into Wolfram Alpha]

Continuing the none-too-rigorous field test ...

When I first heard of Powerset, its big innovation seemed to be presenting not just raw results, but structured information in the form of "Factz". These were three-word sequences that were meant to summarize the information in an article. That was about a year ago. Since then, the Factz feature seems to have been toned down somewhat. The site itself looks slick, with various UI ameneties and a custom style sheet for displaying articles. For whatever that's worth.

Powerset claims to answer questions posed in plain English, but it limits its scope to Wikipedia. As we've seen, this is not necessarily a great limitation, as a fair number of questions can be answered perfectly well by producing the relevant Wikipedia article. Powerset now also provides links to Bing. It's not often you see a search engine advertised on TV, but Microsoft is currently running a well-produced campaign for it.

Since the PowerSet page links to Bing I'll have a look there, too. Between the two, there should be equivalent coverage to Google or Ask. This should be the first real test in this series of a search 2.0 engine with questions that, as far as I can tell, ought to be right in its wheelhouse. So here goes:
  • How much energy does the US consume?
The fourth snippet on the Powerset page gives the same figure cited elsewhere "100 quadrillion BTUs (105 exajoules, or 29000 TWh) in 2005". Bing seems to give largely the same list.
  • How many cell phones are there in Africa?
I'm not finding anything here. There's a button you can click on that brings up a pretty widget containing the Wikipedia page in question with a "relevant passage" highlighted. There's a button on that widget for navigating to the next relevant passage, but it doesn't seem to do anything. In any case, I didn't see any figure for cell phones in Africa. Clicking through to Bing again produced what looked to be the identical list, but (again) re-labeled as "Bing reference".
  • When is the next Cal-Stanford game?
As with the other engines, some hits on particular Big Games and some other random stuff, but nothing telling me when the next one is. Given that Bing once again seems to be just the same list, I'm not going to mention it any more unless it does something notably different from Powerset.
  • When is the next Cal game?
The main difference here is that Cal Ripken appears at the top of the list.
  • Who starred in 2001?
At the top of the Powerset results, but not the Bing results, is a row of posters from "Freebase" (Really? You called it "Freebase"? Really?) labeled "2001: A Space Odyssey (film) Performances" Several of them have actors' names below them. Not bad, though not quite as unmistakable as, say, an IMDB entry. The actual articles are roughly the same as for Google/Ask: mostly stars of films made in 2001.
  • Who starred in 2001: a Space Odyssey?
This ought to produce at least as good a result, and it does. Powerset gives a somewhat more concise set of posters and (along with Bing) a list of articles mostly relevant to the film. The top one mentions the names of the stars, not in a list, but buried in the text.
  • Who has covered "Ruby Tuesday"?
If Powerset is an index to Wikipedia, it had better find the article for this one, and it does. The second highlighted passage mentions a particular cover version. Again I can't navigate to it in the widget, but the widget also shows the table of contents of the article in a smaller pane to the right, with the "Cover versions" section prominently visible. Click on that and Bob's your uncle.
  • What kinds of trees give red fruit?
Not much different from previous tries, though several entries mention the "UCN Red List." I can, however, now add "red huckleberry" and "red pitaya" to the list of red fruit. Except that further reading and link-chasing reveals that huckleberries grow on bushes and pitayas are cactus fruit.
  • Who invented the hammock?
Along with the Wikipedia article everyone has found, Powerset brings up a "Factz" (missing in Bing, of course) stating that "Inhabitants Invented Hammock". OK, thankz.
  • Who played with Miles Davis on Kind of Blue?
As expected, the Wikipedia article on the album pops up. Neither happens to make the personnel section easily visible, but once you get to the article it's, well, much like clicking on a link to Wikipedia. But at that point it's not hard to find the answer.
  • How far is it from Bangor to Leeds?
Stuff on Bangor, Leeds, Gaelic football and such, but no readily apparent answer to the question. At least it doesn't try to foist that Field Notes thingie onto the world.
  • How far is it from Bangor to New York?
Similarly, nothing helpful. But guess what? There's a Bangor, New York. Interesting that Google maps chose Bangor Maine (which I expected) over Bangor, NY (which is closer, though not as much closer as one might think).
  • How far is it from Paris to Dallas?
I see: An article on the TV series Dallas, one on the film Paris, Texas, articles on the town of Paris, Arkansas, and on Texas State Highway 24, a list of technology centers ... isn't this exactly the kind of mindless hash that the new search engines are supposed to avoid?

All in all, less than impressive.

In one case (2001), Powerset delivers an answer for which Google and Ask require a more specific query. In one case (cell phones), it delivers nothing where the others delivered a clear link to the answer. In one case (red fruit) it is somewhat less useful than the others. On the distance questions, where plain text search gave at least some moderately helpful answers and Google maps did the serviceable job you'd expect, Powerset completely whiffed. Bing looks like slightly less of the same.

But the style sheets look nice.

Up next (after another brief interlude): Wolfram Alpha.

Thursday, June 26, 2008

Searching for a smarter search engine

One look at Google's quarterly reports should be enough to understand why people are still trying to build a better search engine. Google search does a great job. It will come as no shock that I've consulted it repeatedly in practically every post here. A friend once described it as adding (say) 25 points to his IQ, though not everyone agrees with that assessment.

I've cited Google as a classic case of "dumb is smarter". Google doesn't try to do anything one might consider "understanding" the material it's indexing. For the most part it just looks at words and the links between pages. There is some secret sauce involved, for example in handling inflections or making it harder to game the rankings. Mainly, though, Google wins because its PageRank algorithm turns out to do a good job of finding relevant pages and because it throws massive amounts of computing power at indexing everything in sight [There's a lot of secret sauce involved in getting that to work at the scale Google operates on].

Google is the dominant search engine, but that doesn't man there's no room for other engines, particularly engines that take a noticeably different approach or that try to solve a noticeably different problem. Powerset is one such engine. Rather than trying to index the entire web by keyword, Powerset answers English queries about material in Wikipedia. Without delving into a proper product review or comparison, which would have to include at least Google and, say, Ask (formerly Ask Jeeves), I'll just note a few impressions and head on to my real goal of blue-sky speculation [geek note: The "power set" of a set is the set of all that set's subsets; less formally, all the combinations of given set of elements.].

Suppose you want to know when John von Neumann was born. You ask "When was John von Neumann born?" Hmm ... oddly enough, it didn't answer that one directly. It did give the Wikipedia page for von Neuman, which gives the answer (December 28, 1903). "When was Mel Brooks born?" works more as intended, with a nice big "1926" at the top of the results. It also shows a link to a page that says 1928, but seems to know better than to believe it.

Other examples
  • "Where is the world's tallest building?" turns up the list of tallest buildings.
  • "What is the time zone for Afghanistan?" turns up a list of pages, the first of which mentions the right answer.
  • "How much money has been spent on cancer research?" turns up a link giving a figure for the UK, but nothing suggesting an overall figure
  • "Why is there air?" brings up the Bill Cosby album of the same name.
Beyond accepting questions posed in plain English, Powerset also aims to give you a richer view of the results it finds. This includes an outline of the page contents and a list of "Factz" gleaned from the text. These take the form of short subject-verb-object near-sentences like (in the case of the "tallest building" article) "dozens measure meter" and "television broadcasts towers". Click on one of these and it highlights a relevant passage in the text, for example "In terms of absolute height, the tallest structures are currently the dozens of radio and television broadcasting towers which measure over 600 meters (about 2,000 feet) in height." or "In terms of absolute height, the tallest structures are currently the dozens of radio and television broadcasting towers which measure over 600 meters (about 2,000 feet) in height."

It's not immediately clear what this is supposed to give me. Powerset says "For most people, places and things, Powerset shows a summary of Factz from across Wikipedia," and to illustrate this, it shows a section of a table of Factz about Henry VII -- whom he married (wife, Anne Boleyn ...) what he dissolved (monestaries, Abbey ...) and so forth. Evidently Henry provides a better example than tall buildings do.

The Factz summary appears to be the sort of thing that Powerset is really driving at. It's certainly the sort of thing that initially drew me to take a look. Rather than just index words, Powerset attempts to extract meaning from the text and present it in a structured way. In other words, it tries to be smart and, in some limited sense, understand the material it's indexing. For example, along with the listing of three-part Factz, it will also display "things" and "actions", with items it deems more significant shown larger.

If we view this smarter approach as an attempt at understanding, however limited, then I'm not sure that the Powerset engine understands all that much. It seems pretty good at distinguishing nouns from verbs, but beyond that, I'm not sure what "dozens measure meter" really signifies. Even in a seemingly simple factual statement like the one quoted, there is more going on than "dozens" "measuring".

It matters that it's dozens of towers, not dozens of meters (or dozens of eggs). It matters that the towers measure more than 600m tall and not less. It matters that the towers are being judged tallest in the limited context of "absolute height". It matters that this is "current", since the Burj Dubai, when completed, will be the tallest completed structure, period. This matters particularly because much of the article is spent wrangling over the meaning of "tallest", a debate which will soon be moot, at least for a while. The Factz approach appears to miss all this, none of which is particularly subtle from a human point of view.

Google, in the meantime, doesn't try to do any of this, but seems to do just fine on the queries above, given verbatim (not in googlese, and without quotes). For "What is the time zone for Afghanistan?" for instance, it said "Time Zone: (UTC+4:30) according to Wikipedia" right at the top. And, of course, Google indexes the entire web ("entire web" defined as "anything you can google", of course), in part because it doesn't spend a lot of time trying to extract meaning. As for the structured view, Wikipedia pages are already outlined, and I'm not sure what Factz give me that ordinary text search doesn't.

Ah well. Understanding natural language isn't just a hard problem, it's a collection of several hard problems, and not a particularly well-defined collection at that.

I don't want to leave the impression that Powerset is useless, and I particularly don't want to denigrate the effort behind it. In fact, I'd encourage people to at least try it. Tastes vary, and some may well find Powerset a nicer way to navigate Wikipedia. Nonetheless, Powerset only serves to confirm my impression that dumb is indeed smarter, and that Google's "we don't even pretend to understand what we're indexing" approach sets the bar remarkably high.