At this rate, there'll be a baker's dozen baker's dozen posts.
While trying to figure out what to try next, I ran across a blog by Mark Johnson, evidently one of the forces behind Bing. Among other things, he makes the point that there's more to evaluating a search engine than just throwing a few queries at it.
Fair enough. Several of the points are good ones, in particular the advice to try a prospective new search engine for a week or so with everyday queries instead of throwing two or three (or thirteen) contrived queries at it. Some, I don't buy so much. Johnson argues that people often make the mistake of just looking at the top result of a query. For my money this is a mistake the same way saying "irregardless" is a mistake. OK, you can call it a mistake, but it's what people do and they're unlikely to change.
In any case, I'm not aiming at the same kind of evaluation here that Johnson probably has in mind. I'm not looking for a document-turner-upper with nicer amenities, even though I fully understand that a collection of smallish amenities can make a major difference over an extended period of time.
An extended trial is a sensible approach if you're looking for something that's basically Google but better. I'm looking for something that's not Google, something that takes a fundamentally different approach and provides a fundamentally different experience. So far, Alpha is the only such engine I've found.
Nonetheless, I thought it was at least worth double-checking that the Bing that Powerset linked to was substantially the same as Bing in its own right, so I ran the baker's dozen past Bing itself. The search results were substantially the same, but not exactly. I'm not sure if that's because Bing searches differently on its own and Powerset was directing me to a page of Powerset results presented Bingishly, or just because web.contents tend to shift in transit and things have changed in the last day or two.
Even from these brief encounters I can see that both Powerset and Bing have various UI amenities beyond the pretty formatting that might well be helpful for routine use. If you're looking for Google-but-better, you might give them a look and decide for yourself. I might do so myself, though the cynic in me wonders whether a one-week trial is meant to be long enough to establish sufficient inertia to keep one from bothering to switch back ...
Johnson also provides pointers to several other engines to try out. So on with the show ...
Showing posts with label Powerset. Show all posts
Showing posts with label Powerset. Show all posts
Thursday, June 11, 2009
Monday, June 8, 2009
Baker's dozen: Powerset
[If you came here for a review of Powerset, you might also want to look into Wolfram Alpha]
Continuing the none-too-rigorous field test ...
When I first heard of Powerset, its big innovation seemed to be presenting not just raw results, but structured information in the form of "Factz". These were three-word sequences that were meant to summarize the information in an article. That was about a year ago. Since then, the Factz feature seems to have been toned down somewhat. The site itself looks slick, with various UI ameneties and a custom style sheet for displaying articles. For whatever that's worth.
Powerset claims to answer questions posed in plain English, but it limits its scope to Wikipedia. As we've seen, this is not necessarily a great limitation, as a fair number of questions can be answered perfectly well by producing the relevant Wikipedia article. Powerset now also provides links to Bing. It's not often you see a search engine advertised on TV, but Microsoft is currently running a well-produced campaign for it.
Since the PowerSet page links to Bing I'll have a look there, too. Between the two, there should be equivalent coverage to Google or Ask. This should be the first real test in this series of a search 2.0 engine with questions that, as far as I can tell, ought to be right in its wheelhouse. So here goes:
All in all, less than impressive.
In one case (2001), Powerset delivers an answer for which Google and Ask require a more specific query. In one case (cell phones), it delivers nothing where the others delivered a clear link to the answer. In one case (red fruit) it is somewhat less useful than the others. On the distance questions, where plain text search gave at least some moderately helpful answers and Google maps did the serviceable job you'd expect, Powerset completely whiffed. Bing looks like slightly less of the same.
But the style sheets look nice.
Up next (after another brief interlude): Wolfram Alpha.
Continuing the none-too-rigorous field test ...
When I first heard of Powerset, its big innovation seemed to be presenting not just raw results, but structured information in the form of "Factz". These were three-word sequences that were meant to summarize the information in an article. That was about a year ago. Since then, the Factz feature seems to have been toned down somewhat. The site itself looks slick, with various UI ameneties and a custom style sheet for displaying articles. For whatever that's worth.
Powerset claims to answer questions posed in plain English, but it limits its scope to Wikipedia. As we've seen, this is not necessarily a great limitation, as a fair number of questions can be answered perfectly well by producing the relevant Wikipedia article. Powerset now also provides links to Bing. It's not often you see a search engine advertised on TV, but Microsoft is currently running a well-produced campaign for it.
Since the PowerSet page links to Bing I'll have a look there, too. Between the two, there should be equivalent coverage to Google or Ask. This should be the first real test in this series of a search 2.0 engine with questions that, as far as I can tell, ought to be right in its wheelhouse. So here goes:
- How much energy does the US consume?
- How many cell phones are there in Africa?
- When is the next Cal-Stanford game?
- When is the next Cal game?
- Who starred in 2001?
- Who starred in 2001: a Space Odyssey?
- Who has covered "Ruby Tuesday"?
- What kinds of trees give red fruit?
- Who invented the hammock?
- Who played with Miles Davis on Kind of Blue?
- How far is it from Bangor to Leeds?
- How far is it from Bangor to New York?
- How far is it from Paris to Dallas?
All in all, less than impressive.
In one case (2001), Powerset delivers an answer for which Google and Ask require a more specific query. In one case (cell phones), it delivers nothing where the others delivered a clear link to the answer. In one case (red fruit) it is somewhat less useful than the others. On the distance questions, where plain text search gave at least some moderately helpful answers and Google maps did the serviceable job you'd expect, Powerset completely whiffed. Bing looks like slightly less of the same.
But the style sheets look nice.
Up next (after another brief interlude): Wolfram Alpha.
Thursday, June 26, 2008
Searching for a smarter search engine
One look at Google's quarterly reports should be enough to understand why people are still trying to build a better search engine. Google search does a great job. It will come as no shock that I've consulted it repeatedly in practically every post here. A friend once described it as adding (say) 25 points to his IQ, though not everyone agrees with that assessment.
I've cited Google as a classic case of "dumb is smarter". Google doesn't try to do anything one might consider "understanding" the material it's indexing. For the most part it just looks at words and the links between pages. There is some secret sauce involved, for example in handling inflections or making it harder to game the rankings. Mainly, though, Google wins because its PageRank algorithm turns out to do a good job of finding relevant pages and because it throws massive amounts of computing power at indexing everything in sight [There's a lot of secret sauce involved in getting that to work at the scale Google operates on].
Google is the dominant search engine, but that doesn't man there's no room for other engines, particularly engines that take a noticeably different approach or that try to solve a noticeably different problem. Powerset is one such engine. Rather than trying to index the entire web by keyword, Powerset answers English queries about material in Wikipedia. Without delving into a proper product review or comparison, which would have to include at least Google and, say, Ask (formerly Ask Jeeves), I'll just note a few impressions and head on to my real goal of blue-sky speculation [geek note: The "power set" of a set is the set of all that set's subsets; less formally, all the combinations of given set of elements.].
Suppose you want to know when John von Neumann was born. You ask "When was John von Neumann born?" Hmm ... oddly enough, it didn't answer that one directly. It did give the Wikipedia page for von Neuman, which gives the answer (December 28, 1903). "When was Mel Brooks born?" works more as intended, with a nice big "1926" at the top of the results. It also shows a link to a page that says 1928, but seems to know better than to believe it.
Other examples
It's not immediately clear what this is supposed to give me. Powerset says "For most people, places and things, Powerset shows a summary of Factz from across Wikipedia," and to illustrate this, it shows a section of a table of Factz about Henry VII -- whom he married (wife, Anne Boleyn ...) what he dissolved (monestaries, Abbey ...) and so forth. Evidently Henry provides a better example than tall buildings do.
The Factz summary appears to be the sort of thing that Powerset is really driving at. It's certainly the sort of thing that initially drew me to take a look. Rather than just index words, Powerset attempts to extract meaning from the text and present it in a structured way. In other words, it tries to be smart and, in some limited sense, understand the material it's indexing. For example, along with the listing of three-part Factz, it will also display "things" and "actions", with items it deems more significant shown larger.
If we view this smarter approach as an attempt at understanding, however limited, then I'm not sure that the Powerset engine understands all that much. It seems pretty good at distinguishing nouns from verbs, but beyond that, I'm not sure what "dozens measure meter" really signifies. Even in a seemingly simple factual statement like the one quoted, there is more going on than "dozens" "measuring".
It matters that it's dozens of towers, not dozens of meters (or dozens of eggs). It matters that the towers measure more than 600m tall and not less. It matters that the towers are being judged tallest in the limited context of "absolute height". It matters that this is "current", since the Burj Dubai, when completed, will be the tallest completed structure, period. This matters particularly because much of the article is spent wrangling over the meaning of "tallest", a debate which will soon be moot, at least for a while. The Factz approach appears to miss all this, none of which is particularly subtle from a human point of view.
Google, in the meantime, doesn't try to do any of this, but seems to do just fine on the queries above, given verbatim (not in googlese, and without quotes). For "What is the time zone for Afghanistan?" for instance, it said "Time Zone: (UTC+4:30) according to Wikipedia" right at the top. And, of course, Google indexes the entire web ("entire web" defined as "anything you can google", of course), in part because it doesn't spend a lot of time trying to extract meaning. As for the structured view, Wikipedia pages are already outlined, and I'm not sure what Factz give me that ordinary text search doesn't.
Ah well. Understanding natural language isn't just a hard problem, it's a collection of several hard problems, and not a particularly well-defined collection at that.
I don't want to leave the impression that Powerset is useless, and I particularly don't want to denigrate the effort behind it. In fact, I'd encourage people to at least try it. Tastes vary, and some may well find Powerset a nicer way to navigate Wikipedia. Nonetheless, Powerset only serves to confirm my impression that dumb is indeed smarter, and that Google's "we don't even pretend to understand what we're indexing" approach sets the bar remarkably high.
I've cited Google as a classic case of "dumb is smarter". Google doesn't try to do anything one might consider "understanding" the material it's indexing. For the most part it just looks at words and the links between pages. There is some secret sauce involved, for example in handling inflections or making it harder to game the rankings. Mainly, though, Google wins because its PageRank algorithm turns out to do a good job of finding relevant pages and because it throws massive amounts of computing power at indexing everything in sight [There's a lot of secret sauce involved in getting that to work at the scale Google operates on].
Google is the dominant search engine, but that doesn't man there's no room for other engines, particularly engines that take a noticeably different approach or that try to solve a noticeably different problem. Powerset is one such engine. Rather than trying to index the entire web by keyword, Powerset answers English queries about material in Wikipedia. Without delving into a proper product review or comparison, which would have to include at least Google and, say, Ask (formerly Ask Jeeves), I'll just note a few impressions and head on to my real goal of blue-sky speculation [geek note: The "power set" of a set is the set of all that set's subsets; less formally, all the combinations of given set of elements.].
Suppose you want to know when John von Neumann was born. You ask "When was John von Neumann born?" Hmm ... oddly enough, it didn't answer that one directly. It did give the Wikipedia page for von Neuman, which gives the answer (December 28, 1903). "When was Mel Brooks born?" works more as intended, with a nice big "1926" at the top of the results. It also shows a link to a page that says 1928, but seems to know better than to believe it.
Other examples
- "Where is the world's tallest building?" turns up the list of tallest buildings.
- "What is the time zone for Afghanistan?" turns up a list of pages, the first of which mentions the right answer.
- "How much money has been spent on cancer research?" turns up a link giving a figure for the UK, but nothing suggesting an overall figure
- "Why is there air?" brings up the Bill Cosby album of the same name.
It's not immediately clear what this is supposed to give me. Powerset says "For most people, places and things, Powerset shows a summary of Factz from across Wikipedia," and to illustrate this, it shows a section of a table of Factz about Henry VII -- whom he married (wife, Anne Boleyn ...) what he dissolved (monestaries, Abbey ...) and so forth. Evidently Henry provides a better example than tall buildings do.
The Factz summary appears to be the sort of thing that Powerset is really driving at. It's certainly the sort of thing that initially drew me to take a look. Rather than just index words, Powerset attempts to extract meaning from the text and present it in a structured way. In other words, it tries to be smart and, in some limited sense, understand the material it's indexing. For example, along with the listing of three-part Factz, it will also display "things" and "actions", with items it deems more significant shown larger.
If we view this smarter approach as an attempt at understanding, however limited, then I'm not sure that the Powerset engine understands all that much. It seems pretty good at distinguishing nouns from verbs, but beyond that, I'm not sure what "dozens measure meter" really signifies. Even in a seemingly simple factual statement like the one quoted, there is more going on than "dozens" "measuring".
It matters that it's dozens of towers, not dozens of meters (or dozens of eggs). It matters that the towers measure more than 600m tall and not less. It matters that the towers are being judged tallest in the limited context of "absolute height". It matters that this is "current", since the Burj Dubai, when completed, will be the tallest completed structure, period. This matters particularly because much of the article is spent wrangling over the meaning of "tallest", a debate which will soon be moot, at least for a while. The Factz approach appears to miss all this, none of which is particularly subtle from a human point of view.
Google, in the meantime, doesn't try to do any of this, but seems to do just fine on the queries above, given verbatim (not in googlese, and without quotes). For "What is the time zone for Afghanistan?" for instance, it said "Time Zone: (UTC+4:30) according to Wikipedia" right at the top. And, of course, Google indexes the entire web ("entire web" defined as "anything you can google", of course), in part because it doesn't spend a lot of time trying to extract meaning. As for the structured view, Wikipedia pages are already outlined, and I'm not sure what Factz give me that ordinary text search doesn't.
Ah well. Understanding natural language isn't just a hard problem, it's a collection of several hard problems, and not a particularly well-defined collection at that.
I don't want to leave the impression that Powerset is useless, and I particularly don't want to denigrate the effort behind it. In fact, I'd encourage people to at least try it. Tastes vary, and some may well find Powerset a nicer way to navigate Wikipedia. Nonetheless, Powerset only serves to confirm my impression that dumb is indeed smarter, and that Google's "we don't even pretend to understand what we're indexing" approach sets the bar remarkably high.
Subscribe to:
Posts (Atom)
