Powerset, a natural language search engine, has just emerged from stealth mode. Why the stealth?... natural language search has been around for more than 10 years. As many of you know, I was a director of engineering at AltaVista almost 10 years ago. Over the years I have seen lots of "new" approaches to search; semantic search, Q&A search, contextual search, visual search, social search, iterative search, personal search, and lots of others. None of them have caught on.
Danny Sullivan of Search Engine Watch fame agrees;
"Wow — so natural language searching is going to be the killer knockout? The search space is littered with companies that have promised this was somehow going to be a great advanced but never went everywhere.
The reason is simple. You don’t need to do a lot of conceptual analysis when the typical search query is two to three words long."
Web search vs. enterprise search
Most search startups focus on consumer web search because Google has made a fortune doing it. But these new approaches would probably be better received in enterprise search where the problems are different and more precision is valued.
Consumer web search is still today largely two to three word queries where the user wants speed and relevance applied to popular content. We tried "natural language search" at AltaVista with poor results. We thought allowing users to enter queries in natural conversational language would yield much better results. We thought at least we would get more words to work with. In fact, users still entered short queries filled with lots of "stop words" that had no value. See my prior post "The Long Tail of Words" to learn how search engines prioritize words in a query.
Enterprise search is different. The information is found in hundreds of different internal databases, and in many cases it is "unstructured" data. There are no links from "authoritative" sources so the "PageRank" system and similar ranking systems don't work. Enterprise search users want to find everything related to a term, sometimes referred to as semantic or contextual search. They are willing to wait more than one second for more complete results. They are willing to enter more words and better clues.
How does natural language search work? There is a lot of linguistic rocket science to this but basically it breaks the search problem into two parts. First, understanding the intent of the query by using (NLP) natural language processing. Second, training their search index algorithm to parse the structure and context of individual web sites.
NLP infers the "intended meaning" of common words like; by, in, about, where, how, etc and applies this meaning to the rest of the words in the query. In the background the NLP engine converts the "conversational English" query into a highly structured query that its search index can understand.
The search index "learns" the structure and context of individual sites. For example, if you study a news site like C/Net you will see that all stories follow a similar structural format. Headline, reporter name, date, location, teaser sub headline, body text, references, links, etc. A sports related site might have a slightly different structure and often include information like; teams, players, scores, statistics, locations, etc. Expert linguists spend lots of time analyzing sites to understand structure and context.
The problem with NLP search is that it doesn't work well for new web sites or dynamic content that it has not been trained to parse. Back in the early days they could only afford to train the NLP engine on a very specific set of sites. It worked amazingly well for those sites, but they were limited in number. I am sure the science has progressed in ten years to allow more rapid and precise "learning" of individual sites.
My advice is that Powerset, and other "alternative" search engines focus on enterprise search or specialized vertical search markets. Their unique technologies will be more appreciated and valued by those customers. If they are really stuck on doing consumer search than they should try to specialize in News Search, People Search, Medical Search, or some other vertical where their power can be an advantage.
UPDATE: Jim Kellerman sent me an email saying he is working on the Powerset team. Jim sat in the office next to me at AltaVista many years ago. I have a lot of respect for Jim and he says they have something good going on at Powerset.
UPDATE2: What could an NLP search engine like Powerset do to improve the results of the query "synopsis of books about the civil war"? Barney Pell, founder of Powerset, uses this query as an example in his blog.
Take a look at these results from Microsoft's Live Search. Better than Google in my opinion.
http://search.msn.com/results.aspx?q=synopsis+of+books+about+the+civil+war&FORM=MSNH
Todays search engines do a great job of finding "books with a synopsis of the civil war", but not a "synopsis of books...". Understanding the query was not the problem. Aggregating the results and presenting them as a synopsis is what is not possible today. I don't see how NLP search engines would improve the results. Am I missing something?
Subscribe - To get an automatic feed of all future posts subscribe here, or to receive them via email go here and enter your email address in the box in the right column.
Don, I think your description of how the index side of an NLP search engine works is a little off the mark. You describe, in part, document anlaysis (understanding the logical structure of a page), which is an important part, but that is not the central NLP part. Document analysis has come a long way and there is certainly no need for expert linguists to be spending time encoding the structure of individual web sites. This is not a fundamental barrier to NLP based search.
Posted by: Matthew Hurst | October 05, 2006 at 08:07 PM
Don, I think your description of how the index side of an NLP search engine works is a little off the mark. You describe, in part, document anlaysis (understanding the logical structure of a page), which is an important part, but that is not the central NLP part. Document analysis has come a long way and there is certainly no need for expert linguists to be spending time encoding the structure of individual web sites. This is not a fundamental barrier to NLP based search.
+1
Posted by: arcade flash games | July 10, 2012 at 06:04 PM