Microsoft announced yesterday the acquisition of Powerset, a natural language search engine. Techmeme has lots of stories about the acquisition. Today the stories are all about Microsoft/Yahoo potential deals. But lets take a step back and look at how Powerset works and why it is an important development in search.
There are two key things to consider;
- Powerset technology is more about indexing the content and understanding its meaning, than the query itself. This has enormous implications.
- There are many lucrative markets for this technology...not just consumer web search.
How does natural language search work? There is a lot of linguistic rocket science to this but basically it breaks the search problem into two parts. First, understanding the intent of the user's query by using (NLP) natural language processing. Second, training their search index algorithm to parse the structure and context of individual web sites, and add meta data about the pages to the index.
Powerset is using linguistics and (NLP) to better understand the meaning and context of search queries. But the real power of Powerset is applied to the search index, not the query. The index of billions of web pages is indexed in the traditional way. The big difference is in the post processing of the index. They analyze the indexed pages for "semantics", context, meaning, similar words, and categories. They add all of this contextual meta data to the search index so that search queries can find better results.
Who is the best ballplayer of all time? Powerset breaks this query down very carefully using linguistic ontologies and all sorts of proprietary rules. For example, they know that "ballplayer" can mean Sports. Sports can be separated into categories that involve a "Ball". Things like baseball, basketball, soccer, and football. Note that soccer does not include the word ball, yet Powerset knows this is a sport that includes a ball.
Powerset knows that "ballplayer" can mean an individual player of a sport that includes a ball. They know that "best of all time" means history, not time in the clock sense.
Powerset understands the intent of the query, but more importantly, it understands the meaning and context of all the relevant web pages. Rather than just match keywords from the query, Powerset looks for "semantic" matches in its index of billions of web pages.
Why hasn't this been done before? Powerset uses all these rules and linguistic approaches to analyze billions of web pages, and adds "meta data" hooks into each word on each page. As you can imagine this is a huge scaling problem, that has been impossible to solve economically until now. With Moore's Law applied to constantly reducing the cost of computing, storage, and bandwidth, it is now possible to solve this problem, although it is still very expensive.
Where else could Powerset technology be used? Consumer web search is one obvious market, but there are many more.
- Enterprise search, companies searching their own internal documents and information, is a huge market that would benefit from Powerset technology. Enterprise search is a multi-billion dollar market.
- Advertising targeting. It is easy to target ads to search terms, but extremely difficult to target ads to general web content or User Generated Content. Today ad targeting technology does a poor job of understanding the context and meaning of a news article, a blog post, or magazine article. Powerset technology could be used to better understand this content and better target more relevant ads. This is a multi-billion dollar opportunity.
- Vertical Search - News search, medical search, people search, resume search, and basic knowledge search could be dramatically improved with Powerset's "semantic" search indexing.
I am sure you can think of many other places Powerset's semantic technology could be used. The technology will improve and expand over time. There is enormous potential here, more than a small startup with limited funding could hope to address. This is why Powerset joining forces with Microsoft makes so much sense.