Powerset is slowly coming out of stealth mode. Dan Farber at ZDNet has an excellent in depth story on Powerset and PowerLabs, their approach to including the community in developing and QA'ing the Powerset search engine index. It is a novel idea, empowering of the open source community to join the fight to beat Google. Open Source contributors love to join a crusade, contribute their skills, and change the world.
Steve Newcomb, co-founder of Powerset, said in the ZDNet story; “We want as many people in Powerlabs (as possible) to help us build and test the product. Powerlabs tells us when we are ready to go. We could have 50,000 people QAing our product,” he added. So far Powerset has 10,000 Powerlabs users. “Imagine how many widgets that could sit inside of Facebook, MySpace and even Second Life. It gives us the ability to launch with an extremely passionate set of people.”
Powerset is using linguistics and Natural Language Processing (NLP) to better understand the meaning and context of search queries. But the real power of Powerset is applied to the search index, not the query. The index of millions of web pages is indexed in the traditional way but they also analyze the pages for "semantics", context, meaning, similar words, and categories. They add all of this contextual meta data to the search index so that search queries can find better results.
Who is the best ballplayer of all time? Powerset breaks this query down very carefully using linguistic ontologies and all sorts of proprietary rules. For example, they know that "ballplayer" can mean Sports. Sports can be separated into categories that involve a "Ball". Things like baseball, basketball, soccer, and football. Note that soccer does not include the word ball, yet Powerset knows this is a sport that includes a ball.
Powerset knows that "ballplayer" can mean an individual player of a sport that includes a ball. They know that "best of all time" means history, not time in the clock sense.
Why hasn't this been done before? Powerset uses all these rules and linguistic approaches to analyze millions and billions of web pages, and adds "meta data" hooks into each word on each page. As you can imagine this is a huge scaling problem, that has been impossible to solve economically. With Moore's Law applied to constantly reducing the cost of computing, storage, and bandwidth, it is now possible to solve this problem, and within a few years it will be economically viable. Powerset may sustain some losses in the early years but economics and scale are on their side.
Remember, "Why 1% of search market share is worth over $1 Billion" This is a huge market with staggering growth. Powerset is banking the lowering cost curve and the growing revenue curve. Both curves are moving fast. Grabbing 1% of market share can pay for a lot of early costs.