Powerset is slowly coming out of stealth mode. Dan Farber at ZDNet has an excellent in depth story on Powerset and PowerLabs, their approach to including the community in developing and QA'ing the Powerset search engine index. It is a novel idea, empowering of the open source community to join the fight to beat Google. Open Source contributors love to join a crusade, contribute their skills, and change the world.
Steve Newcomb, co-founder of Powerset, said in the ZDNet story; “We want as many people in Powerlabs (as possible) to help us build and test the product. Powerlabs tells us when we are ready to go. We could have 50,000 people QAing our product,” he added. So far Powerset has 10,000 Powerlabs users. “Imagine how many widgets that could sit inside of Facebook, MySpace and even Second Life. It gives us the ability to launch with an extremely passionate set of people.”
Powerset is using linguistics and Natural Language Processing (NLP) to better understand the meaning and context of search queries. But the real power of Powerset is applied to the search index, not the query. The index of millions of web pages is indexed in the traditional way but they also analyze the pages for "semantics", context, meaning, similar words, and categories. They add all of this contextual meta data to the search index so that search queries can find better results.
Who is the best ballplayer of all time? Powerset breaks this query down very carefully using linguistic ontologies and all sorts of proprietary rules. For example, they know that "ballplayer" can mean Sports. Sports can be separated into categories that involve a "Ball". Things like baseball, basketball, soccer, and football. Note that soccer does not include the word ball, yet Powerset knows this is a sport that includes a ball.
Powerset knows that "ballplayer" can mean an individual player of a sport that includes a ball. They know that "best of all time" means history, not time in the clock sense.
Why hasn't this been done before? Powerset uses all these rules and linguistic approaches to analyze millions and billions of web pages, and adds "meta data" hooks into each word on each page. As you can imagine this is a huge scaling problem, that has been impossible to solve economically. With Moore's Law applied to constantly reducing the cost of computing, storage, and bandwidth, it is now possible to solve this problem, and within a few years it will be economically viable. Powerset may sustain some losses in the early years but economics and scale are on their side.
Remember, "Why 1% of search market share is worth over $1 Billion" This is a huge market with staggering growth. Powerset is banking the lowering cost curve and the growing revenue curve. Both curves are moving fast. Grabbing 1% of market share can pay for a lot of early costs.
Subscribe - To get an automatic feed of all future posts subscribe here, or to receive them via email go here and enter your email address in the box in the right column.
Hi Don, you may want to check out this post (http://tinyurl.com/3y4r34) of Aydin Senkut (an investor in Powerset) - as I commented there, **I think** the problem in this business is far from being computational. At hakia, we have an efficient way of indexing (we call it - qdexing) meaning data. You may want to see hakia Labs (http://labs.hakia.com/) for a demo. But power of masses is a good approach indeed, I always advocate supporting complex algorithms with human input.
Posted by: Emre Sokullu | June 29, 2007 at 04:51 PM
Powerset might beat Google. And you work for Microsoft. Hmmm ...
Are you admitting that Live Search doesn't matter, that it's all about Google?
Or, should we be expecting a Microsoft acquisition of Powerset?
Posted by: David Scott Lewis | June 30, 2007 at 05:00 AM
Mr. Lewis, You jump to all sorts of conclusions...all of them wrong.
Live Search definitely matters. It has 10% market share which is worth about $10 Billion in market cap...that matters.
Powerset, and every other search startup, has their eyes set on beating Google. Beating Google in this sense means taking a couple points of market share away from Google. It doesn't mean becoming the number 1 search engine and driving Google to zero.
Do you really think that I would be writing about Powerset on my blog if Microsoft was about to acquire the company?
You got one thing right, I do work for Microsoft, which I clearly highlight in the headline and my bio.
Posted by: Don Dodge | June 30, 2007 at 09:53 AM
Interesting post. Both from a search perspective and from a business perspective. Going for small marketshare that's worth a lot as a primary objective is sometyhing I'm not sold on. Think you have to aim for the moon to land amongst the stars.
The iPhone I think is a perfect example, although Jobs et al seem to be setting expectations at a "measly" 10% you know that the aim has to be to dominate the market.
Think Powerset needs to have a similar aim to really be successful. Aim for the best product, the best search and to dominate the market and forget about small percentages - no matter how much financial success comes with the small slice.
Posted by: Farhan Lalji | June 30, 2007 at 12:11 PM
I can't understand all this hype about this powerset search engine that noone ever used and for which nobody so something else that 2 or 3 screenshots.
I am not saying Powerset/Hakia approach is wrong, on the contrary, but this is too much hype for not much yet.
Especially, if PARC who gave away their technology to Powerset had the killer app, wouldn't they have done something with it beforehand?
http://www.powerset.com/press/parc
Anyway, great to read your analysis Don in general.
Posted by: Pascal | June 30, 2007 at 02:02 PM
Gee whiz, I thought you were going to use the "ballplayer" example to show how fundamentally difficult, if not impossible, the "understanding" problem is.
The fact is, in the US and many other cultures, to sports watchers -- that is, anyone who would ask that question -- "ballplayer" absolutely, unambiguously, unquestionably means "BASEBALL player", not basketball player, not football player, cetainly not soccer player. Likewise, let's play ball", let's have a ballgame and others alse unambiguously mean baseball. But "let's toss a ball around" could mean a football or basketball, though there is probably a tilt toward a baseball. But actually, it also depends on the weather, the venue (indoors or outdoors, in a gym, on a court, on a grassy field), and probably other factors.
"Ballplayer" and th other terms may may mean something else in other cultures, or nothing at all. So it's really an example of the ~limitations~ of semantic analysis of just the query, apart from the context. You need to code that specific word, AND you need to know the cultural background of the person who asked the question, and indeed, the actual, of-the-moment context.
Posted by: David Lewis | July 01, 2007 at 06:29 PM
Any time someone tells me about a search engine to beat Google I just say, What's the query? and type it into Google. I suggest you do that with "Who is the best ballplayer of all time?" Next question, please.
Posted by: Machefsky | July 02, 2007 at 04:50 PM
I don't really understand the open source angle to this article. "Empowering of the open source community"?
The reality is that Google and its employees are top contributers to open source, and in their own words Powersets proposed solution is fundamentally proprietary. Although like Google, there are specific projects that open source participants can get excited about, the actual product isn't one of them.
Posted by: Lloyd Budd | October 29, 2007 at 03:01 PM
This is a great read! I heard about Powerset, but knew nothing about it until now. It will be interesting to see where this goes.
Posted by: Alan Daniels | July 02, 2008 at 03:40 PM