« Bill Gates and Steve Jobs agree - DRM is broken | Main | Wal Mart blocks Mac Safari and Firefox - retaliation for Apple blocking Wal Mart on iPhone? »

February 09, 2007


TrackBack URL for this entry:

Listed below are links to weblogs that reference Powerset and Xerox PARC team up to beat Google:

» TechCrunch40 - The first 10 companies from Don Dodge on The Next Big Thing
The first 5 companies have presented. I will update this post with details on the next 5 companies after [Read More]



Excellent ideas! You're spot on with this. It's naive of Powerset to think they can beat Google with technology that will be replicated by Google sooner rather than later if it's that good. It took a very wide range of variables that made Google what it is; simply having incrementally better tech won't get Powerset far. They should have stayed under the radar and worked on the kind of strategy you've suggested.


Excellent ideas! Competing is about problem solving, so they should aim their expertise at problems that their solution bring more values.

But I wonder why MS not consider buying powerset?

allan isfan


I've been a faithful reader of you blog since last fall when a old colleague of your's JG suggested I check it out. That is a good indicator of the power of personal recommendation. You're insights are bang on and it makes me wonder why I don't see the same in many MS products?

Your insights have helped me tremendously with some of the concepts of the start-up I'm pulling together and I want to thank you. Relevant ads is where it's at and you have specifically helped me with this key realization. Keep it up knowing that you are truly helping some new entrepreneurs.

Tim Estes


Great thoughts on this with some clear wisdom they should listen too.

The problem of NLP is:

(1) Scale
(2) Noisy Data
(3) Learning Novel Ideas

Norvig and others on the statistical NLP side are skeptical because they have to deal with all of the above problems (as do those of us who work in the deep areas of Defense/Intel analysis) and have never seen anything like what PowerSet is proposing work on scale, with noisy data, and with emerging concepts.

Xerox PARC - while a big name, has no magic bullet to these problems. Look into the NIMD program with the office of the Intel community formerly known as ARDA. The early PARC work in this area in the 90s (which went to Inxight) never worked well on noisy data and required a whole other company (ClearForest) to get the extraction technology to work on non-trivial categories. All of it is highly supervised work vs. unsupervised algorithms - which tends to bite you big time on web scale due to the problem of massive false positives based on error rate of classifiers (a 99.9% accurate classifier in extraction of natural language will still produce 1000 false positives to 9 true positives if 1,000,000 targets are considered an only 10 real events like that exist).

This can be compensated for, but generally only by layering additional classification tests as filters - which requires a manual structure of concept hierarchy. And that is where it all starts to break at web scale on noisy data - you extract a very limited class of good language matches that cover the semantic area but only when fairly well formed and in the expected frame. Your recall drops to a small fraction(1/1000 - 1/10000) of what is really out on the web that is relevant. That makes it a lot worse than Google for those cases - which is the majority of search use cases right now. Of course, no one ever DEMOs recall - just the salient matches on a small test set. :-)

One of the other real problems of NLP for search is that it raises the bar of expectation for the end user so that when it fails (which it will - and a lot), it destroys the user's confidence in the system to keep using it as an automated abstraction layer. This is something we've seen in a big way when supporting Intelligence Analysts - you have to be consistently good to a level of expectation on EVERY CONCEPT or they stop trusting you all together. Taken with the above paragraph, you can see the Achilles Heel -I can only trust the machine to abstract for me if it really understood everything and replaced my need to read all of the relevant sources I could get to. Otherwise, it is just a filter on the data.


Google is McDonald's - consistently above average, sometimes surprisingly good, and, on occasion, laughably bad. NLP on web-scale (aka Powerset and Hakia) are like temperamental chefs - sometimes extraordinary, but often disappointing and almost never worth the recommendation to others.

Matthew Hurst

Dan - it's amazing to me that pundits have concentrated on the idea of parsing *queries* :

"Rather than focusing all this NLP power on understanding the typical 2 or 3 word search query, why not help advertisers better target their ads on unstructured content?"

Couldn't there be something else out there to parse like, say, the entire web? To think that NLP search is about parsing queries (alone) it to miss the point entirely.


Is this technology going to making end-uers a little paranoid? Some users were paranoid when Google starting putting targeted ads in GMail by picking up key words. Powerset is talking about actually understanding what emails are about. This is a huge.

Or will users get over it like they did for GMail?


It's worth noting that Google didn't invent the search ad auction -- that was Overture, who were bought by Yahoo. Overture sued Google over patent infringement and Google settled for roughly $300 milion in Google stocks in 2004[1]. Google changed the acution mechanism (in a very important way), but they didn't invent the model.

[1] http://www.thestreet.com/tech/georgemannes/10177217.html

The comments to this entry are closed.

My Photo

Enter your email address:

Delivered by FeedBurner

Twitter Updates

    follow me on Twitter