Powerset, a natural language search engine startup announced a deal with Xerox PARC to acquire its Natural Language Processing (NLP) technology. The New York Times has a lengthy article with particulars about the deal with Xerox, Powerset's VC investors, and quotes from Google.
VentureBeat has seen a demo and is very impressed. VentureBeat says "The technology, developed at Palo Alto Research Center (PARC) in Silicon Valley, seeks to understand the meanings between words, akin to the way humans understand language — and is thus called “natural language.” It has been thirty years in the works."
Danny Sullivan, a world renowned search engine analyst, is skeptical, and tired of the hype such startups receive from the press. Danny says "Natural language search makes a compelling pitch for those who really don't know search or haven't heard the natural language mantra before. I've seen the pitch time and time again."
I was also skeptical the first time I heard about Powerset.
OK readers, so do you want some insight and opinion? I don't like to write short, link filled, ramblings with no insights. I like longer ones :-) This will take about 5 minutes. So sit back and take your finger off the delete button. :-)
How do you beat a giant like Google? The same way Google beat AltaVista, Yahoo, Excite, and AOL more than 6 years ago. Change the game.
Google was a clutter free white page with just a search box, when everyone else was a portal with tons of content, news, sports, weather, maps, services, partnerships, flashing banner ads, and....oh yes, I almost forgot, a tiny little search box. Google focused on search while everyone else was trying to be a portal like AOL.
Google also changed the advertising game. Everyone else had random, non-targeted, banner ads, flashing gifs, pop-ups, and interstitial. It was disgusting. Google popularized small text ads, similar to newspaper classified ads, that were targeted to your search query, and placed them on the right side bar out of the way. The ads were actually useful and relevant...and people clicked on them.
Google changed the revenue model. All the other portals (with a search box) were selling sponsorships and display advertising deals. Companies like E*Trade would pay $10 million to sponsor the Finance section. Then a mortgage company would pay $40 million to be the exclusive sponsor. Hilton Hotels would pay millions to sponsor the travel section. You get the idea. Google popularized the ad auction model where millions of small advertisers made bids to win a particular key word. It turns out that there is a "Long Tail" of advertisers too...who are willing to pay a lot of money to be found.
What does all this have to do with Powerset? It is a reminder that winning isn't all about the technology. In fact, Google didn't invent search, didn't invent classified advertising, and didn't invent the ad auction model. They changed the game with small innovations that made a big difference to users and advertisers.
What should Powerset do? There no easy and obvious answers...they only appear easy 10 years later. Powerset's strength is in Natural Language Processing, or understanding the meaning and context of words. Lots of words like those found in a magazine, text book, or newspaper article. Rather than focusing all this NLP power on understanding the typical 2 or 3 word search query, why not help advertisers better target their ads on unstructured content?
What is Yahoo's problem? Untargeted traffic. Yahoo has tons of traffic to its home page and Yahoo Mail, but no way to effectively target ads. AOL, MSN, and every other portal has this same problem. So they all end up selling low cost, low margin, CPM ads rather than high margin Pay Per Click (PPC) ads like Google. If Powerset technology could be used to "understand" the context and meaning in an email message, they could effectively target ads...and triple the portal's revenues. If Powerset could scan a portal's dynamic home page, or each users personalized home page, they could better target ads.
Where else is ad targeting difficult? Social networking sites like MySpace, discussion boards like Yahoo Groups, news sites, sites with lots of consumer generated content, blogs, sites with lots of photos and videos, all have an ad targeting problem. Remember Google collected $10 Billion in revenues for targeting ads...not for providing a cool user interface and experience for users.
Powerset can change the game by focusing their power on the advertising problem. That is where the money is...and quite frankly, it is easier to convince thousand of advertisers that you have the next big thing, than it is to convince hundreds of millions of consumers. Remember my favorite cliche "In a fight between an alligator and a grizzly bear, the terrain determines the victor". If Powerset (alligator) takes the fight to Google (grizzly bear) on their turf...Powerset gets eaten alive. If Powerset stays in the water...they have a much better chance of winning.
Subscribe - To get an automatic feed of all future posts subscribe here, or to receive them via email go here and enter your email address in the box in the right column.
Excellent ideas! You're spot on with this. It's naive of Powerset to think they can beat Google with technology that will be replicated by Google sooner rather than later if it's that good. It took a very wide range of variables that made Google what it is; simply having incrementally better tech won't get Powerset far. They should have stayed under the radar and worked on the kind of strategy you've suggested.
Posted by: moataz | February 09, 2007 at 06:47 PM
Excellent ideas! Competing is about problem solving, so they should aim their expertise at problems that their solution bring more values.
But I wonder why MS not consider buying powerset?
Posted by: TanNg | February 09, 2007 at 08:41 PM
Don,
I've been a faithful reader of you blog since last fall when a old colleague of your's JG suggested I check it out. That is a good indicator of the power of personal recommendation. You're insights are bang on and it makes me wonder why I don't see the same in many MS products?
Your insights have helped me tremendously with some of the concepts of the start-up I'm pulling together and I want to thank you. Relevant ads is where it's at and you have specifically helped me with this key realization. Keep it up knowing that you are truly helping some new entrepreneurs.
Posted by: allan isfan | February 10, 2007 at 11:55 AM
Don,
Great thoughts on this with some clear wisdom they should listen too.
The problem of NLP is:
(1) Scale
(2) Noisy Data
(3) Learning Novel Ideas
Norvig and others on the statistical NLP side are skeptical because they have to deal with all of the above problems (as do those of us who work in the deep areas of Defense/Intel analysis) and have never seen anything like what PowerSet is proposing work on scale, with noisy data, and with emerging concepts.
Xerox PARC - while a big name, has no magic bullet to these problems. Look into the NIMD program with the office of the Intel community formerly known as ARDA. The early PARC work in this area in the 90s (which went to Inxight) never worked well on noisy data and required a whole other company (ClearForest) to get the extraction technology to work on non-trivial categories. All of it is highly supervised work vs. unsupervised algorithms - which tends to bite you big time on web scale due to the problem of massive false positives based on error rate of classifiers (a 99.9% accurate classifier in extraction of natural language will still produce 1000 false positives to 9 true positives if 1,000,000 targets are considered an only 10 real events like that exist).
This can be compensated for, but generally only by layering additional classification tests as filters - which requires a manual structure of concept hierarchy. And that is where it all starts to break at web scale on noisy data - you extract a very limited class of good language matches that cover the semantic area but only when fairly well formed and in the expected frame. Your recall drops to a small fraction(1/1000 - 1/10000) of what is really out on the web that is relevant. That makes it a lot worse than Google for those cases - which is the majority of search use cases right now. Of course, no one ever DEMOs recall - just the salient matches on a small test set. :-)
One of the other real problems of NLP for search is that it raises the bar of expectation for the end user so that when it fails (which it will - and a lot), it destroys the user's confidence in the system to keep using it as an automated abstraction layer. This is something we've seen in a big way when supporting Intelligence Analysts - you have to be consistently good to a level of expectation on EVERY CONCEPT or they stop trusting you all together. Taken with the above paragraph, you can see the Achilles Heel -I can only trust the machine to abstract for me if it really understood everything and replaced my need to read all of the relevant sources I could get to. Otherwise, it is just a filter on the data.
Bottom-line:
Google is McDonald's - consistently above average, sometimes surprisingly good, and, on occasion, laughably bad. NLP on web-scale (aka Powerset and Hakia) are like temperamental chefs - sometimes extraordinary, but often disappointing and almost never worth the recommendation to others.
Posted by: Tim Estes | February 11, 2007 at 12:30 PM
Dan - it's amazing to me that pundits have concentrated on the idea of parsing *queries* :
"Rather than focusing all this NLP power on understanding the typical 2 or 3 word search query, why not help advertisers better target their ads on unstructured content?"
Couldn't there be something else out there to parse like, say, the entire web? To think that NLP search is about parsing queries (alone) it to miss the point entirely.
Posted by: Matthew Hurst | February 12, 2007 at 11:01 AM
Is this technology going to making end-uers a little paranoid? Some users were paranoid when Google starting putting targeted ads in GMail by picking up key words. Powerset is talking about actually understanding what emails are about. This is a huge.
Or will users get over it like they did for GMail?
Posted by: bored | February 12, 2007 at 11:34 AM
It's worth noting that Google didn't invent the search ad auction -- that was Overture, who were bought by Yahoo. Overture sued Google over patent infringement and Google settled for roughly $300 milion in Google stocks in 2004[1]. Google changed the acution mechanism (in a very important way), but they didn't invent the model.
[1] http://www.thestreet.com/tech/georgemannes/10177217.html
Posted by: K G | March 30, 2007 at 11:50 AM