Microsoft's annual TechFest is a three day conference for Microsoft employees to see demos of research projects and hear technical presentations from the researchers themselves. This year for the first time Microsoft invited press and bloggers for a special preview.
Microsoft employs over 750 PhD. research scientists working on hundreds of projects. Web search is always a popular research topic and this year was no exception. As John Markoff from The New York Times wrote;
During a morning session for more than 300 visitors at the Microsoft Conference Center, Lili Cheng, a user-interface designer for the Windows Vista operating system, showed off a new service called Mix that will allow Web surfers to organize search results and easily share them.
A second tool demonstrated, called Web Assistant, is intended to improve the relevance of search results and help resolve ambiguities in results that, for example, would give a user sites for both Reggie Bush and George Bush.
Susan Dumais, a veteran Microsoft search expert, has built a tool to help determine relevance called Personalized Search. It pulls together several hundred results and then compares them with the index that Windows users can build of the documents on their hard drives, a feature called Desktop Search.
Artificial Intelligence applied to search - By building an index of documents, emails, and previous searches it is possible to create a personal profile that will help filter and rank search results for better relevance. This is an artificial intelligence system that learns your interests and preferences, and constantly updates its algorithm based on your choices. In this way it is not necessary for the user to change their behavior or search style in order to improve results.
Here is a link to brief descriptions of other search related projects at Microsoft's research labs.
The Mix project referenced above in the NYT article is about finding and sharing dynamic content from a variety of sources. Search, aggregators, and RSS enable people to draw information from many dynamic streams of information on their desktop. People are getting used to reading dynamic content, but there are limited tools today to author and share dynamic content. Mix enables people to build and share dynamic documents with rich structure and visualizations on top of first-class query objects that draw from desktop, intranets, and Web-based search. Mix explores new user interfaces with regard to privacy and security.
The Microsoft Research site has brief descriptions of most of the projects that were showcased at TechFest. Visit the TechFest Demo site for further details.
Subscribe - To get an automatic feed of all future posts subscribe here, or to receive them via email go here and enter your email address in the box in the right column.
Hi Don,
Question for you: what's the rate of these killer research demos actually making it to market? There seems to be a popular notion that much of the cool stuff at events like Tech Fest never see the light of day, or are 'folded into shipping products'.
Would you be able to comment on that?
Posted by: John Milan | March 08, 2007 at 01:11 AM
What about Latent Semantic Analysis and indexing with words in context. you wont get precision searching unless the documents themselves are correctly indexed.
Posted by: Ian Parker | March 08, 2007 at 07:17 AM
Ian, You are correct that the index must be deeply analyzed and meta data added for the search results to improve.
There are many Microsoft researchers working on Latent Semantic Indexing and search techniques. Susan Dumais, and many others have published papers in scientific journals on this subject.
I spoke to the people at Powerset a few weeks ago and that is exactly where most of their NLP technology is being applied. Parsing a 2 or 3 word query with NLP doesn't yield significantly better results. But, if the index is parsed, and lots of NLP metadata added, the results are impressive.
I think it is safe to say that the brightest minds in the industry at all the big companies are working on these issues. They are conceptually easy to understand but very difficult to implement at web scale.
Posted by: Don Dodge | March 08, 2007 at 01:52 PM
I think it is safe to say that the brightest minds in the industry at all the big companies are working on these issues. They are conceptually easy to understand but very difficult to implement at web scale.
I can see that it is difficult web scale. However remember that the founder of the company Bill Gates at one point said don't worry about resources, Moore's law will take care of that! If you have 80 core processors (<100w) with a combined power of a teraflop that will do LSA web scale for you.
Another approach is to use the idle time of computers, effectively to turn every PC into a server. BOINCs have been used for SETI and also for global warming. It is clear that computing power must be combined in a proper way with a proper operating system without BOINCs.
This would also lead to a far more robust Internet than we have currently. There are great benefits in a Grid quite aopart from this peoblem. Yes I know this is being worked on.
Posted by: Ian Parker | March 13, 2007 at 12:09 PM