Monday, December 07, 2009
I am very pleased to be able to announce that Microsoft Semantic Engine, a group that I founded and led up through the current PDC 2009, has finally been made visible to the general public. The blog linked above called it the "hidden gem" of PDC 2009. That was very kind of him. In the two years that we have been building this technology, our goals have remained the same: to make semantically-enhanced search, discovery, and organization autonomic, scale invariant, and immediately useable by anyone used to search engines.
The system is built from three basic ideas: streams, properties, and concepts. The latter are the emergent categories and clusters that are discovered by the Semantic Engine and used for indexing. This is the core of the "semantic" in semantic search and discovery. It is also the basis for finding the hidden (i.e. latent) relationships in information automatically and making these relationships readily available to users (consumers or business workers).
Approachable, semantic, and self-organizing. No one wants to have to deal with the tera- and peta-bytes of information that they are exposed to daily. People need the right kinds of filters to find and use the information that is most relevant to them. Learning what those filters should be is part of the Semantic Engine's role. This is where a background service integrated deeply with the power of SQL Server and Cloud services like SQL Azure makes all of the difference.
It is definitely a tall order and one which Microsoft is investing in heavily. Meaning-driven indexing, automatic classification, and scalable storage for all shapes and sizes of information (documents, videos, audio, business forms, web pages, RSS/ATOM feeds, etc.) is the vision of "emergent semantics and recombinant information processing". It is also the basis for many value-added services: from predictive analytics and sentiment analysis to more personalized and tailored search and discovery services. Another great blog describing the PDC session is at Harnessing Unstructured Data .
Finally, the direct link to the PDC session video is at PDC 2009 Video. This video covers all of the main technical and business aspects of semantic engineering at large scale and the unique hybrid approach that Semantic Engine has taken to accomplish its goals. It is not a W3C SemanticWeb(tm) approach but one which melds the unique capabilities of unsupervised machine learning (hierarchical clustering), information retrieval models (higher-dimensional vector spaces), pluggable and trainable classifiers (SVMs, Naive Bayesian, Maximum Entropy, Decision Tree, etc.), and personalized filtering and ranking. Your search, discovery and organization finally become based on your preferences. This enables sharing and access of your enhanced and conceptually indexed information artifacts with social and business cliques. The stuff you care about is always "handy and nearby".
It all starts with three building blocks (streams, extracted properties, and concepts) that allow for emergent semantics to become a reality. Emergent mechanisms are the way to go for ensuring scale invariance and adaptive systems that are robust to change and autonomic evolution over time. Biological systems are and information-processing systems need to get that way too. A new order of Deleuzian 'machinic phyla' that can handle the rhizomatic nature of ideas and things.
Meaning-driven computing has arrived. The next decade should be very interesting for developers, consumers, and business people. The Microsoft Semantic Engine is just one technology on the road to omnipresent and unified search, discovery and insight.