
I am very pleased to be able to announce that Microsoft Semantic Engine, a group that I founded and led up through the current PDC 2009, has finally been made visible to the general public. The blog linked above called it the "hidden gem" of PDC 2009. That was very kind of him. In the two years that we have been building this technology, our goals have remained the same: to make semantically-enhanced search, discovery, and organization autonomic, scale invariant, and immediately useable by anyone used to search engines.
The system is built from three basic ideas: streams, properties, and concepts. The latter are the emergent categories and clusters that are discovered by the Semantic Engine and used for indexing. This is the core of the "semantic" in semantic search and discovery. It is also the basis for finding the hidden (i.e. latent) relationships in information automatically and making these relationships readily available to users (consumers or business workers).
Approachable, semantic, and self-organizing. No one wants to have to deal with the tera- and peta-bytes of information that they are exposed to daily. People need the right kinds of filters to find and use the information that is most relevant to them. Learning what those filters should be is part of the Semantic Engine's role. This is where a background service integrated deeply with the power of SQL Server and Cloud services like SQL Azure makes all of the difference.
It is definitely a tall order and one which Microsoft is investing in heavily. Meaning-driven indexing, automatic classification, and scalable storage for all shapes and sizes of information (documents, videos, audio, business forms, web pages, RSS/ATOM feeds, etc.) is the vision of "emergent semantics and recombinant information processing". It is also the basis for many value-added services: from predictive analytics and sentiment analysis to more personalized and tailored search and discovery services. Another great blog describing the PDC session is at Harnessing Unstructured Data .
Finally, the direct link to the PDC session video is at PDC 2009 Video. This video covers all of the main technical and business aspects of semantic engineering at large scale and the unique hybrid approach that Semantic Engine has taken to accomplish its goals. It is not a W3C SemanticWeb(tm) approach but one which melds the unique capabilities of unsupervised machine learning (hierarchical clustering), information retrieval models (higher-dimensional vector spaces), pluggable and trainable classifiers (SVMs, Naive Bayesian, Maximum Entropy, Decision Tree, etc.), and personalized filtering and ranking. Your search, discovery and organization finally become based on your preferences. This enables sharing and access of your enhanced and conceptually indexed information artifacts with social and business cliques. The stuff you care about is always "handy and nearby".
It all starts with three building blocks (streams, extracted properties, and concepts) that allow for emergent semantics to become a reality. Emergent mechanisms are the way to go for ensuring scale invariance and adaptive systems that are robust to change and autonomic evolution over time. Biological systems are and information-processing systems need to get that way too. A new order of Deleuzian 'machinic phyla' that can handle the rhizomatic nature of ideas and things.
Meaning-driven computing has arrived. The next decade should be very interesting for developers, consumers, and business people. The Microsoft Semantic Engine is just one technology on the road to omnipresent and unified search, discovery and insight.
4 comments:
The Semantic Engine looks great, when do we get a chance to work with it?
You can expect to see the Microsoft Semantic Engine in one of the upcoming SQL Server Betas. Stay tuned to the SQL CTP announcements for exact dates. There is a developer SDK and tons of examples already built for working with the Semantic Engine. You won't have to wait too long to try it out and the team is looking forward to feedback on the system.
http://docs.google.com/Doc?docid=0AQIg8QuzTONQZGZxenF2NnNfNzY4ZDRxcnJ0aHI&hl=en_GB
Why are white dwarfs "size ground"? Surely software of this type should tag words like "white dwarf" thereby producing the correct translation.
Most translation and overall natural language processing models need to take into account n-grams.The 'n' indicates the number of words in the 'phrase' that makes it a meaningful whole. This is different from the old school Fregean concept of sentence meaning (Sinn and Bedeuting) as a composite of simple word meanings. That is not sufficient to deal with the phrasal complexity of idiomatic expressions, irony, or the n-gram model of expression "chunks". Example: 'North Korea' is a 2-gram - it refers to (connotes) North Korea, a political entity. Without a sufficiently rich language model to handle n-grams, must translations and naive word-based searches go looking in the wrong area for results. Semantic Engine's language model is quite sophisticated. (alexsto@microsoft.com)
Post a Comment