ISWC2005 Notes: Day 3
It's day three for me. The invited speaker this morning is Dr Alfred Spector, CTO of IBM Software. His presentation is entitled Semantic Acceleration or "The Practical Web".
Starts off by admitting that he's not an expert in the semantics domain but he has a very strong point of view. He always wished he was in AI when he was a systems person. Got opportunity to run IBM's research division full of AI researchers which was great. He focussed on systems infrastructure to assist distribution of information for these researchers.
"Innovation is the intersection of invention and insight, leading to the creation of social and economic value" - US National Innovation Initiative
Still sees incredible opportunities to optimise all of society's processes. Look at automobile technology and how it is better in every dimension compared with 20 years ago.
Information semantics will drive greatly increased value in virtually every domain. Graph of value levelling out without semantics compared with a higher plateau if semantics are introduced. How do we get metadata on every business artifact? Very difficult to do by hand - find ways to automatically generate it.
Structured information: semantics captured in schema. Unstructured information: semantics inherent in usage and context.
Where will the semantics come from? Some will be manually created; some web content generated from existing databases; however most web and enterprise data contains only latent structure, e.g. email. Manual markup hard, perhaps impossible, to scale.
Approach via text analytics - adding structure to unstructured information. However, the analytics world is fragmented and surprisingly unstructured. The right analysis will likely be a best-of-breed combination of many techniques.
IBM are pursuing a combination hypothesis: if intimately integrated various KM technologies will provide higher quality results. Combine data mining, machine learning, IR, string and graph algorithms, text analysis and NLP, UI/human factors, privacy and security. Need tens or hundreds of thousands of text analysers working together. Problem is how you get them to cooperate - this is an architectural challenge.
UIMA is an implementation of this combination technique begun in 2001. Using text, video and speech analysis tequniques with advanced concept/semantic search and knowledge representation and reasoning. Architecture informed by TIPSTER, Catalyst, Atlas, GATE, TAF, Talent and WebFountain.
UIMA software has Java and C++ framework implementations. Support for co-located and service-oriented deployments. It's intended as an open, plug-n-play integrating framework to build an ecosystem of analysis and application developers. Will be open sourced soon. IBM making innovation grants available to catalyze efforts in this area.
Bought iPhrase which was using the UIMA system which seems an.. errr... interesting strategy for creating an ecosystem!