IBM releases unstructured data framework code as open-source
The company hopes to spur wider compatibility for complex text analytics apps
Computerworld - IBM today released the source code for its Unstructured Information Management Architecture (UIMA) to encourage independent software vendors (ISV) to use the framework for the creation of complex, enterprise-ready text analytics applications on a standards-based platform.
In an announcement today, IBM said the UIMA code is now available as an open-source development project on SourceForge.net.
While traditional content and knowledge management applications today allow users to search for terms, they don't allow searches for concepts or relationships between words in documents, Web sites or other text, said Marc Andrews, a spokesman for content discovery strategy and business development at IBM. Complex text analytics applications from various vendors do provide that kind of analysis, but plugging them into existing search applications can be difficult because of code compatibility issues, he said.
The idea behind UIMA is to have a standards-based platform developers can use to create specialized text analysis applications, which can then be tied in by users with the search applications of their choice. UIMA defines a common, standard interface that enables text analytics components from multiple vendors to work together.
"Customers [have] had to do the integrations themselves because there are no interfaces" between proprietary text analysis applications and search products, Andrews said. "They've had to custom-tie them together," which is often difficult and costly. "UIMA enables them to tie these things together more easily, providing plug-and-play in a common language."
Last August, IBM announced that more than 15 ISVs, including SAS Institute Inc., Cognos Inc., ClearForest Corp. and Attensity Corp., had pledged to support UIMA in their text analytics and search products (see "IBM releases open analytics interface"). IBM also introduced its own offering, IBM WebSphere Information Integrator OmniFind Edition, which is based on UIMA.
Text analytics can comb through documents, comment and note fields, problem reports, e-mail, Web sites and other text-based information sources, according to IBM, which worked on the development of UIMA for more than four years.
Several medical institutions are using UIMA to help organize huge amounts of unstructured data that could be useful in medical research, according to IBM.
The Mayo Clinic is using it to help extract and collect data from some 20 million clinical notes in medical records that will be used for research and to improve patient treatments. The Memorial Sloan-Kettering Cancer Center is extracting data on cancer treatments from its records to search for new cancer treatments.
In addition, the International Federation of Pharmaceutical Manufacturers and Associations, a worldwide industry body that represents pharmaceutical companies, recently deployed a portalof clinical trial information that uses the UIMA framework with IBM's OmniFind application to identify medical terms and concepts. That allows doctors, pharmacists, researchers and others to search by disease area or medicine names. The tool even recognizes synonyms across multiple languages. The portal will be used to bring together content from a number of existing clinical trial registries and databases, allowing doctors and patients to review summarized results and find trials they can join, according to IBM.
Read more about App Development in Computerworld's App Development Topic Center.



- Excel 2010 Cheat Sheet
- Register for this Computerworld Insider Cheat Sheet and gain access to hundreds of premium content articles, guides, product reviews and more.
- The Keys to Distributed & Agile Application Development
- How leading firms are winning with strategies for efficient application development, without relying on co-location.
- Overcome Top 7 Admin Challenges of Active Directory
- As Active Directory's role in the enterprise has drastically increased, so has the need to secure the data. Gain insight on creating repeatable,...
- Insiders Can Ruin Your Company. Take Action.
- Did you know that 80 percent of threats to an organization come from the inside? The threat from insiders is often overlooked in...
- Top Solutions and Tools to Prevent Devastating Malware
- Custom malware frequently goes undetected. According to Forrester Research, the best way to reduce risk of breach is to deploy file integrity monitoring...
- Streamline Compliance and Increase ROI
- Streamline, simplify, and automate compliance related activities; especially those that impact multiple business units. This white paper from NetIQ, outlines solutions that will... All App Development White Papers
- Reduced TCO for Communications Applications with New Oracle SPARC Servers
- In this webcast learn how Oracle's new SPARC T4 servers and SPARC Supercluster deliver the security, performance, and scalability required for 4G network...
- Optimizing Networks for the Cloud
- Join guest speaker, Rohit Mehra, IDC Director of Enterprise Communications Infrastructure, to explore current trends, discuss best practices for optimizing Data Center and...
- Apps QuickStart Series Part 2: Designing and Deploying SQL Server on VMware vSphere
- Download this webcast to learn about the design considerations for virtualizing SQL workloads, performance and scalability information and high-availability options, as well as...
- Apps QuickStart Series Part 1: Designing and Deploying Exchange 2010 on VMware vSphere
- Download this webcast to learn the virtual hardware design considerations for Exchange 2010, deployment using the building block approach, options for high-availability and...
- Customer Spotlight: How IPC The Hospitalist Company Implemented Oracle on VMware
- Have you been looking to hear about customer's experiences with the new VMware vCenter Site Recovery Manager product? View this webcast to learn... All App Development Webcasts