Sidebar: Text Mining Glossary
Computerworld -
Text miners use a variety of approaches to extract and present relevant information. Below are definitions of common methods:
Categorization - Presents the search results in categories, rather than as an undifferentiated mass.
Clustering - Grouping similar documents based on their content.
Extraction - Extracting relevant information from a document - for example, pulling out all the company names from a data set.
Keyword search - Searching documents for the occurrence of a particular word or set of words.
Natural-language processing - Determining the meaning of written words taking into account their context, grammar, colloquialisms and so on.
Taxonomy - Categorization of data according to a predefined framework, either industry-standard or customized. Some tools can automatically generate a taxonomy based on analysis of the data store.
Visualization - Graphically presenting the mined data so relationships are easier to spot and understand.
Additional Resources



Learn the important issues you must consider before starting your next mobility initiative. Get your mobility white paper from IDC now, compliments of Sybase.
White Papers & Webcasts
Tech Horizons: ASG's metaCMDB, The Technology That Rocks
Improved business productivity often requires more efficient IT and more efficient IT cannot be achieved without a better understanding of the way business...
How to Reduce Eclipse BIRT Development Effort for Data Visualizations
Web applications can come with a long list of visualization requirements for structured data. By delivering your output through the BIRT Interactive Viewer,...
Mitigating Litigation Risk with Email Management Tools
Does your company have an email retention policy that protects it when litigation occurs? IDC discusses effective email retention policies and the role...
Legacy IT Modernization - Practical Reality
(Source: BluePhoenix) Corporate budgets continue to tighten. Organizations are looking at ways to reduce operating costs and eliminate unnecessary expenses while at the...
Sun GlassFish Portfolio - Deploy Web Applications with Open Source
As enterprises struggle to develop and deliver new and more dynamic services to more people, they must do so with severe budget constraints....
Interactive Guide: Getting Started with Data Governance
In this online interactive guide, Andrew White, Research VP with lead analyst firm Gartner, answers these questions to help get you on the...
The necessary convergence of IT and Facilities
If IT and Facilities could work collaboratively, organizations can operate more efficiently and effectively while still meeting their business objectives. That's why EatonĀ®...
Why Now is the Right Time for the Linux Desktop
(Source: Novell) Faced with tighter budgets, enterprises are rethinking their desktop strategies to deliver the same - if not better - services and...
Is your data center ready for virtualization?
Virtualization can deliver dramatic benefits for data centers, but it can also stress the underlying support infrastructure. Power and cooling systems - which...
Agile Enterprise Content Management (ECM) for Rapid ROI
(Source: IBM) Content rich business processes are a core feature of daily operations at just about any organization today. Very often these essential...
Subscribe to Computerworld

