Ads by TechWords

See your link here
Receive the latest technology news and information.
Application/Web Development
Computerworld Daily News (First Look and Wrap-Up)
Computerworld Blogs Newsletter
The Weekly Top 10
Cloud Computing
View all newsletters




Privacy Policy
 

QuickStudy: Optical Character Recognition

July 29, 2002 12:00 PM ET

Computerworld - Suppose you wanted to digitize the novel Moby Dick overnight. You could stay up all night typing and still not finish. Or you could use a high-end scanner and in minutes scan all of author Herman Melville's works into a computer using optical character recognition (OCR) technology.

This is the technology long used by libraries and government agencies to make lengthy documents quickly available electronically. Advances in OCR technology have spurred its increasing use by enterprises.

For many document-input tasks, OCR is the most cost-effective and speedy method available. And each year, the technology frees acres of storage space once given over to file cabinets and boxes full of paper documents.

Before OCR can be used, the source material must be scanned using an optical scanner (and sometimes a specialized circuit board in the PC) to read in the page as a bitmap (a pattern of dots). Software to recognize the images is also required.

The OCR software then processes these scans to differentiate between images and text and determine what letters are represented in the light and dark areas.

Older OCR systems match these images against stored bitmaps based on specific fonts. The hit-or-miss results of such pattern-recognition systems helped establish OCR's reputation for inaccuracy.

Today's OCR engines add the multiple algorithms of neural network technology to analyze the stroke edge, the line of discontinuity between the text characters, and the background. Allowing for irregularities of printed ink on paper, each algorithm averages the light and dark along the side of a stroke, matches it to known characters and makes a best guess as to which character it is. The OCR software then averages or polls the results from all the algorithms to obtain a single reading.



Jump to comments

24-bit color

Additional Resources

WHITE PAPER
Approximately 60 percent of data migration projects overrun time or budget, while some fail completely. Download this white paper, "Enhancing Your Chance for Successful Data Migration," to learn the critical steps you need to take to execute a data migration project with minimum cost and risk to your business.
WHITE PAPER
Read the Gartner research note to learn why the TCO of a server-based computing deployment used to deliver all applications to users is around 50% lower than that of an unmanaged desktop deployment.
WHITE PAPER
Economic downturns have a tendency to accelerate emerging technologies, boost the adoption of effective solutions, and punish solutions that are not cost competitive or that are out of synch with industry trends. This IDC White Paper presents the results of an IDC survey of 330 companies in Western Europe, Asia/Pacific and the Americas that measures the receptiveness to Linux and takes into consideration changing views driven by the disruptive economic environment that businesses face today.

What People Are Saying

White Papers & Webcasts

Network Operating System Evolution
Computerworld and Juniper invite you to download this white paper!  

Three IT Strategies to Cut Cost Intelligently
Register for this Webcast! Provided by BMC Software.

How Operating Systems Create Network Efficiency
Computerworld and Juniper invite you to download the full report.  

Key Strategies for Managing Data Growth
What are you storage challenges?

Forrester Consulting - Optimizing Users and Applications in a Mobile World
Learn how to successfully deploy a WAN optimization solution that is specifically tuned for a mobile environment!  

Advancing the Economics of Networking
For more information download it today!