
Subscribe to
Computerworld
or
Other Desktop Applications Stories
|
July 29, 2002 (Computerworld) -- Suppose you wanted to digitize the novel Moby Dick overnight. You could stay up all night typing and still not finish. Or you could use a high-end scanner and in minutes scan all of author Herman Melville's works into a computer using optical character recognition (OCR) technology.
This is the technology long used by libraries and government agencies to make lengthy documents quickly available electronically. Advances in OCR technology have spurred its increasing use by enterprises.
For many document-input tasks, OCR is the most cost-effective and speedy method available. And each year, the technology frees acres of storage space once given over to file cabinets and boxes full of paper documents.
Before OCR can be used, the source material must be scanned using an optical scanner (and sometimes a specialized circuit board in the PC) to read in the page as a bitmap (a pattern of dots). Software to recognize the images is also required.
The OCR software then processes these scans to differentiate between images and text and determine what letters are represented in the light and dark areas.
Older OCR systems match these images against stored bitmaps based on specific fonts. The hit-or-miss results of such pattern-recognition systems helped establish OCR's reputation for inaccuracy.
Today's OCR engines add the multiple algorithms of neural network technology to analyze the stroke edge, the line of discontinuity between the text characters, and the background. Allowing for irregularities of printed ink on paper, each algorithm averages the light and dark along the side of a stroke, matches it to known characters and makes a best guess as to which character it is. The OCR software then averages or polls the results from all the algorithms to obtain a single reading.
|
|
Print this Story |
|
Send Us Feedback |
|
E-mail this Story |
|
Digg this Story |
|
Slashdot this Story |
|
|
|
|
|
|
|
All Zones Application Performance Zone Business Continuity Zone Data Center Management Zone Enterprise-Class Security Zone The File Data Management Zone Grid Computing on Windows Zone Security Management Zone ITIL Best Practices Zone The SAS Zone Storage Virtualization Zone Business Intelligence and Analytics Zone |
|
|
| ||||||||
| ||||||||
| ||||||||
|

Computerworld Technology Briefing: Meetings @ the Speed of Business For large organizations, Web conferencing gives a major boost to collaboration among far-flung offices. For smaller companies, experts say Web conferencing is no longer a luxury but a necessity for everything from webinars to customer presentations. But the real value lies in saving soft costs and in increases in productivity.Download this briefing
|

|
In Depth: Apple's Leopard leaps to new heights A refined look, revamped apps and new options build on an already solid OS foundation. Read more... |
Accelerate your pursuit of perfection For almost 80 years, Kodak has been helping banks, insurance companies, healthcare providers, government agencies and other businesses produce billions of document images. So Kodak is uniquely positioned to know and deliverwhat customers want: easy-to-use scanners that output the best possible image quality. Download this white paper now!
|
| This podcast delivers summaries of key technologies and concepts every week. Listen to the short program on your iPod or in your Web browser.
This Week: Darwin Information Typing Architecture,
List of Episodes Subscribe
![]() |
| About Us Advertise Contacts Editorial Calendar Help Desk Jobs at IDG Privacy Policy Reprints Site Map |
|
CIO The Industry Standard |