Subscribe to our e-mail newsletters
For more info on a specific newsletter, click the title. Details will be displayed in a new window.
Computerworld Daily News (First Look and Wrap-Up)
Computerworld Blogs Newsletter
The Weekly Top 10
More E-Mail Newsletters 
Computerworld 2007Subscribe to Computerworld
40 years of the most authoritative source of news and information for IT leaders.

QuickStudy: Optical Character Recognition

 

Sign up to receive Security Resource Alerts

July 29, 2002 (Computerworld) -- Suppose you wanted to digitize the novel Moby Dick overnight. You could stay up all night typing and still not finish. Or you could use a high-end scanner and in minutes scan all of author Herman Melville's works into a computer using optical character recognition (OCR) technology.

This is the technology long used by libraries and government agencies to make lengthy documents quickly available electronically. Advances in OCR technology have spurred its increasing use by enterprises.


More
Computerworld
QuickStudies


For many document-input tasks, OCR is the most cost-effective and speedy method available. And each year, the technology frees acres of storage space once given over to file cabinets and boxes full of paper documents.

Before OCR can be used, the source material must be scanned using an optical scanner (and sometimes a specialized circuit board in the PC) to read in the page as a bitmap (a pattern of dots). Software to recognize the images is also required.

The OCR software then processes these scans to differentiate between images and text and determine what letters are represented in the light and dark areas.

Older OCR systems match these images against stored bitmaps based on specific fonts. The hit-or-miss results of such pattern-recognition systems helped establish OCR's reputation for inaccuracy.

Today's OCR engines add the multiple algorithms of neural network technology to analyze the stroke edge, the line of discontinuity between the text characters, and the background. Allowing for irregularities of printed ink on paper, each algorithm averages the light and dark along the side of a stroke, matches it to known characters and makes a best guess as to which character it is. The OCR software then averages or polls the results from all the algorithms to obtain a single reading.

Continued...
1 | 2 | NEXT  



Print this Story Send Us Feedback E-mail this Story Digg! Digg this Story Slashdot this Story
Optical Character Recognition
After the Scan Is Over
"Apple fans have made much of the fact that the newest figures from Net Applications show that Apple's share of..." Read more...
"It's IT Blogwatch: in which Mozilla's Firefox Web browser continues to gain market share, smashing records as it does so...." Read more...
Read more Desktop Applications posts or See all Blogs
Microsoft promises four patches next week
Google gives away home-cooked Web application security scanner
Storm botnet stages Fourth of July attacks
More top stories...
Microsoft trumpets security additions in upcoming IE8
Apple cuts price of high-end SSD MacBook Air by $500
Ultrathin showdown: Apple MacBook Air vs. Lenovo ThinkPad X300 vs. Toshiba Portege R500
All it takes is a couple hours and about $125 to breathe new life into an old laptop. Here's how.
Is Microsoft's Golden Age over? What are Gates' most memorable quotes? Find out in Computerworld's complete coverage of the end of the Bill Gates era at Microsoft.
There are some things your CIO definitely doesn't want to hear. Also don't miss the flipside, Five things you should always tell your boss.
With its latest version, Mozilla's browser continues to raise the bar for what Web browsers should be.
Reviews, analyses, how-tos, visual tours, hot issues and predictions about Microsoft's new OS.
Four years from now, the IT field will be a vastly different place. Will you be ready?
All Zones
Application Performance Zone
Business Continuity Zone
Data Center Management Zone
Enterprise-Class Security Zone
The File Data Management Zone
Grid Computing on Windows Zone
Security Management Zone
ITIL Best Practices Zone
The SAS Zone
Storage Virtualization Zone
Business Intelligence and Analytics Zone

Ads by TechWords

See your link here
Computerworld Technology Briefing: An open-source path to optimal virtualization
Download this Technology Briefing now!
(Source: Novell/IBM/Intel) Virtualization is about a lot more than just lowering total cost of ownership. In fact users that have taken an open source path to virtualization have realized the additional, mission-critical benefit of markedly reduced IT complexity, as well as a more flexible infrastructure that is easier to change to meet shifting, often unpredictable business requirements.
Download this executive briefing download
Advance your BlackBerry(R) solution management know-how this July
Advance your BlackBerry(R) solution management know-how this July
BlackBerry Technical Seminar, register today!
Go to the webcast 
Adventist Health Improves Document Access with Single Supplier Solution
Download this white paper, free, compliments of Kodak!
(Source: Kodak) Until 2003, Adventist Health System- headquartered in Orlando, FL-relied on a paper-based filing system to manage medical records. The not-for-profit healthcare system, with over 45,000 employees, wanted to improve access to patient records at all of its 40 hospitals in 10 states. And when they transitioned to an electronic medical records system, the organization wanted to work with the best one-vendor solution for scanners.
Download this white paper go
White Papers
Read up on the latest ideas and technologies from companies that sell hardware, software and services.
Deploying Virtualized NetWare on Linux Whitepaper
Toward More Flexible, Next-Generation Collaboration Solutions
Driving Business Success Through Workgroup Choice and Flexibility
View more whitepapers