OCRchie Project Spring 1996

OCRchie: Modular Optical Character Recognition Software

This project was submitted as a senior undergraduate honors project by Kathey Marsden , a Spring 1996 Computer Science graduate from the University of California at Berkeley. This Optical Character Recognition software package began with work in CS169 Fall, 1995, Software Engineering, under the direction of Professor Richard J. Fateman . The original OCR package could learn from a tif file and ascii translation, then recognize a document in the same font. This semester we added interactive learning, interactive segmentation of mathematics, page zoning (the ability to automatically or manually zone columns or regions of text, and interactive read-order specification.

The orginal team members were Archie Russell, James Hopkin, and Cynthia Tian, who contributed significantly to the original design.

Improved/Cut Down Version

(March, 2001) Keith Davies, kjdavies@telus.net has substantially revised this project and made it work for his application, which appears to require reading numeric digits. He also removed the dependence on Tcl/TK, which had become a troublesome issue: the Tcl/TK versions have improved to the extent that OCRchie needed changes to continue to run.

In any case, Keith was kind enough to send this material back to us, and we have posted it in this directory. If you make use of it perhaps you should keep Keith informed as well. Start by reading Keith's email about the changes and re-organization. There is a compressed tarball in that directory that can be moved at your convenience.

-- Richard Fateman

Project Documents

OCR Reference Links

If the links below have moved, you will undoubtedly be able to find many links with a search engine. Here are a few current sites (last updated July, 2000.)

Various non-technical industry reports and press release reviews on OCR. At least some were written by non-experts, and judging from the errors, may have been scanned in! How else to explain misspelling ASCII as ASCH ?
An older (1997) but more respectably academic collection of links on OCR

TIFF Reference Links

Some info on TIFF

Comments? Mail fateman@cs.berkeley.edu