OCR
May 26, 2017Optical character recognition (OCR)
Do you ever struggle to read a friend's handwriting? Count yourself lucky, then, that you're not working for the US Postal Service, which has to decode and deliver something like 30 million handwritten envelopes every single day! With so much of our lives computerized, it's vitally important that machines and humans can understand one another and pass information back and forth. Mostly computers have things their way—we have to "talk" to them through relatively crude devices such as keyboards and mice so they can figure out what we want them to do. But when it comes to processing more human kinds of information, like an old-fashioned printed book or a letter scribbled with a fountain pen, computers have to work much harder. That's where optical character recognition (OCR) comes in. It's a type of software (program) that can automatically analyze printed text and turn it into a form that a computer can process more easily. OCR is at the heart of everything from handwriting analysis programs on cellphones to the gigantic mail-sorting machines that ensure all those millions of letters reach their destinations. How exactly does it work? Let's take a closer look!
What is OCR?
As you read these words on your computer screen, your eyes and brain are carrying out optical character recognition without you even noticing! Your eyes are recognizing the patterns of light and dark that make up the characters (letters, numbers, and things like punctuation marks) printed on the screen and your brain is using those to figure out what I'm trying to say (sometimes by reading individual characters but mostly by scanning entire words and whole groups of words at once).Computers can do this too, but it's really hard work for them. The first problem is that a computer has no eyes, so if you want it to read something like the page of an old book, you have to present it with an image of that page generated with an optical scanner or a digital camera. The page you create this way is a graphic file (often in the form of a JPG) and, as far as a computer's concerned, there's no difference between it and a photograph of the Taj Mahal or any other graphic: it's a completely meaningless pattern of pixels (the colored dots or squares that make up any computer graphic image). In other words, the computer has a picture of the page rather than the text itself—it can't read the words on the page like we can, just like that. OCR is the process of turning a picture of text into text itself—in other words, producing something like a TXT or DOC file from a scanned JPG of a printed or handwritten page.
What's the advantage of OCR?
Once a printed page is in this machine-readable text form, you can do all kinds of things you couldn't do before. You can search through it by keyword (handy if there's a huge amount of it), edit it with a word processor, incorporate it into a Web page, compress it into a ZIP file and store it in much less space, send it by email—and all kinds of other neat things. Machine-readable text can also be decoded by screen readers, tools that use speech synthesizers (computerized voices, like the one Stephen Hawking uses) to read out the words on a screen so blind and visually impaired people can understand them. (Back in the 1970s, one of the first major uses of OCR was in a photocopier-like device called the Kurzweil Reading Machine, which could read printed books out loud to blind people.)How does OCR work?
Let's suppose life was really simple and there was only one letter in the alphabet: A. Even then, you can probably see that OCR would be quite a tricky problem—because every single person writes the letter A in a slightly different way. Even with printed text, there's an issue, because books and other documents are printed in many different typefaces (fonts) and the letter A can be printed in many subtly different forms.
Pattern recognition
If everyone wrote the letter A exactly the same way, getting a computer to recognize it would be easy. You'd just compare your scanned image with a stored version of the letter A and, if the two matched, that would be that. Kind of like Cinderella: "If the slipper fits..."So how do you get everyone to write the same way? Back in the 1960s, a special font called OCR-A was developed that could be used on things like bank checks and so on. Every letter was exactly the same width (so this was an example of what's called a monospace font) and the strokes were carefully designed so each letter could easily be distinguished from all the others. Check-printers were designed so they all used that font, and OCR equipment was designed to recognize it too.
By standardizing on one simple font, OCR became a relatively easy problem to solve. The only trouble is, most of what the world prints isn't written in OCR-A—and no-one uses that font for their handwriting! So the next step was to teach OCR programs to recognize letters written in a number of very common fonts (ones like Times, Helvetica, Courier, and so on). That meant they could recognize quite a lot of printed text, but there was still no guarantee they could recognize any font you might send their way.
Feature detection

Apply that rule and you'll recognize most capital letter As, no matter what font they're written in. Instead of recognizing the complete pattern of an A, you're detecting the individual component features (angled lines, crossed lines, or whatever) from which the character is made. Most modern omnifont OCR programs (ones that can recognize printed text in any font) work by feature detection rather than pattern recognition. Some use neural networks (computer programs that automatically extract patterns in a brain-like way).
How does handwriting recognition work?

Making it easy

Forms designed to be processed by OCR sometimes have separate boxes for people to write each letter in or faint guidelines known as comb fields, which encourage people to keep letters separate and write legibly. (Generally the comb fields are printed in a special color, such as pink, called a dropout color, which can be easily separated from the text people actually write, usually in black or blue ink.)
Tablet computers and cellphones that have handwriting recognition often use feature extraction to recognize letters as you write them. If you're writing a letter A, for example, the touchscreen can sense you writing first one angled line, then the other, and then the horizontal line joining them. In other words, the computer gets a headstart in recognizing the features because you're forming them separately, one after another, and that makes feature extraction much easier than having to pick out the features from handwriting scribbled on paper.
0 comments