The DEVONtechnologies Blog

Check the OCR Accuracy

July 4, 2019 — Eric Böhnisch-Volkmann

OCR, optical character recognition, available in DEVONthink Pro, is a technology that examines the pixels in an image or PDF and tries to determine what alphabetical characters they are. These characters are then put into a layer of text underneath the original image so the file appears unchanged but is now searchable. (Note: You can also let the OCR create plain text documents and more.)

However, sometimes you still aren't finding documents that have had OCR performed on them. Perhaps you received an older document from someone who used a less accurate OCR application. Or maybe the quality of the original image was poor, which made it harder for OCR to detect the characters.

You can estimate the accuracy of the OCR by selecting the PDF, opening the Inspectors page, and switching to the Concorcance inspector. If the text layer is poor, you will see words strung together or even nonsense words.

For a more complete view, you can use Data > Convert > to Plain Text on the PDF and get a copy of the text layer to inspect.