![]() The App Store offers over 50 products, including relatively expensive heavyweights like ABBYY’s FineReader OCR Pro and ExactScan Pro, each around £70-£90. ![]() It was released as open source ten years later.īut OCR is far from dead. HP’s flagship OCR engine, Tesseract, had further development frozen in 1995, despite being one of the best three performers. When you are looking for the word ‘contract’, you will miss those instances in which it has been misrecognised as ‘contact’, a problem that fuzzy search features have tried to alleviate. Offices that had invested substantial sums in scanners and OCR products discovered the real meaning of 98% character accuracy: on an average page there would be ten or more errors, and when scan images or original source material were less than perfect, every other word could be garbled.įew had even considered the effort or cost involved in proof-reading and correcting OCR output, making content searches a gamble at best. In the late 1980s, vendors like Caere Corporation made handsome profits from OmniPage and its competitors before the market cooled. Recognising that companies and organisations have voluminous legacies of paper records, the solution was a combination of expensive high-speed flatbed scanners and state-of-the-art OCR software that would automatically convert the contents of those thousands of pages into searchable text. OCR was touted as central to the strategy for a paperless office. So how, after decades of research investment, can we still not match that in software? Is Optical Character Recognition (OCR) software consistently inferior in its reading accuracy, and if so, how? Even when working through faded typescript, scrawly handwriting, or the most extreme of fonts, most of us read accurately and at speed. The best method for teaching children to read is hugely controversial, but once we have mastered the skill, the fastest and best readers recognise not individual letters, or even words, but altogether larger chunks of text. Does it, or has it still failed to deliver? OCR was supposed to be an essential tool for turning scanned documents from images into searchable text content. ![]()
0 Comments
Leave a Reply. |