Enhancing the quality and accuracy of OCR results

Deloitte's Optical Character Recognition tool "DocuMend" helps with identifying and fixing OCR issues

The Deloitte tool uses a sophisticated ensemble approach to countering OCR problems, packaged into an intuitive, no-code graphical interface. DocuMend first identifies issues and then provides users a quick and easy way to correct them.

Contact

Let’s make this work.

Ändern Sie Ihre Targeting-/Werbe-Cookie-Einstellungen, um das Video anzusehen.

Update

Cookie-Einstellungen verwalten

The Need

Text mining tools powered by natural language processing (NLP) have become indispensable for modern businesses. The quality of NLP solutions heavily depends on the accuracy of the optical character recognition (OCR) engine used. OCR is a technology based on sophisticated computer vision algorithms for recognizing characters from printed books, handwritten papers or images in all possible fonts, sizes, and orientations. With this technology, companies can quickly transform document images into electronic, machine-readable text - the input for many downstream processes, among which text mining solutions.

OCR accuracy has improved over the years and is considered by many as a solved problem. Nevertheless, errors still occur in practical application, which, if left undetected, can skew results – especially where machine learning NLP algorithms are involved. Sources with tiny font sizes, blurred copies or colored paper can trip up OCR algorithms. The resulting electronic text errors are easy to detect for human readers, who are generally able to infer something is wrong. NLP algorithms, however, could completely misinterpret the text, leading to failure of the text mining tool, which requires manual work finding and correcting the failed conversion

Our solution: DocuMend

Deloitte DocuMend identifies the issues by assessing the accuracy of OCR layers both at a document and individual word level. Users may choose among several OCR engines and set the confidence threshold for OCR quality. The accuracy assessment is overlayed onto the original text in the PDF as a sort of textual heat map – ranking the OCR-processed words from lowest to highest confidence in OCR quality.

Up until now, quality control of OCR required humans to proof-read, finding errors either through context or intuition. DocuMend takes advantage of multiple OCR engines to cross-validate, highlighting discrepancies between the engines as likely sources of error. To correct the identified errors, users navigate through the textual heat-map document or work directly from a list where identified errors are ranked from most to least certain, down to the user-specified threshold.

Advantages/Benefits

Automated assessment of OCR quality across multi-page documents
Visual feedback via word-level confidence heat-maps

Intuitive, no-code graphical interface allows business users to quickly navigate documents and gain confidence in the quality of their OCR layers
Direct user-interaction to correct errors on the spot

Example Use Cases

Improve quality of CV screeners for AI-supported HR recruiting processes
Improve reliability of fit-and-proper screening for banks with regulators

Increase confidence in digital document processing adoption within the business
Lay a strong foundation for countless AI applications making use of text-mining / NLP

Here you can download the DocuMend fact sheet:

Download

Enhancing the quality and accuracy of OCR results

Let’s make this work.

The Need

Our solution: DocuMend

Advantages/Benefits

Example Use Cases

Here you can download the DocuMend fact sheet:

Get in touch

Our thinking

Abonniere uns

Together makes progress

Together makes progress

Together makes progress

Together makes progress

Together makes progress

Together makes progress

Zerograding – The Net-Zero Growth Engine

Innovation & AI

Zerograding – The Net-Zero Growth Engine

Innovation & AI

Zerograding – The Net-Zero Growth Engine

Innovation & AI

Zerograding – The Net-Zero Growth Engine

Innovation & AI

Industry Briefings

Economic Trend Briefings

Industry Briefings

Economic Trend Briefings

Industry Briefings

Economic Trend Briefings

Industry Briefings

Economic Trend Briefings

Industry Briefings

Economic Trend Briefings

Industry Briefings

Economic Trend Briefings

Karriere Blog

Deine Einstiegsmöglichkeiten

Karriere Blog

Deine Einstiegsmöglichkeiten

Karriere Blog

Deine Einstiegsmöglichkeiten

Karriere Blog

Deine Einstiegsmöglichkeiten

Karriere Blog

Deine Einstiegsmöglichkeiten

Karriere Blog

Deine Einstiegsmöglichkeiten

Karriere Blog

Deine Einstiegsmöglichkeiten

Enhancing the quality and accuracy of OCR results

Let’s make this work.

The Need

Our solution: DocuMend

Advantages/Benefits

Example Use Cases

Here you can download the DocuMend fact sheet:

Get in touch

David Thogmartin

Harshitha Rao Gandhe

Our thinking

Deloitte aiStudio

AI Reliability with Deloitte AI Qualify

More Effective Deep Learning with Deep Label

AI Fairness with Model Guardian

Abonniere uns