Saving Time for Deeper Analysis

Deloitte's form interpreter and extraction tool "Formalyzer" transforms semi-structured forms into tables

The Deloitte tool Formalyzer tool solves common OCR (Optical Character Recognition) issues by combining multiple Computer Vision and Natural Language Processing (NLP) methods to automate mundane form filling.

Contact

Let’s make this work.

Ändern Sie Ihre Targeting-/Werbe-Cookie-Einstellungen, um das Video anzusehen.

Update

Cookie-Einstellungen verwalten

The Need

Sound analysis is based on data… often, the more, the better. Organizations have long relied on machines to analyze large volumes of data, and the hunger for data has only increased with the ubiquity of Artificial Intelligence (AI) use cases. For machines to process data, it must be structured as a table or database that can be programmatically queried. Yet data doesn't always come formatted perfectly. Many documents in digital form are "unstructured" narrative or "semi-structured" forms – available, just not easily used. Or the digital documents are mere images – scans without embedded OCR (optical character recognition) – from which content cannot be selected in the first place.

The ubiquitous Portable Document Format (PDF) guarantees formatting consistency in a compact file size. It is also notoriously unhelpful to those seeking to extract tabular data from its contents. This difficulty lies in the fundamental design of PDFs to be easily read by humans, not machines. Unlike spreadsheets, PDFs store tables and text as vector graphics. Formatted templates – with fields distributed across the page and often mixed with images – are no better. Futile copy-paste attempts leave a chaotic mess of concatenated and often out-of-sequence numbers.

The result: analysts are too often left with no option other than to manually transfer data into editable formats (spreadsheets). This labor intensive and error-prone process ties up qualified resources with menial tasks. It represents not only a costly productivity drain, but invites fatigue-related manual errors, and leaves analysts less time to do what they were hired to do… analyze.

Our solution: Formalyzer

Deloitte’s table extraction tool Formalyzer addresses this very issue, joining multiple Computer Vision and Natural Language Processing (NLP) methods to provide a simple solution to this all-too-common problem. Formalyzer uses a small sample of documents to learn the layout of a particular form. Users "train" the tool's neural networks to recognize where to extract text or numerical values from locations on the page. Just a handful of samples suffice to create the templates that Formalyzer follows. Users may even specify multiple templates – to handle multiple pages, or inconsistent formats, or imperfectly scanned forms – and Formalyzer will use the most successful ones to extract the contents.

The templates then equip Formalyzer to process thousands of similar forms, extracting the values distributed across the pages into the fields of a database. The input forms may be either PDFs with embedded text (OCR layer) or so-called “dirty scans” (only images). The intuitive graphical user interface guides the user through training on new templates, uploading documents for processing, viewing individual results on-screen, and exporting results for data processing in other applications.

Advantages/Benefits

Analysts can focus on analysis vs. data collection and aggregation
Reduced transmission error
Automatically finds variables and their associated values

Stores these as a table in a CSV or other document file type
Reads thousands of documents via batch-processing
Flexible to take on new and multiple formats
Can be implemented anywhere – on a public or private cloud, on local machines

Example Use Cases

Facilitating balance sheet analysis (e.g., for underwriting SME / corporates)
Reading out tax forms, or other non-tabular forms

Automation of data entry (such as from Energy Performance Certificates)
Integration into existing workflows: ingesting scans and sending results to the following process step

Here you can download the Formalyzer fact sheet

Download

Saving Time for Deeper Analysis

Let’s make this work.

The Need

Our solution: Formalyzer

Advantages/Benefits

Example Use Cases

Here you can download the Formalyzer fact sheet

Get in touch

Our thinking

Abonniere uns

Together makes progress

Together makes progress

Together makes progress

Together makes progress

Together makes progress

Together makes progress

Zerograding – The Net-Zero Growth Engine

Innovation & AI

Zerograding – The Net-Zero Growth Engine

Innovation & AI

Zerograding – The Net-Zero Growth Engine

Innovation & AI

Zerograding – The Net-Zero Growth Engine

Innovation & AI

Industry Briefings

Economic Trend Briefings

Industry Briefings

Economic Trend Briefings

Industry Briefings

Economic Trend Briefings

Industry Briefings

Economic Trend Briefings

Industry Briefings

Economic Trend Briefings

Industry Briefings

Economic Trend Briefings

Karriere Blog

Deine Einstiegsmöglichkeiten

Karriere Blog

Deine Einstiegsmöglichkeiten

Karriere Blog

Deine Einstiegsmöglichkeiten

Karriere Blog

Deine Einstiegsmöglichkeiten

Karriere Blog

Deine Einstiegsmöglichkeiten

Karriere Blog

Deine Einstiegsmöglichkeiten

Karriere Blog

Deine Einstiegsmöglichkeiten

Saving Time for Deeper Analysis

Let’s make this work.

The Need

Our solution: Formalyzer

Advantages/Benefits

Example Use Cases

Here you can download the Formalyzer fact sheet

Get in touch

David Thogmartin

Harshitha Rao Gandhe

Our thinking

Deloitte aiStudio

More Effective Deep Learning with Deep Label

Deloitte Fact Finder

Saving Time with table extraction tool TableMiner

Abonniere uns