Deloitte’s table extraction tool: TableMiner
Deloitte’s table extraction tool “TableMiner” reproduces tables from unstructured (pdf) documents into spreadsheets, taking all-too-common dirty work out of daily life of the analyst.
Sound analysis is based on data… generally, the more, the better. Modern organizations increasingly rely on machines to analyze large volumes of data. This data must be structured, i.e. in the form of tables and databases that can be programmatically queried. The data age has ushered in widespread availability of structured data. Often, but not always. Some data remains “unstructured” – buried within narrative of reports, or inserted as tables within published (digital) documents. The data may be available, yet it is not easily accessible for machine-enabled analysis.
The ubiquitous Portable Document Format (PDF) guarantees formatting consistency and in a generally compact filesize. It is also notoriously unhelpful to those seeking to extract tabular data from its contents. This difficulty lies in the fundamental design of PDFs to be easy on the eyes. Unlike other formats (MS or other Office formats), which store tabular data explicitly as embedded tables, PDFs store tables and text as vector graphics. Converting content to graphics preserves formatting at the cost of removing context: any formatting and structure is lost when copying and pasting text out of a PDF document. Already a problem with e-documents (Office documents) saved as PDFs, scans saved as PDFs without embedded OCR (optical character recognition) are even more unwieldy.
The result: analysts are left with few options other than to manually transfer data to editable formats (spreadsheets) – a labor intensive and error-prone process. This binds qualified resources to menial tasks, representing a costly productivity drain, inviting fatigue-related manual errors, and leaving less time for value-added analytical work.
Neues Fenster öffnen
Neues Fenster öffnen
Neues Fenster öffnen
Neues Fenster öffnen