Bookmark Email Print page

10 techniques for advanced data culling and document review

How you can reduce data processing and review costs

10 techniques for advanced data culling and document reviewThe Issue

When faced with electronic discovery, companies often tackle massive volumes of data, but must do so in a fashion that is both defensible and cost effective. So, where to begin?

To help manage large data sets and overcome potential obstacles, Deloitte Discovery has established the following essential techniques, in conjunction with subject matter specialists in computer forensics, data analytics, and electronic discovery, to help reduce data processing and review costs.

Data Culling1

File type analysis and culling

Identifying system files and unique file types can allow you to remove files that may not be relevant. One of the simplest methods of cost reduction is to apply this technique before processing. Examples may include computer aided files, log files, personal multimedia files, database files, and non-relevant industry specific application files. Deloitte Discovery provider can supply you with a list of file types and then discuss the nature of these to identify file types that are likely to be irrelevant.

Date range analysis and culling

Identifying date ranges of documents2 is another way to remove irrelevant data before processing. Documents without dates can be examined and addressed as necessary. 

Global deduplication

Removing duplicate documents across all custodians within the matter can significantly reduce review volume and greatly enhance review efficiency, productivity and consistency. All duplicate-custodian information is captured to allow execution of productions ‘by custodian’.

Email domain name analysis and culling

Identifying all email domains and the frequency with which they occur in the collected population can allow you to remove emails with non-relevant domains prior to processing. This process allows you to eliminate the “spam” that we all subscribe to e.g. broadcasts from newsgroups.

Search term list creation, validation and non-hit culling

Using Boolean, proximity, wildcard and keyword expansion operators like term stemming, fuzzy term logic and concept search can help refine search term lists. By combining these operations, you can increase defensibility and efficiency. What did that last sentence mean? Rather than using very simple keyword searches, think about how you might combine keywords together with an operator (e.g. AND, OR, within 5 words of etc) or better still think of key concepts to help pinpoint relevant documents.

In addition, validating search term lists can help reduce produced documents prior to document review. Validation can help refine lists by reviewing a statistically valid, but random sample of both search hit and non-hit documents. The search term lists should be assessed and fine-tuned so they return a broad, but not overly inclusive population of potentially relevant documents. At the same time, it should not over-cull the entire universe of documents. Documents that do not hit on search terms should then be removed before document review begins.

Advance Document Review

File type analysis and review population

Similar to the file type analysis during data culling, identifying file types that can be set aside initially can increase review pace. Such file types include multimedia, large size files, files unique to an industry, foreign language files and more. These files can be separated and addressed by a specialised team trained for the challenges inherent to them. Conversely you may wish to concentrate your initial review to a specific file type i.e. emails, Word documents, spreadsheets etc as the matter dictates.

Custodian communication-frequency analysis and review

Identifying custodians that frequently communicate with key parties can allow you to prioritise them for review. You can base their prioritisation on the likelihood of potentially relevant material within the custodian’s data population.

Clustering and predictive coding

Using technologies to identify potentially non-relevant populations of documents can help with prioritisation and review, as well as sample and cull from review. Clustering organises documents into subsets of reoccurring themes. These subsets can be evaluated by theme and culled if irrelevant. Meanwhile, predictive coding assumes that the lower the ranking, the more likely the document is non-relevant. Low ranking documents can be culled from the review after a statistically valid sample has been reviewed to confirm non-relevance.

Review batch organisation

Working within a document review batch, which is a logically organised document set by concept, predictive coding or near duplicates, that contains highly similar documents can increase review speed, accuracy and consistency. In addition, one or more technologies can be used to create the review batch protocol to assist with organisation.

Prioritised relevance review

Utilising document text to identify potentially relevant documents and stage them first for review can assist legal teams to meet tight production deadlines. The remaining potentially non-relevant population can be sampled to confirm non-relevance and eliminated from the active population or can be staged later for review. This type of approach uses statistical modelling incorporating a users (i.e. lawyers) relevance assessment of a document population. By analysing the text and various other characteristics of a document marked as relevant or conversely not relevant, the technology sorts the remaining document population and pushes forward the documents it believes are likely to be more relevant.

Summary

As the idiom suggests, there are many ways to skin a cat. So too, there are many ways to help reduce your document set and increase your review performance both defensibly and cost effectively.


¹ Entire corpus can be analysed in whole and also at a custodian level to enable dynamic culling rules
2 ‘Documents’ refers collectively to emails and user files unless stated otherwise

Author

Benny Lee, Director, Forensic

Related links

Stay connected:
Get connected
Share your comments

                                                

More on Deloitte
Learn about our site