eDiscovery Sampling - The What, Why, How, When and Who
Discovery Digest – Q3 2012
For eDiscovery practitioners who haven’t tried sampling, it may seem like a leap into the unknown.
- How do they know that they are sampling correctly?
- What if the other side challenges their methods?
- Will the judge find sampling defensible?
When done correctly, sampling can provide specific insights into data populations in ways that are mathematically tested. Before taking the leap, eDiscovery practitioners should understand the different types of sampling, the different ways to approach sampling and different reasons to sample.
For those who haven’t tried it yet, here’s a quick guide into the what, why, how, when and who of sampling.
What is sampling?
There are several different kinds of sampling. Statistical sampling and judgmental sampling are two of the most commonly used types in eDiscovery.
Statistical sampling is a method used to estimate a particular property of a large population by examining only a subset, or a small but representative sample, of the document population that has been chosen randomly. Statistical sampling is not just a guess about what a particular set of files may include. It is a reasonably precise mathematical measurement.
The property in question could be a number or proportion, such as the number or proportion of responsive or privileged documents in a document collection. Numbers or proportions are different ways of describing the same type of information. For example, in a jar with 200 jelly beans, you could say 30 are red or 15 percent are red. Both are accurate and reflect different ways of describing the how many red jelly beans are in the jar.
Statistical sampling is random. There are several different ways to create a random sample, but it doesn’t include selecting every 10th document from the population.
Judgmental sampling is a way of spot-checking a chosen subset of a document population. Unlike statistical sampling, it’s not random and your legal team can’t draw mathematical inferences from it.
While judgmental sampling is cheaper than statistical sampling, its uses are also much more restricted. The legal team can only apply conclusions to the specific subset of documents that have been reviewed. Results can’t be applied to a larger population.
Judgmental sampling makes sense when the legal team is familiar with the document set or is trying to identify potential custodians or when a subject matter expert who is familiar with the document set can conduct the sampling.
There are legal and practical reasons to sample. From a legal perspective, the courts are increasingly demanding that litigants use systematic quality control measures throughout eDiscovery, which often include sampling.
From a practical point of view, legal teams can use sampling to estimate how many documents overall in a collection may be responsive or privileged. With the sheer amount of electronically stored information that resides in today’s organizations, it costs too much money and takes too much time to count and review every single file.
Instead, legal teams can use statistical sampling to create a sample count, then examine each document in the count and determine how many of the documents are responsive or privileged. With this information, the legal team can statistically estimate how many documents in the total collection are responsive or privileged.
There are several areas where legal teams can use statistical sampling to overcome eDiscovery challenges or provide a strategic advantage, including:
- Sampling can help determine whether to restore backup tapes by providing insights into how many potentially responsive files reside on those tapes. This can help the team decide whether to restore the tapes and make a compelling argument to support that conclusion to the judge
- Helping to determine proportionality and whether the cost of the review is worthwhile in light of what is at stake in the matter
- Deciding if the team has reviewed enough documents and can stop searching or reviewing any others
- Estimating the effectiveness of specific search terms, methodologies or tools, including human review and new technologies. Through statistical sampling, the legal team can develop the so-called confusion matrix in order to identify how many documents have been correctly coded as responsive (true positives); how many nonresponsive documents have been incorrectly coded as responsive (false positives); how many nonresponsive documents have been correctly coded as nonresponsive (true negatives); and how many responsive documents have been incorrectly coded as nonresponsive (false negatives).
- Guiding and defending the eDiscovery strategy
- Identifying cost-effective and efficient approaches
How to Conduct Sampling?
Once a random subset of documents has been selected, the team reviews the documents and counts how many are responsive or privileged. Then, the proportion of responsive documents is calculated by dividing the number of responsive documents by the total number of documents in the sample. The proportion of responsive documents in the entire population likely be roughly equal to the proportion of responsive documents in the sample.
If the legal team repeats the process, the estimate of responsive documents likely be slightly different each time, but each estimate will be close to the actual number.
When to sample?
Sampling can be done throughout the lifecycle of eDiscovery. It can be performed early on to determine if a particular custodian or backup tape has potentially responsive information. It can be part of an early case assessment. This allows the legal team to enter into the FRCP Rule 26(f) Meet and Confer meeting with a clearer vision of the scope and depth of the potential discovery and turn that information into a strategic advantage.
Sampling can be done during the review, to see if search terms have been identified correctly and are turning up appropriate documents. It can also be done after the review, to check the accuracy of human reviewers.
It’s important to remember that sampling is not a means onto itself. Instead, it can be used to provide another source of information that the legal team can leverage in order to ensure the recognized outcome for the client.
Who should assist with sampling?
When conducting statistical sampling, the legal team should discuss with statisticians and rely on tested technology. This is complex work, and few attorneys have the necessary background to ensure that their statistical sampling likely yield the desired result, determine the appropriate confidence level and margin of error, calculate the proper sample size, draw a random sample of the appropriate size and compute the correct estimate.
By consulting experts in the field, the legal team can understand acceptable practices and gauge their own comfort level for uncertainty or margin for error.
Some attorneys may be nervous about the idea of sampling, since it relies on technology rather than human review. However, it’s important to remember that human reviewers make mistakes, too.
When done correctly, statistical sampling can provide a defensible, cost-efficient tool for eDiscovery.
As used in this document, “Deloitte” means Deloitte Financial Services LLP. Please see www.deloitte.com/us/about for a detailed description of the legal structure of Deloitte LLP and its subsidiaries. Certain services may not be available to attest clients under the rules and regulations of public accounting.
While the information in this article may deal with legal issues, it does not constitute legal advice. If you have specific questions related to information discussed in this article, you are encouraged to consult an attorney who can investigate the particular circumstances of your situation.