5 Questions About What Government Agencies Should Consider When Collecting Data
An interview with Patrick McColloch, director, Discovery,Deloitte Transactions and Business Analytics (DTBA)
It can be a challenge for government agencies to collect electronic data in response to their litigation and investigation matters. The volume of electronic data, as well, as the number of custodians and the far-reaching locations of the custodians can present a challenge during the collection process. Here are five questions regarding the collection process that may help you be more efficient in your collection efforts.
|Why is it important for
to stay flexible during
the identification and
The identification of custodians and potential data sources and data stores is an iterative process. Based on the subject of the complaint or request, you are likely to immediately identify primary sources (e.g. contracts cases would lead you to a contracting officer; employment cases lead you to the human resources office). However, during the subsequent interviews with the initial custodians you should always ask open ended questions such as “Who else might have information related to this? What other systems or records group should I search?” The answers to these questions should be fully documented in your interview notes and you should follow through with new potential leads. An iterative process combined with open ended interview questions will lead to many more data sources and data stores. You should be flexible enough to head down every reasonable trail that presents itself.
This becomes particularly important when it comes to protracted dates. Cases tend to drag out over long periods of time, or in addition, can have a start date that goes back some time. Within the government, organizational changes occur over time and the role and responsibilities of departments change. Similarly, individuals change jobs and may even leave an organization through retirement or job rotations. If you are dealing with a matter that dates back 15 years, you are likely going to have to work back through several different custodians and possibly even different organizational structures. You must be prepared to deal with the issue of organizational and employee changes over time.
One tool to assist with this challenge is a datamap or a spreadsheet that documents the geographic and chronological scope of the matter, the types of data housed at each location and the custodians in charge of that data over time.
|What are some of the
strategies to analyze
your data for problems
and pitfalls once you
have collected it?
After you have collected your initial wave of information, it is important to start analyzing the data to help identify collection gaps or additional sources that need to be considered. For example, after emails and other Electronically Stored Information (ESI) have been loaded to a review platform, you can utilize simple analytic features to:
|I was expecting less
than 5 GB of data, but
it has exploded. Now
This challenge is clearly not unique to the government but can be exacerbated by some of the issues identified above.
First, consider refining the keywords used during the collection process. Additional culling can be implemented after your initial collection to reduce the volume. Run search term hit reports to understand which terms brought back
Second, consider using technology assisted review to expand the capacity and throughput of your human review team to review larger amounts of documents with less human and chronological time. Technology assisted review techniques and tools will take the decisions made by humans on relevance and privilege on a small sample, or seed set, of documents and apply those decisions to documents that the technology determines have the same characteristics of the documents in the seed set. This technology allows your review team to only have to look at a much smaller set of documents and then rely on the technology to systematically apply those decisions to the rest of the collection.
Third, sometimes the volume of relevant data is just much larger than anticipated prior to your collection process and even if you employ some of the technical solutions discussed it will still take you much longer than planned to complete your production. Negotiating a schedule of rolling productions with the receiving party is a good way to satisfy both sides. With a rolling production you agree to produce data in smaller production sets on a set schedule instead of one large production at the end of discovery. The receiving party gets the benefit of receiving data sooner and the producing party is able to extend out the final discovery deadline. It can be even more effective if the parties agree on the priority and order of the types of data produced, allowing the receiving party to ask for what they most need to work with and the producing party to hand over the “low hanging fruit”. This type of communication and cooperation can help avoid discovery disputes.
|What is the difference
between a forensic
image and a forensic
At times individuals may confuse making a “forensically sound collection” with the act of performing a full forensic image of media. In fact, there is a huge difference in terms of process and more importantly the resulting volume of information. Creating a full forensic image of a media (such as a PC Hard drive) is a technical approach, employing specialized hardware and software to make a “bit by bit” image of the original media. While this is often the simplest approach from a collection perspective, this is analogous to checking out the entire library when in fact all you were interested in were the books dealing with contract law. The resulting information is potentially exponentially more than was required and will require expense and effort later to eliminate.
Conversely, a forensically sound collection methodology simply means that all files are properly collected and the
It should be noted, however, that there may be times when a full forensic image is the best approach. Examples include:
|What is structured
data and how do you
identify it and deal
Each government agency has designed databases and systems that are unique to their mission. During the collection process, you will likely run into applications and databases used by custodians to perform their jobs. Although these systems may not contain documents that we traditionally search for during Discovery requests they can and likely do, contain information that is relevant and potentially responsive to a request. This type of information is classified as structured data. Structured data is information stored in discrete pieces and is categorized by type and organized into groups. A simple way to think of structured data is information that is stored in columns (fields) and rows (records). Spreadsheets and databases are examples of structured data.
Structured data systems are used to house information and facilitate many business functions and can be built in a wide
Once you have identified a structured data system that contains relevant information, a plan of action specific to that
Arguably, data collection is one of the most important phases of the discovery process. Therefore, becoming educated about different collection protocols and tools and staying flexible during the collection phase is extremely beneficial.As with other phases of discovery, employ a team approach to share the tasking, document all key decisions and assumptions and rely upon others such as IT, records management and subject matter experts to point you in the right direction.
As used in this document, “Deloitte” means Deloitte Transactions and Business Analytics LLP, an affiliate of Deloitte Financial Advisory Services LLP. Deloitte Transactions and Business Analytics LLP is not a certified public accounting firm. Please see www.deloitte.com/us/about for a detailed description of the legal structure of Deloitte LLP and its subsidiaries. Certain services may not be available to attest clients under the rules and regulations of public accounting.