This site uses cookies to provide you with a more responsive and personalized service. By using this site you agree to our use of cookies. Please read our cookie notice for more information on the cookies we use and how to delete or block them.

Bookmark Email Print this page

WARC This Way

eDiscovery and the ISO 28500 standard for web content collection and preservation


With the ever growing prevalence of transactions, communications and operations conducted solely on the web, it is important for the eDiscovery community to examine standards for web content collection and preservation. This article discusses the ISO 28500 Web ARChive (WARC) standard for web content collection and preservation and its evidentiary implications for electronic discovery.

Let’s begin by saying that reasonableness is the real standard in most things, including eDiscovery. Put another way, if there is a standard for a given task, are its risks, costs, timing and workflow such that the standard should be followed or not? Courts using this standard will look at both the ultimate decision and the process by which a party went about making that decision. Of course, each case, each court and each client is different and adherence to eDiscovery standards and leading practices varies with each risk profile, but it is safe to say it is usually a good idea to adhere to standards and leading practices. The ISO 28500 WARC file format is the internationally recognized standard for website preservation and therefore, a good starting place for ediscovery practitioners.

Web content is anything delivered over Hypertext Transfer Protocol (HTTP), including web sites, social media sites, intranet sites, Wikis, blogs and other similar sources. Web content preservation and collection is not new, but the inclusion of web-based content in litigation and regulatory investigations is becoming more the norm rather than the exception. For example, in the case, E.E.O.C. v. Original Honeybaked Ham Co. of Georgia, Inc., No. 11-cv-02560-MSK-MEH (D. Colo. Nov. 7, 2012) involving allegations of sexual harassment, a hostile environment and retaliation, the court granted, in part, the Defendant’s Motion to Compel and ordered discovery of the class members’ social media (Facebook), text messages and email. Prior to the ruling, the court indicated that class members had utilized “electronic media to communicate” about potentially relevant topics and described that content “as though each class member had a file folder entitled ‘Everything About Me,’ which they have voluntarily shared with others” and that if there was relevant information that could lead to the discovery of admissible evidence within this folder, “the presumption is that it should be produced.” The court further reasoned that the fact that the evidence resided on the web, “[i]s a logistical and, perhaps, financial problem, but not a circumstance that removes the information from accessibility by a party opponent in litigation.”

If you happen to work in web archiving for the Library of Congress, or a national library of virtually any country in the world, you are likely aware that the ISO 28500 WARC format is the standard for web content collection and preservation.

To learn more, download the full article.

As used in this document, “Deloitte” means Deloitte LLP and its subsidiaries. Please see for a detailed description of the legal structure of Deloitte LLP and its subsidiaries. Certain services may not be available to attest clients under the rules and regulations of public accounting.

Related links

Share this page

Email this Send to LinkedIn Send to Facebook Tweet this More sharing options

Stay connected