Modernizing eDiscovery with Automated Processing and ElasticSearch

AEC

Faced with a finite timeline and over 6 million pages of email to sort through to build their case, the opposing counsel had left a small intellectual property law firm in a tough spot. Looking through the data file by file with the firm's team of clerks and analysts was going to take months, if not years.

What Henry Law Firm needed was the ability to quickly filter the mass of email text data containing very specific content, between very specific custodians. Further complicating matters, the dataset was so large in volume that traditional eDiscovery platforms were prohibitively expensive with fees climbing into the mid 5-figures each month.

Knowing that ElasticSearch was the right tool to facet and search large quantities of text data, Lofty Labs built software specifically around the case and email dataset using ElasticSearch, Python, and Django. The solution started with an data cleansing process that codified hundreds of thousands of text files into consistent metadata and indexed it into a managed ElasticSearch cluster. Iterating on designs with the client, Lofty Labs consultants built workflows and tooling that allowed the client’s team to apply their query and tagging process to the dataset. Once the data was cleansed and indexed, Lofty Labs consultants provided statistical and text analysis on the dataset including de-duplication and relevance distribution.

Lofty Labs consultants were able to quickly dismiss over 50% of the dataset as duplicated content using checksum hashing and a “bag of word” analysis. This duplication analysis was complex and yielded faceted results—some content was undupiclated, while other emails were duplicated a dozen times or more—allowing analysts to deeply investigate any intentional obfuscation of information.

In just 90 days a team of two analysts were able to filter the number of items relevant to their case to just under 50,000 documents, a scant 8.3% of the original document set. Ultimately, it was only 80 documents that settled the case. Lofty Labs’ fees to develop the solution were slightly less than one month of hosting fees for comparable eDiscovery tools hosting the same volume of data. This allowed Henry Law Firm to keep the data in a searchable format for many months during the case in the event that new documents were produced, resulting in excess of 600% return on investment. After such a large success, Henry Law Firm re-engaged Lofty Labs to build similar software for document production for additional cases.