Use It Or Lose It: How to Get the Most Out of Analytics

By Katharine Perekslis

While hosting analytics and technology-assisted review have been hot topics in the e-discovery world over the last several years, what one might find more relevant than the technologies available is how case teams are using certain features for faster and more efficient reviews.

Keyword Expansion – Through the use of algorithms, you can indentify words in close proximity to your key term(s). For example, let’s say a case team is looking for the code words used by custodians in a money laundering case. Utilizing keyword expansion on hot terms will uncover additional words and phrases referencing criminal activity, allowing your analytics index to uncover relevant emails that may have otherwise been missed.

Concept Searching and “Find Similar” – Submitting a sentence or paragraph as a concept search can be more accurate than using keywords when the search term has multiple meanings, such as the word “strike.” Running a concept search on the hot issue in a case and batching out the results—prior to batching out the rest of the data set—will help reveal key documents faster. Furthermore, utilizing “Find Similar” on privileged search terms assists in identifying additional terms that should be run through the privilege filter. This technology also helps with accurately spot-checking reviewers’ work and locating any additional privileged documents missed on first-level review.

Clustering – Weed out non-relevant material in large data sets or identify the most interesting clusters to focus on first when building your case strategy. Clustering allows reviewers to focus on similar documents consecutively for a faster review rate. Locating a large group of emails that contains highly non-responsive material, such as flirtatious banter, fantasy football smack talk, or spam emails, enables the reviewer to quickly tag the group and move on.

Email Threading and Near Duplicate Detection – Email threading allows reviewers to focus on inclusive emails only and quickly make decisions about other emails in the chain, reducing the overall review volume. Near duplicate detection addresses the challenge of various email providers rendering emails differently, preventing them from being removed using traditional deduplication methods—an issue that has become increasingly pressing for case teams as businesses have moved away from Microsoft Outlook to other mail platforms. This tool also helps locate different versions of a document and then groups them together.

Categorization – Finding documents that match the concepts in user-provided examples, categorization uses complicated algorithms to score the matches and cuts off at a minimum threshold of similarity. In technology assisted review, human-coded documents become the examples used to train the system, and the rest of the data set is categorized into responsive and non-responsive buckets. This technology recently helped one of our clients—a two-person defense team in a labor litigation—quickly whittle down the plaintiff’s production from 100,000 records to the 5,000 most relevant documents, saving a significant amount of time and money for their client.

As technology improves and data sizes grow, it is essential to take advantage of the benefits of analytics to keep your review organized, focused, and efficient. Initially, analytics may seem to be an expensive add-on to a hosted solution, but the value-add of reduced review time, and eventual long-term savings, can be incalculable.