Legal Technology in Second Requests: Are You Optimizing Your Data-Driven Workflow?
At TransPerfect Legal’s third annual Antitrust Clearance and Merger Enforcement conference, a panel of esteemed antitrust attorneys and legal technologists examined how the latest developments in eDiscovery and legal technology can streamline the substantial compliance process for a Second Request.
This blog post summarizes the session’s highlights and unpacks the latest legal technology trends that can be a boon to complex and fast-moving eDiscovery matters.
TAR for Privilege
While TAR (Technology Assisted Review) is employed on most Second Requests to streamline the identification of responsive documents, this AI-powered tool also is increasingly being leveraged to significantly optimize the privilege review process. To avoid producing privileged material, search terms often cast a very wide net, resulting in high document counts for privilege review. TAR 2.0, also called CAL (Continuous Active Learning), in combination with privileged search terms, is being used to prioritize documents most likely to be privileged by pushing them to the top of the review queue.
The benefits of this workflow are twofold. First, it reduces the number of clawbacks by identifying additional privileged documents that did not hit on privilege screen terms. Second, it reduces privilege downgrades by quickly identifying documents that are not privileged despite hitting on overbroad privilege search terms. Reduction of clawbacks, of course, reduces friction both with clients and the regulatory agency. The panelists noted that to have a successful privilege model, it is important to train the model on documents that are privileged but also on documents that are not privileged. There is also the potential to leverage previous privilege models if certain conditions are met.
TAR for Short Message/Chat Data
Short message data such as Teams messages, text messages, Slack, Telegram, and many others, present a challenge for review when large volumes are in scope. Previously, short message data was tediously reviewed in an Excel spreadsheet, line by line. Today, there are far better workflows offering more flexibility, with the ability to group messages, by day, week, thread, etc.
Short message data was often excluded from the TAR process due to having too little text. However, as the volume of short message data explodes, regulatory agencies are increasingly nudging parties to add this data to the TAR model. The important detail is that messages need to be grouped into 12 or 24-hour segments before being added to the TAR model. This greatly reduces the review burden for these data types as they were previously subject to manual review. Some panelists expressed that there could be the potential to dilute an otherwise stable model by adding a very high count of message data, but the overall sentiment was that this is a welcomed shift.
Consider “Modern Attachments”
The problem of so-called “modern attachments” (sometimes referred to as embedded links), is unlikely to disappear in the near future. Modern attachments are increasingly used in the normal course of business and can pose a significant challenge in a Second Request, or any litigation, where people are frequently sharing documents via hyperlinks, rather than attaching a file to an email or chat. This hyperlink usually points to a document located on file storage within the company’s environment, such as OneDrive or SharePoint. Sending a link, instead of the document, reduces the size of the email and is generally a more secure way to share documents.
As the prevalence of modern attachments increases, many stakeholders expect that the linked documents will be produced with their parent email. However, this is often easier said than done. Microsoft’s M365 environment can now pull the linked documents within the environment, but it does not come without its drawbacks. While this is an improvement, the downside is that it indiscriminately pulls anything that is linked, which can increase both data volumes and collection time. The panelists emphasized the importance of discussing and strategizing early on about how best to handle modern attachments given the technological hurdles it presents.
Redactions can be a pain point in Second Requests, especially in the application of consistent redactions. Redaction propagation technologies have improved over the last couple of years and are more consistent than past versions. Technologies have also improved for video and audio redactions in native. Increasingly, transcripts for MS Teams meetings and other audio are automatically generated. If there is privileged content in a video, rather than redacting the privileged portion of the video, it may make more sense to redact the automated transcript and produce that instead of the video. The panelists agreed that some level of automated redaction should be strongly considered in large matters like a Second Request.
Responsiveness Review. While experts on the panel agree that it is too soon to fully leverage generative AI in a Second Request for identifying the responsive document set, citing costs, accuracy, and security, it is not inconceivable that it will become part of the workflow within the next couple of years.
One panelist shared the results of preliminary testing. In this test, contract attorney training instructions (review protocol) were fed into an LLM (large language model). From there, the model determined the Responsive and Not Responsive sets. This test was run on a dataset that was previously coded by contract reviewers so that a true comparison could be made. The model categorized the results into three buckets; one where it was highly confident in the coding, a second, where confidence was mediocre and third, where the confidence in the coding was poor. For the documents that the model identified with high confidence, these results were similar to a very high-performing TAR model.
A workflow could be envisioned where the high confidence set would be assumed responsive and flow directly to a privileged review, and TAR or search terms could be used on the middle and low confidence documents to identify responsive documents within those buckets. To note, the model was not successful at identifying key documents, nor at identifying privileged documents. While there remains room for improvement, the panelist noted that the model in this initial model did exceed expectations.
Prompt Engineering: As GenAI is slowly adopted into responsiveness review workflows, there will likely be a greater focus on the prompts the LLM model receives. In the above example, the prompt (review protocol, in this case) needed to be refined to produce improved results. The panelists emphasized that prompt engineering will become increasingly important to ensure that the model is getting the best input for the intended results.
A Tempered Approach: Some panelists expressed the need for restraint given the lack of trust many have in LLMs. In addition to the myriad of blatant errors and hallucinations, LLMs often seem like a black box. The panelists grant that TAR was similarly viewed with skepticism at first, but over time practitioners grew increasingly comfortable. GenAI will likely face similar hurdles. In addition to these concerns, there are also practical barriers, like cost and speed. Currently, utilizing GenAI for categorizing documents is still prohibitively expensive (and slow) in most instances. However, panelists expect a shift to take place as the price comes down, accuracy and speed increase, and overall skepticism and discomfort surrounding LLMs gradually decline.
As legal technology continues to advance, practitioners are encouraged to embrace these innovations thoughtfully, recognizing the potential benefits while remaining vigilant to evolving challenges and ensuring a judicious balance between efficiency and trust in the ever-evolving legal tech landscape.