Vals AI, the LLM evaluator, Announces a Market-First Legal AI Benchmarking Study

Published on October 21, 2024

Vals AI, supported by Legaltech Hub, announced it is collaborating with a number of top US law firms, including Reed Smith and Fisher Phillips, AI vendors, including Thomson Reuters, LexisNexis, Harvey, vLex, and Vecflow, and an ALSP, Cognia, to produce a new legal AI benchmarking study that will evaluate the accuracy and efficacy of some of the legal industry’s most used generative AI platforms. This marks the first time that multiple law firms and vendors will have come together to objectively assess legal AI platform performance on real world examples of legal tasks.

A number of legal AI benchmarking reports have been published over the past year, with a variety of outcomes that show AI performing both above and below human baselines and that for certain legal tasks, particularly legal research, there remain high rates of error. These reports raise questions about the utility of generative AI in professional services, leaving lawyers, law firms and other legal service providers confused and unsure. As a result, multiple industry groups and legal practitioners globally have made calls for independent benchmarking of legal AI platforms. This study begins to answer those calls.

The new study will focus on specific legal tasks across transactional, disputes, and advisory disciplines, including document-related tasks, case analysis, and legal and market research. For each task, the participant law firms will provide a curated set of non-confidential or publicly sourced legal documents and corresponding questions with model answers. Responses to the questions will be submitted from each of the AI vendors’ platforms as well as a control group of independent lawyers.

Vals AI will use its proprietary auto-evaluation framework platform to produce a blind assessment of the submitted responses against the model answers. Because the AI platforms differ in their scope of capabilities, vendors may opt out of being assessed for certain task categories. Also, due to the nature of the legal research tasks, all responses to the research questions will instead be assessed by a group of independent law librarians, Legaltech Hub, an insights and analysis company that is an independent player in the legal tech market, helped to design the study and curate the list of law firms and vendors involved. They are also contributing to the development of a fair and objective methodology. The assessment outcome will be based on a scoring rubric considering the accuracy, usefulness, citations (where relevant), style and format of the responses. The scoring of AI-produced responses will be directly compared to the scoring of the responses produced by a control group of lawyers provided by Cognia, establishing a lawyer performance baseline and enhancing the value of the study for law firms and legal practitioners, who require a better understanding of the practical benefits of using AI for their work.

The results of the study will be published in a report later this year at https://vals.ai.

Rayan Krishnan, co-founder and CEO of Vals AI commented:

“Vals AI, Legaltech Hub and the participating law firms and AI vendors share the common goals of increasing transparency within the legal industry and building trust in generative AI for legal practitioners. Following the release of our LegalBench report evaluating individual model performance on standard legal analyses, the natural next step was to apply our methodology and platform under more real-world conditions. We are thrilled to work with such a prestigious and forward-thinking intra-industry group. In undertaking this new study, we hope to push the market forward in its understanding and adoption of generative AI and offer a new standard for legal AI benchmarking.”

About Vals AI:

Vals AI is a new company spearheaded by a team of computer scientists, law students, consultants, and industry members primarily based at Stanford University. The effort focuses on addressing the critical need for robust benchmarks and recommendations for leveraging generative AI in various industry-specific tasks. By establishing live leaderboards and undertaking rigorous analyses of specific workflows, Vals AI aims to bridge the gap between theoretical advancements and real-world applications. The initiative emphasizes neutrality, ensuring unbiased evaluation and recommendations across models.

About Legaltech Hub:

Legaltech Hub (LTH) is the leading insights and analysis company on legal tech and innovation. Through its directory of legal tech solutions and LTH Premium its subscription LTH supports both buyers and vendors of legal tech solutions.

Media contact info:
If you have any questions about the study please contact Rayan Krishnan ([email protected]) or Nikki Shaver ([email protected]).