AI BigLaw Bench subtask scores | AutoAdmit.com

The most prestigious law school admissions discussion board in the world.

Back

Refresh

Options

Favorite

AI BigLaw Bench subtask scores

https://www.harvey.ai/blog/expanding-harveys-model-offerings...

tan beady-eyed office idiot

tan beady-eyed office idiot

I am surprised Grok is close to 2.5 pro and o3. The steep im...

massive heady multi-billionaire house

What do you mean concerning

tan beady-eyed office idiot

Looking into this

supple honey-headed forum

tan beady-eyed office idiot

Drab aromatic principal's office

Duck-like legal warrant

you have to use the reasoning version of grok regular gro...

Passionate Twisted Coldplay Fan Blood Rage

Is that what they used here

tan beady-eyed office idiot

So is gemini 2.5 pro the best at all these law tasks? B...

Canary Diverse Karate Love Of Her Life

Poast new message in this thread

Favorite

Date: May 14th, 2025 12:34 AM
Author: tan beady-eyed office idiot

https://www.harvey.ai/blog/expanding-harveys-model-offerings

“In less than a year, seven models (including three non-OAI models) now outperform the originally benchmarked Harvey system on BigLaw Bench,” Harvey wrote in the blog post.
Harvey’s benchmark also showed that different foundation models are better at specific legal tasks than others. For instance, it says Google’s Gemini 2.5 Pro “excels” at legal drafting but “struggles” with pre-trial tasks like writing oral arguments because the model doesn’t fully understand “complex evidentiary rules like hearsay.”
OpenAI’s o3 does such pre-trial tasks well, according to Harvey’s testing, with Anthropic’s Claude 3.7 Sonnet following close behind.

(http://www.autoadmit.com/thread.php?thread_id=5724720&forum_id=2#48928937)

Favorite

Date: May 14th, 2025 10:53 AM
Author: tan beady-eyed office idiot

(http://www.autoadmit.com/thread.php?thread_id=5724720&forum_id=2#48929625)

Favorite

Date: May 14th, 2025 11:09 AM
Author: massive heady multi-billionaire house

I am surprised Grok is close to 2.5 pro and o3. The steep improvement over year old models is concerning.

(http://www.autoadmit.com/thread.php?thread_id=5724720&forum_id=2#48929687)

Favorite

Date: May 14th, 2025 11:11 AM
Author: tan beady-eyed office idiot

What do you mean concerning

(http://www.autoadmit.com/thread.php?thread_id=5724720&forum_id=2#48929698)

Favorite

Date: May 14th, 2025 11:33 AM
Author: supple honey-headed forum

Looking into this

(http://www.autoadmit.com/thread.php?thread_id=5724720&forum_id=2#48929786)

Favorite

Date: May 14th, 2025 11:40 AM
Author: tan beady-eyed office idiot

(http://www.autoadmit.com/thread.php?thread_id=5724720&forum_id=2#48929806)

Favorite

Date: May 14th, 2025 12:02 PM
Author: Drab aromatic principal's office

(http://www.autoadmit.com/thread.php?thread_id=5724720&forum_id=2#48929872)

Favorite

Date: May 14th, 2025 11:31 AM
Author: Duck-like legal warrant

(http://www.autoadmit.com/thread.php?thread_id=5724720&forum_id=2#48929782)

Favorite

Date: May 14th, 2025 12:02 PM
Author: Passionate Twisted Coldplay Fan Blood Rage

you have to use the reasoning version of grok

regular grok sucks ass

(http://www.autoadmit.com/thread.php?thread_id=5724720&forum_id=2#48929874)

Favorite

Date: May 14th, 2025 4:45 PM
Author: tan beady-eyed office idiot

Is that what they used here

(http://www.autoadmit.com/thread.php?thread_id=5724720&forum_id=2#48930928)

Favorite

Date: May 14th, 2025 12:32 PM
Author: Canary Diverse Karate Love Of Her Life

So is gemini 2.5 pro the best at all these law tasks?

BigLaw Bench Core is a set of core tasks for benchmarking baseline legal problem-solving. Core tasks are organized into two primary categories, each encompassing several specific sub-task types:

*Transactional Task Categories*

Corporate Strategy & Advising
Drafting
Legal Research
Due Diligence
Risk Assessment & Compliance
Negotiation Strategy
Deal Management
Transaction Structuring
Regulatory & Advising

*Litigation Task Categories*

Analysis of Litigation Filings
Case Management
Drafting
Case Law Research
Transcript Analysis
Document Review and Analysis
Trial Preparations & Oral Argument

https://github.com/harveyai/biglaw-bench

(http://www.autoadmit.com/thread.php?thread_id=5724720&forum_id=2#48929968)