What happens when law students go head-to-head with GenAI?

Facebook
Twitter
LinkedIn

Source: University of Wollongong Australia

Can generative artificial intelligence (GenAI) outsmart 90 per cent of law students? It’s a claim OpenAI made in 2023 when it announced ChatGPT-4 scored higher than 90 per cent of human test takers on a simulated version of the US bar exam.

The experiment
In the second semester of 2023 Dr Alimardani, a lecturer in Law and Emerging Technologies at the University of Wollongong School of Law, was the subject coordinator of Criminal Law. He prepared the end of semester exam question and generated five AI answers using different versions of ChatGPT. He generated another five AI answers using a variety of prompt engineering techniques to enhance their responses. After the Criminal Law exam was held at the end of the semester Dr Alimardani mixed the AI generated papers with the real student papers and handed them to tutors for grading.

The results
Dr Alimardani said the exam was marked out of 60 and 225 students took the test. He said the average mark was about 40 (around 66 per cent).

“For the first lot of AI papers which didn’t use any special prompt techniques only two barely passed and the other three failed,” Dr Alimardani said.

“The best performing paper was only better than 14.7 per cent of the students. So this small sample suggests that if the students simply copied the exam question into one of the OpenAI models, they would have a 50 per cent chance of passing.”

The other five papers that used prompt engineering tricks performed better.

What does this mean for students and educators?

Dr Alimardani said none of the tutors suspected any papers were AI generated and were genuinely surprised when they found out.

“The AI generated answers weren’t as comprehensive as we expected. It seemed to me that the models were fine tuned to avoid hallucination by playing it safe and providing less detailed answers,” Dr Alimardani said.

“My research shows that people can’t get too excited about the performance of GenAI models in benchmarks. The reliability of benchmarks may be questionable and the way they evaluate models could differ significantly from how we evaluate students.”

Read the full article: https://www.uow.edu.au/media/2024/what-happens-when-law-students-go-head-to-head-with-genai.php

Sign up for our newsletter

Get weekly news and insights delivered straight to your inbox!