AI not yet a match for humans in academic endeavour

Artificial intelligence is nowhere near a match for humans in true academic endeavour.

4 min read

8:35PMSeptember 05, 2023.

Updated 10:17PMSeptember 05, 2023

It has been eight months since ChatGPT was unleashed on an unsuspecting academe.

A week later, we used it to complete an assessment task. To practise research, writing and referencing, our first-year science students write a short, popular-level article on the scientific topic of their choosing. We asked the blinking ChatGPT cursor a few questions on the topic “How do viruses jump between species?”

Three minutes and 40 seconds later, a few copy-pastes completed the task. We sent the “article” to a fellow academic without divulging its origin. Although it was incomplete – no references – the quality of the writing was given a “seven or eight out of 10”.

A free online artificial intelligence tool can write, with minimal prompting, coherent paragraphs of text on almost any topic. What otherwise would take hours of research, drafting, fact-checking, writing and rewriting can be done in moments.

Students can complete essays without learning anything. Today, Facebook is trying to sell me a ChatGPT writing course from another university in Sydney.

Well, every educator has asked themselves, now what? Here’s a first-hand account of how this year has played out.

In the new year, a variety of all-staff seminars were convened. Particularly eye-opening was the representative from a certain software company that makes a word processing app who rather bluntly informed us that AI soon would be everywhere. All hope of ignoring our robot overlords is lost.

Plagiarism checkers were alert if not alarmed. As at most universities, we subscribe to a service that compares student assessments to online sources, searching for copy-pasting. It is remarkably effective: we receive a plagiarism percentage and a list of the original online sources. We don’t often make accusations of plagiarism, but when we do they’re bulletproof.

In April the plagiarism checkers announced an AI detector. Students’ essays receive an AI score showing how much of their writing is generated by a computer. The problem: we can’t prove it. There is no original. The claimed reliability rate is 98 per cent. But the method used to compute the AI score is a trade secret.

So, we decided to be somewhat scientific about it.

We set our students an extra task: having researched a scientific topic, they were to ask three questions of ChatGPT and copy-paste the answers into their assessment. Then they were to fact-check ChatGPT against their own critically compiled sources of information. We could check the AI checker.

The AI checker turned out to be remarkably good; not perfect but impressive. Upwards of 90 per cent sounds about right.

As we looked in more detail at the articles tagged with a high AI score, a trend emerged: they were terrible.

We’re training future scientists. Scientists reason from evidence: careful experiments and observations. This is often missing from mass-media science: the awful phrase “studies show” betrays a daredevil-worthy leap from nowhere to a conclusion, sailing over several careers’ worth of scientific work in a single bound.

We told our students to dig deeper: what did the scientists do and what did they see?

Could ChatGPT do it? ChatGPT is a language mimic, an exceptionally well-trained parrot. It predicts what word probably comes next, given how you’ve prompted it and what it’s written so far.

Being trained on a mixture of conclusion-heavy news and technical scientific reports, ChatGPT couldn’t find the middle ground, explaining a new idea to a non-specialist. The articles were boring, padded out with the same technical idea restated slightly differently.

ChatGPT could have done better with more specific prompting. But that would have required research – precisely what the AI-bludging students were trying to avoid.

ChatGPT has some writing skills but it is trained on, and aimed at, average. It can rewrite a bad sentence into an OK sentence, and it can rewrite an OK sentence into another OK sentence.

Sometimes it just blundered; for example, by conflating the results of different experiments. On certain medical topics it would wisely refuse to answer the kind of question that should be asked to a doctor. When asked “Can you find three scientific papers to learn about comets?”, ChatGPT provided fake references about as often as real ones. Remember: it’s a language mimic, not a research assistant.

What does this mean for the future? Educators have adapted to new technology before. The worry is that AI will evolve faster than our adaptations. As physicists, we take some solace in the fact free AI apps can’t interpret diagrams. But those days are numbered. Specialist knowledge is not beyond AI. And don’t tell our students about the website undetectable.ai.

In this era, generic take-home essays are about as useful as a take-home spelling test. They have become academic fast food.

The future is coming into focus. Assessment design has always been a subtle art, but perhaps the most obvious answer is the oldest: in-person exams.

Universities have been moving away from in-person exams for more than a decade: too rigid, too stressful, too gameable, too inauthentic. (Not always, of course: would you like your medical professional to be able to calculate concentrations before they put that needle of morphine in your arm?)

Ironically, the latest technology may send education back to pen and paper. But in a world of fake expertise, our students need to show they have the real thing.

Dr Luke Barnes is a lecturer in astronomy and cosmology at Western Sydney University. With PhD at the University of Cambridge, he has published papers in the field of galaxy formation and on the fine-tuning of the universe for life. He is the author, with Professor Geraint Lewis, of “A Fortunate Universe: Life in a Finely Tuned Cosmos” and “The Cosmic Revolutionary’s Handbook: (Or: How to Beat the Big Bang)”, published by Cambridge University Press. Professor Miroslav Filipovic is a scientist at Western Sydney University. Astronomy, science, philosophy and computing are his profession, hobby, interest and passion. Research in astronomy has been a source of fascination for him since the early 1980s. His research interests centre on supernovae, high-energy astrophysics, planetary nebulae, Milky Way structure and mass extinctions, HII regions, X-ray binaries, active galactic nuclei, and the Magellanic Clouds.

Join the conversation

AI not yet a match for humans in academic endeavour

Add your comment to this story

Giles flags summit focus on skills crisis

Bleaching kills 90 per cent of corals in shallow zones of Great Barrier Reef