Image by Canva
ChatGPT released a new classifier tool yesterday to detect AI-generated text that, within a few hours, proved to be imperfect, at best. It turns out that when it comes to detecting generative AI — whether it is text or images — there may be no quick fix.
Sebastian Raschka, an artificial intelligence (AI) and machine learning (ML) researcher who serves as lead AI educator at Lightning AI, began testing the OpenAI Text Classifier on ChatGPT with text snippets from a book he published in 2015. Three different passages received varied results — the tool reported that it was “unclear” whether the book’s preface was written by AI; but the foreword was “possibly AI” and a paragraph from the first chapter was “likely” AI.
No compatible source was found for this media.
Even more concerning was how the tool classified the first page of Shakespeare’s Macbeth:
“The classifier considers the text to be likely AI-generated.”
Event
Learn the critical role of AI & ML in cybersecurity and industry specific case studies. Watch on-demand sessions today.
When asked if he was surprised by the results, Raschka said “Yes and no — they are not sharing the paper so I can’t say 100% how it works, but based on the short description they have on the website, it sounds like they’re training a classifier to predict whether something is human generated or AI generated.” The problem, he explained, is that there are false negatives and false positives based on what dataset the tool was trained on.
With Macbeth, for example, Raschka said he thinks the tool was not trained on Old English. “It’s not normal spoken English, it’s almost like a foreign language.”
OpenAI says tool can still be useful in tandem with other methods
OpenAI admits the classifier, which is a GPT model that is fine-tuned via supervised learning to perform binary classification, with a training dataset consisting of human-written and AI-written text passages, is only about 26% accurate.
However, it says the tool can still be useful in tandem with other methods. In an email, the company said, “The classifier aims to help mitigate false claims that AI-generated text was written by a human. However, it still has a number of limitations — so it should be used as a complement to other methods of determining the source of text instead of being the primary decision-making tool.”
The company added on its website that they are making the classifier publicly available “to get feedback on whether imperfect tools like this one are useful,” adding that they will continue working on detecting AI-generated text and “hope to share improved methods in the future.”
Other generative AI detection tools face an uphill battle
OpenAI is far from alone in attempting to deal with the Wild West of generative AI detection. There are a surge of other tools taking a stab at the challenge.
GPTZero, for example, provides a score that then has to be interpreted by the user. In a blog post, Raschka explained: “GPTZero does not recommend whether the text was AI-generated or not. Instead, it only returns the perplexity score for a relative comparison between texts. This is nice because it forces users to compare similar texts critically instead of blindly trusting a predicted label.”
DetectGPT, Raschka explained, “perturbs” the text: That is, he explained, if the probability of the new text is noticeably lower than the original one, it is AI-generated. Otherwise, if it’s approximately the same, it’s human-generated. The problem is, he added, is that the method involves using a specific LLM (large language model), which “may not be representative of the AI model to generate the text in question.”
Watermarking is another approach, he added — the idea to lower the probabilities of certain words so that they are less likely to be used by the LLMs, using an “avoid list.” However, Raschka, explained, this requires an LLM that has been modified with this avoid list. If the avoid list is known, he said, one can modify the AI-generated text.
What does this mean for generative AI detection?
Raschka said that is it unclear how this will all play out and whether generative AI detection tools will make any headway in overcoming the challenge of discerning between human-created content and AI-generated text. Will the internet itself become unusable, flooded with generated content that is impossible to trust?
“What it means to me, or how I think of the way forward, is that internet was the place where you searched for content and you mostly trusted what you found,” he said. In the future, it will be more about being selective and finding credible websites.
Whatever the future holds, Pandora’s Box is already open when it comes to generative AI, he emphasized — adding that he currently finds ChatGPT useful as a “fancy grammar checker” to make writing easier.
“I don’t think we can go backward,” he said. “Everyone is going to be using these systems and I think it’s fine if we use them responsibly — I don’t think there will be a way of avoiding the use of these models.”
For now, generative AI detection tools are “definitely not good enough” to use for important decisions, he said, which includes efforts to use them in grading student papers — in response to fears about cheating and plagiarism.
“Models like this can cause real-world harm due to educators adopting this for grading,” Raschka tweeted yesterday. “So let’s add some transparency about False Positives and False Negatives.”