A.I. and the Future of Cheating
What happens when universities can’t tell whether an essay is written by a human or an algorithm?
No matter whether you were a straight-A student at university or more a student of beer pong, it’s extremely unlikely that your positive memories of college took place in an examination hall. Beyond being generally miserable, exams exacerbate anxiety and other mental health issues, and do a poor job of assessing skills like critical thinking and creativity. Time-pressured tests are used as the key filter for several prestigious professions and universities and, some argue, for no apparent good reason.
Given this sad state of affairs, it should be positive to see supervised exams and tests fall slowly out of vogue. Headmasters and professors have urged that more flexible, less time-pressured assessments like essays and written assignments should replace exams. Singapore, the world leader of exam-based education, has abolished exam rankings (albeit only for primary grades). At the same time, online education has surged, with enrollment in online courses quadrupling over the last 15 years.
Unfortunately, this trend towards online and written assessments has an adversary: artificial intelligence. We’re not far from a future where students will have access to sophisticated A.I. tools with the ability to “write” high-quality essays or assignments on their behalf.
Earlier this year, OpenAI (an A.I. research company founded by Elon Musk and Sam Altman) warned about the danger of its new algorithm for text generation, called GPT-2. At that time, OpenAI was not particularly comfortable with how advanced GPT-2’s capabilities were in generating sophisticated text in response to commands. OpenAI trained GPT-2 on 8 million web pages that had been curated by users of Reddit (the poor A.I.).
If a university has no way of determining whether an assignment was written by a human or an algorithm, existing grading systems lose any semblance of meritocracy or fairness.
Of course, most students don’t want to cheat their way through university. But cheating is becoming more common. A study of top U.K. universities found 3,721 cases of academic misconduct in 2016–2017, up 40% from the previous study two years earlier. We’re looking at a possible future where lazy or dishonest students can access A.I. tools that allow them to outperform other students without effort and without evidence of actual learning. At the other end of the grade curve, top students might also be incentivised to cheat to get ahead, eager to get an edge over other high achievers in competitive fields like medicine and law.
The power of neural networks right now
I fed GPT-2 the following sentence:
Nervous of how the A.I. might be misused by bad actors, OpenAI to date has only released earlier and weaker iterations of GPT-2.
Here is what it returned back to me:
At the moment, GPT-2 uses a binary search algorithm, which means that its output can be considered a ‘true’ set of rules. If OpenAI is right, it could eventually generate a Turing complete program, a self-improving machine that can learn (and then improve) itself from the data it encounters. And that would make OpenAI a threat to IBM’s own goals of machine learning and AI, as it could essentially make better than even humans the best possible model that the future machines can use to improve their systems. However, there’s a catch: not just any new AI will do, but a specific type; one that uses deep learning to learn the rules, algorithms, and data necessary to run the machine to any given level of AI.
Believe it or not, I didn’t edit this response at all. You can access GPT-2 yourself, at Talk to Transformer. Admittedly, the writing sample above is not amazing: It has a few writing errors and sounds more coherent than it proves to be upon inspection. Irene Solaiman, senior researcher at OpenAI, confirmed to OneZero that existing text-generating A.I. models like GPT-2 were limited in capacity, particularly when asked to complete long passages of text. Solaiman doubted that GPT-2 could be used to produce passable academic essays without significant modification or improvement.
It’s unlikely that sophisticated text-generating A.I. will remain under lock and key with secure custodians like OpenAI.
However, the OpenAI report on GPT-2 does quote research that demonstrates that humans can be fooled by A.I.-generated text. In one study, GPT-2 text samples were almost as convincing as actual New York Times articles (72% of respondents rated the GPT-2 samples as “credible” compared to 83% for the New York Times). It’s not hard to imagine even an imperfect A.I., perhaps a successor to GPT-2, able to outperform its fair share of beer pong aficionados in universities.
In addition, it’s unlikely that sophisticated text-generating A.I. will remain under lock and key with secure custodians like OpenAI. For one thing, some of their technology may have been recreated already by a couple of computer science graduates. For another, OpenAI expects to share the next most advanced version of GPT-2 with the world in just a few months. While OpenAI should be applauded for their slow, careful rollout so far, the absence of large-scale malicious use to date doesn’t preclude misuse in the future, as the GPT-2 report notes. Relatively small-scale instances of cheating using tools like GPT-2 may also escape notice in a way that large-scale misuse (like weaponized misinformation campaigns) would not.
Given the potential demand from students, it may only be a matter of time before entrepreneurial programmers develop dedicated text-generating A.I. directed at them. Existing tools used by universities to check the integrity of student work are laughably ill-equipped to cope with A.I.-written assignments. Academic integrity checkers like TurnItIn scan solely for plagiarism, which already fails to catch instances where students pay others to write original essays for them. A.I. could exploit this existing loophole at scale. While text-generating A.I. access resources in their database, they do not plagiarise per se, and so would avoid detection from existing anti-cheating tools.
Thankfully, for the time being, this remains a hypothetical issue. A.I. may be many years away from producing considered responses to complex questions in a way that would pass as believably human. In the meantime, researchers are developing tools to neutralize the threat to academia. A.I. experts at Harvard and MIT are working on a tool to identify predictable word patterns in order to detect text written by A.I., even if that text would fool a human reader into thinking it was genuine. Another tool in development attempts to detect A.I.-generated text by scanning for common machine errors, like incorrect expressions and untranslated phrases.
Developing an effective solution at scale for universities is vital — without an adequate defense, the academic sector risks being undermined as text-generating A.I. becomes more advanced. If lazy students can simply pay for A.I. tools to write them brilliant essays, and universities cannot tell which responses are written by a human and which are not, any grading system that incorporates those assessments is compromised. One conclusion is that universities will be forced to cut any assignment that allows for the possibility of A.I. assistance. That would mean an end to written assignments, essay responses, and absolutely anything assessed online.
However, a more positive long-term alternative was suggested to OneZero by Miles Brundage, a research scientist at OpenAI. Miles speculates that A.I. tools could eventually be incorporated into the classroom in the same way that calculators have been.
“If it is not being used in an illicit fashion, A.I. can be a boon for generating larger, more ambitious, more creative text documents — similar to how calculators have been used to help develop the skills we want to be teaching,” says Brundage.
Calculators have spared maths and science students the laborious pain of long calculations and allowed for more advanced syllabuses, designed with the knowledge that students will have access to calculators when answering them. Similarly, it is possible that universities might evolve by planning courses that take advantage of students’ access to A.I.
Whether universities fight the technological advance of text-generating A.I. or embrace it, there is a lot at stake — with consequences much more serious than the infliction of more exams on the student body. The great promise of online courses to provide advanced education at scale in the developing world may never be realized if every assignment completed online is inherently suspect, haunted by the spectre of sophisticated A.I.
As ever, A.I. represents a Pandora’s box of challenges and opportunities. On the one hand, text-generating models like GPT-2 might assist writers and help maximize creativity. But with academic cheating, A.I. threatens the opposite — eliminating the need for human effort and compromising a grading system dependent on a fair playing field. How the nascent field of text-generating A.I. develops (and how universities respond) will matter for straight-A students and beer pong experts alike.