
Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.
Advertisement
You can also search for this author in PubMed Google Scholar
You have full access to this article via your institution.
Scientists and publishing specialists are concerned that the increasing sophistication of chatbots could undermine research integrity and accuracy.Credit: Ted Hsu/Alamy
An artificial-intelligence (AI) chatbot can write such convincing fake research-paper abstracts that scientists are often unable to spot them, according to a preprint posted on the bioRxiv server in late December1. Researchers are divided over the implications for science.
“I am very worried,” says Sandra Wachter, who studies technology and regulation at the University of Oxford, UK, and was not involved in the research. “If we’re now in a situation where the experts are not able to determine what’s true or not, we lose the middleman that we desperately need to guide us through complicated topics,” she adds.
The chatbot, ChatGPT, creates realistic and intelligent-sounding text in response to user prompts. It is a ‘large language model’, a system based on neural networks that learn to perform a task by digesting huge amounts of existing human-generated text. Software company OpenAI, based in San Francisco, California, released the tool on 30 November, and it is free to use.
Since its release, researchers have been grappling with the ethical issues surrounding its use, because much of its output can be difficult to distinguish from human-written text. Scientists have published a preprint2 and an editorial3 written by ChatGPT. Now, a group led by Catherine Gao at Northwestern University in Chicago, Illinois, has used ChatGPT to generate artificial research-paper abstracts to test whether scientists can spot them.
The researchers asked the chatbot to write 50 medical-research abstracts based on a selection published in JAMA, The New England Journal of Medicine, The BMJ, The Lancet and Nature Medicine. They then compared these with the original abstracts by running them through a plagiarism detector and an AI-output detector, and they asked a group of medical researchers to spot the fabricated abstracts.
The ChatGPT-generated abstracts sailed through the plagiarism checker: the median originality score was 100%, which indicates that no plagiarism was detected. The AI-output detector spotted 66% the generated abstracts. But the human reviewers didn’t do much better: they correctly identified only 68% of the generated abstracts and 86% of the genuine abstracts. They incorrectly identified 32% of the generated abstracts as being real and 14% of the genuine abstracts as being generated.
“ChatGPT writes believable scientific abstracts,” say Gao and colleagues in the preprint. “The boundaries of ethical and acceptable use of large language models to help scientific writing remain to be determined.”
Wachter says that, if scientists can’t determine whether research is true, there could be “dire consequences”. As well as being problematic for researchers, who could be pulled down flawed routes of investigation, because the research they are reading has been fabricated, there are “implications for society at large because scientific research plays such a huge role in our society”. For example, it could mean that research-informed policy decisions are incorrect, she adds.
But Arvind Narayanan, a computer scientist at Princeton University in New Jersey, says: “It is unlikely that any serious scientist will use ChatGPT to generate abstracts.” He adds that whether generated abstracts can be detected is “irrelevant”. “The question is whether the tool can generate an abstract that is accurate and compelling. It can’t, and so the upside of using ChatGPT is minuscule, and the downside is significant,” he says.
Irene Solaiman, who researches the social impact of AI at Hugging Face, an AI company with headquarters in New York and Paris, has fears about any reliance on large language models for scientific thinking. “These models are trained on past information and social and scientific progress can often come from thinking, or being open to thinking, differently from the past,” she adds.
The authors suggest that those evaluating scientific communications, such as research papers and conference proceedings, should put policies in place to stamp out the use of AI-generated texts. If institutions choose to allow use of the technology in certain cases, they should establish clear rules around disclosure. Earlier this month, the Fortieth International Conference on Machine Learning, a large AI conference that will be held in Honolulu, Hawaii, in July, announced that it has banned papers written by ChatGPT and other AI language tools.
Solaiman adds that in fields where fake information can endanger people’s safety, such as medicine, journals may have to take a more rigorous approach to verifying information as accurate.
Narayanan says that the solutions to these issues should not focus on the chatbot itself, “but rather the perverse incentives that lead to this behaviour, such as universities conducting hiring and promotion reviews by counting papers with no regard to their quality or impact”.
Nature 613, 423 (2023)
doi: https://doi.org/10.1038/d41586-023-00056-7
Gao, C. A. et al. Preprint at bioRxiv https://doi.org/10.1101/2022.12.23.521610 (2022).
Blanco-Gonzalez, A. et al. Preprint at arXiv https://doi.org/10.48550/arXiv.2212.08104 (2022).
O’Connor, S. & ChatGPT Nurse Educ. Pract. 66, 103537 (2023).
Article Google Scholar
Download references
Are ChatGPT and AlphaCode going to replace programmers?
AI bot ChatGPT writes smart essays — should professors worry?
Could AI help you to write your next paper?
Confused by open-access policies? These tools can help
Technology Feature
Largest-ever study of journal editors highlights ‘self-publication’ and gender gap
News
Multimillion-dollar trade in paper authorships alarms publishers
News
The reproducibility issues that haunt health-care AI
Technology Feature
AI system not yet ready to help peer reviewers assess research quality
Nature Index
After AlphaFold: protein-folding contest seeks next big breakthrough
News
From the archive: an economic model named after a goddess, and an ill-fated octopus
News & Views
Hunting for the best bioscience software tool? Check this database
Technology Feature
The reproducibility issues that haunt health-care AI
Technology Feature
Technische Universität Dresden (TU Dresden)
01069 Dresden, Germany
The University of British Columbia (UBC)
Vancouver, Canada
Jülich Research Centre (FZJ)
Jülich, Germany
Helmholtz Centre for Heavy Ion Research GmbH (GSI)
Darmstadt, Germany
You have full access to this article via your institution.
Are ChatGPT and AlphaCode going to replace programmers?
AI bot ChatGPT writes smart essays — should professors worry?
Could AI help you to write your next paper?
An essential round-up of science news, opinion and analysis, delivered to your inbox every weekday.
Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.
© 2023 Springer Nature Limited