Delving into “delve”

Figure 1

It turns out that ChatGPT tends to overuse certain words and phrases, including “delve”. According to one post, “delve” is among the 10 most common words found in text returned by ChatGPT.

If scientific authors use ChatGPT to assist them in writing their papers, then it is likely that these common ChatGPT words will appear, especially in introductions, abstracts and possibly titles.

Indeed, Jeremy Nguyen (@JeremyNguyenPhD) finds that mention of “delve” in papers on PubMed has greatly increased since 2023 (when ChatGPT appeared). He has Tweeted about this https://twitter.com/JeremyNguyenPhD/status/1774021645709295840

This piqued my curiosity. But it would seem that these counts need to be normalized to get a sense of how the proportion of papers using “delve” has grown. So, I took a look, using OpenAlex, an open source platform that indexes papers and other scholarly outputs.

Here is what I found. Although the use of “delve” was gradually increasing through to 2022, it jumped noticeably in 2023 (when ChatGPT became widely available) and has continued to increase in 2024 (data is for part year, latest point available at time of analysis). See Figure 1. This chart shows both the count and the normalized percentage of papers with “delve” in their title or abstract, by publication year. The source is OpenAlex, filtered by articles (which includes peer-reviewed journal papers and preprints).

Some 30,276 (or 46%) of all the 66,158 papers that mention “delve” posted on OpenAlex between 1990 and March 31, 2024 came out in the 15-month period from January 2023 to the end of March 2024.

The analysis in Figure 1 is for all topics, i.e. the universe of records of papers included in OpenAlex (168.3M articles, 1990-2024 to date). A technical point: there is duplication in this dataset, for example, some preprints may subsequently appear in journals. Duplication occurs in both numerators and the denominators. Be aware of this, particularly in interpreting the counts.

Why does ChatGPT use “delve” when it responds? Some suggest it is in an effort to appear “authoritive”, see https://twitter.com/GaryMWatson/status/1774090530227446050

Is this an issue? Well, perhaps not, if researchers use ChatGPT to polish their language or assist in preparing an abstract, it could be helpful. Authors should indicate that they have used ChatGPT to assist in revising their text (and respect editorial guidelines for using AI in journal submissions).

The share of all papers where authors have used ChatGPT to assist their writing is likely to be rather higher than the percentage of papers using common ChatGPT words such as “delve”.

But if the rising mention of “delve” is a sign that authors are increasingly using ChatGPT not to polish but to actually write (or substantially write) their papers, this does raise concerns especially if those papers do not explictly acknowledge that ChatGPT or another generative AI model has been used.

However, be careful. Searching for mentions of “delve” at an aggregated level does indeed show a pattern that is consistent with ChatGPT use. Yet, for a specific individual paper, the use of “delve” in title or abstract does not prove that those specific authors have used ChatGPT, they just might like the word!

There are also perfectly regular uses of “delve” in scientific works, for example the use of delve in reference to certain star clusters, dark-matter galaxy searching, a research archive name, software apps or debuggers, and area or person names.

So, what is of interest here is not the normal or “base” level of use of “delve”, which (from the OpenAlex analysis) was growing slowly from 0.31-0.56 per 1000 papers between 2018 to 2022 (the 5 year period before the widespread availability of ChatGPT). Rather, it is the rapid increase in “delve” mentions from 2023 onwards, reaching an average of 7.9 mentions per 1000 papers in the first quarter (Q1) of 2024. In some countries, the rate of delve mentions by Q1 2024 had risen even faster (see Figure 2, below), greatly above the “base” level through to 2022.

“Delve” mentions in scientific papers seem set to grow in the near term (if the early 2024 results continue to hold for the full year). There are other common ChatGPT words that likely signal whether ChatGPT was used to assist scientific writing. However, it is now easy to go online to find recommendations and tools to “weed out” common ChatGPT-generated terms and phrases. In prompting, ChatGPT can be fed lists of words and phrases not to use. In post-processing, text can also be fed into tools that claim to highlight or fix text where common ChatGPT words are present.

If there is a future downturn (or slowing in growth rate) in mentions of “delve” and other common ChatGPT words in scientific paper titles and abstracts, this may be due to greater awareness by ChatGPT users of these common words and approaches to avoid or replace them. Or improvements in GPT’s own writing capabilities. Or awareness that journals are now paying increasing attention to AI-generated content.

Notes

1/ Figure 1 was posted on X on March 31, 2024, at https://x.com/philipshapira/status/1774497589980742071?s=20

2/ An analysis by country of author affilitation for the top ten countries by count of papers mentioning “delve” in their title or abstract in 2023 shows that all these countries had a fairly low percentage of papers mentioning “delve” published in the two prior years of 2021 and 2023. See Figure 2. Country affiliations are those reported by OpenAlex.

In 2023 and 2024, paths diverged, with papers from Malaysia, Pakistan, and India more likely to mention “delve”. Mentions grew in all the other countries too, but not so rapidly. In 2024 (through to April 1), Germany shows the lowest rate of growth, while China is not too different from the UK and USA.

Figure 2