Flutterby™! : Increasing plagiarism

Next unread comment / Catchup all unread comments User Account Info | Logout | XML/Pilot/etc versions | Long version (with comments) | Weblog archives | Site Map | | Browse Topics

Increasing plagiarism

2025-05-14 04:55:38.624789+02 by Dan Lyke 0 comments

A friend today was showing me how he's getting audio processing code out of Google Gemini, and I had to wonder just how much of it was gonna lead to copyright issues. Anyway...

Colin Gordon @csgordon@discuss.systems

When you submit a paper to an ACM journal, it gets run through TurnItIn (yes, really) and the editors in chief have to look at the report and decide if there are plagiarism concerns. Most submissions have a small percentage (~5%) of verbatim-matching text, from a wide variety of sources. The matches are usually small turns of phrase, technical phrases, affiliations, or ACM copyright text 😛 The exceptions are generally extended versions of conference papers, where obviously large chunks of the extension match the original publication.

But recently I've noticed an up-tick, so far only in the wildly-out-of-scope papers that get desk rejected (mostly papers about using LLMs for NLP) of a high percentage of the paper's text (~30%) being flagged as matching, still from a wide variety of sources, but much larger chunks. A long phrase from here, most of a sentence from there, etc., from very scattered sources across different far-ranging fields. This seems unlikely to be from authors picking up phrases they like from papers they actually encountered. I can't help but think these papers have a high fraction of LLM-generated text, and that LLM-generated text on similar topics tends to output a lot of phrases and sentences repeatedly in aggregate, and these patterns are now getting picked up by traditional plagiarism checkers since there's so much LLM-generated text in the world now.

[ related topics: Interactive Drama Music Copyright/Trademark Conferences ]

comments in ascending chronological order (reverse):

Comment policy

We will not edit your comments. However, we may delete your comments, or cause them to be hidden behind another link, if we feel they detract from the conversation. Commercial plugs are fine, if they are relevant to the conversation, and if you don't try to pretend to be a consumer. Annoying endorsements will be deleted if you're lucky, if you're not a whole bunch of people smarter and more articulate than you will ridicule you, and we will leave such ridicule in place.


Flutterby™ is a trademark claimed by

Dan Lyke
for the web publications at www.flutterby.com and www.flutterby.net.