Flutterby™! : Increasing plagiarism

Next unread comment / Catchup all unread comments User Account Info | Logout | XML/Pilot/etc versions | Long version (with comments) | Weblog archives | Site Map | | Browse Topics

Increasing plagiarism

2025-05-14 04:55:38.624789+02 by Dan Lyke 0 comments

A friend today was showing me how he's getting audio processing code out of Google Gemini, and I had to wonder just how much of it was gonna lead to copyright issues. Anyway...

Colin Gordon @csgordon@discuss.systems

When you submit a paper to an ACM journal, it gets run through TurnItIn (yes, really) and the editors in chief have to look at the report and decide if there are plagiarism concerns. Most submissions have a small percentage (~5%) of verbatim-matching text, from a wide variety of sources. The matches are usually small turns of phrase, technical phrases, affiliations, or ACM copyright text 😛 The exceptions are generally extended versions of conference papers, where obviously large chunks of the extension match the original publication.

But recently I've noticed an up-tick, so far only in the wildly-out-of-scope papers that get desk rejected (mostly papers about using LLMs for NLP) of a high percentage of the paper's text (~30%) being flagged as matching, still from a wide variety of sources, but much larger chunks. A long phrase from here, most of a sentence from there, etc., from very scattered sources across different far-ranging fields. This seems unlikely to be from authors picking up phrases they like from papers they actually encountered. I can't help but think these papers have a high fraction of LLM-generated text, and that LLM-generated text on similar topics tends to output a lot of phrases and sentences repeatedly in aggregate, and these patterns are now getting picked up by traditional plagiarism checkers since there's so much LLM-generated text in the world now.

[ related topics: Interactive Drama Music Copyright/Trademark Conferences ]

comments in ascending chronological order (reverse):

Add your own comment:

(If anyone ever actually uses Webmention/indie-action to post here, please email me)




Format with:

(You should probably use "Text" mode: URLs will be mostly recognized and linked, _underscore quoted_ text is looked up in a glossary, _underscore quoted_ (http://xyz.pdq) becomes a link, without the link in the parenthesis it becomes a <cite> tag. All <cite>ed text will point to the Flutterby knowledge base. Two enters (ie: a blank line) gets you a new paragraph, special treatment for paragraphs that are manually indented or start with "#" (as in "#include" or "#!/usr/bin/perl"), "/* " or ">" (as in a quoted message) or look like lists, or within a paragraph you can use a number of HTML tags:

p, img, br, hr, a, sub, sup, tt, i, b, h1, h2, h3, h4, h5, h6, cite, em, strong, code, samp, kbd, pre, blockquote, address, ol, dl, ul, dt, dd, li, dir, menu, table, tr, td, th

Comment policy

We will not edit your comments. However, we may delete your comments, or cause them to be hidden behind another link, if we feel they detract from the conversation. Commercial plugs are fine, if they are relevant to the conversation, and if you don't try to pretend to be a consumer. Annoying endorsements will be deleted if you're lucky, if you're not a whole bunch of people smarter and more articulate than you will ridicule you, and we will leave such ridicule in place.


Flutterby™ is a trademark claimed by

Dan Lyke
for the web publications at www.flutterby.com and www.flutterby.net.