AI summarization isn't, again, redux
2025-05-22 17:06:42.119618+02 by Dan Lyke 0 comments
PsyPost: AI chatbots often misrepresent scientific studies — and newer models may be worse
The researchers also found that prompting the models to be more accurate didn’t help—if anything, it made things worse. When models were instructed to “avoid inaccuracies,” they were nearly twice as likely to produce generalized statements compared to when they were simply asked to summarize the text. One explanation for this counterintuitive result may relate to how the models interpret prompts. Much like the human tendency to fixate on a thought when told not to think about it, the models may respond to reminders about accuracy by producing more authoritative-sounding—but misleading—summaries.
Royal Society Open Science: Generalization bias in large language model summarization of scientific research Uwe Peters and Benjamin Chin-Yee https://doi.org/10.1098/rsos.241776
Notably, newer models tended to perform worse in generalization accuracy than earlier ones. Our results indicate a strong bias in many widely used LLMs towards overgeneralizing scientific conclusions, posing a significant risk of large-scale misinterpretations of research findings.
Via Calishatat @researchbuzz, who also observed:
The emperor is running around nude and the tech media keeps going "Oh what a lovely wardrobe"
And via.