AI news of the morning
2025-06-20 17:35:22.622696+02 by Dan Lyke 0 comments
Solar Company Sues Google for Giving Damaging Information in AI Overviews
"This lawsuit is not just about defending our company's reputation; it's about standing up for fairness, truth, and accountability in the age of artificial intelligence," Nicholas Kasprowicz, general counsel for the solar company, Wolf River Electric, said in a statement.
Plaintiffs and defendants in copyright lawsuits over generative AI often make sweeping, opposing claims about the extent to which large language models (LLMs) have memorized plaintiffs' protected expression. Drawing on adversarial ML and copyright law, we show that these polarized positions dramatically oversimplify the relationship between memorization and copyright. To do so, we leverage a recent probabilistic extraction technique to extract pieces of the Books3 dataset from 13 open-weight LLMs. Through numerous experiments, we show that it's possible to extract substantial parts of at least some books from different LLMs. This is evidence that the LLMs have memorized the extracted text; this memorized content is copied inside the model parameters. But the results are complicated: the extent of memorization varies both by model and by book. With our specific experiments, we find that the largest LLMs don't memorize most books -- either in whole or in part. However, we also find that Llama 3.1 70B memorizes some books, like Harry Potter and 1984, almost entirely. We discuss why our results have significant implications for copyright cases, though not ones that unambiguously favor either side.
Baldur Bjarnason @baldur@toot.cafe
When people call LLMs “useless” they’re generally being kind as it’d be more accurate to call it harmful, dangerous, toxic, or risky.
It’s like calling white asbestos “useless”. Technically true in most western countries because you literally can’t use it, but it kind of elides the reason why