Better results without LLMs than with
2025-06-15 19:42:59.884443+02 by Dan Lyke 0 comments
Just add humans: Oxford medical study underscores the missing link in chatbot testing
A paper by researchers at the University of Oxford found that while LLMs could correctly identify relevant conditions 94.9% of the time when directly presented with test scenarios, human participants using LLMs to diagnose the same scenarios identified the correct conditions less than 34.5% of the time.
Perhaps even more notably, patients using LLMs performed even worse than a control group that was merely instructed to diagnose themselves using “any methods they would typically employ at home.” The group left to their own devices was 76% more likely to identify the correct conditions than the group assisted by LLMs.