TheAgentCompany
2025-07-02 16:50:27.903117+02 by Dan Lyke 0 comments
According to Gartner, many agents are fiction without the science. "Many vendors are contributing to the hype by engaging in 'agent washing' – the rebranding of existing products, such as AI assistants, robotic process automation (RPA) and chatbots, without substantial agentic capabilities," the firm says. "Gartner estimates only about 130 of the thousands of agentic AI vendors are real."
Which, if course, duh, but mostly this is about TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks
We build a self-contained environment with internal web sites and data that mimics a small software company environment, and create a variety of tasks that may be performed by workers in such a company. We test baseline agents powered by both closed API-based and open-weights language models (LMs), and find that the most competitive agent can complete 30% of tasks autonomously. This paints a nuanced picture on task automation with LM agents--in a setting simulating a real workplace, a good portion of simpler tasks could be solved autonomously, but more difficult long-horizon tasks are still beyond the reach of current systems. We release code, data, environment, and experiments on this https URL.