Flutterby™! : Making up the rules...

Next unread comment / Catchup all unread comments User Account Info | Logout | XML/Pilot/etc versions | Long version (with comments) | Weblog archives | Site Map | | Browse Topics

Making up the rules...

2025-01-20 23:38:08.060464+01 by Dan Lyke 0 comments

OpenAI o3 beats FrontierMath — because OpenAI funded the test and had access to the questions. OpenAI has been bragging that the o3 model achieved "...87.5%, beating the previous best AI score of 55.5%."

Besiroglu says there is also a “hold-out” set of tests that OpenAI has no access to.

Imma go out on a limb here and ask if the hold-out tests are, perhaps, 12.5% of the corpus?

[ related topics: Interactive Drama Artificial Intelligence ]

comments in descending chronological order (reverse):