Flutterby™! : Making up the rules...

Next unread comment / Catchup all unread comments User Account Info | Logout | XML/Pilot/etc versions | Long version (with comments) | Weblog archives | Site Map | | Browse Topics

Making up the rules...

2025-01-20 23:38:08.060464+01 by Dan Lyke 0 comments

OpenAI o3 beats FrontierMath — because OpenAI funded the test and had access to the questions. OpenAI has been bragging that the o3 model achieved "...87.5%, beating the previous best AI score of 55.5%."

Besiroglu says there is also a “hold-out” set of tests that OpenAI has no access to.

Imma go out on a limb here and ask if the hold-out tests are, perhaps, 12.5% of the corpus?

[ related topics: Interactive Drama Artificial Intelligence ]

comments in ascending chronological order (reverse):

Comment policy

We will not edit your comments. However, we may delete your comments, or cause them to be hidden behind another link, if we feel they detract from the conversation. Commercial plugs are fine, if they are relevant to the conversation, and if you don't try to pretend to be a consumer. Annoying endorsements will be deleted if you're lucky, if you're not a whole bunch of people smarter and more articulate than you will ridicule you, and we will leave such ridicule in place.


Flutterby™ is a trademark claimed by

Dan Lyke
for the web publications at www.flutterby.com and www.flutterby.net.