“Notably, O3-MINI, despite being one of the best reasoning models, frequently skipped essential proof steps by labeling them as “trivial”, even when their validity was crucial.”

  • @BigMuffin69@awful.systems
    link
    fedilink
    English
    51 year ago

    I heard new Gemini got the first question, so thats SOTA now*

    *allegedly it came out the same day as the math olympiad so it twas fair, but who the fuck knows