The model is the first to reach over 80 per cent on SWE-Bench Verified, which is used to measure programming skills.
Andrew Ng says he uses multiple AI models and long conversations to brainstorm while driving.