The model is the first to reach over 80 per cent on SWE-Bench Verified, which is used to measure programming skills.