This repository trains LLMs to perform multi-turn Tool-Integrated Reasoning (TIR) with RL, where LLMs iteratively generate code, execute it, and think upon the execution results. This capability ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results