Microsoft says 'rStar-Math' demonstrates how small language models (SLMs) can rival or even surpass the math reasoning capability of OpenAI o1 by +4.5%

OpenAI logo. (Image credit: Getty Images | SOPA Images)

Microsoft has potentially made a breakthrough with small language models (SLMs) after the recent development of a new reasoning technique dubbed rStar-Math. For context, the technique enhances the capabilities of SLMs, allowing them to compete or even surpass the math reasoning capability of OpenAI's o1 reasoning model — without distillation from superior models.

According to the research paper published on arXiv.org:

"rStar-Math achieves this by exercising "deep thinking" through Monte Carlo Tree Search (MCTS), where a math policy SLM performs test-time search guided by an SLM-based process reward model."

Through MCTS, rStar-Math can critically analyze complex tasks and queries step-by-step, making it easier for SMLs to solve math problems. Additionally, the researchers go the extra mile beyond deep thinking by asking the model to showcase its chain of thought, including natural language descriptions and Python code.

The technique features three innovations designed to mitigate the issues riddling SLM training, including:

A novel code-augmented CoT data sythesis method, which performs extensive MCTS rollouts to generate step-by-step verified reasoning trajectories used to train the policy SLM.
A novel process reward model training method that avoids naïve step-level score annotation, yielding a more effective process preference model (PPM).
A self-evolution recipe in which the policy SLM and PPM are built from scratch and iteratively evolved to improve reasoning capabilities.

The research paper further details four rounds of self-evolution "with millions of synthesized solutions for 747k math problems," rStar-Math enhances math reasoning to state-of-the-art levels. Per benchmarks shared, the technique scales Qwen2.5-Math-7B from 58.8% to 90.0% and Phi3-mini-3.8B from 41.4% to 86.4%. Interestingly, this allows the SMLs to surpass OpenAI's o1 reasoning model by +4.5% and +0.9%, respectively. Finally, the technique solved 3.3% of problems, placing among the top 20% of high school competitors in the American Invitational Mathematics Examination (AIME).

Hugging Face highlighted the researchers' plan to release rStar-Math on GitHub. However, one of the paper's researchers, Li Lyna Zhang, indicated that the code is "still undergoing the review process for open-source release" (via Venture Beat). "The repository remains private for now. Please stay tuned!" the researcher added.

Last April, Microsoft unveiled Phi-3 Mini, a lightweight AI model that promises to ship with similar capabilities as GPT-3.5 despite being smaller. It's trained using less data than GPT-4 or other large language models (LLMs), but it can outperform larger models such as Llama 2.

Microsoft's groundbreaking technique proves bigger isn't always better, potentially promising efficiency and performance. This addresses some of the rising concerns about the vast computational resources required to keep next-gen AI models running.

Get the Windows Central Newsletter