Glossary/Reasoning Models (o1, o3, R1)
Flagship AI Models

Reasoning Models (o1, o3, R1)

AI that thinks before it answers — and scores 99th percentile on math competitions.


Definition

Reasoning models are a new class of large language models trained to perform extended chain-of-thought reasoning before producing a final answer. OpenAI's o1 (September 2024) was the first widely deployed reasoning model — it scored 83% on the 2024 International Mathematics Olympiad qualifying exam, compared to 13% for GPT-4o. DeepSeek R1 (January 2025) replicated o1-level performance as an open-source model, setting off a wave of reasoning model development across the industry.

How reasoning models are trained: GRPO and process reward models

Standard LLMs are trained to predict the next token. Reasoning models are trained with reinforcement learning to maximize the correctness of final answers — the model learns to use its context window as a scratchpad. OpenAI uses a proprietary training process; DeepSeek R1 uses Group Relative Policy Optimization (GRPO), which eliminates the need for a separate critic model by using the average reward within a group of generated responses as the baseline.

GRPO objective: advantage A_i is computed relative to the group average reward rather than a learned value function. This eliminates the critic network entirely, reducing training memory by ~50% compared to standard PPO.

ModelCreatorAIME 2024MATH-500SWE-BenchOpen?
o1OpenAI74.4%96.4%48.9%No
o3 miniOpenAI90.0%97.9%49.3%No
DeepSeek R1DeepSeek79.8%97.3%49.2%Yes
Claude 3.7 (thinking)Anthropic~80%~97%70.3%No
Gemini 2.5 ProGoogle92.0%97.9%UnreportedNo

When to use a reasoning model vs a standard model

  • Use reasoning models for: math problems, formal proofs, multi-step coding tasks, complex logic puzzles, scientific analysis
  • Use standard models for: writing, summarization, simple Q&A, translation, classification — tasks where extended thinking wastes time and money
  • Reasoning models are 5–20x more expensive and 5–10x slower than equivalent standard models
  • The 'thinking' tokens are often not shown to users but count toward your token bill

Practical rule

If a task could be solved by a smart person in 30 seconds, use a standard model. If it would take a PhD student 30 minutes of focused work, use a reasoning model.

Try LumiChats for ₹69

39+ AI models. Study Mode with page-locked answers. Agent Mode with code execution. Pay only on days you use it.

Get Started — ₹69/day

Related Terms

2 terms