×

注意!页面内容来自https://deepmind.google/models/gemini/,本站不储存任何内容,为了更好的阅读体验进行在线解析,若有广告出现,请及时反馈。若您觉得侵犯了您的利益,请通知我们进行删除,然后访问 原网页

Introducing our most intelligent model yet. With state-of-the-art reasoning to help you learnbuildand plan anything.

Models

Completing everyday tasksor solving complex problems. Discover the right model for what you need

Gemini brings reasoning and intelligence to your daily life.


Gemini 1 introduced native multimodality and long context to help AI understand the world. Gemini 2 added thinkingreasoning and tool use to create a foundation for agents.

NowGemini 3 brings these capabilities together – so you can bring any idea to life.

Build with Google Antigravity – our AI-first developer experience



Performance

Gemini 3 is state-of-the-art across a wide range of benchmarks

Our most intelligent model yet sets a new bar for AI model performance

<> .table strong.table--strong { font-weight: 700; }
Benchmark Notes Gemini 3 Flash Thinking Gemini 3 Pro Thinking Gemini 2.5 Flash Thinking Gemini 2.5 Pro Thinking Claude Sonnet 4.5 Thinking GPT-5.2 Extra high Grok 4.1 Fast Reasoning
Input price $/1M tokens $0.50 $2.00 $4.00 > 200k tokens $0.30 $1.25 $2.50 > 200k tokens $3.00 $6.00 /MTok > 200k tokens $1.75 $0.20
Output price $/1M tokens $3.00 $12.00 $18.00 > 200k tokens $2.50 $10.00 $15.00 > 200k tokens $15.00 $22.50 > 200k tokens $14.00 $0.50
Academic reasoning
(full settext + MM) Humanity's Last Exam No tools 33.7% 37.5% 11.0% 21.6% 13.7% 34.5% 17.6%
With search and code execution 43.5% 45.8% 45.5%
Visual reasoning puzzles ARC-AGI-2 ARC Prize Verified 33.6% 31.1% 2.5% 4.9% 13.6% 52.9%
Scientific knowledge GPQA Diamond No tools 90.4% 91.9% 82.8% 86.4% 83.4% 92.4% 84.3%
Mathematics AIME 2025 No tools 95.2% 95.0% 72.0% 88.0% 87.0% 100% 91.9%
With code execution 99.7% 100% 75.7% 100%
Multimodal understanding and reasoning MMMU-Pro 81.2% 81.0% 66.7% 68.0% 68.0% 79.5% 63.0%
Screen understanding ScreenSpot-Pro No tools unless specified 69.1% 72.7% 3.9% 11.4% 36.2% 86.3% with python
Information synthesis from complex charts CharXiv Reasoning No tools 80.3% 81.4% 63.7% 69.6% 68.5% 82.1%
OCR OmniDocBench 1.5 Overall Edit Distancelower is better 0.121 0.115 0.154 0.145 0.145 0.143
Knowledge acquisition from videos Video-MMMU 86.9% 87.6% 79.2% 83.6% 77.8% 85.9%
Competitive coding problems from CodeforcesICPCand IOI LiveCodeBench Pro Elo Ratinghigher is better 2316 2439 1143 1775 1418 2393
Agentic terminal coding Terminal-Bench 2.0 Terminus-2 harness 47.6% 54.2% 16.9% 32.6% 42.8%
Agentic coding SWE-bench Verified Single attempt 78.0% 76.2% 60.4% 59.6% 77.2% 80.0% 50.6%
Agentic tool use τ2-bench 90.2% 90.7% 79.5% 77.8% 87.2%
Long horizon real-world software tasks Toolathlon 49.4% 36.4% 3.7% 10.5% 38.9% 46.3%
Multi-step workflows using MCP MCP Atlas 57.4% 54.1% 3.4% 8.8% 43.8% 60.6%
Agentic long term coherence Vending-Bench 2 Net worth (mean)higher is better $3,635 $5,478 $549 $574 $3,839 $3,952 $1,107
Factuality benchmark across groundingparametricsearchand MM FACTS Benchmark Suite 61.9% 70.5% 50.4% 63.4% 48.9% 61.4% 42.1%
Parametric knowledge SimpleQA Verified 68.7% 72.1% 28.1% 54.5% 29.3% 38.0% 19.5%
Multilingual Q&A MMMLU 91.8% 91.8% 86.6% 89.5% 89.1% 89.6% 86.8%
Commonsense reasoning across 100 Languages and Cultures Global PIQA 92.8% 93.4% 90.2% 91.5% 90.1% 91.2% 85.6%
Long context performance MRCR v2 (8-needle) 128k (average) 67.2% 77.0% 54.3% 58.0% 47.1% 81.9% 54.6%
1M (pointwise) 22.1% 26.3% 21.0% 16.4% not supported not supported 6.1%

Gemini 3 Deep Think

Pushes the boundaries of intelligencedelivering a step-change in Gemini 3’s reasoning and multimodal understanding capabilities to help you solve your most complex problems

Gemini 3 Deep Think can better help tackle problems that require creativitystrategic planningand making improvements step-by-step. Available for Google AI Ultra subscribers.

Three bar charts comparing AI model performance. 1) Humanity’s Last Exam (Reasoning & knowledge): Gemini 3 Deep Think scores highest at 41%followed by Gemini 3 Pro (37.5%)GPT-5 Pro (30.7%)GPT-5.1 (26.5%)Gemini 2.5 Pro (21.6%)and Claude Sonnet 4.5 (13.7%). 2) GPQA Diamond (Scientific knowledge): Gemini 3 Deep Think leads at 93.8%followed by Gemini 3 Pro (91.9%)GPT-5 Pro (88.4%)GPT-5.1 (88.1%)Gemini 2.5 Pro (86.4%)and Claude Sonnet 4.5 (83.4%). 3) ARC-AGI-2 (Visual reasoning): Gemini 3 Deep Think (using tools) dominates at 45.1%followed by Gemini 3 Pro at 31.1%GPT-5.1 (17.6%)GPT-5 Pro (15.8%)Claude Sonnet 4.5 (13.6%)and Gemini 2.5 Pro (4.9%). Three bar charts comparing AI model performance. 1) Humanity’s Last Exam (Reasoning & knowledge): Gemini 3 Deep Think scores highest at 41%followed by Gemini 3 Pro (37.5%)GPT-5 Pro (30.7%)GPT-5.1 (26.5%)Gemini 2.5 Pro (21.6%)and Claude Sonnet 4.5 (13.7%). 2) GPQA Diamond (Scientific knowledge): Gemini 3 Deep Think leads at 93.8%followed by Gemini 3 Pro (91.9%)GPT-5 Pro (88.4%)GPT-5.1 (88.1%)Gemini 2.5 Pro (86.4%)and Claude Sonnet 4.5 (83.4%). 3) ARC-AGI-2 (Visual reasoning): Gemini 3 Deep Think (using tools) dominates at 45.1%followed by Gemini 3 Pro at 31.1%GPT-5.1 (17.6%)GPT-5 Pro (15.8%)Claude Sonnet 4.5 (13.6%)and Gemini 2.5 Pro (4.9%).

Iterative development and design

We’ve seen impressive results on tasks that require building something by making small changes over time.

Aiding scientific and mathematical discovery

By reasoning through complex problemsDeep Think can act as a powerful tool for researchers.

Algorithmic development and code

Deep Think excels at tough coding problems where problem formulation and careful consideration of tradeoffs and time complexity is paramount.


Safety

Building with responsibility at the core

As we develop these new technologieswe recognize the responsibility it entailsand aim to prioritize safety and security in all our efforts.


For developers

Build with cutting-edge generative AI models and tools to make AI helpful for everyone

Gemini’s advanced thinkingnative multimodality and massive context window empowers developers to build next-generation experiences.


Try Gemini