AI Model & API Providers Analysis | Artificial Analysis

Understand the AI landscape to choose the best model and provider for your use case

2025 State of AI Survey

Participate to receive the full survey report and win a pair of Ray-Ban Meta AI Glasses 🕶️

State of AI: China Report

Highlights

Intelligence

Artificial Analysis Intelligence Index; Higher is better

68666053535150484843413836Gemini 2.5 ProPreviewo3-mini (high)DeepSeek R1DeepSeek V3(Mar' 25)GPT-4.1Llama 4MaverickGPT-4o (March2025)Claude 3.7SonnetGemini 2.0FlashLlama 4 ScoutLlama 3.3 70BMistral Large2 (Nov '24)GPT-4o mini

Speed

Output Tokens per Second; Higher is better

2422041831831251231151137673392523Gemini 2.0FlashGemini 2.5 ProPreviewo3-mini (high)GPT-4o (March2025)Llama 4 ScoutLlama 4MaverickGPT-4.1Llama 3.3 70BClaude 3.7SonnetGPT-4o miniMistral Large2 (Nov '24)DeepSeek V3(Mar' 25)DeepSeek R1

Price

USD per 1M Tokens; Lower is better

11.933.43.567.5Gemini 2.0FlashGPT-4o miniLlama 4 ScoutLlama 4MaverickDeepSeek V3(Mar' 25)Llama 3.3 70BDeepSeek R1o3-mini (high)Mistral Large2 (Nov '24)Gemini 2.5 ProPreviewGPT-4.1Claude 3.7SonnetGPT-4o (March2025)0.20.30.30.40.50.611.933.43.567.5

How do DeepSeek models compare?

DeepSeek Compared

Where can you get an API for DeepSeek R1?DeepSeek R1 Providers

Which models perform best in different languages?Multilingual Comparison

Who has the best Video Generation model?Video Arena

Which model is fastest with 100k token prompts?Long Context Latency

Language Model Comparison Highlights

Artificial Analysis Intelligence Index

Intelligence Index incorporates 7 evaluations: MMLU-Pro, GPQA Diamond, Humanity's Last Exam, LiveCodeBench, SciCode, AIME, MATH-500

6866605753535351515048484341383837Gemini 2.5 ProPreviewo3-mini (high)DeepSeek R1Claude 3.7SonnetThinking DeepSeek V3(Mar' 25)GPT-4.1 miniGPT-4.1Grok 3Llama 4MaverickGPT-4o (March2025)Claude 3.7SonnetGemini 2.0FlashLlama 4 ScoutGPT-4.1 nanoMistral Large2 (Nov '24)Gemma 3 27BNova Pro

+ Add model from specific provider

Artificial Analysis Intelligence Index: Combination metric covering multiple dimensions of intelligence - the simplest way to compare how smart models are. Version 2 was released in Feb '25 and includes: MMLU-Pro, GPQA Diamond, Humanity's Last Exam, LiveCodeBench, SciCode, AIME, MATH-500. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

Artificial Analysis Intelligence Index by Model Type

Intelligence Index incorporates 7 evaluations: MMLU-Pro, GPQA Diamond, Humanity's Last Exam, LiveCodeBench, SciCode, AIME, MATH-500

Reasoning Model

Non-Reasoning Model

+ Add model from specific provider

Artificial Analysis Intelligence Index by Open Weights vs Proprietary

Intelligence Index incorporates 7 evaluations: MMLU-Pro, GPQA Diamond, Humanity's Last Exam, LiveCodeBench, SciCode, AIME, MATH-500

Proprietary

Open Weights

Open Weights (Commercial Use Restricted)

+ Add model from specific provider

Open Weights: Indicates whether the model weights are available. Models are labelled as 'Commercial Use Restricted' if the weights are available but commercial use is limited (typically requires obtaining a paid license).

Artificial Analysis Coding Index

Represents the average of coding benchmarks in the Artificial Analysis Intelligence Index (LiveCodeBench & SciCode)

5755494444424040383836322929232217o3-mini (high)Gemini 2.5 ProPreviewDeepSeek R1GPT-4.1 miniClaude 3.7SonnetThinking GPT-4.1Grok 3GPT-4o (March2025)Claude 3.7SonnetDeepSeek V3(Mar' 25)Llama 4MaverickGemini 2.0FlashGPT-4.1 nanoMistral Large2 (Nov '24)Llama 4 ScoutNova ProGemma 3 27B

+ Add model from specific provider

Artificial Analysis Coding Index: Represents the average of coding evaluations in the Artificial Analysis Intelligence Index. Currently includes: LiveCodeBench, SciCode. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

Artificial Analysis Math Index

Represents the average of math benchmarks in the Artificial Analysis Intelligence Index (AIME 2024 & Math-500)

9392827372686764636160575654544542Gemini 2.5 ProPreviewo3-mini (high)DeepSeek R1DeepSeek V3(Mar' 25)Claude 3.7SonnetThinking GPT-4.1 miniGPT-4.1Llama 4MaverickGemini 2.0FlashGPT-4o (March2025)Grok 3Gemma 3 27BLlama 4 ScoutGPT-4.1 nanoClaude 3.7SonnetNova ProMistral Large2 (Nov '24)

+ Add model from specific provider

Artificial Analysis Math Index: Represents the average of math evaluations in the Artificial Analysis Intelligence Index. Currently includes: AIME, MATH-500. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

Intelligence Evaluations

Intelligence evaluations measured independently by Artificial Analysis; Higher is better

Results claimed by AI Lab (not yet independently verified)

MMLU-Pro (Reasoning & Knowledge)

86%84%84%82%81%81%80%80%80%80%78%78%75%70%69%67%66%Gemini 2.5 ProPreviewDeepSeek R1Claude 3.7SonnetThinking DeepSeek V3(Mar' 25)Llama 4MaverickGPT-4.1GPT-4o (March2025)Claude 3.7Sonneto3-mini (high)Grok 3GPT-4.1 miniGemini 2.0FlashLlama 4 ScoutMistral Large2 (Nov '24)Nova ProGemma 3 27BGPT-4.1 nano86%84%84%82%81%81%80%80%80%80%78%78%75%70%69%67%66%

GPQA Diamond (Scientific Reasoning)

84%77%77%73%71%69%67%67%66%66%66%66%62%59%51%50%49%43%Gemini 2.5 ProPreviewo3-mini (high)Claude 3.7SonnetThinking Llama 3.1Nemotron Ultra253B ReasoningDeepSeek R1Grok 3Llama 4MaverickGPT-4.1GPT-4.1 miniClaude 3.7SonnetGPT-4o (March2025)DeepSeek V3(Mar' 25)Gemini 2.0FlashLlama 4 ScoutGPT-4.1 nanoNova ProMistral Large2 (Nov '24)Gemma 3 27B84%77%77%73%71%69%67%67%66%66%66%66%62%59%51%50%49%43%

Humanity's Last Exam (Reasoning & Knowledge)

17.1%12.3%10.3%9.3%5.3%5.2%5.1%5.0%4.8%4.8%4.7%4.6%4.6%4.3%4.0%3.9%3.4%Gemini 2.5 ProPreviewo3-mini (high)Claude 3.7SonnetThinking DeepSeek R1Gemini 2.0FlashDeepSeek V3(Mar' 25)Grok 3GPT-4o (March2025)Llama 4MaverickClaude 3.7SonnetGemma 3 27BGPT-4.1 miniGPT-4.1Llama 4 ScoutMistral Large2 (Nov '24)GPT-4.1 nanoNova Pro17.1%12.3%10.3%9.3%5.3%5.2%5.1%5.0%4.8%4.8%4.7%4.6%4.6%4.3%4.0%3.9%3.4%

LiveCodeBench (Coding)

73%70%64%62%48%47%46%43%43%41%40%39%33%33%30%29%23%14%o3-mini (high)Gemini 2.5 ProPreviewLlama 3.1Nemotron Ultra253B ReasoningDeepSeek R1GPT-4.1 miniClaude 3.7SonnetThinking GPT-4.1GPT-4o (March2025)Grok 3DeepSeek V3(Mar' 25)Llama 4MaverickClaude 3.7SonnetGemini 2.0FlashGPT-4.1 nanoLlama 4 ScoutMistral Large2 (Nov '24)Nova ProGemma 3 27B73%70%64%62%48%47%46%43%43%41%40%39%33%33%30%29%23%14%

SciCode (Coding)

40%40%40%40%38%38%37%37%36%36%33%31%29%26%21%21%17%GPT-4.1 miniClaude 3.7SonnetThinking o3-mini (high)Gemini 2.5 ProPreviewGPT-4.1Claude 3.7SonnetGrok 3GPT-4o (March2025)DeepSeek V3(Mar' 25)DeepSeek R1Llama 4MaverickGemini 2.0FlashMistral Large2 (Nov '24)GPT-4.1 nanoGemma 3 27BNova ProLlama 4 Scout40%40%40%40%38%38%37%37%36%36%33%31%29%26%21%21%17%

HumanEval (Coding)

99%98%98%98%96%96%95%95%92%91%90%90%89%88%88%83%83%Gemini 2.5 ProPreviewClaude 3.7SonnetThinking DeepSeek R1o3-mini (high)GPT-4o (March2025)GPT-4.1GPT-4.1 miniClaude 3.7SonnetDeepSeek V3(Mar' 25)Grok 3Gemini 2.0FlashMistral Large2 (Nov '24)Gemma 3 27BLlama 4MaverickGPT-4.1 nanoNova ProLlama 4 Scout99%98%98%98%96%96%95%95%92%91%90%90%89%88%88%83%83%

MATH-500 (Quantitative Reasoning)

99%98%97%95%94%93%93%91%89%89%88%87%85%85%84%79%74%o3-mini (high)Gemini 2.5 ProPreviewDeepSeek R1Claude 3.7SonnetThinking DeepSeek V3(Mar' 25)Gemini 2.0FlashGPT-4.1 miniGPT-4.1GPT-4o (March2025)Llama 4MaverickGemma 3 27BGrok 3Claude 3.7SonnetGPT-4.1 nanoLlama 4 ScoutNova ProMistral Large2 (Nov '24)99%98%97%95%94%93%93%91%89%89%88%87%85%85%84%79%74%

AIME 2024 (Competition Math)

87%86%75%68%52%49%44%43%39%33%33%33%28%25%24%22%11%11%Gemini 2.5 ProPreviewo3-mini (high)Llama 3.1Nemotron Ultra253B ReasoningDeepSeek R1DeepSeek V3(Mar' 25)Claude 3.7SonnetThinking GPT-4.1GPT-4.1 miniLlama 4MaverickGemini 2.0FlashGrok 3GPT-4o (March2025)Llama 4 ScoutGemma 3 27BGPT-4.1 nanoClaude 3.7SonnetMistral Large2 (Nov '24)Nova Pro87%86%75%68%52%49%44%43%39%33%33%33%28%25%24%22%11%11%

Multilingual Index (Artificial Analysis)

83%83%Nova ProMistral Large2 (Nov '24)83%83%

+ Add model from specific provider

While model intelligence generally translates across use cases, specific evaluations may be more relevant for certain use cases.

Intelligence vs. Price

Artificial Analysis Intelligence Index (Version 2, released Feb '25); Price: USD per 1M Tokens

Most attractive quadrant

GPT-4o (March 2025)

GPT-4.1 mini

GPT-4.1

GPT-4.1 nano

o3-mini (high)

Llama 4 Maverick

Llama 4 Scout

Gemini 2.0 Flash

Gemini 2.5 Pro Preview

Claude 3.7 Sonnet

Claude 3.7 Sonnet Thinking

Mistral Large 2 (Nov '24)

DeepSeek R1

DeepSeek V3 (Mar' 25)

Nova Pro

$1.00$2.00$3.00$4.00$5.00$6.00$7.00$8.00Price (USD per M Tokens)303540455055606570Artificial Analysis Intelligence IndexNova ProNova ProMistral Large 2 (Nov '24)Mistral Large 2 (Nov '24)GPT-4.1 nanoGPT-4.1 nanoLlama 4 ScoutLlama 4 ScoutGemini 2.0 FlashGemini 2.0 FlashClaude 3.7 SonnetClaude 3.7 SonnetLlama 4 MaverickLlama 4 MaverickGPT-4o (March 2025)GPT-4o (March 2025)GPT-4.1GPT-4.1GPT-4.1 miniGPT-4.1 miniDeepSeek V3 (Mar' 25)DeepSeek V3 (Mar' 25)Claude 3.7 Sonnet ThinkingClaude 3.7 Sonnet ThinkingDeepSeek R1DeepSeek R1o3-mini (high)o3-mini (high)Gemini 2.5 Pro PreviewGemini 2.5 Pro Preview

+ Add model from specific provider

While higher intelligence models are typically more expensive, they do not all follow the same price-quality curve.

Price: Price per token, represented as USD per million Tokens. Price is a blend of Input & Output token prices (3:1 ratio).

Figures represent performance of the model's first-party API (e.g. OpenAI for o1) or the median across providers where a first-party API is not available (e.g. Meta's Llama models).

Intelligence vs. Output Speed

Artificial Analysis Intelligence Index (Version 2, released Feb '25); Output Speed: Output Tokens per Second

Most attractive quadrant

GPT-4o (March 2025)

GPT-4.1 mini

GPT-4.1

GPT-4.1 nano

o3-mini (high)

Llama 4 Maverick

Llama 4 Scout

Gemini 2.0 Flash

Gemini 2.5 Pro Preview

Claude 3.7 Sonnet

Mistral Large 2 (Nov '24)

DeepSeek R1

DeepSeek V3 (Mar' 25)

Nova Pro

406080100120140160180200220240260280300Output Speed (Output Tokens per Second)303540455055606570Artificial Analysis Intelligence IndexMistral Large 2 (Nov '24)Mistral Large 2 (Nov '24)DeepSeek V3 (Mar' 25)DeepSeek V3 (Mar' 25)DeepSeek R1DeepSeek R1Claude 3.7 SonnetClaude 3.7 SonnetNova ProNova ProGPT-4.1GPT-4.1Llama 4 ScoutLlama 4 ScoutLlama 4 MaverickLlama 4 MaverickGPT-4.1 miniGPT-4.1 miniGPT-4o (March 2025)GPT-4o (March 2025)o3-mini (high)o3-mini (high)Gemini 2.5 Pro PreviewGemini 2.5 Pro PreviewGemini 2.0 FlashGemini 2.0 FlashGPT-4.1 nanoGPT-4.1 nano

+ Add model from specific provider

Output Speed: Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API for models which support streaming).

Figures represent performance of the model's first-party API (e.g. OpenAI for o1) or the median across providers where a first-party API is not available (e.g. Meta's Llama models).

Output Speed

Output Tokens per Second; Higher is better

26724220418318318212512311510576392523GPT-4.1 nanoGemini 2.0FlashGemini 2.5 ProPreviewo3-mini (high)GPT-4o (March2025)GPT-4.1 miniLlama 4 ScoutLlama 4MaverickGPT-4.1Nova ProClaude 3.7SonnetMistral Large2 (Nov '24)DeepSeek V3(Mar' 25)DeepSeek R1

+ Add model from specific provider

Output Speed: Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API for models which support streaming).

Figures represent performance of the model's first-party API (e.g. OpenAI for o1) or the median across providers where a first-party API is not available (e.g. Meta's Llama models).

Latency: Time To First Answer Token

Seconds to First Answer Token Received; Accounts for Reasoning Model 'Thinking' time

Input processing

Thinking (reasoning models, when applicable)

102.1Gemini 2.0FlashLlama 4 ScoutNova ProGPT-4o (March2025)GPT-4.1 miniLlama 4MaverickGPT-4.1Mistral Large2 (Nov '24)GPT-4.1 nanoClaude 3.7SonnetDeepSeek V3(Mar' 25)Gemini 2.5 ProPreviewo3-mini (high)DeepSeek R10.30.40.40.40.40.40.40.50.813.428.644.5105.3

+ Add model from specific provider

Time To First Answer Token: Time to first answer token received, in seconds, after API request sent. For reasoning models, this includes the 'thinking' time of the model before providing an answer. For models which do not support streaming, this represents time to receive the completion.

Pricing: Input and Output Prices

Price: USD per 1M Tokens

Input price

Output price

223351.62.1944.46810151515GPT-4.1 nanoGemini 2.0FlashLlama 4 ScoutLlama 4MaverickDeepSeek V3(Mar' 25)GPT-4.1 miniDeepSeek R1Nova Proo3-mini (high)Mistral Large2 (Nov '24)GPT-4.1Gemini 2.5 ProPreviewClaude 3.7SonnetClaude 3.7SonnetThinking GPT-4o (March2025)0.10.10.150.20.270.40.5511.1221.253350.40.40.50.821.11.62.1944.46810151515

+ Add model from specific provider

Input Price: Price per token included in the request/message sent to the API, represented as USD per million Tokens.

Figures represent performance of the model's first-party API (e.g. OpenAI for o1) or the median across providers where a first-party API is not available (e.g. Meta's Llama models).

Time to First Token Variance

Seconds to First Token Received; Results by percentile; Lower is better

Median, Other points represent 5th, 25th, 75th, 95th Percentiles respectively

GPT-4o (March2025)GPT-4.1 miniGPT-4.1GPT-4.1 nanoo3-mini (high)Llama 4MaverickLlama 4 ScoutGemini 2.0FlashGemma 3 27BGemini 2.5 ProPreviewClaude 3.7SonnetClaude 3.7SonnetThinking Mistral Large2 (Nov '24)DeepSeek R1DeepSeek V3(Mar' 25)Grok 3Nova ProLlama 3.1Nemotron Ultra253B Reasoning01020304050607080901000.420.420.430.8344.530.430.360.35028.580.9600.513.233.3900.40

+ Add model from specific provider

Latency (Time to First Token): Time to first token received, in seconds, after API request sent. For reasoning models which share reasoning tokens, this will be the first reasoning token. For models which do not support streaming, this represents time to receive the completion.

Boxplot: Shows variance of measurements

Picture of the author

API Provider Highlights: Llama 3.3 Instruct 70B

Output Speed vs. Price: Llama 3.3 Instruct 70B

Output Speed: Output Tokens per Second, Price: USD per 1M Tokens; 1,000 Input Tokens

Most attractive quadrant

Lambda Labs (FP8)

Cerebras

Hyperbolic

Amazon

Nebius Fast

Nebius Base

CentML

Azure

Fireworks

Deepinfra (Turbo, FP8)

Deepinfra

FriendliAI

Novita

Groq

SambaNova

Together.ai Turbo

kluster.ai

$0.20$0.30$0.40$0.50$0.60$0.70$0.80$0.90$1.00Price (USD per 1M Tokens)20040060080010001200140016001800200022002400Output Speed (Output Tokens per Second)DeepinfraDeepinfraDeepinfra (Turbo, FP8)Deepinfra (Turbo, FP8)Nebius BaseNebius BaseLambda Labs (FP8)Lambda Labs (FP8)kluster.aikluster.aiAzureAzureHyperbolicHyperbolicNovitaNovitaTogether.ai TurboTogether.ai TurboNebius FastNebius FastAmazonAmazonCentMLCentMLFireworksFireworksFriendliAIFriendliAIGroqGroqSambaNovaSambaNovaCerebrasCerebras

Smaller, emerging providers are offering high output speed and at competitive prices.

Price: Price per token, represented as USD per million Tokens. Price is a blend of Input & Output token prices (3:1 ratio).

Output Speed: Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API for models which support streaming).

Median: Figures represent median (P50) measurement over the past 72 hours to reflect sustained changes in performance.

Notes: Llama 3.3 70B, Cerebras: 33k context, Llama 3.3 70B, Novita: 128k context

Pricing (Input and Output Prices): Llama 3.3 Instruct 70B

Price: USD per 1M Tokens; Lower is better; 1,000 Input Tokens

Input price

Output price

0.120.130.130.230.390.40.250.50.60.590.70.710.710.880.60.90.850.30.40.40.40.390.40.750.50.60.790.70.710.710.881.20.91.2Lambda Labs(FP8)Nebius BaseDeepinfra(Turbo, FP8)DeepinfraNovitaHyperbolicNebius FastCentMLFriendliAIGroqkluster.aiAmazonAzureTogether.aiTurboSambaNovaFireworksCerebras0.120.130.130.230.390.40.250.50.60.590.70.710.710.880.60.90.850.30.40.40.40.390.40.750.50.60.790.70.710.710.881.20.91.2

The relative importance of input vs. output token prices varies by use case. E.g. Generation tasks are typically more input token weighted while document-focused tasks (e.g. RAG) are more output token weighted.

Input Price: Price per token included in the request/message sent to the API, represented as USD per million Tokens.

Output Price: Price per token generated by the model (received from the API), represented as USD per million Tokens.

Notes: Llama 3.3 70B, Cerebras: 33k context, Llama 3.3 70B, Novita: 128k context

Output Speed: Llama 3.3 Instruct 70B

Output Speed: Output Tokens per Second; 1,000 Input Tokens

2327455353184176CerebrasSambaNovaGroqFriendliAIFireworksCentMLAmazonNebius FastTogether.aiTurboNovitaHyperbolicAzurekluster.aiLambda Labs(FP8)Nebius BaseDeepinfra(Turbo, FP8)Deepinfra1841761431381341268079484439363228

Output Speed: Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API for models which support streaming).

Figures represent performance of the model's first-party API (e.g. OpenAI for o1) or the median across providers where a first-party API is not available (e.g. Meta's Llama models).

Notes: Llama 3.3 70B, Cerebras: 33k context, Llama 3.3 70B, Novita: 128k context

Output Speed, Over Time: Llama 3.3 Instruct 70B

Output Tokens per Second; Higher is better; 1,000 Input Tokens

Lambda Labs (FP8)

Cerebras

Hyperbolic

Amazon

Nebius Fast

Nebius Base

CentML

Azure

Fireworks

Deepinfra (Turbo, FP8)

Deepinfra

FriendliAI

Novita

Groq

SambaNova

Together.ai Turbo

kluster.ai

Jan 19Jan 26Feb 02Feb 09Feb 16Feb 23Mar 02Mar 09Mar 16Mar 23Mar 30Apr 06Apr 13020040060080010001200140016001800200022002400260028003000

Smaller, emerging providers offer high output speed, though precise speeds delivered vary day-to-day.

Output Speed: Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API for models which support streaming).

Over time measurement: Median measurement per day, based on 8 measurements each day at different times. Labels represent start of week's measurements.

Notes: Llama 3.3 70B, Cerebras: 33k context, Llama 3.3 70B, Novita: 128k context

See more information on any of our supported models

Model Name	Creator	License	Context Window	Further analysis

o1	OpenAI	Proprietary	200k
o3-mini	OpenAI	Proprietary	200k
o1-preview	OpenAI	Proprietary	128k
o1-mini	OpenAI	Proprietary	128k
GPT-4o (Aug '24)	OpenAI	Proprietary	128k
GPT-4o (May '24)	OpenAI	Proprietary	128k
GPT-4o (Nov '24)	OpenAI	Proprietary	128k
GPT-4o mini	OpenAI	Proprietary	128k
GPT-4o (March 2025, chatgpt-4o-latest)	OpenAI	Proprietary	128k
GPT-4o mini Realtime (Dec '24)	OpenAI	Proprietary	128k
GPT-4.1 mini	OpenAI	Proprietary	1m
o1-pro	OpenAI	Proprietary	200k
GPT-4o Realtime (Dec '24)	OpenAI	Proprietary	128k
o3	OpenAI	Proprietary	128k
GPT-4.1	OpenAI	Proprietary	1m
GPT-4.1 nano	OpenAI	Proprietary	1m
GPT-4.5 (Preview)	OpenAI	Proprietary	128k
o3-mini (high)	OpenAI	Proprietary	200k
GPT-4 Turbo	OpenAI	Proprietary	128k
GPT-4	OpenAI	Proprietary	8k
GPT-4o (ChatGPT)	OpenAI	Proprietary	128k

Llama 3.3 Instruct 70B	Meta	Open	128k
Llama 3.1 Instruct 405B	Meta	Open	128k
Llama 3.2 Instruct 90B (Vision)	Meta	Open	128k
Llama 3.1 Instruct 70B	Meta	Open	128k
Llama 3.2 Instruct 11B (Vision)	Meta	Open	128k
Llama 3.1 Instruct 8B	Meta	Open	128k
Llama 3.2 Instruct 3B	Meta	Open	128k
Llama 3.2 Instruct 1B	Meta	Open	128k
Llama 4 Maverick	Meta	Open	1m
Llama 4 Scout	Meta	Open	10m
Llama 3 Instruct 70B	Meta	Open	8k
Llama 3 Instruct 8B	Meta	Open	8k
Llama 2 Chat 7B	Meta	Open	4k
Llama 2 Chat 70B	Meta	Open	4k
Llama 2 Chat 13B	Meta	Open	4k

Gemini 2.0 Pro Experimental (Feb '25)	Google	Proprietary	2m
Gemini 2.0 Flash (Feb '25)	Google	Proprietary	1m
Gemini 2.0 Flash (experimental)	Google	Proprietary	1m
Gemini 1.5 Pro (Sep '24)	Google	Proprietary	2m
Gemini 1.5 Flash (Sep '24)	Google	Proprietary	1m
Gemini 1.5 Pro (May '24)	Google	Proprietary	2m
Gemma 2 27B	Google	Open	8k
Gemma 2 9B	Google	Open	8k
Gemini 1.5 Flash-8B	Google	Proprietary	1m
Gemma 3 27B Instruct	Google	Open	128k
Gemini 2.0 Flash Thinking Experimental (Dec '24)	Google	Proprietary	2m
Gemini 2.0 Flash-Lite (Feb '25)	Google	Proprietary	1m
Gemini 2.0 Flash Thinking Experimental (Jan '25)	Google	Proprietary	1m
Gemma 3 12B Instruct	Google	Open	128k
Gemma 3 1B Instruct	Google	Open	32k
Gemini 2.5 Pro Preview (Mar' 25)	Google	Proprietary	1m
Gemma 3 4B Instruct	Google	Open	128k
Gemini 2.0 Flash-Lite (Preview)	Google	Proprietary	1m
Gemini 1.5 Flash (May '24)	Google	Proprietary	1m
Gemini 1.0 Pro	Google	Proprietary	33k

Claude 3.5 Sonnet (Oct '24)	Anthropic	Proprietary	200k
Claude 3.5 Sonnet (June '24)	Anthropic	Proprietary	200k
Claude 3 Opus	Anthropic	Proprietary	200k
Claude 3.5 Haiku	Anthropic	Proprietary	200k
Claude 3 Haiku	Anthropic	Proprietary	200k
Claude 3.7 Sonnet (Standard)	Anthropic	Proprietary	200k
Claude 3.7 Sonnet (Extended Thinking)	Anthropic	Proprietary	200k
Claude 3 Sonnet	Anthropic	Proprietary	200k
Claude 2.0	Anthropic	Proprietary	100k
Claude 2.1	Anthropic	Proprietary	200k

Pixtral Large	Mistral	Open	128k
Mistral Large 2 (Nov '24)	Mistral	Open	128k
Mistral Large 2 (Jul '24)	Mistral	Open	128k
Mistral Small 3	Mistral	Open	32k
Mistral Small (Sep '24)	Mistral	Open	33k
Mixtral 8x22B Instruct	Mistral	Open	65k
Pixtral 12B (2409)	Mistral	Open	128k
Ministral 8B	Mistral	Open	128k
Mistral NeMo	Mistral	Open	128k
Ministral 3B	Mistral	Proprietary	128k
Mixtral 8x7B Instruct	Mistral	Open	33k
Codestral-Mamba	Mistral	Open	256k
Mistral Small 3.1	Mistral	Open	128k
Mistral Saba	Mistral	Proprietary	32k
Codestral (Jan '25)	Mistral	Proprietary	256k
Mistral Small (Feb '24)	Mistral	Proprietary	33k
Mistral Large (Feb '24)	Mistral	Proprietary	33k
Mistral 7B Instruct	Mistral	Open	8k
Mistral Medium	Mistral	Proprietary	33k
Codestral (May '24)	Mistral	Open	33k

DeepSeek R1	DeepSeek	Open	128k
DeepSeek R1 Distill Llama 70B	DeepSeek	Open	128k
DeepSeek-V2.5 (Dec '24)	DeepSeek	Open	128k
DeepSeek-Coder-V2	DeepSeek	Open	128k
DeepSeek LLM 67B Chat (V1)	DeepSeek	Open	4k
DeepSeek R1 Distill Qwen 14B	DeepSeek	Open	128k
DeepSeek V3 0324 (Mar' 25)	DeepSeek	Open	128k
DeepSeek R1 Distill Qwen 32B	DeepSeek	Open	128k
DeepSeek Coder V2 Lite Instruct	DeepSeek	Open	128k
DeepSeek R1 Distill Llama 8B	DeepSeek	Open	128k
DeepSeek R1 Distill Qwen 1.5B	DeepSeek	Open	128k
DeepSeek V3 (Dec '24)	DeepSeek	Open	128k
DeepSeek-V2.5	DeepSeek	Open	128k
DeepSeek-V2-Chat	DeepSeek	Open	128k

R1 1776	Perplexity	Open	128k
Sonar Reasoning	Perplexity	Proprietary	127k
Sonar Pro	Perplexity	Proprietary	200k
Sonar Reasoning Pro	Perplexity	Proprietary	127k
Sonar	Perplexity	Proprietary	127k

Grok Beta	xAI	Proprietary	128k
Grok 3 mini	xAI	Proprietary	1m
Grok 3 mini Reasoning (high)	xAI	Proprietary	1m
Grok 2 (Dec '24)	xAI	Proprietary	131k
Grok 3 Reasoning Beta	xAI	Proprietary	1m
Grok 3	xAI	Proprietary	1m

Nova Pro	Amazon	Proprietary	300k
Nova Lite	Amazon	Proprietary	300k
Nova Micro	Amazon	Proprietary	130k

Phi-4	Microsoft Azure	Open	16k
Phi-4 Mini Instruct	Microsoft Azure	Open	128k
Phi-4 Multimodal Instruct	Microsoft Azure	Open	128k
Phi-3 Mini Instruct 3.8B	Microsoft Azure	Open	4k
Phi-3 Medium Instruct 14B	Microsoft Azure	Open	128k

LFM 40B	Liquid AI	Proprietary	32k

Solar Mini	Upstage	Open	4k

DBRX Instruct	Databricks	Open	33k

MiniMax-Text-01	MiniMax	Open	4m

Llama 3.1 Nemotron Instruct 70B	NVIDIA	Open	128k
Llama 3.3 Nemotron Super 49B v1	NVIDIA	Open	128k
Llama 3.1 Nemotron Ultra 253B v1 (Reasoning)	NVIDIA	Open	128k

Llama 3.1 Tulu3 405B	Allen Institute for AI	Open	128k

Reka Flash (Sep '24)	Reka AI	Proprietary	128k
Reka Core	Reka AI	Proprietary	128k
Reka Flash (Feb '24)	Reka AI	Proprietary	128k
Reka Edge	Reka AI	Proprietary	128k
Reka Flash 3	Reka AI	Open	128k

DeepHermes 3 - Mistral 24B Preview	Nous Research	Open	32k
DeepHermes 3 - Llama-3.1 8B Preview	Nous Research	Open	128k
Hermes 3 - Llama-3.1 70B	Nous Research	Open	128k

Command-R+ (Aug '24)	Cohere	Open	128k
Command-R+ (Apr '24)	Cohere	Open	128k
Command-R (Mar '24)	Cohere	Open	128k
Aya Expanse 32B	Cohere	Open	128k
Aya Expanse 8B	Cohere	Open	8k
Command-R (Aug '24)	Cohere	Open	128k
Command A	Cohere	Open	256k

Jamba 1.5 Large	AI21 Labs	Open	256k
Jamba 1.5 Mini	AI21 Labs	Open	256k
Jamba 1.6 Mini	AI21 Labs	Open	256k
Jamba 1.6 Large	AI21 Labs	Open	256k
Jamba Instruct	AI21 Labs	Proprietary	256k

Arctic Instruct	Snowflake	Open	4k

Qwen2.5 Max	Alibaba	Proprietary	32k
Qwen2.5 Instruct 72B	Alibaba	Open	131k
Qwen2.5 Coder Instruct 32B	Alibaba	Open	131k
Qwen Turbo	Alibaba	Proprietary	1m
Qwen2 Instruct 72B	Alibaba	Open	131k
Qwen Chat 72B	Alibaba	Open	34k
Qwen1.5 Chat 110B	Alibaba	Open	32k
QwQ 32B	Alibaba	Open	131k
Qwen2.5 Coder Instruct 7B	Alibaba	Open	131k
Qwen2.5 Instruct 32B	Alibaba	Open	128k
QwQ 32B-Preview	Alibaba	Open	33k

Yi-Large	01.AI	Proprietary	32k

OpenChat 3.5 (1210)	OpenChat	Open	8k

Footer

Key Links

Compare Language Models
Language Models Leaderboard
Language Model API Leaderboard
Image Arena
Video Arena
Speech Arena

Artificial Analysis

FAQ
Contact & Data access
Terms of Use
Privacy Policy
hello@artificialanalysis.ai

AI Model & API Providers Analysis | Artificial Analysis

Language Model Comparison Highlights

Artificial Analysis Intelligence Index

Artificial Analysis Intelligence Index by Model Type

Artificial Analysis Intelligence Index by Open Weights vs Proprietary

Artificial Analysis Coding Index

Artificial Analysis Math Index

Intelligence Evaluations

Intelligence vs. Price

Intelligence vs. Output Speed

Output Speed

Latency: Time To First Answer Token

Pricing: Input and Output Prices

Time to First Token Variance

API Provider Highlights: Llama 3.3 Instruct 70B

Output Speed vs. Price: Llama 3.3 Instruct 70B

Pricing (Input and Output Prices): Llama 3.3 Instruct 70B

Output Speed: Llama 3.3 Instruct 70B

Output Speed, Over Time: Llama 3.3 Instruct 70B

Footer

Key Links

Artificial Analysis

Subscribe to our newsletter