Parametric Workbench v3.0

Cognitive Model Workspace

An interactive, high-fidelity workbench for analyzing, comparing, and cost-modeling frontier AI architectures.

Workspace Index 9 Models
Active catalog
Preference Leader GPT-5.5 Pro
1482 Elo
Efficiency Leader Llama 4
$0.15/1M
Rank Model Provider Chatbot Arena Elo MMLU GPQA HumanEval Input/Output per 1M
#1
GPT-5.5 Pro Proprietary
OpenAI
1482
98.1%
89.2%
98.9%
$30.00/$180.00
#2
Claude Opus 4.8 Proprietary
Anthropic
1475
97.8%
88.5%
98.4%
$6.00/$30.00
#3
GPT-5.5 Proprietary
OpenAI
1438
96.8%
84.1%
97.2%
$5.00/$30.00
#4
Claude Sonnet 4.6 Proprietary
Anthropic
1422
95.2%
81.5%
96.8%
$3.00/$15.00
#5
DeepSeek V4 Pro Proprietary
DeepSeek
1405
94.8%
80.2%
96%
$0.43/$0.87
#6
Grok 4.3 Proprietary
xAI
1395
94.5%
78.4%
94.8%
$1.25/$2.50
#7
Llama 4 Maverick Open weights
Meta
1368
92.5%
72.8%
93.5%
$0.15/$0.60
#8
GLM 5.1 Open weights
Zhipu
1352
91.2%
70.5%
93.2%
$0.98/$3.08
#9
Gemini 3.5 Flash Proprietary
Google
1335
90.5%
64.2%
88.5%
$1.50/$9.00
Baseline Model A
VS
Comparison Model B

Parametric Spec Comparison

Workload Parameters

Configure transaction volumes and average token sizes to evaluate monthly run-rates.

Load Presets:
1,000,000
2,000 tokens
800 tokens

Estimated Run-Rate

Model A $0.00
Model B $0.00

Monthly Volume Ledger

Request Volume 1,000,000 /mo
Prompt Token Mass 2,000 B tokens
Completion Token Mass 800 B tokens
Model A
Prompt$0.00
Compl.$0.00
Model B
Prompt$0.00
Compl.$0.00