Last updated: June 23, 2026
Fugu Ultra is Sakana AI’s quality-first version of Sakana Fugu, a multi-agent model orchestration system exposed through one OpenAI-compatible API. If standard Fugu is the lower-latency default, Fugu Ultra is the variant Sakana positions for harder multi-step tasks where answer depth matters more than response time.
Quick answer: Fugu Ultra is not just a bigger single chatbot. It is part of Sakana Fugu, a learned orchestration system that can coordinate a pool of expert AI agents. Sakana lists fugu-ultra-20260615 pricing at $5 per 1M input tokens, $30 per 1M output tokens, and $0.50 per 1M cached input tokens, with higher rates for context above 272K tokens. Sakana’s published benchmark table shows strong results, but those are vendor-published results and should be validated on your own workload.
For the broader launch and Claude Fable 5 comparison, read the main Sakana AI Fugu review. This page focuses only on the Fugu Ultra query cluster: model type, pricing, benchmarks, use cases, and evaluation checklist.
What is Fugu Ultra?
Fugu Ultra is the performance-oriented variant of Sakana Fugu. Sakana describes Fugu as a multi-agent system delivered as one model: you call one API, while Fugu handles model selection, routing, coordination, and synthesis behind the scenes.
The difference is operating mode:
| Model | Practical role | Tradeoff |
|---|---|---|
| Fugu | Everyday coding, code review, chatbots, and interactive work | Designed to balance latency and quality |
| Fugu Ultra | Hard research, long coding tasks, paper reproduction, patent/literature analysis, and complex reasoning | Prioritizes answer quality and deeper orchestration, with more latency and potentially more usage |
Sakana’s Fugu product page says Fugu and Fugu Ultra are both available through one OpenAI-compatible API. That matters for developers because you can test the two modes without rebuilding your whole integration.
Fugu Ultra vs Fugu
The simplest way to choose is to start with the task shape.
Use Fugu first when:
- the user is waiting interactively;
- latency matters;
- the task is routine coding, editing, or summarization;
- you need configurable agent opt-outs for data, privacy, or compliance;
- a cheaper or simpler model can solve the task reliably.
Evaluate Fugu Ultra when:
- the task is multi-step and failure is expensive;
- the work benefits from planning, verification, and synthesis;
- the prompt contains a paper, codebase, dataset, patent set, or long document set;
- answer depth matters more than immediate response time;
- you are comparing against premium models such as Claude Fable 5, Opus 4.8, GPT-5.5, or Gemini 3.1 Pro.
Sakana’s own examples for Fugu Ultra include Kaggle competitions, paper reproduction, cybersecurity analysis, and literature or patent investigations. Treat those as starting points for evaluation rather than fixed expectations.
Fugu Ultra pricing
Sakana lists fixed pay-as-you-go pricing for fugu-ultra-20260615:
| Token type | Price per 1M tokens | Price when context is over 272K |
|---|---|---|
| Input | $5 | $10 |
| Output | $30 | $45 |
| Cached input | $0.50 | $1.00 |
Sakana also lists monthly subscription tiers that include access to both Fugu and Fugu Ultra:
| Plan | Price | Positioning |
|---|---|---|
| Standard | $20/month | Lightweight daily usage |
| Pro | $100/month | Focused coding, review, research, and analysis sessions |
| Max | $200/month | Heavier long-running workloads |
The important evaluation metric is cost per completed task, not only price per token. Fugu Ultra may use hidden orchestration work behind the final answer. Sakana’s pricing page explains that orchestration tokens can count toward usage. For a fair test, log input, output, cached-input, orchestration usage, latency, retry rate, and human review time.
Fugu Ultra benchmark results
Sakana’s published Fugu table reports the following Fugu Ultra scores:
| Benchmark | Fugu Ultra score | Notes |
|---|---|---|
| SWE Bench Pro | 73.7 | Software engineering benchmark in Sakana’s table |
| TerminalBench 2.1 | 82.1 | Terminal/task execution benchmark |
| LiveCodeBench | 93.2 | Competitive/programming benchmark |
| LiveCodeBench Pro | 90.8 | Harder coding variant |
| Humanity’s Last Exam | 50.0 | Broad hard reasoning/knowledge benchmark |
| CharXiv Reasoning | 86.6 | Scientific/visual reasoning benchmark |
| GPQA-D | 95.5 | Graduate-level science QA benchmark |
| SciCode | 58.7 | Scientific coding benchmark |
| MRCRv2 | 93.6 | Long-context/retrieval-style benchmark |
These numbers are useful, but they should be read with the caveat Sakana itself gives: baseline scores are provider-reported, and the Fugu results are from Sakana’s June 2026 evaluation. Teams should not make production routing decisions from the table alone.
A better internal test is to choose 20-50 representative tasks and compare:
- success rate;
- answer quality after human review;
- cost per successful task;
- latency;
- retry count;
- policy or compliance failures;
- whether the final answer is easier to audit.
Fugu Ultra vs Claude Fable 5
Fugu Ultra and Claude Fable 5 are often compared because both target difficult long-horizon work. The comparison is useful, but the products are structurally different.
| Comparison | Fugu Ultra | Claude Fable 5 |
|---|---|---|
| Type | Multi-agent orchestrator exposed as one model | Single Anthropic Mythos-class model |
| Provider | Sakana AI | Anthropic |
| API style | OpenAI-compatible Sakana API | Claude API |
| Pricing headline | $5 input / $30 output per 1M tokens | $10 input / $50 output per 1M tokens when available |
| Transparency | Less visible worker routing | Clear model identity, with documented safeguards |
| Access context | Sakana says Fugu is unavailable in EU/EEA while compliance work continues | Anthropic’s Fable page listed access as unavailable in the June 23 source check |
The practical question is not “which one wins?” It is: which option fits your workload, compliance constraints, latency tolerance, and provider-risk model? If Fable access is unavailable or your team wants a provider-flexible orchestration layer, Fugu Ultra may be worth evaluating. If you need a single known model identity and Anthropic access is available, Fable 5 may be simpler to govern.
Is Fugu Ultra open source?
No — not in the sense of “the full hosted model orchestration service is open source.” Sakana has a public SakanaAI/fugu GitHub repository, and that repository includes setup docs, configuration materials, the README, assets, and the technical report. But the hosted Fugu and Fugu Ultra service is an API product.
A procurement-friendly distinction:
- public GitHub repo: yes;
- public technical report: yes;
- open-source hosted orchestration service: no;
- exact worker routing visible to users: no;
- API product with documented pricing: yes.
When should you test Fugu Ultra?
Test Fugu Ultra when the work is hard enough that orchestration may matter. Good candidates include:
- long codebase review;
- migration planning;
- paper reproduction;
- patent and literature analysis;
- Kaggle-style data science;
- research synthesis;
- scoped security review with evidence requirements;
- complex tasks where a single-model answer often needs multiple retries.
Do not start with Fugu Ultra for every task. For short writing, simple summarization, quick Q&A, or basic code snippets, a lower-latency and lower-cost model may be enough.
Sources checked
- Sakana Fugu product page
- Sakana Fugu launch announcement
- SakanaAI/fugu GitHub repository
- Claude Fable 5 page
- Anthropic statement on Fable 5 and Mythos 5 access
FAQ
What is Fugu Ultra?
Is Fugu Ultra different from Sakana Fugu?
How much does Fugu Ultra cost?
fugu-ultra-20260615 at $5 per 1M input tokens, $30 per 1M output tokens, and $0.50 per 1M cached input tokens. For context above 272K tokens, Sakana lists $10 input, $45 output, and $1.00 cached input per 1M tokens.