Google2026-05-26

Gemini 3.5 Flash: Google's Agent-Optimized Frontier Model, Unveiled at I/O 2026

On May 19, 2026, Google took the stage at I/O 2026 to announce Gemini 3.5 Flash.

The message was simple: "Frontier intelligence at Flash latency." Cheap, fast, yet frontier-class in capability. This isn't just another model—it's a new breed optimized specifically for agentic workloads.

What Is Gemini 3.5 Flash?

Gemini 3.5 Flash is a large language model built by Google DeepMind, purpose-designed for agent and coding tasks. It's the first model in the Gemini 3.5 family and went GA (generally available) on day one of I/O 2026.

The API model ID is gemini-3.5-flash (no preview suffix). Internal version: 3.5-flash-05-2026. Knowledge cutoff: January 2026.

Core Specs:

Spec	Value
Max Input	1,048,576 tokens (1M)
Max Output	65,536 tokens (64K)
Input Modalities	Text, image, audio, video
Output Modalities	Text only
Dynamic Thinking	Enabled by default
Speed	4× faster than comparable frontier models

Pricing: The Cheapest Frontier Model

Tier	Input	Output	Cached Input
Global	$1.50/1M	$9.00/1M	$0.15/1M
Non-Global	$1.65/1M	$9.90/1M	$0.165/1M

For context, here's how it stacks up against the competition:

Model	Input/1M	Output/1M
Gemini 3.5 Flash	$1.50	$9.00
Gemini 3.1 Pro	$2.50	$15.00
Claude Sonnet 4.6	$3.00	$15.00
GPT-5.2	$1.25	$10.00
Claude Opus 4.7	$5.00	$25.00

That's 40% cheaper on input and 40% cheaper on output versus Gemini 3.1 Pro. And the cached input price of $0.15/1M is just one-tenth of the regular rate. For agents that re-read the same context repeatedly, this creates an overwhelming cost advantage.

Google's pitch: *"A model that generates $1 of revenue per minute at a cost below $0.30."

Benchmarks: Outperforming Pro on Agents and Coding

Gemini 3.5 Flash's design philosophy is clear—optimize for real-world agent tasks, not academic reasoning.

Coding

Benchmark	3.5 Flash	3.1 Pro	Delta
Terminal-Bench 2.1	76.2%	70.3%	+5.9
SWE-Bench Pro (public)	55.1%	54.2%	+0.9

Terminal-Bench 2.1 evaluates terminal-based agentic tasks, and Flash pulled ahead by nearly 6 points.

Agent & Tool Use

Benchmark	3.5 Flash	3.1 Pro	Delta
MCP Atlas	83.6%	78.2%	+5.4
Toolathlon	56.5%	49.4%	+7.1
OSWorld-Verified	78.4%	76.2%	+2.2
Finance Agent v2	57.9%	43.0%	+14.9
GDPval-AA (ELO)	1656	1314	+342

The results are striking. A 14.9-point jump on Finance Agent v2 and a 342 ELO gain on GDPval-AA signal a dramatic leap in agent-scenario performance.

Multimodal & Long Context

Benchmark	3.5 Flash	3.1 Pro	Delta
CharXiv Reasoning	84.2%	83.3%	+0.9
MMMU-Pro	83.6%	80.5%	+3.1
Blueprint-Bench 2	33.6%	26.5%	+7.1
MRCR v2 · 128k	77.3%	84.9%	-7.6
MRCR v2 · 1M	26.6%	26.3%	+0.3

Reasoning (Where Pro Still Wins)

Benchmark	3.5 Flash	3.1 Pro	Delta
Humanity's Last Exam	40.2%	44.4%	-4.2
ARC-AGI-2	72.1%	77.1%	-5.0

On HLE (Humanity's Last Exam) and ARC-AGI-2, Pro came out ahead. For academic reasoning and abstract problem-solving, Pro retains the edge.

Head-to-Head With Other Frontier Models

Google DeepMind published comparison cards against Claude Sonnet 4.6, Claude Opus 4.7, and GPT-5.5, though specific numbers haven't been released yet. What we do know: on Artificial Analysis's Intelligence Index, Flash lands in the upper-right quadrant—frontier intelligence at Flash latency.

Why "Agent-Optimized"?

The most interesting thing about Gemini 3.5 Flash's design is that it was built from the ground up for agentic workloads.

Traditional model design asks: "How do we score higher on benchmarks?" Gemini 3.5 Flash asks a different question: "How efficiently can this model handle the tool calls, code execution, and multi-step planning that agents actually need?"

The concrete differences:

1. Speed. 4× output speed versus comparable frontier models. When an agent runs dozens of steps, per-step latency becomes the bottleneck. Flash eliminates this bottleneck fundamentally.

2. Cached input cost. Agents re-read the same context over and over. At $0.15/1M for cached input, running costs drop by an order of magnitude.

3. Tool-calling accuracy. 83.6% on MCP Atlas, 56.5% on Toolathlon. High success rates across complex tool chains.

4. Sub-agent orchestration. A single API call can spin up a fully reasoning agent. Supports code execution in isolated Linux environments, file/state persistence, and environment continuity across calls.

The Ecosystem: Antigravity, Spark, and Managed Agents

Gemini 3.5 Flash doesn't stand alone. Google announced an entire ecosystem alongside it.

Antigravity 2.0

A desktop standalone app offering parallel sub-agent execution, scheduled background tasks, and deep integration with AI Studio, Android, and Firebase. Co-optimized with Gemini 3.5 Flash.

Gemini Spark

A 24/7 autonomous agent built on 3.5 Flash. It acts on behalf of the user—handling emails, executing online tasks, even making purchases. Rolled out to trusted testers on launch day, with a beta for U.S. AI Ultra subscribers the following week.

Managed Agents in the Gemini API

Create a fully reasoning agent with a single API call. Tool use and code execution run in isolated Linux environments. Persistent environments maintain files and state across invocations.

Real-World Partner Use Cases

Company	Use Case
Shopify	Growth forecasting via parallel sub-agents
Macquarie Bank	Reasoning over 100+ page financial documents
Salesforce Agentforce	Multi-sub-agent enterprise task automation
Ramp	Multimodal OCR + pattern reasoning for invoices
Xero	Autonomous multi-week workflows (1099 prep, etc.)
Databricks	Agent-driven monitoring and search across large datasets

Gemini 3.5 Pro Arrives Next Month

What launched at I/O 2026 was Flash, but Gemini 3.5 Pro is slated for release in June 2026.

Codename: "Cappuccino." According to leaks, the Flash model already matches 92% of GPT-5.5's performance on coding and reasoning at roughly 1/15th to 1/20th the cost. The Pro version is expected to push performance even further.

Internally, Google is already running on Pro and positions it as "frontier intelligence."

Competitive Landscape

Here's where frontier models stand as of May 2026:

Model	Input/1M	Output/1M	Context	Key Trait
Gemini 3.5 Flash	$1.50	$9.00	1M	Agent-optimized, fastest
GPT-5.2	$1.25	$10.00	256K	OpenAI workhorse
Claude Sonnet 4.6	$3.00	$15.00	1M	Strong coder
Claude Opus 4.7	$5.00	$25.00	200K	Top-tier performance
DeepSeek V4 Pro	$0.44	$0.87	1M	Budget king
MiniMax M2.5	$0.15	$1.15	200K	Coding specialist

Flash occupies a distinct niche: performance approaching Opus 4.7 and GPT-5.5, priced lower than Sonnet 4.6, at 4× the speed.

The real killer advantage emerges in agentic workloads. With a 90% discount on cached input, the cost of running agents that repeatedly read the same context could drop to one-tenth of what other models charge.

Limitations

Reduced reasoning performance. 40.2% on HLE (vs. Pro's 44.4%), 72.1% on ARC-AGI-2 (vs. 77.1%). For academic reasoning and abstract problem-solving, Flash trails Pro. This is a deliberate design trade-off—prioritizing speed and agent performance over deep reasoning.

Recall degradation at 128K context. 77.3% on MRCR v2 at 128K (vs. Pro's 84.9%). Tasks requiring precise recall over long documents still favor Pro.

Text-only output. While input supports text, image, audio, and video, output is limited to text. No image or video generation.

The Big Picture

Gemini 3.5 Flash represents a philosophical shift in how we design AI models.

Traditional models aimed to score highest on benchmarks. Flash asks a different question: "How fast, how cheaply, and how accurately can a model operate in the environments agents actually run in?"

This is a fundamentally different strategy from GPT-5.5 or Claude Opus 4.7. Google didn't build the fastest model—they built the model optimized for agents.

When Gemini 3.5 Pro arrives next month, we'll find out just how far this strategy can go.

Comments (0)

Share:X Hatena

Back to Blog