20 May 2026·7 min read·By Elena Vance

Gemini 3.5 Flash: What Buyers Need to Know

Google's Gemini 3.5 Flash delivers frontier AI at lower cost, plus a new Spark agent. What developers and subscribers need to know

Gemini 3.5 Flash is rolling out across Google products starting today, and if you build on AI or pay for tokens, this one matters. The model delivers nearly 300 tokens per second while matching benchmark scores of larger frontier models that crawl along at a quarter of that speed. Google says companies burning through the most AI tokens could save a billion dollars a year by switching. That is not hype. That is a pricing signal you should pay attention to.

The Speed-Cost Breakthrough

Here is the deal. Generative AI has been a money pit. Everyone knows it. The major players are scrambling to find efficiency, and the math gets even worse when you start running agentic workflows that spin for minutes or hours to finish complex tasks. Gemini 3.5 Flash is Google's answer to that specific problem.

Let me put it bluntly. This model is not trying to be the smartest. It is trying to be the smartest per dollar. And per second. And per watt. That combination rewrites what you can afford to automate.

"Certain things like UI control are expensive to do because the model has to search the page, it has to know where to click, it has to act through multiple steps. I think Flash is able to do that well because of that combination of quality and cost," said Tulsee Doshi, senior director of product management for Gemini.

The Numbers That Matter

$1.50 per million input tokens. $9 per million output tokens. That is the API pricing for Gemini 3.5 Flash. Compare that to Gemini 3.1 Pro, which starts at $2 and $12 respectively and climbs higher past 200k tokens. The gap is real, and at scale, it compounds fast.

300 tokens per second. That is the output speed. Most frontier models with comparable benchmark scores deliver a fraction of that throughput. If your application needs snappy responses or you are chaining dozens of calls together, speed and cost are not separate conversations. They are the same conversation.

Benchmark Reality Check

On Terminal Bench and SWE-Bench Pro, Gemini 3.5 Flash clobbers older Flash models and edges past Gemini 3.1 Pro. Its scores sit in the same neighborhood as OpenAI's GPT 5.5, a much larger and more expensive model. On OSWorld-Verified, which tests how models handle real computing environments, the story is the same. It ties GPT 5.5 and runs faster than 3.1 Pro.

But here is what the model card buried. Google ran internal metrics on how its own engineers code, testing against real Googler codebases. Doshi described the jump from 3.1 Pro to 3.5 Flash as "a massive, massive jump." Internal dogfooding at that scale tells you more than any public benchmark.

Gemini Spark: AI That Works While You Sleep

Companies are swapping "AI" for "agents" as their primary buzzword. But Google's no exception. Gemini Spark is the company's first dedicated agent that runs 24/7 in Google's cloud, but it doesn't touch your computing resources or tie to a specific device or browser tab, spanning your entire Google footprint.

"I think of agents as being able to take a model plus a framework such that the combination can actually take action on your behalf," Doshi said.

What Spark Actually Does

You give Spark instructions. It handles the task over time. It grabs context from your Drive files, Gmail, and more. It can watch for specific emails and fold them into daily digests. It can monitor meetings, generate summaries, and surface action items. It sends notifications. It asks follow-up questions. And Google stresses that it asks for your approval before taking high-stakes actions.

Doshi used Spark to pull together evaluations and stats on 3.5 Flash for a slide deck aimed at Google leadership. "It turned out beautifully," she said. "Probably better and in much less time than I would have been able to do." On the personal side, she built an agent that tracks developmental milestones for her new child. "I'm treating my child like an AI model," she joked. "I realize that, but it has been very helpful."

It's $100 per month. The price tag might make you choke. Google added a new Ultra tier. Spark rolls out to AI Ultra subscribers next week, but the $200 per month tier still exists, now $50 cheaper than before, for those who want higher token limits. So Google says the plan is to bring Spark to all users eventually, including free-tier users.

Real talk. $100 per month is steep. The source calls it "an astronomical amount for AI tools." But if Spark actually saves you hours per week on real work, the calculus shifts. Many things people share with Google today would have been unthinkable a decade ago. Sensibilities adjust when the utility is undeniable.

Omni: The Model Google Is Not Sure About Yet

Google announced Gemini Omni Flash. It replaces Veo 3 in products like the Gemini app, YouTube, and Flow. Omni is designed to be truly multimodal. Any input. Any output. Images, text, video, audio. That is the vision.

But that framing misses something. Omni does not do most of that right now. It starts with video only. The rest is aspirational.

"The vision for Gemini has always been that it would be multimodal in, multimodal out. Omni is a step toward that vision," Doshi said.

Right now, Google routes your prompts to different models depending on what you want. Images go to Nano Banana. Music goes to Lyria. Developers plug into different APIs. Some models are not available in all tools. Omni could eventually unify all of that, but the Gemini team is not sure how it will develop. They plan to open Omni to more output types in the coming months and see how it performs. "We might find that there are certain use cases that really benefit from their own custom model and specific focus," Doshi said. "It's not fully proven out yet."

An Omni Pro model is planned. No timeline exists.

Where You Will Find It

Gemini 3.5 Flash is not staying locked behind an API endpoint. It lands in the Gemini app, AI Studio, Android Studio, and all of Google's enterprise products. The Antigravity IDE gets upgraded to version 2.0 with 3.5 Flash support, including multiple parallel workflows where sub-agents spawn from the main model. Google says this only works because the model spits out tokens so efficiently.

A close up of a cell phone with icons on it

The Pro variant of 3.5 is already in internal testing. It should ship next month. If the pattern holds, Pro will leap ahead on raw intelligence while Flash catches up in the next cycle.

The Verdict

Gemini 3.5 Flash does not need to be the smartest model on the planet. It needs to be smart enough at a price that makes agentic workflows viable. On that metric, Google delivered. The speed and cost numbers are real, the internal results are compelling, and the product surface is wide enough to matter immediately.

The open question is Spark. A $100 monthly agent that lives in your Google account sounds either indispensable or invasive depending on your tolerance. Google is betting your tolerance grows once the utility lands.

As reported by Ars Technica, this is the first time Google's Flash line has credibly challenged its own Pro models on quality while dramatically undercutting them on cost. If you pay for inference at scale, you should be running your own benchmarks this week. Not next month. This week.

Frequently Asked Questions

What is Gemini 3.5 Flash?

Gemini 3.5 Flash is a lightweight, fast, and cost-effective AI model from Google, optimized for high-volume tasks.

How does Gemini 3.5 Flash differ from other Gemini models?

It prioritizes speed and efficiency over raw capability, making it ideal for real-time applications where latency matters.

What are the key benefits for buyers?

Buyers get lower costs per API call, faster response times, and reduced computational overhead for scalable deployments.

What use cases is Gemini 3.5 Flash best suited for?

It excels in chatbots, content summarization, and real-time data processing where quick, accurate outputs are needed.

Is Gemini 3.5 Flash available through Google Cloud?

Yes, it is accessible via Vertex AI and the Gemini API, with pay-as-you-go pricing for developers and enterprises.

Written by

Elena Vance

Artificial Intelligence Correspondent

Elena Vance reports on artificial intelligence, from frontier research labs to the products reshaping everyday work. She focuses on how machine learning is moving out of the lab and into the real world, and what that shift means for readers.

Share:𝕏 Facebook WhatsApp LinkedIn