Gemini Deep Research: what each AI research report actually costs
Google launched a developer API for Deep Research on April 21. The official estimate is $1-3 per task. That number is accurate but incomplete - it bakes in 50-70% cache hits and doesn't surface that Google Search queries alone add $1.12 per standard run. Here's where the money actually goes.

Image source: Google AI Blog
TL;DR
- -Two agents launched April 21:
deep-research-preview-04-2026(standard, ~$2/task) anddeep-research-max-preview-04-2026(Max, ~$5/task). - -Runs on Gemini 3.1 Pro at standard rates ($2.00/1M input, $12.00/1M output). No markup on the agent layer.
- -Google Search grounding is on by default: 80 queries per standard task, 160 per Max, at $14/1K. That adds $1.12-$2.24 per task in search costs alone.
- -Implicit caching covers 50-70% of input tokens per task - the main reason inference stays affordable inside the agentic loop.
- -Available via the Interactions API (not
generate_content), async only, paid tiers only.
What Google actually shipped
Deep Research was already live in the Gemini app and NotebookLM before April 21. What changed is the developer API - the Interactions API in public preview. You can now call these agents directly from your code, not just from Google surfaces.
Both versions run on Gemini 3.1 Pro. The model scores 85.9% on BrowseComp - a benchmark measuring an agent's ability to find hard-to-locate information through persistent web browsing. That's 1.9 points above Claude Opus 4.6 (84.0%) and well above GPT-5.2 (65.8%). For a research agent where the entire value proposition is finding things on the web, that matters.
The two agents differ in how hard they push. Standard runs about 80 web searches, processes roughly 250K input tokens and 60K output tokens, and typically completes in 5-15 minutes. Max runs 160 searches, processes around 900K input tokens and 80K output tokens, and is designed for batch jobs where you want maximum source coverage.
Cost per task: the full breakdown
Google publishes official estimates on their pricing page. We've broken out the search cost separately because it's the component most likely to catch developers off guard.
| Component | Deep Research | Deep Research Max |
|---|---|---|
| Input tokens (cumulative) | ~250K | ~900K |
| Output tokens (cumulative) | ~60K | ~80K |
| Implicit cache hit rate | 50-70% | 50-70% |
| Google Search queries | ~80 | ~160 |
| Search cost ($14/1K queries) | ~$1.12 | ~$2.24 |
| Inference cost (w/ 60% cache) | ~$0.95 | ~$1.80 |
| Total per-task estimate | $1–3 | $3–7 |
Token estimates and cost ranges from Google AI Studio pricing docs (April 21, 2026). Inference calculated at Gemini 3.1 Pro standard rates with 60% implicit cache hit rate. All numbers are Google's official estimates - no independent developer measurements exist yet (the API launched three days ago).
Base model rates
The inference pricing is standard Gemini 3.1 Pro - no premium for using the Deep Research agent wrapper. The 200K threshold matters: individual calls in the agentic loop start small but can grow as the model accumulates context across search iterations.
| Tier | Input /1M | Output /1M | Cached input /1M |
|---|---|---|---|
| Prompts ≤200K tokens | $2.00 | $12.00 | $0.20 |
| Prompts >200K tokens | $4.00 | $18.00 | $0.40 |
| Batch / Flex (async) | $1.00 | $6.00 | $0.10 |
Source: Google AI Studio pricing. Storage for explicit caches: $4.50/1M tokens per hour. No free tier on Gemini 3.1 Pro.
Why search queries cost more than inference
With 60% caching, the inference cost for a standard Deep Research task works out to roughly $0.95: about 100K uncached input tokens plus 150K cached, plus 60K output. That's not a large number at Gemini 3.1 Pro prices.
The 80 Google Search queries cost $1.12 at $14 per thousand. That's more than the inference. And it's outside developer control - the agent decides how many searches to run based on task complexity, and the current API preview doesn't expose a per-task query cap.
There are two ways to eliminate the search cost entirely. First, disable Google Search grounding and point the agent at private data via MCP servers - useful if you're doing research on internal documents. Second, use Deep Research through the consumer Gemini subscription products, where search is bundled into the subscription price.
One thing to track: the monthly free Search quota is 5,000 queries shared across all Gemini 3 models. At 80 queries per Deep Research task, you burn through that in 62 tasks. After that it's $14/1K for every query.
vs. OpenAI Deep Research
We covered OpenAI Deep Research API pricing when it launched. The per-task comparison is less straightforward than model rates suggest.
| Agent | Input /1M | Output /1M | Search /task | Typical /task |
|---|---|---|---|---|
| Gemini Deep Research | $2.00 | $12.00 | ~$1.12 | ~$2 |
| Gemini Deep Research Max | $2.00 | $12.00 | ~$2.24 | ~$5 |
| OpenAI o3-deep-research | $10.00 | $40.00 | ~$0.20 | ~$1.45 |
| OpenAI o4-mini-deep-research | $2.00 | $8.00 | ~$0.20 | ~$0.41 |
Gemini: Google Search at $14/1K, ~80-160 queries/task. OpenAI: Web Search tool at $10/1K, ~10-30 queries/task. Typical task costs are estimates using official token consumption data.
The gap that doesn't show up in model rates: Gemini runs search far more aggressively. Where o4-mini-deep-research uses 15-20 queries per task at $10/1K, Gemini Deep Research uses 80 at $14/1K. The search line item alone is $1.12 vs roughly $0.15. That's why o4-mini-deep-research ends up cheaper end-to-end at around $0.41 per task despite matching Gemini on input token price.
Coverage is the argument for Gemini. More searches typically means more sources, which matters for genuinely open-ended research tasks. Scoring 85.9% on BrowseComp is evidence the model uses those queries well. Whether that thoroughness justifies the search premium depends on what you're building.
Using the Interactions API
The Interactions API is separate from the standard generate_content endpoint. It's async-only - you submit a task, get an interaction ID, and poll until it completes. Most tasks finish in under 20 minutes. The API cap is 60 minutes.
from google import genai
client = genai.Client(api_key="YOUR_KEY")
# Submit the research task
interaction = client.interactions.create(
input="What are the latest LLM API pricing changes in Q2 2026?",
agent="deep-research-preview-04-2026",
background=True,
)
# Poll until complete
while interaction.state != "COMPLETED":
interaction = client.interactions.get(interaction.id)
print(interaction.output.text)Swap in deep-research-max-preview-04-2026 for the Max version. MCP servers can be passed via the tools parameter to point the agent at private data sources. Omit the Search tool entirely to cut the per-query costs to zero.
When the search premium pays off
- -Tasks where coverage justifies cost. Due diligence, competitive analysis, technical literature review. If a researcher would spend 3 hours on this manually, spending $2-5 on a 15-minute agent run is an easy trade.
- -Private corpus research. Disable web search, connect your own documents via MCP. You keep the multi-step reasoning across sources - without the $1.12+ per task in search fees. This is probably the highest-value API use case right now.
- -High volume runs favor o4-mini-deep-research. At roughly $0.41/task - one-fifth the cost of Gemini Deep Research standard - OpenAI's cheaper option fits workloads where you need acceptable research quality at scale rather than maximum web coverage.
- -Cost predictability is still limited. Query volume isn't developer-controlled in the current preview. Complex tasks can trigger more searches than simple ones, and there's no per-task query cap exposed yet. Budget with some headroom.
The search overhead is the real number to watch
The underlying model costs are modest. Gemini 3.1 Pro at $2.00/1M input is not expensive, and implicit caching keeps inference under a dollar per standard task. Google's $1-3 estimate is accurate.
What the headline estimate doesn't surface is that search queries are the largest line item at current defaults. At 80 queries per task and $14/1K, search costs more than inference on most standard runs. If you're building something that runs hundreds of research tasks per month, that's the cost to optimize - either by switching to private data via MCP, or by comparing per-task total against o4-mini-deep-research before committing to Gemini.
Sources
- -Gemini Deep Research API docs - Google AI for Developers
- -Google AI Studio pricing - Gemini Deep Research Agent section
- -Deep Research Max announcement - Google AI Blog (April 21, 2026)
- -Gemini 3.1 Pro benchmark table - Google DeepMind
- -BrowseComp benchmark paper - Jason Wei et al.
- -Gemini context caching documentation
Ankit Aglawe
April 24, 2026 · 8 min read