Perplexity Search API: Better Snippets via Relevance + Size

Q: How does the Search API improve relevance?

You can shape retrieval using region and language targeting and domain-based controls. Better relevance upstream leads to higher-quality snippets downstream.

Q: What are the benefits of optimised snippet size?

Optimised snippet size improves speed and readability for UI use cases, while larger snippets provide more context for research and RAG workflows.

Q: Can I customise benchmarks for snippet quality?

You can’t upload benchmarks into the API, but you can build your own evaluation set and tune settings such as results count, filters, and token budgets to meet your requirements.

Q: Which parameters most directly affect snippet length?

max_tokens controls the total content returned across all results, and max_tokens_per_page controls content extraction per individual result.

Q: What’s a sensible default to start with?

A common starting point is 5–10 results, a per-page extraction cap of 512–1,024 tokens, and a total content budget of around 10,000 tokens—then adjust based on outcomes.

Perplexity

Mar 11, 2026

A focused professional works on computer code and documents at a modern office desk, featuring dual monitors, a laptop, and a notebook, surrounded by colleagues, enhancing productivity and collaboration in a tech environment.

Free AI at Work Playbook for managers using ChatGPT, Claude and Gemini.

➔ Download the Playbook

To improve Perplexity Search API snippet quality, tune two dimensions: relevance (filters that control which sources you retrieve) and size (controls that determine how much content is extracted into the snippet fields). Use max_tokens_per_page to limit extraction per result and max_tokens to set an overall content budget across results.

Search results are only as useful as the snippets you can actually work with. If the snippet is too broad, you waste tokens and time. If it’s too short, you miss the evidence you need. And if it’s pulled from the wrong sources, your whole workflow starts on shaky ground.

Perplexity’s Search API gives you practical controls to improve snippet quality across two dimensions:

Relevance — are you extracting from the right sources for the user’s query?
Size — are you extracting the right amount of content for your use case?

This guide shows how to tune both.

Dimension 1: Relevance — get the right sources first

Before you optimise snippet size, make sure the search space is correct. Snippet quality is often a relevance problem in disguise.

Use regional and language targeting

If you’re serving users in specific markets, align retrieval to their context:

set regional constraints so results reflect the geography you care about
use language filtering when multilingual results dilute relevance

Control the source set with domain allow/deny lists

For enterprise workflows, you often want fewer sources, not more.

Examples where domain control helps:

customer support that should prefer official docs and help centres
regulated industries that should prioritise authoritative domains
internal tooling where certain publishers are not acceptable sources

Match `max_results` to your workflow

More results can improve recall — but it can also increase noise.

UI snippets: keep results low so each snippet is purposeful
research or RAG pipelines: use more results, then filter downstream

Dimension 2: Size — tune how much content becomes “snippet”

Perplexity’s Search API supports two main controls that directly affect the amount of extracted content.

Control extraction per result with `max_tokens_per_page`

max_tokens_per_page limits how much content Perplexity extracts from each webpage while processing results.

How to use it:

256–512 for quick previews and high-throughput workloads
2,048–4,096 for deeper analysis where you need more surrounding context

This is the fastest way to stop snippets becoming “mini articles”.

Control total snippet budget with `max_tokens`

max_tokens sets the maximum total tokens of webpage content returned across all results. In other words: it controls how much content appears in the snippet fields overall.

How to think about it:

max_tokens is your total content budget across the entire response
max_tokens_per_page is your cap per result

Used together, you can prevent one long page from consuming your entire output budget.

Recommended configuration patterns

Here are three starting points you can copy into your own API defaults.

Pattern A: Snippet-first UI (fast, concise)

Best for: search boxes, previews, internal portals.

max_results: 5–10
max_tokens_per_page: 256–512
max_tokens: 5,000–10,000

Pattern B: Evidence-rich (balanced)

Best for: product research, decision memos.

max_results: 10
max_tokens_per_page: 2,048–4,096
max_tokens: 20,000–50,000

Pattern C: Downstream RAG (higher recall)

Best for: pipelines that re-rank, chunk, and embed.

max_results: 15–20
max_tokens_per_page: 1,024–2,048
max_tokens: sized to your downstream constraints

Practical steps to improve snippet quality

Start with relevance: set region/language + domain controls.
Set a “safe default” for snippet size: cap per page and total budget.
Run A/B tests against real queries:
- measure time-to-answer, user satisfaction, and downstream model accuracy
Create two modes in your product:
- Preview mode (short snippets)
- Research mode (longer snippets)

Where Generation Digital can help

Snippet quality is rarely just an API setting — it’s a product decision.

Generation Digital can help you:

define search modes that match your user journeys
create safe defaults for extraction budgets
build evaluation harnesses to measure “quality” on your real queries
integrate search with your workflow stack

Summary

Perplexity’s Search API snippet quality improves when you tune the right levers: relevance (filters that shape your source set) and size (controls that shape extracted snippet content). Use max_tokens_per_page to limit extraction per result, and max_tokens to manage the overall snippet budget.

Next steps: If you want help designing search modes, evaluation benchmarks, or governance for production search workflows, speak with Generation Digital: https://www.gend.co/contact

FAQs

Q1: How does the Search API improve relevance?
By allowing you to shape retrieval using region and language targeting, as well as domain-based controls. Better relevance upstream leads to higher-quality snippets downstream.

Q2: What are the benefits of optimised snippet size?
Smaller snippets improve speed and readability; larger snippets provide more evidence and context. The right size depends on whether you’re building a UI experience, a research workflow, or a RAG pipeline.

Q3: Can I customise benchmarks for snippet quality?
You can’t “upload benchmarks” into the API, but you can build your own evaluation set and tune extraction and relevance settings (results count, filters, token budgets) to meet your specific requirements.

Q4: Which parameters most directly affect snippet length?
max_tokens controls the total amount of content returned across all snippets, and max_tokens_per_page controls how much is extracted per result.

Q5: What’s a sensible default to start with?
For most applications: keep results to 5–10, cap per-page extraction to 512–1,024, and set a total budget of 10,000 tokens—then adjust based on your user experience.

‹ Secure AI Agents: OpenAI Defences Against Prompt Injection

Perplexity Sandbox API: Secure Code Execution for Agents ›

Get weekly AI news and advice delivered to your inbox

By subscribing you consent to Generation Digital storing and processing your details in line with our privacy policy. You can read the full policy at gend.co/privacy.

Beyond the Pilot: Scaling AI to Boost Private Equity Portfolio Value

Boost Private Equity Portfolio Value: Scale AI Pilots for Growth

A group of professionals in a modern office setting is focused on a tablet displaying data related to Samsung Browsing Assist, emphasizing collaborative technology solutions powered by Perplexity APIs for enhancing productivity across various devices.

Samsung Browsing Assist: Perplexity APIs Power 1B Devices

A group of professionals sitting at a modern office space, with a central person using voice-activated technology on a smartphone, illustrating the theme "Gemini Live: The Future of Natural Audio AI."

Gemini Live: The Future of Natural Audio AI

Generation
Digital

Miro
Asana
Notion
Glean

Which AI Tool? Quiz

The Pathway to AI Success

About Generation Digital

Contact

UK Office

Generation Digital Ltd
33 Queen St,
London
EC4R 1AP
United Kingdom

Canada Office

Generation Digital Americas Inc
181 Bay St., Suite 1800
Toronto, ON, M5J 2T9
Canada

USA Office

Generation Digital Americas Inc
77 Sands St,
Brooklyn, NY 11201,
United States

EU Office

Generation Digital Software
Elgee Building
Dundalk
A91 X2R3
Ireland

Middle East Office

6994 Alsharq 3890,
An Narjis,
Riyadh 13343,
Saudi Arabia