Perplexity Search API: Better Snippets via Relevance + Size

Q: How does the Search API improve relevance?

You can shape retrieval using region and language targeting and domain-based controls. Better relevance upstream leads to higher-quality snippets downstream.

Q: What are the benefits of optimised snippet size?

Optimised snippet size improves speed and readability for UI use cases, while larger snippets provide more context for research and RAG workflows.

Q: Can I customise benchmarks for snippet quality?

You can’t upload benchmarks into the API, but you can build your own evaluation set and tune settings such as results count, filters, and token budgets to meet your requirements.

Q: Which parameters most directly affect snippet length?

max_tokens controls the total content returned across all results, and max_tokens_per_page controls content extraction per individual result.

Q: What’s a sensible default to start with?

A common starting point is 5–10 results, a per-page extraction cap of 512–1,024 tokens, and a total content budget of around 10,000 tokens—then adjust based on outcomes.

Pérplexité

11 mars 2026

A focused professional works on computer code and documents at a modern office desk, featuring dual monitors, a laptop, and a notebook, surrounded by colleagues, enhancing productivity and collaboration in a tech environment.

Pas sûr de quoi faire ensuite avec l'IA?Évaluez la préparation, les risques et les priorités en moins d'une heure.

➔ Téléchargez notre kit de préparation à l'IA gratuit

To improve Perplexity Search API snippet quality, tune two dimensions: relevance (filters that control which sources you retrieve) and size (controls that determine how much content is extracted into the snippet fields). Use max_tokens_per_page to limit extraction per result and max_tokens to set an overall content budget across results.

Search results are only as useful as the snippets you can actually work with. If the snippet is too broad, you waste tokens and time. If it’s too short, you miss the evidence you need. And if it’s pulled from the wrong sources, your whole workflow starts on shaky ground.

Perplexity’s Search API gives you practical controls to improve snippet quality across two dimensions:

Relevance — are you extracting from the right sources for the user’s query?
Size — are you extracting the right amount of content for your use case?

This guide shows how to tune both.

Dimension 1: Relevance — get the right sources first

Before you optimise snippet size, make sure the search space is correct. Snippet quality is often a relevance problem in disguise.

Use regional and language targeting

If you’re serving users in specific markets, align retrieval to their context:

set regional constraints so results reflect the geography you care about
use language filtering when multilingual results dilute relevance

Control the source set with domain allow/deny lists

For enterprise workflows, you often want fewer sources, not more.

Examples where domain control helps:

customer support that should prefer official docs and help centres
regulated industries that should prioritise authoritative domains
internal tooling where certain publishers are not acceptable sources

Match `max_results` to your workflow

More results can improve recall — but it can also increase noise.

UI snippets: keep results low so each snippet is purposeful
research or RAG pipelines: use more results, then filter downstream

Dimension 2: Size — tune how much content becomes “snippet”

Perplexity’s Search API supports two main controls that directly affect the amount of extracted content.

Control extraction per result with `max_tokens_per_page`

max_tokens_per_page limits how much content Perplexity extracts from each webpage while processing results.

How to use it:

256–512 for quick previews and high-throughput workloads
2,048–4,096 for deeper analysis where you need more surrounding context

This is the fastest way to stop snippets becoming “mini articles”.

Control total snippet budget with `max_tokens`

max_tokens sets the maximum total tokens of webpage content returned across all results. In other words: it controls how much content appears in the snippet fields overall.

How to think about it:

max_tokens is your total content budget across the entire response
max_tokens_per_page is your cap per result

Used together, you can prevent one long page from consuming your entire output budget.

Recommended configuration patterns

Here are three starting points you can copy into your own API defaults.

Pattern A: Snippet-first UI (fast, concise)

Best for: search boxes, previews, internal portals.

max_results: 5–10
max_tokens_per_page: 256–512
max_tokens: 5,000–10,000

Pattern B: Evidence-rich (balanced)

Best for: product research, decision memos.

max_results: 10
max_tokens_per_page: 2,048–4,096
max_tokens: 20,000–50,000

Pattern C: Downstream RAG (higher recall)

Best for: pipelines that re-rank, chunk, and embed.

max_results: 15–20
max_tokens_per_page: 1,024–2,048
max_tokens: sized to your downstream constraints

Practical steps to improve snippet quality

Start with relevance: set region/language + domain controls.
Set a “safe default” for snippet size: cap per page and total budget.
Run A/B tests against real queries:
- measure time-to-answer, user satisfaction, and downstream model accuracy
Create two modes in your product:
- Preview mode (short snippets)
- Research mode (longer snippets)

Where Generation Digital can help

Snippet quality is rarely just an API setting — it’s a product decision.

Generation Digital can help you:

define search modes that match your user journeys
create safe defaults for extraction budgets
build evaluation harnesses to measure “quality” on your real queries
integrate search with your workflow stack

Summary

Perplexity’s Search API snippet quality improves when you tune the right levers: relevance (filters that shape your source set) and size (controls that shape extracted snippet content). Use max_tokens_per_page to limit extraction per result, and max_tokens to manage the overall snippet budget.

Next steps: If you want help designing search modes, evaluation benchmarks, or governance for production search workflows, speak with Generation Digital: https://www.gend.co/contact

FAQs

Q1: How does the Search API improve relevance?
By allowing you to shape retrieval using region and language targeting, as well as domain-based controls. Better relevance upstream leads to higher-quality snippets downstream.

Q2: What are the benefits of optimised snippet size?
Smaller snippets improve speed and readability; larger snippets provide more evidence and context. The right size depends on whether you’re building a UI experience, a research workflow, or a RAG pipeline.

Q3: Can I customise benchmarks for snippet quality?
You can’t “upload benchmarks” into the API, but you can build your own evaluation set and tune extraction and relevance settings (results count, filters, token budgets) to meet your specific requirements.

Q4: Which parameters most directly affect snippet length?
max_tokens controls the total amount of content returned across all snippets, and max_tokens_per_page controls how much is extracted per result.

Q5: What’s a sensible default to start with?
For most applications: keep results to 5–10, cap per-page extraction to 512–1,024, and set a total budget of 10,000 tokens—then adjust based on your user experience.

‹ Secure AI Agents: OpenAI Defences Against Prompt Injection

Perplexity Sandbox API: Secure Code Execution for Agents ›

Recevez chaque semaine des nouvelles et des conseils sur l'IA directement dans votre boîte de réception

En vous abonnant, vous consentez à ce que Génération Numérique stocke et traite vos informations conformément à notre politique de confidentialité. Vous pouvez lire la politique complète sur gend.co/privacy.

Beyond the Pilot: Scaling AI to Boost Private Equity Portfolio Value

Boost Private Equity Portfolio Value: Scale AI Pilots for Growth

A group of professionals in a modern office setting is focused on a tablet displaying data related to Samsung Browsing Assist, emphasizing collaborative technology solutions powered by Perplexity APIs for enhancing productivity across various devices.

Samsung Browsing Assist: Perplexity APIs Power 1B Devices

A group of professionals sitting at a modern office space, with a central person using voice-activated technology on a smartphone, illustrating the theme "Gemini Live: The Future of Natural Audio AI."

Gemini Live: The Future of Natural Audio AI

Génération
Numérique

Miro
Asana
Notion
Glean

Quel outil d'IA? Quiz

Le chemin vers le succès avec l'IA

À propos de Generation Digital

Contact

Bureau du Royaume-Uni

Génération Numérique Ltée
33 rue Queen,
Londres
EC4R 1AP
Royaume-Uni

Bureau au Canada

Génération Numérique Amériques Inc
181 rue Bay, Suite 1800
Toronto, ON, M5J 2T9
Canada

Bureau aux États-Unis

Generation Digital Americas Inc
77 Sands St,
Brooklyn, NY 11201,
États-Unis

Bureau de l'UE

Génération de logiciels numériques
Bâtiment Elgee
Dundalk
A91 X2R3
Irlande

Bureau du Moyen-Orient

6994 Alsharq 3890,
An Narjis,
Riyad 13343,
Arabie Saoudite

Numéro d'entreprise : 256 9431 77 | Droits d'auteur 2026 | Conditions générales | Politique de confidentialité