Perplexity Search API: Better Snippets via Relevance + Size

Perplexity Search API: Better Snippets via Relevance + Size

Pérplexité

11 mars 2026

A focused professional works on computer code and documents at a modern office desk, featuring dual monitors, a laptop, and a notebook, surrounded by colleagues, enhancing productivity and collaboration in a tech environment.

Pas sûr de quoi faire ensuite avec l'IA?Évaluez la préparation, les risques et les priorités en moins d'une heure.

Pas sûr de quoi faire ensuite avec l'IA?Évaluez la préparation, les risques et les priorités en moins d'une heure.

➔ Téléchargez notre kit de préparation à l'IA gratuit

To improve Perplexity Search API snippet quality, tune two dimensions: relevance (filters that control which sources you retrieve) and size (controls that determine how much content is extracted into the snippet fields). Use max_tokens_per_page to limit extraction per result and max_tokens to set an overall content budget across results.

Search results are only as useful as the snippets you can actually work with. If the snippet is too broad, you waste tokens and time. If it’s too short, you miss the evidence you need. And if it’s pulled from the wrong sources, your whole workflow starts on shaky ground.

Perplexity’s Search API gives you practical controls to improve snippet quality across two dimensions:

  1. Relevance — are you extracting from the right sources for the user’s query?

  2. Size — are you extracting the right amount of content for your use case?

This guide shows how to tune both.

Dimension 1: Relevance — get the right sources first

Before you optimise snippet size, make sure the search space is correct. Snippet quality is often a relevance problem in disguise.

Use regional and language targeting

If you’re serving users in specific markets, align retrieval to their context:

  • set regional constraints so results reflect the geography you care about

  • use language filtering when multilingual results dilute relevance

Control the source set with domain allow/deny lists

For enterprise workflows, you often want fewer sources, not more.

Examples where domain control helps:

  • customer support that should prefer official docs and help centres

  • regulated industries that should prioritise authoritative domains

  • internal tooling where certain publishers are not acceptable sources

Match max_results to your workflow

More results can improve recall — but it can also increase noise.

  • UI snippets: keep results low so each snippet is purposeful

  • research or RAG pipelines: use more results, then filter downstream

Dimension 2: Size — tune how much content becomes “snippet”

Perplexity’s Search API supports two main controls that directly affect the amount of extracted content.

Control extraction per result with max_tokens_per_page

max_tokens_per_page limits how much content Perplexity extracts from each webpage while processing results.

How to use it:

  • 256–512 for quick previews and high-throughput workloads

  • 2,048–4,096 for deeper analysis where you need more surrounding context

This is the fastest way to stop snippets becoming “mini articles”.

Control total snippet budget with max_tokens

max_tokens sets the maximum total tokens of webpage content returned across all results. In other words: it controls how much content appears in the snippet fields overall.

How to think about it:

  • max_tokens is your total content budget across the entire response

  • max_tokens_per_page is your cap per result

Used together, you can prevent one long page from consuming your entire output budget.

Recommended configuration patterns

Here are three starting points you can copy into your own API defaults.

Pattern A: Snippet-first UI (fast, concise)

Best for: search boxes, previews, internal portals.

  • max_results: 5–10

  • max_tokens_per_page: 256–512

  • max_tokens: 5,000–10,000

Pattern B: Evidence-rich (balanced)

Best for: product research, decision memos.

  • max_results: 10

  • max_tokens_per_page: 2,048–4,096

  • max_tokens: 20,000–50,000

Pattern C: Downstream RAG (higher recall)

Best for: pipelines that re-rank, chunk, and embed.

  • max_results: 15–20

  • max_tokens_per_page: 1,024–2,048

  • max_tokens: sized to your downstream constraints

Practical steps to improve snippet quality

  1. Start with relevance: set region/language + domain controls.

  2. Set a “safe default” for snippet size: cap per page and total budget.

  3. Run A/B tests against real queries:

    • measure time-to-answer, user satisfaction, and downstream model accuracy

  4. Create two modes in your product:

    • Preview mode (short snippets)

    • Research mode (longer snippets)

Where Generation Digital can help

Snippet quality is rarely just an API setting — it’s a product decision.

Generation Digital can help you:

  • define search modes that match your user journeys

  • create safe defaults for extraction budgets

  • build evaluation harnesses to measure “quality” on your real queries

  • integrate search with your workflow stack

Summary

Perplexity’s Search API snippet quality improves when you tune the right levers: relevance (filters that shape your source set) and size (controls that shape extracted snippet content). Use max_tokens_per_page to limit extraction per result, and max_tokens to manage the overall snippet budget.

Next steps: If you want help designing search modes, evaluation benchmarks, or governance for production search workflows, speak with Generation Digital: https://www.gend.co/contact

FAQs

Q1: How does the Search API improve relevance?
By allowing you to shape retrieval using region and language targeting, as well as domain-based controls. Better relevance upstream leads to higher-quality snippets downstream.

Q2: What are the benefits of optimised snippet size?
Smaller snippets improve speed and readability; larger snippets provide more evidence and context. The right size depends on whether you’re building a UI experience, a research workflow, or a RAG pipeline.

Q3: Can I customise benchmarks for snippet quality?
You can’t “upload benchmarks” into the API, but you can build your own evaluation set and tune extraction and relevance settings (results count, filters, token budgets) to meet your specific requirements.

Q4: Which parameters most directly affect snippet length?
max_tokens controls the total amount of content returned across all snippets, and max_tokens_per_page controls how much is extracted per result.

Q5: What’s a sensible default to start with?
For most applications: keep results to 5–10, cap per-page extraction to 512–1,024, and set a total budget of 10,000 tokens—then adjust based on your user experience.

To improve Perplexity Search API snippet quality, tune two dimensions: relevance (filters that control which sources you retrieve) and size (controls that determine how much content is extracted into the snippet fields). Use max_tokens_per_page to limit extraction per result and max_tokens to set an overall content budget across results.

Search results are only as useful as the snippets you can actually work with. If the snippet is too broad, you waste tokens and time. If it’s too short, you miss the evidence you need. And if it’s pulled from the wrong sources, your whole workflow starts on shaky ground.

Perplexity’s Search API gives you practical controls to improve snippet quality across two dimensions:

  1. Relevance — are you extracting from the right sources for the user’s query?

  2. Size — are you extracting the right amount of content for your use case?

This guide shows how to tune both.

Dimension 1: Relevance — get the right sources first

Before you optimise snippet size, make sure the search space is correct. Snippet quality is often a relevance problem in disguise.

Use regional and language targeting

If you’re serving users in specific markets, align retrieval to their context:

  • set regional constraints so results reflect the geography you care about

  • use language filtering when multilingual results dilute relevance

Control the source set with domain allow/deny lists

For enterprise workflows, you often want fewer sources, not more.

Examples where domain control helps:

  • customer support that should prefer official docs and help centres

  • regulated industries that should prioritise authoritative domains

  • internal tooling where certain publishers are not acceptable sources

Match max_results to your workflow

More results can improve recall — but it can also increase noise.

  • UI snippets: keep results low so each snippet is purposeful

  • research or RAG pipelines: use more results, then filter downstream

Dimension 2: Size — tune how much content becomes “snippet”

Perplexity’s Search API supports two main controls that directly affect the amount of extracted content.

Control extraction per result with max_tokens_per_page

max_tokens_per_page limits how much content Perplexity extracts from each webpage while processing results.

How to use it:

  • 256–512 for quick previews and high-throughput workloads

  • 2,048–4,096 for deeper analysis where you need more surrounding context

This is the fastest way to stop snippets becoming “mini articles”.

Control total snippet budget with max_tokens

max_tokens sets the maximum total tokens of webpage content returned across all results. In other words: it controls how much content appears in the snippet fields overall.

How to think about it:

  • max_tokens is your total content budget across the entire response

  • max_tokens_per_page is your cap per result

Used together, you can prevent one long page from consuming your entire output budget.

Recommended configuration patterns

Here are three starting points you can copy into your own API defaults.

Pattern A: Snippet-first UI (fast, concise)

Best for: search boxes, previews, internal portals.

  • max_results: 5–10

  • max_tokens_per_page: 256–512

  • max_tokens: 5,000–10,000

Pattern B: Evidence-rich (balanced)

Best for: product research, decision memos.

  • max_results: 10

  • max_tokens_per_page: 2,048–4,096

  • max_tokens: 20,000–50,000

Pattern C: Downstream RAG (higher recall)

Best for: pipelines that re-rank, chunk, and embed.

  • max_results: 15–20

  • max_tokens_per_page: 1,024–2,048

  • max_tokens: sized to your downstream constraints

Practical steps to improve snippet quality

  1. Start with relevance: set region/language + domain controls.

  2. Set a “safe default” for snippet size: cap per page and total budget.

  3. Run A/B tests against real queries:

    • measure time-to-answer, user satisfaction, and downstream model accuracy

  4. Create two modes in your product:

    • Preview mode (short snippets)

    • Research mode (longer snippets)

Where Generation Digital can help

Snippet quality is rarely just an API setting — it’s a product decision.

Generation Digital can help you:

  • define search modes that match your user journeys

  • create safe defaults for extraction budgets

  • build evaluation harnesses to measure “quality” on your real queries

  • integrate search with your workflow stack

Summary

Perplexity’s Search API snippet quality improves when you tune the right levers: relevance (filters that shape your source set) and size (controls that shape extracted snippet content). Use max_tokens_per_page to limit extraction per result, and max_tokens to manage the overall snippet budget.

Next steps: If you want help designing search modes, evaluation benchmarks, or governance for production search workflows, speak with Generation Digital: https://www.gend.co/contact

FAQs

Q1: How does the Search API improve relevance?
By allowing you to shape retrieval using region and language targeting, as well as domain-based controls. Better relevance upstream leads to higher-quality snippets downstream.

Q2: What are the benefits of optimised snippet size?
Smaller snippets improve speed and readability; larger snippets provide more evidence and context. The right size depends on whether you’re building a UI experience, a research workflow, or a RAG pipeline.

Q3: Can I customise benchmarks for snippet quality?
You can’t “upload benchmarks” into the API, but you can build your own evaluation set and tune extraction and relevance settings (results count, filters, token budgets) to meet your specific requirements.

Q4: Which parameters most directly affect snippet length?
max_tokens controls the total amount of content returned across all snippets, and max_tokens_per_page controls how much is extracted per result.

Q5: What’s a sensible default to start with?
For most applications: keep results to 5–10, cap per-page extraction to 512–1,024, and set a total budget of 10,000 tokens—then adjust based on your user experience.

Recevez chaque semaine des nouvelles et des conseils sur l'IA directement dans votre boîte de réception

En vous abonnant, vous consentez à ce que Génération Numérique stocke et traite vos informations conformément à notre politique de confidentialité. Vous pouvez lire la politique complète sur gend.co/privacy.

Génération
Numérique

Bureau du Royaume-Uni

Génération Numérique Ltée
33 rue Queen,
Londres
EC4R 1AP
Royaume-Uni

Bureau au Canada

Génération Numérique Amériques Inc
181 rue Bay, Suite 1800
Toronto, ON, M5J 2T9
Canada

Bureau aux États-Unis

Generation Digital Americas Inc
77 Sands St,
Brooklyn, NY 11201,
États-Unis

Bureau de l'UE

Génération de logiciels numériques
Bâtiment Elgee
Dundalk
A91 X2R3
Irlande

Bureau du Moyen-Orient

6994 Alsharq 3890,
An Narjis,
Riyad 13343,
Arabie Saoudite

UK Fast Growth Index UBS Logo
Financial Times FT 1000 Logo
Febe Growth 100 Logo (Background Removed)

Numéro d'entreprise : 256 9431 77 | Droits d'auteur 2026 | Conditions générales | Politique de confidentialité

Génération
Numérique

Bureau du Royaume-Uni

Génération Numérique Ltée
33 rue Queen,
Londres
EC4R 1AP
Royaume-Uni

Bureau au Canada

Génération Numérique Amériques Inc
181 rue Bay, Suite 1800
Toronto, ON, M5J 2T9
Canada

Bureau aux États-Unis

Generation Digital Americas Inc
77 Sands St,
Brooklyn, NY 11201,
États-Unis

Bureau de l'UE

Génération de logiciels numériques
Bâtiment Elgee
Dundalk
A91 X2R3
Irlande

Bureau du Moyen-Orient

6994 Alsharq 3890,
An Narjis,
Riyad 13343,
Arabie Saoudite

UK Fast Growth Index UBS Logo
Financial Times FT 1000 Logo
Febe Growth 100 Logo (Background Removed)


Numéro d'entreprise : 256 9431 77
Conditions générales
Politique de confidentialité
Droit d'auteur 2026