On‑device AI vs data centres and what leaders should do now

On‑device AI vs data centres and what leaders should do now

IA

Pérplexité

9 janv. 2026

A modern data center with rows of server racks and a glowing question mark symbol in the center, symbolizing the concept of on-device AI versus data centers and their roles in technology.
A modern data center with rows of server racks and a glowing question mark symbol in the center, symbolizing the concept of on-device AI versus data centers and their roles in technology.

On‑device AI could dent mega data centres—here’s how to plan

The AI boom triggered a global race to build vast, power‑hungry data centres. Perplexity’s CEO Aravind Srinivas has thrown a spanner in that narrative: if inference increasingly runs on device, the economics of centralised AI might soften over time. Whether or not you buy the strongest form of the claim, it’s a signal to diversify architecture bets now.

Why the argument is credible

  • Efficiency gains: Smaller, instruction‑tuned models keep getting better, unlocking useful tasks at lower compute budgets.

  • Silicon roadmap: NPUs in laptops and phones accelerate matrix ops locally, shrinking latency and cloud egress.

  • Privacy & sovereignty: Local processing reduces data movement, helping with GDPR and sectoral controls.

  • Cost exposure: Cloud AI spend is volatile; shifting a tranche of workloads to device/edge can stabilise unit economics.

Where on‑device fits (today)

  • Summaries and translations of local documents/email on laptops.

  • Contextual helpers in productivity apps with restricted data scopes.

  • Field work: offline drafting, policy look‑ups, and speech transcription on mobiles.

  • Sensitive notes: client or patient‑side triage where data must not transit external clouds.

Where cloud still wins (for now)

  • Large‑context reasoning over big corpora.

  • Heavy multimodal (high‑res video, complex tools) and agentic orchestration.

  • Team‑wide grounding (RAG) against enterprise knowledge with strong observability.

  • Burst capacity for spikes (earnings days, incidents).

Architecture options: hybrid, not binary

  1. Device‑first, cloud‑assist

    • Run a compact model on device; call a cloud model only for escalations.

    • Cache embeddings locally; sync encrypted summaries when online.

  2. Edge/VPC inference

    • Host models in your VPC or colocation for sensitive prompts; keep observability and policy control.

  3. Cloud with smart client

    • Stay cloud‑centric but offload pre/post‑processing and redaction to device NPUs to cut tokens and risk.

Decision framework (CFO/CTO‑friendly)

Criterion

Device-first

Edge/VPC

Cloud-first

Latency

Best (local)

Good (nearby)

Variable

Unit cost

Low per task; fixed device CAPEX

Medium

Pay‑as‑you‑go; can spike

Privacy

Strong (local data)

Strong (residency)

Manage via controls

Observability

Harder; client logging

Strong

Strong

Model size

Small/medium

Medium

Any

Governance implications

  • DPIA/records of processing: document local vs remote paths; justify lawful basis.

  • Content controls: exclude customer data from model training; pin versions for audit.

  • Telemetry minimisation: collect just enough client logs for safety/QA; hash or aggregate sensitive fields.

  • Device posture: enforce OS version, disk encryption, secure enclaves, and remote wipe.

A 90‑day evaluation plan

Weeks 1–2 – Discovery

  • Inventory candidate workloads; tag by sensitivity, latency, context size.

  • Select 3 use cases (e.g., local doc summarisation; mobile transcription; offline policy Q&A).

Weeks 3–6 – Thin slices

  • Ship device‑first prototypes; integrate a cloud escalation path; measure latency, cost per task, override rate.

Weeks 7–12 – Compare & decide

  • A/B device vs cloud for the same task; model TCO over 12 months; set guardrails for productionisation.

Risks & realities (a balanced view)

  • Hype risk: Not all workloads fit device constraints; keep cloud capacity for heavy jobs.

  • Ops overhead: Fleet model distribution/updates and NPU fragmentation need tooling.

  • Security trade‑offs: Endpoints are attack surfaces; harden devices and sign model artefacts.

  • Vendor posture: Validate claims; prefer benchmarks, energy profiles, and roadmaps, not slogans.

Bottom line

On‑device AI is rising, and it will likely rebalance where inference happens. Don’t bet the farm on a single architecture: run hybrid, measure ruthlessly, and move workloads to the cheapest trustworthy path that meets governance needs.

Next Steps: Need help building a hybrid AI plan? Generation Digital runs architecture sprints, TCO models, and pilot builds for regulated sectors.

FAQ

Q1. Will data centres really become obsolete?
A. Unlikely in the near term. Expect rebalancing, with more inference on devices/edge and cloud for heavy or shared contexts.

Q2. What should we pilot first?
A. Low‑risk, high‑volume tasks: local doc/email summarisation, transcription, and offline Q&A with cloud escalation.

Q3. How do we keep auditors happy with on‑device AI?
A. Log prompts/results locally with periodic secure sync, pin model versions, and publish a data‑flow map.

Q4. What hardware matters?
A. NPUs, memory bandwidth, and secure enclaves; ensure managed distribution of models and signed updates.

Q5. How do we measure success?
A. Cost per task, latency, override rate, citation coverage (when using RAG), and user satisfaction.

On‑device AI could dent mega data centres—here’s how to plan

The AI boom triggered a global race to build vast, power‑hungry data centres. Perplexity’s CEO Aravind Srinivas has thrown a spanner in that narrative: if inference increasingly runs on device, the economics of centralised AI might soften over time. Whether or not you buy the strongest form of the claim, it’s a signal to diversify architecture bets now.

Why the argument is credible

  • Efficiency gains: Smaller, instruction‑tuned models keep getting better, unlocking useful tasks at lower compute budgets.

  • Silicon roadmap: NPUs in laptops and phones accelerate matrix ops locally, shrinking latency and cloud egress.

  • Privacy & sovereignty: Local processing reduces data movement, helping with GDPR and sectoral controls.

  • Cost exposure: Cloud AI spend is volatile; shifting a tranche of workloads to device/edge can stabilise unit economics.

Where on‑device fits (today)

  • Summaries and translations of local documents/email on laptops.

  • Contextual helpers in productivity apps with restricted data scopes.

  • Field work: offline drafting, policy look‑ups, and speech transcription on mobiles.

  • Sensitive notes: client or patient‑side triage where data must not transit external clouds.

Where cloud still wins (for now)

  • Large‑context reasoning over big corpora.

  • Heavy multimodal (high‑res video, complex tools) and agentic orchestration.

  • Team‑wide grounding (RAG) against enterprise knowledge with strong observability.

  • Burst capacity for spikes (earnings days, incidents).

Architecture options: hybrid, not binary

  1. Device‑first, cloud‑assist

    • Run a compact model on device; call a cloud model only for escalations.

    • Cache embeddings locally; sync encrypted summaries when online.

  2. Edge/VPC inference

    • Host models in your VPC or colocation for sensitive prompts; keep observability and policy control.

  3. Cloud with smart client

    • Stay cloud‑centric but offload pre/post‑processing and redaction to device NPUs to cut tokens and risk.

Decision framework (CFO/CTO‑friendly)

Criterion

Device-first

Edge/VPC

Cloud-first

Latency

Best (local)

Good (nearby)

Variable

Unit cost

Low per task; fixed device CAPEX

Medium

Pay‑as‑you‑go; can spike

Privacy

Strong (local data)

Strong (residency)

Manage via controls

Observability

Harder; client logging

Strong

Strong

Model size

Small/medium

Medium

Any

Governance implications

  • DPIA/records of processing: document local vs remote paths; justify lawful basis.

  • Content controls: exclude customer data from model training; pin versions for audit.

  • Telemetry minimisation: collect just enough client logs for safety/QA; hash or aggregate sensitive fields.

  • Device posture: enforce OS version, disk encryption, secure enclaves, and remote wipe.

A 90‑day evaluation plan

Weeks 1–2 – Discovery

  • Inventory candidate workloads; tag by sensitivity, latency, context size.

  • Select 3 use cases (e.g., local doc summarisation; mobile transcription; offline policy Q&A).

Weeks 3–6 – Thin slices

  • Ship device‑first prototypes; integrate a cloud escalation path; measure latency, cost per task, override rate.

Weeks 7–12 – Compare & decide

  • A/B device vs cloud for the same task; model TCO over 12 months; set guardrails for productionisation.

Risks & realities (a balanced view)

  • Hype risk: Not all workloads fit device constraints; keep cloud capacity for heavy jobs.

  • Ops overhead: Fleet model distribution/updates and NPU fragmentation need tooling.

  • Security trade‑offs: Endpoints are attack surfaces; harden devices and sign model artefacts.

  • Vendor posture: Validate claims; prefer benchmarks, energy profiles, and roadmaps, not slogans.

Bottom line

On‑device AI is rising, and it will likely rebalance where inference happens. Don’t bet the farm on a single architecture: run hybrid, measure ruthlessly, and move workloads to the cheapest trustworthy path that meets governance needs.

Next Steps: Need help building a hybrid AI plan? Generation Digital runs architecture sprints, TCO models, and pilot builds for regulated sectors.

FAQ

Q1. Will data centres really become obsolete?
A. Unlikely in the near term. Expect rebalancing, with more inference on devices/edge and cloud for heavy or shared contexts.

Q2. What should we pilot first?
A. Low‑risk, high‑volume tasks: local doc/email summarisation, transcription, and offline Q&A with cloud escalation.

Q3. How do we keep auditors happy with on‑device AI?
A. Log prompts/results locally with periodic secure sync, pin model versions, and publish a data‑flow map.

Q4. What hardware matters?
A. NPUs, memory bandwidth, and secure enclaves; ensure managed distribution of models and signed updates.

Q5. How do we measure success?
A. Cost per task, latency, override rate, citation coverage (when using RAG), and user satisfaction.

Recevez des conseils pratiques directement dans votre boîte de réception

En vous abonnant, vous consentez à ce que Génération Numérique stocke et traite vos informations conformément à notre politique de confidentialité. Vous pouvez lire la politique complète sur gend.co/privacy.

Prêt à obtenir le soutien dont votre organisation a besoin pour utiliser l'IA avec succès?

Miro Solutions Partner
Asana Platinum Solutions Partner
Notion Platinum Solutions Partner
Glean Certified Partner

Prêt à obtenir le soutien dont votre organisation a besoin pour utiliser l'IA avec succès ?

Miro Solutions Partner
Asana Platinum Solutions Partner
Notion Platinum Solutions Partner
Glean Certified Partner

Génération
Numérique

Bureau au Royaume-Uni
33 rue Queen,
Londres
EC4R 1AP
Royaume-Uni

Bureau au Canada
1 University Ave,
Toronto,
ON M5J 1T1,
Canada

Bureau NAMER
77 Sands St,
Brooklyn,
NY 11201,
États-Unis

Bureau EMEA
Rue Charlemont, Saint Kevin's, Dublin,
D02 VN88,
Irlande

Bureau du Moyen-Orient
6994 Alsharq 3890,
An Narjis,
Riyad 13343,
Arabie Saoudite

UK Fast Growth Index UBS Logo
Financial Times FT 1000 Logo
Febe Growth 100 Logo (Background Removed)

Numéro d'entreprise : 256 9431 77 | Droits d'auteur 2026 | Conditions générales | Politique de confidentialité

Génération
Numérique

Bureau au Royaume-Uni
33 rue Queen,
Londres
EC4R 1AP
Royaume-Uni

Bureau au Canada
1 University Ave,
Toronto,
ON M5J 1T1,
Canada

Bureau NAMER
77 Sands St,
Brooklyn,
NY 11201,
États-Unis

Bureau EMEA
Rue Charlemont, Saint Kevin's, Dublin,
D02 VN88,
Irlande

Bureau du Moyen-Orient
6994 Alsharq 3890,
An Narjis,
Riyad 13343,
Arabie Saoudite

UK Fast Growth Index UBS Logo
Financial Times FT 1000 Logo
Febe Growth 100 Logo (Background Removed)


Numéro d'entreprise : 256 9431 77
Conditions générales
Politique de confidentialité
Droit d'auteur 2026