OpenAI–Cerebras: 750MW of Low-Latency AI Compute by 2028

OpenAI–Cerebras: 750MW of Low-Latency AI Compute by 2028

OpenAI

IA

16 janv. 2026

A man and a woman in business attire stand in a modern data center, examining servers and discussing AI technology; the woman holds a tablet displaying analytical data.
A man and a woman in business attire stand in a modern data center, examining servers and discussing AI technology; the woman holds a tablet displaying analytical data.

Not sure what to do next with AI?
Assess readiness, risk, and priorities in under an hour.

Not sure what to do next with AI?
Assess readiness, risk, and priorities in under an hour.

➔ Réservez une consultation

OpenAI has struck a multi-year deal with Cerebras to deploy ~750MW of ultra low-latency AI compute through 2028, expanding OpenAI’s capacity for high-speed inference and improving platform scalability and resilience. Reports value the agreement at $10B+, with rollout in staged tranches.

What’s been announced

On 14 January 2026, OpenAI and Cerebras shared that they will add ~750 megawatts of low-latency AI compute to OpenAI’s platform under a multi-year agreement. Capacity comes online in phases through 2028. Multiple outlets report the deal is valued at over $10 billion.

Cerebras will supply wafer-scale systems designed for high-speed inference, complementing OpenAI’s broader, multi-vendor infrastructure strategy and reducing reliance on any single GPU supplier.

Why 750MW matters (without the hype)

“MW” measures power capacity available to run datacentre compute, not model performance directly—but it signals very large-scale infrastructure. Cerebras and press reports frame this as one of the largest low-latency AI inference deployments publicly announced, with an explicit focus on speed and throughput for serving models.

What users could notice

  • Lower latency, higher throughput: Wafer-scale systems integrate compute and memory to serve tokens faster than typical GPU stacks for certain workloads, which can translate to snappier responses and more concurrent users. (Early vendor claims suggest sizeable speedups for inference workloads; real-world results will vary by model and integration.)

  • Scalability during peaks: Phased capacity to 2028 should improve headroom for launches and peak demand, helping stabilise service quality.

  • Resilience & diversification: A broader compute portfolio reduces single-vendor risk and can improve supply flexibility.

How the tech fits

Cerebras’ wafer-scale engine (WSE) is a single, very large chip that emphasises memory bandwidth and on-chip communication, advantageous for certain inference patterns. OpenAI expects to integrate this capacity into its stack in stages, aligning with model roadmaps and datacentre readiness.

Timelines and scope (at a glance)

  • Announcement: 14 Jan 2026.

  • Total capacity: ~750MW planned.

  • Rollout: Phased, running through 2028.

  • Deal value: widely reported $10B+.

  • Focus: High-speed inference for OpenAI customers.

Practical implications for enterprises

  • Capacity for bigger deployments: More headroom for enterprise roll-outs (e.g., large seat counts, heavy retrieval-augmented use).

  • Performance-sensitive apps: If your use case is latency-critical (assistants, agents, streaming outputs), the added capacity should help maintain responsiveness during demand spikes.

  • Portfolio thinking: Expect hybrid backends (GPUs + wafer-scale + other accelerators) tuned per workload. This is consistent with OpenAI’s diversify-to-scale approach.

Note on numbers: Vendor speed claims vary by model and setup. Treat early benchmarks as directional; judge value on your end-to-end latency, throughput, cost per token, and SLA in production.

What to do next

  1. Capacity-ready design: If you’re planning enterprise uptake, design for autoscaling, parallelism, and streaming to take advantage of improved throughput when available.

  2. Benchmark your own path: Measure with your prompts, context sizes and safety settings; track P95 latency, tokens/sec, and error rates across time.

  3. Keep options open: Architect clients to support multiple model backends to benefit from OpenAI’s evolving infrastructure mix.

FAQs

What does the OpenAI–Cerebras partnership entail?
A multi-year agreement to deploy ~750MW of Cerebras wafer-scale systems for high-speed inference, integrated into OpenAI’s platform in stages through 2028.

How will this benefit OpenAI users?
Expect faster responses and better scalability during peak demand as additional low-latency capacity comes online. Real-world gains depend on model, context size and workload.

What’s the significance of “750MW”?
It indicates a very large power envelope for datacentre compute—signalling scale—rather than a direct performance metric. It underpins one of the largest publicly announced inference deployments.

OpenAI has struck a multi-year deal with Cerebras to deploy ~750MW of ultra low-latency AI compute through 2028, expanding OpenAI’s capacity for high-speed inference and improving platform scalability and resilience. Reports value the agreement at $10B+, with rollout in staged tranches.

What’s been announced

On 14 January 2026, OpenAI and Cerebras shared that they will add ~750 megawatts of low-latency AI compute to OpenAI’s platform under a multi-year agreement. Capacity comes online in phases through 2028. Multiple outlets report the deal is valued at over $10 billion.

Cerebras will supply wafer-scale systems designed for high-speed inference, complementing OpenAI’s broader, multi-vendor infrastructure strategy and reducing reliance on any single GPU supplier.

Why 750MW matters (without the hype)

“MW” measures power capacity available to run datacentre compute, not model performance directly—but it signals very large-scale infrastructure. Cerebras and press reports frame this as one of the largest low-latency AI inference deployments publicly announced, with an explicit focus on speed and throughput for serving models.

What users could notice

  • Lower latency, higher throughput: Wafer-scale systems integrate compute and memory to serve tokens faster than typical GPU stacks for certain workloads, which can translate to snappier responses and more concurrent users. (Early vendor claims suggest sizeable speedups for inference workloads; real-world results will vary by model and integration.)

  • Scalability during peaks: Phased capacity to 2028 should improve headroom for launches and peak demand, helping stabilise service quality.

  • Resilience & diversification: A broader compute portfolio reduces single-vendor risk and can improve supply flexibility.

How the tech fits

Cerebras’ wafer-scale engine (WSE) is a single, very large chip that emphasises memory bandwidth and on-chip communication, advantageous for certain inference patterns. OpenAI expects to integrate this capacity into its stack in stages, aligning with model roadmaps and datacentre readiness.

Timelines and scope (at a glance)

  • Announcement: 14 Jan 2026.

  • Total capacity: ~750MW planned.

  • Rollout: Phased, running through 2028.

  • Deal value: widely reported $10B+.

  • Focus: High-speed inference for OpenAI customers.

Practical implications for enterprises

  • Capacity for bigger deployments: More headroom for enterprise roll-outs (e.g., large seat counts, heavy retrieval-augmented use).

  • Performance-sensitive apps: If your use case is latency-critical (assistants, agents, streaming outputs), the added capacity should help maintain responsiveness during demand spikes.

  • Portfolio thinking: Expect hybrid backends (GPUs + wafer-scale + other accelerators) tuned per workload. This is consistent with OpenAI’s diversify-to-scale approach.

Note on numbers: Vendor speed claims vary by model and setup. Treat early benchmarks as directional; judge value on your end-to-end latency, throughput, cost per token, and SLA in production.

What to do next

  1. Capacity-ready design: If you’re planning enterprise uptake, design for autoscaling, parallelism, and streaming to take advantage of improved throughput when available.

  2. Benchmark your own path: Measure with your prompts, context sizes and safety settings; track P95 latency, tokens/sec, and error rates across time.

  3. Keep options open: Architect clients to support multiple model backends to benefit from OpenAI’s evolving infrastructure mix.

FAQs

What does the OpenAI–Cerebras partnership entail?
A multi-year agreement to deploy ~750MW of Cerebras wafer-scale systems for high-speed inference, integrated into OpenAI’s platform in stages through 2028.

How will this benefit OpenAI users?
Expect faster responses and better scalability during peak demand as additional low-latency capacity comes online. Real-world gains depend on model, context size and workload.

What’s the significance of “750MW”?
It indicates a very large power envelope for datacentre compute—signalling scale—rather than a direct performance metric. It underpins one of the largest publicly announced inference deployments.

Recevez des conseils pratiques directement dans votre boîte de réception

En vous abonnant, vous consentez à ce que Génération Numérique stocke et traite vos informations conformément à notre politique de confidentialité. Vous pouvez lire la politique complète sur gend.co/privacy.

Prêt à obtenir le soutien dont votre organisation a besoin pour utiliser l'IA avec succès?

Miro Solutions Partner
Asana Platinum Solutions Partner
Notion Platinum Solutions Partner
Glean Certified Partner

Prêt à obtenir le soutien dont votre organisation a besoin pour utiliser l'IA avec succès ?

Miro Solutions Partner
Asana Platinum Solutions Partner
Notion Platinum Solutions Partner
Glean Certified Partner

Génération
Numérique

Bureau au Royaume-Uni
33 rue Queen,
Londres
EC4R 1AP
Royaume-Uni

Bureau au Canada
1 University Ave,
Toronto,
ON M5J 1T1,
Canada

Bureau NAMER
77 Sands St,
Brooklyn,
NY 11201,
États-Unis

Bureau EMEA
Rue Charlemont, Saint Kevin's, Dublin,
D02 VN88,
Irlande

Bureau du Moyen-Orient
6994 Alsharq 3890,
An Narjis,
Riyad 13343,
Arabie Saoudite

UK Fast Growth Index UBS Logo
Financial Times FT 1000 Logo
Febe Growth 100 Logo (Background Removed)

Numéro d'entreprise : 256 9431 77 | Droits d'auteur 2026 | Conditions générales | Politique de confidentialité

Génération
Numérique

Bureau au Royaume-Uni
33 rue Queen,
Londres
EC4R 1AP
Royaume-Uni

Bureau au Canada
1 University Ave,
Toronto,
ON M5J 1T1,
Canada

Bureau NAMER
77 Sands St,
Brooklyn,
NY 11201,
États-Unis

Bureau EMEA
Rue Charlemont, Saint Kevin's, Dublin,
D02 VN88,
Irlande

Bureau du Moyen-Orient
6994 Alsharq 3890,
An Narjis,
Riyad 13343,
Arabie Saoudite

UK Fast Growth Index UBS Logo
Financial Times FT 1000 Logo
Febe Growth 100 Logo (Background Removed)


Numéro d'entreprise : 256 9431 77
Conditions générales
Politique de confidentialité
Droit d'auteur 2026