Voxtral Transcribe 2: fast, accurate speech‑to‑text for 2026

Voxtral Transcribe 2: fast, accurate speech‑to‑text for 2026

Mistral

5 feb 2026

A modern office with an open layout features several people collaborating around wooden desks, with laptops and notepads scattered, greenery adding warmth, and large windows allowing ample natural light in a space with a wall circuit mural; ideal for productivity and innovative software discussions like Voxtral Transcribe 2, the fast, accurate speech-to-text solution for 2026.
A modern office with an open layout features several people collaborating around wooden desks, with laptops and notepads scattered, greenery adding warmth, and large windows allowing ample natural light in a space with a wall circuit mural; ideal for productivity and innovative software discussions like Voxtral Transcribe 2, the fast, accurate speech-to-text solution for 2026.

¿No está seguro de qué hacer a continuación con IA?
Evalúe su preparación, riesgos y prioridades en menos de una hora.

¿No está seguro de qué hacer a continuación con IA?
Evalúe su preparación, riesgos y prioridades en menos de una hora.

➔ Descarga nuestro paquete gratuito de preparación para IA

Voxtral Transcribe 2 is Mistral’s latest speech‑to‑text release combining a batch model (Mini Transcribe V2) and a streaming model (Realtime). It adds sub‑200ms latency, 13‑language accuracy, diarisation, context biasing, and word‑level timestamps, with pricing from $0.003 per minute and open‑weights for Realtime under Apache 2.0.

Why Voxtral matters now

Real‑time voice is surging, and teams need transcription that is fast, multilingual and private by design. Voxtral Transcribe 2 delivers sub‑200ms streaming, competitive accuracy across 13 languages, and deploy‑anywhere flexibility — including open‑weights for edge use.

What’s new in Voxtral Transcribe 2

  • Two models, one release: Mini Transcribe V2 (batch) + Realtime (streaming).

  • Latency: Realtime configurable down to sub‑200ms; ~2.4s mode matches Mini V2 for subtitling‑grade accuracy.

  • Languages: 13 supported (EN, ZH, HI, ES, AR, FR, PT, RU, DE, JA, KO, IT, NL).

  • Open weights: Realtime under Apache 2.0 for edge/private deployments.

  • Price‑performance: Mini V2 at ~$0.003/min aims for the lowest WER at the lowest price point; Realtime at ~$0.006/min.

Key capabilities for enterprises

  • Speaker diarisation: Who said what and when, with labelled segments; handles most scenarios (note: overlapping speech is transcribed as a single speaker).

  • Context biasing: Up to 100 terms to nudge spellings for brands, jargon and names (optimised for English).

  • Word‑level timestamps: Accurate alignment for subtitles, audit trails and search.

  • Noise robustness & long files: Works in tough acoustics; supports recordings up to 3 hours.

  • Security & compliance: Supports GDPR/HIPAA‑compliant deployments; run on‑prem or private cloud.

How Voxtral compares

Mistral positions Mini V2 as best‑in‑class price‑performance with low WER, and Realtime as near‑offline accuracy at live latencies. The post claims outperformance vs GPT‑4o mini Transcribe (OpenAI), Gemini 2.5 Flash (Google), Assembly Universal (AssemblyAI), and Deepgram Nova (Deepgram), and ~3× faster processing than ElevenLabs Scribe v2 at roughly one‑fifth the cost — per Mistral’s benchmarks. Always confirm with your audio before switching.

Practical uses

  • Meeting intelligence: Multilingual notes with diarisation for clean attribution.

  • Voice agents: Realtime STT (<200ms) for natural turn‑taking with your LLM + TTS pipeline.

  • Contact centres: Live guidance, CRM autofill, and sentiment while calls run.

  • Broadcast & media: Low‑latency live subtitles; resilient to names and jargon using context biasing.

  • Compliance: Timestamps and diarisation to support audits.

Try it now

You can test Voxtral Transcribe 2 immediately in Mistral Studio’s audio playground (upload up to 10 files, toggle diarisation, set timestamp granularity, and add bias terms), or integrate via API. Mini V2 is listed at $0.003/min; Realtime at $0.006/min; Realtime weights are on Hugging Face under Apache 2.0.

Summary

If you need fast, accurate, and controllable STT with enterprise features — and want the option to run privately — Voxtral Transcribe 2 is compelling. Start in the playground, benchmark with your own audio, then choose Mini V2 for batches or Realtime for live use.

FAQ

Is Voxtral Realtime really sub‑200ms?
Yes — the streaming architecture transcribes as audio arrives, with delay configurable down to sub‑200ms.

Which languages does it support?
Thirteen: English, Chinese, Hindi, Spanish, Arabic, French, Portuguese, Russian, German, Japanese, Korean, Italian, Dutch.

Does it do diarisation and timestamps?
Yes — diarisation with labels and start/end times, plus word‑level timestamps for alignment.

Can I deploy it on‑prem or edge?
Yes — Realtime ships open‑weights (Apache 2.0) and both models support private/cloud setups aligned to GDPR/HIPAA.

What does it cost?
Indicative pricing: Mini V2 ~$0.003/min; Realtime ~$0.006/min (check Mistral for latest).

Voxtral Transcribe 2 is Mistral’s latest speech‑to‑text release combining a batch model (Mini Transcribe V2) and a streaming model (Realtime). It adds sub‑200ms latency, 13‑language accuracy, diarisation, context biasing, and word‑level timestamps, with pricing from $0.003 per minute and open‑weights for Realtime under Apache 2.0.

Why Voxtral matters now

Real‑time voice is surging, and teams need transcription that is fast, multilingual and private by design. Voxtral Transcribe 2 delivers sub‑200ms streaming, competitive accuracy across 13 languages, and deploy‑anywhere flexibility — including open‑weights for edge use.

What’s new in Voxtral Transcribe 2

  • Two models, one release: Mini Transcribe V2 (batch) + Realtime (streaming).

  • Latency: Realtime configurable down to sub‑200ms; ~2.4s mode matches Mini V2 for subtitling‑grade accuracy.

  • Languages: 13 supported (EN, ZH, HI, ES, AR, FR, PT, RU, DE, JA, KO, IT, NL).

  • Open weights: Realtime under Apache 2.0 for edge/private deployments.

  • Price‑performance: Mini V2 at ~$0.003/min aims for the lowest WER at the lowest price point; Realtime at ~$0.006/min.

Key capabilities for enterprises

  • Speaker diarisation: Who said what and when, with labelled segments; handles most scenarios (note: overlapping speech is transcribed as a single speaker).

  • Context biasing: Up to 100 terms to nudge spellings for brands, jargon and names (optimised for English).

  • Word‑level timestamps: Accurate alignment for subtitles, audit trails and search.

  • Noise robustness & long files: Works in tough acoustics; supports recordings up to 3 hours.

  • Security & compliance: Supports GDPR/HIPAA‑compliant deployments; run on‑prem or private cloud.

How Voxtral compares

Mistral positions Mini V2 as best‑in‑class price‑performance with low WER, and Realtime as near‑offline accuracy at live latencies. The post claims outperformance vs GPT‑4o mini Transcribe (OpenAI), Gemini 2.5 Flash (Google), Assembly Universal (AssemblyAI), and Deepgram Nova (Deepgram), and ~3× faster processing than ElevenLabs Scribe v2 at roughly one‑fifth the cost — per Mistral’s benchmarks. Always confirm with your audio before switching.

Practical uses

  • Meeting intelligence: Multilingual notes with diarisation for clean attribution.

  • Voice agents: Realtime STT (<200ms) for natural turn‑taking with your LLM + TTS pipeline.

  • Contact centres: Live guidance, CRM autofill, and sentiment while calls run.

  • Broadcast & media: Low‑latency live subtitles; resilient to names and jargon using context biasing.

  • Compliance: Timestamps and diarisation to support audits.

Try it now

You can test Voxtral Transcribe 2 immediately in Mistral Studio’s audio playground (upload up to 10 files, toggle diarisation, set timestamp granularity, and add bias terms), or integrate via API. Mini V2 is listed at $0.003/min; Realtime at $0.006/min; Realtime weights are on Hugging Face under Apache 2.0.

Summary

If you need fast, accurate, and controllable STT with enterprise features — and want the option to run privately — Voxtral Transcribe 2 is compelling. Start in the playground, benchmark with your own audio, then choose Mini V2 for batches or Realtime for live use.

FAQ

Is Voxtral Realtime really sub‑200ms?
Yes — the streaming architecture transcribes as audio arrives, with delay configurable down to sub‑200ms.

Which languages does it support?
Thirteen: English, Chinese, Hindi, Spanish, Arabic, French, Portuguese, Russian, German, Japanese, Korean, Italian, Dutch.

Does it do diarisation and timestamps?
Yes — diarisation with labels and start/end times, plus word‑level timestamps for alignment.

Can I deploy it on‑prem or edge?
Yes — Realtime ships open‑weights (Apache 2.0) and both models support private/cloud setups aligned to GDPR/HIPAA.

What does it cost?
Indicative pricing: Mini V2 ~$0.003/min; Realtime ~$0.006/min (check Mistral for latest).

Recibe noticias y consejos sobre IA cada semana en tu bandeja de entrada

Al suscribirte, das tu consentimiento para que Generation Digital almacene y procese tus datos de acuerdo con nuestra política de privacidad. Puedes leer la política completa en gend.co/privacy.

Próximos talleres y seminarios web

A diverse group of professionals collaborating around a table in a bright, modern office setting.
A diverse group of professionals collaborating around a table in a bright, modern office setting.

Claridad Operacional a Gran Escala - Asana

Webinar Virtual
Miércoles 25 de febrero de 2026
En línea

A diverse group of professionals collaborating around a table in a bright, modern office setting.
A diverse group of professionals collaborating around a table in a bright, modern office setting.

Trabajando con Compañeros de IA - Asana

Trabajando con Compañeros de IA - Asana

Taller Presencial
Jueves 26 de febrero de 2026
Londres, Reino Unido

Generación
Digital

Oficina en el Reino Unido
33 Queen St,
Londres
EC4R 1AP
Reino Unido

Oficina en Canadá
1 University Ave,
Toronto,
ON M5J 1T1,
Canadá

Oficina NAMER
77 Sands St,
Brooklyn,
NY 11201,
Estados Unidos

Oficina EMEA
Calle Charlemont, Saint Kevin's, Dublín,
D02 VN88,
Irlanda

Oficina en Medio Oriente
6994 Alsharq 3890,
An Narjis,
Riyadh 13343,
Arabia Saudita

UK Fast Growth Index UBS Logo
Financial Times FT 1000 Logo
Febe Growth 100 Logo (Background Removed)

Número de la empresa: 256 9431 77 | Derechos de autor 2026 | Términos y Condiciones | Política de Privacidad

Generación
Digital

Oficina en el Reino Unido
33 Queen St,
Londres
EC4R 1AP
Reino Unido

Oficina en Canadá
1 University Ave,
Toronto,
ON M5J 1T1,
Canadá

Oficina NAMER
77 Sands St,
Brooklyn,
NY 11201,
Estados Unidos

Oficina EMEA
Calle Charlemont, Saint Kevin's, Dublín,
D02 VN88,
Irlanda

Oficina en Medio Oriente
6994 Alsharq 3890,
An Narjis,
Riyadh 13343,
Arabia Saudita

UK Fast Growth Index UBS Logo
Financial Times FT 1000 Logo
Febe Growth 100 Logo (Background Removed)


Número de Empresa: 256 9431 77
Términos y Condiciones
Política de Privacidad
Derechos de Autor 2026