Gemini Model Extraction Attacks: How ‘AI Cloning’ Works

Q: What is a model extraction (distillation) attack?

A model extraction attack uses legitimate access to systematically probe an AI model and use its outputs to train a cheaper student model that imitates the original model’s capabilities.

Q: Did Google say Gemini was attacked at scale?

Yes. Google threat reporting and coverage describe campaigns involving 100,000+ prompts aimed at extracting Gemini capabilities.

Q: Are these attacks dangerous for normal Gemini users?

Google says model extraction attacks do not typically threaten the confidentiality, availability, or integrity of AI services for average users; risk is concentrated on model developers and service providers.

Q: What is reasoning trace coercion?

It refers to attempts to coerce an AI model into outputting fuller internal reasoning traces, which can provide extra signal for distillation. Google says it detected and mitigated such campaigns in real time.

Google

Géminis

16 feb 2026

A cybersecurity expert sits in a modern control room surrounded by multiple computer monitors displaying global maps and data on AI model security, focusing on identifying suspicious activity related to Gemini Model Extraction Attacks and 'AI Cloning'.

¿No sabes por dónde empezar con la IA?
Evalúa preparación, riesgos y prioridades en menos de una hora.

➔ Descarga nuestro paquete gratuito de preparación para IA

Model extraction (distillation) attacks try to “clone” an AI model by sending large volumes of carefully designed prompts and using the outputs to train a copycat system. Google says Gemini has faced 100,000+ prompt campaigns, including attempts to coerce reasoning traces. The biggest risk is IP theft for AI providers — mitigated through abuse detection and access controls. (eweek.com)

As generative AI becomes more powerful, the threat landscape is changing with it. We’re not just seeing prompt injection and misuse — we’re seeing attackers aim directly at the model itself.

Google says it has observed model extraction (distillation) attacks targeting Gemini at scale, with 100,000+ prompts used in some campaigns. According to eWeek’s reporting, Google attributes the activity to actors linked to countries including North Korea, Russia and China, and characterises it as intellectual property theft.

What is a model extraction (distillation) attack?

A model extraction attack is when an adversary uses legitimate access (for example, an API or a chatbot interface) to systematically query a model, collect outputs, and use them to train a cheaper “student” model that imitates the “teacher” model.

Google’s Threat Intelligence Group explains that this commonly uses knowledge distillation, a legitimate machine learning technique — but distilling from Gemini without permission violates Google’s terms of service and amounts to IP theft.

Why this matters (and who is actually at risk)

Google makes an important distinction: model extraction attacks don’t typically threaten the confidentiality, availability, or integrity of AI services for average users. The concentration of risk is on:

Model developers (loss of proprietary capability)
AI service providers (abuse that scales cost and erodes competitive advantage)
Organisations hosting custom models (competitors attempting to replicate domain-specific tuning)

In other words: it’s less like “someone hacked your inbox”, and more like “someone tried to copy your engine by test-driving it 100,000 times”.

What Google says attackers are trying to steal: reasoning traces

One of Google’s more specific examples is “reasoning trace coercion”.

GTIG says that while internal reasoning traces are normally summarised before being shown to users, attackers attempted to coerce Gemini into outputting fuller internal reasoning processes. In one campaign, GTIG reports:

Scale: 100,000+ prompts
Intent: replicate reasoning ability across a wide variety of tasks (including non‑English targets)
Outcome: Google says it recognised the attack in real time and reduced the risk of that campaign, protecting internal reasoning traces

This is a useful reminder for security leaders: “reasoning” isn’t just a UX feature — it can be a high-value signal for adversaries trying to distil model behaviour.

Mitigations that actually help

Because model extraction uses legitimate access patterns, classic perimeter controls don’t solve it on their own. The mitigations are operational and product-led:

1) Monitor access patterns for extraction behaviour

Google recommends that organisations offering AI models as a service monitor API access patterns that look like systematic probing or distillation attempts.

Practical signals include high-volume sampling across broad task coverage, unusual multilingual probing, repeated prompt templates, and automated query cadence.

2) Rate limiting and friction where it matters

Extraction is an economics game: attackers need volume. Controls that make high-volume probing expensive or noisy can materially reduce the viability of attacks.

3) Reduce learnable signal

If you expose richer internal traces, consistent formatting, or overly detailed explanations, you may increase the amount of “teacher signal” available to a distillation attacker. GTIG’s case study shows attackers explicitly target this.

4) Defensive responses to degrade student-model usefulness

Google notes it uses detection and mitigation techniques — including real-time proactive defences — to disrupt extraction attempts and reduce the value of the harvested outputs.

What this means for organisations using AI today

If you’re primarily an AI consumer (using Gemini, ChatGPT, Copilot, etc.), model extraction is not usually the most immediate risk. Your bigger day-to-day concerns are typically:

prompt injection / indirect injection via documents and web content
data governance and user permissions
connector/app access controls

If you publish or host models (including internal LLM apps), then extraction becomes a board-level topic: it is about protecting IP, controlling abuse costs, and ensuring your model can’t be cheaply replicated.

Summary

Google says Gemini has faced 100,000+ prompt campaigns aimed at model extraction/distillation, including attempts to coerce reasoning traces to accelerate “cloning”. The primary risk is to model developers and AI service providers, not typical end users — and the most effective mitigations are monitoring, rate controls, and product choices that limit learnable signal while disrupting high-volume abuse.

Next steps: Generation Digital can help you assess exposure across your AI stack — whether you’re consuming models (prompt injection and access controls) or operating them (abuse monitoring, rate limiting, and governance).

FAQs

Q1: What is a model extraction (distillation) attack?
A technique where an attacker uses legitimate access to systematically probe a model and use the outputs to train a cheaper “student” model that imitates the original. (cloud.google.com)

Q2: Did Google say Gemini was attacked at scale?
Yes. Google’s threat reporting and eWeek’s coverage describe campaigns involving 100,000+ prompts aimed at extracting Gemini capabilities. (eweek.com)

Q3: Are these attacks dangerous for normal Gemini users?
Google says model extraction attacks typically don’t threaten the confidentiality, availability, or integrity of AI services for average users; the risk is concentrated on model developers and service providers. (cloud.google.com)

Q4: What is “reasoning trace coercion”?
GTIG describes campaigns that attempt to force Gemini to output fuller internal reasoning traces, which could provide extra signal for distillation. Google says it detected and mitigated at least one such campaign in real time. (cloud.google.com)

Model extraction (distillation) attacks try to “clone” an AI model by sending large volumes of carefully designed prompts and using the outputs to train a copycat system. Google says Gemini has faced 100,000+ prompt campaigns, including attempts to coerce reasoning traces. The biggest risk is IP theft for AI providers — mitigated through abuse detection and access controls. (eweek.com)

What is a model extraction (distillation) attack?

Why this matters (and who is actually at risk)

Model developers (loss of proprietary capability)
AI service providers (abuse that scales cost and erodes competitive advantage)
Organisations hosting custom models (competitors attempting to replicate domain-specific tuning)

In other words: it’s less like “someone hacked your inbox”, and more like “someone tried to copy your engine by test-driving it 100,000 times”.

What Google says attackers are trying to steal: reasoning traces

One of Google’s more specific examples is “reasoning trace coercion”.

Scale: 100,000+ prompts
Intent: replicate reasoning ability across a wide variety of tasks (including non‑English targets)
Outcome: Google says it recognised the attack in real time and reduced the risk of that campaign, protecting internal reasoning traces

This is a useful reminder for security leaders: “reasoning” isn’t just a UX feature — it can be a high-value signal for adversaries trying to distil model behaviour.

Mitigations that actually help

Because model extraction uses legitimate access patterns, classic perimeter controls don’t solve it on their own. The mitigations are operational and product-led:

1) Monitor access patterns for extraction behaviour

Google recommends that organisations offering AI models as a service monitor API access patterns that look like systematic probing or distillation attempts.

Practical signals include high-volume sampling across broad task coverage, unusual multilingual probing, repeated prompt templates, and automated query cadence.

2) Rate limiting and friction where it matters

Extraction is an economics game: attackers need volume. Controls that make high-volume probing expensive or noisy can materially reduce the viability of attacks.

3) Reduce learnable signal

4) Defensive responses to degrade student-model usefulness

Google notes it uses detection and mitigation techniques — including real-time proactive defences — to disrupt extraction attempts and reduce the value of the harvested outputs.

What this means for organisations using AI today

If you’re primarily an AI consumer (using Gemini, ChatGPT, Copilot, etc.), model extraction is not usually the most immediate risk. Your bigger day-to-day concerns are typically:

prompt injection / indirect injection via documents and web content
data governance and user permissions
connector/app access controls

Summary

FAQs

‹ Perplexity Email Assistant: Gmail & Outlook Inbox Automation

London Tech Week 2026: Sovereign AI Takes Centre Stage›

Recibe noticias y consejos sobre IA cada semana en tu bandeja de entrada

Al suscribirte, das tu consentimiento para que Generation Digital almacene y procese tus datos de acuerdo con nuestra política de privacidad. Puedes leer la política completa en gend.co/privacy.

A woman in a blue blazer works on a laptop displaying email automation software, with a screen showing charts in the background, embodying modern workplace efficiency and technology.

Perplexity Email Assistant: Gmail & Outlook Inbox Automation

Gemini Model Extraction Attacks: How ‘AI Cloning’ Works

A bustling convention floor at London Tech Week, filled with attendees exploring various tech company booths, interactive displays, and networking areas.

London Tech Week 2026: Sovereign AI Takes Centre Stage

Perplexity Email Assistant: Gmail & Outlook Inbox Automation

Gemini Model Extraction Attacks: How ‘AI Cloning’ Works

London Tech Week 2026: Sovereign AI Takes Centre Stage

Próximos talleres y seminarios web

A diverse group of professionals collaborating around a table in a bright, modern office setting.

Claridad Operacional a Gran Escala - Asana

Webinar Virtual
Miércoles 25 de febrero de 2026
En línea

Trabaja con compañeros de equipo de IA - Asana

Taller Presencial
Jueves 26 de febrero de 2026
Londres, Reino Unido

De Idea a Prototipo: IA en Miro

Seminario Web Virtual
Miércoles 18 de febrero de 2026
En línea

Generación
Digital

Miro
Asana
Notion
Glean

¿Cuál Herramienta de IA? Quiz

El Camino hacia el Éxito con IA

Acerca de Generación Digital

Contacto

Oficina en Reino Unido

Generation Digital Ltd
33 Queen St,
Londres
EC4R 1AP
Reino Unido

Oficina en Canadá

Generation Digital Americas Inc
181 Bay St., Suite 1800
Toronto, ON, M5J 2T9
Canadá

Oficina en EE. UU.

Generation Digital Américas Inc
77 Sands St,
Brooklyn, NY 11201,
Estados Unidos

Oficina de la UE

Software Generación Digital
Edificio Elgee
Dundalk
A91 X2R3
Irlanda

Oficina en Medio Oriente

6994 Alsharq 3890,
An Narjis,
Riad 13343,
Arabia Saudita

Número de la empresa: 256 9431 77 | Derechos de autor 2026 | Términos y Condiciones | Política de Privacidad

Generación
Digital

Miro
Asana
Notion
Glean

¿Cuál Herramienta de IA? Quiz

El Camino hacia el Éxito con IA

Acerca de Generación Digital

Contacto

Oficina en Reino Unido

Generation Digital Ltd
33 Queen St,
Londres
EC4R 1AP
Reino Unido

Oficina en Canadá

Generation Digital Americas Inc
181 Bay St., Suite 1800
Toronto, ON, M5J 2T9
Canadá

Oficina en EE. UU.

Generation Digital Américas Inc
77 Sands St,
Brooklyn, NY 11201,
Estados Unidos

Oficina de la UE

Software Generación Digital
Edificio Elgee
Dundalk
A91 X2R3
Irlanda

Oficina en Medio Oriente

6994 Alsharq 3890,
An Narjis,
Riad 13343,
Arabia Saudita

Número de Empresa: 256 9431 77
Términos y Condiciones
Política de Privacidad
Derechos de Autor 2026