Google's latest DiffusionGemma open AI model comes with a 4x speed boost

June 10, 2026 • Technology

Summary

Google DeepMind released DiffusionGemma, a new AI model that generates text faster by creating many words at once instead of one by one. It runs well on local computers like gaming GPUs and can produce about four times more text per second than similar models.

Key Facts

DiffusionGemma generates text in parallel, producing whole blocks of text at once.
It is part of Google’s Gemma 4 open AI model family but works differently from previous versions.
The model uses a “diffusion” process similar to image generation, starting with noisy tokens and refining them over several steps.
DiffusionGemma has 26 billion parameters but activates only 3.8 billion during use, fitting in high-end GPUs with 18GB of RAM.
On an Nvidia RTX 5090 GPU, it produces about 700 tokens per second; on an Nvidia H100 AI accelerator, it can exceed 1,000 tokens per second.
This speed is about four times faster than similarly sized autoregressive (one-token-at-a-time) Gemma models.
The model helps with tasks needing complex token interdependence, like solving Sudoku, by continuously improving large token groups.
DiffusionGemma is experimental and available open source under the Apache 2.0 license.

Read the Full Article

This is a fact-based summary from The Actual News. Click below to read the complete story directly from the original source.

Ars Technica