Introducing Gemma 4 12B

Today, we're introducing the Gemma 4 12B, our latest model designed to bring multimodal agent intelligence directly to laptops. Bridging the gap between our edge-ready E4B and our more advanced 26B Mixture of Experts (MoE), the Gemma 4 12B packs powerful capabilities within reduced memory. It's also our first mid-size model to feature a native audio input.
Thanks to the developer community, Gemma 4 models have now surpassed 150 million downloads. He's built everything from wearable robotic arms for physical assistance to enterprise-grade AI security. We're excited to see what you build with this latest addition.
Here's an overview of what makes the Gemma 4 12B unique:
- Novel integrated architecture: There are no multimodal encoders. Vision and audio input flow directly into the LLM backbone.
- Advanced thinking: Benchmark performance alongside our 26B model, enabling powerful multi-step reasoning and agent workflows.
- The laptop is ready: It's small enough to run locally with 16GB of VRAM or integrated memory.
- Open and accessible: Released under the Apache 2.0 license with support for the entire developer ecosystem.
- Dafter-ready: Gemma 4 12B comes equipped with Multi-Token Prediction (MTP) drafts to reduce latency.
Together, these features bring advanced multimodal capabilities to everyday hardware without sacrificing speed or logic. Now let's take a closer look at how the Gemma 4 12B achieves this.
Use the best agents in the area
The Gemma 4 12B delivers performance close to our flagship 26B MoE model in standard benchmarks, but with less than half the total memory. Small enough to run locally on consumer laptops with 16GB of RAM, it opens up a powerful multimodal experience and functionality right on your device.



