Generative AI

Google DeepMind Researchers Release Gemma Scope 2 as a Complete Interpretation Suite for the Gemma 3 Model Stack

Google DeepMind researchers present Gemma Scope 2, an open suite of interpretation tools that reveal how Gemma 3 language models process and represent information at all layers, from 270M to 27B parameters.

Its main goal is simple, give security and compliance AI teams a realistic way to trace modeled behavior back to internal factors instead of relying solely on input analysis. When a Gemma 3 model breaks, reveals illegal objects or exhibits synchronous behavior, Gemma Scope 2 allows researchers to examine what internal features are fired and how those openings flow through the network.

What is Gemma Scope 2?

Gemma Scope 2 is a comprehensive, open suite of discrete autoencoders and related tools designed for internal activation of the Gemma 3 model family. Small autoencoders, SAEs, act as a microscope to the model. They decompose higher-dimensional activations into a smaller set of measurable human characteristics that correspond to concepts or behaviors.

Training for Gemma Scope 2 required storing approximately 110 Petabytes of activation data and matching a total of 1 trillion parameters across all interpretive models.

The suite targets all Gemma 3 models, including 270M, 1B, 4B, 12B and 27B parameter models, and covers the full depth of the network. This is important because many safety-related behaviors only occur at large scales.

What is new compared to the original Gemma Scope?

The first release of Gemma Scope focuses on Gemma 2 and already enabled the research of optical illusion models, identifying model-known secrets and training safe models.

Gemma Scope 2 extends that functionality in four main ways:

  1. The instruments now include the entire Gemma 3 family up to 27B parameters, which are needed to study the emergent behavior seen only in large models, such as the behavior previously analyzed in the 27B Scale model of C2S size for scientific discovery operations.
  2. Gemma Scope 2 includes SAEs and transcoders trained on all layers of Gemma 3. Skipping transcoders and cross-layer transcoders enable tracking of multi-step computations distributed across all layers.
  3. The suite uses the Matryoshka training method for SAEs to learn useful and stable features and to mitigate some of the bugs mentioned in previous releases of Gemma Scope.
  4. There are dedicated tools for interpreting Gemma 3 models designed for discussion, which makes it possible to analyze many behavioral measures such as jailbreaks, refusal methods and chain of custody concepts.

Key Takeaways

  1. Gemma Scope 2 is an open interpretation suite for all Gemma 3 models, from parameters 270M to 27B, with SAEs and transcoders on all layers of both pre-trained and instruction-activated models.
  2. The suite uses discrete autoencoders like a microscope that decompose internal activation into fewer, feature-like concepts, and transcoders that track how these features propagate across layers.
  3. Gemma Scope 2 is clearly placed on the AI ​​security task of studying jailbreaks, hallucinations, sycophancy, denial mechanisms and the mismatch between internal state and transferred thinking in Gemma 3.

Check it out Paper, technical specifications and model weights. Also, feel free to follow us Twitter and don't forget to join our 100k+ ML SubReddit and Subscribe to Our newspaper. Wait! are you on telegram? now you can join us on telegram too.


Michal Sutter is a data science expert with a Master of Science in Data Science from the University of Padova. With a strong foundation in statistical analysis, machine learning, and data engineering, Michal excels at turning complex data sets into actionable insights.

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button