Sparse Autoecoder: From Supposition to Variable Features | by Shuhang xiang | Feb, 2025

nimda February 1, 2025

0 13 4 minutes read

Sparse Autoecoder: From Supposition to Variable Features | by Shuhang xiang | Feb, 2025

Desentangle features on the sophisticated Neural network with superhositions

Neural complex companiesLike larger language (llms) models, often suffer from interpretation Challenges. One of the most important reasons for such difficulties speaker – Neural Neural status of small size than the appropriate characteristics number shall be independent. For example, a LLM toy with 2 neurons should give 6 different features of language. As a result, we see often that one neuron needs to activate many features. For a detailed description of Superapsition, please refer my previous post: “Suppeasy: What makes it difficult to describe neurural network”.

In this blog course, we take one step and: Let's try to get rid of some features installed. I will introduce the way called Sparse Autoder Reducing Neural's sophisticated network, especially LLM into changing factors, by example of receiving language symbols.

A Sparse AutoderBy definition, Autoencoder with a sparkity is intended for the purpose of working for its hidden layers. With a simple structure and a simple glimmer, it aims to decompose the complex network of neural and reflect the features in a more convertible and very understandable manner.

Let's think you have a professional neural network. Autoencoder is not part of the model training process but instead it is a Post-Hoc analytic tool. The original model has its own functionality, and this applies to the collection that it was used as the Sparse Autododer's installation data.

For example, we think that your original model is a neural network with one hidden layer of 5 neurons. Besides, you have 5000 samples training data. You must collect all 5 activation prices for the hidden layer of all your 5000 training samples, and now it is included for your Sparse autodododer.

Photo by writer: Autoecoder to analyze the llm

Autoencoder has read new, sparse presentations from this function. Encoder is the original MP of MLP for a new Vector area of the highest size of the representation. Looking back on my previous 5-neuron example, we can think that we have called the Vector area with 20 features. Hopefully we will find an autoencoder that is successfully flexible for deteriorating the first MLP work for inconvenience, easy to interpret and analyze.

Sparsity is important for Autoqoder to Autoencoder Features, With “Freedom” over Sparsity, Autoencoder maybe Autoend

Language model

Now let's build our toy model. I ask students to note that this model is unreasonable and a little stupid practice but is enough to show how we build a sparse autarse and take other features.

Suppose we have built a single hidden model model near three types. Let's also think that we have the following tokens: “Happy Cat,” Dog, “Dog, not a cat,” “Training data and have the following prices.

data = torch.tensor([
# Cat categories
[0.8, 0.3, 0.1, 0.05],  # "cat"
[0.82, 0.32, 0.12, 0.06],  # "happy cat" (similar to "cat")
# Dog categories
[0.7, 0.2, 0.05, 0.2],  # "dog"
[0.75, 0.3, 0.1, 0.25],  # "loyal dog" (similar to "dog")# "Not animal" categories
[0.05, 0.9, 0.4, 0.4],  # "not cat"
[0.15, 0.85, 0.35, 0.5],  # "not dog"
# Robot and AI assistant (more distinct in 4D space)
[0.0, 0.7, 0.9, 0.8],  # "robot"
[0.1, 0.6, 0.85, 0.75]  # "AI assistant"
], dtype=torch.float32)

The construction of the autododer

We now build an Autoencoder with the following code:

class SparseAutoencoder(nn.Module):
def __init__(self, input_dim, hidden_dim):
super(SparseAutoencoder, self).__init__()
self.encoder = nn.Sequential(
nn.Linear(input_dim, hidden_dim),
nn.ReLU()
)
self.decoder = nn.Sequential(
nn.Linear(hidden_dim, input_dim)
)def forward(self, x):
encoded = self.encoder(x)
decoded = self.decoder(encoded)
return encoded, decoded

According to the code above, we see that the encoder has one completely connected layer, the installation map in the hidden presentation with hidden_dim And it passes to RULU's performance. Decoder uses one specific layer to rebuild input. Note that the absence of the action is activated in the DECORER with the purpose of our outright case, because the rebuilding may contain real data and may have a bad amount. The Relust will stand the contrary to the outcome of exit to remain uncomfortable, unpleasant to rebuild.

We train the model using the code below. Here, the loss of a two-part worker: the loss of reconstruction, measuring the accuracy of the Autoqoder's reconstruction data, as well as the loss of space (weighing), which promotes sparsity formation in Encoder.

# Training loop
for epoch in range(num_epochs):
optimizer.zero_grad()# Forward pass
encoded, decoded = model(data)
# Reconstruction loss
reconstruction_loss = criterion(decoded, data)
# Sparsity penalty (L1 regularization on the encoded features)
sparsity_loss = torch.mean(torch.abs(encoded))
# Total loss
loss = reconstruction_loss + sparsity_weight * sparsity_loss
# Backward pass and optimization
loss.backward()
optimizer.step()

We can now have the appearance of the effect. We have planned the Encoder's Output Valual of each one to work for the first models. Remember that installation tokens 'are curtains,' “Happy Cat,” dog, “” strong dog, “not a cat,” “Airot,”

Photo by writer: Features Learned by Encoder

Or the original model is designed for the simplest construction without intense consideration, Autoencoder presents the material for this empty model. According to the above building, we can see at least four features that appear to the Encoder is read.

Give the first feature 1 to consider. The young man has great numbers for the following tokens: “Cat”, “Happy Cat”, “Dog”, and “a powerful dog”. The outcome suggests that such aspects can be related to “animals” or “pets”. Feature 2 is a dear example, using two “robot” and “Assistant”. Thinking, then, this feature has something about “installation and robots”, showing the model understanding about technical conditions. The 3 Feature has the 4ths of the 4ths: “Not a cat”, “not a dog”, “robot” and “Ai Assistant” and is probably a feature “not an animal”.

Unfortunately, the original model is not a realistic model trained in the actual world text, but rather designed that the same tokens are like the Vector Activation space. However, the results are being interested in exciting: The Sparse Autoencoder has succeeded in showing certain meaningful, personal, or true concepts.

The simple result of this blog posts:, the Sparse Autoencoder can help effectively find high, convertible levels from Neal complex networks such as LLM.

For students who are interested in the actual initiation of autododers, I recommend this article, where Autoecoder was trained to translate a large model of large language for 512 neurons. This study provides the actual app for Sparse in the City of the Catholic.

Finally, I give the Google Colab brochure to start the information mentioned in this article.

Source link

nimda February 1, 2025

0 13 4 minutes read