A Layer is a single processing block in a transformer model that updates the residual stream. It typically consists of an attention mechanism and an MLP (feedforward network).
Model parameters are structured in layers. As information flows from the input to the output, it progresses through the model from one layer to another. For example, Llama 3.1-8B Instruct has 32 layers.
Each layer processes the information it receives from the previous layer and passes the result to the next layer. Think of it like an assembly line - each layer performs a specific transformation on the data before handing it off to the next stage.
In transformer models like those supported by Wisent-Guard, each layer typically contains attention mechanisms (which help the model focus on relevant parts of the input) and feedforward networks (which process and transform the information).
Process basic linguistic features like syntax and word relationships.
Develop semantic understanding and complex reasoning patterns. Often best for representation engineering.
Prepare for output generation and final decision making.
Use a specific layer for monitoring:
python -m wisent_guard tasks mmlu --layer 15 --model meta-llama/Llama-3.1-8B-Instruct --limit 10
Monitor multiple specific layers:
python -m wisent_guard tasks hellaswag --layer 10,15,20 --model meta-llama/Llama-3.1-8B-Instruct --limit 10
Monitor a range of layers:
python -m wisent_guard tasks truthfulqa --layer 14-16 --model meta-llama/Llama-3.1-8B-Instruct --limit 10
Let Wisent-Guard find the optimal layer automatically:
python -m wisent_guard tasks mmlu --layer -1 --model meta-llama/Llama-3.1-8B-Instruct --limit 10
Different layers exhibit distinct activation patterns. Early layers focus on syntax, while deeper layers capture semantics and reasoning.
Middle layers typically contain the richest representations for most tasks, balancing between low-level features and high-level abstractions.
Different model architectures and sizes may have optimal layers at different positions. Experimentation is key to finding the best layers.