Layer

A Layer is a single processing block in a transformer model that updates the residual stream. It typically consists of an attention mechanism and an MLP (feedforward network).

Model parameters are structured in layers. As information flows from the input to the output, it progresses through the model from one layer to another. For example, Llama 3.1-8B Instruct has 32 layers.

Each layer processes the information it receives from the previous layer and passes the result to the next layer. Think of it like an assembly line - each layer performs a specific transformation on the data before handing it off to the next stage.

In transformer models like those supported by Wisent-Guard, each layer typically contains attention mechanisms (which help the model focus on relevant parts of the input) and feedforward networks (which process and transform the information).

Layer Selection for Monitoring

Optimal Layer Choices

Early Layers (0-25%)

Process basic linguistic features like syntax and word relationships.

Middle Layers (25-75%)

OPTIMAL

Develop semantic understanding and complex reasoning patterns. Often best for representation engineering.

Late Layers (75-100%)

Prepare for output generation and final decision making.

Layer Usage in Wisent-Guard

Single Layer

Use a specific layer for monitoring:

python -m wisent_guard tasks mmlu --layer 15 --model meta-llama/Llama-3.1-8B-Instruct --limit 10

Multiple Layers

Monitor multiple specific layers:

python -m wisent_guard tasks hellaswag --layer 10,15,20 --model meta-llama/Llama-3.1-8B-Instruct --limit 10

Layer Range

Monitor a range of layers:

python -m wisent_guard tasks truthfulqa --layer 14-16 --model meta-llama/Llama-3.1-8B-Instruct --limit 10

Auto-Optimized Layer

RECOMMENDED

Let Wisent-Guard find the optimal layer automatically:

python -m wisent_guard tasks mmlu --layer -1 --model meta-llama/Llama-3.1-8B-Instruct --limit 10

Layer Analysis

Activation Patterns

Different layers exhibit distinct activation patterns. Early layers focus on syntax, while deeper layers capture semantics and reasoning.

Representation Quality

Middle layers typically contain the richest representations for most tasks, balancing between low-level features and high-level abstractions.

Model-Specific Patterns

Different model architectures and sizes may have optimal layers at different positions. Experimentation is key to finding the best layers.

Best Practices

Layer Selection Guidelines

Start with middle layers - Layer 15 is often a good starting point for 32-layer models
Experiment with ranges - Test the middle 30-50% of your model's layers to find optimal performance
Consider model size - Larger models may need deeper layers for best results
Task-specific optimization - Different tasks may benefit from different layer choices
Monitor computational cost - More layers increase processing overhead

Continue to Contrastive Pair