get-activations

Extract model activations from contrastive pairs. This command loads pairs, runs them through the model, and saves the layer activations for use in steering vector generation.

Basic Usage

python -m wisent get-activations PAIRS_FILE --output FILE [OPTIONS]

Examples

Basic Activation Extraction

python -m wisent get-activations ./pairs/truthfulqa.json \
  --output ./activations/truthfulqa_enriched.json \
  --model meta-llama/Llama-3.1-8B-Instruct \
  --layers 8,12,15

With Extraction Strategy

python -m wisent get-activations ./pairs/helpfulness.json \
  --output ./activations/helpfulness_enriched.json \
  --model meta-llama/Llama-3.1-8B-Instruct \
  --extraction-strategy chat_last \
  --layers all

Raw Hidden States

python -m wisent get-activations ./pairs/bias.json \
  --output ./activations/bias_raw.json \
  --model meta-llama/Llama-3.1-8B-Instruct \
  --raw \
  --verbose

Arguments

Input/Output

Argument	Default	Description
pairs_file	required	Path to JSON file containing contrastive pairs
--output	required	Output file path for pairs with activations (JSON)

Model Configuration

Argument	Default	Description
--model	Llama-3.2-1B-Instruct	Model identifier (HuggingFace model name or path)
--device	auto	Device to run on (cuda, cpu, mps)
--layers	all	Comma-separated layer indices or 'all'

Extraction Options

Argument	Default	Description
--extraction-strategy	chat_mean	How to extract activation vectors from tokens
--raw	false	Output raw hidden states [seq_len, hidden_size]
--batch-size	1	Batch size for processing
--limit	all	Maximum number of pairs to process

Extraction Strategies

Chat models:

chat_mean - Average across all response tokens
chat_first - First response token
chat_last - Last response token
chat_max_norm - Token with maximum norm
chat_weighted - Weighted average by position
role_play - Role-playing extraction
mc_balanced - Balanced multiple choice

Base models:

completion_last - Last token of completion
completion_mean - Mean of completion tokens
mc_completion - Multiple choice completion

Related Commands

generate-pairs-from-task - Generate pairs from benchmarks
create-steering-vector - Create vectors from enriched pairs
check-linearity - Check representation linearity

Stay in the loop. Never miss out.

Subscribe to our newsletter and unlock Wisent insights.

Contact Careers Privacy Policy Terms of Service