train-unified-goodness

Using pooled benchmark data train one "high quality" vector for steering. This results in a unified vector that encodes common good behavior across various different tasks and domains. To train a single high 'goodness' steering vector using pooled multi task data leads to creating

Basic Usage

python -m wisent train-unified-goodness --model MODEL [OPTIONS]

Examples

Basic Training

python -m wisent train-unified-goodness \
  --model meta-llama/Llama-3.1-8B-Instruct \
  --layer 15 \
  --output ./vectors/unified_goodness.pt

With Specific Benchmarks

python -m wisent train-unified-goodness \
  --model meta-llama/Llama-3.1-8B-Instruct \
  --layer 15 \
  --benchmarks truthfulqa_mc1 mmlu hellaswag \
  --samples-per-benchmark 200 \
  --output ./vectors/unified_goodness.pt

With Weighted Benchmarks

python -m wisent train-unified-goodness \
  --model meta-llama/Llama-3.1-8B-Instruct \
  --layer 15 \
  --benchmark-weights truthfulqa_mc1:2.0 mmlu:1.0 hellaswag:1.5 \
  --output ./vectors/unified_goodness.pt

Full Training with All Options

python -m wisent train-unified-goodness \
  --model meta-llama/Llama-3.1-8B-Instruct \
  --layer 15 \
  --samples-per-benchmark 500 \
  --steering-method CAA \
  --test-after-training \
  --test-prompts 10 \
  --output ./vectors/unified_goodness.pt \
  --verbose

Arguments

Model & Training

Argument	Default	Description
--model	required	Model name or path
--layer	15	Layer for activation extraction
--steering-method	CAA	Steering method to use
--device	auto	Device to run on

Benchmark Selection

Argument	Default	Description
--benchmarks	all available	Specific benchmarks to use
--samples-per-benchmark	100	Number of samples per benchmark
--benchmark-weights	equal	Custom weights for benchmarks (format: name:weight)

Output & Testing

Argument	Description
--output	Output path for the unified vector
--test-after-training	Run test prompts after training
--test-prompts	Number of test prompts to run
--verbose	Enable verbose output

How It Works

Collect Data - Gathers contrastive pairs from multiple benchmarks
Pool Data - Combines all pairs into a unified training set
Weight Samples - Optionally applies custom weights to different benchmarks
Extract Activations - Gets model activations for all pairs
Compute Direction - Calculates the unified steering direction
Test Vector - Optionally evaluates on test prompts

Use Cases

General Improvement - Create a single vector that improves model behavior across tasks
Multi-Domain Steering - Steer behavior consistently across different domains
Transfer Learning - Use unified vector as starting point for specialized steering

Related Commands

generate-vector - Generate single-task vectors
multi-steer - Combine multiple vectors
tasks - Run tasks with steering

Stay in the loop. Never miss out.

Subscribe to our newsletter and unlock Wisent insights.

Contact Careers Privacy Policy Terms of Service