Using pooled benchmark data train one "high quality" vector for steering. This results in a unified vector that encodes common good behavior across various different tasks and domains. To train a single high 'goodness' steering vector using pooled multi task data leads to creating
python -m wisent train-unified-goodness --model MODEL [OPTIONS]
python -m wisent train-unified-goodness \ --model meta-llama/Llama-3.1-8B-Instruct \ --layer 15 \ --output ./vectors/unified_goodness.pt
python -m wisent train-unified-goodness \ --model meta-llama/Llama-3.1-8B-Instruct \ --layer 15 \ --benchmarks truthfulqa_mc1 mmlu hellaswag \ --samples-per-benchmark 200 \ --output ./vectors/unified_goodness.pt
python -m wisent train-unified-goodness \ --model meta-llama/Llama-3.1-8B-Instruct \ --layer 15 \ --benchmark-weights truthfulqa_mc1:2.0 mmlu:1.0 hellaswag:1.5 \ --output ./vectors/unified_goodness.pt
python -m wisent train-unified-goodness \ --model meta-llama/Llama-3.1-8B-Instruct \ --layer 15 \ --samples-per-benchmark 500 \ --steering-method CAA \ --test-after-training \ --test-prompts 10 \ --output ./vectors/unified_goodness.pt \ --verbose
| Argument | Default | Description |
|---|---|---|
| --model | required | Model name or path |
| --layer | 15 | Layer for activation extraction |
| --steering-method | CAA | Steering method to use |
| --device | auto | Device to run on |
| Argument | Default | Description |
|---|---|---|
| --benchmarks | all available | Specific benchmarks to use |
| --samples-per-benchmark | 100 | Number of samples per benchmark |
| --benchmark-weights | equal | Custom weights for benchmarks (format: name:weight) |
| Argument | Description |
|---|---|
| --output | Output path for the unified vector |
| --test-after-training | Run test prompts after training |
| --test-prompts | Number of test prompts to run |
| --verbose | Enable verbose output |
Stay in the loop. Never miss out.
Subscribe to our newsletter and unlock Wisent insights.