Verify that a steered model's activations are correctly aligned with the intended steering direction at inference time. This command compares activations between the base model and the steered model to ensure steering is working correctly.
python -m wisent verify-steering MODEL_PATH [OPTIONS]
python -m wisent verify-steering ./steered_model/
python -m wisent verify-steering ./steered_model/ \ --prompts "Is the Earth flat?" "What is 2+2?" \ --verbose
python -m wisent verify-steering ./steered_model/ \ --prompts-file ./test_prompts.json \ --output ./verification_results.json
python -m wisent verify-steering ./steered_model/ \ --layers "10,15,20" \ --alignment-threshold 0.5 \ --verbose
python -m wisent verify-steering ./titan_steered_model/ \ --check-gate \ --check-intensity \ --verbose
| Argument | Description |
|---|---|
| model_path | Path to the steered model (TITAN, PULSE, or CAA) |
| Argument | Default | Description |
|---|---|---|
| --base-model | auto | Path or name of base model for comparison (auto-detected from config) |
| --device | auto | Device to use: auto, cuda, mps, cpu |
| Argument | Description |
|---|---|
| --prompts | Test prompts to verify steering on (space-separated) |
| --prompts-file | JSON file containing test prompts |
| Argument | Default | Description |
|---|---|---|
| --layers | all | Comma-separated layer indices to check |
| --alignment-threshold | 0.3 | Minimum alignment score to consider steering successful |
| --check-gate | True | Check gate network discrimination (TITAN/PULSE) |
| --check-intensity | True | Check intensity network predictions (TITAN) |
| Argument | Description |
|---|---|
| --output | Output file for detailed results (JSON format) |
| --verbose | Print detailed per-layer diagnostics |
The steering type is automatically detected from config files in the model directory (titan_config.json, pulse_config.json, or caa_config.json).
Stay in the loop. Never miss out.
Subscribe to our newsletter and unlock Wisent insights.