Permanently modify model weights using steering vectors. This command generates steering vectors and bakes them into the model weights using either directional projection or additive methods. The resulting model behaves differently without requiring runtime steering.
python -m wisent modify-weights --task TASK --model MODEL --output-dir DIR [OPTIONS]
python -m wisent modify-weights \ --task arc_easy \ --model meta-llama/Llama-3.1-8B-Instruct \ --output-dir ./modified_model/ \ --method directional \ --strength 1.0
python -m wisent modify-weights \ --task refusal \ --model meta-llama/Llama-3.1-8B-Instruct \ --output-dir ./uncensored_model/ \ --method directional \ --strength 1.5 \ --num-pairs 200
python -m wisent modify-weights \ --task personalization \ --trait "responds with a British accent and uses British slang" \ --model meta-llama/Llama-3.1-8B-Instruct \ --output-dir ./british_model/ \ --method titan \ --titan-num-directions 8
python -m wisent modify-weights \ --task arc_easy,gsm8k,hellaswag \ --model meta-llama/Llama-3.1-8B-Instruct \ --output-dir ./smart_model/ \ --cap-pairs-per-benchmark 50 \ --use-kernel \ --push-to-hub \ --repo-id myuser/smart-llama
python -m wisent modify-weights \ --task truthfulqa_mc1 \ --model meta-llama/Llama-3.1-8B-Instruct \ --output-dir ./truthful_model/ \ --guided \ --guided-mode surgical \ --surgical-top-k 3 \ --save-diagnostics ./diagnostics.json
| Argument | Description |
|---|---|
| --task | Task to modify: refusal, personalization, custom, benchmark name, or comma-separated benchmarks |
| --steering-vectors | Path to pre-computed steering vectors file (.json or .pt) |
| Argument | Description |
|---|---|
| --model | Model identifier (HuggingFace model ID or path) |
| --output-dir | Directory to save modified model |
| Argument | Default | Description |
|---|---|---|
| --method | auto | Modification method: auto, directional, additive, titan, pulse, prism |
| --steering-method | auto | Vector generation: auto, caa, hyperplane, prism, pulse, titan, mlp |
| --strength | 1.0 | Projection strength for directional method |
| --alpha | 1.0 | Steering strength for additive method |
| --components | method-specific | Components to modify (e.g., self_attn.o_proj mlp.down_proj) |
| Argument | Default | Description |
|---|---|---|
| --titan-mode | hybrid | Mode: static, dynamic, or hybrid |
| --titan-num-directions | 8 | Number of manifold directions |
| Argument | Default | Description |
|---|---|---|
| --guided | false | Use linearity-guided weight modification |
| --guided-mode | adaptive | Mode: full, surgical, or adaptive |
| --surgical-top-k | 3 | Number of top layers for surgical mode |
| --min-linear-score | 0.5 | Minimum linear score to include a layer |
| Argument | Default | Description |
|---|---|---|
| --use-kernel | false | Use Gaussian-like kernel for smooth layer weighting |
| --max-weight | 1.5 | Peak weight at center layer |
| --min-weight | 0.3 | Minimum weight at edges |
| Argument | Description |
|---|---|
| --push-to-hub | Upload modified model to HuggingFace Hub |
| --repo-id | HuggingFace repository ID (required if --push-to-hub) |
| --private | Make Hub repository private |
Stay in the loop. Never miss out.
Subscribe to our newsletter and unlock Wisent insights.