modify-weights

Permanently modify model weights using steering vectors. This command generates steering vectors and bakes them into the model weights using either directional projection or additive methods. The resulting model behaves differently without requiring runtime steering.

Basic Usage
python -m wisent modify-weights --task TASK --model MODEL --output-dir DIR [OPTIONS]

Examples

Benchmark-Based Modification
python -m wisent modify-weights \
  --task arc_easy \
  --model meta-llama/Llama-3.1-8B-Instruct \
  --output-dir ./modified_model/ \
  --method directional \
  --strength 1.0
Refusal Suppression
python -m wisent modify-weights \
  --task refusal \
  --model meta-llama/Llama-3.1-8B-Instruct \
  --output-dir ./uncensored_model/ \
  --method directional \
  --strength 1.5 \
  --num-pairs 200
Personalization Modification
python -m wisent modify-weights \
  --task personalization \
  --trait "responds with a British accent and uses British slang" \
  --model meta-llama/Llama-3.1-8B-Instruct \
  --output-dir ./british_model/ \
  --method titan \
  --titan-num-directions 8
Multi-Benchmark Modification
python -m wisent modify-weights \
  --task arc_easy,gsm8k,hellaswag \
  --model meta-llama/Llama-3.1-8B-Instruct \
  --output-dir ./smart_model/ \
  --cap-pairs-per-benchmark 50 \
  --use-kernel \
  --push-to-hub \
  --repo-id myuser/smart-llama
Guided Modification
python -m wisent modify-weights \
  --task truthfulqa_mc1 \
  --model meta-llama/Llama-3.1-8B-Instruct \
  --output-dir ./truthful_model/ \
  --guided \
  --guided-mode surgical \
  --surgical-top-k 3 \
  --save-diagnostics ./diagnostics.json

Arguments

Input Source (mutually exclusive)

ArgumentDescription
--taskTask to modify: refusal, personalization, custom, benchmark name, or comma-separated benchmarks
--steering-vectorsPath to pre-computed steering vectors file (.json or .pt)

Required Arguments

ArgumentDescription
--modelModel identifier (HuggingFace model ID or path)
--output-dirDirectory to save modified model

Weight Modification Method

ArgumentDefaultDescription
--methodautoModification method: auto, directional, additive, titan, pulse, prism
--steering-methodautoVector generation: auto, caa, hyperplane, prism, pulse, titan, mlp
--strength1.0Projection strength for directional method
--alpha1.0Steering strength for additive method
--componentsmethod-specificComponents to modify (e.g., self_attn.o_proj mlp.down_proj)

TITAN Parameters

ArgumentDefaultDescription
--titan-modehybridMode: static, dynamic, or hybrid
--titan-num-directions8Number of manifold directions

Guided Modification

ArgumentDefaultDescription
--guidedfalseUse linearity-guided weight modification
--guided-modeadaptiveMode: full, surgical, or adaptive
--surgical-top-k3Number of top layers for surgical mode
--min-linear-score0.5Minimum linear score to include a layer

Kernel-Based Layer Weighting

ArgumentDefaultDescription
--use-kernelfalseUse Gaussian-like kernel for smooth layer weighting
--max-weight1.5Peak weight at center layer
--min-weight0.3Minimum weight at edges

Export Options

ArgumentDescription
--push-to-hubUpload modified model to HuggingFace Hub
--repo-idHuggingFace repository ID (required if --push-to-hub)
--privateMake Hub repository private

Related Commands

Stay in the loop. Never miss out.

Subscribe to our newsletter and unlock Wisent insights.