modify-weights

Permanently modify model weights using steering vectors. This command generates steering vectors and bakes them into the model weights using either directional projection or additive methods. The resulting model behaves differently without requiring runtime steering.

Basic Usage

python -m wisent modify-weights --task TASK --model MODEL --output-dir DIR [OPTIONS]

Examples

Benchmark-Based Modification

python -m wisent modify-weights \
  --task arc_easy \
  --model meta-llama/Llama-3.1-8B-Instruct \
  --output-dir ./modified_model/ \
  --method directional \
  --strength 1.0

Refusal Suppression

python -m wisent modify-weights \
  --task refusal \
  --model meta-llama/Llama-3.1-8B-Instruct \
  --output-dir ./uncensored_model/ \
  --method directional \
  --strength 1.5 \
  --num-pairs 200

Personalization Modification

python -m wisent modify-weights \
  --task personalization \
  --trait "responds with a British accent and uses British slang" \
  --model meta-llama/Llama-3.1-8B-Instruct \
  --output-dir ./british_model/ \
  --method titan \
  --titan-num-directions 8

Multi-Benchmark Modification

python -m wisent modify-weights \
  --task arc_easy,gsm8k,hellaswag \
  --model meta-llama/Llama-3.1-8B-Instruct \
  --output-dir ./smart_model/ \
  --cap-pairs-per-benchmark 50 \
  --use-kernel \
  --push-to-hub \
  --repo-id myuser/smart-llama

Guided Modification

python -m wisent modify-weights \
  --task truthfulqa_mc1 \
  --model meta-llama/Llama-3.1-8B-Instruct \
  --output-dir ./truthful_model/ \
  --guided \
  --guided-mode surgical \
  --surgical-top-k 3 \
  --save-diagnostics ./diagnostics.json

Arguments

Input Source (mutually exclusive)

Argument	Description
--task	Task to modify: refusal, personalization, custom, benchmark name, or comma-separated benchmarks
--steering-vectors	Path to pre-computed steering vectors file (.json or .pt)

Required Arguments

Argument	Description
--model	Model identifier (HuggingFace model ID or path)
--output-dir	Directory to save modified model

Weight Modification Method

Argument	Default	Description
--method	auto	Modification method: auto, directional, additive, titan, pulse, prism
--steering-method	auto	Vector generation: auto, caa, hyperplane, prism, pulse, titan, mlp
--strength	1.0	Projection strength for directional method
--alpha	1.0	Steering strength for additive method
--components	method-specific	Components to modify (e.g., self_attn.o_proj mlp.down_proj)

TITAN Parameters

Argument	Default	Description
--titan-mode	hybrid	Mode: static, dynamic, or hybrid
--titan-num-directions	8	Number of manifold directions

Guided Modification

Argument	Default	Description
--guided	false	Use linearity-guided weight modification
--guided-mode	adaptive	Mode: full, surgical, or adaptive
--surgical-top-k	3	Number of top layers for surgical mode
--min-linear-score	0.5	Minimum linear score to include a layer

Kernel-Based Layer Weighting

Argument	Default	Description
--use-kernel	false	Use Gaussian-like kernel for smooth layer weighting
--max-weight	1.5	Peak weight at center layer
--min-weight	0.3	Minimum weight at edges

Export Options

Argument	Description
--push-to-hub	Upload modified model to HuggingFace Hub
--repo-id	HuggingFace repository ID (required if --push-to-hub)
--private	Make Hub repository private

Related Commands

optimize-weights - Optimize modification parameters
create-steering-vector - Create steering vectors only
verify-steering - Verify steering effectiveness

Stay in the loop. Never miss out.

Subscribe to our newsletter and unlock Wisent insights.

Contact Careers Privacy Policy Terms of Service