create-steering-vector

Create full steering objects from enriched contrastive pairs with activations. Unlike simpler vector generation, this command preserves method-specific components like gates, intensity networks, and multi-directional steering.

Basic Usage
python -m wisent create-steering-vector ENRICHED_PAIRS_FILE --output FILE [OPTIONS]

Examples

Basic CAA Steering Object
python -m wisent create-steering-vector \
  ./enriched_pairs/helpfulness.json \
  --method caa \
  --output ./steering/helpfulness_caa.pt
TITAN Steering Object
python -m wisent create-steering-vector \
  ./enriched_pairs/harmfulness.json \
  --method titan \
  --titan-num-directions 5 \
  --titan-max-alpha 3.0 \
  --output ./steering/harmfulness_titan.pt
PULSE Steering Object
python -m wisent create-steering-vector \
  ./enriched_pairs/honesty.json \
  --method pulse \
  --pulse-gate-temperature 0.1 \
  --output ./steering/honesty_pulse.pt
PRISM Multi-Directional
python -m wisent create-steering-vector \
  ./enriched_pairs/bias.json \
  --method prism \
  --prism-num-directions 3 \
  --prism-optimization-steps 100 \
  --output ./steering/bias_prism.pt

Arguments

Required Arguments

ArgumentDescription
enriched_pairs_filePath to JSON file containing contrastive pairs with activations
--outputOutput file path for steering object (.pt or .json)

Steering Method

ArgumentDefaultDescription
--methodcaaSteering method: caa, hyperplane, mlp, prism, pulse, titan
--normalizeTrueL2-normalize steering vectors
--no-normalize-Disable L2-normalization

MLP Parameters

ArgumentDefaultDescription
--mlp-hidden-dim256Hidden dimension for MLP classifier
--mlp-num-layers2Number of hidden layers
--mlp-dropout0.1Dropout rate
--mlp-epochs100Training epochs
--mlp-learning-rate0.001Learning rate

Hyperplane Parameters

ArgumentDefaultDescription
--hyperplane-max-iter1000Max iterations for logistic regression
--hyperplane-C1.0Regularization strength

PRISM Parameters

ArgumentDefaultDescription
--prism-num-directions3Number of directions to discover per layer
--prism-optimization-steps100Optimization steps
--prism-learning-rate0.01Learning rate

PULSE Parameters

ArgumentDefaultDescription
--pulse-sensor-layerautoSensor layer index for gating
--pulse-condition-threshold0.5Condition threshold for gating
--pulse-gate-temperature0.1Gate temperature
--pulse-learn-thresholdTrueLearn optimal threshold

TITAN Parameters

ArgumentDefaultDescription
--titan-num-directions5Number of directions per layer
--titan-sensor-layerautoSensor layer for gating
--titan-gate-hidden-dimautoGate network hidden dimension
--titan-intensity-hidden-dimautoIntensity network hidden dimension
--titan-max-alpha3.0Maximum steering intensity
--titan-gate-temperature0.5Gate temperature

Display Options

ArgumentDescription
--verboseEnable verbose output
--timingShow timing information

Supported Methods

  • CAA - Contrastive Activation Addition (simple mean difference)
  • Hyperplane - Logistic regression-based separating hyperplane
  • MLP - Multi-layer perceptron classifier
  • PRISM - Multi-directional steering with multiple directions per layer
  • PULSE - Conditional gating with sensor layer detection
  • TITAN - Full adaptive steering with learned gate and intensity networks

Related Commands

Stay in the loop. Never miss out.

Subscribe to our newsletter and unlock Wisent insights.