DAC

DAC - Dynamic Activation Composition that adjusts steering strength based on model uncertainty and enables multi-directional steering.

How DAC Works

DAC (Dynamic Activation Composition) uses the exact same training process as CAA - computing the difference between positive and negative activation averages to create a steering vector. Dynamic Activation Composition steers in multiple directions at once. It allows for steering towards multiple properties at the same time and optimises the scale of steering.

However, during inference, DAC adds dynamic control by adjusting the steering strength based on the model's uncertainty about what to generate next. When token probabilities are available, it uses the entropy of the probability distribution to modulate steering strength (high entropy means high uncertainty, so reduce steering), and when probabilities aren't available, it falls back to using the variance of the current activations as a proxy for uncertainty.

The dynamic strength is computed as base_strength × (1 - normalized_uncertainty), so the AI gets steered more strongly when it's confident and less when it's uncertain. Like other methods, it targets the second-to-last token position and can save/load the steering vector for reuse.

CLI Examples

# Basic DAC training

python -m wisent_guard.cli tasks caution_pairs.json --from-json --steering-mode --steering-method DAC --layer 15 --save-steering-vector caution_dac.pt

# DAC with dynamic control enabled

python -m wisent_guard.cli tasks empathy_pairs.json --from-json --steering-mode --steering-method DAC --layer 14 --dac-dynamic-control --save-steering-vector empathy_dac.pt

# DAC inference with custom sensitivity

python -m wisent_guard.cli tasks advice_questions.json --from-json --steering-mode --steering-method DAC --layer 15 --load-steering-vector caution_dac.pt --dac-sensitivity 0.3 --steering-strength 1.2

# DAC with linear growth token strategy

python -m wisent_guard.cli tasks reasoning_pairs.json --from-json --steering-mode --steering-method DAC --layer 16 --enable-token-steering --token-steering-strategy linear_growth --save-steering-vector reasoning_dac.pt

# DAC training with threshold control

python -m wisent_guard.cli tasks nuance_pairs.json --from-json --steering-mode --steering-method DAC --layer 13 --dac-threshold 0.6 --allow-small-dataset --save-steering-vector nuance_dac.pt

Parameters

DAC Specific Parameters

--dac-dynamic-control: Enable uncertainty-based steering adjustment
--dac-sensitivity: How sensitive to uncertainty (0.0-1.0, default 0.2)
--dac-threshold: Uncertainty threshold for steering (0.0-1.0, default 0.5)

Token Steering Parameters

--enable-token-steering: Enable position-based steering
--token-steering-strategy: last_only, second_to_last, first_only, all_equal, exponential_decay, exponential_growth, linear_decay, linear_growth
--token-decay-rate: Decay rate for exponential strategies (default 0.5)
--token-min-strength: Minimum strength for decay strategies (default 0.1)
--token-max-strength: Maximum strength for growth strategies (default 1.0)

Implementation Details

For the complete implementation of the DAC steering method in Wisent-Guard, see:

dac.py Original Paper Original Implementation

Continue to HPR