Token Steered - Position-based steering control that applies different intervention strengths to specific token positions during generation.
Token steering addresses a fundamental challenge in activation steering: as models generate longer sequences, the cumulative effects of steering interventions can compound and distort the intended behavior. Traditional steering applies the same intervention strength to every token position, but this can lead to over-steering in longer responses.
Token steering solves this by allowing fine-grained control over when and how strongly to apply steering interventions. You can target specific token positions (like only the first token or second-to-last), apply different strengths to different positions, or use decay/growth patterns that automatically adjust intervention strength based on sequence length.
This approach enables more precise control over model behavior while preventing the accumulation effects that can make longer generations unstable or overly influenced by the steering vector. Token steering works with any underlying steering method (CAA, BiPO, DAC, HPR, K-Steering) as an additional layer of control.
python -m wisent_guard.cli tasks questions.json --from-json --steering-mode --steering-method CAA --layer 15 --load-steering-vector honesty.pt --enable-token-steering --token-steering-strategy first_only
python -m wisent_guard.cli tasks responses.json --from-json --steering-mode --steering-method BiPO --layer 14 --load-steering-vector style.pt --enable-token-steering --token-steering-strategy second_to_last
python -m wisent_guard.cli tasks dialogue.json --from-json --steering-mode --steering-method DAC --layer 16 --load-steering-vector empathy.pt --enable-token-steering --token-steering-strategy all_equal
python -m wisent_guard.cli tasks stories.json --from-json --steering-mode --steering-method HPR --layer 15 --load-steering-vector creativity.pt --enable-token-steering --token-steering-strategy exponential_decay --token-decay-rate 0.8
python -m wisent_guard.cli tasks analysis.json --from-json --steering-mode --steering-method K-Steering --layer 17 --load-steering-vector logic.pt --enable-token-steering --token-steering-strategy linear_growth --token-max-strength 2.0
python -m wisent_guard.cli tasks conversations.json --from-json --steering-mode --steering-method CAA --layer 13 --load-steering-vector politeness.pt --enable-token-steering --token-steering-strategy exponential_decay --token-decay-rate 0.6 --token-min-strength 0.2
first_only
: Apply steering only to the first generated tokenlast_only
: Apply steering only to the last token positionsecond_to_last
: Target the second-to-last token (most common)all_equal
: Apply equal strength to all positionsexponential_decay
: Strong initial steering, exponentially weakeninglinear_decay
: Linear reduction in steering strengthexponential_growth
: Weak initial steering, exponentially strengtheninglinear_growth
: Linear increase in steering strength--enable-token-steering
: Enable position-based steering control--token-steering-strategy
: Strategy for applying steering (first_only, last_only, second_to_last, all_equal, exponential_decay, linear_decay, exponential_growth, linear_growth)--token-decay-rate
: Rate of decay for exponential strategies (0.0-1.0, default 0.5)--token-min-strength
: Minimum strength threshold for decay strategies (default 0.1)--token-max-strength
: Maximum strength ceiling for growth strategies (default 1.0)For the complete implementation of the Token Steered method, explore the source code:
View token_steered.py on GitHub