Token Steered

Steering influences activation levels during inference by modulating them according to specific methods which culminate in a control vector that is then applied to a certain layer. Different methods result in varied values for this vector depending on what token is generated. Consequently, when response lengths increase, there is distortion because accumulated effects from modifying activations affect successive tokens. Therefore, we are able to vary interventions so they occur only on selected tokens rather than all; this happens by default. But you can also choose to target just the initial token or selectively decrease intervention strength as generation proceeds through additional tokens.

Wisent comes with several built-in approaches for adjusting the steering token right from the start.

Token targeting strategies showing last_token, all_tokens, exp_decay, and suffix_only options

To fully implement Token Steering method, review the source code.

Stay in the loop. Never miss out.

Subscribe to our newsletter and unlock Wisent insights.