Control Vector

A Control Vector is a vector that gets added to a particular layer to influence the activations and hence the words being generated.

Control vectors are sets of values added to the activations of the model at inference time. They act as a bias steering the model in a particular direction. We use the activation aggregation method to create a particular control vector. Training a compute vector is usually pretty compute intense and takes a while, especially with a large number of samples. Control vectors are contrastive pair set, model and layer specific.

By default, with steering enabled, Wisent-Guard performs an end-to-end process of training and steering with the 80/20 part of your contrastive pair set. Wisent-Guard supports both training the control vector to save it for reusing and using a pre-trained control vector for inference.

While it is possible to apply the steering conditionally or dynamically, Wisent-Guard supports only one method of steering. With a constant, additive vector on all tokens. This is going to change in the future to allow more granular level of control over your steering.

Implementation Details

For a complete understanding of how control vectors work in Wisent-Guard, including the full implementation of vector creation, application, and optimization techniques, explore the source code:

Control vectors can be saved and loaded so that computation is not necessary every time you want to use them. Wisent-Guard supports saving and loading trained control vectors through the model_persistence.pyModule handling the saving and loading of trained models and control vectors file. This allows you to compute a control vector once and reuse it multiple times, significantly reducing computational overhead and setup time for repeated applications.

View control_vector.py on GitHub

Continue to Steering