Common Issues

Using an unoptimised layer

From our internal testing, for both Llama 3.1. 8B and Mistral-7B, the best performing layer is layer 15 out of 32. However, if you plan to use the layer 15 in other models, it is likely that your classifiers and steering are not going to work correctly. Each model is different and has a different number of layers. Make sure to check you are extracting the activations from the best performing layer that we calibrated for you.

Using a steering strength that is too large

While it might be tempting to steer the model strongly in the direction of better performance, using a steering strength that is too large often results in lobotomising the model. When using an outsized steering strength (you can see it by yourself with large token strength for your favourite control vector), the probability of nonsensical tokens and repetitive tokens increases. Be careful with too much steering in those cases!

Representation engineering is difficult to perform properly. Sometimes the results don't really work for your specific use case. If you are facing issues, here is a list of steps you might want to consider before deploying your tool.

Troubleshooting Checklist

  1. Have I chosen an open source model?
  2. Am I aware of which layer/s of the model should I be using for my task?
  3. Am I capturing the activations from a specific contrastive pair set?
  4. What is the specific prompt the activations are collected from?
  5. What token are the activations captured from?
  6. What kind of classifier am I training on those activations?
  7. How effective is the classifier I have trained?
  8. How am I evaluating the ground truth of the classifier?
  9. How am I using the activations to create a control vector?
  10. What is the steering method I am using? What tokens are steered?
  11. How effective is the steering? Are my responses not getting lobotomised?