Common Issues

Proper representation engineering proves challenging; sometimes results fail to suit specific needs perfectly. If you encounter problems there are some suggestions as well as an itemized checklist of considerations prior to deployment of your tool. To rephrase the given text more naturally with equivalent meaning: Engineering proper representations can be very hard. Results often do not match up perfectly to specific needs. Should you run into difficulties, here are some tips along with a checklist

Using an unoptimised layer

Based on internal testing results for both Llama 3.1 8B and Mistral 7B, Layer 15 out of 32 performs best; however, using Layer 15 in other models might result in classifier and steering functions not working properly. Layers differ among models and so activation extraction needs careful checking to ensure use of the calibration layer we have selected specifically for each one.

Using a steering strength that is too large

While it may be alluring to direct the model very effectively toward higher performance levels, employing a high steering factor frequently leads to depersonalization or 'lobotomy' of the model. Using overly strong steering factors becomes evident if you use large token strengths for favorite control vectors; there is then an increased likelihood of nonsense tokens and repetition. Take care especially when steering excessively.

Troubleshooting Checklist

1

Have I chosen an open source model?

2

Am I aware of which layer/s of the model should I be using for my task?

3

Am I capturing the activations from a specific contrastive pair set?

4

What is the specific prompt the activations are collected from?

5

What token are the activations captured from?

6

What kind of classifier am I training on those activations?

7

How effective is the classifier I have trained?

8

How am I evaluating the ground truth of the classifier?

9

How am I using the activations to create a control vector?

10

What is the steering method I am using? What tokens are steered?

11

How effective is the steering? Are my responses not getting lobotomised?

Stay in the loop. Never miss out.

Subscribe to our newsletter and unlock Wisent insights.