Contrastive Pair Set

A Contrastive Pair Set is a set of contrastive pairs used to extract a representation.

Contrastive pair sets are collections of multiple contrastive pairs.

Contrastive pair sets are collections of multiple contrastive pairs. These sets are used to extract activations to approximate how a particular representation looks like from a large set of contrasting activations. Wisent-Guard supports using actual benchmarks as contrastive pair sets through the tasks argument in the CLI.

By default, 80% of the contrastive pair set is going to be used to create a representation, train a classifier or construct the steering vector. The remaining 20% will be used for evaluating your classifier or steering. Because of this, all Wisent-Guard results are reported as the results on samples that the model has not been trained on before.

You can control the split within a particular contrastive pair set to control how many are used for training and how many are used for testing. We have also created the contrastive pair set primitive for you to be able to add your own data for representation engineering. All you need is for your data to conform to the contrastive pair set standard.

Implementation Details

For a complete understanding of how contrastive pair sets work in Wisent-Guard, including the full implementation of pair collection, validation, and processing logic, explore the source code:

View contrastive_pair_set.py on GitHub