synthetic

Pipeline designed for direct generation of contrasting synthetic pairs based on descriptions of traits alongside extraction of activation along with training towards steering vectors; this method achieves fast creation of such vectors specifically tailored to desired behavior.

Basic Usage
python -m wisent synthetic --trait DESCRIPTION [OPTIONS]

Examples

Create Helpfulness Vector
python -m wisent synthetic \
  --trait "responds more helpfully with detailed explanations" \
  --num-pairs 30 \
  --model meta-llama/Llama-3.1-8B-Instruct \
  --layer 15 \
  --save-pairs ./pairs/helpfulness.json \
  --output ./vectors/helpfulness.pt
Create Personality Vector
python -m wisent synthetic \
  --trait "speaks like a wise and calm philosopher" \
  --num-pairs 25 \
  --model meta-llama/Llama-3.1-8B-Instruct \
  --layer 15 \
  --steering-strength 1.5
Load Existing Pairs
# Use previously generated pairs
python -m wisent synthetic \
  --pairs-file ./pairs/existing.json \
  --model meta-llama/Llama-3.1-8B-Instruct \
  --layer 15

Arguments

Pair Source (mutually exclusive)

ArgumentDescription
--traitNatural language description of the trait (generates new pairs)
--pairs-filePath to existing JSON file with contrastive pairs

Generation Options

ArgumentDefaultDescription
--num-pairs30Number of pairs to generate (only with --trait)
--save-pairsNoneSave generated pairs to file

Model & Training

ArgumentDefaultDescription
--modelLlama-3.1-8B-InstructModel name or path
--layer15Layer for activation extraction
--deviceautoDevice (cuda, cpu, mps)
--steering-methodCAASteering method to use
--steering-strength1.0Steering strength for testing
--test-questions5Number of test questions for evaluation

Nonsense Detection

ArgumentDefaultDescription
--enable-nonsense-detectionfalseEnable nonsense detection
--nonsense-actionregenerateAction on nonsense (regenerate, stop, flag)

Pipeline Steps

  1. Generate Pairs - Creates positive/negative response pairs from the trait description
  2. Extract Activations - Gets model activations for each pair at the specified layer
  3. Compute Direction - Calculates the steering direction (positive - negative mean)
  4. Test Vector - Evaluates the steering vector on test questions

Related Commands

Stay in the loop. Never miss out.

Subscribe to our newsletter and unlock Wisent insights.