synthetic

Pipeline designed for direct generation of contrasting synthetic pairs based on descriptions of traits alongside extraction of activation along with training towards steering vectors; this method achieves fast creation of such vectors specifically tailored to desired behavior.

Basic Usage

python -m wisent synthetic --trait DESCRIPTION [OPTIONS]

Examples

Create Helpfulness Vector

python -m wisent synthetic \
  --trait "responds more helpfully with detailed explanations" \
  --num-pairs 30 \
  --model meta-llama/Llama-3.1-8B-Instruct \
  --layer 15 \
  --save-pairs ./pairs/helpfulness.json \
  --output ./vectors/helpfulness.pt

Create Personality Vector

python -m wisent synthetic \
  --trait "speaks like a wise and calm philosopher" \
  --num-pairs 25 \
  --model meta-llama/Llama-3.1-8B-Instruct \
  --layer 15 \
  --steering-strength 1.5

Load Existing Pairs

# Use previously generated pairs
python -m wisent synthetic \
  --pairs-file ./pairs/existing.json \
  --model meta-llama/Llama-3.1-8B-Instruct \
  --layer 15

Arguments

Pair Source (mutually exclusive)

Argument	Description
--trait	Natural language description of the trait (generates new pairs)
--pairs-file	Path to existing JSON file with contrastive pairs

Generation Options

Argument	Default	Description
--num-pairs	30	Number of pairs to generate (only with --trait)
--save-pairs	None	Save generated pairs to file

Model & Training

Argument	Default	Description
--model	Llama-3.1-8B-Instruct	Model name or path
--layer	15	Layer for activation extraction
--device	auto	Device (cuda, cpu, mps)
--steering-method	CAA	Steering method to use
--steering-strength	1.0	Steering strength for testing
--test-questions	5	Number of test questions for evaluation

Nonsense Detection

Argument	Default	Description
--enable-nonsense-detection	false	Enable nonsense detection
--nonsense-action	regenerate	Action on nonsense (regenerate, stop, flag)

Pipeline Steps

Generate Pairs - Creates positive/negative response pairs from the trait description
Extract Activations - Gets model activations for each pair at the specified layer
Compute Direction - Calculates the steering direction (positive - negative mean)
Test Vector - Evaluates the steering vector on test questions

Related Commands

generate-pairs - Generate pairs only (without training)
create-steering-vector - Create vector from existing pairs
verify-steering - Verify vector effectiveness

Stay in the loop. Never miss out.

Subscribe to our newsletter and unlock Wisent insights.

Contact Careers Privacy Policy Terms of Service