Primitives

Activations

Synthetic Pairs

Ground Truth Evaluator

Resource Monitoring

Representation Reading

Representation Control

Evaluation

CLI Commands

Important Considerations

Definitions

Roadmap

Synthetic Pair Generation

Using synthesis generators to produce contrast pairs of very high quality; these create diverse prompts that include both positive and negative responses to clearly delineate contrasts related to specific behavioral traits. Expertise also exists in avoiding detection by advanced AI systems through corresponding sentence recasting. Specifically clarifying, emphasis is put on high quality pair creation focused on trait descriptors via synthesis generation containing different set of prompts that feature both positives and negatives to precisely contrast targeted behaviors. Also there is expertise at preventing detection by sophisticated AI systems. Revised: Employing synthesis generators to generate contrast pairs of very high quality; they generate distinct prompts featuring both positive and negative responses that emphasize clear contrasts relating to specific behavioral traits. Specialized skill also lies in evading detection by advanced AI systems by fine tuning recasting of sentences appropriately. More specifically, attention is directed towards producing contrasting pairs of high quality based on descriptor trait using synthesis

How Synthetic Generation Works

By using words such as "honest" or "helpful", a generator of contrastive pairs synthesizes disparate elements; employing large language models (LLMs), it produces varied prompts which yield example sets displaying those traits along with opposites that show lack thereof. To emphasize the generation of both positive and negative exemplars side by side.

Throughout creation of content different levels of quality control are implemented: automated generation of contrasting elements followed by removal of artifacts through processing outcomes; diversity checks are performed to guarantee distinct samples with screening against similarity aiming at reducing redundancy as well. Resulting information is streamlined so that vector extraction can be carried out easily; extra measures have also been taken to prevent detection by automation such as using various obfuscation techniques.

This instrument accommodates diverse operational settings including specifying characteristics, supplying templates and presenting exemplars related to usage scenarios. Characteristics highlight functions; templates serve as cues which can be modified; samples supply foundation materials for elaborating further. Different backgrounds select among various options accordingly.

Pair Format

Generated pairs follow the standard contrastive pair format:

{
  "question": "How should I respond to criticism?",
  "positive": "Listen carefully and consider if the feedback is valid...",
  "negative": "Dismiss the criticism and attack the person giving it..."
}

Positive responses show the target trait, while Negative responses show the opposite behavior.

CLI Examples

Generate pairs from trait description

python -m wisent.cli synthetic "honest and truthful" --num-pairs 50 --output honesty_pairs.json

Generate with opposite trait specified

python -m wisent.cli synthetic "helpful and supportive" --opposite-trait "dismissive and unhelpful" --num-pairs 100 --output helpful_pairs.json

Generate with domain focus

python -m wisent.cli synthetic "provides accurate medical information" --domain medical --num-pairs 75 --output medical_accuracy_pairs.json

Generate with quality filters

python -m wisent.cli synthetic "respectful communication" --num-pairs 200 --min-length 50 --max-similarity 0.8 --output respectful_pairs.json

Generate from seed examples

python -m wisent.cli synthetic --from-examples seed_pairs.json --expand-factor 5 --output expanded_pairs.json

Python API

Basic synthetic generation

from wisent.core.synthetic import SyntheticContrastivePairsGenerator

generator = SyntheticContrastivePairsGenerator(
    model_name="gpt-4",  # or local model
    trait="honest and truthful"
)

pairs = generator.generate(num_pairs=50)

# Save to file
generator.save(pairs, "honesty_pairs.json")

Advanced generation with options

from wisent.core.synthetic import SyntheticContrastivePairsGenerator

generator = SyntheticContrastivePairsGenerator(
    model_name="gpt-4",
    trait="helpful and informative",
    opposite_trait="dismissive and vague",  # Optional: specify opposite
    domain="customer service",  # Optional: focus domain
    temperature=0.8,  # Diversity control
    max_workers=4  # Parallel generation
)

pairs = generator.generate(
    num_pairs=100,
    min_length=30,
    max_similarity=0.85,  # Filter similar pairs
    validate=True  # Run quality checks
)

print(f"Generated {len(pairs)} pairs")
for p in pairs[:3]:
    print(f"Prompt: {p['prompt'][:50]}...")
    print(f"Positive: {p['positive'][:50]}...")
    print(f"Negative: {p['negative'][:50]}...")
    print()

Expanding existing pairs

from wisent.core.synthetic import SyntheticContrastivePairsGenerator
import json

# Load seed pairs
with open("seed_pairs.json") as f:
    seeds = json.load(f)

generator = SyntheticContrastivePairsGenerator(
    model_name="gpt-4",
    seed_examples=seeds
)

# Generate variations
expanded = generator.expand(
    expand_factor=5,  # 5x the seed count
    preserve_style=True
)

print(f"Expanded {len(seeds)} seeds to {len(expanded)} pairs")

Parameters

Generation Parameters

--num-pairs

Number of pairs to generate (default: 50)

--opposite-trait

Explicit opposite trait (auto-generated if not specified)

--domain

Focus domain for prompts (e.g., medical, legal, technical)

--temperature

Generation temperature for diversity (default: 0.7)

Quality Filters

--min-length

Minimum response length in characters (default: 20)

--max-length

Maximum response length in characters (default: 500)

--max-similarity

Maximum similarity between pairs (default: 0.9)

--validate

Run quality validation on generated pairs

Expansion Parameters

--from-examples

Path to seed examples JSON file

--expand-factor

Multiplier for expansion (default: 3)

--preserve-style

Maintain style of seed examples (default: true)

Performance

--max-workers

Parallel workers for generation (default: 1)

--batch-size

Batch size for generation (default: 10)

--output

Output file path (default: pairs.json)

For the complete implementation of synthetic pair generation in Wisent, see:

View pairs_generator.py on GitHub

Stay in the loop. Never miss out.

Subscribe to our newsletter and unlock Wisent insights.

Contact Careers Privacy Policy Terms of Service