Synthetic Pair Generation

Using synthesis generators to produce contrast pairs of very high quality; these create diverse prompts that include both positive and negative responses to clearly delineate contrasts related to specific behavioral traits. Expertise also exists in avoiding detection by advanced AI systems through corresponding sentence recasting. Specifically clarifying, emphasis is put on high quality pair creation focused on trait descriptors via synthesis generation containing different set of prompts that feature both positives and negatives to precisely contrast targeted behaviors. Also there is expertise at preventing detection by sophisticated AI systems. Revised: Employing synthesis generators to generate contrast pairs of very high quality; they generate distinct prompts featuring both positive and negative responses that emphasize clear contrasts relating to specific behavioral traits. Specialized skill also lies in evading detection by advanced AI systems by fine tuning recasting of sentences appropriately. More specifically, attention is directed towards producing contrasting pairs of high quality based on descriptor trait using synthesis

How Synthetic Generation Works

By using words such as "honest" or "helpful", a generator of contrastive pairs synthesizes disparate elements; employing large language models (LLMs), it produces varied prompts which yield example sets displaying those traits along with opposites that show lack thereof. To emphasize the generation of both positive and negative exemplars side by side.

Throughout creation of content different levels of quality control are implemented: automated generation of contrasting elements followed by removal of artifacts through processing outcomes; diversity checks are performed to guarantee distinct samples with screening against similarity aiming at reducing redundancy as well. Resulting information is streamlined so that vector extraction can be carried out easily; extra measures have also been taken to prevent detection by automation such as using various obfuscation techniques.

This instrument accommodates diverse operational settings including specifying characteristics, supplying templates and presenting exemplars related to usage scenarios. Characteristics highlight functions; templates serve as cues which can be modified; samples supply foundation materials for elaborating further. Different backgrounds select among various options accordingly.

Pair Format

Generated pairs follow the standard contrastive pair format:

{
  "question": "How should I respond to criticism?",
  "positive": "Listen carefully and consider if the feedback is valid...",
  "negative": "Dismiss the criticism and attack the person giving it..."
}

Positive responses show the target trait, while Negative responses show the opposite behavior.

CLI Examples

Generate pairs from trait description
python -m wisent.cli synthetic "honest and truthful" --num-pairs 50 --output honesty_pairs.json
Generate with opposite trait specified
python -m wisent.cli synthetic "helpful and supportive" --opposite-trait "dismissive and unhelpful" --num-pairs 100 --output helpful_pairs.json
Generate with domain focus
python -m wisent.cli synthetic "provides accurate medical information" --domain medical --num-pairs 75 --output medical_accuracy_pairs.json
Generate with quality filters
python -m wisent.cli synthetic "respectful communication" --num-pairs 200 --min-length 50 --max-similarity 0.8 --output respectful_pairs.json
Generate from seed examples
python -m wisent.cli synthetic --from-examples seed_pairs.json --expand-factor 5 --output expanded_pairs.json

Python API

Basic synthetic generation
from wisent.core.synthetic import SyntheticContrastivePairsGenerator

generator = SyntheticContrastivePairsGenerator(
    model_name="gpt-4",  # or local model
    trait="honest and truthful"
)

pairs = generator.generate(num_pairs=50)

# Save to file
generator.save(pairs, "honesty_pairs.json")
Advanced generation with options
from wisent.core.synthetic import SyntheticContrastivePairsGenerator

generator = SyntheticContrastivePairsGenerator(
    model_name="gpt-4",
    trait="helpful and informative",
    opposite_trait="dismissive and vague",  # Optional: specify opposite
    domain="customer service",  # Optional: focus domain
    temperature=0.8,  # Diversity control
    max_workers=4  # Parallel generation
)

pairs = generator.generate(
    num_pairs=100,
    min_length=30,
    max_similarity=0.85,  # Filter similar pairs
    validate=True  # Run quality checks
)

print(f"Generated {len(pairs)} pairs")
for p in pairs[:3]:
    print(f"Prompt: {p['prompt'][:50]}...")
    print(f"Positive: {p['positive'][:50]}...")
    print(f"Negative: {p['negative'][:50]}...")
    print()
Expanding existing pairs
from wisent.core.synthetic import SyntheticContrastivePairsGenerator
import json

# Load seed pairs
with open("seed_pairs.json") as f:
    seeds = json.load(f)

generator = SyntheticContrastivePairsGenerator(
    model_name="gpt-4",
    seed_examples=seeds
)

# Generate variations
expanded = generator.expand(
    expand_factor=5,  # 5x the seed count
    preserve_style=True
)

print(f"Expanded {len(seeds)} seeds to {len(expanded)} pairs")

Parameters

Generation Parameters

--num-pairs
Number of pairs to generate (default: 50)
--opposite-trait
Explicit opposite trait (auto-generated if not specified)
--domain
Focus domain for prompts (e.g., medical, legal, technical)
--temperature
Generation temperature for diversity (default: 0.7)

Quality Filters

--min-length
Minimum response length in characters (default: 20)
--max-length
Maximum response length in characters (default: 500)
--max-similarity
Maximum similarity between pairs (default: 0.9)
--validate
Run quality validation on generated pairs

Expansion Parameters

--from-examples
Path to seed examples JSON file
--expand-factor
Multiplier for expansion (default: 3)
--preserve-style
Maintain style of seed examples (default: true)

Performance

--max-workers
Parallel workers for generation (default: 1)
--batch-size
Batch size for generation (default: 10)
--output
Output file path (default: pairs.json)

For the complete implementation of synthetic pair generation in Wisent, see:

Stay in the loop. Never miss out.

Subscribe to our newsletter and unlock Wisent insights.