The Wisent Python API provides a high-level interface for steering AI models using representation engineering. It supports multiple modalities including text, audio, video, and robotics.
pip install wisent
The main process includes building an instance of Wisent, appending contrast pairs, training steerable vectors and producing results after applying steering.
from wisent import Wisent
# Create a Wisent instance for text/LLM steering
wisent = Wisent.for_text("meta-llama/Llama-3-8B-Instruct")
# Add contrastive pairs for a trait
wisent.add_pair(
positive="I'd be happy to help you with that.",
negative="I refuse to help with that.",
trait="helpfulness"
)
# Train steering vectors
wisent.train()
# Generate with steering applied
response = wisent.generate(
"How do I cook pasta?",
steer={"helpfulness": 1.5}
)Wisent provides factory methods for different modalities:
| Method | Use Case |
|---|---|
Wisent.for_text(model_name) | LLMs and text generation models |
Wisent.for_audio(model_name) | Audio/speech models (e.g., Whisper) |
Wisent.for_video(model_name) | Video understanding models |
Wisent.for_robotics(model) | Robotics policy networks |
Wisent.for_multimodal(model_name) | Vision-language models (VLMs) |
| Method | Description |
|---|---|
add_pair(positive, negative, trait) | Add a contrastive pair for a trait |
add_pairs(pairs, trait) | Add multiple contrastive pairs at once |
train(traits, layers, aggregation) | Train steering vectors from stored pairs |
generate(content, steer) | Generate output with optional steering |
save_vectors(path) | Save trained steering vectors to file |
load_vectors(path) | Load trained steering vectors from file |
You can define and combine multiple steering traits:
from wisent import Wisent
wisent = Wisent.for_text("meta-llama/Llama-3-8B-Instruct")
# Add pairs for different traits
wisent.add_pair(
positive="Let me explain this clearly...",
negative="I guess maybe...",
trait="confidence"
)
wisent.add_pair(
positive="That's a great question!",
negative="Ugh, another question...",
trait="friendliness"
)
# Train all traits
wisent.train()
# Apply multiple traits with different strengths
response = wisent.generate(
"Explain quantum computing",
steer={
"confidence": 1.5,
"friendliness": 0.8
}
)Save trained steering vectors to reuse them later:
# Save trained vectors
wisent.save_vectors("my_steering_vectors.pt")
# Later, load them back
wisent = Wisent.for_text("meta-llama/Llama-3-8B-Instruct")
wisent.load_vectors("my_steering_vectors.pt")
# Use immediately without retraining
response = wisent.generate(
"Your prompt here",
steer={"helpfulness": 1.0}
)The TraitConfig dataclass stores configuration for each steering trait:
from wisent import TraitConfig
# TraitConfig attributes:
# - name: str # Unique identifier for the trait
# - description: str # Human-readable description
# - steering_vectors # Per-layer steering vectors (set after training)
# - default_scale: float # Default steering strength (default: 1.0)
# - layers: List[str] # Which layers to apply to
# Access trait info
trait_info = wisent.get_trait_info("helpfulness")
print(f"Trait: {trait_info.name}")
print(f"Default scale: {trait_info.default_scale}")Inspect the Wisent instance state:
# Check defined traits print(wisent.traits) # ['helpfulness', 'confidence', ...] # Check if trained print(wisent.is_trained) # True/False # Get available intervention points (layers) print(wisent.get_intervention_points()) # Get recommended layers for steering print(wisent.get_recommended_layers())
Stay in the loop. Never miss out.
Subscribe to our newsletter and unlock Wisent insights.