Python API

The Wisent Python API provides a high-level interface for steering AI models using representation engineering. It supports multiple modalities including text, audio, video, and robotics.

Installation

Install Wisent
pip install wisent

Basic Usage

The main process includes building an instance of Wisent, appending contrast pairs, training steerable vectors and producing results after applying steering.

Basic Text Steering Example
from wisent import Wisent

# Create a Wisent instance for text/LLM steering
wisent = Wisent.for_text("meta-llama/Llama-3-8B-Instruct")

# Add contrastive pairs for a trait
wisent.add_pair(
    positive="I'd be happy to help you with that.",
    negative="I refuse to help with that.",
    trait="helpfulness"
)

# Train steering vectors
wisent.train()

# Generate with steering applied
response = wisent.generate(
    "How do I cook pasta?",
    steer={"helpfulness": 1.5}
)

Factory Methods

Wisent provides factory methods for different modalities:

MethodUse Case
Wisent.for_text(model_name)LLMs and text generation models
Wisent.for_audio(model_name)Audio/speech models (e.g., Whisper)
Wisent.for_video(model_name)Video understanding models
Wisent.for_robotics(model)Robotics policy networks
Wisent.for_multimodal(model_name)Vision-language models (VLMs)

Core Methods

MethodDescription
add_pair(positive, negative, trait)Add a contrastive pair for a trait
add_pairs(pairs, trait)Add multiple contrastive pairs at once
train(traits, layers, aggregation)Train steering vectors from stored pairs
generate(content, steer)Generate output with optional steering
save_vectors(path)Save trained steering vectors to file
load_vectors(path)Load trained steering vectors from file

Multiple Traits

You can define and combine multiple steering traits:

Multiple Traits Example
from wisent import Wisent

wisent = Wisent.for_text("meta-llama/Llama-3-8B-Instruct")

# Add pairs for different traits
wisent.add_pair(
    positive="Let me explain this clearly...",
    negative="I guess maybe...",
    trait="confidence"
)

wisent.add_pair(
    positive="That's a great question!",
    negative="Ugh, another question...",
    trait="friendliness"
)

# Train all traits
wisent.train()

# Apply multiple traits with different strengths
response = wisent.generate(
    "Explain quantum computing",
    steer={
        "confidence": 1.5,
        "friendliness": 0.8
    }
)

Saving & Loading Vectors

Save trained steering vectors to reuse them later:

Persistence Example
# Save trained vectors
wisent.save_vectors("my_steering_vectors.pt")

# Later, load them back
wisent = Wisent.for_text("meta-llama/Llama-3-8B-Instruct")
wisent.load_vectors("my_steering_vectors.pt")

# Use immediately without retraining
response = wisent.generate(
    "Your prompt here",
    steer={"helpfulness": 1.0}
)

TraitConfig

The TraitConfig dataclass stores configuration for each steering trait:

TraitConfig Structure
from wisent import TraitConfig

# TraitConfig attributes:
# - name: str              # Unique identifier for the trait
# - description: str       # Human-readable description
# - steering_vectors       # Per-layer steering vectors (set after training)
# - default_scale: float   # Default steering strength (default: 1.0)
# - layers: List[str]      # Which layers to apply to

# Access trait info
trait_info = wisent.get_trait_info("helpfulness")
print(f"Trait: {trait_info.name}")
print(f"Default scale: {trait_info.default_scale}")

Introspection

Inspect the Wisent instance state:

Introspection Methods
# Check defined traits
print(wisent.traits)  # ['helpfulness', 'confidence', ...]

# Check if trained
print(wisent.is_trained)  # True/False

# Get available intervention points (layers)
print(wisent.get_intervention_points())

# Get recommended layers for steering
print(wisent.get_recommended_layers())

Stay in the loop. Never miss out.

Subscribe to our newsletter and unlock Wisent insights.