Train classifiers on model activations to detect hallucinations and untruthful responses in real-time.
Use LiveCodeBench to compute steering vectors and improve model code generation quality.
Learn the mathematical foundations of activations, representations, and how to work with model internals.
Permanently modify model weights to reduce unnecessary refusals using norm-preserving abliteration.
Optimize steering parameters for multiple personality traits and combine vectors for multi-trait steering.
Train, save, and use classifiers on benchmarks.
Create steering vectors from tasks, activations, or synthetic pairs.
Combine multiple steering vectors with different parameters.
Generate contrastive pairs from tasks or synthetically.
Generate responses with steering or classifier-based control.
Agentic mode with quality control and steering.
Evaluate generated responses and personalization.
Detect and handle nonsensical model outputs.
Abliteration and permanent weight modifications.
Extract and work with model activations.
Stay in the loop. Never miss out.
Subscribe to our newsletter and unlock Wisent insights.