DOCUMENTATION

Pro Tip:

Terms with dotted underlines have definitions!

Hover over them to see explanations.

Overview
Representation Engineering
Primitives
ModelLayerContrastive PairContrastive Pair SetActivationsActivation Collection MethodGround Truth EvaluatorResource Monitoring
Representation Reading
ClassifierDetection Handling
Representation Control
Activation Aggregation MethodControl VectorSteeringCAABiPODACHPRK-SteeringToken SteeredNonsensical Response BlockingAgentic Mode
Examples
Arguments
Common Issues
FAQ
Integrations
Important Considerations
Definitions
Roadmap

Examples

Detect Hallucinations

Detect when a model is hallucinating facts and prevent it from generating false information.

View example

Detect Harmful Content

Block generation of harmful or dangerous content with activation monitoring.

View example

Detect Bad Code

Prevent the generation of insecure, inefficient, or buggy code snippets.

View example

Detect Bias

Identify and prevent biased outputs across different demographic groups.

View example

Detect Gender Bias

Specifically target and prevent gender stereotypes and biases in model outputs.

View example

Detect Personal Information

Monitor and block leakage of personally identifiable information (PII).

View example
Continue to Arguments