Wisent-Guard

A Python package for latent space monitoring and guardrails. Delivered to you by the Wisent team led by Lukasz Bartoszcze.

GitHub StarsPyPI VersionPyPI DownloadsLicense

Overview

Wisent-Guard uses representation engineering to make your AI safer and more trustworthy. Unlock the true potential of your LLMModelA model is a set of weights used to generate responses. At the moment, Wisent only works with open source models. Each model has a distinct parameter size and special tokens to mark the beginning of the model response and user query. with layer-level control. With our tools, you can cut hallucinations by 43% and harmful responses by 88%. All through the power of controlling intermediate representationsRepresentationA high level concept embedded within the weights of the neural network. To be honest, the exact definition of what a representation is can be a bit difficult. It can be really wide, like a representation of hallucination or good coding ability. It can be pretty narrow like knowledge about a particular historical fact or being able to perform a particular task. Representations get acquired in training through process known as representation learning. Representation engineering however, focuses on observing and changing representations at runtime.- thoughts hidden deep inside the AI brain.

  • Read our research on this here
  • Explore our vision for a world where we control the AI here
  • Happy to demo all the tech we have in store here

Installation

pip install wisent-guard

Quick Start

Run MMLU benchmark with classification:

python -m wisent_guard.cli tasks mmlu --model meta-llama/Llama-3.1-8B-Instruct --layer 15 --limit 10 --classifier-type logistic --verbose

Run HellaSwag benchmark with steering:

python -m wisent_guard.cli tasks hellaswag --model meta-llama/Llama-3.1-8B-Instruct --layer 15 --limit 5 --steering-mode --steering-strength 1.0 --verbose

Features

  • State-of-the-art LLM Safeguards: Flexible latent space monitoring tool for blocking unwelcome responses. Reduces hallucinations by 43%, up to 7x the next best alternative. Can be used for various purposes including security, hallucinations, quality assurance.
  • Generalized Representation EngineeringRepresentation EngineeringThe practice of detecting (Representation Reading) and influencing (Representation Steering) representations present at inference time. Framework: With its modular architecture, enables flexible modifications, creating a performant and usable framework for identifying and manipulating representationsRepresentationA high level concept embedded within the weights of the neural network. To be honest, the exact definition of what a representation is can be a bit difficult. It can be really wide, like a representation of hallucination or good coding ability. It can be pretty narrow like knowledge about a particular historical fact or being able to perform a particular task. Representations get acquired in training through process known as representation learning. Representation engineering however, focuses on observing and changing representations at runtime. for safeguarding, performance improvement or evaluation.
  • Model-Agnostic: Works with all open source modelsModelA model is a set of weights used to generate responses. At the moment, Wisent only works with open source models. Each model has a distinct parameter size and special tokens to mark the beginning of the model response and user query..
  • CUDA/MPS/CPU Support: Built-in support for CUDA, MPS (Metal Performance Shaders) or CPU to make the guard work with variety of GPU Architectures
  • Support for over 6000 tasks: use built in benchmarks using our lm-harness integration to create your own representations from common benchmarks like MMLU, TruthfulQA, HellaSwag and others. Or, use your own dataset of contrastive pairsContrastive PairA set of two strings where one string corresponds to a positive instance of a trait (contains a particular representation we want to identify e.g. "The capital of Japan is Paris", which is a hallucination) and the other to a negative one (e.g. "The capital of Japan is Tokyo", which is directly how the first pair would look like without the hallucination). This pair of contrastive positive and negative behaviour is essential to extracting the vector for hallucinations. to create the perfect representation for your use case.
  • Full optimisation: use our internal logic to optimise common representation engineering hyper parameters like layerLayerA single processing block in a transformer model that updates the residual stream. It typically consists of an attention mechanism and an MLP (feedforward network), each followed by a residual connection and layer normalization. In LLaMA 3.1 8B, there are 32 such layers, each refining the token representations passed through the residual stream. choice, classifierClassifierA function determining whether a representation (e.g. of hallucination or harmfulness) is present in a particular residual stream. choice and steeringSteeringInfluencing the activations of a particular model so that it generates tokens that are better aligned with the user's preferences. strength to maximise the performance on your benchmarks
  • Oversight: save your results and understand the internal logic with built-in tools for logging and error handling
  • Performance speedup: reduce the impact of this tech on your performance by having optimised performance in the background