Primitives

Activations

Synthetic Pairs

Ground Truth Evaluator

Resource Monitoring

Representation Reading

Representation Control

Evaluation

CLI Commands

Important Considerations

Definitions

Roadmap

Model

A Model is a set of weights used to generate responses. At the moment, Wisent only works with open source large language models. Each model has special tokens to mark the beginning of the model response and user query.

The model parameters are structured into Layers. Each model has a fixed number of layers.

Parameters

During learning through training, weights and biases acquired by the model. Such elements specify how the system processes data and produces outputs. To be clear, these are key determinants of processing information and creating outcomes.

Qwen2.5-7B has ~7 billion parameters, while Llama-3.1-70B has ~70 billion parameters.
More parameters generally mean better capabilities but require more computational resources.

Special Tokens

Tokens used by models to discern how conversations are structured and assign roles; each model family employs distinct tokens for recognizing who speaks.

Llama: <|user|> and <|assistant|>

Qwen: <|im_start|>user and <|im_start|>assistant

Mistral: [INST] and [/INST]

Open Source

Models whose weights are openly available for downloading inspection and modification; distinct from closed proprietary models like GPT 4 or Claude where one has unrestricted access to internal workings.

Open Source: Llama 3.1, Qwen2.5, Mistral 7B, Gemma 2

Closed Source: GPT-4, Claude 3.5, Gemini Pro

Direct access to internal activation levels can be obtained exclusively via open source models for Wisent.

Supported Models

Qwen

DeepSeek

LLaMA

Mistral

Gemma

or any HuggingFace compatible transformer model

Model Loading Example

Wisent is optimized to work with models hosted on HuggingFace. However, you can also adapt the existing code to load your internal model or a model in any other format by changing the model.py file to load your model into existing Wisent pipeline.

From transformers import AutoModelForCausalLM, AutoTokenizer

#Load a model and tokenizer 
model_name = "Qwen/Qwen2.5-Instruct" 
model = AutoModelForCausalLM.from_pretrained(
 model_name,
 torch_dtype=torch.float16,
 device_map="auto"
 ) 
tokenizer = AutoTokenizer.from_pretrained(model_name)  

#Model characteristics 
print(f"Model parameters: {model.num_parameters():,}") 
print(f"Special tokens: {tokenizer.special_tokens_map}") 
print(f"Vocabulary size: {tokenizer.vocab_size:,}")

User Tags Configuration

User tags are special tokens that mark the beginning of user input in conversations. Different models use different tag formats, and specifying the correct tags is crucial for proper activation extraction.

Supported by Default

LLaMA Models: <|user|>

Qwen models: <|im_start|>user

Mistral models: [INST]

Not Supported by Default

Custom chat templates

Non-standard user markers

Proprietary tag formats

Unsupported tags should be configured manually.

Implementation Reference

For details on how to implement and configure look at the core model file To elaborate using natural language: Refer for

View wisent_model.py on GitHub

Role in Representation Engineering

Foundation Layer

The model serves as the foundation for all representation engineering techniques. Its internal activations contain the representations we aim to detect and manipulate.

Activation Source

Every layer in the model produces activations that can be monitored, analyzed, and potentially modified to achieve desired behaviors.

Intervention Target

Modifications can also be made using methods such as control vectors and steering in order to impact the processing of generating outputs. To make models more customizable via approaches including control vectors and

Stay in the loop. Never miss out.

Subscribe to our newsletter and unlock Wisent insights.

Contact Careers Privacy Policy Terms of Service