Model

A Model is a set of weights used to generate responses. At the moment, Wisent only works with open source large language models. Each model has special tokens to mark the beginning of the model response and user query.

The model parameters are structured into layers. Each model has a fixed number of layers.

Parameters

The numerical values (weights and biases) that the model learned during training. These determine how the model processes information and generates responses.

Examples: Qwen2.5-7B has ~7 billion parameters, while Llama-3.1-70B has ~70 billion parameters. More parameters generally mean better capabilities but require more computational resources.

Special Tokens

Specific text markers that models use to understand conversation structure and roles. Each model family uses different tokens to identify who is speaking.

Open Source

Models with publicly available weights that can be downloaded, inspected, and modified. Unlike proprietary models (GPT-4, Claude), you have full access to the model's internals.

Examples:
• Open Source: Llama 3.1, Qwen2.5, Mistral 7B, Gemma 2
• Closed Source: GPT-4, Claude 3.5, Gemini Pro
• Why it matters: Wisent-Guard needs direct access to internal activations, which is only possible with open source models.

Supported Models

Qwen
DeepSeek
LLaMA
Mistral
Gemma
Any HuggingFace compatible transformer model

Model Loading Example

Wisent-Guard is optimized to work with models hosted on HuggingFace. However, you can also adapt the existing code to load your internal model or a model in any other format by changing the model.py file to load your model into existing Wisent-Guard pipeline.

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load a model and tokenizer
model_name = "Qwen/Qwen2.5-Instruct"
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Model characteristics
print(f"Model parameters: {model.num_parameters():,}")
print(f"Special tokens: {tokenizer.special_tokens_map}")
print(f"Vocabulary size: {tokenizer.vocab_size:,}")

User Tags Configuration

⚠️ Important: Specify User Tags

User tags are special tokens that mark the beginning of user input in conversations. Different models use different tag formats, and specifying the correct tags is crucial for proper activation extraction.

✅ Supported by Default

• <|user|> - LLaMA models
• <|im_start|>user - Qwen models
• [INST] - Mistral models

❌ Not Supported by Default

• Custom chat templates
• Non-standard user markers
• Proprietary tag formats

Note: You'll need to configure these manually

Implementation Reference

📁 Source Code Reference

For detailed implementation and configuration options, check the model core file:

wisent_guard/core/model.py

Role in Representation Engineering

Foundation Layer

The model serves as the foundation for all representation engineering techniques. Its internal activations contain the representations we aim to detect and manipulate.

Activation Source

Every layer in the model produces activations that can be monitored, analyzed, and potentially modified to achieve desired behaviors.

Intervention Target

Models can be modified through techniques like control vectors and steering to influence their output generation process.

Continue to Layer