Wisent Guard CLI Reference

Basic Usage

python -m wisent_guard tasks <task_name> [OPTIONS]

The CLI follows a simple pattern: specify the task(s) to run, followed by configuration options.

Quick Start Commands

Steering Mode (HellaSwag)

python -m wisent_guard tasks hellaswag --model meta-llama/Llama-3.1-8B-Instruct --layer 15 --limit 5 --steering-mode --steering-strength 1.0 --verbose

Classification Mode (MMLU)

python -m wisent_guard tasks mmlu --model meta-llama/Llama-3.1-8B-Instruct --layer 15 --limit 10 --classifier-type logistic --verbose

Core Arguments

Basic Configuration

Argument	Type	Default	Description
--model	str	meta-llama/Llama-3.1-8B-Instruct	Model name or path
--layer	str	15	Layer(s) to extract activations from
--limit	int	None	Limit number of documents per task
--verbose	flag	False	Enable verbose logging

Classification Mode

Classification mode trains classifiers to detect harmful/incorrect content in model activations.

Classifier Configuration

Argument	Type	Default	Description
--classifier-type	str	logistic	Type of classifier (logistic, mlp)
--detection-threshold	float	0.6	Classification threshold (higher = stricter)

Steering Mode

Steering mode uses Contrastive Activation Addition (CAA) to influence model behavior during generation.

Steering Configuration

Argument	Type	Default	Description
--steering-mode	flag	False	Enable steering mode
--steering-strength	float	1.0	Steering vector strength multiplier

Steering Strength Guidelines

Strength	Effect	Recommendation
0.5-1.0	Subtle behavioral changes	Recommended for production
1.0-3.0	Noticeable but coherent changes	Good for experimentation
5.0+	Risk of incoherent outputs	Not recommended

Examples

Basic Classification

python -m wisent_guard tasks mmlu --model meta-llama/Llama-3.1-8B-Instruct --layer 15 --limit 10 --classifier-type logistic

Steering with Custom Strength

python -m wisent_guard tasks hellaswag --model meta-llama/Llama-3.1-8B-Instruct --layer 15 --steering-mode --steering-strength 2.0 --verbose

Multi-Task Evaluation

python -m wisent_guard tasks hellaswag,mmlu,truthfulqa --layer 15 --limit 20 --model meta-llama/Llama-3.1-8B

Core Arguments

Required Arguments

Argument	Description	Example
command	Command to run (always `tasks`)	tasks
task_names	Task name(s) or file path	hellaswag, truthfulqa,mmlu, data.csv

Basic Configuration

Argument	Type	Default	Description
--model	str	meta-llama/Llama-3.1-8B-Instruct	Model name or path
--layer	str	15	Layer(s) to extract activations from
--shots	int	0	Number of few-shot examples
--limit	int	None	Limit number of documents per task
--seed	int	42	Random seed for reproducibility
--device	str	None	Device to run on (auto-detected if None)
--verbose	flag	False	Enable verbose logging

Model and Layer Configuration

Model Selection

--model meta-llama/Llama-3.1-8B-Instruct  # HuggingFace model
--model /path/to/local/model               # Local model path

Layer Specification

The --layer argument supports multiple formats:

Format	Description	Example
Single layer	Extract from one layer	--layer 15
Range	Extract from layer range	--layer 14-16
List	Extract from specific layers	--layer 14,15,16
Auto-optimize	Find optimal layer	--layer -1

Generation Settings

Argument	Type	Default	Description
--max-new-tokens	int	300	Maximum new tokens for generation
--split-ratio	float	0.8	Train/test split ratio

Continue to Common Issues