python -m wisent_guard tasks <task_name> [OPTIONS]
The CLI follows a simple pattern: specify the task(s) to run, followed by configuration options.
python -m wisent_guard tasks hellaswag --model meta-llama/Llama-3.1-8B-Instruct --layer 15 --limit 5 --steering-mode --steering-strength 1.0 --verbose
python -m wisent_guard tasks mmlu --model meta-llama/Llama-3.1-8B-Instruct --layer 15 --limit 10 --classifier-type logistic --verbose
Argument | Type | Default | Description |
---|---|---|---|
--model | str | meta-llama/Llama-3.1-8B-Instruct | Model name or path |
--layer | str | 15 | Layer(s) to extract activations from |
--limit | int | None | Limit number of documents per task |
--verbose | flag | False | Enable verbose logging |
Classification mode trains classifiers to detect harmful/incorrect content in model activations.
Argument | Type | Default | Description |
---|---|---|---|
--classifier-type | str | logistic | Type of classifier (logistic, mlp) |
--detection-threshold | float | 0.6 | Classification threshold (higher = stricter) |
Steering mode uses Contrastive Activation Addition (CAA) to influence model behavior during generation.
Argument | Type | Default | Description |
---|---|---|---|
--steering-mode | flag | False | Enable steering mode |
--steering-strength | float | 1.0 | Steering vector strength multiplier |
Strength | Effect | Recommendation |
---|---|---|
0.5-1.0 | Subtle behavioral changes | Recommended for production |
1.0-3.0 | Noticeable but coherent changes | Good for experimentation |
5.0+ | Risk of incoherent outputs | Not recommended |
python -m wisent_guard tasks mmlu --model meta-llama/Llama-3.1-8B-Instruct --layer 15 --limit 10 --classifier-type logistic
python -m wisent_guard tasks hellaswag --model meta-llama/Llama-3.1-8B-Instruct --layer 15 --steering-mode --steering-strength 2.0 --verbose
python -m wisent_guard tasks hellaswag,mmlu,truthfulqa --layer 15 --limit 20 --model meta-llama/Llama-3.1-8B
Argument | Description | Example |
---|---|---|
command | Command to run (always `tasks`) | tasks |
task_names | Task name(s) or file path | hellaswag, truthfulqa,mmlu, data.csv |
Argument | Type | Default | Description |
---|---|---|---|
--model | str | meta-llama/Llama-3.1-8B-Instruct | Model name or path |
--layer | str | 15 | Layer(s) to extract activations from |
--shots | int | 0 | Number of few-shot examples |
--limit | int | None | Limit number of documents per task |
--seed | int | 42 | Random seed for reproducibility |
--device | str | None | Device to run on (auto-detected if None) |
--verbose | flag | False | Enable verbose logging |
--model meta-llama/Llama-3.1-8B-Instruct # HuggingFace model
--model /path/to/local/model # Local model path
The --layer
argument supports multiple formats:
Format | Description | Example |
---|---|---|
Single layer | Extract from one layer | --layer 15 |
Range | Extract from layer range | --layer 14-16 |
List | Extract from specific layers | --layer 14,15,16 |
Auto-optimize | Find optimal layer | --layer -1 |
Argument | Type | Default | Description |
---|---|---|---|
--max-new-tokens | int | 300 | Maximum new tokens for generation |
--split-ratio | float | 0.8 | Train/test split ratio |