evaluate

Evaluate a single prompt using a steering vector and output quality scores. Helpful for assessing effectiveness of steering in real time for distinct inputs.

Basic Usage
python -m wisent evaluate --vector FILE --prompt TEXT --model MODEL --trait NAME [OPTIONS]

Examples

Basic Evaluation
python -m wisent evaluate \
  --vector ./vectors/helpfulness.pt \
  --prompt "What is the best way to learn programming?" \
  --model meta-llama/Llama-3.1-8B-Instruct \
  --trait helpfulness
With Custom Strength
python -m wisent evaluate \
  --vector ./vectors/cynical.pt \
  --prompt "What do you think about the future of AI?" \
  --model meta-llama/Llama-3.1-8B-Instruct \
  --trait cynical \
  --trait-description "responds with cynical worldview" \
  --steering-strength 2.0
With Thresholds
python -m wisent evaluate \
  --vector ./vectors/honest.pt \
  --prompt "Tell me about your capabilities" \
  --model meta-llama/Llama-3.1-8B-Instruct \
  --trait honest \
  --trait-threshold 0.5 \
  --answer-threshold 0.7
JSON Output
python -m wisent evaluate \
  --vector ./vectors/creative.pt \
  --prompt "Write a short story opening" \
  --model meta-llama/Llama-3.1-8B-Instruct \
  --trait creative \
  --json

Arguments

Required

ArgumentDescription
--vectorPath to steering vector file (.pt)
--promptPrompt to evaluate
--modelModel name or path
--traitTrait name (e.g., 'catholic', 'cynical')

Optional Configuration

ArgumentDefaultDescription
--deviceautoDevice to run on
--steering-strength2.0Steering strength to apply
--max-new-tokens100Maximum new tokens to generate
--trait-descriptiontrait nameOptional description of the trait

Threshold Parameters

ArgumentDescription
--trait-thresholdMinimum trait quality threshold (-1 to 1 scale)
--answer-thresholdMinimum answer quality threshold (0 to 1 scale)

Output Options

ArgumentDescription
--verboseEnable verbose output
--jsonOutput results as JSON

Output Scores

  • Trait Score - How well the response exhibits the target trait (-1 to 1)
  • Answer Quality - Overall quality of the answer (0 to 1)
  • Generated Response - The steered model output

Related Commands

Stay in the loop. Never miss out.

Subscribe to our newsletter and unlock Wisent insights.