Generate model responses for benchmark tasks, optionally with steering applied. This command is useful for creating response datasets that can be evaluated later.
python -m wisent generate-responses MODEL --task TASK --output FILE [OPTIONS]
python -m wisent generate-responses meta-llama/Llama-3.1-8B-Instruct \ --task truthfulqa_mc1 \ --num-questions 50 \ --output ./responses/truthfulqa_baseline.json
python -m wisent generate-responses meta-llama/Llama-3.1-8B-Instruct \ --task arc_easy \ --num-questions 100 \ --use-steering \ --steering-object ./vectors/accuracy.pt \ --steering-strength 1.5 \ --output ./responses/arc_steered.json
python -m wisent generate-responses meta-llama/Llama-3.1-8B-Instruct \ --task gsm8k \ --num-questions 20 \ --max-new-tokens 256 \ --temperature 0.3 \ --top-p 0.9 \ --verbose \ --output ./responses/gsm8k.json
| Argument | Description |
|---|---|
| model | Model name or path |
| --task | Task name (e.g., arc_easy, truthfulqa_mc1) |
| --output | Output file path for results |
| Argument | Default | Description |
|---|---|---|
| --num-questions | 10 | Number of questions to generate responses for |
| --max-new-tokens | 128 | Maximum tokens to generate |
| --temperature | 0.7 | Temperature for generation |
| --top-p | 0.95 | Top-p for nucleus sampling |
| --device | auto | Device to use (cpu, cuda, mps) |
| Argument | Default | Description |
|---|---|---|
| --use-steering | false | Use steering during generation |
| --steering-object | - | Path to steering object file (.pt) |
| --steering-strength | 1.0 | Steering strength multiplier |
| --disable-thinking | false | Disable thinking mode (Qwen models) |
Stay in the loop. Never miss out.
Subscribe to our newsletter and unlock Wisent insights.