cluster-benchmarks

Cluster benchmarks by their activation similarity to identify groups of tasks that share similar representations. This helps in understanding which benchmarks might benefit from shared steering vectors and in designing efficient multi-task optimization strategies.

Basic Usage

python -m wisent cluster-benchmarks --model MODEL [OPTIONS]

Examples

Basic Clustering

python -m wisent cluster-benchmarks \
  --model meta-llama/Llama-3.1-8B-Instruct \
  --output ./cluster_results/

With More Pairs

python -m wisent cluster-benchmarks \
  --model meta-llama/Llama-3.1-8B-Instruct \
  --pairs-per-benchmark 100 \
  --output ./detailed_clusters/

GPU Acceleration

python -m wisent cluster-benchmarks \
  --model meta-llama/Llama-3.1-8B-Instruct \
  --device cuda \
  --pairs-per-benchmark 50 \
  --output ./gpu_clusters/

Arguments

Argument	Default	Description
--model	required	Model name or path (e.g., meta-llama/Llama-3.2-1B-Instruct)
--output	./cluster_output	Output directory for results
--pairs-per-benchmark	50	Number of contrastive pairs per benchmark
--device	auto	Device to use (cuda/mps/cpu). Auto-detected if not specified.

Output

The command generates:

Cluster assignments for each benchmark
Similarity matrices showing relationships between benchmarks
Visualizations of the clustering results
Recommendations for which benchmarks to combine for multi-task steering

Related Commands

geometry-search - Search for optimal steering geometry
check-linearity - Check representation linearity
modify-weights - Multi-benchmark weight modification

Stay in the loop. Never miss out.

Subscribe to our newsletter and unlock Wisent insights.

Contact Careers Privacy Policy Terms of Service