cluster-benchmarks

Cluster benchmarks by their activation similarity to identify groups of tasks that share similar representations. This helps in understanding which benchmarks might benefit from shared steering vectors and in designing efficient multi-task optimization strategies.

Basic Usage
python -m wisent cluster-benchmarks --model MODEL [OPTIONS]

Examples

Basic Clustering
python -m wisent cluster-benchmarks \
  --model meta-llama/Llama-3.1-8B-Instruct \
  --output ./cluster_results/
With More Pairs
python -m wisent cluster-benchmarks \
  --model meta-llama/Llama-3.1-8B-Instruct \
  --pairs-per-benchmark 100 \
  --output ./detailed_clusters/
GPU Acceleration
python -m wisent cluster-benchmarks \
  --model meta-llama/Llama-3.1-8B-Instruct \
  --device cuda \
  --pairs-per-benchmark 50 \
  --output ./gpu_clusters/

Arguments

ArgumentDefaultDescription
--modelrequiredModel name or path (e.g., meta-llama/Llama-3.2-1B-Instruct)
--output./cluster_outputOutput directory for results
--pairs-per-benchmark50Number of contrastive pairs per benchmark
--deviceautoDevice to use (cuda/mps/cpu). Auto-detected if not specified.

Output

The command generates:

  • Cluster assignments for each benchmark
  • Similarity matrices showing relationships between benchmarks
  • Visualizations of the clustering results
  • Recommendations for which benchmarks to combine for multi-task steering

Related Commands

Stay in the loop. Never miss out.

Subscribe to our newsletter and unlock Wisent insights.