HPR

HPR - Householder Pseudo-Rotation that applies rotation transformations to activation space using Householder matrices for more stable steering.

How HPR Works

HPR (Householder Pseudo-Rotation) starts exactly like CAA by computing the difference between positive and negative activation averages, but then transforms this difference vector into a special mathematical object called a Householder matrix using the formula H = I - 2vv^T, where v is the normalized steering vector and I is the identity matrix.

During inference, instead of simply adding the vector like CAA, HPR applies this Householder matrix as a rotation transformation to the entire activation space, effectively rotating the AI's internal representations toward the desired behavior. The transformation is applied with a beta parameter that controls the rotation strength, and the matrix multiplication happens on flattened activations before reshaping them back, targeting the second-to-last token position.

This approach preserves the geometric structure of the activation space better than simple addition, making it more stable for larger steering strengths.

CLI Examples

# Basic HPR training

python -m wisent_guard.cli tasks confidence_pairs.json --from-json --steering-mode --steering-method HPR --layer 16 --save-steering-vector confidence_hpr.pt

# HPR with custom beta parameter (rotation strength)

python -m wisent_guard.cli tasks accuracy_pairs.json --from-json --steering-mode --steering-method HPR --layer 14 --hpr-beta 0.8 --save-steering-vector accuracy_hpr.pt

# HPR inference with high steering strength

python -m wisent_guard.cli tasks test_scenarios.json --from-json --steering-mode --steering-method HPR --layer 16 --load-steering-vector confidence_hpr.pt --steering-strength 2.0

# HPR with exponential token decay strategy

python -m wisent_guard.cli tasks creativity_pairs.json --from-json --steering-mode --steering-method HPR --layer 13 --enable-token-steering --token-steering-strategy exponential_decay --token-decay-rate 0.7

# HPR training on GPU with large dataset

python -m wisent_guard.cli tasks ethics_pairs.json --from-json --steering-mode --steering-method HPR --layer 17 --device cuda --max-new-tokens 100 --save-steering-vector ethics_hpr.pt

Parameters

HPR Specific Parameters

--hpr-beta: Rotation strength parameter (0.0-1.0, default 1.0)

Token Steering Parameters

--enable-token-steering: Enable position-based steering
--token-steering-strategy: last_only, second_to_last, first_only, all_equal, exponential_decay, exponential_growth, linear_decay, linear_growth
--token-decay-rate: Decay rate for exponential strategies (default 0.5)
--token-min-strength: Minimum strength for decay strategies (default 0.1)
--token-max-strength: Maximum strength for growth strategies (default 1.0)

Implementation Details

For the complete implementation of the HPR steering method in Wisent-Guard, see:

hpr.py Original Paper Original Implementation

Continue to K-Steering