Classifying through use of decision boundaries via logistic regression differs from CAAM (Classification and Alignment Method); instead of this approach the new method trains classifiers to distinguish among varying degrees of activation and then derives steering directions based on learned weights from these classifiers. To classify using decision boundary by means of logistic regression contrasts to CAAM (Classification Alignment Method): this alternative approach trains classifiers for distinguishing different intensities of activation and derives steering directions directly from weights acquired by those classifiers.

How HYPERPLANE Works

This approach trains models to distinguish between distinct kinds of activity via classification using logistic regression. From this training process, optimization results in forming a decision boundary or hyperplane. That resulting plane's normal vector becomes the steering vector. To put it another way: this procedure trains models to discern specific types of actions clearly with logistic regression; resulting from that training process, optimization leads to formation of a decision border or plane and selection of its normal vector as steering vector. Rephrased simply: this method trains classifiers to distinguish clearly different activities using logistic regression. As a result, after training, optimization defines a decision boundary or plane whose normal vector serves as the steering vector. Another way of putting it: through training with logistic regression this technique trains classifiers to clearly distinguish different actions; resulting from that training optimization determines a decision border/ plane and selects the normal vector as steering vector.

Strategy benefits most when there is strong orthogonality compared to linearity among features. When contrast between orthogonal features stands out clearly, each pair conveys distinct directional information so averaging by CAA typically closely matches zero. This approach works particularly well because it focuses on enhancing classification performance via Hyperplane Method rather than computing simple average directly. Essentially distinguishing between directions is key to success.

This approach employs logistic regression along with optimization using L-BFGS via Scikit Learn; by default resulting weights undergo normalization using L2 to maintain consistent directional strength across different iterations through training process.

When to Use HYPERPLANE

Orthogonal Geometry: When contrastive pairs have varied directions rather than a common axis
CAA Fails: When CAA produces weak or inconsistent steering vectors
Classification Focus: When you want a direction that maximally separates positive from negative
Regularization Needed: When you need control over the regularization strength (C parameter)

Hyperplane works with linearly separable activations

Works: Logistic regression finds an optimal separating hyperplane between a⁺ and a⁻ clusters.

Fails: XOR-like patterns where classes alternate in quadrants cannot be separated by any single hyperplane.

CLI Examples

Basic HYPERPLANE training

python -m wisent.cli tasks honesty_pairs.json --from-json --steering-mode --steering-method HYPERPLANE --layer 15 --save-steering-vector honesty_hyperplane.pt

HYPERPLANE with custom regularization

python -m wisent.cli tasks safety_pairs.json --from-json --steering-mode --steering-method HYPERPLANE --layer 12 --hyperplane-C 0.1 --hyperplane-max-iter 2000 --save-steering-vector safety_hyperplane.pt

HYPERPLANE inference using saved vector

python -m wisent.cli tasks test_questions.json --from-json --steering-mode --steering-method HYPERPLANE --layer 15 --load-steering-vector honesty_hyperplane.pt --steering-strength 1.2

HYPERPLANE without normalization

python -m wisent.cli tasks refusal_pairs.json --from-json --steering-mode --steering-method HYPERPLANE --layer 18 --no-normalize --save-steering-vector refusal_hyperplane.pt

Parameters

HYPERPLANE Specific Parameters

--hyperplane-C

Smaller levels of intensity used for regularizing corresponds to greater strength of regularization and defaults at one unit.

--hyperplane-max-iter

Maximum iterations for L-BFGS solver (default: 1000)

--normalize

L2-normalize the resulting vector (default: true)

Common Steering Parameters

--layer

Layer index to apply steering

--steering-strength

Magnitude of steering effect during inference

--save-steering-vector

Path to save the trained steering vector

--load-steering-vector

Path to load a pre-trained steering vector

Carefully consider an elaborate implementation of HYPERPLANE together with use of Wisent. Carefully scrutinize an extensive execution of HYPERPLANE along with utilization of Wisent.

View hyperplane.py on GitHub

Stay in the loop. Never miss out.

Subscribe to our newsletter and unlock Wisent insights.

Contact Careers Privacy Policy Terms of Service