Classifying through use of decision boundaries via logistic regression differs from CAAM (Classification and Alignment Method); instead of this approach the new method trains classifiers to distinguish among varying degrees of activation and then derives steering directions based on learned weights from these classifiers. To classify using decision boundary by means of logistic regression contrasts to CAAM (Classification Alignment Method): this alternative approach trains classifiers for distinguishing different intensities of activation and derives steering directions directly from weights acquired by those classifiers.
This approach trains models to distinguish between distinct kinds of activity via classification using logistic regression. From this training process, optimization results in forming a decision boundary or hyperplane. That resulting plane's normal vector becomes the steering vector. To put it another way: this procedure trains models to discern specific types of actions clearly with logistic regression; resulting from that training process, optimization leads to formation of a decision border or plane and selection of its normal vector as steering vector. Rephrased simply: this method trains classifiers to distinguish clearly different activities using logistic regression. As a result, after training, optimization defines a decision boundary or plane whose normal vector serves as the steering vector. Another way of putting it: through training with logistic regression this technique trains classifiers to clearly distinguish different actions; resulting from that training optimization determines a decision border/ plane and selects the normal vector as steering vector.
Strategy benefits most when there is strong orthogonality compared to linearity among features. When contrast between orthogonal features stands out clearly, each pair conveys distinct directional information so averaging by CAA typically closely matches zero. This approach works particularly well because it focuses on enhancing classification performance via Hyperplane Method rather than computing simple average directly. Essentially distinguishing between directions is key to success.
This approach employs logistic regression along with optimization using L-BFGS via Scikit Learn; by default resulting weights undergo normalization using L2 to maintain consistent directional strength across different iterations through training process.

Works: Logistic regression finds an optimal separating hyperplane between a⁺ and a⁻ clusters.

Fails: XOR-like patterns where classes alternate in quadrants cannot be separated by any single hyperplane.
python -m wisent.cli tasks honesty_pairs.json --from-json --steering-mode --steering-method HYPERPLANE --layer 15 --save-steering-vector honesty_hyperplane.pt
python -m wisent.cli tasks safety_pairs.json --from-json --steering-mode --steering-method HYPERPLANE --layer 12 --hyperplane-C 0.1 --hyperplane-max-iter 2000 --save-steering-vector safety_hyperplane.pt
python -m wisent.cli tasks test_questions.json --from-json --steering-mode --steering-method HYPERPLANE --layer 15 --load-steering-vector honesty_hyperplane.pt --steering-strength 1.2
python -m wisent.cli tasks refusal_pairs.json --from-json --steering-mode --steering-method HYPERPLANE --layer 18 --no-normalize --save-steering-vector refusal_hyperplane.pt
Carefully consider an elaborate implementation of HYPERPLANE together with use of Wisent. Carefully scrutinize an extensive execution of HYPERPLANE along with utilization of Wisent.
Stay in the loop. Never miss out.
Subscribe to our newsletter and unlock Wisent insights.