DSO: Development of Direct Orientation for Bias Reduction

Generative models are often used to make decisions on behalf of users, such as visual language models (VLMs) that identify which person in a room is a doctor to help visually impaired people. However, VLM decisions are influenced by demographic assumptions in the input, which can lead to biased results such as failure to identify women as physicians. Furthermore, when bias reduction leads to a loss of performance, users may have different needs for balancing bias reduction and overall model strength, highlighting the need for methods that allow for controlled bias reduction during decision making. Activation guidance is a popular technique for controlling inference time that has shown potential in inducing safe behavior in large language models (LLMs). However, we recognize that current guidelines strive to address bias, where equal outcomes across demographic groups are required. To address this, we propose Direct Steering Optimization (DSO) that uses reinforcement learning to detect changes in steering activation lines, designed to minimize bias while maintaining model performance control. We show that DSO achieves a high-level trade-off between fairness and power in both VLMs and LLMs, while giving experts control over the time to think about the trade-offs. Overall, our work highlights the advantage of designing guidance strategies specifically developed to control a behavioral model, which provide more effective biased interventions than approaches that rely on predefined heuristics for control.
- † Carnegie Mellon University
- ‡ Equal contribution
- ** Work done while at Apple



