Controlling Language Models and Distributions with Active Transport

The increasing power of large production models and their more widespread distribution have raised concerns about their reliability, safety, and potential misuse. To address these issues, recent works have proposed to control the generation of models by activating the models to attract or inhibit the emergence of concepts or behaviors from the generated output. In this paper we present Activation Transport (AcT), a general framework for guiding activation guided by optimal transport theory that enables many previous works on activation. AcT is modality-agnostic and provides granular control over the model's behavior with a computationally agnostic subject, while having little impact on the model's capabilities. We experimentally demonstrate the efficiency and flexibility of our approach to address large-scale linguistic models (LLMs) and text-to-image distribution models (T2Is). In LLMs, we show that AcT can effectively reduce toxicity, create abstract concepts, and increase their authenticity. In T2Is, we show how AcT enables good style control and logic denial.