Local Methods of Coherence in Conditional Classification

Conditional distribution models appear to be able to generalize, that is, to generate convincing samples of combinations that fall outside the distribution of conditions, but the mechanisms underlying this ability remain unclear. To do this in concrete, we learn the general length, the ability to produce images with more objects than what is seen during training. In a controlled CLEVR setting (Johnson et al., 2017), we find that longitudinal integration is achievable in some cases but not in others, suggesting that the models sometimes learn an underlying underlying structure. We then investigate space as a form of clustering in general. Previous works proposed point location as an innovation mechanism in unconditional distribution models (Kamb & Ganguli, 2024; Niedoba et al., 2024), but did not deal with adaptive or clustering in general. In this paper, we prove the true equivalence between a specific compositional structure (conditional composition) (Bradley et al., 2025) and scores with few dependencies on both pixels and conditioners (local conditional scores). This theory also extends to conceptualization (such as style+content) in feature space. We confirm our theory empirically: CLEVR models that succeed in generalization length exhibit spatially conditional scores, while those that fail do not. Furthermore, we show that a causal intervention that implicitly enforces local conditional scores enables longitudinal generalization in a model that previously failed. Finally, we investigate the SDXL and find that in the pixel area, the local area exists but the conditional area usually does not; however, we find evidence of multiple local conditional scores in the studied network feature area.



