Reactive Machines

DiffuCoder: Understanding and Developing Hidden Diffusion Models for Coding

Large-scale linguistic models (dLLMs) are compelling alternatives to autoregressive (AR) models because their denoising models apply to all sequences. The global programming and reproducibility features of dLLM are particularly useful for coding. However, the current training and thinking methods of dLLMs in coding are still under-researched. To demystify the recording behavior of dLLMs and unlock their coding capabilities, we systematically investigate their denoising processes and reinforcement learning (RL) methods. We train a 7B dLLM, textbf{DiffuCoder}, on 130B code tokens. Using this model as a testbed, we analyze its coding behavior, and show how it differs from that of AR models: (1) dLLMs can decide how their generation should be causal without relying on partial AR decoding, and (2) increasing the sampling temperature not only differentiates the choice of tokens but also their generation system. This variation creates a rich search space for RL output. For RL training, in order to reduce the variance of the token log presence estimates and maintain the training efficiency, we propose textbf{coupled-GRPO}, a novel sampling scheme that creates a coupled mask noise to complete the training. In our tests, cooked-GRPO significantly improves DiffuCoder's performance in code generation benchmarks (+4.4% in EvalPlus) and reduces reliance on AR selection during recording. Our work provides deep insight into dLLM production mechanisms and provides an effective, distributive RL training framework.

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button