Embedded Long-Term Movement Learning for Active Kinematics Generation

0 3 1 minute read

Embedded Long-Term Movement Learning for Active Kinematics Generation

Understanding and predicting movement is an important aspect of visual intelligence. Although modern video models show a strong understanding of scene dynamics, exploring possible futures using complete video integration remains inadequate. We model the scene dynamics orders of magnitude more effectively by working directly on long-term motion embeddings learned from large-scale trajectories obtained from tracker models. This enables the efficient production of long, realistic movements that achieve specified goals through text messages or location pokes. To achieve this, we first study the highly compressed motion embedding with a temporal compression factor of 64×. In this space, we train a conditional flow matching model to generate hidden motion objects placed on job descriptions. The resulting motion distribution goes beyond both state-of-the-art video models and task-specific methods.