🌹 🌹 🌹 🌹 🌹 🌹 🌹 🌹 🌹 🌹 🌹 🌹 🌹 🌹 🌹

DenSe Policy: Bidirectional Autoregressive Learning of Actions

1Shanghai Jiao Tong University, 2Xidian University, 3Shanghai Innovation Institute
* Equal contribution    † Corresponding author

Bidirectional Learning, Coarse-to-fine Inference

DSP Diagram

Dense Policy provides new insights into policy learning. From a sequence learning perspective, we posit a novel paradigm: bidirectional prediction offers advantages over unidirectional prediction for sequence modeling. Regarding action generation, we explore a novel approach, demonstrating that expanding actions from sparse keyframes to complete, dense frames via inference is more effective than modeling the joint distribution directly.

Abstract

DSP Diagram
Mainstream visuomotor policies predominantly rely on generative models for holistic action prediction, while current autoregressive policies, predicting the next token or chunk, have shown suboptimal results. This motivates a search for more effective learning methods to unleash the potential of autoregressive policies for robotic manipulation. This paper introduces a bidirectionally expanded learning approach, termed Dense Policy, to establish a new paradigm for autoregressive policies in action prediction. It employs a lightweight encoder-only architecture to iteratively unfold the action sequence from an initial single frame into the target sequence in a coarse-to-fine manner with logarithmic-time inference. Extensive experiments validate that our dense policy has superior autoregressive learning capabilities and can surpass existing holistic generative policies. Our policy, example data, and training code will be publicly available upon publication.

Dense Policy Overview

DSP Diagram

Dense Policy accepts visual inputs in different modalities and optional robot proprioception. It employs a unified encoder to perform cross-attention between hierarchical action representations and observation features. This facilitates a bidirectionally expanding dense process. During each dense process level, the actions, initially represented as sparse keyframes, are progressively infilled and refined into a complete predicted sequence, leading to a coarse-to-fine generation procedure.

Presentation

DSP Diagram Another Diagram

Compared to other downstream policies, the Dense Policy achieves a superior balance between a lightweight parameters and efficient inference speed. Furthermore, it exhibits enhanced learning efficiency, achieving superior performance within the same number of training iterations. We demonstrate its efficacy across four distinct manipulation tasks.

Open Drawer at 4x Speed

                                      Dense Policy

Baseline                                      

Put Bread into Pot at 4x Speed

                                      Dense Policy

Baseline                                      

Pour Balls at 4x Speed

                                      Dense Policy

Baseline                                      

Flower Arrangement at 4x Speed

                                      Dense Policy

Baseline                                      

Deal with different kinds of objects and complete different kinds of tasks

DSP sim
DSP real

Dense Policy exhibits superior performance across a diverse range of manipulation tragets, including rigid bodies, deformable objects, and articulated structures, as well as tasks characterized by high degrees of freedom, long horizons, and multi-object interactions. This is primarily attributed to its bidirectional sequence modeling, which produces smoother, more adaptive action trajectories, and its coarse-to-fine hierarchical inference, enabling high-precision actions suitable for manipulation tasks with low error tolerance.

Limitations

We evaluate the zero-shot generalization capability of the dense policy by attempting to elevate both the cup and the bowl. This scenario, absent from the expert training demonstrations, presents an out-of-distribution challenge, implying the policy has likely not encountered grasping and pouring actions from such elevated heights during training. Our findings reveal that the policy achieves complete ball transfer when only the cup is elevated. However, performance degrades significantly when both the cup and the bowl are raised simultaneously. This observation indicates that while the policy exhibits a certain degree of generalization ability, it is not yet sufficiently robust to handle such combined perturbations.

All Cases at 4x Speed

               (a) Elevate cup

                             (b) Elevate cup and bowl

                   (c) The Flower Dilemma         

We also conducted an intriguing experiment focusing on the task of flower arrangement, a task that stringently tests a model's spatial reasoning capabilities, as mentioned in our paper. While the order of picking flowers is often inconsequential to task success, certain extreme cases necessitate a specific sequence. For instance, in the case presented, only by first inserting the flower with the blue base into the cup can all three flowers be successfully arranged. Otherwise, inserting other flowers initially makes grasping the blue-base flower significantly more challenging due to spatial constraints and the risk of collision with already-inserted flowers, potentially leading to failure. Furthermore, the dense point cloud in this crowded scene constricts the action space, inherently increasing the task difficulty. The dense policy, after successfully inserting the red-base flower, encountered the difficulty in picking the blue-base flower and subsequently stalled, highlighting a limitation in handling such constrained scenarios.

Citation

@article{su2025dense,
  title={Dense Policy: Bidirectional Autoregressive Learning of Actions},
  author={Su, Yue and Zhan, Xinyu and Fang, Hongjie and Xue, Han and Fang, Hao-Shu and Li, Yong-Lu and Lu, Cewu and Yang, Lixin},
  journal={arXiv preprint arXiv:2503.13217},
  year={2025}
}