Hi there
Welcome to my Homepage!
I am an undergraduate (2022-2026) at Xidian University, focusing on Computer Vision and Robot Learning.
I work at MMLab@HKU with Prof. Xihui Liu. Previously I worked at MVIG@SJTU with Prof. Lixin Yang and Prof. Cewu Lu.
Currently I conduct the VLA research at ByteDance Seed.
News
- DSPv2 and MBA are accepted in ICRA 2026 🔥
- Dense Policy is accepted in ICCV 2025 🔥
- MBA is accepted in IEEE RA-L 2025 🔥
- Our work Advdisplay was accepted at AAAI 2025 🔥
- In charge of Microsoft Club. Feel free to reach out if you’d like to join.
Experience




Publications

World Guidance: World Modeling in Condition Space for Action Generation
Yue Su, Sijin Chen, Haixin Shi, Mingyu Liu, Zhengshen Zhang, Ningyuan Huang, Weiheng Zhong, Zhengbang Zhu, Yuxiao Liu†, Xihui Liu†
We propose WoG (World Guidance), a world modeling paradigm in condition space for action generation: less is more.
Arxiv Preprint [arXiv] [code] [website]
Yue Su, Sijin Chen, Haixin Shi, Mingyu Liu, Zhengshen Zhang, Ningyuan Huang, Weiheng Zhong, Zhengbang Zhu, Yuxiao Liu†, Xihui Liu†
We propose WoG (World Guidance), a world modeling paradigm in condition space for action generation: less is more.
Arxiv Preprint [arXiv] [code] [website]
CLAP: Contrastive Latent Action Pretraining for Learning Vision-Language-Action Models from Human Videos
Chubin Zhang*, Jianan Wang*, Zifeng Gao, Yue Su, Tiranru Dai, Cai Zhou,
Jiwen Lu, Yansong Tang†
Learning Vision-Language-Action Models from Human Videos.
ArXiv Preprint [arXiv] [code] [website]
Chubin Zhang*, Jianan Wang*, Zifeng Gao, Yue Su, Tiranru Dai, Cai Zhou,
Jiwen Lu, Yansong Tang†
Learning Vision-Language-Action Models from Human Videos.
ArXiv Preprint [arXiv] [code] [website]
DSPv2: Improved Dense Policy for Effective and Generalizable Whole-body Mobile Manipulation
Yue Su, Chubin Zhang, Sijin Chen, Liufan Tan,
Yansong Tang, Jianan Wang, Xihui Liu†
Improved Dense Policy for Whole-body Mobile Manipulation, with effective perception, generalizable manipulation and coherent actions.
ICRA 2026 [arXiv] [code] [website]
Yue Su, Chubin Zhang, Sijin Chen, Liufan Tan,
Yansong Tang, Jianan Wang, Xihui Liu†
Improved Dense Policy for Whole-body Mobile Manipulation, with effective perception, generalizable manipulation and coherent actions.
ICRA 2026 [arXiv] [code] [website]
Dense Policy: Bidirectional Autoregressive Learning of Actions
Yue Su*, Xinyu Zhan*, Hongjie Fang, Han Xue,
Haoshu Fang, Yong-Lu Li, Cewu Lu, Lixin Yang†
Propose Dense Policy, A bidirectional robotic autoregressive policy, which infers trajectories by gradually expanding actions from sparse keyframes, demonstrated exceeding diffusion policies.
ICCV 2025 [paper] [arXiv] [website] [3D-code] [2D-code]
Yue Su*, Xinyu Zhan*, Hongjie Fang, Han Xue,
Haoshu Fang, Yong-Lu Li, Cewu Lu, Lixin Yang†
Propose Dense Policy, A bidirectional robotic autoregressive policy, which infers trajectories by gradually expanding actions from sparse keyframes, demonstrated exceeding diffusion policies.
ICCV 2025 [paper] [arXiv] [website] [3D-code] [2D-code]

Motion Before Action: Diffusing Object Motion as Manipulation Condition
Yue Su*, Xinyu Zhan*, Hongjie Fang, Yong-Lu Li, Cewu Lu, Lixin Yang†
Propose MBA, a novel plug-and-play module leveraging cascaded diffusion processes to generate actions guided by object motion, enabling seamless integration with manipulation policies.
RA-L 2025, ICRA 2026 [paper] [arxiv] [website] [code]
Yue Su*, Xinyu Zhan*, Hongjie Fang, Yong-Lu Li, Cewu Lu, Lixin Yang†
Propose MBA, a novel plug-and-play module leveraging cascaded diffusion processes to generate actions guided by object motion, enabling seamless integration with manipulation policies.
RA-L 2025, ICRA 2026 [paper] [arxiv] [website] [code]

Generative Adversarial Patches for Physical Attacks on Cross-Modal Pedestrian Re-Identification
Yue Su, Hao Li†, Maoguo Gong†
A generative physical adversarial attack on VI-ReID models perturbs modality-invariant features.
ArXiv Preprint [arxiv]
Yue Su, Hao Li†, Maoguo Gong†
A generative physical adversarial attack on VI-ReID models perturbs modality-invariant features.
ArXiv Preprint [arxiv]

AdvDisplay: Adversarial Display Assembled by Thermoelectric Cooler for Fooling Thermal Infrared Detectors
Hao Li†, Fanggao Wan, Yue Su, Yue Wu, Mingyang Zhang, Maoguo Gong†
Historically, infrared adversarial attacks were single-use and tough to deploy. Using TEC, we implemented efficient attacks adaptable to hardware scenarios.
AAAI 2025 [paper]
Hao Li†, Fanggao Wan, Yue Su, Yue Wu, Mingyang Zhang, Maoguo Gong†
Historically, infrared adversarial attacks were single-use and tough to deploy. Using TEC, we implemented efficient attacks adaptable to hardware scenarios.
AAAI 2025 [paper]
Projects
ManiUniCon: A Unified Control Interface for Robotic Manipulation
ManiUniCon is a comprehensive, multi-process robotics control framework designed for robotic manipulation tasks. It provides a unified interface for controlling various robot arms, integrating sensors, and executing policies in real-time.
[code]
ManiUniCon is a comprehensive, multi-process robotics control framework designed for robotic manipulation tasks. It provides a unified interface for controlling various robot arms, integrating sensors, and executing policies in real-time.
[code]

MetaPalace: Let you in a meta world of The Palace Museum
We've done what the Old Palace official website couldn't: offering 3D artifact views with single-view reconstruction and an interactive LLM-powered tour guider using RAG technology.
[website] [front-end code] [back-end code]
We've done what the Old Palace official website couldn't: offering 3D artifact views with single-view reconstruction and an interactive LLM-powered tour guider using RAG technology.
[website] [front-end code] [back-end code]
Awards
- Xiaomi Outstanding Scholarship 2025
- National Scholarship 2025
- Outstanding Student, Xidian University, 2025
Talks
- [2025/12] Invited to Talk on NICE seminar about Imitation Learning
- [2025/12] Invited to Talk on RL China about DSPv2
- [2025/10] Invited to Talk on 3D视觉工坊 about DSP and DSPv2

