Hi there
Welcome to my Homepage!
I am an undergraduate (2022-2026) at Xidian University and incoming Phd at MMLab@HKU with Prof. Xihui Liu. Previously I worked at ByteDance Seed. I was also an RA at MVIG@SJTU with Prof. Lixin Yang and Prof. Cewu Lu.
News
- 2026/05 Finished my internship at ByteDance Seed.
Experience






Hubei Wuchang Experimental High School
Sep 2019 - June 2022
那是一段小有遗憾的幸福时光.
Sep 2019 - June 2022
那是一段小有遗憾的幸福时光.
Publications

Towards Generalist Game Players: An Investigation of Foundation Models in the Game Multiverse
Kuan Zhang*, Dongchen Liu*, Qiyue Zhao*, Tianyu Xin*, *, Haisheng Wang, Han Yin, Hongbo Ma, Peize Li, ..., Yiming Li†
A survey of foundation models as generalist game players across datasets, models, harnesses, and benchmarks.
ArXiv Preprint [arXiv] [code]
Kuan Zhang*, Dongchen Liu*, Qiyue Zhao*, Tianyu Xin*, *, Haisheng Wang, Han Yin, Hongbo Ma, Peize Li, ..., Yiming Li†
A survey of foundation models as generalist game players across datasets, models, harnesses, and benchmarks.
ArXiv Preprint [arXiv] [code]
World Guidance: World Modeling in Condition Space for Action Generation
, Sijin Chen, Haixin Shi, Mingyu Liu, Zhengshen Zhang, Ningyuan Huang, Weiheng Zhong, Zhengbang Zhu, Yuxiao Liu†, Xihui Liu†
We propose WoG (World Guidance), a world modeling paradigm in condition space for action generation: less is more.
ICML 2026 [arXiv] [code] [website]
, Sijin Chen, Haixin Shi, Mingyu Liu, Zhengshen Zhang, Ningyuan Huang, Weiheng Zhong, Zhengbang Zhu, Yuxiao Liu†, Xihui Liu†
We propose WoG (World Guidance), a world modeling paradigm in condition space for action generation: less is more.
ICML 2026 [arXiv] [code] [website]
CLAP: Contrastive Latent Action Pretraining for Learning Vision-Language-Action Models from Human Videos
Chubin Zhang*, Jianan Wang*, Zifeng Gao, , Tiranru Dai, Cai Zhou,
Jiwen Lu, Yansong Tang†
Learning Vision-Language-Action Models from Human Videos.
ArXiv Preprint [机器之心] [arXiv] [code] [website]
Chubin Zhang*, Jianan Wang*, Zifeng Gao, , Tiranru Dai, Cai Zhou,
Jiwen Lu, Yansong Tang†
Learning Vision-Language-Action Models from Human Videos.
ArXiv Preprint [机器之心] [arXiv] [code] [website]
DSPv2: Improved Dense Policy for Effective and Generalizable Whole-body Mobile Manipulation
, Chubin Zhang, Sijin Chen, Liufan Tan,
Yansong Tang, Jianan Wang, Xihui Liu†
Improved Dense Policy for Whole-body Mobile Manipulation, with effective perception, generalizable manipulation and coherent actions.
ICRA 2026 [arXiv] [code] [website]
, Chubin Zhang, Sijin Chen, Liufan Tan,
Yansong Tang, Jianan Wang, Xihui Liu†
Improved Dense Policy for Whole-body Mobile Manipulation, with effective perception, generalizable manipulation and coherent actions.
ICRA 2026 [arXiv] [code] [website]
Dense Policy: Bidirectional Autoregressive Learning of Actions
*, Xinyu Zhan*, Hongjie Fang, Han Xue,
Haoshu Fang, Yong-Lu Li, Cewu Lu, Lixin Yang†
Propose Dense Policy, A bidirectional robotic autoregressive policy, which infers trajectories by gradually expanding actions from sparse keyframes, demonstrated exceeding diffusion policies.
ICCV 2025 [paper] [arXiv] [website] [3D-code] [2D-code]
*, Xinyu Zhan*, Hongjie Fang, Han Xue,
Haoshu Fang, Yong-Lu Li, Cewu Lu, Lixin Yang†
Propose Dense Policy, A bidirectional robotic autoregressive policy, which infers trajectories by gradually expanding actions from sparse keyframes, demonstrated exceeding diffusion policies.
ICCV 2025 [paper] [arXiv] [website] [3D-code] [2D-code]

Motion Before Action: Diffusing Object Motion as Manipulation Condition
*, Xinyu Zhan*, Hongjie Fang, Yong-Lu Li, Cewu Lu, Lixin Yang†
Propose MBA, a novel plug-and-play module leveraging cascaded diffusion processes to generate actions guided by object motion, enabling seamless integration with manipulation policies.
RA-L 2025, ICRA 2026 [paper] [arxiv] [website] [code]
*, Xinyu Zhan*, Hongjie Fang, Yong-Lu Li, Cewu Lu, Lixin Yang†
Propose MBA, a novel plug-and-play module leveraging cascaded diffusion processes to generate actions guided by object motion, enabling seamless integration with manipulation policies.
RA-L 2025, ICRA 2026 [paper] [arxiv] [website] [code]

Generative Adversarial Patches for Physical Attacks on Cross-Modal Pedestrian Re-Identification
, Hao Li†, Maoguo Gong†
A generative physical adversarial attack on VI-ReID models perturbs modality-invariant features.
ArXiv Preprint [arxiv]
, Hao Li†, Maoguo Gong†
A generative physical adversarial attack on VI-ReID models perturbs modality-invariant features.
ArXiv Preprint [arxiv]

AdvDisplay: Adversarial Display Assembled by Thermoelectric Cooler for Fooling Thermal Infrared Detectors
Hao Li†, Fanggao Wan, , Yue Wu, Mingyang Zhang, Maoguo Gong†
Historically, infrared adversarial attacks were single-use and tough to deploy. Using TEC, we implemented efficient attacks adaptable to hardware scenarios.
AAAI 2025 [paper]
Hao Li†, Fanggao Wan, , Yue Wu, Mingyang Zhang, Maoguo Gong†
Historically, infrared adversarial attacks were single-use and tough to deploy. Using TEC, we implemented efficient attacks adaptable to hardware scenarios.
AAAI 2025 [paper]
Projects
ManiUniCon: A Unified Control Interface for Robotic Manipulation
ManiUniCon is a comprehensive, multi-process robotics control framework designed for robotic manipulation tasks. It provides a unified interface for controlling various robot arms, integrating sensors, and executing policies in real-time.
Universal-Control Team [code]
ManiUniCon is a comprehensive, multi-process robotics control framework designed for robotic manipulation tasks. It provides a unified interface for controlling various robot arms, integrating sensors, and executing policies in real-time.
Universal-Control Team [code]

MetaPalace: Let you in a meta world of The Palace Museum
We've done what the Old Palace official website couldn't: offering 3D artifact views with single-view reconstruction and an interactive LLM-powered tour guider using RAG technology.
[website] [front-end code] [back-end code]
We've done what the Old Palace official website couldn't: offering 3D artifact views with single-view reconstruction and an interactive LLM-powered tour guider using RAG technology.
[website] [front-end code] [back-end code]
Awards
- 2025 Xiaomi Outstanding Scholarship
- 2025 National Scholarship
- 2025 Outstanding Student, Xidian University
Talks
- 2026/03 Invited to Talk on RoboTion about WoG.
- 2025/12 Invited to Talk on NICE seminar about Imitation Learning.
- 2025/12 Invited to Talk on RL China about DSPv2.
- 2025/10 Invited to Talk on 3D视觉工坊 about DSP and DSPv2.
