Hi there
Welcome to my Homepage!
I am an undergraduate (2022-2026) at Xidian University, focusing on Vision-Language-Action. I work at MVIG@SJTU with Prof. Lixin Yang and Prof. Cewu Lu.
News
- Unlesh the potential of Autoregressive policy: Dense Policy is accepted in ICCV 2025 🔥
- MBA is accepted in IEEE RA-L 2025 🔥
- Our work Advdisplay was accepted at AAAI 2025 🔥
- In charge of Microsoft Club. Feel free to reach out if you’d like to join.
Experience


Publications

Dense Policy: Bidirectional Autoregressive Learning of Actions
Yue Su*, Xinyu Zhan*, Hongjie Fang, Han Xue,
Haoshu Fang, Yong-Lu Li, Cewu Lu, Lixin Yang†
Propose Dense Policy, A bidirectional robotic autoregressive policy, which infers trajectories by gradually expanding actions from sparse keyframes, demonstrated exceeding diffusion policies.
ICCV 2025 [arXiv] [website] [3D-code] [2D-code]
Yue Su*, Xinyu Zhan*, Hongjie Fang, Han Xue,
Haoshu Fang, Yong-Lu Li, Cewu Lu, Lixin Yang†
Propose Dense Policy, A bidirectional robotic autoregressive policy, which infers trajectories by gradually expanding actions from sparse keyframes, demonstrated exceeding diffusion policies.
ICCV 2025 [arXiv] [website] [3D-code] [2D-code]

Motion Before Action: Diffusing Object Motion as Manipulation Condition
Yue Su*, Xinyu Zhan*, Hongjie Fang, Yong-Lu Li, Cewu Lu, Lixin Yang†
Propose MBA, a novel plug-and-play module leveraging cascaded diffusion processes to generate actions guided by object motion, enabling seamless integration with manipulation policies.
RA-L 2025, ICRA 2026 [paper] [arxiv] [website] [code]
Yue Su*, Xinyu Zhan*, Hongjie Fang, Yong-Lu Li, Cewu Lu, Lixin Yang†
Propose MBA, a novel plug-and-play module leveraging cascaded diffusion processes to generate actions guided by object motion, enabling seamless integration with manipulation policies.
RA-L 2025, ICRA 2026 [paper] [arxiv] [website] [code]

Generative Adversarial Patches for Physical Attacks on Cross-Modal Pedestrian Re-Identification
Yue Su, Hao Li†, Maoguo Gong†
A generative physical adversarial attack on VI-ReID models perturbs modality-invariant features.
ArXiv Preprint [arxiv]
Yue Su, Hao Li†, Maoguo Gong†
A generative physical adversarial attack on VI-ReID models perturbs modality-invariant features.
ArXiv Preprint [arxiv]

AdvDisplay: Adversarial Display Assembled by Thermoelectric Cooler for Fooling Thermal Infrared Detectors
Hao Li†, Fanggao Wan, Yue Su, Yue Wu, Mingyang Zhang, Maoguo Gong†
Historically, infrared adversarial attacks were single-use and tough to deploy. Using TEC, we implemented efficient attacks adaptable to hardware scenarios.
AAAI 2025 [paper]
Hao Li†, Fanggao Wan, Yue Su, Yue Wu, Mingyang Zhang, Maoguo Gong†
Historically, infrared adversarial attacks were single-use and tough to deploy. Using TEC, we implemented efficient attacks adaptable to hardware scenarios.
AAAI 2025 [paper]
Projects

MetaPalace: Let you in a meta world of The Palace Museum
We've done what the Old Palace official website couldn't: offering 3D artifact views with single-view reconstruction and an interactive LLM-powered tour guider using RAG technology.
[website] [front-end code] [back-end code]
We've done what the Old Palace official website couldn't: offering 3D artifact views with single-view reconstruction and an interactive LLM-powered tour guider using RAG technology.
[website] [front-end code] [back-end code]

U-pre: U-Net is an excellent learner for time series forecasting
Time series forecasting is suited for U-Net's architecture due to its consistent input-output distributions and strong mathematical alignment. Combining U-Net with Bert-Encoder improved performance by incorporating both local and global attention.
[code] [report-cn]
Time series forecasting is suited for U-Net's architecture due to its consistent input-output distributions and strong mathematical alignment. Combining U-Net with Bert-Encoder improved performance by incorporating both local and global attention.
[code] [report-cn]

M-pre: Mamba for time series forecasting
We tried Mamba for time series forecasting based on feature-conditioned tokens, which outpreformed transformer-based U-pre.
[code] [report-cn]
We tried Mamba for time series forecasting based on feature-conditioned tokens, which outpreformed transformer-based U-pre.
[code] [report-cn]

UniGen: Unified understanding and generation based on Flicker 8k dataset
A light-weight model for joint learning of language and image based on tiny captioned image dataset. UniGen is equipped with the abilities of image genration and language description in one model.
[code]
A light-weight model for joint learning of language and image based on tiny captioned image dataset. UniGen is equipped with the abilities of image genration and language description in one model.
[code]


OpenDoBot: Generalizable visual-motor policy on Dobot Robot
We develop Imitation Learning policy and multi stage detection based policy on DoBot Robot, which has been proved to preform well on combinatorial problems.
[DoBot Robot] [IL policy code] [multi stage code]
We develop Imitation Learning policy and multi stage detection based policy on DoBot Robot, which has been proved to preform well on combinatorial problems.
[DoBot Robot] [IL policy code] [multi stage code]

FGSM3D: Is the point cloud gradient perturbation attack feasible?
We tried to extend FGSM to the 3D field and achieved significant success within a certain gradient range, but the sampling method of 3D models tells us that things seem to be not that simple...
[code] [report-cn]
We tried to extend FGSM to the 3D field and achieved significant success within a certain gradient range, but the sampling method of 3D models tells us that things seem to be not that simple...
[code] [report-cn]