Hi there
Welcome to my Homepage!
I am an undergraduate (2022-2026) at Xidian University, focusing on Vision-Language-Action. I work at MVIG@SJTU with Prof. Lixin Yang and Prof. Cewu Lu.
News
- MBA is accepted in IEEE RA-L 2025 🔥
- Unlesh the potential of Autoregressive model in imitation learning: Dense Policy is on preprint!
- Our work Advdisplay was accepted at AAAI 2025 🔥
- In charge of Microsoft Club. Feel free to reach out if you’d like to join.
Research Experience


Publications

Dense Policy: Bidirectional Autoregressive Learning of Actions
Yue Su*, Xinyu Zhan*, Hongjie Fang, Han Xue,
Haoshu Fang, Yong-Lu Li, Cewu Lu, Lixin Yang†
Propose Dense Policy, A bidirectional robotic autoregressive policy, which infers trajectories by gradually expanding actions from sparse keyframes, demonstrated exceeding diffusion policies.
[arXiv] [website] [3D-code] [2D-code]
Yue Su*, Xinyu Zhan*, Hongjie Fang, Han Xue,
Haoshu Fang, Yong-Lu Li, Cewu Lu, Lixin Yang†
Propose Dense Policy, A bidirectional robotic autoregressive policy, which infers trajectories by gradually expanding actions from sparse keyframes, demonstrated exceeding diffusion policies.
[arXiv] [website] [3D-code] [2D-code]

Motion Before Action: Diffusing Object Motion as Manipulation Condition
Yue Su*, Xinyu Zhan*, Hongjie Fang, Yong-Lu Li, Cewu Lu, Lixin Yang†
Propose MBA, a novel plug-and-play module leveraging cascaded diffusion processes to generate actions guided by object motion, enabling seamless integration with manipulation policies.
IEEE RA-L 2025
[arxiv] [website] [code]
Yue Su*, Xinyu Zhan*, Hongjie Fang, Yong-Lu Li, Cewu Lu, Lixin Yang†
Propose MBA, a novel plug-and-play module leveraging cascaded diffusion processes to generate actions guided by object motion, enabling seamless integration with manipulation policies.
IEEE RA-L 2025
[arxiv] [website] [code]

Generative Adversarial Patches for Physical Attacks on Cross-Modal Pedestrian Re-Identification
Yue Su, Hao Li†, Maoguo Gong†
A generative physical adversarial attack on VI-ReID models perturbs modality-invariant features.
[arxiv]
Yue Su, Hao Li†, Maoguo Gong†
A generative physical adversarial attack on VI-ReID models perturbs modality-invariant features.
[arxiv]

AdvDisplay: Adversarial Display Assembled by Thermoelectric Cooler for Fooling Thermal Infrared Detectors
Hao Li†, Fanggao Wan, Yue Su, Yue Wu, Mingyang Zhang, Maoguo Gong†
Historically, infrared adversarial attacks were single-use and tough to deploy. Using TEC, we implemented efficient attacks adaptable to hardware scenarios.
AAAI 2025
[paper]
Hao Li†, Fanggao Wan, Yue Su, Yue Wu, Mingyang Zhang, Maoguo Gong†
Historically, infrared adversarial attacks were single-use and tough to deploy. Using TEC, we implemented efficient attacks adaptable to hardware scenarios.
AAAI 2025
[paper]
Projects

MetaPalace: Let you in a meta world of The Palace Museum
We've done what the Old Palace official website couldn't: offering 3D artifact views with single-view reconstruction and an interactive LLM-powered tour guider using RAG technology.
[website] [front-end code] [back-end code]
We've done what the Old Palace official website couldn't: offering 3D artifact views with single-view reconstruction and an interactive LLM-powered tour guider using RAG technology.
[website] [front-end code] [back-end code]

U-pre: U-Net is an excellent learner for time series forecasting
Time series forecasting is suited for U-Net's architecture due to its consistent input-output distributions and strong mathematical alignment. Combining U-Net with Bert-Encoder improved performance by incorporating both local and global attention.
[code] [report-cn]
Time series forecasting is suited for U-Net's architecture due to its consistent input-output distributions and strong mathematical alignment. Combining U-Net with Bert-Encoder improved performance by incorporating both local and global attention.
[code] [report-cn]

M-pre: Mamba for time series forecasting
We tried Mamba for time series forecasting based on feature-conditioned tokens, which outpreformed transformer-based U-pre.
[code] [report-cn]
We tried Mamba for time series forecasting based on feature-conditioned tokens, which outpreformed transformer-based U-pre.
[code] [report-cn]

UniGen: Unified understanding and generation based on Flicker 8k dataset
A light-weight model for joint learning of language and image based on tiny captioned image dataset. UniGen is equipped with the abilities of image genration and language description in one model.
[code]
A light-weight model for joint learning of language and image based on tiny captioned image dataset. UniGen is equipped with the abilities of image genration and language description in one model.
[code]


FGSM3D: Is the point cloud gradient perturbation attack feasible?
We tried to extend FGSM to the 3D field and achieved significant success within a certain gradient range, but the sampling method of 3D models tells us that things seem to be not that simple...
[code] [report-cn]
We tried to extend FGSM to the 3D field and achieved significant success within a certain gradient range, but the sampling method of 3D models tells us that things seem to be not that simple...
[code] [report-cn]

AcoFlow: Heuristic Search for Maximum Flow Problem
The problem of finding the maximum flow lies in how to design better heuristic information to find the augmenting path. We boldly challenge this problem through the ant colony algorithm.
[code] [report-cn]
The problem of finding the maximum flow lies in how to design better heuristic information to find the augmenting path. We boldly challenge this problem through the ant colony algorithm.
[code] [report-cn]