I am currently a first-year Ph.D. student at the AI3 Institute of Fudan University, supervised by Prof. Siyu Zhu. Prior to this, I obtained my bachelor’s degree from the School of Software Engineering, Sun Yat-sen University. My research focuses on Vision Generative Models, World Models, and Vision-Language-Action Models.

🔥 News

2026.02: 🎉🎉 WAM-Flow is accepted by CVPR 2026.
2025.08: 🎉🎉 Hallo4 is accepted by SIGGRAPH Asia 2025.
2025.02: 🎉🎉 Hallo3 is accepted by CVPR 2025.
2025.01: 🎉🎉 Hallo2 is accepted by ICLR 2025.

📝 Publications

arXiv 2026

WAM-Diff: A Masked Diffusion VLA Framework with MoE and Online Reinforcement Learning for Autonomous Driving

Mingwang Xu^*, Jiahao Cui^*, Feipeng Cai^*, Hanlin Shang^*, Zhihao Zhu, Shan Luan, Yifang Xu, Neng Zhang, Yaoyi Li, Jia Cai, Siyu Zhu

paper code stars...

WAM-Diff is a masked-diffusion VLA framework for autonomous driving, achieving 91.0 PDMS on NAVSIM-v1 and 89.7 EPDMS on NAVSIM-v2.

CVPR 2026

WAM-Flow: Parallel Coarse-to-Fine Motion Planning via Discrete Flow Matching for Autonomous Driving

Yifang Xu^*, Jiahao Cui^*, Zhihao Zhu^*, Hanlin Shang, Shan Luan, Mingwang Xu, Feipeng Cai, Neng Zhang, Yaoyi Li, Jia Cai, Siyu Zhu

paper code stars...

WAM-Flow is a VLA planner that uses discrete flow matching for parallel coarse-to-fine trajectory generation, achieving 90.3 PDMS on NAVSIM v1.

SIGGRAPH Asia 2025

Hallo4: High-Fidelity Dynamic Portrait Animation via Direct Preference Optimization

Jiahao Cui^*, Yan Chen^*, Mingwang Xu^*, Hanlin Shang, Yuxuan Chen, Yun Zhan, Zilong Dong, Yao Yao, Jingdong Wang, Siyu Zhu

project paper code stars...

Powered by DPO, Halla4 generates lifelike audio-driven avatar videos with rich emotional expression and highly accurate lip synchronization.

CVPR 2025

Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Video Diffusion Transformer

Jiahao Cui, Hui Li, Yun Zhan, Hanlin Shang, Kaihui Cheng, Yuqi Ma, Shan Mu, Hang Zhou, Jingdong Wang, Siyu Zhu

project paper code stars...

Hallo3 can generate highly realistic avatars with dynamic backgrounds, expressive foregrounds, and various head orientations, which collectively contribute to creating remarkably vivid and lifelike talking head.

ICLR 2025

Hallo2: Long-Duration and High-Resolution Audio-driven Portrait Image Animation

Jiahao Cui^*, Hui Li^*, Yao Yao, Hao Zhu, Hanlin Shang, Kaihui Cheng, Hang Zhou, Siyu Zhu, Jingdong Wang

project paper code stars...

A new avatar video generative model capable of generating stunning 4K resolution videos for up to 1 hour!

🎖 Honors and Awards

2025.12: National Scholarship for Ph.D. Students
2023.12: National Scholarship
2023.11: Sun Yat-sen University Outstanding Student Scholarship, First Class
2022.12: National Scholarship
2022.12: Sun Yat-sen University Outstanding Student Scholarship, First Class

📖 Educations

2025.09 - present: Ph.D student, AI3 Institute, Fudan University.
2021.09 - 2025.06: Undergraduate student, School of Software Engineering, Sun Yat-sen University.

💻 Internships

2025.09 - present: 2030 Lab, YINWANG, China.
2025.06 - 2025.09: PCG ARCLab, Tencent, China.

💬 Services

Reviewer: SIGGRAPH Asia 2025, CVPR 2026, ECCV 2026.