I am currently a first-year Ph.D. student at the AI3 Institute of Fudan University, supervised by Prof. Siyu Zhu. Prior to this, I obtained my bachelor’s degree from the School of Software Engineering, Sun Yat-sen University. My research focuses on Vision Generative Models, World Models, and Vision-Language-Action Models.

πŸ”₯ News

  • 2026.02: Β  πŸŽ‰πŸŽ‰Β  WAM-Flow is accepted by CVPR 2026.
  • 2025.08: Β  πŸŽ‰πŸŽ‰Β  Hallo4 is accepted by SIGGRAPH Asia 2025.
  • 2025.02: Β  πŸŽ‰πŸŽ‰Β  Hallo3 is accepted by CVPR 2025.
  • 2025.01: Β  πŸŽ‰πŸŽ‰Β  Hallo2 is accepted by ICLR 2025.

πŸ“ Publications

arXiv 2026
WAM-Diff teaser

WAM-Diff: A Masked Diffusion VLA Framework with MoE and Online Reinforcement Learning for Autonomous Driving

Mingwang Xu*, Jiahao Cui*, Feipeng Cai*, Hanlin Shang*, Zhihao Zhu, Shan Luan, Yifang Xu, Neng Zhang, Yaoyi Li, Jia Cai, Siyu Zhu

  • WAM-Diff is a masked-diffusion VLA framework for autonomous driving, achieving 91.0 PDMS on NAVSIM-v1 and 89.7 EPDMS on NAVSIM-v2.
CVPR 2026
WAM-Flow teaser

WAM-Flow: Parallel Coarse-to-Fine Motion Planning via Discrete Flow Matching for Autonomous Driving

Yifang Xu*, Jiahao Cui*, Zhihao Zhu*, Hanlin Shang, Shan Luan, Mingwang Xu, Feipeng Cai, Neng Zhang, Yaoyi Li, Jia Cai, Siyu Zhu

  • WAM-Flow is a VLA planner that uses discrete flow matching for parallel coarse-to-fine trajectory generation, achieving 90.3 PDMS on NAVSIM v1.
SIGGRAPH Asia 2025
sym

Hallo4: High-Fidelity Dynamic Portrait Animation via Direct Preference Optimization

Jiahao Cui*, Yan Chen*, Mingwang Xu*, Hanlin Shang, Yuxuan Chen, Yun Zhan, Zilong Dong, Yao Yao, Jingdong Wang, Siyu Zhu

  • Powered by DPO, Halla4 generates lifelike audio-driven avatar videos with rich emotional expression and highly accurate lip synchronization.
CVPR 2025

Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Video Diffusion Transformer

Jiahao Cui, Hui Li, Yun Zhan, Hanlin Shang, Kaihui Cheng, Yuqi Ma, Shan Mu, Hang Zhou, Jingdong Wang, Siyu Zhu

  • Hallo3 can generate highly realistic avatars with dynamic backgrounds, expressive foregrounds, and various head orientations, which collectively contribute to creating remarkably vivid and lifelike talking head.
ICLR 2025
sym

Hallo2: Long-Duration and High-Resolution Audio-driven Portrait Image Animation

Jiahao Cui*, Hui Li*, Yao Yao, Hao Zhu, Hanlin Shang, Kaihui Cheng, Hang Zhou, Siyu Zhu, Jingdong Wang

  • A new avatar video generative model capable of generating stunning 4K resolution videos for up to 1 hour!

πŸŽ– Honors and Awards

  • 2025.12: National Scholarship for Ph.D. Students
  • 2023.12: National Scholarship
  • 2023.11: Sun Yat-sen University Outstanding Student Scholarship, First Class
  • 2022.12: National Scholarship
  • 2022.12: Sun Yat-sen University Outstanding Student Scholarship, First Class

πŸ“– Educations

  • 2025.09 - present: Ph.D student, AI3 Institute, Fudan University.
  • 2021.09 - 2025.06: Undergraduate student, School of Software Engineering, Sun Yat-sen University.

πŸ’» Internships

  • 2025.09 - present: 2030 Lab, YINWANG, China.
  • 2025.06 - 2025.09: PCG ARCLab, Tencent, China.

πŸ’¬ Services

  • Reviewer: SIGGRAPH Asia 2025, CVPR 2026, ECCV 2026.