I'm a Computer Vision Research Scientist at Woven by Toyota and a Cooperative Research Fellow at Y. Sato Lab, the University of Tokyo. I focus on computer vision and human activity understanding, specifically involving vision-language multimodal reasoning, video and multi-view understanding, and human body perception.

News
May 2026 Two papers accepted to CVPR 2026: CaST-Bench and Multi-speaker Attention Alignment.
Mar 2026 Joined Woven by Toyota as Research Scientist in the Vision AI Platform team.
Mar 2026 Started as Cooperative Research Fellow at Y. Sato Lab, the University of Tokyo.
Mar 2026 Received Ph.D. in Information Science from the University of Tokyo.
Sep 2025 Paper accepted to ICCV 2025: Egocentric Action-aware Inertial Localization.
Research Experience
Research Scientist at Woven by Toyota Vision AI Platform
2026 –
Intern at Woven by Toyota Vision AI Platform
2025
Intern at CyberAgent AI Lab Activity Understanding Team
2024
Intern at Shanghai AI Laboratory OpenGVLab
2023
Intern at Microsoft Research Asia Media Computing Group
2022
Education
Ph.D. in Information Science The University of Tokyo
2026.3
M.Sc. in Information Science The University of Tokyo
2023.3
B.Sc. in Computer Science Nanjing University
2020.7
Top-Tier Publications
CVPR 2026

CaST-Bench: Benchmarking Causal Chain-Grounded Spatio-Temporal Reasoning for Video Question Answering

Mingfang Zhang, Jingjing Pan, Ashutosh Kumar, Rajat Saini, Mustafa Erdogan, Hsuan-Kung Yang, Caixin Kang, Yifei Huang, Yoichi Sato, Quan Kong

CVPR 2026
CVPR 2026

Multi-speaker Attention Alignment for Multimodal Social Interaction

Liangyang Ouyang, Yifei Huang, Mingfang Zhang, Caixin Kang, Ryosuke Furuta, Yoichi Sato

CVPR 2026
IMWUT 2025

Vinci: A Real-time Smart Assistant Based on Egocentric Vision-language Model for Portable Devices

Yifei Huang, Jilan Xu, Baoqi Pei, Lijin Yang, Mingfang Zhang, ..., Limin Wang

IMWUT 2025
ICCV 2025

Egocentric Action-aware Inertial Localization in Point Clouds with Vision-Language Guidance

Mingfang Zhang, Ryo Yonetani, Yifei Huang, Liangyang Ouyang, Ruicong Liu, Yoichi Sato

ICCV 2025
ICLR 2025

SiMHand: Mining Similar Hands for Large-Scale 3D Hand Pose Pre-training

Nie Lin, Takehiko Ohkawa, Yifei Huang, Mingfang Zhang, Minjie Cai, Ming Li, Ryosuke Furuta, Yoichi Sato

ICLR 2025
ECCV 2024

Masked Video and Body-worn IMU Autoencoder for Egocentric Action Recognition

Mingfang Zhang, Yifei Huang, Ruicong Liu, Yoichi Sato

ECCV 2024
CVPR 2024

EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World

* co-first author Yifei Huang*, Guo Chen*, Jilan Xu*, Mingfang Zhang*, Lijin Yang, Baoqi Pei, Hongjie Zhang, Lu Dong, Yali Wang, Limin Wang, Yu Qiao

CVPR 2024
CVPR 2024

Single-to-Dual-View Adaptation for Egocentric 3D Hand Pose Estimation

Ruicong Liu, Takehiko Ohkawa, Mingfang Zhang, Yoichi Sato

CVPR 2024
CVPR 2023

Structural Multiplane Image: Bridging Neural View Synthesis and 3D Reconstruction

Mingfang Zhang, Jinglu Wang, Xiao Li, Yifei Huang, Yoichi Sato, Yan Lu

CVPR 2023
CVPR 2022

GazeOnce: Real-Time Multi-Person Gaze Estimation

Mingfang Zhang, Yunfei Liu, Feng Lu

CVPR 2022
TPAMI 2021

Optical Flow in the Dark

Mingfang Zhang, Yinqiang Zheng, Feng Lu

TPAMI 2021
CVPR 2020

Optical Flow in the Dark

* co-first author Yinqiang Zheng*, Mingfang Zhang*, Feng Lu

CVPR 2020