Mingfang Zhang (张铭方)

I'm a Computer Vision Research Scientist at Woven by Toyota and a Cooperative Research Fellow at Y. Sato Lab, the University of Tokyo. I focus on computer vision and human activity understanding, specifically involving vision-language multimodal reasoning, video and multi-view understanding, and human modeling.

News

May 2026 Three papers accepted to CVPR 2026: CaST-Bench, Multi-speaker Attention Alignment, and InstAP.

Apr 2026 Joined Woven by Toyota as Research Scientist in the Vision AI Platform team.

Apr 2026 Started as Cooperative Research Fellow at Y. Sato Lab, the University of Tokyo.

Mar 2026 Received Ph.D. in Information Science from the University of Tokyo.

Sep 2025 Paper accepted to ICCV 2025: Egocentric Action-aware Inertial Localization.

Research Experience

Research Scientist at Woven by Toyota Vision AI Platform

2026 –

Intern at Woven by Toyota Vision AI Platform

2025

Intern at CyberAgent AI Lab Activity Understanding Team

2024

Intern at Shanghai AI Laboratory OpenGVLab

2023

Intern at Microsoft Research Asia Media Computing Group

2022

Education

Ph.D. in Computer Vision at Y. Sato Lab The University of Tokyo

2026.3

M.Sc. in Computer Vision at Y. Sato Lab The University of Tokyo

2023.3

B.Sc. in Computer Science Nanjing University

2020.7

Top-Tier Publications

CaST-Bench: Benchmarking Causal Chain-Grounded Spatio-Temporal Reasoning for Video Question Answering

Mingfang Zhang, Jingjing Pan, Ashutosh Kumar, Rajat Saini, Mustafa Erdogan, Hsuan-Kung Yang, Caixin Kang, Yifei Huang, Yoichi Sato, Quan Kong

Project Page Paper

CVPR 2026

Multi-speaker Attention Alignment for Multimodal Social Interaction

Liangyang Ouyang, Yifei Huang, Mingfang Zhang, Caixin Kang, Ryosuke Furuta, Yoichi Sato

Project Page Paper

CVPR 2026

InstAP: Instance-Aware Vision-Language Pre-Train for Spatial-Temporal Understanding

Ashutosh Kumar, Rajat Saini, Jingjing Pan, Mustafa Erdogan, Mingfang Zhang, Betty Le Dem, Norimasa Kobori, Quan Kong

Project Page Paper

CVPR 2026

Vinci: A Real-time Smart Assistant Based on Egocentric Vision-language Model for Portable Devices

Yifei Huang, Jilan Xu, Baoqi Pei, Lijin Yang, Mingfang Zhang, ..., Limin Wang

Project Page Paper

IMWUT 2025

Egocentric Action-aware Inertial Localization in Point Clouds with Vision-Language Guidance

Mingfang Zhang, Ryo Yonetani, Yifei Huang, Liangyang Ouyang, Ruicong Liu, Yoichi Sato

Project Page Paper

ICCV 2025

SiMHand: Mining Similar Hands for Large-Scale 3D Hand Pose Pre-training

Nie Lin, Takehiko Ohkawa, Yifei Huang, Mingfang Zhang, Minjie Cai, Ming Li, Ryosuke Furuta, Yoichi Sato

Project Page Paper

ICLR 2025

Masked Video and Body-worn IMU Autoencoder for Egocentric Action Recognition

Mingfang Zhang, Yifei Huang, Ruicong Liu, Yoichi Sato

Project Page Paper

ECCV 2024

EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World

* co-first author Yifei Huang^*, Guo Chen^*, Jilan Xu^*, Mingfang Zhang^*, Lijin Yang, Baoqi Pei, Hongjie Zhang, Lu Dong, Yali Wang, Limin Wang, Yu Qiao

Project Page Paper

CVPR 2024

Single-to-Dual-View Adaptation for Egocentric 3D Hand Pose Estimation

Ruicong Liu, Takehiko Ohkawa, Mingfang Zhang, Yoichi Sato

Project Page Paper

CVPR 2024

Structural Multiplane Image: Bridging Neural View Synthesis and 3D Reconstruction

Mingfang Zhang, Jinglu Wang, Xiao Li, Yifei Huang, Yoichi Sato, Yan Lu

Project Page Paper

CVPR 2023

GazeOnce: Real-Time Multi-Person Gaze Estimation

Mingfang Zhang, Yunfei Liu, Feng Lu

Project Page Paper

CVPR 2022

Optical Flow in the Dark

Mingfang Zhang, Yinqiang Zheng, Feng Lu

Project Page Paper

TPAMI 2021

Optical Flow in the Dark

* co-first author Yinqiang Zheng^*, Mingfang Zhang^*, Feng Lu

Project Page Paper

CVPR 2020