
Mingfang Zhang
Computer Vision Researcher
- Tokyo, Japan
- Woven by Toyota
- The University of Tokyo
- Google Scholar
- Github
Publications
InstAP: Instance-Aware Vision-Language Pre-Train for Spatial-Temporal Understanding
Published in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026
Multi-speaker Attention Alignment for Multimodal Social Interaction
Published in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026
CaST-Bench: Benchmarking Causal Chain-Grounded Spatio-Temporal Reasoning for Video Question Answering
Published in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026
Egocentric Action-aware Inertial Localization in Point Clouds with Vision-Language Guidance
Published in International Conference on Computer Vision (ICCV), 2025
Vinci: A Real-time Smart Assistant Based on Egocentric Vision-language Model for Portable Devices
Published in ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT), 2025
EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World
Published in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024
InstAP: Instance-Aware Vision-Language Pre-Train for Spatial-Temporal Understanding
Published in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026
Multi-speaker Attention Alignment for Multimodal Social Interaction
Published in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026
CaST-Bench: Benchmarking Causal Chain-Grounded Spatio-Temporal Reasoning for Video Question Answering
Published in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026
Vinci: A Real-time Smart Assistant Based on Egocentric Vision-language Model for Portable Devices
Published in ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT), 2025
Masked Video and Body-worn IMU Autoencoder for Egocentric Action Recognition
Published in European Conference on Computer Vision (ECCV), 2024
Single-to-Dual-View Adaptation for Egocentric 3D Hand Pose Estimation
Published in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024
EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World
Published in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024
Structural Multiplane Image: Bridging Neural View Synthesis and 3D Reconstruction
Published in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023
Optical Flow in the Dark
Published in IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021
Optical Flow in the Dark
Published in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020
SiMHand: Mining Similar Hands for Large-Scale 3D Hand Pose Pre-training
Published in International Conference on Learning Representations (ICLR), 2025
Single-to-Dual-View Adaptation for Egocentric 3D Hand Pose Estimation
Published in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024
GazeOnce: Real-Time Multi-Person Gaze Estimation
Published in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022