I'm a Computer Vision Research Scientist at Woven by Toyota and a Cooperative Research Fellow at Y. Sato Lab, the University of Tokyo. I focus on computer vision and human activity understanding, specifically involving vision-language multimodal reasoning, video and multi-view understanding, and human body perception.
News
May 2026 Two papers accepted to CVPR 2026: CaST-Bench and Multi-speaker Attention Alignment.
Mar 2026 Joined Woven by Toyota as Research Scientist in the Vision AI Platform team.
Mar 2026 Started as Cooperative Research Fellow at Y. Sato Lab, the University of Tokyo.
Mar 2026 Received Ph.D. in Information Science from the University of Tokyo.
Sep 2025 Paper accepted to ICCV 2025: Egocentric Action-aware Inertial Localization.
Research Experience

Research Scientist at Woven by Toyota Vision AI Platform

Intern at Woven by Toyota Vision AI Platform

Intern at CyberAgent AI Lab Activity Understanding Team

Intern at Shanghai AI Laboratory OpenGVLab

Intern at Microsoft Research Asia Media Computing Group
Education

Ph.D. in Information Science The University of Tokyo

M.Sc. in Information Science The University of Tokyo

B.Sc. in Computer Science Nanjing University
Top-Tier Publications

CaST-Bench: Benchmarking Causal Chain-Grounded Spatio-Temporal Reasoning for Video Question Answering
CVPR 2026

Multi-speaker Attention Alignment for Multimodal Social Interaction
CVPR 2026

Vinci: A Real-time Smart Assistant Based on Egocentric Vision-language Model for Portable Devices
IMWUT 2025

Egocentric Action-aware Inertial Localization in Point Clouds with Vision-Language Guidance
ICCV 2025

SiMHand: Mining Similar Hands for Large-Scale 3D Hand Pose Pre-training
ICLR 2025

Masked Video and Body-worn IMU Autoencoder for Egocentric Action Recognition
ECCV 2024

EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World
CVPR 2024

Single-to-Dual-View Adaptation for Egocentric 3D Hand Pose Estimation
CVPR 2024

Structural Multiplane Image: Bridging Neural View Synthesis and 3D Reconstruction
CVPR 2023

GazeOnce: Real-Time Multi-Person Gaze Estimation
CVPR 2022

Optical Flow in the Dark
TPAMI 2021

Optical Flow in the Dark
CVPR 2020
