Zihui (Sherry) Xue


Hi, I am Zihui Xue (薛子慧), and I usually go by Sherry. I am 4th-year Ph.D. candidate at UT Austin, advised by Prof. Kristen Grauman. My research focuses on developing methods to better understand and structure video content for instructional videos.

Email  |  Scholar  |  Github

profile photo
News
  • [May 2025] Excited to spend the summer in London 🇬🇧 working at Google DeepMind as a Student Researcher.
  • [Feb. 2025] Two accepted papers at CVPR: ProgressCaptioner and Viewpoint Rosetta Stone (oral).
  • [Sep. 2024] HOI-Swap is accepted by NeurIPS'24. See you in Vancouver 🎿.
  • [Jul. 2024] Two accepted papers at ECCV'24: Action2Sound (oral) and Exo2Ego.
  • [Feb. 2024] Three papers (one first-author) got accepted by CVPR'24. See you in Seattle ☕️.
  • [Sep. 2023] AE2 got accepted by NeurIPS'23. See you in New Orleans 🦪.
  • [Feb. 2023] EgoT2 got accepted by CVPR'23 as Highlight. See you in Vancouver 🏔️.
  • [Aug. 2022] Spent a wonderful summer interning at FAIR, Meta AI, working with Lorenzo Torresani 😊
Recent Projects
Progress-Aware Video Frame Captioning

Zihui Xue, Joungbin An, Xitong Yang, Kristen Grauman
CVPR, 2025 [paper] [webpage]
Viewpoint Rosetta Stone: Unlocking Unpaired Ego-Exo Videos for View-invariant Representation Learning

Mi Luo, Zihui Xue, Alex Dimakis, Kristen Grauman
CVPR, 2025 (Oral) [paper] [webpage]
HOI-Swap: Swapping Objects in Videos with Hand-Object Interaction Awareness

Zihui Xue, Mi Luo, Changan Chen, Kristen Grauman
NeurIPS, 2024 [paper] [webpage]
Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos

Changan Chen*, Puyuan Peng*, Ami Baid, Zihui Xue, Wei-Ning Hsu, David Harwath, Kristen Grauman
ECCV, 2024 (Oral) [paper] [webpage]
Put Myself in Your Shoes: Lifting the Egocentric Perspective from Exocentric Videos

Mi Luo, Zihui Xue, Alex Dimakis, Kristen Grauman
ECCV, 2024 [paper]
Learning Object State Changes in Videos: An Open-World Perspective

Zihui Xue, Kumar Ashutosh, Kristen Grauman
CVPR, 2024 [paper] [webpage]
Ego-Exo4D: Understanding Skilled Human Activity from First-and Third-Person Perspectives

Kristen Grauman, Andrew Westbury, Lorenzo Torresani, Kris Kitani, Jitendra Malik, ..., Zihui Xue, et al.
CVPR, 2024 (Oral) [paper] [webpage] [blog]
Detours for Navigating Instructional Videos

Kumar Ashutosh, Zihui Xue, Tushar Nagarajan, Kristen Grauman
CVPR, 2024 (Highlight) [paper]
Learning Fine-grained View-Invariant Representations from Unpaired Ego-Exo Videos via Temporal Alignment

Zihui Xue, Kristen Grauman
NeurIPS, 2023 [paper] [webpage]
Fine-grained ego-exo view-invariant features -> temporally align two videos from diverse viewpoints
Egocentric Video Task Translation

Zihui Xue, Yale Song, Kristen Grauman, Lorenzo Torresani
CVPR 2023 (Hightlight) [paper] [webpage]
Hollistic egocentric perception for a set of diverse video tasks