Zihui (Sherry) Xue

Hi, I am Zihui Xue (薛子慧), a Ph.D. student at UT Austin, advised by Prof. Kristen Grauman. I am also a visiting researcher at Facebook AI Research. Previously, I'm fortunate to work with Prof. Radu Marculescu on efficient deep learning and Prof. Hang Zhao on multimodal learning. I obtained my bachelor's degree from Fudan University in 2020, where I worked with Prof. Yuedong Xu .

I'm broadly interested in multimodal learning (images, audio, video, language, etc.). My recent research lies in egocentric video learning.

Email  |  CV  |  Google Scholar  |  Github

profile photo
Photo taken by [Zhengqi Gao]
  • [Aug. 2022] Spent a wonderful summer interning at Facebook AI Research (FAIR) 😊
  • [Jan. 2022] Co-advise [Cross Inductive Bias Distillation] got accepted by CVPR'22 🎉
  • [Jan. 2022] Check out our SUGAR paper on efficient GNN training 🙇
  • [Sep. 2021] One paper got accepted by NeurIPS'21 🎉
  • [Sep. 2021] One paper got accepted by CoRL'21 🎉
  • [Jul. 2021] Two papers got accepted by ICCV'21 (one first author) 🎉
  • [Aug. 2020] Start working with Prof. Hang Zhao at Shanghai Qi Zhi Institue, Tsinghua University on multimodal learning 😊
(a) Multimodal Learning and Self-supervised Learning
The Modality Focusing Hypothesis: On the Blink of Multimodal Knowledge Distillation

Zihui Xue*, Zhengqi Gao* Sucheng Ren*, Hang Zhao
arXiv preprint, 2022

When is multimodal knowledge distillation helpful?

Dynamic Multimodal Fusion

Zihui Xue, Radu Marculescu
arXiv preprint, 2022

Adaptively fuse multimodal data and generate data-dependent forward paths during inference time.

What Makes Multi-Modal Learning Better than Single (Provably)

Yu Huang, Chenzhuang Du, Zihui Xue, Xuanyao Chen, Hang Zhao, Longbo Huang
Conference on Neural Information Processing Systems (NeurIPS), 2021

Can multimodal learning provably perform better than unimodal?

Multimodal Knowledge Expansion

Zihui Xue, Sucheng Ren, Zhengqi Gao, Hang Zhao
International Conference on Computer Vision (ICCV), 2021
[paper] [website]

A knowledge distillation-based framework to effectively utilize multimodal data without requiring labels.

On Feature Decorrelation in Self-Supervised Learning

Tianyu Hua, Wenxiao Wang, Zihui Xue, Sucheng Ren, Yue Wang, Hang Zhao
International Conference on Computer Vision (ICCV), 2021
(Oral, Acceptance Rate 3.0%)

[paper] [website]

Reveal the connection between model collapse and feature correlations!

(b) Efficient Deep Learning
SUGAR: Efficient Subgraph-level Training via Resource-aware Graph Partitioning

Zihui Xue, Yuedong Yang, Mengtian Yang, Radu Marculescu
arXiv preprint, 2022

An efficient GNN training framework that accounts for resource constraints.

Anytime Depth Estimation with Limited Sensing and Computation Capabilities on Mobile Devices

Yuedong Yang, Zihui Xue, Radu Marculescu
Conference on Robot Learning (CoRL), 2021

Anytime Depth Estimation with energy-saving 2D LiDARs and monocular cameras.

(c) Network Science
Sampling Graphlets of Multiplex Networks: A Restricted Random Walk Approach

Simiao Jiao, Zihui Xue, Xiaowei Chen, Yuedong Xu
ACM Transactions on the Web (TWEB), 2021


A random walk approach to estimate the graphlet concentration in multiplex networks.