IMG_6119.heic

Email:

hhua2 [AT] cs.rochester.edu

<aside> <img src="notion://custom_emoji/16b80f2e-3c9f-4c04-963e-223f53bce4d5/155c5b68-f629-802a-a0fa-007a2920d110" alt="notion://custom_emoji/16b80f2e-3c9f-4c04-963e-223f53bce4d5/155c5b68-f629-802a-a0fa-007a2920d110" width="40px" /> **Google Scholar**

</aside>

<aside> <img src="https://prod-files-secure.s3.us-west-2.amazonaws.com/16b80f2e-3c9f-4c04-963e-223f53bce4d5/cd361d2a-9856-49d3-bfa3-abc0d900c2fa/25231.png" alt="https://prod-files-secure.s3.us-west-2.amazonaws.com/16b80f2e-3c9f-4c04-963e-223f53bce4d5/cd361d2a-9856-49d3-bfa3-abc0d900c2fa/25231.png" width="40px" /> GitHub

</aside>

<aside> <img src="notion://custom_emoji/16b80f2e-3c9f-4c04-963e-223f53bce4d5/155c5b68-f629-80e1-adbc-007af1442f89" alt="notion://custom_emoji/16b80f2e-3c9f-4c04-963e-223f53bce4d5/155c5b68-f629-80e1-adbc-007af1442f89" width="40px" /> Linkedin

</aside>

👋 Hi!

I am Hang Hua, a final-year PhD student in Computer Science at the University of Rochester advised by Prof. Jiebo Luo. Prior to UR, I obtained my master's degree from Peking University and my bachelor’s degree from South China University of Technology.

🌋 Research Interests

My research focuses on GenAI, with a particular emphasis on Multimodal LLMs (MLLMs). I investigate the core limitations of MLLMs—such as Compositionality, Fine-grained Visual Perception, and Reasoning—that cannot be overcome by scaling alone. To address these challenges, I develop diagnostic benchmarks to assess MLLMs' capabilities and design new MLLMs that incorporate enhanced competencies. More specifically,

<aside> 🌟 What’s NEW

☑️ Dec. 09, 2024 Two papers (**V2Xum-LLM, AVicuna**) accepted by AAAI 2025—see you in Philadelphia!

☑️ Nov. 21, 2024 We propose FineCaption, a novel Vision-Language model with the improved capabilities of Attribute-Aware Regional Captioning, Regional Dense Captioning, and Comprehensive Global Image Captioning.

☑️ Oct. 2, 2024 We release MMComposition, a new benchmark for evaluating the compositionality of MLLMs.

☑️ Jul. 1, 2024 FineMatch is accepted by ECCV 2024!

</aside>

📜 Selected Publications

Please see my Google Scholar profiles for the full list.

(*: equal contribution, 🔥: highlight)


🔥PROMPTCAP: Prompt-Guided Image Captioning for VQA with GPT-3

Yushi Hu*, Hang Hua*, Zhengyuan Yang, Weijia Shi, Noah A. Smith, Jiebo Luo

ICCV 2023. [paper][code]

🔥MMCOMPOSITION: Revisiting the Compositionality of Pre-trained Vision-Language Models

Hang Hua, Yunlong Tang, Ziyun Zeng, Liangliang Cao, Zhengyuan Yang, Hangfeng He, Chenliang Xu, Jiebo Luo

ArXiv 2024. [paper][code]

🔥FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity

Hang Hua, Qing Liu, Lingzhi Zhang, Jing Shi, Zhifei Zhang, Yilin Wang, Jianming Zhang, Jiebo Luo

ArXiv 2024. [paper][code]

🔥V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning

Hang Hua, Yunlong Tang, Chenliang Xu, Jiebo Luo

AAAI 2025. [paper][code]

🔥FINEMATCH: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction

Hang Hua, Jing Shi, Kushal Kafle, Simon Jenni, Daoan Zhang, John Collomosse, Scott Cohen, Jiebo Luo

ECCV 2024. [paper][code]

VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?

Yunlong Tang, Junjia Guo, Hang Hua, Susan Liang, Mingqian Feng, Xinyang Li, Rui Mao, Chao Huang, Jing Bi, Zeliang Zhang, Pooyan Fazli, Chenliang Xu

ArXiv 2024. [paper][code]

PromptFix: You Prompt and We Fix the Photo

Yongsheng Yu, Ziyun Zeng, Hang Hua, Jianlong Fu, Jiebo Luo

NeurIPS 2024. [paper] [code]

BattleAgent: Multi-modal Dynamic Emulation on Historical Battles to Complement Historical Analysis

Shuhang Lin, Wenyue Hua, Lingyao Li, Jianchao Ji, Lizhou Fan, Hang Hua, Jiebo Luo, Yongfeng Zhang

EMNLP 2024 Demo Track. [paper][code]

Emo-Avatar: Efficient Monocular Video Style Avatar through Texture Rendering

Pinxin Liu, Luchuan Song, Daoan Zhang, Hang Hua, Yunlong Tang, Huaijin Tu, Jiebo Luo, Chenliang Xu

3D Vision 2025. [paper]

Improving Pre-trained Language Model Fine-tuning with Noise Stability Regularization

Hang Hua, Xingjian Li, Dejing Dou, Chengzhong Xu, Jiebo Luo

TNNLS 2023 (IF: 10.4). [paper]

VideoXum: Cross-modal Visual and Textural Summarization of Videos

Hang Hua*, Jingyang Lin*, Ming Chen, Yikang Li, Jenhao Hsiao, Chiuman Ho, Jiebo Luo

TMM 2023 (IF: 8.4). [paper][code]

Noise Stability Regularization for Improving BERT Fine-tuning

Hang Hua, Xingjian Li, Dejing Dou, Chengzhong Xu, Jiebo Luo

NAACL-HLT 2021. [paper]

Controllable Unsupervised Text Attribute Transfer via Editing Entangled Latent Representation

Ke Wang, Hang Hua, Xiaojun Wan

NeurIPS 2019. [paper][code]


📚Professional Service:


🏆Awards:

Jefferies Data Science Fellowship, 2020


🔬 Research Experience

University of Rochester, 08/2020–Present

Ph.D. student, advised by  Jiebo Luo

Peking University, 09/2017–05/2020

Research Assistant, supervised by  Xiaojun Wan

Adobe Research, 03/2024–11/2024

Research Intern, supervised by  Zhe Lin

Adobe Research, 05/2023–11/2023

Research Intern, supervised by Scott Cohen