Email:
hhua2 [AT] cs.rochester.edu
<aside> <img src="notion://custom_emoji/16b80f2e-3c9f-4c04-963e-223f53bce4d5/155c5b68-f629-802a-a0fa-007a2920d110" alt="notion://custom_emoji/16b80f2e-3c9f-4c04-963e-223f53bce4d5/155c5b68-f629-802a-a0fa-007a2920d110" width="40px" /> **Google Scholar**
</aside>
<aside> <img src="https://prod-files-secure.s3.us-west-2.amazonaws.com/16b80f2e-3c9f-4c04-963e-223f53bce4d5/cd361d2a-9856-49d3-bfa3-abc0d900c2fa/25231.png" alt="https://prod-files-secure.s3.us-west-2.amazonaws.com/16b80f2e-3c9f-4c04-963e-223f53bce4d5/cd361d2a-9856-49d3-bfa3-abc0d900c2fa/25231.png" width="40px" /> GitHub
</aside>
<aside> <img src="notion://custom_emoji/16b80f2e-3c9f-4c04-963e-223f53bce4d5/155c5b68-f629-80e1-adbc-007af1442f89" alt="notion://custom_emoji/16b80f2e-3c9f-4c04-963e-223f53bce4d5/155c5b68-f629-80e1-adbc-007af1442f89" width="40px" /> Linkedin
</aside>
I am Hang Hua, a final-year PhD student in Computer Science at the University of Rochester advised by Prof. Jiebo Luo. Prior to UR, I obtained my master's degree from Peking University and my bachelor’s degree from South China University of Technology.
My research focuses on GenAI, with a particular emphasis on Multimodal LLMs (MLLMs). I investigate the core limitations of MLLMs—such as Compositionality, Fine-grained Visual Perception, and Reasoning—that cannot be overcome by scaling alone. To address these challenges, I develop diagnostic benchmarks to assess MLLMs' capabilities and design new MLLMs that incorporate enhanced competencies. More specifically,
<aside> 🌟 What’s NEW
☑️ Dec. 09, 2024 Two papers (**V2Xum-LLM, AVicuna**) accepted by AAAI 2025—see you in Philadelphia!
☑️ Nov. 21, 2024 We propose FineCaption, a novel Vision-Language model with the improved capabilities of Attribute-Aware Regional Captioning, Regional Dense Captioning, and Comprehensive Global Image Captioning.
☑️ Oct. 2, 2024 We release MMComposition, a new benchmark for evaluating the compositionality of MLLMs.
☑️ Jul. 1, 2024 FineMatch is accepted by ECCV 2024!
</aside>
Please see my Google Scholar profiles for the full list.
(*: equal contribution, 🔥: highlight)
🔥PROMPTCAP: Prompt-Guided Image Captioning for VQA with GPT-3
Yushi Hu*, Hang Hua*, Zhengyuan Yang, Weijia Shi, Noah A. Smith, Jiebo Luo
🔥MMCOMPOSITION: Revisiting the Compositionality of Pre-trained Vision-Language Models
Hang Hua, Yunlong Tang, Ziyun Zeng, Liangliang Cao, Zhengyuan Yang, Hangfeng He, Chenliang Xu, Jiebo Luo
🔥FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity
Hang Hua, Qing Liu, Lingzhi Zhang, Jing Shi, Zhifei Zhang, Yilin Wang, Jianming Zhang, Jiebo Luo
🔥V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning
Hang Hua, Yunlong Tang, Chenliang Xu, Jiebo Luo
🔥FINEMATCH: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction
Hang Hua, Jing Shi, Kushal Kafle, Simon Jenni, Daoan Zhang, John Collomosse, Scott Cohen, Jiebo Luo
VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?
Yunlong Tang, Junjia Guo, Hang Hua, Susan Liang, Mingqian Feng, Xinyang Li, Rui Mao, Chao Huang, Jing Bi, Zeliang Zhang, Pooyan Fazli, Chenliang Xu
PromptFix: You Prompt and We Fix the Photo
Yongsheng Yu, Ziyun Zeng, Hang Hua, Jianlong Fu, Jiebo Luo
BattleAgent: Multi-modal Dynamic Emulation on Historical Battles to Complement Historical Analysis
Shuhang Lin, Wenyue Hua, Lingyao Li, Jianchao Ji, Lizhou Fan, Hang Hua, Jiebo Luo, Yongfeng Zhang
EMNLP 2024 Demo Track. [paper][code]
Emo-Avatar: Efficient Monocular Video Style Avatar through Texture Rendering
Pinxin Liu, Luchuan Song, Daoan Zhang, Hang Hua, Yunlong Tang, Huaijin Tu, Jiebo Luo, Chenliang Xu
3D Vision 2025. [paper]
Improving Pre-trained Language Model Fine-tuning with Noise Stability Regularization
Hang Hua, Xingjian Li, Dejing Dou, Chengzhong Xu, Jiebo Luo
TNNLS 2023 (IF: 10.4). [paper]
VideoXum: Cross-modal Visual and Textural Summarization of Videos
Hang Hua*, Jingyang Lin*, Ming Chen, Yikang Li, Jenhao Hsiao, Chiuman Ho, Jiebo Luo
TMM 2023 (IF: 8.4). [paper][code]
Noise Stability Regularization for Improving BERT Fine-tuning
Hang Hua, Xingjian Li, Dejing Dou, Chengzhong Xu, Jiebo Luo
NAACL-HLT 2021. [paper]
Controllable Unsupervised Text Attribute Transfer via Editing Entangled Latent Representation
Ke Wang, Hang Hua, Xiaojun Wan
Jefferies Data Science Fellowship, 2020
University of Rochester, 08/2020–Present
Ph.D. student, advised by Jiebo Luo
Peking University, 09/2017–05/2020
Research Assistant, supervised by Xiaojun Wan
Adobe Research, 03/2024–11/2024
Research Intern, supervised by Zhe Lin
Adobe Research, 05/2023–11/2023
Research Intern, supervised by Scott Cohen