Email:
hhua2 [AT] cs.rochester.edu
<aside> <img src="notion://custom_emoji/16b80f2e-3c9f-4c04-963e-223f53bce4d5/155c5b68-f629-802a-a0fa-007a2920d110" alt="notion://custom_emoji/16b80f2e-3c9f-4c04-963e-223f53bce4d5/155c5b68-f629-802a-a0fa-007a2920d110" width="40px" /> **Google Scholar**
</aside>
<aside> <img src="https://prod-files-secure.s3.us-west-2.amazonaws.com/16b80f2e-3c9f-4c04-963e-223f53bce4d5/cd361d2a-9856-49d3-bfa3-abc0d900c2fa/25231.png" alt="https://prod-files-secure.s3.us-west-2.amazonaws.com/16b80f2e-3c9f-4c04-963e-223f53bce4d5/cd361d2a-9856-49d3-bfa3-abc0d900c2fa/25231.png" width="40px" /> GitHub
</aside>
<aside> <img src="notion://custom_emoji/16b80f2e-3c9f-4c04-963e-223f53bce4d5/155c5b68-f629-80e1-adbc-007af1442f89" alt="notion://custom_emoji/16b80f2e-3c9f-4c04-963e-223f53bce4d5/155c5b68-f629-80e1-adbc-007af1442f89" width="40px" /> Linkedin
</aside>
I am Hang Hua, a final-year PhD student in Computer Science at the University of Rochester :rochester: advised by Prof. Jiebo Luo (Fellow of ACM/AAAI/IEEE/NAI/AIMBE/IAPR/SPIE). Prior to UR, I obtained my master's degree from Peking University :peking: and my bachelor’s degree from South China University of Technology :south_china:.
My research focuses on GenAI, with a particular emphasis on Multimodal LLMs (MLLMs) and Pre-trained Language Models (PLMs). I investigate the core limitations of MLLMs and PLMs —such as Compositionality, Fine-grained Visual Perception, Robustness, and Reasoning—that cannot be overcome by scaling alone. To address these challenges, I develop diagnostic benchmarks to assess MLLMs' capabilities and design new MLLMs that incorporate enhanced competencies. More specifically,
<aside> 🌟 What’s NEW
☑️ Feb. 26, 2025 📣 🚀🚀 Two papers ( FineCaption, VidComposition) accepted by CVPR 2025—see you in Nashville!
☑️ Jan. 02, 2025 📣 Excited to introduce our new survey paper — Generative AI for Cel-Animation: A Survey.
☑️ Dec. 09, 2024 🚀🚀 Two papers (**V2Xum-LLM, AVicuna**) accepted by AAAI 2025—see you in Philadelphia!
☑️ Nov. 21, 2024 📣 We propose FineCaption, a novel Vision-Language model with the improved capabilities of Attribute-Aware Regional Captioning, Regional Dense Captioning, and Comprehensive Global Image Captioning.
☑️ Oct. 2, 2024 📣 We release MMComposition, a new benchmark for evaluating the compositionality of MLLMs.
☑️ Jul. 1, 2024 🚀🚀 FineMatch is accepted by ECCV 2024!
</aside>
Please see my Google Scholar profiles for the full list.
(*: equal contribution, 🔥: highlight)
🔥PROMPTCAP: Prompt-Guided Image Captioning for VQA with GPT-3
Hang Hua*,Yushi Hu*, Zhengyuan Yang, Weijia Shi, Noah A. Smith, Jiebo Luo
🔥MMCOMPOSITION: Revisiting the Compositionality of Pre-trained Vision-Language Models
Hang Hua, Yunlong Tang, Ziyun Zeng, Liangliang Cao, Zhengyuan Yang, Hangfeng He, Chenliang Xu, Jiebo Luo
🔥FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity
Hang Hua, Qing Liu, Lingzhi Zhang, Jing Shi, Zhifei Zhang, Yilin Wang, Jianming Zhang, Jiebo Luo
🔥V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning
Hang Hua, Yunlong Tang, Chenliang Xu, Jiebo Luo
🔥FINEMATCH: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction
Hang Hua, Jing Shi, Kushal Kafle, Simon Jenni, Daoan Zhang, John Collomosse, Scott Cohen, Jiebo Luo
🔥Generative AI for Cel-Animation: A Survey
Yunlong Tang, Junjia Guo, Pinxin Liu, Zhiyuan Wang, Hang Hua, Jia-Xing Zhong,Chenliang Xu
🔥VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?
Yunlong Tang*, Junjia Guo*, Hang Hua, Susan Liang, Mingqian Feng, Xinyang Li, Rui Mao, Chao Huang, Jing Bi, Zeliang Zhang, Pooyan Fazli, Chenliang Xu
PromptFix: You Prompt and We Fix the Photo
Yongsheng Yu, Ziyun Zeng, Hang Hua, Jianlong Fu, Jiebo Luo
BattleAgent: Multi-modal Dynamic Emulation on Historical Battles to Complement Historical Analysis
Shuhang Lin, Wenyue Hua, Lingyao Li, Jianchao Ji, Lizhou Fan, Hang Hua, Jiebo Luo, Yongfeng Zhang
EMNLP 2024 Demo Track. [paper][code]
Emo-Avatar: Efficient Monocular Video Style Avatar through Texture Rendering
Pinxin Liu, Luchuan Song, Daoan Zhang, Hang Hua, Yunlong Tang, Huaijin Tu, Jiebo Luo, Chenliang Xu
3D Vision 2025. [paper]
Improving Pre-trained Language Model Fine-tuning with Noise Stability Regularization
Hang Hua, Xingjian Li, Dejing Dou, Chengzhong Xu, Jiebo Luo
TNNLS 2023 (IF: 10.4). [paper]
VideoXum: Cross-modal Visual and Textural Summarization of Videos
Hang Hua*, Jingyang Lin*, Ming Chen, Yikang Li, Jenhao Hsiao, Chiuman Ho, Jiebo Luo
TMM 2023 (IF: 8.4). [paper][code]
Noise Stability Regularization for Improving BERT Fine-tuning
Hang Hua, Xingjian Li, Dejing Dou, Chengzhong Xu, Jiebo Luo
NAACL-HLT 2021. [paper]
Controllable Unsupervised Text Attribute Transfer via Editing Entangled Latent Representation
Ke Wang, Hang Hua, Xiaojun Wan
Jefferies Data Science Fellowship, 2020-2023
University of Rochester, 08/2020–Present
Peking University, 09/2017–05/2020
Adobe Research, 03/2024–11/2024
Adobe Research, 05/2023–11/2023