IMG_6119.heic

Email:

hhua2 [AT] cs.rochester.edu

<aside> <img src="notion://custom_emoji/16b80f2e-3c9f-4c04-963e-223f53bce4d5/155c5b68-f629-802a-a0fa-007a2920d110" alt="notion://custom_emoji/16b80f2e-3c9f-4c04-963e-223f53bce4d5/155c5b68-f629-802a-a0fa-007a2920d110" width="40px" /> **Google Scholar**

</aside>

<aside> <img src="https://prod-files-secure.s3.us-west-2.amazonaws.com/16b80f2e-3c9f-4c04-963e-223f53bce4d5/cd361d2a-9856-49d3-bfa3-abc0d900c2fa/25231.png" alt="https://prod-files-secure.s3.us-west-2.amazonaws.com/16b80f2e-3c9f-4c04-963e-223f53bce4d5/cd361d2a-9856-49d3-bfa3-abc0d900c2fa/25231.png" width="40px" /> GitHub

</aside>

<aside> <img src="notion://custom_emoji/16b80f2e-3c9f-4c04-963e-223f53bce4d5/155c5b68-f629-80e1-adbc-007af1442f89" alt="notion://custom_emoji/16b80f2e-3c9f-4c04-963e-223f53bce4d5/155c5b68-f629-80e1-adbc-007af1442f89" width="40px" /> Linkedin

</aside>

👋 Hi!

I am Hang Hua, a final-year PhD student in Computer Science at the University of Rochester :rochester: advised by Prof. Jiebo Luo (Fellow of ACM/AAAI/IEEE/NAI/AIMBE/IAPR/SPIE). Prior to UR, I obtained my master's degree from Peking University :peking: and my bachelor’s degree from South China University of Technology :south_china:.

🌋 Research Interests

My research focuses on GenAI, with a particular emphasis on Multimodal LLMs (MLLMs) and Pre-trained Language Models (PLMs). I investigate the core limitations of MLLMs and PLMs —such as Compositionality, Fine-grained Visual Perception, Robustness, and Reasoning—that cannot be overcome by scaling alone. To address these challenges, I develop diagnostic benchmarks to assess MLLMs' capabilities and design new MLLMs that incorporate enhanced competencies. More specifically,

<aside> 🌟 What’s NEW

☑️ Feb. 26, 2025 📣 🚀🚀 Two papers ( FineCaption, VidComposition) accepted by CVPR 2025—see you in Nashville!

☑️ Jan. 02, 2025 📣 Excited to introduce our new survey paper — Generative AI for Cel-Animation: A Survey.

☑️ Dec. 09, 2024 🚀🚀 Two papers (**V2Xum-LLM, AVicuna**) accepted by AAAI 2025—see you in Philadelphia!

☑️ Nov. 21, 2024 📣 We propose FineCaption, a novel Vision-Language model with the improved capabilities of Attribute-Aware Regional Captioning, Regional Dense Captioning, and Comprehensive Global Image Captioning.

☑️ Oct. 2, 2024 📣 We release MMComposition, a new benchmark for evaluating the compositionality of MLLMs.

☑️ Jul. 1, 2024 🚀🚀 FineMatch is accepted by ECCV 2024!

</aside>

📜 Selected Publications

Please see my Google Scholar profiles for the full list.

(*: equal contribution, 🔥: highlight)


🔥PROMPTCAP: Prompt-Guided Image Captioning for VQA with GPT-3

Hang Hua*,Yushi Hu*, Zhengyuan Yang, Weijia Shi, Noah A. Smith, Jiebo Luo

ICCV 2023. [paper][code]

🔥MMCOMPOSITION: Revisiting the Compositionality of Pre-trained Vision-Language Models

Hang Hua, Yunlong Tang, Ziyun Zeng, Liangliang Cao, Zhengyuan Yang, Hangfeng He, Chenliang Xu, Jiebo Luo

ArXiv 2024. [paper][code]

🔥FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity

Hang Hua, Qing Liu, Lingzhi Zhang, Jing Shi, Zhifei Zhang, Yilin Wang, Jianming Zhang, Jiebo Luo

CVPR 2025. [paper][code]

🔥V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning

Hang Hua, Yunlong Tang, Chenliang Xu, Jiebo Luo

AAAI 2025. [paper][code]

🔥FINEMATCH: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction

Hang Hua, Jing Shi, Kushal Kafle, Simon Jenni, Daoan Zhang, John Collomosse, Scott Cohen, Jiebo Luo

ECCV 2024. [paper][code]

🔥Generative AI for Cel-Animation: A Survey

Yunlong Tang, Junjia Guo, Pinxin Liu, Zhiyuan Wang, Hang Hua, Jia-Xing Zhong,Chenliang Xu

ArXiv 2025. [paper][code]

🔥VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?

Yunlong Tang*, Junjia Guo*, Hang Hua, Susan Liang, Mingqian Feng, Xinyang Li, Rui Mao, Chao Huang, Jing Bi, Zeliang Zhang, Pooyan Fazli, Chenliang Xu

CVPR 2025. [paper][code]

PromptFix: You Prompt and We Fix the Photo

Yongsheng Yu, Ziyun Zeng, Hang Hua, Jianlong Fu, Jiebo Luo

NeurIPS 2024. [paper] [code]

BattleAgent: Multi-modal Dynamic Emulation on Historical Battles to Complement Historical Analysis

Shuhang Lin, Wenyue Hua, Lingyao Li, Jianchao Ji, Lizhou Fan, Hang Hua, Jiebo Luo, Yongfeng Zhang

EMNLP 2024 Demo Track. [paper][code]

Emo-Avatar: Efficient Monocular Video Style Avatar through Texture Rendering

Pinxin Liu, Luchuan Song, Daoan Zhang, Hang Hua, Yunlong Tang, Huaijin Tu, Jiebo Luo, Chenliang Xu

3D Vision 2025. [paper]

Improving Pre-trained Language Model Fine-tuning with Noise Stability Regularization

Hang Hua, Xingjian Li, Dejing Dou, Chengzhong Xu, Jiebo Luo

TNNLS 2023 (IF: 10.4). [paper]

VideoXum: Cross-modal Visual and Textural Summarization of Videos

Hang Hua*, Jingyang Lin*, Ming Chen, Yikang Li, Jenhao Hsiao, Chiuman Ho, Jiebo Luo

TMM 2023 (IF: 8.4). [paper][code]

Noise Stability Regularization for Improving BERT Fine-tuning

Hang Hua, Xingjian Li, Dejing Dou, Chengzhong Xu, Jiebo Luo

NAACL-HLT 2021. [paper]

Controllable Unsupervised Text Attribute Transfer via Editing Entangled Latent Representation

Ke Wang, Hang Hua, Xiaojun Wan

NeurIPS 2019. [paper][code]


📚Professional Service:


🏆Awards:

Jefferies Data Science Fellowship, 2020-2023


🔬 Research Experience

University of Rochester, 08/2020–Present

Peking University, 09/2017–05/2020

Adobe Research, 03/2024–11/2024

Adobe Research, 05/2023–11/2023