Currently, I am a Postdoctoral Researcher at Suzhou Institute for Advanced Research, University of Science and Technology of China (USTC), advised by Prof.
S. Kevin Zhou (IEEE Fellow) and Prof.
Houqiang Li (IEEE Fellow). Before that, I obtained my Ph.D. degree from the Department of Electronic Engineering and Information Science, University of Science and Technology of China (USTC) in 2024, advised by Prof.
Yongdong Zhang (IEEE Fellow) and Prof.
Zhendong Mao. From 2018 to 2020, I studied in the Department of Automation at USTC, advised by Prof.
Shuang Cong. I obtained my B. Eng. degree in the School of Internet of Things Engineering from Jiangnan University in 2018.
My research interests broadly lie in the areas of Multimodal Artificial Intelligence and Deep Learning (e.g., vision-language alignment, cross-modal retrieval, report generation, retrieval augmented generation, hallucination evaluation, etc.). I am recently interested in multimodal large language models in medical scenarios.
We are currently completing a review: Composed Multi-modal Retrieval: A Survey of Approaches and Applications. For more details, please see
page.
This repo is used for recording and tracking recent Composed Multi-modal Retrieval (CMR) works, including Composed Image Retrieval (CIR), Composed Video Retrieval (CVR), Composed Person Retrieval (CPR), etc.
The survey can be found
here.