Currently, I am a Postdoctoral Researcher at Suzhou Institute for Advanced Research, University of Science and Technology of China (USTC), advised by Prof.
S. Kevin Zhou (IEEE Fellow) and Prof.
Houqiang Li (IEEE Fellow). Before that, I obtained my Ph.D. degree from the Department of Electronic Engineering and Information Science, University of Science and Technology of China (USTC) in 2024, advised by Prof.
Yongdong Zhang (IEEE Fellow) and Prof.
Zhendong Mao. From 2018 to 2020, I studied in the Department of Automation at USTC, advised by Prof.
Shuang Cong. I obtained my B. Eng. degree in the School of Internet of Things Engineering from Jiangnan University in 2018.
My research interests broadly lie in the areas of Multimodal Artificial Intelligence and Deep Learning (e.g., vision-language alignment, cross-modal retrieval, report generation, retrieval augmented generation, hallucination evaluation, etc.). I am recently interested in multimodal large language models in medical scenarios, including explainable disease diagnosis, LLM-based clinical decision making, medical text processing, pathology, MRI image processing, etc.
We are currently completing a review: Composed Multi-modal Retrieval: A Survey of Approaches and Applications. For more details, please see
page.
This repo is used for recording and tracking recent Composed Multi-modal Retrieval (CMR) works, including Composed Image Retrieval (CIR), Composed Video Retrieval (CVR), Composed Person Retrieval (CPR), etc.
The survey can be found
here.
Explainable artificial intelligence is an important research direction, and the Concept Bottleneck Model (CBM) is a current promising research paradigm.
CBMs typically involve a layer preceding the final fully connected classifier, where each neuron corresponds to a concept that can be interpreted by humans. CBMs also show advantages in improving accuracy through human intervention during testing.
We are maintaining a GitHub repository
CBM page, aiming to keep pace with its rapidly evolving.