Currently, I am a Postdoctoral Researcher at Suzhou Institute for Advanced Research, University of Science and Technology of China (USTC), advised by Prof.
S. Kevin Zhou (IEEE Fellow) and Prof.
Houqiang Li (IEEE Fellow). Before that, I obtained my Ph.D. degree from the Department of Electronic Engineering and Information Science at USTC in 2024, advised by Prof.
Yongdong Zhang (IEEE Fellow) and Prof.
Zhendong Mao. From 2018 to 2020, I studied in the Department of Automation at USTC, advised by Prof.
Shuang Cong. I obtained my B. Eng. degree in the School of Internet of Things Engineering from Jiangnan University in 2018.
My research interests broadly lie in the areas of Multimodal Artificial Intelligence and Deep Learning (e.g., vision-language alignment, cross-modal retrieval, report generation, retrieval augmented generation, hallucination evaluation, etc.). I am recently interested in multimodal large language models in medical scenarios, including explainable disease diagnosis, LLM-based clinical decision making, medical text processing, pathology, MRI image processing, etc.
(1) My research in vision-language alignment and retrieval focuses on learning fine-grained, reliable, and interpretable cross-modal correspondences between visual content and natural language. The studied scenarios cover both natural images and medical images, ranging from general image-text matching, cross-modal retrieval, and semantic alignment to medical report generation, pathology image analysis, and retrieval-augmented medical decision support.
Recently, we have been completing a review: Composed Multi-modal Retrieval: A Survey of Approaches and Applications. For more details, please see
CMR page.
This repo is used for recording and tracking recent Composed Multi-modal Retrieval (CMR) works, including Composed Image Retrieval (CIR), Composed Video Retrieval (CVR), Composed Person Retrieval (CPR), etc.
The survey can be found
here.
(2) Explainable artificial intelligence is an important research direction, and the Concept Bottleneck Model (CBM) is a current promising research paradigm.
CBMs typically involve a layer preceding the final fully connected classifier, where each neuron corresponds to a concept that can be interpreted by humans. CBMs also show advantages in improving accuracy through human intervention during testing.
We are maintaining a GitHub repository
CBM page, aiming to keep pace with its rapidly evolving. The survey can be found
Concept Bottleneck Models for Explainable Decision Making: A Survey of Progress, Taxonomy and Future Directions.
(3) I am also actively exploring
LLM- and Agent-based medical intelligence, with a particular focus on adapting large language models to downstream clinical applications. Representative directions include clinical decision support, auxiliary diagnosis, medical report generation, ICD coding, medical text processing, hallucination evaluation, and retrieval-augmented clinical reasoning. More recently, I have been interested in advanced agentic systems that integrate domain-specific skills, self-learned knowledge, tool use, retrieval modules, and multi-agent collaboration, aiming to build more reliable, interpretable, and clinically useful AI systems for real-world medical scenarios.