Kun Zhang (张坤)

About Me

Currently, I am a Postdoctoral Researcher at Suzhou Institute for Advanced Research, University of Science and Technology of China (USTC), advised by Prof. S. Kevin Zhou (IEEE Fellow) and Prof. Houqiang Li (IEEE Fellow). Before that, I obtained my Ph.D. degree from the Department of Electronic Engineering and Information Science at USTC in 2024, advised by Prof. Yongdong Zhang (IEEE Fellow) and Prof. Zhendong Mao. From 2018 to 2020, I studied in the Department of Automation at USTC, advised by Prof. Shuang Cong. I obtained my B. Eng. degree in the School of Internet of Things Engineering from Jiangnan University in 2018.

My research interests broadly lie in the areas of Multimodal Artificial Intelligence and Deep Learning (e.g., vision-language alignment, cross-modal retrieval, report generation, retrieval augmented generation, hallucination evaluation, etc.). I am recently interested in multimodal large language models in medical scenarios, including explainable disease diagnosis, LLM-based clinical decision making, medical text processing, pathology, MRI image processing, etc.

(1) My research in vision-language alignment and retrieval focuses on learning fine-grained, reliable, and interpretable cross-modal correspondences between visual content and natural language. The studied scenarios cover both natural images and medical images, ranging from general image-text matching, cross-modal retrieval, and semantic alignment to medical report generation, pathology image analysis, and retrieval-augmented medical decision support. Recently, we have been completing a review: Composed Multi-modal Retrieval: A Survey of Approaches and Applications. For more details, please see CMR page. This repo is used for recording and tracking recent Composed Multi-modal Retrieval (CMR) works, including Composed Image Retrieval (CIR), Composed Video Retrieval (CVR), Composed Person Retrieval (CPR), etc. The survey can be found here.

(2) Explainable artificial intelligence is an important research direction, and the Concept Bottleneck Model (CBM) is a current promising research paradigm. CBMs typically involve a layer preceding the final fully connected classifier, where each neuron corresponds to a concept that can be interpreted by humans. CBMs also show advantages in improving accuracy through human intervention during testing. We are maintaining a GitHub repository CBM page, aiming to keep pace with its rapidly evolving. The survey can be found Concept Bottleneck Models for Explainable Decision Making: A Survey of Progress, Taxonomy and Future Directions.

(3) I am also actively exploring LLM- and Agent-based medical intelligence, with a particular focus on adapting large language models to downstream clinical applications. Representative directions include clinical decision support, auxiliary diagnosis, medical report generation, ICD coding, medical text processing, hallucination evaluation, and retrieval-augmented clinical reasoning. More recently, I have been interested in advanced agentic systems that integrate domain-specific skills, self-learned knowledge, tool use, retrieval modules, and multi-agent collaboration, aiming to build more reliable, interpretable, and clinically useful AI systems for real-world medical scenarios.

Selected Publications [ Full List ]

2026

DiffVP: Differential Visual Semantic Prompting for LLM-Based CT Report Generation
Yuhe Tian, Kun Zhang^#, Haoran Ma, Rui Yan, Yingtai Li, Rongsheng Wang, Shaohua Kevin Zhou^#.
arXiv preprint arXiv:2603.17718, 2026
[ PDF ]

From Documents to Spans: Code-Centric Learning for LLM-based ICD Coding
Xu Zhang, Wenxin Ma, Chenxu Wu, Rongsheng Wang, Kun Zhang^#, S Kevin Zhou^#.
arXiv preprint arXiv:2603.15270, 2026
[ PDF ]

Concept Bottleneck Models for Explainable Decision Making: A Survey of Progress, Taxonomy, and Future Directions
Chunjiang Wang, Fan Li, Wenbo Hu, Rui Yan, Kun Zhang^#, Shaohua Kevin Zhou^#.
International Joint Conference on Artificial Intelligence (IJCAI), 2026
[ PDF ]

WSISum: WSI Summarization via Dual-level Semantic Reconstruction
Baizhi Wang^#, Kun Zhang^#, Yuhao Wang, Yunjie Gu, Haijing Luan, Ying Zhou, Taiyuan Hu, Rundong Wang, Zhidong Yang, Zihang Jiang, Rui Yan, S Kevin Zhou.
Medical Image Analysis, 2026

Improving Image-Text Alignment with an Optimal Feature Sub-space-aware Similarity Learning Framework
Kun Zhang^#, Jingyu Li^#, Zhe Li, Huatian Zhang, Lei Zhang, Zhendong Mao, Yongdong Zhang.
SCIENCE CHINA Information Sciences, 69(5): 152104, 2026

2025

DH-Set: Improving Vision-Language Alignment with Diverse and Hybrid Set-Embeddings Learning
Kun Zhang, Jingyu Li, Zhe Li, S Kevin Zhou.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025
[ PDF ]

Composed Multi-modal Retrieval: A Survey of Approaches and Applications
Kun Zhang^#, Jingyu Li^#, Zhe Li^#, Jingjing Zhang^#, Fan Li, Yandong Liu, Rui Yan, Zihang Jiang, Nan Chen, Lei Zhang, Yongdong Zhang, Zhendong Mao, S Kevin Zhou.
Preprint, 2025
[ PDF ] [ Code ]

MVP-CBM: Multi-layer Visual Preference-enhanced Concept Bottleneck Model for Explainable Medical Image Classification
Chunjiang Wang, Kun Zhang^#, Yandong Liu, Zhiyang He, Xiaodong Tao, S Kevin Zhou^#.
International Joint Conference on Artificial Intelligence (IJCAI), 2025
[ PDF ]

MACD: Multi-Agent Clinical Diagnosis with Self-Learned Knowledge for LLM
Wenliang Li, Rui Yan, Xu Zhang, Li Chen, Hongji Zhu, Jing Zhao, Junjun Li, Mengru Li, Wei Cao, Zihang Jiang, Wei Wei^#, Kun Zhang^#, S Kevin Zhou^#.
Preprint, 2025
[ PDF ] [ Blog ]

A General Knowledge Injection Framework for ICD Coding
Xu Zhang, Kun Zhang^#, Wenxin ma, Rongsheng Wang, Chenxu Wu, Yingtai Li, S Kevin Zhou^#.
Association for Computational Linguistics (ACL Findings), 2025
[ PDF ]

KANTrust: A Multi-Omics Framework for Uncertainty-Aware Disease Subtyping
Chunjiang Wang, Rui Yan^#, Kun Zhang^#, Zihang Jiang, Zhiyang He, Xiaodong Tao, S Kevin Zhou^#.
IEEE International Conference on Bioinformatics and Biomedicine (IEEE BIBM), 2025

Rethinking Pseudo Word Learning in Zero-Shot Composed Image Retrieval: From an Object-Aware Perspective
Zhe Li, Lei Zhang, Kun Zhang, Weidong Chen, Yongdong Zhang and Zhendong Mao.
International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2025
[ PDF ]

Hierarchy-Aware Pseudo Word Learning with Text Adaptation for Zero-Shot Composed Image Retrieval
Zhe Li, Lei Zhang, Zheren Fu, Kun Zhang, Zhendong Mao.
International Conference on Computer Vision (ICCV), 2025
[ PDF ]

FundusAdapter: few-shot adaptation of fundus image foundation model for fundus image diagnosis
Yifan Chang, Zihang Jiang, Kun Zhang^#,S Kevin Zhou^#.
Medical Imaging with Deep Learning (MIDL Short), 2025
[ PDF ]

DiffRGenNet: Difference-aware Medical Report Generation
Minghao Bian, Kun Zhang, Dexin Zhao, S Kevin Zhou.
Medical Imaging with Deep Learning (MIDL), 2025
[ PDF ]

SDD-4DGS: Static-Dynamic Aware Decoupling in Gaussian Splatting for 4D Scene Reconstruction
Dai Sun, Huhao Guan, Kun Zhang, Xike Xie, S Kevin Zhou.
Preprint (Arxiv), 2025
[ PDF ]

2024

Enhanced Semantic Similarity Learning Framework for Image-Text Matching
Kun Zhang, Bo Hu, Huatian Zhang, Zhe Li, Zhendong Mao.
IEEE Transactions on Circuits and Systems for Video Technology (IEEE-TCSVT) , 2024
[ PDF ] [ Code ]

Identification of Necessary Semantic Undertakers in the Causal View for Image-Text Matching
Huatian Zhang, Lei Zhang, Kun Zhang, Zhendong Mao.
AAAI Conference on Artificial Intelligence (AAAI) , 2024
[ PDF ]

Cascade Semantic Prompt Alignment Network for Image Captioning
Jingyu Li, Lei Zhang, Kun Zhang, Bo Hu, Hongtao Xie, Zhendong Mao.
IEEE Transactions on Circuits and Systems for Video Technology (IEEE-TCSVT) , 2024
[ PDF ] [ Code ]

Improving Image-Text Matching with Bidirectional Consistency of Cross-Modal Alignment
Zhe Li, Lei Zhang, Kun Zhang, Yongdong Zhang, Zhendong Mao.
IEEE Transactions on Circuits and Systems for Video Technology (IEEE-TCSVT) , 2024
[ PDF ]

Fast, Accurate, and Lightweight Memory-Enhanced Embedding Learning Framework for Image-Text Retrieval
Zhe Li, Lei Zhang, Kun Zhang, Yongdong Zhang, Zhendong Mao.
IEEE Transactions on Circuits and Systems for Video Technology (IEEE-TCSVT) , 2024
[ PDF ]

Visual-Linguistic Dependency Encoding for Image-Text Retrieval
Wenxin Guo, Lei Zhang, Kun Zhang, Yi Liu and Zhendong Mao.
Joint International Conference on Computational Linguistics, Language Resources and Evaluation Technology (COLING) , 2024
[ PDF ]

2023

Unlocking the Power of Cross-Dimensional Semantic Dependency for Image-Text Matching
Kun Zhang, Lei Zhang, Bo Hu, Mengxiao Zhu, Zhendong Mao.
ACM International Conference on Multimedia (ACM MM), 2023
[ PDF ] [ Code ] [ Blog ]

Unified Adaptive Relevance Distinguishable Attention Network for Image-Text Matching
Kun Zhang, Zhendong Mao, Anan Liu, Yongdong Zhang.
IEEE Transactions on Multimedia (IEEE-TMM), 2023
, (ESI Highly Cited Paper) [ PDF ] [ Blog ] [ Code ]

2022

Negative-Aware Attention Framework for Image-Text Matching
Kun Zhang, Zhendong Mao, Quan Wang, Yongdong Zhang.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022
[ PDF ] [ Code ] [ Blog ]

Show Your Faith: Cross-Modal Confidence-Aware Network for Image-Text Matching
Huatian Zhang, Zhendong Mao, Kun Zhang, Yongdong Zhang.
AAAI Conference on Artificial Intelligence (AAAI) , 2022
[ PDF ] [ Blog ] [ Code ]

Before 2022

An Efficient Online Estimation Algorithm with Measurement Noise for Time-varying Quantum States
Kun Zhang, Shuang Cong, Kezhi Li.
Signal Processing (SIGPRO) , 2021
[ PDF ]

An Online Optimization Algorithm for Real-time Quantum State Tomography
Kun Zhang, Shuang Cong, Kezhi Li, Tao Wang.
Quantum Information Processing (QIP) , 2020
[ PDF ]

An Efficient Online Estimation Algorithm for Evolving Quantum States
Kun Zhang, Shuang Cong, Yaru Tang, Nikolaos M. Freris.
IEEE European Signal Processing Conference (EUSIPCO) , 2020
[ PDF ]

Efficient and fast optimization algorithms for quantum state filtering and estimation
Kun Zhang, Shuang Cong, Jiao Ding, Jiaojiao Zhang, Kezhi Li.
International Conference on Intelligent Control and Information Processing (IEEE ICICIP) , 2019
[ PDF ]

Awards

Jiangsu Funding Program for Excellent Postdoctoral Talent (A) (江苏省卓越博士后A类), 2025
President Award of the Chinese Academy of Sciences (中国科学院院长奖), 2024
National Scholarship for Doctoral Students (博士生国家奖学金), 2023
USTC-SZSE Doctoral Scholarship (中国科大-深交所博士奖学金), 2022
National Scholarship for Undergraduate Students (本科生国家奖学金), 2015
First-class Academic Scholarship of USTC, 2018/19/21/23
1st Place of Wuxi Internet of Things Maker Competition, 2018
3rd Place of China Artificial Intelligence Society Bo Er Cup Competition, 2018

Project

国家自然科学基金青年基金项目，2026-2028, (主持) (National Natural Science Foundation of China), 2025
江苏省青年科学基金项目，2026-2028, (主持) (Jiangsu Provincial Department of Science and Technology), 2025
中国博士后科学基金面上项目, 2024-2026, （主持） (China Postdoctoral Science Foundation), 2024
国家重点研发计划：基于大模型的基层辅助诊断机器人软件系统开发, 2025-2028,（参与） (Ministry of Science and Technology of China), 2025
国家重点研发计划：主流价值观理论知识体系与计算模型, 2021-2024,（参与） (Ministry of Science and Technology of China), 2021
国家重点研发计划：面向全员媒体的内容跨媒体解析与动态组合生产, 2020-2023,（参与） (Ministry of Science and Technology of China), 2020

Academic Activities

Reviewer

Served as reviewer for many conferences or journals, including CVPR, ICCV, ECCV, ACL, NeurIPS, AAAI, ACM MM, IJCAI, IEEE-TMM, IEEE-TCSVT, KBS, Pattern Recognition, etc.

Snapping life's beautiful moments!