Kun Zhang (张坤)

About Me

Currently, I am a Postdoctoral Researcher at Suzhou Institute for Advanced Research, University of Science and Technology of China (USTC), advised by Prof. S. Kevin Zhou (IEEE Fellow) and Prof. Houqiang Li (IEEE Fellow). Before that, I obtained my Ph.D. degree from the Department of Electronic Engineering and Information Science, University of Science and Technology of China (USTC) in 2024, advised by Prof. Yongdong Zhang (IEEE Fellow) and Prof. Zhendong Mao. From 2018 to 2020, I studied in the Department of Automation at USTC, advised by Prof. Shuang Cong. I obtained my B. Eng. degree in the School of Internet of Things Engineering from Jiangnan University in 2018.

My research interests broadly lie in the areas of Multimodal Artificial Intelligence and Deep Learning (e.g., vision-language alignment, cross-modal retrieval, report generation, retrieval augmented generation, hallucination evaluation, etc.). I am recently interested in multimodal large language models in medical scenarios, including explainable disease diagnosis, LLM-based clinical decision making, medical text processing, pathology, MRI image processing, etc.

We are currently completing a review: Composed Multi-modal Retrieval: A Survey of Approaches and Applications. For more details, please see page. This repo is used for recording and tracking recent Composed Multi-modal Retrieval (CMR) works, including Composed Image Retrieval (CIR), Composed Video Retrieval (CVR), Composed Person Retrieval (CPR), etc. The survey can be found here.

Explainable artificial intelligence is an important research direction, and the Concept Bottleneck Model (CBM) is a current promising research paradigm. CBMs typically involve a layer preceding the final fully connected classifier, where each neuron corresponds to a concept that can be interpreted by humans. CBMs also show advantages in improving accuracy through human intervention during testing. We are maintaining a GitHub repository CBM page, aiming to keep pace with its rapidly evolving.

Selected Publications [ Full List ]

2025

DH-Set: Improving Vision-Language Alignment with Diverse and Hybrid Set-Embeddings Learning
Kun Zhang, Jingyu Li, Zhe Li, S Kevin Zhou.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025
[ PDF ]

MVP-CBM: Multi-layer Visual Preference-enhanced Concept Bottleneck Model for Explainable Medical Image Classification
Chunjiang Wang, Kun Zhang^#, Yandong Liu, Zhiyang He, Xiaodong Tao, S Kevin Zhou^#.
International Joint Conference on Artificial Intelligence (IJCAI), 2025
[ PDF ]

A General Knowledge Injection Framework for ICD Coding
Xu Zhang, Kun Zhang^#, Wenxin ma, Rongsheng Wang, Chenxu Wu, Yingtai Li, S Kevin Zhou^#.
Association for Computational Linguistics (ACL Findings), 2025
[ PDF ]

Rethinking Pseudo Word Learning in Zero-Shot Composed Image Retrieval: From an Object-Aware Perspective
Zhe Li, Lei Zhang, Kun Zhang, Weidong Chen, Yongdong Zhang and Zhendong Mao.
International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2025

Hierarchy-Aware Pseudo Word Learning with Text Adaptation for Zero-Shot Composed Image Retrieval
Zhe Li, Lei Zhang, Zheren Fu, Kun Zhang, Zhendong Mao.
International Conference on Computer Vision (ICCV), 2025

FundusAdapter: few-shot adaptation of fundus image foundation model for fundus image diagnosis
Yifan Chang, Zihang Jiang, Kun Zhang^#,S Kevin Zhou^#.
Medical Imaging with Deep Learning (MIDL Short), 2025
[ PDF ]

DiffRGenNet: Difference-aware Medical Report Generation
Minghao Bian, Kun Zhang, Dexin Zhao, S Kevin Zhou.
Medical Imaging with Deep Learning (MIDL), 2025
[ PDF ]

SDD-4DGS: Static-Dynamic Aware Decoupling in Gaussian Splatting for 4D Scene Reconstruction
Dai Sun, Huhao Guan, Kun Zhang, Xike Xie, S Kevin Zhou.
Preprint (Arxiv), 2025
[ PDF ]

2024

Enhanced Semantic Similarity Learning Framework for Image-Text Matching
Kun Zhang, Bo Hu, Huatian Zhang, Zhe Li, Zhendong Mao.
IEEE Transactions on Circuits and Systems for Video Technology (IEEE-TCSVT) , 2024
[ PDF ] [ Code ]

Identification of Necessary Semantic Undertakers in the Causal View for Image-Text Matching
Huatian Zhang, Lei Zhang, Kun Zhang, Zhendong Mao.
AAAI Conference on Artificial Intelligence (AAAI) , 2024
[ PDF ]

Cascade Semantic Prompt Alignment Network for Image Captioning
Jingyu Li, Lei Zhang, Kun Zhang, Bo Hu, Hongtao Xie, Zhendong Mao.
IEEE Transactions on Circuits and Systems for Video Technology (IEEE-TCSVT) , 2024
[ PDF ] [ Code ]

Improving Image-Text Matching with Bidirectional Consistency of Cross-Modal Alignment
Zhe Li, Lei Zhang, Kun Zhang, Yongdong Zhang, Zhendong Mao.
IEEE Transactions on Circuits and Systems for Video Technology (IEEE-TCSVT) , 2024
[ PDF ]

Fast, Accurate, and Lightweight Memory-Enhanced Embedding Learning Framework for Image-Text Retrieval
Zhe Li, Lei Zhang, Kun Zhang, Yongdong Zhang, Zhendong Mao.
IEEE Transactions on Circuits and Systems for Video Technology (IEEE-TCSVT) , 2024
[ PDF ]

Visual-Linguistic Dependency Encoding for Image-Text Retrieval
Wenxin Guo, Lei Zhang, Kun Zhang, Yi Liu and Zhendong Mao.
Joint International Conference on Computational Linguistics, Language Resources and Evaluation Technology (COLING) , 2024
[ PDF ]

2023

Unlocking the Power of Cross-Dimensional Semantic Dependency for Image-Text Matching
Kun Zhang, Lei Zhang, Bo Hu, Mengxiao Zhu, Zhendong Mao.
ACM International Conference on Multimedia (ACM MM), 2023
[ PDF ] [ Code ] [ Blog ]

Unified Adaptive Relevance Distinguishable Attention Network for Image-Text Matching
Kun Zhang, Zhendong Mao, Anan Liu, Yongdong Zhang.
IEEE Transactions on Multimedia (IEEE-TMM), 2023
, (ESI Highly Cited Paper) [ PDF ] [ Blog ] [ Code ]

2022

Negative-Aware Attention Framework for Image-Text Matching
Kun Zhang, Zhendong Mao, Quan Wang, Yongdong Zhang.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022
[ PDF ] [ Code ] [ Blog ]

Show Your Faith: Cross-Modal Confidence-Aware Network for Image-Text Matching
Huatian Zhang, Zhendong Mao, Kun Zhang, Yongdong Zhang.
AAAI Conference on Artificial Intelligence (AAAI) , 2022
[ PDF ] [ Blog ] [ Code ]

Before 2022

An Efficient Online Estimation Algorithm with Measurement Noise for Time-varying Quantum States
Kun Zhang, Shuang Cong, Kezhi Li.
Signal Processing (SIGPRO) , 2021
[ PDF ]

An Online Optimization Algorithm for Real-time Quantum State Tomography
Kun Zhang, Shuang Cong, Kezhi Li, Tao Wang.
Quantum Information Processing (QIP) , 2020
[ PDF ]

An Efficient Online Estimation Algorithm for Evolving Quantum States
Kun Zhang, Shuang Cong, Yaru Tang, Nikolaos M. Freris.
IEEE European Signal Processing Conference (EUSIPCO) , 2020
[ PDF ]

Efficient and fast optimization algorithms for quantum state filtering and estimation
Kun Zhang, Shuang Cong, Jiao Ding, Jiaojiao Zhang, Kezhi Li.
International Conference on Intelligent Control and Information Processing (IEEE ICICIP) , 2019
[ PDF ]

Awards

President Award of the Chinese Academy of Sciences (中国科学院院长奖), 2024
National Scholarship for Doctoral Students (博士生国家奖学金), 2023
USTC-SZSE Doctoral Scholarship (中国科大-深交所博士奖学金), 2022
National Scholarship for Undergraduate Students (本科生国家奖学金), 2015
First-class Academic Scholarship of USTC, 2018/19/21/23
1st Place of Wuxi Internet of Things Maker Competition, 2018
3rd Place of China Artificial Intelligence Society Bo Er Cup Competition, 2018

Academic Activities

Reviewer

Served as reviewer for many conferences or journals, including CVPR, ICCV, ECCV, ACL, NeurIPS, AAAI, ACM MM, IJCAI, IEEE-TMM, IEEE-TCSVT, KBS, Pattern Recognition, etc.

Snapping life's beautiful moments!