Learning Conceptual Text Prompts from Visual Regions of Interest for Medical Image Segmentation
Zhu He , Haoran Zhang , Wentao Zhang , Shen Zhao , Qiqi Liu , Xiaohu Wu , Qicheng Lao
Engineering ›› : 202604006
Vision–language segmentation models (VLSMs) are effective in medical image segmentation tasks. However, a major limitation of these models is their dependence on manually crafted textual inputs. Studies have used visual question answering to semiautomatically generate textual information. However, these methods encounter challenges such as error accumulation. Herein, we propose a method to learn conceptual text prompts directly from visual regions of interest (ROIs) for facilitating medical image segmentation. We extracted textual conceptual attributes from ROIs using a large multimodal model to derive coarse real-text prompts. A text latent space transformation module accepted the ROI images as input for generating fine-grained pseudo-text prompts to compensate for the lack of image detail perception in the abovementioned real-text prompts. These prompts were encoded into a unified text embedding. Thereafter, we applied a self-adding noise knowledge distillation method to transfer the knowledge from text embedding to the class token of the image encoder, enabling direct text-guided inference during testing while reducing error accumulation. Our approach minimized the need for man- ual prompt design by leveraging explicit discrete and implicit continuous text prompts to effectively guide visual segmentation. Extensive evaluation across 13 medical image segmentation datasets demon- strated that our model outperformed the state-of-the-art VLSMs and vision-based segmentation models, exhibiting superior segmentation accuracy.
Conceptual text Prompt learning / Knowledge distillation / Medical image segmentation
| [1] |
|
| [2] |
|
| [3] |
|
| [4] |
|
| [5] |
|
| [6] |
|
| [7] |
|
| [8] |
|
| [9] |
|
| [10] |
|
| [11] |
|
| [12] |
|
| [13] |
|
| [14] |
|
| [15] |
|
| [16] |
|
| [17] |
|
| [18] |
|
| [19] |
Du C, Zhang Z, Liu B, Cao Z, Jiang N, |
| [20] |
|
| [21] |
|
| [22] |
|
| [23] |
|
| [24] |
|
| [25] |
|
| [26] |
|
| [27] |
|
| [28] |
Zhou Z, Lei Y, Zhang B, Liu L, |
| [29] |
|
| [30] |
|
| [31] |
|
| [32] |
|
| [33] |
|
| [34] |
|
| [35] |
|
| [36] |
|
| [37] |
|
| [38] |
|
| [39] |
|
| [40] |
|
| [41] |
|
| [42] |
|
| [43] |
|
| [44] |
|
| [45] |
|
| [46] |
|
| [47] |
|
| [48] |
|
| [49] |
|
| [50] |
|
| [51] |
|
| [52] |
|
| [53] |
|
| [54] |
|
| [55] |
|
| [56] |
|
| [57] |
Degerli A, Kiranyaz S, Chowdhury ME, Gabbouj M. OSegNet: operational segmentation network for COVID—19 detection using chest X—ray images. In: Proceedings of the IEEE International Conference on Image Processing; 2022 Oct 16—19; Bordeaux, France. New York City: IEEE; 2022. p. 2306-10. |
| [58] |
|
| [59] |
|
| [60] |
|
| [61] |
|
| [62] |
|
| [63] |
|
| [64] |
|
| [65] |
|
| [66] |
|
| [67] |
|
| [68] |
|
| [69] |
|
| [70] |
|
/
| 〈 |
|
〉 |