&nbsp;基于大语言模型和量子计算开发沙门氏菌耐药性预测平台

Yujie You; Kan Tan; Zekun Jiang; Le Zhang

doi:10.1016/j.eng.2025.01.013

PDF(2933 KB)

工程（英文） ›› 2025, Vol. 48 ›› Issue (5) : 174-184. DOI: 10.1016/j.eng.2025.01.013

Article

基于大语言模型和量子计算开发沙门氏菌耐药性预测平台

Yujie You ^a ,
Kan Tan ^a ,
Zekun Jiang ^a^,^b ,
Le Zhang ^a^,^b^,^*

作者信息 +

Developing a Predictive Platform for Salmonella Antimicrobial Resistance Based on a Large Language Model and Quantum Computing

Yujie You ^a ,
Kan Tan ^a ,
Zekun Jiang ^a^,^b ,
Le Zhang ^a^,^b^,^*

Author information +

History +

Abstract

As a common foodborne pathogen, Salmonella poses risks to public health safety, common given the emergence of antimicrobial-resistant strains. However, there is currently a lack of systematic platforms based on large language models (LLMs) for Salmonella resistance prediction, data presentation, and data sharing. To overcome this issue, we firstly propose a two-step feature-selection process based on the chi-square test and conditional mutual information maximization to find the key Salmonella resistance genes in a pan-genomics analysis and develop an LLM-based Salmonella antimicrobial-resistance predictive (SARPLLM) algorithm to achieve accurate antimicrobial-resistance prediction, based on Qwen2 LLM and low-rank adaptation. Secondly, we optimize the time complexity to compute the sample distance from the linear to logarithmic level by constructing a quantum data augmentation algorithm denoted as QSMOTEN. Thirdly, we build up a user-friendly Salmonella antimicrobial-resistance predictive online platform based on knowledge graphs, which not only facilitates online resistance prediction for users but also visualizes the pan-genomics analysis results of the Salmonella datasets.

Keywords

Salmonella resistance prediction / Pan-genomics / Large language model / Quantum computing / Bioinformatics

引用本文

EndNote

Ris (Procite)

Bibtex

导出引用

Yujie You, Kan Tan, Zekun Jiang. . Engineering. 2025, 48(5): 174-184 https://doi.org/10.1016/j.eng.2025.01.013

参考文献

原文顺序 | 文献年度倒序 | 文中引用次数倒序

[1]	Ferrari RG, Rosario DKA, Cunha-Neto A, Mano SB, Figueiredo EES, Conte CA, et al.Worldwide epidemiology of salmonella serovars in animal-based foods: a meta-analysis.Appl Environ Microbiol 2019; 85(14):e00591-e619.
[2]	Qin XJ, Yang MZ, Cai H, Liu YT, Gorris L, Aslam MZ, et al.Antibiotic resistance of salmonella typhimurium monophasic variant 1, 4, 5, 12:i:-in China: a systematic review and meta-analysis.Antibiotics (Basel) 2022; 11(4):532.
[3]	Anahtar MN, Yang JH, Kanjilal S.Applications of machine learning to the problem of antimicrobial resistance: an emerging model for translational research.J Clin Microbiol 2021; 59(7):e01260-e1320.
[4]	Botelho J, Schulenburg H.The role of integrative and conjugative elements in antibiotic resistance evolution.Trends Microbiol 2021; 29(1):8-18.
[5]	Soon WW, Hariharan M, Snyder MP.High-throughput sequencing for biology and medicine.Mol Syst Biol 2013; 9:640.
[6]	Su M, Satola SW, Read TD.Genome-based prediction of bacterial antibiotic resistance.J Clin Microbiol 2019; 57(3):e01405-e1418.
[7]	Wang CC, HungY T, Chou CY, Hsuan SL, Chen ZW, Chang PY, et al.Using random forest to predict antimicrobial minimum inhibitory concentrations of nontyphoidal Salmonella in Taiwan.Vet Res 2023;54(1):11.
[8]	Ren Y, Chakraborty T, Doijad S, Falgenhauer L, Falgenhauer J, Goesmann A, et al.Deep transfer learning enables robust prediction of antimicrobial resistance for novel antibiotics.Antibiotics 2022; 11(11):1611.
[9]	Gao J, Lao QH, Liu P, Yi HH, Kang QB, Jiang ZK, et al.Anatomically guided cross-domain repair and screening for ultrasound fetal biometry.IEEE J Biomed Health Inform 2023; 27(10):4914-4925.
[10]	Lai X, Zhou J, Wessely A, Heppt M, Maier A, Berking C, et al.A disease network-based deep learning approach for characterizing melanoma.Int J Cancer 2022; 150(6):1029-1044.
[11]	Song H, Chen L, Cui Y, Li Q, Wang Q, Fan J, et al.Denoising of MR and CT images using cascaded multi-supervision convolutional neural networks with progressive training.Neurocomputing 2022; 469:354-365.
[12]	Zhang Q, Zhang H, Zhou K, Zhang L.Developing a physiological signal-based, mean threshold and decision-level fusion algorithm (PMD) for emotion recognition.Tsinghua Sci Technol 2023; 28(4):673-685.
[13]	Zhang L, Song W, Zhu T, Liu Y, Chen W, Cao Y,et al.ConvNeXt-MHC: improving MHC-peptide affinity prediction by structure-derived degenerate coding and the ConvNeXt model.Brief Bioinform 2024;25(3):bbae133.
[14]	Jiang Z, Cheng D, Qin Z, Gao J, Lao Q, Li K, et al.TV-SAM: increasing zero-shot segmentation performance on multimodal medical images using GPT-4 generated descriptive prompts without human annotation.Big Data Mining and Analytics 2024; 7(4):1199-1211.
[15]	Gao J, Lao Q, Kang Q, Liu P, Du C, Li K, et al.Boosting your context by dual similarity checkup for in-context learning medical image segmentation.
[16]	You Y, Zhou F, Yue Y.The classical iterative HHL-based hemodynamic simulation quantum linear equation algorithm for abdominal aortic aneurysm.
[17]	Xiao M, Wei R, Yu J, Gao C, Yang F, Zhang L, et al.CpG island definition and methylation mapping of the T2T-YAO genome.Genom Proteom Bioinform 2024;22(2):qzae009.
[18]	Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP.Smote: synthetic minority over-sampling technique.J Artif Intell Res 2002; 16:321-357.
[19]	He H, Bai Y, Garcia EA, Li S.ADASYN: adaptive synthetic sampling approach for imbalanced learning.In: Proceedings of the 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence); 2008 Jun 1–8; Hong Kong, China; 2008. p. 1322–8.
[20]	Han H, Wang WY, Mao BH.Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning.In: Proceedings of the International Conference on Intelligent Computing, ICI C 2005; 2005 Aug 23–26; Hefei, China. Berlin: Springer nature; 2005. p. 878–87.
[21]	Zankari E, Hasman H, Cosentino S, Vestergaard A, Rasmussen S, Lund O, et al.Identification of acquired antimicrobial resistance genes.J Antimicrob Chemother 2012; 67:2640-2644.
[22]	McArthur AG, Waglechner N, Nizam F, Yan A, Azad MA, Baylay AJ, et al.The comprehensive antibiotic resistance database.Antimicrob Agents Chemother 2013; 57:3348-3357.
[23]	Moradigaravand D, Palm M, Farewell A, Mustonen V, Warringer J, Parts L, et al.Prediction of antibiotic resistance in Escherichia coli from large-scale pan-genome data.PLoS Comput Biol 2018; 14(12):e1006258.
[24]	Ha SM, Lin EY, Klausner JD, Adamson PC.Machine learning to predict ceftriaxone resistance using single nucleotide polymorphisms within a global database of Neisseria gonorrhoeae genomes.Microbiol Spectr 2023; 11(6):e0170323.
[25]	Yang Y, Walker TM, Kouchaki S, Wang C, Peto TEA, Crook DW, et al.An end-to-end heterogeneous graph attention network for Mycobacterium tuberculosis drug-resistance prediction.Brief Bioinform 2021;22(6):bbab299.
[26]	Jiang Z, Lu Y, Liu Z, Wu W, Xu X, Dinnyés A, et al.Drug resistance prediction and resistance genes identification in Mycobacterium tuberculosis based on a hierarchical attentive neural network utilizing genome-wide variants.Brief Bioinform 2022;23(3):bbac041.
[27]	Shi JH, Yan Y, Links MG.Antimicrobial resistance genetic factor identification from whole-genome sequence data using deep feature selection.
[28]	Bai J, Bai S, Chu Y, Cui Z, Dang K, Deng X, Fan Y, et al.Qwen technical report.2023. arXiv: 2309.16609.
[29]	Ma F, Xiao M, Zhu L, Jiang W, Jiang J, Zhang PF, et al.An integrated platform for Brucella with knowledge graph technology: from genomic analysis to epidemiological projection.Front Genet 2022; 13:981633.
[30]	Zhang L, Dai Z, Yu J, Xiao M.CpG-island-based annotation and analysis of human housekeeping genes.Brief Bioinform 2021; 22(1):515-525.
[31]	Zhang L, Zhang L, Guo Y, Xiao M, Feng L, Yang C, et al.MCDB: a comprehensive curated mitotic catastrophe database for retrieval, protein sequence alignment, and target prediction.Acta Pharm Sin B 2021; 11(10):3092-3104.
[32]	Kline DM, Berardi VL.Revisiting squared-error and cross-entropy functions for training neural network classifiers.Neural Comput Appl 2005; 14(4):310-318.
[33]	Barenco A, Berthiaume A, Deutsch D, Ekert AK, Jozsa R, Macchiavello C, et al.Stabilization of quantum computations by symmetrization.SIAM J Comput 1997; 26:1541-1557.
[34]	Chin CS, Alexander D, Marks P.Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data (in Eng).Nat Methods 2013; 10(6):563-569.
[35]	Tatusova T.NCBI prokaryotic genome annotation pipeline.Nucleic Acids Res 2016; 44(14):6614-6624.
[36]	Page AJ, Cummins CA, Hunt M, Wong VK, Reuter S, Holden MT, et al.Roary: rapid large-scale prokaryote pan genome analysis.Bioinformatics 2015; 31(22):3691-3693.
[37]	Katoh K, Misawa K, Ki K, Miyata T.MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform.Nucleic Acids Res 2002; 30(14):3059-3066.
[38]	Page AJ, Taylor B, Delaney AJ, Soares J, Seemann T, Keane JA, et al.SNP-sites: rapid efficient extraction of SNPs from multi-FASTA alignments.Microb Genom 2016; 2(4):e000056.
[39]	Dong X, Sun F, Han X, Hou R.Study of positive and negative association rules based on multi-confidence and chi-squared test.Advanced data mining and applications, Springer Nature, Berlin 2006; 100-109.
[40]	Liang J, Shi Z, Li D, Wierman MJ.Information entropy, rough entropy and knowledge granulation in incomplete information systems.Int J Gen System 2006; 35:641-654.
[41]	Dinh T, Zeng Y, Zhang R, Lin Z, Gira M, Rajput S, et al.LIFT: language-interfaced fine-tuning for non-language machine learning tasks.In: Proceedings of the 36th International Conference on Neural Information Processing Systems; 2022 Nov 28–Dec 9; New Orleans, L A, US A. Red Hook: Curran Associates Inc.; 2022. p. 11763–84.
[42]	Hegselmann S, Buendia A, Lang H, Agrawal M, Jiang X, Sontag D ,et al.TabLLM: few-shot classification of tabular data with large language models.In: Proceedings of the International Conference on Artificial Intelligence and Statistics; 2023 Apr 25–27; Valencia, Spain. online: PML R. p. 5549–58.
[43]	Putnam J.Python Web development with Django.Comput Rev 2010; 51(6):330.
[44]	Breiman L.Random forests.Mach Learn 2001; 45:5-32.
[45]	Chen T, Guestrin C.XGBoost: a scalable tree boosting system.In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2016 Aug 13–17; San Francisco, C A, US A. New York City: Association for Computing Machinery (AC M); 2016. p. 785–94.
[46]	Cortes C, Vapnik VN.Support-vector networks.Mach Learn 1995;20:273–97.
[47]	Loshchilov I, Hutter F.Decoupled weight decay regularization.In: Proceedings of the International Conference on Learning Representations; 2019 May 6–9; New Orleans, L A, US A. Wadern: dblp; 2019.
[48]	Xia Y, Yang C, Hu N, Yang Z, He X, Li T, et al.Exploring the key genes and signaling transduction pathways related to the survival time of glioblastoma multiforme patients by a novel survival analysis model.BMC Genomics 2017;18(Suppl 1):950.
[49]	Zhang L, Liu G, Kong M, Li T, Wu D, Zhou X, et al.Revealing dynamic regulations and the related key proteins of myeloma-initiating cells by integrating experimental data into a systems biological model.Bioinformatics 2021; 37(11):1554-1561.
[50]	You Y, Lai X, Pan Y, Zheng H, Vera J, Liu S, et al.Artificial intelligence in cancer target identification and drug discovery.Signal Transduct Target Ther 2022; 7(1):156.
[51]	Aleksandrowicz G, Alexander T, Barkoutsos P, Bello L, Ben-Haim Y, Bucher D, et al.Qiskit: an open-source framework for quantum computing [Internet].Genève: Zenodo; 2019 Jan 23 [cited 2024 Jan 22]. Available from: https://zenodo.org/records/2562111.
[52]	Zha J, Su J, Li T, Cao C, Ma Y, Wei H, et al.Encoding molecular docking for quantum computers.J Chem Theory Comput 2023; 19(24):9018-9024.
[53]	Shu G, Shan Z., Xu J.Zhao J ,Wang S. A general quantum algorithm for numerical integration. Sci Rep 2024;14(1):10432.
[54]	Liu F ,Bian K, Meng F, Zhang W, Dahlsten O.Information compression via hidden subgroup quantum autoencoders.npj Quantum Information 2024;10(1):74.