《1. Introduction》

1. Introduction

Numerous microbes, including eukaryotes, archaea, bacteria, and viruses, inhabit our gastrointestinal tract [1]. Nextgeneration sequencing technology has greatly improved our ability to read out the taxonomic composition of these microbes. Further functional and association studies have uncovered the crucial roles played by the microbiota (i.e., the collection of organisms) and microbiome (i.e., their genes) in human health and disease [2,3]. However, most microbial species harbor vast amounts of strainlevel genetic variation between hosts and even within a host over time [4]. In general, single-nucleotide variants (SNVs) and insertions/deletions (indels) are the most common types of mutations in gut microbes. The ratio of non-synonymous and synonymous bases (dN/dS) has often been used to explain evolutionary trends at the protein level, which include purifying selection (dN/dS < 1), neutral evolution (dN/dS = 1), and positive selection (i.e., adaptive evolution, dN/dS > 1) [4]. Adaptive evolution is a process that enables a population to better survive in its environment. It is notable that neutral evolution and purifying selection are the dominant evolutionary forces in the human microbiome [5,6]; purifying selection impacts the majority of microbes with a dN/dS less than 1 in the human microbiome [7,8]. In contrast, structural variants (SVs) are infrequent [9]. However, our current understanding of genetic mutations in the context of microbiomes is limited.

This review focuses on microbial coevolution within the context of complex microbiomes, which shapes strain-level microbial diversity within and between host species. Of particular relevance to human health, an emerging body of literature suggests that specific genomic mutations within the microbiome are associated with common metabolic diseases. We also discuss the literature concerning the adaptive evolution of both beneficial and harmful microbes during the processes by which they invade, colonize, and persist in a host. Finally, we summarize recent advances in algorithms and techniques for data analysis that are relevant to these problems, and indicate directions where future development would be valuable.

《2. Within-host adaptive evolution of the microbiota》

2. Within-host adaptive evolution of the microbiota

In the traditional view, natural selection led to the adaptive evolution of bacteria in the natural environment, and the microbes in the intestinal tract coevolved with humans over tens of thousands of years, with the implication that the host intestinal environment was just another driving factor for the evolution of the intestinal microbiome. However, recent research suggests a very different evolutionary and coevolutionary timescale. Recent analyses of metagenomic data have demonstrated that microbial populations can evolve on short timescales in the human intestinal tract [5,10] due to competition among intestinal microbes and the invasion of new strains. Overall, adaptive evolution may be caused by individual-specific and exposure-specific factors (e.g., diets, regions, and uses of antibiotics). In particular, regions and diets appear to be major reasons for the polymorphism or evolution of strains in the gut [11]. Shotgun metagenomic sequencing, which allows functional and taxonomic data to be obtained, has revealed that antibiotics can force rapid genetic shifts in the genome of an individual species without significant changes occurring in that species’ relative abundance [12]. Host age has also been found to be an important driving factor; furthermore, the composition and genomes of intestinal microorganisms will change dynamically over time, resulting in adaptive evolution and strain replacement [13,14]. Chen et al. [15] studied the gut microbiomes of 338 individuals containing 51 human phenotypes for four years and described the relationship between the stability and variation of microbes and host physiology. They found that the evolution of microorganisms varied greatly between hosts but was temporally stable and significantly altered over the long term within a given host. Some strains showed remarkable genetic polymorphism, such as Bacteroides fragilis (B. fragilis), Faecalibacterium prausnitzii (F. prausnitzii), Eubacterium rectale (E. rectale), and Prevotella copri (P. copri). Understanding the factors underlying microbiota stability or instability in a given host and how such stability/instability links to disease will propel new insights into the mechanisms of evolution and the establishment of particular disease models.

B. fragilis is a generalist symbiotic bacterium in the gut that possesses genetic plasticity, partly due to inversion, replication, and horizontal gene transfer mediated by mobile genetic elements [16–18]. These characteristics have facilitated its adaptation to different ecological environments and enhanced its resistance to antibiotics [19,20]. Zhao et al. [7] carried out metagenomic sequencing of B. fragilis isolates and explored their adaptive evolution in healthy humans. Parallel evolution of 16 genes of B. fragilis was found in the fecal samples of 12 healthy hosts, many of which were related to cell-envelope biosynthesis and polysaccharide utilization; moreover, the mutation was retained in the continuous adaptive evolution within the same host. Furthermore, the addition of public metagenomic data revealed that a common adaptive mutation of B. fragilis occurs frequently in the gut microbiome of Western—but not Chinese—individuals, suggesting that regional or dietary factors play a role in driving the evolution.

F. prausnitzii is ubiquitous in the intestines of healthy human adults; it shows tremendous genetic diversity, and the prevalence of this diversity varies with age, geographical location, lifestyle, and disease [2,21]. A recent study reconstructed 3000 assembled genomes from 7907 human and 203 animal intestinal metagenomic data worldwide and classified them into 12 Faecalibacterium-like species-level genome bins (SGBs) [22]. Twelve SGBs were found to be distributed in human intestines all over the world, showing regional diversity. Increased diversity and the relative abundance of Faecalibacterium were correlated to the increased metabolic potential of complex polysaccharides, which may be promoted by fiber-rich diets. A higher percentage of genes related to starch degradation were found in Faecalibacterium SGBs enriched in Chinese subjects in comparison with Western populations. In contrast, lactose and protein metabolism-related genes were depleted, mainly due to a richer rice diet and a more deficient intake of milk and protein in Asians [23,24]. In addition, compared with subjects in other Western countries, enrichment of the genes related to antibiotic resistance was observed in Europe and Chinese subjects [22]. It was suggested that the consumption of antibiotics performs a function in driving the adaptive evolution of strains. A recent study uncovered universal adaptive mutations of the F. prausnitzii genome due to probiotic intervention [25], suggesting that F. prausnitzii is constantly adapting to the selection pressure of probiotics. The results showed diverse evolutionary trends (i.e., adaptive evolution and purifying selection) of different functional genes of a gut microbial strain. Intriguingly, sensor histidine kinase KdpD was found to be under purifying selection, which indicates that the expression of kdpFABC may not be activated.

As a common human intestinal microorganism, P. copri is controversial due to conflicting reports that it has both positive and negative correlations with host health [26,27]. In transcontinental research spanning more than 6500 metagenomic samples, P. copri was divided into four distinct evolutionary clades. Different clades coexisted in populations with non-Westernized diets, and diversity was higher overall in these populations than in those with a Western lifestyle, with significant functional diversity, especially in carbohydrate metabolism [28]. Another study found that fiberrich diets were associated with P. copri types that could enhance carbohydrate catabolism. P. copri associated with an omnivorous diet had a high prevalence of leuB gene, which involves the biosynthesis of branched-chain amino acids—a key factor in glucose intolerance and type 2 diabetes (T2D) [29]. All these lines of evidence suggest that diet drives the evolution of P. copri in humans.

The microbial genomic changes caused by evolution and strain replacement, such as bacterial single-nucleotide polymorphisms (SNPs), SVs including the gain or loss of genomic regions, and copy number variations (CNVs), are increasingly found to be important in human health [9]; the first two types of variation have already been connected to the development of human disease [30,31]. CNVs may lead to important phenotypic differences in bacteria, even when other types of genetic variation are minimal [32,33]. Other types of microbial genomic SVs are also important, ubiquitous, and associated with risk factors for host disease that can be replicated in independent cohorts. For example, the gene function of a region in Anaerostipes hadrus clustered in the same SV encodes the composite inositol catabolism–butyrate biosynthesis pathway, which is associated with a lower risk of host metabolic disease [9]. Metagenomic studies have begun to focus on genomic variations of the gut microbiome in different disease states (Table 1), such as colorectal cancer (CRC), T2D, and Graves’ disease (GD). Understanding the genomic variations of gut microbiota connected to human disease states may guide microbiome-targeted therapeutic strategies.

《Table 1》

Table 1 The SNP sites related to disease.

Previous studies have linked T2D to an increased abundance of butyrate-producing bacteria in the intestinal microbiota. Chen et al. [34] found that the SNP distribution of Bacteroides coprocola (B. coprocola) was significantly different in T2D patients than in healthy populations, although there was no difference in the relative abundance of B. coprocola between these subject groups. The 65 genes in which SNPs associated with T2D status were diverse. Among them, two mutant genes encode glycosyl hydrolases. Interestingly, an essential drug target of T2D, alpha-glucosidase, is a glycosyl hydrolase located in the intestinal tract. Different strains of B. coprocola may have other effects in human intestines that are related to T2D disease processes.

Genetic mutations in intestinal microbes can be disease specific and can even predict medical conditions at early stages. A CRC prediction model based on SNVs at the strain level showed high accuracy in both the training (area under curve (AUC) = 75.35%) and the verification cohort (AUC = 73.08%–88.02%). Among the studied SNVs, two SNVs in E. rectale were implicated in fusaric acid resistance, and the other two SNVs in F. prausnitzii were located in genes encoding methyltransferase and ZF-HC2 domain-containing protein [30]. Another combined model predicted GD using microbial species, metagenome-assembled genomes (MAGs), genes, and SNPs, and showed high accuracy (AUC = 98.08%) and specificity in a global cross-disease multi-cohort analysis. A total of 275 SNPs belonging to B. vulgatus, F. prausnitzii, and E. rectale differed significantly between the healthy and GD cohort and were mainly located in genes encoding xylanase activity, mannonate dehydratase activity, β-lactamase activity, and β-galactosidase activity [31]. This study demonstrated that fecal-based noninvasive diagnosis will potentially be useful for these diseases.

《3. Pathogen adaptive evolution is closely related to virulence》

3. Pathogen adaptive evolution is closely related to virulence

The virulence and pathogenicity of bacteria depend on the bacteria’s specific functions. The accumulated mutations of pathogens may enhance their virulence or transmission. These mutations include gene rearrangement, optimization of gene expression, loss of unnecessary genes, and horizontal gene transfer (HGT) with other bacteria. For example, most Escherichia coli (E. coli) strains exist harmlessly in the human intestinal tract, but a few strains can cause serious disease. A previous study examined the selective effects of HGT for an E. coli strain and confirmed that phage-driven HGT evolution confers a metabolic growth advantage [35]. Similarly, Lescat et al. [36] completed phenotypic assays of an ancestral strain and a mutant strain of E. coli and confirmed that the mutated strains of E. coli grew faster than wild-type strains in minimum medium D-galactonate. However, the researchers also showed that three galactonate operons are under strong purifying selection based on a genome database of 110 E. coli strains. Therefore, exploring the evolution of pathogens and commensals in the host is useful for devising strategies to combat pathogens and prevent their further evolution.

Interaction between species can shape genetic diversity and the exchange of mobile gene elements, leading to functional diversity. Evolution within individual microbial species can shape their functional diversity. Staphylococcus epidermidis (S. epidermidis) is a crucial skin microorganism and opportunistic pathogen. Metagenomic analysis of 1482 strains of S. epidermidis from five healthy people revealed that the S. epidermidis isolates from skin belonged to multiple founders rather than to a single colonizer, with clear individual and body site specificity [37]. The wide range of individual variations of S. epidermidis at the population level can shape its strain and functional diversity under purifying selection and lead to the mixing of populations on skin sites and the spread of antibiotic-resistance genes within individuals, which improves the virulence of S. epidermidis. The article also suggested that rapid and strong enough purifying selection favors the growth of a particular genetic configuration and drives a distinct subpopulation to form.

Carbapenem-resistant Klebsiella pneumoniae (K. pneumoniae) is an urgent threat associated with high mortality [38,39]. The virulence and pathogenicity of K. pneumoniae have been enhanced through two opposite types of mutations in the capsule biosynthesis gene wzc. One gain of function mutation led to hypercapsuleproducing mutants, and the other loss of function mutation led to capsule-deficient mutants. Transmission and mortality were enhanced in the hypercapsule mutants, which is relevant to bloodstream infections. In contrast, epithelial cell invasion and persistence of urinary tract infection increased in capsule-deficient mutants. The evolution of persistence and virulence may be pervasive features of K. pneumoniae infection [40].

Pathogen lineages can also evolve to colonize specific tissues within the host. Mycobacterium tuberculosis (M. tuberculosis) is the leading cause of death globally, especially in acquired immune deficiency syndrome (AIDS) patients [41]. Genome analysis of 2693 samples from 44 subjects collected from postmortem lung and extrapulmonary organs showed that M. tuberculosis diversified within individuals and formed sub-lineages that coexisted for multiple years. There was strong evidence that purifying selection occurred within individual patients, without the need for patient-to-patient transmission. These different strains were distributed differently within the lung, but many new mutations were shared in various sites within the lung. Furthermore, this distribution was neither expected nor long term [42].

《4. Probiotic adaptive evolution in vivo to improve fitness and colonization》

4. Probiotic adaptive evolution in vivo to improve fitness and colonization

Probiotics are live microorganisms that, when administered in adequate amounts, confer a health benefit on the host [43]. Many studies have explored the ecological impact on the gut microbiota due to the ecological and evolutionary forces of consumed probiotics, including reshaping the indigenous microbial communities [44,45] and improving gut or immune health [46]. However, the genomes and functional traits of the consumed probiotics can vary during administration to facilitate colonization due to the intestinal selective pressure caused by gut microbiota [47,48]. These adaptive mutations in the genome of probiotic strains may confer fitness advantages such as improvements in carbohydrate utilization and colonization [47,49,50]. However, they may lead to potential safety issues, such as the transfer of antibiotic-resistance genes or virulence factors [51]. Accordingly, exploring the adaptive evolution of probiotics in the human gut is an exciting frontier for both the microbiome and population genetics fields.

Although the prospect of genetically engineered probiotics is promising, their intended therapeutic efficacy and safety are affected by natural selection within the intestinal environment. Furthermore, the evolution of genetically engineered probiotics in vivo under diverse gut microbiomes and host diets needs to be explored. To address this aim, Crook et al. [47] exposed the candidate probiotics E. coli Nissle (EcN) to the digestive tract of mice for several weeks to investigate the stress effect of EcN on the variety of diets and background microbiota with varying degrees of complexity (Fig. 1(a)). They reported that EcN accumulated genetic mutations that regulate carbohydrate utilization to gain competitive fitness, but the drug history of antibiotics also conferred resistance to EcN. Next, the researchers used genetically engineered probiotic EcN-expressing phenylalanine ammonia lyase 2 (PAL2) to treat phenylketonuria mouse models and found that the EcN gene remained stable over one week. This study demonstrated the utility of EcN as a chassis for probiotic engineering, at least in a preclinical model. Overall, this study provides us with a opportunity to better understand the safety and engineering potential of probiotics.

Diet is another crucial evolutionary force that shapes hostmicrobe symbioses. Martino et al. [52] confirmed that host diet was a driving force for the evolution of Lactiplantibacillus plantarum (L. plantarum) in the host-microbe symbiosis (Fig. 1(b)). They identified novel variants derived from the Drosophila diet in the ackA gene of L. plantarum that improved its animal growth-promoting potential, and additional mutations of L. plantarum seemed to enhance the symbiotic benefits further. This is an excellent confirmation that bacterial adaptation to the host diet may be the first step in animal–microbe symbiosis. Consideration of host origin may have a significant impact on the selection of probiotic strains. Consequently, understanding microbe–host coevolution will require careful consideration of multiple host models and probiotic strains [53].

《Fig. 1》

Fig. 1. Probiotic adaptive evolution in diverse hosts. Under the selective pressure of diet, antibiotics, and indigenous intestinal microbes, probiotics (a) E. coli Nissle, (b) L. plantarum HNU082, (c) L. plantarum NIZO2877, and (d) L. rhamnosus GG (LGG) undergo adaptive mutations, which are mainly manifested in carbohydrate utilization, antibiotic resistance, and acid tolerance. One blue square represents three SNPs; Fig. 1(a) uses an ellipsis to refer to a number of blue squares that is too large to conveniently display. ICU: intensive care unit.

To fully understand the evolutionary strategy of L. plantarum in multiple hosts, in our previous work [54], we introduced the probiotic L. plantarum HNU082 (Lp082) to the gut microbes of healthy humans, mice, and zebrafish to explore the in vivo gut-adaptive strategies of L. plantarum (Fig. 1(c)). Across all hosts, highly consistent SNVs were acquired when Lp082 became established in and adapted to the gut, which improved carbohydrate utilization and acid tolerance performance and significantly promoted in vivo internal competitive adaptability. Furthermore, resident gut microbial strains that compete with Lp082 (e.g., Bacteroides spp. and Bifidobacterium spp.) accumulated 10–70-fold more evolutionary changes than usual to counter the invasion of Lp082. The intestinal microbiota ecological and genetic stability in humans was found to be higher than that in mice. In summary, Lp082 demonstrated a highly convergent adaptation strategy in diverse host environments and animal models—a finding that lays a foundation for engineering probiotics for better engraftment in humans.

Although the intestinal environment can help to promote the adaptive evolution of probiotics and increase their colonization ability, the resulting safety problems cannot be ignored. Some studies have suggested that there is a significant risk associated with the use of probiotics in particular cases. For example, the probiotic Lactobacillus rhamnosus (L. rhamnosus) strain GG (LGG) has been linked to bacteremia [55] (Fig. 1(d)). Blood isolates contained new mutations, including a non-synonymous SNV, conferring antibiotic resistance in one patient. These findings support the idea that probiotic strains can directly cause bacteremia and adaptively evolve within intensive care unit (ICU) patients. Moreover, withinhost evolution can enhance the survival of probiotic strains but is also accompanied by evolved specific antibiotic resistance [47]. Conversely, the probiotic L. plantarum P-8 reliably lost one to three plasmids, suggesting that probiotics might have a tendency toward reduced genomes in the host. However, the benefit of genetic deletions for probiotics is debatable [56].

Understanding universal adaptive mutations of the indigenous gut microbiota due to probiotics consumption is essential. To this end, a global, cross-cohort metagenomic meta-analysis investigated the coevolution of indigenous gut microbes due to the consumption of probiotics [25]. The results suggested that a diverse consumption of probiotics can guide widespread SNVs in the natural gut microbes of both mice and humans. Interestingly, far more SNVs were identified in the microbial residents by the same probiotic strains introduced in mouse gut than in humans. Furthermore, the SNVs pattern induced by probiotics was highly probiotic strain specific. Collectively, the study substantially extended our understanding of the coevolution of the consumed probiotics and the indigenous gut microbiota, highlighting the importance of critical assessment of probiotics efficacy and safety in an integrated manner. Hence, the in vivo evolution of probiotics could become a new benchmark for the evaluation of probiotics.

《5. Advances in the gut microbial genomic mutation analysis pipeline 》

5. Advances in the gut microbial genomic mutation analysis pipeline 

The main processes affecting the fate of new mutations include drift, selection, migration (or transfer), and recombination [4]. Strikingly, it has been estimated that in an individual adult human microbiome, billions of bacterial mutations are generated every day, and some of these differences may be clinically relevant [7]. Of note, even if a slight adaptive mutation is detected in an individual microbiome, it may be critical for the bacteria’s long-term presence in the human body [57,58]. Consequently, it is crucial to select a method for revealing accurate and reliable mutations.

The sequencing of individual isolate genomes is the gold standard and identifies mismatches in whole-genome alignments [59]. However, it is time consuming and expensive to isolate many cells in the microbial population for phenotypic analysis and genome sequencing, especially when bias must be avoided. The original method for mutation detection—that is, genome assembly followed by whole-genome alignment—performs well across multiple sequencing platforms [60]. However, it is challenging to apply to low-abundance species and uncultivated taxa. Unfortunately, many bacteria and archaea remain uncultivable [61], and one in five common strains ultimately failed to grow successfully in 19 media [62]. In addition, whole-genome alignment relies on highly accurate and continuous strain assemblies, and hundreds of isolates may be needed for a single strain, which can be expensive.

Culture-independent metagenomic analysis can reveal evolutionary mechanisms and metabolic variations at high throughput with low cost. Recently, more efforts have used several approaches for identifying SNPs in the microbiome by aligning short reads from the shotgun metagenome to reference genomes, including Constrains [63], MIDAS2 [64], metaSNV [65], DESMAN [66], and inStrain [67]. Firstly, StrainPhlAn [68] can be used for metagenomic strain identification-based SNVs. Next, we can choose a specific representative strain’s genome and use inStrain to complete SNV calling. InStrain exhibits higher accuracy and sensitivity in calculating nucleotide diversity and linkage disequilibrium, identifying SNVs, and calculating accurate coverage depth and breadth. If the above reference strain’s genome is missing, its database alignment approach may miss novel genes and species. Similarly, MIDAS aligns short fragments to a more than 30 000 reference genome database to identify genetic variants in each strain of every sample. As yet, there is no selection of reference genomes for certain bacterial species that have high strain-level diversity (e.g., F. prausnitzii, P. copri, and E. coli) is unknown. 

New approaches based on the single-cell technique have been developed for this purpose, including single-cell genome sequencing (SiC-seq) conducted through droplet microfluidics [69], Raman-activated gravity-driven single-cell encapsulation and sequencing (RAGE-Seq) [70], and Raman-activated cell sorter and sequencing (RACS-Seq) [71]. Single-cell sequencing combined with deep sequencing and specialized bioinformatics methods can identify genetic variants and mobile genes [72], which may quickly produce unbiased results compared with isolated cultures. Overall, quantitative genomic variation within the human microbiome will provide a novel and precise view of the application of microbiome genomics.

《6. Future perspectives》

6. Future perspectives

Additional studies of the human microbiome—particularly those that target microbial genetic and genomic variation—will be launched in the next decade. Variation within microbial genomes is already being used as a clinical biomarker for a range of clinical conditions. However, these studies should be extended to construct predictive models for metabolic disease, with the ultimate aim of understanding the causal relationship between the microbiome and metabolic disease. More in-depth studies about the purifying selection of common and pathogenic bacteria and adaptive evolution probiotics in vivo are also required in order to understand the mechanisms of both harmful and beneficial microbes and their interactions with the host. Finally, advances in single-cell-based sequencing technology and progress toward a more comprehensive and intelligent bioinformatics pipeline for microbial genomic mutation analysis will greatly improve all studies aimed at understanding microbial evolution in the context of complex host-associated communities.

《Acknowledgments》

Acknowledgments

This work was supported by the National Natural Science Foundation of China (31701577).

《Compliance with ethics guidelines》

Compliance with ethics guidelines

Jiachao Zhang and Rob Knight declare that they have no conflict of interest or financial conflicts to disclose.