《1.Introduction》

1.Introduction

Method development has been a core driving force of microbiome science. From the discovery of the role of 16S rRNA genes as markers for taxonomy, to the introduction of the metagenomics approach, and then to the revelation of “microbial dark matter” via single-cell sequencing, the emergence of each new core technology has had wide-ranging and profound impacts on the plethora of microbiome-related fields and industries. As a result, novel technologies for microbiome analysis have been the top priority for past national and international microbiome projects [1–3]. However, the analytical methods and tools for analyzing microbial consortia, such as functional imaging, omics-based interrogation, and cell culture, have been quite primitive compared to those for pure cultures. As a result, for example, real-time monitoring of microbiota function in situ is still difficult, and the rational design and assembly of robust microbiotas to operate in natural environments remain challenging. Microbiome data mining across distinct disciplines or application areas has also faced enormous challenges, in that technological platforms for the emerging frontier of microbiome data science have just started to emerge. Therefore, challenges and opportunities abound in the field of microbiome analysis.

We believe that microbiome analysis has been undergoing a revolution, driven by three key shifts in ways of thinking and technological platforms (Fig. 1). The first shift is from dissecting microbiota structure by sequencing to tracking microbiota state, function, and intercellular interaction via imaging. Improvement in temporal resolution is crucial to dynamics tracking of the state or function  of the microbiota (e.g., at the transcriptional or metabolic level), before any change in the structure of the microbiota takes place. The second shift is from interrogating a consortium or population of cells to probing individual cells. Such enhancement in spatial resolution is important for a mechanistic understanding of the functional heterogeneity of the microbiota and of its implications for phenomena such as antimicrobial resistance. The third shift is from microbiome data analysis to microbiome data science, in which a local perspective such as a comparison of microbiomes within a single project is complemented by a global or bird’s eye perspective in which any new data are compared with all microbiomes known to date (or relevant subsets of these microbiomes). This allows a real-time and updated appreciation of the nature and degree of the new data’s contribution to the currently characterized structural and functional space of microbiotas on Earth.

  《Fig. 1》

Fig. 1. Microbiome analysis methodology is undergoing three key shifts in ways of thinking and technological platforms.

《​​​​​​​2.From dissecting microbiota structure by sequencing to tracking microbiota state, function, and intercellular interaction via imaging: The emergence of label-free, function-based imaging technologies for the microbiota》

2.From dissecting microbiota structure by sequencing to tracking microbiota state, function, and intercellular interaction via imaging: The emergence of label-free, function-based imaging technologies for the microbiota

In many chronic diseases, a change in disease status is associated with a series of microbiota structure changes. However, each of these events is a consequence of, and underpinned by, many changes in the microbiota state. Being able to detect and describe such state changes is fundamental to the understanding of the mode of action and microevolution of the microbiota. However, the metagenomic approach can typically only profile the phylogenetic structure (via phylogenetic markers such as 16S/18S rRNA genes) or functional gene structure (via shotgun sequencing). On the other hand, methods such as interrogating the metatranscriptome, metaproteome, or meta-metabolome face tremendous challenges in real-time monitoring of microbiota state because of their destructive nature, tedious operation, and significant cost. Furthermore, the sheer organismal and functional complexity and general lack of functional biomarkers in the microbiome have presented major technical hurdles in imaging and tracing microbiota function. For example, fluorescence-based microspectroscopy, despite wide applications in cell biology, has seen fewer successes in characterizing microbiota function, because most microbes cannot be labeled with probes that target specific function unless specific functional biomarkers are known a priori. As a result, label-free, rapid, function-based microbiome imaging tools that are applicable to most or all types of cells in a microbiota are urgently needed.

One of the most defining features of a microbiome is the presence of a network of intricate yet profound inter-species interactions. These networks are the foundation of the function and evolution of the microbiota. However, the most commonly used microbiome analysis tools today, such as metagenomics, metatranscriptomics, meta-metabolomics, DNA/RNA-based stable isotope probing (SIP), and culturomics approaches, cannot directly reveal the metabolic interactions among the members of a microbiome.

A new class of label-free, single-cell-level functional imaging tools called “ramanome” was recently proposed for the “instant photography” of a microbial community [4]. The ramanome is a collection of single-cell Raman spectra (SCRS) acquired from individual cells within an isogenic population (ramanome) or consortium (metaramanome) [4]. Each SCRS consists of over 1500 Raman bands, which individually or collectively correspond to the resonance frequency of chemical bonds from the metabolites in a given cell. These bands can be used to model the profile and relative abundance of metabolites in the cell. Because the metabolite profile is sensitive to the physiological state, environmental changes, and genetic background of a cell, each SCRS is conceptually equivalent to a digital portrait photo of ~1500 pixels, from which phenotypic features of an individual human face can be recognized. Because it is an imaging approach, obtaining the ramanome can be non-destructive to the cell and does not require external labeling or preexisting biomarkers; in addition, it usually takes only seconds to image each cell. Thus, a ramanome can be considered as a single-cell-resolution metabolome that can be measured and monitored with high throughput and low cost. Such “group photos” of individual cells in a population or consortium can directly, and in a “landscape-like” manner, reveal or model the state and function of the community at single-cell resolution. For example, our recent work suggested that the ramanome can quantitatively distinguish bacterial species [5], measure the general metabolic activity of cells or probe the catabolic activity targeting a specific substrate [6], model the intracellular levels of triacylglycerols (TAG) [7] and starch [8], and distinguish cellular drug responses based on the mechanism of cytotoxicity [4]. In a recent elaboration, reverselabeling Raman-imaging technology was introduced, which demonstrated that the ramanome can trace metabolic interactions such as cross feeding within a bacterial consortium [9]. As each individual Raman peak or combination of peaks in an SCRS can potentially describe a phenotype, the number of states or functions that can be described by a ramanome is very large; moreover, just as a 1500-pixel digital portrait photo provides a combination of many features, these cellular functions can potentially be unveiled simultaneously. Thus, the ramanome/meta-ramanome can define, measure, and monitor the functional profile and phenotypic heterogeneity of a microbial community, and can serve as a generally applicable new type of phenome data that is complementary to existing omics tools such as metagenomics, metatranscriptomics, and meta-metabolomics.

《​​​​​​​3.From interrogating a consortium or population of cells to probing individual cells: Microbiota analysis at the deepest level》

3.From interrogating a consortium or population of cells to probing individual cells: Microbiota analysis at the deepest level

By sequencing the collective genetic materials of a microbial consortium, metagenomic approaches can provide a comprehensive view of the organismal structure and functional potential of a microbiota [10,11]. Metagenomic datasets are usually characterized by enormous volume, high genetic heterogeneity, and extreme bias in relative organismal abundance. Therefore, questions such as how to optimize and standardize ecosystem-specific sample pretreatment methods, how to take advantage of new sequencing techniques, how to improve sequencing and assembling strategies, and how to mine microbiome big data are of high priority. Notable progress in tackling these challenges has been made in several areas by microbiome tool developers based in China. This progress includes the classification and analysis of organismal profiles based on short sequences [12], a method for sequencing 16S rRNA gene flanking region sequences (RiboFR-Seq) [13], a genereconstruction algorithm based on machine learning and path topology (inGAP-CDG) [14], and the metaSort method [15] for experimentally and computationally reducing the organismal complexity of a complex microbiome. In addition, a series of seminal algorithms and pipelines were developed for metagenomic sequence quality control (QC-Chain) [16], high-throughput pairwise comparison of microbiomes (Meta-Storm algorithm) [17,18], a strategy for analyzing largescale microbiome datasets (the MDV model for data analysis) [19], and software for data visualization (MetaSee) [20].

Although metagenomic approaches are powerful, single-cell analysis, including functional sorting, sequencing, and cultivation at the single-cell level, can potentially solve one of their core limitations: the inability to discriminate and validate the function of individuals within the community. At present, most function-based cell-sorting approaches are based on fluorescence-activated cell sorting (FACS) [21]. However, FACS typically requires labeling the cells with fluorescent probes that target proteins, metabolites, or nucleic acids, thus requiring a priori knowledge about the biomarkers of the targeted function. For most functional analyses of the microbiota, both of these requirements are difficult to fulfill. To address these limitations, a series of core technologies and devices for Raman-activated cell sorting (RACS) were developed by this team, including Ramanactivated cell ejection (RACE) [5] and Raman-activated microfluidic sorting (RAMS) [22]. Furthermore, a prototype for a Raman-activated cell sorter called RACS-1 has been demonstrated, which sorts and isolates microbial cells based on the aforementioned SCRS [5,22]. Because  SCRS can model a theoretically unlimited number of cellular phenotypes in a label-free, landscape-like, and rapid manner [23], RACS can serve as a general-purpose instrument for sorting and isolating cells of specific functions from a microbiota.

Single-cell sequencing directly coupled to FACS or RACS can interpret the microbiota mode of action at single-cell resolution, thus addressing the core challenges encountered by metagenomic sequencing. Methods for amplifying genomic DNA from single cells, such as multiple displacement amplification (MDA), have revealed novel Bacteria and Archaea from several extreme environments [24], discovered oil-degrading pathways from deep-sea hot springs [25], distinguished pathogens at the resolution of individual strains [26], and unveiled the interaction and co-evolution between ① marine planktonic microalgae and their symbiotic viruses [27], ② human bacterial symbionts and parasitic viruses [28], and ③ nitrogen-fixing cyanobacteria and photosynthetic microalgae [29]. The more recently developed MALBAC (short for multiple annealing and looping-based amplification cycles) method exhibited a lower amplification bias than MDA for mammalian single cells [30,31]; however, performance in microbial single cells did not seem to improve, and contaminating reads derived from the amplification of environmental DNA were significant [32]. Therefore, methods that have low amplification bias and robust protection against contamination, but that are also easily and reliably coupled to the upstream steps of microbial cell isolation or sorting, are urgently needed. Recently, we developed a facile device called “FOCOT” (short for Facile One-Cell-One-Tube), which couples between microbial single-cell isolation and genome sequencing based on integrated dynamic microdroplet arrays that feature microdroplet generation with an automatic energy supply, precise manipulation and merging of microdroplets, and recovery of intra-droplet contents [33]. Its ability to efficiently couple microbial single-cell isolation to sequencing reactions, with low probability of contamination and in the absence of expensive and bulky equipment, can potentially enable portable, onsite, and real-time single-cell analysis under conditions with limited resources or in extreme environments. On the other hand, a technique that does not require microfluidic chips for microdroplet preparation, called cross-interface emulsification (XiE), was introduced as a way to generate nano-liter microdroplets for single-cell genomic DNA amplification reactions coupled to upstream FACS [34]. These  new  techniques  are  expected  to  facilitate  the  development of mobile instruments or even hand-held devices for reliable and higher-throughput amplification and sequencing of marker genes or genomic DNA from individual cells that are functionally sorted from the  microbiota.

Cultivating microbes, whether before or after cell sorting, has always been an important strategy for detection and functional validation, and is also essential for the isolation and mass production of functional elements from a microbiota. Recent studies have shown that many soil or gut microbes that have been considered to be unculturable are now culturable in the laboratory, via multi-round optimization of growth conditions and devices that allow in situ or ex vivo culture [35,36]. However, the large-scale cultivation of members of microbiotas, the so-called culturomics approach, has mainly depended on plating the microbial consortium all together on a liquid or solid medium—an approach that can be problematic, because slow-growing or rare species are outgrown by other members of the microbiota. Several solutions have been proposed, such as taking advantage of microdroplet-based cultivation at the single-cell level [37,38]. For example, an easy-to-use method for the microfluidics-based cultivation of microbes, called the microfluidic streak plate (MSP), achieves a culturable diversity higher than that from traditional plating. It does so because parallel cultures of individual microbial cells in microdroplets made of lipid medium or agar can minimize competition for nutrients while allowing cross feeding [39]. Droplet-based cultivation can also serve the purpose of rapiddiagnosis. For example, data from our laboratories showed that the turn-around time of live bacterial cell counting can be reduced from 1–2 days to a few hours by digital spreading plate counting (dSPC), which generates single-cell-containing microdroplets, followed by cultivation. Moreover, with the introduction of nanometer-size particlemediated aggregation, single-cell-harboring microdroplets can be directly distinguished from empty ones, so that the counting of bacterial cells can be further reduced to just a few minutes. These methods can also take advantage of cellphone-based microscopic imaging, so as to enable portable and real-time detection and counting of all or selected members of the microbiota.

《​​​​​​​4.From microbiome data analysis to microbiome data science: “Look for the old to learn the new”》

4.From microbiome data analysis to microbiome data science: “Look for the old to learn the new”

Big data is one of the most critical bottlenecks for microbiome science at present. Microbiome big data holds the key to unleash the power of microbiomes to overcome critical challenges in precision medicine, environmental remediation, and clean energy production. For example, microbiome data science will allow us to define the nature of similarities among microbiomes in order to understand the global features of microbiomes on Earth; machine learning can be exploited to unveil associations between structural and functional similarity in microbiomes, so as to dissect and predict microbiome evolution; and artificial intelligence can be used to establish diagnostic and early-alarm models for human diseases and environmental disasters. However, in metagenomic datasets, for example, features such as the enormous data volume and the heterogeneity in data origin and sequence type have hindered integration, searches, and pairwise comparison. There are two main problems that need to be overcome. First, integration and indexing of metagenomic datasets are challenging. MG-RAST [40] and CAMERA [41] are among the most prominent metagenomic databases; however, these datasets can differ in metadata, project design and sample preparation, and sequencing methodology. Hence, integrated analysis and global comparison of all these datasets have been difficult. Second, high-speed comparison and searching for metagenome datasets have been hindered by the lack of appropriate methods. Published methods, including microbiome structure-analysis tools such as PHYLOSHOP [42] and MEGAN [43] and structure-comparison tools such as mothur [44], UniFrac [45,46], and QIIME [47], are primarily based on the premise of “within-project metagenome analysis,” and are not optimized to support comparisons and searches across the much larger scope of all known metagenomes. Third, the explosive expansion of data sources and data volume has resulted in unprecedented needs in terms of the functionality, throughput, and cost, in the design and sustainability of big data systems. At the same time, the integration of multi-omics, including metatranscriptome, metametabolome, and single-cell genomes as well as new phenome data types such as the aforementioned ramanome and meta-ramanome, has been a major challenge.

In order to tackle these technological bottlenecks, the Microbiome Search Engine (MSE)† was developed by this team to enable microbiota structureor function-based searching and data mining, with one metagenome as the basic search unit. Based on a series of computational-method developments, including a novel indexing method [17], an algorithm for the pairwise comparison of 16Samplicon libraries [48–50], a statistical framework for evaluating similarities among metagenomes [17], general-purpose graphic processing unit (GPGPU)-based acceleration software [18], and so forth, MSE allows a “BLAST-like” search of microbiomes in which subject microbiomes in the known microbiome space that are most similar to the query microbiome, in either organismal structure or functional structure, are rapidly identified and returned. Moreover, via machinelearning approaches, MSE automatically builds computational models for key metadata such as the type or stage of a polymicrobial disease or ecological disaster, and applies them to calculate a series of microbiome-based indices for diagnosis or risk assessment, such as the Microbial Index of Gingivitis [51], Microbial Indicators of Caries [52], and Microbial Index of Gout [53]. The reference microbiome database covers a wide range of high-quality, well-annotated metagenomic datasets that were analyzed using a consistent bioinformatics pipeline that accounts for heterogeneity in the type and strategy of sequence data acquisition, quality control, comparison, and visualization. As a result, MSE supports both local and global interrogation of the known microbiome space, and may even enable prospecting into the yet-unexplored areas of microbiome structure and function (including not just bacteria but fungi and viruses; e.g., Ref. [54]).

In summary, similar to the impact on microbiome science of the metagenomics approach over a decade ago, the three methodological shifts in microbiome analysis methodology described here have the potential to fundamentally change the ways of thinking and the tools that are widely accessible to microbiome scientists in the next decade. As advocated in the Confucian Analects over 2000 years ago, “To do a good job, one must first sharpen one’s tools,” major breakthroughs in harnessing the power of the microbiome to meet the challenges facing our generation will not be possible without innovation in methods, software, and instruments. In addition, novel funding and management mechanisms will be required to encourage seamless collaboration among tool developers and tool users. By fostering both domestic and international collaborations, microbiome tool developers based in China, who contributed many of the new methods and tools introduced above, have the opportunity to deliver high-quality “Made-in-China” tools to the international microbiome research community, thus building a competitive and contributive China Microbiome Initiative.

《​​​​​​​Acknowledgements》

Acknowledgements

We are grateful to the support from the National Natural Science Foundation of China (NSFC) (31425002, 91231205, 81430011,

61303161, 31470220, and 31327001), and the Frontier Science Research Program, the Soil-Microbe System Function and Regulation Program, and the Science and Technology Service Network Initiative (STS) from the Chinese Academy of Sciences (CAS).

《​​​​​​​Compliance with ethics guidelines》

Compliance with ethics guidelines

Jian Xu, Bo Ma, Xiaoquan Su, Shi Huang, Xin Xu, Xuedong Zhou, Wei Huang, and Rob Knight declare that they have no conflict of interest or financial conflicts to disclose.