1. Introduction
Glycosylation, a ubiquitous and crucial post-translational modification of proteins, plays a pivotal role in numerous biological functions, including cell recognition, metabolism, signaling, and immune responses [
1]. In-depth analysis of glycosylation features is essential for the discovery of new biomarkers and the development of novel therapeutic strategies [
2], [
3]. This holds particularly true in the burgeoning fields of glycosylation-related omics—specifically glycoproteomics and glycomics [
4], [
5].
Mass spectrometry (MS) is renowned for its precision in detecting and characterizing glycosylated proteins [
6], [
7]. Despite advancements in preprocessing, instrumentation, and bioinformatics, challenges persist in the analysis of protein glycosylation in clinical settings. One of the primary challenges lies in analyzing glycopeptides or glycans, where the presence of co-eluting peptides or other impurities can obscure the target signals in MS. This underscores the necessity for glycopeptide and glycan enrichment prior to analysis, which requires sophisticated sample preprocessing methods. Similarly, in glycomics, the complex and branched structures of glycans require not only sophisticated analytical techniques but also effective enrichment strategies to facilitate accurate analysis.
In the research field of multi-glycosylation-omics and its potential applications in clinical settings, the diversity and complexity of samples are rapidly increasing, resulting in a growing demand for high-throughput sample-preparation technologies with broader applicability. While several high-throughput enrichment techniques for glycoproteomics and glycomics have proven feasible for analyzing simple samples, their application has been primarily limited to processing simple or individual proteins, such as immunoglobulin G (IgG), alpha-1 acid glycoprotein (AGP), or haptoglobin [
8], [
9], [
10]. The efficacy of these techniques in processing complex biological samples is limited. Furthermore, methods capable of integrating analyses of various glycosylation-related biomolecules are becoming increasingly crucial, especially in the context of high-dimensional multi-omics research. Existing sample-preprocessing methods fall short in integrating analyses of diverse glycosylation-related biomolecules into a unified multi-glycosylation-omics study, often being limited to analyzing either glycopeptides or glycans separately. Consequently, there is an urgent need to develop a sample-preprocessing method that not only integrates the analysis of various glycosylation-related biomolecules but also unifies the different branches of multi-glycosylation-omics.
In this study, we address the challenges of sample preparation in multi-glycosylation-omics analysis by integrating our previously validated cotton-based N-glycome sample-preparation method with a high-throughput 96-well-plate platform, culminating in the novel GlycoPro solution for glycoproteomics and glycomics analysis. The cotton-based method, noted for its excellent hydrophilicity, extensive surface area, and remarkable adsorption capacity, has been shown to be effective for
N-glycan enrichment [
11]. We have further optimized and finely tuned this processing procedure to accommodate various analytes, including
N/
O-glycopeptides and
N/
O-glycans. Another notable feature of GlycoPro is its high-throughput capacity in sample processing, which reduces the enrichment or purification time to just 0.25-0.50 min per sample and enables the simultaneous handling of up to 384 samples. This efficiency is consistent across both glycomics and glycoproteomics analyses, ensuring a uniform high-throughput capability for diverse glycan and glycoprotein samples. Regarding analytical comprehensiveness, GlycoPro was able to consistently identify over 3300
N-glycopeptides and 3500
O-glycopeptides from 2 µL of serum, with correlation coefficients exceeding 0.98 across technical replicates. Similarly, regarding the depth of glycomics identification, we were able to identify 193
N-glycans and 71
O-glycans from only 2 μL of serum. Moreover, the stability of GlycoPro has been demonstrated by its consistent detection of glycans over several consecutive days.
Finally, we successfully applied the GlycoPro method to process serum samples from breast cancer patients and identified a robust N-glycan biomarker panel, which demonstrated a sensitivity of 88.24% and a specificity of 78.95% in distinguishing between malignant and non-malignant states.
2. Materials and methods
2.1. Chemical and reagents
2,5-Dihydroxybenzoic acid (DHB), bovine serum albumin (BSA), mucin from porcine stomach (PSM), NH4HCO3, dithiothreitol (DTT), iodoacetamide (IAA), trifluoroacetic acid (TFA), formic acid (FA), N-methylmorpholine (4-NMM), and methylamide chloride were purchased from Sigma (USA). (7-azabenzotriazol-1-yloxy) tripyrrolidinophosphonium hexafluorophosphate (PyAOP) was purchased from Merk (Germany). Trypsin was obtained from Beijing Shengxia Proteins Scientific Co., Ltd. (China). Peptide N-glycosidase F (PNGase F; 500 units per microlitre (U·μL−1)) was obtained from New England Biolabs (USA). Acetonitrile (ACN; high-performance liquid chromatography (HPLC) grade) and dimethyl sulfoxide (DMSO) were purchased from Sinopharm Chemical Reagent Co., Ltd. (China). Defatted cotton balls were purchased from Shanghai Honglong Medical Equipment Co., Ltd. (China). All water used in the experiment was prepared using a Milli-Q system (Millipore, USA).
2.2. Clinical sample collection
All blood samples were stored in plain tubes at −80 °C until analysis. The use of human serum samples was approved by the Guangdong Provincial People’s Hospital (Guangdong Academy of Medical Sciences). The following samples were obtained: 45 samples from healthy controls (HCs), 43 samples from benign patients with fibroid, 80 samples from infiltrating carcinoma (IC) with lymph node metastasis (LNM) breast cancer patients, and 8 samples from IC breast cancer patients. The study was approved by the Research Ethics Committee of Guangdong Provincial People’s Hospital (Guangdong Academy of Medical Sciences) (ethics No. KY-2020-042-02).
2.3. Sample preparation
DHB was dissolved in 50% ACN (containing 0.1% TFA) at a concentration of 10 mg·mL−1. PSM and BSA were dissolved in distilled water at a concentration of 0.1 or 1.0 mg·mL−1 for further use. For the conventional enzymatic release of N-glycans, 2 μL of human serum was dissolved in 38 μL of 25 mmol·L−1 (mM) ammonium bicarbonate (ABC) buffer (pH 7.8) at a concentration of about 1 mg·mL−1 protein; 40 μL of serum protein solution was denatured by heating at 100 °C for 5 min, and 0.5 μL of PNGase F solution was added after the solution was cooled to room temperature. The mixture was incubated at 37 °C overnight.
2.4. Preparation of the GlycoPro platform
A measured quantity of cotton wool, ranging from 3.75 to 4.25 mg, was carefully packed into a 10 μL pipette tip. This process was repeated for a total of 192 pipette tips. A quantitative polymerase chain reaction (qPCR) plate with pre-punched holes at the bottom was then prepared. The prepared pipette tips were arranged and securely placed over this modified qPCR plate, thus completing the assembly of the GlycoPro platform, as depicted in Fig. S1 in Appendix A.
2.5. Release of N-glycans on the GlycoPro platform
To initiate N-glycan release, a 96-well plate was prepared. With the aid of an eight-channel Eppendorf pipettor, 38 µL of 25 mM ABC buffer was allocated to each well. Diverse samples were then vigorously resuspended and centrifuged. Post-centrifugation, 2 µL of the clear supernatant was accurately pipetted into the corresponding wells. The plate was sealed securely with adhesive film and immersed in a water bath at 100 °C for 5 min to induce protein denaturation. The plate was subsequently cooled to room temperature and centrifuged to consolidate the contents. Thereafter, a precise 0.5 µL volume of PNGase enzyme was dispensed into each well to ensure uniform enzymatic activity. The assembly was then placed on a 37 °C orbital shaker to incubate overnight, thereby enabling the complete enzymatic digestion of the N-glycans.
2.6. Methylamidation of sialylated N-glycans
Methylamidation of the sialylated
N-glycans was conducted with slight modifications from previously described methods [
38]. Initially, the
N-glycans isolated via the GlycoPro technique were subjected to lyophilization. In each well of a 96-deep-well plate, 10 µL of 5 mol·L
−1 (M) methylamine hydrochloride (prepared in DMSO) and 10 µL of 1 M PyAOP (mixed in a DMSO:4-NMM solution with a 70:30 volume ratio) were introduced. The plate was then incubated at ambient temperature for 1 h to allow the reaction to proceed. Upon completion, the reaction was quenched by adding 180 µL of 80% ACN containing 0.1% TFA to each well. The resulting solution subsequently underwent GlycoPro enrichment to eliminate excess reagents.
2.7. Enrichment of N-glycans using the GlycoPro platform
Prior to sample loading, each tip of the GlycoPro platform was preconditioned with two washes of 80 µL of 0.1% TFA, followed by equilibration using 80 µL of 80% ACN containing 0.1% TFA. A 1 min soak period was employed between the solution-addition and centrifugation steps to ensure thorough saturation of the cotton wool within each tip. Sample preparation included adjusting all specimens to an 80% ACN solution before application to the cotton tips. This was followed by the application of the sample onto the cotton tip and a brief centrifugation to facilitate adsorption. To wash, the cotton tip was rinsed six times with 80 µL of 80% ACN containing 0.1% TFA. For elution, 50 µL of water was dispensed onto the cotton tip; this process was repeated three times, yielding a total of 150 µL of the collected solution. All operations were conducted using an eight-channel Eppendorf pipettor to enhance both efficiency and consistency. The final lyophilization step was performed using a 96-well-plate-compatible integrated concentrator centrifuge (SCIENTZ-10LS) and lyophilizer (SCIENTZ-10A) from SCIENTZ.
2.8. Release of O-glycan from PSM
The oxidative release of
O-glycans was executed in line with previously established protocol [
12]. In brief, the pH of a 5 mL, 15% sodium hypochlorite (NaOCl) solution was adjusted by titrating with approximately 3 mL of 1 M hydrochloric acid until a pH of 6.8 was attained. Next, 100 μL of this buffered NaOCl solution was added to 200 μL of an aqueous solution of PSM (1 mg·mL
−1), with the entire mixture being maintained on ice to prevent degradation. The reaction was halted by the addition of 3 μL of 10% FA by volume. After quenching, the mixture was lyophilized and redissolved in 25 μL of water, followed by desalting and enrichment using the GlycoPro platform.
2.9. Enrichment of O-glycans using the GlycoPro platform
The O-glycans were desalted and enriched utilizing the GlycoPro platform in a manner analogous to the procedure for N-glycan enrichment, with modifications to the buffer system. In brief, the samples were redissolved in a solution of 70% ACN and 15% methanol (MeOH) containing 1% TFA. This mixture was also employed for washing to remove non-glycosylated peptides and excess salts.
2.10. Enrichment of intact N/O-glycopeptides using the GlycoPro platform
Serum samples (5 µL each) were diluted 20-fold with 25 mM ABC in a 96-well plate, as previously described. Proteins were denatured via reduction with 10 mM DTT at 37 °C for 30 min and alkylated with 20 mM IAA at 25 °C for 45 min. Digestion with trypsin was performed at an enzyme-to-protein ratio of 1:50 (w/w) at 37 °C for 16 h. The resulting tryptic peptides were then desalted and concurrently enriched using the GlycoPro platform. To mitigate sample processing bias, the glycopeptide enrichment and MS analysis of intra-batch samples were executed randomly. For intact N-glycopeptide enrichment, lyophilized tryptic peptides were resuspended in 80% ACN containing 1% TFA and washed with the same solution to remove non-glycosylated peptides and residual salts.
2.11. MALDI-TOF MS analysis
Sample aliquots were carefully spotted onto a matrix-assisted laser desorption/ionization (MALDI) target plate and left to dry at room temperature. Subsequently, 1 µL of the DHB matrix solution (5 mg·mL−1 in 50% ACN with 0.1% TFA) was applied to the dried sample spots and allowed to crystallize prior to analysis. This sample-matrix co-crystallization facilitates ionization efficiency during MALDI-MS analysis.
MALDI-MS spectra were recorded in the positive ion reflector mode, scanning a mass range from m/z 1000 to 4000, utilizing a rapifleX MALDI-time-of-flight (TOF) mass spectrometer (Bruker). The instrument settings and calibration were optimized for the highest accuracy and resolution of glycan molecular ions.
2.12. LC-MS/MS analysis
The N/O-glycopeptides from the serum samples were resuspended in Solvent A (water with 0.1% FA), separated by means of nano-liquid chromatography (LC), and analyzed using online electrospray tandem MS (MS/MS). The experiments were performed on an EASY-NanoLC 1200 system (Thermo Fisher Scientific, USA) connected to a Thermo Scientific Orbitrap Eclipse Tribrid mass spectrometer (Thermo Fisher Scientific) equipped with an online nanoelectrospray ion source with custom spray potential. Glycopeptides were loaded onto the analytical column (Acclaim PepMap C18, 75 μm × 25 cm) and subsequently separated with a linear gradient, where Solvent A was 0.1% FA in water and Solvent B was 80% ACN with 0.1% FA. The gradient was 60 min in total for the glycopeptides from the IgG samples: 2%-8% from 0 to 1 min, 8%-18% from 1 to 31 min, 18%-30% from 31 to 35 min, 18%-30% from 31 to 35 min, 30%-55% from 35 to 55 min, and held for 95% for the last 5 min. The column flow rate was maintained at 300 nL·min−1.
The parameters used for glycopeptide analysis in data dependent acquisition (DDA) mode were set as follows: For MS1, the scan range was set m/z from 350 to 2000, with a resolution of 60 000, a normalized automatic gain control (AGC) target of 100%, a maximum injection time of 50 ms, and the included charge states of 2-6. Each selected precursor was subjected to one higher-energy collisional dissociation (HCD)-MS/MS, where the isolation window was set at 0.7, collision energy was set at 37%, and detector type was Orbitrap. The HCD-MS/MS resolution was set at 30 000, with a normalized AGC target of 500% and a maximum injection time of 200 ms. The data obtained from DDA were searched using the Byonic database.
2.13. Data processing of N-glycans
We conducted database searches using our proprietary software, the specifics of which are accessible at
https://github.com/FudanLuLab/glyhunter. The database used is shown in Table S1 in Appendix A. The search parameters were set as follows: calibration tolerance at 50 parts per million (ppm); search mass tolerance at 20 ppm; and for the summary, the parameters were “turn_on: area” and “turn_on: intensity” with a signal-to-noise threshold of 3.
2.14. Data processing
For N-glycopeptide identification, raw data were searched with Byonic software (Protein Metrics, USA) against the human UniProt database (20 428 protein entries released in October 2023). The spectra were searched using precursor and fragment ion tolerances of 10 and 20 ppm, respectively. The search was restricted to tryptic peptides, allowing up to three missed cleavages. Cysteine carbamidomethylation (C; +57.022 Da) was specified as a fixed modification. Methionine oxidation (M; +15.995 Da) was set as a variable modification. The false discovery rate (FDR) of the glycopeptide spectrum match (GPSM) was restricted to less than 1%.
For MS1-based quantification, MZmine reported the relative abundance of each glycopeptide according to the precursor peak area derived from the MS1 elution profile of the DDA raw files.
2.15. Data analysis
MALDI mass spectra extraction was executed utilizing Bruker Daltonics flexAnalysis Application, with spectra subsequently visualized via Origin 2019. Three-dimensional scatter plots for glycopeptide retention time (RT) analysis, along with boxplots, Venn diagrams, scatterplots, pie charts, and bar charts, were generated using the ggplot2 package within R. In these boxplots, the centerlines and squares represent the median and mean values, respectively. The upper and lower limits of each box signify the 75th and 25th percentiles, respectively. A heatmap and partial least squares discrimination analysis (PLS-DA) plot, which illustrate sample-to-sample correlations, were created using the online platform
https://www.metaboanalyst.ca/. Mantel test analysis, correlating the targeted
N-glycopeptides with clinical data, was conducted using the linkET package (v. 0.0.7.4) in R.
2.16. Differential expression analysis and machine learning model
Missing values in our dataset were addressed using zero imputation, and any N-glycans with over 50% missing values were excluded from further statistical analysis. After data preprocessing, 79 serum N-glycans were selected for subsequent analysis, each conforming to the criteria required for advanced data evaluation. The comparison between the HC/benign patients and breast cancer patients was conducted using a normality test followed by a t-test if the data were normally distributed. These analyses were performed utilizing the Stats package (v. 4.2.3) in R. To account for multiple comparisons and control the FDR, we applied the Benjamini-Hochberg procedure for p-value adjustment. All reported p values are FDR-adjusted unless otherwise specified. Differentially expressed glycopeptides were identified based on a p-adjust value threshold of less than 0.05.
The machine learning implementation in this study was carried out using Python (v. 3.10.11), predominantly relying on the sci-kit-learn library (v. 1.2.2). The dataset was divided into a training set and a test set through a stratified partitioning strategy. The hyperparameter, particularly the penalty coefficient of the final logistic regression model, was meticulously optimized via a five-fold cross-validation process within the training set. Subsequently, the model was retrained with the entire training set and underwent thorough evaluation on the test set to ensure an impartial assessment of its performance.
In our analytical process, a comprehensive selection strategy was employed to identify five N-glycans for further focus. This selection was grounded in several key factors: Firstly, the extent of significant differential expression of the N-glycans between the two disease states was evaluated. Secondly, the correlation of the N-glycans with clinical information was scrutinized to gauge their disease relevance. Thirdly, the relative abundance of these N-glycans in both disease states was assessed. Finally, the impact of each N-glycan on the discriminative accuracy between healthy/benign controls and breast cancer patients was analyzed. This thorough selection process was crucial to ascertain the relevance and potential of the chosen N-glycans for in-depth analysis and biomarker identification.
3. Results
3.1. Workflow of the GlycoPro platform
In our previous research, we developed a cotton-based method for the enrichment and desalting of
N-glycans. Cotton’s stability under acidic conditions and high-concentration ACN environments makes it particularly effective for glycan enrichment from complex samples. Utilizing this approach, we successfully achieved the N-glycome profiling of multiple samples within 2.5 h, demonstrating both high selectivity and sensitivity [
11]. However, this method was primarily tailored for single-omics studies with exhibited low throughput, as reported in the comparable literature. Subsequently, we discovered that cotton could also be utilized for enriching
N-glycopeptides and
O-glycopeptides/
O-glycans. As a result, we further optimized this technique, making it suitable for preprocessing various types of multi-glycosylation-omics samples. This led to the integration of these methods into a single GlycoPro platform. GlycoPro consists of three integral modules:
(1) Module 1: A 96-well plate, which serves as the base for eluents and washes.
(2) Module 2: A holder designed to accommodate interchangeable enrichment tips.
(3) Module 3: A set of modular tips tailored for the preparation of specific sample types, as illustrated in Fig. S1.
The operation of our system is straightforward and efficient. For instance, to prepare samples for glycomics research, we start by adding 2 µL of serum to each well of a 96-well plate for enzymatic reactions (the digestion vessel), prefilled with 38 µL of enzymatic digestion buffer, thus achieving a 20-fold dilution. The digestion vessel is then sealed with a film and subjected to a 10 min denaturation of the serum proteins in a 100 °C water bath. After cooling to room temperature, 0.5 µL of PNGase F enzyme is added to each well, followed by a rapid enzymatic digestion at 50 °C for 1 h or prolonged digestion at 37 °C for 12 h to release N-glycans. Following digestion, the N-glycans and deglycosylated proteins in the digestion vessel are transferred to the cotton tips located in Module 3 of the GlycoPro device for initial N-glycan enrichment. This procedure involves using the cotton tips in Module 3 for enrichment and purification, and a 96 deep-well plate in Module 1 for collecting eluates. Both Modules 1 and 3 can be easily replaced to accommodate different sample types, while Module 2 of GlycoPro is designed for repeated use without the need for replacement, accommodating a variety of sample types across multiple applications. This design enhances the versatility and sustainability of the platform, making it suitable for diverse glycomic and glycoproteomic analyses.
Furthermore, in studies requiring glycan derivatization, the post-digestion enriched glycans are directly eluted from Module 3 to Module 1’s 96 deep-well plate. The samples in Module 1 undergo lyophilization, followed by derivatization targeting sialic acid (S) groups using methylammonium chloride. After derivatization, these glycans undergo further purification and are collected into another Module 1 using GlycoPro, prior to MALDI-MS analysis. This process is based on the same protocols employed in the enrichment stage (
Fig. 1). Thus, the entire GlycoPro system ensures a seamless transition from the enzymatic release of glycans to the enrichment of derivatized glycans. We have meticulously optimized the process in the 96-well plates and tailored the enrichment process for various types of analytes.
3.2. Comprehensive profiling of N-glycopeptides in human serum using the GlycoPro platform
First, we assessed the efficiency of the GlycoPro platform by enriching glycopeptides from a simple protein mixture. Given the low abundance of glycopeptides typically found in total serum peptide mixtures (2%-5%) [
13], we spiked BSA with IgG at a ratio of 50:1 (w:w) to mimic the 2% glycopeptides content commonly reported in serum. MALDI-MS analysis after enrichment showed a significant improvement in the signal-to-noise ratio for the IgG
N-glycopeptides (
m/
z = 2602.052, 2635.029, 2765.082, 2796.093, 2927.162, and 2958.150; Fig. S2(a) in Appendix A). We optimized the cotton filling in each tip, finding that 1 mg of cotton was sufficient for the maximal enrichment of IgG
N-glycopeptides from a total protein amount of 100 μg. This optimization was confirmed by comparing peak areas or intensities between a representative IgG
N-glycopeptide (EEQYNSTYR-H4N4F1,
m/
z = 2796.092) and a standard non-glycopeptide from BSA (HPYFYAPELLYYANK,
m/
z = 1888.927; Fig. S2(b) in Appendix A). We then fine-tuned the binding/washing buffer system, discovering that 80% ACN and 1% TFA provided the most effective glycopeptide enrichment, in line with previously reported findings (Fig. S2(c) in Appendix A). Employing the GlycoPro platform, we then enriched
N-glycopeptides from serum samples, identifying 3313
N-glycopeptides (Table S4 as a separate file in Appendix A) and quantifying 1967 glycopeptides (
Fig. 2(a)) across technical triplicates from 2 uL of serum (Fig. S4 in Appendix A). The peak intensities of these quantified
N-glycopeptides spanned a dynamic range of 10
4 (
Fig. 2(b)). Moreover, the RT of the identified
N-glycopeptides showed a correlation coefficient exceeding 0.98, attesting to the reproducibility of the GlycoPro platform (Fig. S6(a) in Appendix A).
Further examination of the motif sequences of the enriched
N-glycopeptides validated their congruence with established conservative structural motifs (
Fig. 2(c)). Evaluation of the serine (S) or threonine (T) residues at the third position within the canonical N-glycosylation motif, as illustrated in
Fig. 2(d), provided insight into the conserved nature of the glycosylation sites. In-depth profiling of the glycan structures on these glycopeptides indicated a predominance of complex-type glycans. Notably, sialylated glycans constituted over one-third of these structures, with numbers of
N-glycans featuring both S and fucose (F) comparable with those of high-mannose
N-glycans. In contrast,
N-glycans exclusively containing F were the least frequent (
Fig. 2(e)). Our analysis also highlighted the variability of
N-glycans at specific glycosylation sites and delineated the commonality of glycosylation sites per glycoprotein (
Fig. 2(f)). Collectively, these findings affirm the GlycoPro platform’s remarkable efficiency in
N-glycopeptide enrichment and demonstrate its suitability for comprehensive glycoproteomic analysis.
3.3. Comprehensive profiling of O-glycopeptides in human serum using the GlycoPro platform
The enrichment process for
O-glycopeptides presents greater challenges compared with that of
N-glycopeptides due to multiple factors: ①
O-glycopeptides exhibit greater heterogeneity, with O-glycosylation occurring on S or T, which complicates the identification of specific glycosylation sites, in contrast to the more conserved N-glycosylation at asparagine A-X-S/T sequons [
14]. ② Current enrichment strategies are primarily tailored for
N-glycopeptides, which have more consistent glycosylation patterns, while the diversity and instability of
O-glycopeptides hinder the development of a universal enrichment method. ③ The sensitivity of MS for
O-glycopeptides is often reduced due to their characteristically shorter and more complex glycan chains, which can adversely affect ionization efficiency and the resulting detection signals [
15].
To facilitate the concurrent analysis and understanding of both N- and O-glycosylation modifications, various strategies have been developed based on the hydrophilic interaction liquid chromatography (HILIC) model [
16]. These methods often necessitate pretreatment with glycosidases to remove
N-glycans prior to the analysis of
O-glycopeptides, which adds complexity to the preprocessing steps and can lead to increased sample loss. Recent studies have proposed global analysis strategies for the simultaneous enrichment and characterization of
N- and
O-glycopeptides without the enzymatic removal of
N-glycans [
17]. However, the synthesis of the required materials for these methods can be complex, limiting their practicality for clinical applications. Consequently, we explored the potential of GlycoPro for the simultaneous enrichment of both
N- and
O-glycopeptides. Our findings demonstrate that the GlycoPro platform effectively enriches both types of glycopeptides from serum samples (Fig. S3 in Appendix A). We successfully identified 3561
O-glycopeptides in serum samples (Table S5 as a separate file in Appendix A), as shown in
Fig. 3(a). This analysis also revealed 603
O-glycosylation sites and the presence of 130 distinct
O-glycoproteins (Fig. S5 in Appendix A). The peak intensities of these 1994 quantified
O-glycopeptides spanned a dynamic range of 10
4 (
Fig. 3(b)), which demonstrates the high sensitivity of this method. Furthermore, the RT of the identified
O-glycopeptides (with a correlation coefficient exceeding 0.98) underscores the GlycoPro platform’s precision and its broad scope in capturing glycopeptide diversity (Fig. S6(b) in Appendix A).
Motif sequence analysis of the triplicate technical replicates showed that the sequences of the enriched
O-glycopeptides aligned with established conservative structural motifs (
Fig. 3(c)). Quantitative evaluations of S and T residues at the glycosylation sites were also performed (
Fig. 3(d)). Further analysis of the glycan compositions indicated that approximately 600
O-glycopeptides were exclusively modified with F, predominantly displaying single fucosylation. Similarly, about 800
O-glycopeptides were uniquely modified with S, predominantly with single sialylation. Notably, the most substantial category comprised
O-glycopeptides modified with both S and F, accounting for about 1100
O-glycopeptides (
Fig. 3(e)). The analysis of glycan distribution at individual glycosylation sites and the enumeration of glycosylation sites per glycoprotein showed that most had a diversity of five or more glycan types (
Fig. 3(f)), highlighting the significant heterogeneity of glycan structures in the serum O-glycoproteome.
Additionally, we conducted a preliminary quantitative analysis of the 2735 N-glycopeptides and 1760 O-glycopeptides across 20 human serum samples (10 healthy individuals and 10 IC with LNM patients). Volcano (Figs. S7 and S8 in Appendix A) and clustering heatmaps (Figs. S9 and S10 in Appendix A) showcasing the top ten differentially expressed N- and O-glycopeptides were generated, providing an initial visualization of the differences between the disease groups. Representative LC-MS spectra of significantly different N- and O-glycopeptides across these groups are also presented (Figs. S11 and S12 in Appendix A), further illustrating the distinct glycopeptide profiles associated with each condition. Collectively, these results attest to the GlycoPro platform’s capability for efficient O-glycopeptide enrichment and demonstrate its potential value for comprehensive glycoproteomic research.
3.4. Comprehensive profiling of N/O-glycans using the GlycoPro platform
We then moved on to evaluate the performance of GlycoPro for glycome study, including the N-glycome and the O-glycome. For N-glycome enrichment, we have previously demonstrated the distinct advantages of using cotton in sample preparation [
11]. However, pipette-tip-based enrichment methods present challenges in controlling residual volumes during elution, especially when processing large-scale samples, which can impact the reproducibility of enrichment efficiency. By transitioning from tip-based processing to a 96-well-plate format, our GlycoPro method enhances high-throughput
N-glycan enrichment by using centrifugation to uniformly control the residual solution during the enrichment process. This adjustment results in heightened throughput, specificity, and reproducibility for
N-glycan enrichment, thereby propelling glycomics research forward. Furthermore, we have rigorously evaluated the intra-day and inter-day stability and reproducibility of this method, as well as its detection limits.
To validate the platform’s capability for processing a large volume of samples, with the potential to handle up to 384 samples daily, we undertook an evaluation over a continuous three-day period. This assessment was designed to test the platform’s capacity to process clinical samples during this timeframe. We conducted six technical replicates each day to evaluate the intra-day reliability, while also extending the assessment across the three days to evaluate the inter-day consistency.
Fig. 4(a) demonstrates the platform’s high reproducibility in
N-glycan enrichment, with 72 out of 105
N-glycans consistently detected across a three-day period, achieving an overlap rate exceeding 70.8% (Fig. S13 in Appendix A). This level of reproducibility is particularly important in clinical contexts, where consistent glycan profiling is essential for reliable biomarker discovery and monitoring. Furthermore, the GlycoPro platform exhibited remarkable sensitivity, successfully enriching six
N-glycans from serum that was diluted as a minimal volume of 62.5 nL (Fig. S14 in Appendix A).
In a detailed assessment of the enriched
N-glycans,
Fig. 4(b) displays a radial line chart that represents the relative intensities of the identified glycans. Each radial axis corresponds to an individual glycan, while the concentric circles denote intensity levels from technical replicates. This consistent detection underscores the platform’s potential for quantitative glycomic studies. The MALDI-MS spectra, as shown in
Fig. 4(c), further illustrate the diversity of the profiled
N-glycans.
To adapt our GlycoPro platform for the analysis of O-glycans, we have innovated the application of cotton for the enrichment of O-glycans released through oxidative processes. Whereas existing studies predominantly utilize porous graphitic carbon (PGC) for the enrichment of O-glycans released via NaOCl, our approach has refined the binding buffer composition to contain 70% ACN, 15% MeOH, and 1% TFA, optimizing the enrichment process to be compatible with the GlycoPro system.
The robustness of our optimized method is confirmed by the consistent peak intensities of the identified
O-glycans.
Fig. 5(a) presents box plots of the log-transformed intensity values for 76
O-glycans in PSM, consistently detected over a three-day span with four technical replicates each day. Such consistent detection reinforces the method’s reliability for
O-glycan profiling. The narrow range of intensities across these replicates emphasizes the method’s precision and its aptitude for quantitative assessments. Moreover, the Venn diagram in
Fig. 5(b) illustrates the reproducibility of our platform, evidenced by the significant overlap in
O-glycan identification across the technical replicates; this underscores the reliability of the enrichment process, which is essential for glycomic analysis. In addition, the MALDI-MS spectra in
Fig. 5(c) offer a comprehensive view of the
O-glycans enriched via our method. For seven
O-glycans exhibiting higher peak intensities, MS/MS provided further validation of their structural information (Figs. S15-S21 in Appendix A). Beyond the 76
O-glycans identified in PSM, we conducted a quantitative analysis of 71
O-glycans across 20 human serum samples, comprising 10 healthy individuals and 10 IC with LNM patients. Notably, the top eight differentially expressed
O-glycans (Fig. S22 in Appendix A) showed effective classification in the clustering heatmap, distinguishing between the two patient groups (Fig. S23 in Appendix A). We also present representative MALDI-MS spectra of significantly different
O-glycans across different disease groups, further illustrating the distinctive glycan profiles associated with each condition (Fig. S24 in Appendix A).
The GlycoPro platform is confirmed to be a robust and reliable system for the enrichment of both N- and O-glycans. Its suitability for high-throughput applications and its potential utility in biomarker discovery establish it as a significant tool for advancing GlycoMarker research.
3.5. High-throughput serum N-glycome profiling for breast cancer biomarker discovery
Finally, we demonstrated the application of our platform in clinical research. The workflow of our N-glycome biomarker discovery study is depicted in
Fig. 6, which details the process from sample preparation to MS analysis. The 88 cases of IC and IC with LNM breast cancer patients represent the cancer group within our study cohort. Together with the 88 HC, the clinical cohort comprises a total of 176 individuals. Detailed clinical information for these 88 cancer cases is presented in Table S2 in Appendix A, and information for all 176 clinical samples is provided in Table S3 in Appendix A. We divided all 176 samples into a training set (140 samples) and a test set (36 samples).
N-glycans from 2 μL of serum for each sample were prepared by GlycoPro and then analyzed by means of MALDI-MS. Differential analysis and machine learning algorithms were applied to the abundance of
N-glycans. We also show representative MALDI-MS spectra of significantly different
N-glycans across disease groups, highlighting the distinctive glycan profiles of each condition (Fig. S25 in Appendix A).
Before analyzing the samples, quality control (QC) samples composed of a mixture of all specimens were first evaluated to ensure consistency. The boxplot shows the consistency of the N-glycan peak areas across 16 QC samples (Fig. S26(a) in Appendix A), and the radial line chart further displays the reproducibility for each N-glycan across the entire sample cohort (Fig. S26(b) in Appendix A). Following the QC analysis, we proceeded to examine the platform’s performance with clinical samples. The consistent N-glycan peak areas across the QC samples laid a strong foundation for further analyses, which revealed the platform’s remarkable precision and minimal variation between batches, underscoring its suitability for high-throughput applications (Fig. S27 in Appendix A).
In total, we identified 193 types of serum
N-glycans from just 2 μL of serum. After data preprocessing, including a missing value filtration, we ultimately selected 79 types of serum
N-glycans for subsequent differential analysis (a detailed scheme can be found in Section 2). Preliminary observations indicated that these 79 serum
N-glycans could effectively distinguish between healthy/benign control and breast cancer samples (
Fig. 7(a)). These distinct patterns were evident in the PLS-DA results (
Fig. 7(b)), underlining the diagnostic potential of these glycan structures. Employing a
t-test with a stringent significance threshold of
p-adjust < 0.05, we discerned 62
N-glycans with significant differential abundance between the control group and the IC and IC with LNM conditions (
Fig. 7(c)). We classified these
N-glycans into five structural categories: those containing F, S, combinations of F and S (FS), high mannose (high mannose structures), and other glycan structures. The comparative expression levels of these glycan subtypes between healthy/benign and cancer samples are quantitatively represented in Fig. S28 in Appendix A, providing a glycomic landscape for further analysis.
Quantitative analysis of the
N-glycan structural categories indicated an increased representation of high mannose, F, S, and FS glycans within cancerous samples. This upregulation is in concordance with patterns documented in the current body of glycomic literature [
18]. In contrast,
N-glycans lacking these specific modifications displayed a relative decrease in abundance. A detailed subgroup analysis elucidated a reduction in mono-sialylated
N-glycan species as opposed by a rise in their poly-sialylated counterparts, implying a trend toward increased sialylation in the context of malignancy (
Fig. 7(d)) [
19]. The trend included a slight elevation in mannose content within the high-mannose
N-glycans. Moreover, a significant increase in F content was observed, which is a novel finding not previously reported in the literature. S metabolism has been identified as a crucial factor in breast cancer progression and metastasis. Studies have shown that nutrient-deprived cancer cells preferentially utilize S to maintain cell surface glycosylation, which contributes to the pathogenicity of breast cancer cells [
20]. Furthermore, protein sialylation has been found to regulate a gene expression signature that promotes breast cancer cell pathogenicity, emphasizing the significance of S metabolism in breast cancer. Thus, there is growing interest in leveraging S metabolism for cancer diagnosis and treatment [
21].
For further analysis, the glycans were thus grouped into two broad categories: one encompassing F, FS, high-mannose structures, and other
N-glycans; and the other encompassing S structures. A comparative analysis demonstrated an inverse relationship between the abundance of sialylated
N-glycans and the combined group of the other four categories across the two disease states (
Fig. 7(e)). This finding not only corroborates observations from existing glycomic research but also suggests that these differential glycan profiles could serve as viable biomarkers for the diagnosis and progression of breast cancer [
22].
3.6. Integrative analysis of N-glycan biomarkers for breast cancer diagnosis
We utilized Mantel testing for multivariate correlation analysis, assessing the relationship between clinical parameters—age, TNM classification, and breast cancer subtypes—and the abundance of specific
N-glycans. The Mantel test diagram vividly highlights these correlations, with the width of the lines denoting the strength of the association, pinpointing
N-glycans significantly correlated with clinical information (
Fig. 8(a)). Then, we quantified the relative abundance of 62 differentially expressed
N-glycans in breast cancer using a rose diagram (
Fig. 8(b)). While all
N-glycans were included in the diagram for a comprehensive view, our subsequent analyses focused on those within the top 50% of relative abundance. We posit that these more abundant
N-glycans are prime candidates for biomarker development due to their higher likelihood of consistent detection across clinical serum samples.
To further refine our biomarker discovery process, we leveraged logistic regression—a machine learning model highly valued for its interpretability, simplicity, and well-established track record in clinical research. Logistic regression facilitates a transparent relationship between feature selection and outcome prediction, making it particularly suitable for medical diagnostic applications where understanding the influence of each variable is crucial. Moreover, logistic regression is advantageous when working with a limited dataset, offering robust performance without requiring an extensive array of features. After conducting a series of comprehensive screenings, we ultimately selected five
N-glycans as our biomarker panel. Details of the selection process are outlined in Section 2. Box plots are utilized to illustrate the differential expression of these select glycans between healthy/benign control and breast cancer samples. Inset Shapley additive exPlanations (SHAP) value plots emphasize the relevance of these glycans in the model output, highlighting their potential as biomarkers for breast cancer diagnosis (
Fig. 8(c)). We then trained a logistic regression-based machine learning model to evaluate the predictive power of the identified
N-glycan panel. The confusion matrix from this model demonstrates its diagnostic accuracy, achieving a sensitivity of 88.24% in correctly identifying true positives and a specificity of 78.95% in correctly identifying true negatives, thus confirming the model’s efficacy in classification (
Fig. 8(d)). The model’s diagnostic precision was further assessed through the construction of a receiver operating characteristic (ROC) curve, which achieved an area under the curve (AUC) of 0.89, indicating a significant discriminative capability between healthy/benign control and breast cancer samples (
Fig. 8(e)).
4. Discussion
Incorporating multi-omics approaches—especially glycomics and glycoproteomics—into cancer diagnostics holds promise for revolutionizing the early detection and treatment of cancer. Limitations remain despite significant progress in cotton-based enrichment technologies, such as the methods proposed by Selman et al. [
23] and Xin et al. [
24] for the enrichment of glycans and glycopeptides, which have laid a solid foundation for this field. Bladergroen et al. [
25] realized an automated sample-processing workflow based on MALDI-MS, providing an important reference for developing high-throughput analysis platforms. However, these works primarily focused on
N-glycan enrichment and analysis, with time-consuming processes and limited integration for
O-glycans and glycopeptides [
26].
Our study positions the GlycoPro platform as a transformative tool in glycoscience, specifically tailored for the high-throughput analysis of both glycoproteins and glycans. By introducing a 96-well-plate design and optimizing the sample workflow, we have enhanced the sample-processing efficiency and universality. The platform achieves synchronous enrichment and analysis of
N/
O-glycans and
N/
O-glycopeptides—a key issue not yet solved in existing high-throughput workflows. We have compressed the digestion time to 1 h at 50 °C, which enables the complete processing (including digestion or dissociation, desalting, and lyophilization) of 384 samples within 3.5 h (for
N/
O-glycans) or 4.5 h (for
N/
O-glycopeptides). We have also optimized the sample amount to require only 2 μL of serum per sample, which reduces the difficulty of sample acquisition in clinical research. Moreover, our research addresses the challenges highlighted by Wheeler et al. [
27] regarding the effective processing of released
N-glycans on the MALDI-MS platform and the stabilization and derivatization of S. In the GlycoPro platform, we have integrated the derivatization step into the existing workflow, adding only 2.5 h (1 h for derivatization and 1.5 h for desalting) to complete the process and ensuring that all analytical steps are completed within 6 h (3.5 h + 2.5 h) for
N-glycans. This integrated design allows the platform to simultaneously analyze
N-glycans,
O-glycans, and glycopeptides, providing greater flexibility for glycomics research.
GlycoPro has demonstrated impressive depth in N/O-glycopeptide identification, exceeding 3000 intact N/O-glycopeptide identifications from individual serum samples, and has displayed high correlation and robustness in technical triplicates (above 0.99). The consistent identification of serum N-glycans across days of technical replicates has demonstrated the platform’s high reproducibility and sensitivity, affirming its reliability for clinical research applications. This reliability was further proven by maintaining performance even with minute amounts of sample (62.5 nL), indicating a low detection limit, and emphasizing the platform’s capability for early-stage cancer detection by monitoring glycan alterations. Our platform’s adaptability to complex samples underscores its potential for accelerating biomarker discovery and the comprehensive analysis of glycosylation patterns in various diseases, as demonstrated by the N-glycomics of breast cancer studied in this research.
Subsequently, we used GlycoPro to process serum samples from breast cancer patients and were able to identify a powerful panel of five
N-glycan biomarkers, with high sensitivity, specificity, and significant clinical relevance. It is well-known that changes in glycosylation patterns, such as increased fucosylation and sialylation, promote cancer progression by modulating cell-cell communication, immune evasion, and metastasis. For instance, aberrant sialylation has been linked to the survival of breast cancer stem cells (BCSCs) and the occurrence of epithelial-mesenchymal transition (EMT) in triple-negative breast cancer (TNBC) [
28], [
29]. Additionally, fucosylated glycan structures are associated with enhanced tumor invasiveness and metastatic potential in breast cancer [
30], [
31], [
32]. The highmannose structures observed in our analysis may reflect metabolic alterations in cancer cells that support rapid growth, as previously documented in Ref. [
33]. These insights into glycan alterations in breast cancer provide a mechanistic basis for the clinical relevance of the identified
N-glycan biomarkers, underscoring their potential utility as diagnostic and therapeutic targets [
32], [
34], [
35], [
36]. Moreover, the comprehensive analysis of
N-glycan biomarkers in this study, which combines bioinformatics, clinical metrics, and machine learning, embodies the potential of interdisciplinary approaches in precision oncology. The development of a predictive model using logistic regression demonstrates the effectiveness of machine learning in interpreting complex biological data. The AUC of 0.89 obtained by our model validates the feasibility of using readily identifiable
N-glycan spectra as reliable biomarkers for breast cancer diagnosis.
In summary, the GlycoPro platform’s adaptability to handling a large volume of clinical samples offers opportunities to extend its application beyond breast cancer to other glycosylation-related diseases. Specifically, diagnosing breast cancer through the detection of serum
N-glycans offers a non-invasive and more objective approach compared with traditional methods [
37], [
39]. However, our study has limitations that need to be addressed in future research. While the platform has proven effective for
N-glycan enrichment and the identification of potential biomarkers for breast cancer, applying these findings to clinical diagnostics requires further validation in larger, independent cohorts. Additionally, an in-depth exploration of the glycoproteome in large clinical sample cohorts using GlycoPro remains to be carried out. Future research will focus on expanding the platform’s capabilities, refining the biomarker panel, and validating these tools in a clinical setting, which are critical steps toward fully leveraging the potential of glycomics in cancer diagnostics and personalized medicine [
40], [
41]. Finally, while this study identifies promising glycan biomarkers associated with breast cancer, external validation in independent cohorts will be essential to confirm their clinical applicability. Future studies will focus on validating these findings across diverse populations to strengthen their potential as diagnostic and therapeutic targets.
CRediT authorship contribution statement
Xuejiao Liu: Writing - original draft, Visualization, Methodology, Investigation, Formal analysis, Conceptualization. Yue Meng: Validation, Supervision, Resources, Investigation. Bin Fu: Writing - review & editing, Visualization, Software, Investigation, Formal analysis. Haoru Song: Writing - review & editing, Writing - original draft, Validation, Supervision, Investigation. Bing Gu: Supervision, Resources, Project administration, Funding acquisition. Ying Zhang: Writing - review & editing, Writing - original draft, Supervision, Project administration, Methodology, Funding acquisition, Data curation, Conceptualization. Haojie Lu: Supervision, Project administration, Funding acquisition, Data curation, Conceptualization.
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This work was supported by the National Key Research and Development Program of China (2024YFA1306301), the National Natural Science Foundation of China (NSFC; 22174021 and 22434001), Shanghai Municipal Science and Technology Major Project (2023SHZDZX02), and the Greater Bay Area Institute of Pre-cision Medicine (Guangzhou; IPM2021C005). We acknowledge the Core Facility of Shanghai Medical College, Fudan University.
Appendix A. Supplementary material
Supplementary data to this article can befoundonlineat
https://doi.org/10.1016/j.eng.2025.01.011.