1. Introduction
Approximately 60% of the proteins reported by Protein Atlas as being secreted to the blood [
1], [
2], [
3] are described as potentially glycosylated in the UniProtKB database [
4], [
5]. Growing evidence of the contribution of glycosylation to precision medicine is enabling the development of advanced glycan detection tools [
6]. Furthermore, emerging technologies are evolving toward the generation of non-invasive early diagnostic platforms, which benefit from the elucidation of site-specific
N-glycan features from the low-abundant glycoproteome. However, advanced workflows are first required to extend site-specific
N-glycan identification to the lower abundance ranges. A number of clinical studies have revealed alterations in the N-glycosylation of human blood plasma (HBP) proteins due to physiological and pathophysiological changes [
7], [
8], [
9], [
10], [
11]. Glycoproteomic analysis via liquid chromatography coupled to tandem mass spectrometry (LC-MS/MS) is the core technology for detecting such alterations of N-glycosylation in blood plasma proteins in a site-specific manner [
12]. In recent years, explorative large-scale glycopeptide-centered N-glycoproteomic studies on human plasma proteins (e.g., immunoglobulin G (IgG)) have enabled the identification of glycoforms that could serve as potential biomarkers or therapeutic targets [
7], [
13], [
14], [
15], [
16]. Promising blood plasma biomarker candidates include tissue-leakage proteins and signaling molecules, due to their organ specificity [
13]. Nevertheless, limitations related to the analysis of blood plasma and mass spectrometry (MS)-based
N-glycopeptide identification itself still impede the full diagnostic potential of
N-glycoproteomic analyses. Firstly, the average concentration of tissue-leakage proteins in blood plasma is 1 × 10
5-1 × 10
7 times lower than the concentration of albumin (35-50 mg·mL
−1), the major blood plasma protein [
17]. Due to this challenging concentration range, established workflows for blood plasma (glyco)proteomics frequently include both a high-abundant blood plasma proteins (HAPs) depletion step and a fractionation step on either the peptide or protein level [
18], [
19]. A second constraint is that current MS-based workflows for intact
N-glycopeptide identification provide only a limited description of
N-glycan structural information and an incomplete deduction of the input spectra. Glucuronidated
N-glycans, which were detected in blood plasma through glycomic experiments but have not yet been reported through
N-glycoproteomic analysis, are an example of the effect of these limitations [
20], [
21]. This results in differences between the glycoforms identified by glycomics and those by glycoproteomics analysis and illustrates the necessity to improve MS-based workflows for accurately describing the
N-glycoproteome [
22].
To conduct an optimal
N-glycoproteomic search on a given LC-MS/MS dataset, the input search parameters (
N-glycan list, protein list, modifications, etc.) must describe the characteristics of the different
N-glycopeptides present in the sample. Furthermore, factors that negate the existence of rare
N-glycans (e.g., terminal glucuronic acid) and ignoring post-glycosylational
N-glycan modifications such as sulfation, phosphorylation, and acetylation can lead to an inaccurate and inadequate description of the
N-glycoproteome. To date,
N-glycoproteomic data analyses tend to yield variable
N-glycan and glycoprotein identifications when different
N-glycopeptide search strategies and bioinformatics tools are applied and compared [
23], [
24]. A study by Kawahara et al. [
23] proposed that the main bottlenecks for obtaining precise results are the scoring algorithms and validation filters for glycopeptide spectra matches (gPSMs). Even though advanced false discovery rate (FDR) validation methods might improve the quality of gPSMs, only a few software programs have demonstrated more efficient FDR validation methods, after a controlled search [
25], [
26]. Thus, during an in-depth
N-glycoproteomic data analysis, manual validation is still necessary for finding and reprocessing missed matches, as Lee et al. [
27] also concluded.
Defining the
N-glycan structure is another desirable but underdeveloped aspect of
N-glycoproteomic analysis. Applying higher-energy collisional dissociation (HCD)-based methods favors the fragmentation of
N-glycopeptides, while electron-transfer/higher-energy collision dissociation (EThcD)-based methods are more useful for
O-glycopeptides [
28]. In addition, diverse combinations of collisional energies applied during fragmentation spectra acquisitions have specific advantages for the description of glycopeptides [
29], [
30]. The HCD spectra acquired with a set of increasing normalized collisional energies (NCEs), such as 20, 35, and 50, are a prominent setup [
31]. This HCD fragmentation with stepped energy (HCD.step) is more efficient, since it simultaneously generates information corresponding to both the peptide and the
N-glycan composition [
31], [
32]. In comparison, HCD spectra acquired at low NCE (HCD.low; e.g., fixed NCE 20) are rich in
N-glycan structural information [
31], [
33]. The presence of structural B and Y ions support the differentiation of isomeric
N-glycan branching structures with high confidence, as recently demonstrated by Hoffmann et al. [
31] and Shen et al. [
34]. Therefore, by analyzing both spectra, the differentiation of isomeric
N-glycan structures containing antenna or core fucosylation, bisecting HexNAc and diLacNAc, becomes possible [
31]. Furthermore, it is possible to collect more evidence regarding phosphorylation and sulfation modifications, which is critical for understanding glycoprotein-receptor interactions [
35], [
36], [
37], [
38], [
39], [
40], [
41]. Nonetheless, the detection of new
N-glycan compositions and structures is an explorative task that can only be accomplished by the manual annotation of
N-glycan fragment ions—especially oxonium ion signals corresponding to rare or unknown
N-glycan building blocks (e.g., glucuronic acid).
A preceding study from our group demonstrated the effectiveness of in-depth glycoproteomic analyses in maximizing the site-specific structural elucidation of
N-glycans [
31]. Furthermore, our prior established glycomic methods have been proven to be reliable in the identification and characterization of rare
N-glycans [
42]. For example, the identification of a sulfated
N-glycan [
42], further described using a highly selective sulfatase [
43], prompts a search for rare
N-glycans in the field of
N-glycoproteomics.
Thus, building on our previous work, we integrate a sample preparation workflow in this work and further develop a data analysis workflow that allows the in-depth N-glycoproteomic analysis of HBP. The preparative workflow not only enables the detection of glycoproteins within a concentration range of ten orders of magnitude but also better characterizes the micro-heterogeneity of their N-glycans. The intact N-glycopeptide LC-MS/MS-based analysis applied here includes the acquisition of spectra using HCD.step and HCD.low fragmentation energies, which supports the differentiation of certain isomeric N-glycan branching structures. To conduct an in-depth exploration, manual validation was performed, based on a newly developed decision tree. After the validation, we achieved the detection of 1929 N-glycopeptides, some of which bore N-glycan modifications such as sulfation and phosphorylation. Correction of the invalid or partially valid N-glycopeptide identifications by means of de novo N-glycan sequencing disclosed rare N-glycan building blocks (e.g., glucuronic acid). The insights and improvements gained herein will support further glycoproteomic software development and are crucial for future N-glycoproteomic studies aiming not only for the exploration of biomarker candidates in HBP but also for the evaluation of biotherapeutic proteins or the exploration of biological models.
2. Materials and methods
All chemicals used were LC-MS grade or of the highest purity available. Milli-Q water was used to prepare all aqueous solutions (18 MΩ × cm, < 5 parts per billion (ppb), Millipore Milli-Q Reference A+ system, Merck, Germany). LC-MS-grade acetonitrile (ACN; #A955-212) was purchased from Fisher Scientific (Germany), while 2,2,2-trifluoroethanol (TFE; #808259), potassium chloride (KCl; #104935), potassium phosphate dibasic (K2HPO4; #104873), and sodium phosphate dibasic (Na2HPO4; #106585) were purchased from Merck. Sodium chloride (NaCl; #P029.3) was purchased from Carl Roth (Germany) and trifluoroacetic acid (TFA; #28904) was purchased from Fisher Scientific. Ammonium bicarbonate (ABC; #09830) was purchased from Merck. Formic acid (FA; #56302), DL-dithiothreitol (DTT; #D5545), iodoacetamide (IAA; #I1149), calcium chloride (CaCl2; #A4689), and sodium dodecyl sulfate (SDS; #75746) were purchased from Merck. Sequencing-grade modified trypsin was purchased from Promega (#V5111, USA).
2.1. Sample preparation
The HBP sample was acquired from Affinity Biologicals (VisuCon-F Frozen normal control plasma, FRNCP0105, Canada). The steps applied in the sample preparation workflow are depicted in
Fig. 1. The top 14 HAPs were depleted using single-use High Select Top 14 Abundant Protein Depletion Midi Spin Columns (#A36371, Thermo Fisher Scientific, USA). These HAP included albumin, IgG, immunoglobulin A (IgA), immunoglobulin M (IgM), immunoglobulin D (IgD), immunoglobulin E (IgE), kappa and lambda light chains, α-1-antitrypsin, α-1-acid glycoprotein, α-2-macroglobulin, apolipoprotein A1, fibrinogen, haptoglobin, and transferrin. Four immunoaffinity columns were equilibrated at room temperature by end-over-end agitation at 13 revolutions per minute (rpm) (MultiBio RS-24, BIOSAN, Latvia) 1 h before sample application. Each column was loaded with 70 μL of untreated blood plasma. The columns were incubated for 15 min with end-over-end agitation at 13 rpm. The bottom of the column was opened to collect the flow-through by centrifugation at 1000
g for 1 min (Heraeus Multifuge X 1R centrifuge, Thermo Scientific, Germany). This instrument was used for all centrifugation steps. The slurry was resuspended by adding 1 mL of phosphate-buffered saline (PBS; 8 g·L
−1 NaCl
(aq), 0.2 g·L
−1 KCl
(aq), 0.2 g·L
−1 K
2HPO
4(aq), 1.15 g·L
−1 Na
2HPO
4(aq), with the pH adjusted to 7.4). The flow-through was recovered in the same tube by centrifuging at 1000
g for 1 min. The flow-through from the four columns was then pooled and is referred to herein as the “top 14-HAP depleted sample.” To avoid protein precipitation during the subsequent desalting process, SDS detergent was added to the top 14-HAP depleted sample to a final concentration of 0.01% (v/v). This protein sample was concentrated ten times using a 3-kDa Amicon Ultra 15-mL device (#UFC900396, Merck) at 3000
g for 80 min. The initial concentration of NaCl in PBS (130 mmol·L
−1 (mM)) was reduced to a concentration below 0.7 mM NaCl by two centrifugation steps of 3000
g for 60 min, adding 10 mL of Milli-Q water, and one last centrifugation step of 3000
g for 30 min, adding 3 mL of Milli-Q water. The sample volume was reduced to approximately 400 μL using a rotational vacuum concentrator (RVC; RVC 2-33 CO plus, Germany) at 1 °C and 0.1 mbar (1 bar = 10
5 Pa), for approximately 2 h. This instrument was used for all drying steps. The sample was quantified by a Pierce BCA Protein Assay Kit (#23225, Thermo Fisher Scientific) and stored at −80 °C.
Sample fractionation by protein size was conducted through an eight-channel GELFrEE 8100 Fractionation System (8% Cartridge Kit, #42103, Abcam, UK). The cartridge was prepared according to the manufacturer’s instructions. For sample preparation, we used 800 μg of the desalted and concentrated sample, adjusting the volume to 448 μL with Milli-Q water and adding 120 μL of GELFrEE acetate sample buffer and 32 μL of 1 mol·L−1 (M) DTT(aq). The sample mixture was then heated at 50 °C for 10 min. A protein sample of 200 µg was loaded in each cartridge channel. The electrophoresis method was programmed to generate 12 fractions. The electrophoresis loading-step started at 50 V and continued for 16 min. Then, 12 fractions were collected in the following 152.5 min. The amount of protein per fraction was determined by a Pierce BCA Protein Assay (#23225, Thermo Fisher Scientific).
Tryptic digestion was performed using the filter-aided sample preparation (FASP) method [
44], as described by Hoffmann et al. [
31]. This digestion protocol was also applied on untreated blood plasma and the top 14-HAP depleted sample using 60 μg of protein. For all filtration steps, 10-kDa Nanosep Omega filters (OD010C35, Pall, USA) were used. The reagents, 0.4 M DTT and 0.55 M IAA, were dissolved first in 50 mM ABC
(aq) buffer at pH 7.8 (ABC buffer) immediately before use and diluted ten times in 8 M urea
(aq) and 100 mM Tris-HCl
(aq) at pH 8.5 (urea buffer). After each washing step, the filters were shaken and then centrifuged at 14 000
g for 10 min. From each fraction derived from the GELFrEE system, 60 μg of protein was loaded onto the filter and washed twice with 200 μL of urea buffer. Then, 100 µL of 40 mM DTT
(aq) was added and incubated for 20 min at 56 °C and 300 rpm on a ThermoMixer C (Eppendorf, Germany). After centrifugation at 14 000
g for 10 min, 100 µL of 55 mM IAA
(aq) was added. The sample was incubated for 20 min in the dark, at room temperature and 300 rpm. Once reduced and carbamidomethylated, the protein sample was washed three times with 100 µL of urea buffer and three times with 100 μL of ABC buffer. Trypsin was added at a ratio of 1:60 (mg enzyme:mg protein sample) in a final volume of 100 μL of 50 mM ABC
(aq) + 5% (v/v) ACN + 1 mM CaCl
2(aq). The sample was incubated overnight at 37 °C and 300 rpm (incubator Titramax 1000, Heidolph, Germany). The digest was then recovered and collected by centrifugation. The membrane was washed once with 50 μL of 50 mM ABC
(aq) + 5% (v/v) ACN and once with 50 μL of water. The flow-through was kept and combined with the digest. The digest was then divided into aliquots with 20 μg of peptides. The digests were dried using an RVC (1 °C, 0.1 mbar, ∼2 h).
Glycopeptide enrichment via cotton hydrophilic interaction liquid chromatography-solid phase extraction (Cotton-HILIC-SPE) was based on a protocol from Selman et al. [
45]. Cotton-tips were produced in-house by introducing a mercerized-cotton thread with a length of 3 mm into 250 μL of Rainin tips (Pipette Tips RT LTS 250 µL SX 768A/8, Mettler Toledo, Switzerland). The washing solution (85% (v/v) ACN
(aq) and 0.1% (v/v) TFA
(aq)) and first-elution solution (78% (v/v) ACN
(aq) and 0.1% (v/v) TFA
(aq)) were prepared immediately before use. For each GELFrEE-fraction, 20 μg of lyophilized tryptic digest was dissolved in 50 μL of 85% (v/v) ACN
(aq). The thread-filled tips were cleaned three times by pipetting and then disposing of 100 μL of water. The tips were equilibrated by pipetting 100 μL of washing solution five times. The peptide sample was loaded into the cotton tips by slowly pipetting the liquid up and down 20 times. The liquid from the loading step was discharged in a clean tube and kept (referred to as the “loading fraction”). The tips were washed three times by pipetting 100 μL of washing solution into a clean tube (“wash fraction”). The elution 1 fraction was recovered by pipetting 100 μL of first-elution solution into a clean tube three times. The elution 2 fraction was recovered by pipetting 100 μL of H
2O into a clean tube six times. The four fractions were dried under vacuum and stored at −20 °C. This Cotton-HILIC-SPE protocol was also applied to 20 μg of tryptic peptides from the untreated blood plasma and the top 14-HAP depleted sample. The hydrophilic interaction liquid chromatography (HILIC) fractions were solubilized in 20 μL of 2% (v/v) ACN
(aq) + 0.1% (v/v) TFA
(aq) on the day of injection. For each LC-MS/MS run, 4 μL of the sample were injected.
2.2. NanoRP-LC-ESI-OT-OT-MS/MS data-dependent acquisition
Nano-reversed-phase liquid-chromatography electrospray-ionization orbitrap MS/MS (nanoRP-LC-ESI-OT-OT-MS/MS) was performed using a Dionex UltiMate 3000 RSLCnano system (UHPLC, Thermo Scientific) coupled online to an Orbitrap Elite Hybrid Ion Trap-Orbitrap Mass Spectrometer. The UHPLC system was equipped with a C18 trap column (length 2 cm, particle size 5 µm, pore size 100 Å, inner diameter 100 µm; Acclaim PepMap 100, #164199, Thermo Scientific) and a C18 separation column (length 25 cm, pore size 100 Å, particle size 2 μm, inner diameter 75 μm; Acclaim PepMap RSLC nanoViper, #164941, Thermo Scientific). For the nano-flow separation gradient, the nano pump mobile phase A (2% (v/v) ACN(aq), 0.1% (v/v) FA(aq)) and mobile phase B (80% (v/v) ACN(aq), 10% (v/v) TFE(aq), 0.1% (v/v) FA(aq)) were controlled at a flow rate of 300 nL·min−1 at 40 °C. Then, 5 min after the sample injection, the port valve of the loading pump mobile phase A (2% (v/v) ACN(aq), 0.05% (v/v) TFA(aq), flow rate 7 µL·min−1) was switched, connecting the trap with the separation column and the nano-flow separation gradient was started. The nano-flow separation gradient was set as follows: 4% B for 4 min, linear increase to 35% B in 62 min, further to 90% B in 2 min, kept at 90% B for 6 min, decreased again to 4% B in 2 min, and finally kept at 4% B for 24 min. Both precursor and fragment ion scans were acquired using an orbitrap mass analyzer for orbitrap MS/MS (OT-OT-MS/MS). Each sample was measured twice to acquire data on the glycopeptides fragmented at two HCD regimes: HCD.low, with a fixed NCE of 20, and HCD.step, with a stepped NCE of 35 (width 15%, two steps). Both data-dependent scan methods were acquired in positive mode, selecting the top five peaks with a 10 s dynamic exclusion, a scan range of 350-2000 mass-to-charge ratio (m/z), and an isolation width of 4 m/z. The orbitrap mass analyzer was used for both a precursor ion scan (MS1) and a fragment ion scan (MS2), with respective resolutions of 30 000 and 15 000.
2.3. Data analysis
The HCD.step measurements of the HILIC fractions (loading, wash, elution 1, and elution 2) derived from the untreated blood plasma, the top 14-HAP depleted sample, and the top 14-HAP depleted and fractionated sample were searched for peptides using the human UniProtKB/SwissProt database (v2021-11-30, 20 306 canonical sequences) via Proteome Discoverer (version 2.5.0.400, Thermo Fisher Scientific). The parameters applied to the search engines Sequest HT and Mascot were set as follows: full specific tryptic digestion, two missed cleavages allowed, and precursor and fragment mass tolerances of 10 parts per million (ppm) and 0.02 Da, respectively. Deamidation (N, Q), oxidation (M), and acetyl/+42.011 Da (protein N-terminus) were established as dynamic modifications, and carbamidomethyl/+57.021 Da (C) as a static modification. Percolator was applied for peptide validation, and a 1% FDR on peptide level was applied.
The workflow of the glycoproteomic data analysis is depicted in
Fig. 2. The search engine Byonic (v4.2.10, Protein Metrics, USA) was used to search for
N-glycopeptides in the HCD.step and HCD.low measurements. The HCD.step generates both abundant peptide fragment ions (a, b, y) and some fragment ions from the peptide-linked glycan moiety (Y ions). This fragmentation energy generates the following four characteristic
N-glycopeptide Y ions: ① [M
peptide+H-NH
3]
+, ② [M
peptide+H]
+, ③ [M
peptide+H+
0.2X HexNAc]
+, and ④ [M
peptide+H+HexNAc]
+ (where molecular weight NH
3 = 17.0265 Da,
0.2X HexNAc = 83.0371 Da, and HexNAc = 203.0794 Da). The HCD.low MS
2 spectra are rich in glycan fragment ions and Y ions, with a longer portion of the
N-glycan attached, which allows for interpretation of the
N-glycan structure. The following parameters were applied in Byonic: specific tryptic digestion, maximum two missed cleavages, cysteine carbamidomethylation as a fixed peptide modification, methionine oxidation as a common-1 modification, and asparagine deamidation and pyro Glu/Gln as rare-1 dynamic modifications. The precursor and fragment mass tolerances were set to 10 and 20 ppm, respectively. The recalibration lock mass was 445.1201
m/
z. The fragmentation type was set to QTOF/HCD. The human canonical proteome UniProtKB/Swiss-Prot database (20 396 reviewed sequences downloaded in May 2021) was applied for protein identification. For
N-glycan identification, a customized
N-glycan database of 288 compositions was applied. During the Byonic search, no decoy search was performed, and there were no cuts on the protein FDR. The
N-glycoproteomic analysis of the untreated blood plasma and top 14-HAP depleted sample (HCD.step acquisitions) was conducted using the same parameters.
Combining the 182
N-glycan Byonic database with the reported compositions, including multiple fucoses and the compositions of phosphorylated or sulfated
N-glycans deduced through a diagnostic search, resulted in a database with 288 entries of
N-glycan compositions. A diagnostic search was performed on each HCD.step file, applying two layout filters created in Thermo Xcalibur Qual Browser (version 2.2, Thermo Scientific) software (Figs. S1 and S2 in Appendix A). This made it possible to filter MS
2 spectra containing oxonium marker ions for the sulfated
N-glycans HexNAc
1Sulfo
1 [M+H]
+ or HexNAc
1Hex
1Sulfo
1 [M+H]
+ and the phosphorylated
N-glycans Hex
1Phospho
1 [M+H]
+ or Hex
2Phospho
1 [M+H]
+. In the filtered MS
2 spectra, the presumable peptide mass was determined by detecting the four characteristic Y ions (①-④). Then, the corresponding
N-glycan mass was calculated by subtracting the peptide mass from the precursor ion mass. The sulfated and phosphorylated
N-glycan compositions were deduced using the GlycoMod tool from expasy.org [
46]. The compositions from sulfated and phosphorylated
N-glycans were appended to the
N-glycan database.
The search results derived from the HCD.low and HCD.step acquisitions were imported into Byologic (v4.4-74-g75311a1df5 x64, Protein Metrics) in two sets. The searches corresponding to the HCD.step acquisitions generated one Byologic file (HCD.step-HBP), and those corresponding to the HCD.low acquisitions generated a second Byologic file (HCD.low-HBP). Manual glycopeptide validation was performed only on the HCD.step-HBP list using Byologic. The validated N-glycopeptide identifications in the HCD.step file were transferred to the HCD.low-HBP Byologic file in order to substitute the corresponding correct precursor ion identifications in the HCD.low-HBP list. This substitution was performed using the Peptide Manager Byonic function. The validated N-glycopeptide identifications in the HCD.step-HBP file were exported using the “Export in silico to CSV” option. Then, the CSV file was imported into the HCD.low-HBP Byologic file through the Peptide Manager “Intersect CSV” function. The parameters set for intersecting the corresponding N-glycopeptides were the precursor mass error (10 ppm) and the retention time error (5 min). Once transferred to the HCD.low-HBP Byologic file, the N-glycopeptide identifications were manually revised, and additional structural information on the corresponding N-glycans was annotated.
Rare N-glycan compositions were identified during the N-glycopeptide validation. This related to three N-glycan building-block (supposedly monosaccharide) masses that were not included in the first N-glycan database. In order to find the gPSMs for these identifications, the Byonic wildcard search function was applied. This function makes it possible to add a delta mass within a user-specified range. The mass ranges searched here were as follows: 176, 245, and 259 (±1 Da). These ranges were narrowed to the delta masses deduced during the validation (176.0314, 245.0524, and 259.0672 Da). One specific wildcard search for each delta mass was applied to all the HCD.step files. To reduce the search space, the MS/MS filtering function was applied, allowing MS2 spectra containing at least two of the masses HexNAc1Hex1 [M+H]+/366.1395, HexNAc1 [M+H]+/204.0867, HexNAc1 [M-H2O+H]+/186.0761, and NeuAc1 [M-H2O+H]+/274.0921, with a mass tolerance 0.02 Da. In the wildcard search, the delta mass is assigned to the glycan modification and not to the peptide sequence by setting the parameter restriction on residues to “g” (where “g” means “glycan”). For both wildcard searches, the same parameters were applied as in the first Byonic search: specific tryptic digestion, maximum two missed cleavages, cysteine carbamidomethylation as a fixed peptide modification, methionine oxidation as a common-1 modification, and asparagine deamidation and pyro Glu/Gln as rare-1 dynamic modifications. The precursor and fragment mass tolerances were 10 and 20 ppm, respectively, and the recalibration lock mass was 445.1201 m/z. The fragmentation type was set to QTOF/HCD. To speed up the search, a shorter protein sequence list and a N-glycan compositions list including only the elements present in the “True” validated N-glycopeptide results were applied.
3. Results
During the in-depth N-glycoproteomic analysis of HBP proteins, two main causes of complexity are encountered. First, a few HAP suppress the signal of hundreds of proteins present at lower abundance. Second, the heterogeneous and unpredictable nature of protein N-glycosylation requires sufficient evidence for describing both the structure of an N-glycan and its position in a particular protein. In this study, we established a sample preparation workflow and developed glycoproteomic data analysis workflow that make it possible to reach the very low-abundant glycoproteome. Furthermore, they allow the detection and site-specific description of rare N-glycan compositions and the identification of structural features linked to blood plasma glycoproteins with high confidence, as will be substantiated in the following sections.
3.1. Exploring blood plasma (glyco)proteins at low concentration range and N-glycan micro-heterogeneity
To evaluate the performance of our preparative workflow for identifying middle- and low-abundant HBP proteins, a proteomic analysis was conducted. For each step of the sample preparation workflow, the number of proteins identified was compared (acceptance criterion: ≥ two unique peptides per protein). The protein identifications were linked to the blood plasma concentrations reported in the Plasma Proteome Database (PPD) using the visProteomics R package [
47], [
48]. As presented in
Fig. 3(a), the concentration of most proteins observed by analyzing the untreated blood plasma ranged from 1 × 10
9 down to 1 × 10
6 pg·mL
−1. After depleting the top 14-HAP, the concentration range of the proteins detected was extended, now ranging from 1 × 10
9 down to 3 × 10
5pg·mL
−1. Next, by integrating both the top 14-HAP depletion and protein size fractionation, the concentration range of the proteins identified could be further extended and the detection limit further lowered, now reaching from 1 × 10
9 down to 6 × 10
3 pg·mL
−1. Moreover, in a few cases, glycoproteins at lower abundances could be detected, such as the cysteine-rich secretory protein 3 (6.31 pg·mL
−1) [
47]. A list of the proteins identified after each fractionation step is provided in Tables S1-S3 in Appendix A.
The sample preparation workflow established here also results in a significant improvement regarding the detection of more glycoproteins and glycopeptides and the measurement of the micro-heterogeneity of glycosylation, as displayed in Fig. S3 in Appendix A. As an example, in
Fig. 3(b), the
N-glycan compositions detected on each specific site of the zinc-α-2-glycoprotein are compared at each step of the workflow. For site N
128, for instance, ten times more
N-glycan compositions were detected after the top 14-HAP depletion, compared with the direct analysis of blood plasma. Moreover, even 20 times more
N-glycan compositions were detected when combining the top 14-HAP depletion and protein size fractionation (Table S4 in Appendix A). Thus, the preparative workflow effectively boosts the detection of a large variety of
N-glycans at each site. A detailed comparison of the
N-glycopeptides identified after each step of the workflow is shown in Tables S5-S7 in Appendix A. An overview of the
N-glycopeptide identification, after the in-depth
N-glycoproteomic analysis, is displayed in Fig. S4 in Appendix A. The figure shows that the most common
N-glycan attached to the blood plasma glycoproteins is a diantennary complex-type
N-glycan that is disialylated and non-fucosylated.
Overall, by comparing the (glyco)proteomic results after each step of the preparative workflow, we demonstrate that the applied preparative workflow expands the range for the identification of blood plasma (glyco)proteins from the middle range to the very low-concentration range and enables a deeper description of the micro-heterogeneity of N-glycosylation.
3.2. Development of a data analysis workflow for the identification of intact N-glycopeptides
A dedicated data analysis workflow was developed and applied to validate the data generated by the in-depth
N-glycoproteomic LC-MS/MS analysis (
Fig. 2). The workflow relies on the revision of two
N-glycopeptide lists grouped by the fragmentation energy applied: HCD.step and HCD.low. The first part of the workflow uses HCD.step spectra to confirm the correctness of the peptide sequence and the
N-glycan composition within the gPSM suggested by Byonic. In the second part, HCD.low fragment ion spectra are used to confirm the proposed
N-glycan composition. Information about the
N-glycan structure is manually added, such as to clarify whether a fucose is linked to the core or to the antenna of an
N-glycan. After acknowledging errors, such as missed
N-glycan compositions, the third part of the workflow focuses on correcting gPSMs that were classified as uncertain identifications by
N-glycan
de novo sequencing or performing additional glycoproteomic searches.
The first part of the data analysis workflow is conducted according to the proposed decision tree shown in
Fig. 4(a). With this decision strategy, a total of 7867 gPSMs were classified in three main categories: “True,” “Uncertain,” and “False.” A “True” gPSM has evidence regarding three aspects: ① It is an
N-glycopeptide, ② the mass of the peptide moiety is consistent with the Y ion [M
peptide+H]
+, and ③ the oxonium ions observed are coherent with the suggested
N-glycan composition. An “Uncertain” gPSM meets the first requirement but fails in one or both of the other two. A “False” gPSM does not fulfill the first requirement, since it corresponds to a non-glycosylated peptide or an
O-glycopeptide. The latter is recognized by the ratio between the intensity of the HexNAc
1 [MH-
2O+H]
+ and HexNAc
1 [M+H]
+ oxonium ions [
31]. In an
N-glycopeptide-derived MS
2 spectrum (HCD.step), the intensity of HexNAc
1 [M+H]
+/204.0867 will be 3-10 times the intensity of HexNAc
1 [M-H
2O+H]
+/186.0761 [
31]. As described in
Fig. 4(a), each main category acquires a more specific classification according to the evidence observed. Each of the subcategories is exemplified and described in Figs. S5-S15 in Appendix A. For example, the MS
2 spectra from a gPSM classified as “True-Evidence” contain at least three characteristic Y ions and oxonium ions consistent with the
N-glycan composition presented in the gPSM, yet the MS
2 spectra might lack peptide b and y ions. The category “Uncertain-change
N-glycan and peptide” is also an
N-glycopeptide identification, but the mass of the characteristic Y ions disagrees with that expected from the gPSM suggested by the software.
3.3. Validated N-glycopeptide identifications in HBP
Fig. 4(b) shows the results of the manual validation of the HCD.step-HBP list, where 88.7% of the gPSMs are
N-glycopeptide identifications. The “False” cases (i.e., non
N-glycopeptides) represent only 11.3% of the dataset. The results from the validation of HCD.step-HBP are listed in Table S7. From the total of 7867 gPSMs, 2263 were correct concerning the peptide and
N-glycan composition (27.9% of the total gPSMs). The total “True” gPSMs are spread across four subcategories: 909 “True matches with outstanding evidence” (T-EO), 773 “True matches with evidence” (T-E), 500 “True matches with evidence and alternative identifications” (T-EA), and 81 “True matches with evidence valid related to other identification of the same peptide” (T-EVR). One third of the total identifications correspond to “Uncertain-no-evidence” identifications, in which gPSMs feature MS
2 spectra with a poor number of fragment ions. Another 17% of all gPSMs are incorrect matches (category “Uncertain-change
N-glycan and peptide”), whose MS
2 spectra might require different glycoproteomic searches. Some “Uncertain-change
N-glycan and peptide” gPSMs were corrected by comparing their MS
2 spectra with those from “True” gPSMs with a similar precursor ion mass or retention time. Thus, 656 manually corrected gPSMs comprised a “Quasi-true corrected gPSM” subcategory. Other “Quasi-true” categories, “Change glycan” and “Double site,” represent only 0.6% of the total gPSMs and are cases in which the peptide moiety is correct, but the
N-glycan cannot be confirmed. As displayed in
Fig. 4(c), a total of 942 different
N-glycosites belonging to 805 human glycoproteins were found. Focusing on the different glycoforms per glycosylation site, the total number of gPSMs was condensed to 1929, disregarding peptide modifications or missed cleavages.
3.4. Annotating structural features of bisecting, fucosylated, and LacNAc extended N-glycans using HCD.low spectra
Only the 2263 “True” gPSM were applied to the second part of the workflow for the annotation of
N-glycan structural information in the HCD.low-HBP list (Table S8 in Appendix A). The
N-glycan structural evidence was screened in the HCD.low-spectra corresponding to the validated gPSMs, using the
N-glycan marker ions listed in Tables
1 and
2 [
31], [
33], [
49], [
50], [
51].
In this table, Y ions are considered in different charge states (
z = 1-3
+). This is the second part of the data analysis workflow, which enables differentiation between antenna and core fucose, multi-antennary
N-glycans and repeated LacNAc units, or antenna HexNAc and bisecting HexNAc [
31]. Detection of the Y ion peptide + HexNAc
3Hex
1 ion, for example, suggests a bisecting
N-glycan. In total, 16 bisected
N-glycopeptides were identified in 11 glycoproteins. Most of these bisected
N-glycans are linked to the proteins IgA2 (heavy chain), IgG1 (heavy chain), IgG2 (heavy chain), and synaptojanin-1. In the case of a LacNAc repeat unit, the oxonium-ion HexNAc
2Hex
2NeuAc
1 (B ion [M+H]
+/1022.3671) was detected and annotated in the HCD.low-HBP list. It has been reported that the HexNAc
2Hex
2NeuAc
1 oxonium ion is not present in MS
2 spectra derived from complex-type
N-glycans fragmented at a low collision energy, which allows high specificity in identifying diLacNAc structures [
33]. Notably, the oxonium ion HexNAc
2Hex
2 (B ion [M+H]
+/731.2717) was not applied, because this fragment ion is also generated by complex-type and bisected
N-glycans. Therefore, the identification of non-sialylated LacNAc repeat units was not included. It was found that apolipoprotein D harbors three different
N-glycans with a sialylated LacNAc repeat unit at glycosylation site N
65, where apolipoprotein D is the protein with the highest frequency of this type of
N-glycan. Three additional
N-glycopeptides, belonging to kininogen-1, β-2-glycoprotein 1, and kallistatin, were also found to have LacNAc repeat units.
In our HCD.step-HBP list, 679 unique fucosylated
N-glycopeptides were further reviewed to deduce the position of fucose in the respective
N-glycans (Table S7). The presence of the following marker ions was registered in the HCD.step- and HCD.low-HBP lists: for core fucose, peptide+HexNAc
1Fuc
1 [M+H]
+ (Y ion); and, for antenna fucose, HexNAc
1Hex
1Fuc
1 [M+H]
+ (oxonium ion, B ion). In total, we found 350 unique fucosylated
N-glycopeptides that had MS
2 spectra with either a core or an antenna fucose marker ion (Table S9 in Appendix A). Ambiguous cases occurred when only one fucose was suggested in the
N-glycopeptide identification but both classes of marker ions (i.e., antenna and core fucose) were observed. Since fucose transfer (fucose rearrangement) in the gas phase can occur from the core to the antenna, generating “False” antenna fucose marker ions [
52], the
N-glycopeptides that included one fucose but presented both antenna and core fucose ions were set as “core or antenna fucose.” In the case of
N-glycan compositions with more than one fucose presenting both antenna and core fucose ions, both core and antenna fucosylation were assigned. For
N-glycopeptides where core fucose ions were not observed but antenna fucose ions were detected, only antenna fucosylation was accepted. As a result, we found 86 ambiguous core- or antenna-monofucosylated
N-glycopeptides, 156 core-monofucosylated
N-glycopeptides, 47 core- and antenna-fucosylated
N-glycopeptides, and 61
N-glycopeptides presenting only one or more antenna fucose. The proteins with the highest frequency of antenna fucosylation are zinc-α-2-glycoprotein, α-2-HS-glycoprotein, β-2-glycoprotein 1, clusterin, and ceruloplasmin. In
Fig. 5, all gPSMs including fucose(s) in the
N-glycan composition are plotted, in order to analyze the frequency of fucosylated
N-glycopeptides with and without fucose marker ion evidence. It can be seen that a large number of difucosylated gPSMs, which lacked any kind of fucose ion, presented complex-type diantennary monosialylated or tri-antennary disialylated
N-glycans.
A prominent example of a difucosylated gPSM is shown in
Fig. 6(a), where no antenna fucose oxonium ions are observed in the annotation done by the software. By analyzing the isotopic pattern of the triply charged precursor of this example (
Fig. 6(b)), it is noticeable that there is an initial isotopic peak with an
m/
z difference of 0.3331 with respect to the monoisotopic peak assigned by the software. This presumes that the mass of the precursor ion deduced by the software is 1 Da bigger than the real mass of such a precursor ion (Δ
m/
z = 0.3331,
z = 3, Δ
m = 0.9993 Da). Then, the mass difference between assigning 2× fucoses instead of 1× NeuAc is 1 Da (2× Fuc = 292.1158 Da, 1× NeuAc = 291.0954 Da, Δ = +1.0204 Da, where Δ refers to mass difference). Hence, we hypothesized that the software misrepresented the MS
1 isotopic pattern. This would explain the mass difference of 1 Da, which leads to the incorrect assignation of two fucoses instead of one neuraminic acid. To confirm this hypothesis, the
Fig. 6(c) shows the manual
de novo sequencing of the
N-glycan composition reflected in the HCD.low spectrum from this precursor ion. As fucose ion evidence was not observed, we searched for neutral losses corresponding to fucose. The result shows only a delta mass corresponding to a doubly charged NeuAc and a neutral loss from a second NeuAc at the end of the
m/
z range. This result demonstrates that this
N-glycan ion contains two NeuAc instead of one NeuAc and two fucoses. Interestingly, when looking at the MS
1 isotopic distribution of this precursor ion in the raw file (
Fig. 6(d), the correct monoisotopic peak) would be picked. Other examples in which the potential real monoisotopic peak of the precursor ion is ignored in gPSMs with multiple fucoses are shown in Fig. S16 in Appendix A.
3.5. Detection of sulfated and phosphorylated N-glycopeptides in common HBP glycoproteins
N-glycopeptides with sulfated and phosphorylated
N-glycans are difficult to detect and locate—not only due to their low abundance but also due to their unstable behavior during MS analysis [
20], [
53]. Still, by integrating a dedicated diagnostic search using Thermo Xcalibur Qual Browser, we were able to detect
N-glycopeptides with phosphorylated and sulfated
N-glycan compositions in the HBP (Figs. S1 and S2). A collection of all possible phosphorylated and sulfated
N-glycan compositions was added to the Byonic
N-glycan composition database. This database was then used for all
N-glycopeptide searches. Additionally, during manual validation, both the HCD.step- and HCD.low-HBP lists were revised with regard to marker ions related to sulfation or phosphorylation. The example in
Fig. 7 shows that the HCD.low MS
2 spectrum of a sulfated
N-glycopeptide identification contains the HexNAc
1Hex
1Sulfo
1 [M+H]
+ oxonium marker ion, while the HCD.step MS
2 spectrum of the same
N-glycopeptide displays the HexNAc
1Sulfo
1 [M+H]
+ oxonium marker ion (see software annotation in Fig. S17 in Appendix A). Interestingly, the Hex
1Sulfo
1 [M+H]
+ oxonium ion was not detected in any of the acquired MS
2 spectra. With regard to phosphorylated
N-glycopeptides (
Fig. 8), the HCD.low MS
2 spectrum mainly shows the Hex
1Phospho
1 [M+H]
+ and Hex
2Phospho
1 [M+H]
+ oxonium marker ions, while the HCD.step MS
2 spectra predominantly show the Hex
1Phospho
1 [M+H]
+ oxonium marker ion (see software annotation in Fig. S18 in Appendix A). Among the
N-glycopeptide identifications from the HCD.step-HBP list, 65 contained sulfated
N-glycans, but only 10 of these contained sulfation marker ions (Table S10 in Appendix A). In the HCD.step-HBP list, 15
N-glycopeptides presented phosphorylated
N-glycans, but—in contrast to the sulfated
N-glycopeptides—all the MS
2 spectra contained marker ion(s) for phosphorylated hexose (Table S11 in Appendix A).
As the examples in
Fig. 7,
Fig. 8 show, through
de novo sequencing of the HCD.low MS
2 spectra, sulfation was detected on the antenna HexNAc, and phosphorylation was detected after the fourth mannose of the hybrid-type
N-glycan. In the lists of validated gPSMs containing sulfation or phosphorylation (Tables S10 and S11), it was observed that the sulfated
N-glycan compositions are related to complex-type
N-glycans, while the phosphorylated
N-glycans are related to hybrid-type or oligomannose-type
N-glycans. The
N-glycosylation micro-heterogeneity of some proteins, such as ceruloplasmin, hemopexin, heparin cofactor 2, and zinc-α-2-glycoprotein, indicated that several
N-glycosylation sites harbored a sulfated
N-glycan (Table S10). The proteins ceruloplasmin, hemopexin, and zinc-α-2-glycoprotein also contained a higher frequency of phosphorylated
N-glycans (Table S11).
Glycoproteomic analysis of the untreated blood plasma did not achieve the detection of phosphorylated or sulfated
N-glycopeptides at all (Fig. S19 in Appendix A). After the analysis of the top 14-HAP depleted sample, we detected one sulfated
N-glycan attached to N
762 from ceruloplasmin and one phosphorylated hybrid
N-glycan linked to N
187 of hemopexin. In contrast, by combining the immunoaffinity depletion of the top 14-HAP plus the fractionation by protein size, our sample preparation workflow achieved the detection of phosphorylated
N-glycans in 11 glycoproteins and sulfated
N-glycans in 54 glycoproteins. Moreover, it allowed the identification of a phosphorylated
N-glycan in cysteine-rich secretory protein LCCL domain-containing 1, a very low-abundant protein, whose concentration in blood plasma, as reported by the Protein Atlas Database and Protein Abundance Database (PaxDB), is 8.2 pg·mL
−1 and 0.033 ppm, respectively [
54], [
55], [
56]. Even though sulfated
N-glycopeptides tend to lose the sulfated marker ions due to their lability, we confirmed ten glycoproteins holding sulfated-HexNAc
N-glycans. These results demonstrate the power of the preparative workflow for the site-specific identification of modified
N-glycans.
3.6. Rare N-glycans in HBP proteins
As displayed in
Fig. 9, three rare
N-glycan building blocks with masses of 176.0314, 245.0524, and 259.0672 Da were found during the process of the
de novo sequencing of HCD.low MS
2 spectra of previously incorrect gPSM. As described in the following paragraphs, it is hypothesized that the first mass corresponds to glucuronic acid, while the last two possibly resemble other types of sialic acid. All the identifications with these masses show complex diantennary monosialylated
N-glycans. While one antenna is capped by NeuAc, the second could be capped by any of the rare
N-glycan building blocks (
Fig. 9). This hypothesis is supported by the presence of the oxonium ions generated in each case: Hex
1HexNAc
1+176.0314 [M+H]
+/542.1694, Hex
1HexNAc
1+245.0524 [M+H]
+/611.1899, and Hex
1HexNAc
1+259.0672 [M+H]
+/625.2049. Furthermore, site-specific identification of these
N-glycans revealed their occupancy on N
121 from prothrombin (Fig. S20 in Appendix A).
The composition of the oxonium ion Hex
1HexNAc
1GlcA
1 [M+H]
+ (542.1716 [M+H]
+, where GlcA refers to glucuronic acid) matches the observed
m/
z 542.1694 [M+H]
+ (mass error: −4.06 ppm). The ion Hex
1HexNAc
1GlcA
1 [M+H]
+ has been reported as a fragment derived from 3-sulfonated glucuronidated
N-glycans (HNK-1), which are common in the human brain [
49]. In our case, the existence of the HNK-1 glycoepitope was rejected, since no fragment ion or neutral loss representing GlcA sulfation was detected. Moreover, the mass of the elemental composition of GlcA (C
6H
8O
6 after water loss) is 176.0321 Da, and the observed neutral mass is 176.0314 Da. This results in an error of −3.98 ppm between both masses, which favors the hypothesis that the first
N-glycan building block identified is glucuronic acid.
In order to find other proteins harboring glucuronidated N-glycans, a new Byonic wildcard search was performed (detailed in Section 2). As a result, ten glucuronidated N-glycopeptides derived from β-2-glycoprotein 1, α-2-HS-glycoprotein, kallikrein, histidine-rich glycoprotein, prothrombin, hemopexin, and complement factor H were identified (Table S12 in Appendix A). No glucuronidated N-glycopeptides were detected by the analysis of the untreated HBP (Fig. S19 in Appendix A). After the analysis of the top 14 HAP-depleted sample, only the N-glycosylation site N453 from hemopexin was found to harbor a glucuronidated complex N-glycan. These comparisons demonstrate that the sample preparation workflow established here supports the identification of considerably rarer N-glycopeptides.
To our knowledge, the terminal
N-glycan building blocks 245.0524 and 259.0672 Da have not been described to date. Assuming that these molecules might be a variant of sialic acid, the molecules were searched in PubChem, proposing their hydrated masses (263.20 and 277.23 Da, respectively). The candidates with the greatest similarity to Neu5Ac (C
11H
19NO
9, 309.27 Da [
57]) were selected and submitted to CFM-ID 4.0, an online tool to predict fragment ion spectra [
58]. The software predicted the fragment ion spectrum using NCE 10 and 20 V (NCE 20 V was applied here for HCD.low measurements). After comparing the predicted and average observed fragment ions, the two candidate molecules shown in Fig. S21 in Appendix A were selected. The first candidate (C
9H
13NO
8, 263.0641 Da [
59]) theoretically generates the B ions 246.0608 [M+H]
+ and 228.0503 [M+H-H
2O]
+, which are also observed in the MS
2 spectra that contains the building block “245 Da” (oxonium ions 246.0596 and 228.0490 [M+H]
+). This results in errors of −4.99 and −5.53 ppm, respectively. The second candidate (C
10H
15NO
8, 277.0798 Da [
60]) produces the oxonium ions 260.0765 [M+H]
+ and 242.0659 [M+H-H
2O]
+. These B ions are observed in the MS
2 spectra where the mass “259 Da” is present (oxonium ions 260.0755 and 242.0643 [M+H]
+). Comparing the error between the theoretical and observed fragment ions’
m/
z for the second candidate results in −3.77 and - 6.65 ppm, respectively. The low error supports the hypothesis that the
N-glycan building blocks found might be two molecules similar to NeuAc, such as those proposed in Fig. S21 (molecules drawn in PubChem Sketcher [
61]). Both
N-glycan building blocks were only found on N
121 of prothrombin (Fig. S20 and Table S13) and have not been reported in this or other proteins before. Ongoing follow-up analyses are being conducted on this protein, which has a significant therapeutic role, to clarify the identity of such glycan building blocks.
During the manual annotation of the
N-glycan structural features in the HCD.low-HBP list, we found disialo-antennary
N-glycan structures, as shown in
Fig. 10. The surprising presence of one antenna holding two sialic acids was evidenced by the B ion HexNAc
1Hex
1NeuAc
2 [M+H]
+/948.3303. Typically, sialic acid is linked to galactose, while the second sialic acid might be linked to this first sialic acid or to the antenna HexNAc. In order to describe this disialo-antennary structure, the MS
2 HCD.low spectra from two
N-glycopeptides, corresponding to N
762 ceruloplasmin and N
65 apolipoprotein D, were
de novo sequenced. In both MS
2 spectra, the linkage of sialic acid to HexNAc was demonstrated by the presence of peptide-HexNAc
3Hex
3NeuAc
1 or peptide-HexNAc
4Hex
4NeuAc
2 fragment ions ([M+H]
2+ or [M+H]
3+). In addition, the fragment ion spectra from apolipoprotein D showed the oxonium marker ion HexNAc
1NeuAc
1 [M+H]
+, supporting the existence of the unexpected HexNAc-NeuAc linkage. Overall, ten proteins presented a disialo-antennary
N-glycan structure, including ceruloplasmin, α-1-antichymotripsin, apolipoprotein D, and β-2-glycoprotein 1. All the
N-glycopeptides presenting disialo-antennary
N-glycans had one of the following compositions: HexNAc(5)Hex(6)NeuAc(3) or HexNAc(5)Hex(6)Fuc(1)NeuAc(3). Interestingly,
Fig. 10 also describes the
N-glycan structure of a disialo-antennary
N-glycan attached to β-2-glycoprotein 1. This
N-glycopeptide shows not only one disialo-antenna but also a LacNAc repeat unit on the other antenna. The aforementioned oxonium marker ions are absent in the HCD.step fragment ion spectra annotated by the software and displayed in the Fig. S22 in Appendix A. This finding demonstrates that it is only by including HCD.low fragmentation that more
N-glycan features can be collected to detect relevant glycoepitopes.
4. Discussion
In this work, we established and applied both a sample preparation workflow and a data analysis workflow for an in-depth analysis of intact
N-glycopeptides in blood plasma. The internal steps in the workflows are connected as follows: ① conducting blood plasma top 14-HAP depletion, followed by a protein size fractionation, tryptic digestion, and intact glycopeptide enrichment; ② measuring glycopeptide-enriched fractions twice by means of nanoRP-LC-ESI-OT-OT-MS/MS HCD.low and HCD.step fragmentation methods; ③ searching for
N-glycopeptides using software; ④ manually validating
N-glycopeptides using the newly developed decision tree; ⑤ manually annotating structural
N-glycan features using HCD.low fragment ion spectra; and ⑥ conducting particular glycoproteomic searches to incorporate missed rare
N-glycan compositions. The main achievement of this in-depth analysis is the capacity to delve even deeper into the
N-glycan micro-heterogeneity, along with a significantly extended protein concentration range (10
9-10
3 pg·mL
−1). We were even able to achieve the detection of low-abundant glycans such as phosphorylated
N-glycans in very low-abundant proteins such as cysteine-rich secretory protein LCCL domain-containing 1 (8.2 pg·mL
−1) [
54], [
55]. Another example is the reliable identification of sulfated
N-glycans in middle-abundant proteins (e.g., extracellular matrix protein 1, 0.78 μg·mL
−1 [
47]) through the detection of sulfated fragment ions, which are typically difficult to acquire [
53].
In addition, the manual validation of all the unique
N-glycopeptide identifications computed by Byonic led to the detection of glucuronidated and other rare
N-glycans attached to blood plasma proteins expressed by the liver. Finally, the manual annotation of
N-glycan structural features revealed the advantages of using HCD.low spectra for recognizing relevant structures or glycoepitopes. After searching through the GlyCosmos, UniCarb, and GlyConnect databases [
62], [
63], [
64], we conclude that there is not yet a site-specific description of the glycoproteins harboring some of the sulfated, phosphorylated, glucuronidated, and disialylated-antenna
N-glycan structures identified in our study (listed in Tables S14 and S15 in Appendix A) [
65], [
66], [
67], [
68], [
69].
Next, by performing a proteomic analysis on all fractions after
N-glycopeptide enrichment (FDR < 1%), we demonstrated an expanded detection limit of blood plasma proteins within a concentration range from 1 × 10
9 down to 1 × 10
3 pg·mL
−1. This evaluation not only demonstrates that our workflow is comparable to other in-depth proteomic workflows [
25], [
70] but also shows that it has significant advantages for glycoproteomics in blood plasma. In contrast, a study by Wessels et al. [
71] presented a strategy for the diagnostics of congenital disorders of glycosylation (CDG) via the direct glycoproteomic analysis of untreated blood plasma. Their method resulted in the site-specific profile of 34 proteins, which satisfied the evaluation of the selected CDG cases. Our workflow offers an alternative for evaluating the site-specific profiles of low-abundant proteins and other
N-glycan types, such as rare
N-glycan structures that might be involved in different CDGs. Thus far, structural
N-glycan details require glycomics or exoglycosidases and lectins. However, in this work, we demonstrate a new avenue for moving toward structural glycoproteomics.
In terms of exploring other application possibilities, our preparative workflow integrates a well-established fractionation platform, which has demonstrated its applicability to other biological samples such as cell cultures and biotherapeutics [
72], [
73], [
74]. Therefore, in combination with the glycoproteomic preparative methods [
31], [
45] and the newly developed data analysis workflow, the presented methodology might be useful for expanding glycoproteomic research not only for HBP but also in fields such as biopharmaceuticals or basic research.
In recent years, many studies have revealed substantial information about the blood plasma
N-glycoproteome based on the analysis of intact
N-glycopeptides [
25], [
70], [
75]. These analyses have applied not only multi-step sample preparation workflows but also powerful bioinformatics tools. The biggest study was performed by Shu et al. [
70], who achieved the identification of 1036
N-glycosites containing 738
N-glycans, resulting in 22 677 unique
N-glycopeptides derived from 526 glycoproteins using pMatchGlyco software (FDR = 1%). In our study, we obtained 7867
N-glycopeptide identifications (no decoys, no cuts on FDR) from which 1929 were manually validated as “True” identifications, revealing 942 different
N-glycosites and 805 human glycoproteins. Shu et al.’s [
70] greater number of
N-glycopeptide identifications might result from the library of de-
N-glycosylated peptide identifications that they created by cumulative searches, including semi-tryptic digestion and 16 variable peptide modifications. In addition, they explored atypical
N-glycosylation consensus sequences (N-X-S/T/V/C, X ≠ P) and a bigger
N-glycan database (739
N-glycans). In contrast, our study resulted from one standard search applying full specific tryptic digestion, four variable peptide modifications, a common
N-glycosylation consensus sequence (N-X-S/T, X ≠ P), and 288
N-glycan compositions. Regarding sample material, Shu et al. [
70] used a pooled serum sample from 50 healthy individuals, which might increase the opportunity to accumulate the proteins secreted in blood plasma in different abundances. Additionally, the researchers included the HAP depleted fraction in their study. In comparison, our analysis resulted from a blood plasma sample (pooled from a smaller number of healthy donors) from which the top 14 HAP were depleted and later discarded using a single-use immunoaffinity column. An analysis of the HAP fraction was not within the scope of our study, since the
N-glycosylation of these proteins is well studied. Another important limitation of our study was the relatively long measurement time of the instrument used for acquiring data (Orbitrap Elite-Velos, scan speed 4 Hz), compared with the Orbitrap Q Exactive mass spectrometer used by Shu et al. (with a scan speed three times faster, at 12 Hz) [
70]. The significant difference between the performance of both instruments was demonstrated by Sun et al. [
76].
Our study aimed to address the limitations and pitfalls caused by the blind spots that exist during an
N-glycoproteomic analysis. By acknowledging these aspects, glycoproteomic bioinformatics tools can be refined to obtain more accurate and comprehensive results. Thus, the software and parameters for the search are another factor that can lead to different results regarding the interpretation of
N-glycopeptides. This was shown in a study by Kawahara et al. [
23], in which two glycoproteomic spectra files derived from human serum were provided to 22 expert groups in glycoproteomics to evaluate the impact of applying different bioinformatics strategies on the identification of intact
O- and
N-glycopeptides. The results showed an enormous variability in glycopeptide identifications, glycoproteins, and glycan compositions. The study mainly attributed this inconsistency to the filters applied after the search, such as the score threshold or FDR cut. In our study, we excluded post-search filter interference by disallowing decoys and FDR cuts. Instead, we conducted a manual validation after the
N-glycopeptide search. We are aware that this approach could result in incorrect identifications. Nevertheless, our scope was to identify hitches and opportunities during an intact
N-glycopeptide analysis, especially for incorrect
N-glycopeptide identifications with a high quality score and ambiguous gPSM. All the identifications were manually scrutinized, evidence supporting the trueness of the identification was annotated, and incorrect matches were revised. Thus, the subcategory “True evidence with alternatives” was a validation subcategory related to the variability observed by Kawahara et al. [
23]. This subcategory lists
N-glycopeptides with alternative peptide matches named in the column “comments” of the table HCD.step-HBP list in Table S7. In our dataset, 35% of the “True”
N-glycopeptides are classified within this subcategory, and two common features were observed: ① a low number of b and y ions in the MS
2 spectra and ② a low abundance of these observed proteins in blood plasma. The first issue relates to low MS
2 spectra quality, which is associated with several factors such as precursor ions with poor ionization efficiency, fragment ion losses, and low-abundant precursor ions. The second issue depends on the protein concentration distribution of the sample, where low-abundant proteins are reflected as low-abundant precursor ions in the final spectra. Regarding the second issue, Kreimer et al. [
77] proposed the implementation of algorithms designed for the smart selection and acquisition of the typically suppressed precursor ions in order to reduce the proportion of spectra with poor quality. These algorithms would then reduce the proportion of spectra leading to variability in glycopeptide identifications by improving the spectra quality during the MS/MS measurement.
Another source of variability in glycoproteomic analyses is fucosylation. After manual validation, we observed a high frequency of identifications presenting multiple-fucose moieties without any fucose-related fragment ion. This occurrence was higher in di-, tri-, or tetra-antennary
N-glycopeptides with an incomplete number of capping sialic acids in the antennae. Kawahara et al. [
23] reported a similar observation, where the high frequency of
N-glycopeptide identifications with multiple-fucose moieties, reported by many participants, did not correlate with the results obtained from a typical
N-glycomic analysis of HBP. We observed that the assignment of multiple-fucose moieties instead of one sialic acid is caused by an incorrect detection of the isotopic pattern for the corresponding precursor ion. Lee et al. [
27] also observed this problem and reported that it might be influenced by the precursor-picking default settings. Hence, special attention must be given to this phenomenon, since it could also induce the incorrect assignation of other not-fucose-related quasi-isobaric
N-glycan masses, such as HexNAc(5)Hex(6)Fuc(1)NeuAc(3)/3007.0580 Da and HexNAc(7)Hex(6)NeuGc(2)/3008.0532 Da.
Determining the fucose position on the
N-glycan is an additional difficulty when deciphering the structure of a fucosylated
N-glycan. Hexose rearrangement is a reaction often found in MS in which an internal hexose migrates to a different position within the
N-glycan, producing “False”
N-glycan structures [
78]. Fucose rearrangement occurs in the gas phase, resulting in a fucose transfer between two antennae or between the core and—most likely—the α6-Man-linked antenna (due to its flexibility) [
52]. As a result, this reaction generates misleading fragment ions such as HexNAc
1Hex
1Fuc
1 in the MS
2 HCD.step spectra of only core-fucosylated
N-glycopeptides. Acs et al. [
79] showed that the core fucose linkage is robust at high NCEs. However, Wuhrer et al. [
52] and Acs et al. [
79] also acknowledge that a transfer of fucose from the antenna to the core can be induced, since they observed the Y ion (peptide-HexNAc
2Hex
3Fuc
1 [M+H]
+) when conducting MS/MS analyses with collision-induced dissociation (CID) at different energies. Nevertheless, in our study, that Y ion was not used for confirming core fucosylation, which was primarily done using the peptide-HexNAc
1Fuc
1 [M+H]
+ Y ion. Diverse studies have reported that core and antenna fucose-linkage stability improves when using NCE 20 instead of higher collisional energies, even though fragments from fucose rearrangement might still be produced in lower abundance [
31], [
79]. On the one hand, in agreement with these studies, we observed the ambiguous generation of fucose fragment ions, such as the detection of both peptide+HexNAc
1Fuc
1 (Y ion) and HexNAc
1Hex
1Fuc
1 [M+H]
+ in the MS
2 spectra of monofucosylated
N-glycopeptides. It is assumed that core fucosylation would most likely lead to the generation of an antenna marker ion (instead of the opposite rearrangement). On the other hand, while the adequate chromatographic separation of antenna-fucosylated and core-fucosylated labeled
N-glycans on a C18 column has been demonstrated [
80], it might not be achievable for all the glycopeptides in a complex sample [
3], [
81], such as blood plasma, allowing the overlap of structural isomers to occur. Therefore, coelution of isomers (i.e., antenna- and core-fucosylated peptides) cannot be ruled out as a possible situation explaining the presence of both ions. Thus, monofucosylated gPSMs showcasing both ions were categorized as “core or antenna fucosylation.” To describe multiple-fucosylated
N-glycan structures with high confidence, it is necessary to check more details, such as the presence of Y ions, neutral losses, and the MS
1 isotopic pattern distribution using spectra acquired at low collisional energies; however, for a large number of
N-glycopeptide identifications, manual annotation is not feasible.
In recent years, special emphasis has been put on describing not only
N-glycan structural information but also the localization of
N-glycans within the protein [
22]. Even though glycomic analysis has successfully added on the description of structural groups in
N-glycans (e.g., the diLacNAc unit, sialyl Lewis X, phosphorylation, and so forth), only MS can precisely determine the original position of such
N-glycans. With the aim of integrating
N-glycan structural information using MS, Shen et al. [
34] conducted an
N-glycoproteomic analysis of mouse brain tissue using the new software StrucGP. By acquiring and combining two complementary fragmentation energies (HCD.low for glycan and HCD.step for peptide moiety), this software was able to interpret
N-glycan structural groups on intact
N-glycopeptides. Similarly, we coupled information from both the HCD.step and the HCD.low MS/MS analysis and manually annotated the oxonium marker ions that supported the identification of special
N-glycan structural features and modifications, as previously described by our group [
31]. Based on the observation of fragment ions,
N-glycan isoforms such as bisecting
N-glycans and a repeated LacNAc structure were identified. These identifications were searched in GlyConnect to find the corresponding reported information [
65], [
63]. The bisecting
N-glycans observed in immunoglobulin proteins and plasma protease C1 inhibitor align with the information reported in GlyConnect [
65], [
63]. However, for the rest of the identified
N-glycopeptides with bisecting GlcNAc, no reports were found (Table S15). Even though
N-glycosylation sites harboring a LacNAc structure have been reported in terms of
N-glycan composition, evidence of a sialylated diLacNAc structure was not previously described for the
N-glycosylation sites identified in this work. An interesting disialo-antennary
N-glycan was observed in site N
253 of β-2-glycoprotein 1. A similar
N-glycan has been reported for site N
162 of this protein [
65]. The disialylated-antenna structure found in our study might resemble the epitope disialyl Lewis C NeuAc α2-3Galβ 1-3(NeuAc α2-6)GlcNAc reported in α-2-HS-glycoprotein from
Bos taurus [
82]. More experiments are necessary to achieve sialic acid linkage elucidation. Based on previous reports, sialic acid linkage (α2,3 and α2,6) determination on intact glycopeptides can be achieved not only through ion mobility [
83], [
84] but also by triggering MS
3 fragmentation of the HexNAc
1Hex
1NeuAc
1 oxonium ion [
85], [
86]. Hence, integrating MS
3 fragmentation of the HexNAc
1Hex
1NeuAc
1 and HexNAc
1Hex
1NeuAc
2 oxonium ions, as part of the LC-MS/MS method, can support the structural description of this rare
N-glycan to a certain extent.
Disialic acid in one antenna was also found by Saraswat et al. [
75] in ten blood plasma proteins with a linkage between two sialic acids (except for clusterin protein, where the linkage is also to a HexNAc). However, our study had no
N-glycosylation sites in common with the work of Saraswat et al. Only the protein α-1-antichymotrypsin was identified in both studies with a disialylated antennary group in different
N-glycosites [
75]. Sturiale et al. [
51] compared the
N-glycomes from a patient and two controls (parents) and identified the NeuAc α2-3Galβ 1-3(NeuAc α2-6)GlcNAc epitope in isolated proteins (transferrin, α-1-antitrypsin, IgG, and α-1-acid glycoprotein) and serum
N-glycome. The production of the GlcNAc-NeuAc linkage in human
N-glycans has been studied previously [
87], [
88]. Our study provides evidence of the existence of this glycoepitope in the low-abundant blood plasma
N-glycoproteome.
Sulfation and phosphorylation are two post-glycosylational modifications that add a negative charge to an
N-glycan. The
N-glycopeptides with these types of glycan modifications (as well as sialylated
N-glycans) are better detected by methods that enrich anionic molecules [
89], [
90]. Nevertheless, our study showed that, while none of these modified
N-glycopeptides were detected through the analysis of the untreated HBP, their detection was possible after applying the multi-step sample-preparation workflow.
N-glycopeptides featuring sulfated and phosphorylated
N-glycans were detected for proteins with reported concentrations ranging from 4.17 × 10
8 down to 1.55 × 10
5 pg·mL
−1 (PPD) [
47]. Our results show that sulfated
N-glycopeptides were more frequently present than phosphorylated
N-glycopeptides. However, identifications of
N-glycopeptides holding phosphorylated
N-glycans always showed corresponding oxonium marker ions—which was not the case for sulfated
N-glycopeptides. It seems that phosphate-containing fragment ions are more stable than sulfate-containing fragment ions. This observation is in agreement with Zhang et al
. [
91], who found that the abundance of Hex
1Phospho
1 [M-H
2O+H]
+ oxonium ions was higher than that of Hex
1Sulfo
1 [M+H]
+ when using low fragmentation energy. From the observed sulfated
N-glycopeptide identifications, only 17% showed the oxonium ion HexNAc
1Sulfo
1 [M+H]
+ within their MS
2 spectra, probably due to the short lifetime of sulfated fragment ions in MS analysis [
53]. Recent glycomic work in our group confirmed the presence of HexNAc sulfation in N-glycans released from human serum IgA [
42], [
43].
Cajic et al. [
42] established a versatile glycomic workflow that enables the isolation of any
N-glycan of interest via HILIC-high-performance liquid chromatography (HPLC; by means of a removable fluorescent dye), followed by multiple analyses (i.e., matrix-assisted laser desorption/ionization time-of-flight MS (MALDI-TOF-MS) and multiplexed capillary gel electrophoresis with laser-induced fluorescence detection (xCGE-LIF)). Employing this glycomic workflow, Cajic et al. [
42] identified a sulfated
N-glycan released from human serum IgA, for which Chuzel et al. [
43] further determined the presence of GlcNAc-6-SO4 using a highly specific sulfatase found via functional metagenomics.
Spurious sulfated
N-glycopeptide identifications might be caused by the assignation of sulfation plus two HexNAc instead of three hexoses (Hex
3 = 486.1585 Da and HexNAc
2Sulfo
1 = 486.1156 Da). Shu et al. [
70] detected phosphorylated and sulfated
N-glycans in many blood plasma proteins, including some proteins also found in our work. In contrast to our studies, all the sulfated
N-glycans identified in their study showed hexose sulfation instead of HexNAc sulfation. A previous study showed that
N-glycans holding a sulfated galactose (e.g., Gal-3-sulfate) do not generate the ion Hex
1Sulfo
1 [M+H]
+; however, these sulfated
N-glycans generate HexNAc
1Hex
1Sulfo
1 [M+H]
+ (scarcely) and a sulfate neutral loss between the Y ion including galactose and the consecutive sulfated-Gal Y ion [
35]. This suggests that some of our sulfated
N-glycopeptide identifications lacking sulfated fragment ions might correspond to galactose-sulfated
N-glycopeptides; however, it is difficult to collect this evidence for all identifications, since it is necessary to manually annotate the Y ions observed in each HCD.low MS
2 spectra. During the manual validation of our data in the HCD.step MS
2 spectra, no HexNAc
1Hex
1Sulfo
1 [M+H]
+ oxonium ions were detected; only HexNAc
1Sulfo
1 [M+H]
+ was found. After
de novo sequencing in the HCD.low MS
2 spectra, HexNAc sulfation was confirmed in the antennae, most likely corresponding to sulfated-6-GlcNAc (
Fig. 7).
The majority of the sulfated
N-glycopeptides found in our case feature diantennary, mono- or di-sialylated, and sometimes core-fucosylated
N-glycans with LacNAc extensions, resembling the LacNAc sulfated glycoepitope: 6-sulfo sialyl Lewis X in its defucosylated form. The
N-glycopeptide identifications containing confirmed sulfated
N-glycans derive from the glycoproteins involved in blood coagulation, fibrinolysis, hemostasis, inflammatory response, mineral balance, osteogenesis, complement pathway, apoptosis, and innate immunity [
92].
LacNAc sulfated
N-glycans have various implications in immunology. On the one hand, 6-sulfo sialyl Lewis X (with GlcNAc-6-SO4) is a ligand for L-selectin, a cell adhesion molecule for the tethering and trafficking of lymphocytes through the peripheral nodes [
38]. On the other hand, the structural isomer, 6′-sulfo sialyl Lewis X (with Gal-6-SO4), is a primary ligand of Siglec-8. The cross linkage between 6′-sulfo sialyl Lewis X and Siglec-8 induces histamine and prostaglandin D2 in mast cells, whereas it induces apoptosis in eosinophils [
36], [
37]. A glycomic analysis performed by Yamada et al. [
20] on the serum of patients with pancreatic cancer showed that the abundance of sulfated
N-glycans was increased compared with that of the healthy controls. In contrast, the study showed that the amount of phosphorylated
N-glycans remained stable, emphasizing the importance of reliably discriminating between both
N-glycan modifications in pathophysiology.
For the first time, Sleat et al. [
39] detected Man-6-P in high-abundant plasma glycoproteins. They calculated the relative fraction of proteins with phosphorylated
N-glycans in blood plasma and compared it with the fraction of proteins with phosphorylated
N-glycans from lysosomes, concluding that phosphorylated
N-glycans in blood plasma proteins exist as traces. It is known that Man-6-P is recognized by transporters (Man-6-P receptors) that carry the Man-6-P-modified glycoprotein to the lysosomes [
39]. This receptor, which is expressed in all human cells and tissues, is not only present intracellularly in the Golgi apparatus and endosomes but also extracellularly, in cell membranes [
40]. In the extracellular context, Man-6-P receptors such as CD222 (a receptor from the P-lectin family) are involved in protein internalization, trafficking, lysosomal biogenesis, apoptosis, cell migration, and the regulation of cell growth [
40]. Overall, this suggests that interactions between mannose-phosphorylated
N-glycans and binding molecules such as CD222 are essential in physiology [
40], [
41].
Glucuronidation is another
N-glycan modification not regularly explored in the clinical context.
N-glycopeptides containing glucuronic acid were observed after the manual curation of incorrectly assigned
N-glycopeptide identifications. The incorrect matches might have resulted from the assignation of two NeuAc instead of two HexNAc plus one GlcA residue, which have the same atomic composition in total (2NeuAc = 2[C
11H
17O
8N] = C
22H
34O
16N
2 and 2HexNAc+1GlcA = 2[C
8H
13O
5N]+1[C
6H
8O
6] = C
22H
34O
16N
2 = 582.1908 Da). Huffman et al. [
21] observed glucuronidated
N-glycans in the blood plasma
N-glycome of different European populations. Yamada et al. [
20] reported a reduced relative abundance of glucuronidated
N-glycans from the serum of patients with pancreatic cancer, compared with healthy controls. Sulfated glucuronic acid is more common in brain
N-glycans as a key component of the glycoepitope HNK-1 (SO
4-3GlcAβ1-3Galβ1-4GlcNAc). HNK-1 influences neuronal functions such as adhesion, cell recognition, migration, preferential motor-re-innervation, synaptic plasticity, and post-trauma regeneration in the peripheral and central nervous systems [
93], [
94]. Laminins and cadherin-2 are HNK-1 binding proteins [
94]. Nevertheless, it is unclear whether this is relevant to the biological role of non-sulfated HNK-1.
Two new
N-glycan building blocks with masses of 245.0524 and 259.0672 Da were identified, attached to complex diantennary monosialylated
N-glycans. These masses are unrelated to the also atypical ketodeoxynononic acid identified by Wang et al. [
95] as a capping sugar in
N-glycans from human prostate-specific antigen purified from seminal fluid. During our literature search, we could not find any reported residues that resemble such
N-glycan building blocks in humans. A limitation of this observation is the lack of an orthogonal method to describe the molecule structure. Interestingly, both rare-
N-glycan building blocks were found in prothrombin (glycosylation site N
121). This
N-glycosylation site corresponds to prothrombin fragment region 1 (also called kringle-1), which is important for calcium-mediated membrane-surface binding [
96].
Bioinformatics tools able to provide a site-specific
N-glycan description from the analysis of intact glycopeptides emerged 12 years ago [
97]. Since then, software design has moved toward improving gPSM quality (pGlyco 2.0), visualization of results (pGlyco 2.0, glyXtoolMS, GPSeeker), and annotation of
N-glycopeptide structural features (GPSeeker, glyXtoolMS, StrucGP) [
26], [
34], [
98], [
99]. Nonetheless, challenges that still prevent the accurate analysis of an
N-glycoproteome include the lack of comprehensiveness, ambiguity on peptide composition or glycan structure, missing structural information, and false positives [
22]. These issues hamper the detection of
N-glycopeptides that are potentially useful as biomarkers or for therapy. Although manual validation can alternatively be used to detect pitfalls after a glycoproteomic search, its disadvantages include high effort and time requirements, as well as the bias caused by each data reviewer. For future developments, it would be appropriate for
N-glycoproteomic software to include an
N-glycan diagnostic step plus the integration of
N-glycan structural features during gPSM. Missing
N-glycan compositions, lack of information, or incorrect parameters are barriers hindering the full explanation of
N-glycoproteomic spectra input. Our blood plasma
N-glycoproteomic analysis was intended to find
N-glycopeptides hidden between the obscure niche of the low-abundant
N-glycoproteome and MS
2 spectra that are neglected due to insufficient search parameters. While a fraction of “Uncertain” gPSMs were corrected (8.3% of the total) through manual validation and data reprocessing, we acknowledge through the categories “Uncertain-change
N-glycan and peptide” and “Uncertain glycopeptide” that 51% of the total gPSMs are still unexplained, largely due to the low quality of the fragment ion spectra. Nonetheless, we were able to identify several previously unknown and rare
N-glycans, as well as various atypical
N-glycan structures that are potentially useful for the design of biotherapeutics, clinical diagnostics (i.e., biomarker discovery), or the exploration of
N-glycan-protein interactions and functions. Hence, future studies could benefit glycomedicine by including the identification of rare
N-glycans across a broad range of HBP samples, using the approaches showcased here. We did not conduct a quantitative analysis, as it was outside the scope of this study, and fractionation might limit the accuracy of the analyses. Therefore, we hope that new studies aiming to expand diagnostic opportunities using our results can conduct quantitative analyses that target the features of the low-abundant HBP glycoproteome reported here.
5. Conclusions
Glycoproteomic sample preparation, LC-MS measurement, and data analysis software have evolved in recent years. Nonetheless, two major limitations still need to be addressed: structural glycan elucidation and the precision of gPSMs. Blood plasma analysis is a common and challenging task for exploring the potential of new N-glycoproteomic search strategies. In this work, we developed and applied a sample preparation workflow comprising the fractionation of HAP-depleted blood plasma and glycopeptide enrichment, followed by two LC-MS/MS measurements using the fragmentation energies HCD.step and HCD.low. This workflow enables the detection of glycoproteins in a concentration range from 109 down to 103 pg·mL−1, thus expanding the detection range by five orders of magnitude compared with the direct analysis of blood plasma. Validation and curation of the N-glycoproteomic search is based on a novel gPSM decision tree to critically assess mass spectral evidence on the peptide and N-glycan level. This approach makes it possible to reliably elucidate structural glycan features and to identify rare N-glycan compositions—including, for example, the presence of glucuronic acid and two rare N-glycan building blocks (245.0524 and 259.0672 Da). Furthermore, an atypical disialylated-antenna N-glycan structure containing a HexNAc-NeuAc linkage that resembles the Lewis C epitope NeuAc α2-3Galβ 1-3(NeuAc α2-6)GlcNAc was identified essentially based on the presence of the oxonium ion HexNAc1NeuAc1 [M+H]+. Interestingly, without HCD.low spectra annotation, NeuAc moieties in those N-glycopeptides were erroneously assumed to be spread on different antennae. Other structural groups that are difficult to discriminate, such as antenna versus core fucose, diLacNAc versus multi-antennae, bisecting GlcNAc, and phosphorylation versus sulfation, were also reliably confirmed. In contrast to previous reports, no hexose sulfation but only HexNAc sulfation was detected with high certainty through oxonium marker ions. By making use of manual validation, it was possible to uncover pitfalls that have hindered valuable data.
We propose that automated validation tools could be fine-tuned using the validation strategies applied here. We consider that transferring the key features of our data analysis workflow to bioinformatics tools is the most efficient way to obtain an accurate picture of the N-glycoproteome of any sample. In the future, the uncovered glycoproteomics results could be applied in the fields of biomarker discovery, biotherapeutic products, and biochemistry.
CRediT authorship contribution statement
Frania J. Zuniga-Banuelos: Writing - original draft, Visualization, Validation, Methodology, Investigation, Formal analysis, Data curation, Conceptualization. Marcus Hoffmann: Writing - review & editing, Validation, Supervision, Methodology, Conceptualization. Udo Reichl: Writing - review & editing, Supervision, Project administration, Funding acquisition. Erdmann Rapp: Writing - review & editing, Validation, Supervision, Project administration, Methodology, Funding acquisition, Conceptualization.
Declaration of competing interest
The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: Erdmann Rapp is founder and CEO of glyXera GmbH. Frania J. Zuniga-Banuelos is employee of glyXera GmbH and Max Planck Institute. glyXera provides high performance glycoanalytical products and services, and holds several patents on xCGE-LIF based glycoanalysis. Udo Reichl is shareholder of glyXera GmbH. Marcus Hoffmann declares no conflict of interest.
Acknowledgments
The authors gratefully thank Iva Budimir for creating the visProteomics R package. Moreover, we thank Barbara Koehler and Lisa Fichtmueller for their technical support. This work was supported by European Commission (EC) Horizon 2020 research and innovation program for Frania J. Zuniga-Banuelos and Erdmann Rapp under the project ‘‘IMforFUTURE” (H2020-MSCA-ITN/721815), and by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) for Marcus Hoffmann and Erdmann Rapp under the project ‘‘The concert of dolichol-based glycosyla-tion: from molecules to disease models” (FOR2509).
Data availability statement
All the raw data and search results produced in this study were deposited to the ProteomeXChange Consortium, dataset PXD042039 (
https://proteomecentral.proteomexchange.org/cgi/GetDataset?ID=PXD042039), through MassIVE MSV000091870 (ftp://massive.ucsd.edu/MSV000091870/).
Appendix A. Supplementary data
Supplementary data to this article can befoundonlineat
https://doi.org/10.1016/j.eng.2024.11.039.