![]() |
OpenMS
|
Detects features in MS1 data based on peptide identifications.
pot. predecessor tools | → FeatureFinderIdentification → | pot. successor tools |
---|---|---|
PeakPickerHiRes (optional) | ProteinQuantifier | |
IDFilter |
Reference:
Weisser & Choudhary: Targeted Feature Detection for Data-Dependent Shotgun Proteomics (J. Proteome Res., 2017, PMID: 28673088).
This tool detects quantitative features in MS1 data based on information from peptide identifications (derived from MS2 spectra). It uses algorithms for targeted data analysis from the OpenSWATH pipeline.
The aim is to detect features that enable the quantification of (ideally) all peptides in the identification input. This is based on the following principle: When a high-confidence identification (ID) of a peptide was made based on an MS2 spectrum from a certain (precursor) position in the LC-MS map, this indicates that the particular peptide is present at that position, so a feature for it should be detectable there.
Targeted data analysis on the MS1 level uses OpenSWATH algorithms and follows roughly the steps outlined below.
Use of inferred ("external") IDs
The situation becomes more complicated when several LC-MS/MS runs from related samples of a label-free experiment are considered. In order to quantify a larger fraction of the peptides/proteins in the samples, it is desirable to infer peptide identifications across runs. Ideally, all peptides identified in any of the runs should be quantified in each and every run. However, for feature detection of inferred ("external") IDs, the following problems arise: First, retention times may be shifted between the run being quantified and the run that gave rise to the ID. Such shifts can be corrected (see MapAlignerIdentification), but only to an extent. Thus, the RT location of the inferred ID may not necessarily lie within the RT range of the correct feature. Second, since the peptide in question was not directly identified in the run being quantified, it may not actually be present in detectable amounts in that sample, e.g. due to differential regulation of the corresponding protein. There is thus a risk of introducing false-positive features.
FeatureFinderIdentification deals with these challenges by explicitly distinguishing between internal IDs (derived from the LC-MS/MS run being quantified) and external IDs (inferred from related runs). Features derived from internal IDs give rise to a training dataset for an SVM classifier. The SVM is then used to predict which feature candidates derived from external IDs are most likely to be correct. See steps 4 and 5 below for more details.
1. Assay generation
Feature detection is based on assays for identified peptides, each of which incorporates the retention time (RT), mass-to-charge ratio (m/z), and isotopic distribution (derived from the sequence) of a peptide. Peptides with different modifications are considered different peptides. One assay will be generated for every combination of (modified) peptide sequence, charge state, and RT region that has been identified. The RT regions arise by pooling all identifications of the same peptide, considering a window of size extract:rt_window
around every RT location that gave rise to an ID, and then merging overlapping windows.
2. Ion chromatogram extraction
Ion chromatograms (XICs) are extracted from the LC-MS data (parameter in
). One XIC per isotope in an assay is generated, with the corresponding m/z value and RT range (variable, depending on the RT region of the assay).
3. Feature detection
Next feature candidates - typically several per assay - are detected in the XICs and scored. A variety of scores for different quality aspects are calculated by OpenSWATH.
4. Feature classification
Feature candidates derived from assays with "internal" IDs are classed as "negative" (candidates without matching internal IDs), "positive" (the single best candidate per assay with matching internal IDs), and "ambiguous" (other candidates with matching internal IDs). If "external" IDs were given as input, features based on them are initially classed as "unknown". Also in this case, a support vector machine (SVM) is trained on the "positive" and "negative" candidates, to distinguish between the two classes based on the different OpenSWATH quality scores (plus an RT deviation score). After parameter optimization by cross-validation, the resulting SVM is used to predict the probability of "unknown" feature candidates being positives.
5. Feature filtering
Feature candidates are filtered so that at most one feature per peptide and charge state remains. For assays with internal IDs, only candidates previously classed as "positive" are kept. For assays based solely on external IDs, the feature candidate with the highest SVM probability is selected and kept (possibly subject to the svm:min_prob
threshold).
6. Elution model fitting
Elution models can be fitted to the features to improve the quantification. For robustness, one model is fitted to all isotopic mass traces of a feature in parallel. A symmetric (Gaussian) and an asymmetric (exponential-Gaussian hybrid) model type are available. The fitted models are checked for plausibility before they are accepted.
Finally the results (feature maps, parameter out
) are returned.
The command line parameters of this tool are:
INI file documentation of this tool: