![]() |
OpenMS
|
Computes confidence scores for OpenSwath results.
potential predecessor tools | → OpenSwathConfidenceScoring → | potential successor tools |
---|---|---|
OpenSwathAnalyzer | OpenSwathFeatureXMLToTSV |
This is an implementation of the SRM scoring algorithm described in:
Malmstroem, L.; Malmstroem, J.; Selevsek, N.; Rosenberger, G. & Aebersold, R.:
Automated workflow for large-scale selected reaction monitoring experiments.
J. Proteome Res., 2012, 11, 1644-1653
It has been adapted for the scoring of OpenSwath results.
The algorithm compares SRM/MRM features (peak groups) to assays and computes scores for the agreements. Every feature is compared not only to the "true" assay that was used to acquire the corresponding ion chromatograms, but also to a number (parameter decoys
) of unrelated - but real - assays selected at random from the assay library (parameter lib
). This serves to establish a background distribution of scores, against which the significance of the "true" score can be evaluated. The final confidence value of a feature is the local false discovery rate (FDR), calculated as the fraction of decoy assays that score higher than the "true" assay against the feature. In the output feature map, every feature is annotated with its local FDR in the meta value "local_FDR" (a "userParam" element in the featureXML), and its overall quality is set to "1 - local_FDR".
The agreement of a feature and an assay is assessed based on the difference in retention time (RT) and on the deviation of relative transition intensities. The score S is computed using a binomial generalized linear model (GLM) of the form:
\[ S = \frac{1}{1 + \exp(-(a + b \cdot \Delta_{RT}^2 + c \cdot d_{int}))} \]
The meanings of the model terms are as follows:
\( \Delta_{RT} \): Observed retention times are first mapped to the scale of the assays (parameter trafo
), then all RTs are scaled to the range 0 to 100 (based on the lowest/highest RT in the assay library). \( \Delta_{RT} \) is the absolute difference of the scaled RTs; note that this is squared in the scoring model.
\( d_{int} \): To compute the intensity distance, the n (advanced parameter transitions
) most intensive transitions of the feature are selected. For comparing against the "true" assay, the same transitions are considered; otherwise, the same number of most intensive transitions from the decoy assay. Transition intensities are scaled to a total of 1 per feature/assay and are ordered by the product (Q3) m/z value. Then the Manhattan distance of the intensity vectors is calculated (Malmstroem et al. used the RMSD instead, which has been replaced here to be independent of the number of transitions).
\( a, b, c \): Model coefficients, stored in the advanced parameters GLM:intercept
, GLM:delta_rt
, and GLM:dist_int
. The default values were estimated based on the training dataset used in the Malmstroem et al. study, reprocessed with the OpenSwath pipeline.
In addition to the local FDRs, the scores of features against their "true" assays are recorded in the output - in the meta value "GLM_score" of the respective feature.
The command line parameters of this tool are:
INI file documentation of this tool: