OpenMS
Loading...
Searching...
No Matches
RTModel

Used to train a model for peptide retention time prediction or peptide separation prediction.

For retention time prediction, a support vector machine is trained with peptide sequences and their measured retention times. For peptide separation prediction, two files have to be given: One file contains the positive examples (the peptides which are collected) and the other contains the negative examples (the flowthrough peptides).

These methods and applications of this model are described in the following publications:

Nico Pfeifer, Andreas Leinenbach, Christian G. Huber and Oliver Kohlbacher Statistical learning of peptide retention behavior in chromatographic separations: A new kernel-based approach for computational proteomics. BMC Bioinformatics 2007, 8:468

Nico Pfeifer, Andreas Leinenbach, Christian G. Huber and Oliver Kohlbacher Improving Peptide Identification in Proteome Analysis by a Two-Dimensional Retention Time Filtering Approach J. Proteome Res. 2009, 8(8):4109-15

There are a number of parameters which can be changed for the svm (specified in the ini file and command line):

  • svm_type: the type of the svm (can be NU_SVR or EPSILON_SVR for RT prediction and is C_SVC for separation prediction)
  • kernel_type: the kernel function (e.g., POLY for the polynomial kernel, LINEAR for the linear kernel or RBF for the gaussian kernel); we recommend SVMWrapper::OLIGO for our paired oligo-border kernel (POBK)
  • border_length: border length for the POBK
  • k_mer_length: length of the signals considered in the POBK
  • sigma: the amount of positional smoothing for the POBK
  • degree: the degree parameter for the polynomial kernel
  • c: the penalty parameter of the svm
  • nu: the nu parameter for nu-SVR
  • p: the epsilon parameter for epsilon-SVR


The last five parameters (sigma, degree, c, nu and p) can be used in a cross validation (CV) to find the best parameters according to the training set. Therefore you have to specify the start value of a parameter, the step size in which the parameters should be increased and a final value for the particular parameter such that the tested parameter is never bigger than the given final value. If you want to perform a cross validation for example for the parameter c, enable CV (across all 5 parameters) and set skip_cv to false in the INI file. This can be easily done with using the INIFileEditor.

Furthermore, you can specify the number of partitions for the CV with number_of_partitions in the ini file and the number of runs with number_of_runs.


Consequently you have two choices to use this application:

  1. Set the parameters of the svm: The RTModel application will train the svm with the training data and store the svm model
  2. Give a range of parameters for which a CV should be performed: The RTModel application will perform a CV to find the best parameter combination in the given range and afterwards train the svm with the best parameters and the whole training data. Then the model is stored.


The model can be used in RTPredict, to predict retention times for peptides or peptide separation depending on how you trained the model.

Note
Currently mzIdentML (mzid) is not directly supported as an input/output format of this tool. Convert mzid files to/from idXML using IDFileConverter if necessary.

The command line parameters of this tool are:

INI file documentation of this tool: