OpenMS
Loading...
Searching...
No Matches
PTModel

Used to train a model for the prediction of proteotypic peptides.

The input consists of two files: One file contains the positive examples (the peptides which are proteotypic) and the other contains the negative examples (the nonproteotypic peptides).

Parts of this model has been described in the publication

Ole Schulz-Trieglaff, Nico Pfeifer, Clemens Gröpl, Oliver Kohlbacher and Knut Reinert LC-MSsim - a simulation software for Liquid Chromatography Mass Spectrometry data BMC Bioinformatics 2008, 9:423.

There are a number of parameters which can be changed for the svm (specified in the ini file):

  • kernel_type: the kernel function (e.g., POLY for the polynomial kernel, LINEAR for the linear kernel or RBF for the gaussian kernel); we recommend SVMWrapper::OLIGO for our paired oligo-border kernel (POBK)
  • border_length: border length for the POBK
  • k_mer_length: length of the signals considered in the POBK
  • sigma: the amount of positional smoothing for the POBK
  • degree: the degree parameter for the polynomial kernel
  • c: the penalty parameter of the svm
  • nu: the nu parameter for nu-SVC

The last five parameters (sigma, degree, c, nu and p) are used in a cross validation (CV) to find the best parameters according to the training set. Thus, you have to specify the start value of a parameter, the step size in which the parameters should be increased and a final value for the particular parameter such that the tested parameter is never bigger than the given final value. If you want to perform a cross validation, for example, for the parameter c, you have to specify c_start, c_step_size and c_stop in the ini file. Let's say you want to perform a CV for c from 0.1 to 2 with step size 0.1. Open up your ini-file with INIFileEditor and modify the fields c_start, c_step_size, and c_stop accordingly.

If the CV should test additional parameters in a certain range you just include them analogously to the example above. Furthermore, you can specify the number of partitions for the CV with number_of_partitions in the ini file and the number of runs with number_of_runs.


Consequently you have two choices to use this application:

  1. Set the parameters of the svm: The PTModel application will train the svm with the training data and store the svm model.
  2. Give a range of parameters for which a CV should be performed: The PTModel application will perform a CV to find the best parameter combination in the given range and afterwards train the svm with the best parameters and the whole training data. Then the model is stored.


The model can be used in PTPredict, to predict the likelihood for peptides to be proteotypic.

Note
Currently mzIdentML (mzid) is not directly supported as an input/output format of this tool. Convert mzid files to/from idXML using IDFileConverter if necessary.

The command line parameters of this tool are:

INI file documentation of this tool: