OpenMS
Loading...
Searching...
No Matches
FileFilter

Extracts portions of the data from an mzML, featureXML or consensusXML file.

pot. predecessor tools → FileFilter → pot. successor tools
any tool yielding output
in mzML, featureXML
or consensusXML format

any tool that profits on reduced input

With this tool it is possible to extract m/z, retention time and intensity ranges from an input file and to write all data that lies within the given ranges to an output file.

Depending on the input file type, additional specific operations are possible:

  • mzML
    • extract spectra of a certain MS level
    • filter by signal-to-noise estimation
    • filter by scan mode of the spectra
    • filter by scan polarity of the spectra
  • remove MS2 scans whose precursor matches identifications (from an idXML file in 'id:blacklist')
  • featureXML
    • filter by feature charge
    • filter by feature size (number of subordinate features)
    • filter by overall feature quality
  • consensusXML
    • filter by size (number of elements in consensus features)
    • filter by consensus feature charge
    • filter by map (extracts specified maps and re-evaluates consensus centroid)
      e.g. FileFilter -map 2 3 5 -in file1.consensusXML -out file2.consensusXML
      If a single map is specified, the feature itself can be extracted.
      e.g. FileFilter -map 5 -in file1.consensusXML -out file2.featureXML
  • featureXML / consensusXML:
  • remove items with a certain meta value annotation. Allowing for >, < and = comparisons. List types are compared by length, not content. Integer, Double and String are compared using their build-in operators.
    • filter sequences, e.g. "LYSNLVER" or the modification "(Phospho)"
      e.g. FileFilter -id:sequences_whitelist Phospho -in file1.consensusXML -out file2.consensusXML
    • filter accessions, e.g. "sp|P02662|CASA1_BOVIN"
    • remove features with annotations
    • remove features without annotations
    • remove unassigned peptide identifications
    • filter id with best score of features with multiple peptide identifications
      e.g. FileFilter -id:remove_unannotated_features -id:remove_unassigned_ids -id:keep_best_score_id -in file1.featureXML -out file2.featureXML
    • remove features with id clashes (different sequences mapped to one feature)

The priority of the id-flags is (decreasing order): remove_annotated_features / remove_unannotated_features -> remove_clashes -> keep_best_score_id -> sequences_whitelist / accessions_whitelist

MS2 and higher spectra can be filtered according to precursor m/z (see 'peak_options:pc_mz_range'). This flag can be combined with 'rt' range to filter precursors by RT and m/z. If you want to extract an MS1 region with untouched MS2 spectra included, you will need to split the dataset by MS level, then use the 'mz' option for MS1 data and 'peak_options:pc_mz_range' for MS2 data. Afterwards merge the two files again. RT can be filtered at any step.

Note
For filtering peptide/protein identification data, see the IDFilter tool.
Currently mzIdentML (mzid) is not directly supported as an input/output format of this tool. Convert mzid files to/from idXML using IDFileConverter if necessary.

The command line parameters of this tool are:

INI file documentation of this tool:

For the parameters of the S/N algorithm section see the class documentation there:
peak_options:sn