OpenMS  2.4.0

EPIFANY - Efficient protein inference for any peptide-protein network is a Bayesian protein inference engine. It uses PSM (posterior) probabilities from Percolator, OpenMS' IDPosteriorErrorProbability or similar tools to calculate posterior probabilities for proteins and protein groups.

Experimental classes:
This tool is work in progress and usage and input requirements might change.
pot. predecessor tools $ \longrightarrow $ Epifany $ \longrightarrow $ pot. successor tools
PercolatorAdapter IDFilter

It is a protein inference engine based on a Bayesian network. Currently the same model like Fido is used with the main parameters alpha (pep_emission), beta (pep_spurious_emission) and gamma (prot_prior). If not specified, these parameters are trained based on their classification performance and calibration via a grid search by simply running with several possible combinations and evaluating. Unless you see very extreme output probabilities (e.g. many close to 1.0) or you know good parameters (e.g. from an earlier run), grid search is recommended although slower. The tool will merge multiple idXML files (union of proteins and concatenation of PSMs) when given more than one. It assumes one search engine run per input file but might work on more. Proteins need to be indexed by OpenMS's PeptideIndexer but this is usually done before Percolator/IDPEP since target/decoy associations are needed there already. Make sure that the input PSM probabilities are not too extreme already (garbage in - garbage out). After merging the input probabilities are preprocessed with a low posterior probability cutoff to neglect very unreliable matches. Then the probabilities are aggregated with the maximum per peptide and the graph is built and split into connected components. When compiled with the OpenMP flag (default enabled in the release binaries) the tool is multithreaded which can be activated at runtime by the threads parameter. Note that peak memory requirements may rise significantly when processing multiple components of the graph at the same time.

The command line parameters of this tool are:

Epifany -- Runs a Bayesian protein inference.
Version: 2.4.0-HEAD-2019-03-12 Mar 12 2019, 13:43:07, Revision: a8ff34a
To cite OpenMS:
  Rost HL, Sachsenberg T, Aiche S, Bielow C et al.. OpenMS: a flexible open-source software platform for mass spectrometry data analysis. Nat Meth. 2016; 13, 9: 741-748. doi:10.1038/nmeth.3959.

  Epifany <options>

This tool has algorithm parameters that are not shown here! Please check the ini file for a detailed descript
ion or use the --helphelp option.

Options (mandatory options marked with '*'):
  -in <file>*                        Input: identification results (valid formats: 'idXML', 'consensusXML')
  -exp_design <file>                 (Currently unused) Input: experimental design (valid formats: 'tsv')
  -out <file>*                       Output: identification results with scored/grouped proteins (valid forma
                                     ts: 'idXML', 'consensusXML')
  -protein_fdr <option>              Additionally calculate the target-decoy FDR on protein-level based on 
                                     the posteriors (default: 'false' valid: 'true', 'false')
  -greedy_group_resolution <option>  Post-process inference output with greedy resolution of shared peptides 
                                     based on the parent protein probabilities. Also adds the resolved ambigu
                                     ity groups to output. (default: 'none' valid: 'none', 'remove_associatio
                                     ns_only', 'remove_proteins_wo_evidence')
Common UTIL options:
  -ini <file>                        Use the given TOPP INI file
  -threads <n>                       Sets the number of threads allowed to be used by the TOPP tool (default:
  -write_ini <file>                  Writes the default configuration file
  --help                             Shows options
  --helphelp                         Shows all options (including advanced)

The following configuration subsections are valid:
 - algorithm   Parameters for the Algorithm section

You can write an example INI file using the '-write_ini' option.
Documentation of subsection parameters can be found in the doxygen documentation or the INIFileEditor.
Have a look at the OpenMS documentation for more information.

INI file documentation of this tool:

required parameter
advanced parameter
+EpifanyRuns a Bayesian protein inference.
version2.4.0-HEAD-2019-03-12 Version of the tool that generated this parameters file.
++1Instance '1' section for 'Epifany'
in[] Input: identification resultsinput file*.idXML,*.consensusXML
exp_design (Currently unused) Input: experimental designinput file*.tsv
out Output: identification results with scored/grouped proteinsoutput file*.idXML,*.consensusXML
protein_fdrfalse Additionally calculate the target-decoy FDR on protein-level based on the posteriorstrue,false
conservative_fdrtrue Use (D+1)/(T) instead of (D+1)/(T+D) for reporting protein FDRs.true,false
greedy_group_resolutionnone Post-process inference output with greedy resolution of shared peptides based on the parent protein probabilities. Also adds the resolved ambiguity groups to output.none,remove_associations_only,remove_proteins_wo_evidence
psm_probability_cutoff0.001 Remove PSMs with probabilities less than or equal this cutoff
min_psms_extreme_probability0 Set PSMs with probability lower than this to this minimum probability.
max_psms_extreme_probability1 Set PSMs with probability higher than this to this maximum probability.
log Name of log file (created only when specified)
debug0 Sets the debug level
threads1 Sets the number of threads allowed to be used by the TOPP tool
no_progressfalse Disables progress logging to command linetrue,false
forcefalse Overwrite tool specific checks.true,false
testfalse Enables the test mode (needed for internal use only)true,false
+++algorithmParameters for the Algorithm section
top_PSMs1 Consider only top X PSMs per spectrum. 0 considers all.0:∞
update_PSM_probabilitiestrue (Experimental:) Update PSM probabilities with their posteriors under consideration of the protein probabilities.true,false
user_defined_priorsfalse (Experimental:) Uses the current protein scores as user-defined priors.true,false
annotate_group_probabilitiestrue Annotates group probabilities for indistinguishable protein groups (indistinguishable by experimentally observed PSMs).true,false
++++model_parametersModel parameters for the Bayesian network
prot_prior-1 Protein prior probability ('gamma' parameter). Negative values enable grid search for this param.-1:1
pep_emission-1 Peptide emission probability ('alpha' parameter). Negative values enable grid search for this param.-1:1
pep_spurious_emission-1 Spurious peptide identification probability ('beta' parameter). Usually much smaller than emission from proteins. Negative values enable grid search for this param.-1:1
pep_prior0.1 Peptide prior probability (experimental, should be covered by combinations of the other params).0:1
regularizefalse Regularize the number of proteins that produce a peptide together (experimental, should be activated when using higher p-norms).true,false
++++loopy_belief_propagationSettings for the loopy belief propagation algorithm.
scheduling_typepriority (Not used yet) How to pick the next message: priority = based on difference to last message (higher = more important). fifo = first in first out. random_spanning_tree = message passing follows a random spanning tree in each iterationpriority,fifo,random_spanning_tree
convergence_threshold1e-05 Initial threshold under which MSE difference a message is considered to be converged.1e-07:1
dampening_lambda0.001 Initial value for how strongly should messages be updated in each step. 0 = new message overwrites old completely (no dampening),1 = old message stays (no convergence, don't do that)In-between it will be a convex combination of both. Prevents oscillations but hinders convergence.1e-07:0.5
max_nr_iterations0 (Unused, autodetermined) If not all messages converge, how many iterations should be done at max?
p_norm_inference1 P-norm used for marginalization of multidimensional factors. 1 == sum-product inference (all configurations vote equally) (default),<= 0 == infinity = max-product inference (only best configurations propagate)The higher the value the more important high probability configurations get.
++++param_optimizeSettings for the parameter optimization.
aucweight0.2 How important is AUC vs calibration of the posteriors? 0 = maximize calibration only, 1 = maximize AUC only, between = convex combination.0:1
conservative_fdrtrue Use (D+1)/(T) instead of (D+1)/(T+D) for parameter estimation.true,false