OpenMS  2.6.0

Identifies peptides in MS/MS spectra via Mascot.

 pot. predecessor tools MascotAdapter pot. successor tools any signal-/preprocessing tool (in mzML format) IDFilter or any protein/peptide processing tool

This wrapper application serves for getting peptide identifications for MS/MS spectra. It uses a local installation of the Mascot server to generate the identifications. A second wrapper (MascotAdapterOnline) is available which is able to perform identifications by communicating with a Mascot server over the network. So, it is not necessary to execute MascotAdapterOnline on the same machine as Mascot.

The minimal version of Mascot supported with this server is 2.1.

This wrapper can be executed in three different modes:

1. The whole process of ProteinIdentification via Mascot is executed. Inputfile is a mzData file containing the MS/MS spectra for which the identifications are to be found. The results are written as a idXML output file. This mode is selected by default.

2. Only the first part of the ProteinIdentification process is performed. This means that the MS/MS data is transformed into Mascot Generic Format (mgf) which can be used directly with Mascot. Being in the cgi directory of the Mascot directory calling a Mascot process should look like the following:

./nph-mascot.exe 1 -commandline -f outputfilename < inputfilename

Consult your Mascot reference manual for further details.

This mode is selected by the -mascot_in option in the command line.

3. Only the second part of the ProteinIdentification process is performed. This means that the outputfile of the Mascot server is translated into idXML.

This mode is selected by the -mascot_out option in the command line.

If your Mascot server is installed on the same computer as the TOPP applications the MascotAdapter can be executed in mode 1. Otherwise the Mascot engine has to be executed manually assisted by mode 2 and mode 3. The ProteinIdentification steps then look like:

• execute MascotAdapter in mode 2
./MascotAdapter -in mzDataFile -out mascotGenericFormatFile -mascot_in
• copy mascotGenericFormatFile to your Mascot server
• call your Mascot server process:
./nph-mascot.exe 1 -commandline -f mascotOutFile < mascotGenericFormatFile
• call the script to export your outfile in mascot xml
./export_dat.pl do_export=1 export_format=XML file=mascotOutFile _sigthreshold=0
_showsubset=1 show_same_sets=1 show_unassigned=0 prot_score=0 pep_exp_z=0 pep_score=0
pep_homol=0 pep_ident=0 pep_seq=1 show_header=1 show_queries=1 pep_rank=0 > mascotXMLFile
• copy mascotXMLFile to the server on which the TOPP applications are installed
• execute MascotAdapter in mode 3
./MascotAdapter -in mascotXMLFile -out IdXMLFile -mascot_out

For mode 1 you have to specify the directory in which the Mascot server is installed. This is done by setting the option mascot_dir in the ini file. Furthermore you have to specify a folder in which the user has write permissions. This is done by setting the option temp_data_directory in the ini file. Two temporary files will be created in this directory during execution but deleted at the end of execution.

Note
Currently mzIdentML (mzid) is not directly supported as an input/output format of this tool. Convert mzid files to/from idXML using IDFileConverter if necessary.

The command line parameters of this tool are:

```MascotAdapter -- Annotates MS/MS spectra using Mascot.
Version: 2.6.0 Sep 30 2020, 12:54:34, Revision: c26f752
To cite OpenMS:
Rost HL, Sachsenberg T, Aiche S, Bielow C et al.. OpenMS: a flexible open-source software platform for mass spectrometry data analysis. Nat Meth. 2016; 13, 9: 741-748. doi:10.1038/nmeth.3959.

Usage:

Options (mandatory options marked with '*'):
-in <file>*                      Input file in mzData format.
Note: In mode 'mascot_out' a Mascot results file (.mascotXML) is read (va
lid formats: 'mzData', 'mascotXML')
-out <file>*                     Output file in idXML format.
Note: In mode 'mascot_in' Mascot generic format is written. (valid format
s: 'idXML', 'mgf')
-out_type <type>                 Output file type (for TOPPAS) (valid: 'idXML', 'mgf')
-instrument <i>                  The instrument that was used to measure the spectra (default: 'Default')
-precursor_mass_tolerance <tol>  The precursor mass tolerance (default: '2.0')
-peak_mass_tolerance <tol>       The peak mass tolerance (default: '1.0')
-taxonomy <tax>                  The taxonomy (default: 'All entries' valid: 'All entries', '. . Archaea
(Archaeobacteria)', '. . Eukaryota (eucaryotes)', '. . . . Alveolata (alve
olates)', '. . . . . . Plasmodium falciparum (malaria parasite)', '. . .
. . . Other Alveolata', '. . . . Metazoa (Animals)', '. . . . . . Caenorha
bditis elegans', '. . . . . . Drosophila (fruit flies)', '. . . . . . Chor
data (vertebrates and relatives)', '. . . . . . . . bony vertebrates', '.
. . . . . . . . . lobe-finned fish and tetrapod clade', '. . . . . . . .
. . . . Mammalia (mammals)', '. . . . . . . . . . . . . . Primates', '. .
...
ilable')
-modifications <mods>            The modifications i.e. Carboxymethyl (C)
-variable_modifications <mods>   The variable modifications i.e. Carboxymethyl (C)
-charges [1+ 2+ ...]             The different charge states (default: '[1+ 2+ 3+]')
-db <name>                       The database to search in (default: 'MSDB')
-hits <num>                      The number of hits to report (default: 'AUTO')
-cleavage <enz>                  The enzyme descriptor to the enzyme used for digestion. (Trypsin is defaul
t, None would be best for peptide input or unspecific digestion, for more
please refer to your mascot server). (default: 'Trypsin' valid: 'Trypsin',
'Arg-C', 'Asp-N', 'Asp-N_ambic', 'Chymotrypsin', 'CNBr', 'CNBr+Trypsin',
'Formic_acid', 'Lys-C', 'Lys-C/P', 'PepsinA', 'Tryp-CNBr', 'TrypChymo',
'Trypsin/P', 'V8-DE', 'V8-E', 'semiTrypsin', 'LysC+AspN', 'None')
-missed_cleavages <num>          Number of allowed missed cleavages (default: '0' min: '0')
-sig_threshold <num>             Significance threshold (default: '0.05')
-pep_homol <num>                 Peptide homology threshold (default: '1.0')
-pep_ident <num>                 Peptide ident threshold (default: '1.0')
-pep_rank <num>                  Peptide rank (default: '1')
-prot_score <num>                Protein score (default: '1.0')
-pep_score <num>                 Peptide score (default: '1.0')
-pep_exp_z <num>                 Peptide expected charge (default: '1')
-show_unassigned <num>           Show_unassigned (default: '1')
-first_dim_rt <num>              Additional information which is added to every peptide identification as
metavalue if set > 0 (default: '0.0')
-boundary <string>               MIME boundary for mascot output format
-mass_type <type>                Mass type (default: 'Monoisotopic' valid: 'Monoisotopic', 'Average')
-mascot_directory <dir>          The directory in which mascot is located
-temp_data_directory <dir>       A directory in which some temporary files can be stored

Common TOPP options:
-ini <file>                      Use the given TOPP INI file
-threads <n>                     Sets the number of threads allowed to be used by the TOPP tool (default:
'1')
-write_ini <file>                Writes the default configuration file
--help                           Shows options
--helphelp                       Shows all options (including advanced)

```

INI file documentation of this tool:

Legend:
required parameter
+MascotAdapterAnnotates MS/MS spectra using Mascot.
version2.6.0 Version of the tool that generated this parameters file.
++1Instance '1' section for 'MascotAdapter'
in input file in mzData format.
Note: In mode 'mascot_out' a Mascot results file (.mascotXML) is read
input file*.mzData,*.mascotXML
out output file in idXML format.
Note: In mode 'mascot_in' Mascot generic format is written.
output file*.idXML,*.mgf
out_type output file type (for TOPPAS)idXML,mgf
instrumentDefault the instrument that was used to measure the spectra
precursor_mass_tolerance2.0 the precursor mass tolerance
peak_mass_tolerance1.0 the peak mass tolerance
taxonomyAll entries the taxonomyAll entries,. . Archaea (Archaeobacteria),. . Eukaryota (eucaryotes),. . . . Alveolata (alveolates),. . . . . . Plasmodium falciparum (malaria parasite),. . . . . . Other Alveolata,. . . . Metazoa (Animals),. . . . . . Caenorhabditis elegans,. . . . . . Drosophila (fruit flies),. . . . . . Chordata (vertebrates and relatives),. . . . . . . . bony vertebrates,. . . . . . . . . . lobe-finned fish and tetrapod clade,. . . . . . . . . . . . Mammalia (mammals),. . . . . . . . . . . . . . Primates,. . . . . . . . . . . . . . . . Homo sapiens (human),. . . . . . . . . . . . . . . . Other primates,. . . . . . . . . . . . . . Rodentia (Rodents),. . . . . . . . . . . . . . . . Mus.,. . . . . . . . . . . . . . . . . . Mus musculus (house mouse),. . . . . . . . . . . . . . . . Rattus,. . . . . . . . . . . . . . . . Other rodentia,. . . . . . . . . . . . . . Other mammalia,. . . . . . . . . . . . Xenopus laevis (African clawed frog),. . . . . . . . . . . . Other lobe-finned fish and tetrapod clade,. . . . . . . . . . Actinopterygii (ray-finned fishes),. . . . . . . . . . . . Takifugu rubripes (Japanese Pufferfish),. . . . . . . . . . . . Danio rerio (zebra fish),. . . . . . . . . . . . Other Actinopterygii,. . . . . . . . Other Chordata,. . . . . . Other Metazoa,. . . . Dictyostelium discoideum,. . . . Fungi,. . . . . . Saccharomyces Cerevisiae (baker's yeast),. . . . . . Schizosaccharomyces pombe (fission yeast),. . . . . . Pneumocystis carinii,. . . . . . Other Fungi,. . . . Viridiplantae (Green Plants),. . . . . . Arabidopsis thaliana (thale cress),. . . . . . Oryza sativa (rice),. . . . . . Other green plants,. . . . Other Eukaryota,. . Bacteria (Eubacteria),. . . . Actinobacteria (class),. . . . . . Mycobacterium tuberculosis complex,. . . . . . Other Actinobacteria (class),. . . . Firmicutes (gram-positive bacteria),. . . . . . Bacillus subtilis,. . . . . . Mycoplasma,. . . . . . Streptococcus Pneumoniae,. . . . . . Streptomyces coelicolor,. . . . . . Other Firmicutes,. . . . Proteobacteria (purple bacteria),. . . . . . Agrobacterium tumefaciens,. . . . . . Campylobacter jejuni,. . . . . . Escherichia coli,. . . . . . Neisseria meningitidis,. . . . . . Salmonella,. . . . . . Other Proteobacteria,. . . . Other Bacteria,. . Viruses,. . . . Hepatitis C virus,. . . . Other viruses,. . Other (includes plasmids and artificial sequences),. . unclassified,. . Species information unavailable
modifications[] the modifications i.e. Carboxymethyl (C)
variable_modifications[] the variable modifications i.e. Carboxymethyl (C)
charges[1+, 2+, 3+] the different charge states
dbMSDB the database to search in
hitsAUTO the number of hits to report
cleavageTrypsin The enzyme descriptor to the enzyme used for digestion. (Trypsin is default, None would be best for peptide input or unspecific digestion, for more please refer to your mascot server).Trypsin,Arg-C,Asp-N,Asp-N_ambic,Chymotrypsin,CNBr,CNBr+Trypsin,Formic_acid,Lys-C,Lys-C/P,PepsinA,Tryp-CNBr,TrypChymo,Trypsin/P,V8-DE,V8-E,semiTrypsin,LysC+AspN,None
missed_cleavages0 number of allowed missed cleavages0:∞
sig_threshold0.05 significance threshold
pep_homol1.0 peptide homology threshold
pep_ident1.0 peptide ident threshold
pep_rank1 peptide rank
prot_score1.0 protein score
pep_score1.0 peptide score
pep_exp_z1 peptide expected charge
show_unassigned1 show_unassigned
first_dim_rt0.0 additional information which is added to every peptide identification as metavalue if set > 0
boundary MIME boundary for mascot output format
mass_typeMonoisotopic mass typeMonoisotopic,Average
mascot_directory the directory in which mascot is located
temp_data_directory a directory in which some temporary files can be stored
log Name of log file (created only when specified)
debug0 Sets the debug level
threads1 Sets the number of threads allowed to be used by the TOPP tool
no_progressfalse Disables progress logging to command linetrue,false
forcefalse Overrides tool-specific checkstrue,false
testfalse Enables the test mode (needed for internal use only)true,false

You can specify the Mascot parameters precursor_mass_tolerance (the peptide mass tolerance), peak_mass_tolerance (the MS/MS tolerance), taxonomy (restriction to a certain subset of the database), modifications, variable_modifications, charges (the possible charge variants), db (database where the peptides are searched in), hits (number of hits), cleavage (the cleavage enzyme), missed_cleavages (number of missed cleavages) and mass_type (Monoisotopic or Average) via the ini file.

Known problems with Mascot server execution:

• getting error message: "FATAL_ERROR: M00327 The ms-monitor daemon/service is not running, please start it."

• Possible explanations:
• Your ms-monitor is really not running => consult your Mascot reference manual for details about starting the Mascot server.
• (Suppose you have Mascot installed in directory mascot.) mascot/data/mascot.control is not writable for the current user. This has to be changed. Otherwise you will not be able to use the Mascot server via the shell and receive the above error message.
=> Change write permissions of the file mascot/data/mascot.control such that the current user has write permissions to it.
Todo:
This adapter is using antiquated internal methods and needs to be updated! E.g. use MascotGenericFile.h instead of MascotInfile.h....