Sequence database setup: Spectral library

Mascot Server uses NIST MSPepSearch for searching spectral libraries. Library files in MSP format can be downloaded or created using Database Manager. Several libraries from NIST and PRIDE / EBI are predefined in Database Manager. These can be enabled on your Mascot Server very easily:

  1. From the Library menu in Database Manager, choose Enable predefined definition
  2. Choose Enable for the library of interest
  3. Unless you wish to change the location for the local files, choose Next
  4. A default reference database will be suggested. In most cases, just choose Create
  5. The MSP file will be downloaded and converted to NIST binary format. Once it shows as In use in Database Status, the library is available for searching.

Libraries are also available from PeptideAtlas / ISB. The ISB libraries are in SpectraST *.sptxt format, which is very similar to MSP and may be supported in a future release.

If you wish to configure library files that are not predefined:

  1. From the Library menu in Database Manager, choose Create new
  2. Choose a suitable name (click on the question mark for advice on naming)
  3. Select one of the following options
    • Copy Of: Copy an existing database. You will be required to enter a new name and given the choice of copying the existing database files.
    • Use predefined definition template: Start from a predefined definition. The differences between this and enabling a predefined definition are (i) you can make changes to the configuration, (ii) the definition will not be kept up-to-date automatically.
    • New custom definition: A new custom database definition from scratch.
    • Create from search results: Create the library by importing Mascot search results, described below
  4. The next page gives an opportunity to change the location for the local files. For Copy Of, you also have the choice of copying the existing database files.
  5. If required, the next page of the wizard allows you to specify where the *.MSP file can be found. This can be a download URL, a file path, or you can copy and rename the file yourself.
  6. Once the MSP file has been copied or downloaded, you can review and modify the configuration. See notes, below, for information about these settings
  7. The MSP file will be converted to NIST binary format. Once it shows as In use in Database Status, the library is available for searching.

Configuration notes

Parse rules

For a custom definition, if the library entries includes protein accessions, you must choose suitable parse rules to extract an accession and description. These accessions are not critical – most entries will get a more useful set of accessions from the reference database. In most cases, there will be an accession but no description, so you can choose \(.*\) for both. The MSP accession will only used if the peptide fails to map to any entry in the reference database.

MS/MS tolerances

The default library tolerance is quite wide, 0.6 Da / 500 ppm, because the entries may come from any type of instrument, and having a tolerance that is too wide is much better than one that is too narrow. If you are creating a library from data acquired on a specific instrument, capable of high MS/MS accuracy, you may be able to use a much tighter fragment tolerance.

The reference database

Protein inference for library matches is accomplished by assigning a reference Fasta database to each library as part of the library configuration. A detailed description can be found on the relevant help page. The default reference database for predefined databases is SwissProt, usually with an appropriate taxonomy filter. If SwissProt is not available on your Mascot Server, you will need to choose an alternative protein Fasta file (cannot be NA). You can select any locally available Fasta file, but we advise against choosing a very large database, such as NCBIprot, even with a restricted taxonomy, because the huge number of proteins mapping to each peptide sequence will make compression, searching, and reporting very much slower than with a less redundant file, such as SwissProt or a UniProt complete proteome.

Taxonomy

In most cases, library files are compiled for an individual organism, so there is no requirement to identify the taxonomy of individual entries. Even if this was not the case, the entries in a library are peptides, not proteins, so taxonomy assignment would be tricky. Hence, Mascot allows taxonomy to be specified in the filter used to construct a library, but not as a filter when searching a library.

Create a library from Mascot search results

From the Library menu in Database Manager, choose Create new, Create from search results. You specify a location for the new files, MS/MS tolerances, and the reference database.

You then define filters that control which peptide matches will be added to the library. More information about spectral library filters can be found on the Spectral library search help page.

You must specify at least one filter, which must be a score or expect value threshold, typically expect < 0.01, because you only want high confidence matches in a library. The final step is to import search results filtered by date range and an optional filepath wildcard. Later on, you can schedule a recurring update task to import matches from new Fasta search results.

It can be useful to apply quite narrow restrictions for an individual library. For example, you might want one library for human SILAC data and another for human phosphopeptides and another for human MHC peptides. The same peptide match may make an appearance in several libraries and, if you change your mind about the criteria, you can easily modify the filters and create a new library.

Within a library, a peptide sequence with a particular charge and set of modifications appears only once and is represented by the match with highest score. That is, libraries built by Database Manager do not contain consensus spectra. Only matches from uninterpreted MS/MS data are considered; PMF and sequence queries results are ignored, as are matches from the second pass of an error tolerant search. Modifications and Taxonomy apply to the match, not the search parameters. If the filter includes Phospho (ST), only matches containing Phospho (ST) pass, not all matches in a search where this was a variable mod. All taxonomies at or below the specified node will pass.