Result report overview
At the completion of a search, a summary report is displayed that provides an overview of the results. There will often be a choice of report formats and each report contains links to more detailed views of the experimental and calculated data.
Types of Summary Report
The default summary report for peptide mass fingerprint results is the Concise Protein Summary. Proteins that match the same set or a sub-set of mass values are grouped into a single hit. The intention is to provide a one page summary of the search results. You can use the format controls to switch to the original Protein Summary, where each protein hit is listed separately, together with details of individual mass value matches.
For MS/MS searches of less than 300 spectra, the default summary report is the Peptide Summary. This provides a clear picture of the peptide matches, grouped into protein hits using a simple parsimony algorithm. If there are 300 or more spectra, the default summary report is the Protein Family Summary. This groups the proteins into families based on a novel hierarchical clustering algorithm and presents these results one page at a time, initially with 20 families per page. This report is ideally suited to very large and complex MS/MS searches, where it is not practical to display all the results on a single HTML page.
For MS/MS results, you can use the format controls to switch to a Select Summary, which is similar to a Peptide Summary, but provides a more compact view of the results. The Select Summary splits the peptide matches assigned to protein hits into a separate report from the unassigned peptide matches. For searches of less than 1000 MS/MS spectra, you can also choose a Protein Summary, but it is not recommended to do so unless you are viewing the results of a combination search. If the sample is a mixture, using one of the Protein Summary reports to view MS/MS search results can give a very misleading picture.
If you are submitting MS/MS searches to an in-house Mascot server, you will also have the option to create an Archive Report. This is simply an edited version of the Peptide Summary report, that only includes the protein hits you have selected. If there are no peptide sequence matches at all from a search of MS/MS data, only molecular weight matches, then a Protein Summary report will be displayed. This indicates that the search has failed. Possibly the spectra are nothing but noise or possibly the search parameters are incorrect in some way.
In summary reports for MS/MS results, if the database was nucleic acid and one or more UniGene indexes have been configured for the database being searched, there will be the option to generate a report in which the protein matches are clustered into UniGene families.
The final choice on the list of report formats is always Export Search Results. This enables the results to be exported in a number of "machine readable" formats, including mzIdentML, the standard interchange format for search results.
Protein View
The protein view of an entry on the hit list can be displayed by clicking on an accession number in a summary report.
Information about the protein, the enzyme (if any), and any modifications are printed at the top of the page. This is followed by the formatted sequence of the protein in 1-letter code with matched peptides highlighted in bold, red type.
If the sequence database was nucleic acid, and the matches all came from a single frame, the report will be very similar to that for a protein database entry. If the matches come from multiple frames, because of a frame shift or splice, then only one frame at a time will be displayed. A drop down list can be used to switch between frames.
The sequence block is followed by a detailed table of the peptide matches. For an enzyme digest, you can also choose to display all the calculated peptides, whether matched or not, including all partials up to the limit specified by the Missed Cleavages parameter. The matched peptides are shown in bold, red type, together with a link to the corresponding peptide view. If no enzyme or a semi-specific enzyme was used, this option is not available, and the table contains only the matched peptides.
If the enzyme was a mixture of independent enzymes, and you choose to display calculated peptides, these will be shown for one enzyme component at a time. A drop down list can be used to switch between enzymes. The formatted protein sequence shows highlights for all matches at all times.
The default sort order is start residue order. Controls are provided to re-display the table sorted by increasing or decreasing peptide molecular weight
A graph displays the mass differences between the calculated and experimental mass values for the protein match in the same units as were used to specify the peptide mass error tolerance. There is also a figure for the RMS error of the set of matched mass values in ppm.
If available, at the bottom of the page, the full text of the sequence annotations is reproduced.
Genomic sequences
If the match is to a very long nucleic acid sequence, (greater than 30,000 bases by default), the conventional Protein View is impractical. In this case, Mascot will automatically generate a DDBJ/EMBL/GenBank format feature table. For example:
BLASTCDS 422..469 /label=Q103 /colour=2 /note=”Mascot match, … sequence=GLGTDEDTLIEILASR” /blastp_file=”../data/20001016/FTGrCfc.dat” /mass=1701.88 /score=82 /rank=1 /translation=”GLGTDEDTLIEILASR” BLASTCDS 603..650 /label=Q105 /colour=2 /note=”Mascot match, … sequence=SEDFGVNEDLGDSDAR” /blastp_file=”../data/20001016/FTGrCfc.dat” /mass=1738.73 /score=82 /rank=2 /translation=”SEDFGVNEDLGDSDAR” |
By default, only matches with significant scores (p < 0.05) are output. A different score threshold can be specified by appending &_featuretableminscore=X to the protein view URL, where X is the score threshold.
The feature table can be saved to a text file and read into a genome browser such as Artemis from the Sanger Centre. This provides a very flexible and powerful way to view Mascot peptide matches in genomic sequence data.
Peptide View
The Peptide View of a matched peptide can be loaded by clicking on a query number hyperlink in a summary report or an ions score hyperlink in Protein View.
The name of the protein and the 1-letter sequence of the peptide are printed at the top of the page, followed by the query title, if any. Below this is a mass spectrum labelled with fragment ions, e.g. b(6). Note that a small interval around the peptide molecular ion (±2 Da by default) is omitted from the spectrum, reflecting the suppression of these data points in the Mascot search.
If the report is viewed in most modern web browsers, the spectrum display is an interactive SVG graphic that supports zoom and pan. Hover the mouse over a cleavage point in the peptide sequence to highlight the corresponding peaks in the spectrum and vice versa. Drag between two peaks to display mass differences. Controls have tooltip help. (The Xi Spectrum Viewer was developed by the Rappsilber Laboratory at the University of Edinburgh and released under Artistic License 2.0. Some associated icons taken from Farm-fresh web icons, released under the Creative Commons Attribution 3.0 License.)
The colours used for different components of the SVG graphic can be specified in mascot.dat. For details, search the latest Mascot Parser help for getSpectrumViewerColourSchemeString.
If the browser does not support the SVG graphic, a simple bitmap is displayed. Click the mouse within the spectrum to zoom in by a factor of 2, so as to show greater detail in crowded regions. Alternatively, controls above the spectrum can be used to specify the plotted mass range directly or reset the mass scale.
In the spectrum and the table that follows, you can choose whether to label all possible matches or just the matches used for scoring.
Mascot begins by selecting a small number of experimental peaks on the basis of normalised intensity. It calculates a probability based score according to the number of matches. It then increases the number of selected peaks, re-calculates the score, and continues to iterate until it is clear that the score can only get worse. It then reports the best score it found, which should correspond to an optimum selection, taking mostly real peaks and leaving behind mostly noise.
If you choose to label all possible matches, remember that many spectra have "peak at every mass" noise, and can match any ion series from any sequence if there is no intensity discrimination.
The matched fragment ions are shown in tabular format below the spectrum. The ion series are those specified by the INSTRUMENT search parameter. If you choose to label the matches used for scoring, bold italic red means the series contributed to the score. Bold red means that the number of matches in the ion series is greater than would be expected by chance, indicating that the ion series is present. Non-bold red means that the number of matches in the ion series is no greater than would be expected by chance, so that the matches themselves may be by chance.
A graph displays the mass differences between the calculated and experimental fragment ion mass values in the units used to specify the error tolerance. A second graph shows the same points but with an axis in ppm. The root mean square (RMS) error of the set of matched mass values is given in ppm.
If any residues in the matched peptide have modifications with multiple neutral losses, the table shows the values corresponding to the dominant neutral loss(es). The text immediately above the table gives details. The labels in the spectrum are for all peaks that were selected and matched to obtain the best score, and any neutral losses form part of the label. So, for example, the spectrum might contain peaks labelled y(9) and also y(9)-98. The table will list just one of these values in the y column.
A link is provided to perform a BLAST search of the matched peptide sequence at NCBI. If NCBI is busy, then copy the sequence to the clipboard and follow the final link to a list of alternative BLAST engines.
Finally, the alternative matches to the same MS/MS spectrum are tabulated, allowing you to load Peptide View reports for other matches. If the top rank match is significant and contains one or more variable modifications for which alternative arrangements are possible, site analysis information is displayed.
UniGene
NOTE: UniGene was retired by NCBI in July 2019, although the final UniGene builds are still available as static content from the FTP site
One of the drawbacks of searching an EST database is that there are very few long sequences, so that extended groupings of peptide matches into protein matches are rare. This can be rectified with UniGene, an index created by automatically partitioning GenBank sequences into a non-redundant set of gene-oriented clusters. Each UniGene cluster is a list of the GenBank sequences, including EST’s, which represent a unique gene. It is not an attempt to produce a consensus sequence.
If one or more UniGene indexes have been configured for the database being searched, there will be a format control to generate a species based UniGene report.
Following a Protein View link from a UniGene report will display a list of Unigene family members in place of the standard Protein View.
URL Switches
There are a number of switches to modify the format of the result reports. Many of these have a global default, set by a parameter in the Options section of mascot.dat. These defaults can be changed in an individual report using the format controls, or by appending the relevant switch to the report URL. Switches take the form label=value and the delimiter between switches is an ampersand (&). For example, if the report URL was:
http://local-server/mascot/cgi/master_results.pl?file=../data/20040121/F001847.dat
The type of report could be changed by appending "REPTYPE=protein":
http://local-server/mascot/cgi/master_results.pl?file=../data/20040121/F001847.dat&REPTYPE=protein
Labels and values are not case sensitive. Note that many labels begin with an underscore character. Values that are not literal strings are shown in italics.
URL arguments relating to quantitation are described here
master_results.pl and master_results_2.pl
URL | mascot.dat | master_results.pl | master_results_2.pl | Value | Description |
---|---|---|---|---|---|
reptype | peptide | Peptide Summary | |||
archive | Archive Report | ||||
concise | Concise Protein Summary | ||||
protein | Full Protein Summary | ||||
select | Select Summary (hits) | ||||
unassigned | Select Summary (unassigned) | ||||
report | auto | Report all significant hits | |||
N | Report N hits | ||||
_showsubsets | ShowSubSets | 1 | For a Peptide Summary, set the value to 1 to report all hits that match a subset of peptides. Default is 0 for no sub-set hits. Intermediate values set a threshold on the difference in protein score between the primary hit and the sub-set hit expressed as a fraction. | ||
_requireboldred | RequireBoldRed | 1 | Set value to 1 to report Peptide Summary hits only if they contain at least one "bold red" peptide, (default 0). | ||
_showallfromerrortolerant | ShowAllFromErrorTolerant | 1 | Set value to 1 to report all matches from an error tolerant search, including the garbage, (default 0) | ||
_onlyerrortolerant | 1 | Set value to 1 to report only error tolerant matches from an automatic error tolerant search, (default 0) | |||
_noerrortolerant | 1 | Set value to 1 to suppress error tolerant matches from an automatic error tolerant search, (default 0) | |||
_show_decoy_report | 1 | Set value to 1 to display the report for an automatic decoy database search, (default 0) | |||
_sigthreshold | SigThreshold | N | Probability to use for the significance threshold. Range is 0.99 to 1E-18, (default 0.05). | ||
_sortunassigned | SortUnassigned | scoredown | Sort unassigned matches by descending score, (default) | ||
queryup | Sort unassigned matches by ascending query number | ||||
intdown | Sort unassigned matches by descending intensity | ||||
_ignoreionsscorebelow | IgnoreIonsScoreBelow | N | Values greater than 0 and less than 1 act as an expect value threshold, and the scores for any peptide matches with higher expect values are set to 0, so that they disappear from the report. Values of 1 or more act as a score threshold, and any peptide matches with lower scores suppressed. A value of -1 means set the threshold to the value of _sigthreshold. Floating point number, (default 0.0). | ||
_showpopups | true | Show top 10 peptide matches for each query in JavaScript pop-up, (default) | |||
false | Suppress JavaScript pop-ups. | ||||
_alwaysgettitle | 1 | Set to 1 to force reports to fetch Fasta titles from database when they are not included in the result file, (default 0 in master_results.pl, 1 in master_results_2.pl). | |||
_server_mudpit_switch | MudpitSwitch | N | Protein score calculation switches to large search mode when the ratio between the number of queries and the number of database entries, (after any taxonomy filter), exceeds this value, (default 0.001). | ||
percolate | Percolator | 1 | Set value to 1 to re-rank results using Percolator, (default 0). | ||
percolate_rt | PercolatorUseRT | 1 | Set value to 1 to include retention time feature when using Percolator, (default 0). | ||
_proteinfamilyswitch | ProteinFamilySwitch | 0 | The number of MS-MS spectra required for displaying the Protein Family Summary report. Set to 0 to force results to be always displayed as Protein Family Summary, (default 300). | ||
_prefertaxonomy | N | 1-based integer index into the list of taxonomies in the Mascot taxonomy file. 0 means no preference. | |||
group_family | 0 | Set to 0 to disable family grouping. | |||
_minpeplen | MinPepLenInPepSummary | N | Peptides shorter than this are ignored for protein inference purposes. Positive, non-zero integer. | ||
min_num_sig_unique_seqs | N | Proteins will only be reported if they contain significant matches to at least this number of distinct peptide sequences. Positive, non-zero integer. |
protein_view.pl
URL | mascot.dat | Value | Description |
---|---|---|---|
sort | startup | Sort table of peptides by ascending start residue number, (default) | |
massup | Sort table of peptides by ascending mass | ||
massdown | Sort table of peptides by descending mass | ||
showall | true | Show all calculated peptides, not just matched peptides | |
false | Show just matched peptides, (default) | ||
_showallfromerrortolerant | ShowAllFromErrorTolerant | 1 | Set value to 1 to report all matches from an error tolerant search, including the garbage, (default 0) |
_onlyerrortolerant | 1 | Set value to 1 to report only error tolerant matches from an automatic error tolerant search, (default 0) | |
_noerrortolerant | 1 | Set value to 1 to suppress error tolerant matches from an automatic error tolerant search, (default 0) | |
_show_decoy_report | 1 | Set value to 1 to display the report for an automatic decoy database search, (default 0) | |
_sigthreshold | SigThreshold | N | Probability to use for the significance threshold. Range is 0.99 to 1E-18. Default is 0.05. |
_ignoreionsscorebelow | IgnoreIonsScoreBelow | N | Values greater than 0 and less than 1 act as an expect value threshold, and the scores for any peptide matches with higher expect values are set to 0, so that they disappear from the report. Values of 1 or more act as a score threshold, and any peptide matches with lower scores suppressed. Floating point number, (default 0.0). |
_server_mudpit_switch | MudpitSwitch | N | Protein score calculation switches to large search mode when the ratio between the number of queries and the number of database entries, (after any taxonomy filter), exceeds this value, (default 0.001). |
_featuretablelength | FeatureTableLength | N | Length of database entry in bases at which protein view switches to GenBank output. Default 30000 |
_featuretableminscore | FeatureTableMinScore | N | Score threshold for inclusion in GenBank feature table format, if undefined then report includes matches that exceed lower of homology or identity threshold |
indyenzyme | N | If enzyme was independent, display cleavage products for this specificity index | |
frame | N | For a nucleic acid database, display matches in this frame number | |
percolate | Percolator | 1 | Set value to 1 to re-rank results using Percolator, (default 0). |
percolate_rt | PercolatorUseRT | 1 | Set value to 1 to include retention time feature when using Percolator, (default 0). |