Summary reports for PMF

At the completion of a search, a summary report is displayed that provides an overview of the results. There is a choice of report formats and both reports contain links to more detailed views of the experimental and calculated data.

The default summary report for peptide mass fingerprint results is the Concise Protein Summary. Proteins that match the same set or a sub-set of mass values are grouped into a single hit. The intention is to provide a one page summary of the search results. You can use the format controls to switch to the original Protein Summary, where each protein hit is listed separately, together with details of individual mass value matches.

Concise Protein Summary

Sections of the report are described in the order in which they appear. Use this link to open an example report in a new browser window or tab.

Header

At the top of the report are a few lines to identify the search uniquely: search title, date, user name, etc. The database version is identified with either a release number or an ISO datestamp. The score, accession and description for the top scoring protein hit are listed.

If the search included the auto-decoy option, information about the highest scoring match in the decoy database is displayed at this location.

Score Distribution

Following the header, a histogram illustrates the protein score distribution. The 50 best matching proteins are divided into 16 bins according to their score, and the heights of the bars show the number of matches in each bin.

The protein score is a measure of the statistical significance of a PMF match. The region in which random matches may be expected is shaded green. This region extends up to the significance threshold, which has a default setting of 5%. If a score falls in the green shaded area, there is greater than a 5% probability that the match was a random event, of no significance. Conversely, a match in the unshaded part of the histogram has less than a 5% probability of being a random event. It is quite common to see several proteins getting the same high score. Even if the protein sequences in the database are non-identical, the same group of matched mass values may occur in multiple proteins.

Format Controls

These controls enable the report format to be modified. After making changes, press the "Format As" button to reload the report using the new settings.

For a peptide mass fingerprint search, there are just three controls:

  • Report format Choose from the list of available formats
  • Significance threshold The default significance threshold is p < 0.05. You can change this to any value in the range 0.99 to 1E-18.
  • Maximum number of hits This value was initially chosen when the search was submitted. Enter a positive integer if you wish to re-specify the number of protein hits to report. Of course, the total number of hits actually found by the search may be less. The maximum number of hits saved to the result file is 50. Entering the word AUTO or a value of 0 will display all of the hits that have a protein score exceeding the significance threshold, plus one extra hit.

Repeating a search

A search can easily be repeated, so as to investigate the effect of changes in search parameters. Choose Re-Search All to repeat the search with all mass values or Search Unmatched to repeat the search with only the mass values that did not get a match in the top hit. This could be a way of investigating whether the sample was a protein mixture, although Mascot has a built-in PMF mixture mode. If there is statistically significant evidence for a second or even a third protein, this will appear in the result report.

Protein Hit List

The body of the report contains a tabular summary of the best matching proteins. The number of proteins shown is specified in the search form, up to a maximum of 50. Proteins that match the same set of mass values, or a sub-set, are grouped into a single hit.

For each protein, the first line contains the accession string, (linked to the corresponding Protein View), the protein molecular mass, and the protein score. Expect is the number of times we would expect to obtain an equal or higher score, purely by chance. The lower this expectation value, the more significant the result. The number of mass values matched to the protein completes the first line. The second line is the protein description taken from the Fasta entry.

The Concise Protein Summary is intended to be brief, including only the most important information. If you want to see details of individual mass matches for all proteins, use the format controls to switch to the Protein Summary. Or, for a selected protein, click on the accession string link to load a Protein View

Search Parameters

At the foot of the report, the search parameters are summarised. Descriptions of individual search parameters can be found here.

Protein Summary

Sections of the report are described in the order in which they appear. Use this link to open an example report in a new browser window or tab.

Header

At the top of the report are a few lines to identify the search uniquely: search title, date, user name, etc. The database version is identified with either a release number or an ISO datestamp. The score, accession and description for the top scoring protein hit is listed.

If the search included the auto-decoy option, false discovery rate information is displayed at this location.

Score Distribution

Following the header, a histogram illustrates the protein score distribution. The 50 best matching proteins are divided into 16 bins according to their score, and the heights of the bars show the number of matches in each bin.

The protein score is a measure of the statistical significance of a PMF match. The region in which random matches may be expected is shaded green. This region extends up to the significance threshold, which has a default setting of 5%. If a score falls in the green shaded area, there is greater than a 5% probability that the match was a random event, of no significance. Conversely, a match in the unshaded part of the histogram has less than a 5% probability of being a random event. It is quite common to see several proteins getting the same high score. Even if the protein sequences in the database are non-identical, the same group of matched mass values may occur in multiple proteins.

Format Controls

These controls enable the report format to be modified. After making changes, press the "Format As" button to reload the report using the new settings.

For a peptide mass fingerprint search, there are just three controls:

  • Report format Choose from the list of available formats
  • Significance threshold The default significance threshold is p < 0.05. You can change this to any value in the range 0.99 to 1E-18.
  • Maximum number of hits This value was initially chosen when the search was submitted. Enter a positive integer if you wish to re-specify the number of protein hits to report. Of course, the total number of hits actually found by the search may be less. The maximum number of hits saved to the result file is 50. Entering the word AUTO or a value of 0 will display all of the hits that have a protein score exceeding the significance threshold, plus one extra hit.

(You may see an Overview Table at this position)

Repeating a search

A search can easily be repeated, so as to investigate the effect of changes in search parameters. Choose Re-Search All to repeat the search with all mass values or Search Unmatched to repeat the search with only the mass values that did not get a match in the top hit. This could be a way of investigating whether the sample was a protein mixture, although Mascot has a built-in PMF mixture mode. If there is statistically significant evidence for a second or even a third protein, this will appear in the result report.

Index

Each accession string is a hyperlink to jump down to the protein hit in the body of the report

Protein Hit List

The body of the report contains a tabular summary of the best matching proteins. The number of proteins shown is specified in the search form, up to a maximum of 50.

For each protein, the first line contains the accession string, (linked to the corresponding Protein View), the protein molecular mass, and the protein score. Expect is the number of times we would expect to obtain an equal or higher score, purely by chance. The lower this expectation value, the more significant the result. The number of mass values matched to the protein completes the first line. The second line is the protein description taken from the Fasta entry. This is followed by a table summarising the matched peptide masses. The table columns contain:

  1. Experimental m/z value
  2. Experimental m/z transformed to a relative molecular mass
  3. Relative molecular mass calculated from the matched peptide sequence
  4. Difference (error) between the experimental and calculated masses
  5. Inclusive numbering of the residues, starting with 1 for the N-terminal residue of the intact protein
  6. Number of missed cleavage sites
  7. Sequence of the peptide in 1-letter code. The residues that bracket the peptide sequence in the protein are also shown, delimited by periods. If the peptide forms the protein terminus, then a dash is shown instead.

If any variable modifications were used to get the mass match, these are listed after the sequence string. Note that you should not take this as evidence for the presence of any post-translational modification. Individual mass matches in a PMF can be chance events.

Underneath the table, any unmatched mass values are listed as a comma separated string.

Unless you particularly want to see details of the individual mass matches for every protein in the hit list, the default Concise Protein Summary may be a better choice.

Search Parameters

At the foot of the report, the search parameters are summarised. Descriptions of individual search parameters can be found here.

Overview Table

The (optional) overview table provides an animated summary of the results. This feature is deprecated and cannot be selected in the search form. You are unlikely to see it unless using older client software that requests this feature.

Each row of the overview table represents a peptide, while each column represents a protein. Where a protein contains a mass match, the table cell contains an LED style indicator. This indicator will light up when it is under the mouse cursor, along with all the other indicators in the row that correspond to the same peptide. Even when the sequence database is non-identical, there may still be extensive homology between entries, and the overview table indicators provide a rapid means of identifying which peptides are common to which proteins.

In addition to lighting up the indicators, moving the mouse cursor over a cell displays the query title (if any), the protein accession number, and the peptide sequence in the three text fields above the table. Clicking on one of the indicators will load a Protein View for the selected protein. Clicking on a column header cell will jump down the page to the corresponding protein hit in the body of the report.

The cells in the first column of the overview table identify each query by the experimental m/z value of the peptide. When the mouse cursor is moved over these cells, the query title (if any) is displayed in one of the text fields above the table. Each cell also contains a check box, which can be used to select a sub-set of the mass values for a repeat search. The repeat search buttons are modified accordingly, offering the choices Select All, Select None, and Search Selected

PMF Mixture Mode

Although it is essential to use MS/MS when dealing with a complex mixture or looking for a minor component, it is sometimes possible to detect simple mixtures using PMF. Mascot PMF searches automatically test for the possibility that the sample is a mixture of proteins, and any statistically significant protein mixture will be reported.

Mascot scores the match for the complete set of experimental mass values to the in silico digest products of the putative protein mixture. It isn’t a subtractive approach, where the strongest match to a single protein is found, the matched values are removed, and the remainder used to search for the next protein. It would be very difficult to provide a true probability-based score for the subtractive approach. Also, it is less sensitive because, in large data sets, there are likely to be several shared mass values, that match to more than one of the proteins in the mixture. In theory, the algorithm can detect a 6 component mixture, but we have never seen a real-life example. You are very unlikely to see more than 3 components in real data, even with excellent signal to noise, coverage, and mass accuracy.

Use this link to open an example report in a new browser window or tab.

Switch to the Protein Summary report to see which masses match to which protein. Searching for mixtures is disabled if an intact protein mass is specified, because this can create artefacts.

Combination Searches

Combination searches are where the data include both MS/MS spectra and molecular mass values. If the results from such a search are viewed using a Protein Summary report, the protein scores will contain contributions from both the matching of MS/MS spectra to peptide sequences and the matching of peptide molecular masses to proteins.

Typically, a peptide mass fingerprint has a similar information content to a single MS/MS spectrum. If you have good coverage for a particular protein, chances are you will also have several good MS/MS spectra from the protein, so the score contribution from the PMF matching is not critical. On the other hand, if coverage for a particular protein is low, the peptide mass fingerprint score will also be poor, so is of little use.

One situation where a combined search can be useful is when you have high coverage PMF data, plus very limited amounts of low quality MS/MS data. Then, the PMF score contribution may equal or exceed that of the ions score. However, including poor MS/MS data in a PMF search can work against you. Imagine that an MS/MS spectrum has a precursor mass match to a protein, but the MS/MS spectrum is nothing but noise, and gets random matches to peptides from other proteins. We must then say that this mass does not ‘belong’ to the protein, so there should be no contribution from the peptide mass to the PMF score. In other words, bad MS/MS data can degrade a PMF match. It is usually safer to discard the poor quality MS/MS data, and do a conventional PMF

Difficulty also arises when the sample contains more than one or two proteins. A Protein Summary is limited to 50 proteins because Mascot only saves PMF scores for the top 50. If the major components in the mixture are well represented in the database, the whole hit list could be occupied by variants of these proteins, excluding all the minor components. So, even if you have a good MS/MS match to a peptide from a minor protein, it may not appear in the report

Combination searches are useful when you are trying to do something unusual, like locate exon-intron boundaries or splice variants. In such cases, you aren’t interested in the scores, just in whether a particular mass match distinguishes between two possibilities.

Note: By default, the information required to create a Protein Summary report is only saved to the result file for searches of 1000 queries or less. This is more than adequate for a PMF. It may not be sufficient for certain combination searches. If you need to increase this limit, search for SplitNumberOfQueries in the Setup & Installation manual. Increasing this limit will cause searches to use more memory, and may restrict the size of standard, MS/MS searches or the number of simultaneous searches that can be run on your server.