Mascot database search > Access Mascot Server > Mascot search overview

Mascot search overview

Mascot is a powerful search engine which uses mass spectrometry data to identify proteins from primary sequence databases.

While a number of similar programs available, Mascot is unique in that it integrates all of the proven methods of searching. These different search methods can be categorised as follows:

Peptide Mass Fingerprint in which the only experimental data are peptide mass values, (tutorial)
Sequence Query in which peptide mass data are combined with amino acid sequence and composition information. A super-set of a sequence tag query, (more information)
MS/MS Ion Search using uninterpreted MS/MS data from one or more peptides, (tutorial)

MS/MS data can be searched against both Fasta files and spectral libraries.

The general approach for all types of search is to take a small sample of the protein of interest and digest it with a proteolytic enzyme, such as trypsin. The resulting digest mixture is analysed by mass spectrometry.

Different types of mass spectrometer have different capabilities. A simple instrument will measure a set of molecular weights for the intact mixture of peptides. An instrument with MS/MS capability can additionally provide structural information by recording the fragment ion spectrum of a peptide. Usually, the digest mixture will be separated by chromatography prior to MS/MS analysis, so that MS/MS spectra from individual peptides can be measured.

The experimental mass values are then compared with calculated peptide mass or fragment ion mass values, obtained by applying cleavage rules to the entries in a comprehensive primary sequence database. By using an appropriate scoring algorithm, the closest match or matches can be identified. If the "unknown" protein is present in the sequence database, then the aim is to pull out that precise entry. If the sequence database does not contain the unknown protein, then the aim is to pull out those entries which exhibit the closest homology, often equivalent proteins from related species.

The sequence databases that can be searched on the Matrix Science free, public Mascot server are:

Fasta sequence databases:

SwissProt is a high quality, curated protein database. Sequences are non-redundant, rather than non-identical. SwissProt is ideal for peptide mass fingerprint searches and MS/MS searches of well characterised organisms, where it isn’t essential to match every single spectrum.

EMBL EST divisions contain "single-pass" cDNA sequences, or Expressed Sequence Tags, from a number of organisms. During a Mascot search, the nucleic acid sequences are translated in all six reading frames. There are 10 divisions:

Environmental_EST
Fungi_EST
Human_EST
Invertebrates_EST
Mammals_EST
Mus_EST
Plants_EST
Prokaryotes_EST
Rodents_EST
Vertebrates_EST

contaminants is a database of common contaminants compiled by Max Planck Institute of Biochemistry, Martinsried

cRAP is a database of common contaminants compiled by the Global Proteome Machine Organization

Selected UniProt proteomes

Database name	Organism	Taxonomy ID	Uniprot ID	Coverage
UP6548_A_thaliana	Arabidopsis thaliana (Mouse-ear cress) (Strain: cv. Columbia)	3702	UP000006548	99.6%
UP9136_B_taurus	Bos taurus (Bovine) (Strain: Hereford)	9913	UP000009136	98.0%
UP1940_C_elegans	Caenorhabditis elegans (Strain: Bristol N2)	6239	UP000001940	99.7%
UP6906_C_reinhardtii	Chlamydomonas reinhardtii (Chlamydomonas smithii) (Strain: CC-503)	3055	UP000006906	96.0%
UP437_D_rerio	Danio rerio (Zebrafish) (Brachydanio rerio) (Strain: Tuebingen)	7955	UP000000437	96.9%
UP2195_D_discoideum	Dictyostelium discoideum (Slime mold) (Strain: AX4)	44689	UP000002195	96.0%
UP803_D_melanogaster	Drosophila melanogaster (Fruit fly) (Strain: Berkeley)	7227	UP000000803	99.3%
UP625_E_coli_K12	Escherichia coli (strain K12) (Strain: K12 / MG1655 / ATCC 47076)	83333	UP000000625	100.0%
UP219602_F_oxysporum	Fusarium oxysporum f. sp. radicis-cucumerinum (Strain: Forc016)	327505	UP000219602	98.5%
UP5640_H_sapiens	Homo sapiens (Human)	9606	UP000005640	99.5%
UP589_M_musculus	Mus musculus (Mouse) (Strain: C57BL/6J)	10090	UP000000589	99.7%
UP808_M_pneumoniae	Mycoplasma pneumoniae (strain ATCC 29342 / M129)	272634	UP000000808	75.9%
UP59680_O_sativa	Oryza sativa subsp. japonica (Rice) (Strain: cv. Nipponbare)	39947	UP000059680	87.0%
UP8311_R_communis	Ricinus communis (Castor bean)	3988	UP000008311	90.5%
UP2494_R_norvegicus	Rattus norvegicus (Rat) (Strain: Brown Norway)	10116	UP000002494	97.8%
UP2311_S_cerevisiae	Saccharomyces cerevisiae (strain ATCC 204508 / S288c) (Baker’s yeast)	559292	UP000002311	98.9%
UP2485_S_pombe	Schizosaccharomyces pombe (strain 972 / ATCC 24843) (Fission yeast)	284812	UP000002485	97.8%
UP8227_S_scrofa	Sus scrofa (Pig) (Strain: Duroc)	9823	UP000008227	96.2%
UP241690_T_harzianum	Trichoderma harzianum CBS 226.95	983964	UP000241690	98.6%
UP5226_T_rubripes	Takifugu rubripes (Japanese pufferfish) (Fugu rubripes)	31033	UP000005226	95.2%
UP279841_T_thermophilus	Thermus thermophilus	274	UP000279841	85.5%
UP186698_X_laevis	Xenopus laevis (African clawed frog) (Strain: J)	8355	UP000186698	95.6%
UP7305_Z_mays	Zea mays (Maize) (Strain: cv. B73)	4577	UP000007305	96.4%

Spectral library databases:

NIST_Mouse_IonTrap

NIST_S.cerevesiae_IonTrap

PRIDE_Contaminants

PRIDE_Human