This help document answers the following questions about the batch query tool:
Use the MGI Batch Query to retrieve data about many MGI genes simultaneously. Currently, the tool retrieves;
Future versions may include additional options.
Given a set of gene symbols or input identifiers (e.g. MGI accession, RefSNP, VEGA IDs) from a spreadsheet, you can:
You can also use the batch query to find duplications, ambiguities, and variations in data. You could check, for example, that an input list of MGI identifiers returns the same number of Ensembl, Entrez Gene, and VEGA IDs on different days or that switching the input and output IDs returns identical data. See Are there any examples? for additional ways you might use this query.
Note:Be sure to remove any quotation marks or other non-alphanumeric characters from any list you enter or upload. The only valid delimiters are tab, comma/space, space, and new line.
When typing or pasting IDs into the input box (you can copy and paste a column from an Excel spreadsheet):
For a given query, your list can be either mixed or all of the same type. You can also enter (mouse) gene symbols, synonyms, or orthologs.
MGI:96677 Trp53 Pax-6 P53 16590 ENSMUSG00000005672 OTTMUSG00000015949 247073 MI0000248 10379153
The following are additional types (not listed on the pulldown menu), followed by examples, of other types of IDs that the Batch Query recognizes:
Input identifier type Representative example MGI Gene/Marker ID MGI:96677 Current Symbols Only Trp53
Pax6
D11Mit10All Symbols/Synonyms/Orthologs
(includes both current and old symbols)Pax6
Pax-6
Trp53
P53Entrez Gene ID 16590 Ensembl ID ENSMUSG00000028530 VEGA ID OTTMUSG00000015949 UniGene ID 247073 MiRBase ID MI0000248 GenBank/RefSeq ID* NM_001122899
AK033644
NP_666257UniProt ID P48356
A2AKJ2GO (Gene Ontology) GO:0019221 RefSNP ID rs3021544 Affy Probeset ID 10379153 * GenBank IDs are for nucleotide sequences only. RefSeq IDs are for either nucleotide or protein sequences.
Input identifier type Representative example EC
(Enzyme Commission)2.7.10.1 Homologene 20151 PDB
(Protein Data Bank)1HU8 Consensus CDS CCDS16941.1 Protein Ontology PR:000004803
Yes.
Since the Batch Query's default option is to search all input types, you do not have to identify what you enter or upload into the Batch Query. The tool determines whether they are of one type or a combination of IDs and or symbols. You may, however, select from the pulldown menu if you wish to constrain your query to a single type. See also When would I constrain my search to a single input type? below.
You may wish to select a single input type from the Type list when you want the Batch Query to return:
There is no limit to the number of identifiers that you can enter all at once, but there is a limit to how many numbers different browsers can present and there is a time constraint for very large files.
Yes. You can customize your results in three areas: Gene Attributes, Additional Information, and Format. Click to make your selections.
When you select... the resulting columns contain.... Nomenclature symbol, name, marker type Genome Location chromosome, start coordinate, end coordinate, strand, genome build Ensembl ID Ensembl ID Entrez Gene ID Entrez Gene ID VEGA ID VEGA ID
When you select... the resulting columns contain.... Gene Ontology (GO) ID, term, code
NOT annotations do not appear in the results.Mammalian Phenotype Ontology (MP) ID, term
Normal annotations do not appear in the results.
See How do I interpret Mammalian Phenotype (MP) results?Alleles MGI allele identifier and allele symbols. Gene Expression Anatomical structure (Theiler stage and tissue name), assay results (total number of times the structure was examined, number of times expression was detected in the structure, number of times expression was found to be absent in the structure). These numbers summarize results obtained from wild type and mutant specimens. RefSNP ID RefSNP identifiers
Results include RefSNPs within 2 KB of the gene/marker.GenBank/RefSeq ID GenBank (nucleotide) or RefSeq (nucleotide or protein) sequence identifiers. UniProt ID UniProt sequence identifiers. Human Disease (DO) ID, term
See How do I interpret Human Disease (DO) results?
The purpose of the single Additional information choice is to limit results to a reasonable size. There is quite an increase in the amount of data returned when you select an Additional Information category. For example, if you enter symbols for 9 paired box genes (your input list is Pax1, Pax2, Pax3, ... Pax9), and you select:
Nomenclature and … the MGI Batch Query returns... Genome Location 9 rows, one for each gene, Pax1 - Pax9 UniProt ID 30+ matching rows Gene Ontology (GO) 200+ matching rows Mammalian Phenotype (MP) 650+ matching rows Gene Expression 1400+ matching rows RefSNP ID ~2900 matching rows
Yes, you can.
See Are there examples? for a sample of a query modification.
MGI Batch Query results
All Batch Query results appear in the form of a table in either web (HTML) format, or in tab-delimited text depending on your Output selection.
- A summary of your query parameters appears at the top of the page under You searched for…. It lists:
- the total number of IDs/symbols you entered (or uploaded)
- the input identifier type of those IDs (e.g., Search all input types, MGI Gene/Marker ID, Current Symbols Only, and so on)
- your Output options (e.g., Nomenclature,Genome Location, UniProt ID, and so on)
- For each ID entered, at least one row is returned for that ID, its corresponding MGI gene/marker ID, plus columns for whichever attributes or additional information you selected (see Are there choices for how to view query results?).
- If a gene has more than one associated identifier, a row returns for each association (for example, there may be more than one Ensembl, Entrez Gene or VEGA ID; many GO or MP terms; lots of RefSNPs).
- Background row shading alternates by marker symbol. For example, if you were to enter Pax6 and Kit, all Pax6 associations would appear in one shading (e.g., light) and all Kit associations would appear in a contrast shading (e.g., dark).
- Entries in the Feature Type column (beneath Nomenclature) identify the category and/or subcategory of the marker (e.g., snoRNA Gene, QTL, Pseudogene, Complex/Cluster/Region, and so on.)
- When you select Gene Ontology (GO), three columns return: ID (with an entry such as GO:0005525), term (with an entry such as GTP binding), and Code (with an entry such as IEA. See Guide to GO Evidence Codes at the Gene Ontology website for the definitions).
Mammalian Phenotype (MP) results
The resulting list of Mammalian Phenotype Ontology terms associated with a gene is a combination of all terms associated with all mutant alleles of that gene.
- Mammalian Phenotype (MP) terms appear by gene.
- Each term describes a mouse phenotype with some mutation in that gene.
- The term does not necessarily imply that mutations in that gene contribute to or cause the phenotype.
- Analyzed mice may have causative mutations in other genes.
- Wide phenotypic variation exists due to homozygotes vs. heterozygotes and different strain backgrounds.
See also MGI Batch Query results for information on the other fields.
For detailed information, use the Phenotypes, Alleles & Disease Models Query Form to find your gene of interest and Mammalian Phenotype terms associated with specific genotypes and strains.
Human Disease (DO) results
Human Disease terms appear by gene, followed by an ID and the Disease Ontology vocabulary term entry.
- Each term listed indicates that a mutant allele of this gene is involved in a mouse genotype used as a disease model.
- The term does not necessarily imply that mutations in that gene contribute to or cause the disease.
- Analyzed mice may have causative mutations in other genes.
- Wide variation exists due to homozygotes vs. heterozygotes and different strain backgrounds.
See also MGI Batch Query results for information on other fields.
For detailed information, use the Phenotypes, Alleles & Disease Models Query Form to find your gene of interest and view Human Disease terms as they are associated with specific allelic mutations and strains.
Gene Expression results
If there is expression data from a curated reference in the Gene Expression Database, the anatomical structure examined (listed by Theiler stage and structure name) appears, followed by columns indicating:
- the total number of assay results for this gene/tissue
- the number of positive assay results (+)
- the number of negative assay results (−).
- the detected counts also include specimens for which detected = ambiguous or not specified (as well as present).
See also MGI Batch Query results for information on the other fields.
Use the Genes and Markers Query Form to find a gene of interest and view additional expression results (e.g., Literature Summary, Data Summary, Theiler Stages, Assay Types, cDNA source data, External Resources).
See Guide to GO Evidence Codes at the Gene Ontology website.
Note: If you get an error message before your query completes, try using a smaller list of IDs or selecting fewer output categories.
MGI curators add new annotations from the literature every day. Sequence data are download from those databases weekly and undergo MGI curation. The MGI web site is updated with these and other new data once a week.
Note: The steps below may not work with some versions of the Firefox browser. Check their website for a workaround or use a different browser for saving MGI Batch Query results in Excel. The Genes and Markers Query Form also provides the option to forward your results directly to the Batch Query.
Note: If your initial data are not in tab-delimited or comma-separated format, copy and paste the file into a spreadsheet, save it in one of those formats, and then use the MGI Batch Query to upload the desired column (be sure to identify the proper File Type and column).