HumanMouse: Disease Connection
Relating human diseases and mouse models
 About MGI   Help   Contact Us   MGI Home

A brief tour of the Human-Mouse: Disease Connection

This document is intended to be an overview, for more detailed explanations on query logic, see also the Human–Mouse: Disease Connection user help documentation.

Table of contents

  1. The Human–Mouse: Disease Connection (HMDC)
  2. What can you do with the HMDC?
  3. Beginning your search
  4. Search results
    1. Gene homology x Phenotypes/Diseases tab
    2. Genes tab
    3. Diseases tab
  5. Uploading Variant Call Format (vcf) files


The Human–Mouse: Disease Connection

The goal of the Human–Mouse: Disease Connection (HMDC) is to provide seamless human-to-mouse data traversal, enabling clinical and translational researchers to take advantage of the wealth of data and annotations from mouse models; as well as allowing mouse researchers to connect their findings directly to genetic associations reported in human disease.

The mouse is genetically and physiologically similar to humans, is tractable as a laboratory animal, has a fully sequenced and well-annotated genome, and a readily available set of powerful molecular technologies for manipulating its genome in very precise ways. Now, clinical researchers whose primary focus is on human genetic disease, variants, and natural mutations have a highly-accessible way to explore experimentally characterized mouse mutants for a spectrum of associated phenotypes, as well as known disease models developed by the greater research community. Investigators can begin with genes (symbols, names or IDs) or gene lists, genome positions (as coordinates from human or mouse, or as .vcf files), OMIM diseases or phenotypes, and retrieve a list of genes, annotated mammalian phenotypes, associated human diseases and/or available mouse models for further research, along with comprehensive supporting references. If you are new to working with mouse models, or encounter terms which are unfamiliar to you, we also encourage you to browse the Introduction to Mouse Genetics in order to aid with the interpretation of your results.

What can you do with the HMDC?

This tool can be used to:

  • prioritize candidate variants from whole genome sequencing based on previously reported disease associations and observed phenotypes
  • prioritize candidates within a genomic region (QTLs, GWAS, association or family based studies)
  • identify mouse genes that exhibit a phenotype of interest
  • identify human and mouse genetic associations to disease
  • find an experimental mouse model to use in the study of a specific human disease
[return to top]

Beginning your search

Entry into the HMDC is designed to be straightforward. On the homepage, three boxes appear where investigators may choose to enter (1) genes - either individually or multiples, (2) a genome location or set of genomic regions, and/or (3) disease name(s) or mammalian phenotype(s).

  1. Searching genes:
    Gene based queries will match official symbols, full gene names and synonyms in both human and mice. An asterisk (*) can be used as a wildcard in the prefix or suffix. Separate multiple entries using commas, spaces or new lines.
  2. Searching genome locations:
    To search positions, please ensure that you are using the current genome assembly (build) and have selecting the appropriate species using the radio buttons above the search window. If you need to convert your data, a simple online remapping tool has been provided by NCBI. Currently only base pair (bp) positions are supported, not megabase (Mb) or linkage positions (i.e. centimorgans or band position). Separate multiple entries using spaces, commas or new lines; please do not use commas within coordinate positions as these will be misinterpreted as item separators for independent genome locations.
  3. Searching disease and phenotype terms:
    Disease and phenotype terms may be searched using text-matching by simply typing in the box, or you can take advantage of structured vocabulary by selecting from the autocomplete list that appears. Multiple entries are supported, results must match at least one term, but are not required to match all terms. Use the filters once the grid has been generated to restrict your list. The shaded grey text to the right of autocomplete terms indicates the vocabulary that an annotation applies to:
    • Mammalian Phenotype: will match all genes where a mouse model has been reported to exhibit this phenotype. These are hierarchical, so selecting a broad general term will also bring back genes annotated to more specific child terms. See the Mammalian Phenotype (MP) browser.
    • OMIM: from the Online Mendelian Inheritance in Man database. Contains human diseases with associated human genes and is cross-referenced externally and within MGI for mouse genetic models of these human diseases.
      • Some terms are very specific and may reference only a subtype of the disease (ex. "Alzheimer disease, familial, 5"). In these cases, it may be preferable to use the text-based matching without selecting an autocomplete value.
  4. Upload a vcf:
    Variant call format (.vcf) file upload is also supported for phenotype and disease annotation. This tool is not equipped to do functional analysis of variants or filtering, so we recommend uploading a trimmed candidate list. See Exomiser for a filtering tool designed to process human data. The default is to discard all SNPs with a known dbSNP identifier (rs#). If you want to keep these genes in your results clear the contents of the sixth column, but do not delete it, so as to preserve standard column structure. See Section on vcf files for more information.
search boxes

For many experimental questions, a single search box will be sufficient, but if two boxes have search terms entered, the results will reflect cases matching on both categories (boolean AND). While this can be an efficient shortcut, if a very precise phenotype or disease annotation is used, some potentially relevant results may be omitted.

Whichever search box is used, the same set of results tables will appear. To follow along with this example, enter “trp53 proc apc cdkn2a” into the Genes search box (not case sensitive).

[return to top]

Search Results

Along the top, an orange banner remembers your original query and allows you to quickly modify your results by adding, removing or replacing search terms. A second banner below serves as a header for your results, and indicates which type of matching was run. The results themselves are arranged into three tabbed tables for: Gene homology x Phenotypes/Diseases (shown), Genes, and Diseases.

starting results page


1. Gene homology x Phenotypes/Diseases tab
    On this tab, a list of gene homologs (column 1: human, column 2: mouse) and gene-associated phenotype (left side) or disease (right side) terms which matched your search are returned as an interactive grid. Gene rows are only returned where at least one phenotype or disease term has been annotated, and columns are only displayed if at least one gene association has been reported. Matching transgenes (denoted by "Tg(promoter-gene)lab_code"), where an artificial construct has been introduced and expressed in a live mouse will also appear, but only in the mouse column. This table compresses multiple alleles of a gene (if they exist) into a single row.
  • Filtering: Results can be filtered by clicking the boxes that appear adjacent to each row and above each column. This will filter all tabs.
    • Click to place a checkmark in the rows and/or columns you would like to keep and click on the "Apply Filters" button or filter icon in the top left of the grid. If a physiological system, disease or gene symbol match is not relevant to your experimental question, leave them unselected to hide.
    • Filters can be removed by clicking on the "Remove filters" note which will appear above all tabs, just below the orange Results header bar.
  • Left side of grid: If a mutant allele of a gene has been reported to affect a particular anatomical/physiological system, a systems level phenotype column will appear with a blue filled cell at the intersection of the gene and phenotype. These use a hierarchical structured vocabulary, navigate the Mammalian phenotype browser to find specific terms or click on the filled box to see more details.
    • Darker shades of blue indicate more annotations to aberrant phenotypes within this anatomical/physiological system
    • Clear cells indicate no data.
    • If an aspect of a system has been specifically examined in the context of mouse gene mutants and found to be normal, an N will appear (with blue background fill).
    • If researchers determined that the mouse carrying a particular mutation appeared overall ‘normal’, a normal phenotype column will be displayed on the far right of the Mammalian Phenotypes section
  • Filled boxes on Mammalian Phenotypes side: Click individual boxes to generate a pop-up window with genetic and phenotypic details. The specific allele pairs and exact phenotypes will be displayed. Clicking on an allele symbol or row will generate a new window with the complete list of all phenotype annotations in all systems with supporting references (J:#s). For help interpreting Allele Detail pages see here.
  • Right side of grid: If mutations in this gene have been associated with a human disease or reported as a mouse model of a human disease, columns will appear to the right with colored fill indicating the species.
    • Orange filled cells are used if the Gene x Disease association is supported by human data. Human data annotations come from OMIM, NCBI curation, Gene Reviews, or Gene Tests.
    • If mutant mice have been reported as genetic models for this disease, a blue fill will be used. MGI curators annotate this data based on author statements in peer reviewed publications.
    • In cases where both mouse genetic models and human clinical cases support involvement of orthologous (same gene in different species) genes, a two-toned fill will appear.
    • Clear indicates that the intersection of a given gene and human disease has not been reported.
  • Filled boxes on Human Disease associations side: Click individual boxes to generate a pop-up with genetic and disease details. The specific allele pairs used to model the disease in mice will be displayed. Clicking on an allele symbol or row will generate a new tab with the complete list of all phenotype annotations in all systems within this disease model, along with supporting references (J:#s). Find disease-specific references by clicking on the J:# in the disease box, as well as links to a Human Disease and Mouse Model Detail page by clicking on the disease name, or the OMIM entry for that disease by clicking on the OMIM ID.
  • Unexpected extra diseases may appear in the grid if a phenotype or disease term was used.
    For example, a Disease search for "Alzheimer" will also return "Breast Cancer" and "Schizophrenia" on the grid. These are returned because the specific allele pair(s) which match models for "Alzheimer Disease" have also been reported to exhibit the characteristic phenotypes of these other diseases: Cav1tm1Mls/tm1Mls for both Alzheimer Disease and Breast Cancer, and Plcb1tm1Hssh/tm1Hssh for Alzheimer Disease and Schizophrenia. Use the filters to hide if these are not of interest, or go to the Diseases tab to see only those diseases which matched the original search terms with their gene annotations.

[return to top]

2. Genes tab
    On the Genes tab, the complete list of genes will be returned, with human and mouse homologs listed on separate rows (see column 1: Organism). You may also note that human gene standard nomenclature is in uppercase (ex. AMER1) while mouse genes are written in sentence case (ex. Amer1). Genes matching your query but with no reported phenotypes or diseases will be included on this tab only, so you may see genes here that do not appear on the grid view.
  • To filter this list, apply filters on the "Gene Homology x Phenotypes/Diseases" tab.
  • This table may be sorted by using the arrows in certain column headers, and data on this table (filtered or unfiltered) can be downloaded using the button indicated. Columns in the download are tab delimited and multiple "Associated Human Diseases" or "Abnormal Mouse Phenotypes Reported in these Systems" terms are pipe separated (|).
  • Click on the Gene Symbol in the second column to go to a mouse Gene Detail Page on MGI in rows where "Organism: mouse" or Vertebrate Homology Page in rows where "Organism: human". Example: mouse Apc and human APC.
  • The Associated Human Diseases column in a row that corresponds to Organism: mouse will list diseases where mutant mice have been reported to display phenotypes and symptoms matching the human disease. This also corresponds to a blue filled cells in the right half "Gene homology x Phenotypes/Diseases" grid on the previous tab.
  • The Associated Human Diseases column in a row that corresponds to Organism: human indicates that mutations have been reported in human clinical cases of this disease, or that variant associations have been made in human populations. This corresponds to orange filled cells in the right half of the grid.
  • The References in MGI column provides a list of all MGI-curated references for a gene, as well as a sub-list of Disease Relevant publications, where a mouse was specifically reported by the authors as a model for one of the diseases represented in the Associated Human Diseases column. Reference pages contain the full citation, abstract, curated data, and direct links to the paper itself. These references will focus on mouse genetic models as MGI does not curate human-only data.
  • The Find Mice (IMSR) column on the Genes tab contains links to the International Mouse Strain Resource (IMSR) which is a database indexing the major public and commercial mouse repositories. The hyperlinked number indicates how many mouse strains carrying mutant alleles of a gene are available for purchase. This may link to multiple unique alleles and allele types. Use the Alleles column to see which allele is present before placing an order to ensure that the expected phenotypes will be found. Order forms for each strain are linked in the Repository column in the IMSR (see arrow). If the "Find Mice (IMSR)" column on the Genes tab is blank, or the repositories listed with the IMSR do not distribute your preferred model, it may be possible to obtain mice by directly contacting the corresponding author of a publication which developed the line. See the Original Reference at the bottom of MGI's Allele Detail pages.
[return to top]

3. Diseases tab
    Similar to the Genes tab, the Diseases tab returns a complete list of gene–associated human disease terms, compiled from both human data and mouse models. If a gene or transgene is listed in the Associated Mouse Markers column, this indicates a mouse disease model has been reported for that Gene x Disease pair. If a gene is listed in the Associated Human Markers column, a variant of this gene has been implicated in human association studies or clinical cases. If an OMIM autocomplete term was used as the starting search term, the list will be restricted to the disease(s) specified, including if no gene associations have been reported. If a phenotype term, gene symbol or position was used to run the search, all diseases associated with the matched genes will be displayed.
    Diseases tab
  • The data displayed on this table can be downloaded using the button indicated.
  • The references column contains a list of publications where authors have specifically reported that mutant or transgenic mice have been used to model the characteristic phenotypes of the disease.
  • Click on the hyperlinked disease term to go to a Human Disease and Mouse Model Detail page where gene links and genotype or allele specific information can be viewed by clicking in the Mouse Models column.
    • The Human Disease and Mouse Model Detail page on MGI displays genes associated with human data or mouse models as before, this time delineated into up to three classes for "Associated in both", "Associated in mouse models", and "Associated in human" indicated by the human and/or mouse graphics.
    • The OMIM entry for this disease can be reached by clicking on the OMIM ID number just beneath the header.
    • Clicking on the links in the Mouse Models column will reveal details of the mouse genotype used as model, at least one supporting reference and a link to a view of all phenotypes that have been observed in mice with this combination of mutant alleles and transgenes (where applicable).
[return to top]

Uploading a vcf File

A vcf file is a file format used to store gene sequence variations by position. Typically, these are the result of whole genome or exome sequencing. The format must include:

  • Column 1: CHROM - chromosome number
  • Column 2: POS - variant position in basepairs. PLEASE ENSURE THAT THESE POSITIONS ARE BASED ON GRCh37 (HUMAN) OR GRCm38 (MOUSE). [Convert]
  • Column 3: ID - if the variant is known, a refSNP or other reference identifier will appear. If unknown, this column may be blank or contain "."
  • Column 4: REF - the reference allele based on alignment to a reference genome.
  • Column 5: ALT - the alternative allele that was detected in the sample
  • Column 6: QUAL - a quality score for sequence reads and base calling
  • Column 7: FILTER - indicates that the call is of sufficient confidence for the filters and thresholds applied during analysis. "PASS", some other quality score, blank or may contain "."
  • Column 8: INFO - includes descriptors of the variant
  • Column 9: FORMAT - includes descriptors of the genotyping depth and quality
  • Column 10... Sample1 - sample data. If more than one individual was sequenced, each sample is arranged in a new column.
If you edit your file, please clear column data rather than removing columns.

See sample files for: Human and Mouse.

In this beta release...

The HMDC does not offer filtering or variant specific analyses in our initial release, but rather applies phenotype and disease terms to the genes and homologs which are represented. Note that this will return phenotypes associated with genes even if the SNP variant is benign. For this reason, we have set the default to filter out all variants that have a known rs:# or other identifiers in column 3 (ID), as well as reject all variants that are not tagged as "PASS" in column 7 (FILTER). We also strongly encourage users to pre-filter by presumed functional impact or on the basis of linkage (where possible) using a tool such as Exomiser. These filters allow a phenotypic annotation to de novo, unreported (rare) or private gene variants. If your experimental design requires that known variants be included, you can bypass the ID filter by clearing column 3 in your uploaded file.

Maximum upload file size is 25MB. Only the first 100,000 lines will be processed.

This form has only been tested with VCF v4.0 and higher standard formats.

Please contact MGI User Support to suggest important features that you would like to see implemented in future releases.

[return to top]