Mouse genetics has experienced many milestones over the last few decades, from the generation of inbred lines which consistently exhibited a coat color trait or susceptibility to tumors in the early 1900s, up to routine gene manipulation, conditional targeting and whole genome sequencing or transcript profiling of individuals. As an experimental system, the mouse has allowed incredible insight into gene function, dissection of gene-disease relationships and a wealth of other data.
The study of genetics in it’s most basic form is the study of genes. What they are, where they are, how they work, how they differ, and how this forms the basis of inherited traits. Inherited traits cover the gamut of metabolic activity, disease susceptibility, body patterning, hair color, blood type and more. The code for these traits resides in an organism's DNA. When a gene is activated, the DNA is transcribed into messenger RNA (mRNA) which is then translated into protein. The regulation of gene activation is controlled by transcription factors which can bind to non-coding promoter or enhancer elements. These recruit RNA polymerases and other factors to the transcriptional start site, which copy the sequence into a new molecule. The regions of the gene that will eventually code for protein are called exons and are spliced together with introns removed as part of mRNA processing. The canonical splice site consensus sequence in pre-mRNA uses "CAG| G" for the 3' donor and "MAG | GTRAGT" for the 5' splice site acceptor, where M is A or C and R is A or G[1]. Additional post-transcriptional regulation may also occur by non-coding microRNAs (miRNA), small interfering RNAs (siRNA), small nuclear or nucleolar RNAs (snRNA and snoRNA), or other processes. mRNA is capped, polyadenylated and exported from the nucleus to the ribosomes, where it is translated into protein. |
Within a species, all members carry the same set of genes, or functional units. The mouse reference genome contains an approximately 24,500 protein-coding genes and was fully sequenced in 2002 [2]. Structurally, these are distributed across 19 pairs of autosomes, plus X and Y sex chromosomes. For the sake of comparison, humans have 23 pairs of chromosomes (22 autosomes plus XX or XY) which contain an estimated 20,687 protein-coding genes [3]. Over 90% of the mouse and human genomes can be aligned into regions of shared synteny, which is to say, blocks where homologous genes are conserved in the same relative order, indicative of shared common ancestry. At the gene level, a mouse homolog has been identified and classified for 17,096 human genes. Variability between individuals within a species is due to variant gene forms called alleles. Some changes are able to alter the expression or function of genes resulting in observable phenotypic differences, such as coat color, disease resistance/susceptibility, or metabolism but many other alleles result in little to no detectable variation beyond the sequence level. Variant alleles may include single nucleotide polymorphisms (SNPs), insertions and deletions (indels), or copy number variation (CNVs). In the case of laboratory model organisms, targeted alleles also exist where genetic engineering techniques have been used to specifically alter, delete or introduce DNA sequences. A single gene may have multiple alleles with different biological consequences, and every allele in MGI is given a unique identifier, which is appended to the gene symbol as a superscript. See the section below on Nomenclature for more. Barring the unusual case of chimeras, every nucleated cell in a mouse’s body carries the same DNA code. However, different gene expression patterns allow these cells to develop into a variety of tissues and organs, as well as respond to stimuli. The Gene Expression Database at MGI allows you to search for genes which are expressed in different mouse anatomical structures, with a particular emphasis on embryonic development. The end result, whether a single measurement or the composite of all of an individual's observable or measurable traits is called the phenotype. In MGI, phenotypes are annotated to the complete genotype, which is comprised of both the allele(s) of special interest as well as the background. Just as different specific alleles may be expected to have different characteristics and biological impacts (consider missense coding variants versus knockout alleles versus conditional or reporter targeted alleles), other alleles in the strain background, whether known or unknown, can have modifier effects on a trait, or set of traits. For example, the Lepob allele shows background sensitivity with homozygous Lepob/Lepob on a C57BL/6J background exhibiting severe obesity with a pre-diabetes-like syndrome, while Lepob/Lepob on a C57BLKS background become severely diabetic and infertile in addition to obese, with a significantly shortened life expectancy. |
For a full glossary, see: MGI Glossary
Term | Definition |
---|---|
Inbred strain | A strain that is essentially homozygous at all loci. In mice, requires brother- sister matings for at least 20 sequential generations. Within an inbred strain, individuals are genetically identical to one another; though different inbred strains each carry a unique set of alleles. See more. |
Allele | One of the variant forms of a gene, differing from other forms in its nucleotide sequence. Diploid organisms carry two alleles at each locus which may be the same (homozygous) or different (heterozygous). Hemizygous is used to describe alleles where you would not expect two alleles for an organism in the "normal" state (i.e. Y-chromosomes, X-linked genes in males, or insertion of foreign DNA). Includes natural variation and targeted variants. |
Phenotypic Allele | An allelic variant which produces an observable change. May be spontaneous mutations or targeted. |
Targeted Allele | Using laboratory techniques to specifically disrupt, alter or introduce DNA sequences such that the “targeted” allele will be heritable in the germ line of offspring. Includes knockouts, knock-ins, reporters, conditional alleles, and more. |
Transgene | Any DNA sequence or combination of sequences that has been introduced via a construct into the germ line of the animal by random integration. |
Genotype |
|
Phenotype |
|
Locus/Loci (pl.) | Literally "place". Refers to chromosomal position or coordinates, or a gene/cluster which can be mapped there. |
Endogenous | That which originates from within the organism. ex. Gene sequences which have not been targeted. |
Homolog |
|
To begin understanding mouse model genetics, it is important to begin at the roots - inbred lines. An inbred strain is one in which mice have been carefully bred either brother-sister or parent-progeny over a minimum of 20 sequential generations to produce a line which is genetically and phenotypically consistent. Most laboratory strains are now several hundreds of generations inbred. Within an inbred mouse strain, all individuals have the same genotype and are homozygous (carry a single allelic variant) at all loci. Members of an inbred strain are essentially genetic clones of one another, which allows good experimental reproducibility for genetically influenced traits, guaranteeing a consistent and uniform animal model for study, even across multiple generations or different laboratories.
While all mice are genetically homogenous within an inbred line, different inbred lines will carry a different set of fixed alleles and exhibit different phenotypic characteristics as a result. Choosing the right inbred strain for a particular experiment or control is an important consideration when working with laboratory mice.
Having a well characterized and consistent background is also a valuable tool when it comes to examining the effects of a targeted allele or manipulation. By comparing the phenotype of the targeted line to its parental strain, it is possible to determine the which biological systems have been affected and draw inferences about the function of the single, modified gene.
Inbred strains are given names to identify the specific genetic background. These names are composed of a combination of capital letters and/or numbers and may also have substrain designations (letters or numbers following a forward slash) and-or lab codes to indicate where the strain is being maintained. See the nomenclature tutorial video below (requires Flash), or browse the Full Guidelines for Nomenclature of Mouse and Rat Strains. If the tutorial below is not displayed on this page, you may also be able to view it here.
The most commonly used inbred strain is C57BL/6J which was also the strain sequenced to generate the "reference" mouse genome. For more information on specific inbred strains, see the strain detail pages on the Mouse Phenome Database[4].
As mentioned above, one of the advantages of inbred strains in genetic research is that they provide a stable framework to study the role a single gene by using mutants or knockout alleles. Unlike in population-based studies in humans where many genetic factors are segregating, a new, known mutation on a predictable background allows researchers to attribute new phenotypes to that specific gene and draw inferences about it’s normal function. Because a stable, predictable background is required, mutant phenotypes in MGI are attributed to the complete genotype of a mouse (specific allele on a defined inbred background). Many mutant alleles exhibit the same phenotype regardless of background, but others have altered penetrance or display a range of phenotypes in a background-dependent fashion as a result of other pathway modifiers.
One of the most useful aspects of the mouse as an experimental genetic model is this ability to easily generate targeted alleles. The canonical targeted allele is a gene knockout, where all or part of the coding region of a targeted gene is replaced by a selection cassette, completely eliminating function. This is achieved by introducing a vector engineered to contain homologous sequences on either side of the targeting cassette into embryonic stem (ES) cells. The cellular machinery within a cell pairs the vector’s homologous arms to the mouse’s endogenous sequence and if recombination occurs on both sides, the cassette will be introduced into the germ line. These ES cells can be selected and transplanted back into a pseudopregnant female mouse where they will grow and develop. The illustration in Figure 5 (C) shows a heterozygous targeted allele.
This same strategy can also be used to introduce specific new variation into a mouse. This may be a single, specific engineered substitution designed to duplicate a mutation found in human clinical cases (ex. M390R in Bbs1tm1Vcs), a recombinase recognition sequence to allow in vivo genome manipulation (LoxP, FRT), or the introduction of a complete cDNA from another species, which is known as a transgene.
Allele type | Generation | Description | Most common uses (others may apply) |
---|---|---|---|
Knockout | Targeted, homologous recombination | Replaces the coding region of a known gene (complete gene or specific exons) with a targeting cassette. Complete loss-of-function. | Used to study function of the targeted gene by comparison to non-targeted controls. |
Knock-in (nucleotide substitution) | Targeted, homologous recombination | Replaces the coding region of a known gene with an alternative sequence of the same gene containing a specific mutation | Used to study the phenotypic impact of this variation, often pulled from human clinical observations. Functional impact is allele specific. |
Knock-in (insertion) | Targeted, homologous recombination | A targeting construct is used to introduce a new variant into a specific locus. May be an alternative variant of the targeted gene or a complete new cDNA | Used to study the role of the variant and/or express the desired variant or new gene product, ex. humanized mice or a recombinase under the control of an endogenous promoter. |
Floxed | Targeted, homologous recombination | LoxP recombinase recognition sequences are introduced flanking a targeted gene, exon or portion of a construct | Co-expression of cre recombinase in the same cell will remodel the flanked region according to the directionality of the LoxP sites. In the absence of cre the gene is typically expected to function as wild type. See section on Conditional Alleles and cre-Lox. Permits tissue specific excision or expression of a gene. Allows study of genes required for developmental viability that would be embryonic lethal in a knock-out. |
FRT | Targeted, homologous recombination | FRT recombinase recognition sequences are introduced flanking a targeted gene, exon or portion of a construct | Similar to floxed, but sensitive to co-expression with Flp recombinase (flippase). |
Targeted Reporter | Targeted, homologous recombination | In frame insertion of a reporter (GFP, LacZ, luciferase, etc) gene into a known locus following some regulatory sequence. May disrupt an endogenous gene though this is not required. | Used to monitor transcriptional activity of an endogenous promoter in vivo. If used as a tag on a functional protein, may be used to study protein localization or cell trafficking of the gene product |
Gene trap | Random insertion | A known cassette with a selectable marker is randomly incorporated into the germ line. See the International Gene Trap Consortium | High throughput gene disruption. Often cassettes have reporter function allowing researchers to determine transcriptional activation of the endogenous promoter once the site of insertion has been determined |
Transgenic (cre/Flp) | typically random insertion, may be targeted | Introduction of cre or Flp recombinase genes under the control of a promoter | Used to specify which tissues will express a site-specific recombinase enzyme, and therefore when and/or where within an organism a floxed/FRT gene will be remodeled. See section on Conditional Alleles and cre-Lox |
Transgenic (expressed) | typically random insertion, may be targeted | Introduction of a foreign-origin gene for expression in the mouse | Expression of a non-mouse protein of interest. |
Transgenic (reporter, cre) | typically random insertion, may be targeted | Introduction of a reporter gene with a loxP-flanked STOP codon before the coding sequence | Allows researchers to assess the transgene activity of a cre recombinase transgenic line. Co-expression with cre removes the STOP codon and activates the reporter, marking tissues where cre activity is present. See more. |
For many genes, a complete null allele can have very serious consequences for embryonic development or viability. As well, investigators may often wish to examine the role of a gene exclusively in the context of a particular organ system or cell type without causing abnormalities elsewhere. Site-specific recombinase technology can be used to efficiently cause deletions, translocations and inversions in genomic DNA with high fidelity when a genome-remodeling recombinase enzyme is co-expressed. By restricting expression of the recombinase to specific tissues or cell lineages - or by placing it under the control of a drug-inducible promoter - the genome remodeling can also be restricted to specific tissues or time points.
The most commonly used system in laboratory mice is called "cre-lox" which involves the cre recombinase enzyme, originally cloned from the P1 bacteriophage, and recombinase recognition sites referred to as "loxP sites" (locus of X-over P1). LoxP sites are 32 base pair consensus sequences with an 8-base core and two inverted repeats.
ATAACTTCGTATAGCATACATTATACGAAGTTAT►
The inverted repeats give the loxP site the capacity for directionality. If the two flanking loxP sites (upstream and downstream of the target) are oriented in the same direction, the floxed segment will be excised along with one of the loxP repeats (as shown in Figure 5), if the loxP sites are opposite orientations (facing each other), the segment will be inverted, and if the loxP sites are located on different chromosomes (a trans arrangement), the recombinase will mediate chromosomal translocations.
Typically, cre and loxP strains are developed separately and crossed together to produce a cre-lox strain.
The offspring of this cross will carry both the floxed targeted allele for the gene of interest as well as the recombinase. In tissues where the recombinase is expressed, the region flanked by loxP sites will be excised (or otherwise recombined) generating a tissue-restricted knockout. In tissues where the cre promoter is inactive, the recombinase will not be expressed and the floxed gene will remain functionally intact.
Recombinase activity of a given cre transgene allele can be determined by crossing the cre mouse to a reporter strain which carries a loxP-flanked STOP cassette in front of a reporter gene (lacZ or GFP, for example). If cre recombinase is active in a tissue, the STOP will be excised and the reporter expressed.
To locate a cre recombinase strain that expresses in a particular tissue type, or is under the control of a specific known promoter, use MGI's Recombinase (cre) Portal, or search using the Phenotypes, Alleles and Disease Models query form, specifying "transgenic (Cre/Flp)" in the Categories section.
Structured, controlled vocabularies in biology allow data to be organized and accessed using standardized, hierarchical terms. It also allows databases such as MGI to annotate similar findings under similar headings, so that researchers do not need to browse multiple keywords at different levels of precision.
MGI annotates data using three major vocabularies: