The Global Biodata Coalition (GBC) designated GXD a Global Core Biodata Resource.

GCBR_Logo

Global Core Biodata Resources are biodata resources that are of fundamental importance to the wider biological and life sciences community and the long term preservation of biological data. They:

  • provide free and open access to their data,
  • are used extensively both in terms of the number and distribution of their users,
  • are mature and comprehensive,
  • are considered authoritative in their field,
  • are of high scientific quality, and
  • provide a professional standard of service delivery.
Their operation is based on well-established life-cycle management processes and well-understood dependencies with related data resources. GCBRs have either terms of use or specific licenses that conform to the Open Definition, to enable the reuse of data.

GXD introduces an Expression Profile Search

GXD has developed the Expression Profile Search, a tool to search for genes by their expression profile. It allows users to define the expression profile of interest by specifying up to 10 anatomical structures and whether expression is present or absent in these structures. It can be found as a tab on the Gene Expression Data Query.

GXD now includes RNA-seq data

GXD has been expanded to include RNA-Seq data. In keeping with GXD's scope, these data are from experiments that examine endogenous gene expression in wild-type and mutant mice during the embryonic stages and/or postnatal life. The data sets have been imported from the EMBL-EBI's Expression Atlas. We have integrated these data with the other types of expression data in GXD and with the genetic, functional, phenotypic and disease-related information in MGI and thus made them accessible to many new search capabilities.

We chose the Expression Atlas as our data source because their team selects high quality data sets from the public repositories (ArrayExpress and NCBI's GEO) and then uses a standardized pipeline to re-analyze the data, generating consistently processed TPM values. To effectively integrate these data into GXD, we processed these files further to compute averaged quantile normalized TPM values per gene per biological replicate set. Using the Expression Atlas thresholds as a guide, the TPM values are assigned to expression bins of high, medium, low, and below cutoff.

This binning of TPM values allowed us to assign a detected/not detected value to these data, as is done for all the other expression data in GXD. We also annotated metadata for the RNA-seq samples using the same controlled vocabularies and ontologies we use for all other expression data in GXD. These two steps enable the full integration of these data into GXD/MGI and makes them accessible via existing search tools.

New data filters on GXD's search summaries

GXD has developed new filters that take advantage of the genetic, functional, phenotypic and disease-related information in MGI. These filters have been added to the gene expression data search summaries. They enable users to use gene function, phenotype and disease ontology annotations, as well as marker type, to filter expression assay results. Filters for individual RNA-seq data sets and TPM expression bins have also been developed. When combined with the pre-existing filters for anatomical system, developmental stage, assay type, detected/not detected and wild-type and mutant specimens, users have powerful tools to quickly and efficiently extract the expression data of interest to them.

Direct access to Morpheus heat map visualization and analysis tools

GXD users can use our search tools and filters to create RNA-seq data sets containing the expression data of interest to them. Then, by merely clicking a button on the gene expression data search summary, these data, including the curated sample metadata, will be rendered into an expression heat map via Morpheus, a heat map visualization and analysis tool created at the Broad Institute. Morpheus offers a myriad of tools for further display and analysis, including sorting, filtering, hierarchical clustering, nearest neighbors analysis, and visual enrichment.