The SIB Resource Bgee is a database for retrieval and comparison of gene expression patterns across multiple animal species. It provides an intuitive answer to the question “where is a gene expressed?” and supports research in cancer and agriculture as well as evolutionary biology. On the occasion of its latest release, Marc Robinson-Rechavi and Frédéric Bastian, co-leading the Bgee team, tell us how the database has evolved over the years, and shed light on some of its newest features – and future developments.
What is Bgee?
Frédéric Bastian: Bgee is an integrated curated expression atlas allowing us to retrieve gene expression patterns in multiple animal species, and to perform comparative transcriptomics. It provides an intuitive answer to the question “where is a gene expressed?”. An important feature of Bgee is that it is exclusively based on curated healthy wild-type expression data (e.g. no data from gene knock-outs, treatments or diseases) to provide a comparable reference base of normal gene expression.
How can Bgee be used in research?
Frédéric Bastian: Its applications range from retrieving information on a single gene to functional genomics studies looking at normal gene functions or gene expression evolution. It is also used in cancer research to characterize healthy gene expression and in the agricultural domain, for instance to study gene expression variations between different breeds of farm animal.
Can you cite a particularly exciting example of a study using Bgee?
Marc Robinson-Rechavi: A paper published in NAR in 2020 used Bgee to study how gene expression was putatively controlled in the ancestor of all vertebrate species. Some of the conclusions - such as the importance of conserved regulatory elements involved in the development of the nervous system - were drawn thanks to unique tools provided by Bgee. In particular, the gene expression comparisons allow study of gene expression patterns between species, and TopAnat is a tool to perform enrichment analyses similar to gene ontology enrichment tests by using anatomical terms mapped to genes by expression patterns. The approach used in this study could, for example, help to prioritize sequence variants in whole genome sequences of patients affected by genetic diseases.
From v1 to v14: what are the key changes made to Bgee since its inception?
Marc Robinson-Rechavi: Data integration is the key word here. In its initial release, Bgee included EST (Expressed Sequence Tags) data of only four species. Since then, it has grown to include RNA-Seq, Affymetrix and in situ hybridization data in 29 species. This has been possible thanks to Bgee’s unique approach of integrating and harmonizing datasets, making them comparable between experiments and species.
Frédéric Bastian: While originally accessible only through its website, we wanted to make it possible to embed Bgee into downstream analysis pipelines. We thus developed multiple Bioconductor R packages and web-based tools to perform gene expression enrichment analyses and retrieve expression data annotations, and allowing users to detect genes actively expressed in their own RNA-Seq or scRNA-Seq datasets. We believe these developments make Bgee a truly versatile tool, which can be used to answer novel research questions using gene expression analyses in a wide range of animal species.
What differentiates Bgee from other, similar, resources?
Marc Robinson-Rechavi: Bgee differentiates itself from other resources by completely integrating data across multiple datasets and multiple technologies using qualitative (calls of presence/absence of expression) and quantitative methods (non-parametric statistics producing expression “scores”); together providing a single answer to the question “where is this gene expressed?”.
In addition, relations of anatomical homology between species have been curated to allow gene expression comparisons between different species: information on gene expression in human lung is now comparable to that on the swim bladder in zebrafish. This is essential not only to study gene evolution, but also in other fields such as biomedical applications.
Which feature in Bgee is currently most exciting?
Frédéric Bastian: It is definitely the gene expression comparison tool! Using curated anatomical homology in animals, this feature allows the automatic comparison of gene expression within and between species. A user can enter a gene list, and Bgee will identify the conditions in which the expression of a gene is the most conserved. For instance, when entering the list of orthologs of the brain gene SRRM4, Bgee correctly identifies specific nervous system structures as the organs with the most conserved expression in vertebrates.
Thinking about the future: how do you expect to see Bgee continue to evolve?
Frédéric Bastian: The release of version 15 of Bgee, planned for April 2021, will integrate single-cell RNA-Seq data (scRNA-Seq), as well as RNA-Seq from 60 more species. This is a major step that will allow an unprecedented level of detail in the description of gene expression patterns. Research data needs to be interoperable to advance the life sciences. Bgee facilitates this process since both researchers and published datasets benefit from its tools and integration functions: data remains seamlessly available to researchers, either for analyses within a single species, or for comparative transcriptomics over multiple species.
What does it mean to be an SIB Resource?
Marc Robinson-Rechavi: By being an SIB Resource, Bgee benefits from the network of best-in-class resources identified and supported by SIB. This allows data and knowledge exchange with major resources, such as UniProtKB/SwissProt, STRING, and SwissOrthology. It also has access to the range of competences of SIB experts, notably in biocuration. Bgee also benefits from SIB’s support in improving user experience, disseminating information, and applying for grants from major funding agencies.
Frédéric Bastian: More generally, by promoting its culture of excellence in data science, SIB motivates the Bgee team to pursue its aim of producing results of the highest standard and quality. This is reflected very pragmatically in our annotation standards, coding practices, and data quality assurance.
What striking feature would you next like to integrate into Bgee?
Frédéric Bastian: A tool to perform differential expression analyses over the integration of all data in Bgee, allowing comparison of any condition (e.g. data type, species or tissue). With such a feature, it would for instance be possible to retrieve the most important genes in an organ as compared to all other organs, or the genes harbouring most of the variations between different strains of the same species, or study expression level changes during gene evolution between species. We actually have something in the pipeline for that!
Bastian et al., The Bgee suite: integrated curated expression atlas and comparative transcriptomics in animals, Nucleic Acids Research, 2020.