Epigenomics, the study of the complete set of epigenetic modifications on the genetic material of a cell, underwent a profound big data revolution over the past years. To enable research and downstream discoveries by making the most of the data generated, the Epigenomics focus group aims to propose recommendations on how to reach FAIR principles goals, from data visualization to metadata annotation and benchmarking of analysis tools. It is bringing together SIB Members from a range of disciplines such as epigenetics, transcriptomics, core facilities and benchmarking.

Fostering the use and analysis of freely available data through FAIR principles

Novel high-throughput sequencing technologies have revolutionized the field of epigenomics, starting about 15 years ago. Assays such as ChIP-seq, ATAC-seq, MNase-seq or CAGE-seq produce genome-wide maps of transcription factor binding sites, histone post-translational modifications, open chromatin regions, nucleosomes and transcription start sites at base pair or near base pair resolution. As a result, previously hidden gene regulatory events that take place along chromosomes have suddenly become visible. An incredible wealth of public data has already been generated and continues to grow exponentially. Despite unrestricted data access, they are still heavily under-used and under-analyzed. The focus of this group is thus on issues related to data usability, interoperability, visualization and reproducibly, briefly on FAIR principles. The three areas outlined below will be considered in priority.

About the SIB focus groups

The Focus Groups aim to foster knowledge exchanges and collaborations in the community of 900 SIB Members, around specific scientific topics, from single-cell omics to epigenomics.

Data visualization, an SIB epigenomics track hub initiative

Bench biologists studying gene regulation are often only interested in very narrow genomic regions, within which they need to access and explore diverse data types from many different laboratories. And if possible, upon a few mouse clicks. Unfortunately, public epigenomics data are organized in a way that makes this difficult. They come as huge files containing data for the entire genome, but only from a single experiment. To address this bottleneck, the group advocates making epigenomics data viewable via UCSC track hubs, in parallel to deposition of the raw data in a public repository. Track hubs, in conjunction with indexed Big Data formats, allow for easy and rapid on-the-fly integration of data from all over the world in a single browser window. The proposed initiative aims to encourage Swiss epigenomics researchers to make their own data available as track hubs, by providing advice and technical support via training and person-to-person know-how transfer. As bioinformatics core facilities could potentially play a pivotal role in this endeavor, the group is looking to establish collaborations with such entities.

Fighting the metadata crises with recommendations for sample annotation

Epigenomics data are readily accessible, the formats used are generally standard, and a panoply of powerful methods and software resources already exists for analyzing the data. However, there are major shortcomings and disparities regarding the quality and completeness of the metadata. And without knowing with confidence what the data in a given file represent, no biological insights can be gained, even with most sophisticated algorithms. “The metadata crisis is due, on one hand, to a lack of incentives for data producers to properly annotate their data, on the other hand to insufficient quality checks and data curation efforts along the data dissemination channels, for instance on the part of data repository staff or journal editors” explains Philipp Bucher, Chair of the Focus Group. The situation is further aggravated by the sparsity or absence of community-accepted metadata representation standards (ontologies) for regulatory genomic regions and, to a lesser extent, for cell types and physiological conditions. The Epigenomics Focus Group aims to become a forum for discussing and addressing the bottlenecks in this area among experts from diverse fields. One specific objective is to come up with recommendations for epigenomics dataset annotation broadly supported by SIB Groups and beyond. As many aspects of the metadata crisis in epigenomics extend to other omics fields, in particular transcriptomics, the group welcomes participation and interactions with data producers, data users and biocurators from neighboring fields.

Facilitating data analysis through benchmarking, protocol sharing, and tools

Researchers interested in analyzing their own data or public data face the paradox of choice. Plenty of public programs and web resources are already available, and new ones are being released all the time. How to choose the best bioinformatics tools for a particular task? Testing a new method often is a time-consuming exercise resulting in disappointment. The group’s focus in this area is on benchmarking and exchanging first-hand experiences among users of bioinformatics tools, both at the level of individual processing steps (e.g. peak finding) and comprehensive analysis pipelines (e.g. from sequence reads to gene regulatory network). Issues related to computational reproducibility and proper deployment of workflows are also discussed. The goal is to create a community of people interested in sharing their experience and know-how, and to build up an infrastructure for this purpose. This may include the creation of reference data sets and the organization of benchmarking events, in addition to using standard communications vehicles such as teleconferences or mailing lists. Participation and interactions with experts from other fields such as benchmarking and computational reproducibility is encouraged here.

Focus Group coordinating members:

Are you a SIB Member and interested in joining? Contact Philipp Bucher.