Answering biological questions with federated queries across databases

“If you want to answer complex biological questions, you often need to combine data that are scattered on the web”. In this in silico talk, SIB’s Tarcisio Mendes de Farias at the University of Lausanne presents an approach - and a tool – to face this challenge. Called BioQuery, the interface he developed with his colleagues enables biologists to quickly run predefined queries in natural language across multiple data sources, from a single-entry-point. Right now, the tool - described in a paper published in the journal Database – relies on data integration across several leading databases, including SIB Resources, and already provides users with new biological insights.

About the in silico talks series – The latest in bioinformatics by SIB Scientists

The in silico talks online series aims to inform bioinformaticians, life scientists and clinicians about the latest advances led by SIB Scientists on a wide range of topics in bioinformatics methods, research and resources. Stay abreast of the latest developments, get exclusive insights into recent papers, and discover how these advances might help you in your work or research, by subscribing to the in silico talks mailing list.

Speaker

Name:

Tarcisio Mendes de Farias

Institute:

SIB Swiss Institute of Bioinformatics

Group:

Vital-IT

Tarcisio Mendes de Farias is a staff Research Scientist in the Bgee team at the SIB Swiss Institute of Bioinformatics. He has a PhD degree in Computer Science under an industrial program that involved the University of Burgundy and ACTIVe3D - Sopra Steria company in France. He also worked for Dassault Systèmes as a R&D Product Manager. In 2019, he finished a 2,5 year Postdoc at the labs of C. Dessimoz and M. Robinson-Rechavi in Switzerland. He has a M.Sc. degree in Information and Communication Technologies from the University of Technology of Compiègne in France, and a computer engineering degree from the University of Pernambuco in Brazil. Currently, he is mainly interested in research projects about data integration and interoperability in life sciences, natural language processing, and ontologies for describing biological and biomedical knowledge.

Video

Duration:

11 minutes 46 seconds

License:

This video is available under the creative commons license CC-BY-4.0

Complex bioinformatics databases hold enormous amounts of knowledge that can be retrieved with in-depth technical know-how. The recent study presented here enables an easy access to the wealth of complementary information contained in different resources, through editable template queries in natural language.

An example? In his talk, Tarcisio takes the example of typical research question a molecular biologist studying a certain type of brain cancer may have: “what are the human genes associated with the disease, for which orthologs exist in the rat and which are expressed in its brain?”.

The answer to this question would indeed allow her to: 1) identify all the genes involved in the disease in human – an information available in UniProtKB, 2) find out which are the “corresponding” (orthologous) genes in a model species such as the rat – an information available in OMA, and from these, 3) identify those that are specifically expressed in its brain – an information available in Bgee.

Listen to Tarcisio as he presents the approach he and his colleagues took to integrate the data from these different resources, and see the tool they developed in action, allowing researchers to ask, without in-depth technical know-how, this very same question and retrieve the answer in a short time.

The study was powered by the BioSODA project, supported by the National Research Programme “Big Data” (NRP75)

Reference(s)

Sima A C, Mendes de Farias T et al. Enabling semantic queries across federated bioinformatics databases. Database (2019).

DOI:

10.1093

Mendes de Farias T et al. VoIDext: Vocabulary and Patterns for Enhancing Interoperable Datasets with Virtual Links. In: On the Move to Meaningful Internet Systems: OTM 2019 Conferences. OTM 2019. Lecture Notes in Computer Science, vol 11877. Springer, Cham.

DOI:

10.1007