What do we do?
The BioMeXT group specializes in Information Extraction (IE), from the biomedical literature, as well as other textual sources. Information extraction consists in the task of automatically extracting structured information from textual documents, and is an important component of Text Mining systems. We specialize in the extraction of domain-specific entities (such as genes, proteins, drugs, diseases), and their semantic relations (e.g. protein-protein interactions, gene-disease associations). Our tools are often evaluated through participation in community-run evaluation challenges (e.g. BioCreAtIvE: Critical Assessment of Information Extraction systems in Biology). Additionally, we provide an environment for Assisted Curation (ODIN), which is currently being used in the curation pipeline of the RegulonDB database in a project funded by the US-NIH.
The SNF-funded MelanoBase project began in March 2016. The goal of the project is a large-scale extraction of information from the biomedical literature in order to build a disease-centric knowledgebase of information relevant for biological and clinical purposes. Later in the year a collaboration was established with the FBK research institute in Trento (Italy) in order to pursue advanced natural language processing techniques relevant for the MelanoBase project. In December 2016 the collaborative project "Automated detection of adverse drug events from older inpatients' medical records using structured data mining and natural language processing", submitted within the "Smarter Health Care" National Research Programme (NRP74) was approved. We will participate in this project, contributing natural language processing technologies for the automated analysis of medical records.
Main publications 2016
- Qinghua Wang, Shabbir S. Abdul, Lara Almeida, Sophia Ananiadou, Yalbi I. Balderas-Martínez, Riza Batista-Navarro, David Campos, Lucy Chilton, Hui-Jou Chou, Gabriela Contreras, Laurel Cooper, Hong-Jie Dai, Barbra Ferrell, Juliane Fluck, Socorro Gama-Castro, Nancy George, Georgios Gkoutos, Afroza K. Irin, Lars J. Jensen, Silvia Jimenez, Toni R. Jue, Ingrid Keseler, Sumit Madan, Sérgio Matos, Peter McQuilton, Marija Milacic, Matthew Mort, Jeyakumar Natarajan, Evangelos Pafilis, Emiliano Pereira, Shruti Rao, Fabio Rinaldi, Karen Rothfels, David Salgado, Raquel M. Silva, Onkar Singh, Raymund Stefancsik, Chu-Hsien Su, Suresh Subramani, Hamsa D. Tadepally, Loukia Tsaprouni, Nicole Vasilevsky, Xiaodong Wang, Andrew Chatr-Aryamontri, Stanley J. F. Laulederkind, Sherri Matis-Mitchell, Johanna McEntyre, Sandra Orchard, Sangya Pundir, Raul Rodriguez-Esteban, Kimberly Van Auken, Zhiyong Lu, Mary Schaeffer, Cathy H. Wu, Lynette Hirschman, and Cecilia N. Arighi. Overview of the interactive task in BioCreative V. The Journal of Biological Databases and Curation, 2016, baw119. doi:10.1093/database/baw119
- Mauro Mazzocut, Ivana Truccolo, Marialuisa Antonini, Fabio Rinaldi, Paolo Omero, Emanuela Ferrarin, Paolo De Paoli, Carlo Tasso. Insight of Italian Web conversations in Complementary and Alternative Medicines and Cancer, Journal of Medical Internet Research, 2016, 18(6):e120. doi:10.2196/jmir.5521
- Fabio Rinaldi, Tilia Renate Ellendorff, Sumit Madan, Simon Clematide, Adrian van der Lek, Theo Mevissen, Juliane Fluck. BioCreative V Track 4: A Shared Task for the Extraction of Causal Network Information in Biological Expression Language. The Journal of Biological Databases and Curation, 2016, baw067. doi:10.1093/database/baw067