Peer Bork, then Interim Director General of the European Molecular Biology Laboratory (EMBL), gave a keynote talk at the 2025 edition of SIB’s flagship biennial event, the [BC]2 Basel Computational Biology Conference.
Peer passed away in January 2026. We are honoured to have spoken with the globally renowned bioinformatician late last year about the role of data in life science research and innovation. This interview is published as a tribute to his vision.
SIB: Over the past 30 years, the life sciences have become one of the largest producers of big data. Does this mean biologists must now also be data scientists?
Peer Bork: Modern biology is a data-driven science, so most biologists need to be able to understand, analyse and work with data.
Most work with large datasets and specialist tools, so bioinformatics and data science have become central to the field. Of course, this also means that biodata resources and support from data specialists are essential for biology research and discovery. Without these, biologists would not be able to share, access or analyse large volumes of data to gain new insights and develop solutions for global challenges.
Finally, AI – which will revolutionize biology like other scientific disciplines – relies on data, so it is important for researchers to design experiments in such a way that creates ‘AI-ready’ data.
SIB: Is the essential role of data resources for research and innovation well understood and supported by governments and science funding bodies?
P.B.: The hard truth is that without data resources, the life sciences would grind to a halt. These resources are as important as labs, instruments and even electricity, yet they rarely receive the same recognition, and are often taken for granted.
As the volumes of data generated and subsequent demands on data infrastructure continue to rise, the challenge – and the opportunity – is to move beyond short-term project support and establish sustainable funding mechanisms and appropriate recognition, nationally and beyond.
SIB and EMBL: driving innovation, economic strength and well-being
Like SIB, EMBL enables life-science research and its translation to medicine, agriculture, industry and society by providing openly available biological data, tools and knowledge. SIB partners with EMBL on globally important data resources – including UniProt, the world’s leading knowledgebase for protein sequence and function, and STRING, a knowledgebase of protein-protein interactions, which are both part of the SIB Resource portfolio.
SIB: Biodata resources have gone from being shared among individual scientists using email and floppy disks, to being openly available online via research infrastructure like EMBL-EBI, SIB and ELIXIR. What do you see as the most important aspect of their next big evolution?
P.B.: The biological data landscape is fragmented: there are a multitude of data types, producers, and formats that don’t always ‘talk’ to each other. It takes a lot of effort to make data FAIR – which stands for Findable, Accessible, Interoperable and Reusable. In a nutshell, FAIR means data are produced once and then reused over and over again, by scientists worldwide, to gain new insights. FAIR, machine-readable data are critical to leverage powerful AI technologies.
But no single organization or country can do this alone. FAIR data is a team game. We need both centralized and federated resources that can talk to each other. Even centralized resources such as those provided by SIB or EMBL (see box) among others, can’t capture the massive amounts of data being generated, which need to be quality controlled and curated. So there is also a need for specialist domain knowledge kept by expert communities around the world. It’s a community- and infrastructure-building exercise.
SIB and EMBL: driving innovation, economic strength and well-being
Like SIB, EMBL enables life-science research and its translation to medicine, agriculture, industry and society by providing openly available biological data, tools and knowledge. SIB partners with EMBL on globally important data resources – including UniProt, the world’s leading knowledgebase for protein sequence and function, and STRING, a knowledgebase of protein-protein interactions, which are both part of the SIB Resource portfolio.
SIB: Given the huge number of data resources available – and that new databases and software tools continue to be created – how can institutions, funding bodies and governments be sure the right resources are being maintained and further developed?
P.B.: To leverage the power of open data resources, we need coordinated efforts from funders, governments and scientific institutes. Together, we must first recognize that biodata infrastructure is as important to science as roads or electricity are to society.
We also need to develop long-term, stable funding models, shared international responsibility, and more incentive mechanisms for scientists who share their research data in a FAIR way.
Organizations like the European life sciences infrastructure ELIXIR and the Global Biodata Coalition are already doing valuable work in bringing communities together, identifying critical data resources, and working with funders to secure their future. But we still have a long way to go to secure these critical resources and leverage them in a way that delivers real benefits for science, healthcare and everyday life.
SIB: Can you give an example of how interoperable, federated resources enable next-generation life-science initiatives?
P.B.: A recent example from EMBL and partners is TREC, which stands for TRaversing Ecosystems. This is large-scale study of ecosystems and their response to the environment, from molecules to communities, has completed the sampling part of its first expedition along European coastlines. Together with our partners, we developed novel sampling standards and gathered biological samples and environmental data from 115 locations. The vast volumes of data collected, which will be made publicly open, can be used to understand and develop solutions for major challenges, such as environmental pollution, biodiversity loss, global warming and ocean acidification.
The raw data of various types are stored in public databases, and derived data are integrated and hosted by a data hub and dedicated portal – which is an example of the interplay between federated and centralized resources. To integrate the data, various tools are used, including some co-developed and supported by SIB such as STRING and mOTUs.
The first TREC expedition was a collaboration with The Tara Ocean Foundation, the European Marine Biological Resource Centre, and many institutes and marine stations across Europe. It’s an ambitious initiative to bring molecular biology closer to environmental sciences. Going forward, we will build on this initiative by applying the same principles of standardized data collection, storage and analysis to, for example, freshwater ecosystems.
Molecular biology touches all living things, and has enormous potential to contribute to other areas of the life sciences, from human health to agriculture and food security, environmental sciences and beyond. The possibilities are truly endless.
Reference(s)
Peer Bork speaking at the 2025 [BC]2 Basel Computational Biology Conference. Credit: SIB