Thousands of genetic markers have already been robustly associated with complex human traits, such as Alzheimer’s disease, cancer, obesity, or height. To discover these associations, researchers need to compare the genomes of many individuals at millions of genetic locations or markers, and therefore require cost-effective genotyping technologies. A new statistical method, developed by Olivier Delaneau’s group at the SIB Swiss Institute of Bioinformatics and the University of Lausanne (UNIL), offers game-changing possibilities. For less than $1 in computational cost, GLIMPSE is able to statistically infer a complete human genome from a very small amount of data. The method offers a first realistic alternative to current approaches relying on a predefined set of genetic markers, and so allows a wider inclusion of underrepresented populations. The study, which suggests a paradigm shift for data generation in biomedical research, is published in Nature Genetics.
Taking advantage of genomes already sequenced
“Our original thinking was: can we make use of the wealth of sequenced genomes to improve those that are newly sequenced? In other words, more for less: this is exactly what GLIMPSE does,” explains Diogo Ribeiro, Postdoctoral Researcher in Olivier Delaneau’s Group and co-author of the paper. How does it work? By building on the idea that we all share relatively recent common ancestors, from which small portions of our DNA are inherited. Briefly, GLIMPSE mines large collections of human genomes that have been very accurately sequenced (high-coverage WGS) to identify portions of DNA that are shared with newly sequenced genomes. In this way, GLIMPSE can reliably fill in the gaps in the low-coverage data.
A new paradigm for future genomic studies with far-ranging applications
Made available as part of an open-source suite of tools, GLIMPSE paves the way for wide adoption of low-coverage WGS, promoting a paradigm shift in data generation for future genomic studies. Since the first release of the software as a preprint in April 2020, ongoing research has already started to use the tool, for instance to reconstruct the genomes of people living thousands of years ago from ancient DNA, or of COVID-19 patients from SARS-CoV-2 nasopharyngeal swabs as part of a GWAS study.