Manual expert curation of the leading protein information resource is coping well with the ever-growing biomedical literature, a recent study shows.
Distilled knowledge, extracted from the literature by expert curators, makes up essential life science resources, or knowledgebases. While the importance of these resources is increasingly recognized, the question of their sustainability is frequently raised. How can a tight number of curators keep up with the rapid growth of biomedical literature, i.e. currently over 1 million papers published a year?
A recent study led by Sylvain Poux (Head of Biocuration) and colleagues from the SIB Swiss-Prot Group, reveals that, despite appearances, expert biocuration is indeed sustainable. To reach this conclusion, the authors used UniProtKB/Swiss-Prot – the expertly curated section of the UniProt knowledgebase, to which the SIB Swiss-Prot Group provides most of the content – as an example. Over a 6-month period they tracked the literature triage process and studied the proportion of relevant vs non-relevant papers with regards to curatable articles. The team shows that the number of papers curated (8-10,000 per year) is actually very small with regards to the total number of papers evaluated by curators (50-70,000 per year). The reason for this difference is that only up to 2–3% of all publications indexed in PubMed each year appears to be relevant for UniProt curation. It therefore seems that the sheer amount of biomedical papers published each year is a poor way to evaluate biocuration effort.
These results demonstrate, for one, that expert curation in UniProt can keep up with the increasing number of publications. They also underline that, more than ever, curators play a crucial role in shedding light on knowledge from a growing body of publications, to ensure for example that only the most adequate and informative evidences are reported.
Reference: Poux S et al. On expert curation and scalability: UniProtKB/Swiss-Prot as a case study. Bioinformatics, July 2017, https://doi.org/10.1093/bioinformatics/btx439