Protein sequence databases and sequence annotation

03 March 2020
Cancellation deadline:
29 February 2020
Marie-Claude Blatter, Elisabeth Gasteiger, Ivo Pedruzzi, Anne Morgat
0.25 credits

No future instance of this course is planned yet


Using high-throughput technologies, you can identify long lists of candidate genes that differ between two experimental conditions. In order to interpret these gene lists and to discover fundamental properties like gene function and disease relevance, you need to use the annotation linked to a given gene or protein sequence.

The goals of this course and the practical exercises that follow are to give some basic theoretical and practical knowledge on protein sequence databases with a focus on UniProtKB, on Gene Ontology, on the different manual and automated annotation pipelines (such as HAMAP) and, in particular, on the optimum use of UniProt. During the theory and the practical sessions, we will discuss questions such as:

  • Where do the protein sequences come from?
  • What are the differences between the major protein sequence databases?
  • What are the manual and automated gene / protein annotation pipelines?
  • What are the Gene Ontology (GO) annotation pipelines?
  • How to assess protein sequence accuracy and annotation quality?
  • How to extract biological knowledge from a Blast result or gene list?
  • How to mine enzyme data in UniProtKB using chemical structure data and chemical classifications from the Rhea resource of biochemical reactions?


This course targets biologists and bioinformaticians who seek to analyse protein data. It will also be useful for people who programmatically access protein sequence databases and need to understand the data.

Min 12 participants.

Learning objectives

At the end of the course, the participants are expected to:

  • know the differences between the major protein sequence databases
  • understand the major sequence annotation pipelines and the GO annotation pipelines
  • assess the protein sequence accuracy and the annotation quality


Knowledge / competencies



You are required to bring your own laptop with an Internet connexion.


09h00 Welcome

09h15 Protein sequence databases

10h30 Pause

11h00 Practicals

12h30 Pause

13h30 Automated annotation pipeline: theory and practicals

15h00 Pause

15h30 How to mine enzyme data: theory and practicals

17h00 End


The registration fees for academics are 60 CHF, and 300 CHF for industrial participants. This includes course content material and coffee breaks.

Deadline for registration and free-of-charge cancellation is set is set to 26 February 2020. Cancellation after this date will not be reimbursed. Please note that participation to SIB courses is subject to our general conditions.

You will be informed by email of your registration confirmation.


University of Lausanne (Metro M1 line, Sorge station). More details to the registered participants.

Additional information

Coordination: Grégoire Rossier

We will recommend 0.25 ECTS credits for this course (given a passed exam at the end of the course).

You are welcome to register to the SIB courses mailing list to be informed of all future courses and workshops, as well as all important deadlines using the form here.

For more information, please contact