Protein sequence databases and sequence annotation

Overview

Using high-throughput technologies, you can identify long lists of candidate genes that differ between two experimental conditions. In order to interpret these gene lists and to discover fundamental properties like gene function and disease relevance, you need to use the annotation linked to a given gene or protein sequence.

The goals of this course and the practical exercises that follow are to give some basic theoretical and practical knowledge on protein sequence databases with a focus on UniProtKB, on Gene Ontology, on the different manual and automated annotation pipelines (such as HAMAP) and, in particular, on the optimum use of UniProt. During the theory and the practical sessions, we will discuss questions such as:

Where do the protein sequences come from?
What are the differences between the major protein sequence databases?
What are the manual and automated gene / protein annotation pipelines?
What are the Gene Ontology (GO) annotation pipelines?
How to assess protein sequence accuracy and annotation quality?
How to extract biological knowledge from a Blast result or gene list?
How to mine enzyme data in UniProtKB using chemical structure data and chemical classifications from the Rhea resource of biochemical reactions?

Audience

This course targets biologists and bioinformaticians who seek to analyse protein data. It will also be useful for people who programmatically access protein sequence databases and need to understand the data.

Min 12 participants.

Learning objectives

At the end of the course, the participants are expected to:

know the differences between the major protein sequence databases
understand the major sequence annotation pipelines and the GO annotation pipelines
assess the protein sequence accuracy and the annotation quality

Prerequisites

Knowledge / competencies

None

Technical

You are required to bring your own laptop with an Internet connexion.

Schedule

09h00 Welcome

09h15 Protein sequence databases

10h30 Pause

11h00 Practicals

12h30 Pause

13h30 Automated annotation pipeline: theory and practicals

15h00 Pause

15h30 How to mine enzyme data: theory and practicals

17h00 End

Application

The registration fees for academics are 60 CHF, and 300 CHF for industrial participants. This includes course content material and coffee breaks.

Deadline for registration and free-of-charge cancellation is set is set to 26 February 2020. Cancellation after this date will not be reimbursed. Please note that participation to SIB courses is subject to our general conditions.

You will be informed by email of your registration confirmation.

Venue

University of Lausanne (Metro M1 line, Sorge station). More details to the registered participants.

Additional information

Coordination: Grégoire Rossier

We will recommend 0.25 ECTS credits for this course (given a passed exam at the end of the course).

You are welcome to register to the SIB courses mailing list to be informed of all future courses and workshops, as well as all important deadlines using the form here.

For more information, please contact training@sib.swiss.