GSMS Course Registration

Course information

Course: Gene Expression Data for Beginners

The course will probably be online.

Summary: Online genome databases are rapidly expanding and are already central to the biological and medical research. Such massive amount of information needs basic understanding of how to find the data of interest and what to do with those data. In this course we will mostly focus on gene expression data, since they are the most commonly used in contemporary research and scientific publications. Students will practice with data retrieval mostly from GEO. We will practice with three types of data: expression microarrays, bulk RNAseq and single-cell RNAseq data. Students will learn the most typical protocols of data processing and statistical analysis for all three cases. Attention will be given to data normalization, filtering, annotation, and visualization. Optionally, elements of ChIPseq, as well as to genomic (promoter) motifs and gene ontology tools will be covered. Students will be introduced to basic programming in Python, R and few other script types. Students must be interested in learning R and Python programming at the beginner’s level. Expect at least equal amount of time spent on the homework between the sessions. The course will include exercises with data retrieval, processing, and basic elements of analysis. Use of own data is welcomed. Because of coronavirus pandemic, the course will be likely carried out online.

Learning outcomes: 

                   ·  Understanding the sources of gene expression data and approaches for data analysis

                   ·  Understanding the principles of R and Python packages use for genome data analysis. Basic knowledge of using and writing scripts in R and Python.

                   ·  Understanding basics of the experimental design and statistical robustness of the data sets.

        Assumed pre­knowledge A good understanding of molecular biology and genetics. Having experience with programming is helpful, but is not obligatory. The desire to learn programming is required.

        Equipment Personal laptop with 8 GM RAM or more, Mac, Linux, or Windows10. R (with R-studio) and Python3 (installed as standalone or via Anaconda, optionally PyCharm) can be preinstalled, or will be installed during the course.

        Compulsory literature No

        Recommended Articles and online resources will be recommended during the course

in principle 100% participation required (although one missing seminar is accepted provided good reasons and good performance, but homework is still required), final assignment is based on successful homework.

Course schedule:

7 October  09:00-12:00

14 October  09:00-12:00

21 October  09:00-12:00

28 October  09:00-12:00

4 NOvember  09:00-12:00

11 November  09:00-12:00

18 November  09:00-12:00


October 7–November 18, 2020



EC (without exam)


Location address

to be announced

Course coordinator

  • Leonid Bystrykh, PhD





Deadline for registration

September 21, 2020


Register now

Back to listing