GSMS Course Registration

All GSMS courses

Only GSMS PhD students (including BCN PhD students) can participate in these courses. Use your P-number to register (ask your Personnel Office).

For registration please go to Upcoming courses.

Course: Beyond Multiple Linear Regressions with R

This course is organized by the Research Master CPE. PhD students are allowed to participate.

General information

Introductory courses in statistics explore models that fit your data under a set of strict assumptions. For example, observations must not include missing data, the response variable must be continuous, observations independent of one another, and residuals be normally distributed. But real world data (WRD) are messy, violating these assumptions.1 This course is about tackling the challenges of real world data. We will introduce the generalized linear models, a modelling framework that encompasses a whole range of strategies to model response variables that are not continuous and whose residuals are not normally distributed. We will model response variables that are binary (e.g., positive/negative diagnosis), count data (e.g., number of tumours). We also explore how to tackle observations that are interdependent, such as clustered data and longitudinal data, consisting of repeated measures over time. Completing the course successfully will enable you to go out into the world and conduct your own research with real-life, messy data, to estimate the parameters of the mechanisms that are generating your data and to estimate uncertainty around those parameters.

Learning outcomes
After completing this course successfully, you are able to:
- Apply generalized linear models (GLMs) as an overall framework, bringing a variety of models under one umbrella (e.g., the Poisson regression, Logistic regression, and Negative binomial regression etc.)
- Apply data exploration strategies to identify what type of generalizing linear mixed effects models is suited for your data
- Analyse real-world data by fitting and interpreting generalized linear mixed effects models
- Build, describe, interpret, expand and compare generalized linear mixed effects models
- Use Quarto to write reproducible reports and GitHub for version control and collaboration
- Effectively communicate results from statistical analyses to a general audience

Textbooks
All books are freely available online. Print copies are also available for purchase.
- Roback, Legler (2020) Beyond Multiple Linear Regression, CRC Press, 1st edition (main course book, freely available online)
- Grolemund, Wickham, O’Reilly, (2016) R for Data Science (freely available online)

Prerequisites
In addition to an introduction course in statistics, you will need to feel comfortable using R, which has become the lingua franca of data science in many academic circles. To prepare, please follow the first FOUR modules from the course:
R Programming Fundamentals.
- Module 1: Introduction, where you will be introduced to the R programming language, its history and evolution
- Module 2: Getting Started, where you learn to set-up R on your device and familiarize yourself with the basic features of R studio
- Module 3: Data Structures, where you learn about vectors, matrices and lists, the basic data structures and data frames used in R
- Module 4: Data Input, where you become acquainted with best practices on importing and exporting data from external sources and programs, such as Excel, into R.

Alternatively, those who would like to brush up on their R programming skills, there are a variety of free resources you can use, such as the SICSS bootcamp, Harvard’s famous R Basics course, or Stanford’s R Programming fundamentals.

Approach
The course will extend over a five-week period, in which the first four weeks will be dedicated to the learning and practicing of the material in the textbook, and the fifth week will be dedicated to your final project.

The activities and assessments in this course are designed to help you successfully achieve the course learning objectives. Each activity and assessment is part of the prepare, practice, perform cycle for each topic. There are three stages to the learning of each of the modules in the course:

- Prepare: Includes reading assignments and videos to introduce new concepts and ensure a basic comprehension of the material.
- Practice: Includes in-class activities and application exercises to explore the topics new topics in more depth. These activities will be completed during lecture. As they are intended for practice, they will not be graded.
- Perform: Includes homework, quizzes, and the projects. These assignments are an opportunity for you to demonstrate your understanding of the course material and how it is applied to the analysis of real-world data.

Topics
The course will cover the following topics:
- Review: multiple linear regression
- Robust regression methods
- Likelihoods and distribution theory
- Poisson regression: Goodness of fit & overdispersion
- Binomial and Ordinal logistic regression
- Correlated data
- Multilevel mixed effects models: inference, estimation and interpretation
- Mixed effects models for longitudinal data
- Covariance structure of observations
- Multilevel Generalized Linear Regression

Evaluation
The final grade for this course is a weighted average of all the assignments. All assignments are assessed individually by two senior staff members. Assessment is done according to course-specific rubrics, which specify the expected learning outcomes for each assignment.

Readings and recorded lectures
Reading and lecture video assignments accompany each topic, primarily from the course textbook Beyond Multiple Linear Regression, but they may periodically include articles and other resources. Recordings of short lectures will be available with application exercises. The activities and application exercises will give you an opportunity to explore concepts in more depth and get practice applying them to real-world data.

Team Projects
There will be four homework assignments during the course. In these assignments, you will apply what you’ve learned as you answer conceptual questions and complete guided and unguided analyses. You may discuss homework assignments with other students; however, homework should be completed and submitted individually.

Readiness assurance quiz
Every session will start with a short quiz during the semester. These quizzes will cover the readings, lecture notes and activities, and any assignments since the previous quiz. More details about the format and content for each quiz will be available as they are assigned.

Bonus assignments
These will be more challenging, optional assignments for those who want to deepen their knowledge and practice more advanced methods.

Location
This course is hybrid. Participants are expected to bring their laptops with them to the lectures and to have a RUG account.

EC (with exam)

3

Course coordinator

  • Ofer Engel

Language

English

Back to listing