top of page

Introduction to Causal Inference


Divisions of Biostatistics & Epidemiology

UC Berkeley


This course presents a general framework for causal inference. Directed acyclic graphs and non-parametric structural equation models (NPSEM) are used to define the causal model. Target causal parameters are defined using counterfactuals and marginal structural models. G-computation estimators, inverse probability weighted estimators, and targeted maximum likelihood estimators are introduced. Non-parametric and semi-parametric approaches to nuisance parameter estimation, with an emphasis on Super Learning, are presented. Students gain practical experience implementing these estimators and interpreting results through discussion assignments, R labs, and R assignments. 

Course Learning Objectives

By the end of this course, students should be able to

  1)  Translate a scientific question and background knowledge into a causal model and target causal parameter using the Structural Causal Model (SCM)/counterfactual frameworks.

  2)  Assess identifiability of the target causal parameter and express it as a parameter of the observed data distribution.

  3)  Understand the challenge posed by the curse of dimensionality; be familiar with and able to apply data adaptive (machine learning) approaches.

  4)  Understand the properties of and be able to apply three classes of causal effect estimators.

  5)  Begin to develop familiarity with the uses of a formal causal framework for investigating a wide range of questions about the world works.

Part I: From causal questions to the statistical estimation problem

Lecture 1: A General Roadmap for Tackling Causal Questions

Lecture 2: Pearl’s Structural Causal Model (SCM)

Lecture 3: Defining Target Causal Quantities: Link between SCM and Counterfactuals

Lecture 4: Defining the Observed Data and its link to the SCM

Lecture 5: Identifying Causal Effects 


Part II : Statistical estimation and interpretation

Lecture 6: Introduction to Estimation

Lecture 7: Introduction to Data-Adaptive Estimation and Super Learning

Lecture 8: Estimation of Causal Effects with Data-Adaptive Methods

Lecture 9: The Propensity Score and Inverse Probability of Treatment Weighting (IPTW)

Lecture 10: IPTW for Marginal Structural Model (MSM) parameters

Lecture 11: Introduction to Targeted Maximum Likelihood Estimation (TMLE)

Lecture 12: Interpretation, Wrap up and Where Next?


Discussion Assignments:

Assignment 1: For two redacted real studies, apply the first steps of the roadmap to (i) specify the scientific question, (ii) represent knowledge with a SCM, and (iii) specify the target causal parameter.

Assignment 2: For the same studies, specify the observed data, assess identifiability, specify the statistical estimand, and discuss the needed positivity assumption.


R Labs & Corresponding Homework:

Lab & Hw 1: Defining the causal parameter and introduction to simulations in R

Lab & Hw 2: Identifiability, linking the observed data to the causal model, and implementation of the simple substitution estimator based on the G-computation formula

Lab & Hw 3: Cross-validation and data-adaptive methods for prediction

Lab & Hw 4: Inverse probability of treatment weighting (IPTW) estimators and the impact of positivity violations

Lab 5: Targeted maximum likelihood estimation (TMLE)

Lab 6: Inference with the non-parametric bootstrap and with influence curves for TMLE


Final Project: 

Fully apply each step of the causal roadmap to a real-world problem

Suggested background readings for each topic/section of the course are provided. Helpful references are also provided at appropriate points in the lecture slides. Please note that the listed references are NOT intended as a complete bibliography, but only as helpful entry points to the material. 

bottom of page