Introduction to Causal Inference

Material by Maya Petersen, M.D Ph.D & Laura B. Balzer , Ph.D

Image credit:

Keegan Houser

Summary

With the ongoing “data explosion”, methods to delineate causation from correlation are perhaps more pressing now than ever. This course will introduce the Causal Roadmap, which is a general framework for Causal Inference: (1) clear statement of the research question, (2) definition of the causal model and effect of interest, (3) specification of the observed data, (4) assessment of identifiability - that is, linking the causal effect to a parameter estimable from the observed data distribution, (5) specification of the statistical estimation problem, (6) choice and implementation of estimators, including state-of-the-art methods, and (7) appropriate interpretation of findings (Petersen & van der Laan, Epi, 2014; Figure). The statistical methods include G-computation, inverse probability weighting (IPW), and targeted minimum loss-based estimation (TMLE) with Super Learner, an ensemble machine learning method. The emphasis will be on practical implementation and real-world challenges and solutions. You will gain experience working through the Roadmap with case studies, R labs, R assignments, and a final project using real data. By the end of the course, you will have the practical tools to assess cause-and-effect in your applied work.

Spring 2024 Syllabus

Course Learning Objectives

By the end of this course, students should be able to

1. Translate your research question and knowledge into a causal model (directed acyclic graphs and non-parametric structural equation models).

2. Define the target causal parameter with counterfactuals.

3. Assess identifiability of the target causal parameter and express it as a parameter of the observed data distribution.

4. Explain the challenges posed by parametric estimation approaches and apply machine learning methods.

5. Identify the properties of and apply three estimators: G-computation, inverse probability weighting (IPW), and targeted minimum loss-based estimation (TMLE) with Super Learner.

6. Explain how to appropriately address missing outcomes, which may be differentially measured.

7. Apply course concepts to address cause-and-effect in a real-data application in your final projects.

8. Explore more advanced settings for Causal Inference, such as time-dependent exposures, clustered data, and continuous exposures.

Roadmap Overview & Roadmap Step 0 - Research Question

Learning objectives:

Identify the distinction between causal and statistical inference
Describe the Causal Roadmap, which serves as the framework for the course

Corresponding Materials:

Roadmap Step 1 - Causal Model

Learning objectives:

Explain how causal models encode our knowledge about the system that we are studying – including the roles of exclusion restrictions and independence assumptions
In R, simulate data from a specific data generating process, reflected in the causal model

Corresponding Materials:

Roadmap Step 2 - Counterfactuals & Causal Effects

Learning objectives:

Explain how to formally specify our research question in terms of causal effects, which are summaries of counterfactual outcomes
In R, generate counterfactuals and evaluate the target causal effect with simulations
Apply Steps 0-2 of the Causal Roadmap to a case study in Journal Club 1

Corresponding Materials:

Roadmap Step 3 - Observed Data

Learning Objective:

Explain how the observed data and statistical model are related to the causal model
Describe the distinction between parametric, semi-parametric, and non-parametric statistical models.

Corresponding Materials:

Roadmap Step 4 - Identifiability & Step 5 - Estimation Problem

Learning Objective:

Explain the assumptions needed to express our causal parameter as a function of the observed data distribution (i.e., a statistical parameter)
Define the statistical estimation problem
Apply Steps 3-5 of the Causal Roadmap to a case study in Journal Club 2

Corresponding Materials:

Roadmap Step 6A - Estimation with Gcomp

Learning Objectives:

Explain how to implement a simple substitution estimator based on the G-computation formula
Describe the limitations of using parametric regressions
Apply simulations to evaluate estimator performance

Corresponding Materials:

Roadmap Step 6B - Estimation with IPW

Learning Objective:

Understand & implement the inverse probability weighted (IPW) estimator
Explore the impact of positivity violations on estimator performance

Corresponding Materials:

Roadmap Step 6C - Super Learner

Learning Objectives:

Explain the dangers of using parametric regressions and ad hoc approaches to statistical estimation and inference
Understand and implement Super Learner, an ensemble machine learning method

Corresponding Materials:

Overview Video
Lecture 10
R Lab 4 | Data | Super Learner Wrapper Functions | Answers
R HW 4 | Data | Super Learner Wraper Functions

Roadmap Step 6D - TMLE

Learning Objectives:

Understand why machine learning is not enough for causal inference
Describe how to implement TMLE and describe its properties

Corresponding Materials:

Roadmap Step 6E - Inference & Step 7 - Interpretation

Learning Objectives:

Implement the non-parametric bootstrap for variance estimation & confidence interval construction.
Appropriately interpret the results of our study

Corresponding Materials:

Applying the Roadmap for Missing Data

Learning Objectives:

Explain how the Causal Roadmap can be applied for missing data
Describe the real-data application in the SEARCH study

Corresponding Materials:

Next Directions & New Frontiers

Learning Objectives:

Describe how the Causal Roadmap can be applied in more advanced settings

Corresponding Lectures:

Final Project

Final Project Guidlines
The final project provides an exciting opportunity to apply what you have learned to a real-data problem. Select a point-treatment problem (a single intervention node) with a binary exposure and in an observational setting, as that has been the focus of this course. The project asks you to explicitly and thoughtfully apply each step of the Causal Roadmap to this problem. This does not mean you must have a perfect data analysis and presentation at the end of the semester. High quality analysis of real-data takes time, and many of you will encounter challenges that we have not yet learned how to address. A good project, rather, is one in which you have thought hard about each step in the Roadmap, have done your best given your training, and have clearly identified limitations and next steps.

Readings

Suggested background readings for each topic/section of the course are provided. Helpful references are also provided at appropriate points in the lecture slides. Please note that the listed references are NOT intended as a complete bibliography, but only as helpful entry points to the material.

1. M.J. van der Laan and S. Rose. Targeted Learning: Causal Inference for Observational and Experimental Data. Springer, Berlin Heidelberg New York,2011.

○ Available for UC Berkeley students here

2. J. Pearl. Causality. Models, Reasoning, and Inference. Cambridge University

Press, 2000, 2nd Ed 2009.

○ Available for UC Berkeley students here

This course was awarded the 2014 Causality in Statistics Education Award by the American Statistical Association

We would like to thank Mark van der Laan for his contributions to the development of this course. We would also like to thank the former Graduate Student Instructors (GSIs) and our students for their valuable feedback to the course content and organization.

Suggested citation for the course:

M. Petersen and L. Balzer. Introduction to Causal Inference. UC Berkeley, August 2014. <avaliable at https://ctml.berkeley.edu/introduction-causal-inference>

Introduction to Causal Inference by Maya Petersen & Laura Balzer is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

This work by CTML Faculty and/or Staff is licensed under CC BY 4.0

Introduction to Causal Inference

Introduction to Causal Inference

Summary

Spring 2024 Syllabus

Course Learning Objectives

Roadmap Overview & Roadmap Step 0 - Research Question

Roadmap Step 1 - Causal Model

Roadmap Step 2 - Counterfactuals & Causal Effects

Roadmap Step 3 - Observed Data

Roadmap Step 4 - Identifiability & Step 5 - Estimation Problem

Roadmap Step 6A - Estimation with Gcomp

Roadmap Step 6B - Estimation with IPW

Roadmap Step 6C - Super Learner

Roadmap Step 6D - TMLE

Roadmap Step 6E - Inference & Step 7 - Interpretation

Applying the Roadmap for Missing Data

Next Directions & New Frontiers

Final Project

Readings

Thank You!

School of Public Health