Introduction to Causal Inference

Introduction to Causal Inference

Instructors: Nerissa Nance & Laura B. Balzer. Material developed with Maya Petersen.

Image credit:
Keegan Houser

Summary

With the ongoing “data explosion”, methods to delineate causation from correlation are perhaps more pressing now than ever. This course will introduce the Causal Roadmap, which is a general framework for Causal Inference: (1) clear statement of the research question, (2) definition of the causal model and effect of interest, (3) specification of the observed data, (4) assessment of identifiability - that is, linking the causal effect to a parameter estimable from the observed data distribution, (5) specification of the statistical estimation problem, (6) choice and implementation of estimators, including state-of-the-art methods, and (7) appropriate interpretation of findings (Petersen & van der Laan, Epi, 2014; Figure). The statistical methods include G-computation, inverse probability weighting (IPW), and targeted minimum loss-based estimation (TMLE) with Super Learner, an ensemble machine learning method. The emphasis will be on practical implementation and real-world challenges and solutions. You will gain experience working through the Roadmap with case studies, R labs, R assignments, and a final project using real data. By the end of the course, you will have the practical tools to assess cause-and-effect in your applied work.
Causal Roadmap

Course Learning Objectives

By the end of this course, students should be able to

1. Translate your research question and knowledge into a causal model (directed acyclic graphs and non-parametric structural equation models).


2. Define the target causal parameter with counterfactuals.


3. Assess identifiability of the target causal parameter and express it as a parameter of the observed data distribution.


4. Explain the challenges posed by parametric estimation approaches and apply machine learning methods.


5. Identify the properties of and apply three estimators: G-computation, inverse probability weighting (IPW), and targeted minimum loss-based estimation (TMLE) with Super Learner.


6. Explain how to appropriately address missing outcomes, which may be differentially measured.


7. Apply course concepts to address cause-and-effect in a real-data application in your final projects.


8. Explore more advanced settings for Causal Inference, such as time-dependent exposures, clustered data, and continuous exposures.

Overview of Lectures

Part I: From causal questions to the statistical estimation problem

Lecture 1 Why Bother with Causal Inference?

Lecture 2 Intro to Structural Causal Models (SCMs)

Lecture 3 Defining Causal Effects with Counterfactuals

Lecture 4 Stats Review of Discrete Random Variables 

Lecture 5 Specify the observed data & their link to the causal model

Lecture 6 Overview & Intuition for Identifiability

Part II : Statistical estimation and interpretation

Lecture 8 Overview of Estimation

Lecture 9 Inverse probability weighted (IPW) estimator (with R lab)

Lecture 10 Dangers of not respecting thenon-parametric statistical model during estimation

Lecture 11: Spring Break!

Lecture 12 Why we need alternative estimators?

Lecture 13 Okay! We have a point estimate; what about a variance estimate?

Lecture 14 Applying the Causal Roadmap for Missing Data

Lecture 15 New directions & New frontiers 

Overview of Labs & Assignments

Discussion Assignments 

Assignment 1: For two redacted real studies, apply the first steps of the roadmap to (i) specify the scientific question, (ii) represent knowledge with a SCM, and (iii) specify the target causal parameter.

Assignment 2: For the same studies, specify the observed data, assess identifiability, specify the statistical estimand, and discuss the needed positivity assumption.

R Labs & Corresponding Homework 

Lab & Homework 1: Defining the causal parameter and introduction to simulations in R

Lab & Homework 2: Identifiability, linking the observed data to the causal model, and implementation of the simple substitution estimator based on the G-computation formula

Lab & Homework 3: Cross-validation and data-adaptive methods for prediction

Lab & Homework 4: Inverse probability of treatment weighting (IPTW) estimators and the impact of positivity violations

Lab 5: Targeted maximum likelihood estimation (TMLE)

Lab 6: Inference with the non-parametric bootstrap and with influence curves for TMLE

Final Project

Final Project GuidelinesFully apply each step of the causal roadmap to a real-world problem. 

Readings

Suggested background readings for each topic/section of the course are provided. Helpful references are also provided at appropriate points in the lecture slides. Please note that the listed references are NOT intended as a complete bibliography, but only as helpful entry points to the material. 


1.  
M.J. van der Laan and S. Rose. Targeted Learning: Causal Inference for Observational and Experimental Data. Springer, Berlin Heidelberg New York,2011.

○ Available for UC Berkeley students here

2. J. Pearl. Causality. Models, Reasoning, and Inference. Cambridge University

Press, 2000, 2nd Ed 2009.

○ Available for UC Berkeley students here

Thank You!

We would like to thank Mark van der Laan for his contributions to the development of this course. We would also like to thank the former Graduate Student Instructors (GSIs) and our students for their valuable feedback to the course content and organization.

Suggested citation for the course: 

M. Petersen and L. Balzer. Introduction to Causal Inference. UC Berkeley, August 2014. <www.ucbbiostat.com>


Introduction to Causal Inference by Maya Petersen & Laura Balzer is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 

This work by CTML Faculty and/or Staff is licensed under CC BY 4.0