Emily Riederer wrote an article titled Causal design patterns for data analysts.
« Causal inference is complex and doing it well requires both statistical and domain expertise. »
« To illustrate potential applications, this post will provide a brief overview of Stratification, Propensity Score Weighting, Regression Discontinuity, and Difference in Differences with motivating examples from consumer retail. Each of these methods deal with situations where different groups receive different treatments but the assignment of groups was not completely random. »
« The types of partial randomization found in your historical data and the types of biases you are concerned about dictate which methods are applicable. In short:
- If you have significant overlap between “similar” treated and untreated individuals but the treatment was not randomly assigned, stratification or propensity score weighting can help you rebalance your data so that your treated and untreated groups have a more similar distribution of traits and their average outcomes are more comparable
- If you have disjoint treated and untreated groups partitioned by a sharp cut-off, regression discontinuity allows you to measure the local treatment effect at the juncture between groups
- If treatments are assigned to different populations, difference-in-differences and event study methods help to compare different groups across multiple time periods.
« Causal inference requires investment in data management, domain knowledge, and probabilistic reasoning. »
« Domain knowledge is essential to validating the assumptions. Unlike other forms of inference (e.g. a basic linear regression), many of the assumptions we discussed for the methods above cannot be computational or visually assessed (e.g. not like the quintessential residual or QQ plots). Instead, assumptions rely largely and careful attention to detail combined with intuition and background knowledge from one’s domain. This means causal inference should necessarily be a human-in-the-loop activity. »
Abbreviations in the article and graphics: CI = causal inference; C = control group; T =Treatment/Test group.
See also Resource Round-Up: Causal Inference.