Experiments are expensive and take a long time. Wouldn’t it be nice if we could just predict how they would turn out? This very simple motivating example for Kallus’ talk convinced me that I should start paying more attention to the causal inference literature.

Not that the concept is new to me. In my adtech days we would do this fairly often: predicting long-term user behavior from short-term metrics (paper). One aspect that I didn’t appreciate at the time is that this measurement is biased. Supposing you have noisy estimates you’re adding spherical noise to our points (and it gets worse if you assume some dependence in the noise between your short-term and long-term) biasing your estimates downwards. But that’s just simple measurement error.

Kallas points to the standard correction for this: jackknife instrumental variables. Each individual AB test is an exogenous shock to the system that’s randomly assigned and thus useful for providing variation and indeed you can use this variation to get your correction (with a linear model). He then spent the remaning part of the talk motviating the non-parametric version.

This is quite a bit trickier as there is always a best linear approximation: with a fully non-parametric function you end up with an ill-posed problem: getting a function which minimizes your prediction error doesn’t imply that you’ll be anywhere close to the true function. Through some math that I need to learn about and understand better Kallas shows that if you restrict yourself only to the thing that you really need: the point estimate and not the funciton itself, you can still learn this.

Quite fascinating talk with practical applications. I’ve always been slightly skeptical of casual inference. However it’s clear I need to read Kallus’s book: Applied Causal Inference Powered by ML and AI.

Upon downloading the book I quickly realized I had attempted it in the past. The first chapter had scared me away: they develop an example relating prices and sales ranks for toy cars on Amazon. And they work through various estimates of the casual effect each time they come up with a new 95% confidence interval which only barely intersects with the prior estimate. And I can’t help but suspect that we’re not at the end of the line and there inevitably will yet another almost completely different estimate from a new method. This lack of convergence bothers me but I will keep reading and hopefully they will calm my nerves.