Blog Post

I’ve recently been introduced to the idea of tier list for causal analysis. As expected by Baader-Meinhof the perfect example pops up in Cal Newport’s newsletter and podcast.

Newport is responding to an op-ed by Stephen Kurczy in the Washington Post which discusses a unique situation: in the town of Green Bank, West Virginia there’s an observatory and a school. For many years the school was not allowed wifi as it would interfere with the observatory. So now the question becomes: was there a negative impact on the students due to the inability to use wifi?

Starting at F tier is simple correlation: it turns out that Green Bank scores the lowest on standardized tests in its county. So a difference in test scores exists, wifi is also another difference between the schools, thus they’re probably related? But there’s a reason why statistician’s mantra is “correlation is not causality”: there’s a lot of other things that could also explain that gap. In particular it seems like there’s a pretty easy explanation: Newport reports that the other school in the county houses the Gifted and Talented program. With that context it’s not at all surprising that Green Bank would score lower regardless of the wifi.

Newport then starts edging up in into D tier territory pulling out time series data for a before-after analysis. The idea being that before the last couple years there wasn’t really an effect of not having wifi: no school really used that much wireless technology in the classroom. So we’ll look at the gaps: due to the G&T and other potential differences we’ll see a gap between Green Bank and the other schools in the county. Crucially if there is an effect due to wifi we should see that gap widen when the other schools start using it (and Green Bank does not). And indeed that’s what Newport finds: while all the schools see drops in performance from 2017 to 2022 [Newport’s identification of the start of wifi impact], Green Bank sees a lower drop. The gap actually got smaller suggesting a negative impact for wifi. Unfortunately this is not much better than the simple correlation: a lot of other things happened during that time that could also explain the gap. Newport suggests that deteriorating economic conditions likely led to the drop in performance in all schools. Green Bank, being situated next to the federal observatory, could have been more robust to those effects and that instead could explain our observation.

So what would a C/B/A tier analysis look like?

We need to start controlling for all of these potential confounders. As a start we should control for economic factors and standard educational variables such as classroom size, G&T status, etc. We’d then create a synthetic control matching different schools against Green Bank. However it’s quite unlikely we’d be able to able to observe and collect all of these so it’s likely closer to C tier. And we inevitably run into the problem that our treatment group is just one school: there’s so much that could be idiosyncratic that it’s not clear even with omniscience we’d be able to match perfectly.

A better design that edges into B tier would be something like a decently controlled difference-in-differnce. So instead of comparing Green Bank against other schools we can compare students with themselves. Say the general practice is that fourth graders don’t use wifi in most schools, but then starting in fifth grade they do. So our effect1 is the difference between schools of the differences between test scores in fifth grade and those of the same students in fourth grade.

Of course A tier is a proper randomized controlled experiment. By randomizing which schools are allowed / not allowed wifi you get, in expectation, balance on all confounders both observed and not. You also can bump up the sample size to get a better estimate of the effect. This is, of course, not perfect. In particular I would worry about the treatment being stable: I could image parents moving their kids from no-wifi schools to wifi schools (or possibly the reverse); those parents are likely highly involved in their student’s academics which probably leads to higher test scores and thus is a new source of confounding2. To get to S tier you could double blind it, but that seems impossible with this sort of treatment.

So what should we conclude from this? Is there a detriment from lacking wifi or is it maybe even an advantage? Does the D tier analysis is win out over the F tier? Or is it too fatally flawed itself? This is where the art comes in and more importantly common sense.

We’ll likely never have a large volume of A/S tier randomized experiments so we don’t want to be too cautious and reject anything less pristine 3. But neither do we want to put confidence in conclusions that can flip flop every time we control for another factor 4. In this particular case though I don’t think we have enough evidence to have anything more than suspicions. My general rule of thumb: if I can tell a reasonable story that provides an alternative explanation your causal analysis doesn’t work.


  1. Technically this is just the marginal treatment effect of 5th grade with wifi; if the real effects only show up in 7th grade we’d miss it ↩︎

  2. note that this already is implicitly happening for the observational data above and represents an unobserved confounder ↩︎

  3. Fisher for instance was a notorious apologist for the tobacco industry denigrating all observational data ↩︎

  4. I thought it was satire when the first chapter of the Causal ML Book goes through a half dozen somewhat contradictory estimates of the same causal effect and kept on going instead of despairing ↩︎