Suppose you want to know what proportion of X posts are shilling for crypto? Nowadays you’d concoct an LLM prompt, ask it to classify tweets appropriately and then report out the proportion classified by your model as crypto shills.

You’d however have a nagging suspicion: what if the LLM is wrong? So you go back and check some examples: it’s generally correct but it messes up a noticeable fraction of the time. Sadly this suggests that your resulting estimate suffers from garbage-in-garbage-out.

However there is something still appealing about your estimate. Remember: “all models are wrong, some are useful”. Perhaps with some cleverness you can recover a reasonable estimate of prevalence. Indeed I will show that you can and in a relatively unclever way (though the path there is clever).

Prediction Powered Inference

When thinking about imperfect classifiers the latest game in town is Prediction Powered Inference (PPI). This is some interesting statistical machinery to unbias estimators which use black box algorithms as plug-in estimates for their targets.

The general idea is to learn a rectifier which adjust for the bias of your estimator. They extend this to a wide class of estimators that can be framed as the solution of convex optimization problem, but we can get the general gist through the example of estimating a mean. Conveniently this is the same as prevalence for our binary outcome.

We’ll introduce the following notation

  • \(\hat{f}(x_{i})\) is our classifier’s binary output on example i. This is our LLM classifier in our crypto tweet framing.
  • \(f(x_{i})\) is the true label.
  • \(S_{N}\) is a large dataset where we only observe \(\hat{f}\). This is our full dataset of tweets.
  • \(S_{n}\) is a small dataset where we observe \(\hat{f}\) and \(f\). Notably this is iid with \(S_{n}\). This is a simple random sample1 of tweets where you have to sit down and do old-fashioned human labeling.

The original estimate we were using was

\[ \sum_{x_{i} \in S_{N}} \hat{f}(x_{i}) \]

and we can consider the bias of this estimator as

\[ E[\sum_{x_{i} \in S_{N}} \frac{\hat{f}(x_{i})}{N} - \sum_{x_{i} \in S_{N}} \frac{f(x_{i})}{N}] \]

But we note that since SN and Sn are iid then this term is equivalent to

\[ E[\sum_{x_{i} \in S_{n}} \frac{\hat{f}(x_{i})}{n} - \sum_{x_{i} \in S_{n}} \frac{f(x_{i})}{n}] \]

and since we have access to both \(f\) and \(\hat{f}\) in \(S_{n}\) we can empirically estimate this as

\[ \delta = \sum_{x_{i} \in S_{n}} \frac{\hat{f}(x_{i}) - f(x_{i})}{n} \]

We call this term \(\delta\) the rectifier thus our final estimate is

\[ \sum_{x_{i} \in S_{N}} \hat{f}(x_{i}) - \sum_{x_{i} \in S_{n}} \frac{\hat{f}(x_{i}) - f(x_{i})}{n} \]

As an exercise you should work out the variance of this and you’ll see that as the model gets better and starts making fewer mistakes the lower your estimated variance will be. Crucially you need your classifier to be relatively decent (but not too decent otherwise you might as well use \(\hat{f}\) itself).

This is very cool and already quite useful. Then there’s a paper PPI++ which introduces a new estimator

\[ \lambda \bar{\hat{f}}_{N} + ( \bar{f}_{n} - \lambda \bar{\hat{f}}_{n} )\]

where \(\lambda\) is now a tuning term which controls how much PPI you do: when \(\lambda = 0\) it’s the original estimate and at \(\lambda = 1\) it’s the PPI estimate. And they give some formulas on how to find the optimal \(\lambda\).

This is the part that gets quite interesting as this particular choice of λ it ends up being the solution to a linear regression problem: \(f \sim \beta_{1} * \hat{f} + \beta_{0}\).

Indeed the earlier PPI esimate was also the solution to a linear regression problem: \(f \sim \hat{f} + \beta_{0}\) just fixing the coefficient \(\beta_{1}=1\). Indeed our original naive estimate is also sort of a regression estimate \(f \sim \hat{f}\) which learns no parameters but fixes \(\beta_{1}=1\) and \(\beta_{0}=0\).

As another practical matter this interpretation helps us to understand what to do when we introduce another classifier. Within the original PPI framework it’s hard to say how to do this: you can I suppose get a PPI estimate for each classifier and then maybe average them? But with this understanding of the linear regression it’s clear you can just add the new classifiers as a new covariate! So you can try multiple prompts or different baseline models for your classifier and throw them all in together!

Why does this make sense?

It’s admittedly a bit convoluted how we get here, but in the end this solution is fairly elegant. The best way to think about this is as a form of post-stratification. You use \(S_{n}\) to determine the proportion of positives for each label bucket and then use \(S_{N}\) to precisely estimate the size of the buckets in the population. Indeed this is what you might have initially just tried to do if you hadn’t heard about PPI.

Or you can interpret this as a form of calibration: suppose the additional classifiers you add are simply different versions of your original classifier just at different threshold. You’d basically be estimating the calibration curve (albeit without forcing it to be a monotonic).

Finally, like many things in statistics this was worked out decades ago by survey statisticians. This particular approach is called generalized regression estimates (GREG).


  1. You can be more clever in your sampling just so long as you can reweight back to a SRS ↩︎