Paper

The methodology reminds me of some of the work I’ve been doing with PPI using this emerging source of data: ad hoc classifiers enabled by LLMs to process unstructured text and multimedia data sets. In the ChatGPT paper they used four different classifiers: work/not work, topic, asking/doing, and work activities. Crucially, they didn’t actually look at the users’ conversations, merely the output of the classifiers. This addresses some of the serious privacy concerns one would otherwise have in this sort of situation. This leaves them without the ability to do something like PPI using a random sample that’s annotated by humans. But they could do the next best thing and use some open datasets of prompts to validate that their classifier is roughly doing the right thing. It is rather fragile: what if the public chats differ substantially from real chats (especially as time passes), but it’s certainly the best one can do while respecting privacy. It’ll be interesting if we end up back with survey sampling just to seek consent for the data to perform these classifier correction techniques.

The empirical data were interesting: folks use ChatGPT for a lot less writing than I expected. But the real gem of the talk was when Hitzig introduced an economic model for performing tasks with access to LLMs. She models three ways to perform a task: do it yourself, have the AI do it, or collaborate with the AI. Performance is modeled as trying to hit a context-dependent target \(a(\omega)\) with \(\omega\) coming from a prior distribution \(F\) with noisy implementations and quadratic loss.

  • DIY has cost \(c\), but approaches the right contextual target \(a(\omega)\) with variance \(\sigma_{DIY}\).
  • AI is costless, but it’s biased towards the mean of the broader prior distribution over contexts \(\bar{a} = \argmin E_{\omega}[L(a(\omega), a(\omega))]\). And of course it has its own variance \(\sigma_{AI}\).
  • The assistant option incurs costs \(c\) and also an additional collaboration cost \(\eta x\) with intensity \(x\) controlling how strongly the collaboration reduces the combined variance.

Unfortunately, I can’t find this work published anywhere, but the key finding was that you essentially should stop doing tasks yourself whenever the model has passed a certain threshold of performance (which actually happens before the AI is better than you, since you get some benefit from not having to pay \(c\)). Then the choice of whether to use the AI or use it as an assistant becomes a function of both how good the AI is and how far away the average answer \(\bar{a}\) is from the right contextual answer \(a(\omega)\). The more you need context, the more often you’ll need to use AI as an assistant as opposed to delegate completely.

This matches my intuition on tasks, although I suspect we’re still well in the realm where DIY makes sense. Though I could see a time when it would be negligent not to at least have the AI assist you in tasks. One element missing from the model is the impact of learning: one could introduce such a term where after repeated exposure to the task your variance decreases. Then it becomes a tradeoff between higher performance now using the AI as opposed to higher performance later given your improved skill. And of course, it depends on whether the task requires sufficient contextual knowledge that your improved skill actually becomes useful.