pdf

Box puts forth the argument that statistical theory proceeds for the most part due to applications. From Pascal and Fermat on the “problem of points” (https://en.wikipedia.org/wiki/Problem_of_points), Fisher at Rothamsted, and all sorts of WWII problems the practice has led the theory. These practical problems change the assumptions of the existing methodologies and provoke new methodological development. As he writes:

Frequently it is the establishment of a new frame of reference for a problem. This may involve extension, modification or even abandonment of a previous formulation. It has to be understood that statistical problems are frequently not like, for example, chess problems which may require “White to mate in three moves”, given a particular configuration of the pieces. Here a solution based on the pretence that a knight can move like a queen would be unacceptable. Yet the changes in the rules that have sometimes been adopted in reformulation of statistical problems must, at the time of their introduction, have been thought of as little short of cheating – George Box

You can see this more clearly by expanding on his later observation that “for example the existence of fast computers is encouraging the development of new statistical methods which would have been quite impossible without them, and which presage further theoretical development”. With the benefit of hindsight, this effectively becomes the story laid out in Efron and Hastie’s Computer Age Statistical Inference. Bayesian statistics flourished with the MCMC revolution, and deep learning has become a household term (at least in Bay Area households).

The fields of statistics and machine learning have diverged somewhat with the connection between statistical theory and machine learning practice becoming unclear. Many successful models operate in regimes where classical statistical theory suggests they ought not (like double descent). If Box is right, then this represents an opportunity for the field!