Advent of Code 2025 (GenAI edition)

Advent of Code is one of my favorite times of year. It’s a daily programming challenge with a delightful frame story and excellent puzzle design. You typically get something relatively straightforward as the first part of the day’s problem: parse some file, do a little computation, and pretty soon you’re done. But then there’s the twist: for the second part of the problem, only revealed after you have done the first, it turns out things are more complicated. Suddenly that small file turns out to be cleverly compressed and now your inputs are 100x larger and your puny O(n^2) algorithm just isn’t going to cut it anymore. And the fun begins…

Of course there’s different sorts of fun. The leaderboard tracks how fast people can solve the problems: in grad school I loved competing¹ on speed (and the ability to stay up past midnight). Lots of people use AoC as an excuse to try out new languages: there’s a lot of 25 Days of AoC in 25 languages post series out there.

There’s been a lot of controversy regarding the use of AI. It turns out that AI is surprisingly good at Advent of Code problems. My own shock was my cousin-in-law-in-law using cursor to (mostly) oneshot the last couple days of problems while admitting to (mostly) not knowing what was going on. The one saving grace was some fundamental asymmetry: when it was easy he was done quite quickly. But for certain harder problems it took him hours if he finished at all. So I could carry on with my hand-coding ways.

Indeed the site’s stance is that using AI to solve the puzzles fails to understand the point:

Should I use AI to solve Advent of Code puzzles? No. If you send a friend to the gym on your behalf, would you expect to get stronger? Advent of Code puzzles are designed to be interesting for humans to solve - no consideration is made for whether AI can or cannot solve a puzzle. If you want practice prompting an AI, there are almost certainly better exercises elsewhere designed with that in mind.²

The year since has been an interesting ride. This is confounded with moving up to SF from Sunnyvale, but it seems like everywhere I go I’m bombarded with advertisements for AI IDEs or “Agentic $Anything”. At work there are AI “adoption” initiatives and perpetual doomsaying of full-scale replacement of knowledge workers.

I’ve admittedly been a bit of a skeptic. It’s not that I don’t use these tools: it’s a nice replacement for stack overflow. But I am quite cognizant of Gell-Mann amnesia: when I want the AI to do work that’s within my area of expertise I can easily spot an alarming number of deficiencies (and correct them). So for things outside of my area of expertise I try to maintain the same skepticism even as my ability to do the fact-checking dwindles. As such I haven’t found a lot of professional use cases aside from moderately speeding up the writing of straightforward simulations [which even then were wrong and required significant intervention on my end].

But as for blindly continuing on with this skepticism such is a recipe for becoming irrelevant. With the presumed state of progress in AI dismissing the tools early on could be misleading. So every now and again it behooves one to dip their toe back in if only to confirm the turmoil is simply froth and not a turning of the tides.

To that end I’ll be using this year’s AoC to checkout Gemini CLI. Using an AI IDE is a bridge too far at this point: I like the fact that it’s runnable on the command line and most importantly I can keep my beloved emacs. This still mirrors the potential application area (indeed solving these sorts of small file manipulation / mini-algorithmic problems would be a complement to my existing work without encroaching on replacement ³.

I’m not quite imagining this as a John Henry-esque battle between man and machine: I’m just not that heroic (and have no intention of working myself to death just to prove a point). Instead it’s an experiment in how well AI works for these problems, how to work with AI tools, and most importantly, is it still any fun and did I learn anything? Generally speaking I’m aiming to solve the problem on my own and then either try to one-shot or co-develop a solution with the AI.

Of particular interest will be my confidence in the results. My usual day job ends up on a place without the usual guardrails of testing. A unit test isn’t going to tell me if there is a filter that’s missing a SQL query or whether the specified model doesn’t make sense. Maintaining the ability to exert expert judgment while coping with the volume of AI assistance will be key ⁴. Advent of Code is perfect for this: it’s not straightforward enough for me to simply spot an error but also not complicated enough that I can’t verify the correct solution and engineer test cases to poke at presumed solutions.

23rd on the global leaderboard in 2016! ↩︎
One might beg the question why I’m using AoC for this exercise: the simple answer is lack of time. I’m going to be doing AoC regardless so might as well bundle this in. I have thought that Project Euler might be a great test bed for figuring out how well it works on math. But I certainly don’t have time to relapse on PE. ↩︎
“making the easy things easier and the hard things harder” seems great from my perspective: more time for me to spend on the interesting hard things ↩︎
Lest I find myself as an accountability sink ↩︎