We’ve definitely come a long way. I was somewhat surprised to see tat I had actually participated in the very first advent of code in 2015! And I missed 2018 somehow so that makes it a literal decade of Decembers coding away late in the night.

This short year of problems did not disappoint. We saw classics like the graph algorithms from Day 11, range manipulation in Day 5, and a ton of dynamic programming. We also saw new things: like the trick of Day 12 and Discrete Sum Unions of Day 8. Solving the puzzles is as fun as always and it’s awesome to be sitting on a decade of experience so many of these come out more or less effortlessly while there’s still a couple that teach me something new.

This year was different in my exploration of GenAI tools to solve. It’s hard to justify now but I went in thinking I would be vindicated in my skepticism. That was dumb. The coding agents more or less one-shot every single problem. It was far from pretty and there were some very dumb mistakes but it definitely worked.

Parts of the GenAI coding experience did take the joy out of it. Its solutions were soulless and textbooky. But with the right interface it actually got better. The results changed my prejudices about vibe coding. Like it’s still ugly and dangerous, but it’s much better operationally than I would have expected and in the hands of someone who knows what they’re doing and can shape the solution appropriately I can totally see how productivity could be enhanced.

So in that sense This experiment did what it set out to do: I do feel I have to pay more attention to the froth around AI and coding and I’m no longer comfortable sitting back and holding my nose.

I have some long-term concerns about the cognitive effects. Right now I have the expertise to shape it towards a good solution. But after reading so much ugly code is it possible my own standards start degrading? Would I even know most of this stuff if I hadn’t put in 200+ nights of AoC solving? I loved their FAQ: “Should I use AI to solve Advent of Code puzzles? No. If you send a friend to the gym on your behalf, would you expect to get stronger?” So for my own edification I don’t think I’ll use AI again for Advent of Code. But other projects that are languishing not so much from their innate difficulty but more from lack of my time: let’s see how it does!

Notes on GenAI Code

  • It is pretty ugly: take a look through the previous days and you’ll see the difference between artisanal and mass produced code. It rarely uses more complicated language features. That said both the compiler and my employer don’t really care about pretty code; it’s mostly about results. Now the pretty code does have some ancillary benefits: it’s much easier to read and reason about. And putting myself outside of the driver’s seat with the GenAI solutions reviewability becomes even more important!
  • It really doesn’t like to use libraries. Everything is a reimplementation. And it’s not like it can’t use libraries; when prompted it works pretty well. But I suspect a lot of the RL in the training discourages using libraries implicitly as it’s probably solving a bunch of leetcode style things. That said it’s a poor match for solving these sorts of problems in a practical context.
  • It does really dumb stuff. My main take before this experiment is that you’re highly likely to have bugs in GenAI code for anything complicated. After 23 solutions I’d like to revise that to you’re somewhat likely to have bugs. But that rate is still a lot; you definitely need to be there carefully reviewing and guiding the process. I am still befuddled by folks who claim to “fully vibe code” and not touch the code at all. Doing this in any sort of professional context seems incredibly risky unless bugs just don’t matter. But this may be context specific: maybe building web apps is ok to just yolo vibe code?
  • The model quality definitely matters: the Gemini 3 Pro code (Day 9 onwards) was much nicer than the Gemini 2.5 Pro code. Certainly the code commenting got a ton better without a lot of redundant comments telling me what the code is also telling me. Speed too was quite different though I suspect that’s confounded with more TPUs being devoted to Gemini 3.
  • One benefit is that the AI doesn’t get lazy whereas I do. So if I want to investigate some other way of implementing the problem it’s more of a free action. Or if I know I should add certain test cases using AI to start the process lowered my activation energy.
  • A corollary of this is that you can also task it with deep research on a whim. Just on the off chance it’s nice to run your problem through it and ask if there’s anything smarter you can do. Usually a quick skim will reveal it’s not that helpful but I did this for Day 9 Part 1 and it came back with a solution which only looks at points on the convex hull. So easy win!
  • The UI makes a big difference: Gemini CLI is a terrible experience for this workflow; antigravity is amazing. The differentiating factor is keeping you in context; while the agent is chugging away at something I can be going through the code editing and commenting on things. Having it all in one place and letting prompts pile up solves the human idling problem while waiting for the agent to finish.
  • Somewhat in tension with the last point is the idea of multitasking. I actually solved Day 10 and Day 11 in parallel since I was delayed by switching antigravity. That was nice! I popped in and out of the two processes orchestrating like a champ. Could I have done this without having already solved the problems? Certainly not! But in terms of just imposing my vision upon the code it was glorious. I can certainly see how folks get addicted to this when they have to churn through a bunch of more-or-less easy tasks in a codebase. How many of those tasks do I have? Not many, but I’d definitely reach for this when that applies.