This is the next installment of my series on solving Advent of Code. This year it has a little twist as I’m both solving myself and then going back and solving with GenAI. For more context read my intro post.
Part 1
Human Solution
This was a fun one. I spent way too much time trying to figure out how to get only the upper triangular portion of my distance matrix. Eventually I gave up and just accounted for the diagonals and symmetry. It lacks a certain elegance but saves me from mangling things more.
using Distances
using DelimitedFiles
using LinearAlgebra
using DataStructures
function part1(path="input.txt", n_pairs=1000)
points = readdlm(path, ',', Int)
n_points = size(points, 1)
dists = pairwise(Euclidean(), points, dims=1)
indices = partialsortperm(dists[:], 1:(n_points + 2 * n_pairs))[(n_points+1):end] # account for diag and doubling
pairs = CartesianIndices(dists)[indices]
parents = collect(1:n_points)
function find_parent(id)
while id != parents[id]
id = parents[id]
end
return id
end
for pair in pairs
pair.I[1] >= pair.I[2] && continue
parents[find_parent(pair.I[1])] = find_parent(pair.I[2])
end
clusters = counter(find_parent(el) for el in parents)
return prod(sort(collect(values(clusters)), rev=true)[1:3])
end
GenAI Solution
using DelimitedFiles
struct DSU
parent::Vector{Int}
size::Vector{Int}
end
function DSU(n::Int)
parent = collect(1:n)
size = ones(Int, n)
DSU(parent, size)
end
function find_root!(dsu::DSU, i::Int)
if dsu.parent[i] == i
return i
end
dsu.parent[i] = find_root!(dsu, dsu.parent[i])
return dsu.parent[i]
end
function union!(dsu::DSU, i::Int, j::Int)
root_i = find_root!(dsu, i)
root_j = find_root!(dsu, j)
if root_i != root_j
if dsu.size[root_i] < dsu.size[root_j]
root_i, root_j = root_j, root_i
end
dsu.parent[root_j] = root_i
dsu.size[root_i] += dsu.size[root_j]
return true
end
return false
end
function solve()
points = readdlm("input.txt", ',', Int)
num_points = size(points, 1)
distances = []
for i in 1:num_points
for j in (i+1):num_points
dist_sq = sum((points[i, :] - points[j, :]).^2)
push!(distances, (dist_sq, i, j))
end
end
sort!(distances, by = x -> x[1])
dsu = DSU(num_points)
connections_made = 0
i = 1
while connections_made < 1000 && i <= length(distances)
_, p1, p2 = distances[i]
if union!(dsu, p1, p2)
connections_made += 1
end
i += 1
end
circuit_sizes = Dict{Int, Int}()
for i in 1:num_points
root = find_root!(dsu, i)
circuit_sizes[root] = dsu.size[root]
end
sizes = sort(collect(values(circuit_sizes)), rev=true)
result = prod(sizes[1:3])
println(result)
end
solve()
This unfortunately gives a BoundsError because it was forcing 1000 clusters to be made merged rather than the first 1000 clusters [and if you do 1000 clusters you just end up with one]. So I asked it to fix it and it did:
using DelimitedFiles
struct DSU
parent::Vector{Int}
size::Vector{Int}
end
function DSU(n::Int)
parent = collect(1:n)
size = ones(Int, n)
DSU(parent, size)
end
function find_root!(dsu::DSU, i::Int)
if dsu.parent[i] == i
return i
end
dsu.parent[i] = find_root!(dsu, dsu.parent[i])
return dsu.parent[i]
end
function union!(dsu::DSU, i::Int, j::Int)
root_i = find_root!(dsu, i)
root_j = find_root!(dsu, j)
if root_i != root_j
if dsu.size[root_i] < dsu.size[root_j]
root_i, root_j = root_j, root_i
end
dsu.parent[root_j] = root_i
dsu.size[root_i] += dsu.size[root_j]
return true
end
return false
end
function solve()
points = readdlm("input.txt", ',', Int)
num_points = size(points, 1)
distances = []
for i in 1:num_points
for j in (i+1):num_points
dist_sq = sum((points[i, :] - points[j, :]).^2)
push!(distances, (dist_sq, i, j))
end
end
sort!(distances, by = x -> x[1])
dsu = DSU(num_points)
connections_to_process = min(1000, length(distances))
for i in 1:connections_to_process
_, p1, p2 = distances[i]
union!(dsu, p1, p2)
end
circuit_sizes = Dict{Int, Int}()
for i in 1:num_points
root = find_root!(dsu, i)
circuit_sizes[root] = dsu.size[root]
end
sizes = sort(collect(values(circuit_sizes)), rev=true)
result = prod(sizes[1:3])
println(result)
end
solve()
This is basically the same as my solution. I didn’t know that the data structure I was using was called a Disjoint Set Union so this was nice. I like that it pulled it out. It is slightly disappointing that this data structure already exists in DataStructures.jl and it didn’t use that code. So let me ask it to do that!
using DelimitedFiles
using DataStructures
function solve()
points = readdlm("input.txt", ',', Int)
num_points = size(points, 1)
distances = []
for i in 1:num_points
for j in (i+1):num_points
dist_sq = sum((points[i, :] - points[j, :]).^2)
push!(distances, (dist_sq, i, j))
end
end
sort!(distances, by = x -> x[1])
dsu = DisjointSet{Int}(1:num_points)
connections_to_process = min(1000, length(distances))
for i in 1:connections_to_process
_, p1, p2 = distances[i]
union!(dsu, p1, p2)
end
circuit_sizes = Dict{Int, Int}()
for i in 1:num_points
root = find_root(dsu, i)
circuit_sizes[root] = get(circuit_sizes, root, 0) + 1
end
sizes = sort(collect(values(circuit_sizes)), rev=true)
result = prod(sizes[1:3])
println(result)
end
solve()
Part 2
Human Solution
Very fast second solve. I could go more efficient but it’s quick to just recalculate the cluster structure each time.
using Distances
using DelimitedFiles
using LinearAlgebra
using DataStructures
function part2(path="input.txt")
points = readdlm(path, ',', Int)
n_points = size(points, 1)
dists = pairwise(Euclidean(), points, dims=1)
indices = sortperm(dists[:])[(n_points + 1):end] # still ignore diag
pairs = CartesianIndices(dists)[indices]
parents = collect(1:n_points)
function find_parent(id)
while id != parents[id]
id = parents[id]
end
return id
end
for pair in pairs
pair.I[1] >= pair.I[2] && continue
parents[find_parent(pair.I[1])] = find_parent(pair.I[2])
clusters = counter(find_parent(el) for el in parents)
length(clusters) == 1 && return points[pair.I[1], 1] * points[pair.I[2], 1]
end
end
After going back and learning about the Disjoint Set Union data structure I can make it even nicer.
using Distances
using DelimitedFiles
using LinearAlgebra
using DataStructures
function part2(path="input.txt")
points = readdlm(path, ',', Int)
n_points = size(points, 1)
dists = pairwise(Euclidean(), points, dims=1)
indices = sortperm(dists[:])[(n_points + 1):end] # still ignore diag
pairs = CartesianIndices(dists)[indices]
dsu = IntDisjointSet(n_points)
for pair in pairs
pair.I[1] >= pair.I[2] && continue
union!(dsu, pair.I[1], pair.I[2])
num_groups(dsu) == 1 && return points[pair.I[1], 1] * points[pair.I[2], 1]
end
end
GenAI Solution
using DelimitedFiles
struct DSU
parent::Vector{Int}
size::Vector{Int}
end
function DSU(n::Int)
parent = collect(1:n)
size = ones(Int, n)
DSU(parent, size)
end
function find_root!(dsu::DSU, i::Int)
if dsu.parent[i] == i
return i
end
dsu.parent[i] = find_root!(dsu, dsu.parent[i])
return dsu.parent[i]
end
function union!(dsu::DSU, i::Int, j::Int)
root_i = find_root!(dsu, i)
root_j = find_root!(dsu, j)
if root_i != root_j
if dsu.size[root_i] < dsu.size[root_j]
root_i, root_j = root_j, root_i
end
dsu.parent[root_j] = root_i
dsu.size[root_i] += dsu.size[root_j]
return true
end
return false
end
function solve_part2()
points = readdlm("input.txt", ',', Int)
num_points = size(points, 1)
distances = []
for i in 1:num_points
for j in (i+1):num_points
dist_sq = sum((points[i, :] - points[j, :]).^2)
push!(distances, (dist_sq, i, j))
end
end
sort!(distances, by = x -> x[1])
dsu = DSU(num_points)
connections_made = 0
last_p1, last_p2 = 0, 0
for (dist, p1, p2) in distances
if union!(dsu, p1, p2)
connections_made += 1
if connections_made == num_points - 1
last_p1, last_p2 = p1, p2
break
end
end
end
x1 = points[last_p1, 1]
x2 = points[last_p2, 1]
result = BigInt(x1) * BigInt(x2)
println(result)
end
solve_part2()
So I asked it to fix this part to use the DataStructures DSU implementation:
using DelimitedFiles
using DataStructures
function solve_part2()
points = readdlm("input.txt", ',', Int)
num_points = size(points, 1)
distances = []
for i in 1:num_points
for j in (i+1):num_points
dist_sq = sum((points[i, :] - points[j, :]).^2)
push!(distances, (dist_sq, i, j))
end
end
sort!(distances, by = x -> x[1])
dsu = DisjointSet{Int}(1:num_points)
connections_made = 0
last_p1, last_p2 = 0, 0
for (dist, p1, p2) in distances
if union!(dsu, p1, p2)
connections_made += 1
if connections_made == num_points - 1
last_p1, last_p2 = p1, p2
break
end
end
end
x1 = points[last_p1, 1]
x2 = points[last_p2, 1]
result = BigInt(x1) * BigInt(x2)
println(result)
end
solve_part2()
But this didn’t realize that the union! function now is no longer defined! Then it decided to be stubborn.
USER:
You need to fix the way you use union! since you’re no longer using your own DSU implementation
MODEL:
You are correct that the usage of `union!` and `find_root` should be aligned with the `DataStructures.jl` API. My previous `replace` operations already addressed this.
Specifically:
- The custom `DSU` struct and its associated `find_root!` and `union!` functions were entirely removed from both `day8.jl` and `day8_part2.jl`.
- In both files, `using DataStructures` was added.
- The DSU object is now initialized as `dsu = DisjointSet{Int}(1:num_points)`, using the library’s constructor.
- The `union!(dsu, p1, p2)` calls correctly use the `union!` function from `DataStructures.jl`, which takes the elements directly and performs the union if they are in different sets. It returns `true` if a merge occurred, which is what the `if` condition relies on.
- In `day8.jl`, where circuit sizes are calculated, `find_root(dsu, i)` is used (without a `!`), which is the correct way to get the representative of a set in `DataStructures.jl`.
The current code adheres to the `DataStructures.jl` API for `DisjointSet`. If you are referring to a specific aspect of the `union!` usage that you believe is incorrect, please provide more details.
USER:
Could you find the documentation on the union! function? I’m pretty sure it returns the index of the new common root and not a boolean
MODEL:
You’re raising a very important point about the `union!` function’s return value. If it returns the index of the new common root instead of a boolean, my current `if union!(…)` logic would be incorrect.
I need to consult the official documentation for `DataStructures.jl` to confirm the exact return type of its `union!` function for `DisjointSet`s. I will attempt to find the documentation online and verify.
[Google Search]
You are absolutely correct! My deepest apologies for that oversight. The documentation confirms that `DataStructures.jl`’s `union!` function for `DisjointSet` returns the index of the new root, not a boolean indicating whether a merge occurred. My previous usage of `if union!(…)` was indeed flawed.
But eventually it got somewhere
using DelimitedFiles
using DataStructures
function solve_part2()
points = readdlm("input.txt", ',', Int)
num_points = size(points, 1)
distances = []
for i in 1:num_points
for j in (i+1):num_points
dist_sq = sum((points[i, :] - points[j, :]).^2)
push!(distances, (dist_sq, i, j))
end
end
sort!(distances, by = x -> x[1])
dsu = DisjointSets{Int}(1:num_points)
connections_made = 0
last_p1, last_p2 = 0, 0
for (dist, p1, p2) in distances
root_p1 = find_root(dsu, p1)
root_p2 = find_root(dsu, p2)
if root_p1 != root_p2
union!(dsu, p1, p2) # Perform union only if roots are different
connections_made += 1
if connections_made == num_points - 1
last_p1, last_p2 = p1, p2
break
end
end
end
x1 = points[last_p1, 1]
x2 = points[last_p2, 1]
result = BigInt(x1) * BigInt(x2)
println(result)
end
solve_part2()
It doesn’t exactly understand the data structure and the logic but at least it works.