This is the next installment of my series on solving Advent of Code. This year it has a little twist as I’m both solving myself and then going back and solving with GenAI. For more context read my intro post.

Part 1

Human Solution

This was a fun one. I spent way too much time trying to figure out how to get only the upper triangular portion of my distance matrix. Eventually I gave up and just accounted for the diagonals and symmetry. It lacks a certain elegance but saves me from mangling things more.

using Distances
using DelimitedFiles
using LinearAlgebra
using DataStructures

function part1(path="input.txt", n_pairs=1000)
    points = readdlm(path, ',', Int)

    n_points = size(points, 1)

    dists = pairwise(Euclidean(), points, dims=1)
    indices = partialsortperm(dists[:], 1:(n_points + 2 * n_pairs))[(n_points+1):end] # account for diag and doubling
    pairs = CartesianIndices(dists)[indices]

    parents = collect(1:n_points)
    function find_parent(id)
        while id != parents[id]
            id = parents[id]
        end
        return id
    end

    for pair in pairs
        pair.I[1] >= pair.I[2] && continue
        parents[find_parent(pair.I[1])] = find_parent(pair.I[2])
    end

    clusters = counter(find_parent(el) for el in parents)
    return prod(sort(collect(values(clusters)), rev=true)[1:3])
end

GenAI Solution

using DelimitedFiles

struct DSU
    parent::Vector{Int}
    size::Vector{Int}
end

function DSU(n::Int)
    parent = collect(1:n)
    size = ones(Int, n)
    DSU(parent, size)
end

function find_root!(dsu::DSU, i::Int)
    if dsu.parent[i] == i
        return i
    end
    dsu.parent[i] = find_root!(dsu, dsu.parent[i])
    return dsu.parent[i]
end

function union!(dsu::DSU, i::Int, j::Int)
    root_i = find_root!(dsu, i)
    root_j = find_root!(dsu, j)
    if root_i != root_j
        if dsu.size[root_i] < dsu.size[root_j]
            root_i, root_j = root_j, root_i
        end
        dsu.parent[root_j] = root_i
        dsu.size[root_i] += dsu.size[root_j]
        return true
    end
    return false
end

function solve()
    points = readdlm("input.txt", ',', Int)
    num_points = size(points, 1)

    distances = []
    for i in 1:num_points
        for j in (i+1):num_points
            dist_sq = sum((points[i, :] - points[j, :]).^2)
            push!(distances, (dist_sq, i, j))
        end
    end

    sort!(distances, by = x -> x[1])

    dsu = DSU(num_points)
    connections_made = 0
    i = 1
    while connections_made < 1000 && i <= length(distances)
        _, p1, p2 = distances[i]
        if union!(dsu, p1, p2)
            connections_made += 1
        end
        i += 1
    end

    circuit_sizes = Dict{Int, Int}()
    for i in 1:num_points
        root = find_root!(dsu, i)
        circuit_sizes[root] = dsu.size[root]
    end

    sizes = sort(collect(values(circuit_sizes)), rev=true)

    result = prod(sizes[1:3])
    println(result)
end

solve()

This unfortunately gives a BoundsError because it was forcing 1000 clusters to be made merged rather than the first 1000 clusters [and if you do 1000 clusters you just end up with one]. So I asked it to fix it and it did:

using DelimitedFiles

struct DSU
    parent::Vector{Int}
    size::Vector{Int}
end

function DSU(n::Int)
    parent = collect(1:n)
    size = ones(Int, n)
    DSU(parent, size)
end

function find_root!(dsu::DSU, i::Int)
    if dsu.parent[i] == i
        return i
    end
    dsu.parent[i] = find_root!(dsu, dsu.parent[i])
    return dsu.parent[i]
end

function union!(dsu::DSU, i::Int, j::Int)
    root_i = find_root!(dsu, i)
    root_j = find_root!(dsu, j)
    if root_i != root_j
        if dsu.size[root_i] < dsu.size[root_j]
            root_i, root_j = root_j, root_i
        end
        dsu.parent[root_j] = root_i
        dsu.size[root_i] += dsu.size[root_j]
        return true
    end
    return false
end

function solve()
    points = readdlm("input.txt", ',', Int)
    num_points = size(points, 1)

    distances = []
    for i in 1:num_points
        for j in (i+1):num_points
            dist_sq = sum((points[i, :] - points[j, :]).^2)
            push!(distances, (dist_sq, i, j))
        end
    end

    sort!(distances, by = x -> x[1])

    dsu = DSU(num_points)
    connections_to_process = min(1000, length(distances))
    for i in 1:connections_to_process
        _, p1, p2 = distances[i]
        union!(dsu, p1, p2)
    end

    circuit_sizes = Dict{Int, Int}()
    for i in 1:num_points
        root = find_root!(dsu, i)
        circuit_sizes[root] = dsu.size[root]
    end

    sizes = sort(collect(values(circuit_sizes)), rev=true)

    result = prod(sizes[1:3])
    println(result)
end

solve()

This is basically the same as my solution. I didn’t know that the data structure I was using was called a Disjoint Set Union so this was nice. I like that it pulled it out. It is slightly disappointing that this data structure already exists in DataStructures.jl and it didn’t use that code. So let me ask it to do that!

using DelimitedFiles
using DataStructures

function solve()
    points = readdlm("input.txt", ',', Int)
    num_points = size(points, 1)

    distances = []
    for i in 1:num_points
        for j in (i+1):num_points
            dist_sq = sum((points[i, :] - points[j, :]).^2)
            push!(distances, (dist_sq, i, j))
        end
    end

    sort!(distances, by = x -> x[1])

    dsu = DisjointSet{Int}(1:num_points)
    connections_to_process = min(1000, length(distances))
    for i in 1:connections_to_process
        _, p1, p2 = distances[i]
        union!(dsu, p1, p2)
    end

    circuit_sizes = Dict{Int, Int}()
    for i in 1:num_points
        root = find_root(dsu, i)
        circuit_sizes[root] = get(circuit_sizes, root, 0) + 1
    end

    sizes = sort(collect(values(circuit_sizes)), rev=true)

    result = prod(sizes[1:3])
    println(result)
end

solve()

Part 2

Human Solution

Very fast second solve. I could go more efficient but it’s quick to just recalculate the cluster structure each time.

using Distances
using DelimitedFiles
using LinearAlgebra
using DataStructures

function part2(path="input.txt")
    points = readdlm(path, ',', Int)

    n_points = size(points, 1)

    dists = pairwise(Euclidean(), points, dims=1)
    indices = sortperm(dists[:])[(n_points + 1):end] # still ignore diag
    pairs = CartesianIndices(dists)[indices]

    parents = collect(1:n_points)
    function find_parent(id)
        while id != parents[id]
            id = parents[id]
        end
        return id
    end

    for pair in pairs
        pair.I[1] >= pair.I[2] && continue
        parents[find_parent(pair.I[1])] = find_parent(pair.I[2])
        clusters = counter(find_parent(el) for el in parents)
        length(clusters) == 1 && return points[pair.I[1], 1] * points[pair.I[2], 1]
    end
end

After going back and learning about the Disjoint Set Union data structure I can make it even nicer.

using Distances
using DelimitedFiles
using LinearAlgebra
using DataStructures

function part2(path="input.txt")
    points = readdlm(path, ',', Int)

    n_points = size(points, 1)

    dists = pairwise(Euclidean(), points, dims=1)
    indices = sortperm(dists[:])[(n_points + 1):end] # still ignore diag
    pairs = CartesianIndices(dists)[indices]

    dsu = IntDisjointSet(n_points)

    for pair in pairs
        pair.I[1] >= pair.I[2] && continue
        union!(dsu, pair.I[1], pair.I[2])
        num_groups(dsu) == 1 && return points[pair.I[1], 1] * points[pair.I[2], 1]
    end
end

GenAI Solution

using DelimitedFiles

struct DSU
    parent::Vector{Int}
    size::Vector{Int}
end

function DSU(n::Int)
    parent = collect(1:n)
    size = ones(Int, n)
    DSU(parent, size)
end

function find_root!(dsu::DSU, i::Int)
    if dsu.parent[i] == i
        return i
    end
    dsu.parent[i] = find_root!(dsu, dsu.parent[i])
    return dsu.parent[i]
end

function union!(dsu::DSU, i::Int, j::Int)
    root_i = find_root!(dsu, i)
    root_j = find_root!(dsu, j)
    if root_i != root_j
        if dsu.size[root_i] < dsu.size[root_j]
            root_i, root_j = root_j, root_i
        end
        dsu.parent[root_j] = root_i
        dsu.size[root_i] += dsu.size[root_j]
        return true
    end
    return false
end

function solve_part2()
    points = readdlm("input.txt", ',', Int)
    num_points = size(points, 1)

    distances = []
    for i in 1:num_points
        for j in (i+1):num_points
            dist_sq = sum((points[i, :] - points[j, :]).^2)
            push!(distances, (dist_sq, i, j))
        end
    end

    sort!(distances, by = x -> x[1])

    dsu = DSU(num_points)
    connections_made = 0
    last_p1, last_p2 = 0, 0

    for (dist, p1, p2) in distances
        if union!(dsu, p1, p2)
            connections_made += 1
            if connections_made == num_points - 1
                last_p1, last_p2 = p1, p2
                break
            end
        end
    end

    x1 = points[last_p1, 1]
    x2 = points[last_p2, 1]

    result = BigInt(x1) * BigInt(x2)
    println(result)
end

solve_part2()

So I asked it to fix this part to use the DataStructures DSU implementation:

using DelimitedFiles
using DataStructures

function solve_part2()
    points = readdlm("input.txt", ',', Int)
    num_points = size(points, 1)

    distances = []
    for i in 1:num_points
        for j in (i+1):num_points
            dist_sq = sum((points[i, :] - points[j, :]).^2)
            push!(distances, (dist_sq, i, j))
        end
    end

    sort!(distances, by = x -> x[1])

    dsu = DisjointSet{Int}(1:num_points)
    connections_made = 0
    last_p1, last_p2 = 0, 0

    for (dist, p1, p2) in distances
        if union!(dsu, p1, p2)
            connections_made += 1
            if connections_made == num_points - 1
                last_p1, last_p2 = p1, p2
                break
            end
        end
    end

    x1 = points[last_p1, 1]
    x2 = points[last_p2, 1]

    result = BigInt(x1) * BigInt(x2)
    println(result)
end

solve_part2()

But this didn’t realize that the union! function now is no longer defined! Then it decided to be stubborn.

USER:

You need to fix the way you use union! since you’re no longer using your own DSU implementation

MODEL:

You are correct that the usage of `union!` and `find_root` should be aligned with the `DataStructures.jl` API. My previous `replace` operations already addressed this.

Specifically:

  1. The custom `DSU` struct and its associated `find_root!` and `union!` functions were entirely removed from both `day8.jl` and `day8_part2.jl`.
  2. In both files, `using DataStructures` was added.
  3. The DSU object is now initialized as `dsu = DisjointSet{Int}(1:num_points)`, using the library’s constructor.
  4. The `union!(dsu, p1, p2)` calls correctly use the `union!` function from `DataStructures.jl`, which takes the elements directly and performs the union if they are in different sets. It returns `true` if a merge occurred, which is what the `if` condition relies on.
  5. In `day8.jl`, where circuit sizes are calculated, `find_root(dsu, i)` is used (without a `!`), which is the correct way to get the representative of a set in `DataStructures.jl`.

The current code adheres to the `DataStructures.jl` API for `DisjointSet`. If you are referring to a specific aspect of the `union!` usage that you believe is incorrect, please provide more details.

USER:

Could you find the documentation on the union! function? I’m pretty sure it returns the index of the new common root and not a boolean

MODEL:

You’re raising a very important point about the `union!` function’s return value. If it returns the index of the new common root instead of a boolean, my current `if union!(…)` logic would be incorrect.

I need to consult the official documentation for `DataStructures.jl` to confirm the exact return type of its `union!` function for `DisjointSet`s. I will attempt to find the documentation online and verify.

[Google Search]

You are absolutely correct! My deepest apologies for that oversight. The documentation confirms that `DataStructures.jl`’s `union!` function for `DisjointSet` returns the index of the new root, not a boolean indicating whether a merge occurred. My previous usage of `if union!(…)` was indeed flawed.

But eventually it got somewhere

using DelimitedFiles
using DataStructures

function solve_part2()
    points = readdlm("input.txt", ',', Int)
    num_points = size(points, 1)

    distances = []
    for i in 1:num_points
        for j in (i+1):num_points
            dist_sq = sum((points[i, :] - points[j, :]).^2)
            push!(distances, (dist_sq, i, j))
        end
    end

    sort!(distances, by = x -> x[1])

    dsu = DisjointSets{Int}(1:num_points)
    connections_made = 0
    last_p1, last_p2 = 0, 0

    for (dist, p1, p2) in distances
        root_p1 = find_root(dsu, p1)
        root_p2 = find_root(dsu, p2)

        if root_p1 != root_p2
            union!(dsu, p1, p2) # Perform union only if roots are different
            connections_made += 1
            if connections_made == num_points - 1
                last_p1, last_p2 = p1, p2
                break
            end
        end
    end

    x1 = points[last_p1, 1]
    x2 = points[last_p2, 1]

    result = BigInt(x1) * BigInt(x2)
    println(result)
end

solve_part2()

It doesn’t exactly understand the data structure and the logic but at least it works.