Intro

For reasons that are only clear to me, I needed to construct continuous loops through a grid, like this:

+--+  +--+--+--+  +--+  +--+--+  +--+--+--+
|  |  |        |  |  |  |     |  |        |     
+  +  +  +--+  +  +  +  +  +  +  +--+  +--+
|  |  |  |  |  |  |  |  |     |     |  |        
+  +--+  +  +--+  +  +  +  +--+  +--+  +--+
|        |        |  |  |  |     |        |     
+--+  +  +--+  +  +  +--+  +--+--+  +  +--+
   |        |     |                    |        
+--+  +--+  +--+  +--+  +--+--+--+--+  +  +
|     |  |     |     |  |           |  |        
+--+--+  +--+  +--+  +  +  +  +--+  +  +--+
            |     |  |  |     |  |  |     |     
+--+--+  +  +--+  +--+  +  +--+  +--+  +--+
|     |        |        |  |           |        
+  +  +--+--+  +  +  +--+  +  +--+  +--+  +
|           |  |     |     |  |  |  |           
+--+  +--+  +--+  +--+  +--+  +  +  +  +--+
   |  |  |        |     |     |  |  |  |  |     
+--+  +  +--+--+  +--+  +  +--+  +  +  +  +
|     |        |     |  |  |     |  |  |  |     
+  +  +--+--+  +  +--+  +--+  +--+  +--+  +
|           |  |  |           |           |     
+--+  +--+  +  +  +  +--+  +  +--+--+--+  +
   |  |  |  |  |  |  |  |              |  |     
+--+  +  +  +  +--+  +  +--+  +--+--+--+  +
|     |  |  |        |     |  |           |     
+--+  +  +  +--+--+--+  +--+  +  +--+--+  +
   |  |  |              |     |  |     |  |     
+  +--+  +--+--+--+--+--+  +  +--+  +  +--+

I wanted the entire graph to be one continuous loop. I wanted the graph to be random, and to pass through most (but not all) nodes.

This is called a Hamiltonian path through the activated nodes. In general, finding such paths is an NP-Complete problem. By taking advantage of the grid shape and my ability to throw away edge cases, I was able to generate these very quickly. This page discusses how I achieved runtime efficiency.

Depth-first search

At first I tried a simple depth-first search (DFS). This moved through the board, turning randomly at each step. Eventually this stopped if the starting node was reached and some minimum number of nodes had been visited on the way.

def find_next_node(node: Node, depth: int = 0) -> bool:
    nonlocal node_status
    nonlocal starting_node
    nonlocal resulting_path

    node_status[node] = Status.VISITED
    for next_node in get_adjacent_shuffled(node, rows, columns):
        if next_node == starting_node and depth >= min_nodes:
            resulting_path.append(node)
            return True
        if node_status[next_node] == Status.NOT_VISITED:
            if find_next_node(next_node, depth + 1):
                resulting_path.append(node)
                return True
    else:
        node_status[node] = Status.NOT_VISITED
        return False

node_status[starting_node] = Status.STARTING_NODE
find_next_node(starting_node)

resulting_path = [starting_node] + resulting_path

# Translate to edge IDs
edges = []
for st, en in zip(resulting_path[:-1], resulting_path[1:]):
    edges.append(get_edge_id(st, en))

This algorithm worked, but it grew exponentially. Above around 8x8 boards, the runtime was unacceptably slow.

Mixed integer programming

I realized that I could get almost a continuous path if I simply require every node to have either 2 edges or 0. This ensures that there is no forking or dead-ends. The problem is that this may make multiple continuous loops, which I'll deal with in the next section.

It would be nice to provide the constraint that every node has EITHER 0 or 2 edges, but this is a quadratic constraint. E.g. with edges a, b, c, d (all binary), you'd have the constraint (a+b+c+d)*(a+b+c+d-2)=0. And to my knowledge, there's no good way to solve a quadratically constrained program. Instead, I decided randomly that some nodes would be 0 and some would be 2. This makes all of our constraints linear.

With all linear constraints, this becomes a mixed integer program; i.e. a linear program with binary variables. Though these algorithms are technically exponential in the worst-case scenario, I found the program to run much more quickly than the DFS algorithm I had been running. I was able to generate 16x16 boards in around the same time it used to take to generate 8x8s.

One problem is that with these contraints a solution is not always feasible. One simple example is if you have a corner with 2 edges and an adjacent node with 0 edges. In practice, it's hard to tell when a solution is feasible. To resolve this, I just threw such boards away. I usually saw ~20 boards needed to be generated before I found a feasible board.

Finally I threw some random edges in the objective function to ensure that two boards with the same nodes would not necessarily be the same.

I used Google OR-Tools to implement:

model.Minimize(sum(
    edge_by_name[name]
    for name in random.sample(
        list(all_edges(board)), int(board.rows * board.columns / 2)
    )
))
for nodule in all_nodules(board):
    if random.random() < 0.1:
        model.Add(
            sum(
                edge_by_name[name]
                for name in logic.get_edges_adjacent_to_nodule(nodule, board)
            )
            == 0
        )
    else:
        model.Add(
            sum(
                edge_by_name[name]
                for name in logic.get_edges_adjacent_to_nodule(nodule, board)
            )
            == 2
        )
solver = cp_model.CpSolver()
status = solver.Solve(model)
found_solution = status in (cp_model.OPTIMAL, cp_model.FEASIBLE)
if not found_solution:
    continue

Multiple continuous loops

Because I most nodes were visited by some loop, I found that I could usually combine distinct loops by simply looking for adjacent parallel edges

+  +
|  |
+  +

and surgering them so that they were connected.

+--+

+--+

There are interesting edge cases where this doesn't work:

+--+  +
|  |
+--+  +

+  +--+
   |  |
+  +--+

I again just threw these away and started over.

To aid this I created a "Quad" class which is all of

a--b  a--c
|  |  |  |
c--d  b--d

So the basic algorithm is: 1. Loop through all Quads. 2. If ab and cd are edges from different loops, then delete ab/cd and add ac/bd.

Perhaps the hardest part is determining if the edges are from different loops. To handle this, I used a DisjointSet. This is a data class that contains subsets; the subsets can be efficiently merged by selecting any two elements from the sets. This way I only needed to merge nodes whenever an edge joined them, and the DisjointSet would have each loop as a different subset.

Altogether this was my code:

found_solution = False
while not found_solution:
    ds = DisjointSet()
    used_nodules = set()
    board.edges = []
    model = cp_model.CpModel()
    model.verbose = 0
    edge_by_name = {name: model.new_bool_var(name) for name in all_edges(board)}
    model.Minimize(sum(
        edge_by_name[name]
        for name in random.sample(
            list(all_edges(board)), int(board.rows * board.columns / 2)
        )
    ))
    for nodule in all_nodules(board):
        if random.random() < 0.1:
            model.Add(
                sum(
                    edge_by_name[name]
                    for name in logic.get_edges_adjacent_to_nodule(nodule, board)
                )
                == 0
            )
        else:
            model.Add(
                sum(
                    edge_by_name[name]
                    for name in logic.get_edges_adjacent_to_nodule(nodule, board)
                )
                == 2
            )
            ds.add(nodule)
            used_nodules.add(nodule)
    solver = cp_model.CpSolver()
    status = solver.Solve(model)
    found_solution = status in (cp_model.OPTIMAL, cp_model.FEASIBLE)
    if not found_solution:
        continue

    for edge in all_edges(board):
        if solver.BooleanValue(edge_by_name[edge]):
            board.edges.append(edge)
            ds.merge(*logic.terminals_from_edge(edge))

    if len(ds.subsets()) > 1:
        aq = list(all_quads(board))
        random.shuffle(aq)
        for q in aq:
            if (
                q.a not in used_nodules
                or q.b not in used_nodules
                or q.c not in used_nodules
                or q.d not in used_nodules
            ):
                # No paths to surger
                continue
            if (
                ds.connected(q.a, q.b)
                and logic.edge_from_terminals(q.a, q.b) in board.edges
                and ds.connected(q.c, q.d)
                and logic.edge_from_terminals(q.c, q.d) in board.edges
                and not ds.connected(q.a, q.c)
            ):
                board.edges.remove(logic.edge_from_terminals(q.a, q.b))
                board.edges.remove(logic.edge_from_terminals(q.c, q.d))
                board.edges.append(logic.edge_from_terminals(q.a, q.c))
                board.edges.append(logic.edge_from_terminals(q.b, q.d))
                ds.merge(q.a, q.c)

    if len(ds.subsets()) > 1:
        # Corner case
        found_solution = False
        continue

    found_solution = True

Conclusion

In the end, I was able to generate valid 15x15 grids in seconds. This satisfied my mysterious requirements.