Hehe, that was a fun episode!
I know python isn't the right choice for super fast code, of course. For the original problem of 6x6, that wasn't even relevant. Now that i'm running larger grids, i've become painfully aware of the need for more optimization. I have improved the speed of my code by a few orders of magnitude so far, and i think i can do another 2-3 before i'd have to switch out the fundamental design or even the language.
I can describe the evolution of my code so far, maybe that's fun. (Sorry btw for not posting it yet, but i have more improvements i want to make first).
At first, i created a model that can represent these things. I wrote a few classes:
- Vector, a class that has x and y values as well as a few methods that enable some arithmetic and such.
- Square, represents one square of the grid. They have a position (Vector), an incoming and an outgoing direction, as well as several convenience methods to traverse the grid and measure distances.
- Glyph, has Squares and a bunch of methods for connecting them, rendering as ASCII, SVG, getting rotated or mirrored versions, etc.
I'm not gonna go into my original number-based approach here. I intend to go back to it as soon as i run out of immediate ideas for improving the brute force, and i'll make a post about it if/when i get it to work correctly. I'll only describe my brute force journey today.
The brute force per se is very straight forward. It is a recursive function, calling itself to a depth of x*y (space size), each time appending one step that can be left, right, or forward.
In a 6x6 grid, the upper bound is 3^36 or 150094635296999121 different sequences of moves. Let's call them leaves of the tree. If i can check a million per second of them for validity (and i can't with this design), that's about 4759 years. Luckily, we don't have to generate or try them all. We can skip branches when we discover that they can't lead to any valid leaves.
My first naive approach was to detect that condition only when it's too late. That is, when i try to connect to a square that's outside the grid, or to a square that's already connected. This way, there was still dozens to hundreds of million sequences to try (but, about 10 orders of magnitude less than the worst case). It took a few hours iirc.
I then added another detection of dead branches: In the field right before the corners, it has to connect to the corner. Going any other direction can never produce a valid glyph. Then in the corner, it has to always turn right. I think that was all i did to get 6x6 down to 15 minutes.
Next thing i did was to detect when i'm stepping on an edge that i have to keep free. For example, when i'm finding a path from the top left to the top right corner, i can never go all the way to the left, right, or bottom edge, or i won't be able to get back to the top left corner later. This was the one that brought 7x6 from over 5h down to about 1h.
At that point, it still got stuck for minutes to hours sometimes, and i didn't immediately realize why. So i added a command to see what it's currently trying. I found that (of course) it sometimes encloses empty squares that are then unreachable.
So next thing i did, was to check for that whenever it returns back to the one legal edge, or touches another portion of the path from a different edge. Also aborting a branch when it leaves just one space to the "home" edge. This brought 7x6 down to 30 minutes.
Next thing i will do is the same detection, but when it touches the same portion of itself. It's more complicated because depending on how many turns it made, i have to check for empty squares on the left or the right side of all the previous squares of that portion. I found the correct way to know which one when the contact is frontal (7x6 now 15min), but haven't had the time yet to extend it for tangential contact. I hope to do that tomorrow. I'll also have to check for single square distances in other places than the home edge.
Running 8x8, after finding about 175k glyphs, i ran into memory problems. This thing was using gigabytes! Up until that point, i kept all the Glyph objects in lists, along with all the Square objects they contain, etc. I realised i can keep only their Lfr string representation and save about 90% of that memory.
During that change, i saw that i can improve the Lfr format by representing the corners by a '.' character rather than an 'r' character. That way, it's immediately obvious where they are, and this change allowed me to rewrite the rotation and mirroring functions to be a few string operations rather than many object manipulations. 7x6 runs in 11 minutes now.
So yeah i'm not done with optimising my inherently non-optimal design yet. Of course some sort of bitmap would be quicker - probably even within python, and certainly in a language closer to the hardware such as C. I don't really feel like rewriting it in such a way though, at this time. I wanna keep improving the python code until i run out of ideas, then implement the necessary bits in javascript to generate and script SVGs.
I've completed all the spaces up to 8x7, 9x6 and 10x5. I'm not gonna upload individual SVG files for them to my server because it's hundreds of thousands of files - i'll change that page so it loads entire spaces per file, then generates the SVGs inside the browser using javasript. This will take a little time though. I hope i can be back at the original number based approach for that, it would make the file sizes much smaller. But if i can't make that work right, the Lfr format will do - it's 20MB for 8x7, less if only including the archetypes and omitting the information about self symmetries. It can be reduced further by gzipping or similar.
Currently i'm running 8x8 (at 570k glyphs so far) and 10x6 (450k glyphs so far). I won't run spaces bigger than that unless i can speed things up by at least another 2 orders of magnitude.