While trying to come up with more optimizations for SZLib’s drawing system I came across something called “scanline coherence shape algebra”, which is an algorithm described in a book called Graphics Gems. It attempts to provide an efficient algorithm for computing “the visible regions of overlapping windows”. Aha, exactly what I’m looking for. I had to try it out.
In brief, the algorithm stores the shape as a list of “spans”. Each span consists of a y co-ordinate and a list of “segments”, which are just x co-ordinates. Spans are sorted by their y value and segments are sorted by their x value. Put together, they describe an arbitrary shape. The algorithm supports union, intersection and subtraction operations.
For the first pass at an implementation I followed the algorithm described in the book (as far as I could; the pseudocode for the special cases is too vague to implement in some places). I used linked lists to represent the spans and segments, and of course that required an overdose of
malloc() to create each node in the list. For the second pass I switched to using arrays to store the data and kept a pool of pre-allocated arrays to limit
malloc() as much as possible. That was a little faster, but still not fast enough. For the last pass I switched back to linked lists but built a version that uses a combination of a pair of pre-allocated arrays and couple of stacks to store the linked lists and eliminate all calls to
So, how does the new shape algebra code compare with the existing code? It’s really, really, amazingly bad. Really bad. On my development machine, adding rects to the damaged rect manager in my test application using the existing code averages about 5ms of CPU time per 1s of wallclock time. The new code averages about 100ms of CPU time per 1s of wallclock time. The existing code is 20x faster than the new algorithm.
At that point I gave up on it. It’s likely that rendering damaged areas would be faster using data produced by the new code as it automatically consolidates adjacent rectangles, but given that this single operation uses 20% more CPU time than the entirety of the rest of the test application it’s completely impractical.
It did give me a few ideas for other optimizations for identifying intersection failures, though:
- Sorting the rectangles within the quad tree first by their y and then their x axis;
- Maintaining a rectangle that represents the bounds of the damaged area within each quad tree node.