A Failed Experiment

While trying to come up with more optimizations for SZLib’s drawing system I came across something called “scanline coherence shape algebra”, which is an algorithm described in a book called Graphics Gems. It attempts to provide an efficient algorithm for computing “the visible regions of overlapping windows”. Aha, exactly what I’m looking for. I had to try it out.

In brief, the algorithm stores the shape as a list of “spans”. Each span consists of a y co-ordinate and a list of “segments”, which are just x co-ordinates. Spans are sorted by their y value and segments are sorted by their x value. Put together, they describe an arbitrary shape. The algorithm supports union, intersection and subtraction operations.

For the first pass at an implementation I followed the algorithm described in the book (as far as I could; the pseudocode for the special cases is too vague to implement in some places). I used linked lists to represent the spans and segments, and of course that required an overdose of malloc() to create each node in the list. For the second pass I switched to using arrays to store the data and kept a pool of pre-allocated arrays to limit malloc() as much as possible. That was a little faster, but still not fast enough. For the last pass I switched back to linked lists but built a version that uses a combination of a pair of pre-allocated arrays and couple of stacks to store the linked lists and eliminate all calls to malloc().

So, how does the new shape algebra code compare with the existing code? It’s really, really, amazingly bad. Really bad. On my development machine, adding rects to the damaged rect manager in my test application using the existing code averages about 5ms of CPU time per 1s of wallclock time. The new code averages about 100ms of CPU time per 1s of wallclock time. The existing code is 20x faster than the new algorithm.

At that point I gave up on it. It’s likely that rendering damaged areas would be faster using data produced by the new code as it automatically consolidates adjacent rectangles, but given that this single operation uses 20% more CPU time than the entirety of the rest of the test application it’s completely impractical.

It did give me a few ideas for other optimizations for identifying intersection failures, though:

  • Sorting the rectangles within the quad tree first by their y and then their x axis;
  • Maintaining a rectangle that represents the bounds of the damaged area within each quad tree node.


Redrawing with Damaged Rects

The Problem: Bad APIs and Suboptimal Redrawing

The existing gadget redrawing system in Woopsi sucks. It’s inefficient and makes the API ugly.

For example, here is a completely bare-bones startup method that will get Woopsi into a usable (but empty) state:

void SomeApp::startup() {

The two function calls in the startup() method ensure that the Woopsi instance is visible. When an instance of the Woopsi class is created, its redraw system is disabled. This allows the initial set of gadgets to be added without them appearing on-screen one at a time. Once the initial state has been set up, drawing can be enabled, and the initial state can be drawn.

You might notice that these two lines of code aren’t required in any other UI framework you’ve used. Neither winforms, AWT, Swing, GTK, KDE or anything else requires that you manually enable redrawing and redraw the display once you’ve finished adding UI components to your application.

The reason for this is the way that Woopsi handles redrawing. In Woopsi, each gadget is responsible for redrawing itself. All gadgets know which portions of themselves are visible, and they also know when their appearance has been changed. When a gadget is created it knows that it must redraw itself, and does so. Likewise, when a gadget moves, or is resized, or is clicked, it will redraw itself. So, when we add gadgets to Woopsi, they will draw themselves to the screen one at a time instead of the entire screen being drawn in a single operation. This piecemeal redraw looks odd and wastes CPU time, so drawing is initially disabled. Users must re-enable drawing themselves when the UI is ready.

The problem of wasted CPU cycles is more onerous in other situations. Consider the TextBox gadget. This redraws itself every time its text is changed. If we run the following code, the textbox will redraw four times:


The textbox redraws every time a character is added to its text. If we were to add one hundred characters, the textbox would redraw one hundred times. Now suppose that the textbox fills the entire bottom DS screen. Changing the text in the textbox changes roughly 8x8, or 64 pixels, per update. That means we’ve changed 6,400 pixels. However, we’ve redrawn every pixel on the screen one hundred times. We’ve redrawn 256x192, or 49,152 pixels, per update. We’ve redrawn 4,195,200 pixels when we only needed to redraw at most 6,400. 99.87% of the redrawing was just wasted CPU time.

Woopsi offers a workaround in the form of the enableDrawing() and disableDrawing() methods of the Gadget class:


However, this is a nasty hack. It requires that the developer knows enough about Woopsi’s internals to be able to predict when it will redraw and when it is safe to prevent the automated redraws from running. In the above code, it isn’t particularly obvious why drawing is re-enabled before the last character is added (so that the final append triggers a redraw) and it isn’t something that developers should need to care about.

The Competition: What Everyone Else Does

In the majority of other UI kits, redrawing is handled by tracking damaged rectangles. These are areas of the screen that have been modified and therefore need redrawing. In these systems, instead of our textbox redrawing itself, it would notify an object responsible for tracking these damaged rectangles that its visible areas had changed. The tracking object would store any of those rectangles that it hadn’t been previously notified of in a list. At frequent intervals (every VBL, say) the tracker would order the gadget hierarchy to redraw all of the damaged rectangles in its list. Using this system, the textbox would only be drawn once no matter how many times it was changed.

Justifying Woopsi’s Behaviour: DMA and Bad Code

Woopsi doesn’t use a damaged rectangle system for two reasons. Firstly, it makes extensive use of the DS’ DMA hardware to copy areas of the framebuffer around. It is especially important to the scrolling system that everything on-screen is up-to-date. If redrawing was postponed until a later time it is likely that the scrolling system would be working with out-of-date data. The second reason is the complexity of the rectangle splitting algorithm discussed here and here. Trying to get the code that splits up rectangles to do anything slightly different would be horribly complicated because, well, the code itself is horrible.

Potential Benefits of Changing Woopsi

But what if Woopsi could use the damaged rectangle system for redrawing? How would that affect the API and the framework? It would instantly obsolete about a dozen of the most complicated methods, and it would make a bare-bones Woopsi setup function look like this:

void SomeApp::startup() { }

Adding one hundred characters to a textbox would require a single redraw without any extra developer effort. Even actions such as moving gadgets would be more efficient - they are currently entirely erased from one location and redrawn at another, even if they move by just one pixel. A damaged rectangle system would reduce the number of rendered pixels in such a situation by almost 50% because there wouldn’t be an erasing step.

Fixing the Rectangle Splitter

The existing code used to split up rectangles into intersecting and non-intersecting sub-rectangles is 207 lines long. At times, it has nested if statements and for loops nearly a dozen deep.

(I’d originally intended to add the code in here, except my current WordPress theme breaks the “more” tag, making the post far too long for the front page of the blog. Additionally, trying to force the code into a smaller div by adding a “style” attribute breaks the syntax highlighter and causes angle brackets to be interpreted as HTML tags. I really need to get a new theme that’s less cluttered and easier to read, but nothing I’ve found so far fits.)

Anything nested that deeply is in dire need of refactoring. I looked at it the other day and realised that the entire algorithm was vastly overengineered. A far more efficient approach presented itself to me and I set about rewriting it. The replacement method is 74 lines long and contains just 4 if statements with no nesting at all.

The new method is a massive improvement over the original code. It is considerably smaller and easier to understand, it is faster and best of all, it produces the optimal result each time.

In all situations where there is an intersection, the original code produced (2n-1) rectangles, where n is the optimum number of rectangles. The larger the number of rectangles produced by this method, the more clipping the renderer will have to do later, so having a function that always produces the optimal result is a very good thing.

Switching to Damaged Rectangles

With the rectangle splitting routine rewritten and moved into the Rect class, where it should have been from the start, switching to a damaged rectangle-based rendering system starts to become feasible. The last problem to solve is the issue of scrolling. The scroll routines must be working with the latest bitmap data, which means rendering cannot be deferred.

I don’t know how much thought I’ve put into into solving this during the course of Woopsi’s development, but I’d never managed to hit on a solution until a couple of days ago. The answer is painfully obvious once you know it: make any functions that use the scrolling abilities of Woopsi flush any pending damaged rectangle redraws before they start. That way, the screen will be up-to-date before any scrolling is done.

Simple, obvious.

The upshot of all this is that Woopsi’s rendering engine is now based on damaged rectangles. Instead of the TextBox calling redraw() when its text changes, it now calls markRectsDirty(). It sends its list of visible rectangles to a new DisplayManager class (I might change the name of that class at some point), which works out which parts of the damaged rectangles it knows about (and discards them) and which are new (and remembers them). Once every VBL the DisplayManager redraws the display.

Redrawing is achieved by splitting the damaged rectangles into intersections with the gadgets. The DisplayManager performs a depth-first walk of the gadget tree in order to redraw in front-to-back order. Any rectangles that are redrawn are removed from the damaged rects list. Once that list is empty, the redraw method ends.

What has changed?

So, what has changed? There’s no need to call enableDrawing() or redraw() in the startup() method, for a start. In fact, both enableDrawing() and disableDrawing() no longer exist. The redraw() method has gone too, as well as the following:

  • Gadget::erase()
  • Gadget::eraseGadget()
  • Gadget::redrawDirty()
  • Gadget::drawChildren()
  • Gadget::redrawDirtyChildren()
  • Woopsi::eraseRect()

As previously mentioned, these were some of the cludgiest, most unpleasant functions in the system. I’m glad to have chucked them out.

There’s always a Catch

Unfortunately, the hack that the DimmedScreen class exploited no longer works. That class has had to be deleted.