2007-09-22

Rectangles Redux Redux and a Calculator

Although there isn’t a great deal of noticable difference between this latest demo of Woopsi and the last version, quite a lot of under-the-hood changes have happened.

Starting at the start (good a place as any), I’ve added a new “Textbox” gadget. This gadget just displays a single line of text. It can align the text in one of three ways in both the vertical and horizontal planes (left-aligned, right-aligned, top-aligned, bottom-aligned, and centred in both dimensions). Handy feature to have, so I’ve stripped down the “Button” gadget and made it inherit from the Textbox. Instant text alignment all round.

The Textbox can have its text altered or added to once it has been instantiated. Implementing this was more tricky than I’d expected. If we need to add text to the textbox, we need to concatenate the strings. As I’m using char pointers this means that we need to:

  • Work out how much memory we need to allocate for the concatenated string
  • Allocate the memory
  • Copy the current memory into the new memory
  • Concatenate the new string with the new memory
  • Delete the original memory
  • Update the pointer to address the new memory

Sometimes I wish I was coding in C#, where all of that is handled by a simple “+=”.

The problem I had was that I was passing a string literal into the constructor, and updating the internal pointer to point to it. You obviously can’t delete that, so I had to change the constructor to go through the business above instead (the above checks for a NULL pointer before deleting).

The second change reflects the fact that I’ve been considering how Woopsi will be used by other developers. To this end, I’ve renamed all of the internal drawing functions. They shouldn’t be used by anyone other than gadget developers, and renaming them makes way for the forthcoming drawing functions that will be useful to programmers just using the system as a pre-built library. As part of this change I went through the code and changed the access levels of anything that was marked as “public” that shouldn’t have been.

Event handling has been tricky. I initially went for a MenuDS approach, in which events trigger pointers to functions. It’s really easy to build, but I discovered that it falls apart if you’re dealing with objects. If I want a button to trigger a member function in an object, I can’t - pointers to member functions are different from standard pointers to functions. More importantly, they’re useless for this system - you have to specify the class type pointed to when declaring a pointer to a member function, and as the classes will be written by other developers, this is no use.

I had a look about on t’internet for inspiration, and eventually decided on a class-based event system. Each gadget has a pointer to an instance of an “EventHandler” class, which has three methods - “handleClick()”, “handleRelease()”, and “handleDrag()”. When a gadget is clicked/released/dragged, it triggers the relevant function in the EventHandler. The clever bit is that the EventHandler class is an abstract base class. To create an event handler object, you’d create a class that inherits from the class, implements the pure virtual event handler methods, create an instance of the class and pass it into the gadget. When the event occurs, the gadget fires the method and passes a pointer to itself into the function. This ultimately means that we know which gadget fired the method, and we can get the events to trigger any function we want.

It may be necessary to add more event types as the system gets more complicated. In fact, it might even be handy to have an event data struct that contains the gadget pointer, x and y co-ordinates of the event, etc. It’d let me send more data around and it would be extensible, too. Hmm, sounds like the .NET “EventArg” class. In fact, it’s the same thing…

Just after I posted the last entry I changed the comment system to allow comments from non-registered users. The mysterious Jeff had some initial difficulty with the system, but persevered and I’m glad he did - he posted a tour de force of bit mangling to optimise the filled rectangle routine. I’m particularly impressed with his bit flipping logical AND that produces an even number from an odd. The new version of the code looks like this:

void drawFilledRect(s16 x, s16 y, u16 width, u16 height, u8 colour) {

    // Draw top line
    u16 lastX = x + width;
    for (u16 i = x; i < lastX; i++) {
        PA_Put8bitPixel(0, i, y, colour);
    }

    // Calculate DMA copy values - we need to deal with even values only
    u16 dmaX = (x + 1) & ~1;
    s32 dmaWidth = width;
    bool drawRight = false;

    // Did we convert to an even x value?
    if (dmaX != x) {
        // Reduce the width to compensate
        dmaWidth -= 1;

        // Remember that we need to draw an extra pixel
        drawRight = true;
    }

    // Pre-emptive bitshift to save calculations
    dmaX = dmaX >> 1;
    dmaWidth = (dmaWidth & 1 ? dmaWidth - 1 : dmaWidth) >> 1;

    // Calculate last line to draw
    u16 lastY = y + height;
    lastX = x + width - 1;

    // Precalculate line values for loop
    u16* line0 = PA_DrawBg[0] + dmaX + (y << 7);
    u16* linei = line0 + (1 << 7);

    // Loop through all lines
    for (u16 i = y + 1; i < lastY; i++) {

        // Perform DMA copy
        if (dmaWidth > 0) {
            DMA_Copy(line0, linei, dmaWidth, DMA_16NOW);
        }

        // Fill left gap
        if (x & 1) {
            PA_Put8bitPixel(0, x, i, colour);
        }

        // Fill right gap
        if ((width & 1) || (drawRight)) {
            PA_Put8bitPixel(0, lastX, i, colour);
        }

        // Move to next line
        linei += (1 << 7);
    }
}

Next up is a change to the TextViewer gadget that handles vertically-scrolling text. For some completely inexplicable reason, Woopsi started crashing in No$GBA and the real DS hardware. It worked fine in DeSMuME. After a lot of digging, I realised that the problem was the TextViewer - it would crash if I put a string with more than 63 characters into its constructor. However, it wouldn’t crash if I moved the constructor above the constructor for the second screen.

Sounds like a memory leak. Had an exhaustive dig through the code, and there isn’t any memory leaking in anything called by the constructor. Even stranger, disabling all of the code within the constructor still caused the crash. Strangest of all, DeSMuME still didn’t have any problems with it.

I eventually gave up, replaced the strings with char pointers, and it all works fine. I get the added benefit of internal consistency (since every other gadget uses chars, not strings) and undoubtedly a minor speed boost.

The last new development isn’t a change to Woopsi; it’s a test application built with Woopsi. As I’ve now got screens, windows, buttons, textboxes and event handling, I decided I was at the point at which I should be able to build something useful. The second screen now hosts a very simple, integer-only, 5-digit calculator application. It’s implemented a single class that creates a window, adds gadgets, and wires up the various events.

Some vaguely accurate stats for the calculator:

  • 30 lines of event handling code
  • 30 lines of GUI setup code
  • 25 lines of GUI managing code
  • 170 lines of actual calculator code

Approximately 250 lines of code needed for a simple GUI-based calculator application, almost all of which is “real” code rather than GUI manipulation. I’m sure that most of the real code could be streamlined, too - it’s a bit of a rush job.

The latest demo is here:

Woopsi Demo V0.06

Comments

Jeff on 2007-09-22 at 11:43 said:

You piqued my interest in things optimal so I decided to look under the covers of the compiler and there are some interesting things to learn. Your original was, I recall, something like

int x;
x = (x&amp;1) ? x+1 : x;

The compiler turns that into:

    ldr     r3, [fp, #-16]      ; load value from frame
    and     r3, r3, #1          ; check bottom bit
    and     r3, r3, #255        ; (wasted instruction)
    cmp     r3, #0              ; was the result zero?
    beq     .L8                 ; yes, branch
    ldr     r3, [fp, #-16]      ; no, load again
    add     r3, r3, #1          ; add one
    str     r3, [fp, #-24]      ; and store it in a temp
    b       .L10

.L8: ; come here on zero ldr r3, [fp, #-16] ; store x in temp str r3, [fp, #-24] ; .L10: ldr r3, [fp, #-24] ; copy temp back to x str r3, [fp, #-16]

Ghastly. My alternative was:

int x;
x=(x+1)&amp;~1;

The compiler turns that into:

    ldr     r3, [fp, #-16]      ; load value from frame
    add     r3, r3, #1          ; add 1
    bic     r3, r3, #1          ; clear bottom bit
    str     r3, [fp, #-16]      ; store value back

Thats a win for sure. However, when I used the -O switch on the compiler, the game seemed to change. Its a lot harder to test because gcc can optimise away completely null operations, so my test became:

int x;
x = (x&amp;1) ? x+1 : x;
dummy(x);

and the compiler turned that into: ; value is already in r0 tst r0, #1 ; test bottom bit addne r0, r0, #1 ; add 1 if bottom bit was non-zero bl dummy ; call dummy(r0)

while my code was essentially the same.

                                ; value is already in r0
    add     r0, r0, #1          ; add 1
    bic     r0, r0, #1          ; then clear bottom bit
    bl      dummy

The lesson to be learned, I guess, is that there are some things for which the ?: operator is actually an good fit, provided that you actually let the optimiser have a go at it (with -O)

Similiarly, your code has if (x&1) inside the loop to decide whether to draw the left edge, and (width&1)||drawRight to draw the right.

( Since width is invariant, you could do that comparison outside the loop … drawRight |= (width&1); … then just test drawRight inside the loop)

if (x&amp;1) {
    dummy(...)
}

generates:

    ldr     r3, [fp, #-16]      ; load x
    and     r3, r3, #1          ; mask
    and     r3, r3, #255        ; (wasted)
    cmp     r3, #0              ; compare
    beq     .L20                ; and branch on zero
    mov     r0, #5
    bl      dummy

.L20:

If you compile with -O it becomes: ; value is already in r0 tst r0, #1 ; test bottom bit movne r0, #5 blne dummy ; call if the bit was nonzero .L13:

if ((x&amp;1)||y) {
    dummy(...);
}

becomes: ldr r3, [fp, #-16] ; load x and r3, r3, #1 ; mask and r3, r3, #255 ; (wasted) cmp r3, #0 ; compare bne .L17 ; branch to the call if set ldr r3, [fp, #-20] ; load y cmp r3, #0 ; compare with zero beq .L21 ; branch past the call if zero .L17: mov r0, #5 ; do the call bl dummy .L21:

or with -O: ; values is already in r0/r1 and r0, r0, #1 ; mask out bottom bit orrs r0, r0, r1 ; or in y movne r0, #5 ; if result non-zero, do the call blne dummy .L13:

Moral: make sure you are passing -O to gcc.

ant on 2007-09-23 at 00:23 said:

Wow, thanks for the exhaustive analysis! Looks like there’s still a couple of optimisations I can make to the code; I might have a look at that tomorrow if I get time.

ant.simianzombie.com » Blog Archive » Woopsi Progress! on 2007-11-30 at 15:01 said:

[…] realised this a while ago (here, but it looks like I didn’t explain the reasons), which is why I decided to rename all of the […]

ant.simianzombie.com » Woopsi Progress! on 2009-10-07 at 11:47 said:

[...] realised this a while ago (here, but it looks like I didn’t explain the reasons), which is why I decided to rename all of the [...]