2018-06-06

Chuckie Egg GBA

Here’s a video of something old-but-new I’ve been working on:

It’s Chuckie Egg, but for the GBA.

How did I manage to squeeze the bitmap-mode UI library built for a dual-CPU 66/16MHz DS into a single-CPU 16MHz GBA? With full-screen scrolling?

I didn’t even try.

Extracting A Library

I mentioned before that the games I’ve been writing with SZLib adhere reasonably strictly to the MVC architecture. The game logic forms the “model” portion of that architecture. If done correctly, the game logic could live in an entirely separate library from the code that handles graphics, sound and user input. That’s what I’ve done to get this working: I moved the model to a new “SZChuckie” library and created a new front-end project that uses the GBA’s tile-and-sprite graphics modes.

Splitting out the game library was mostly trivial, but somehow took far longer than I expected. References to bitmaps, sounds and user input handling had all found their way into the game logic. For example, each of the sprite-esque things in the game (eggs, chicken feed, hens, ducks, Harry) stored an array of bitmaps that represented their animations. The view controller could ask for the current bitmap in order to display the correct image in a view. I had to replace the bitmap reference with a more abstract “animation frame” object, which ditched the bitmap data and stored just a size and a frame number. The view controller now stores the bitmaps and uses SZLib’s type introspection to figure out which bitmap to display for any incoming didChangeFrame events emitted by the game object. The class that represents a level had a function for generating a bitmap from its level data which the view controller used as the game’s background; I moved that into a separate file.

Pulling the sounds into a higher layer meant adding more callbacks to the game object’s delegate. The delegate - in this case, the view controller - can respond appropriately and play the correct sounds.

Input was similarly straightforward. I added a new d-pad/button class specific to Chuckie Egg (so there’s a “jump” button instead of “A” and “B” buttons) and pass its state to the game whenever I call its update() function. As a side-effect, this fixed a few places where the game wasn’t responding correctly to some button presses or the analog stick.

The most long-winded changes were purely administrative: renaming all of the files/classes/etc; creating makefiles and build scripts; and testing that nothing was broken.

With all that done I could move onto the GBA port, which of course started with more administrative work. I needed to get three libraries from SZLib building for the GBA: the core library, which includes reference-counted objects, boxed primitives and collections; the geometry library, which includes points, sizes and rectangles; and the hardware abstraction library, which includes abstractions for initializing the hardware, sound, vwaits and hardware division. Those all needed makefiles. I used the makefile from libgba as a template. With the makefiles in place I could add GBA support to the hardware abstraction library, which involved digging through libgba’s headers to find the appropriate function calls (where available) or disabling the functionality (where not). The core and geometry libraries just worked.

That all put the project in a very weird place. Typically when I start a game I get enough of the logic working to start pushing pixels around on the screen, and from there the logic and graphics code evolve in parallel. Somewhere near the end of the project I’ll add sound. However, in this project I just had to link the game library and hook up a few callbacks to get a fully-working and audible, but invisible, version of Chuckie Egg.

The GBA

Programming the GBA isn’t particularly difficult. Well, it shouldn’t be difficult because it’s a very simple platform. It’s a little harder than it needs to be because libgba has no documentation and the canonical tonc guide references constants that don’t match libgba, so I find myself trying to interpret memory addresses rather than looking for the names of those addresses. The command line graphics conversion utility that ships with devKitARM (grit) has a tendency to either get things wrong (it refuses to generate a correct palette for me) or be unhelpfully helpful (some combination of settings causes it to insert a blank tile at the start of a tile map, causing me to spend far too long trying to figure out why I couldn’t fill the screen with the tile I wanted).

Once I really get the hang of the GBA I’ll hide all of its features behind an abstraction library that I can make cross-platform, which will let me debug my code on my development machine. With neither a debugger nor the ability to printf() on the GBA I’m wasting far too much time running C in my head.

The Port

The GBA’s screen is 240x160 pixels, which is about 20% smaller than the Dragon 32’s resolution of 256x192 (this is a remake of the Dragon 32 version). The DS port was very fortunate in that the DS’ resolution is also 256x192, so there was no need to scroll or scale the game to get it to fit. For the GBA port I decided to opt for scrolling as it’s so simple to achieve (just change a couple of registers). It refused to work for a while until I dug up a comment in one of the examples that noted that the registers are write-only. Duh.

I think that the scrolling works really well. The levels aren’t much larger than the GBA’s screen so it’s impossible to get disoriented or lost. It is possible to get into a “leap of faith” situation where a player might jump from the top of the level hoping that there’s something at the bottom to catch them, but it doesn’t make the game less enjoyable (at least, not for me, but then I know these levels backwards so I’m not a great test subject).

The unmodifiable parts of the levels - the floors and ladders - are implemented in a tiled background. Everything else - the hens, duck, eggs, chicken feed and Harry himself - are all sprites. That gives me around 30 sprites per level, which the GBA throws around without breaking a sweat. I’m considering changing the eggs and feed to become tiles within the background if performance becomes a problem.

Sound is powered by MaxMod. The API is slightly different to the DS’ version, but they’re similar enough that making sound work was simple.

I wasn’t particularly worried about performance. From the considerable amount of perf work I’ve done on the DS version I know that the overwhelming amount of work the game does is in the UI libraries that I’m not using.

Disaster Strikes

Everything was progressing well until I hit a massive problem. The game worked perfectly in mGBA and VBA-M, but it refused to do anything but display a green screen on a real GBA. Every time I’ve encountered something like this on the DS it was caused by a memory problem. The DS emulators I use don’t accurately model the behavior of memory on a real device, so problems like allocating the wrong amount of memory, forgetting to initialize variables, and using memory after it’s freed (etc) can crash real hardware but cause no problems under emulation. I set about investigating.

First step: Code inspection. Was I allocating the wrong amount of memory for my objects? That’s a mistake I’ve made repeatedly in SZLib: allocating enough memory for a pointer to a struct rather than the struct itself. Nope. Forgetting to initialize variables? Nope. Leaking memory? Nope.

Next: Run the SDL version through a static analyzer. No problems.

Next: Run with the address sanitizer enabled. No problems.

Maybe the problem is in the GBA front-end code. I #ifdef-ed out enough of the GBA-specific code to get it compiling on my dev machine and ran that through all of the available code/memory checking tools. No problems.

Let’s dig back through the Git history to find a version that did work on the GBA. Rolling back a few days gave me a version that worked. It seemed that I’d subsequently made a change where I stopped writing to an intermediate buffer representing the OAM (object attribute memory; for sprites) and wrote directly to it. Undoing that change didn’t fix the bug.

I started to remove arbitrary pieces of code that caused memory allocation to occur. Sometimes a change would allow the game to run; sometimes a different bug would manifest. It made no sense.

I enabled libgba’s console output and wrapped malloc() so that it would print the address of the allocated memory. No clues there: all of the addresses were safely contained within the correct region of RAM. Then I started printing out the addresses of stack variables. It seemed that I was getting close to the end of the stack, which perhaps made sense. Stack variables are stored in a 32K block of memory (IWRAM) which is immediately adjacent to a 1K block of memory that controls I/O. Overflowing the stack would cause weird behavior. Immediately following the 1K I/O block is a 1K palette block. One of the changes I made to the game caused the palette to start cycling. Aha, that seems likely. But how am I overflowing the stack? And why doesn’t that bug show up in the emulators?

I started up mGBA and took a look at its memory viewer. According to that the game uses a tiny fraction of the 256K and 32K RAM blocks. There’s no way it should be overflowing the stack or encountering problems related to memory shortages.

Perhaps there’s something wrong with the GBA itself. I recently got hold of another backlit GBA which is the device I’ve been testing on. I haven’t used it for playing games; maybe it’s defective? Nope, same problem on another device.

At this point I’d wasted three evenings on this ridiculous problem and, although I’d learned a significant amount about how the GBA’s memory is laid out, I’d made no progress in figuring out the cause of the issue.

I put everything away and thought about the problem some more.

There’s nothing wrong with the majority of the code. It’s running on 4 other platforms without issue. I’ve tested it with every analysis tool at my disposal and can’t find any problems. The code that’s specific to the GBA amounts to less than 1,000 simple lines, and I analyzed most of that too. The fact that the game runs perfectly under emulation suggests it isn’t a library or compiler problem. It isn’t a problem with the GBA hardware because it exhibits the same behavior on two different devices. It isn’t a memory exhaustion problem. So what’s the difference between running the game in an emulator and running it on my GBA?

The flash cart.

Could it be the flash cart?

I’ve got an EZ-Flash IV (one of the older ones that uses mini-SD cards). It runs its own code and patches ROM files in order for them to work correctly. Could the flash cart be breaking the ROM? Fortunately I’ve got a couple of older flash carts lying around. Copy the ROM to my Supercard SD and…

It works. Perfectly.

The EZ-Flash is doing something bad to the ROM, causing it to write to bad memory addresses, causing bad things to happen. There’s nothing wrong with my code. And there was no way I was ever going to debug that problem.

To Do

There’s plenty of work left to do before the game is finished:

  • Add a “Simian Zombie” screen.
  • Add a title/high score screen.
  • Add level/life/score/time/bonus indicators.
  • Show the “Game over” screen when the player dies.
  • Show the score increasing when the player completes a level.
  • Show the “Get ready” screen before each level.

I also want to add an attract mode showing little demos of gameplay that appear after a few seconds idling on the title screen. I’m thinking of using that system to show how to perform various techniques in the game (like grabbing ladders easily, bouncing between ladders, etc). Oh, and then there’s the level editor that I didn’t finish, too, but that makes less sense on the GBA as it doesn’t have a writable file system.

2016-10-05

Hanky Alien Refactored

I’ve been toying with the idea of writing another homebrew DS game using the libraries I developed for Hanky Alien. Unfortunately the existing code has one considerable flaw: the game logic is intimately entwined with the rendering system, to the point that writing a new game unavoidably involves rewriting a bunch of graphics code that really should have been separate.

To fix this I’ve pulled apart Hanky Alien and reassembled it into three distinct pieces:

  • A layering library that handles all of the rect stuff I insist on using instead of just writing a sprite simulator for SDL like any sensible person;
  • An MVC GUI framework with the components necessary to get the game running (view hierarchy, controllers, timers, label views, bitmap views, event-based stylus and button handling, transitions, etc);
  • The game itself.

The game logic is currently included in the same build target as its presentation layer, but those can be trivially separated once I get around to it.

The new structure has a multitude of benefits. In order to place text on the screen, the code no longer needs to figure out where the text will go, figure out if it needs to erase what was previously at those co-ordinates, and then render the text; it just updates the string in a label view and lets the underlying libraries do all of the hard work. There’s a complete separation between the game logic and all I/O. Timer events piped down from the controller kick the game’s update logic, and game events (such as movement, or the addition or removal of objects within the game) are sent up to the controller via delegate callbacks. The controller is responsible for creating, moving, updating and removing views. I can change the game logic without having to think about any rendering. Each scene in the game - the “Simian Zombie” scene, the title scene, and the game scene itself - are all separate controllers. The GUI framework handles animating the transition from one controller to the next, which means a bunch of complex custom state machine logic became redundant. On top of that, the whole shebang is written in pure C.

All of this structure came with a horrible cost, however. Performance was nowhere near acceptable.

One huge benefit of manually managing the layout and redraw of the screen means each situation can be as efficient as possible. A generic approach to updating the framebuffer means there’s no opportunity for taking shortcuts that make sense in one place but would break other code. This forced me to dig through the entire codebase and try to optimize everything.

First step: profiling with Instruments. Hanky Alien builds happily using SDL2, so I can profile it on my Mac. It doesn’t necessarily give me accurate information, obviously; the Mac and DS have wildly different architectures and capabilities. It does give me a starting point, though.

One mistake that quickly became apparent was the amount of boxing and unboxing of rect structs that the layering library was performing. Subtracting one rectangle from another produces an array of leftover rectangles, and those were returned in an array. The array provided by the core library would only store objects, so I’d opted for the naive approach of boxing the leftover rect structs into value objects and sticking those in an array. Unfortunately each boxing operation required a malloc(), and any attempt at using the data in the objects required them to be unboxed back into structs first. The fix for this was easy: create a new array class specifically for storing collections of rect structs. As long as a resize isn’t required - which inevitably demands a realloc() call - appending rects involves nothing more than changing an int and copying a struct.

Creating those arrays was also slow, particularly when multiple rectangles are divided in a tight loop. The code used to look something like this:

for (int i = 0; i < SZRectArrayLength(rectsToDivide); ++i) {
    SZRect r = SZRectArrayRectAtIndex(rectsToDivide, i);
    SZRectArrayRef a = SZRectCreateArrayBySubtractingRect(rect, r);

    // Do stuff with a

    SZRectArrayRelease(a);
}

It now looks like this:

static SZRectArrayRef a = NULL;

if (!a) {
    a = SZRectArrayCreate();
}

for (int i = 0; i < SZRectArrayLength(rectsToDivide); ++i) {
    SZRect r = SZRectArrayRectAtIndex(rectsToDivide, i);
    SZRectSubtractRectAndPopulateArray(rect, r, a);

    // Do stuff with a

    SZRectArrayRemoveAll(a);
}

The “remove all” function just sets the array’s size property to 0. We create a single static array and re-use it in each call of the function and each iteration of the loop. We can get away with this because all of the code is single-threaded.

I’ve been unrolling loops. The code above does a bunch of work in the loop that we could unroll. We can move the variable declarations out of the loop and stop querying the array length on each iteration:

int rectCount = SZRectArrayLength(rectsToDivide);
SZRect r;

for (int i = 0; i < rectCount; ++i) {
    r = SZRectArrayRectAtIndex(rectsToDivide, i);
    SZRectSubtractRectAndPopulateArray(rect, r, a);

    // Do stuff with a

    SZRectArrayRemoveAll(a);
}

We’re passing structs around by value; we should probably stop doing that too:

for (int i = 0; i < rectCount; ++i) {
    r = SZRectArrayRectAtIndex(rectsToDivide, i);
    SZRectSubtractRectAndPopulateArray(&rect, &r, a);

    // Do stuff with a

    SZRectArrayRemoveAll(a);
}

The game logic itself had some inefficiencies. Something I’d annotated with a “todo” in the original game was the alien collision code. In order to check if the player’s bullets were colliding with an alien, the code looped over all aliens in the game. The new version treats each column of aliens as a unit and checks collisions with those before checking against the aliens themselves.

The code uses a huge number of rect calculations, so those needed to be as fast as possible. All of the functions had code to cater for rects with a negative width or height. That would never be possible with the data coming out of the various libraries, so I added a bunch of “fast” versions that didn’t do any sanity checking.

The rendering code now uses Cearn’s ARM assembly copy functions for blitting stuff to the framebuffer.

The code is much, much faster than it used to be, but it’s not enough to throw around as many bullets as the older version.

Some of the exciting failures were:

  • Flattening the view hierarchy into a single view and manually handling a bunch of the redrawing logic. Initially it seemed like this would lead to a massive speed boost, but as the redrawing code got more and more complex the game got slower and slower.
  • Replacing bitmaps with solid blocks of color to see if the DMA would help.
  • Using quadtrees to help partition the space and reduce the amount of collision detection being done.
  • Creating my own profiler to record the execution time of various functions.

My final idea was to ditch the DS completely and switch platforms to something with more CPU power. Requirements for the new platform:

  • Must be a console or handheld (rules out anything with a keyboard or without a joypad).
  • An easily-installed homebrew dev kit with great C support (rules out pretty much everything prior to 32 bit).
  • A faster CPU than the DS’ 66MHz ARM9 (no GBA).
  • No fighting firmware upgrades (so long, Vita/3DS/Wii U).
  • Well-designed input devices (no GPH devices).
  • Must be easy to test on the real device.
  • Must have a reasonably capable OSX emulator.

By my estimation, that leaves three potential devices: the PSP, the Wii, and the Dreamcast. I’ve had a look at the dev kits for the PSP and Wii and neither one really seemed to fit with the archaic software rendering that I’m trying to do. The Dreamcast, though, appears to have a wonderfully simplistic framebuffer graphics mode and even supports DS-style RGB555-encoded pixel data. It could use some more homebrew.

One trip to Amazon later and a new (old) NTSC Dreamcast is winging its way to me! After a few evenings of tinkering I managed to get this running in lxdream:

HankyAlien-Dreamcast

Unfortunately, despite the Dreamcast being fantastically more capable on paper (200MHz CPU, 16MB RAM, 8MB VRAM vs the DS’ 66MHz CPU, 4MB RAM and 656KB VRAM), it turns out that in terms of raw CPU performance there’s not a whole lot between them. On a more positive note, programming the Dreamcast looks like it’ll be fun, so once this refactoring project is complete I think I’ll try writing something exclusively for it.