Performance and Quad Trees

I spent a couple of weeks working on improving the performance of SZLib. Specifically, I wanted to eliminate two of the major bottlenecks in the system: recording areas to be redrawn; and determining which layers within the hierarchy will draw those areas.

The algorithm for recording damaged areas looks like this:

  • Add the rect to the damaged rect manager:
    • For each existing damaged rect:
      • Remove the intersection from the incoming rect.
    • Add the remaining areas of the damaged rect to the damaged rect array.

The algorithm for determining the layer responsible for redrawing a rect looks like this:

  • For each rect in the damaged rect manager:
    • For each layer in the layer hierarchy:
      • Add the layer/rect intersection tuple to an array.
      • Remove the intersection from the damaged rect manager.

There are obvious issues in both of these algorithms. In the first we have to compare every new damaged rect with every existing damaged rect. In the second we have to compare every damaged rect with every layer, which gives us a worst-case complexity of O(n^2).

My solution to both problems was to create a simple quad tree. I’d tried a complex quad tree before but it failed miserably. This time I:

  • Stored only rectangles, not rectangles and the layers that they represented;
  • Created the entire fixed-depth quad tree immediately, rather than trying dynamically to split and join nodes;
  • Reset and re-used quad tree instances instead of creating new trees each time I needed one;
  • Treated the quad tree as an immutable structure once it had been built by eliminating methods that would remove rectangles.

The time complexity of the two algorithms above is now the worst-case rather than every case. This change, coupled with some other minor optimizations, produced some great results.

I put together a test app that has 80 overlapping Hanky Alien bitmaps moving around the screen with a few nested text boxes. Numbers from my Mac suggest that the test app is around 40% faster now.

Some more numbers:

  • Ignoring time spent in SDL and during vertical blanks, the test takes up less than half a second of CPU time for every 10 seconds of runtime on my Mac.
  • Rendering time for every 10 seconds of runtime is about 30ms on my Mac.
  • The PSP can handle moving 68 aliens at a solid 60fps.
  • The DS can handle moving 13 aliens at a solid 60fps.


Amy Zing Update

A while back I released a very simple game called AmyZing and the Amazing Mazes. Somehow this ultra-minimalist game attracted the attention of a DS homebrewer called PypeBros, who offered to put together a sprite of Amy herself. You can download an updated version with his sprite here:

I dug out a tracker for the first time in over a decade to add a little chiptune to the game. Only DS and PSP users get music as I still haven’t found good audio libraries for the Dreamcast or 3DS.


Drop Attack

You might remember that I pulled my version of Super Foul Egg from the App Store a long time ago (almost three and a half years ago now) to make way for an official version. Unfortunately that official version never actually got released, so there’s been no way to play SFE on your iPad for far too long.

However! Here’s a version that got released in the last couple of weeks that I’ve been enjoying:

It takes its inspiration from SFE but adds new game modes with a variety of new ideas and local multiplayer (which I didn’t get around to finishing in my version). It’s only 99c so it’s worth a try.


Puzzle Game Randomization

It seems that shapes in the official versions of the puzzle game are never truly random: the game tries to avoid delivering the same shape three times in a row. The GB game used a moderately broken routine that incorrectly produced some shapes more frequently than others. Most modern games adopt an approach seemingly inspired by Scrabble, in which all 7 shapes are added to a bag and pulled out at random until the bag is empty. This is a wonderfully simple algorithm, but it sounds like it would lead to horribly predictable games. It also makes it impossible to receive three of a given shape in a row rather than merely very unlikely.

My version of the game started out with a real randomizer (well, as random as you can get with rand()), but it sometimes made the game unenjoyable - lengthy repetitions of the same unhelpful block, long gaps between shapes I needed, etc - so I tried out the bag idea. Thinking it would be more enjoyable if it were possible to receive the same shape three times in a row I opted to make my bag hold two of each shape. It seems to work pretty well and eliminates the unenjoyable shape sequences, but it does introduce an element of predictability. There’s now a maximum number of shapes that can occur between two appearances of a given shape, which I can use to plan how I structure the playfield.

Worried that this predictability made the game too easy, I sat down for a lengthy play through. I reached level 19 and completed 194 lines. I can’t remember the last time I managed that on the Game Boy. Nuts. I switched to the Game Boy version to see how I did at that. Despite that version now seeming ugly and slow in comparison to my own, I managed to reach level 20 with 201 lines. It’s not easier after all.


Isometric Demo

I’ve been working on a new little project. It’s a game I’ve been thinking about writing for years that’ll hopefully going to turn out like a cross between the puzzle solving of Dizzy and the exploration of Metroid. Part of the delay in getting started was indecision about the best viewpoint for the game, which I’ve been debating for a long time.

If I opt for a Pokémon-style overhead viewpoint I’ll probably have an easier time with the artwork and the game will be much more sedate, which is something I’m aiming for. On the other hand, it prevents me from including any kind of gravity/jump mechanics, which are an important part of the exploration aspect of the game. If I opt for a Metroid-style sideways viewpoint it’ll allow for jumping but it will also likely introduce annoying Metroid-style platform negotiation that I want to avoid.

Recently I’ve been considering an isometric viewpoint like Head Over Heels or the ZX Spectrum version of Batman. It gives me the advantages of both the Pokémon and Metroid viewpoints, but doesn’t preventing jumping and seriously discourages any finicky platforming action. I’ve decided to give it a try.

I started out with this randomly-generated room:


Rooms are represented as an array of tile structs, which are arranged such that the first element in the array represents the tile at (0,0,0) (the tile at the bottom of the stack at the leftmost side of the image):


The first row of the x axis is the first set of data in the array, followed by the next row, working backwards in the y axis. Once we hit the end of the y axis we move up a level in the z axis and repeat.

The graphics consist of hexagonal bitmaps each representing a single tile in the map. In order to render the map we start with the tile at (0,0,max) and render all tiles to (0,0,0). We then move to (1,0,max) and render to (1,0,0). Once we’ve rendered the entire bottom level of the map we move to (0,1,max) and repeat the sequence again. Simply put, we render from back to front, left to right, bottom to top, using the painter’s algorithm. The map consists of tiles that don’t move and aren’t animated, so we can render it to a buffer and copy chunks of it to the framebuffer when we need to erase things.

Rendering moving objects turned out to be far more involved. The problem is occlusion: sometimes moving objects are in front of background elements and sometimes they’re behind them depending on where the objects are positioned in the 3D space described by the map. Sometimes objects are partially occluded.

The easiest way to handle occlusion is to re-render the entire scene, including moving objects, from back to front each time an object moves. This is inefficient and gets slower in proportion to the complexity of the scene.

A more efficient idea I had was to figure out which tiles were in front of moving objects, render those to separate buffers, and add them to the scene as foreground views. However, this turned out to be trickier to implement than I expected.

Yet another approach I came up with was to use raycasting to render the scene. For each pixel of the screen project a ray into the map and see where the ray intersects a tile. Find that pixel within the tile’s bitmap and draw it to the screen. This approach is promising for two reasons: each pixel is drawn precisely once; and it makes possible interesting enhancements to the game engine, like texture mapping instead of precanned hexagonal bitmaps, and the ability to rotate the map by an arbitrary amount. It’s a much simpler problem than a Wolfenstein-like first-person raycaster because the isometric perspective eliminates all of the complex 3D transforms and trigonometry.

After deciding that a raycaster was overkill I came up with an algorithm that uses a z-buffer instead. It’s split into two parts. First we generate the background image buffer and z-buffer in one step:

  • Create an array of ints with the same dimensions as the rendered bitmap (this is the z-buffer);
  • Fill the z-buffer with -1;
  • For each tile in the map in the order described above:
    • Draw the tile to the buffer;
    • For each pixel drawn to the buffer:
      • Write the depth of the tile (ie the tile count minus the current iteration count of the loop) to the z-buffer at the coordinates of the pixel.

That gives us the background image buffer we need plus a z-buffer in which each value describes the depth of a pixel in the image buffer.

Next we can draw any other objects:

  • Create another array to store the moveable-object z-buffer;
  • For each moveable object:
    • For each point in the left, right and top faces of the object within the 3D space of the map:
      • Calculate the 2D location of the point within the object’s bitmap and grab the pixel color;
      • Calculate the depth of the point;
      • Calculate the 2D location on screen where the pixel will be drawn;
      • Get the depth of the pixel in the background image buffer by looking up the appropriate index in the background z-buffer;
      • Get the depth of the pixel from the moveable object z-buffer;
      • Compare the depths and draw the pixel to the framebuffer if the depth from the moveable object is less than either depth from the z-buffers.

Erasing and redrawing is handled by marking the region that bounds the moveable object bitmap as dirty. When we redraw we blit the relevant background rect into the framebuffer then redraw the moveable objects on top.

Here’s what it looks like so far:

Unfortunately it’s too slow on the DS right now.


A Puzzle Game

If you’re stuck on an aeroplane with nothing to do, what better way to pass the time than to write a new game? I’d written versions of this particular game in Flash and SQL before so creating a new version in C would be more of an exercise in dredging up old memories rather than creating something new (especially as the basics are so trivial to implement), making it an ideal project for a plane flight. I got the game logic working in the air, the rest of the game written over the next few days, and I’ve been mucking around with presentation and fluff for the last few weeks.

Here’s a slightly cropped video of the DS version:

Most of the game works the same way as the monochrome Game Boy version. Personally I think that everything added to the game since the Game Boy version - with the exception of color - has made the game worse, so this version intentionally sticks fairly closely to its minimalistic inspiration. Some of the graphics I redrew from scratch, using the Game Boy Color version as a reference, whilst some started out as screenshots of the GB version. The music came from The Mod Archive; after having no success in using large WAV files of the original audio I was happy to find some great versions of the original music on that site in XM format.

The only aspect of the game I intentionally changed was the responsiveness of the controls. Neither of the Game Boy versions of the game managed to get the controls right. The controls in the GB version of the game were sluggish and unresponsive. The controls in the GBC version grossly overcorrected the problem and made everything move far too quickly. This version finds a happy medium between the two.

I dug out the source code for the Flash version as a reference to see how I’d implemented that, opening it with a hex editor as I haven’t owned a copy of Flash XM or a computer capable of running it for years, but surprisingly there was barely any code in it. Thinking back, the two trickiest parts of the Flash remake were coming up with a function to rotate multidimensional arrays by 90 degrees (in order to rotate the shapes) and trying to increase the automatic drop speed by a constant amount in each level. The Flash version tried to solve the drop speed by using an ugly system in which, instead of regularly dropping the shape every n frames, drops were triggered by a repeating pattern of frames - eg a drop after one frame, then after two frames, then after one frame, etc - in order to simulate fractional frames. Rotating the shapes was tricky because each shape was contained within an array just large enough to accommodate the shape (the box was stored as a 2x2 array while the line was a 1x4 array, for example), so the rotation algorithm had to work with multidimensional arrays whose dimensions changed size as they rotated. Worse, each shape has a rotation point that can fall outside the bounds of its array making it difficult to reposition correctly in the grid once rotated.

For the DS version I avoided the rotation problem entirely by storing all possible rotations of the shapes as pre-canned data in the game, and solved the repositioning problem by representing all shapes using 4x4 arrays. There’s no need to reposition the shapes following a rotation because the arrays are large enough to contain the correct position of each rotation. I solved the drop speed problem by using the frame counts from the Game Boy version as listed over at HardDrop.

Of course, creating a new game led to a number of improvements and enhancements to the underlying libraries, which is the fun but time-consuming part. I wanted to support graphics that were 50% larger on the PSP and 3DS, which meant changing my PNG-to-C tool so that it could convert two PNG files into one C file with #ifdefs to switch automatically between small or large graphics depending on the target platform. I was already embedding two copies of the bitmap data - one for the DS and other platforms and one for the Dreamcast - so I first had to implement real fixes for the hacky system that handled the color space conversions for that platform and get rid of the unnecessary bitmap data.

Once I had support for multiple bitmap sizes I realized I’d also need to be able to support multiple font sizes. I finished off my objc tool for creating fonts from PNGs and added large font support to it.

Supporting XM mod files meant adding mod support to the sound system. That was straight-forward enough: SDL’s mixer library uses libmodplug for mod playback; the DS uses MaxMod; and I’m using MikMod for the PSP. I haven’t figured out mod support for the Dreamcast or any kind of sound for the 3DS yet. Hanky Alien, Chuckie Egg and this game all had their own sound implementations, so I divided the class into two pieces (a data provider and a sound server) and moved the server out into the shared hardware abstraction library. Chuckie Egg and this game now use the server code; Hanky Alien will use it eventually.

Finally, I’ve been adding macros for reducing the amount of boilerplate necessary when creating classes. I’ve also created a tool that will generate new source and header files for classes.

There’s still a bunch of tedious presentation work to do to finish this off. I wrote a hierarchical menu system for Amy Zing that I’d hoped would be the only menu system I’d ever need to write, but naturally the menu system I need here uses a different structure. It’s more like a graph of screens than a hierarchy, in which the decision about which node to visit next depends on logic in whatever object owns the menu instance rather than the option that the user just selected. I’m hoping that the high score system I wrote for Chuckie Egg will just drop in to this game, though I’ll need to write a new view to present it. I need a title screen, a “game over” screen, a “game complete” screen (for the B-type game), some alternative background graphics, launcher assets for the PSP, a two-player mode (on the same device) and music/sound on platforms that currently don’t have it.


Amy Zing and the Amazing Mazes

This is a very little game created solely for one very little toddler who kept harrassing me to draw mazes for him to solve. The effort involved in creating the mazes vastly outweighed the effort it took him to solve them, so I figured I needed to automate it. The game is loosely based on the snail maze built into some revisions of the Sega Master System BIOS but it isn’t a remake. Like the snail maze, the objective is to guide the red square through the maze from the starting point to the finishing point.

Here’s a screenshot:


I used the Kruskal algorithm to generate the mazes. On the linked page the author complains that the algorithm “tends to create a lot of short dead-ends”, but that property makes it perfect for toddlers who don’t have the patience for extensive backtracking when they take the wrong path.

The game itself took almost no time at all to write; excluding libraries and presentation fluff the game weighs in at around 600 lines of C. The most complex tasks were fixing a weird rendering bug (that turned out to be a missing conversion between co-ordinate systems in the layer library) and writing a bunch of transitions between scenes. The first, and most troublesome, transition looks just like this one I made 7 years ago in JavaScript (hit refresh if the image doesn’t show up; the script doesn’t preload the image), but with the addition of fade out as the image disappears. Unfortunately that proved too complex for both the DS and the 3DS to render at 60fps so I replaced it with a simpler cross-fade transition.

There’s no title screen bitmap and the player’s character is just a red box. I’d intended to come up with a pixel art title screen showing Amy Zing herself, and an Amy Zing sprite that would wander around the mazes, but that’s beyond the limit of my drawing ability (if anyone is interested in contributing a title screen and a sprite in 3 sizes, let me know).

Tinkering with menu systems, transitions, difficulty levels, presentation and general polish took long enough that the toddler in question has long since lost interest in solving mazes.

Download it here:


SZLib Super

Something missing from SZLib is a way for subclasses to call the implementation of a method they’ve overridden. For example, in Objective-C:

@interface ClassA : NSObject

- (void)printMe;


@interface ClassB : ClassA

@interface ClassC : ClassB

@implementation ClassA

- (void)printMe {


@implementation ClassB

@implementation ClassC

- (void)printMe {
    [super printMe];


int main(int argc, const char *argv[]) {
    @autoreleasepool {
        ClassC *c = [[ClassC alloc] init];
        [c printMe];

    return 0;

The output of this program is ClassAClassC. The call to super in ClassC is smart enough to realize that ClassB doesn’t implement the method and will fall back to the implementation in ClassA. super can be used recursively to move up through the class hierarchy despite the type of the object being acted upon not changing.

I’ve been trying to come up with a way of supporting something like this in SZLib, but it turns out that it’s tricky to implement tidily without compiler support.

First, I moved the callbacks struct out of the objects that I create and into their metaclasses. Here’s the callback struct from SZObject:

typedef struct {
    int (*equals)(SZTypeRef, SZTypeRef);
    int (*compare)(SZTypeRef, SZTypeRef);
    unsigned long (*hash)(SZTypeRef);
    SZTypeRef (*copy)(SZTypeRef);
    void (*dealloc)(SZTypeRef);
    char *(*description)(SZTypeRef);
} SZObjectCallbacks;

Previously I’d pass this into the object’s initializer:

SZTypeRef SZObjectInit(SZTypeRef ref, const SZObjectCallbacks *callbacks);

Aside from being awkward, this meant that each object could only reference the single callback struct passed to it in its initializer. There was no way to work back through the class hierarchy and examine the callbacks of any superclasses. The workaround I used was for classes to expose their callback functions and allow subclasses to call them (so SZSomeObject had a function __SZSomeObjectEquals() exposed in its header, which subclasses of SZSomeObject could call in their own equals implementation). This too was awkward, as it exposed the internals of the superclasses and required that subclasses knew more than they really needed to about the implementation of their superclass.

Callbacks are now passed to the allocator, not the initializer, as part of the metaclass:

typedef struct __SZMetaClass {
    const struct __SZMetaClass *parentClass;
    const char *className;
    const SZObjectCallbacks *callbacks;
} SZMetaClass;

SZTypeRef SZObjectAllocate(size_t size, const SZMetaClass *metaclass);

Each metaclass has a pointer to its superclass, instantly giving us a hierarchy we can walk.

Now that we have a hierarchy we still need to be able to:

  • Call the superclass’ implementation of a given function from an overriding function in a subclass;
  • Walk back up the hierarchy if we find that a superclass doesn’t implement a given function until we find a class that does;
  • Allow recursive calls of the function so that each subclass in the hierachy can call its superclass’ implementation.

This is last point in particular is harder than it sounds. The obvious approach fails miserably:

void SZSomeObjectEquals(SZTypeRef ref1, SZTypeRef ref2) {
    SZObjectRef obj1 = (SZObjectRef)ref1;
    SZObjectRef obj2 = (SZObjectRef)ref2;

    // Attempt to call "super" implementation of "equals"
    return obj1->metaclass->parentClass->equals(obj1, obj2);

Accessing the parentClass in this way gives us the same metaclass each time, so if the superclass’ implementation of equals calls its own superclass it’ll get stuck in a loop. We’d need to pass the metaclass as an argument to the function so that we can update it on each successive call:

void SZSomeObjectEquals(SZTypeRef ref1, SZTypeRef ref2) {
    SZObjectRef obj1 = (SZObjectRef)ref1;
    SZObjectRef obj2 = (SZObjectRef)ref2;

    // Call the immediate superclass' implementation of "equals"
    return obj1->metaclass->parentClass->equals(obj1, obj2, obj->metaclass->parentClass);

That implies a public API that doesn’t include a metaclass parameter vs an internal API that expects a metaclass parameter.

We also need to deal with the possibility that some classes have opted not to override the function, so we end up with something like this in SZObject:

int SZObjectEqualsUsingMetaClass(SZTypeRef ref1, SZTypeRef ref2, const SZMetaClass *metaclass) {
    const SZObjectRef obj1 = (SZObjectRef)ref1;
    const SZObjectRef obj2 = (SZObjectRef)ref2;

    // Hunt for the next valid implementation of "equals" in the superclass hierarchy
    while (metaclass && metaclass->parentClass && !metaclass->callbacks->equals) {
        metaclass = metaclass->parentClass;

    // Call the implementation of "equals" that we found
    return metaclass->callbacks->equals(obj1, obj2, metaclass);

We’d use that like this:

void SZSomeObjectEquals(SZSomeObject obj1, SZSomeObject obj2) {

    // Call the "super" implementation of "equals"
    return SZObjectEqualsUsingMetaClass(obj1, obj2, obj.object->metaclass->parentClass);

Our callback struct has to accept the metaclass as a parameter in each callback function:

typedef struct {
    int (*equals)(SZTypeRef, SZTypeRef, const SZMetaClass *);
    int (*compare)(SZTypeRef, SZTypeRef, const SZMetaClass *);
    unsigned long (*hash)(SZTypeRef, const SZMetaClass *);
    SZTypeRef (*copy)(SZTypeRef, const SZMetaClass *);
    void (*dealloc)(SZTypeRef, const SZMetaClass *);
    char *(*description)(SZTypeRef, const SZMetaClass *);
} SZObjectCallbacks;

This is still a little awkward:

  • I’d rather hide the metaclass stuff somehow rather than force each overridden function to deal with it;
  • There needs to be a function to lookup the correct callback for each callback;
  • Callbacks need to be looked up each time the function is called.

The overhead of the lookup isn’t significant so I’m not worried about that, at least with the shallow class hierarchies I’m building. The proliferation of additional functions is unfortunate but I don’t see a way around it in C (I came up with a macro that would generate lookup functions automatically, but it transpires that the C preprocessor doesn’t allow the token pasting operator to inject struct fields (obj->metaclass->##name## is forbidden) which killed that idea).

In spite of the limitations this system works well enough, and achieves the initial goals:

  • Superclasses can stop exposing their implementations of the various callbacks;
  • Subclasses can call the standard …UsingMetaClass() functions instead of trying to figure out what their superclasses are doing;
  • Subclasses can specify NULL as their callback function and the system will gracefully fall back to the superclass’ implementation.


Replacing CrashPlan

It’s old news now, but CrashPlan are shutting down their consumer backup business. This is unfortunate because, despite its reliance on Java, CrashPlan is my backup system of choice for a number of reasons:

  • Backup to any computer running CrashPlan in addition to CrashPlan’s servers;
  • Multiple computers on a single plan;
  • Unlimited storage;
  • User-defined encryption key;
  • Backup external disks;
  • iOS app;
  • Reasonably priced;
  • Simple restore process.

I’ve been looking around for a replacement system. CrashPlan themselves recommend two options: their “CrashPlan for small business” offering, and former competitors Carbonite. Neither one matches my requirements. The CrashPlan alternative has two huge drawbacks:

  • Instead of being ~$14/month to backup all of my computers, it’d cost $10/month per computer (which works out as $40/month for me).
  • It can’t backup to other computers running CrashPlan, so I’d need a second system for local backups.

If the feature set was the same I’d consider keeping CrashPlan, but they’re charging more for less.

Carbonite has its own set of limitations:

  • The $5/month service doesn’t backup external drives, so I’d need the $8/month service (per computer);
  • The $8 service can only backup one external hard drive;
  • Files over 4GB aren’t backed up automatically;
  • It can only backup to Carbonite’s servers;
  • It only backs up files in your user directory.

It’s cheaper than CrashPlan’s own option, but a backup system that requires a greater-than-zero amount of effort to maintain after the initial setup is a bad idea.

I’m currently evaluating BackBlaze as a possible replacement. It retains many of CrashPlan’s essential capabilities:

  • Unlimited storage;
  • User-defined encryption key;
  • Backs up external disks;
  • iOS app;
  • Simple restore process (much improved since the last time I evaluated it).

It’s priced at $5/month per computer, which makes it more expensive than the old CrashPlan service but cheaper than the other options, and as a bonus it’s a native app.

That takes care of online backup, but what about a local backup? Local backups were unique to the consumer version of CrashPlan, so regardless of what I switch to I’m going to have to run two systems. I have a MacMini that acts as a backup server, so I need something that can backup to that. Time Machine would be ideal, except its GUI melts my GPU and it won’t reliably backup for me across a network (which is why I switched away from it and started using CrashPlan for local backups in the first place).

Arq looks like it might work out. It seems to do everything TimeMachine does, except the UI is a standard Mac app and it backs up via SFTP.



Just in time for it to be killed off by Nintendo in favour of the Switch (if you could ever find them in the shops; grumble grumble), Hanky Alien and Chuckie Egg now run on the 3DS. Getting the new platform up and running was pretty easy. Input was straightforward: the d-pad required switching to a couple of new functions for reading the button state but otherwise the constants are the same in libctru as those in libnds. Touch somehow just worked, which was surprising.

The hardest part was getting the graphics up and running. Not only does the 3DS introduce yet another set of RGB encodings, but the output of the LCDs is rotated 90 degrees from the VRAM. What’s that? You have a bunch of 2D graphics functions that are all optimized around VRAM being laid out in rows, from left to right? Bwahaha! Now VRAM is organized into columns, from bottom to top.

Though graphics were the trickiest part, the most laborious exercise was checking that the projects continued to build correctly for all 5 of the currently supported platforms (DS, 3DS, PSP, Dreamcast and SDL). I really need to come up with a better build system for the PSP and Dreamcast, and it’d be neat if devKitARM supported makefile names other than “makefile”.

There are a few things still to do:

  • libctru introduces yet another sound API, so there’s no sound yet;
  • SZLib doesn’t cater for the possibility that a device has multiple screens with different dimensions, so the bottom screen is off-center.

Anyhow, here are early 3DS builds, which I’ve tested out on my old 3DS XL and Citra (which is shaping up to be a great homebrew tool):