2018-03-31

# A Failed Experiment

While trying to come up with more optimizations for SZLib’s drawing system I came across something called “scanline coherence shape algebra”, which is an algorithm described in a book called Graphics Gems. It attempts to provide an efficient algorithm for computing “the visible regions of overlapping windows”. Aha, exactly what I’m looking for. I had to try it out.

In brief, the algorithm stores the shape as a list of “spans”. Each span consists of a y co-ordinate and a list of “segments”, which are just x co-ordinates. Spans are sorted by their y value and segments are sorted by their x value. Put together, they describe an arbitrary shape. The algorithm supports union, intersection and subtraction operations.

For the first pass at an implementation I followed the algorithm described in the book (as far as I could; the pseudocode for the special cases is too vague to implement in some places). I used linked lists to represent the spans and segments, and of course that required an overdose of `malloc()` to create each node in the list. For the second pass I switched to using arrays to store the data and kept a pool of pre-allocated arrays to limit `malloc()` as much as possible. That was a little faster, but still not fast enough. For the last pass I switched back to linked lists but built a version that uses a combination of a pair of pre-allocated arrays and couple of stacks to store the linked lists and eliminate all calls to `malloc()`.

So, how does the new shape algebra code compare with the existing code? It’s really, really, amazingly bad. Really bad. On my development machine, adding rects to the damaged rect manager in my test application using the existing code averages about 5ms of CPU time per 1s of wallclock time. The new code averages about 100ms of CPU time per 1s of wallclock time. The existing code is 20x faster than the new algorithm.

At that point I gave up on it. It’s likely that rendering damaged areas would be faster using data produced by the new code as it automatically consolidates adjacent rectangles, but given that this single operation uses 20% more CPU time than the entirety of the rest of the test application it’s completely impractical.

It did give me a few ideas for other optimizations for identifying intersection failures, though:

• Sorting the rectangles within the quad tree first by their y and then their x axis;
• Maintaining a rectangle that represents the bounds of the damaged area within each quad tree node.

2018-03-19

I spent a couple of weeks working on improving the performance of SZLib. Specifically, I wanted to eliminate two of the major bottlenecks in the system: recording areas to be redrawn; and determining which layers within the hierarchy will draw those areas.

The algorithm for recording damaged areas looks like this:

• Add the rect to the damaged rect manager:
• For each existing damaged rect:
• Remove the intersection from the incoming rect.
• Add the remaining areas of the damaged rect to the damaged rect array.

The algorithm for determining the layer responsible for redrawing a rect looks like this:

• For each rect in the damaged rect manager:
• For each layer in the layer hierarchy:
• Add the layer/rect intersection tuple to an array.
• Remove the intersection from the damaged rect manager.

There are obvious issues in both of these algorithms. In the first we have to compare every new damaged rect with every existing damaged rect. In the second we have to compare every damaged rect with every layer, which gives us a worst-case complexity of O(n^2).

My solution to both problems was to create a simple quad tree. I’d tried a complex quad tree before but it failed miserably. This time I:

• Stored only rectangles, not rectangles and the layers that they represented;
• Created the entire fixed-depth quad tree immediately, rather than trying dynamically to split and join nodes;
• Reset and re-used quad tree instances instead of creating new trees each time I needed one;
• Treated the quad tree as an immutable structure once it had been built by eliminating methods that would remove rectangles.

The time complexity of the two algorithms above is now the worst-case rather than every case. This change, coupled with some other minor optimizations, produced some great results.

I put together a test app that has 80 overlapping Hanky Alien bitmaps moving around the screen with a few nested text boxes. Numbers from my Mac suggest that the test app is around 40% faster now.

Some more numbers:

• Ignoring time spent in SDL and during vertical blanks, the test takes up less than half a second of CPU time for every 10 seconds of runtime on my Mac.
• Rendering time for every 10 seconds of runtime is about 30ms on my Mac.
• The PSP can handle moving 68 aliens at a solid 60fps.
• The DS can handle moving 13 aliens at a solid 60fps.

2018-01-29

# Isometric Demo

I’ve been working on a new little project. It’s a game I’ve been thinking about writing for years that’ll hopefully going to turn out like a cross between the puzzle solving of Dizzy and the exploration of Metroid. Part of the delay in getting started was indecision about the best viewpoint for the game, which I’ve been debating for a long time.

If I opt for a Pokémon-style overhead viewpoint I’ll probably have an easier time with the artwork and the game will be much more sedate, which is something I’m aiming for. On the other hand, it prevents me from including any kind of gravity/jump mechanics, which are an important part of the exploration aspect of the game. If I opt for a Metroid-style sideways viewpoint it’ll allow for jumping but it will also likely introduce annoying Metroid-style platform negotiation that I want to avoid.

Recently I’ve been considering an isometric viewpoint like Head Over Heels or the ZX Spectrum version of Batman. It gives me the advantages of both the Pokémon and Metroid viewpoints, but doesn’t preventing jumping and seriously discourages any finicky platforming action. I’ve decided to give it a try.

I started out with this randomly-generated room:

Rooms are represented as an array of tile structs, which are arranged such that the first element in the array represents the tile at (0,0,0) (the tile at the bottom of the stack at the leftmost side of the image):

The first row of the x axis is the first set of data in the array, followed by the next row, working backwards in the y axis. Once we hit the end of the y axis we move up a level in the z axis and repeat.

The graphics consist of hexagonal bitmaps each representing a single tile in the map. In order to render the map we start with the tile at (0,0,max) and render all tiles to (0,0,0). We then move to (1,0,max) and render to (1,0,0). Once we’ve rendered the entire bottom level of the map we move to (0,1,max) and repeat the sequence again. Simply put, we render from back to front, left to right, bottom to top, using the painter’s algorithm. The map consists of tiles that don’t move and aren’t animated, so we can render it to a buffer and copy chunks of it to the framebuffer when we need to erase things.

Rendering moving objects turned out to be far more involved. The problem is occlusion: sometimes moving objects are in front of background elements and sometimes they’re behind them depending on where the objects are positioned in the 3D space described by the map. Sometimes objects are partially occluded.

The easiest way to handle occlusion is to re-render the entire scene, including moving objects, from back to front each time an object moves. This is inefficient and gets slower in proportion to the complexity of the scene.

A more efficient idea I had was to figure out which tiles were in front of moving objects, render those to separate buffers, and add them to the scene as foreground views. However, this turned out to be trickier to implement than I expected.

Yet another approach I came up with was to use raycasting to render the scene. For each pixel of the screen project a ray into the map and see where the ray intersects a tile. Find that pixel within the tile’s bitmap and draw it to the screen. This approach is promising for two reasons: each pixel is drawn precisely once; and it makes possible interesting enhancements to the game engine, like texture mapping instead of precanned hexagonal bitmaps, and the ability to rotate the map by an arbitrary amount. It’s a much simpler problem than a Wolfenstein-like first-person raycaster because the isometric perspective eliminates all of the complex 3D transforms and trigonometry.

After deciding that a raycaster was overkill I came up with an algorithm that uses a z-buffer instead. It’s split into two parts. First we generate the background image buffer and z-buffer in one step:

• Create an array of ints with the same dimensions as the rendered bitmap (this is the z-buffer);
• Fill the z-buffer with -1;
• For each tile in the map in the order described above:
• Draw the tile to the buffer;
• For each pixel drawn to the buffer:
• Write the depth of the tile (ie the tile count minus the current iteration count of the loop) to the z-buffer at the coordinates of the pixel.

That gives us the background image buffer we need plus a z-buffer in which each value describes the depth of a pixel in the image buffer.

Next we can draw any other objects:

• Create another array to store the moveable-object z-buffer;
• For each moveable object:
• For each point in the left, right and top faces of the object within the 3D space of the map:
• Calculate the 2D location of the point within the object’s bitmap and grab the pixel color;
• Calculate the depth of the point;
• Calculate the 2D location on screen where the pixel will be drawn;
• Get the depth of the pixel in the background image buffer by looking up the appropriate index in the background z-buffer;
• Get the depth of the pixel from the moveable object z-buffer;
• Compare the depths and draw the pixel to the framebuffer if the depth from the moveable object is less than either depth from the z-buffers.

Erasing and redrawing is handled by marking the region that bounds the moveable object bitmap as dirty. When we redraw we blit the relevant background rect into the framebuffer then redraw the moveable objects on top.

Here’s what it looks like so far:

Unfortunately it’s too slow on the DS right now.

2018-01-27

# A Puzzle Game

If you’re stuck on an aeroplane with nothing to do, what better way to pass the time than to write a new game? I’d written versions of this particular game in Flash and SQL before so creating a new version in C would be more of an exercise in dredging up old memories rather than creating something new (especially as the basics are so trivial to implement), making it an ideal project for a plane flight. I got the game logic working in the air, the rest of the game written over the next few days, and I’ve been mucking around with presentation and fluff for the last few weeks.

Here’s a slightly cropped video of the DS version:

Most of the game works the same way as the monochrome Game Boy version. Personally I think that everything added to the game since the Game Boy version - with the exception of color - has made the game worse, so this version intentionally sticks fairly closely to its minimalistic inspiration. Some of the graphics I redrew from scratch, using the Game Boy Color version as a reference, whilst some started out as screenshots of the GB version. The music came from The Mod Archive; after having no success in using large WAV files of the original audio I was happy to find some great versions of the original music on that site in XM format.

The only aspect of the game I intentionally changed was the responsiveness of the controls. Neither of the Game Boy versions of the game managed to get the controls right. The controls in the GB version of the game were sluggish and unresponsive. The controls in the GBC version grossly overcorrected the problem and made everything move far too quickly. This version finds a happy medium between the two.

I dug out the source code for the Flash version as a reference to see how I’d implemented that, opening it with a hex editor as I haven’t owned a copy of Flash XM or a computer capable of running it for years, but surprisingly there was barely any code in it. Thinking back, the two trickiest parts of the Flash remake were coming up with a function to rotate multidimensional arrays by 90 degrees (in order to rotate the shapes) and trying to increase the automatic drop speed by a constant amount in each level. The Flash version tried to solve the drop speed by using an ugly system in which, instead of regularly dropping the shape every n frames, drops were triggered by a repeating pattern of frames - eg a drop after one frame, then after two frames, then after one frame, etc - in order to simulate fractional frames. Rotating the shapes was tricky because each shape was contained within an array just large enough to accommodate the shape (the box was stored as a 2x2 array while the line was a 1x4 array, for example), so the rotation algorithm had to work with multidimensional arrays whose dimensions changed size as they rotated. Worse, each shape has a rotation point that can fall outside the bounds of its array making it difficult to reposition correctly in the grid once rotated.

For the DS version I avoided the rotation problem entirely by storing all possible rotations of the shapes as pre-canned data in the game, and solved the repositioning problem by representing all shapes using 4x4 arrays. There’s no need to reposition the shapes following a rotation because the arrays are large enough to contain the correct position of each rotation. I solved the drop speed problem by using the frame counts from the Game Boy version as listed over at HardDrop.

Of course, creating a new game led to a number of improvements and enhancements to the underlying libraries, which is the fun but time-consuming part. I wanted to support graphics that were 50% larger on the PSP and 3DS, which meant changing my PNG-to-C tool so that it could convert two PNG files into one C file with `#ifdef`s to switch automatically between small or large graphics depending on the target platform. I was already embedding two copies of the bitmap data - one for the DS and other platforms and one for the Dreamcast - so I first had to implement real fixes for the hacky system that handled the color space conversions for that platform and get rid of the unnecessary bitmap data.

Once I had support for multiple bitmap sizes I realized I’d also need to be able to support multiple font sizes. I finished off my objc tool for creating fonts from PNGs and added large font support to it.

Supporting XM mod files meant adding mod support to the sound system. That was straight-forward enough: SDL’s mixer library uses libmodplug for mod playback; the DS uses MaxMod; and I’m using MikMod for the PSP. I haven’t figured out mod support for the Dreamcast or any kind of sound for the 3DS yet. Hanky Alien, Chuckie Egg and this game all had their own sound implementations, so I divided the class into two pieces (a data provider and a sound server) and moved the server out into the shared hardware abstraction library. Chuckie Egg and this game now use the server code; Hanky Alien will use it eventually.

Finally, I’ve been adding macros for reducing the amount of boilerplate necessary when creating classes. I’ve also created a tool that will generate new source and header files for classes.

There’s still a bunch of tedious presentation work to do to finish this off. I wrote a hierarchical menu system for Amy Zing that I’d hoped would be the only menu system I’d ever need to write, but naturally the menu system I need here uses a different structure. It’s more like a graph of screens than a hierarchy, in which the decision about which node to visit next depends on logic in whatever object owns the menu instance rather than the option that the user just selected. I’m hoping that the high score system I wrote for Chuckie Egg will just drop in to this game, though I’ll need to write a new view to present it. I need a title screen, a “game over” screen, a “game complete” screen (for the B-type game), some alternative background graphics, launcher assets for the PSP, a two-player mode (on the same device) and music/sound on platforms that currently don’t have it.

2017-11-28

# Amy Zing and the Amazing Mazes

This is a very little game created solely for one very little toddler who kept harrassing me to draw mazes for him to solve. The effort involved in creating the mazes vastly outweighed the effort it took him to solve them, so I figured I needed to automate it. The game is loosely based on the snail maze built into some revisions of the Sega Master System BIOS but it isn’t a remake. Like the snail maze, the objective is to guide the red square through the maze from the starting point to the finishing point.

Here’s a screenshot:

I used the Kruskal algorithm to generate the mazes. On the linked page the author complains that the algorithm “tends to create a lot of short dead-ends”, but that property makes it perfect for toddlers who don’t have the patience for extensive backtracking when they take the wrong path.

The game itself took almost no time at all to write; excluding libraries and presentation fluff the game weighs in at around 600 lines of C. The most complex tasks were fixing a weird rendering bug (that turned out to be a missing conversion between co-ordinate systems in the layer library) and writing a bunch of transitions between scenes. The first, and most troublesome, transition looks just like this one I made 7 years ago in JavaScript (hit refresh if the image doesn’t show up; the script doesn’t preload the image), but with the addition of fade out as the image disappears. Unfortunately that proved too complex for both the DS and the 3DS to render at 60fps so I replaced it with a simpler cross-fade transition.

There’s no title screen bitmap and the player’s character is just a red box. I’d intended to come up with a pixel art title screen showing Amy Zing herself, and an Amy Zing sprite that would wander around the mazes, but that’s beyond the limit of my drawing ability (if anyone is interested in contributing a title screen and a sprite in 3 sizes, let me know).

Tinkering with menu systems, transitions, difficulty levels, presentation and general polish took long enough that the toddler in question has long since lost interest in solving mazes.

2017-11-02

# SZLib Super

Something missing from SZLib is a way for subclasses to call the implementation of a method they’ve overridden. For example, in Objective-C:

``````@interface ClassA : NSObject

- (void)printMe;

@end

@interface ClassB : ClassA
@end

@interface ClassC : ClassB
@end

@implementation ClassA

- (void)printMe {
printf("ClassA");
}

@end

@implementation ClassB
@end

@implementation ClassC

- (void)printMe {
[super printMe];
printf("ClassC");
}

@end

int main(int argc, const char *argv[]) {
@autoreleasepool {
ClassC *c = [[ClassC alloc] init];
[c printMe];
}

return 0;
}
``````

The output of this program is `ClassAClassC`. The call to `super` in ClassC is smart enough to realize that ClassB doesn’t implement the method and will fall back to the implementation in ClassA. `super` can be used recursively to move up through the class hierarchy despite the type of the object being acted upon not changing.

I’ve been trying to come up with a way of supporting something like this in SZLib, but it turns out that it’s tricky to implement tidily without compiler support.

First, I moved the callbacks struct out of the objects that I create and into their metaclasses. Here’s the callback struct from SZObject:

``````typedef struct {
int (*equals)(SZTypeRef, SZTypeRef);
int (*compare)(SZTypeRef, SZTypeRef);
unsigned long (*hash)(SZTypeRef);
SZTypeRef (*copy)(SZTypeRef);
void (*dealloc)(SZTypeRef);
char *(*description)(SZTypeRef);
} SZObjectCallbacks;
``````

Previously I’d pass this into the object’s initializer:

``````SZTypeRef SZObjectInit(SZTypeRef ref, const SZObjectCallbacks *callbacks);
``````

Aside from being awkward, this meant that each object could only reference the single callback struct passed to it in its initializer. There was no way to work back through the class hierarchy and examine the callbacks of any superclasses. The workaround I used was for classes to expose their callback functions and allow subclasses to call them (so SZSomeObject had a function __SZSomeObjectEquals() exposed in its header, which subclasses of SZSomeObject could call in their own `equals` implementation). This too was awkward, as it exposed the internals of the superclasses and required that subclasses knew more than they really needed to about the implementation of their superclass.

Callbacks are now passed to the allocator, not the initializer, as part of the metaclass:

``````typedef struct __SZMetaClass {
const struct __SZMetaClass *parentClass;
const char *className;
const SZObjectCallbacks *callbacks;
} SZMetaClass;

SZTypeRef SZObjectAllocate(size_t size, const SZMetaClass *metaclass);
``````

Each metaclass has a pointer to its superclass, instantly giving us a hierarchy we can walk.

Now that we have a hierarchy we still need to be able to:

• Call the superclass’ implementation of a given function from an overriding function in a subclass;
• Walk back up the hierarchy if we find that a superclass doesn’t implement a given function until we find a class that does;
• Allow recursive calls of the function so that each subclass in the hierachy can call its superclass’ implementation.

This is last point in particular is harder than it sounds. The obvious approach fails miserably:

``````void SZSomeObjectEquals(SZTypeRef ref1, SZTypeRef ref2) {
SZObjectRef obj1 = (SZObjectRef)ref1;
SZObjectRef obj2 = (SZObjectRef)ref2;

// Attempt to call "super" implementation of "equals"
return obj1->metaclass->parentClass->equals(obj1, obj2);
}
``````

Accessing the `parentClass` in this way gives us the same metaclass each time, so if the superclass’ implementation of `equals` calls its own superclass it’ll get stuck in a loop. We’d need to pass the metaclass as an argument to the function so that we can update it on each successive call:

``````void SZSomeObjectEquals(SZTypeRef ref1, SZTypeRef ref2) {
SZObjectRef obj1 = (SZObjectRef)ref1;
SZObjectRef obj2 = (SZObjectRef)ref2;

// Call the immediate superclass' implementation of "equals"
return obj1->metaclass->parentClass->equals(obj1, obj2, obj->metaclass->parentClass);
}
``````

That implies a public API that doesn’t include a metaclass parameter vs an internal API that expects a metaclass parameter.

We also need to deal with the possibility that some classes have opted not to override the function, so we end up with something like this in SZObject:

``````int SZObjectEqualsUsingMetaClass(SZTypeRef ref1, SZTypeRef ref2, const SZMetaClass *metaclass) {
const SZObjectRef obj1 = (SZObjectRef)ref1;
const SZObjectRef obj2 = (SZObjectRef)ref2;

// Hunt for the next valid implementation of "equals" in the superclass hierarchy
while (metaclass && metaclass->parentClass && !metaclass->callbacks->equals) {
metaclass = metaclass->parentClass;
}

// Call the implementation of "equals" that we found
return metaclass->callbacks->equals(obj1, obj2, metaclass);
}
``````

We’d use that like this:

``````void SZSomeObjectEquals(SZSomeObject obj1, SZSomeObject obj2) {

// Call the "super" implementation of "equals"
return SZObjectEqualsUsingMetaClass(obj1, obj2, obj.object->metaclass->parentClass);
}
``````

Our callback struct has to accept the metaclass as a parameter in each callback function:

``````typedef struct {
int (*equals)(SZTypeRef, SZTypeRef, const SZMetaClass *);
int (*compare)(SZTypeRef, SZTypeRef, const SZMetaClass *);
unsigned long (*hash)(SZTypeRef, const SZMetaClass *);
SZTypeRef (*copy)(SZTypeRef, const SZMetaClass *);
void (*dealloc)(SZTypeRef, const SZMetaClass *);
char *(*description)(SZTypeRef, const SZMetaClass *);
} SZObjectCallbacks;
``````

This is still a little awkward:

• I’d rather hide the metaclass stuff somehow rather than force each overridden function to deal with it;
• There needs to be a function to lookup the correct callback for each callback;
• Callbacks need to be looked up each time the function is called.

The overhead of the lookup isn’t significant so I’m not worried about that, at least with the shallow class hierarchies I’m building. The proliferation of additional functions is unfortunate but I don’t see a way around it in C (I came up with a macro that would generate lookup functions automatically, but it transpires that the C preprocessor doesn’t allow the token pasting operator to inject struct fields (`obj->metaclass->##name##` is forbidden) which killed that idea).

In spite of the limitations this system works well enough, and achieves the initial goals:

• Superclasses can stop exposing their implementations of the various callbacks;
• Subclasses can call the standard …UsingMetaClass() functions instead of trying to figure out what their superclasses are doing;
• Subclasses can specify `NULL` as their callback function and the system will gracefully fall back to the superclass’ implementation.

2017-10-15

# SZLib 3DS

Just in time for it to be killed off by Nintendo in favour of the Switch (if you could ever find them in the shops; grumble grumble), Hanky Alien and Chuckie Egg now run on the 3DS. Getting the new platform up and running was pretty easy. Input was straightforward: the d-pad required switching to a couple of new functions for reading the button state but otherwise the constants are the same in libctru as those in libnds. Touch somehow just worked, which was surprising.

The hardest part was getting the graphics up and running. Not only does the 3DS introduce yet another set of RGB encodings, but the output of the LCDs is rotated 90 degrees from the VRAM. What’s that? You have a bunch of 2D graphics functions that are all optimized around VRAM being laid out in rows, from left to right? Bwahaha! Now VRAM is organized into columns, from bottom to top.

Though graphics were the trickiest part, the most laborious exercise was checking that the projects continued to build correctly for all 5 of the currently supported platforms (DS, 3DS, PSP, Dreamcast and SDL). I really need to come up with a better build system for the PSP and Dreamcast, and it’d be neat if devKitARM supported makefile names other than “makefile”.

There are a few things still to do:

• libctru introduces yet another sound API, so there’s no sound yet;
• SZLib doesn’t cater for the possibility that a device has multiple screens with different dimensions, so the bottom screen is off-center.

Anyhow, here are early 3DS builds, which I’ve tested out on my old 3DS XL and Citra (which is shaping up to be a great homebrew tool):

2016-12-29

# Transparency in SZLib

One of the neat features in early Woopsi was screen dimming. The class responsible read each overlapped pixel from the framebuffer, adjusted its brightness, and wrote it back. I eventually had to kill the feature when I changed how the rendering system worked and broke any support for transparency.

Woopsi’s drawing algorithm (essentially a reversed painter’s algorithm) looks like this:

• Pass in a rect that describes the area to be redrawn;
• For each layer in the layer hierarchy, from the topmost down:
• Draw the parts of the layer that intersect with the rect;
• Subtract the drawn parts from the rect.

Each layer is expected to own all pixels within its rectangular region. To support transparency, there would need to be a way for layers to opt out of subtracting themselves from the redraw rect so that lower layers could draw into the same area. Even if there were a way to achieve that, we’re traversing the hierarchy the wrong way. Transparent views need to render over the top of existing framebuffer data. We need to draw from the bottom up.

The CanvasLayers JavaScript library has a workaround but it’s really inefficient. When setting the library up, it includes the option to support transparency. With that enabled, this is the algorithhm:

• Pass in a rect that describes the area to be redrawn;
• For each layer in the layer hierarchy, from the bottommost up:
• Draw the parts of the layer that intersect the rect.

It’s just the painter’s algorithm. Its awfulness is slightly mitigated by all of the dirty rectangle work that the library does, but we’re still pointlessly drawing a bunch of stuff below opaque layers. We’re also forced to maintain two algorithms to do the same work.

SZLib needed to support transparency in order for overlapping objects in Hanky Alien and Chuckie Egg to work, so I put some thought into the problem and arrived at the obvious (with hindsight) solution: a two-pass algorithm. In pass 1, we iterate over the layers from top-down to figure out which layer will be responsible for drawing what. In pass 2, we iterate over pass 1’s results from bottom-up and perform the drawing.

Pass 1 looks like this:

• Pass in a rect that describes the area to be redrawn;
• Create an array of layer/rect tuples;
• For each layer in the layer hierarchy, from the topmost down:
• Find the intersection of the layer with the rect;
• Create a tuple containing the intersection and the layer;
• If the layer is opaque, remove the intersection from the rect.

Note that we don’t consider an intersection to be “used” if the layer is transparent, which allows lower layers to create their own tuple in subsequent iterations.

In pass 2, we iterate backwards over the tuple array and redraw each tuple. Pass 1 gave us an array ordered by z-index from high to low; by iterating backwards we therefore redraw from the bottom up. Lower opaque layers are drawn before higher transparent layers.

This new algorithm supports transparency with the minimum amount of redundant drawing.

2015-09-07

# HankyAlien - A New DS Game

Here’s a new homebrew game for the Nintendo DS. You should be able to tell exactly what the game is from the screenshots. Download it here:

# Diary of a Game

It occurred to me that I’d never written a large project exclusively in C. I’ve written little utilities and libraries in C, and used it occasionally in C++ and Objective-C programs, but a whole project that’s just C? Nope. I’d promised myself I was done with remakes and DS programming, so why not try writing a new remake for the DS in C?

The first thing I needed to do was write my own core libraries from scratch, because that’s always the interesting part. Abstract away some of the C nastiness. First of all, I wanted to use reference counting everywhere. The model used in Woopsi, in which the framework tries to own everything, is efficient for the CPU but eventually results in some really nasty problems for the programmer. I still have no idea who owns a Woopsi font object.

# SZLib

Reference counting is very easy to implement (it is the way I implemented it, anyway). Start off with a basic object that looks something like this:

``````typedef struct __SZObject {
unsigned int refCount;
} SZObject;
``````

Next, add a few functions to create/retain/release them:

``````SZObject *SZObjectCreate() {
SZObject *ref = malloc(sizeof(SZObject));
ref->refCount = 1;

return ref;
}

void SZObjectRetain(void *ref) {
SZObject *object = (SZObject *)ref;
++object->refCount;
}

void SZObjectRelease(void *ref) {
SZObject *object = (SZObject *)ref;
assert(object->refCount > 0);

--object->refCount;

if (object->refCount == 0) {
free(object);
}
}
``````

It’s easy to embed reference counts into any object by taking advantage of a feature of C’s layout of structs in memory:

``````typedef struct __SZThing {
SZObject object;
char *data;
} SZThing;
``````

I can cast a variable of type SZThing* to an SZObject* and treat it as though the former is an instance of the latter. The SZThing struct has all of the properties of the SZObject since the first field in the struct is an SZObject. Neat.

There’s a little more to it than this, of course. If the constructor function for the SZThing object mallocs its data property, then the release function needs to free that memory. This is all done with a callback struct that looks something like this:

``````typedef struct __SZObjectCallbacks {
int (*equals)(const void *, const void *);
int (*compare)(const void *, const void *);
unsigned long (*hash)(const void *);
void *(*copy)(const void*);
void (*dealloc)(void *);
} SZObjectCallbacks;
``````

Whenever an object includes an SZObject in its struct it is expected to create and populate one of these structs. The function pointers are called at the appropriate times if they are not NULL (dealloc is called by the SZObjectRelease() function, for example) and this sets us up for a bunch of exciting functionality: copying, comparisons, and hashing, in addition to worry-free memory management (not thought-free; there’s still a requirement to retain and release, but there’s no agonising over whether a given piece of code should free a block of memory or not).

With reference counting in place, the next thing to do is write a bunch of collection classes that take advantage of this functionality. I call them “classes”, even though C doesn’t support classes, because the idea is similar: a group of associated data that can only be accessed via a well-defined set of functions. Achieving this requires “opaque types”, which are basically structs with hidden definitions. An opaque type is created by adding a typedef to a header file and a struct definition to a .c file. Here’s a snippet of the header file:

``````typedef struct __SZHashMap *SZHashMapRef;

int SZHashMapCount(SZHashMapRef hashmap);
``````

Here’s part of the .c file:

``````typedef struct __SZHashMap {
SZObject object;
void **data;
int size;
int reservedSize;
SZHashMap;

int SZHashMapCount(SZHashMapRef hashmap) {
return hashmap->size;
}
``````

It seems to me to be extremely counter-intuitive that the compiler doesn’t go nuts when it encounters a typedef for something that doesn’t exist yet, but it works. The code to create and use one of these hashmaps looks like this:

``````SZHashMapRef hashmap = SZHashMapCreate();
printf("%d", SZHashMapCount(hashmap));
SZObjectRelease(hashmap);
``````

Outside of the hashmap’s .c file there’s no way to create a plain SZHashMap struct or poke at its internals (short of looking through the raw bytes in RAM).

Other than a hashmap (which I didn’t need to use) the base library includes a bunch of other things I didn’t really need or use: a set (which is currently backed by a binary tree that can quickly become unbalanced, so it needs to be replaced with a self-balancing tree structure), a linked list, an autorelease pool, a number wrapper (like a boxed int/float/double in Java), a value wrapper (slightly more versatile than a number) and a data wrapper (wraps a byte array; essentially just a very versatile value). Things I wrote and then did use were a string (ASCII only), a range (a struct containing a start and a length) and an array.

# SZGeometry

Once the core library was done I could move onto slightly more interesting things: a geometry library. This is mostly based on WoopsiGfx, and includes code for managing points, sizes, rects, lines and circles.

The most interesting features are the circle and line point enumerator functions. In modern languages with new-fangled features like closures it would be possible to do something like this:

``````SZRect rect = SZRectMake(0, 0, 10, 10);
[SZCircle enumeratePointsInRect:rect block:^(SZPoint point) {
// Do stuff with point here
}];
``````

In C that’s not possible, so we’re forced to use function pointers and a context object to simulate closures instead:

``````void enumerate() {}
SZRect rect = SZRectMake(0, 0, 10, 10);
SZCircleEnumeratePoints(rect, NULL, __callback);
}

void __callback(const void *context, SZPoint point) {
// Do stuff with point here
}
``````

It achieves the same thing, but it’s less elegant. Closures are probably the feature I’ve missed most when coding C. In either case, we’ve split up enumerating the points of a circle from rendering the circle (in WoopsiGfx they were part of the same operation), so we can do more interesting things with the circle’s description now.

# SZGraphics

With the geometry library in place, we can add a graphics library. Again, this is all back-ported from the C++ WoopsiGfx library (with the occasional bugfix). It includes a few different types of bitmap, a font class, and a graphics class for drawing to those bitmaps.

Achieving different types of bitmap without inheritance was tricky. The library uses a kind of base class that all of the other bitmaps include as the first field in their structs:

``````typedef struct __SZBitmap {
SZObject object;
SZSize size;
SZBitmapCallbacks callbacks;
} SZBitmap;

typedef struct __SZMutableBitmapWithInternalBuffer {
SZBitmap object;
SZColor *bitmap __attribute__ ((aligned (4)));
} SZMutableBitmapWithInternalBuffer;
``````

We’re starting to nest structs: the mutable bitmap contains a bitmap, and the bitmap contains an SZObject. I can therefore treat a mutable bitmap as a bitmap and as an SZObject and get all of the functionality of the combined set of 3 objects.

A very truncated version of the SZBitmapCallbacks struct looks like this:

``````typedef struct __SZBitmapCallbacks {
SZColor (*pixelAtCoordinates)(SZBitmapRef, SZPoint);
} SZBitmapCallbacks;
``````

Meanwhile, SZBitmap offers a function that does this:

``````SZColor SZBitmapPixelAtCoordinates(SZBitmapRef bitmap, SZPoint point) {
return bitmap->callbacks.pixelAtCoordinates(bitmap, point);
}
``````

Therefore, all a “subclass” of SZBitmap has to do is fill out its SZBitmap’s callback struct and, even if it has some wildly crazy internal representation of a bitmap, it can be treated like any other SZBitmap. All of the standard functions will work.

Making this subclassing scheme possible means working around the opaque types idea, unfortuntely, since each subclass needs to include an SZBitmap (the struct, not a pointer to a struct) in its own struct. I got around this by moving the struct definition to a “private” header file (you can tell it’s supposed to be private; its filename is suffixed with “_private”).

Fonts are based on Jeff Laing’s monochrome packed fonts from Woopsi, with one significant change. Rather than the SZFont being responsible for rendering itself to a bitmap, the font has a function for testing if the point at a given set of co-ordinates within a character glyph is set (1) or unset (0). The graphics class uses that information to render to a bitmap. This way all of the rendering logic is in one place and there’s no coupling between SZFont and SZBitmap. I haven’t bothered with multiple types of font or unicode because they’re two more things I won’t use.

The final piece of the graphics library is the class that does all of the drawing work. It includes the usual set of drawing functions: lines, text, rectangles, circles, flood fill, etc. The API is a vast improvement over WoopsiGfx because it takes advantage of all of the available primitives created in the lower-level libraries like rects and points, greatly reducing the number of arguments passed to each function. All of the functions accept a graphics context struct that includes a bitmap to draw to and a clipping rect, meaning all of the drawing functions are clipped. The clipping code is much saner in the new library.

# SZHAL

The last library is a hardware abstraction library. It includes a framebuffer class, a couple of functions for doing DMA-boosted `memset()` and `memfill()`, and code for interacting with the stylus, buttons, vblanks, and hardware setup and teardown. The library uses `#ifdefs` to switch between DS and SDL implementations where necessary. I started abstracting out the DS’ sprite system but quickly abandoned it when I realized how awkward it would be to get the graphics working in SDL.

# The Game

The downside to completely ignoring the capabilities of the DS hardware and doing all of the graphics work with just the CPU and a framebuffer is that it is insanely expensive. Early on it looked like HankyAlien wouldn’t work because drawing all of the aliens at 60fps wasn’t possible. Then I realised that the original game moved one alien at a time, so I’d only ever need to draw one alien at once.

Although the game’s background is black, in theory it would be trivial to drop a background image in with no meaningful drop in performance. Every time a sprite is drawn to the screen the game pushes a rect that describes the sprite onto a dirty rects array. Each time the game iterates it erases all of the dirty rects and flushes out the array. To get a background working, we’d just need to perform a bitmap copy for that rect from the background image to the framebuffer instead of drawing a black rect.

The code was starting to get a little unwieldy and I found myself wishing that I could somehow implement the delegate pattern to tidy it up. After some thought, I figured out that delegates are very easy to implement in C. First of all we have to define a delegate struct:

``````typedef struct __SZGameSpriteDelegate {
void *delegate;
void (*spriteDidFinishExploding)(void *, SZGameSpriteRef);
} SZGameSpriteDelegate;
``````

We can create one of these structs like so:

``````SZGameSpriteDelegate delegate;
delegate.delegate = game;
delegate.spriteDidFinishExploding = __SZGameSpriteDidFinishExploding;
``````

In this snippet, `__SZGameSpriteDidFinishExploding` is a static function within the `SZGame` class that we want to call when the sprite explosion animation ends. The `game` variable is a pointer to the `SZGame` instance that wants to receive the callback. We call it like this:

``````void SZGameSpriteFinishExploding(SZGameSpriteRef sprite) {
if (sprite->delegate.delegate && sprite->delegate.spriteDidFinishExploding) {
sprite->delegate.spriteDidFinishExploding(sprite->delegate.delegate, sprite);
}
}
``````

The function in `SZGame` looks like this:

``````static void __SZGameSpriteDidFinishExploding(void *game, SZGameSpriteRef sprite) {
SZGameRef gameRef = (SZGameRef)game;

// Do stuff with gameRef
}
``````

Therefore, the `SZGameSpriteDelegate` struct holds two pieces of information: the object that wants to respond to callbacks, which we pass as the first argument to the callback functions, and the set of optional functions that the delegate can respond to.

The shields are only drawn once per game. The game maintains a bitmap representation of each shield and does two-stage collision detection with other sprites: do the sprites intersect, and if so, do any pixels overlap? If there’s a collision it erases the other sprite’s rect from the shield bitmap. We erase the other sprite from the screen and the two representations are therefore kept in sync.