2016-10-05

Hanky Alien Refactored

I’ve been toying with the idea of writing another homebrew DS game using the libraries I developed for Hanky Alien. Unfortunately the existing code has one considerable flaw: the game logic is intimately entwined with the rendering system, to the point that writing a new game unavoidably involves rewriting a bunch of graphics code that really should have been separate.

To fix this I’ve pulled apart Hanky Alien and reassembled it into three distinct pieces:

  • A layering library that handles all of the rect stuff I insist on using instead of just writing a sprite simulator for SDL like any sensible person;
  • An MVC GUI framework with the components necessary to get the game running (view hierarchy, controllers, timers, label views, bitmap views, event-based stylus and button handling, transitions, etc);
  • The game itself.

The game logic is currently included in the same build target as its presentation layer, but those can be trivially separated once I get around to it.

The new structure has a multitude of benefits. In order to place text on the screen, the code no longer needs to figure out where the text will go, figure out if it needs to erase what was previously at those co-ordinates, and then render the text; it just updates the string in a label view and lets the underlying libraries do all of the hard work. There’s a complete separation between the game logic and all I/O. Timer events piped down from the controller kick the game’s update logic, and game events (such as movement, or the addition or removal of objects within the game) are sent up to the controller via delegate callbacks. The controller is responsible for creating, moving, updating and removing views. I can change the game logic without having to think about any rendering. Each scene in the game - the “Simian Zombie” scene, the title scene, and the game scene itself - are all separate controllers. The GUI framework handles animating the transition from one controller to the next, which means a bunch of complex custom state machine logic became redundant. On top of that, the whole shebang is written in pure C.

All of this structure came with a horrible cost, however. Performance was nowhere near acceptable.

One huge benefit of manually managing the layout and redraw of the screen means each situation can be as efficient as possible. A generic approach to updating the framebuffer means there’s no opportunity for taking shortcuts that make sense in one place but would break other code. This forced me to dig through the entire codebase and try to optimize everything.

First step: profiling with Instruments. Hanky Alien builds happily using SDL2, so I can profile it on my Mac. It doesn’t necessarily give me accurate information, obviously; the Mac and DS have wildly different architectures and capabilities. It does give me a starting point, though.

One mistake that quickly became apparent was the amount of boxing and unboxing of rect structs that the layering library was performing. Subtracting one rectangle from another produces an array of leftover rectangles, and those were returned in an array. The array provided by the core library would only store objects, so I’d opted for the naive approach of boxing the leftover rect structs into value objects and sticking those in an array. Unfortunately each boxing operation required a malloc(), and any attempt at using the data in the objects required them to be unboxed back into structs first. The fix for this was easy: create a new array class specifically for storing collections of rect structs. As long as a resize isn’t required - which inevitably demands a realloc() call - appending rects involves nothing more than changing an int and copying a struct.

Creating those arrays was also slow, particularly when multiple rectangles are divided in a tight loop. The code used to look something like this:

for (int i = 0; i < SZRectArrayLength(rectsToDivide); ++i) {
    SZRect r = SZRectArrayRectAtIndex(rectsToDivide, i);
    SZRectArrayRef a = SZRectCreateArrayBySubtractingRect(rect, r);

    // Do stuff with a

    SZRectArrayRelease(a);
}

It now looks like this:

static SZRectArrayRef a = NULL;

if (!a) {
    a = SZRectArrayCreate();
}

for (int i = 0; i < SZRectArrayLength(rectsToDivide); ++i) {
    SZRect r = SZRectArrayRectAtIndex(rectsToDivide, i);
    SZRectSubtractRectAndPopulateArray(rect, r, a);

    // Do stuff with a

    SZRectArrayRemoveAll(a);
}

The “remove all” function just sets the array’s size property to 0. We create a single static array and re-use it in each call of the function and each iteration of the loop. We can get away with this because all of the code is single-threaded.

I’ve been unrolling loops. The code above does a bunch of work in the loop that we could unroll. We can move the variable declarations out of the loop and stop querying the array length on each iteration:

int rectCount = SZRectArrayLength(rectsToDivide);
SZRect r;

for (int i = 0; i < rectCount; ++i) {
    r = SZRectArrayRectAtIndex(rectsToDivide, i);
    SZRectSubtractRectAndPopulateArray(rect, r, a);

    // Do stuff with a

    SZRectArrayRemoveAll(a);
}

We’re passing structs around by value; we should probably stop doing that too:

for (int i = 0; i < rectCount; ++i) {
    r = SZRectArrayRectAtIndex(rectsToDivide, i);
    SZRectSubtractRectAndPopulateArray(&rect, &r, a);

    // Do stuff with a

    SZRectArrayRemoveAll(a);
}

The game logic itself had some inefficiencies. Something I’d annotated with a “todo” in the original game was the alien collision code. In order to check if the player’s bullets were colliding with an alien, the code looped over all aliens in the game. The new version treats each column of aliens as a unit and checks collisions with those before checking against the aliens themselves.

The code uses a huge number of rect calculations, so those needed to be as fast as possible. All of the functions had code to cater for rects with a negative width or height. That would never be possible with the data coming out of the various libraries, so I added a bunch of “fast” versions that didn’t do any sanity checking.

The rendering code now uses Cearn’s ARM assembly copy functions for blitting stuff to the framebuffer.

The code is much, much faster than it used to be, but it’s not enough to throw around as many bullets as the older version.

Some of the exciting failures were:

  • Flattening the view hierarchy into a single view and manually handling a bunch of the redrawing logic. Initially it seemed like this would lead to a massive speed boost, but as the redrawing code got more and more complex the game got slower and slower.
  • Replacing bitmaps with solid blocks of color to see if the DMA would help.
  • Using quadtrees to help partition the space and reduce the amount of collision detection being done.
  • Creating my own profiler to record the execution time of various functions.

My final idea was to ditch the DS completely and switch platforms to something with more CPU power. Requirements for the new platform:

  • Must be a console or handheld (rules out anything with a keyboard or without a joypad).
  • An easily-installed homebrew dev kit with great C support (rules out pretty much everything prior to 32 bit).
  • A faster CPU than the DS’ 66MHz ARM9 (no GBA).
  • No fighting firmware upgrades (so long, Vita/3DS/Wii U).
  • Well-designed input devices (no GPH devices).
  • Must be easy to test on the real device.
  • Must have a reasonably capable OSX emulator.

By my estimation, that leaves three potential devices: the PSP, the Wii, and the Dreamcast. I’ve had a look at the dev kits for the PSP and Wii and neither one really seemed to fit with the archaic software rendering that I’m trying to do. The Dreamcast, though, appears to have a wonderfully simplistic framebuffer graphics mode and even supports DS-style RGB555-encoded pixel data. It could use some more homebrew.

One trip to Amazon later and a new (old) NTSC Dreamcast is winging its way to me! After a few evenings of tinkering I managed to get this running in lxdream:

HankyAlien-Dreamcast

Unfortunately, despite the Dreamcast being fantastically more capable on paper (200MHz CPU, 16MB RAM, 8MB VRAM vs the DS’ 66MHz CPU, 4MB RAM and 656KB VRAM), it turns out that in terms of raw CPU performance there’s not a whole lot between them. On a more positive note, programming the Dreamcast looks like it’ll be fun, so once this refactoring project is complete I think I’ll try writing something exclusively for it.

2011-08-11

Really Bad Eggs - Objective-C Edition

The Objective-C version of Really Bad Eggs is coming along nicely. I’ll try to get some screenshots and maybe a video up, but in the meantime here’s some stream-of-consciousness pontificating on the port so far.

When I wrote DS code I used to complain about the lack of a debugger. I’ve written most of the Objective-C version of Really Bad Eggs whilst out and about on a Windows laptop, so I haven’t had a debugger or even a compiler. Despite the lack of tools and the game being my first Objective-C program it’s working well so far. Rewriting the code in a new language has allowed me to re-examine a lot of the original ideas. In some cases I’ve refactored both the C++ and Objective-C code to be simpler or more versatile. In others I’ve used features of Objective-C to improve on the original code.

Bug Fixes and General Improvements

The DS’s representation of incoming garbage eggs was inaccurate due to a typo. I’ve fixed this in both versions of the game. I’ve also improved the placement of garbage eggs that are dropped into the grid.

Garbage eggs are always dropped into the columns with the least eggs first. The (simplified) algorithm looks like this:

  • Get a list of columns and their heights, sorted in ascending order by height (insertion sort).
  • Add the leftmost column to the list of “active” columns.
  • While there are eggs to place:
    • Place a garbage egg in each active column.
    • If the column to the right of the last active column has the same height as the last active column, add it to the active column list.

It essentially fills the grid from bottom to top as though it were pouring in water.

There is a subtle problem with the algorithm. The sorting system produces a list of columns that are sorted by height and column index. In SQL, it would look like this:

SELECT
    columnHeight
   ,columnIndex
FROM
    columns
ORDER BY
    columnHeight ASC
   ,columnIndex ASC

If columns 0, 1, 2, and 5 are of height 2, whilst columns 3 and 4 are of height 1, and we add a garbage egg, we know that the egg will be added to column 3. Here’s the column state:

        000  0
        000000
        ------
Column: 012345

Here’s the sorted data:

Column: { 3, 4, 0, 1, 2, 5 }
Height: { 1, 1, 2, 2, 2, 2 }

The first column to receive an egg will be column 3, as it is the left-most column in the sorted list.

This predictability means that advanced players can start using it to their advantage. They can drop a block here because they know they’ve got an incoming garbage egg and want it to go there. They can change their plans because they know what’s going to happen next. To prevent this, the sorting algorithm now randomises columns of the same height within the sorted array.

Memory Management in Objective-C

Memory management is a pain. I can see exactly why Apple have implemented reference counting the way they have. I’ve worked extensively with low-level memory management in C++ due to the combo attack of the DS’ tiny stack/slow CPU. The combination makes both smart pointers and stack-based objects mostly unusable, leading to the same questions time and time again:

  • If I call a function, who owns the memory it allocates? Do I delete it or not?
  • Does this function expect to receive an existing block of memory that I allocate or will it allocate its own?
  • When writing functions, should I allocate memory or should I expect it to be allocated in advance?
  • Should I return allocated memory via the return statement or via a pointer-to-a-pointer passed as a parameter?
  • Should I return an object on the stack and hope RVO kicks in, or should I return a pointer to an object on the heap?
  • Should functions receive and return pointers or references to objects?
  • If an object is told to store a pointer to another object, should it store the pointer or a copy? What if the pointed-to object changes during the lifetime of my object?

I’ve become used to considering these issues every time I work with memory in C++, but Objective-C tries to solve the issues itself. Mostly it uses conventions to suggest the correct approach. Functions with “new” or “init” at the start of the names, for example, will retain the returned object and increase its reference count. The caller therefore becomes the owner of the object. Convenience methods, such as NSArray’s arrayWithObjects method, return autoreleased objects. These are released automatically by the Cocoa framework when the current iteration of the application’s state terminates. If the creating code needs them around longer then it needs to retain them. Collections retain objects added to them, so they can be deemed to be the owners of those objects. It is the responsibility of the object’s previous owner to ensure that he releases the object.

Getting the hang of the conventions is tricky, particularly when dealing with frameworks like cocos2d which aren’t the best architected examples I’ve seen. There are a number of edge cases, too. Suppose you retain a reference to object A, which creates and retains a reference to object B, which itself creates and retains a reference to object C. During its creation, object C retains object B. What happens to B and C when A is released? Why, nothing, of course. B and C have a cyclic relationship. Releasing object A reduces the reference count of B by one, but B is still referenced by C (and vice-versa) so neither will be deallocated. This isn’t an issue in C++ as the developer manually deletes objects. Deleting A will delete B which will delete C (assuming the destructors tidy things up correctly). Nor is it an issue for C#’s garbage collector because it can tell that the only references to B and C are each other.

For the most part, I’ve used what Objective-C calls “weak references” - I’ve got pointers to objects without retaining them, the same as I would in C++. My code probably looks hideous to experienced Objective-C coders as I’ve trampled conventions all over the place.

One of the main reasons I upgraded to OSX Lion so quickly was the promise of ARC, or “Automatic Reference Counting”. Skimming through the docs suggests that there are a lot of edge cases to it, though (that seems to be a common theme with Objective-C - watch out for the edge cases), so I’m sticking with the traditional method first in order to understand fully what ARC is replacing.

A bug that caught me out for a while was a classic Objective-C memory leak. The “Leaks” utility insisted I’d leaked memory from one class like crazy, but it also told me that the reference count for all leaked objects was 0. That makes no sense until you remember that Objective-C allows you to make boneheaded mistakes like this:

- (void)dealloc { }

Oops. All dealloc methods should call the superclass’ dealloc method:

- (void)dealloc {
    [super dealloc];
}

I’d have suggested that Apple use the template pattern for this instead. NSObject could have methods like this:

- (void)onDealloc { }

- (void)dealloc {
    [self onDealloc];
    [super dealloc];
}

User code would look like this:

- (void)onDealloc {
    // Cleanup here
}

It’s now impossible to leak memory by forgetting to dealloc the superclass. I imagine there’s a valid case for giving developers the power to shoot themselves in the foot, but there’s a lot of boilerplate code needed in classes that could be excised very easily by using the template pattern. The same is true for object initialisation. Here’s what you do now:

- (id)init {
    if ((self = [super init])) {
    // Initialise the object
    }

    return self;
}

This is what a templated NSObject could look like:

- (void)onInit { }

- (id)init {
    if ((self = [super init])) {
        [self onInit];
    }

    return self;
}

This is what a user class would look like:

- (void)onInit {
    // Initialise the object
}

Tidier, but potentially less powerful.

Performance

Running the OSX version of Really Bad Eggs through Xcode’s CPU profiler demonstrates how efficiently the game works. Most of the game code barely registers; cocos2d’s drawing routines use several magnitudes more CPU time than anything else in the game. It also demonstrated how abysmally slow the NSArray objects are. One function I used them in took up more CPU power than anything else in the entire game (except the cocos2d drawing routines), so I refactored the function without NSArrays and the problem disappeared.

Dynanism

The dynamic nature of Objective-C means I’m happy to use calls to an object’s isKindOfClass method, whereas in C++ I’d have been more inclined to refactor to remove the need for type checking or calls to dynamic_cast<>(). For example, the C++ function to test if two eggs can be connected looks like this:

void NormalBlock::connect(const BlockBase* top,
                          const BlockBase* right,
                          const BlockBase* bottom,
                          const BlockBase* left) {
    setConnections(top != NULL && top->getColour() == _colour && top->getState() == BlockBase::BLOCK_STATE_NORMAL,
                   right != NULL && right->getColour() == _colour && right->getState() == BlockBase::BLOCK_STATE_NORMAL,
                   bottom != NULL && bottom->getColour() == _colour && bottom->getState() == BlockBase::BLOCK_STATE_NORMAL,
                   left != NULL && left->getColour() == _colour && left->getState() == BlockBase::BLOCK_STATE_NORMAL);
}

All egg classes inherit from the BlockBase class (descriptive name, I know) which includes a “_colour” member. Each subclass sets this to a colour representative of the bitmaps used to display the egg. The red egg has an RGB value of (31, 0, 0) whilst the blue egg has an RGB value of (0, 0, 31). Checking if two eggs are of the same type simply entails a comparison between two 16-bit integers.

The Objective-C function just compares the eggs’ classes instead:

- (void)connect:(BlockBase*)top right:(BlockBase*)right bottom:(BlockBase*)bottom left:(BlockBase*)left {

    BOOL topSet = top != NULL && [top class] == [self class] && top.state == BlockNormalState;
    BOOL rightSet = right != NULL && [right class] == [self class] && right.state == BlockNormalState;
    BOOL bottomSet = bottom != NULL && [bottom class] == [self class] && bottom.state == BlockNormalState;
    BOOL leftSet = left != NULL && [left class] == [self class] && left.state == BlockNormalState;

    [self setConnectionTop:topSet right:rightSet bottom:bottomSet left:leftSet];
}

It’s not a major change, but it does remove some cruft from the code.

Blocks as Callback Functions

The Grid class represents the well of eggs and manages all of the eggs within it. It includes functions for rotating the live eggs, identifying and exploding chains of eggs, and so on. In the C++/DS version, the Grid class is also responsible for calling methods of the SoundPlayer class to trigger audio playback every time anything sound-worthy occurs. This works, but it’s not terribly clean - the Grid shouldn’t need to deal with sounds. The design ties the logic of the game to the platform it is running on.

A better solution would be to trigger an event, but if you’ve ever implemented an event system you’ll know that it’s a very verbose pattern. At the very least, you’ll need interfaces and classes that represent event arguments and event listeners. I’d like to tidy up the design but I’m not that bothered.

A considerably terser option would be to use a function pointer as a callback method, but that opens up a whole can of non-OO, global namespace-polluting nastiness. C++’s pointers-to-members are so limited - the exact type of the pointed-to object must be known - that they aren’t even worth considering.

Another possibility would be to create a message queue and have the grid push messages into it. A message pump would dispatch messages to the sound player object as necessary. Again, this is ugly and involves far too much new code for virtually no benefit.

It was only after working through this familiar thought process that I remembered that Apple has added closures (“blocks”) as a non-standard extension to C. With closures, the problems disappear. No bloated event system, no global namespace pollution with random C functions, and a simple way to extract the sound triggers from the Grid class.

The “Grid.h” file includes the following typedefs:

typedef void(^GridEvent)(Grid*);
typedef void(^GridBlockEvent)(Grid* grid, BlockBase* block);

I think of the typedefs as being analogous to C#’s delegate signatures. The class itself includes the following members:

GridEvent _onGarbageRowAdded;
GridEvent _onLand;
GridEvent _onGarbageLand;
GridBlockEvent _onBlockAdd;
GridBlockEvent _onBlockRemove;
GridBlockEvent _onGarbageBlockLand;

The class triggers the closures like so:

if (_onBlockLand != nil) _onBlockLand(self);

Wiring up code to an instance of the Grid class is simple:

Grid* grid = [[Grid alloc] initWithHeight:2 playerNumber:0];

grid.onBlockLand = ^(Grid* sender) {
    printf("A block has landed");
};

It looks like C, and it tastes like C, but I can perform JavaScript-like tricks with it. ‘Tis a beautiful abomination.

I’ve used this pattern all over the code to trigger sounds and visual effects as well as add/move/remove sprites. The game engine code is completely separate from the presentation layer.

Classes as Objects

The BlockFactory class has also benefited a little from the switch to Objective-C. When a grid needs to place a new egg, it fetches a new egg instance from the BlockFactory rather than create the egg itself.

The BlockFactory is necessary because all players receive the same sequence of eggs. Suppose player 1 requests his third pair of eggs and is given a red egg and a blue egg. When player 2 requests his third pair of eggs, he must also be given a red egg and a blue egg. No player can “cheat” due to a particularly advantageous sequence of eggs. Winning is a matter of skill and speed rather than luck.

The BlockFactory must, therefore, maintain a list of all egg types that have been generated. It also needs to remember which pair of eggs all players will receive next. Once a pair of eggs has been used by all players it can be removed from the list. Attempts to retrieve a pair of eggs not in the list will cause a new pair of random egg types to be created and appended to the list. The BlockFactory instance is shared across all players.

In the C++ version, the list stores enum values that represent an egg type. When a new egg is requested, the correct constructor is called for that value using a switch() statement. Here’s the code that adds a new item to the list:

BlockFactory::BlockType BlockFactory::getRandomBlockType() const {
    return (BlockFactory::BlockType)(rand() % _blockColourCount);
}

void BlockFactory::addRandomBlock() {
    _blockList.push_back(getRandomBlockType());
}

Here’s the code that creates a new egg from a given type:

BlockBase* BlockFactory::newBlockFromType(BlockFactory::BlockType type) const {
    switch (type) {
        case BLOCK_RED:
            return new RedBlock();
        case BLOCK_PURPLE:
            return new PurpleBlock();
        case BLOCK_YELLOW:
            return new YellowBlock();
        case BLOCK_BLUE:
            return new BlueBlock();
        case BLOCK_GREEN:
            return new GreenBlock();
        case BLOCK_ORANGE:
            return new OrangeBlock();
    }

    // Included to silence compiler warning
    return new RedBlock();
}

The BlockFactory class includes a BlockType enum necessary for this scheme to work.

The Objective-C version dumps the enum and uses a tidier solution. As classes in Objective-C are themselves objects, storing enum values in the list is unnecessary. Instead, the list stores pointers to the class objects. Here’s the code that adds a new item to the list:

- (void)addRandomBlockClass {
    [_blockList addObject:[self randomBlockClass]];
}

- (Class)randomBlockClass {
    int type = rand() % _blockColourCount;

    switch (type) {
        case 0:
            return [RedBlock class];
        case 1:
            return [BlueBlock class];
        case 2:
            return [YellowBlock class];
        case 3:
            return [PurpleBlock class];
        case 4:
            return [GreenBlock class];
        case 5:
            return [OrangeBlock class];
    }

    // Included to silence compiler warning
    return [RedBlock class];
}

Creating a new BlockBase object from the list becomes trivial. A single line of code will create a new BlockBase object from an item in the list:

BlockBase* block = [[[_blockList objectAtIndex:0] alloc] init];

The item in the list is a class, so calling its alloc and init methods gives us a new object automatically.

The difference between the two sets of code is minor, but conceptually it’s a big leap. Objective-C lets me store classes in arrays. I’m pretty certain that the only other language I’ve used that could do anything even remotely like this is JavaScript.