2020-10-19

GBA Entity Component System

A New Game

I’m writing a new game. Technically it’s an old game; it’s something I’ve been meaning to get around to for decades now, even if the plan has evolved over the years and what I’m trying to make now doesn’t have much, if anything, in common with my original ideas.

What I’m working on now is a Metroidvania-like platformer for the GBA. I have a few goals:

  • Like old LucasArts adventures, there should be no way to die. I’ve found that being forced to repeat the same few minutes of a game over and over until I get it right just isn’t interesting for me any more, and I don’t have time for it.

  • It should be possible to save anywhere. Castlevania, Metroid, and even more modern games like Bloodstained that really should know better, devolve into a frantic scramble from one save room to the next so that I can switch off the console and go deal with something more urgent.

  • The game should tell an interesting story.

  • The game should focus on exploration. Exploration should reward the player with new capabilities that allow access to new areas and more pieces of the story.

  • It shouldn’t be too large or complicated. No-one cares about GBA homebrew so there is no point.

I have the basics of story figured out, influenced too much by Greg Bear’s “Hull Zero Three”, Aldiss’ “Non-Stop”, Heinlein’s “Orphans of the Sky”, Galouye’s “Dark Universe”, and the original Star Trek episode “For The World is Hollow and I Have Touched the Sky”. Derivative? Yes. It’s even been done before in videogames. The Sega Genesis/Mega Drive game “Generations Lost” has an intro scene that is almost identical to the opening I’d envisioned for this game. I’ve got some original twists planned, though.

I don’t have a good answer for what the challenge is yet. There’s exploration; that’s good. And there’s a story; great. But what do you do in the game if there’s no way to die, and no monsters? Just wander around a big empty environment finding things to collect and chatting to NPCs? Is that a compelling game? Sounds a lot like Treasure Island Dizzy and that kept me entertained for hours - once I got the invulnerability cheat working - so maybe that’s enough if I throw in a few take-thing-x-to-place-y puzzles. Generations Lost is a more traditional platformer and has enemies and weapons, but that game only managed to keep my attention for a few minutes before I switched it off in annoyance. Is it a “walking simulator”? I’ve never played one, but it sounds like it might be. Perhaps I’ll start with that and see where I get to.

The other big concern is the game’s graphics. It takes me days to laboriously create terrible pixel art. The sub-Amiga-public-domain graphics in Professor Sinister are just about the limit of what I can do. I’m bad at pixel art and I don’t want to practice. Graphics are a chore that I’d rather someone else did.

The ECS Pattern

Coding the game is a different matter entirely. The game is an excuse for a fun experiment with the “entity component system” pattern. Anything that you’d think of as a “game object”, like the player’s character, NPCs, moving platforms, or collectible items, is represented by an “entity”. An entity is essentially just a container for a bunch of different “components”. A component is a struct that holds a set of basic information about some aspect of the entity. For example, a “velocity” component might contain the current x and y speed of the entity, the delta by which to mutate those values, and the maximum values that can be attained. Lastly, a “system” is a stateless function that iterates over all relevant entities (perhaps those with both a “velocity” component and a “position” component, for example), and mutates the state of their components using the components themselves as the inputs.

The ECS pattern is neat because giving game objects behaviors is simply a matter of giving the objects the relevant component(s). Want an entity to be affected by gravity? Add the “gravity” and “velocity” components. Now the gravity system will recognize the entity and mutate its velocity appropriately. It’s far more flexible than the approach I used in my unfinished shoot-em-up Chroma-X, which tried to achieve something similar with protocols/interfaces: a game object had a number of “sockets”, like an “engine” socket, and changing how the object behaved meant crafting a new class that implemented the appropriate interface and could be plugged into the appropriate socket. Want an enemy to move in a sine wave? Plug in an instance of the “sine wave” engine. Adding an additional behavior to a game object with that design meant modifying its class to support the behavior’s interface rather than just appending a component and letting the rest of the system do its thing automatically.

Professor Sinister used object-oriented C. Data was stored in reference-counted opaque types and everything had to be malloced. Dynamic memory management allowed me to create my own implementations of variable-length mutable collections like arrays, sets, hashmaps, strings, etc. This time around I’m going in exactly the opposite direction. There isn’t a single call to malloc anywhere in the code, unless I really want that chunk of memory to live in EWRAM.

Achieving the loosely-coupled ECS panacea (in which an entity can contain 0…n components, each of which has a different size) in C is impossible in any sane way without using malloc() and variably-sized collections, so my solution gives all entities all possible components. Components include an “enabled” flag that systems look for when determining if an entity is relevant or not. It’s wasteful of memory but efficient in terms of processing time, as it eliminates searching through an entity’s components.

(On the subject of “loose coupling”, it’s worth pointing out that the ECS pattern doesn’t achieve it. For example, the jump system is responsible for determining if an entity can jump, and it needs input from numerous components to make that decision: Does the jump component indicate that the entity is already jumping? Does the input component say that the jump button is pressed? Does the duck component indicate that the entity is ducking? Does the environment collision component indicate that the entity is colliding with a tile above it? Systems are dependent on multiple components; those dependencies require that the components are updated in a specific order; and that ordering creates implicit dependencies between systems. Those dependencies can’t be enforced by a compiler, and they’re an impediment to multithreading the game logic.)

Deciding on the boundaries between components is interesting. For example, I have a “duck” component that tracks whether or not an entity is ducking. The component consists of two bools: enabled and ducking. The ducking system checks the state of other components (is the entity jumping, or falling, or climbing?) and, if they are in a good state, sets the ducking bool to match the state of the down button on the d-pad. Meanwhile, running is implemented not as a separate component but as part of the velocity system. If left on the d-pad is held down then the entity will move left; if B and left are held, then the entity will move left more quickly. Should there be a separate component for the run state of the entity? A separate component feels wrong but I’m not entirely sure why, other than the additional overhead it would incur; possibly because it’s a modification of the walk function rather than a separate capability?

Similarly, it’s tricky to decide on which system should update the components. For example, if an entity is flying and grabs hold of a ladder, which system is responsible for setting the state of the flight component from flying to inactive? The climbing system? Or the flying system? Perhaps an appropriate rule is “any system can read a component’s state, but only the main system for that component can mutate that state”. In our example, the flying system mutates the flying component. That rule feels like it will quickly collapse under the weight of too many edge-cases.

Platforming Engine

The platforming engine is coming along nicely. It supports climbable tiles (ladders), solid tiles, tiles with solid edges (any or all of the edges can be solid, so it’s possible to walk into the tile from the left but then hit its solid right edge), tiles with one-way edges (you could jump up through the tile from beneath and then land on its solid top edge), and tiles with a height map. Height maps are expressed as an array of 8 values between 0…8 that specify the height of the tile at a given x co-ordinate. They allow me to implement slopes with arbitrary gradients, bumps, dips, etc. This page was particularly helpful when getting slopes to work:

http://higherorderfun.com/blog/2012/05/20/the-guide-to-implementing-2d-platformers/

Destructible Tiles

Map tiles can be marked as destructible. The player can stand next to destructible tiles and press a button, which results in the tiles being removed from the map. The GBA’s hardware limitations made that feature more complex to implement than you’d expect. Possible approaches I considered were:

  • Mutating the level data (this isn’t desirable because it would require copying the level data from ROM to EWRAM, which is limited to 256K);
  • Storing a dictionary of destroyed tile co-ordinates (this isn’t desirable because looking up the data in the dictionary would not necessarily be fast; each co-ordinate would require a couple of uint32_ts to store; and the maximum size for the dictionary would either need to be the size of the map (see previous problem regarding RAM limits) or some arbitrary smaller limit, which would mean we could unexpectedly run out of space in the dictionary);
  • Create a small ring buffer of destroyed tile co-ordinates. Tiles would be overwritten automatically with newly destroyed co-ordinates when the buffer wraps, which means the oldest tiles would heal themselves like Lode Runner when new tiles were destroyed (not exactly what I wanted, but it would work);
  • Create a bit array in which each bit represents the destroyed state of a tile in the map.

After playing around with the first idea for a while I realised it wasn’t going to work with the way I was representing tiles. Each tile was a complex struct that stored a large amount of metadata. I rewrote that to use something like the flyweight pattern. I maintain a palette of unique tile definitions and represent the map with a second array of indexes into the tile palette. That shaved about 20% off the size of the ROM, which currently only has a couple of mostly-empty test maps.

Even with that change, the bit array option was far more efficient. A single screen map consumes a minimum of 600 bytes if copied to RAM (assuming one byte per tile entry). The equivalent bit array consumes 75 bytes.

Fixed Point

Initially I had everything implemented with integer math, but it became increasingly clear that the engine needed to use fixed-point math instead. The jump was too stiff; gravity was too aggressive; and running was too fast. Making the switch introduced an exciting selection of new problems and bugs, most of which were related to rounding problems. For example, maxX() used to be implemented like this:

return rect.origin.x + rect.size.width - 1;

If I have a rect at point (0, 0) with width and height both equal to 2, maxX() will give me a value of 1. That’s what I’d expect: the maximum X coordinate contained within the rect is 1. I can use the rect functions to step through the columns in the rect like this:

for (int i = minX(rect); i <= maxX(rect); ++i)

That falls apart with fractional values, though. Suppose I have a rect with a width of 0.5. The integer version of the function would return a maximum X coordinate of -0.5, which is clearly wrong. If we were to retain the decision to have maxX() return the largest X coordinate that falls within the rect the correct implementation would look like this:

return rect.origin.x + rect.size.width - FIXED_EPSILON

…where FIXED_EPSILON is the smallest magnitude value representable by a fixed point number.

It was more effort, but less surprising for users of the function, to change maxX() so that it simply returned the first possible coordinate outside the bounds of the rectangle:

return rect.origin.x + rect.size.width;

Stepping through the columns in the rect then requires this alternative iterator:

for (int i = minX(rect); i < maxX(rect); ++i)

Performance Surprises

Writing anything for the GBA requires time fixing performance problems. The most significant cause of problems so far, and the most surprising, was the Div BIOS function. I’d taken care to use the BIOS Div and Mod functions everywhere thinking that they were the most performant way to divide on the system. Given that I’ve already experienced bugs in the DS’ BIOS (*cough*CpuFastSetcough) you’d think I’d know better. In any case, ROMs built using Div performed noticeably worse than ROMs using the standard C divide operator.

The Testing Dance

Testing ROMs requires an elaborate dance with three laptops and a GBA:

  • New laptop to write and build the code;
  • Old laptop with an SD slot to copy the ROM to an SD card;
  • GBA to test;
  • Very old laptop running Windows to check no$gba’s CPU meter.

Grabbing the REG_VCOUNT value gives me a reasonable estimate of CPU usage and should mean I don’t need no$gba, but my first attempt at writing a profiler wasn’t accurate enough. mGBA’s logging functions (which I was calling to see the contents of the profiler) are expensive. If I improve the profiler hopefully I won’t need the Windows machine again.

2019-04-24

Proportional Fonts in GBA Tile Modes

I wanted to create a system for showing text with proportional fonts on the GBA. Typically on tile-based systems you’ll see fixed-width text with characters that align to tile boundaries. This is easy to do and efficient for the hardware. It’s exactly what I did in Professor Sinister. The new project I’m working on is going to have significantly more text and legibility is crucial, so I wanted to use proportional fonts instead of fixed-width.

First, some information about the GBA’s video system. The GBA’s screen resolution is 240x160px. When in tile mode, the screen is divided into a grid of 8x8px tiles. The hardware supports up to 4 layers of these tile grids overlaid on top of each other. In 4-bit color tile mode, in which each byte holds two pixels, ~19K of VRAM would be required per layer to store a full-screen bitmap in tile form. In addition to the tile bitmaps each layer requires a map to tell the video system how to lay out the tile bitmaps on screen. Each entry in the map is 16 bits wide. That’s another ~1K required per tile layer. Presenting a unique full-screen bitmap in all 4 tile layers simultaneously would require 75K of VRAM for the tile bitmaps and ~4.6K for the maps. The GBA has 64K of VRAM.

The upshot of this is simple: the GBA has nowhere enough VRAM. It can’t possibly show 4 full-screen bitmaps in tile mode. The best it can do is ~3.5 screens of tiles (2048) in 4bpp tile mode or 1.7 screens of tiles (1024) in 8bpp tile mode, but neither scenario leaves room for the tile maps.

In bitmap modes the video system is even more limited: two bitmaps (mode 4; 8bpp; 37.5K each) or one bitmap (mode 3; 16bpp; 75K). These modes eat into sprite memory and sacrifice sprites for bitmap data.

Keeping all of that in mind, how do you use a tile-based layout to show proportional font glyphs? Why, you ignore the benefits and constraints of the tile system and treat it like a bitmap, of course.

The font rendering system I used in Chuckie Egg et al has its roots in the rendering system I created for Woopsi, but with the warts removed. Give it a bitmap object to render into and the renderer will happily use the bitmap’s setPixel() method to draw text. The bitmap itself translates from the screen co-ordinates supplied by the renderer into co-ordinates that match its internal structure. That’s how I got graphics showing up on the 3DS with its weirdly-oriented framebuffer: I created a new bitmap class specifically for the 3DS.

Making tile mode on the GBA behave like a bitmap is a similar problem, but it requires some more setup. The first thing to do is to decide how many tiles to use. Using the full size of a tile layer isn’t advisable because, as we’ve seen, the number of tiles available is severely limited. I wanted the text to pop over the bottom of the screen like the conversation text in Pokemon. I figured I’d need 5 lines of 10px tall text. Add a couple of pixels spacing between each line and that gives me 60px; round it off with a couple of pixels padding above and below the text and I need 64px, which neatly fits into 8 vertical tiles. The screen is 30 tiles wide, so I need to reserve 240 tiles solely for the use of the text renderer. That’s a sizeable chunk of the 2048 tiles available in 4bpp mode, but at least we’re not using the 600 tiles we’d need for a full screen of text.

I’m keeping the first (zeroth) tile bitmap in VRAM empty, so that means I’m reserving tile bitmaps 1 through 241. At this point I can set up the tile map for the tile layer I’m using to display text: rows 12 to 19 need to contain tile bitmaps 1 to 241. Note that the layer is actually 32 tiles wide, not 30, but as two columns of tiles are offscreen and I’ll never scroll the layer I’m leaving those extra columns empty. Also note that the tile bitmaps must be sequential in VRAM or you’ll introduce an extra layer of complexity.

With that done I can feed VRAM+32 (ie the second tile bitmap) into my new bitmap class as its bitmap origin address and teach it how to translate screen co-ordinates to tile bitmap co-ordinates. This is how pixels are laid out in VRAM:

+-----------------+-----------------+  +-----------------+-----------------+
| 000 001 002 003 | 004 005 006 007 |  | 064 065 066 067 | 068 069 070 071 |
+-----------------+-----------------+  +-----------------+-----------------+
| 008 009 010 011 | 012 013 014 015 |  | 072 073 074 075 | 076 077 078 079 |
+-----------------+-----------------+  +-----------------+-----------------+
| 016 017 018 019 | 020 021 022 023 |  | 080 081 082 083 | 084 085 086 087 |
+-----------------+-----------------+  +-----------------+-----------------+
| 024 025 026 027 | 028 029 030 031 |  | 088 089 090 091 | 092 093 094 095 |
+-----------------+-----------------+  +-----------------+-----------------+
| 032 033 034 035 | 036 037 038 039 |  | 096 097 098 099 | 100 101 102 103 |
+-----------------+-----------------+  +-----------------+-----------------+
| 040 041 042 043 | 044 045 046 047 |  | 104 105 106 107 | 108 109 110 111 |
+-----------------+-----------------+  +-----------------+-----------------+
| 048 049 050 051 | 052 053 054 055 |  | 112 113 114 115 | 116 117 118 119 |
+-----------------+-----------------+  +-----------------+-----------------+
| 056 057 058 059 | 060 061 062 063 |  | 120 121 122 123 | 124 125 126 127 |
+-----------------+-----------------+  +-----------------+-----------------+

Each large box represents a tile bitmap. Each cell in the box represents a u16. The numbers represent the index of each 4-byte pixel.

To translate from screen co-ordinates to VRAM index, locate the tile that contains the point:

const int tileSize = 8;
const int pixelsPerTile = 8;
const int tilesPerRow = 30;

int tileX = point.x / tileSize;
int tileY = point.y / tileSize;

Figure out the index of that tile:

int tileOffset = (tileX * tileSize * tileSize / pixelsPerTile) + (tileY * tileSize * tileSize * tilesPerRow / pixelsPerTile);

Find the pixel within the tile:

int x = point.x % 8;
int y = point.y % 8;

Find the offset of the byte containing the pixel:

int offset = tileOffset + x + (y * tileSize);

That has far too many mods and divides. We can take advantage of the fact that almost every calculation is a power of 2 and use some bitwise magic to produce this faster but mostly-illegible one-liner:

#define offsetForPoint(x, y) (((x) & ~0x7) << 3) + ((((y) & ~0x7) << 3) * 30) + ((x) & 0x7) + (((y) & 0x7) << 3)

The define above, coupled with the VRAM address we supplied to the class when we created it and the knowledge that VRAM can only be addressed as u16s, we can create a setPixel() method that the font renderer can write to.

What’s particularly neat about reusing the existing rendering routines is that they can efficiently clip to a rectangle, allowing me to avoid the jerkiness of text rendering seen in something like Pokemon. That game renders text to the screen one character at a time, so the speed at which text appears varies with the width of the characters being rendered. With this rendering system I can smoothly slide the clipping rect across the screen and rely on the renderer to draw fractions of a character - or multiple characters if necessary - at a consistent pace.

2019-04-15

Professor Sinister Updated Again

In celebration of Simian Zombie’s 12th birthday, here’s an updated version of Professor Sinister and his Marauding Mechanicals:

Changes:

  • Added ladder mount/dismount assist to make it easier to use ladders.
  • Added cheat codes for level skip, invulnerability and slow motion mode.
  • Added raster effect on title screen.
  • Added high score table with persistent scores.
  • Fixed robot position when they collect power packs.
  • Fixed memory leaks.
  • Fixed crashes.
  • Fixed graphical glitches when switching between scenes.
  • Levels reset correctly when the game loops around.
  • Game loops through correct set of levels.
  • Updated title screen scrolling text.
  • Optimizations.

2017-04-10

Chuckie Egg 20170410

With egg hunting season almost upon us, it seems appropriate to release a little update for my favorite egg hunting game. Grab it now:

I’d hoped to get a level editor into the next release, but it’s not finished. This version fixes a few bugs and adds a couple of minor new features.

Changelog:

  • Scores are persisted on DS and PSP.
  • PSP analog stick can control Harry.
  • Fixed crash when completing a level if a cheat label is showing.
  • Fixed crash when switching to high score table.
  • Harry dismounts ladders correctly when climbing and jump is pressed.
  • Player does not get points when hens eat grain.
  • Bonus cannot be negative.

I haven’t looked into working with the Dreamcast’s VMU yet.

2016-11-19

Chuckie Egg Releases

Just in time for the holidays, here are final versions of Chuckie Egg for DS and PSP, and an almost-final version for the Dreamcast:

Chuckie Egg DS

Post Mortem

PSP

The PSP has a huge screen. Well, relative to the DS; maybe not so huge if you compare it to a modern phone. What do you do with all that space if you’re remaking a game that fits in 256x192 pixels? The obvious thing to do is double up the pixels and make the graphics twice the size. But wait; at 272 pixels tall the PSP’s screen is only 70% larger vertically than the Dragon 32, which means we can’t double up the resolution without losing 30% of the game’s display. I imagine it’s possible to wrap the game’s bitmap onto a quad and get the GPU to stretch it, but I eventually decided just to centre the game at its native resolution.

The PSP’s screen is 480 pixels wide but its default framebuffer is 512 pixels wide. Trying to change the width of the framebuffer to match the screen does not work (it results in the display being smeared horizontally, which always means “the dimensions of your bitmap do not match the layout of memory”). I had to add in the concept of “usable” vs “available” framebuffer space (“physical” vs “logical”, I suppose) and base the layout of the various screens on that.

It’s possible to push the PSP’s “home” button and quit from a game back to the launcher. Making this work entails creating a background thread and wiring up a callback that the OS can use to signal to the game that it needs to quit. It’s very simplistic; if the game doesn’t implement the callback then the system just hangs forever. The OS doesn’t appear to forcibly quit rogue processes. I’m not even sure that it’s really a multi-process system. I imagine Sony handled errors like this in their QA process so they didn’t bother with anything more elaborate to catch badly-implemented games.

Different geographic regions have different PSP button layouts. I don’t quite know what Sony thought the advantage of swapping the O and X buttons was, but it means well-behaved homebrew has to work around it. Games typically either allow the buttons to be remapped or come in different builds for different layouts. Official games were region-locked so that wasn’t an issue; it was probably fixed during localization. I’m too lazy to deal with it.

Getting sound working was troublesome. The basic PSPSDK libraries offer a way - I think - to craft PCM data and play it back, which really doesn’t help if you just want to play a WAV and not put any thought into it. I ended up using libmikmod, which has some peculiarities. Its MikMod_Update() function must be called regularly in order for sounds to work, but that can be stuffed into another background thread and forgotten about. It offers two ways to play back samples: Sample_Play(), which will automatically choose a channel to play back on, and Voice_Play() which lets you choose the channel. I found that Sample_Play() always chose exactly the wrong channel and ended up stopping sounds that I wanted to continue playing, so I switched to the alternative function. Voice_Play() doesn’t have any useful documentation. Inspecting the library’s code reveals that the trick is to set the volume, frequency and panning of the voice each time a new sample begins:

Voice_Play(channel, sound, 0);
Voice_SetVolume(channel, sound->volume << 2);
Voice_SetPanning(channel, PAN_CENTER);
Voice_SetFrequency(channel, sound->speed);

DS

The DS has a second screen. What do you do with two screens when remaking a game designed for one? I had a few ideas:

  • Level editor;
  • Chuckie Egg title screen;
  • Simian Zombie logo;
  • Dragon 32 image.

In the end the SZ logo was simplest, so I went with that. Otherwise, the DS is my default platform, so there aren’t any surprises here.

Dreamcast

It appears that calling snd_sfx_stop() on a channel playing a stereo WAV file will only stop one of the channels, which explains why sounds weren’t behaving themselves in the beta. Switching to mono files fixed it. Another problem was caused by limitations of the audio hardware: the Dreamcast can only play back the first 65534 samples in a WAV file, which is far shorter than several of the sounds in the game. To work around this I halved the sample rates of the longer samples. That fixed most of them, but the title song and Spectrum loading sound were still too long. I converted those to MP3 and used the streaming API, but that created problems of its own. Sounds still cut out or don’t play, and the emulator I’ve been using to help development crashes whenever I call mp3_stop(). There’s still some work to do there.

The DC has a similar problem to the PSP: I’m using the 320x240 NTSC graphics mode, but a good portion of that is off-screen on my TV. The “usable” framebuffer space concept helped out here.

The DC feels noticeably faster than the other two platforms, which is odd, as I’d expect them to run at identical framerates. Perhaps it’s just because it runs on such a massive display.

The Dreamcast’s reversed ARGB ordering means I’ve had to convert bitmaps into both DS and DC sourcecode and use #ifdefs to choose the right version of the data for the current target platform at compile time.

Cross Platform

What was most surprising was how little effort I had to put into Chuckie Egg in order to make it performant on these different platforms. The effort that went into Hanky Alien presumably helped a lot. I was testing HankyAlien with up to ~140 views on screen at once and up to ~45 of those updating simultaneously. In contrast, despite superficially being a more complex game, Chuckie Egg has ~45 views on screen at once and only ~10 updating simultaneously. (It’s interesting to note that, while both the DS and DC struggled with the HankyAlien test scenario and their framerates suffered, the PSP sauntered through it without a hitch. I really must write something specifically for the PSP in future.)

Writing object-oriented C is fun, but sometimes I look at the amount of boilerplate necessary to create a new “class” and decide to do things another way, particularly if I’m creating a base class. The most simple class requires:

  • A static metaclass struct, which forms a linked list back through the chain of superclasses (it enables the program to determine the type of an object at runtime).
  • A static callback struct that includes pointers to equality/comparison/hash/copy/dealloc functions.
  • Declarations of those functions.
  • Implementations of those functions.
  • A struct with the member variables of the class.
  • A typedef that defines a pointer to the struct, allowing the struct to be hidden.
  • A constructor that calls the allocator and initializer methods of the class.
  • An initializer function that sets up the members of class instances and calls back to the superclass’ initializer.
  • A convenience release method that wraps the basic release method.

If the class is intended to be subclassed it must also:

  • Provide an additional private header including:
    • A callback struct definition that nests the superclass’ callback struct and enables “methods” to be “overridden” (really it’s the template pattern with function pointers).
    • The struct itself, so subclasses can directly access properties of the struct and include the struct as the first item in their own structs.
  • Call the functions in the callback struct when appropriate (including those for the superclass).

Every time I create a new class I do so by copying-and-pasting an existing class and modifying it rather than start it from scratch and have to retype all of the boilerplate. Sometimes I’ll think about the work involved in setting one of these things up, pine for C++ or Objective-C, and then use a different structure that’s less “correct” but will still get the job done without smelling too badly. The grain and eggs, for example, should really inherit from a single “collectible item” class, but the correctness of that design doesn’t in any way compensate for the amount of work involved in setting it all up.

Keeping a strict separation between the model (game logic), the views (the stuff you see on screen) and the view controller (delegate of the model; updates the views to match incoming events) results in some beautifully clean code but makes prototyping difficult, especially early on when none of the infrastructure is in place. I did most of the early prototyping in the view controller before moving it into a separate model.

Each level renders its static content (ladders, platforms and bird cage) as a bitmap when it begins. It passes that up through the model to the view controller, which uses the bitmap as the background view for the game. That eliminated the need to create dozens of views to represent tiles that never change during the course of a level.

There are now 134,000 lines of code in the project, but a sizable number of those are included in bitmaps that have been converted to C.

2016-11-14

PSP Surprise

I’ve mentioned before that SZLib presents challenges:

The downside to completely ignoring the capabilities of the DS hardware and doing all of the graphics work with just the CPU and a framebuffer is that it is insanely expensive.

However, its bare-bones approach - it just needs a framebuffer and a way to read a joystick, pretty much - made getting it working on the Dreamcast fairly trivial. Getting it working on other platforms should be simple too, but when choosing a new platform for the library I eliminated everything but the Dreamcast as a potential target.

I was too hasty when I crossed the PSP off that list. The handheld does offer the ability to write directly to the framebuffer. It took about an hour or so to get to this point:

ChuckieEggPSP

By far the hardest part was getting MikMod set up to play sound effects, which took a couple of hours on its own.

There’s a little bit of work to do. The PSP has a physical screen size of 480x272, but for some reason it seems that the frame buffer is 512 pixels wide. The disconnect between the physical and logical sizes means everything is slightly off-center. Also, the “home” button isn’t wired up yet. Overall, though, the game works just as well on the PSP as it doesn on the Dreamcast and NDS.

Excitingly, because they’re based on the same libraries, HankyAlien also now runs on the PSP.

2016-11-10

Chuckie Egg

Back in the early eighties, when the home computer boom was booming, my parents decided to buy themselves a computer. Following the advice of various computing magazines of the era they ended up with what was supposedly the best of the bunch: the Dragon 32. It featured 32K of RAM, a real keyboard and a Centronics parallel port. It was powered by Bill Gates’ favourite 8-bit CPU, the Motorola 6809. Unfortunately for my parents and their new Dragon, the manufacturer ran into financial troubles and collapsed not long after.

The unusual choice of the 6809 CPU meant very little software was written for or ported to the platform. One of the few games that made it was the BBC and Spectrum classic “Chuckie Egg”. This obscure port (the World of Spectrum doesn’t list it) is by far my favourite version, and it’s the game I spent the most time playing on our Dragon. There has never been a remake of it. Worse, neither the Dreamcast nor the Nintendo DS has a native version of Chuckie Egg. Let’s fix that.

Here’s a beta version of my remake of the Dragon 32 version of Chucky Egg for Nintendo DS and Dreamcast:

A couple of screenshots:

Chuckie Egg Logo Chuckie Egg Game

The game itself is fully playable, but there’s still some polishing to do:

  • Sound on the Dreamcast is misbehaving;
  • The Dreamcast version runs faster than the DS, so some timings are off;
  • Later levels do not have increased numbers of hens or faster moving hens;
  • Hens don’t eat grain;
  • The high score table is not functional (it’s just a bitmap);
  • There’s no “game over” screen;
  • The score/timer/etc display layout and font is all wrong;
  • Remaining lives aren’t displayed;
  • The font in the title screen scroller is wrong.

This is an inexhaustive list of bugs I spotted in the original game while making this:

  • On the first level, climb the largest ladder to the third platform on the right (the start of the staircase). Stand on the ladder just below the top of the floor and jump right. Instead of bouncing off the first step in the staircase, Harry lands inside it. Now press right and Harry will fall through the platform on the right. (The remake fixes this.)

  • On level 7, stand next to the two blocks immediately on the right when the level starts. Face them, hold “right”, and jump. Harry stands in the middle of the upper block. (The remake reproduces this.)

  • Animation frame doesn’t reset when Harry stands on an elevator. (The remake fixes this.)

  • When a new level rolls in the remaining time is set to 999; it immediately jumps to 898 when the level starts. After dying it resets to 899.

  • Harry gets killed if the elevator he is standing on hits the top of the screen, but if he is jumping then he falls through the elevator. (The remake doesn’t kill Harry; he just drops down when the elevator disappears.)

  • There are a bunch of inconsistencies with ladder behavior. Sometimes it is possible to walk from a platform all the way across a ladder; sometimes Harry falls through half-way along the ladder. This looks more like a bug than desired behavior.

  • One of the extra hens on level 3 (at the bottom of the twin ladders) starts one block below where it should do.

Here are some of the differences between the remake and the original:

  • Jump isn’t quite the same.
  • Frame rate is much better.
  • No flickering.
  • Hen patterns aren’t the same (but the Dragon doesn’t match the BBC or ZX, and they don’t match each other, so that’s OK).
  • Duck moves smoothly.
  • Hens move more smoothly.

I think this will be the last remake for a while. This is the list of remakes I’ve managed to complete (more or less) so far:

  • Pong (Blitz Basic/C/C++/Java/Flash; Amiga/Cybiko/GBA/DS)
  • Asteroids (Flash)
  • Mario Bros (Flash)
  • Tetris (Flash)
  • IK+ (Flash)
  • PacMan (Java; Linux)
  • Space War (Java; web applet)
  • Earth Shaker (C++; DS/SDL)
  • Super Foul Egg (C++/Objective-C; DS/SDL/OSX/iOS)
  • BioShock sub game (C++; DS/SDL)
  • Space Invaders (C; DS/SDL/DC)
  • Chuckie Egg (C; DS/SDL/DC)

This is the list of remakes I started but abandoned:

  • World Grand Prix (Flash)
  • Trick or Treat (Java; web applet)
  • Atic Atac (C/C++; Cybiko/DS/SDL)
  • Defender (C/C++; DS)
  • Dan Dare (C; DS/SDL)

2016-10-13

HankyAlien v20161013

New releases of HankyAlien for Dreamcast and Nintendo DS!

Changes for Dreamcast are:

  • Added sound;
  • Fixed palette;
  • Improved performance;
  • Fixed layout for NTSC TVs.

Note that the sound doesn’t work at all well with lxdream.

Aside from the massive behind-the-scenes rewrite, there shouldn’t be anything noticeably different between this DS version and the last release.

2016-10-10

HankyAlienDC

A few more evenings of reorganising things for the Dreamcast and here’s HankyAlienDC:

HankyAlienDC

The archive contains a .elf version that will run in lxdreams, and presumably other emulators, and a .bin version that will run via an SD adaptor and DreamShell RC4. It will presumably run if you somehow burn it to a CD, but it’s been so long since I’ve made a CD of Dreamcast homebrew that I’ve completely forgotten how to do it. Plus I’m on a different OS and don’t have an optical drive any more.

There are still a few issues to be fixed:

  • The colors are wrong. The components in DS pixels are ordered ABGR; DC pixels are ordered ARGB.
  • Elements at the top and bottom of the screen are cropped out. The display is set to 320x240@60Hz, but of course NTSC TVs don’t display 240 vertical pixels; they display 200.
  • There’s no sound yet.

Here’s my Dreamcast test hardware, acquired from Amazon and eBay:

Dreamcast

Things I was surprised to learn about the Dreamcast:

  • There were multiple revisions of the hardware. The final US revision doesn’t support homebrew because Sega removed support for the “Mil-CD” multimedia disc format that had enabled all piracy on the system. There was only one PAL revision, which did support Mil-CD, which explains why I’d never heard of incompatible hardware before.
  • On a modern flatscreen TV - or at least, on my TV - the quality of the composite video output is awful. Really, really awful.
  • The performance of the emulator isn’t at all representative of the performance of the hardware.

2016-10-05

Hanky Alien Refactored

I’ve been toying with the idea of writing another homebrew DS game using the libraries I developed for Hanky Alien. Unfortunately the existing code has one considerable flaw: the game logic is intimately entwined with the rendering system, to the point that writing a new game unavoidably involves rewriting a bunch of graphics code that really should have been separate.

To fix this I’ve pulled apart Hanky Alien and reassembled it into three distinct pieces:

  • A layering library that handles all of the rect stuff I insist on using instead of just writing a sprite simulator for SDL like any sensible person;
  • An MVC GUI framework with the components necessary to get the game running (view hierarchy, controllers, timers, label views, bitmap views, event-based stylus and button handling, transitions, etc);
  • The game itself.

The game logic is currently included in the same build target as its presentation layer, but those can be trivially separated once I get around to it.

The new structure has a multitude of benefits. In order to place text on the screen, the code no longer needs to figure out where the text will go, figure out if it needs to erase what was previously at those co-ordinates, and then render the text; it just updates the string in a label view and lets the underlying libraries do all of the hard work. There’s a complete separation between the game logic and all I/O. Timer events piped down from the controller kick the game’s update logic, and game events (such as movement, or the addition or removal of objects within the game) are sent up to the controller via delegate callbacks. The controller is responsible for creating, moving, updating and removing views. I can change the game logic without having to think about any rendering. Each scene in the game - the “Simian Zombie” scene, the title scene, and the game scene itself - are all separate controllers. The GUI framework handles animating the transition from one controller to the next, which means a bunch of complex custom state machine logic became redundant. On top of that, the whole shebang is written in pure C.

All of this structure came with a horrible cost, however. Performance was nowhere near acceptable.

One huge benefit of manually managing the layout and redraw of the screen means each situation can be as efficient as possible. A generic approach to updating the framebuffer means there’s no opportunity for taking shortcuts that make sense in one place but would break other code. This forced me to dig through the entire codebase and try to optimize everything.

First step: profiling with Instruments. Hanky Alien builds happily using SDL2, so I can profile it on my Mac. It doesn’t necessarily give me accurate information, obviously; the Mac and DS have wildly different architectures and capabilities. It does give me a starting point, though.

One mistake that quickly became apparent was the amount of boxing and unboxing of rect structs that the layering library was performing. Subtracting one rectangle from another produces an array of leftover rectangles, and those were returned in an array. The array provided by the core library would only store objects, so I’d opted for the naive approach of boxing the leftover rect structs into value objects and sticking those in an array. Unfortunately each boxing operation required a malloc(), and any attempt at using the data in the objects required them to be unboxed back into structs first. The fix for this was easy: create a new array class specifically for storing collections of rect structs. As long as a resize isn’t required - which inevitably demands a realloc() call - appending rects involves nothing more than changing an int and copying a struct.

Creating those arrays was also slow, particularly when multiple rectangles are divided in a tight loop. The code used to look something like this:

for (int i = 0; i < SZRectArrayLength(rectsToDivide); ++i) {
    SZRect r = SZRectArrayRectAtIndex(rectsToDivide, i);
    SZRectArrayRef a = SZRectCreateArrayBySubtractingRect(rect, r);

    // Do stuff with a

    SZRectArrayRelease(a);
}

It now looks like this:

static SZRectArrayRef a = NULL;

if (!a) {
    a = SZRectArrayCreate();
}

for (int i = 0; i < SZRectArrayLength(rectsToDivide); ++i) {
    SZRect r = SZRectArrayRectAtIndex(rectsToDivide, i);
    SZRectSubtractRectAndPopulateArray(rect, r, a);

    // Do stuff with a

    SZRectArrayRemoveAll(a);
}

The “remove all” function just sets the array’s size property to 0. We create a single static array and re-use it in each call of the function and each iteration of the loop. We can get away with this because all of the code is single-threaded.

I’ve been unrolling loops. The code above does a bunch of work in the loop that we could unroll. We can move the variable declarations out of the loop and stop querying the array length on each iteration:

int rectCount = SZRectArrayLength(rectsToDivide);
SZRect r;

for (int i = 0; i < rectCount; ++i) {
    r = SZRectArrayRectAtIndex(rectsToDivide, i);
    SZRectSubtractRectAndPopulateArray(rect, r, a);

    // Do stuff with a

    SZRectArrayRemoveAll(a);
}

We’re passing structs around by value; we should probably stop doing that too:

for (int i = 0; i < rectCount; ++i) {
    r = SZRectArrayRectAtIndex(rectsToDivide, i);
    SZRectSubtractRectAndPopulateArray(&rect, &r, a);

    // Do stuff with a

    SZRectArrayRemoveAll(a);
}

The game logic itself had some inefficiencies. Something I’d annotated with a “todo” in the original game was the alien collision code. In order to check if the player’s bullets were colliding with an alien, the code looped over all aliens in the game. The new version treats each column of aliens as a unit and checks collisions with those before checking against the aliens themselves.

The code uses a huge number of rect calculations, so those needed to be as fast as possible. All of the functions had code to cater for rects with a negative width or height. That would never be possible with the data coming out of the various libraries, so I added a bunch of “fast” versions that didn’t do any sanity checking.

The rendering code now uses Cearn’s ARM assembly copy functions for blitting stuff to the framebuffer.

The code is much, much faster than it used to be, but it’s not enough to throw around as many bullets as the older version.

Some of the exciting failures were:

  • Flattening the view hierarchy into a single view and manually handling a bunch of the redrawing logic. Initially it seemed like this would lead to a massive speed boost, but as the redrawing code got more and more complex the game got slower and slower.
  • Replacing bitmaps with solid blocks of color to see if the DMA would help.
  • Using quadtrees to help partition the space and reduce the amount of collision detection being done.
  • Creating my own profiler to record the execution time of various functions.

My final idea was to ditch the DS completely and switch platforms to something with more CPU power. Requirements for the new platform:

  • Must be a console or handheld (rules out anything with a keyboard or without a joypad).
  • An easily-installed homebrew dev kit with great C support (rules out pretty much everything prior to 32 bit).
  • A faster CPU than the DS’ 66MHz ARM9 (no GBA).
  • No fighting firmware upgrades (so long, Vita/3DS/Wii U).
  • Well-designed input devices (no GPH devices).
  • Must be easy to test on the real device.
  • Must have a reasonably capable OSX emulator.

By my estimation, that leaves three potential devices: the PSP, the Wii, and the Dreamcast. I’ve had a look at the dev kits for the PSP and Wii and neither one really seemed to fit with the archaic software rendering that I’m trying to do. The Dreamcast, though, appears to have a wonderfully simplistic framebuffer graphics mode and even supports DS-style RGB555-encoded pixel data. It could use some more homebrew.

One trip to Amazon later and a new (old) NTSC Dreamcast is winging its way to me! After a few evenings of tinkering I managed to get this running in lxdream:

HankyAlien-Dreamcast

Unfortunately, despite the Dreamcast being fantastically more capable on paper (200MHz CPU, 16MB RAM, 8MB VRAM vs the DS’ 66MHz CPU, 4MB RAM and 656KB VRAM), it turns out that in terms of raw CPU performance there’s not a whole lot between them. On a more positive note, programming the Dreamcast looks like it’ll be fun, so once this refactoring project is complete I think I’ll try writing something exclusively for it.