2018-07-06

Professor Sinister Update

Here’s a video of the latest build of Professor Sinister:

The most obvious change is the inclusion of the score/level/etc indicators at the top of the screen. Those are implemented as tiles in one of the GBA’s 4 background layers, giving us one layer for the text, one layer for the platforms, one layer for the parallax background, and one spare layer.

It wasn’t really noticeable in the previous video, but there was a little flickering at the top of the screen in both the sprites and the background tiles. It turns out that my belief that there was no performance work to do in the game engine was incorrect. Representing the vacuum tubes and power packs as “game objects” (essentially a data structure representing a sprite) instead of just inspecting tiles in the level data meant that the game had to iterate over them all when testing for collisions. That taxed the GBA’s tiny brain just a little too far.

The game just about got away with it until I decided to support larger levels with larger quantities of collectible items (demoed in the video above), at which point the horizontal tearing was too awful to ignore. I had to rewrite the egg/chicken feed code in the SZChuckie library, and then update both the Professor Sinister and the Chuckie Egg front-ends to match. Larger levels will let me take advantage of features of the engine that Chuckie Egg can’t use, such as arbitrary numbers of vacuum tubes, power packs, elevators and robots.

The game now has a “Simian Zombie” logo screen and horizontal scrolling text on its title screen (but it’s really inefficient so I’ll probably rewrite it), and a functioning “get ready”/“game over” screen.

I rewrote the way that the duck moves. The old code was pretty nasty and only managed to move the duck in one of 8 directions (horizontal, vertical, diagonal), whereas the new code uses vectors (correctly this time) to calculate a more direct route to the player. I still haven’t come up with a new design for the duck, though, so it’s not visible in Professor Sinister yet.

Speaking of design, here’s some more detail on Professor Sinister’s graphics. I originally wanted the robots in Professor Sinister to look like something from a 1950s sci fi movie: clunky and boxy, and covered in flashing lights. Their bodies would be separated into two distinct parts, so that when they changed direction the bottom half would spin 180˚ before the top swiveled around to match. I eventually ditched that idea because it would slow down the gameplay and would require far too many changes to the Chuckie Egg library.

I refined the pool of robotic inspirations down to just two: the robots from “Wallace and Gromit” and “Forbidden Planet”. In the former, I liked the little wheel that the robot scoots around on. In the latter I liked the bulbous design, segmented arms, and pincers.

The first design I came up with was very similar to the version that ended up in the game:

First Robot Design

It was taller and thinner, however, so that when I tried transferring it into Aseprite I quickly realized that it was entirely the wrong shape. The robot was too tall for the 16x16 sprite and didn’t take up enough horizontal space. At that point it seemed that I’d have to ditch the design entirely, because trying to squash the robot into a wider, more cylindrical shape (from Robo to Mobo) didn’t work at all.

I gave up on the robots and switched to Professor Sinister. I was aiming for a cross between Dr Robotnik from “Sonic the Hedgeghog” and Dr Wily from “Mega Man”. Most of what I draw ends up looking like something from “The Far Side”, so that’s where I started. I knew I wanted thick, opaque glasses, a round head and body, and a lab coat, but the lab coat never looked right and neither did the body:

First Professor Sinister Design

In pursuit of new ideas I briefly toyed with the idea of making the professor female, but I didn’t get very far with that. Instead I tried out an egg-shaped head (referencing Chuckie Egg) with feet directly attached to it, like Dizzy, and it looked pretty good:

Amended Professor Sinister Design

Once I had a sketch to start from, getting the character into Aseprite was easier than I’d expected. The only feedback I got on the sprite was “it looks too much like an egg”, so I added a ring of white hair around the top of his head to hide his eggy-ness.

“Rick Dangerous 2” provided much of the palette for the professor. Its graphics are wonderfully cartoony and vibrant. I’ve always wanted to make a game that looked like Rick 2. Borrowing colors from its palette seemed like an obvious first step.

With the professor finished I realized that I could keep the egg theme for the robots. I took the egg shape, flipped it upside down, and suddenly the design worked:

Amended Robot Design

It took some flipping between Aseprite and Acorn to get the sprites into a form that Grit could convert to C code. Grit of course utterly failed to produce correct palettes. I had to figure out which index Grit had assigned to each of the 60 or so colors used in the game and the set up the palette in code. Each minor edit to the graphics completely changed the layout of the palette, meaning more manual palette layout. I later discovered that although Grit can’t produce a palette from BMP, PNG or PCX files, it will correctly convert palettes from GIF files.

Testing out the graphics revealed a problem with the professor sprite. In Chuckie Egg, Harry’s feet are precisely aligned with the collision area between Harry and the floor. It’s possible to be pixel-perfectly accurate in moving him to the edge of a platform. In the new graphics I’d made the professor’s feet much larger and had completely lost that accuracy. I had to redraw the feet to line up with Harry’s.

The ladder uses some of the palette from Rick Dangerous 2, as do the background tiles (which were heavily influenced by some background tiles in one of the Rick games). The platforms are also similar to platforms in Rick 2.

I created the vacuum tube and power pack tiles, and the elevator sprite, entirely from scratch. I had a look at a couple of vacuum tubes on Google for inspiration, but otherwise I didn’t have any reference for these graphics.

As for the duck, I had a couple of ideas: a sphere (in reference to “Impossible Mission”) or a demonic robot face (like “Sinistar”). Then I remembered that it needs to be roughly the same shape as the duck or the collisions won’t work correctly, so I’m now considering some kind of robot pterodactyl thing.

The todo list doesn’t seem to be getting any shorter:

  • Add a title screen bitmap.
  • Write music for the title screen.
  • Write music that plays when Sinister is killed.
  • Create new graphics for the duck and its cage.
  • Replace the sound samples.
  • Add a high score screen.
  • Replace all level designs.
  • Add an attract mode.

2016-10-05

Hanky Alien Refactored

I’ve been toying with the idea of writing another homebrew DS game using the libraries I developed for Hanky Alien. Unfortunately the existing code has one considerable flaw: the game logic is intimately entwined with the rendering system, to the point that writing a new game unavoidably involves rewriting a bunch of graphics code that really should have been separate.

To fix this I’ve pulled apart Hanky Alien and reassembled it into three distinct pieces:

  • A layering library that handles all of the rect stuff I insist on using instead of just writing a sprite simulator for SDL like any sensible person;
  • An MVC GUI framework with the components necessary to get the game running (view hierarchy, controllers, timers, label views, bitmap views, event-based stylus and button handling, transitions, etc);
  • The game itself.

The game logic is currently included in the same build target as its presentation layer, but those can be trivially separated once I get around to it.

The new structure has a multitude of benefits. In order to place text on the screen, the code no longer needs to figure out where the text will go, figure out if it needs to erase what was previously at those co-ordinates, and then render the text; it just updates the string in a label view and lets the underlying libraries do all of the hard work. There’s a complete separation between the game logic and all I/O. Timer events piped down from the controller kick the game’s update logic, and game events (such as movement, or the addition or removal of objects within the game) are sent up to the controller via delegate callbacks. The controller is responsible for creating, moving, updating and removing views. I can change the game logic without having to think about any rendering. Each scene in the game - the “Simian Zombie” scene, the title scene, and the game scene itself - are all separate controllers. The GUI framework handles animating the transition from one controller to the next, which means a bunch of complex custom state machine logic became redundant. On top of that, the whole shebang is written in pure C.

All of this structure came with a horrible cost, however. Performance was nowhere near acceptable.

One huge benefit of manually managing the layout and redraw of the screen means each situation can be as efficient as possible. A generic approach to updating the framebuffer means there’s no opportunity for taking shortcuts that make sense in one place but would break other code. This forced me to dig through the entire codebase and try to optimize everything.

First step: profiling with Instruments. Hanky Alien builds happily using SDL2, so I can profile it on my Mac. It doesn’t necessarily give me accurate information, obviously; the Mac and DS have wildly different architectures and capabilities. It does give me a starting point, though.

One mistake that quickly became apparent was the amount of boxing and unboxing of rect structs that the layering library was performing. Subtracting one rectangle from another produces an array of leftover rectangles, and those were returned in an array. The array provided by the core library would only store objects, so I’d opted for the naive approach of boxing the leftover rect structs into value objects and sticking those in an array. Unfortunately each boxing operation required a malloc(), and any attempt at using the data in the objects required them to be unboxed back into structs first. The fix for this was easy: create a new array class specifically for storing collections of rect structs. As long as a resize isn’t required - which inevitably demands a realloc() call - appending rects involves nothing more than changing an int and copying a struct.

Creating those arrays was also slow, particularly when multiple rectangles are divided in a tight loop. The code used to look something like this:

for (int i = 0; i < SZRectArrayLength(rectsToDivide); ++i) {
    SZRect r = SZRectArrayRectAtIndex(rectsToDivide, i);
    SZRectArrayRef a = SZRectCreateArrayBySubtractingRect(rect, r);

    // Do stuff with a

    SZRectArrayRelease(a);
}

It now looks like this:

static SZRectArrayRef a = NULL;

if (!a) {
    a = SZRectArrayCreate();
}

for (int i = 0; i < SZRectArrayLength(rectsToDivide); ++i) {
    SZRect r = SZRectArrayRectAtIndex(rectsToDivide, i);
    SZRectSubtractRectAndPopulateArray(rect, r, a);

    // Do stuff with a

    SZRectArrayRemoveAll(a);
}

The “remove all” function just sets the array’s size property to 0. We create a single static array and re-use it in each call of the function and each iteration of the loop. We can get away with this because all of the code is single-threaded.

I’ve been unrolling loops. The code above does a bunch of work in the loop that we could unroll. We can move the variable declarations out of the loop and stop querying the array length on each iteration:

int rectCount = SZRectArrayLength(rectsToDivide);
SZRect r;

for (int i = 0; i < rectCount; ++i) {
    r = SZRectArrayRectAtIndex(rectsToDivide, i);
    SZRectSubtractRectAndPopulateArray(rect, r, a);

    // Do stuff with a

    SZRectArrayRemoveAll(a);
}

We’re passing structs around by value; we should probably stop doing that too:

for (int i = 0; i < rectCount; ++i) {
    r = SZRectArrayRectAtIndex(rectsToDivide, i);
    SZRectSubtractRectAndPopulateArray(&rect, &r, a);

    // Do stuff with a

    SZRectArrayRemoveAll(a);
}

The game logic itself had some inefficiencies. Something I’d annotated with a “todo” in the original game was the alien collision code. In order to check if the player’s bullets were colliding with an alien, the code looped over all aliens in the game. The new version treats each column of aliens as a unit and checks collisions with those before checking against the aliens themselves.

The code uses a huge number of rect calculations, so those needed to be as fast as possible. All of the functions had code to cater for rects with a negative width or height. That would never be possible with the data coming out of the various libraries, so I added a bunch of “fast” versions that didn’t do any sanity checking.

The rendering code now uses Cearn’s ARM assembly copy functions for blitting stuff to the framebuffer.

The code is much, much faster than it used to be, but it’s not enough to throw around as many bullets as the older version.

Some of the exciting failures were:

  • Flattening the view hierarchy into a single view and manually handling a bunch of the redrawing logic. Initially it seemed like this would lead to a massive speed boost, but as the redrawing code got more and more complex the game got slower and slower.
  • Replacing bitmaps with solid blocks of color to see if the DMA would help.
  • Using quadtrees to help partition the space and reduce the amount of collision detection being done.
  • Creating my own profiler to record the execution time of various functions.

My final idea was to ditch the DS completely and switch platforms to something with more CPU power. Requirements for the new platform:

  • Must be a console or handheld (rules out anything with a keyboard or without a joypad).
  • An easily-installed homebrew dev kit with great C support (rules out pretty much everything prior to 32 bit).
  • A faster CPU than the DS’ 66MHz ARM9 (no GBA).
  • No fighting firmware upgrades (so long, Vita/3DS/Wii U).
  • Well-designed input devices (no GPH devices).
  • Must be easy to test on the real device.
  • Must have a reasonably capable OSX emulator.

By my estimation, that leaves three potential devices: the PSP, the Wii, and the Dreamcast. I’ve had a look at the dev kits for the PSP and Wii and neither one really seemed to fit with the archaic software rendering that I’m trying to do. The Dreamcast, though, appears to have a wonderfully simplistic framebuffer graphics mode and even supports DS-style RGB555-encoded pixel data. It could use some more homebrew.

One trip to Amazon later and a new (old) NTSC Dreamcast is winging its way to me! After a few evenings of tinkering I managed to get this running in lxdream:

HankyAlien-Dreamcast

Unfortunately, despite the Dreamcast being fantastically more capable on paper (200MHz CPU, 16MB RAM, 8MB VRAM vs the DS’ 66MHz CPU, 4MB RAM and 656KB VRAM), it turns out that in terms of raw CPU performance there’s not a whole lot between them. On a more positive note, programming the Dreamcast looks like it’ll be fun, so once this refactoring project is complete I think I’ll try writing something exclusively for it.

2010-09-10

Performance Problems - Not Where You'd Expect

The new rendering system in Woopsi is coming along nicely. In most areas it increases rendering speed and reduces flicker. In one area, though, it hasn’t been an improvement at all. Scrolling has become a problem.

In the old system, scrolling regions were extremely quick, mainly because most of the scrolling is done by just using the DMA to copy the existing pixels to a new location. In the new system, however, scrolling is sluggish and annoying. It seems to take more than one frame to perform a single scroll operation, which results in the display lagging behind the stylus and the system appearing to be, well, a bit rubbish.

I’ve already made one major optimisation. Instead of asking every gadget to draw every damaged rectangle until there are no rectangles left (which results in dozens of pointless, complex clipping operations), Woopsi uses the hierarchy of the gadgets to its advantage. If a parent gadget does not intersect a damaged rectangle, there is no way for its children to intersect it either. In avoiding trying to draw those children we save a lot of time.

However, this did not solve the scrolling problem. So what could it be? More to the point, how can it have happened, as the scrolling code hasn’t actually changed?

I had a few guesses. Perhaps the scrolling operation was running more than once. Putting break points into the Xcode version of Woopsi quickly killed that idea. Perhaps I need to cache the contents of ScrollingPanels in some sort of off-screen bitmap - maybe it’s all the redrawing that is a problem. But it couldn’t be that - there weren’t any problems before. Perhaps the list of newly-revealed rectangles that need to be redrawn that is returned by the scrolling routine is inaccurate. Testing it overturned that idea.

So I turn to Xcode’s “Instruments”, something that I wish Visual Studio would copy. Running Woopsi through the CPU Sampler suggests that the only thing Woopsi is doing is creating Rect objects. Creating rectangles is making Woopsi’s scrolling rubbish? That makes no sense.

Except, perhaps it does.

All of the new rendering functions create WoopsiArrays of Rect objects. As the inital size of a WoopsiArray is 100, and as they store objects, not pointers to objects, that means that each function call to one of these methods creates hundreds of Rect objects. Worse, these functions are recursive. There are probably tens of thousands of Rect objects being created every time the screen redraws.

I tried switching to arrays of pointers-to-rect objects, but quickly got bogged down in recursive memory allocation hell. Instead, I altered the WoopsiArray class so that it accepts an initial reserved size parameter. In most cases, none of the methods needs more than 4 rects in a list, so I’ve gone with that as the initial size. I’ve also changed the way that the array class resizes itself; it now doubles in size every time it needs to allocate more space instead of increasing by 100 elements. I believe this is the behaviour of the STL’s vector class.

Surprisingly enough, these very minor changes fixed the scrolling problems and boosted the speed of Woopsi everywhere else. I’m not sure I can remember it being this quick.

It’s a very old programmer’s cliche, but it’s true. Performance problems aren’t always where you’d expect.