Woopsi Updates and DMA Mayhem

I’ve been on a graphics and refactoring kick recently, and the latest set of changes reflects that. I’ve been ripping out, refactoring and bugfixing graphics code throughout Woopsi.

The Rect struct, which describes a rectangle (x/y/width/height), is used all over the Woopsi code. However, it was nested within the Gadget class, which made using it intensely annoying. Any attempt to create a Rect had to use the fully-qualified Gadget::Rect name (or WoopsiUI::Gadget::Rect if working outside of the WoopsiUI namespace). To fix this, I have moved the Rect struct into a separate header file. Rects can now be created simply by using the typename “Rect”. Much better.

The SuperBitmap class included facade methods for drawing to the internal bitmap. For example, it had a “drawText()” method that would just call “_bitmap->drawText()“. Since the bitmap no longer includes drawing methods, this became “_graphics->drawText()“. “_graphics” is a pointer to a Graphics object that can draw to the bitmap. However, this means that the SuperBitmap class is more cumbersome than it needs to be. Why not simply expose the Graphics object and get rid of the facade methods?

I’ve now done this. Drawing to a SuperBitmap used to work like this (semi-pseudocode):

SuperBitmap* bitmap = new SuperBitmap();

It now works like this:

SuperBitmap* bitmap = new SuperBitmap();

The examples directory contains two new demos: bitmapdrawing and gadgetdrawing. They’re almost identical, except the first draws to a bitmap (displayed via a SuperBitmap gadget), whilst the second draws directly to an AmigaWindow. The first is, therefore, a demo of how to do persistent drawing, whilst the second is a demo of how to do non-persistent (but mildly faster) drawing.

Whilst writing these examples I came across a number of bugs. Some of them were created whilst consolidating the Bitmap and GraphicsPort drawing methods into a new hierarchy, but some have been around for a while. The GraphicsPort::drawPixel() and drawXORPixel() methods both clip correctly. The GraphicsPort::drawLine() method draws to the correct framebuffer (it previously only drew to the bottom framebuffer). Graphics::drawBitmap() clips correctly if the co-ordinates for the destination exceed the size of the destination bitmap, instead of crashing as it did previously.

Lastly, I’ve been trying to fix a long-standing problem with the DS’ data cache and its interaction with the DMA hardware. Here’s what happens when the ARM9 tries to access a piece of data:

  • ARM9 will attempt to read the data cache;
    • If data found, ARM9 will continue as normal;
    • If no data found, ARM9 will read main memory.

Now here’s what happens when the DMA hardware is used:

  • DMA will read data from main memory;
  • DMA will write data to main memory.

The DMA hardware cannot see the cache. Also, if the DMA hardware changes main memory but that memory is cached, the ARM9 will read the (outdated) cache instead of main memory.

Using the DMA therefore requires that the cache is correctly updated like this:

  • Write cache to main memory;
  • Use DMA to copy from main memory to main memory;
  • Mark the cache as invalid so the ARM9 fetches new data from main memory.

Woopsi was only performing the first of these three actions, and it wasn’t doing so consistently. The result of this is that, no matter what I do, the last 4 pixels of every bitmap I attempt to blit to another bitmap are not drawn. Some research led me to two useful sources: a GBADEV topic and a blog post from cearn on coranac.com.

Cearn gives the code for replacement copy and fill functions that perform a variety of checks, such as ensuring that the cache is written to RAM before trying to copy and checking that the copy is working with legal data.

However, trying to replace Woopsi’s existing DMA code with cearn’s results in a very nasty and very immediate crash. I initially assumed that his solution was incorrect. From the information in the GBADEV forum it seems that all calls to DC_FlushRange() and DC_InvalidateRange() must be performed on memory that is aligned to 32-bit boundaries. His functions do not check or enforce this.

I wrote replacements that took the most useful parts of cearn’s code and mixed in some alignment-enforcing jiggery-pokery to ensure that all cache handling is done to the correct boundaries. This, however, failed in exactly the same way. The call to DC_InvalidateRegion() kills Woopsi dead before it even appears on screen. Remove this, and it works - except those last damned pixels still aren’t drawn.

Some more research on GBADEV threw up this thread, in which it is determined that memcpy() is actually faster than the DMA when working with main memory. DMA is faster when working with VRAM. This does make sense. VRAM is uncached, so memcopy() will always have to go to main memory to fetch data. The DMA, on the other hand, does not need the cache to be flushed before it can see the latest state. The situation is reversed when dealing with main memory. memcopy() may be able to use the cache, whilst the cache must be flushed before the DMA can do its job. Using the DMA with main memory, therefore, will always result in cache misses somewhere along the line.

It then occurred to me that I could write a function that would use a for-loop when working with main memory and the DMA code when working with VRAM. The most obvious way to tackle this is to check if the source and destination pointers fall within the framebuffer address space. If so, use the DMA. If not, default to memcopy() instead. Some fiddling later and I have copy and fill methods that are theoretically faster than the original macros borrowed from PALib and draw everything correctly.

I’m considering removing external access to the mutable u16 array inside bitmaps that inherit from the MutableBitmapBase class. This would allow the FrameBuffer class to be a wrapper around an SDL screen (conditional compilation shinnanegans ahoy) and remove the double-copying bottleneck that currently exists. This would make Woopsi almost immediately portable to any handheld with a touchscreen and an SDL port.

To this end, all mutable bitmaps now have a blit() method, that can use the DMA hardware to copy a u16 array to a specific co-ordinate within itself, and a blitFill() method, that can use the DMA hardware to fill memory with a specified colour. There is now no reason, other than speed, for any object to get access to the non-const raw bitmap array within a Bitmap class.

Related to those methods, bitmaps also have a getData() method that will return a const pointer to the internal bitmap array at a specific set of co-ordinates. Instead of mucking about trying to calculate where rows of pixels start within the array, it’s now possible to just ask a bitmap to give you a pointer straight to any pixel.


Woopsi, Bitmaps and Python 3

Back when I was programming mediocre games in AMOS, one of my friends decided he wanted to code too. I introduced him to the basics and he quickly advanced way beyond me. Whilst I was struggling to get simple things working in BASIC he was writing demo effects - fire, spinning 3D environments, toruses - in Pascal and C. One of the most impressive programs that he wrote, at least to me, was a BMP loader. I had no idea how to go about writing anything like that. I put it down to two things:

  • BASIC, and the way it hides all implementation details from the coder to the point that he finds himself doing nothing but bolting together existing library functions (ie. “loadBMP()”, “saveBMP()”, “makeAnAmazingGame()”);
  • The total lack of any useful books on coding where I grew up.

You could point out that if he could learn with no books, I could too, and he’s probably just smarter than me. That’s probably true.

Anyhoo, having just spent a month manually serialising objects for network transmission in C++ (exciting new project to be released soon), I decided it was time to write my own BMP loader for Woopsi.

Instead of leaping straight into the C++ code, I figured I’d make a prototype in Python first:


This is a Python 3 class that can load BMP files in either 1, 4, 8 or 24-bit colour depth. 16-bit support would be possible fairly simply, but I didn’t have a BMP in the right depth whilst writing this. It can only load uncompressed bitmaps. Finally, it supports all versions of the BMP DIB header, but it treats V4 and V5 as V3 and discards the extra data.

Once that was written, getting a C++ version created was simple. Woopsi now has two new classes in the “bonus” folder:

  • BinaryFile, which can read and write binary files (works with both the SDL and DS libfat versions, as libfat is POSIX compliant for files);
  • BitmapIO, which can load and save BMP files.

The Woopsi version is rather more limited than the Python version. It will only load and save 24-bit bitmaps. It can only deal with version 3 of the DIB header (the most popular BMP format, as far as I can tell). It will only load uncompressed bitmaps.

Usage is like so:

// Loading
Bitmap* bitmap = BitmapIO::loadBMP("/mypicture.bmp");

// Saving

Writing the BMP code highlighted a number of bugs in other parts of Woopsi, all of which are now fixed. The Bitmap::getPixel() method didn’t work for bitmaps larger than 65535 pixels. The Bitmap::drawBitmap() method wasn’t calling DC_FlushRange() before trying to DMA copy the source bitmap, leading to rows of black pixels. The SuperBitmap class has an overload for drawBitmap() that accepts a Bitmap object.

Lastly, there’s a new example program demonstrating the BitmapIO functionality.


More Bitmaps and Calendar Fixes

The bitmap class introduced in the last post is now part of the SuperBitmap. The SuperBitmap class has retained the same API, but it now is just a facade wrapped around an instance of the Bitmap class. The Bitmap does all of the hard work.

I’ve made a couple of amendments to the Calendar and Date classes, too. The Date class now has its equals and not-equals operators overloaded, whilst the Calendar now shows Monday as the first column (instead of Sunday). It also handles clicks on buttons correctly if that button represents the currently-selected day in a different month/year.


Cursors and Bitmaps

Two changes today. I’ve optimised the MultiLineTextBox’s cursor drawing routines. It’s not as fast as it could be - the fastest way to draw it would probably be to inject it into the main drawText() routine - but as that class changes so much I think it’s best to keep it readable at this stage.

Secondly, I’ve replaced the Bitmap struct with a Bitmap class. The struct wasn’t being used anywhere. The class includes all of the drawing functionality previously only available in the SuperBitmap, so it is now possible to create and manipulate bitmaps entirely in memory without an associated gadget. This should, I think, answer one of Jeff’s earlier queries. I’ll get around to stripping the code from the SuperBitmap and replacing it with the Bitmap class later. Not sure if the SuperBitmap is going to inherit from the Bitmap or just contain an instance of it yet.