I’ve been on a graphics and refactoring kick recently, and the latest set of changes reflects that. I’ve been ripping out, refactoring and bugfixing graphics code throughout Woopsi.
The Rect struct, which describes a rectangle (x/y/width/height), is used all over the Woopsi code. However, it was nested within the Gadget class, which made using it intensely annoying. Any attempt to create a Rect had to use the fully-qualified Gadget::Rect name (or WoopsiUI::Gadget::Rect if working outside of the WoopsiUI namespace). To fix this, I have moved the Rect struct into a separate header file. Rects can now be created simply by using the typename “Rect”. Much better.
The SuperBitmap class included facade methods for drawing to the internal bitmap. For example, it had a “drawText()” method that would just call “_bitmap->drawText()“. Since the bitmap no longer includes drawing methods, this became “_graphics->drawText()“. “_graphics” is a pointer to a Graphics object that can draw to the bitmap. However, this means that the SuperBitmap class is more cumbersome than it needs to be. Why not simply expose the Graphics object and get rid of the facade methods?
I’ve now done this. Drawing to a SuperBitmap used to work like this (semi-pseudocode):
SuperBitmap* bitmap = new SuperBitmap();
It now works like this:
SuperBitmap* bitmap = new SuperBitmap();
The examples directory contains two new demos: bitmapdrawing and gadgetdrawing. They’re almost identical, except the first draws to a bitmap (displayed via a SuperBitmap gadget), whilst the second draws directly to an AmigaWindow. The first is, therefore, a demo of how to do persistent drawing, whilst the second is a demo of how to do non-persistent (but mildly faster) drawing.
Whilst writing these examples I came across a number of bugs. Some of them were created whilst consolidating the Bitmap and GraphicsPort drawing methods into a new hierarchy, but some have been around for a while. The GraphicsPort::drawPixel() and drawXORPixel() methods both clip correctly. The GraphicsPort::drawLine() method draws to the correct framebuffer (it previously only drew to the bottom framebuffer). Graphics::drawBitmap() clips correctly if the co-ordinates for the destination exceed the size of the destination bitmap, instead of crashing as it did previously.
Lastly, I’ve been trying to fix a long-standing problem with the DS’ data cache and its interaction with the DMA hardware. Here’s what happens when the ARM9 tries to access a piece of data:
- ARM9 will attempt to read the data cache;
- If data found, ARM9 will continue as normal;
- If no data found, ARM9 will read main memory.
Now here’s what happens when the DMA hardware is used:
- DMA will read data from main memory;
- DMA will write data to main memory.
The DMA hardware cannot see the cache. Also, if the DMA hardware changes main memory but that memory is cached, the ARM9 will read the (outdated) cache instead of main memory.
Using the DMA therefore requires that the cache is correctly updated like this:
- Write cache to main memory;
- Use DMA to copy from main memory to main memory;
- Mark the cache as invalid so the ARM9 fetches new data from main memory.
Woopsi was only performing the first of these three actions, and it wasn’t doing so consistently. The result of this is that, no matter what I do, the last 4 pixels of every bitmap I attempt to blit to another bitmap are not drawn. Some research led me to two useful sources: a GBADEV topic and a blog post from cearn on coranac.com.
Cearn gives the code for replacement copy and fill functions that perform a variety of checks, such as ensuring that the cache is written to RAM before trying to copy and checking that the copy is working with legal data.
However, trying to replace Woopsi’s existing DMA code with cearn’s results in a very nasty and very immediate crash. I initially assumed that his solution was incorrect. From the information in the GBADEV forum it seems that all calls to DC_FlushRange() and DC_InvalidateRange() must be performed on memory that is aligned to 32-bit boundaries. His functions do not check or enforce this.
I wrote replacements that took the most useful parts of cearn’s code and mixed in some alignment-enforcing jiggery-pokery to ensure that all cache handling is done to the correct boundaries. This, however, failed in exactly the same way. The call to DC_InvalidateRegion() kills Woopsi dead before it even appears on screen. Remove this, and it works - except those last damned pixels still aren’t drawn.
Some more research on GBADEV threw up this thread, in which it is determined that memcpy() is actually faster than the DMA when working with main memory. DMA is faster when working with VRAM. This does make sense. VRAM is uncached, so memcopy() will always have to go to main memory to fetch data. The DMA, on the other hand, does not need the cache to be flushed before it can see the latest state. The situation is reversed when dealing with main memory. memcopy() may be able to use the cache, whilst the cache must be flushed before the DMA can do its job. Using the DMA with main memory, therefore, will always result in cache misses somewhere along the line.
It then occurred to me that I could write a function that would use a for-loop when working with main memory and the DMA code when working with VRAM. The most obvious way to tackle this is to check if the source and destination pointers fall within the framebuffer address space. If so, use the DMA. If not, default to memcopy() instead. Some fiddling later and I have copy and fill methods that are theoretically faster than the original macros borrowed from PALib and draw everything correctly.
I’m considering removing external access to the mutable u16 array inside bitmaps that inherit from the MutableBitmapBase class. This would allow the FrameBuffer class to be a wrapper around an SDL screen (conditional compilation shinnanegans ahoy) and remove the double-copying bottleneck that currently exists. This would make Woopsi almost immediately portable to any handheld with a touchscreen and an SDL port.
To this end, all mutable bitmaps now have a blit() method, that can use the DMA hardware to copy a u16 array to a specific co-ordinate within itself, and a blitFill() method, that can use the DMA hardware to fill memory with a specified colour. There is now no reason, other than speed, for any object to get access to the non-const raw bitmap array within a Bitmap class.
Related to those methods, bitmaps also have a getData() method that will return a const pointer to the internal bitmap array at a specific set of co-ordinates. Instead of mucking about trying to calculate where rows of pixels start within the array, it’s now possible to just ask a bitmap to give you a pointer straight to any pixel.