Woopsi Updates and DMA Mayhem

I’ve been on a graphics and refactoring kick recently, and the latest set of changes reflects that. I’ve been ripping out, refactoring and bugfixing graphics code throughout Woopsi.

The Rect struct, which describes a rectangle (x/y/width/height), is used all over the Woopsi code. However, it was nested within the Gadget class, which made using it intensely annoying. Any attempt to create a Rect had to use the fully-qualified Gadget::Rect name (or WoopsiUI::Gadget::Rect if working outside of the WoopsiUI namespace). To fix this, I have moved the Rect struct into a separate header file. Rects can now be created simply by using the typename “Rect”. Much better.

The SuperBitmap class included facade methods for drawing to the internal bitmap. For example, it had a “drawText()” method that would just call “_bitmap->drawText()“. Since the bitmap no longer includes drawing methods, this became “_graphics->drawText()“. “_graphics” is a pointer to a Graphics object that can draw to the bitmap. However, this means that the SuperBitmap class is more cumbersome than it needs to be. Why not simply expose the Graphics object and get rid of the facade methods?

I’ve now done this. Drawing to a SuperBitmap used to work like this (semi-pseudocode):

SuperBitmap* bitmap = new SuperBitmap();

It now works like this:

SuperBitmap* bitmap = new SuperBitmap();

The examples directory contains two new demos: bitmapdrawing and gadgetdrawing. They’re almost identical, except the first draws to a bitmap (displayed via a SuperBitmap gadget), whilst the second draws directly to an AmigaWindow. The first is, therefore, a demo of how to do persistent drawing, whilst the second is a demo of how to do non-persistent (but mildly faster) drawing.

Whilst writing these examples I came across a number of bugs. Some of them were created whilst consolidating the Bitmap and GraphicsPort drawing methods into a new hierarchy, but some have been around for a while. The GraphicsPort::drawPixel() and drawXORPixel() methods both clip correctly. The GraphicsPort::drawLine() method draws to the correct framebuffer (it previously only drew to the bottom framebuffer). Graphics::drawBitmap() clips correctly if the co-ordinates for the destination exceed the size of the destination bitmap, instead of crashing as it did previously.

Lastly, I’ve been trying to fix a long-standing problem with the DS’ data cache and its interaction with the DMA hardware. Here’s what happens when the ARM9 tries to access a piece of data:

  • ARM9 will attempt to read the data cache;
    • If data found, ARM9 will continue as normal;
    • If no data found, ARM9 will read main memory.

Now here’s what happens when the DMA hardware is used:

  • DMA will read data from main memory;
  • DMA will write data to main memory.

The DMA hardware cannot see the cache. Also, if the DMA hardware changes main memory but that memory is cached, the ARM9 will read the (outdated) cache instead of main memory.

Using the DMA therefore requires that the cache is correctly updated like this:

  • Write cache to main memory;
  • Use DMA to copy from main memory to main memory;
  • Mark the cache as invalid so the ARM9 fetches new data from main memory.

Woopsi was only performing the first of these three actions, and it wasn’t doing so consistently. The result of this is that, no matter what I do, the last 4 pixels of every bitmap I attempt to blit to another bitmap are not drawn. Some research led me to two useful sources: a GBADEV topic and a blog post from cearn on coranac.com.

Cearn gives the code for replacement copy and fill functions that perform a variety of checks, such as ensuring that the cache is written to RAM before trying to copy and checking that the copy is working with legal data.

However, trying to replace Woopsi’s existing DMA code with cearn’s results in a very nasty and very immediate crash. I initially assumed that his solution was incorrect. From the information in the GBADEV forum it seems that all calls to DC_FlushRange() and DC_InvalidateRange() must be performed on memory that is aligned to 32-bit boundaries. His functions do not check or enforce this.

I wrote replacements that took the most useful parts of cearn’s code and mixed in some alignment-enforcing jiggery-pokery to ensure that all cache handling is done to the correct boundaries. This, however, failed in exactly the same way. The call to DC_InvalidateRegion() kills Woopsi dead before it even appears on screen. Remove this, and it works - except those last damned pixels still aren’t drawn.

Some more research on GBADEV threw up this thread, in which it is determined that memcpy() is actually faster than the DMA when working with main memory. DMA is faster when working with VRAM. This does make sense. VRAM is uncached, so memcopy() will always have to go to main memory to fetch data. The DMA, on the other hand, does not need the cache to be flushed before it can see the latest state. The situation is reversed when dealing with main memory. memcopy() may be able to use the cache, whilst the cache must be flushed before the DMA can do its job. Using the DMA with main memory, therefore, will always result in cache misses somewhere along the line.

It then occurred to me that I could write a function that would use a for-loop when working with main memory and the DMA code when working with VRAM. The most obvious way to tackle this is to check if the source and destination pointers fall within the framebuffer address space. If so, use the DMA. If not, default to memcopy() instead. Some fiddling later and I have copy and fill methods that are theoretically faster than the original macros borrowed from PALib and draw everything correctly.

I’m considering removing external access to the mutable u16 array inside bitmaps that inherit from the MutableBitmapBase class. This would allow the FrameBuffer class to be a wrapper around an SDL screen (conditional compilation shinnanegans ahoy) and remove the double-copying bottleneck that currently exists. This would make Woopsi almost immediately portable to any handheld with a touchscreen and an SDL port.

To this end, all mutable bitmaps now have a blit() method, that can use the DMA hardware to copy a u16 array to a specific co-ordinate within itself, and a blitFill() method, that can use the DMA hardware to fill memory with a specified colour. There is now no reason, other than speed, for any object to get access to the non-const raw bitmap array within a Bitmap class.

Related to those methods, bitmaps also have a getData() method that will return a const pointer to the internal bitmap array at a specific set of co-ordinates. Instead of mucking about trying to calculate where rows of pixels start within the array, it’s now possible to just ask a bitmap to give you a pointer straight to any pixel.


Scrolling Changes

Building on the GraphicsPort::copy() method reported last time, I’ve added a new GraphicsPort::scroll() method. This takes in x and y co-ordinates as the location of the region to scroll, in addition to the width and height of the region to be scrolled. This information represents a rectangle that will be shifted through a specified distance both vertically and horizontally, which are also parameters to the function.

The method uses the copy() method to do the actual pixel moving, but does some crazy clipping calculations in order to limit the region being copied to within the bounds of the visible regions of the relevant gadget. To be specific, it uses the copy() routine where possible, but there are a number of situations where no data is actually available to copy (ie. destination falls outside the bounds of the clipping region). To cater for this, the function expects to be passed a pointer to an empty WoopsiArray of rects. Where the copy cannot be performed, a rect representing the region to be refreshed is appended to the array. These can be redrawn (using a loop around the draw(Rect) method) once the scroll function exits.

In addition to this, the function adds all revealed regions to the array too. So, if you scroll a gadget horizontally by 10 pixels, the array will contain a rect 10 pixels wide representing the newly-visible area that needs to be redrawn.

The scroll() method rendered much of the code in the ScrollingPanel redundant. In fact, it reduced it from ~500 lines of code down to ~200, and removed another set of functions that dealt directly with the framebuffer and recalculated visible regions. Unfortunately, it doesn’t appear to have made any noticable difference to the speed of the ScrollingPanel’s dragging function - it’s still flickery. The only way around that would appear to be double-buffering the display, which is counter-productive given the way I’m using the DMA hardware to copy pixels around.


Woopsi Optimisations and Fixes

I’ve made a number of fixes to Woopsi today. First of all, sliders and scrollbars. The slider grip now automatically resizes itself, meaning there’s no need to call resizeGrip() when setting the max/min/value properties of a slider/scrollbar. The resizeGrip() method is now private. Still on the topic of scrollbars, the grip in the ScrollingTextbox’s scrollbar now adjusts to the correct position when the textbox is first initialised. It was previously placed at the top of the scrollbar regardless of the position of the text.

Next, some fixes related to the event refactoring. The Alert, Requester and WoopsiKeyboard gadgets now handle events from their decorations correctly. Previously, the XOR dragging rect wasn’t being drawn correctly when they were first clicked.

Lastly, I’ve made some improvements to the visible region caching system. There are two types of rectangles used in the system. The first, which was already being cached, represents the entire visible surface of a gadget, ignoring any children it might have. The closest Java analogy would be Swing’s “glass pane”. Drawing to these rects draws over the top of the entire gadget, even over the top of any children the gadget might have. The second type of rects represent the visible portions of a gadget once its children have been drawn; they can be thought of as background rects, behind everything else in the gadget. Woopsi only draws the background rects; everything else is left to children to draw (and so on, recursively, until the screen is filled).

Whilst looking at the Gadget::redraw() method, I noticed that the background rects were being recalculated every time redraw() was called. I’ve now changed this so that both foreground and background rects are cached, which should provide a small speed boost. Note that, as I type this, I’ve realised that calling the rects “foreground” and “background” is a far better convention than their current names, so I’ll probably rename them soon.

Still on the subject of caching, I’ve moved the caching code and the rect splitting code into a new class, “RectCache”. The Gadget class is overloaded with functionality at the moment, so I want to extract some of it out to make it easier to work with. As a byproduct, I’ve made the “removeOverlappedRects()” method non-recursive, which should make that a little faster, too.

A major reason for trying to move this code into a separate class is to try and either optimise it (by making the rect splitting smarter) or replace it with the BSP tree code I came up with - oops - a year ago. I’ve fixed a raft of bugs in that code, but I’m still struggling to work out how exactly I can integrate it into Woopsi, or if I even need to. On a system with a little more CPU oomph and no DMA hardware (GP2x or Wiz, for example), it makes sense to replace the current system with the BSP code. It’s tidier, more efficient, and simpler. On the DS, however, I keep encountering optimisations that I’ve made in Woopsi that can’t be replicated using the BSP method and that offer significant speed gains. I might put up a longer post specifically on this topic later.


Woopsi Bugfix Bugfix; Rules of the DS DMA Hardware

A bugfix for the bugfix. I’ve rethought the locations of the DC_FlushRange() commands. It seems that you don’t need to worry about flushing the mem cache if you’re dealing with VRAM, so I’ve removed all of the the DC_FlushRange() calls from the GraphicsPort. I’ve rewritten the filled rect function to use DMA_Force() instead of DMA_Copy().

The only calls left are now limited to the SuperBitmap. One is in the SuperBitmap::draw() command, since we want to make sure that when we blit the bitmap we’re blitting the latest data. The second is in the SuperBitmap::drawFilledRect() command, as we need to be able to duplicate a value in main RAM. The last is in SuperBitmap::drawHorizLine(), for the same reason.

Here are the basic rules of the DS’ DMA hardware:

  • Always check that DMA activity has finished on a DMA channel before attempting to use that channel for something else
  • Never try to use the DMA hardware with the stack
  • If you use the DMA to copy values from main memory, ensure you call DC_FlushRange() on the region of memory you’re copying from
  • The DMA hardware works best when used with VRAM
  • The smallest data size that the DMA can work with is 16-bit, so don’t try to copy 8-bit data around unless a) you’re copying an even number of bytes, and b) the memory is aligned to the 2-byte boundary.


Woopsi Bugfixes

A few minor bugfixes now in the SVN repo. First of all, I’ve removed the post-DMA operation checks for DMA activity. They’ve been replaced by calls to DC_FlushRange() which should fix all of the DMA problems, which I’ve also added into the SDL code. I’ve also fixed the vector pointer problem in Gadget::moveHiddenToChildList().