Speedy Rectangles

After a bit of testing, I discovered that my new rectangle drawing routine isn’t as fast as I’d thought. In fact, it’s slower than the original method of plotting individual pixels, because (obviously enough) the code to draw a line between two arbitrary points (ie. the PALib line functions) is more complex than the code to draw a line between two points in the same plane (ie. my original code). More importantly, the PALib line drawing system doesn’t use the two- or four-pixel simultaneous writing mode, because it wouldn’t work - if you’re drawing a line from one row of pixels to the next, you’re not writing to consecutive pixels.

All of this is patently obvious, so I don’t know why I didn’t realise it hours ago.

I had a ponder for a while, and realised that the best way to write data to the screen is, as always, the DMA hardware. All I need to do is plot the first row of the rectangle using the 8-bit pixel routines, and I can use the DMA hardware to duplicate that row down to the end of the rectangle.

The only complexity comes with the fact that two 8-bit pixels are squeezed into each 16-bit slot, and the DMA hardware works in 16-bits. That means that if I’m starting with an odd x value, or have an odd width, the DMA copier will either write over a part of the screen I don’t want it to, or it’ll leave gaps. In that case, I have to ensure that I compensate for odd values in the function, and fill in the gaps after the copy using the pixel plotter.

Eventually, after much fiddling (most of it involving a mental block with DMA_Copy and the nature of pointers), I’ve come up with a filled rectangle routine that in one stroke solves all of the performance issues with Woopsi. Once I get the optimised redraw logic in place it should be even faster.

Here’s the filled rectangle routine:

void drawRect(s16 x, s16 y, u16 width, u16 height, u8 colour) {

    // Draw top line
    for (u16 i = x; i < x + width; i++) {
        PA_Put8bitPixel(0, i, y, colour);

    // Calculate DMA copy values
    u16 dmaX = x & 1 ? x + 1 : x;
    u16 dmaWidth = width & 1 ? width - 1 : width;

    // Perform DMA copy
    for (u16 i = y + 1; i < y + height; i++) {
        DMA_Copy((PA_DrawBg[0] + (y << 7) + (dmaX >> 1)), (PA_DrawBg[0] + (i << 7) + (dmaX >> 1)), (dmaWidth >> 1), DMA_16NOW);

    // Fill in any gaps
    if (x & 1) {
        // Draw left-hand side
        for (u16 i = y; i < y + height; i++) {
            PA_Put8bitPixel(0, x, i, colour);

    if (width & 1) {
        // Draw right-hand side
        for (u16 i = y; i < y + height; i++) {
            PA_Put8bitPixel(0, x + width, i, colour);

Here’s a demo:

Woopsi Demo V0.03