2007-12-21

Racing Frustration?

I think I’ve hit a race condition with the DMA hardware. The SuperBitmap does this to draw the first row of pixels in its drawFilledRect() set of functions:

// Draw initial pixel
u16* pos = _bitmap + (y * _bitmapWidth) + x;
*pos = colour;

// Duplicate pixel
DMA_Force(*pos, (pos + 1), (width - 1), DMA_16NOW);

Simple enough. Plot the top-left pixel using the colour we want to fill with, then use DMA_Force to copy that pixel along the length of a row.

Except it doesn’t work. Well, it works sometimes on the real hardware, and always in the DS emulators. It seems to not work in the current demo set up, but works OK with just PacMan running. By “not work”, I mean that it draws the first pixel, but then doesn’t copy the pixel across the row.

Just to prove to myself that my clipping maths isn’t wrong, I swapped out the above code for this:

// Draw first row pixel by pixel
for (u16 i = 0; i < width; i++) {
    _bitmap[(y * _bitmapWidth) + x + i] = colour;
}

This pixel plotter draws the line correctly, and the DMA_Copy routine that follows it draws the rest of the filled rect. I can’t see that it’s anything but a race condition (DMA hardware running before the original pixel is drawn - is that even possible?) or a bug in some subsystem to which I don’t have access (the DS itself?). In any case, I’ve spent far too long trying to get DMA_Force to work for this, and I’ve decided to give up on it. The negligible performance gain doesn’t justify the effort.

In other news, some more progress with the ScrollingPanel. I’ve scrapped the “allowStylusScroll” concept and replaced it with the existing “draggable” concept, which means all of the pre-written dragging code now applies. Scrolling now clips to the client rect, so bordered mode is available. There are a few x/y co-ordinate confusion fixes, too (probably copy-and-paste errors).

One major bug with the scrolling code was the way it handled scrolling if the scroll distance was greater than the size of the gadget (ie. scroll 10 horizontal pixels in a 5 pixel wide gadget). The gadget previously didn’t handle this circumstance at all and caused strange graphical corruption and crashes when it tried to DMA copy negative widths.

The gadget now has maximum and minimum scroll values, and won’t scroll beyond those regions. It’s also got scroll width and height accessors which give the total scrollable dimensions of the gadget.

It sends key, drag, lid and focus/blur events to its children, so ScrollingPanels can now contain other ScrollingPanels, which can themselves contain more ScrollingPanels, etc. You could create some excitingly unusable interfaces by doing that. You could also attach a panel to a window, making it the full size of the window, and have windows that are apparently much wider than the screen that contains them. All sorts of potential uses.

I had a quick go at adding the ScrollingPanel functionality into the MultiLineTextBox. It looks very promising - just inherit from ScrollingPanel instead of Gadget and change the draw(rect) function, basically. That change isn’t in SVN yet because it’s nowhere near finished.

Comments

Jeff on 2007-12-22 at 06:55 said:

Hmmm, the DMA problem sounds odd though not impossible.

What about not trying to write then immediately read the same location? ie, rather than putting the first byte into buff[0] and using DMA to copy into buff[1…], just DMA directly out of ‘colour copy into buff[0…]

That would avoid cache-line inconsistency, which sounds like it might be your problem, though I’m not sure that the DS hardware suffers from that sort of thing…

I’ve also seen some suggestion that you need to wait for the busy bit to go clear on the DMA control registers before talking to the device again - does the problem happen on just the FIRST row you right, or alternately just NOT the first row?

Jeff on 2007-12-22 at 07:21 said:

ie,

u16* pos = _bitmap + (y * _bitmapWidth) + x; DMA_Force(&colour, (pos), (width), DMA_16NOW);

ant on 2007-12-22 at 12:36 said:

I’ve tried that one already, in the vague hope that it could be something to do with VRAM, but that doesn’t work at all. Not sure why exactly; guess it’s something to do with the “colour” variable being on the stack. Shouldn’t make any difference, but it seems to. I wonder what would happen if I malloced a new u16 and passed that in?

I might try checking the busy bit in a while loop and see if that helps.

Jeff on 2007-12-22 at 22:17 said:

When you say doesn’t work at all, do you mean that it didn’t RUN or didn’t COMPILE?

Its worth checking again, given the calling sequence on that macro. I just checked the sources and it looks like this:


#define DMA_Force(ulVal,dest, count, mode)
{
    REG_DMA3SRC=(u32)&ulVal;
    REG_DMA3DST = (u32)dest;
    REG_DMA3CNT = (count) |(mode) | DMA_SRC_FIX;
}

so the first argument is expected to be the ACTUAL SOURCE whereas the destination is the ADDRESS OF THE DEST. So what I suggested was wrong, it should have been:


u16* pos = _bitmap + (y * _bitmapWidth) + x;
DMA_Force(colour, (pos), (width), DMA_16NOW);

ie, no & on the colour. I would have thought you’d get a syntax error if you passed in the &

ant on 2007-12-23 at 13:02 said:

It compiles (when using the value rather than a pointer, natch), but everything gets drawn as a black rectangle. The weird thing is that the GraphicsPort contains almost identical code and that works fine; the only difference is that the GraphicsPort draws its top line in a separate function. The extra function call might be delaying the execution of the code just long enough for the DMA hardware to clear and accept the new instructions. Not sure. Need to work out which address to get the clear bit from…

EDIT: Yep, replacing the code within the function to a call to SuperBitmap::drawHorizLine() - which uses the DMA_Force macro - works fine. Trying to replace DMA_Copy in the loop with a call to DMA_Force results in black rectangles again. There must be some sort of problem with DMA_Force within the hardware itself. It doesn’t seem to run as quickly as DMA_Copy and ties up the hardware longer, so any other attempts to work with the DMA hardware cause race conditions.

That’s how it appears to me, anyway.

Jeff on 2007-12-23 at 22:44 said:

Well, thats just weird. And, of course, it may explain why PALib kept that great big block of zeroes lying around…

It would be so nice if people would comment “why” they code something they way they do. Everyone seems to think that comments are about “what” they are doing, and then slip into “but its self-evident, so I don’t need to explain”. Of course PALib has an excuse since all the comments are in French so I have no idea whether he explained those things or not…

Jeff on 2007-12-23 at 22:50 said:

You did try checking the busy bits, didn’t you? You said you were going to. This is one of those areas where PALib seems a little slack - contains the following:


static inline void dmaCopyWords(uint8 channel, const void* src, void* dest, uint32 size) {
    DMA_SRC(channel) = (uint32)src;
    DMA_DEST(channel) = (uint32)dest;
    DMA_CR(channel) = DMA_COPY_WORDS | (size>>2);
    while(DMA_CR(channel) & DMA_BUSY);
}

static inline void dmaCopyWordsAsynch(uint8 channel, const void* src, void* dest, uint32 size) {
    DMA_SRC(channel) = (uint32)src;
    DMA_DEST(channel) = (uint32)dest;
    DMA_CR(channel) = DMA_COPY_WORDS | (size>>2);

}

so in the case where they want synchronous execution, they definitely wait until the busy bit is unset before continuing. Personally, I don’t understand why their async version doesn’t loop BEFORE so that it would work regardless of whether you call it too often or not.

Nevertheless, this is another area where getting away from the PALib macros might be a good idea, to see if that fixes things. I started working on my own helper class to do the same thing before I discovered that libnds had already done it.

Jeff on 2007-12-23 at 22:51 said:

Warning: previous post is missing some < THe first paragraph is supposed to say <nds/dma.h> contains the following:

ant on 2007-12-23 at 22:57 said:

Nope, didn’t get around to checking the busy bit. Got distracted by the SDL idea, the promise of an Xcode version of Woopsi with debugging, re-formatting the Mac (just re-installing the apps now) and Christmas drinks in the pub (actually, re-installing tomorrow might be a more sensible idea…).