1:30 AM. Been hard at work on this for hours. My plan was to have a professional-looking logo sequence, and this basically means having high-quality images that cross-fade into each other. High-quality images generally means 16-bit. Making a fast, good-quality 16-bit cross-fade effect is, I’ve discovered, bloody difficult.
First of all, there’s the sheer amount of data involved. The DS has a resolution of 256x192, which gives us 49152 pixels. In 16-bit mode, each of these pixels is worth two bytes, which makes a total of 98304 bytes of memory, or roughly 96K. To make a cross-fader, we need two images, which is 192K. To do it quickly, we need to pre-calculate each step of the fade, so we multiply the size of a single image by the amount of steps in the fade, and add it to our running total. Assuming 15 steps produces a smooth transition, that’s (15 * 96) + 192, or about 1.5MB of data.
The DS has 32MB of RAM.
Anyway, a cross-fader in 16-bits works like this (different from the 8-bit version, which relies on producing a single intermediate image and performing palette adjustments on it):
- Loop through every pixel in the first image;
- Get the colour of the pixel;
- Subtract that colour from the colour of the same pixel in the second image (remembering to break it into RGB components first);
- Divide the result by the total number of steps in the fade;
- Multiply that result by the number of the current step;
- Add the pixel colour of the pixel from the first image.
As a formula:
px3 = px1 + (i * (px2 - px1) / t)
Where px3 is the faded pixel, i is the current fade step, px2 is one of the RGB components of the pixel from the second image, px1 is the same from the first image, and t is the total number of steps in the fade routine. This is apparently called “linear interpolation”; I looked it up when I was half-way through coding the routine to discover that I was, yet again, re-inventing the wheel. Actually coding this was incredibly difficult, mainly because of bugs in the PALib, the compiler, the emulators, and my own inexperience with C++. Today I discovered:
- Using PA_Print or PA_OutputText crashes DeSMuME;
- Declaring an array with more than 32768 elements (ie. greater than the upper boundary of a signed int) crashes No$GBA.
I learnt how to create arrays dynamically using malloc(), free the memory again using free(), create pointers to arrays of structures, and even create pointers to arrays of structures containing pointers to arrays of structures and retrieve data from the whole mess afterwards. This was mainly to avoid the 32768 array limit - bitmaps have 49152 pixels! This only seems to happen with C++, though, for some reason.
I also learnt how to output directly to the NDS’ screens without using the PALib functions - there’s a handy pointer called “PA_DrawBg”, which is actually a multidimensional array that points directly at the video memory for the two screens. This line of code draws a black pixel at the top-left of the bottom screen:
PA_DrawBg = 0;
This was amazingly useful. Anyway, once I’d got this all written, I hit a snag. Initially, I pre-calculated the amount that I’d need to increase each pixel by (96K of RAM), and calculated the actual increases on the fly. There was such a lot of data flying around that it just wasn’t smooth enough. I could see the program redrawing the screen, so it was obvious that I needed to optimise it. The solution was to have everything pre-calculated, hence the huge memory requirements outlined above. Even that was too slow - although the fades themselves were now perfect, the images were now on screen for far too long whilst the fade calculations were performed. So, I allocated another 96K of RAM to storing the pre-calculated amounts that the pixels had to be increased by again. This boosted the speed of the algorithm by about 5 times, and gives me a perfect system.
Today’s handy class is called “Colour16Bit”. It enables me to get or set the RGB components of a colour value independently.