ARM Assembly First Steps

I’ve been reading about ARM assembly lately. At the moment, the only book I’ve got is this one:

ARM: Assembly Language Programming

It’s not too bad, especially as it’s free and seemingly the only book on ARM assembly still available. (If anyone has any suggestions of good books on the topic, let me know.) It’s easy to read and gives an overview of the basics of writing assembly for the ARM family of processors. It has some quirks and strange omissions, though, and it’s obviously still a work in progress. A lot of internal references to other chapters are incomplete, a lot of pages are repeated, and I’m 63 pages in and there’s no mention of whether the ARM is little- or big-endian. This is rather important when you’re trying to add two 64-bit numbers with a 32-bit CPU (ie. the example on page 63). It also stumped me for a while with its buggy example programs. The “16-Bit Data Transfer” example on page 57 achieves the transfer (or, more accurately, doesn’t achieve it) by first loading a single byte from memory into R1 (line eight) and writing it back to memory as a word (line 9). It should, I’d have thought, have used the “LDRH” and “STRH” instructions to deal with half-word data instead.

It doesn’t list the clock cycles required for each command, either. One of my 6502 asm books included this information, which I thought superfluous at the time, but which I now realise is essential for writing efficient assembly code. If you work on the assumption that a C compiler will try to choose the most efficient asm instruction for each given situation, anyone trying to better the compiler by dropping into asm himself must know how long each command takes to execute. If he doesn’t know that information, the chances are the compiler will do a better job with a lot less effort.

This brings me to my first asm function for the DS. This will draw a pixel to one of the screens (it should be in 16-bit framebuffer mode; I’m using Woopsi as the test environment, hence the use of Woopsi’s “DrawBg” framebuffer pointer):

void drawPixel(u8 screen, s16 x, s16 y, u16 colour) {
    mov r4, y, LSL #8   ; Store y in r4 multiplied by screen width (ie. left shifted 8 places)
    add r4, x           ; Add x to current value
    add r4, DrawBg[screen]; Add address of screen bitmap to current value

    mov r5, colour      ; Store new colour in r5

    strh r5, [r4]       ; Store r5 (colour) in address pointed to by r4
    asm volatile("\
        mov r4, %0, LSL #8\n\
        add r4, %1\n\
        add r4, %2\n\
        mov r5, %3\n\
        strh r5, [r4]\n"
        : "r" (y), "r" (x), "r" (DrawBg[screen]), "r" (colour)
        : "r4", "r5"

My current plan is to write a few functions in C and asm, then compare the compiler’s take on it with my own. Hopefully I’ll see where the compiler does a better job. Still getting the hang of inline assembly, too. No idea yet how labels work when inlined, or why GCC insists I use backslashes within a string literal spread over several lines (something to do with the @ symbol, I think), or if there’s a more efficient way of getting at the function parameters (I think they’re automatically loaded into the lower registers, but I don’t know what happens if there are more parameters than there are registers).

None of this relates to Woopsi, though, as I don’t really have any intention of trying to out-code the compiler there. I have managed to make one change to Woopsi today - the Gadget::addGadget() and Gadget::insertGadget() functions now automatically draw the newly-added child gadget. Instead of forcing the developer to manually call draw() for each new gadget, this is done for him. Preventing new gadgets being drawn automatically is just a case of calling “disableDrawing()” on the parent gadget (or another gadget higher up the hierarchy).


Jeff on 2008-05-29 at 01:20 said:

I think the reason it doesn’t mention endianness is that the ARM is ambidextrous - that is, its a control register somewhere that determines exactly how it loads partial words.


ant on 2008-05-29 at 10:22 said:

True, but that only makes it even more surprising. There’s no mention of bi-endianness in the book, nor any mention of how to switch between modes. I think I’ve actually found a diagram of the byte order, though, which suggests that the authors assume the ARM defaults to big-endian (this apparently isn’t the case; ARM CPUs in Symbian devices are set to little-endian).