2008-10-09

Strings and Things

The Text class is fixed. It now wraps lines of text that are wider than the available display window but have no natural breaking points. The wrapping function was skipping over characters when attempting to force breaks into non-broken text and drifting off into memory after the end of the string.

Next up, there’s now a WoopsiString class. This wraps around a char array and provides all of the functionality I’ve been doing manually in the TextBox class - setting the string (and moving memory about), concatenating strings (and moving memory about), inserting one string into another (and moving memory about), and removing sections of a string (and moving- well, you get the idea).

I’ve taken the opportunity afforded by this rethink to improve the way the memory allocation is handled. Instead of blindly reallocating a new char array every time the string changes length, the new WoopsiString class attempts to work within the memory it has already allocated. For example, imagine we make this string:

WoopsiString* s = new WoopsiString("abcd");

Now we append another string to end:

s->append("efgh");

We’ve got no choice here but to expand the memory used, which means allocating a new char array of length 9 (initial string length plus second string length, plus terminator) and copying the two strings into it. However, if we then truncate the string there’s no reason for us to allocate a smaller array:

s->remove(2);   // Remove chars from index 2 until end of string

All we really need to do here in order to truncate the string from index 2 onwards is put a terminating character in the 3rd array element. We’ve then got 6 empty slots in the char array. Now, should we append anything else to our string, we can either fill in the empty slots (if there are enough slots) and save time, or we can go through the memory allocation and string copy process again (if there aren’t enough slots):

s->append("xyz");

In this instance, we can append “xyz” to the “ab” we’ve got stored in the WoopsiString s without allocating any new memory.

I imagine every string class ever written does something like this, so it’s not an amazing innovation, but it’s a definite improvement over the char array mangling and memory fragmenting that the TextBox was doing before.

There’s no change to the way the TextBox interfaces with the outside world; that’s still all done with char arrays or chars. This is just an internal change.

One more class done today - Label. The Label is a borderless, read-only (from the user’s point of view) gadget that allows text to be put onto the screen. Basically, it’s the TextBox with GADGET_BORDERLESS set in its flags minus the cursor stuff I’ve been adding to TextBox. TextBox now inherits from Label.

Comments

Jeff on 2008-10-09 at 20:35 said:

The thing about these sorts of optimisations is that people try to do them indepedently of whats actually going to happen with the class. What you’ve done cuts down on memory fragmentation a bit (since you don’t realloc every time you add a single character) but at the same time, a string that contained 20K which gets truncated to 2K is going to waste 18K unless you’ve got some way to explicitly claim that memory back.

But thats probably not a common occurrence.

What is a common occurrence in your string class is insertion of MULTIPLE characters into a string, starting at “the insertion point” and waiting as the user types a character. One solution I’ve seen to this involves exploiting that extra space you put at the end of the string and maintaining a ‘gap’ in the string. ie, because insertion of one character is probably going to be followed by insertion of more, when you make space for (say) 80 more and remember that the gap is present - the second insertion just writes into the gap, no memory realloc required. You get the next 79 insertions at NO memory cost.

If someone moves the selection point, you don’t need to relocate the entire back-end of the string, just the part between the old gap and the new insertion point.

DMA can help shift the string up/down (subject to IRAM limits, of course)

ant on 2008-10-10 at 06:58 said:

Interesting idea. I had a couple of other ideas for improving performance, one of which was fairly similar. Based on the assumption that the majority of string edits would just involve adding n characters to the end of the string, realloc operations could extend the string by a certain size (80 characters, say) each time the allocated memory fills up. That way, only every 80th append to the string causes a realloc.

Your suggestion extends that so that the gap isn’t fixed at the end of the string, but can jump to cope with inserts in addition to appends. Neat!

If the WoopsiString turns out to have performance problems I’ll implement the idea you’ve outlined, but otherwise I think I’ll leave it as it is.