2010-01-05

Unicode News

Unicode support in Woopsi is coming along nicely. Most of the gadgets now expect to receive WoopsiString objects instead of chars or char arrays when dealing with text. As the WoopsiString now works with UTF-8, that means most of the gadgets now also support UTF-8.

I’ve overloaded the WoopsiString’s “=” operator for WoopsiString objects, char arrays and u32s, which means that any parameter that should be a WoopsiString can accept char arrays or u32s instead. Converting from string literals to WoopsiStrings is therefore seamless and automatic.

The glyphs are now stored in a separate “GlyphFont” font. The default glyph font is stored in the defaultGadgetStyle global variable, in the same way that the default system font is stored. Converting this unearthed a couple of bugs in the Bmp2Font program - it doesn’t write out the resultant font constuctor properly (it misses out the transparent colour and uses the raw bitmap data instead of the bitmap wrapper object). I still need to fix those.

The most problematic changes were convincing buttons that displayed glyphs to centre their text properly (turns out that they were centring based on the wrong font) and getting the MultiLineTextBox to redraw at a reasonable speed. After some cunning hacking about with the WoopsiString and the MultiLineTextBox I’ve got it back up to something approaching its old speed.

Regarding speed, unicode definitely has a negative impact. Having to scan through an entire string to locate a codepoint is a horrible performance killer. WoopsiString::getCharAt() is definitely not a function you should use if you’re reading out sequential characters, as it looks for the codepoint from the start of the string each time it is called. It’s much more advisable to use getToken() to get a pointer to the first token, read out the codepoint, then increase the pointer by the size of each character as it is read. Before I went back to optimise things, the MultiLineTextBox was using getCharAt() when drawing and so was the PackedFontBase, meaning the string was being scanned through twice for each character being printed. Xcode’s “Shark” profiling utility noted that the MultiLineTextBox was using 31% of the total CPU time when printing a line of text solely because of all the UTF-8 parsing. Yikes.

There are still a lot of gadgets to convert over, and probably a hefty set of bugfixes and optimisations to make.

Comments

Lakedaemon on 2010-01-05 at 23:02 said:

Regarding speed, unicode definitely has a negative impact.

Actually, it shouldn’t have that much impact if iterating the strings is done right :
i.e. starts on the first char and then “hop” to the next char by adding the value returned by getToken

regular string : +1 +1 +1 +1 +1 +1 +1 utf8 string : +2 +1 +3 +1 +4 +2 +1

iterating a string is still O(n)

But as I lacked time and wanted to see if the whole thing was going to work, I did a quick/dirty hack/workaround ans Implemented it in a quadratic way, as you depicted it and the way I did it… it is O(n^2) …ouch… you really feel it when you put a 300+ chars long text in a multilinebox.

So, indeed, after I had made everything to work, I was planning to modify the algorythms to iterate on utf8 string the right way (i.e O(n)).

for example, this piece of code for (s32 i = 0; i getCharWidth(lineData->getCharAt(i));

should be replaced by something O(n) like (with my old syntax, not the new one) :

u32 codepoint; char* Token= lineData->getCharAt(i); u32 i=0; while (i getCharWidth(codepoint); i++; }

There are just a few places in the code where this will have to be done (mainly in the wrapping code and the (displayport) graphing code… I did it in a few places but didn’t touch the wrapping code because it was quite complicated and I was in a hurry. (Besides I’m a beginner in the c++ business ;) )

On a side note, I have ported 90% of my app to Woopsy (nice library by the way) and I should resume working on freetype/unicode support soon (and contribute a bit more).

ant.simianzombie.com » Woopsi 0.99 Released on 2010-05-21 at 07:55 said:

[…] to the WoopsiString class. Nor had I decided that WoopsiStrings, not char arrays, would be the official way of working with string data. I set about documenting Woopsi’s string support assuming that they were too fundamental to […]