2008-02-28

Vectors and Linked Lists

I had a quick go at replacing the vector class this evening with a non-STL equivalent. The obvious route to go for is a linked list, so that’s what I created - a template class representing a doubly-linked list with both sequential access and simulated random access.

Random access is simulated by iterating along the list until the correct index is reached, which gives me bounds checking for free. Sequential access is wrapped up inside the class and hidden away, and is handled by an “iterator” member variable. It can be repositioned to the start or end of the list or moved to a random index, from where it can be moved up or down the list of items.

I got the class finished and plugged it into the code (after some tinkering). It didn’t work. One of the main problems with it is the internal iterator pointer - if one iterative function calls another, the iterator gets moved around and nothing works. I knew it wasn’t an ideal solution when I wrote it, and had the nagging feeling that it wasn’t going to work, but this obvious problem didn’t become apparent until I actually watched the code in a debugger.

I can see two ways out of this. The easiest way is to move the iterator code into a separate class, which will mean I can have as many iterators as I want (I’m thinking of something like .NET’s “GetEnumerator()” function). However, this doesn’t solve the basic problems of a linked list. Although I use a lot of sequential access in Woopsi, a significant amount of code uses random access of the gadget vectors. Linked lists aren’t designed for random access.

I’d love to have a C#/Javascript-style custom iterator…

What would be better is a standalone implementation of the vector that just supplies the features that Woopsi uses. First question, then - what is a vector? The Wikipedia article and the description on cplusplus.com suggest that the vector is really nothing more than a dynamic array, or an array that can resize itself. This is typically achieved by allocating a fixed size to start off with (16 elements, say), then allocating a larger array if the original array fills up (32 elements - the original array plus 16 more), copying from one to the other, and finally freeing the original memory. It offers less memory fragmentation than a linked list and true random access, but the resize operation is much more expensive. Insert and remove operations anywhere but at the end of the array are similarly more costly.

I think I’m going to go for the second option - a dynamic array class. It might even be possible to abuse the DMA hardware further and get it to help out with the resize.

Comments

jeff on 2008-02-28 at 01:11 said:

If you deferred the creation of the sub-gadget vector till you need it, you could have each container class initialise the vector with its preferred size ie, my Window knows its going to have max 12 gadgets so it tells you in advance to allocate an array that big - sure, if you go over, it costs on reallocate operations, but if that happens to you, you should have specified a better max at the outset.

One other question - why do you need random access to the vector? ie, when do you index a gadget by its position in the vector, other than when iterating through the content of the vector

For the gadget vector, I doubt highly that normal use involves deleting gadgets (though it would happen) - you’d probably be fine if you just NULL’d entries in the vector and skipped over them during iteration. And you could grow the vector by keeping an overflow pointer at the end that pointed to the next block of pointers.

ant on 2008-02-28 at 10:10 said:

Deferring creation of the vector would create extra steps for the user and extra complexity; I’d rather go with a sensible initial array size and not have to worry about it. I like APIs that allow me to be lazy.

There are a number of points in the code when random access would be an advantage. The other reason for needing it is that there are quite a few other places where I’ve used the vector class - the invalid rectangle system is a big one. That’ll need replacing too, so I need a container that is generic rather than lots of minor variations on the same theme.

Deleting gadgets will definitely be important to me - every time a window closes it gets deleted (that’s the default behaviour, anyway).

I thought about using the “overflow pointer” idea (it’s called an “unrolled linked list”, I think - each item in the list is itself an array), but although it reduces memory fragmentation it still doesn’t give you random access.

I’ll probably get the dynamic array class written then tidy up the existing linked list, and choose the best one for each instance of the vector class that I’m removing.

Jeff on 2008-02-28 at 20:24 said:

Deferring creation doesn’t have to be visible to the user. You simply NULL the pointer at constructor time, and any time you reference the list, you check if its NULL first, and allocate a new, default size, one if its not there. Of course, there is no need to allocate one if its not there yet and you are calling a function that does a find, or an iterate, etc. You just take the NULL path out.

If the user calls Gadget::SetSubgadgetArraySize(20), then you allocate an array of the desired size at that point.

“Optimal memory usage” isn’t compatible with “lazy”

I think you’ll find this would actually speed things up as well, since a NULL check will be a one-word-test that will make all gadgets that don’t actually have sub-gadgets faster because they won’t do anything.

As to the rectangle list, do you ever delete rectangles from the MIDDLE of the vector? As I recall, you are constantly splitting the current rectangle into a bunch of new ones that you APPEND and then process like a queue. (ie, from the front, not randomly) ?

ant on 2008-02-28 at 23:49 said:

Yep, rectangles are deleted from the middle of a vector. The routine loops through the entire rectangle vector for each gadget in the system splitting/deleting any rectangles that overlap a gadget. There’s no guarantee where the rectangle will be in the vector. Rectangles created during a split are inserted to the start of the vector (if I remember correctly) so that they aren’t examined again during the loop.

Anyhow, I’ve got both a linked list (with separate iterator class) and a dynamic array written. I’ll go through all instances of the vector and choose the best one for the given situation. The NULL idea is a good plan, but it’ll take some refactoring.