2007-09-24

Screen Refreshing Algorithms

The most notably “academic” aspect of Woopsi will be the screen refresh code. Optimised routines for drawing 2D primitives are important to the system, but I doubt that anyone ever received a PhD for coming up with a faster horizontal line routine. Screen refreshing is slightly more involved, and so it’s the kind of algorithm that people have probably spent years refining. I’m pretty sure that overlapping windows stumped the XEROC PARC guys, and I’m fairly certain that the Apple engineers who saw it demoed incorrectly remembered seeing overlapping windows and so put a considerable amount of effort into implementing it.

I could check those two facts on Wikipedia, but hey - blogs are notorious for containing hearsay and unconfirmed facts. Who am I to break to mold?

Woopsi’s current refresh code is the simplest possible system I could think of. Any time a window moves, it calls the draw() function of its parent screen, which recursively redraws every gadget in every window on the screen. It does this indiscriminantly, so windows and gadgets unaffected by the moving window are also drawn. This is really slow, but the quick fix let me get on with more interesting things. I’m now at the point where I really need to sort the refresh code out. My original plan for this algorithm, which I’ve documented somewhere, went like this:

  • When a window moves, send its previous co-ordinates to the screen.
  • Redraw the section of screen that falls within that region (this is just a filled rect job).
  • Send the co-ordinates to each window, from the lowest to the highest.
  • If the window falls within the invalid region, redraw that section of window (filled rect again), then send the co-ordinates to each gadget within the window.
  • If the gadget falls within the region, redraw the gadget.

Simple enough. However, it’s a solution with two horrible flaws. First of all, we’re drawing every single window that overlaps the invalid region, whether that window is visible or not. This is wasteful. Secondly, and more importantly, we’re drawing entire gadgets. A gadget may extend beyond the invalid region, and overwrite graphics that have not changed. Not a problem if the gadget is entirely visible, but if the gadget is partially obscured, and the obscured region lies outside the invalid rectangle, we’ve got a problem. We’ll overwrite another window, and that window won’t be redrawn.

There is a solution to this - the gadget could send another invalid rectangle to the screen. This has its own set of problems, though, the most important of which is that, for a complex display, we could end up stuck in a recursive loop, redrawing the entire display over and over again, just by invalidating one window.

I put some thought into this, and had a google for other solutions. There doesn’t seem to be much information about this kind of algorithm; perhaps I’m looking for the wrong things. I’d got a vague notion that clipping and BSP trees would come into the solution somewhere, so I tried looking for those and didn’t come up with much.

Undaunted, I took those two concepts and came up with an algorithm of my very own. I shall call it “Ant’s Refreshing Algorithm”, and very refreshing it is too. No doubt someone else has already come up with it, and SCO probably owns the patent on it. Even if they don’t, they’ll say they do and sue me anyway. Or at least, they would do if I was one of their customers and they still had any money left.

Anyway. This is a description of the routine:

  • When a refresh is needed (ie. content drawn to a window that is not at the top of the display, or a window is moved), we record the area changed. We store it as a rectangle struct (x, y, width, height).
  • Send the struct to the screen and request a refresh.
  • The screen creates a vector that will store rectangle structures. This vector represents all of the regions on the display that are still in an invalid state. The initial rectangle struct is inserted into the vector.
  • The screen creates a vector that will store gadgets that need to be redrawn (“dirty” gadgets).
  • The screen sends refresh to window vector, starting with the uppermost window:
    • Each window performs a collision check - does any entry in the the invalid rectangle vector overlap the window?
      • If yes, recurse through gadget hierarchy performing collision checks - does any the entry in the invalid rectangle vector overlap the gadget?
      • If yes, the gadget is added to the screen’s dirty gadget vector.
        • The overlapping rectangle is subtracted from the rectangle that it collided with. For simple collisions this just may mean altering the collision rectangle’s values - if the gadget overlaps exactly half of the collision rectangle, we can just reduce the width or height, and change the co-ordinates as necessary. If we overlap just a chunk of the rectangle (for example, the bottom-right corner), we split the rectangle up into chunks, remove the original rectangle from the vector, and add the new chunks in.
      • Add the window to the vector of “dirty” gadgets stored in the screen. It is important for the redrawing routine that children are added to the vector before parents.
    • Stop recursing once the rectangle vector is empty.
  • If we reach the end of the gadget hierarchy and there are any remaining entries in the invalid gadget, those portions are empty of other gadgets and so need to be filled by the screen itself.

What this routine gives us is a vector containing all of the visible gadgets within the screen that have been invalidated. We achieve this by working from the visible window downwards, removing filled portions of the invalidated rectangle as we go. Redrawing the display is then very easy:

  • Send the original dirty rectangle into the screen’s dirty gadget vector, starting at the back, which will always be the lowest window. This is why we added children before parents - we need parents to be closer to the end of the vector than their children, so that they get drawn first).
  • Draw the gadget, but clip to the original dirty rectangle’s dimensions - we only want to draw the portion of the gadget that falls inside the dirty rectangle.

The only really complex part of the whole routine would appear to be the rectangle divisions we need to do when we’re working out which gadgets to change. Even that shouldn’t be too complicated. I will need to change window/screen borders into gadgets, but then I was intending to make this change all along - I’ve got a plan for programmer-definable alternative window borders using a base “WindowBorder” class and some bitmap stuff.

EDIT:

Hmm, thinking about it, if I scrap the “dirty gadget” vector and have the gadgets re-draw the invalid section immediately, it’ll become a one-pass algorithm that does a lot less drawing. It’ll make drawing more complicated, but it’d be a tidier routine. If I went down this route I’d have to process parents before children, unlike the method outlined above.

Comments

Jeff on 2007-09-25 at 10:04 said:

Actually, in my opinion, optimising your horizontal line routine will have a serious impact on the rest of your system since most of your windows are orthogonal, aren’t they? In fact, optimising vertical lines would probably also be a sensible thing to tackle (and its an easy win).

As to your overlapping window idea, a friend on mine once built one of these for DOS (yes, that DOS) and his approach relied on a back-buffer. If you assign a unique identifier to each window (1, 2, 3, etc) and you can blit horizontal lines quickly, you can keep your back-buffer such that each pixel value identifies the window it is contained in. Whenever you move a window, you compute the difference between old and new windows and re-calculate the window indices in the backbuffer - since windows don’t move that often, its not that big a deal.

Now, when you are drawing into your front window, you take the back-buffer values into account - whenever you want to draw a pixel, you make sure that the backbuffer value matches the current window - it kills the ability to use DMA_Copy() but it does handle the overlapping relatively nicely.

setpixel(x,y,c,w) { int offset = y*WIDTH + x; if (backbuffer[offset]==w) frontbuffer[offset]=c }

looks pretty lumpy but if you drop down to assembler, and have w in a register, I think it becomes fairly straight-forward, given the conditional capabilities of the ARM. And since you will have an optimised sethorizontalline(), you don’t keep calculating offset all the time, you calculate it once at the start then increment it as x increases. ie,

setnpixels(x,y,c,w,n) { int offset = yWIDTH+x; ui8 backbuffer = …baseofbuffer…+offset; ui16 *frontbuffer = …baseofbuffer… + offset; while (n-- > 0) { if (*backbuffer++ == w) *frontbuffer = c; frontbuffer++; } }

If accessing the backbuffer memory is slow, you could always DMA_Copy it to a local buffer on the stack.

It might also be possible to maintain a screen-size BITMAP per window instead - ie, one bit per pixel used by each window. Its harder to visualise in C but the ARM has some nice bit fiddling instructions that might make it a fast solution, and it also has ‘find the first bit set in this word’ instructions that could be used to identify start and finish points along a scanline, perhaps. Not sure that this idea is a winner, but you get my drift, I hope.

The trick is building the bitmaps from front to back, making sure that no window at the back gets to take any bits already used by windows in front of it, which you could do by keeping one more temporary bitmap of “all the bits used already” and then for each window from front to back: a) build windows mask naively (another optimised setline :) b) xor with the ‘all bits used already’ c) or window mask into ‘all bits used already’

Again, you only do this when a window moves (or appears)

As to your idea of a vector of rectangles, thats pretty much how Apple’s “regions” work (from my naive understanding) and its a whole lot of work to get it straight - there’s a reason that Apple are better than Microsoft at that game.

ant on 2007-09-25 at 20:40 said:

Interesting ideas, but I think you’d lose me completely at the assembly level - I haven’t got around to learning ARM asm yet. Must get a book at some point.

The only issue I’d have is that pixel plotting is just too slow to use for large areas of the screen. The DMA hardware is so much faster that I’ve got to implement a solution that uses it at every opportunity.

Speaking of which, I’ve got the algorithm I described all coded up. Time for a new blog entry…