2012-06-29

iOS Data Synchronisation Strategies

I’m currently writing an iOS app that is essentially a front-end to a SQL database. Users see data formatted into an attractive hierarchical layout and can enter information using the usual set of lists, pickers and textboxes. However, what makes this app unusual is the requirement that it be usable regardless of whether or not the device has an internet connection. Data can be pulled from the server when the user has an internet connection and can be edited even if the connection drops. When the connection resumes, the device sends the updates to the server and fetches any other changes made.

Immediately this raises all sorts of questions. The really, really big question is this: How does the system resolve conflicts? What happens if two users try to change the same information at the same time? What happens if a user makes changes on a device without a connection, makes conflicting changes on a second device with a connection, and then tries to sync the first device?

Here’s another mindbending requirement: Users can never be expected to manually merge data. When you consider that Apple is trying to hide the filesystem because the average user can’t cope with the concept of hierarchies, this makes sense. How can someone who doesn’t understand a simple folder hierarchy be expected to perform a 3-way merge?

After putting some thought into the problem, I came up with three possible solutions.

Last Write Wins

This is the easiest solution to implement and the most likely to result in data loss. When a device sends its local changes to the server it simply overwrites anything stored there.

Consider this scenario:

  • A user makes some trivial changes to the data on his iPhone.
  • He switches off the phone.
  • He spends a week making extensive changes to the data on his iPad.
  • He switches on his iPhone.
  • His week of changes are entirely overwritten with the data from the iPhone.

The server’s database already exists and cannot have its schema altered, and unfortunately it doesn’t support versioning. Once the data is overwritten it is gone.

Checkin/Checkout

This is the TFS model of working. If I want to edit some data (which can be thought of as a document), I need to check it out first. The document is locked to me and no-one else can edit it in the meantime. Edits are therefore serialised so there’s no chance of conflicting edits being made.

In order to support this, each device must have a unique identifier. Checking a document out to a user isn’t specific enough, because a user could have two devices (as per the “last write wins” scenario) and make conflicting edits on both. As Apple no longer allow apps to access the iOS device’s unique identifier, each installation of the app must generate its own unique ID and store it on the device. This allows a document to be checked out by a specific user on a specific device.

But what if a user leaves his phone at home and needs to checkout the document on a different device? We’ll have to amend the system so that checkouts can be overridden. That creates a new problem: what do we do with documents at checkin time that have had their checkout overridden and are therefore subject to conflicting edits? We have two choices: overwrite everything on the server and lose all changes made on the other device, or pull down the server data and lose everything on this device. We’re losing data again.

Even if we’re happy to accept the possibility of lost data (at least we can blame the users for ignoring the lock warnings) there’s another scenario we have to deal with. What happens if a user has a document stored on his device and wants to edit it but doesn’t have an internet connection? The device can’t contact the server to obtain the lock. Do we allow the edits and hope that no-one else grabs the lock before we get a connection back? What if someone else updates the document and releases the lock before that happens? We won’t know that the document has changed and we lose data.

Checkin/checkout is clearly a bad model:

  • Obtaining a lock without a connection is impossible and any workaround will lead to lost data;
  • Not allowing editing without a lock will prevent the app being used without an internet connection;
  • Allowing locks to be overridden will lead to lost data;
  • Not allowing locks to be overridden will lead to severe usage limitations.

Distributed Version Control

My reference to TFS in the “checkin/checkout” model should suggest my thought process so far: It’s essentially a distributed version control problem. We have a central repository and multiple clients that will:

  • Pull the latest state;
  • Change their data offline;
  • Push back to the server.

Unlike a DVCS, we have two big limitations:

  • The server doesn’t store a history;
  • Merges must be entirely automatic.

It’s important that the clients do as little work as possible in resolving conflicts. It’s possible that clients for other platforms will get written, and their programmers won’t want to re-implement a bunch of merging code that should have been on the server in the first place.

How can you tell a server to merge changes from a client if the server has no idea what its data looked like when the client last performed a pull?

This is my solution:

  • Client pulls data from server.
  • Client stores two copies of the data: One is the “pristine” server state and is immutable; one will be used for editing.
  • When the client pushes, it sends both the pristine and edited states of the data.
  • The server receives the data and compares its current state, the pristine state and the edited state of the data.
  • If the pristine and edited data matches, no changes have been made and the data should not be altered regardless of the current state.
  • If the pristine and edited data doesn’t match, the current data is overwritten with the edited state.
  • If the edited data matches the current data, no changes are made.
  • The resulting dataset is sent back to the client.
  • The client updates its local data with the data received from the server.

Note that, unlike a text document, the data in question can be broken down into discrete pieces. For example, it could contain a person’s name and address, which in turn would be broken down into first name, last name, street, county, post code, etc. Changing any element of the address would change the meaning of the entire address, so any single change would cause all fields to be overwritten with the client’s data. However, changing the address does not imply that the person’s name should change, so that would be examined separately from the address and updated appropriately.

Data that hasn’t been changed by the client won’t overwrite data that has been changed by another client. Data that has been changed by the client will overwrite any other changes. The system automatically merges where there are no conflicts and resolves conflicting edits via “last write wins”.

Other Thoughts

There doesn’t seem to be a foolproof way of ordering overwrites such that the most recently changed data ends up as the final version. I could make changes on my phone, switch it off, make more changes on my iPad and then switch my phone back on. My phone’s older data becomes the canonical version. I could try using a timestamp, but there’s no guarantee that those are correct. Lamport clocks won’t help because, as far as they are concerned, the two edits happened simultaneously.

The problem can be generalised from being considered as a DVCS problem to a distributed database problem, which opens up some more potential research. Reading up on distributed databases led me to the CAP theorem, which states that you can’t have immediate consistency of data if your database is always available (even if the device has no internet connection) and is split into several partitions (ie. a central SQL instance and a local CoreData instance). That means conflicts and merging are inevitable, and the way around it is “eventual consistency”. The disparate datastores will eventually synchronise and will eventually all have the same data; in the meantime, the absolute truth is fractured into the various stores and can only be determined by considering the entire cloud of devices participating in the system.

I installed and played with CouchDB for a while, which quickly supplanted MongoDB as my new favourite NoSQL database. Its approach to handling conflicts during data replication between nodes? Push back to the application and let it deal with the problem. It seems there is no “correct” way to handle merge conflicts automatically. My merging system with its “last write wins” bodge is, I think, the best solution to the problem given the constraints.

2011-08-11

Really Bad Eggs - Objective-C Edition

The Objective-C version of Really Bad Eggs is coming along nicely. I’ll try to get some screenshots and maybe a video up, but in the meantime here’s some stream-of-consciousness pontificating on the port so far.

When I wrote DS code I used to complain about the lack of a debugger. I’ve written most of the Objective-C version of Really Bad Eggs whilst out and about on a Windows laptop, so I haven’t had a debugger or even a compiler. Despite the lack of tools and the game being my first Objective-C program it’s working well so far. Rewriting the code in a new language has allowed me to re-examine a lot of the original ideas. In some cases I’ve refactored both the C++ and Objective-C code to be simpler or more versatile. In others I’ve used features of Objective-C to improve on the original code.

Bug Fixes and General Improvements

The DS’s representation of incoming garbage eggs was inaccurate due to a typo. I’ve fixed this in both versions of the game. I’ve also improved the placement of garbage eggs that are dropped into the grid.

Garbage eggs are always dropped into the columns with the least eggs first. The (simplified) algorithm looks like this:

  • Get a list of columns and their heights, sorted in ascending order by height (insertion sort).
  • Add the leftmost column to the list of “active” columns.
  • While there are eggs to place:
    • Place a garbage egg in each active column.
    • If the column to the right of the last active column has the same height as the last active column, add it to the active column list.

It essentially fills the grid from bottom to top as though it were pouring in water.

There is a subtle problem with the algorithm. The sorting system produces a list of columns that are sorted by height and column index. In SQL, it would look like this:

SELECT
    columnHeight
   ,columnIndex
FROM
    columns
ORDER BY
    columnHeight ASC
   ,columnIndex ASC

If columns 0, 1, 2, and 5 are of height 2, whilst columns 3 and 4 are of height 1, and we add a garbage egg, we know that the egg will be added to column 3. Here’s the column state:

        000  0
        000000
        ------
Column: 012345

Here’s the sorted data:

Column: { 3, 4, 0, 1, 2, 5 }
Height: { 1, 1, 2, 2, 2, 2 }

The first column to receive an egg will be column 3, as it is the left-most column in the sorted list.

This predictability means that advanced players can start using it to their advantage. They can drop a block here because they know they’ve got an incoming garbage egg and want it to go there. They can change their plans because they know what’s going to happen next. To prevent this, the sorting algorithm now randomises columns of the same height within the sorted array.

Memory Management in Objective-C

Memory management is a pain. I can see exactly why Apple have implemented reference counting the way they have. I’ve worked extensively with low-level memory management in C++ due to the combo attack of the DS’ tiny stack/slow CPU. The combination makes both smart pointers and stack-based objects mostly unusable, leading to the same questions time and time again:

  • If I call a function, who owns the memory it allocates? Do I delete it or not?
  • Does this function expect to receive an existing block of memory that I allocate or will it allocate its own?
  • When writing functions, should I allocate memory or should I expect it to be allocated in advance?
  • Should I return allocated memory via the return statement or via a pointer-to-a-pointer passed as a parameter?
  • Should I return an object on the stack and hope RVO kicks in, or should I return a pointer to an object on the heap?
  • Should functions receive and return pointers or references to objects?
  • If an object is told to store a pointer to another object, should it store the pointer or a copy? What if the pointed-to object changes during the lifetime of my object?

I’ve become used to considering these issues every time I work with memory in C++, but Objective-C tries to solve the issues itself. Mostly it uses conventions to suggest the correct approach. Functions with “new” or “init” at the start of the names, for example, will retain the returned object and increase its reference count. The caller therefore becomes the owner of the object. Convenience methods, such as NSArray’s arrayWithObjects method, return autoreleased objects. These are released automatically by the Cocoa framework when the current iteration of the application’s state terminates. If the creating code needs them around longer then it needs to retain them. Collections retain objects added to them, so they can be deemed to be the owners of those objects. It is the responsibility of the object’s previous owner to ensure that he releases the object.

Getting the hang of the conventions is tricky, particularly when dealing with frameworks like cocos2d which aren’t the best architected examples I’ve seen. There are a number of edge cases, too. Suppose you retain a reference to object A, which creates and retains a reference to object B, which itself creates and retains a reference to object C. During its creation, object C retains object B. What happens to B and C when A is released? Why, nothing, of course. B and C have a cyclic relationship. Releasing object A reduces the reference count of B by one, but B is still referenced by C (and vice-versa) so neither will be deallocated. This isn’t an issue in C++ as the developer manually deletes objects. Deleting A will delete B which will delete C (assuming the destructors tidy things up correctly). Nor is it an issue for C#’s garbage collector because it can tell that the only references to B and C are each other.

For the most part, I’ve used what Objective-C calls “weak references” - I’ve got pointers to objects without retaining them, the same as I would in C++. My code probably looks hideous to experienced Objective-C coders as I’ve trampled conventions all over the place.

One of the main reasons I upgraded to OSX Lion so quickly was the promise of ARC, or “Automatic Reference Counting”. Skimming through the docs suggests that there are a lot of edge cases to it, though (that seems to be a common theme with Objective-C - watch out for the edge cases), so I’m sticking with the traditional method first in order to understand fully what ARC is replacing.

A bug that caught me out for a while was a classic Objective-C memory leak. The “Leaks” utility insisted I’d leaked memory from one class like crazy, but it also told me that the reference count for all leaked objects was 0. That makes no sense until you remember that Objective-C allows you to make boneheaded mistakes like this:

- (void)dealloc { }

Oops. All dealloc methods should call the superclass’ dealloc method:

- (void)dealloc {
    [super dealloc];
}

I’d have suggested that Apple use the template pattern for this instead. NSObject could have methods like this:

- (void)onDealloc { }

- (void)dealloc {
    [self onDealloc];
    [super dealloc];
}

User code would look like this:

- (void)onDealloc {
    // Cleanup here
}

It’s now impossible to leak memory by forgetting to dealloc the superclass. I imagine there’s a valid case for giving developers the power to shoot themselves in the foot, but there’s a lot of boilerplate code needed in classes that could be excised very easily by using the template pattern. The same is true for object initialisation. Here’s what you do now:

- (id)init {
    if ((self = [super init])) {
    // Initialise the object
    }

    return self;
}

This is what a templated NSObject could look like:

- (void)onInit { }

- (id)init {
    if ((self = [super init])) {
        [self onInit];
    }

    return self;
}

This is what a user class would look like:

- (void)onInit {
    // Initialise the object
}

Tidier, but potentially less powerful.

Performance

Running the OSX version of Really Bad Eggs through Xcode’s CPU profiler demonstrates how efficiently the game works. Most of the game code barely registers; cocos2d’s drawing routines use several magnitudes more CPU time than anything else in the game. It also demonstrated how abysmally slow the NSArray objects are. One function I used them in took up more CPU power than anything else in the entire game (except the cocos2d drawing routines), so I refactored the function without NSArrays and the problem disappeared.

Dynanism

The dynamic nature of Objective-C means I’m happy to use calls to an object’s isKindOfClass method, whereas in C++ I’d have been more inclined to refactor to remove the need for type checking or calls to dynamic_cast<>(). For example, the C++ function to test if two eggs can be connected looks like this:

void NormalBlock::connect(const BlockBase* top,
                          const BlockBase* right,
                          const BlockBase* bottom,
                          const BlockBase* left) {
    setConnections(top != NULL && top->getColour() == _colour && top->getState() == BlockBase::BLOCK_STATE_NORMAL,
                   right != NULL && right->getColour() == _colour && right->getState() == BlockBase::BLOCK_STATE_NORMAL,
                   bottom != NULL && bottom->getColour() == _colour && bottom->getState() == BlockBase::BLOCK_STATE_NORMAL,
                   left != NULL && left->getColour() == _colour && left->getState() == BlockBase::BLOCK_STATE_NORMAL);
}

All egg classes inherit from the BlockBase class (descriptive name, I know) which includes a “_colour” member. Each subclass sets this to a colour representative of the bitmaps used to display the egg. The red egg has an RGB value of (31, 0, 0) whilst the blue egg has an RGB value of (0, 0, 31). Checking if two eggs are of the same type simply entails a comparison between two 16-bit integers.

The Objective-C function just compares the eggs’ classes instead:

- (void)connect:(BlockBase*)top right:(BlockBase*)right bottom:(BlockBase*)bottom left:(BlockBase*)left {

    BOOL topSet = top != NULL && [top class] == [self class] && top.state == BlockNormalState;
    BOOL rightSet = right != NULL && [right class] == [self class] && right.state == BlockNormalState;
    BOOL bottomSet = bottom != NULL && [bottom class] == [self class] && bottom.state == BlockNormalState;
    BOOL leftSet = left != NULL && [left class] == [self class] && left.state == BlockNormalState;

    [self setConnectionTop:topSet right:rightSet bottom:bottomSet left:leftSet];
}

It’s not a major change, but it does remove some cruft from the code.

Blocks as Callback Functions

The Grid class represents the well of eggs and manages all of the eggs within it. It includes functions for rotating the live eggs, identifying and exploding chains of eggs, and so on. In the C++/DS version, the Grid class is also responsible for calling methods of the SoundPlayer class to trigger audio playback every time anything sound-worthy occurs. This works, but it’s not terribly clean - the Grid shouldn’t need to deal with sounds. The design ties the logic of the game to the platform it is running on.

A better solution would be to trigger an event, but if you’ve ever implemented an event system you’ll know that it’s a very verbose pattern. At the very least, you’ll need interfaces and classes that represent event arguments and event listeners. I’d like to tidy up the design but I’m not that bothered.

A considerably terser option would be to use a function pointer as a callback method, but that opens up a whole can of non-OO, global namespace-polluting nastiness. C++’s pointers-to-members are so limited - the exact type of the pointed-to object must be known - that they aren’t even worth considering.

Another possibility would be to create a message queue and have the grid push messages into it. A message pump would dispatch messages to the sound player object as necessary. Again, this is ugly and involves far too much new code for virtually no benefit.

It was only after working through this familiar thought process that I remembered that Apple has added closures (“blocks”) as a non-standard extension to C. With closures, the problems disappear. No bloated event system, no global namespace pollution with random C functions, and a simple way to extract the sound triggers from the Grid class.

The “Grid.h” file includes the following typedefs:

typedef void(^GridEvent)(Grid*);
typedef void(^GridBlockEvent)(Grid* grid, BlockBase* block);

I think of the typedefs as being analogous to C#’s delegate signatures. The class itself includes the following members:

GridEvent _onGarbageRowAdded;
GridEvent _onLand;
GridEvent _onGarbageLand;
GridBlockEvent _onBlockAdd;
GridBlockEvent _onBlockRemove;
GridBlockEvent _onGarbageBlockLand;

The class triggers the closures like so:

if (_onBlockLand != nil) _onBlockLand(self);

Wiring up code to an instance of the Grid class is simple:

Grid* grid = [[Grid alloc] initWithHeight:2 playerNumber:0];

grid.onBlockLand = ^(Grid* sender) {
    printf("A block has landed");
};

It looks like C, and it tastes like C, but I can perform JavaScript-like tricks with it. ‘Tis a beautiful abomination.

I’ve used this pattern all over the code to trigger sounds and visual effects as well as add/move/remove sprites. The game engine code is completely separate from the presentation layer.

Classes as Objects

The BlockFactory class has also benefited a little from the switch to Objective-C. When a grid needs to place a new egg, it fetches a new egg instance from the BlockFactory rather than create the egg itself.

The BlockFactory is necessary because all players receive the same sequence of eggs. Suppose player 1 requests his third pair of eggs and is given a red egg and a blue egg. When player 2 requests his third pair of eggs, he must also be given a red egg and a blue egg. No player can “cheat” due to a particularly advantageous sequence of eggs. Winning is a matter of skill and speed rather than luck.

The BlockFactory must, therefore, maintain a list of all egg types that have been generated. It also needs to remember which pair of eggs all players will receive next. Once a pair of eggs has been used by all players it can be removed from the list. Attempts to retrieve a pair of eggs not in the list will cause a new pair of random egg types to be created and appended to the list. The BlockFactory instance is shared across all players.

In the C++ version, the list stores enum values that represent an egg type. When a new egg is requested, the correct constructor is called for that value using a switch() statement. Here’s the code that adds a new item to the list:

BlockFactory::BlockType BlockFactory::getRandomBlockType() const {
    return (BlockFactory::BlockType)(rand() % _blockColourCount);
}

void BlockFactory::addRandomBlock() {
    _blockList.push_back(getRandomBlockType());
}

Here’s the code that creates a new egg from a given type:

BlockBase* BlockFactory::newBlockFromType(BlockFactory::BlockType type) const {
    switch (type) {
        case BLOCK_RED:
            return new RedBlock();
        case BLOCK_PURPLE:
            return new PurpleBlock();
        case BLOCK_YELLOW:
            return new YellowBlock();
        case BLOCK_BLUE:
            return new BlueBlock();
        case BLOCK_GREEN:
            return new GreenBlock();
        case BLOCK_ORANGE:
            return new OrangeBlock();
    }

    // Included to silence compiler warning
    return new RedBlock();
}

The BlockFactory class includes a BlockType enum necessary for this scheme to work.

The Objective-C version dumps the enum and uses a tidier solution. As classes in Objective-C are themselves objects, storing enum values in the list is unnecessary. Instead, the list stores pointers to the class objects. Here’s the code that adds a new item to the list:

- (void)addRandomBlockClass {
    [_blockList addObject:[self randomBlockClass]];
}

- (Class)randomBlockClass {
    int type = rand() % _blockColourCount;

    switch (type) {
        case 0:
            return [RedBlock class];
        case 1:
            return [BlueBlock class];
        case 2:
            return [YellowBlock class];
        case 3:
            return [PurpleBlock class];
        case 4:
            return [GreenBlock class];
        case 5:
            return [OrangeBlock class];
    }

    // Included to silence compiler warning
    return [RedBlock class];
}

Creating a new BlockBase object from the list becomes trivial. A single line of code will create a new BlockBase object from an item in the list:

BlockBase* block = [[[_blockList objectAtIndex:0] alloc] init];

The item in the list is a class, so calling its alloc and init methods gives us a new object automatically.

The difference between the two sets of code is minor, but conceptually it’s a big leap. Objective-C lets me store classes in arrays. I’m pretty certain that the only other language I’ve used that could do anything even remotely like this is JavaScript.