2012-11-27

Diffing NSManagedObjects

I’m currently working on a system to calculate the differences between two CoreData objects. Why would you want to do that, you ask? In my case, I’m trying to transmit the changes made to the object to a web server using the least amount of data and I don’t want to bother trying to store the changes as some kind of activity log. As discussed previously, I’m storing two copies of my objects: one that represents the last-known server state, and one that is changed by the user.

Here’s an imaginary CoreData model that we’ll use for all examples:

  • Person

    • firstName (string)
    • lastName (string)
    • age (int)
    • address (Address)
  • Address

    • street (string)
    • city (string)
    • state (string)
    • postalCode (string)

The “Person” entity has a related “Address” object, which gives the address at which the person lives.

In my first attempt, I came up with a particularly brittle solution and created the diff manually. The method takes in two parameters (the new and old objects that represent its different states) and computes the diff like this:


NSMutableDictionary *diff = [NSMutableDictionary dictionary];

if (![newObj.firstName isEqualToString:oldObj.firstName]) {
    diff[@"firstName"] = @{ @"old": oldObj.firstName, @"new": newObj.firstName };
}

Each attribute is compared and, if inequalities are found, the new and old values are stored in the dictionary. We can then transmit this minimal amount of information and the server can figure out what changed and how.

This solution is tied to the structure of the data model. Any changes in the data model mean we have to change the diff method. A better way would be to exploit features of the CoreData API that allow us to use a metaprogramming approach that has no foreknowledge of the data model and figure out the whole thing out on the fly.

CoreData has a number of useful features that allow us to do this. Firstly, all NSManagedObject instances are composed of two types of data: attributes and relationships. Attributes are basic properties of an object, such as ints, strings, doubles, etc. Relationships connect different NSManagedObjects together. To demonstrate this on the example model:

  • Person

    • Attributes:
      • firstName (string)
      • lastName (string)
      • age (int)
    • Relationships:
      • address (Address)
  • Address

    • Attributes:
      • street (string)
      • city (string)
      • state (string)
      • postalCode (string)

Each NSManagedObject has an associated “entity” instance that contains metadata about its attributes and relationships. If we wanted to list all of the names of the attributes for a Person instance, we could do this:


NSArray *attributeNames = [[[person entity] attributesByName] allKeys];

for (NSString *name in attributeNames) {
    NSLog(@"%@", name);
}

The output would be as follows:

age
firstName
lastName

We can do the same with relationship names:


NSArray *relationshipNames = [[[person entity] relationshipsByName] allKeys];

for (NSString *name in relationshipNames) {
    NSLog(@"%@", name);
}

Here’s the output:

address

We’re now able to list all of the properties of an NSManagedObject. Armed with this, and adding in a little Objective-C metaprogramming, we can write a function that can identify differences between two managed objects. Assuming we have two valid NSManagedObject pointers, here’s a simple diff function:


- (NSDictionary *)diffNewObject:(NSManagedObject *)newObject withOldObject:(NSManagedObject *)oldObject {
    
    NSMutableDictionary *diff = [NSMutableDictionary dictionary];

    NSArray *attributeNames = [[[newObject entity] attributesByName] allKeys];

    for (NSString *name in attributeNames) {

        SEL selector = NSSelectorFromString(name);

        id newValue = [newObject performSelector:selector];
        id oldValue = [oldObject performSelector:selector];

        if (![newValue isEqual:oldValue]) {
            diff[name] = @{ @"new": newValue, @"old": oldValue };
        }
    }

    return diff;
}

One of the metaprogramming tricks we’re using is “NSSelectorFromString()”, which can be thought of as giving us a function pointer, though that isn’t really at all accurate. Since entity attributes all map to properties of the NSManagedObject instance, we can treat them as selectors and call them using the “performSelector:” method. Once we’ve extracted the new and old values we can compare them and, if they are different, add them to the diff dictionary.

Let’s see what happens if we run it:


Person *person1 = [self createPerson];
person1.firstName = @"Joe";
person1.lastName = @"McJoe";
person1.age = @23;

Person *person2 = [self createPerson];
person2.firstName = @"Bob";
person2.lastName = @"McJoe";
person2.age = @23;

NSDictionary *diff = [self diffNewObject:person1 withOldObject:person2];

NSLog(@"%@", diff);

The output we get is this:

{
    firstName = {
        new = Joe;
        old = Bob;
    };
};

Now the attributes of the entities in our data model can change without the diff method breaking.

The method is still a little brittle, however. What happens if we’re trying to diff two objects of different types that share some of the same properties? What happens if some of the properties are nil? (NSDictionaries cannot store nil values.) What if one of the objects is nil?

Here’s an updated version of the method that accounts for all of these possibilities:


- (NSDictionary *)diffNewObject:(NSManagedObject *)newObject withOldObject:(NSManagedObject *)oldObject {
    
    NSMutableDictionary *diff = [NSMutableDictionary dictionary];

    NSArray *attributeNames = newObject == nil ? [[[oldObject entity] attributesByName] allKeys] : [[[newObject entity] attributesByName] allKeys];

    for (NSString *name in attributeNames) {

        SEL selector = NSSelectorFromString(name);

        id newValue = nil;
        id oldValue = nil;

        if (newObject != nil && [newObject respondsToSelector:selector]) newValue = [newObject performSelector:selector];
        if (oldObject != nil && [oldObject respondsToSelector:selector]) oldValue = [oldObject performSelector:selector];

        newValue = newValue ? newValue : [NSNull null];
        oldValue = oldValue ? oldValue : [NSNull null];

        if (![newValue isEqual:oldValue]) {
            diff[name] = @{ @"new": newValue, @"old": oldValue };
        }
    }

    return diff;
}

There are a few changes:

  • We ensure that we get the attribute list from a non-nil object;
  • We ensure that the objects respond to the selectors before trying to call them;
  • We replace nil values with NSNull.

Unfortunately, finding the diff between two objects’ attributes is the easy part. Producing a diff of the relationships is rather harder. We made a mistake in our object model, so let’s correct that now and see one of the reasons why diffing relationships is hard:

  • Person

    • Attributes:
      • firstName (string)
      • lastName (string)
      • age (int)
    • Relationships:
      • address (Address)
  • Address

    • Attributes:
      • street (string)
      • city (string)
      • state (string)
      • postalCode (string)
    • Relationships:
      • person (Person)

Apple recommend including an inverse of all relationships. Our Person entity has a relationship to the Address entity, so our Address entity should include the inverse relationship back to the Person entity.

Hopefully the problem we’ve introduced is apparent. We’ve created a cycle in what was previously an acyclic graph. If we try and walk the graph from a Person to an Address, we’ll find that we need to walk back to the Person. And back to the Address. And back to the Person. And now we’ve recursed too far and blown the stack.

CoreData includes another feature that can help mitigate this problem. It isn’t a perfect solution, but it should suffice in most situations. Each relationship includes a deletion rule that tells CoreData what to do with related entities each time an entity is deleted. In this case, if we were to delete a Person we’d want their Address to be deleted. However, if we delete an Address (because the person sells his house), we don’t want to delete the person.

We’ll add that metadata to our data model:

  • Person

    • Attributes:
      • firstName (string)
      • lastName (string)
      • age (int)
    • Relationships:
      • address (Address) (Delete: cascade)
  • Address

    • Attributes:
      • street (string)
      • city (string)
      • state (string)
      • postalCode (string)
    • Relationships:
      • person (Person) (Delete: nullify)

We can use the deletion rule to infer ownership of one entity by another. The Person entity can be said to “own” its related Address entity because deleting the Person deletes the Address. The Address entity can be said to be “owned” by its related Person entity because deleting the Address won’t delete the Person. We still have cycles in the graph, but now we can use its deletion rules as direction indicators that tell a walk algorithm which paths it can take through the graph.

Here’s a graph walk algorithm that uses this new information:


- (void)walkGraph:(NSManagedObject *)managedObject {

    NSLog(@"%@", [[managedObject entity] name]);
    
    NSDictionary *relationships = [[managedObject entity] relationshipsByName];

    for (NSString *name in relationships) {

        NSRelationshipDescription *relationship = relationships[name];

        if (relationship.deleteRule != NSCascadeDeleteRule) continue;

        SEL selector = NSSelectorFromString(name);

        NSManagedObject *child = [managedObject performSelector:selector];

        [self walkGraph:child];
    }
}

We’re using a familiar set of metaprogramming tools: NSSelectorFromString(), performSelector: and relationshipsByName. Here’s what it will output if we feed it a fully-populated Person instance:

Person
Address

Note that an alternative would be to maintain a list of walked objects instead of using the deletion rule/ownership approach, but this could introduce its own set of problems.

At this point, we can add a couple of methods and produce a full diff between two Person objects:


- (NSDictionary *)diffNewObject:(NSManagedObject *)newObject withOldObject:(NSManagedObject *)oldObject {
    
    NSDictionary *attributeDiff = [self diffAttributesOfNewObject:newObject withOldObject:oldObject];

    NSDictionary *relationshipsDiff = [self diffRelationshipsOfNewObject:newObject withOldObject:oldObject];

    NSMutableDictionary *diff = [NSMutableDictionary dictionary];

    if (attributeDiff.count > 0) {
        diff[@"attributes"] = attributeDiff;
    }

    if (relationshipsDiff.count > 0) {
        diff[@"relationships"] = relationshipsDiff;
    }

    return diff;
}

- (NSDictionary *)diffRelationshipsOfNewObject:(NSManagedObject *)newObject withOldObject:(NSManagedObject *)oldObject {

    NSMutableDictionary *diff = [NSMutableDictionary dictionary];
    
    NSDictionary *relationships = newObject == nil ? [[oldObject entity] relationshipsByName] : [[newObject entity] relationshipsByName];

    for (NSString *name in relationships) {

        NSRelationshipDescription *relationship = relationships[name];

        if (relationship.deleteRule != NSCascadeDeleteRule) continue;

        SEL selector = NSSelectorFromString(name);

        id newValue = nil;
        id oldValue = nil;

        if (newObject != nil && [newObject respondsToSelector:selector]) newValue = [newObject performSelector:selector];
        if (oldObject != nil && [oldObject respondsToSelector:selector]) oldValue = [oldObject performSelector:selector];

        NSDictionary *relationshipDiff = [self diffNewObject:newValue withOldObject:oldValue];

        if (relationshipDiff.count > 0) {
            diff[name] = relationshipDiff;
        }
    }

    return diff;
}

- (NSDictionary *)diffAttributesOfNewObject:(NSManagedObject *)newObject withOldObject:(NSManagedObject *)oldObject {

    NSMutableDictionary *diff = [NSMutableDictionary dictionary];

    NSArray *attributeNames = newObject == nil ? [[[oldObject entity] attributesByName] allKeys] : [[[newObject entity] attributesByName] allKeys];

    for (NSString *name in attributeNames) {

        SEL selector = NSSelectorFromString(name);

        id newValue = nil;
        id oldValue = nil;

        if (newObject != nil && [newObject respondsToSelector:selector]) newValue = [newObject performSelector:selector];
        if (oldObject != nil && [oldObject respondsToSelector:selector]) oldValue = [oldObject performSelector:selector];

        newValue = newValue ? newValue : [NSNull null];
        oldValue = oldValue ? oldValue : [NSNull null];

        if (![newValue isEqual:oldValue]) {
            diff[name] = @{ @"new": newValue, @"old": oldValue };
        }
    }

    return diff;
}

We’ve added two new methods. The new diffRelationshipsOfNewObject:withOldObject: method expands on our walkGraph: method by checking for nil, the ability to perform a given selector, and by producing a diff. The new diffNewObject:withOldObject: method collates the attribute and relationship diffs into a single dictionary. It is called recursively by the relationship diff method to ensure that the entire object graph is compared.

Although we can now successfully create a full diff of the graph described by our Person entity, we haven’t yet managed to cover all of the possibilities offered by CoreData. Suppose we modify our data model to accommodate people who own multiple houses?

  • Person

    • Attributes:
      • firstName (string)
      • lastName (string)
      • age (int)
    • Relationships:
      • address (Address) (Delete: cascade) (To many)
  • Address

    • Attributes:
      • street (string)
      • city (string)
      • state (string)
      • postalCode (string)
    • Relationships:
      • person (Person) (Delete: nullify) (To one)

Our relationship diff method no longer works. When we retrieve the value of the relationship we receive an NSSet pointer rather than an NSManagedObject pointer and we haven’t written any code to handle that. Even if we write code to iterate over the set and diff individual items, we run into some nasty problems:

  • If we have two addresses and change one, how do we match up the addresses in the two Person objects?
  • How can we tell if an address has been deleted?
  • How can we tell if an address has been added?

It’s clear that our Address entity needs to include an immutable identifier that is unique within the context of its set, but is shared between copies of a Person. NSManagedObjects include an objectID property, but this is unique within the entire dataset so can’t be used. We’ll need to add a new attribute to the Address entity:

  • Address
    • Attributes:
      • guid (string)
      • street (string)
      • city (string)
      • state (string)
      • postalCode (string)
    • Relationships:
      • person (Person) (Delete: nullify) (To one)

I’ve plumped for a GUID, but you could use a bog-standard string or a number.

Now we can identify Address entities by their GUID, but we’ve got another problem. By design, our diff method is entirely ignorant of the structure of our object model. It doesn’t know that it is trying to diff Address objects and therefore can’t know that it should identify them using their “guid” attribute. To get around this, we’ll use yet another meta-feature of CoreData: user info.

Everything in a CoreData model includes a “user info” dictionary. This is a collection of key/value pairs of user-created metadata that can be examined at run time. By “everything”, I mean precisely that: entities, attributes and relationships all contain a “user info” dictionary.

We need the guid property of the Address entity to be identifiable as a unique identifier at run time, so we can exploit the user info dictionary to do this. We’ll add a single key/value pair to the Address entity’s user info that contains “id” as the key and “guid” as the value. At run time, any time we encounter a one-to-many relationship, we’ll examine the members of the related set and locate their “id” user info key. The value will tell us which property contains that entity’s unique identifier, which we can then use to match up objects in the two sets.

Our Address entity now looks like this:

  • Address (User info: @{ @“id”: @“guid” })
    • Attributes:
      • guid (string)
      • street (string)
      • city (string)
      • state (string)
      • postalCode (string)
    • Relationships:
      • person (Person) (Delete: nullify) (To one)

The method for diffing two sets looks like this:


- (NSArray *)diffNewSet:(NSSet *)newSet withOldSet:(NSSet *)oldSet {
    
    NSMutableArray *changes = [NSMutableArray array];

    // Find all items that have been newly created or updated.
    for (NSManagedObject *newItem in newSet) {

        NSString *idAttributeName = newItem.entity.userInfo[@"id"];

        NSAssert(idAttributeName, @"Entities must have an id property set in their user info.");

        id newItemId = [newItem valueForKey:idAttributeName];

        NSManagedObject *oldItem = nil;

        for (NSManagedObject *setItem in oldSet) {
            id setItemId = [setItem valueForKey:idAttributeName];

            if ([setItemId isEqual:newItemId]) {
                oldItem = setItem;
                break;
            }
        }

        NSDictionary *diff = [self diffNewObject:newItem withOldObject:oldItem];

        if (diff) {
            [changes addObject:diff];
        }
    }

    // Find all items that have been deleted.
    for (NSManagedObject *oldItem in oldSet) {

        NSString *idAttributeName = oldItem.entity.userInfo[@"id"];

        NSAssert(idAttributeName, @"Entities must have an id property set in their user info.");

        id oldItemId = [oldItem valueForKey:idAttributeName];

        NSManagedObject *newItem = nil;

        for (NSManagedObject *setItem in newSet) {
            id setItemId = [setItem valueForKey:idAttributeName];

            if ([setItemId isEqual:oldItemId]) {
                newItem = setItem;
                break;
            }
        }

        if (!newItem) {
            NSDictionary *diff = [self diffNewObject:newItem withOldObject:oldItem];

            if (diff) {
                [changes addObject:diff];
            }
        }
    }

    return changes;
}

We’ll create a pair of example sets and see what we get out of the method. Here are our sets:

NewSet = (
    Address {
        guid = "123";
        street = "Somewhere";
        city = "Sometown";
        postalCode = "54532";
    },
    Address {
        guid = "567";
        street = "Elsewhere";
        city = "Another Town";
        postalCode = "222";
    }
};

OldSet = (
    Address {
        guid = "123";
        street = "Somewhere";
        city = "Townsville";
        postalCode = "54532";
    }
);

You should be able to see that we’ve added a new address and changed the existing address’ city attribute. This is the array produced by the method:

(
    {
        attributes: {
            city: {
                old: "Townsville",
                new: "Sometown"
            }
        }
    },
    {
        attributes: {
            guid: {
                old: "<null>",
                new: "567"
            },
            street: {
                old: "<null>",
                new: "Elsewhere"
            },
            city: {
                old: "<null>",
                new: "Another Town"
            },
            postalCode {
                old: "<null>",
                new: "222"
            }
        }
    }
)

Looks good. We can see that one of the array entries contains just the old and new values for “city” as we’d expect. The other entry contains values for all of the Address object’s attributes as we’ve compared it to nil. It didn’t exist previously. The only issue we have is that we can’t tell which Address object the first list entry represents because its guid attribute isn’t included. We can rectify this omission by modifying the diffNewObject:withOldObject: method:


- (NSDictionary *)diffNewObject:(NSManagedObject *)newObject withOldObject:(NSManagedObject *)oldObject {
    
    NSDictionary *attributeDiff = [self diffAttributesOfNewObject:newObject withOldObject:oldObject];

    NSDictionary *relationshipsDiff = [self diffRelationshipsOfNewObject:newObject withOldObject:oldObject];

    NSMutableDictionary *diff = [NSMutableDictionary dictionary];

    if (attributeDiff.count > 0) {
        diff[@"attributes"] = attributeDiff;
    }

    if (relationshipsDiff.count > 0) {
        diff[@"relationships"] = relationshipsDiff;
    }

    if (diff.count > 0) {
        diff[@"entityName"] = newObject ? newObject.entity.name : oldObject.entity.name;

        NSString *idAttributeName = newObject ? newObject.entity.userInfo[@"id"] : oldObject.entity.userInfo[@"id"];

        if (idAttributeName) {
            id itemId = newObject ? [newObject valueForKey:idAttributeName] : [oldObject valueForKey:idAttributeName];

            if (itemId) {
                diff[idAttributeName] = itemId;
            }
        }
    }

    return diff;
}

We’re fetching the unique identifier attribute and adding its value to the diff. We’ve also added the entity’s name, which may help when dealing with the diff once it has been transmitted and received and needs to be parsed. The output from our previous pair of sets now looks like this:

(
    {
        attributes: {
            city: {
                old: "Townsville",
                new: "Sometown"
            }
        },
        entityName = "Address",
        guid: "123"
    },
    {
        attributes: {
            guid: {
                old: "<null>",
                new: "567"
            },
            street: {
                old: "<null>",
                new: "Elsewhere"
            },
            city: {
                old: "<null>",
                new: "Another Town"
            },
            postalCode {
                old: "<null>",
                new: "222"
            }
        },
        entityName = "Address",
        guid: "567"
    }
)

The final piece of the puzzle is to plug this set diffing method into our relationship diff method. Fortunately, the relationship description class includes an “isToMany” property that will allow us to use the correct diffing method for a given relationship. Here’s the final set of methods required for diffing a pair of managed objects:


- (NSDictionary *)diffNewObject:(NSManagedObject *)newObject withOldObject:(NSManagedObject *)oldObject {
    
    NSDictionary *attributeDiff = [self diffAttributesOfNewObject:newObject withOldObject:oldObject];

    NSDictionary *relationshipsDiff = [self diffRelationshipsOfNewObject:newObject withOldObject:oldObject];

    NSMutableDictionary *diff = [NSMutableDictionary dictionary];

    if (attributeDiff.count > 0) {
        diff[@"attributes"] = attributeDiff;
    }

    if (relationshipsDiff.count > 0) {
        diff[@"relationships"] = relationshipsDiff;
    }

    if (diff.count > 0) {
        diff[@"entityName"] = newObject ? newObject.entity.name : oldObject.entity.name;

        NSString *idAttributeName = newObject ? newObject.entity.userInfo[@"id"] : oldObject.entity.userInfo[@"id"];

        if (idAttributeName) {
            id itemId = newObject ? [newObject valueForKey:idAttributeName] : [oldObject valueForKey:idAttributeName];

            if (itemId) {
                diff[idAttributeName] = itemId;
            }
        }
    }

    return diff;
}

- (NSDictionary *)diffRelationshipsOfNewObject:(NSManagedObject *)newObject withOldObject:(NSManagedObject *)oldObject {

    NSMutableDictionary *diff = [NSMutableDictionary dictionary];
    
    NSDictionary *relationships = newObject == nil ? [[oldObject entity] relationshipsByName] : [[newObject entity] relationshipsByName];

    for (NSString *name in relationships) {

        NSRelationshipDescription *relationship = relationships[name];

        if (relationship.deleteRule != NSCascadeDeleteRule) continue;

        SEL selector = NSSelectorFromString(name);

        id newValue = nil;
        id oldValue = nil;

        if (newObject != nil && [newObject respondsToSelector:selector]) newValue = [newObject performSelector:selector];
        if (oldObject != nil && [oldObject respondsToSelector:selector]) oldValue = [oldObject performSelector:selector];

        if (relationship.isToMany) {

            NSArray *changes = [self diffNewSet:newValue withOldSet:oldValue];

            if (changes.count > 0) {
                diff[name] = changes;
            }

        } else {

            NSDictionary *relationshipDiff = [self diffNewObject:newValue withOldObject:oldValue];

            if (relationshipDiff.count > 0) {
                diff[name] = relationshipDiff;
            }
        }
    }

    return diff;
}

- (NSDictionary *)diffAttributesOfNewObject:(NSManagedObject *)newObject withOldObject:(NSManagedObject *)oldObject {

    NSMutableDictionary *diff = [NSMutableDictionary dictionary];

    NSArray *attributeNames = newObject == nil ? [[[oldObject entity] attributesByName] allKeys] : [[[newObject entity] attributesByName] allKeys];

    for (NSString *name in attributeNames) {

        SEL selector = NSSelectorFromString(name);

        id newValue = nil;
        id oldValue = nil;

        if (newObject != nil && [newObject respondsToSelector:selector]) newValue = [newObject performSelector:selector];
        if (oldObject != nil && [oldObject respondsToSelector:selector]) oldValue = [oldObject performSelector:selector];

        newValue = newValue ? newValue : [NSNull null];
        oldValue = oldValue ? oldValue : [NSNull null];

        if (![newValue isEqual:oldValue]) {
            diff[name] = @{ @"new": newValue, @"old": oldValue };
        }
    }

    return diff;
}

- (NSArray *)diffNewSet:(NSSet *)newSet withOldSet:(NSSet *)oldSet {
    
    NSMutableArray *changes = [NSMutableArray array];

    // Find all items that have been newly created or updated.
    for (NSManagedObject *newItem in newSet) {

        NSString *idAttributeName = newItem.entity.userInfo[@"id"];

        NSAssert(idAttributeName, @"Entities must have an id property set in their user info.");

        id newItemId = [newItem valueForKey:idAttributeName];

        NSManagedObject *oldItem = nil;

        for (NSManagedObject *setItem in oldSet) {
            id setItemId = [setItem valueForKey:idAttributeName];

            if ([setItemId isEqual:newItemId]) {
                oldItem = setItem;
                break;
            }
        }

        NSDictionary *diff = [self diffNewObject:newItem withOldObject:oldItem];

        if (diff.count > 0) {
            [changes addObject:diff];
        }
    }

    // Find all items that have been deleted.
    for (NSManagedObject *oldItem in oldSet) {

        NSString *idAttributeName = oldItem.entity.userInfo[@"id"];

        NSAssert(idAttributeName, @"Entities must have an id property set in their user info.");

        id oldItemId = [oldItem valueForKey:idAttributeName];

        NSManagedObject *newItem = nil;

        for (NSManagedObject *setItem in newSet) {
            id setItemId = [setItem valueForKey:idAttributeName];

            if ([setItemId isEqual:oldItemId]) {
                newItem = setItem;
                break;
            }
        }

        if (!newItem) {
            NSDictionary *diff = [self diffNewObject:newItem withOldObject:oldItem];

            if (diff.count > 0) {
                [changes addObject:diff];
            }
        }
    }

    return changes;
}

That’s it: diff created! At this point, if you find that you don’t like the way old and new values are grouped together, or want to change any other aspect of the way the diff is produced, you should have enough information about how to use metaprogramming with NSManagedObjects to be able to change things around.

There are still two unsolved problems that I’ll tackle in another post:

  • What do you do if you want to include a related object in the diff if it has its deletion rule set to nullify?
  • What do you do if you want to exclude an attribute or relationship from the diff?

Once those are solved, there’s another big one: how do you apply the diff to an object? I’ll tackle that in another post, too.