Core Data Queries Using Expressions

Core Data can have a steep learning curve for anybody new to either the Mac or iOS platforms. One of the key points for me was understanding that Core Data is not a relational database but a persistent object store with many features to manage the life-cycle of an object. I think some of the confusion comes from the fact the Core Data can use SQLite as the underlying object store but that is an implementation detail that can lead you astray if you are not careful.

So if Core Data is not a relational database how do you do those things that would be easy if you could just use an SQL query? A Core Data fetch request with a combination of predicates and sort descriptors is a very flexible mechanism that covers many of the most common queries you might need for retrieving objects. However when you are more interested in querying for specific values such as the minimum or maximum value of an attribute an alternative approach using expressions can be easier and more efficient.

To illustrate the code snippets in this post I will assume a very simple Core Data model with a single entity to represent a task in a todo list:

Retrieving the minimum value of an attribute

Each task object contains an NSDate attribute which indicates when the task was first created:

@property (nonatomic, retain) NSDate * createdAt;

Suppose I want to find out what the oldest creation date is for all of the tasks. A first approach might be to use a fetch request to retrieve the first task after sorting all of the tasks by the creation date (with an ascending sort order):

NSFetchRequest *fetchRequest = [[NSFetchRequest alloc] init];
NSEntityDescription *entity = [NSEntityDescription
    entityForName:@"Task"
    inManagedObjectContext:self.managedObjectContext];
[fetchRequest setEntity:entity];
[fetchRequest setFetchLimit:1];

NSSortDescriptor *sortDescriptor = [[NSSortDescriptor alloc]
    initWithKey:@"createdAt"
    ascending:YES];
[fetchRequest setSortDescriptors:[NSArray arrayWithObject:sortDescriptor]];
[sortDescriptor release];

NSError *error = nil;
NSArray *fetchResults = [self.managedObjectContext
                         executeFetchRequest:fetchRequest
                         error:&error];

Task *oldest = [fetchResults lastObject];
NSLog(@"oldest = %@",oldest.createdAt);

Note that the fetch limit (setFetchLimit) is set to 1 as we only want the first object in the sorted list of all tasks. This is not a bad solution but as we will see we can do much better. To understand what Core Data is doing under the covers it can be very useful to turn on some debugging. In particular we can get Core Data to show us the underlying SQL queries it is using by setting the argument -com.apple.CoreData.SQLDebug to 1 on application launch. With Xcode 4 the launch arguments are set in the project scheme window:

With the debugging enabled we can see the query that Core Data is using and the fetch execution time which will be useful when comparing the performance of different approaches:

2012-01-19 20:31:37.864 ToDoSync[3455:fb03] CoreData: sql: SELECT 0,
t0.Z_PK, t0.Z_OPT, t0.ZCOMPLETE, t0.ZCREATEDAT, t0.ZNOTE, t0.ZTITLE FROM
ZTASK t0 ORDER BY t0.ZCREATEDAT LIMIT 1
2012-01-19 20:31:37.875 ToDoSync[3455:fb03] CoreData: annotation: sql
connection fetch time: 0.0742s
2012-01-19 20:31:37.875 ToDoSync[3455:fb03] CoreData: annotation: total
fetch execution time: 0.0784s for 1 rows.

The debugging output shows us that Core Data is performing a select to retrieve all of the attributes of the task object, ordered by creation data with a query limit of 1. The total fetch execution time in this case was 0.0784s based on a database containing 5,000 tasks running on a fourth generation iPod touch test device. Note that any time you are looking to optimise performance it is a good advice to run the code on an actual device. Running on the iOS Simulator will give you much faster performance due to the obviously greater performance of the host computer.

Restricting The Properties to Fetch

Before looking at the use of expressions there is one minor optimisation that we could consider applying to the previous fetch request. Since we are interested only in the creation date we can modify the fetch request to make it only retrieve that one property:

[fetchRequest setResultType:NSDictionaryResultType]; 
[fetchRequest setPropertiesToFetch:[NSArray arrayWithObject:@"createdAt"]];

By default the result type of a fetch request is NSManagedObjectResultType which as the name implies means we will get managed objects back from the fetch. To specify that we want one or more properties of an object you first need to make the fetch request return a dictionary by setting the result type to NSDictionaryResultType and then you set the properties to fetch by passing it an array containing the names of the properties you want back. In this case we just want the createdAt property. Now when we execute the fetch request we get back an array containing a single dictionary (since we set a fetch limit of 1) which contains the single property which in this case is an NSDate:

NSDate *oldest = [[fetchResults lastObject] valueForKey:@"createdAt"] 

Looking at the SQL debug you can see that the select statement now only retrieves the single attribute

2012-01-19 20:59:26.484 ToDoSync[18564:707] CoreData: sql: SELECT
t0.ZCREATEDAT FROM ZTASK t0 ORDER BY t0.ZCREATEDAT LIMIT 1
2012-01-19 20:59:26.535 ToDoSync[18564:707] CoreData: annotation: sql
connection fetch time: 0.0505s
2012-01-19 20:59:26.539 ToDoSync[18564:707] CoreData: annotation: total
fetch execution time: 0.0545s for 1 rows.

There is a performance improvement in that this fetch executes in around 0.05s compared to 0.07s for the previous example. However this may often turn out to be a false optimisation if after determining the earliest creation date we then shortly afterwards find we want to retrieve the actual task with this creation date. In that case it is generally better to just retrieve the full task as in the original query so that Core Data already has it cached ready for when we need it.

Using an Expression

A better way to solve this type of query is actually to create an expression with the function that we want to perform. Unfortunately there is a little bit more code required though we start as with the previous example by constructing a fetch request that will return a dictionary result:

NSFetchRequest *fetchRequest = [[NSFetchRequest alloc] init];
NSEntityDescription *entity = [NSEntityDescription
        entityForName:@"Task"
        inManagedObjectContext:self.managedObjectContext];
[fetchRequest setEntity:entity];
[fetchRequest setResultType:NSDictionaryResultType];

We then create an expression which specifies the function we want to use and the key-path of the property we want to apply it to. So for our example where we want the minimum of the createdAt property:

NSExpression *keyPathExpression = [NSExpression
              expressionForKeyPath:@"createdAt"];
NSExpression *earliestExpression = [NSExpression 
              expressionForFunction:@"min:"
              arguments:[NSArray arrayWithObject:keyPathExpression]];

There are a wide range of functions that we could apply including average:, sum:, min:, max:, median:, sqrt:, etc., for the full list check the documentation for the NSExpression class. Unfortunately that is not all we need to do as we must also create an expression description to specify the result type we are expecting from the fetch request:

NSExpressionDescription *earliestExpressionDescription =
      [[NSExpressionDescription alloc] init];
[earliestExpressionDescription setName:@"earliestDate"];
[earliestExpressionDescription setExpression:earliestExpression];
[earliestExpressionDescription setExpressionResultType:NSDateAttributeType];

The key point is that we need to set a name for the expression which we will use when retrieving the result - remember that we have already specified that the fetch request should give us back a dictionary containing the result. The name of the expression will be our key into that dictionary. We also need to specify that we expect the result type of the expression to be an NSDate object. Finally we can set the properties to fetch using our expression description and execute the fetch request:

[fetchRequest setPropertiesToFetch:[NSArray arrayWithObject:
              earliestExpressionDescription]];
NSError *error = nil;
NSArray *fetchResults = [self.managedObjectContext
                         executeFetchRequest:fetchRequest
                         error:&error];

The NSArray we get back as the fetch result should contain a single NSDictionary object which contains the NSDate object stored using the expression description name as the key:

NSDate *oldest = [[fetchResults lastObject]
                  valueForKey:@"earliestDate"]; 

Finally just for completeness and assuming you are not using ARC we should release a few things:

[earliestExpressionDescription release];
[fetchRequest release];

This is a lot more code than the original solution but the SQL debug log shows some interesting results:

2012-01-19 21:47:39.292 ToDoSync[18639:707] CoreData: sql: SELECT
min( t0.ZCREATEDAT) FROM ZTASK t0
2012-01-19 21:47:39.304 ToDoSync[18639:707] CoreData: annotation: sql
connection fetch time: 0.0121s
2012-01-19 21:47:39.308 ToDoSync[18639:707] CoreData: annotation: total
fetch execution time: 0.0162s for 1 rows.

I find it somewhat amusing that the more code we write the smaller the underlying Core Data SQLite query gets :-) This fetch request, executed on the same device and dataset as before, executes in 0.0162s which is considerably faster than the original query which took over 0.07s. The reason is obvious if you take a look at the SQL query being used as Core Data is using SQLite to perform the min function directly on the createdAt property in the database avoiding the need to retrieve all 5,000 values.

Where expressions really start to become effective is when you need to perform multiple calculations on the same dataset. So suppose that we want to calculate both the earliest and the latest creation dates. All we need to do is construct a second expression:

NSExpression *latestExpression =
    [NSExpression expressionForFunction:@"max:"
    arguments:[NSArray arrayWithObject:keyPathExpression]];
NSExpressionDescription *latestExpressionDescription =
    [[NSExpressionDescription alloc] init];
[latestExpressionDescription setName:@"latestDate"];
[latestExpressionDescription setExpression:latestExpression];
[latestExpressionDescription setExpressionResultType:NSDateAttributeType];

This time we are using the max: function and we have named our expression latestDate. Note that we do not need to use a separate fetch request for each of these expressions. We can set our properties to fetch to include both expressions and execute a single fetch request:

[fetchRequest setPropertiesToFetch:[NSArray 
     arrayWithObjects:earliestExpressionDescription,
     latestExpressionDescription, nil]];

Now when we execute the fetch request we get back a dictionary containing two entries representing both the earliestDate and latestDate results:

NSError *error = nil;
NSArray *fetchResults = [self.managedObjectContext
                         executeFetchRequest:fetchRequest
                         error:&error];
NSDate *oldest = [[fetchResults lastObject] valueForKey:@"earliestDate"];
NSDate *latest = [[fetchResults lastObject] valueForKey:@"latestDate"];

As the SQL debug shows us this query to calculate both the earliest and latest dates executes almost as fast as the fetch for just the earliest date and is still many times faster than the original approach of sorting the property with Core Data:

2012-01-19 22:06:33.153 ToDoSync[18681:707] CoreData: sql: SELECT
min( t0.ZCREATEDAT), max( t0.ZCREATEDAT) FROM ZTASK t0
2012-01-19 22:06:33.170 ToDoSync[18681:707] CoreData: annotation: sql
connection fetch time: 0.0174s
2012-01-19 22:06:33.175 ToDoSync[18681:707] CoreData: annotation: total
fetch execution time: 0.0220s for 1 rows.

Wrapping Up

Using NSExpression is perhaps not the most intuitive way to perform complex queries and calculations on Core Data sets. However I think it is worth spending some time mastering them as the performance improvements can be significant, especially when you need to frequently repeat a calculation with a large data set.