5
Feb
2013
 

Querying Objective-C Data Collections

by Matt Long

In my Xcode LLDB Tutorial, I mention using the debugger to interrogate data collections. Well, I wanted to elaborate on that idea a little because there are some techniques you can use for querying objective-c data collections that are very powerful.

If you develop apps for clients, you my be one of the lucky ones–the ones who actually get to model your data and use Core Data to store and access it. But I’m betting there are many of you who aren’t the lucky ones–or at least not on all of your projects. From time to time you have to deal with data in whatever format your client gives it to you. Maybe you’ve even suggested taking the CSVs or Plists (or whatever other formats clients have come up with to ruin your life) and actually loading those into Core Data. But they don’t get Core Data and they shoot down the idea. Well, you may want to just walk away from the gig. However, if you’re like me, you’ve got bills to pay and clients (the good ones at least) tend to help you accomplish that. Well, fortunately for us, Objective-C makes dealing with this kind of data manageable using a little technique known as KVC, Key-Value-Coding, with array filtering and sorting.

This is not an advanced topic, so if you’re already familiar with how KVC and array filtering and sorting works, this post may not help you as much. But for those of you who are fairly new to iOS development, you need to know about this magical feature of the language as all the senior iOS developers use it and you should too.

Querying Objective-C Data Collections Is As Easy As a Where Clause

I’m going to assume that you have some experience with relational databases. In a regular SQL query, when you want to filter a list of records based on some criteria, you use a where clause to ensure that you only get back the records you want from the database. Say, for example you wanted to query a list of hospitals by county. Your query might look something like this:

SELECT * FROM hospitals WHERE county = 'CLARKE'

The record set you get back after this query should only be the hospitals found in Clarke county.

Let’s consider an array of dictionary objects–each one of which has a set of keys and values representing each of fields in a hospital record. You’ve loaded this data into an NSArray from a property list that’s been embedded in your app. Say each hospital record in the array is declared as a dictionary and looks something like this when printed out in the debugger:

{
    Address = "220 Hospital Drive";
    City = Jackson;
    County = CLARKE;
    Hospital = "JACKSON MEDICAL CENTER--010128";
    State = AL;
    Zip = 36545;
}

There are thousands of these in your array, so how do you get only the ones where the county is Clarke, for example?

Well, it’s simple and similar to our SQL query:

NSArray *filteredItems = [hospitals filteredArrayUsingPredicate:
                                       [NSPredicate predicateWithFormat:@"County == %@", @"Clarke"]];

Now filteredItems contains only the hospital records where the county is Clarke.

Let’s say that now you only want to get a list of the hospital names in the array. You can simply specify the name of that field as a key path, which will result in another array. Here’s what I mean

NSArray *hospitalNamesInClarke = [filteredItems valueForKeyPath:@"Hospital"];

The array, hospitalNamesInClarke now contains a list of strings for all of the hospitals in Clarke county Alabama.

Delving Deeper

Just for discussion sake, let’s think about what programmers often do when they first learn object-oriented programming. They often build complex object hierarchies to model all of the data they have to work with. Don’t get me wrong, when using Core Data, I take advantage of mogenerator to generate my managed objects so that I can add some smart methods to my model objects for convenience and clarity. However, when I get a data stream back from, for example, a JSON payload, it doesn’t make sense to create a separate model class to hold my data until I can parse them into Core Data. It makes much more sense to leave them as dictionaries and just query them directly using key value coding. Consider a single record stored in memory as a dictionary. It might look something like this:

Single Hospital Record

If you were to take this dictionary and rotate it clockwise, making the keys actually a header–something like this:

Querying Objective-C Data Collections: Single Record Table

Then consider a bunch of records in rows like this:

Querying Objective-C Data Collections: Multiple Records

It’s starting to look familiar isn’t it? The database table analogy makes a lot more sense when you look at it this way. An array of dictionaries can be thought of as an array of records you can run queries on.

Sorting, Filtering, and Aggregating

Yep. It’s all possible. We’ve already talked about filtering. Remember this code from earlier:

NSArray *filteredItems = [hospitals filteredArrayUsingPredicate:
                                [NSPredicate predicateWithFormat:@"County == %@", @"CLARKE"]];

Well, sorting is just as easy:

NSArray *sortedItems = [hospitals sortedArrayUsingDescriptors:@[[NSSortDescriptor 
                                               sortDescriptorWithKey:@"County" ascending:YES]]];

Or you could have multiple sort descriptors:

NSArray *sortedItems = [hospitals sortedArrayUsingDescriptors:@[[NSSortDescriptor 
                                               sortDescriptorWithKey:@"State" ascending:YES],
                                                                   [[NSSortDescriptor 
                                               sortDescriptorWithKey:@"County" ascending:YES]]];

This sort will sort on State first and then County.

Aggregating your data is also really powerful. Say you have added another “column” to your dictionary record called AnnualERVisitors. If you wanted to get a sum of all the ER visitors for all hospitals in a certain county, you could do a filter first, and then perform a sum on the results. Something like this:

NSArray *filteredItems = [hospitals filteredArrayUsingPredicate:
                                [NSPredicate predicateWithFormat:@"County == %@", @"CLARKE"]];
NSNumber *totalERVisitors = [filteredItems valueForKeyPath:@"@sum.AnnualERVisitors"];

The variable totalERVisitors now contains the sum of all of the AnnualERVisitors values in Clarke county. This special operator, @sum, provides the ability to automatically sum all of the values in the AnnualERVisitors field. You can now manipulate that NSNumber any way you like by getting its primitive value, for example:

NSInteger totalVisitorsCount = [totalERVisitors integerValue];
// Do a little math. Make a little love. Get down tonight.

So you’re probably curious about what else you could do. First, let’s consider the list of operators we have at our disposal. Here are the list of operators according to the Key-Value Programming Guide:

Collection Operators

– @sum
– @avg
– @count
– @max
– @min

Object Operators

– @distinctUnionOfObjects
– @unionOfObjects

Array and Set Operators

– @distinctUnionOfArrays
– @unionOfArrays
– @distinctUnionOfSets

If you want detailed descriptions of each of these operators, follow the link above to the Apple website.

In my Xcode LLDB Tutorial I cover using the debug terminal to analyze/interrogate your data structures. In the sample project I’ve included with this post, you can use the same techniques from that tutorial to debug and set a breakpoint at the end of the viewDidLoad: method. It should look something like this:

Querying Objective-C Data Collections: Code for viewDidLoad:

When it breaks, enter this line into the debug terminal:

po (NSNumber*)[(NSArray*)[_hospitals filteredArrayUsingPredicate:
                                       (NSPredicate*)[NSPredicate predicateWithFormat:@"County == %@", @"Clarke"]] 
                                                                           valueForKeyPath:@"@sum.AnnualERVisitors"]

This will produce the following output:

(lldb) po (NSNumber*)[(NSArray*)[_hospitals filteredArrayUsingPredicate:(NSPredicate*)[NSPredicate predicateWithFormat:@"County == %@", @"Clarke"]] valueForKeyPath:@"@sum.AnnualERVisitors"]
$1 = 0x080688a0 9744
(lldb)

This has taken all of the values in the AnnualERVisitors “column” and added them together giving us a result of 9,744 annual visitors in Clarke county. Say we wanted to get the average of all visitors in the hospitals in our list. Try this command in the debug console:

po [_hospitals valueForKeyPath:@"@avg.AnnualERVisitors"]

And this will produce the following output:

(lldb) po [_hospitals valueForKeyPath:@"@avg.AnnualERVisitors"]
$2 = 0x08068a00 4575.25
(lldb)

The @avg operator has returned the average ER visitors of all of the hospitals in our collection, 4,575.25.

Distinct Values

There are instances where you want to grab all of the values for a particular property in your collection, but you only want distinct/unique values. In our same sample code, run to the breakpoint I mentioned earlier and then run this command in the debug console:

po [_hospitals valueForKeyPath:@"County"]

This will produce the following output:

(lldb) po [_hospitals valueForKeyPath:@"County"]
$3 = 0x08157290 <__NSArrayI 0x8157290>(
Clarke,
Clarke,
PASCO,
COOK
)

Notice that Clarke has shown up twice. This is because two of our records represent hospitals in the same county. In order to get a list of distinct values, we can use this command instead:

po [_hospitals valueForKeyPath:@"@distinctUnionOfObjects.County"]

And the output of this command is:

(lldb) po [_hospitals valueForKeyPath:@"@distinctUnionOfObjects.County"]
$4 = 0x07555c80 <__NSArrayI 0x7555c80>(
PASCO,
Clarke,
COOK
)

Notice Clarke only displays once now. The @distinctUnionOfObjects operator has returned only unique values. This is what we were looking for.

Conclusion

So, while we’ve been using debugger commands in the examples here, you can instead use the commands in your code. Just remove the ‘po’ and set a variable with the result of the valueForKeypath: calls in your code and you can manipulate the results in any way you like. Querying your collections is a very powerful coding technique you should master. It often provides a great way to reduce the amount of code you need to use to get the values you want. That being said, there are times when these types of data collection queries are just too expensive. If that’s the case, you’ll have to optimize. Just like everything else in programming, there are tradeoffs. These techniques won’t work in all situations, but they often will and are worth exploring to solve the problem of obtaining just the values you need. Until next time.

Download The Sample Project, CollectionSearch