A blog exploring advanced programming topics in Swift.

Open Sourcing Identified Collections

Monday Jul 12, 2021

We are excited to announce the 0.1.0 release of Identified Collections and its first member, IdentifiedArray: a data structure for working with collections of identifiable elements in a performant way.

Motivation

When modeling a collection of elements in your application’s state, it is easy to reach for a standard Array. However, as your application becomes more complex, this approach can break down in many ways, including accidentally making mutations to the wrong elements or even crashing. 😬

For example, if you were building a “Todos” application in SwiftUI, you might model an individual todo in an identifiable value type:

struct Todo: Identifiable {
  var description = ""
  let id: UUID
  var isComplete = false
}

And you would hold an array of these todos as a published field in your app’s view model:

class TodosViewModel: ObservableObject {
  @Published var todos: [Todo] = []
}

A view can render a list of these todos quite simply, and because they are identifiable we can even omit the id parameter of List:

struct TodosView: View {
  @ObservedObject var viewModel: TodosViewModel

  var body: some View {
    List(self.viewModel.todos) { todo in
      …
    }
  }
}

If your deployment target is set to the latest version of SwiftUI, you may be tempted to pass along a binding to the list so that each row is given mutable access to its todo. This will work for simple cases, but as soon as you introduce side effects, like API clients or analytics, or want to write unit tests, you must push this logic into a view model, instead. And that means each row must be able to communicate its actions back to the view model.

You could do so by introducing some endpoints on the view model, like when a row’s completed toggle is changed:

class TodosViewModel: ObservableObject {
  …
  func todoCheckboxToggled(at id: Todo.ID) {
    guard let index = self.todos.firstIndex(where: { $0.id == id })
    else { return }

    self.todos[index].isComplete.toggle()
    // TODO: Update todo on backend using an API client
  }
}

This code is simple enough, but it can require a full traversal of the array to do its job.

Perhaps it would be more performant for a row to communicate its index back to the view model instead, and then it could mutate the todo directly via its index subscript. But this makes the view more complicated:

List(self.viewModel.todos.enumerated(), id: \.element.id) { index, todo in
  …
}

This isn’t so bad, but at the moment it doesn’t even compile. An evolution proposal may change that soon, but in the meantime List and ForEach must be passed a RandomAccessCollection, which is perhaps most simply achieved by constructing another array:

List(Array(self.viewModel.todos.enumerated()), id: \.element.id) { index, todo in
  …
}

This compiles, but we’ve just moved the performance problem to the view: every time this body is evaluated there’s the possibility a whole new array is being allocated.

But even if it were possible to pass an enumerated collection directly to these views, identifying an element of mutable state by an index introduces a number of other problems.

While it’s true that we can greatly simplify and improve the performance of any view model methods that mutate an element through its index subscript:

class TodosViewModel: ObservableObject {
  …
  func todoCheckboxToggled(at index: Int) {
    self.todos[index].isComplete.toggle()
    // TODO: Update todo on backend using an API client
  }
}

Any asynchronous work that we add to this endpoint must take great care in not using this index later on. An index is not a stable identifier: todos can be moved and removed at any time, and an index identifying “Buy lettuce” at one moment may identify “Call Mom” the next, or worse, may be a completely invalid index and crash your application!

class TodosViewModel: ObservableObject {
  …
  func todoCheckboxToggled(at index: Int) async {
    self.todos[index].isComplete.toggle()

    do {
      // ❌ Could update the wrong todo, or crash!
      self.todos[index] = try await self.apiClient.updateTodo(self.todos[index])
    } catch {
      // Handle error
    }
  }
}

Whenever you need to access a particular todo after performing some asynchronous work, you must do the work of traversing the array:

class TodosViewModel: ObservableObject {
  …
  func todoCheckboxToggled(at index: Int) async {
    self.todos[index].isComplete.toggle()

    // 1️⃣ Get a reference to the todo's id before kicking off the async work
    let id = self.todos[index].id

    do {
      // 2️⃣ Update the todo on the backend
      let updatedTodo = try await self.apiClient.updateTodo(self.todos[index])

      // 3️⃣ Find the updated index of the todo after the async work is done
      let updatedIndex = self.todos.firstIndex(where: { $0.id == id })!

      // 4️⃣ Update the correct todo
      self.todos[updatedIndex] = updatedTodo
    } catch {
      // Handle error
    }
  }
}

Introducing: identified collections

Identified collections are designed to solve all of these problems by providing data structures for working with collections of identifiable elements in an ergonomic, performant way.

Most of the time, you can simply swap an Array out for an IdentifiedArray:

import IdentifiedCollections

class TodosViewModel: ObservableObject {
  @Published var todos: IdentifiedArrayOf<Todo> = []
  …
}

And then you can mutate an element directly via its id-based subscript, no traversals needed, even after asynchronous work is performed:

class TodosViewModel: ObservableObject {
  …
  func todoCheckboxToggled(at id: Todo.ID) async {
    self.todos[id: id]?.isComplete.toggle()

    do {
      // 1️⃣ Update todo on backend and mutate it in the todos identified array.
      self.todos[id: id] = try await self.apiClient.updateTodo(self.todos[id: id]!)
    } catch {
      // Handle error
    }

    // No step 2️⃣ 😆
  }
}

You can also simply pass the identified array to views like List and ForEach without any complications:

List(self.viewModel.todos) { todo in
  …
}

Identified arrays are designed to integrate with SwiftUI applications, as well as applications written in the Composable Architecture.

Design

IdentifiedArray is a lightweight wrapper around the OrderedDictionary type from Apple’s Swift Collections. It shares many of the same performance characteristics and design considerations, but is better adapted to solving the problem of holding onto a collection of identifiable elements in your application’s state.

IdentifiedArray does not expose any of the details of OrderedDictionary that may lead to breaking invariants. For example an OrderedDictionary<ID, Identifiable> may freely hold a value whose identifier does not match its key or multiple values could have the same id, and IdentifiedArray does not allow for these situations.

And unlike OrderedSet, IdentifiedArray does not require that its Element type conforms to the Hashable protocol, which may be difficult or impossible to do, and introduces questions around the quality of hashing, etc.

IdentifiedArray does not even require that its Element conforms to Identifiable. Just as SwiftUI’s List and ForEach views take an id key path to an element’s identifier, IdentifiedArrays can be constructed with a key path:

var numbers = IdentifiedArray(id: \Int.self)

Performance

IdentifiedArray is designed to match the performance characteristics of OrderedDictionary. It has been benchmarked with Swift Collections Benchmark:

Benchmarking `IdentifiedArray`.

IdentifiedArray and the Composable Architecture

This data structure may sound familiar because it first shipped with the initial release of the Composable Architecture. When we open sourced the library over 15 months ago, it came with tools that assisted in breaking down larger features that work on collections of state into smaller features that work on individual elements of state, and this included IdentifiedArray. We even dedicated an episode to this topic recently.

While IdentifiedArray solved a real problem when we first introduced it, it wasn’t without its issues. Its original implementation made it easy to break the semantics of the type (e.g. having exactly at most one element for each id), and it was mostly unoptimized, causing it to have quite bad performance characteristics for certain collection operations.

The IdentifiedArray that comes with Identified Collections has been completely rewritten as a safer, more performant wrapper around OrderedDictionary. In order to avoid some of the pitfalls from the previous version of IdentifiedArray that shipped with the Composable Architecture, we took inspiration from Swift Collections by only partially conforming IdentifiedArray to some of collection protocols that are more problematic in producing invariants. While this is a breaking change, it should help prevent a whole slew of bugs, and we hope these changes will not affect most users. If you encounter any issues with the upgrade, or have any questions, please start a GitHub discussion.

Try it today

Head over to the Identified Collections repository to try the library out today. If you’re building an application in the Composable Architecture, the latest release already uses Identified Collections, so upgrade today and take it for a spin.


Subscribe to Point-Free

👋 Hey there! If you got this far, then you must have enjoyed this post. You may want to also check out Point-Free, a video series covering advanced programming topics in Swift. Consider subscribing today!