Episteme and Techne     About     Projects     Archive     Feed

Sequences and Generators

If you’ve written code in Swift, you may have seen the for in (or foreach) loop at some point. For those who aren’t acquainted with it, an simple example follows:

// prints "first", "second", and "third", each on its own line
for item in ["first", "second", "third"] {
    println(item)
}

This loop runs once for each object in the array. Each time it runs, it binds the current object to the constant item, allowing the code within the loop to perform some operation based on that object. Since in this case the array contains strings, item is automatically inferred to be of type String.

It turns out you can also write a foreach loop to iterate over the items in a dictionary…

for (number, name) in [1: "one", 2: "two", 3: "three"] {
  println("The number \(number) is named \(name)")
}

…or a range.

var buffer = "0"
for index in 1..<10 {
  buffer += " \(index)"
}
// buffer is now "0 1 2 3 4 5 6 7 8 9"

But Array, Dictionary, and Range are Swift built-in types. What if we wanted to build our own collection and iterate over it using a foreach loop? Let’s try doing so and see what happens.

Building a Linked List

The linked list is a fundamental data structure, one of the first data structures taught to computer science students. It consists of a chain of nodes which each contain a value, as well as a reference to the next node in the chain.

One elegant way to reason about linked lists is to first imagine an empty list, a special list containing zero elements. We can then construct a list using the following technique:

  • We start with the empty list: []
  • We prepend the first item A to the empty list to form a list with one element: [A]--> []
  • Then we prepend the second item B to that list to form a list with two elements: [B]--> [A]--> []
  • …and so forth

A Swift linked-list implementation using classes and inheritance follows:

class List<T> { }

final class ConsList<T> : List<T> {
  let value : T
  let next : List<T>

  init(_ value: T, next: List<T>) {
    self.value = value
    self.next = next
  }
}

final class NilList<T> : List<T> { }

Linked lists are defined recursively. The NilList class serves as our empty list and base case. All other lists are defined as a ConsList containing a value and followed by a slightly smaller linked list.

Sequences and Generators

Now that we have a linked list type, we want to be able to iterate through the linked list nodes in order using a foreach loop.

The secret to doing so lies in the SequenceType protocol. In Swift, protocols can confer special powers to types that choose to conform to them, and SequenceType is no exception. Indeed, any type that adopts SequenceType can be used with foreach loops.

The SequenceType protocol defines a single function: generate(), which takes no arguments and returns a Generator. Let’s go ahead and begin extending our List class:

// Initial attempt
extension List : SequenceType {
  func generate() -> Generator {
    // ???
  }
}

In order to proceed, we need to know what generate() is supposed to do, and what a Generator is.

Generators

A generator is a helper object that is related to the sequence we want to iterate through. Generator is not the name of a predefined Swift class, but rather an associated type belonging to the SequenceType protocol.

You can think of associated types as a ‘wildcard’ type that you have to fill in when you write the code that makes your class or struct conform to the protocol. In this case, SequenceType wants us to define our own generator type and then have our List’s generate() method return one of those generators.

Let’s sketch out a generator for our linked list type. If we look at the definition of Generator, we notice that our new generator has to conform to the GeneratorType protocol in order to qualify as a generator. GeneratorType in turn defines a single function, next():

// Initial attempt
struct ListGenerator: GeneratorType {
  mutating func next() -> Element? {
    // ???
  }
}

Note that Element is another associated type, this one belonging to the GeneratorType protocol. At this point, we have a bunch of interlocking components:

  • We have our List, which needs to conform to SequenceType
  • …and to do so, we’ve defined a ListGenerator.
  • In turn, our ListGenerator has to conform to GeneratorType
  • …which means we need to figure out what Element is.

Purpose

To figure out how these components all fit together, we’ll step back and discuss the roles that our generator, generate(), and next() functions are intended to fulfill.

As mentioned before, a generator is an object associated with a sequence. At the beginning of the foreach loop, a generator for the sequence being iterated through is created using the generate() method.

The job of the generator is to repeatedly vend out the ‘next’ item in the sequence whenever its next() method is called. When the sequence is completely consumed, next() returns nil. In order to perform this job, the generator might need to keep track of state, such as its current position within the sequence.

Building ListGenerator

Let’s return to our ListGenerator. We want next() to return whatever type of element its parent List contains. Since List is generic, we should also make ListGenerator generic, and replace Element with T:

struct ListGenerator<T> : GeneratorType {
  mutating func next() -> T? {
    // ???
  }
}

Our ListGenerator is going to walk through the list, advancing one node every time its next() method is called. In order to keep track of this state, let’s add a property tracking the generator’s current position, currentNode.

struct ListGenerator<T> : GeneratorType {
  // Note: our ListGenerator is a generator for a List of type T...
  var currentNode : List<T>

  // ...and our next() function returns an element of type T
  mutating func next() -> T? {
    // ???
  }
}

Now we can fill out our next() function. We’ll also add an initializer for good measure.

struct ListGenerator<T> : GeneratorType {
  var currentNode : List<T>

  init(head: List<T>) {
    currentNode = head
  }

  mutating func next() -> T? {
    switch currentNode {
    case let cons as ConsList<T>:
      currentNode = cons.next
      return cons.value
    default:  // e.g. nil node
      return nil
    }
  }
}

next() does one of two things. If the current node is a ConsList, it advances the currentNode to the next node and then returns the current value. If the current node is the NilList, it just returns nil.

Now let’s close the circle and finish implementing our generate() function.

extension List : SequenceType {
  func generate() -> ListGenerator<T> {
    return ListGenerator(head: self)
  }
}

All we’re doing here is creating a fresh ListGenerator set to the first node in the list and returning it.

Trying it out

At this point, we’re done! Let’s try it out. (Don’t leave out the type annotation on myList; Swift’s type inference engine still has issues.)

let myList : List<Int> = ConsList(10,
  next: ConsList(15,
    next: ConsList(18,
      next: NilList())))

for item in myList {
  println("the item is \(item)")
}
// Output:
// the item is 10
// the item is 15
// the item is 18

Pretty cool! Our foreach-compatible List type takes its deserved place among the ranks of such exalted built-in types as Array, Repeat, and UnsafeBufferPointer.

Lazy Sequences

What do collections like List, Array, and Dictionary have in common? They all have a finite number of items. But sequences don’t need to be collections, nor do they even need to have a finite number of elements. A sequence can be something as unlikely as a constant flow of updates from the USGS’s live earthquake feed or the digits of pi.

It’s usually not feasible to pre-generate all the elements in an infinite sequence, so most such sequences are lazy; their elements are generated on-demand or as they become available. It turns out that the SequenceType and GeneratorType machinery we just learned about can be used to support lazy sequences in Swift.

Collatz sequence

The Collatz sequence, also known as the hailstone sequence, is a sequence defined by a simple algorithm that has a bizarre property: no matter what number you start off with, the sequence always seems to eventually reach 1. (Whether this is actually true for all numbers is not known.)

Let’s build the Collatz sequence in Swift. In this case, our generator will be doing all the work, so our actual Hailstone struct is very simple:

struct Hailstone : SequenceType {
  let start : Int

  func generate() -> HailstoneGenerator {
    return HailstoneGenerator(value: start)
  }
}

The generator is responsible for advancing the sequence. Since the Collatz sequence is infinite in length, it probably won’t surprise you that next() never returns nil.

struct HailstoneGenerator : GeneratorType {
  var value : Int

  mutating func next() -> Int? {
    let this = value
    // If even, divide by 2. If odd, multiply by 3 and add 1.
    value = (value % 2 == 0) ? (value / 2) : (3*value + 1)
    return this
  }
}

Now we can try it out! For fun, we’ll use the enumerate() function (which works with any sequence type) to also get the indices of each hailstone number.

for (idx, hailstone) in enumerate(Hailstone(start: 91773)) {
  println("hailstone #\(idx + 1) is: \(hailstone)")
  if hailstone == 1 {
    break
  }
}

Can you build a Hailstone instance such that this loop runs forever? Go study mathematics for a decade, prove that it’s possible or impossible before you turn 40, and win a Fields Medal!

Lazy sequences don’t need to be infinite, of course. With a little bit of work we could add a take method to Hailstone which would return another Hailstone sequence consisting of the first n Collatz numbers.

let firstTwenty : Hailstone = Hailstone(start: 12345).take(20)

Or we could have our take method return a different type like HailstoneSlice, if we wanted to keep the distinction between infinite and finite sequences clear.

The map Function

If you prefer functional programming, you may be familiar with the higher-order map function. This function takes in a collection of items of type T, as well as a transforming function that maps inputs of type T to outputs of type U. It then returns another collection of items of type U, created by applying the transformer to each item from the input collection.

Here is a very simple example of map in action, taking in an array of integers and squaring each element:

func square(a: Int) -> Int { return a * a }

let output = map([1, 2, 3, 4, 5], square)
println("output is: \(output)")
// prints: output is: [1, 4, 9, 16, 25]

Conceptually, map is similar to the foreach loop. Whereas the foreach loop contains code inside the loop body which has access to each element in turn, the map function takes in a mapping function that is fed each element of the input collection in turn.

Swift’s standard library contains map as both methods defined on certain types like Array, as well as a map function that works with any sequence type and returns an array:

/// Return an `Array` containing the results of mapping `transform`
/// over `source`.
func map<S : SequenceType, T>(source: S, transform: (S.Generator.Element) -> T) -> [T]

The function signature may look a bit intimidating, but it’s actually pretty straightforward:

  • source is of type S, which is any type that’s a sequence (i.e. conforms to SequenceType).
  • transform is a function that takes an input of type S.Generator.Element and returns an output of type T.
  • S.Generator.Element can be read as “the type of the elements returned when next() is called on the generator type returned by S’s generate() function”. Remember that S has already been defined to be a sequence.
  • T can be whatever you want it to be.
  • The function outputs an array of T-typed objects.

If we wanted our List to have a map method, we’d need to define it ourselves. However, since our List conforms to SequenceType, we can use it with the map function without writing any additional code:

func square(a: Int) -> Int { return a * a }

let myList : List<Int> = ConsList(1,
  next: ConsList(2,
    next: ConsList(3,
      next: ConsList(4,
        next: ConsList(5, next: NilList())))))
let output = map(myList, square)
println("output is: \(output)")
// prints: output is: [1, 4, 9, 16, 25]

In fact, our List (or any other SequenceType that we might build in the future) is automatically compatible with any function that works with sequences. For example, if we have a list containing comparable elements such as integers, we can use the standard library’s minElement() and maxElement() functions to find the smallest or largest elements.

Conclusion

Sequences are an example of a powerful form of abstraction that Swift adopts to great effect. Rather than baking language functionality into the compiler and granting it to only a handful of privileged built-in types, Swift exposes language features as protocols that any type can choose to adopt:

  • Equatable types can adopt Equatable and be checked for equality using ==.
  • Types with an ‘empty’ or ‘identity’ value can adopt NilLiteralConvertible to have the compiler automatically treat nil as that identity value.
  • Types that can be abstracted as sequences can adopt SequenceType and be used in foreach loops (and many other places).

A final note: I am indebted to Colin Eberhardt’s blog post on sequences, which was one of the first good analyses of Swift’s sequence and generator system. It is somewhat outdated, but still worth a read.