This article is also available in Chinese.

While Swift’s actual native Dictionary type has a pretty complex implementation (undoubtedly for performance reasons), Swift gives us the tools to write a beautiful and simple one with very little code. We’ll start small and add features as we go.

A quick recap of how dictionaries work: dictionaries allow you to set and retrieve a value using a key, which can be any type. They are usually backed by an array, although they can be backed by a tree as well. We will explore an array-backed dictionary in this post, mostly because I don’t know how tree-backed dictionaries work yet.

Since our dictionary will be backed with an array, we need a way to convert the key that’s given into an integer, and then a way to force that integer into the bounds of our array. These two methods are the hashing function and the modulus operation, respectively. By hashing, we can consistently turn our key (usually a string, but can be any type that is Hashable) into a number, and by taking the modulus with respect to the length of the array, we’ll get a consistent slot in the array to set and retrieve the value.

I took some inspiration from Mike Ash’s Let’s Build NSMutableDictionary, particularly the rules for resizing.

Let’s get started. We know that we want this to be generic over Key and Value, and that Key has to be Hashable. Something that is Hashable is also Equatable, which we will need as well, but we will get for free.

struct Dictionary<Key, Value where Key: Hashable> {
	private var storage = //some Array...
    
    subscript(key: Key) -> Value? {
        get {
	        return nil
        }
        set {
        }
    }
}

This is the basic structure of our type. We know our array has to be generic over …something, but we don’t know what yet. Because there may be collisions, which are two different Key objects that hash and mod to the same position in the array, each placeholder object will need to support storing multiple values. Let’s design Placeholder:

struct Placeholder<Key, Value where Key: Hashable> {
    var values: [(Key, Value)] = []
}

This object will hold many keys and values that have all hashed and modded to the same spot in the dictionary. If we’ve designed our dictionary well, there won’t be more than one key-value pair in each Placeholder often, but it will happen. A nice implementation of this Dictionary might use a linked list for the values property of the Placeholder. I’ll leave that as an exercise for the reader.

Now that we know roughly what Placeholder looks like, we know what the storage will look like.

private var storage = Array(count: 8, repeatedValue: Placeholder<Key, Value>())

We start with a randomly chosen size for the array. It helps to pick a power of two, because modding by a power of 2 is a little bit faster than any other number. Until we implement resizing, it won’t really matter what size we pick. With the Array(count:repeatedValue:) constructor, each spot in the array now has a placeholder that we can add values to.

To set values in the dictionary, we need to hash the key (and absolute value it, since the hash can sometimes come back as negative), and then mod it by the size of the array.

set {
    let position = abs(key.hashValue) % storage.count
    //...
}

To actually add this value to the dictionary, we need to a) make sure the new value is not nil, b) find the placeholder at position, and c) add the key and value to the placeholder.

set {
    guard let value = newValue else { return }
    let position = abs(key.hashValue) % storage.count
    storage[position].values.append((key, value))
}

For a basic implementation of the setter, that’s really all we need. (I’ve left out some small details, like what happens when you try to set the same key twice. We’ll tackle that soon.)

For fetching the value, process is pretty similar. Hash the key, absolute value, mod, and then we have the placeholder at that position. To retrieve the value from the placeholder, I’m going to delegate that to the placeholder itself.

get {
    let position = abs(key.hashValue) % storage.count
    return storage[position].firstValue(matchingKey: key)
}

That method is where the real magic will happen. We need to find the first key-value pair that has that key. A method on SequenceType called first(where:) will land in Swift 3, but until we have that bounty, we will need to use the longhand lazy.filter( /* block */ ).first

func firstValue(matchingKey key: Key) -> Value? {
    let matchingKeyValuePair = values.lazy.filter({ $0.0 == key }).first
    //...
}

Once we have the tuple that represents the pair, we can call .1 on it to extract the value.

func firstValue(matchingKey key: Key) -> Value? {
    return values.lazy.filter({ $0.0 == key }).first?.1
}

That’s pretty much it for a basic implementation of Dictionary. 23 lines of Swift. You can find all the code together as a gist here.

There are a few interesting things left to implement on our type. First, an extremely lazy implementation of generate() so that our Dictionary conforms to SequenceType:

extension Dictionary: SequenceType {
    typealias Generator = IndexingGenerator<[(Key, Value)]>
    func generate() -> Dictionary.Generator {
        return storage.flatMap({ $0.values }).generate()
    }
}

Next, a way to remove keys. First, from the placeholder:

extension Placeholder {
    mutating func removeValue(key key: Key) {
        values = values.filter({ $0.0 != key })
    }
}

and then on Dictionary itself:

extension Dictionary {
    mutating func remove(key key: Key) {
        let position = abs(key.hashValue) % storage.count
        storage[position].removeValue(key: key)
    }
}

We’ll add a call to remove(key:) before we set any new values. This will ensure the same key can’t point to two different values.

Lastly, let’s take a look at resizing. When a dictionary has too many objects in it, its backing storage should resize itself. Typically, “too many objects” is defined as having a load factor (number of objects divided by backing array length) above 2/3 or 3/4. I chose 0.7.

extension Dictionary {
	private let maxLoadFactor = 0.7
	
	private var size: Int {
		return storage.count
	}
	
	var count: Int {
		return storage.flatMap({ $0.values }).count
	}
	
	var currentLoadFactor: Double {
		return Double(count) / Double(size)
	}
}

(The implementation of count is extremely lazy, again. Ideally, the Dictionary would keep track of how many objects have been added to — and removed from — it, but that is tougher than it looks.)

mutating func resizeIfNeeded() {
    if currentLoadFactor > maxLoadFactor {
        //resize storage
    }
}

This is the part where Swift’s value semantics get really really weird. When Mike Ash built his NSMutableDictionary, he created a fixed-size dictionary, and wrapped it in a mutable size dictionary. When he needed to resize, he would generate a new fixed-size dictionary (with double the size), and copy all the items to it manually.

We don’t have to do this in Swift. In a Swift struct, assigning self to a variable makes a full copy of it.

let oldDictionary = self
//...

Once we have a copy of the dictionary, we can reset our storage variable to an array with double the size. (Doubling the size ensures that we remain a power of two.)

//...
storage = Array<Placeholder<Key, Value>>(count: size*2, repeatedValue: Placeholder<Key, Value>())
//...

Once that’s done, our dictionary is empty, and we have to copy all the values from the oldDictionary to the current one:

//...
for (key, value) in oldDictionary {
    self[key] = value
}

Here’s the complete resizeIfNeeded() function:

mutating func resizeIfNeeded() {
    if Double(count) / Double(size) > maxLoadFactor {
        let oldDictionary = self
        storage = Array<Placeholder<Key, Value>>(count: size*2, repeatedValue: Placeholder<Key, Value>())
        for (key, value) in oldDictionary {
            self[key] = value
        }
    }
}

self, inside a Swift struct, is a way of accessing the values and functions of the current type, but it’s also a very moldable and mutable thing. You can set it to new values, copy it to others, and generally treat it as if it were just another variable reference.

I joked two weeks ago that Swift is a more dynamic language than Objective-C, Ruby, or Python because you can change its falsiness behavior, but here we have another situation where you can mutate something in Swift that you couldn’t in Objective-C: the reference to self itself. We could have written self = Dictionary(size: size*2) in this method, and it would have been perfectly valid Swift. To developers who are used to writing object-oriented code where an object’s identity is paramount, this is fundamentally weird.

The full implementation, with sequencing, removing, and resizing can be found as a gist. Besides the lazy count/generate() implementations, I’m pleased with the way this little project turned out.

If you are thinking that the performance of this Dictionary is must be absolutely treacherous, you would be right. This dictionary is written to prefer simple code and concepts to performance. If you want to write a much more performant Dictoinary, you can use a more advanced technique called linear probing..