Unowned or Weak? Lifetime and Performance

Posted on October 27, 2016 中文版

Update 10/17: Mike Ash wrote about some recent improvements to weak references handling here.

The usual explanation that when dealing with retain cycles you should choose between unowned or weak references considering the object lifetimes is well known, but sometimes you could be still in doubt about which one you should actually use and wondering if defensively using only weak references is a good idea.

In this article, after a brief introduction, I’ll analyze the differences between the two in term of lifetime and performance with excerpts from the Swift sources to, I hope, help you choose which flavor of weak reference you should use in different circumstances.

Contents:

The Basics

As we all know, Swift leverages the good old ARC (Automatic Reference Counting) to manage memory and as a consequence, as we were used to with Objective-C, we’ll have to deal manually with retain cycles through a wise use of weak references.

If you are not familiar with ARC, you just need to know that every reference type instance will have a reference count (a simple integer value) associated with it, that will be used to keep count of the number of times an object instance is currently being referred to by a variable or a constant. Once that counter reaches zero, instances are deallocated and the memory and resources they held are made available again.

You have a retain cycle every time two instances refer in some way to each other (e.g. two classes instances that have a property that refers to the other class instance as will happen with two adjacent node instances in a doubly linked list) preventing those instances from being deallocated because the retain count has always a value greater than zero.

To solve this, in Swift but also in many other languages, the concept of weak references has been introduced, references that are not considered by ARC and that as such will not increment the retain count of your objects.

Considering that weak references do not prevent instances from being deallocated, it’s essential to always remember that at any point a weak reference could not point anymore to a valid object. Not a problem impossible to overcome, but something that we need to consider when we deal with this kind of references.

Swift has two kinds of weak references: unowned and weak.

While they serve the same purpose, they are slightly different in regard to the assumptions they do related to your instance lifetime and have different performance characteristics.

Instead of looking at this from the perspective of retain cycles between classes, we’ll discuss this in the context of closures, that from the days of Objective-C is maybe the most common situation in which you’ll have to deal with retain cycles. As happens with classes, using an external instance inside the closure creates a strong reference to that instance, or captures it, blocking its deallocation.

In Objective-C, following the standard pattern you would have declared a weak reference to that instance outside the block and then declared a strong reference to that instance inside the block to get a hold of it during the execution of the block. And obviously, checking that the reference was still valid was necessary.

To help deal with retain cycles, Swift introduces a new construct to simplify and make more explicit the capturing of external variables inside closures, the capture list. With capture lists, you can declare on top of your function the external variables that will be used specifying which kind of reference should be created internally.

Let’s see a few examples of what is the result of capturing variables in different ways.

When you don’t use capture lists, the closure will create a strong reference to the value from the outer scope:


var i1 = 1, i2 = 1

var fStrong = {
    i1 += 1
    i2 += 2
}

fStrong()
print(i1,i2) //Prints 2 and 3

Modification happening inside the closure will alter the value of the original variable as you would expect.

Using a capture list, a new constant valid inside the closure’s scope is created instead. If you don’t specify a capture modifier the constant will simply be a copy of the original value and this will work with both value types and reference types.



var fCopy = { [i1] in
    print(i1,i2)
}

fStrong()
print(i1,i2) //Prints 2 and 3  

fCopy()  //Prints 1 and 3

In the example above we are declaring the fCopy function before the call to fStrong, and it’s when the function is declared that the private constant is initialized. As you can see, when we call the second function we still print the original value for i1.

Specifying either weak or unowned before the name of an external variable with a reference type, this constant will be instead initialized as a weak reference to the original value, and this specific form of capturing is the one we use to break retain cycles.


class aClass{
    var value = 1
}

var c1 = aClass()
var c2 = aClass()

var fSpec = { [unowned c1, weak c2] in
    c1.value += 1
    if let c2 = c2 {
        c2.value += 1
    }
}


fSpec()
print(c1.value,c2.value) //Prints 2 and 2

The difference in how the two aClass captured instances are handled inside the closure is a consequence of their different characteristics.

Unowned references are used when the original instance will never be nil while the closure is reachable and are declared as implicitly unwrapped optionals. Trying to use its captured value when the original instance has been deallocated will result in a crash.

If instead, the original instance we want to capture could be nil at some point during execution, we must declare the reference as weak and verify that the reference is still valid before using it.

The Question: Unowned or Weak?

Which one of the two weak reference types should you use?

This question can be answered simply reasoning about the lifetime of the original object and the closure that references it.

unowned vs weak

There are two possible scenarios:

  • The closure has the same lifetime of the captured object, so the closure will be reachable only until the object will be reachable. The external object and the closure have the same lifetime (e.g. simple back-references between and object and its parent). In this case, you should declare the reference as unowned.

    A common example is the [unowned self] used in many examples of small closures that do something in the context of their parent and that not being referenced or passed anywhere else do not outlive their parents.

  • The closure lifetime is independent from the one of the object being captured, the closure could still be referenced when the object is not reachable anymore. In this case you should declare the reference as weak and verify it’s not nil before using it (don’t force unwrap).

    A common example of this is the [weak delegate = self.delegate!] you can see in some examples of closure referencing a completely unrelated (lifetime-wise) delegate object.

What if you are unsure about the lifetime relationship between two objects and you don’t want to risk having an invalid unowned reference? Always capturing defensively as weak could be a good approach?

No, and not only because having a clear idea of your objects lifetime is a good thing, the two attributes have also wildly different performance characteristics.

The most common implementation for weak references requires that each time a new reference is created it must be registered in a side-table where every weak reference is associated with the object it refers to.

When an object does not have any strong references pointing to it, the runtime will start the deallocation process but before this happens, it will set to nil all the weak references that were pointing to the object. Because of this behavior, weak references implemented this way are called zeroing weak references.

This implementation has a tangible overhead if you consider that an additional data structure needs to be maintained and that we need to guarantee the correctness of all operations on these global reference holding structures even in presence of concurrent access. It should not be possible under any circumstance to access the value pointed by a weak reference once the deallocation process has started.

Weak references (unowned and with some variations weak too) in Swift employ a less convoluted and faster mechanism instead.

Every object in Swift keeps two reference counters, the usual strong reference counter, used to decide when ARC will be able to safely deinitialize an object and an additional weak reference counter that counts how many unowned or weak references have been created toward this object, when this counter reaches zero, the object is deallocated.

It is important to understand that an object is not really deallocated until all its unowned references have been released, it will be kept reachable but it will be in an uninitialized state, with its content being just garbage after the deinitialization occurs.

Every time and unowned reference is declared its unowned reference counter is incremented atomically (using atomic gcc/llvm operations, that allow to perform basic operations like increment, decrement, compare and compare&swap in a fast and thread-safe way) to guarantee thread-safety and each time it’s used, the strong reference count will be checked to understand if the object it’s still valid before safely retaining it.

Trying to access an invalid object will result in a failed assertion and your application will fail gracefully with an error at runtime (that’s why this unowned implementation is defined unowned(safe) implementation).

As further optimization, if your application is compiled with -OFast unowned references will not be checked for object validity anymore and the references will behave like the __unsafe_unretained you have in Objective-C. If the object is invalid, your reference will point to deinitialized garbage memory (implementation known as unowned(unsafe)).

When an unowned reference is released, if there are no more strong or unowned references the object will finally be deallocated and this is why the object cannot just be deallocated completely when the strong counter reaches zero, all the reference counters must still be accessible to verify the unowned and strong count.

Swift’s weak references add an additional layer of indirection wrapping unowned references in an optional container, something that is useful to handle cleanly all those cases when the pointed object could become null after a deallocation. But this does not come for free, additional machinery is required to manage this optional correctly.

Considering all this, the use of unowned should be preferred over weak every time it’s possible and lifetime relationships allow it. But this is not the end of our story, let’s talk about performance1 now.

Performance: A Look Under The Hood

Before we look into the source of the Swift project to verify what we said in the previous section, we need to understand how each kind of reference is managed by ARC and to do that, I need to explain a few things about swiftc, LLVM and SIL.

I’ll try to give you a short overview and explain only what is strictly necessary, if you want to learn more you’ll find some useful links in the footnotes.

Let’s start with a diagram that contains the basic functional blocks of swiftc, the Swift compiler, to give you an idea of what the whole compilation process entails:

swiftc block diagram

Swiftc follows an approach for the most part similar to other compilers built on top of LLVM like clang.

In the first part of the compilation process, managed by a specific language frontend, the source code is parsed to produce an AST representation2 of your source code and the resulting AST is then analyzed from a semantic point of view to identify semantic errors.

At this point, in other LLVM-based compilers, after an additional step that performs a static analysis of your code (and if necessary displays errors and warnings) through a dedicated component, the content of the AST is converted to a light-weight and low-level machine independent representation called LLVM IR (LLVM Intermediate Representation) by the IRGen component.

These two components, static analyzer and IRGen, are separated even if some checks needs to be performed in both of them, there is usually a lot of code duplication between these two modules.

The IR is a Static Single Assignment form (SSA-form) compliant language and can be considered the RISC-style assembly language of the LLVM register-based virtual machine. Being SSA based simplifies greatly the next step of the compilation process, where multiple passes of optimization are applied to the IR obtained from the internal representation provided by the language frontend.

It’s important to know, that one of the characteristics of IR is that it can be represented in three different forms: an in-memory representation (used internally), a serialized bitcode representation (the same bitcode you already know) and a human-readable form.

This last form is quite useful to verify the final structure of the IR code that will be passed to the last step of the process, that will convert our machine-independent IR to a platform-specific representation (e.g. x86, ARM, etc…). This last step will be performed by dedicated LLVM platform backends.

But what makes swiftc different from other compilers based on LLVM?

The fundamental structural difference between swiftc and other compilers is the presence of an additional component, SILGen, right before IRGen that performs diagnostic and optimization passes on your source producing an intermediate high-level representation called SIL(Swift Intermediate Language), that will be converted to LLVM IR. This allows to consolidate in a single software component all the language-specific checks and simplifies IRGen.

The conversion from AST to IR is a two step process. SILGen converts the source represented as an AST into raw SIL and then the compiler performs Swift diagnostic checks (printing errors or warnings if necessary) and optimizes the validated raw SIL through multiple passes producing canonical SIL. As the diagram above shows, the canonical SIL is then converted into LLVM IR.

SIL3 is, again, a SSA-form language and extends the Swift syntax with additional constructs. It relies on Swift’s type system and is able to understand Swift declaration but it’s important to remember that top level Swift code or function content will be ignored when compiling hand-coded (yes, we can write SIL and compile it) SIL sources.

In the second half of this section, we’ll analyze an example of canonical SIL to understand how unowned and weak references are handled by the compiler. Looking at the SIL generated from our code, a basic closure with a capture list, we’ll be able to see all the ARC-related function calls added by the compiler.

Deconstructing capture lists handling

Let’s start with this simple Swift example that declares two variables and captures them weakly in a closure:


class aClass{
    var value = 1
}

var c1 = aClass()
var c2 = aClass()

var fSpec = { 
    [unowned c1, weak c2] in
    c1.value = 42
    if let c2o = c2 {
        c2o.value = 42
    }
}

fSpec()

To generate canonical SIL for this sample, just compile the swift source file with xcrun swiftc -emit-sil sample.swift. Raw SIL can be generated using the -emit-silgen option.

If you run the command above you’ll notice that swiftc produces quite a lot of code, let’s take a look at an excerpt of the swiftc output to learn what some basic SIL directives do and to understand the overall structure.

I’ve added a few multiline comments with explanations (convenient single line comments are generated by the compiler) where needed that should be enough to clarify what’s happening:


/*
  This file contains canonical SIL 
*/
sil_stage canonical             

/* 
  Some special import available only internally that can be used in SIL 
*/
import Builtin                  
import Swift
import SwiftShims

/* 
 Definitions for three global variables for c1,c2 and the fSpec closure 
 @_Tv4clos2c1CS_6aClass is the symbol name for this variable and $aClass 
 its type (types start with $). Variable names are mangled here but can 
 be transformed in something more readable as we'll see below.  
*/
// c1
sil_global hidden @_Tv4sample2c1CS_6aClass : $aClass

// c2
sil_global hidden @_Tv4sample2c2CS_6aClass : $aClass

// fSpec
sil_global hidden @_Tv4sample5fSpecFT_T_ : $@callee_owned () -> ()

...

/*
  A hierarchical scope definition that refers to positions in the original source.
  Each SIL instruction will point to the sil_scope it was generated from.
*/
sil_scope 1 {  parent @main : $@convention(c) (Int32, UnsafeMutablePointer<Optional<UnsafeMutablePointer<Int8>>>) -> Int32 }
sil_scope 2 { loc "sample.swift":14:1 parent 1 }


/* 
  An autogenerated @main function that contains the code of our original global
  scope.
 
  It follows the familiar c main() structure accepting the number of
  arguments and an arguments array. The function conforms to the c calling convention.
  This function contains the instructions needed to invoke the closure above.
*/
// main
sil @main : $@convention(c) (Int32, UnsafeMutablePointer<Optional<UnsafeMutablePointer<Int8>>>) -> Int32 {
/*
  Registers starts with a % followed by a numeric id.
  Every time a new register is defined (or at the beginning of a function for function
  parameters) the compiler adds a trailing comment with the list of registers or instructions
  that depend on its value (called users). 
  For other instructions, the id of the current instruction is provided.

  In this case, register 0 will be used to calculate the content of register 4 and register 1
  will be used to create the value of register 10.
*/
// %0                                             // user: %4
// %1                                             // user: %10
/*
  Every function is decomposed in a series of basic blocks of instructions and each block ends 
  with a terminating instruction (a branch or a return). 
  This graph of blocks represents all the possible execution paths of the function.
*/
bb0(%0 : $Int32, %1 : $UnsafeMutablePointer<Optional<UnsafeMutablePointer<Int8>>>):
  ...
  /*
    Each SIL instruction has a reference to the source location that contains the Swift 
    instruction from which it originated and a reference to the scope it's part of.
    We'll look to some of this below when analyzing this method.
  */
  unowned_retain %27 : $@sil_unowned aClass, loc "sample.swift":9:14, scope 2 // id: %28
  store %27 to %2 : $*@sil_unowned aClass, loc "sample.swift":9:14, scope 2 // id: %29
  %30 = alloc_box $@sil_weak Optional<aClass>, var, name "c2", loc "sample.swift":9:23, scope 2 // users: %46, %44, %43, %31
  %31 = project_box %30 : $@box @sil_weak Optional<aClass>, loc "sample.swift":9:23, scope 2 // user: %35
  %32 = load %19 : $*aClass, loc "sample.swift":9:23, scope 2 // users: %34, %33
  ...
}

...

/* 
  A series of autogenerated methods for aClass, init/deinit,  
  setter/getter and other utility methods. 
  
  The comments added by the compiler clarify what they do.
*/

/*
  Hidden function are visible only inside their module.

  @convention(method) is the default Swift method calling convention, an additional 
  parameter is added at the end to contain a reference to self.
*/
// aClass.__deallocating_deinit
sil hidden @_TFC4clos6aClassD : $@convention(method) (@owned aClass) -> () {
    ...
}

/*
  @guaranteed parameters are guaranteed to be valid for all the duration of the call.
*/
// aClass.deinit
sil hidden @_TFC4clos6aClassd : $@convention(method) (@guaranteed aClass) -> @owned Builtin.NativeObject {
    ...
}

/*
  Functions annotated with [transparent] are small function that can be inlined.
*/
// aClass.value.getter
sil hidden [transparent] @_TFC4clos6aClassg5valueSi : $@convention(method) (@guaranteed aClass) -> Int {
    ...
}

// aClass.value.setter
sil hidden [transparent] @_TFC4clos6aClasss5valueSi : $@convention(method) (Int, @guaranteed aClass) -> () {
    ...
}

// aClass.value.materializeForSet
sil hidden [transparent] @_TFC4clos6aClassm5valueSi : $@convention(method) (Builtin.RawPointer, @inout Builtin.UnsafeValueBuffer, @guaranteed aClass) -> (Builtin.RawPointer, Optional<Builtin.RawPointer>) {
    ...
}

/*
  @owned specifies that the object is owned by the caller.
*/
// aClass.init() -> aClass
sil hidden @_TFC4clos6aClasscfT_S0_ : $@convention(method) (@owned aClass) -> @owned aClass {
    ...
}

// aClass.__allocating_init() -> aClass
sil hidden @_TFC4clos6aClassCfT_S0_ : $@convention(method) (@thick aClass.Type) -> @owned aClass {
    ...
}

/* 
  The closure.
*/
// (closure #1)
sil shared @_TF4closU_FT_T_ : $@convention(thin) (@owned @sil_unowned aClass, @owned @box @sil_weak Optional<aClass>) -> () {
    ...
    /* SIL for the closure, see below */
    ...
}

...

/* 
  sil_vtable defines the virtual function table for the aClass class.

  It contains as expected all the autogenerated methods.
*/
sil_vtable aClass {
  #aClass.deinit!deallocator: _TFC4clos6aClassD	// aClass.__deallocating_deinit
  #aClass.value!getter.1: _TFC4clos6aClassg5valueSi	// aClass.value.getter
  #aClass.value!setter.1: _TFC4clos6aClasss5valueSi	// aClass.value.setter
  #aClass.value!materializeForSet.1: _TFC4clos6aClassm5valueSi	// aClass.value.materializeForSet
  #aClass.init!initializer.1: _TFC4clos6aClasscfT_S0_	// aClass.init() -> aClass
}

Now let’s go back to the main function, to see how the two class instances are retrieved and passed to the closure when it’s invoked.

This time all the symbols are demangled4 to make the snippet slightly more readable:

 
// main
sil @main : $@convention(c) (Int32, UnsafeMutablePointer<Optional<UnsafeMutablePointer<Int8>>>) -> Int32 {
// %0                                             // user: %4
// %1                                             // user: %10
bb0(%0 : $Int32, %1 : $UnsafeMutablePointer<Optional<UnsafeMutablePointer<Int8>>>):
  ...
  /*
    References to the global variables are placed in three registers.
  */
  %13 = global_addr @clos.c1 : $*aClass, loc "sample.swift":5:5, scope 1 // users: %26, %17
  ...
  %19 = global_addr @clos.c2 : $*aClass, loc "sample.swift":6:5, scope 1 // users: %32, %23
  ...
  %25 = global_addr @clos.fSpec : $*@callee_owned () -> (), loc "sample.swift":8:5, scope 1 // users: %48, %45
  /*
    c1 is unowned_retained. 
    This instruction increments the unowned reference count of the variable.
  */
  %26 = load %13 : $*aClass, loc "sample.swift":9:14, scope 2 // user: %27
  %27 = ref_to_unowned %26 : $aClass to $@sil_unowned aClass, loc "sample.swift":9:14, scope 2 // users: %47, %38, %39, %29, %28
  unowned_retain %27 : $@sil_unowned aClass, loc "sample.swift":9:14, scope 2 // id: %28
  store %27 to %2 : $*@sil_unowned aClass, loc "sample.swift":9:14, scope 2 // id: %29
  /*
    For c2 the process is more complex.
    alloc_box creates a reference-counted container for this variable that will be stored
    on the heap.

    After the box has been created, an optional variable is initialized to point to c2 and stored
    in the box. The box retains the value it contains, so, as you see below, once the box is 
    populated, the optional can be released.

    At one point, while the value of c2 is being stored in the optional, the object is
    temporarily strong_retained and then released.
  */
  %30 = alloc_box $@sil_weak Optional<aClass>, var, name "c2", loc "sample.swift":9:23, scope 2 // users: %46, %44, %43, %31
  %31 = project_box %30 : $@box @sil_weak Optional<aClass>, loc "sample.swift":9:23, scope 2 // user: %35
  %32 = load %19 : $*aClass, loc "sample.swift":9:23, scope 2 // users: %34, %33
  strong_retain %32 : $aClass, loc "sample.swift":9:23, scope 2 // id: %33
  %34 = enum $Optional<aClass>, #Optional.some!enumelt.1, %32 : $aClass, loc "sample.swift":9:23, scope 2 // users: %36, %35
  store_weak %34 to [initialization] %31 : $*@sil_weak Optional<aClass>, loc "sample.swift":9:23, scope 2 // id: %35
  release_value %34 : $Optional<aClass>, loc "sample.swift":9:23, scope 2 // id: %36
  /*
    A reference to the closure is retrieved.
  */
  // function_ref (closure #1)
  %37 = function_ref @sample.(closure #1) : $@convention(thin) (@owned @sil_unowned aClass, @owned @box @sil_weak Optional<aClass>) -> (), loc "sample.swift":8:13, scope 2 // user: %44
  /*
    c1 is tagged with @unowned and the variable is then unowned_retained.
  */
  strong_retain_unowned %27 : $@sil_unowned aClass, loc "sample.swift":8:13, scope 2 // id: %38
  %39 = unowned_to_ref %27 : $@sil_unowned aClass to $aClass, loc "sample.swift":8:13, scope 2 // users: %42, %40
  %40 = ref_to_unowned %39 : $aClass to $@sil_unowned aClass, loc "sample.swift":8:13, scope 2 // users: %44, %41
  unowned_retain %40 : $@sil_unowned aClass, loc "sample.swift":8:13, scope 2 // id: %41
  strong_release %39 : $aClass, loc "sample.swift":8:13, scope 2 // id: %42
  /*
    The box containing an optional with the value of c2 is strong_retained.
  */
  strong_retain %30 : $@box @sil_weak Optional<aClass>, loc "sample.swift":8:13, scope 2 // id: %43
  /*
    Creates a closure object binding the function to its parameters.
  */
  %44 = partial_apply %37(%40, %30) : $@convention(thin) (@owned @sil_unowned aClass, @owned @box @sil_weak Optional<aClass>) -> (), loc "sample.swift":8:13, scope 2 // user: %45
  store %44 to %25 : $*@callee_owned () -> (), loc "sample.swift":8:13, scope 2 // id: %45
  /*
    Performs release on the c1 and c2's box variables (using the matching *_release functions).
  */
  strong_release %30 : $@box @sil_weak Optional<aClass>, loc "sample.swift":14:1, scope 2 // id: %46
  unowned_release %27 : $@sil_unowned aClass, loc "sample.swift":9:14, scope 2 // id: %47
  /*
     Loads the previously stored closure object, retains it strongly and invoke the function.
  */
  %48 = load %25 : $*@callee_owned () -> (), loc "sample.swift":17:1, scope 2 // users: %50, %49
  strong_retain %48 : $@callee_owned () -> (), loc "sample.swift":17:1, scope 2 // id: %49
  %50 = apply %48() : $@callee_owned () -> (), loc "sample.swift":17:7, scope 2
  ...
}

The closure has a more complex structure:

 
/*
  The closure parameters are annotated with @sil annotations that specify how they will be 
  retained, we have an unowned aClass, c1, and a weak box with and optional containing c2.
*/
// (closure #1)
sil shared @clos.fSpec: $@convention(thin) (@owned @sil_unowned aClass, @owned @box @sil_weak Optional<aClass>) -> () {
// %0                                             // users: %24, %6, %5, %2
// %1                                             // users: %23, %3
/*
  This function has three blocks, with the last two being executed conditionally depending
  on the value of the c2 optional.
*/
bb0(%0 : $@sil_unowned aClass, %1 : $@box @sil_weak Optional<aClass>):
  ...
  /*
    c1 is strongly retained.
  */
  strong_retain_unowned %0 : $@sil_unowned aClass, loc "sample.swift":10:5, scope 17 // id: %5
  %6 = unowned_to_ref %0 : $@sil_unowned aClass to $aClass, loc "sample.swift":10:5, scope 17 // users: %11, %10, %9
  /*
    Using the internal Builtin package, an Int with value 42 is initialized using an integer 
    literal as parameter for an Int struct.

    This value is then set as new value of c1 and once done the variable is released.
    The class_method instruction that we see here for the first time, retrieves a reference 
    to a function from the vtable of an object.
  */
  %7 = integer_literal $Builtin.Int64, 42, loc "sample.swift":10:16, scope 17 // user: %8
  %8 = struct $Int (%7 : $Builtin.Int64), loc "sample.swift":10:16, scope 17 // user: %10
  %9 = class_method %6 : $aClass, #aClass.value!setter.1 : (aClass) -> (Int) -> () , $@convention(method) (Int, @guaranteed aClass) -> (), loc "sample.swift":10:14, scope 17 // user: %10
  %10 = apply %9(%8, %6) : $@convention(method) (Int, @guaranteed aClass) -> (), loc "sample.swift":10:14, scope 17
  strong_release %6 : $aClass, loc "sample.swift":10:16, scope 17 // id: %11
  /*
    And now it's the turn of c2.
    The optional is retrieved and branch to one the the last to blocks is performed depending
    on its content. 

    If the optional has a value the bb2 block will be executed before jumping 
    to bb3, if it doesn't after a brief jump to bb1, the function will proceed to bb3 releasing
    the retained parameters.
  */
  %12 = load_weak %3 : $*@sil_weak Optional<aClass>, loc "sample.swift":11:18, scope 18 // user: %13
  switch_enum %12 : $Optional<aClass>, case #Optional.some!enumelt.1: bb2, default bb1, loc "sample.swift":11:18, scope 18 // id: %13

bb1:                                              // Preds: bb0
  /*
    Jumps to the end of the closure.
  */
  br bb3, loc "sample.swift":11:18, scope 16        // id: %14

// %15                                            // users: %21, %20, %19, %16
bb2(%15 : $aClass):                               // Preds: bb0
  /*
    Invokes the setter for aClass setting a value of 42 and procedes.
  */
  ...
  %17 = integer_literal $Builtin.Int64, 42, loc "sample.swift":12:21, scope 19 // user: %18
  %18 = struct $Int (%17 : $Builtin.Int64), loc "sample.swift":12:21, scope 19 // user: %20
  %19 = class_method %15 : $aClass, #aClass.value!setter.1 : (aClass) -> (Int) -> () , $@convention(method) (Int, @guaranteed aClass) -> (), loc "sample.swift":12:19, scope 19 // user: %20
  %20 = apply %19(%18, %15) : $@convention(method) (Int, @guaranteed aClass) -> (), loc "sample.swift":12:19, scope 19
  strong_release %15 : $aClass, loc "sample.swift":13:5, scope 18 // id: %21
  br bb3, loc "sample.swift":13:5, scope 18         // id: %22

bb3:                                              // Preds: bb1 bb2
  /*
    Releases both captured parameters and returns.
  */
  strong_release %1 : $@box @sil_weak Optional<aClass>, loc "sample.swift":14:1, scope 17 // id: %23
  unowned_release %0 : $@sil_unowned aClass, loc "sample.swift":14:1, scope 17 // id: %24
  %25 = tuple (), loc "sample.swift":14:1, scope 17 // user: %26
  return %25 : $(), loc "sample.swift":14:1, scope 17 // id: %26
}

At this point, ignoring for a moment the performance characteristics of the various ARC instructions we can do a quick recap of what needs to be done for each kind of captured variable at different stages:

Action Unowned Weak
Pre-call #1 unowned_retain the object Create a @box, strong_retain the object, create an optional and store it in the @box,release the optional
Pre-call #2 strong_retain_unowned, unowned_retain and strong_release strong_retain
Closure execution strong_retain_unowned, unowned_release load_weak, switch on Optional, strong_release
Post-call unowned_release strong_release

As we saw in the SIL excerpts above, handling weak references involves more work because they make use of an optional that needs to be handled.

Here is a brief explanation of what each one of the ARC instructions listed above do as described in the documentation:

  • unowned_retain: Increments the unowned reference count of the heap object.
  • strong_retain_unowned: Asserts that the strong reference count of the object is still positive, then increases it by one.
  • strong_retain: Increases the strong retain count of the object.
  • load_weak: Not really an ARC call but it increments the strong reference count of the object referenced by the optional.
  • strong_release: Decrements the strong reference count of the object. If the release operation brings the strong reference count of the object to zero, the object is destroyed and the weak references are cleared. When both its strong and unowned reference counts reach zero, the object’s memory is deallocated.
  • unowned_release: Decrements the unowned reference count of the object. When both its strong and unowned reference counts reach zero, the object’s memory is deallocated.

Now let’s dig deeper in the Swift runtime to see how these instructions are implemented, the files that contain what we need are HeapObject.cpp, HeapObject.h, RefCount.h and for a few minor definitions Heap.cpp and SwiftObject.mm. Boxes implementation can be found in MetadataImpl.h, but I will not talk about them in this post.

Many of the ARC functions declared in these file come in three variants, a basic implementation for Swift objects and two additional implementations for non native Swift objects: Bridge objects and Unknown objects. The last two variants will not be discussed here.

The first set of instructions we’ll discuss is the one related to unowned references.

The functions that implement unowned_retain and unowned_release can be found halfway through HeapObject.cpp:

 
SWIFT_RT_ENTRY_VISIBILITY
void swift::swift_unownedRetain(HeapObject *object)
    SWIFT_CC(RegisterPreservingCC_IMPL) {
  if (!object)
    return;

  object->weakRefCount.increment();
}

SWIFT_RT_ENTRY_VISIBILITY
void swift::swift_unownedRelease(HeapObject *object)
    SWIFT_CC(RegisterPreservingCC_IMPL) {
  if (!object)
    return;

  if (object->weakRefCount.decrementShouldDeallocate()) {
    // Only class objects can be weak-retained and weak-released.
    auto metadata = object->metadata;
    assert(metadata->isClassObject());
    auto classMetadata = static_cast<const ClassMetadata*>(metadata);
    assert(classMetadata->isTypeMetadata());
    SWIFT_RT_ENTRY_CALL(swift_slowDealloc)
        (object, classMetadata->getInstanceSize(),
         classMetadata->getInstanceAlignMask());
  }
}

While swift_unownedRetain, the implementation of unowned_retain, simply increments atomically the unowned reference count (here called weakRefCount), swift_unownedRelease is more complex because as described above it needs to handle object deallocation, performing it only when there are no other unowned references left.

But nothing particularly complex here, as you can see here the doDecrementShouldDeallocate function, called by a similarly named function in the snippet above, doesn’t do much and swift_slowDealloc just frees the given pointer.

And once we have an unowned reference to an object, another instruction, strong_retain_unowned, is used to create a strong reference:


SWIFT_RT_ENTRY_VISIBILITY
void swift::swift_unownedRetainStrong(HeapObject *object)
    SWIFT_CC(RegisterPreservingCC_IMPL) {
  if (!object)
    return;
  assert(object->weakRefCount.getCount() &&
         "object is not currently weakly retained");

  if (! object->refCount.tryIncrement())
    _swift_abortRetainUnowned(object);
}

Since this object should already be weakly referenced, an assert is performed to verify that the object is indeed weakly retained and once done, an attempt to increment its strong retain count is performed. The attempt will fail if the object is in the process of being deallocated.

All the functions like tryIncrement that modify in some way the retain counters are located in RefCount.h and require just a few atomic operations to perform their task.

Let’s talk about weak references now, as we saw before, swift_weakLoadStrong is used to obtain a strong reference to the object contained in the optional:


HeapObject *swift::swift_weakLoadStrong(WeakReference *ref) {
  if (ref->Value == (uintptr_t)nullptr) {
    return nullptr;
  }

  // ref might be visible to other threads
  auto ptr = __atomic_fetch_or(&ref->Value, WR_READING, __ATOMIC_RELAXED);
  while (ptr & WR_READING) {
    short c = 0;
    while (__atomic_load_n(&ref->Value, __ATOMIC_RELAXED) & WR_READING) {
      if (++c == WR_SPINLIMIT) {
        std::this_thread::yield();
        c -= 1;
      }
    }
    ptr = __atomic_fetch_or(&ref->Value, WR_READING, __ATOMIC_RELAXED);
  }

  auto object = (HeapObject*)(ptr & ~WR_NATIVE);
  if (object == nullptr) {
    __atomic_store_n(&ref->Value, (uintptr_t)nullptr, __ATOMIC_RELAXED);
    return nullptr;
  }
  if (object->refCount.isDeallocating()) {
    __atomic_store_n(&ref->Value, (uintptr_t)nullptr, __ATOMIC_RELAXED);
    SWIFT_RT_ENTRY_CALL(swift_unownedRelease)(object);
    return nullptr;
  }
  auto result = swift_tryRetain(object);
  __atomic_store_n(&ref->Value, ptr, __ATOMIC_RELAXED);
  return result;
}

Obtaining a strong reference in this case requires more complex synchronization that will reduce performance under heavy thread contention.

The WeakReference object we see here for the first time is a simple struct that contains an integer Value field pointing to the target object that as every Swift object is represented in the runtime with the HeapObject class.

Right after the weak reference in acquired for the current thread setting the WR_READING flag, the Swift object is retrieved from the WeakReference container and if it’s not valid anymore or if it has become eligible for deallocation while we were waiting to acquire the resource, the current reference is set to null.

If the object is still valid, an attempt to retain is performed as expected.

Therefore, even from this point of view we can expect the performance of weak reference during common operations to be lower than what we would expect from simpler unowned references (but from what I’ve seen the major overhead seems to be optional handling).

Conclusion

Does using defensively only weak references make sense? No, from both the point of view of performance and code clarity.

Using the right type of capture modifier makes explicit some lifetime characteristics of our code and makes it harder to reach wrong conclusions about how the code behaves when someone else, or future you, will read what you wrote.

Footnotes


1: The first discussion on the weak/unowned dilemma with input from Apple can be found here, and a later discussion on twitter with Joe Groff has been summarized here by Michael Tsai. This article starts from there with the intention of providing a throughout and approachable explanation.
2: A good description of ASTs can be found on Wikipedia while this article from Slava Pestov has more details on how this is implemented for the Swift compiler.
3: For more information about SIL, check out the detailed official SIL guide and this video from the 2015 LLVM Developers's Meeting. A quick reference for SIL instructions written by Lex Chou is available here.
4: To learn more about how name mangling is performed in Swift, read this reference from Lex Chou.
5: Mike Ash talked about weak references with an experimental approach in one of his Friday Q&A, it's not completely up-to-date with the current way things are named and implemented in Swift but the explanations are still valid.

Did you like this article? Let me know on Twitter!

I'm also on Twitter and GitHub.

Subscribe via RSS or email.