The Importance of Being `final`

[UPDATE - Swift 1.2 from Xcode 6.3 Beta 2 brings performance benefits to non-final properties and methods making final unnecessary for performance from that time - see my initial reaction to Beta 2. I still recommend final where possible as it avoids need to consider effects of an object being subclassed and methods overwritten that change the behaviour.]

This post is intended to quantify, explain and show the performance hit that you take from not adding the word final to your classes (or their properties and methods). The post was triggered by a blog post complaining about Swift performance and showing some performance figures where Swift was significantly slower than Objective-C. In the optimised builds that gap could be closed just by adding a single final keyword. I'm grateful to David Owens for showing the code he was having trouble and giving me a base to demonstrate the difference final can make without it being me cherry picking any code of my choosing.

As with most performance issues this doesn't matter most of the time. Most code is waiting on user input, network responses or other slow things and it really doesn't matter that much if the code is being accessed a few times a second. However when you have got performance critical code and in particular that inner loop that is being executed hundreds of thousands of times a second it can make a huge difference.

What does final do

From The Swift Programming Language:

I don't think that there is much that I can usefully add to that about the syntax and the semantics but the rest of this post is largely going to be about what it enables the compiler to do.

Do note especially though that if the class itself is marked as final no subclasses are possible so effectively all properties and methods are automatically final.

What about structs and enums?

As you can't subclass these they are always effectively final anyway so get all the performance benefits.

But why does it work faster

To access a final method or property there are simply less steps of indirection and lookup to conduct at runtime. In the case of a method there is no need for any indirection and the compiler can simply write the code to call the correct method any place you access it from an object. In places the method can even be inlined completely so the cost of the call, the jump, the register shuffling to match the calling convention can all be bypassed and only the essential code executed or on occasions possibly everything can be eliminated.

On the other hand when you have a non-final method the compiler has to assume that it may actually be an subclass that it is calling into and there may be an overriden method that needs to be accessed instead. So before it calls the method it must lookup the required method in the object and then jump to the code. To be able to inline it there would need to be two code paths and it would need to check the table to see if the inlined code path is allowable (if the address of the method is the same as for the base type it is being compiled against).

The same applies to properties but if anything even more so. An overriding class may define there to be didSet or willSet calls triggered on the property so at the very least they need to be checked. The subclass may also make the property dynamic by providing explicit get and set. This all means that the object's table needs to be checked before the property can be accessed. At the moment this seems to happen on every occasion you access it.

Difference at Assembly Level

I added this to the non-final class to get a nice small manageable block of assembler to compare the results of compilation:


Looking at the resulting assembly for the final version (below) you can see how short it is and also that there are no function calls (call) occurring in it. In fact there is only a single jump on overflow (jo).
On the other hand the non-final version is substantially larger, has an extra jump and a call to imp__stubs__objc_retain. Not that this is for the very simple method that only accesses the Int and updates it. The main method operating on the buffer accesses into the array on every time through the inner nested loop. It would have to do this type of indirect load and store for each pixel in the array and that adds up.


How much difference does it make?

Well in debug unoptimised builds very little. They still make all the explicit calls without inlining and will also make various checks anyway. There may be some improvement possible here in future but it will never be as effective. However in optimised builds in critical code it can speed it up by over ten times.

Measurement

Starting from David's day005_badperf branch at the time I saw his blog post I created this branch with two simple changes. Firstly release (optimised) builds are enabled. Secondly the measurement point is moved to focus on the effect on the Swift code rather than the API (probably in C). This makes the absolute figures impossible to compare directly with David's and we are running on different computers anyway.

Then I tested with and without final applied to the class (it was not specifically applied to any properties either).


Without final final class
Absolute time (avg)
0.0165 0.00097
Relative speed 1 17.04

This isn't always going to be the effect, in fact it is an extreme case but accessing an array property in an inner loop is something that might come up at other times.

The code for this measurement can be found on my Github fork. Just toggle the last commit or add/remove final manually to compare the builds. The timings above are averages calculated in LibreOffice (Numbers really didn't like me pasting the numbers for some reason and was treating them as text).

Is final Just for Performance

Absolutely not. Inheritance is something that should really be designed into the superclass as it is easy for assumptions to be broken about behaviour. I would go so far as to say that almost all classes should be final unless you are specifically planning to inherit from them. I would much prefer if Swift defaulted to final for all classes methods and properties and had a keyword such as overridable or inheritable that could be applied in specific places that you wanted to allow to be overriden.

Overcoming Swift Resistance

The remainder of this post is largely my reaction to David Owens' Swift Resistance blog post yesterday that triggered me to write this whole post. I've read a few of his posts and it always feels like he is looking for reasons to complain about Swift and make it look bad

He wrote about some performance issues he was having with some fairly low level Swift code. The valid point he makes is that Swift debug builds are much (100x in some realistic cases) slower than release builds. This is definitely a nuisance (but not in my view the killer he makes it out to be) however he also included a performance table where Swift was about 70% slower in the release version.

The original figures are here just for reference so that you can see if they have been updated:

Swift is actually Fast (in Optimised builds -O -OUnchecked)

If the post read more clearly (as I think he now believes) that debug was an issue but that in release mode Swift was just as fast as (or faster than) Objective-C in his test I don't think I would have been triggered to investigate and see what I could do about it. As it was the performance figures showed Swift 70% slower than his Objective-C version and that was what I thought I could fix.

Unoptimised is always going to be slower - Sometimes infinitely

Apple's Swift team have mentioned several times on the forums that unoptimised speed is one of the issues that they are working on but I don't think it is realistic to expect that to be able to get anywhere close to the optimised performance in every case. At the pathological limit there are cases where code can be completely optimised away for an infinite speedup on those sections.

Objective-C gets suffers less but also partially avoids the issue largely using optimised code even in debug builds. This is because you are always calling into Apple's standard library (provided as optimised build) for things like array handling whereas in Swift the library is built into the language and can actually be optimised further and inlined where appropriate but not in the unoptimised builds.

Working Arround the Issue

He already found that using an unsafeMutableBufferPointer kept the speed up both in debug and release and Joe Groff (@jckarter) of Apple's Swift team also pointed out that most of the code could use the normal array but that you can treat it is an unsafe buffer in the hottest sections of the code using the withUnsafeMutableBufferPointer method of array to abandon safety and go for performance in specific areas.

Using the Debugger on Optimised Code

It is also worth noting that it is possible to use the debugger on optimised code, it is just harder because of the optimisations that may have taken place (inlining methods etc.) mean that the source code does not have exact correspondence to the binary and some information not needed for the runtime may have been removed. You can still step through the instructions. You can still set breakpoints, even conditional and symbolic ones (there are some cases things will not fire if optimised away so you might need to adjust things).

Old School - println()

You can also debug without the debugger using print statements and other code tweaks. Yes these approaches are slower but I have debugged in far worse environments (embedded devices). However there is definitely a cost in time and effort in these approaches.

Frameworks and Modules

At least for development you could separate your performance critical code into a separate framework from the bulk of you project and then you can set up to build the performance critical section as an optimised build and the bulk of the project as debug. This should also help with compilation speed. If you need to target iOS7 then you can have a separate monolithic target that you use to build your actual release builds.

Notes about the original Swift Resistance Article

The performance figures given in the Swift Resistance article at this time however have been significantly improved on by the changes David has already made (including making the class final) since and reached a similar level to the original Objective-C speed in release (according to him although I suspect it has surpassed the Objective-C speed).

There is no secret about the update but I'm not sure if or how it will be reflected in the original blogpost.

You can see a discussion between us in a pull request to add a final I think that the reason that David isn't seeing speedup is because he updated to include final between me seeing his post and forking the repo and his inspection of the pull request.