A screenshot of Xcode and the iOS Simulator — The loaded Xcode Project is the Cookpad application and the simulator is displaying the login screen within the Cookpad app.

Solving Mysterious Logout Issues on iOS 15

Liam Nichols
Source Diving
Published in
7 min readNov 25, 2021

--

At Cookpad, after the public launch of iOS 15 we started to receive reports from users telling us that they were being repeatedly logged out when opening our app. This came as quite a surprise since this wasn’t something that we spotted during our testing with the iOS beta releases.

If you are here for a fix then scroll right down to the conclusion, but if you want to understand more on how we debugged this particular issue then let’s begin.

Reproducing the reported issue

The details in the user reports were limited and all we knew was that starting with iOS 15, people would return to the app and find themselves logged out.

We had no video or no concrete steps to reproduce the issue so I found myself trying to launch the app in every way possible hoping to see it for myself. I tried reinstalling the app, I tried launching with and without a network connection, I tried force quitting and after a good thirty minutes with no luck I gave up and started responding on the ticket with my (lack of) findings.

It wasn’t until after that when I unlocked the phone again and launched Cookpad without even trying that I found myself at the login screen just as our users had been reporting!

After that, I wasn’t able to reproduce accurately but it seemed to be related to leaving my phone for a period of time before coming back to use it again.

Narrowing down the cause

I was afraid that reinstalling the app from Xcode could potentially stop the issue from being reproducible so before doing so, it was time to look through the code and try to narrow the issue down. I came up with three potential causes based on our implementation:

  1. Data was being cleared from UserDefaults.
  2. An unexpected API call returning HTTP 401 and triggering a logout.
  3. An error being thrown from the Keychain.

I was able to eliminate the first two potential causes thanks to a few subtle behaviours that I had observed after reproducing the issue myself:

  • The login screen didn’t ask me to pick a region — This indicated that data in the UserDefaults was fine because our ‘has shown region selection’ preference was still being persisted.
  • The main UI was not shown, even briefly — This indicated that there was no attempt to make a network request so it was probably too soon for the API to have been the cause.

This left us with the Keychain leading me onto the next question: What has changed and why is it so difficult to reproduce?

What has changed and why is it so difficult to reproduce?

I skimmed through the release notes, ran a quick search in google and I couldn’t find anything so I had to keep digging to better understand the issue.

Access to the Keychain data is provided via the Security framework which can be notoriously tricky to grasp. While there are lots of third-party libraries that wrap this framework to make things easier, we maintain our own tiny abstraction based on some Apple sample code.

Looking at this code, we call the SecItemCopyMatching method to load our access token and it returns the data along with an OSStatus code that describes the result. Unfortunately for us however, while our wrapper would throw non-success results with the status code for debugging purposes, we were discarding this information in the next layer up by just treating an error as nil.

We operate a weekly release schedule thanks to a lot of automation and at this time, the next cut-off point for our upcoming release (code freeze) was the following day. Because we still didn’t fully understand how widespread the issue was and we weren’t sure if we were going to be able to ship a fix before code freeze, I took this opportunity to fix the lack of observability by adding some extra non-fatal logging using Crashlytics:

A diff of code showing how we improved error recording and documentation around behaviours when loading session information.
While we weren’t in a position to change the behaviour of how we load the session, we were able to start logging errors and to better document the current behaviour of our implementation.

The result gave us some great insights that we could then observe over the following weeks:

A screenshot from the Crashlytics console showing the metrics of the non-fatal issue slowly decrease as a fix was rolled out
The number of impacted users slowly decreases in version 10.58.0 and 10.59.0 due to a mitigation we introduced while we worked to identify the root cause that was fixed in 10.60.0

At this point, I was able to catch the exact error code being returned. The culprit was errSecInteractionNotAllowed:

Interaction with the Security Server is not allowed.

This error is telling us that we’re trying to read from the Keychain at a point in time when the data is not available to us. This can typically happen if you try to read data that has been stored with the accessibility set to something like kSecAttrAccessibleWhenUnlocked while the device is still locked.

Now this makes total sense but the only problem is that in Cookpad, we only ever read from the Keychain when the app is launching, and my assumption was that the user must have tapped the app icon to launch the app and therefore the device should always be unlocked at that point, right?

So what has actually changed then? Even when I am able to reproduce the issue, I am 100% sure that my phone was unlocked at the point of me tapping the app icon so I can’t understand why this Keychain error would have been thrown.

Determined to find the cause, I replaced the implementation of our app with a debugging tool that would attempt and log Keychain reads at different points in its lifecycle:

A screenshot of a debugging interface in the app showing log messages that indicate failure or success when attempting  to read keychain data at different points in time.

In scenarios where I was able to reproduce the issue, I observed the following results:

  • main.swift — Failed (errSecInteractionNotAllowed)
  • AppDelegate.init()— Failed (errSecInteractionNotAllowed)
  • AppDelegate.applicationProtectedDataDidBecomeAvailable(_:) — Success
  • AppDelegate.application(_:didFinishLaunchingWithOptions:) — Success
  • ViewController.viewDidAppear(_:) — Success

So that (half) explains it. To avoid holding some implicitly unwrapped optional properties on our AppDelegate, we perform some setup in the init() method and part of this involved reading the access token from the Keychain. This was why the read would fail and ultimately why some users would find themselves logged out.

I learned an important lesson here that I shouldn’t assume that protected data is available when the AppDelegate is initialised, but to be honest, I still wasn’t happy because I couldn’t understand why it wasn’t available. After all, we hadn’t changed this particular bit of code for years and it had been working fine in iOS 12, 13 and 14, so what gives?

Finding the root cause

My debugging interface was useful, but it was missing a vital bit of information that would help to answer everything: The time.

I knew that ‘protected data’ wasn’t available up until AppDelegate.application(_:didFinishLaunchingWithOptions:), but it still didn’t really make sense because to reproduce the issue, I was doing the following:

  1. Launch the app
  2. Use it briefly
  3. Force quit the app
  4. Lock my device and leave it for ~30 minutes
  5. Unlock the device
  6. Launch the app again

I was 100% sure that the device was unlocked whenever I launched the app again in step 6 and therefore I was adamant that I should be able to read from the Keychain in AppDelegate.init().

It wasn’t until I looked at the timing of all of these steps that things started making a bit more sense:

A screenshot of a debugging interface in the app showing log messages that indicate failure or success when attempting to read keychain data at different points in time. This time, the logs include timestamps.

Looking carefully again at the timestamps:

  • main.swift — 11:38:47
  • AppDelegate.init() — 11:38:47
  • AppDelegate.application(_:didFinishLaunchingWithOptions:) — 12:03:04
  • ViewController.viewDidAppear(_:) — 12:03:04

The app process itself is being launched 25 minutes before I was actually unlocking the phone and tapping on the app icon!

Now I would have never have actually thought that there was this big of a delay, it was actually @_saagarjha that suggested I check the timestamps and after, he pointed me to this tweet:

Twitter: Home of the Apple Developer Documentation

Now everything makes sense. We never detected it initially because we likely didn’t give the iOS 15 betas enough time to ‘learn’ our patterns so the issue was only reproduced in real-world scenarios where the device thought I was going to launch the app soon. I still have no idea how this prediction is formed but I’m just going to put it down to “Siri Intelligence” and call it a day.

Conclusion

Starting with iOS 15, the system might decide to “pre-warm” your app before the user actually tries to open it and this can increase the chances of protected data not being available when you think it should be.

Protect yourself from this by waiting for the application(_:didFinishLaunchingWithOptions:) delegate callback, and if possible, keep an eye on UIApplication.isProtectedDataAvailable (or the delegate callbacks/notifications) and handle the situation accordingly.

We still get a very small number of non-fatals that report isProtectedDataAvailable as false inside of application(_:didFinishLaunchingWithOptions:) but getting the app to a point where we can defer reading the access token from the Keychain beyond this point would be a massive task and right now it’s not worth the time to investigate further.

This was a pretty tough bug to debug, and the fact that the change in behaviour seems to be entirely undocumented really did not help me. If you were also caught out by this, please consider duplicating FB9780579.

I learned a lot from this and I hope you do too! If you find yourself interested in working on issues like this and more, we’re hiring iOS Engineers for both our product and platform teams at Cookpad so why don’t you apply?

Update: Since publishing this article, numerous people have in fact pointed me to relatively well written documentation from Apple regarding the pre-warming behaviour. Other people however have also told me that they have still observed different behaviour to what is documented in some scenarios so do proceed with caution.

--

--