The Sorrows of Offline-First Mobile Apps

Open-Closed Software Model

Bertrand Meyer the inventor of the Eiffel programming language is a massive supporter of the open-closed model of software development. Once released, code is closed to modification and open to extension. If you want different functionality, you extend an existing class, perhaps create a sibling of the class you want to modify and make it do what you want.

That always sounds great but my experience is that with the best will in the world, most software gets to the point where it needs fairly major rework, this will not be in the form of the mythical "rewrite" that we all pine for and we are not planning on keeping the old classes around so we do what we should not do, we modify the code that we have.

It's OK To Modify Software If...

Actually I think this is OK if we have a definition of software at two levels. There is the architectural level, which defines top level program flows and use cases and then there is the detail level that dictates what happens in the fine detail. So the structural level might dictate an authentication sequence whereas the detail level says how this data is presented to the user. Clearly, making changes that only affect details is less likely to cause bugs. For a start, with encapsulation, the effect of a single method change should be very small. Also, clearly, making changes that affect the structure are much more likely to cause bugs. A recent article I read talked about how often we throw a method call in somewhere that works the first time but then we restructure something slightly, miss the badly placed method call and cause a bug.

The PixelPin App

Anyway, we have an app at PixelPin that started life as a fairly simple beast that did two main things. You either logged in directly to the app or you could log into a 3rd-party app using the PixelPin app, which did slightly different things to ensure the trust relationship was intact. Both of these flows became calls to a web service and it all pretty much did what it needed to.

Except it was slooooowwwwwwwwww

It turns out that despite our mobile providers quoting very high MB/s speeds on their data networks, the reality is that the connection setup is very slow so to perform, say, 4 calls to a web service would add about 5-10 seconds EACH! to the time taken to register on the app.

Simple Optimisations First

Initially, I decided to improve this quite simply by not calling lots of web service methods but saving all the data up and calling the method once at the end of registration, it would no longer upload an image (PixelPin is an image-based authentication system), since registration restricted the user to a gallery image. All good so far.

But it was still too slow when logging in.

Good is Sometimes Not Good Enough

Most people expect login to be fairly instant but, again, the slowness of the data networks meant that even a single call with a small payload of data might take 5 seconds or longer and sometimes timeout. Although this is understandable, it is not acceptable and doesn't make for a sellable product.

What we needed was offline login!

Offline-First Sounds Simple Right?

One of the things we had to accept from that outset was that mobile devices do not have very mature security controls. You can save data onto the device but ultimately, especially on a rooted phone, that data is accessible. We have encrypted the data but the key, by necessity, is also stored on the device. That meant we had to include a range of measures to both make it hard to extract this information and also to limit the damage if this data was somehow obtained.

That said, the ideal was that login always occurs offline for speed reasons and the data is updated in the background by a Push Service to ensure it is always up-to-date. We can still make the data stale after a period of time to force the App to check with the server whether the data is still correct but this provides the basis of offline-first functionality.

The problem was not with the theory, the problem was I already had an app that I had spent quite a lot of time designing, analysing from a security perspective and implementing, how was I going to retro-fit the offline and avoid the rewrite?

Retro-fit, What Could Possibly Go Wrong?

I made a mistake, I assumed that the use-cases were basically the same but with an "offline module" in place of the calls to the web service. Ignoring the push functionality (which I would defer for now and replace with a single initial web service call), this should be simple right.

I added the offline data, encryption mechanism with key generator and got a basic login to work offline - great eh?

But as with many a developer, the joy quickly wore off when my team were testing it and reported a whole host of problems. Problems after logging in, particularly when selecting your own image to use and other seemingly random errors.

Two of these were nothing really to do with offline. One was related to understanding the Android lifecycle and knowing that you need to store important data for when the system kills off an activity on a device with limited resources and the other was simply an error when creating one of the web service methods that was modified for the new registration system. I felt I was reaching the summit and was almost ready for release when....

The last 1%

I've heard this saying in a number of ways but it is something like the last 5% of the work takes 95% of the time. We get the bulk of the app up and running very quickly, as we should with our modern tools and debugging suites (although I'm not sure I count Eclipse as modern!) but it's when we get to the last little bits that we learn that Android doesn't support our kind of encryption, without adding a library (including all the joys of trying to include Java libraries into the build AND the deployment) but a specific problem reared its head and I don't know why I didn't think of it before.

The original design used a session system on the server. Once you logged in, you had a session, which meant when you called subsequent methods like UpdatePasspoints, you passed the session id, which was both a useful identifier but it also ensured that the given user was authenticated - which would avoid somebody simply calling the web API with a random user and changing their passpoints.

What happens when you login offline? The server is not in the circuit and doesn't create a session. I have logged in locally but how do I convince the server that a given user has actually logged in and that it is not just a fake app pretending to have logged someone in?

I initially considered simply creating a session if the user wants to do something like change picture or passpoints, which would require an online connection (no big deal) but they are already logged in, I can't ask them to login again with the online mechanism. Perhaps I could always login to the server when using the app direct, on the basis that the only reason you would do that is to change your picture or passpoints but then we're back to the original problem with slowness.

Conclusion

I basically have to go back to the drawing board and redesign how this works with offline-first. There is a really handy mechanism that Google provide for Android which is a way of authenticating the app with the server to prove it isn't a fake app and this is really useful for part of the job. The other part is how to prove to the server that the user has logged in locally without exposing any private data or storing additional data on the device. Perhaps some challenge/response mechanism would work but that is for another day.....