Upgrade Easily

Recently I’ve worked on an older code base that has been gathering dust for a while. After giving it a thought, we decided our task would be easier if we made an upgrade of the main library, the REST framework.

We had to jump from version 2.4 to 3.1. Doing a major version upgrade of the main library never sounds like fun, especially if you haven’t touched the code for a while.

Much to my delight, Tom Christie did a great job, making our upgrade a smooth and clear process. To the extent that it got me thinking about API upgrades and, in result, I decided to write a post about it. It’s also my way of saying thank you to the REST framework developers.

In this post I’ll talk about things that authors of libraries can do to make upgrades easy for their users. Don’t think this post isn’t for you, because you’re not writing a library? Well, let me challenge you on that. Truth is that if you’re working on a module in your application, that module has some API. If you’re writing a micro service used by others, that service has an API, too. All software is built from smaller pieces. In this post I call those pieces “libraries”, so today we’re all library writers.

Why Bother About Upgrades

As you may know or will soon realize, treating upgrades seriously takes time. But if a library is to be taken seriously, its authors need to do it.
Otherwise users faced with bugs will not consider an upgrade, but a switch to a different library instead. Projects that are hard to upgrade will have bad reputation of being buggy and hard to maintain. Even if authors fix all issues in a new version, it won’t matter to the users, who are stuck in the past, unable to progress.

Library writers spend a lot of their time looking at the library from different perspectives. It’s fun to fix bugs and ship new features. So much, that it’s easy to forget that users actually don’t enjoy upgrades.

Any non-trivial system has tons of dependencies that won’t upgrade themselves. As a user, I never have time to read release notes for all of those. Most often, I’m looking for a new version only when there’s a problem with the current one. And when I have a problem, first I need to decide if upgrading is an option to solve it.

And so, let me start with the most important thing. The changelog.

Maintain a Changelog

For authors of a library, changelog may act as a record of “what happened when”, so they remember which features are available in which versions. For users on the other hand, a changelog should answer a simple question “Will this upgrade benefit me?”

When a guy like me reads release notes, I’m probably trying to figure out if an upgrade may solve my problem. I want release notes to tell me as much as possible, because otherwise I have to actually perform the upgrade to see if it helped. This can be very frustrating.

What exactly am I looking for? This depends on the problem at hand.

Performance — When changelog only says “lowered memory requirements for generating XML”, it merely sparks my curiosity. The authors can vastly help by publishing benchmarks along release notes. I want to know how much I’ll gain based on those benchmarks, without doing much additional measurements. Authors, you’ve probably done that benchmark anyway, please publish it!

Bugs — Saying that bug X was fixed is nice. What’s even better, is specifying which versions are affected by the bug, so it is easy to figure out if the upgrade will help at all. I guess we’ve all seen release notes with something like “fixed several application crashes”. It’s cool you’ve fixed something, but is it the bug I’m experiencing? No idea. Being verbose about what’s wrong is even more important when the bug has anything to do with security. Serious projects often have a dedicated place to track security issues. For example PyPI, Django, Ubuntu.

Missing feature — Have you ever reread the whole documentation looking for a good way to do your task? That’s another case when I become interested in checking newer versions. Dear authors, please, don’t just name your new features. Add a link to the documentation and an example. Seeing new API in action is so much better than trying to guess if “support for alpha channels” will really help me combine semi-transparent PNG images.

Upgrade Guide

Now that the users know there is something promising in the new release, they’ll gladly learn more about the process of getting there. Such guide should mostly focus on things that need to be changed. New features become important only when users manage to upgrade and run their old code. So to begin with, authors should focus on deprecations and removals.

I like it when guides refer to changes using full names of methods, constants and so on, because then I can grep my code while reading and make a list of changes that affect me.

See REST framework’s 3.0 announcement for a great example of an upgrade guide. It even includes a video. Hats off. The before- and after-upgrade code samples were very helpful. Rationale presented in the guide explained how I should re-model my code to make good use of the REST framework. What’s best is that all this information was in one place, so I didn’t have to jump much to other parts of the documentation.

Automated Tool

Some upgrades are easy in their nature, but time consuming because there are so many places to change. If the community around a library is big, it is a great idea to build an automated tool that can perform at least a part of the upgrade. Let me use Python itself as an example. Jumping from version 2.x to 3.x involved lots of simple code changes. Nobody would enjoy adding parentheses for those print statements. Luckily, there’s a tool for that: 2to3.

Upgrade-Friendly Implementations

When migrating old code, it is easy to miss a few places, leaving behind some, now invalid, uses of a library. In ideal world, when application is not correct, it won’t even start. In Python, I like solutions that get close to that ideal.

If authors of a library can detect improper usage at runtime, they should be brutal about it. Aborting execution is the right answer. If this sounds too drastic for you, ask yourself if you’re not missing something in your testing and monitoring frameworks.

Silent errors causing services to seem like they function properly are source of bugs that are one of the worst to spot and investigate. I much more prefer to just have a service stopped, which I can immediately diagnose.

Let’s see how Python’s built-in mechanisms can be used to facilitate that in an idiomatic manner.

Deprecated API

Module warnings is available since Python 2.1. Among other things, it lets you tell users that the API is deprecated. What’s great about it is that it works similarly to logging libraries: users can decide what to do with those warnings. What I like to do is turn them into fatal errors when running tests. See module’s documentation on how to do it.

Here’s a usage example from the REST framework:

@property
def DATA(self):
    ...
    warnings.warn(
        "`request.DATA` is deprecated. Use `request.data` instead.",
        DeprecationWarning,
        stacklevel=1
    )
    ...

There is a separate warning type for things that are not yet deprecated, but will be in the future version. Such pending deprecations are useful in a number of cases, for example, when authors maintain several major releases.

Removed API

Language will complain on its own if users try to call non-existing methods, so some of removals need no further attention. However, when using API means providing hooks, overriding methods or setting some global variables, the language itself will not notice the misuse.

Luckily, authors can easily handle such cases. Here’s an example of how REST framework detects when it is run with a class written according to a contract that was OK in 2.x, but isn’t used in 3.x. Notice the assertion. There’s no way to make this code run correctly, so just abort it. What’s also great about the assertion is the detailed explanation of what’s wrong and how to fix it.

def _handle_backwards_compat(self, view):
    assert not (
        getattr(view, 'pagination_serializer_class', None) or
        getattr(api_settings, 'DEFAULT_PAGINATION_SERIALIZER_CLASS', None)
    ), (
        "The pagination_serializer_class attribute and "
        "DEFAULT_PAGINATION_SERIALIZER_CLASS setting have been removed as "
        "part of the 3.1 pagination API improvement. See the pagination "
        "documentation for details on the new API."
    )

Another example, this time showing what users get in the console while running the code:

AssertionError: Serializer `webapp.serializers.ProductSerializer` has
old-style version 2 `.restore_object()` that is no longer compatible
with REST framework 3. Use the new-style `.create()` and `.update()`
methods instead.

I run unit tests and they told me what to fix. Isn’t it great?

The getattr is very versatile, it can be used with modules, classes and instances. Usually the only thing left are method parameters. Checking them is straightforward. Here’s another example from REST framework

def __init__(self, read_only=False, write_only=False,
             required=None, default=empty, initial=empty, source=None,
             label=None, help_text=None, style=None,
             error_messages=None, validators=None, allow_null=False):
    ...
    # Some combinations of keyword arguments do not make sense.
    assert not (read_only and write_only), NOT_READ_ONLY_WRITE_ONLY
    assert not (read_only and required), NOT_READ_ONLY_REQUIRED
    assert not (required and default is not empty), NOT_REQUIRED_DEFAULT
    assert not (read_only and self.__class__ == Field), USE_READONLYFIELD

Those types of checks can verify that clients are not using an old API, but as seen in the example above, they can also be used to enforce proper usage of the current version.

Plan to Upgrade

Thoughtful preparation goes a long way in making upgrades easy when they finally have to happen. It’s no rocket science, so I’ll just briefly mention some good practices.

Semantic versioning

If you haven’t read http://semver.org/ already, please, go there and read it. Consider it a part of this post.

Documentation

Documentation should be versioned, just as the main project. Authors should host all major versions of the documentation, even if they no longer support that release, because it’s easy and there are users still using it.

When methods become introduced or deprecated after the initial release, their description should include a version number of when it happened. It helps user see how long that method will be available, or to which version they need to upgrade to use it.

You’ve seen such documentations already:

  • REST framework 2.x and 3.x
  • Django 1.4 and 1.8 (notice the neat selector in the bottom-right corner)
  • Python 2.x and 3.5

Don’t Forget the Code

When writing documentation, remember that people still read code. If you keep your documentation separate, make sure that when reading code, it is still obvious what is and what isn’t considered a public API.

If you come from outside Python’s world, let me stress that I’m not talking about what’s considered “private” or “public”, and that’s for at least two reasons. First, in Python there’s no public or private. There is only a convention about naming private stuff with the underscore prefix. This convention isn’t followed much. Second, the API that is supposed to be used by users is probably a small subset of all public things in the library.

Here’s an example of code that’s good or bad, depending on the library’s author intention. Is length supposed to be public or is it there just to save on a common calculation that would be repeated in many methods? Since there’s no extra context suggesting otherwise, as a user I would assume it is OK to use length in my own code and I would expect to see renaming of length mentioned in the changelog. If that’s not the case, the code is incorrect.

class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y
        self.length = math.sqrt(x*x + y*y)
    ...

And here’s a funny example from Factory Boy, where an underscored method _build is a part of the public API. So much for the convention.

@classmethod
def _build(cls, model_class, *args, **kwargs):
    """Actually build an instance of the model_class.

    Customization point, will be called once the full set of args and kwargs
    has been computed.
    ...

Keep your dependencies simple

Let’s say you’re an author of a library web_crawler that uses http_request 1.0 to fetch pages and I’m your happy user. I’m also using a library called notifications that happens to also use http_request 1.0. If your new release of web_crawler depends on http_request 2.0, but library notifications was not upgraded, I won’t be able to easily upgrade web_crawler.

What can authors do about it? The simplest solution is to limit your dependencies to essentials. Use well established low-level libraries that have matured and don’t go over major redesigns anymore. Stick to Python’s built-in libraries whenever possible. If a new feature would require pulling in a non-trivial dependency, consider skipping that feature. Instead use the documentation to show users how they can do it themselves.

Another approach is to support several versions of your dependency. Often it requires little extra work while the gain for your users is enormous.
Let me show you another example from the REST framework. Module compat encapsulates everything that changes between supported versions of Django. As a design policy, it was decided that all such things are to be placed in wrappers inside this module and the rest of the code is clean and simple. Very elegant.

I’ll quote several most interesting cases. First, a way to bring OrderedDict depending on where it is available. Note that the standard library is tried first.

try:
    from collections import OrderedDict
except ImportError:
    from django.utils.datastructures import SortedDict as OrderedDict

Handling ImportError can be used to provide optional features. Setting postgres_fields to None simplifies later if-statements.

try:
    from django.contrib.postgres import fields as postgres_fields
except ImportError:
    postgres_fields = None

When the name is always present in a module, but its semantics change, ImportError won’t do the trick. Here’s when version numbers can be very handy. Based on a version number, you can define a custom wrapper to unify the interface. Remember that in Python if-statement creates no scope, so classes defined in the else block are still available in the whole module.

if django.VERSION >= (1, 5):
    from django.views.generic import View
else:
    from django.views.generic import View as DjangoView
    class View(DjangoView):
        def _allowed_methods(self):
            return [m.upper() for m in self.http_method_names if hasattr(self, m)]

Version your module

I’ve just mentioned how important it can be to support several versions of a library. Being able to check library’s version programmatically is a game changer here. So if you’re an author, please provide the version of your library. It is as simple, as putting

__version__ = '1.0.3'

in your toplevel module. This convention is suggested in PEP-396 and PEP-8. Many people follow this style even though it isn’t officially standardized (yet?).

Maintain Several Releases

This is a must for authors of any respectful library. As a bare minimum, authors should allow downloads, keep tags and host documentation for older releases. Major releases should be kept around at least for a year, better two. How long exactly depends on the number of users you have.

Serious bug fixes, especially security related, should be back-ported to old major versions. Otherwise you can put your users in a situation when they have a known vulnerability in their code and are unable to upgrade because a jump to the latest major version is too much for them.

Biggest projects often include a Long Term Support versions. This is a great thing for clients with gigantic code base that’s hard to upgrade often.

Test!

The main fear people have when upgrading are regression bugs. Authors must test their library thoroughly. Always add regression tests when doing bug fixes. When in doubt, err on having too many tests than too few. Perhaps you know how not to break your code, but your contributors may not.

When Upgrade is Too Hard

In practice many things can go wrong making the upgrade process hard despite all preparations. Here are two tricks to consider when upgrade isn’t smooth.

Go Back in Time

Let’s say you’re using version 3.2 and just discovered there’s 4.10 available. Imagine library’s authors at the time of introducing 4.0. Back then, they probably cared about 3.x users. But with 4.10, they may not. They probably prepared great upgrade instructions, but only going from the last 3.x (I’ll name it 3.9) to 4.0. Your upgrade may be much easier if you first upgrade from 3.2 to 3.9, then to 4.0, and then to 4.10. Try even more steps, if this is still too hard.

It works, because in 3.9 you may find some pending deprecation warnings and prepare code for 4.0 even before the upgrade happens. Then jumping to 4.0 is easier because there’s a detailed upgrade guide about it and you’ve already done a part of it. Moving on to 4.10 shouldn’t involve much work at all (because of proper use of semantic versioning!) but some functions could become deprecated, so now you only need to deal with those.

Once more, let me use REST framework as an example. Let’s see how request.py evolved over time.

In version 2.4, there was a QUERY_PARAMS parameter.

@property
def QUERY_PARAMS(self):
    """
    More semantically correct name for request.GET.
    """
    return self._request.GET

In version 3.0 authors decided to rename it to query_params, but the old name wasn’t removed immediately. Instead, it started as a PendingDeprecationWarning:

@property
def QUERY_PARAMS(self):
    """
    Synonym for `.query_params`, for backwards compatibility.
    """
    warnings.warn(
        "`request.QUERY_PARAMS` is pending deprecation. Use `request.query_params` instead.",
        PendingDeprecationWarning,
        stacklevel=1
    )
    return self._request.GET

In version 3.1, the name became deprecated:

@property
def QUERY_PARAMS(self):
    """
    Synonym for `.query_params`, for backwards compatibility.
    """
    warnings.warn(
        "`request.QUERY_PARAMS` is deprecated. Use `request.query_params` instead.",
        DeprecationWarning,
        stacklevel=1
    )
    return self._request.GET

And finally, in version 3.2 it no longer works. But it has not been removed yet, to give users a sensible error. Great!

@property
def QUERY_PARAMS(self):
    raise NotImplementedError(
        '`request.QUERY_PARAMS` has been deprecated in favor of `request.query_params` '
        'since version 3.0, and has been fully removed as of version 3.2.'
    )

Postpone the Trouble

Sometimes there’s just no time to do an upgrade. Consider using both versions of the library. I don’t really recommend it, because by doing so you’re simply asking for trouble. But all I’m saying is that it’s good to have options. Library’s authors can make it practically impossible, or relatively easy. While hardly any language supports it directly, sometimes it’s as easy as forking the project and renaming the top-level module.

It should be easy to do parallel installations of libraries with sensible design that:

  • avoids global variables
  • avoids excessive dependencies
  • doesn’t change global state

Summary

Writing a library in a way that will help users upgrade it is easy but it’s not something authors think about when they start their work. Library writers must realize how much impact it can have on the success of their library. Users must realize how critical it may become to be able to quickly upgrade their code. After all, a poorly addressed security fix can put an end to a whole organisation.

There’s really no rocket science behind upgrades, as long as you follow good coding practices. Upgrades are just yet another important reason to follow them even when no one is watching.

What’s your experience with upgrades? Can you remember some particularly successful ones? What’s the worst upgrade you’ve had?

Happy upgrading!


Cover photo by Mike Rastiello licensed with Attribution-NonCommercial 2.0 Generic License

Posted by

Tomek Rydzyński

Share this article