Algebraic Data Types in (typed) Python

By properly utilizing Algebraic Data Types (ADTs, not to be confused with abstract data types), you can transform certain types of invalid states from runtime errors into type-checking errors, making them an excellent method for representing data and managing state. Although ADTs may sound complex, they represent a fairly straightforward concept. They are composite types, meaning they reference or contain other types, and primarily consist of two basic categories: product and sum types. If you have experience with tuples, you've worked with product types, and if you've used booleans, you've dealt with sum types. While there are additional types, we'll focus on these two fundamental categories for now.

We'll be using typed Python, since ADTs really shine with the help of type-checking.

Here's the short version: both product and sum types are composite types that reference other types and let's pretend those other types are str and int. Following this, a product type would contain str AND int, and a sum type contains str OR int.

So, what's up with the naming? Product and sum types do sound kind of pretentious.

Sum types are called sum types because the number of possible different values in a sum type is simply the sum of all possible values of its containing types. The number of possible different values of a product type is... A Cartesian product of all the values of its containing types. This actually came as a surprise to me while researching this post, since I had a different notion of where the naming comes from[1].

Product types

Python has a ton of product types; tuples being the OG. A tuple[str, int] is a product type over str and int - it contains a string AND an integer. The number of different values of this tuple is: the number of different values of str, multiplied by the number of different values of int. Admittedly that's not very useful in this context since both strings and integers in Python are bounded by available memory, but the theory checks out.

Other product types include named tuples, dataclasses, attrs classes, msgspec classes, TypedDicts, you get the idea. You use a ton of product types every day without thinking about it so I won't dwell on them too much.

Sum types

bool is the quintessential sum type in Python, being logically equivalent to a union of two singletons, True and False. Barring bool, Python sum types come in the form of enums, literals or unions.

Enums have been here a while, being added back in 3.4. Here's an example from the docs:

from enum import Enum

class Color(Enum):
    RED = 1
    GREEN = 2
    BLUE = 3

This is a sum type of 3 singletons. Color.RED, Color.GREEN and Color.BLUE are, effectively, values inhabiting the Color type. Meaning if a function parameter is annotated with Color, passing in Color.RED is valid. Enums can only contain singletons, although the singletons themselves can be of any value. So this is also legal:

class Color(Enum):
    RED = (1, "a")
    GREEN = (2, "b")
    BLUE = (3, "c")

(Given that these are singletons, you want to be using immutable values here.)

If you've ever used the Rust programming language, you may have noticed it has enum types of its own. Even though they're called the same, Rust enums are different; they are equivalent to Python unions which we'll get to later.

Literals were added to Python in 3.8 (although since they're mostly a typing construct I think they are usable on earlier versions via typing_extensions), and they are essentially a simple, anonymous enum with some restrictions.

Here's Color as a literal:

from typing import Literal

ColorLiteral = Literal["RED", "GREEN", "BLUE"]

If a function argument was annotated as a ColorLiteral, Mypy would ensure only these three strings were allowed to be passed in.

I think literals are super fun since they're so easy to define and use. But again, literals can only contain string and int values, and singletons; you can't have a Literal["RED", tuple[str, int]]. I've been using them more and more since they're just more airy than enums.

Unions are the heavyweight of Python sum types. They've been around for ages, and in Python 3.10 they've received dedicated syntax to make them more readable (which I'll be using here).

If you want a sum type that can be either a string or an int, you need to use a union:

StringOrInt = str | int

Now if your function takes an instance of StringOrInt, you'll know it's either an instance of a string or an int.

You might think unions only work for classes, so you can't have a union of a product type and a sum type, and you'd be wrong - unions can contain literals and enums too.

Sum types in practice

Here's an example. Your web app has a user model, and your users have a privilege assigned to them. The privilege can be one of:

  • normal user
  • admin
  • banned

But for banned users, you want to know when they were banned. You might go about modeling your user class like so:

class User:
    privilege: str
    banned_at: datetime | None

This definition sucks. First of all, the privilege field is just a string, but not any string can be a valid privilege. So let's fix that.

class User:
    privilege: Literal["normal", "admin", "banned"]
    banned_at: datetime | None

Ok, great, now we can leverage our type-checker to ensure privilege is always valid. But our state modeling is still lacking; it's possible for a user to exist with the normal privilege and a banned_at date, and a banned user with no banned_at field. We're using two independent fields of a product type (User) where we should be using a sum type instead.

A product type in a trench coat pretending to be a sum type is something I see all the time, especially in contexts where sum types don't exist or are hard to use. The most famous example is maybe found in the Go programming language, where a function that can return a successful result T or an error will return a tuple[T, error], and the callers are expected to check if the error is nil before proceeding. Following example taken from the offical docs:

f, err := os.Open("filename.ext")
if err != nil {
    log.Fatal(err)
}
// do something with the open *File f

There's no help from the type-checker if you return both (or neither) by accident. Good times.

We can, and will, do better:

class Banned:
    banned_at: datetime

class User:
    privilege: Literal["normal", "admin"] | Banned

So there, as long as we run Mypy on this (and the rest of our toolkit supports these constructs), it's actually impossible for our Users to get into an invalid state. Not only that, but intended valid states are obvious from the types themselves, so we don't need endless comments, tests and Notion pages explaining how the banned_at field is only there for banned users.

Handling Sum Types

Booleans and enum members are singletons, so you'll need a chain of ifstatements using identity comparisons (the is operator):

def process_color(c: Color):
    if c is Color.RED:
        print("Got Red!")
    elif c is Color.BLUE:
        print("Got Blue!")
    # and so on

Literals are values so you'll need ordinary equality, unless the literal contains an enum member, in which case you'll want identity comparisons again.

def process_color_literal(c: ColorLiteral):
    if c == "RED":
        print("Got Red!")
    # an so on

Unions are a little more complex since they can contain literals, enums, and a lot of different product types at once. For union members that happen to be classes, you should use an isinstance check.

def process_user(user: User):
    if isinstance(user.privilege, Banned):
        print("The user is banned!")
    elif user.privilege == "admin":
        print("The user is an admin"
    # an so on

Python 3.10 introduced the match statement to the language. Matching is cool since it presents a unified way of handling both product and sum types, and I feel it expresses the intent of the code better. The cons are its complexity (I've been using it for a while and I still need to go look up the guides sometimes) and the fact it requires an additional indentation level. Here's the last example written using match:

def process_user(user: User):
    match user.privilege:
        case Banned(banned_at=banned_at):
            print(f"The user was banned at {banned_at}!")
        case "admin":
            print("The user is an admin!")
        # an so on

Handling Sum Types Exhaustively

Here's a useful trick for exhaustively handling sum types. It can often be the case that all possible variants of a sum type need to be handled, and neither the if chain nor the match statement provide this by default. For example, in the last example, the case of the normal user privilege wasn't handled. Exhaustiveness checking is great because it can add a layer of safety to refactoring sum types: new variants can be added without them causing havoc somewhere downcode.

The typingmodule contains a special function called assert_never. If a static type checker detects this function in a place that's not unreachable, it will generate an error. We can make use of this to make our checks exhaustive.

from typing import assert_never

def process_user(user: User):
    match user.privilege:
        case Banned(banned_at=banned_at):
            print(f"The user was banned at {banned_at}!")
        case "admin":
            print("The user is an admin!")
        case _:
            assert_never(user.privilege)  # ERROR

(For the if chain approach, instead of case _ just use a bare else.)

If we actually handle the missing variant, the error goes away.

from typing import assert_never

def process_user(user: User):
    match user.privilege:
        case Banned(banned_at=banned_at):
            print(f"The user was banned at {banned_at}!")
        case "admin":
            print("The user is an admin!")
        case "normal":
            print("The user is a normal user!")
        case _:
            assert_never(user.privilege)  # No error any more.

More on Naming

Ever since some ancient, long-forgotten college course I've had this notion that multiplication maps to conjunction (i.e. logical AND), and addition maps to disjunction (i.e. logical OR). In other words, * means AND, and + means OR. Probably because someone taught me Boolean algebra for a digital electronics class. So it was always logical to me that product types (product, *) means those types reference type A AND type B, and sum types (+) mean those types reference type A OR type B. Which is ultimately correct.
But it turns out that's not exactly where product and sum types get their name.

Tin
Zagreb, Croatia