Reduce - The Power of a Single Python Function

While Python is not a pure functional programming language, you still can do a lot of functional programming in it. In fact, just one function - reduce - can do most of it and in this article I will (try) show you all the things one can do just with reduce.

Reduce

Let's begin by looking at what reduce is. Formally, Python's functools.reduce is a higher-order function that folds a binary operation into each pair of items in an iterable. In simpler words - it's a function that takes function and an iterable as parameters, applying the function to pairs of values from the iterable until only one value is left.

The easiest way to really understand what it does, is to look at an example. Here's how you can implement factorial using reduce:


from functools import reduce
import operator

# Signature:
reduce(function, iterable[, initializer])

# Factorial
reduce(lambda x, y: y*x, range(1, 6), 1)
reduce(operator.mul, range(1, 6))
(((((1 * 1) * 2) * 3) * 4) * 5)
# 120

What we know as reduce in Python is commonly known as fold (or more specifically foldl) in functional programming languages, such as Haskell:


factorial n = foldl (*) 1 [1..n]

main = do
  let f = factorial 5
  print(f)

# 120

The above shows equivalent implementation of factorial in Haskell using foldl, which doesn't look that much different from the Python version.

Various versions of a fold (left, right, with or without initializer) are very powerful, general-purpose functions. If you ever played with Haskell, then you know that fold can be used to implement literally anything. So, let's take a look at how does that claim translate to Python and its reduce...

Folding

As already noted, reducing/folding is very powerful. That's because anything that can be computed from a sequence of successive elements can (if awkwardly) be expressed as a reduction. So, just to prove a point, here are other builtin Python functions implemented using reduce:


from functools import reduce
import operator

_sum = lambda d: reduce(operator.add, d, 0)  # sum()

f = str
_map = lambda d: reduce(lambda x, y: x + [f(y)], d, [])  # map()

is_prime = lambda n: all(n%j for j in range(2, int(n**0.5)+1)) and n > 1
_filter = lambda d: reduce(lambda x, y: x + [y] if is_prime(y) else x, d, [])  # filter(is_prime, range(10))

_reversed = lambda d: reduce(lambda x, y: [y] + x, d, [])  # reversed(data)

_min = lambda d: reduce(lambda x, y: x if x < y else y, d)  # min(data)

_max = lambda d: reduce(lambda x, y: x if x > y else y, d)  # max(data)

As you can see from the above, you can implement other basic functional programming concepts such as map and filter using reduce. So, while not practical, it's definitely possible to write proper functional programming code just with reduce.

As mentioned earlier reduce is also known as foldl because it folds from left, what about the opposite - foldr? Python doesn't have builtin version of reduce that "folds right", but you can implement anything with reduce, right?


data = [7, 4, 3, 6, 2]
foldr = lambda f, d: reduce(lambda x, y: f(y, x), d[::-1])
foldr(data) == (7 - (4 - (3 - (6 - 2))))  # True

reduce(operator.sub, [7, 4, 3, 6, 2]) == ((((7 - 4) - 3) - 6) - 2)  # True

reduce works well for left associative operations like / (such as ratio) or generally associative operations like * or +, e.g., (((((1 * 1) * 2) * 3) * 4) * 5). If you however would like to perform the operations from the other side, then you'd need foldr and above is a quick implementation.

Beyond Folds

While folding is a fun exercise, there are many more practical examples of how to use reduce in Python. First being function composition.

Function composition is fundamental concept in functional programming, and it's very useful for chaining multiple functions:


import functools

def compose(*funcs):
    return lambda x: functools.reduce(lambda acc, f: f(acc), funcs, x)

# f3(f2(f1(value)))
func = compose(f1, f2, f3)
func(value)

The above one-liner needs only reduce and lambda to chain together any number of functions.

Notice also, that order matters here - we pass the functions to compose as (f1, f2, f3), but they are computed as f3(f2(f1(...))). If we wanted it to be computed as f1(f2(f3(...))) we'd need to reverse the list of function as funcs[::-1].

Moving on from the world of pure functional programming, let's say we have a large JSON (dictionary) returned by some API and we're looking for some deeply nested key:


data = {
    "deeply": {
        "nested": {
            "python": {
                "dict": "value"
            }
        }
    }
}

print(reduce(lambda d, key: d[key], ["deeply", "nested", "python", "dict"], data))
# value

Here we use dictionary getter to go through the levels of a dictionary. If you don't like lambda, then this can be also done with dict.get.

In similar vein, we can also use attribute getter to reach deep into class attributes:


class SomeClass:
    pass

c = SomeClass()
c.one = SomeClass()
c.one.two = SomeClass()
c.one.two.three = "data"

attrs = ["one", "two", "three"]

reduce(getattr, attrs, c)
# prints: "data"

Hashing function are a good example of reduction - they compute a constant length digest of a value of arbitrary length by continuously applying the same function to the chunks of the value until it's reduced to the constant length:


import operator

from random import choice
from string import ascii_uppercase

# 100 random uppercase characters
random_string = ''.join(choice(ascii_uppercase) for i in range(100))

def _hash(data):
    n = 5
    # Break into chunks of 5 chars and compute their hash
    chunks = [hash(data[i:i+n]) for i in range(0, len(data), n)]
    # Reduce hash components into single value
    return reduce(operator.xor, chunks, 0)

print(_hash(random_string))
# 5469166689487367977

In this example, we first break the text we want to hash into number of chunks. We then compute hash of each chunk and store it in a list. To create the final hash, we combine reduce with operator.xor to iteratively combine hashes of each chunk until only one value is left.

Now, let's say you have a dictionary with numerical values and you want to sum them all up. You could obviously use for loop to do it, but reduce allows for nice succinct solution:


data = {"key": 4, "other-key": 6, "another": 7}
print(reduce(lambda x, key: x + data[key], data, 0))
# 17

Naturally, reductions are quite useful for many mathematical operations, such as for computing nCr or combinations:


import operator as op
from functools import reduce

# https://en.wikipedia.org/wiki/Combination
def comb(n, k):
    k = min(k, n-k)
    numerator = reduce(op.mul, range(n, n-k, -1), 1)
    denominator = reduce(op.mul, range(1, k+1), 1)
    return numerator // denominator

I intentionally named it comb because there is a function with the same name in math module, which does the same thing. As we all know Python comes with batteries included, so it always makes sense to look around in the standard library before implementing your own solution.

With that said, there a few functions in math module that could be implemented using reduce, which might be a good exercise in using and understanding how reduce really works. Some candidates would be: factorial, fsum, gcd, lcm or prod.

While on the topic of math, we can also use reduce in combination with pandas. For example to compute union of data frames:


import pandas as pd

dfs = [pd.DataFrame([(1, 2, 3), (4, 5, 6), (7, 8, 9)]),
       pd.DataFrame([(3, 7, 9), (1, 3, 5), (0, 1, 2)]),
       pd.DataFrame([(9, 7, 1), (6, 2, 5), (1, 2, 4)])]

df = reduce(pd.DataFrame.add, dfs)
print(df)

#     0   1   2
# 0  13  16  13
# 1  11  10  16
# 2   8  11  15

Usually, we would use x.DataFrame.add(y) as it's more natural in Python, but here we provide pd.DataFrame.add as a reduction function instead. The trick here is that first argument of DataFrame.add is self, so it's possible to write this as pd.DataFrame.add(x, y) instead of lambda x, y: x.DataFrame.add(y).

And expanding on the previous example, we can use reduce one more time with pandas, this time to perform column-wise reduction:


dfs = ...
df = reduce(pd.DataFrame.add, dfs)

sums = df.apply(lambda x: reduce(operator.add, x), axis=0)
print(sums.to_string())
# 0    42
# 1    37
# 2    34
sums = [reduce(operator.add, df[row]) for row in df]
print(sums)
# [32, 37, 44]

Similar to what we did earlier with dictionaries, we can also use reduce to sum elements of other iterables. So, let's say we want to sum specifically 3rd elements of list of tuples:


data = [("John", "Smith", 5300.0), ("Ben", "Greene", 1532.30), ("Amy", "Holmes", 3550.75)]

# With list comprehension
print(reduce(lambda a, b: a+b, [sub[2] for sub in data]))
# Simpler
print(reduce(lambda a, b: a+b[2], data, 0))
# 10383.05

And while reducing various Python iterables, let's also show an example for set:


data = [
  [1, 6, 8, 2, 3],
  [2, 9, 4, 1, 6],
  [1, 7, 5, 6, 2]
]

print(reduce(set.intersection, map(set, data)))
# {1, 2, 6}

Here with the above snippet we can use set.intersection operator to compute intersection of any number of iterables.

And finally, if you're a fan of type-checking or static typing, or you want to clearly see what can go into the reduce expression and what comes out, then you might consider adding some type hints to your reductions. This can be done even with the inline lambda expressions:


from collections.abc import Callable
from functools import reduce
from typing import cast, TypeAlias, TypeVar

FloatFunc:  TypeAlias = Callable[[float, float], float]
ReverseFunc: TypeAlias = Callable[[list, str], list]
T = TypeVar("T")
MapFunc: TypeAlias = Callable[[list, Callable[T, T]], list]

_sum      = lambda d: reduce(cast(FloatFunc, lambda x, y: x+y), d, 0.0)
_min      = lambda d: reduce(cast(FloatFunc, lambda x, y: x if x < y else y), d)
_max      = lambda d: reduce(cast(FloatFunc, lambda x, y: x if x > y else y), d)
_reversed = lambda d: reduce(cast(ReverseFunc, lambda x, y: [y] + x), d, [])
_map      = lambda d: reduce(cast(MapFunc, lambda x, y: x + [f(y)]), d, [])

This uses cast function to signal to type checker that return value will be of the designated type. Be aware though that this doesn't actually change anything at runtime, rather it's just information for type checker such as mypy.

reduce expressions aren't particularly easy to understand to begin with, so knowing input and output types might be useful. However, while this surely adds clarity, it also makes the expressions more verbose, which doesn't exactly help with readability, so you decide whether it's worth it to include type hints in these kinds of one-liners.

Conclusion

Fold - and therefore also reduce - is a general purpose abstraction, useful for more than just pure, functional programming. These reductions offer a way to remove the boilerplate and solve problems that can be naturally expressed recursively, which there are plenty of.

With that said, while Python provides some tools for functional programming (itertools, functools, lambdas, decorators), it is not a functional programming language, so don't try to make it into one. Some of the examples above are here to prove a point and using them might not be the best idea.

Also, even in the situations where folds or other functional programming concepts might provide a good solution, do also consider that many (or most?) people who will read your code might not be familiar with them and might have hard time making sense of what it does.

Subscribe: