Exciting New Features in Django 3.2

And my wishlist for the future...


Django 3.2 is just around the corner and it's packed with new features. Django versions are usually not that exciting (it's a good thing!), but this time many features were added to the ORM, so I find it especially interesting!

This is a list of my favorite features in Django 3.2

Image from the Django welcome page

A lot of great people worked on this release and none of them is me. I included links to the tickets of each new feature to show my appreciation to the people behind it.

Table of Contents


⚙ Setup local environment with the latest version of Django

To setup an environment with the latest version of Django start by creating a new directory and a virtual environment:

$ mkdir django32
$ cd django32
$ python3.9 -m venv venv
$ source venv/bin/activate

To install the latest version of Django you can either install using pip, or if it hasn't been released yet, install directly from git:

(venv) $ pip install git+https://github.com/django/django@3.2a1

Start a new project and app:

(venv) $ django-admin startproject project
(venv) $ ./manage.py startapp store

Add the new app to the list of INSTALLED_APPS, and configure a PostgreSQL database:

# settings.py

INSTALLED_APPS = [
    # ...
    'store',
]

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.postgresql',
        'NAME': 'django32',
        'USER': 'postgres',
    }
}

To try out some of the new features, create a Customer model:

# store/models.py
import datetime

from django.db import models

class Customer(models.Model):
    joined_at: datetime.datetime = models.DateTimeField()
    name: str = models.CharField(max_length=1000)

Finally, create the DB, generate and apply the migrations:

(venv) $ createdb django32 -O postrges
(venv) $ ./manage.py makemigrations
(venv) $ ./manage.py migrate

Great! Now add some random customer data:

import string
import datetime
import random
import pytz

starting_at = pytz.UTC.localize(datetime.datetime(2021, 1, 1))
DAY = datetime.timedelta(days=1)

Customer.objects.bulk_create((Customer(
    joined_at = starting_at + (DAY * random.random() * 365),
    name = ''.join(random.choices(string.ascii_letters + ' ' * 10, k=random.randint(10, 20))),
) for __ in range(10_000)))

Congratulations! You now have 10K new customers, you are ready to go!


Covering Indexes

Ticket #30913

Covering indexes let you store additional columns in an index. The main benefit of a covering index is that when a query only uses fields that are present in the index, the database can use an index-only scan, meaning the actual table is not accessed at all. This can make queries faster.

Django 3.2 added support for PostgreSQL covering indexes:

The new Index.include and UniqueConstraint.include attributes allow creating covering indexes and covering unique constraints on PostgreSQL 11+.

If for example you are searching for names of customers that have joined during a certain period of time, you can create an index on joined_at, and include the field name in the index:

class Customer(models.Model):
    class Meta:
        indexes = (
            models.Index(
                name='%(app_label)s_%(class)s_joined_at_ix',
                fields=('joined_at',),
                include=('name',),
            ),
        )

The include arguments makes this a covering index.

For queries that only use the fields joined_at and name, the database will be able to satisfy queries using just the index:

>>> print(
...   Customer.objects
...   .filter(joined_at__lt=pytz.UTC.localize(datetime.datetime(2021, 2, 1)))
...   .values_list('name')
...   .explain()
... )

Index Only Scan using store_customer_joined_at_ix on store_customer
  Index Cond: (joined_at < '2021-02-01 00:00:00+00'::timestamp with time zone)

The query above finds names of customers that joined before February 2021. According to the execution plan, the database was able to satisfy the query using just the index, without even accessing the table. This is called an "index only scan".

Index only scans can be a bit confusing at first. As described in the official documentation of PostgreSQL, it might take some time before PostgreSQL can actually use just the index:

But there is an additional requirement for any table scan in PostgreSQL: it must verify that each retrieved row be “visible” to the query's MVCC snapshot [...]. Visibility information is not stored in index entries, only in heap entries; so at first glance it would seem that every row retrieval would require a heap access anyway.

Another way to check if a table page can be viewed by the current transaction is to check the table's visibility map, which is significantly smaller and faster to access than the table itself. It may take some time for PostgreSQL to update the visibility map, so until then you might see an execution plan like this one:

Bitmap Heap Scan on store_customer  (cost=27.07..117.02 rows=876 width=16)
  Recheck Cond: (joined_at < '2021-02-01 00:00:00+00'::timestamp with time zone)
  ->  Bitmap Index Scan on store_customer_joined_at_ix  (cost=0.00..26.86 rows=876 width=0)
        Index Cond: (joined_at < '2021-02-01 00:00:00+00'::timestamp with time zone)

To check if your index can really be used for index only scans, you can speed up the process by manually issuing vacuum analyze on the table:

VACUUM ANALYZE store_customer;

Executing VACUUM will also reclaim some unused space and make it available for re-use.

UPDATE 2020-03-04: I originally suggested using VACUUM FULL instead of plain VACUUM. A commenter on twitter mentioned that using VACUUM will be sufficient for this purpose, and will be much less intrusive and disruptive, so use that instead!

It is also important to keep in mind that inclusive indexes are not free. Additional fields in the index will make the index bigger.


Provide Timezone to TruncDate

Ticket #31948

I write a lot about mistakes in SQL and timezones are usually at the top of the list. One of the most dangerous mistakes when working with timestamps is truncating without explicitly specifying a timezone, which can lead to incorrect and inconsistent results.

In Django 3.2 it becomes easier to avoid this mistake:

The new tzinfo parameter of the TruncDate and TruncTime database functions allows truncating datetimes in a specific timezone.

In previous Django versions, the timezone was set internally according to the current timezone:

# django/db/models/functions/datetime.py
tzname = timezone.get_current_timezone_name() if settings.USE_TZ else None

As of Django 3.2, you can explicitly provide a timezone to the TruncDate functions family:

import pytz
from django.db.models.functions import TruncDay

Customer.objects
.annotate(joined_at_day=TruncDay('joined_at', tzinfo=pytz.UTC))
.values('joined_at_day')

# SELECT DATE_TRUNC('day', "store_customer"."joined_at" AT TIME ZONE 'UTC') AS "joined_at_day"
# FROM "store_customer"

Customer.objects
.annotate(joined_at_day=TruncDay('joined_at', tzinfo=pytz.timezone('America/New_York')))
.values('joined_at_day')

# SELECT DATE_TRUNC(
#     'day',
#     "store_customer"."joined_at" AT TIME ZONE 'America/New_York'
# ) AS "joined_at_day"
# FROM "store_customer"

A step in the right direction!


Building JSON Objects

Ticket #32179

Building JSON objects in PostgreSQL is very handy, especially if you are working with unstructured data.

As of Django 3.2, the function json_build_object from PostgreSQL that accepts arbitrary key-value pairs was added to the ORM:

Added the JSONObject database function.

One interesting use case is serializing objects directly in the DB, bypassing the need to create ORM objects:

>>> from django.db.models import F
>>> from django.db.models.functions import JSONObject
>>> Customer.objects.annotate(obj=JSONObject(
    id=F('id'),
    name=F('name'),
    joined_at_day=TruncDay('joined_at', tzinfo=pytz.UTC),
).values_list('obj').first()

({
    'id': 1,
    'name': 'Haki Benita',
    'joined_at_day': '2021-04-25T00:00:00',
},)

We already showed how crucial serialization performance can be, so this is something to consider.


Loud Signal Receiver

Ticket #32261

A while back I twitted about a mysterious bug I had that went unnoticed for a long time because it happened inside a signal receiver.

When you use send_robust to broadcast signals, if the signal fails, Django keeps the error and moves on to the next receiver. After all of the receivers processed the signal, Django returns a list with the receivers' return values and exceptions. To check if any of the receivers failed, you need to go over the list and check for instances of Exception. Signals are often used to decouple modules, and handing exceptions from receivers this way defeats that purpose.

To make sure I don't miss exceptions in signal receivers again, I created a "loud signal receiver" that logs exceptions:

from django.dispatch import receiver

def loud_receiver(signal, logger, **kwargs):
    """Subscribe to Django signal and log errors from the receiver function.

    When using `send_robust` to send Django Signals, errors happening in the
    receivers are kept and returned. Because signals are mostly used for decoupling
    modules, the return value from `send_robust` is often dismissed.
    To make it easier not to miss errors from Django signal receivers, use this decorator
    instead to log the exceptions to a specific logger.

    NOTE: Not necessary as of Django 3.2

    Example:
        logger = logging.getLogger('some.logger')
        @loud_receiver(signals.SomeSignal, logger=logger, dispatch_uid='uid')
        def receiver_func():
            pass
    """
    def _decorator(func):
        def loud_func(*func_args, **func_kwargs):
            try:
                func(*func_args, **func_kwargs)
            except Exception:
                logger.exception('exception from signal receiver')
                raise
        return receiver(signal, **kwargs)(loud_func)
    return _decorator

As of Django 3.2 this is no longer necessary:

Signal.send_robust() now logs exceptions.

Great!


QuerySet Alias

Ticket #27719

The alias function is an entirely new feature in Django 3.2:

The new QuerySet.alias() method allows creating reusable aliases for expressions that don’t need to be selected but are used for filtering, ordering, or as a part of complex expressions.

I often use SubQuery and OuterRef to write complex queries, and there is a little gotcha when combined with annotate:

from django.db.models import Subquery, OuterRef

Customer.objects.annotate(
    id_of_previous_customer=Subquery(
        Customer.objects
        .filter(joined_at__lt=OuterRef('joined_at'))
        .order_by('-joined_at')
        .values('id')[:1],
    )
).filter(id_of_previous_customer__isnull=True)

The query above is a complicated way to find the first customer that joined. The queryset is using SubQuery to find the previous customer for every customer by joined_at, and then looks for the customer which no other customer has joined before. This is very inefficient, but I'm using it to illustrate my point.

To understand the problem, inspect the query this queryset is producing:

SELECT
    "store_customer"."id",
    "store_customer"."joined_at",
    "store_customer"."name",
    (
        SELECT U0."id"
        FROM "store_customer" U0
        WHERE U0."joined_at" < "store_customer"."joined_at"
        ORDER BY U0."joined_at"
        DESC LIMIT 1
    ) AS "id_of_previous_customer"
FROM
    "store_customer"
WHERE
    (
        SELECT U0."id"
        FROM "store_customer" U0
        WHERE U0."joined_at" < "store_customer"."joined_at"
        ORDER BY U0."joined_at" DESC
        LIMIT 1
    ) IS NULL

The annotated subquery appears in both the SELECT and the WHERE clauses. This affects the execution plan:

Seq Scan on store_customer  (cost=0.00..4401.05 rows=50 width=32)
  Filter: ((SubPlan 2) IS NULL)
  SubPlan 1
    ->  Limit  (cost=0.29..0.42 rows=1 width=12)
        ->  Index Scan Backward using store_customer_joined_at_ix on store_customer u0
            (cost=0.29..450.59 rows=3333 width=12)
            Index Cond: (joined_at < store_customer.joined_at)
  SubPlan 2
    ->  Limit  (cost=0.29..0.42 rows=1 width=12)
        ->  Index Scan Backward using store_customer_joined_at_ix on store_customer u0_1
            (cost=0.29..450.59 rows=3333 width=12)
            Index Cond: (joined_at < store_customer.joined_at)

The subquery is executed twice!

To solve this in Django versions prior to 3.2, you can provide a values_list that excludes the annotated subquery from the SELECT clause:

# Django 3.1
from django.db.models import Subquery, OuterRef

Customer.objects.annotate(
    id_of_previous_customer=Subquery(
        Customer.objects
        .filter(joined_at__lt=OuterRef('joined_at'))
        .order_by('-joined_at')
        .values('id')[:1],
    )
).filter(id_of_previous_customer__isnull=True)
.values('id')

Side note: You might think that instead of using values_list in this case you can omit the annotated field using .defer('id_of_previous_customer'). This won't work. Django will throw a KeyError: 'id_of_previous_customer' at you!

Starting with Django 3.2, you can replace annotate with alias and the field will not be added to the select clause:

# Django 3.2
from django.db.models import Subquery, OuterRef

Customer.objects.alias(
    id_of_previous_customer=Subquery(
        Customer.objects
        .filter(joined_at__lt=OuterRef('joined_at'))
        .order_by('-joined_at')
        .values('id')[:1],
    )
).filter(id_of_previous_customer__isnull=True)

The generated SQL now uses the subquery only once:

SELECT "store_customer"."id", "store_customer"."joined_at", "store_customer"."name"
FROM "store_customer"
WHERE (
    SELECT U0."id"
    FROM "store_customer" U0
    WHERE U0."joined_at" < "store_customer"."joined_at"
    ORDER BY U0."joined_at" DESC
    LIMIT 1
) IS NULL

The execution plan is simpler:

Seq Scan on store_customer  (cost=0.00..4380.04 rows=50 width=28)
  Filter: ((SubPlan 1) IS NULL)
  SubPlan 1
    ->  Limit  (cost=0.29..0.42 rows=1 width=12)
        ->  Index Scan Backward using store_customer_joined_at_ix on store_customer u0
            (cost=0.29..450.59 rows=3333 width=12)
            Index Cond: (joined_at < store_customer.joined_at)

One less way to shoot yourself in the foot!


New Admin Decorators

Ticket #16117

Before Django 3.2, to customize a calculated field in Django admin you first added a function, and then assigned some attributes to it:

# Django 3.1
from django.contrib import admin

from .models import Customer

@admin.register(Customer)
class CustomerAdmin(admin.ModelAdmin):
    list_display = (
        'id',
        'joined_at',
        'joined_at_year',
        'name',
    )

    def joined_at_year(self, obj: Customer) -> str:
        return obj.joined_at.year
    joined_at_year.admin_order_field = 'joined_at__year'
    joined_at_year.short_description = 'Year joined'

This is the kind of weird APIs that are mostly only possible in dynamic languages such as Python.

If you are using Mypy (and you should), this code will trigger an annoying warning, and the only way to silence it is to add a type: ignore comment:

...
joined_at_year.admin_order_field = 'joined_at__year'  # type: ignore[attr-defined]
joined_at_year.short_description = 'Year joined'  # type: ignore[attr-defined]

If you are using Django Admin and Mypy as much as I do, this can be pretty annoying.

The new display decorator solves this problem:

The new display() decorator allows for easily adding options to custom display functions that can be used with list_display or readonly_fields. Likewise, the new action() decorator allows for easily adding options to action functions that can be used with actions.

Adjusting the code to use the new display decorator:

@admin.display(ordering='joined_at__year', description='Year joined')
def joined_at_year(self, obj: Customer) -> str:
    return obj.joined_at.year

No type errors!

Another useful decorator is action which uses a similar approach to customize custom admin actions.


Value Expression Detects Type

Ticket #30446

This is a small feature that addresses a small nuisance in the ORM:

Value() expression now automatically resolves its output_field to the appropriate Field subclass based on the type of its provided value for bool, bytes, float, int, str, datetime.date, datetime.datetime, datetime.time, datetime.timedelta, decimal.Decimal, and uuid.UUID instances. As a consequence, resolving an output_field for database functions and combined expressions may now crash with mixed types when using Value(). You will need to explicitly set the output_field in such cases.

In previous Django versions if you wanted to use some constant value in a query, you had to explicitly set an output_field, otherwise it will fail:

>>> # Django 3.1
>>> from django.db.models import Value
>>> Customer.objects.annotate(
    number=Value(1),
    text=Value('text'),
    boolean=Value(True),
    date_=Value(datetime.date(2020, 1, 1)),
    datetime_=Value(pytz.UTC.localize(datetime.datetime(2020, 1, 1))),
).values_list('number', 'text', 'boolean', 'date_', 'datetime_').first()

FieldError: Cannot resolve expression type, unknown output_field

In Django 3.2, the ORM figures it out on its own:

>>> # Django 3.2
>>> from django.db.models import Value
>>> Customer.objects.annotate(
    number=Value(1),
    text=Value('text'),
    boolean=Value(True),
    date_=Value(datetime.date(2020, 1, 1)),
    datetime_=Value(pytz.UTC.localize(datetime.datetime(2020, 1, 1))),
).values_list('number', 'text', 'boolean', 'date_', 'datetime_').first()

(1, 'text', True, datetime.date(2020, 1, 1), datetime.datetime(2020, 1, 1, 0, 0, tzinfo=<UTC>))

Very cool!


More Mentionable Features

There are plenty more features in Django 3.2 that the documentation explains better than me. To name a few:

  • Navigable links in the admin (Ticket #31181): Read-only related fields are now rendered as navigable links if target models are registered in the admin. I'm still using a decorator to add links to Django Admin, guess now i'll have less use for it.

  • Durable argument for atomic() (Ticket #32220): When you execute code inside a database transaction, when the transaction finishes without any errors you expect it to be committed to the database. However, if the caller executed your code inside a database transaction of his own, if the parent transaction is rolled back, so is yours. To prevent this from happening, you can now mark your transaction as durable. When there is an attempt to open a durable transaction inside another transaction, a RuntimeError error is raised.

  • Cached templates are reloaded on Django's development server (Ticket #25791): If you are using Django's runserver command to develop locally, you probably got used to it reloading when a python file changes. However, if you are using Django's django.template.loaders.cached.Loader loader, when an HTML file is changing the dev server will not reload it, and you will have to restart the devserver to see the changes. This is pretty annoying, and so far I had to disable the cached loader in dev. Starting at Django 3.2 this is no longer necessary because cached templates are correctly reloaded in development.

  • Support for function based indexes (Ticket #26167): FBIs are useful when you query an expression often and you want to index it. A classic example is indexing lower case texts.


Wishlist

Django ORM is comprehensive and feature rich, but there are still some things on my wishlist for future versions:

  • Custom joins: Django is currently able to perform joins only between tables that are connected via ForeignKey. There are situations where you want to join tables that are not necessarily connected with a foreign key, or using more complex conditions. One common example is slowly changing dimensions, where a join condition require a BETWEEN operator.

  • Update returning: When updating many rows it's sometimes useful to fetch them immediately. This is a well known (and a very useful) feature in SQL. Django currently has no support for it, but I hear it might soon.

  • Database views: There are many hacks for getting database views to work with the ORM. These hacks usually involves creating a view directly in the database or in a manual migration, and then setting managed=False on the model. These hacks get the job done, but not in a very graceful way. I wish there was a way to define database views so that migrations can detect changes. Maybe even an option to create views using a Django queryset.

  • Database partitions: Database partitions are extremely useful in data modeling. When used correctly they can make queries much faster and maintenance a lot easier. Some database engines such as Oracle already provide very mature implementations for database partitioning, and other engines such as PostgreSQL are getting there. At the moment, there is no native support for database partitioning in Django, and most implementations I've seen resort to manually managing tables. As a result, I often avoid partitions all together and that's unfortunate.

  • Require authentication by default: Django currently permits access to any view unless explicitly marked otherwise, usually using the require_login decorator. This makes it easier to get started with Django, but it can potentially cause security issues down the road if you are not careful. I know there are solutions for this, usually using custom middleware and decorators. I really wish Django had an option to flip the condition so that access is restricted by default unless marked otherwise.

  • Typing: If you are following this blog you know I'm a big fan of type hinting in Python. At the moment, Django does not come with type hinting or official stubs. Shiny new frameworks such as Starlette and FastAPI advertise themselves as being 100% type annotated, but Django is still lagging behind. There is a project called django-stubs that is making some progress in this regard.

  • Database connection pooling Django currently supports two modes for managing database connections - creating a new connection per request, or a new connection per thread (persistent connections). Creating database connections in common deployments is a relatively heavy operation. It requires setting up a TCP connection, often a TLS connection, and initializing the connection, which adds significant latency. In PostgreSQL in particular, it also consumes many database server resources, so creating a new connection per request is a really bad idea.

    Persistent connections are much better. They work well with the way Django is usually deployed, small amount of worker processes and/or threads. But such deployments tend to breakdown under real world conditions. Whenever your database or one of your upstreams starts taking longer to process requests for some reason, the workers get tied up, requests back up, and the entire system chokes. Even with strict timeouts, this will still happen.

    To improve upon this catastrophic failure mode, a common solution is to use async workers such as gevent greenlets, or in the future, asycnio tasks. But now, each request gets its own lightweight thread, hence its own connection, which renders Django's persistent connections feature useless.

    It would be great if Django included a high-quality connection pool, which maintains a certain number of connections and hands them out to requests as needed. External solutions like PgBouncer exist, but they add operational overhead. A built-in solution would often be sufficient.




Similar articles