SQLAlchemy Pattern: Raiseload Under Test

SQLAlchemy’s default lazy loading is, most of the time, what I want. We access a relationship attribute, the ORM emits the query, we get only the data that we need when we need it. But it will do the same thing in a loop. Add an attribute access to a serialiser or template, not realising it’s an unloaded relationship, and a list endpoint that fired a handful of queries starts firing one per row, or worse. That’s the N+1 problem, and it tends to get through tests and review quietly, because nothing actually breaks - the page still renders, just with hundreds of queries behind it.

This post describes a simple pattern for catching that class of regression under test, without changing anything in production. Loud in tests, silent in prod - that’s the idea.

It’s easily done, especially when making a small change in a complex system with normalised data. I run into it less when building a new feature, because “how do I make this performant?” is already at the forefront of my mind. It’s the seemingly innocuous change to something that already works that tends to get me.

The obvious fix is raiseload: set it on a query and touching any unloaded relationship raises instead of going back to the database. Running it everywhere is too inconvenient, and I really don’t want it switched on in production. A missing eager load shouldn’t blow up a request - it should just make it a bit slower until it shows up in the performance monitoring.

What I do want is for the queries that feed list views, exports and reports to be loud about it under test. Those are the ones where an unexpected lazy load almost always means a regression. Everywhere else, the default behaviour is fine.

So I put raiseload behind a switch that’s only on under test, and opt into it per query:

# myapp/db_utils.py
_RAISELOAD_UNDER_TEST = False


def raiseload_under_test():
    """Return raiseload options when running under pytest, an empty list otherwise.

    Splat into the ``options=`` list of a query so tests catch
    accidental lazy loads: ``options=[selectinload(cls.foo), *raiseload_under_test()]``.
    """
    return [raiseload("*", sql_only=True)] if _RAISELOAD_UNDER_TEST else []

sql_only=True avoids raising on relationships that are already loaded in the session’s identity map, which we run into frequently in the real world (e.g. parent → child → parent accesses). An already-loaded relationship should be used, not needlessly re-fetched, and not treated as an error.

The conftest switches the flag on before the tests run:

# conftest.py
import myapp.db_utils as db_utils

db_utils._RAISELOAD_UNDER_TEST = True

# Other imports ...

The helper reads the flag when a query is built, not when the module is imported, so it’s enough to set this before any test runs a query. A fixture or a pytest_configure hook would do just as well.

In use, a list-feeding classmethod eager-loads what it needs and splats in the raiseload:

class Author(Base):
    # ...
    books: Mapped[list["Book"]] = relationship(back_populates="author")

    @classmethod
    def list_with_books(cls, session):
        stmt = select(cls).options(
            selectinload(cls.books),
            *raiseload_under_test(),
        )
        return session.scalars(stmt).all()

Define the relationships the query is meant to eager-load, explicitly lazyload if necessary, then splat in the raiseload. In production the splat expands to nothing. Under test, anything you forgot to eager-load raises InvalidRequestError whenever a test touches it. The wildcard isn’t shallow, so a forgotten eager load one level down (say book.genre when the query only loaded books) raises too.

It catches a specific class of regression. If you have a test that exercises the endpoint, it’ll fail the next test run instead of getting deployed. It does nothing for queries that are slow for other reasons (bad indexes, missing constraints, fundamentally expensive joins), and it’s no substitute for application performance monitoring or load testing, especially if performance is critical.

One last thing worth pointing out: if you’re hitting this a lot, the database itself is often the better answer. To perhaps state the obvious, your SQL database isn’t just a data store, it’s a powerful system for processing data, and it’s generally much more efficient to do that processing on the server than to transfer the data back to your Python process. One of SQLAlchemy’s strengths is how capable it is for arbitrary queries as well as classic ORM-style access, and the first defence against inefficient querying is to write statements that match the shape of what you’re trying to access. But there are plenty of times you want the mapped instance, and writing queries for everything defeats the point of an ORM. When you do write the join yourself, contains_eager lets you route those joined rows straight onto the relationship, so you get the mapped instance without a second query. For the everyday “forgot to eager-load that” regression, this keeps it out of production. Another tool in the toolbox.