ORMs vs Plain SQL in Python
Introduction
As part of the Marketing R&D team at Lightricks, we encountered performance and clean code issues when using Django ORM for our database management. We decided to move to a plain SQL approach in newer projects to facilitate database optimizations and better boundaries in our code.
In this article I’d like to discuss using ORMs vs writing plain SQL queries.
In the next article, if you decide to use plain SQL, I’ll explain how we use Flyway as our migration tool for schema management in a CI/CD workflow.
ORM vs Plain SQL
Working with python, it’s common to use an SQL abstraction like Django ORM or SQL alchemy. While these approaches work well for new applications, they quickly fail at scaling your data or your team.
When working with large applications, optimizations to the database structure and queries often need to be made. For example, as our database grows we may want to partition it into multiple sub-tables. Or perhaps we need control of our query to make database specific optimizations.
We encountered both problems, and had to write most of the select queries ourselves, and redefine tables created by Django.
On the surface, using an SQL abstraction is very tempting:
- Tests are very clean and don’t require extra code.
- Record inserts are very easy to implement, assuming we are passing ORM objects in our code, since we just need to call
.save()
- SQL Abstraction — ability to easily switch between database implementations, without writing any SQL.
No need to write SQL queries - Migrations are automatically generated and come with a built-in migration tool.
This saves a lot of development time when you first start the project, but comes at a cost:
Coupling
Since ORM models already contain all the data you need, your team will be tempted to use it outside of database related code.
For example, say you’re fetching data for a Todo list, you’ll probably create the model instance in the parsing code and start passing it around to other functions. However, ORM models are often mutable, and have information that has nothing to do with most of your code. Any function that receives the model can change it, which makes code hard to debug and test.
Instead, if you create proper boundaries with immutable objects(using a NamedTuple
/ attrs
/ dataclasses
), you’ll make your code much simpler and easier to test, at the cost of a few more lines of code.
Schema/Query Optimizations
While ORMs work fine when you first start out, as your data grows you’ll want to optimize the table structure and queries:
- Partition tables by timestamp/country/client etc.
- Use engine specific tricks to increase performance tenfold.
When you encounter these problems, you’ll need to write SQL queries manually anyway, and write migrations yourselves, so using ORMs suddenly doesn’t save that much time.
Summary
If you’re planning on working in a small group on a small project, an ORM approach will save you plenty of time. But if you anticipate big amounts of data and scaling your app to more developers, I suggest you try using plain SQL instead.
If you do use an ORM, I suggest you only use the database objects in separate data layer files, to make code more functional, cleaner and easier to test.
Please feel free to leave your thoughts below, as this is written from personal experience and getting input is always neat.