Django

In Identify Distribution Strategy we discussed the framework-agnostic database changes required for using Citus in the multi-tenant use case. Here we investigate specifically how to migrate multi-tenant Django applications to a Citus storage backend with the help of the django-multitenant library.

This process will be in 5 steps:

  • Introducing the tenant column to models missing it that we want to distribute
  • Changing the primary keys of distributed tables to include the tenant column
  • Updating the models to use the TenantModelMixin
  • Distributing the data
  • Updating the Django Application to scope queries

Preparing to scale-out a multi-tenant application

Initially you’ll start with all tenants placed on a single database node. To be able to scale out django, some simple changes will have to be made to your models.

Let’s consider this simplified model:

from django.utils import timezone
from django.db import models

class Country(models.Model):
    name = models.CharField(max_length=255)

class Account(models.Model):
    name = models.CharField(max_length=255)
    domain = models.CharField(max_length=255)
    subdomain = models.CharField(max_length=255)
    country = models.ForeignKey(Country, on_delete=models.SET_NULL)

class Manager(models.Model):
    name = models.CharField(max_length=255)
    account = models.ForeignKey(Account, on_delete=models.CASCADE,
                                related_name='managers')

class Project(models.Model):
    name = models.CharField(max_length=255)
    account = models.ForeignKey(Account, related_name='projects',
                                on_delete=models.CASCADE)
    managers = models.ManyToManyField(Manager)

class Task(models.Model):
    name = models.CharField(max_length=255)
    project = models.ForeignKey(Project, on_delete=models.CASCADE,
                                related_name='tasks')

The tricky thing with this pattern is that in order to find all tasks for an account, you’ll have to query for all of an account’s project first. This becomes a problem once you start sharding data, and in particular when you run UPDATE or DELETE queries on nested models like task in this example.

1. Introducing the tenant column to models belonging to an account

1.1 Introducing the column to models belonging to an account

In order to scale out a multi-tenant model, it’s essential for queries to quickly locate all records that belong to an account. Consider an ORM call such as:

Project.objects.filter(account_id=1).prefetch_related('tasks')

It generates these underlying SQL queries:

SELECT *
FROM myapp_project
WHERE account_id = 1;

SELECT *
FROM myapp_task
WHERE project_id IN (1, 2, 3);

However, the second query would go faster with an extra filter:

-- the AND clause identifies the tenant
SELECT *
FROM myapp_task
WHERE project_id IN (1, 2, 3)
      AND account_id = 1;

This way you can easily query the tasks belonging to one account. The easiest way to achieve this is to simply add a account_id column on every object that belongs to an account.

In our case:

class Task(models.Model):
    name = models.CharField(max_length=255)
    project = models.ForeignKey(Project, on_delete=models.CASCADE,
                                related_name='tasks')
    account = models.ForeignKey(Account, related_name='tasks',
                                on_delete=models.CASCADE)

Create a migration to reflect the change: python manage.py makemigrations.

1.2. Introduce a column for the account_id on every ManyToMany model that belongs to an account

The goal is the same as previously. We want to be able to have ORM calls and queries routed to one account. We also want to be able to distribute the ManyToMany relationship related to an account on the account_id.

So the calls generated by:

Project.objects.filter(account_id=1).prefetch_related('managers')

Can include in their WHERE clause the account_id like this:

SELECT *
FROM "myapp_project" WHERE "myapp_project"."account_id" = 1;

SELECT *
FROM myapp_manager manager
INNER JOIN myapp_projectmanager projectmanager
ON (manager.id = projectmanager.manager_id
AND  projectmanager.account_id = manager.account_id)
WHERE projectmanager.project_id IN (1, 2, 3)
AND manager.account_id = 1;

For that we need to introduce through models. In our case:

class Project(models.Model):
    name = models.CharField(max_length=255)
    account = models.ForeignKey(Account, related_name='projects',
                                on_delete=models.CASCADE)
    managers = models.ManyToManyField(Manager, through='ProjectManager')

class ProjectManager(models.Model):
    project = models.ForeignKey(Project, on_delete=models.CASCADE)
    manager = models.ForeignKey(Manager, on_delete=models.CASCADE)
    account = models.ForeignKey(Account, on_delete=models.CASCADE)

Create a migration to reflect the change: python manage.py makemigrations.

2. Include the account_id in all primary keys and unique constraints

Primary-key and unique constraints on values other than the tenant_id will present a problem in any distributed system, since it’s difficult to make sure that no two nodes accept the same unique value. Enforcing the constraint would require expensive scans of the data across all nodes.

To solve this problem, for the models which are logically related to an account (the tenant for our app), you should add account_id to the primary keys and unique constraints, effectively scoping objects unique inside a given account. This helps add the concept of tenancy to your models, thereby making the multi-tenant system more robust.

2.1 Including the account_id to primary keys

Django automatically creates a simple “id” primary key on models, so we will need to circumvent that behavior with a custom migration of our own. Run python manage.py makemigrations appname --empty --name remove_simple_pk, and edit the result to look like this:

from django.db import migrations

class Migration(migrations.Migration):

  dependencies = [
    # leave this as it was generated
  ]

  operations = [
    # Django considers "id" the primary key of these tables, but
    # we want the primary key to be (account_id, id)
    migrations.RunSQL("""
      ALTER TABLE myapp_manager
      DROP CONSTRAINT myapp_manager_pkey CASCADE;

      ALTER TABLE myapp_manager
      ADD CONSTRAINT myapp_manager_pkey
      PRIMARY KEY (account_id, id);
    """),

    migrations.RunSQL("""
      ALTER TABLE myapp_project
      DROP CONSTRAINT myapp_project_pkey CASCADE;

      ALTER TABLE myapp_project
      ADD CONSTRAINT myapp_product_pkey
      PRIMARY KEY (account_id, id);
    """),

    migrations.RunSQL("""
      ALTER TABLE myapp_task
      DROP CONSTRAINT myapp_task_pkey CASCADE;

      ALTER TABLE myapp_task
      ADD CONSTRAINT myapp_task_pkey
      PRIMARY KEY (account_id, id);
    """),

    migrations.RunSQL("""
      ALTER TABLE myapp_projectmanager
      DROP CONSTRAINT myapp_projectmanager_pkey CASCADE;

      ALTER TABLE myapp_projectmanager
      ADD CONSTRAINT myapp_projectmanager_pkey PRIMARY KEY (account_id, id);
    """),
  ]

2.2 Including the account_id to unique constraints

The same thing needs to be done for UNIQUE constraints. You can have explicit constraints that you might have set in your model with unique=True or unique_together like:

class Project(models.Model):
    name = models.CharField(max_length=255, unique=True)
    account = models.ForeignKey(Account, related_name='projects',
                                on_delete=models.CASCADE)
    managers = models.ManyToManyField(Manager, through='ProjectManager')

class Task(models.Model):
    name = models.CharField(max_length=255)
    project = models.ForeignKey(Project, on_delete=models.CASCADE,
                                related_name='tasks')
    account = models.ForeignKey(Account, related_name='tasks',
                                on_delete=models.CASCADE)

    class Meta:
        unique_together = [('name', 'project')]

For these constraints, you can simply change in the models the constraints:

class Project(models.Model):
    name = models.CharField(max_length=255)
    account = models.ForeignKey(Account, related_name='projects',
                                on_delete=models.CASCADE)
    managers = models.ManyToManyField(Manager, through='ProjectManager')

    class Meta:
        unique_together = [('account', 'name')]

class Task(models.Model):
    name = models.CharField(max_length=255)
    project = models.ForeignKey(Project, on_delete=models.CASCADE,
                                related_name='tasks')
    account = models.ForeignKey(Account, related_name='tasks',
                                on_delete=models.CASCADE)

    class Meta:
        unique_together = [('account', 'name', 'project')]

Then generate the migration with:

python manage.py makemigrations

Some UNIQUE constraints are created by the ORM and you will need to explicitly drop them. This is the case for OneToOneField and ManyToMany fields.

For these cases you will need to: 1. Find the constraints 2. Do a migration to drop them 3. Re-create constraints including the account_id field

To find the constraints, connect to your database using psql and run \d+ myapp_projectmanager You will see the ManyToMany (or OneToOneField) constraint:

"myapp_projectmanager" UNIQUE CONSTRAINT myapp_projectman_project_id_manager_id_bc477b48_uniq,
btree (project_id, manager_id)

Drop this constraint in a migration:

from django.db import migrations

class Migration(migrations.Migration):

  dependencies = [
    # leave this as it was generated
  ]

  operations = [
    migrations.RunSQL("""
      DROP CONSTRAINT myapp_projectman_project_id_manager_id_bc477b48_uniq;
    """),

Then change your models to have a unique_together including the account\_id

class ProjectManager(models.Model):
    project = models.ForeignKey(Project, on_delete=models.CASCADE)
    manager = models.ForeignKey(Manager, on_delete=models.CASCADE)
    account = models.ForeignKey(Account, on_delete=models.CASCADE)

    class Meta:
        unique_together=(('account', 'project', 'manager'))

And finally apply the changes by creating a new migration to generate these constraints:

python manage.py makemigrations

3. Updating the models to use TenantModelMixin and TenantForeignKey

Next, we’ll use the django-multitenant library to add account_id to foreign keys, and make application queries easier later on.

In requirements.txt for your Django application, add

django_multitenant>=2.0.0, <3

Run pip install -r requirements.txt.

In settings.py, change the database engine to the customized engine provided by django-multitenant:

'ENGINE': 'django_multitenant.backends.postgresql'

3.1 Introducing the TenantModelMixin and TenantManager

The models will now not only inherit from models.Model but also from the TenantModelMixin.

To do that in your models.py file you will need to do the following imports

from django_multitenant.mixins import *

Previously our example models inherited from just models.Model, but now we need to change them to also inherit from TenantModelMixin. The models in real projects may inherit from other mixins too like django.contrib.gis.db, which is fine.

You will also, at this point, introduce the tenant_id to define which column is the distribution column.

class TenantManager(TenantManagerMixin, models.Manager):
    pass

class Account(TenantModelMixin, models.Model):
    ...
    tenant_id = 'id'
    objects = TenantManager()

class Manager(TenantModelMixin, models.Model):
    ...
    tenant_id = 'account_id'
    objects = TenantManager()

class Project(TenantModelMixin, models.Model):
    ...
    tenant_id = 'account_id'
    objects = TenantManager()

class Task(TenantModelMixin, models.Model):
    ...
    tenant_id = 'account_id'
    objects = TenantManager()

class ProjectManager(TenantModelMixin, models.Model):
    ...
    tenant_id = 'account_id'
    objects = TenantManager()

3.2 Handling ForeignKey constraints

For ForeignKey and OneToOneField constraint, we have a few different cases:

  • Foreign keys (or One to One) between distributed tables, for which you should use the TenantForeignKey (or TenantOneToOneField).
  • Foreign keys between a distributed table and a reference table don’t require a change.
  • Foreign keys between a distributed table and a local table, which require to drop the constraint by using models.ForeignKey(MyModel, on_delete=models.CASCADE, db_constraint=False).

Finally your models should look like this:

from django.db import models
from django_multitenant.fields import TenantForeignKey
from django_multitenant.mixins import *

class Country(models.Model):  # This table is a reference table
  name = models.CharField(max_length=255)

class TenantManager(TenantManagerMixin, models.Manager):
    pass

class Account(TenantModelMixin, models.Model):
    name = models.CharField(max_length=255)
    domain = models.CharField(max_length=255)
    subdomain = models.CharField(max_length=255)
    country = models.ForeignKey(Country, on_delete=models.SET_NULL)  # No changes needed

    tenant_id = 'id'
    objects = TenantManager()

class Manager(TenantModelMixin, models.Model):
    name = models.CharField(max_length=255)
    account = models.ForeignKey(Account, related_name='managers',
                                on_delete=models.CASCADE)
    tenant_id = 'account_id'
    objects = TenantManager()

class Project(TenantModelMixin, models.Model):
    account = models.ForeignKey(Account, related_name='projects',
                                on_delete=models.CASCADE)
    managers = models.ManyToManyField(Manager, through='ProjectManager')
    tenant_id = 'account_id'
    objects = TenantManager()

class Task(TenantModelMixin, models.Model):
    name = models.CharField(max_length=255)
    project = TenantForeignKey(Project, on_delete=models.CASCADE,
                             related_name='tasks')
    account = models.ForeignKey(Account, on_delete=models.CASCADE)

    tenant_id = 'account_id'
    objects = TenantManager()

class ProjectManager(TenantModelMixin, models.Model):
    project = TenantForeignKey(Project, on_delete=models.CASCADE)
    manager = TenantForeignKey(Manager, on_delete=models.CASCADE)
    account = models.ForeignKey(Account, on_delete=models.CASCADE)

    tenant_id = 'account_id'
    objects = TenantManager()

3.3 Handling ManyToMany constraints

In the second section of this article, we introduced the fact that with citus, ManyToMany relationships require a through model with the tenant column. Which is why we have the model:

class ProjectManager(TenantModelMixin, models.Model):
    project = TenantForeignKey(Project, on_delete=models.CASCADE)
    manager = TenantForeignKey(Manager, on_delete=models.CASCADE)
    account = models.ForeignKey(Account, on_delete=models.CASCADE)

    tenant_id = 'account_id'
    objects = TenantManager()

After installing the library, changing the engine, and updating the models, run python manage.py makemigrations. This will produce a migration to make the foreign keys composite when necessary.

4. Distribute data in Citus

We need one final migration to tell Citus to mark tables for distribution. Create a new migration python manage.py makemigrations appname --empty --name distribute_tables. Edit the result to look like this:

from django.db import migrations
from django_multitenant.db import migrations as tenant_migrations

class Migration(migrations.Migration):
  dependencies = [
    # leave this as it was generated
  ]

  operations = [
    tenant_migrations.Distribute('Country', reference=True),
    tenant_migrations.Distribute('Account'),
    tenant_migrations.Distribute('Manager'),
    tenant_migrations.Distribute('Project'),
    tenant_migrations.Distribute('ProjectManager'),
    tenant_migrations.Distribute('Task'),
  ]

With all the migrations created from the steps so far, apply them to the database with python manage.py migrate.

At this point the Django application models are ready to work with a Citus backend. You can continue by importing data to the new system and modifying views as necessary to deal with the model changes.

Updating the Django Application to scope queries

The django-multitenant library discussed in the previous section is not only useful for migrations, but also for simplifying application queries. The library allows application code to easily scope queries to a single tenant. It automatically adds the correct SQL filters to all statements, including fetching objects through relations.

For instance, in a view simply set_current_tenant and all the queries or joins afterward will include a filter to scope results to a single tenant.

# set the current tenant to the first account
s = Account.objects.first()
set_current_tenant(s)

# now this count query applies only to Project for that account
Project.objects.count()

# Find tasks for very important projects in the current account
Task.objects.filter(project__name='Very important project')

In the context of an application view, the current tenant object can be stored as a SESSION variable when a user logs in, and view actions can set_current_tenant to this value. See the README in django-multitenant for more examples.

The set_current_tenant function can also take an array of objects, like

set_current_tenant([s1, s2, s3])

which updates the internal SQL query with a filter like tenant_id IN (a,b,c).

Automating with middleware

Rather than calling set_current_tenant() in each view, you can create and install a new middleware class in your Django application to do it automatically.

# src/appname/middleware.py

from django_multitenant.utils import set_current_tenant

class MultitenantMiddleware:
    def __init__(self, get_response):
        self.get_response = get_response

    def __call__(self, request):
        if request.user and not request.user.is_anonymous:
            set_current_tenant(request.user.employee.company)
        response = self.get_response(request)
        return response

Enable the middleware by updating the MIDDLEWARE array in src/appname/settings/base.py:

MIDDLEWARE = [
    # ...
    # existing items
    # ...

    'appname.middleware.MultitenantMiddleware'
]