Partitioning the database
=========================

In the database used by datanommer and datagrepper one table stands out from the other
ones by its size, the ``messages`` table. This can be observed in :ref:`datanommer`.

One possibility to speed things up in datagrepper is to partition that table into a set
of smaller sized partitions.

Here are some resources regarding partitioning postgresql tables:

- Table partitioning at postgresql's documentation:
  https://www.postgresql.org/docs/13/ddl-partitioning.html
- How to use table partitioning to scale PostgreSQL:
  https://www.enterprisedb.com/postgres-tutorials/how-use-table-partitioning-scale-postgresql
- Definition of PostgreSQL Partition: https://www.educba.com/postgresql-partition/

Attempt #1
----------

For our first attempt at partitioning the `messages` table, we thought we would
partition it by year. Having a different partition for each year. We thus started by
adding a ``year`` field to the table and fill it by extracting the year from the
``timestamp`` field of the table.

However, one thing to realize when using partitioned table is that each partition needs
to be considered as an independant table. Meaning an unique constraint has to involve
the field on which the table is partitioned. In other words, if you partition the table
by a year field, that year field will need to be part of the primary key as well as any
``UNIQUE`` constraint on the table.

So to partition the `messages` table on ``year``, we had to add the ``year`` field to
the primary key. However, that broke the foreign key constraints on the
``user_messages`` and ``package_messages`` tables which rely on the ``id`` field to link
the tables.

Attempt #2
----------

Since partitioning on ``year`` did not work, we reconsidered and decided to partition on
the ``id`` field instead using `RANGE PARTITION`.

We partitioned the ``messages`` table on the ``id`` field with partition of 10 million
records each. This has the advantage of making each partition of similar sizes.

More resources
--------------

These are a few more resources we looked at and thought were worth bookmarking:

- Automatic partitioning by day - PostgreSQL:
  https://stackoverflow.com/questions/55642326/
- pg_partman, partition manager: https://github.com/pgpartman/pg_partman
- How to scale PostgreSQL 10 using table inheritance and declarative partitioning:
  https://blog.timescale.com/blog/scaling-partitioning-data-postgresql-10-explained-cd48a712a9a1/