Partitioning the database¶
In the database used by datanommer and datagrepper one table stands out from the
other ones by its size, the
messages table. This can be observed in
One possibility to speed things up in datagrepper is to partition that table into a set of smaller sized partitions.
Here are some resources regarding partitioning postgresql tables:
Table partitioning at postgresql’s documentation: https://www.postgresql.org/docs/13/ddl-partitioning.html
How to use table partitioning to scale PostgreSQL: https://www.enterprisedb.com/postgres-tutorials/how-use-table-partitioning-scale-postgresql
Definition of PostgreSQL Partition: https://www.educba.com/postgresql-partition/
For our first attempt at partitioning the messages table, we thought we would
partition it by year. Having a different partition for each year.
We thus started by adding a
year field to the table and fill it by extracting
the year from the
timestamp field of the table.
However, one thing to realize when using partitioned table is that each partition
needs to be considered as an independant table. Meaning an unique constraint has
to involve the field on which the table is partitioned.
In other words, if you partition the table by a year field, that year field will
need to be part of the primary key as well as any
UNIQUE constraint on the
So to partition the messages table on
year, we had to add the
field to the primary key. However, that broke the foreign key constraints on
package_messages tables which rely on the
field to link the tables.
Since partitioning on
year did not work, we reconsidered and decided to
partition on the
id field instead using RANGE PARTITION.
We partitioned the
messages table on the
id field with partition of 10
million records each. This has the advantage of making each partition of similar
These are a few more resources we looked at and thought were worth bookmarking:
Automatic partitioning by day - PostgreSQL: https://stackoverflow.com/questions/55642326/
pg_partman, partition manager: https://github.com/pgpartman/pg_partman
How to scale PostgreSQL 10 using table inheritance and declarative partitioning: https://blog.timescale.com/blog/scaling-partitioning-data-postgresql-10-explained-cd48a712a9a1/