Partitioning the database
In the database used by datanommer and datagrepper one table stands out from the other
ones by its size, the messages
table. This can be observed in Datanommer.
One possibility to speed things up in datagrepper is to partition that table into a set of smaller sized partitions.
Here are some resources regarding partitioning postgresql tables:
Table partitioning at postgresql’s documentation: https://www.postgresql.org/docs/13/ddl-partitioning.html
How to use table partitioning to scale PostgreSQL: https://www.enterprisedb.com/postgres-tutorials/how-use-table-partitioning-scale-postgresql
Definition of PostgreSQL Partition: https://www.educba.com/postgresql-partition/
Attempt #1
For our first attempt at partitioning the messages table, we thought we would
partition it by year. Having a different partition for each year. We thus started by
adding a year
field to the table and fill it by extracting the year from the
timestamp
field of the table.
However, one thing to realize when using partitioned table is that each partition needs
to be considered as an independant table. Meaning an unique constraint has to involve
the field on which the table is partitioned. In other words, if you partition the table
by a year field, that year field will need to be part of the primary key as well as any
UNIQUE
constraint on the table.
So to partition the messages table on year
, we had to add the year
field to
the primary key. However, that broke the foreign key constraints on the
user_messages
and package_messages
tables which rely on the id
field to link
the tables.
Attempt #2
Since partitioning on year
did not work, we reconsidered and decided to partition on
the id
field instead using RANGE PARTITION.
We partitioned the messages
table on the id
field with partition of 10 million
records each. This has the advantage of making each partition of similar sizes.
More resources
These are a few more resources we looked at and thought were worth bookmarking:
Automatic partitioning by day - PostgreSQL: https://stackoverflow.com/questions/55642326/
pg_partman, partition manager: https://github.com/pgpartman/pg_partman
How to scale PostgreSQL 10 using table inheritance and declarative partitioning: https://blog.timescale.com/blog/scaling-partitioning-data-postgresql-10-explained-cd48a712a9a1/