Partitioning the database
In the database used by datanommer and datagrepper one table stands out from the other
ones by its size, the
messages table. This can be observed in Datanommer.
One possibility to speed things up in datagrepper is to partition that table into a set of smaller sized partitions.
Here are some resources regarding partitioning postgresql tables:
Table partitioning at postgresql’s documentation: https://www.postgresql.org/docs/13/ddl-partitioning.html
How to use table partitioning to scale PostgreSQL: https://www.enterprisedb.com/postgres-tutorials/how-use-table-partitioning-scale-postgresql
Definition of PostgreSQL Partition: https://www.educba.com/postgresql-partition/
For our first attempt at partitioning the messages table, we thought we would
partition it by year. Having a different partition for each year. We thus started by
year field to the table and fill it by extracting the year from the
timestamp field of the table.
However, one thing to realize when using partitioned table is that each partition needs
to be considered as an independant table. Meaning an unique constraint has to involve
the field on which the table is partitioned. In other words, if you partition the table
by a year field, that year field will need to be part of the primary key as well as any
UNIQUE constraint on the table.
So to partition the messages table on
year, we had to add the
year field to
the primary key. However, that broke the foreign key constraints on the
package_messages tables which rely on the
id field to link
Since partitioning on
year did not work, we reconsidered and decided to partition on
id field instead using RANGE PARTITION.
We partitioned the
messages table on the
id field with partition of 10 million
records each. This has the advantage of making each partition of similar sizes.
These are a few more resources we looked at and thought were worth bookmarking:
Automatic partitioning by day - PostgreSQL: https://stackoverflow.com/questions/55642326/
pg_partman, partition manager: https://github.com/pgpartman/pg_partman
How to scale PostgreSQL 10 using table inheritance and declarative partitioning: https://blog.timescale.com/blog/scaling-partitioning-data-postgresql-10-explained-cd48a712a9a1/