Using the array type for user and package queries¶
Currently, we use auxiliary tables to query for messages related to packages or users, in the standard RDBS fashion.
We came to some problems when trying to enforce foreign key constrains while using the timescaledb extension. We decided to try, if just using a column with array type with proper indes would have simmilar performace.
Array columns support indexing with Generalized Inverted Index, GIN, that allows for fast searches on membership and intersection. Because we mostly search for memebership, array column could be performant enough for our purposes.
To have comparable results, we enabled timescaledb in same fashion as in our other experiment.
To add new column
alter table messages2 add column packages text;
To populate it
update messages2 set packages=t_agg.p_agg from (select msg, array_agg(package) as p_agg from package_messages group by msg) as t_agg where messages.id=t_agg.msg;
We need to enable the btree_gin extension to be able to create index with array as well as timestamp
CREATE EXTENSION btree_gin;
To create the index
CREATE INDEX idx_msg_user on "messages2" USING GIN ("timestamp", "packages");
To help reuse our testing script, we setup postgrest locally
podman run --rm --net=host -p 3000:3000 -e PGRST_DB_URI=$DBURI -e PGRST_DB_ANON_ROLE="datagrepper" -e PGRST_MAX_ROWS=25 postgrest/postgrest:v7.0.
Because we focused only on package queries, as user colun couldn’t be populated due to constraints on size, we chose two as representative. There is implicit limit to return just 25 rows.
A simple membership:
A simple membership ordered by time.
Querying just the package membership¶
The queries were surprisingly fast, with maximum under 4 seconds and mean around half a second. This encouraged us to do further experiments.
test_filter_by_package Requests: 300, pass: 300, fail: 0, exception: 0 For pass requests: Request per Second - mean: 3.63 Time per Request - mean: 0.522946, min: 0.000000, max: 3.907548
Querying just the package membership ordered by timestamp desc¶
Usually we want to see most recent messages. So we ammended the query, to include “order by timestamp desc”. The result was less encouraging, with longest succesful query taking more than 90 seconds and several timing out.
This seems to be the result of GIN index not supporting order in the index.
test_filter_by_package Requests: 300, pass: 280, fail: 0, exception: 20 For pass requests: Request per Second - mean: 0.53 Time per Request - mean: 7.474040, min: 0.000000, max: 99.880939
While array support seems interesting, and for simple queries very fast, indexes that require ordering don’t seem to be supported. This makes strong case against using them.