Seek to a position in an ordered relation using PostgreSQL
I need to seek to a given position within a sorted relation to fetch an object and it’s neighbors. To get the sorted relation I also need to filter by categories and a status column and then assign a position to each tuple in the relation so that I can seek to the position that I want.
Here is an example. The where clause specifies the position and how many neighbors are needed.
The problem with is that with 2M+ rows in the objects table this query takes about 3000ms on a modern laptop and about 900ms on production database hardware. Even at 900ms that’s an order of magnitude (maybe 2) longer than most high traffic websites can tolerate.
Because of the filtering and the need to find the object at a given position you need to materialize the sorted and filtered relation somehow before you seek to the position that you want. That’s what the CTE is doing. I upload an explain from sample data on explain.depesz.com here so you can see the costs. Building that CTE for every page load is expensive.
One way to get around this problem would be to materialize or denormalize the data. This adds some storage overhead but should allow for much faster query times.
Something like the following should suffice to store the data. The assumption
is that we will populate and maintain this relation with the status values that
we want which only leaves the category_id
to filter by. Ideally you wouldn’t
have to filter by the category_id
either which would keep you from having
to compute the position each time.
Then to populate the relation we can use the following.
Then we can use this query to get the tuples matching the categories we want.
This works well and gives us a 20x speedup on a laptop (about 10x on production hardware). The cost is about 80-100MB of data and indexes for the table and the maintenance of the table which I won’t go into here (changing categories, removing those that don’t match the status we are looking for, etc).
Another approach would be to use arrays and the PostgreSQL type system. PostgreSQL allows you to create compound types which can be used as the member type for arrays.
Here is an example. First we need to create the type. This will be the element type for the array.
Next, we create the table to hold the array of objects in the heterogeneous
(by category_id
) container. Each tuple will be uniquely identified by the
container_id
and the type
. The type
will identify the type of ordering.
Now we can populate the array with the tuples that match the status criteria so that at query time we only need to to filter by the category. We’ll also have to maintain the arrays via triggers or a cron job to add or remove objects when the status changes. One of the nice properties of using arrays like this is that the updates to an array can be done atomically.
With the arrays populated by container_id
we can now get the tuples
we want using UNNEST
.
This gives us about the same speedup as object_container_order_rollup
above but at one tenth the storage costs (8K vs 80MB). You can probably shave
off another 80% of the query cost by indexing directly into the array if you
don’t have to do any filtering at display time.
To conclude, if you need to seek to a given position within a sorted relation to fetch an object and it’s neighbors you have a few options for materializing the data. You can materialize the data in a table or an array and filter at query time. Materializing to a table is straightforward should probably be the default for most scenarios. Using a custom type and an array is also an option but should be used with caution and may not scale well with a lorge number of elements.
In PostgreSQL 9.3 you could also create a materialized view and see if that will give you the performance you need.
The SQL to setup the tables and the data can be found here if you want to experiment with it. Here is the SQL to create and populate the rollup table and the SQL to create and populate the arrays.