Seek to a position in an ordered relation using PostgreSQL
I need to seek to a given position within a sorted relation to fetch an object
and it’s neighbors. To get the sorted relation I also need to filter by
categories and a status column and then assign a position to each tuple in
the relation so that I can seek to the position that I want.
Here is an example. The where clause specifies the position and how many
neighbors are needed.
The problem with is that with 2M+ rows in the objects table this query takes
about 3000ms on a modern laptop and about 900ms on production database hardware.
Even at 900ms that’s an order of magnitude (maybe 2) longer than most high
traffic websites can tolerate.
Because of the filtering and the need to find the object at a given position you
need to materialize the sorted and filtered relation somehow before you seek to
the position that you want. That’s what the CTE is doing. I upload an explain
from sample data on explain.depesz.comhere so you can see the costs.
Building that CTE for every page load is expensive.
One way to get around this problem would be to materialize or denormalize the
data. This adds some storage overhead but should allow for much faster query
Something like the following should suffice to store the data. The assumption
is that we will populate and maintain this relation with the status values that
we want which only leaves the category_id to filter by. Ideally you wouldn’t
have to filter by the category_id either which would keep you from having
to compute the position each time.
Then to populate the relation we can use the following.
Then we can use this query to get the tuples matching the categories
This works well and gives us a 20x speedup on a laptop (about 10x on production
hardware). The cost is about 80-100MB of data and indexes for the table and
the maintenance of the table which I won’t go into here (changing categories,
removing those that don’t match the status we are looking for, etc).
Another approach would be to use arrays and the PostgreSQL type system.
PostgreSQL allows you to create compound types which can be used as the
member type for arrays.
Here is an example. First we need to create the type. This will be the element
type for the array.
Next, we create the table to hold the array of objects in the heterogeneous
(by category_id) container. Each tuple will be uniquely identified by the
container_id and the type. The type will identify the type of ordering.
Now we can populate the array with the tuples that match the status criteria so
that at query time we only need to to filter by the category. We’ll also have
to maintain the arrays via triggers or a cron job to add or remove objects when
the status changes. One of the nice properties of using arrays like this is that
the updates to an array can be done atomically.
With the arrays populated by container_id we can now get the tuples
we want using UNNEST.
This gives us about the same speedup as object_container_order_rollup
above but at one tenth the storage costs (8K vs 80MB). You can probably shave
off another 80% of the query cost by indexing directly into the array if you
don’t have to do any filtering at display time.
To conclude, if you need to seek to a given position within a sorted relation
to fetch an object and it’s neighbors you have a few options for materializing
the data. You can materialize the data in a table or an array and filter at
query time. Materializing to a table is straightforward should probably be the
default for most scenarios. Using a custom type and an array is also an option
but should be used with caution and may not scale well with a lorge number of
In PostgreSQL 9.3 you could also create a materialized view and see if that
will give you the performance you need.