We are experiencing significant slowdown due to our use of Docker repository hosting with Artifactory. One of the issues we are facing seems to be inefficient Docker database queries for nodes with a particular SHA-256. This query is not making use of the index on node_props.prop_value:
This is the node_props table on PostgreSQL:
The "node_props_prop_value_idx" index is on substr(prop_value::text, 1, 255). However, this Docker query does not specify substr() in the same way, so this index is not used. PostgreSQL does not have special support for substr() in the query planner, so this index is useless in the above query.
I am guessing that the substr() use was added as a result of issues such as this:
The problem is that there is a limit on the amount of data that can be stored in the index row. A typical solution to this may include truncating the value being searched upon. However, the PostgreSQL query planner does not know about substr(), and it does not recognize that it might be possible to use it to help find the SHA-256 value in the node_props table. Consequently, it actually looks for every value in the table, where prop_key = 'sha256'. The more 'sha256' properties in the database, the slower this will behave. In our case, this can take several hundred milliseconds per query, and these queries are happening many times a second during peak.
To leverage PostgreSQL properly in this regard, you need to change the query to specify the substr() call in a manner that matches the index.
As a temporary work-around, I have manually created on an index on prop_value:
However, this is fragile, as it relies on the prop_value always being able to fit in an index row. I believe the list of solutions here includes:
1) Use substr() in a manner that would allow the index to be effectively used. For example:
2) Store some sort of checksum or message digest of the prop_value inside the node_props table, and match on this. Have an index on the checksum or message digest. For example:
3) Store a short or canonical "searchable" version of the prop_value in a varchar(255), and search on this instead. Have an index on this whole field.
4) Store the SHA-256 value in a separate table dedicated for this purpose, and stop using the node_props for this purpose. Have an index on this whole field.