Uploaded image for project: 'Artifactory Binary Repository'
  1. Artifactory Binary Repository
  2. RTFACT-12908

Database queries for node_props do not use prop_value index on PostgreSQL

    Details

    • Type: Performance
    • Status: Resolved
    • Priority: High
    • Resolution: Fixed
    • Affects Version/s: 4.12.1, 4.14.2
    • Fix Version/s: 5.2.0
    • Component/s: Docker
    • Labels:
      None

      Description

      We are experiencing significant slowdown due to our use of Docker repository hosting with Artifactory. One of the issues we are facing seems to be inefficient Docker database queries for nodes with a particular SHA-256. This query is not making use of the index on node_props.prop_value:

      [2016-11-28 08:35:39.748 UTC] <artifactory@127.0.0.1(59677)> artifactory: [unknown]: LOG:  duration: 270.904 ms  execute S_5: select distinct  n.repo as itemRepo,n.node_path as itemPath,n.node_name as itemName,n.created as itemCreated,n.modified as itemModified,n.updated as itemUpdated,n.created_by as itemCreatedBy,n.modified_by as itemModifiedBy,n.node_type as itemType,n.bin_length as itemSize,n.node_id as itemId,n.depth as itemDepth,n.sha1_actual as itemActualSha1,n.sha1_original as itemOriginalSha1,n.md5_actual as itemActualMd5,n.md5_original as itemOriginalMd5 from  nodes n left outer join node_props np100 on np100.node_id = n.node_id where (( np100.prop_key = $1 and  np100.prop_value = $2) and n.node_type = $3) and(n.repo != $4 or n.repo is null) 
      [2016-11-28 08:35:39.748 UTC] <artifactory@127.0.0.1(59677)> artifactory: [unknown]: DETAIL:  parameters: $1 = 'sha256', $2 = 'a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4', $3 = '1', $4 = 'auto-trashcan'
      

      This is the node_props table on PostgreSQL:

      artifactory=# \d node_props
                  Table "public.node_props"
         Column   |          Type           | Modifiers 
      ------------+-------------------------+-----------
       prop_id    | bigint                  | not null
       node_id    | bigint                  | not null
       prop_key   | character varying(255)  | 
       prop_value | character varying(4000) | 
      Indexes:
          "node_props_pk" PRIMARY KEY, btree (prop_id)
          "node_props_node_id_idx" btree (node_id)
          "node_props_prop_key_idx" btree (prop_key)
          "node_props_prop_value_idx" btree (substr(prop_value::text, 1, 255))
      Foreign-key constraints:
          "node_props_nodes_fk" FOREIGN KEY (node_id) REFERENCES nodes(node_id)
      

      The "node_props_prop_value_idx" index is on substr(prop_value::text, 1, 255). However, this Docker query does not specify substr() in the same way, so this index is not used. PostgreSQL does not have special support for substr() in the query planner, so this index is useless in the above query.

      I am guessing that the substr() use was added as a result of issues such as this:

      https://www.jfrog.com/jira/browse/RTFACT-5764

      The problem is that there is a limit on the amount of data that can be stored in the index row. A typical solution to this may include truncating the value being searched upon. However, the PostgreSQL query planner does not know about substr(), and it does not recognize that it might be possible to use it to help find the SHA-256 value in the node_props table. Consequently, it actually looks for every value in the table, where prop_key = 'sha256'. The more 'sha256' properties in the database, the slower this will behave. In our case, this can take several hundred milliseconds per query, and these queries are happening many times a second during peak.

      To leverage PostgreSQL properly in this regard, you need to change the query to specify the substr() call in a manner that matches the index.

      As a temporary work-around, I have manually created on an index on prop_value:

      create index node_props_prop_value_idx_2 on node_props(prop_value);
      

      However, this is fragile, as it relies on the prop_value always being able to fit in an index row. I believe the list of solutions here includes:

      1) Use substr() in a manner that would allow the index to be effectively used. For example:

      select ... from node_props where substr(prop_value::text, 1, 255) = substr(?, 1, 255) and prop_value = ?;
      

      2) Store some sort of checksum or message digest of the prop_value inside the node_props table, and match on this. Have an index on the checksum or message digest. For example:

      select ... from node_props where sha1sum(prop_value::text) = sha1sum(?) and prop_value = ?;
      create index on node_props(sha1sum(prop_value::text));
      

      3) Store a short or canonical "searchable" version of the prop_value in a varchar(255), and search on this instead. Have an index on this whole field.

      4) Store the SHA-256 value in a separate table dedicated for this purpose, and stop using the node_props for this purpose. Have an index on this whole field.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                yuvalr Yuval Reches
                Reporter:
                mark.mielke Mark Mielke
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - 4 days
                  4d
                  Remaining:
                  Remaining Estimate - 4 days
                  4d
                  Logged:
                  Time Spent - Not Specified
                  Not Specified