Uploaded image for project: 'Artifactory Binary Repository'
  1. Artifactory Binary Repository
  2. RTFACT-12909

Docker hitting scaling issues when a particular base layer is a component of thousands of images

    Details

    • Type: Performance
    • Status: Resolved
    • Priority: High
    • Resolution: Fixed
    • Affects Version/s: 4.12.1, 4.14.2
    • Fix Version/s: 5.2.0
    • Component/s: Docker
    • Labels:
      None

      Description

      We are experiencing significant slowdown due to our use of Docker repository hosting with Artifactory. One of the issues we are facing seems to be a scaling issue with the Docker handling of layers, where if the same layer is used by thousands of images, the processing overhead for Docker operations becomes progressively slower. This is the problematic query:

      [2016-11-28 08:35:39.748 UTC] <artifactory@127.0.0.1(59677)> artifactory: [unknown]: LOG:  duration: 270.904 ms  execute S_5: select distinct  n.repo as itemRepo,n.node_path as itemPath,n.node_name as itemName,n.created as itemCreated,n.modified as itemModified,n.updated as itemUpdated,n.created_by as itemCreatedBy,n.modified_by as itemModifiedBy,n.node_type as itemType,n.bin_length as itemSize,n.node_id as itemId,n.depth as itemDepth,n.sha1_actual as itemActualSha1,n.sha1_original as itemOriginalSha1,n.md5_actual as itemActualMd5,n.md5_original as itemOriginalMd5 from  nodes n left outer join node_props np100 on np100.node_id = n.node_id where (( np100.prop_key = $1 and  np100.prop_value = $2) and n.node_type = $3) and(n.repo != $4 or n.repo is null) 
      [2016-11-28 08:35:39.748 UTC] <artifactory@127.0.0.1(59677)> artifactory: [unknown]: DETAIL:  parameters: $1 = 'sha256', $2 = 'a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4', $3 = '1', $4 = 'auto-trashcan'
      

      In our case, we have a particular layer that shows up in 10,000+ images. Note that in the below results, it is taking advantage of an index I created, as described here:

      https://www.jfrog.com/jira/browse/RTFACT-12908

      This previously took 250+ milliseconds with 12,000+ matching nodes. We removed several hundred obsolete Docker images and brought it down to just over 10,000 matching nodes. This now takes 170 - 200 milliseconds to query:

      artifactory=# select count(*) from node_props where prop_value = 'a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4' and prop_key = 'sha256';
       count 
      -------
       10181
      (1 row)
      
      Time: 9.996 ms
      artifactory=# explain analyze execute statement1('sha256', 'a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4', 1, 'auto-trashcan');         
                                                                                                          QUERY PLAN                                                                                                     
      -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
       Unique  (cost=120.46..120.50 rows=1 width=258) (actual time=162.251..171.768 rows=10181 loops=1)
         ->  Sort  (cost=120.46..120.46 rows=1 width=258) (actual time=162.250..166.517 rows=10181 loops=1)
               Sort Key: n.repo, n.node_path, n.node_name, n.created, n.modified, n.updated, n.created_by, n.modified_by, n.bin_length, n.node_id, n.depth, n.sha1_actual, n.sha1_original, n.md5_actual, n.md5_original
               Sort Method: external merge  Disk: 2880kB
               ->  Nested Loop  (cost=5.18..120.45 rows=1 width=258) (actual time=2.276..46.881 rows=10181 loops=1)
                     ->  Bitmap Heap Scan on node_props np100  (cost=4.76..111.99 rows=1 width=8) (actual time=2.246..11.370 rows=10181 loops=1)
                           Recheck Cond: ((prop_value)::text = $2)
                           Filter: ((prop_key)::text = $1)
                           Heap Blocks: exact=4703
                           ->  Bitmap Index Scan on node_props_prop_value_idx_2  (cost=0.00..4.76 rows=28 width=0) (actual time=1.532..1.532 rows=10231 loops=1)
                                 Index Cond: ((prop_value)::text = $2)
                     ->  Index Scan using nodes_pk on nodes n  (cost=0.42..8.45 rows=1 width=258) (actual time=0.003..0.003 rows=1 loops=10181)
                           Index Cond: (node_id = np100.node_id)
                           Filter: ((((repo)::text <> $4) OR (repo IS NULL)) AND (node_type = $3))
       Execution time: 172.888 ms
      (15 rows)
      
      Time: 173.297 ms
      

      To join and output 10,000 records it is taking 173 milliseconds. Does the Docker plugin really need to get ALL of the results from this query? Or might it be satisfied with only a few results?

      In our case, this query is happening many times a second during peak, where peak is during continuous build operations that invoke Docker push and pull form Artifactory. Before I implemented the index hack, it was reporting a variety of SHA-256 values. After I implemented the index hack to speed up how fast it finds the SHA-256 values, now only a few values such as the one above remain as a concern. Due to the 10,000+ results that are being prepared and processed multiple times a second, it is really affecting the performance of Artifactory as a whole. It is causing Artifactory to be laggy in the day for all users of Artifactory - whether the UI, Maven, Docker, or many of the others.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                yuvalr Yuval Reches
                Reporter:
                mark.mielke Mark Mielke
              • Votes:
                1 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: