[RTFACT-14296] Docker hitting scaling issues when a particular base layer is a component of thousands of images (Part 2) Created: 28/May/17  Updated: 15/Jul/19

Status: Open
Project: Artifactory Binary Repository
Component/s: Docker
Affects Version/s: 4.12.1, 4.14.2, 5.3.1
Fix Version/s: None

Type: Improvement Priority: High
Reporter: Mark Mielke Assignee: Unassigned
Resolution: Unresolved Votes: 4
Labels: None

Issue Links:
Contains(WBSGantt)
Duplicate
is duplicated by RTFACT-19629 docker push is slow when layers exist... Open
Relationship
relates to RTFACT-17630 Improving docker virtual manifest sync Resolved
is related to RTFACT-12909 Docker hitting scaling issues when a ... Resolved

 Description   

This issue was partially addressed in RTFACT-12909. However, the query is still very slow, and still needs to be addressed. I have updated the text to reflect only the remaining work required.

We are experiencing significant slowdown due to our use of Docker repository hosting with Artifactory. One of the issues we are facing seems to be a scaling issue with the Docker handling of layers, where if the same layer is used by thousands of images, the processing overhead for Docker operations becomes progressively slower.

Query #1:

[2017-05-28 04:06:39.326 UTC] <artifactory@127.0.0.1(37430)> artifactory: PostgreSQL JDBC Driver: LOG:  duration: 145.988 ms  execute S_14: select distinct  n.repo as itemRepo,n.node_id as itemId,n.node_path as itemPath,n.node_name as itemName,n.node_type as itemType from  nodes n where ( n.repo = $1 and n.depth >= $2 and n.node_name = $3 and n.node_type = $4) 
[2017-05-28 04:06:39.326 UTC] <artifactory@127.0.0.1(37430)> artifactory: PostgreSQL JDBC Driver: DETAIL:  parameters: $1 = 'docker-v2-local', $2 = '1', $3 = 'sha256__a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4', $4 = '1'
[2017-05-28 04:06:39.340 UTC] <artifactory@127.0.0.1(33921)> artifactory: PostgreSQL JDBC Driver: LOG:  duration: 169.690 ms  execute S_11: select distinct  n.repo as itemRepo,n.node_id as itemId,n.node_path as itemPath,n.node_name as itemName,n.node_type as itemType from  nodes n where ( n.repo = $1 and n.depth >= $2 and n.node_name = $3 and n.node_type = $4) 
[2017-05-28 04:06:39.340 UTC] <artifactory@127.0.0.1(33921)> artifactory: PostgreSQL JDBC Driver: DETAIL:  parameters: $1 = 'docker-v2-local', $2 = '1', $3 = 'sha256__a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4', $4 = '1'
[2017-05-28 04:06:39.349 UTC] <artifactory@127.0.0.1(39528)> artifactory: PostgreSQL JDBC Driver: LOG:  duration: 183.926 ms  execute <unnamed>: select distinct  n.repo as itemRepo,n.node_id as itemId,n.node_path as itemPath,n.node_name as itemName,n.node_type as itemType from  nodes n where ( n.repo = $1 and n.depth >= $2 and n.node_name = $3 and n.node_type = $4) 
[2017-05-28 04:06:39.349 UTC] <artifactory@127.0.0.1(39528)> artifactory: PostgreSQL JDBC Driver: DETAIL:  parameters: $1 = 'docker-v2-local', $2 = '1', $3 = 'sha256__a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4', $4 = '1'
[2017-05-28 04:06:39.364 UTC] <artifactory@127.0.0.1(39484)> artifactory: PostgreSQL JDBC Driver: LOG:  duration: 164.312 ms  execute <unnamed>: select distinct  n.repo as itemRepo,n.node_id as itemId,n.node_path as itemPath,n.node_name as itemName,n.node_type as itemType from  nodes n where ( n.repo = $1 and n.depth >= $2 and n.node_name = $3 and n.node_type = $4) 
[2017-05-28 04:06:39.364 UTC] <artifactory@127.0.0.1(39484)> artifactory: PostgreSQL JDBC Driver: DETAIL:  parameters: $1 = 'docker-v2-local', $2 = '1', $3 = 'sha256__a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4', $4 = '1'

Here is the PostgreSQL "explain analyze" output for this query:

artifactory=# explain analyze select distinct  n.repo as itemRepo,n.node_id as itemId,n.node_path as itemPath,n.node_name as itemName,n.node_type as 
itemType from  nodes n where ( n.repo = 'docker-v2-local' and n.depth >= 1 and n.node_name = 'sha256__a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4' and n.node_type = 1);
                                                                     QUERY PLAN                                                                     
----------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=21833.97..21863.11 rows=3885 width=101) (actual time=107.094..124.709 rows=15495 loops=1)
   ->  Sort  (cost=21833.97..21843.68 rows=3885 width=101) (actual time=107.088..115.748 rows=15495 loops=1)
         Sort Key: node_id, node_path
         Sort Method: external sort  Disk: 2200kB
         ->  Bitmap Heap Scan on nodes n  (cost=627.29..21602.35 rows=3885 width=101) (actual time=15.222..66.333 rows=15495 loops=1)
               Recheck Cond: ((node_name)::text = 'sha256__a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4'::text)
               Filter: ((depth >= 1) AND ((repo)::text = 'docker-v2-local'::text) AND (node_type = 1))
               Rows Removed by Filter: 142
               Heap Blocks: exact=11839
               ->  Bitmap Index Scan on nodes_node_name_idx  (cost=0.00..626.32 rows=15436 width=0) (actual time=10.049..10.049 rows=15641 loops=1)
                     Index Cond: ((node_name)::text = 'sha256__a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4'::text)
 Planning time: 0.532 ms
 Execution time: 129.258 ms
(13 rows)

Because this SHA-256 shows up 15,000 times, much of the table is scanned and then returned. You can also see that this query is frequent (many times a second).

Query #2:

This is the query I had originally raised a concern about.

[2017-05-28 04:08:57.262 UTC] <artifactory@127.0.0.1(39532)> artifactory: PostgreSQL JDBC Driver: LOG:  duration: 724.550 ms  execute S_13: select distinct  n.repo as itemRepo,n.node_path as itemPath,n.node_name as itemName,n.created as itemCreated,n.modified as itemModified,n.updated as itemUpdated,n.created_by as itemCreatedBy,n.modified_by as itemModifiedBy,n.node_type as itemType,n.bin_length as itemSize,n.node_id as itemId,n.depth as itemDepth,n.sha1_actual as itemActualSha1,n.sha1_original as itemOriginalSha1,n.md5_actual as itemActualMd5,n.md5_original as itemOriginalMd5 from  nodes n left outer join node_props np100 on np100.node_id = n.node_id where ( n.repo = $1 and( np100.prop_key = $2 and  substr(np100.prop_value::text, 1, 255) = substr($3, 1, 255))) and n.node_type = $4 limit  10 
[2017-05-28 04:08:57.262 UTC] <artifactory@127.0.0.1(39532)> artifactory: PostgreSQL JDBC Driver: DETAIL:  parameters: $1 = 'docker-v2-local', $2 = 'sha256', $3 = 'a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4', $4 = '1'

The explain analyze output:

artifactory=# explain analyze select distinct  n.repo as itemRepo,n.node_path as itemPath,n.node_name as itemName,n.created as itemCreated,n.modified as itemModified,n.updated as itemUpdated,n.created_by as itemCreatedBy,n.modified_by as itemModifiedBy,n.node_type as itemType,n.bin_length as itemSize,n.node_id as itemId,n.depth as itemDepth,n.sha1_actual as itemActualSha1,n.sha1_original as itemOriginalSha1,n.md5_actual as itemActualMd5,n.md5_original as itemOriginalMd5 from  nodes n left outer join node_props np100 on np100.node_id = n.node_id where ( n.repo = 'docker-v2-local' and( np100.prop_key = 'sha256' and  substr(np100.prop_value::text, 1, 255) = substr('a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4', 1, 255))) and n.node_type = 1 limit  10;
                                                                                                   QUERY PLAN                                                                             
                       
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------
 Limit  (cost=37076.77..37077.15 rows=10 width=262) (actual time=475.014..475.037 rows=10 loops=1)
   ->  Unique  (cost=37076.77..37130.77 rows=1440 width=262) (actual time=475.011..475.033 rows=10 loops=1)
         ->  Sort  (cost=37076.77..37080.37 rows=1440 width=262) (actual time=475.010..475.019 rows=10 loops=1)
               Sort Key: n.node_path, n.node_name, n.created, n.modified, n.updated, n.created_by, n.modified_by, n.bin_length, n.node_id, n.depth, n.sha1_actual, n.sha1_original, n.md5_
actual, n.md5_original
               Sort Method: external merge  Disk: 4424kB
               ->  Nested Loop  (cost=706.84..37001.23 rows=1440 width=262) (actual time=14.082..280.463 rows=15495 loops=1)
                     ->  Bitmap Heap Scan on node_props np100  (cost=706.42..10819.79 rows=5719 width=8) (actual time=14.038..90.829 rows=15637 loops=1)
                           Recheck Cond: (substr((prop_value)::text, 1, 255) = 'a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4'::text)
                           Filter: ((prop_key)::text = 'sha256'::text)
                           Heap Blocks: exact=7311
                           ->  Bitmap Index Scan on node_props_prop_value_idx  (cost=0.00..704.99 rows=15925 width=0) (actual time=12.277..12.277 rows=31172 loops=1)
                                 Index Cond: (substr((prop_value)::text, 1, 255) = 'a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4'::text)
                     ->  Index Scan using nodes_pk on nodes n  (cost=0.42..4.57 rows=1 width=262) (actual time=0.010..0.011 rows=1 loops=15637)
                           Index Cond: (node_id = np100.node_id)
                           Filter: (((repo)::text = 'docker-v2-local'::text) AND (node_type = 1))
                           Rows Removed by Filter: 0
 Planning time: 1.467 ms
 Execution time: 477.136 ms
(18 rows)

This involves the same layer, and is much more expensive. It depends what the users are doing as to which one is the biggest concern. When this one happens multiple times a second, it causes our server to become overloaded. The "LIMIT 10" added, as well as the effective use of the substr index help. But, the core problem is still here.

I think the fundamental problem here is that:

1) nodes has tens of thousands of matches for the repo key.
2) node_props has 15,000+ matches for the SHA-256.
3) The results cannot be prepared without a complex table merge.

If you had the option, I would highly suggest pulling the SHA-256 up into the nodes table. This would be more efficient storage, and more efficient searching. You could have an index on the SHA-256 and the repo key, and very quickly identify results.

For the second query, I have seen evidence that dropping DISTINCT can make the query < 5 milliseconds, compared to the current > 400 ms. This is because with the LIMIT 10, the nested join (instead of hash merge) is very effective at finding results more quickly. But, this doesn't seem to help with the first query.

I think this issue needs to be addressed still to ensure that Artifactory is scalable when it comes to supporting Docker. This SHA-256 refers to an image layer that occurs when a Dockerfile step adds environment variables but does not affect any file content. This is an extremely common situation with Docker.



 Comments   
Comment by Mark Mielke [ 28/May/17 ]

Some fields are messed up above, as I tried to use "Clone". JFrog will have to fix them... Sprint ID, Fix Version, ... Sorry about that.

Comment by Mark Mielke [ 28/May/17 ]

Here is the result of the second query if I drop the "DISTINCT":

artifactory=# explain analyze select n.repo as itemRepo,n.node_path as itemPath,n.node_name as itemName,n.created as itemCreated,n.modified as itemModified,n.updated as itemUpdated,n.created_by as itemCreatedBy,n.modified_by as itemModifiedBy,n.node_type as itemType,n.bin_length as itemSize,n.node_id as itemId,n.depth as itemDepth,n.sha1_actual as itemActualSha1,n.sha1_original as itemOriginalSha1,n.md5_actual as itemActualMd5,n.md5_original as itemOriginalMd5 from  nodes n left outer join node_props np100 on np100.node_id = n.node_id where ( n.repo = 'docker-v2-local' and( np100.prop_key = 'sha256' and  substr(np100.prop_value::text, 1, 255) = substr('a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4', 1, 255))) and n.node_type = 1 limit  1;
                                                                                   QUERY PLAN                                                                                    
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.42..34.14 rows=1 width=262) (actual time=0.097..0.097 rows=1 loops=1)
   ->  Nested Loop  (cost=0.42..48079.47 rows=1426 width=262) (actual time=0.095..0.095 rows=1 loops=1)
         ->  Seq Scan on node_props np100  (cost=0.00..22064.65 rows=5663 width=8) (actual time=0.056..0.056 rows=1 loops=1)
               Filter: (((prop_key)::text = 'sha256'::text) AND (substr((prop_value)::text, 1, 255) = 'a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4'::text))
               Rows Removed by Filter: 89
         ->  Index Scan using nodes_pk on nodes n  (cost=0.42..4.58 rows=1 width=262) (actual time=0.031..0.031 rows=1 loops=1)
               Index Cond: (node_id = np100.node_id)
               Filter: (((repo)::text = 'docker-v2-local'::text) AND (node_type = 1))
 Planning time: 1.093 ms
 Execution time: 0.193 ms
(10 rows)

Much, much faster. It is enough to make me wonder whether the PostgreSQL planner could use some help here, or whether this is just a tough problem to solve in a general fashion, and in this one particular case a nested join happens to be faster.

Comment by Mark Mielke [ 28/May/17 ]

For the second query, the use of "DISTINCT ON (ItemId)" instead of "DISTINCT", causes it to use a streamable hash, instead of a sort, and this seems to work very well too:

artifactory=# explain analyze select distinct on (itemId) n.repo as itemRepo,n.node_path as itemPath,n.node_name as itemName,n.created as itemCreated,n.modified as itemModified,n.updated as itemUpdated,n.created_by as itemCreatedBy,n.modified_by as itemModifiedBy,n.node_type as itemType,n.bin_length as itemSize,n.node_id as itemId,n.depth as itemDepth,n.sha1_actual as itemActualSha1,n.sha1_original as itemOriginalSha1,n.md5_actual as itemActualMd5,n.md5_original as itemOriginalMd5 from  nodes n left outer join node_props np100 on np100.node_id = n.node_id where ( n.repo = 'docker-v2-local' and( np100.prop_key = 'sha256' and  substr(np100.prop_value::text, 1, 255) = substr('a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4', 1, 255))) and n.node_type = 1 limit  10;
                                                                                      QUERY PLAN                                                                                       
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.85..476.53 rows=10 width=262) (actual time=3.645..4.958 rows=10 loops=1)
   ->  Unique  (cost=0.85..67832.23 rows=1426 width=262) (actual time=3.644..4.955 rows=10 loops=1)
         ->  Nested Loop  (cost=0.85..67828.66 rows=1426 width=262) (actual time=3.640..4.942 rows=10 loops=1)
               ->  Index Scan using node_props_node_id_idx on node_props np100  (cost=0.42..41813.84 rows=5663 width=8) (actual time=3.607..4.868 rows=10 loops=1)
                     Filter: (((prop_key)::text = 'sha256'::text) AND (substr((prop_value)::text, 1, 255) = 'a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4'::text))
                     Rows Removed by Filter: 2194
               ->  Index Scan using nodes_pk on nodes n  (cost=0.42..4.58 rows=1 width=262) (actual time=0.005..0.005 rows=1 loops=10)
                     Index Cond: (node_id = np100.node_id)
                     Filter: (((repo)::text = 'docker-v2-local'::text) AND (node_type = 1))
 Planning time: 1.025 ms
 Execution time: 5.055 ms
(11 rows)

I think you use "DISTINCT" because you are not sure when creating the query whether or not you will generate unique per-node results. In this particular case, since every node has only one "sha256" property, I believe it is always "DISTINCT" and doesn't need special handling. But, perhaps in other cases when you join against multiple tables it might not be. The "DISTINCT ON (ItemId), where ItemId is a primary key in nodes, should have the result you want, but help the query planner do a much better job by making it clear to the query planner what you really require. You don't really require the entire row to be distinct... you only require one row per node. "DISTINCT ON (ItemId)" then achieves this.

Comment by Mark Mielke [ 29/May/17 ]

For the second query... use of DISTINCT ON (...) turns out to have worked because it encouraged the use of the primary key as a faster means of identifying DISTINCT results. We have two Docker repositories, both of which have this problematic layer, where one repository has many more instances of this problem than the other (10,000+ vs 100+). The consequence of this is that although DISTINCT ON (...) encourages the use of the nested join and quickly finds 10 results and returns them quickly, when performing a "docker pull" on the smaller Docker repository resulted in 100+ ms wait, as it had to skip many records to find the problematic layer in the other repository using nested join.

I adjusted the code in our instance to try out DISTINCT ON (...) for "items" ("nodes") in the Aql processing. I'm still monitoring whether it really has a valuable impact or not. The main conclusion remains that the SHA-2 is more of an identity field, than a property field, and the use of a property table is extremely inefficient for this extremely common use case.

Comment by Mark Mielke [ 30/Oct/17 ]

It looks like Artifactory 5.5 now has supports for the "sha256" as a column in the "nodes" table. Can you confirm if the Docker plugin will take advantage of this new capability once we have completed the migration steps to calculate the SHA-256 for all existing artifacts? It looks all the right hooks exist and it should be just a case of whether the Docker plugin uses the new hooks properly?

Generated at Wed Aug 21 12:43:31 UTC 2019 using JIRA 7.6.3#76005-sha1:8a4e38d34af948780dbf52044e7aafb13a7cae58.