[RTFACT-16906] default keepalive settings in our nginx container not sufficient to handle heavy workloads Created: 08/Jun/18  Updated: 28/Nov/18  Resolved: 28/Nov/18

Status: Resolved
Project: Artifactory Binary Repository
Component/s: Docker Image
Affects Version/s: 6.0.0
Fix Version/s: 6.6.0

Type: Bug Priority: Normal
Reporter: Shivaram Radhakrishna Assignee: Unassigned
Resolution: Fixed Votes: 0
Labels: None
Environment:

Canonical-Kubernetes on AWS


Issue Links:
Dependency
Relationship
Assigned QA: Gal Ben Ami

 Description   

Problem:
 
The issue is that creating users in bulk doesn't work (via java client). If the default timeouts are not high enough, then we will see timeout errors due to which few API requests will be lost (fail with timeout errors i.e. HTTP no response).
 
The solution according to Shiva is to use the same timeout configurations as being set by the DevOps team as they have been running in prod without any issues so far. Hence once the default timeout was incremented from 900 to 2400s, the load tests were successful.
 
More details below -

In Kubernetes cluster, (canonical kubernetes on AWS), soldev-qa runs stress tests (like creating 1000 groups, 2000 users etc).

With artifactory 6.0 installed from helm (from stable/artifactory-ha), these tests fail around 200 or so and the error is usually HTTP_EMPTY_RESPONSE. Increasing ELB timeout or ELB connection draining timeouts do not have any effect. Setting keepalive_timeout to 0 and keepalive_requests to 0 (disable keepalive completely) solves this problem for us.

 

This may have an adverse effect on performance. The solution recommended by Nginx support team was to leave the keepalive_timeout to the default value of 65 seconds, but change the keepalive_requests to the maximum supported value 2147483647 for 32-bit and 4294967295 for 64-bit platform). This should ensure that the requests in keepalive do not get closed by nginx and elb will not get empty response.



 Comments   
Comment by Ankush Chadha [ 22/Jun/18 ]

 
 
A production version of nginx.conf was pushed into the docker image (https://git.jfrog.info/projects/ARTIFACTORY/repos/artifactory-pro/pull-requests/367/commits/9f05d9a25b5b0391949081cdb5ed2ef4afdb30bb&source=gmail&ust=1529719373184000&usg=AFQjCNEvrTklKvYAtNYKFOGBfVOMgGrwHQ)

The only change i.e. still open is to update timeout value in artifactory.conf to 2400s from 900 to be consistent with a production version of nginx.conf

Generated at Sun Oct 20 05:59:45 UTC 2019 using JIRA 7.6.16#76018-sha1:9ed376192612a49536ac834c64177a0fed6290f5.