The issue is that creating users in bulk doesn't work (via java client). If the default timeouts are not high enough, then we will see timeout errors due to which few API requests will be lost (fail with timeout errors i.e. HTTP no response).
The solution according to Shiva is to use the same timeout configurations as being set by the DevOps team as they have been running in prod without any issues so far. Hence once the default timeout was incremented from 900 to 2400s, the load tests were successful.
More details below -
In Kubernetes cluster, (canonical kubernetes on AWS), soldev-qa runs stress tests (like creating 1000 groups, 2000 users etc).
With artifactory 6.0 installed from helm (from stable/artifactory-ha), these tests fail around 200 or so and the error is usually HTTP_EMPTY_RESPONSE. Increasing ELB timeout or ELB connection draining timeouts do not have any effect. Setting keepalive_timeout to 0 and keepalive_requests to 0 (disable keepalive completely) solves this problem for us.
This may have an adverse effect on performance. The solution recommended by Nginx support team was to leave the keepalive_timeout to the default value of 65 seconds, but change the keepalive_requests to the maximum supported value 2147483647 for 32-bit and 4294967295 for 64-bit platform). This should ensure that the requests in keepalive do not get closed by nginx and elb will not get empty response.