When large docker images are replicated via event-based push replication, there is a period of time on the target where not all layers are replicated, but Artifactory will return a 200 when that image is requested. This impacts an automated build process that spans multiple sites, as the CI/users in the target region have no way of knowing when the replication is truly complete and their builds have a chance of failing.
This is causing numerous docker image pull failures and operationally delay in our production Artifactory environments.
Steps for reproduction:
- Have 2 Artifactory instances, a source and target.
- Create local Docker repositories on source and Target.
- Set up event based push replication from Source to Target.
- Perform a docker push of a large (1.5GB +) image to the Source Artifactory/Repository.
- Send a docker pull request to the target repository.
- Observe a 200 in request.log, followed by an error in docker failing to resolve the image.
- we expect the GET for the manifest to return:
- Wait until replication has completed, and retry the pull. The image will successfully be downloaded.
Suggestions for improvement:
- Better response code when not all layers have completed replication.
- Replication status of replications in progress (possibly with REST API) so build tools can perform a check and wait instead of failing a build.