Uploaded image for project: 'Artifactory Binary Repository'
  1. Artifactory Binary Repository
  2. RTFACT-14763

Binaries deployed to secondary nodes when primary is down do not propagate with sharding-cluster

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: High
    • Resolution: Fixed
    • Affects Version/s: 5.2.1, 5.3.1, 5.4.6
    • Fix Version/s: 5.9.0
    • Component/s: Filestore, HA
    • Labels:
      None

      Description

      When you have (for example) a two-node HA cluster using the sharding-cluster template, shown below, deploying a file to one node redundantly copies the file to the other node, as expected. However, if you reach a situation where the primary node is down, and files are deployed to the secondary node, the binaries will now only exist on the secondary node, also as expected. If you start the primary node back up, the expectation is that running the following command will copy the binaries that were deployed when the primary was down to the primary itself (at the next GC):

      curl -u<user> -XPOST http://<host>/artifactory/api/system/storage/optimize

      With this, the next time GC runs, the binaries should be copied to the primary, and exist with a complete redundancy of 2. This does not appear to be happening, but does appear to work if the files were deployed to the primary node when a secondary node was down (ie the reverse of the situation above DOES work).

      Steps to reproduce the issue

      1.Create an HA setup with the following binarystore.xml file:

      <config version="1">
      <chain template="cluster-file-system"/>
      </config>

      2. It can use any DB
      3. Deploy a file and notice that it is copied to both nodes' $ARTIFACTORY_HOME/data/filestore directory
      4. Now, shut down the primary node, and deploy another, brand new file, to the secondary node
      5. Start up the primary node, run the optimize API:
      curl -u<user> -XPOST http://<host>/artifactory/api/system/storage/optimize
      6. Run GC, and note that in the logs it seems to indicate the job completed when the following is seen:
      "Checksum synchronization took XXms"
      7. Note that the binary is never copied from the secondary back to the primary, you can check the sha-value and note that it doesn't exist in the primary nodes filestore
      8. Note that trying to download this binary from the secondary, when the master is down, will fail with a 500 error.
      9. Again, note that if you shut down the secondary, and deploy to the primary, then startup the secondary, run the API and run GC, it does work as expected.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                tamirh Tamir Hadad
                Reporter:
                daniela Daniel Augustine
                Assigned QA:
                Konstantin Shenderov
              • Votes:
                2 Vote for this issue
                Watchers:
                9 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: