Uploaded image for project: 'Artifactory Binary Repository'
  1. Artifactory Binary Repository
  2. RTFACT-8735

Improving the HA recovery mechanism when a cluster member experience OOM issue

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Normal
    • Resolution: Fixed
    • Affects Version/s: 3.9.4, 4.3.0
    • Fix Version/s: 4.11.1
    • Component/s: High Availability
    • Labels:
      None

      Description

      When an HA cluster member experiencing an OutOfMemory issue, it can that leaves the JVM in an unstable state while some of the Artifactory processes and operations forcibly stopped or failed due to the OOM, while others may still run.
      An example for such a state is while one of the cluster nodes experienced an OOM and does not function correctly due to the OOM, but the job which responsible for updating the DB with the node last heartbeat still runs. This can cause the other cluster members to try and re-join this node as the DB shows that the node is active and this is a potential for other issues.

        Attachments

          Activity

            People

            • Assignee:
              oferc Ofer Cohen (Inactive)
              Reporter:
              shayb Shay Bagants
              Assigned QA:
              Matan Katz
            • Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: