Uploaded image for project: 'Artifactory Binary Repository'
  1. Artifactory Binary Repository
  2. RTFACT-8735

Improving the HA recovery mechanism when a cluster member experience OOM issue

    XMLWordPrintable

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Normal
    • Resolution: Fixed
    • Affects Version/s: 3.9.4, 4.3.0
    • Fix Version/s: 4.11.1
    • Component/s: High Availability
    • Labels:
      None

      Description

      When an HA cluster member experiencing an OutOfMemory issue, it can that leaves the JVM in an unstable state while some of the Artifactory processes and operations forcibly stopped or failed due to the OOM, while others may still run.
      An example for such a state is while one of the cluster nodes experienced an OOM and does not function correctly due to the OOM, but the job which responsible for updating the DB with the node last heartbeat still runs. This can cause the other cluster members to try and re-join this node as the DB shows that the node is active and this is a potential for other issues.

        Attachments

          Activity

            People

            Assignee:
            oferc Ofer Cohen (Inactive)
            Reporter:
            shayb Shay Bagants
            Assigned QA:
            Matan Katz
            Votes:
            1 Vote for this issue
            Watchers:
            6 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: