Uploaded image for project: 'Artifactory Binary Repository'
  1. Artifactory Binary Repository
  2. RTFACT-17959

Heartbeat not up-to-date after SQL-connection failed on some nodes

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Open
    • Priority: 4 - Normal
    • Resolution: Unresolved
    • Affects Version/s: 6.5.0
    • Fix Version/s: None
    • Component/s: High Availability
    • Labels:
    • Environment:

      RedHat-7

    • Severity:
      Medium

      Description

      Hi Jfrog Team,
      Following some network (San) problem, our postgress database was no longer available from 2 out of 4 nodes.
      After this incident, all services have return to work but the heartbeat on both nodes continue to failed. This will mark the nodes as "Unavailable" in the HA system, and Artifactory will not attempt to propagate changes to unavailable nodes.
      A solution is to restart the Artifactory service on the both impacted nodes but I don't understand why the heartbeat system don't work back when the nodes return to work.

      Our last error on the node is:
      Caused by: org.postgresql.util.PSQLException: The connection attempt failed.
      Caused by: java.net.SocketTimeoutException: connect timed out

      In your HA-node.properties file, we use a special port

      1. The port that should be used to communicate with this server within the cluster. (Optional)
        membership.port=10001

      Artifactory-HA version 6.5.0 Revision 60500900

      All the curls on port 8081 work: curl http://ip:8081/artifactory/api/system/ping
      All the curls on port 10001 are refused: curl http://ip:10001
      curl: (7) Failed connect to ip:10001; Connection refused

      We are on a HA with 4 nodes system:
      <config version="7">
          <chain>
              <provider id="cache-fs" type="cache-fs">
                  <provider id="sharding-cluster" type="sharding-cluster">
                      <sub-provider id="state-aware" type="state-aware"/>
                      <dynamic-provider id="remote-fs" type="remote"/>
                  </provider>
              </provider>
          </chain>
          <provider id="state-aware" type="state-aware">
              <fileStoreDir>/jdss/data/artifactory/ha</fileStoreDir>
              <zone>local</zone>
          </provider>

          <!-- Shard dynamic remote provider configuration -->
          <provider id="remote-fs" type="remote">
              <zone>remote</zone>
          </provider>
          <provider id="sharding-cluster" type="sharding-cluster">
              <readBehavior>crossNetworkStrategy</readBehavior>
              <writeBehavior>crossNetworkStrategy</writeBehavior>
              <redundancy>3</redundancy>
              <lenientLimit>2</lenientLimit>
              <property name="zones" value="local,remote"/>
          </provider>
      </config>

      Maybe it's a configuration problem with us?
      Do you have an idea of the problem and why does not the heartbeat refresh?

       

      Regards.

        Attachments

          Activity

            People

            Assignee:
            Unassigned
            Reporter:
            fl Fabien Legros
            Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

              Dates

              Created:
              Updated:

                Sync Status

                Connection: RTFACT Sync
                RTMID-17959 -
                SYNCHRONIZED
                • Last Sync Date: