GemFire handles network outages by using a weighting system to determine whether the
remaining available members have a sufficient quorum to continue as a distributed system.
Individual members are each assigned a weight, and the quorum is determined by
comparing the total weight of currently responsive members to the previous total
weight of responsive members.
Your distributed system can split into separate running systems when members lose the
ability to see each other. The typical cause of this problem is a failure in the
network. To guard against this, Pivotal GemFire offers network partitioning
detection so that when a failure occurs, only one side of the system keeps running
and the other side automatically shuts down.
The network partitioning detection feature is only enabled when
enable-network-partition-detection is set to true in gemfire.properties. By
default, this property is set to false. See Configure Pivotal GemFire to Handle Network Partitioning
for details. Quorum weight calculations are always performed and logged by the
locator regardless of this configuration setting.
The overall process for detecting a network partition is as follows:
- The distributed system starts up.
When you start up a distributed system, you typically start the locators
first, then the cache servers and then other members such as applications or
processes that access distributed system data.
- After the locators start up, the
oldest locator assumes the role of the membership coordinator. Peer
discovery occurs as members come up and locators generate a membership
discovery list for the distributed system. Locators hand out the membership
discovery list as each member process starts up. This list typically
contains a hint on who the current membership coordinator is.
- Members join and if necessary,
depart the distributed system:
- Whenever the coordinator is alerted
of a membership change (a member either joins or leaves the distributed
system), it generates a new membership view. The membership view is
generated by a two-phase protocol:
- In the first phase, the
membership coordinator sends out view preparation message to all
members and waits 12 seconds for a view preparation ack return
message from each member. If the coordinator does not receive an ack
message from a member within 12 seconds, the coordinator attempts to
connect to the member's failure-detection socket and then attempts
to connect to its direct-channel socket. If the coordinator cannot
connect to the member's sockets, it declares the member dead and
starts the process over again with a new view.
- In the second phase, the
coordinator sends out the new membership view to all members that
acknowledged the view preparation message. The coordinator waits
another 12 seconds for an acknowledgment of receiving the new view
from each member. Any members that fail to acknowledge the view are
removed from the view.
- Each time the membership
coordinator sends a view, it calculates the total weight of members in the
current membership view and compares it to the total weight of the previous
membership view. Some conditions to note:
- When the first membership
view is sent out, there are no accumulated losses. The first view
only has additions and usually contains the initial
- A new coordinator may have
a stale view of membership if it did not see the last membership
view sent by the previous (failed) coordinator. If new members were
added during that failure, then the new members may be ignored when
the first new view is sent out.
- If members were removed
during the failover to the new coordinator, then the new coordinator
will have to determine these losses during the view preparation
- If GemFire detects that the total
membership weight has dropped by a configured percentage within a
single membership view change (loss of quorum), then GemFire will declare a
network partition event if network partitioning detection has been enabled
(enable-network-partition-detection set to true). The
coordinator sends a network-partitioned-detected UDP message to all members
(even to the non-responsive ones) and then closes the distributed system
with a ForcedDisconnectException. If a member fails to
receive the message before the coordinator closes the system, the member is
responsible for detecting the event on its own.
The presumption is that when a network partition is declared, the members that can
generate a quorum of membership will continue operations and assume the role of the
"surviving" side. The surviving members will elect a new coordinator, designate a
lead member, and so on.
Note that it is possible for a member to fail during view transmission and that some
other process will reuse its fd-sock or direct-channel port, causing a false
positive in the member verification step. This is acceptable because it means that
the machine that hosted the process is still reachable. No network failure has
occurred and the member that did not acknowledge the view preparation message will
be removed in a subsequent view.