Overview of Multi-site Caching

A multi-site installation consists of two or more distributed systems that are loosely coupled. Each site manages its own distributed system, but region data is distributed to remote sites using one or more logical connections.

The logical connections consist of a gateway sender in the sending site, and a gateway receiver in the receiving site. In a client/server installation, gateway senders and gateway receivers are configured in the server layer.

Gateway senders and receivers are defined at startup in the distributed system member caches. A site can use serial and/or parallel gateway sender configurations, as described in Gateway Senders.

Consistency for WAN Updates

GemFire ensures that all copies of a region eventually reach a consistent state on all members and clients that host the region, including GemFire members that distribute region events across a WAN.

By default, potential WAN conflicts are resolved using a timestamp mechanism. You can optionally install a custom conflict resolver to apply custom logic when determining whether to apply a potentially conflicting update received over a WAN.

Consistency for Region Updates describes how GemFire ensures consistency within a cluster, in client caches, and when applying updates over a WAN. Resolving Conflicting Events provides more details about implementing a custom conflict resolver for WAN updates.

Discovery for Multi-Site Systems

Each GemFire cluster in a WAN configuration uses locators to discover remote GemFire clusters as well as local GemFire members.

Each locator in a WAN configuration defines a unique distributed-system-id property that identifies the local cluster to which it belongs. A locator uses the remote-locators property to define the addresses of one or more locators in remote GemFire clusters to use for WAN distribution.

When a locator starts up, it contacts each locator that is configured in the remote-locators property to exchange information about the available locators and gateway receivers in the cluster. The locator also shares information about locators and gateway receivers in any other GemFire clusters that have connected to the cluster. Connected clusters can then use the shared gateway receiver information to distribute region events according to their configured gateway senders.

Each time a new locator starts up or an existing locator shuts down, the changed information is broadcast to other connected GemFire clusters.

Note: When you configure a multi-site system, you must use locators for discovery rather than multicast.

Gateway Senders

A GemFire cluster uses a gateway sender to distribute region events to another, remote GemFire cluster. You can create multiple gateway sender configurations to distribute region events to multiple remote clusters, and/or to distribute region events concurrently to another remote cluster.

A gateway sender always communicates with a gateway receiver in a remote cluster. Gateway senders do not communicate directly with other cache server instances. See Gateway Receivers.

GemFire provides two types of gateway sender configurations: serial gateway senders and parallel gateway senders.

Serial Gateway Senders

A serial gateway sender distributes region events from a single GemFire server in the local cluster to a remote GemFire cluster. Although multiple regions can use the same serial gateway for distribution, a serial gateway uses a single logical event queue to dispatch events for all regions that use the gateway sender.

Because a serial gateway sender distributes all of a region's events from a single GemFire member, it provides the most control over ordering region events as they are distributed across the WAN. However, a serial gateway sender provides only a finite amount of throughput for distributing events. As you add more regions and servers to the local cluster, you may need to configure additional serial gateway senders manually and isolate individual regions on specific serial gateway senders to handle the increased distribution traffic.

Parallel Gateway Senders

A parallel gateway sender distributes region events simultaneously from each GemFire server that hosts the region. For a partitioned region, this means that each GemFire server that hosts buckets for the region uses its own logical queue to distribute events for those buckets. As you add new GemFire servers to scale the partitioned region, WAN distribution throughput scales automatically with each new instance of the parallel gateway sender.

A parallel gateway sender uses multiple queues on multiple GemFire members to simultaneously distribute region events to a remote GemFire cluster. Each queue distributes only part of the events for a configured region. For example, for a partitioned region, each queue distributes only those events for the local partition.

Distributed, non-replicated regions can also use a parallel gateway sender for distribution. With a distributed region, GemFire creates a separate gateway sender and queue on each member that hosts the region, and then hashes region events to place them into a distinct queue. This provides a level of scalability for distributed regions that is similar to that provided for partitioned regions.

Note: Replicated regions cannot use a parallel gateway sender.

Although parallel gateway senders provide the best throughput for WAN distribution, they provide less control for ordering events. With a parallel gateway sender, you cannot preserve event ordering for the region as a whole because multiple GemFire servers distribute the regions events at the same time. However, the ordering of events for a given partition (or in a given queue of a distributed region) can be preserved. See Configuring Event Queues.

Gateway Sender Queues

The queue that a gateway sender uses to distribute events to a remote site overflows to disk as needed, in order to prevent the GemFire member from running out of memory. You can configure the maximum amount of memory that each queue uses, as well as the batch size and frequency for processing batches in the queue. You can also configure these queues to persist to disk, so that a gateway sender can pick up where it left off when its member shuts down and is later restarted.

Optionally, a serial gateway sender queue can use multiple threads to dispatch queued events. When you configure multiple threads for a serial gateway sender, the logical queue that is hosted on a GemFire member is divided into multiple physical queues, each with a dedicated dispatcher thread. You can then configure whether those threads dispatch queued events by key, by thread, or in the same order in which events were added to the queue.

Note: You cannot configure multiple dispatcher threads or the ordering policy for a parallel gateway sender queue.

See Configuring Event Queues.

High Availability for Gateway Senders

When a serial gateway sender configuration is deployed to multiple GemFire members, only one "primary" sender is active at a given time. All other serial gateway sender instances are inactive "secondaries" that are available as backups if the primary sender shuts down. GemFire designates the first gateway sender to start up as the primary sender, and all other senders become secondaries. As gateway senders start and shut down in the distributed system, GemFire ensures that the oldest running gateway sender operates as the primary.

A parallel gateway sender is deployed to multiple GemFire members by default, and each GemFire member that hosts primary buckets for a partitioned region actively distributes data to the remote GemFire site. When you use parallel gateway senders, high availability for WAN distribution is provided if you configure the partitioned region for redundancy. With a redundant partitioned region, if a member that hosts primary buckets fails or is shut down, then a GemFire member that hosts a redundant copy of those buckets takes over WAN distribution for those buckets.

Stopping Gateway Senders

The scope of the gateway sender stop operation is the VM on which it is invoked. When you stop a parallel gateway sender using the GatewaySender.stop() or gfsh stop gateway-sender, the gateway sender is stopped on the individual node where this API is called. If the gateway sender is not parallel (serial), then the gateway sender will stop on the local VM, and the secondary gateway sender will become primary and start dispatching events. The gateway sender will wait for GatewaySender.MAXIMUM_SHUTDOWN_WAIT_TIME seconds before stopping itself (by default, this value is set to 0.) You can set this Java system property when starting the server member in gfsh. If the Java system property is set to -1, then the gateway sender process will wait until all events are dispatched from the queue before stopping.

Note: Pivotal recommends that you use extreme caution when stopping parallel gateway senders by using the GatewaySender.stop() API or gfsh stop gateway-sender command.

The API and gfsh command stops the parallel gateway sender in one member, which causes data loss because events to buckets in that member will be dropped by the stopped sender. The partitioned region does not failover in this scenario since the member is still running. Instead, to ensure that the remaining events are sent, shut down the entire member to ensure proper failover of partition region events. When a member with the stopped parallel sender is shut down, the other parallel gateway sender members hosting the partition region become primary and deliver the remaining events. In addition, if the whole cluster is brought down after stopping an individual parallel gateway sender, then events queued on that gateway sender can be lost.

Pausing Gateway Senders

Similar to stopping a gateway sender, the scope of pausing a gateway sender is the VM on which it is invoked. Pausing a gateway sender temporarily stops the dispatching of events from the underlying queue. Note that events are still queued into the queue. In case where the gateway sender is parallel, the gateway sender is paused on the individual node where the GatewaySender.pause() API is called or the gfsh pause gateway-sender command is invoked. The parallel gateway senders on other members can still dispatch events. In case where the paused gateway sender is not parallel (serial) and is not primary, then the primary gateway sender will still continue dispatching events. The batch of events that are in the process of being dispatched are dispatched regardless of the state of the pause operation. We can expect a maximum of one batch of events being received at the gateway receiver even after the gateway senders have been paused.

Gateway Receivers

A gateway receiver configures a physical connection for receiving region events from gateway senders in one or more remote GemFire clusters.

A gateway receiver applies each region event to the same region or partition that is hosted in the local GemFire member. (An exception is thrown if the receiver receives an event for a region that it does not define.)

Gateway senders use any available gateway receiver in the target cluster to send region events. You can deploy gateway receiver configurations to multiple GemFire members as needed for high availability and load balancing.

See Configure Gateway Receivers.