Transactions by Region Type

A transaction is managed on a per-cache basis, so multiple regions in the cache can participate in a single transaction. The data scope of a Pivotal GemFire cache transaction is the cache that hosts the transactional data. For partitioned regions, this may be a remote host to the host running the transaction application. Any transaction that includes one or more partitioned regions is run on the member storing the primary copy of the partitioned region data. Otherwise, the transaction host is the same one running the application.
  • The member running the transaction code is called the transaction initiator.

  • The member that hosts the data—and the transaction—is called the transactional data host.

So the transactional data host may be local or remote to the transaction initiator. In either case, when the transaction commits, data distribution is done from the transactional data host in the same way.
Note: If you have consistency checking enabled in your region, the transaction will generate all necessary version information for the region update when the transaction commits. See Transactions and Consistent Regions for more details.

Transactions and Partitioned Regions

In partitioned regions, transaction operations are done first on the primary data store then distributed to other members from there, regardless of which member initializes the cache operation. This is the same as is done for normal cache operations on partitioned regions.

In this figure, M1 runs two transactions.
  • The first, T1, works on data whose primary buckets are stored in M1, so M1 is both initiator and data host for the transaction.
  • The second transaction, T2, works on data whose primary buckets are stored in M2, so M1 is the transaction initiator and M2 is the transactional data host.
Transaction on a Partitioned Region:



The transaction is managed on the data host. This includes the transactional view, all operations, and all local cache event handling. In the figure, when T2 is committed, the cache on M2 is updated and the transaction events distributed throughout the system, exactly as if the transaction had originated on M2.

The first region operation in the transaction determines the transactional data host. All other operations must also work with that as their transactional data host:
  • All partitioned region data managed inside the transaction must use the transactional data host as their primary data store. In the figure above, if transaction T2 tried to put entry W or transaction T1 tried to put entry Z, they would get a TransactionDataNotColocatedException. For information on partitioning your data so it is grouped properly for your transactions, see Understanding Custom Partitioning and Data Colocation. In addition, the data must not be moved during the transaction. Plan any partitioned region rebalancing to avoid rebalancing while transactions are running. See Rebalancing Partitioned Region Data.
  • All non-partitioned region data managed inside the transaction must be available on the transactional data host and must be distributed. Operations on regions with local scope are not allowed in transactions with partitioned regions.

The next figure shows a transaction that uses two partitioned regions and one replicated region. As with the single region example, all local event handling is done on the transactional data host.

For a transaction in these data keys to work, the first operation must be on one of the partitioned regions, to establish M2 as the transactional data host. Running the first operation on a key in the replicated region would establish M1 as the transactional data host, and subsequent operations on the partitioned region data would fail with a TransactionDataNotColocated exception.

Transaction on a Partitioned Region with Other Regions:



Transactions and Replicated Regions

For replicated regions, the transaction and its operations are applied to the local member and the resulting transaction state is distributed to other members according to the attributes of each region.

Note: If possible, use distributed-ack scope for your regions where you will run transactions. The REPLICATE region shortcuts use distributed-ack scope.
The region’s scope affects how data is distributed during the commit phase. Transactions are supported for these region scopes:
  • distributed-ack. Handles transactional conflicts both locally and between members. The distributed-ack scope is designed to protect data consistency. This scope provides the highest level of coordination among transactions in different members. When the commit call returns for a transaction run on all distributed-ack regions, you can be sure that the transaction’s changes have already been sent and processed. In addition, any callbacks in the remote member have been invoked.
  • distributed-no-ack. Handles transactional conflicts locally, less coordination between members. This provides the fastest transactions with distributed regions, but doesn't work for all situations. This scope is appropriate for:
    • Applications with only one writer
    • Applications with multiple writers that write to different data sets
  • local. No distribution, handles transactional conflicts locally. Transactions on regions with local scope have no distribution, but they perform conflict checks in the local member. You can have conflict between two threads when their transactions change the same entry, like object Y in this figure.

Transactions on non-replicated regions (regions that use the old API with DataPolicy EMPTY, NORMAL and PRELOADED) are always transaction initiators, and the transaction data host is always a member with a replicated region. This is similar to the way transactions using the PARTITION_PROXY shortcut are forwarded to members with primary bucket.

Note: When you have transactions operating on EMPTY, NORMAL or PARTITION regions, make sure that the GemFire property conserve-sockets is set to false to avoid distributed deadlocks. An empty region is a region created with the API RegionShortcut.REPLICATE_PROXY or a region with that uses the old API of DataPolicy set to EMPTY.

Conflicting Transactions with Local Scope

When encountering conflicts with local scope, the first transaction to start the commit process "wins." The other transaction’s commit fails with a conflict, and its changes are dropped. In the diagram below, the resulting value for entry "Y" depends on which transaction commits first.

Transactions and Persistent Regions

By default, GemFire does not allow transactions on persistent regions. You can enable the use of transactions on persistent regions by setting the gemfire property gemfire.ALLOW_PERSISTENT_TRANSACTIONS to true.

When executing transactions on persistent regions, we recommend using the TransactionWriter to log all transactions along with a time stamp. This will allow you to recover in the event that all nodes fail simultaneously while a transaction is being committed. You can use the log to recover the data manually.