Main Features of Pivotal GemFire

Read about Pivotal GemFire main features and key functionality.

This section summarizes Pivotal GemFire main features and describes key functionality. For more specific information about functionality that has been added to GemFire since the 6.5 release, refer to the Release Notes.

Feature Summary

GemFire includes the following features:
  • Combines redundancy, WAN replication, and a "shared nothing" persistence architecture to deliver fail-safe reliability and performance.
  • Continuous querying to provide active data change notifications.
  • Horizontally scalable to thousands of cache members, with multiple cache topologies to meet different enterprise needs. The cache can be distributed across multiple computers.
  • Asynchronous and synchronous cache update propagation.
  • Delta propagation distributes only the difference between old and new versions of an object (delta) instead of the entire object, resulting in significant distribution cost savings.
  • Reliable asynchronous event notifications and guaranteed message delivery through optimized, low latency distribution layer.
  • Applications run 4 to 40 times faster with no additional hardware.
  • Data awareness and real-time business intelligence. If data changes as you retrieve it, you see the changes immediately.
  • Integration with Spring Framework to speed and simplify the development of scalable, transactional enterprise applications.
  • JTA compliant transaction support.
  • Native support for Java, C++, and C# applications.

High Read-and-Write Throughput

GemFire uses concurrent main-memory data structures and a highly optimized distribution infrastructure to provide more than 10 times the read-and-write throughput of traditional disk-based databases. Applications can make copies of data dynamically in memory through synchronous or asynchronous replication for high read throughput, or partition the data across many GemFire system members to achieve high read-and-write throughput. Data partitioning doubles the aggregate throughput if the data access is fairly balanced across the entire data set. Linear increase in throughput is limited only by the backbone network capacity.

Low and Predictable Latency

GemFire's optimized caching layer minimizes context switches between threads and processes. It manages data in highly concurrent structures to minimize contention points. Communication to peer members is synchronous if the receivers can keep up, which keeps the latency for data distribution to a minimum. Servers manage object graphs in serialized form to reduce the strain on the garbage collector.

GemFire partitions subscription management (interest registration and continuous queries) across server data stores, ensuring that a subscription is processed only once for all interested clients. The resulting improvements in CPU use and bandwidth utilization improve throughput and reduce latency for client subscriptions.

High Scalability

GemFire achieves scalability through dynamic partitioning of data across many members and spreading the data load uniformly across the servers. For "hot" data, you can configure the system to expand dynamically to create more copies of the data. You can also provision application behavior to run in a distributed manner in close proximity to the data it needs.

If you need to support high and unpredictable bursts of concurrent client load, you can increase the number of servers managing the data and distribute the data and behavior across them to provide uniform and predictable response times. Clients are continuously load balanced to the server farm based on continuous feedback from the servers on their load conditions. With data partitioned and replicated across servers, clients can dynamically move to different servers to uniformly load the servers and deliver the best response times.

You can also improve scalability by implementing asynchronous "write behind" of data changes to external data stores like a database. GemFire avoids a bottleneck by queuing all updates in order and redundantly. You can also conflate updates and propagate them in batch to the database.

Continuous Availability

In addition to guaranteed consistent copies of data in memory, applications can persist data to disk on one or more GemFire members synchronously or asynchronously, by using GemFire's "shared nothing disk architecture". All asynchronous events (store-forward events) are redundantly managed in at least two members such that if one server fails, the redundant one takes over. All clients connect to logical servers, and the client fails over automatically to alternate servers in a group during failures or when servers become unresponsive.

Reliable Event Notifications

Publish/subscribe systems offer a data distribution service where new events are published into the system and routed to all interested subscribers in a reliable manner. Traditional messaging platforms focus on message delivery, but often the receiving applications need access to related data before they can process the event. This often requires them to access a standard database when the event is delivered, making the subscriber limited by the speed of the database.

GemFire offers data and events through a single system. Data is managed as objects in one or more distributed data regions, similar to tables in a database. Applications simply insert, update, or delete objects in data regions, and the platform delivers the object changes to the subscribers. The subscriber receiving the event has direct access to the related data in local memory or can fetch the data from one of the other members through a single hop.

Continuous Querying

In messaging systems like JMS, clients subscribe to topics and queues. Any message delivered to a topic is sent to the subscriber. GemFire allows applications to express complex interest using the OQL query language and referred to as continuous querying.

Parallelized Application Behavior on Data Stores

You can execute application business logic in parallel on the GemFire members. GemFire's data-aware function execution service permits execution of arbitrary data-dependent application functions on the members where the data is partitioned for locality of reference and scale.

By co-locating the relevant data and parallelizing the calculation, you dramatically increase overall throughput. More importantly, the calculation latency is inversely proportional to the number of members on which it can be parallelized.

The fundamental premise is to route the function transparently to the application that carries the data subset required by the application function and avoid moving data around on the network. Application function can be executed on only one member, executed in parallel on a subset of members or in parallel across all members. This programming model is similar to the popular Map-Reduce model from Google. Data-aware function routing is most appropriate for applications that require iteration over multiple data items (such as a query or custom aggregation function).

Shared-Nothing Disk Persistence

Each GemFire system member manages data on disk files independent of other members. Failures in disks or cache failures in one member do not affect the ability of another cache instance to operate safely on its disk files. This "shared nothing" persistence architecture allows applications to be configured such that different classes of data are persisted on different members across the system, dramatically increasing the overall throughput of the application even when disk persistence is configured for application objects.

Unlike a traditional database system, GemFire does not manage data and transaction logs in separate files. All data updates are appended to files that are similar to transactional logs of traditional databases. You can avoid disk seek times if the disk is not concurrently used by other processes, and the only cost incurred is the rotational latency. Even some of the best disk technology today takes about 2ms to seek to a track.

Multisite Data Distribution

Scalability problems can result from data sites being spread out geographically across a WAN. GemFire offers a novel model to address these topologies, ranging from a single peer-to-peer cluster to reliable communications between data centers across the WAN. This model allows distributed systems to scale out in an unbounded and loosely-coupled fashion without loss of performance, reliability or data consistency.

At the core of this architecture is the gateway sender configuration used for distributing region events to a remote site. You can deploy gateway sender instances in parallel, which enables GemFire to increase the throughput for distributing region events across the WAN. You can also configure gateway sender queues for persistence and high availability to avoid data loss in the case of a member failure.

Reduced Cost of Ownership

You can configure caching in tiers. The client application process can host a cache locally (in memory and overflow to disk) and delegate to a cache server farm on misses. Even a 30 percent hit ratio on the local cache translates to significant savings in costs. The total cost associated with every single transaction comes from the CPU cycles spent, the network cost, the access to the database, and intangible costs associated with database maintenance. By managing the data as application objects, you avoid the additional cost (CPU cycles) associated with mapping SQL rows to objects.

Single-Hop Capability for Client/Server

Clients can send individual data requests, such as put and delete, in a single hop, directly to the server holding the data key. The clients store metadata about bucket locations for partitioned region data. This feature improves performance and client access to partitioned regions in the server tier.

Heterogeneous Data Sharing

C#, C++ and Java applications can share application business objects without going through a transformation layer such as SOAP or XML. The server side behavior, though implemented in Java, provides a unique native cache for C++ and .NET applications. Application objects can be managed in the C++ process heap and distributed to other processes using a common "on-the-wire" representation for objects. A C++ serialized object can be directly deserialized as an equivalent Java or C# object. A change to a business object in one language can trigger reliable notifications in applications written in the other supported languages.

Client/Server Security

GemFire supports running multiple, distinct users in client applications. This feature accommodates installations where GemFire clients are embedded in application servers, and each application server supports data requests from many users. Each user may be authorized to access a small subset of data on the servers, as in a customer application where each customer can access only their own orders and shipments. Each user in the client connects to the server with its own set of credentials, and is given its own access authorization to the server cache.

Client/server communication has increased security against replay attacks. The server now sends the client a unique, random identifier with each response, to be used in the next client request. Because of the identifier, even a repeated client operation call is sent as a unique request to the server.