|About Pivotal GemFire User's Guide / Reference|
GemFire solutions architects need to estimate resource requirements for meeting application performance, scalability and availability goals.
The information here is only a guideline, and assumes a basic understanding of GemFire. While no two applications or use cases are exactly alike, the information here should be a solid starting point, based on real-world experience. Much like with physical database design, ultimately the right configuration and physical topology for deployment is based on the performance requirements, application data access characteristics, and resource constraints (i.e., memory, CPU, and network bandwidth) of the operating environment.
The following guidelines should provide a rough estimate of the amount of memory consumed by your system. A worksheet is available to help calculate your capacity using this information.
Memory calculation about keys and entries (objects) and region overhead for them can be divided by the number of members of the distributed system for data placed in partitioned regions only. For other regions, the calculation is for each member that hosts the region. Memory used by sockets, threads, and the small amount of application overhead for GemFire is per member.
For each entry added to a region, the GemFire cache API consumes a certain amount of memory to store and manage the data. This overhead is required even when an entry is overflowed or persisted to disk. Thus objects on disk take up some JVM memory, even when they are paged to disk. The Java cache overhead introduced by a region, using a 32-bit JVM, can be approximated as listed below.
Actual memory use varies based on a number of factors, including the JVM you are using and the platform you are running on. For 64-bit JVMs, the usage will usually be larger than with 32-bit JVMs. As much as 80% more memory may be required for 64-bit JVMs, due in large part to the fact that all address and integers are 64 bits, not 32 bits. The companion spread sheet does provide for 64-bit JVMs but it is necessarily an approximation due to the platform and JVM issues mentioned above.
There are several additional considerations for calculating your memory requirements:
Objects in GemFire are serialized for storage into partitioned regions and for all distribution activities, including moving data to disk for overflow and persistence. For optimum performance, GemFire tries to reduce the number of times an object is serialized and deserialized, so your objects may be stored in serialized or non-serialized form in the cache.
This table gives estimates for the cache overhead in a 32-bit JVM. The overhead is required even when an entry is overflowed or persisted to disk. Actual memory use varies based on a number of factors, including the JVM type and the platform you run on. For 64-bit JVMs, the usage will usually be larger than with 32-bit JVMs and may be as much as 80% more.
|When calculating cache overhead...||You should add...|
|For each region
Note: Memory consumption for object headers and object references can vary for 64-bit JVMs, different JVM implementations, and different JDK versions.
|81 bytes per entry|
|And concurrency checking is enabled (enabled by default)||16 bytes per entry|
|And statistics are enabled for the member||16 bytes per entry|
|And the region is partitioned||16 bytes per entry|
|And the region is persisted and/or overflowed||40 bytes per entry|
|And the region has an LRU eviction controller||16 bytes per entry|
|And the region has global scope||90 bytes per entry|
|And the region has entry expiration configured||147 bytes per entry|
|For each optional user attribute||52 bytes per entry|
GemFire's JMX management and monitoring system contributes to memory overhead and should be accounted for when establishing the memory requirements for your deployment. Specifically, the memory footprint of any processes (such as locators) that are running as JMX managers can increase.
For each resource in the distributed system that is being managed and monitored by the JMX Manager (for example, each MXBean such as MemberMXBean, RegionMXBean, DiskStoreMXBean, LockServiceMXBean and so on), you should add 10 KB of required memory to the JMX Manager node.
Objects in GemFire are serialized for storage into partitioned regions and for all distribution activities, including overflow and persistence to disk. For optimum performance, GemFire tries to reduce the number of times an object is serialized and deserialized. Because of this, your objects may be stored in serialized form or non-serialized form in the cache. To do capacity planning for your data, therefore, use the larger of the serialized and deserialized sizes. For example, if your object classes are DataSerializable, the non-serialized form will generally be the larger of the two.
In addition to better performance, GemFire PDX serialization can provide significant space savings over Java Serializable. In some cases we have seen savings of up to 65%, but the savings will vary depending on the domain objects. PDX serialization is most likely to provide the most space savings of all available options. DataSerializable is more compact, but it requires that objects be deserialized on access, so that should be taken into account. On the other hand, PDX serializable does not require deserialization for most operations, and because of that, it may provide greater space savings.
In any case, the kinds and volumes of operations that would be done on the server side should be considered in the context of data serialization, as GemFire has to deserialize data for some types of operations (access). For example, if a function invokes a get operation on the server side, the value returned from the get operation will be deserialized in most cases (the only time it will not be deserialized is when PDX serialization is used and the read-serialized attribute is set). The only way to find out the actual overhead is by running tests, and examining the memory usage.
|String||(String type + length (3 to 5 bytes)) + String.length|
|Domain Object||9 bytes (for PDX header) + object serialization length (total all member fields) + 1 to 4 extra bytes (depends on the total size of Domain object)|
Servers always maintain two outgoing connections to each of their peers. So for each peer a server has, there are four total connections: two going out to the peer and two coming in from the peer.
The server threads that service client requests also communicate with peers to distribute events and forward client requests. If the server's GemFire connection property conserve-sockets is set to true (the default), these threads use the already-established peer connections for this communication.
Since each client connection takes one server socket on a thread to handle the connection, and since that server acts as a proxy on partitioned regions to get results, or execute the function service on behalf of the client, for partitioned regions, if conserve sockets is set to false, this also results in a new socket on the server being opened to each peer. Thus N sockets are opened, where N is the number of peers. Large number of clients simultaneously connecting to a large set of peers with a partitioned region with conserve sockets set to false can cause a huge amount of memory to be consumed by socket. Set conserve socket to true in these instances.
32,768 /socket (configurable)
Default value per socket should be set to a number > 100 + sizeof (largest object in region) + sizeof (largest key)
|If server (for example if there are clients that connect to it)||= (lesser of max-threads property on server or max-connections)* (socket buffer size +thread overhead for the JVM )|
|Per member of the distributed system if conserve sockets is set to true||4* number of peers|
|Per member, if conserve sockets is set to false||4 * number of peers hosting that region* number of threads|
|If member hosts a Partitioned Region, If conserve sockets set to false and it is a Server (this is cumulative with the above)||
=< max-threads * 2 * number of peers
Note: it is = 2* current number of clients connected * number of peers. Each connection spawns a thread.
Per Server, depending on whether you limit the queue size. If you do, you can specify the number of megabytes or the number of entries until the queue overflows to disk. When possible, entries on the queue are references to minimize memory impact. The queue consumes memory not only for the key and the entry but also for the client ID/or thread ID as well as for the operation type. Since you can limit the queue to 1 MB, this number is completely configurable and thus there is no simple formula.
|1 MB +|
|GemFire classes and JVM overhead||Roughly 50MB|
Each concurrent client connection into the a server results in a thread being spawned up to max-threads setting. After that a thread services multiple clients up to max-clients setting.
|There is a thread stack overhead per connection (at a minimum 256KB to 512 KB, you can set it to smaller to 128KB on many JVMs.)|