General
What is Enterprise Data Fabric (EDF) ?
Also referred to as a data grid, an Enterprise Data Fabric a distributed memory based data management platform that pools memory (and CPU, network and optionally local disk) across multiple processes on multiple nodes to manage live application data and behavior. For a detailed description of the Enterprise Data Fabric, please review What is an Enterprise Data Fabric (EDF)?
What is GemFire Enterprise (GFE) ?
GemFire Enterprise (GFE) is a high-performance, distributed data management platform that serves as the core component of the GemFire Enterprise Data Fabric (EDF). GFE supports multiple layouts and topologies, allowing it to be easily incorporated into various architectures such as J2EE, .NET, Grid, SOA and web portals and provide extremely high performance, throughput, and scalability. Learn more about GemFire Enterprise here
Where can I get GemFire Enterprise?
You can download a current version of GemFire Enterprise (GFE) after registering on our website at http://www.gemstone.com/download
How can I install GemFire Enterprise (GFE)?
The detailed installation instructions are provided with each download. See 'Where can I get GemFire Enterprise?'(GFE)' above for details on download.
Please note that, for Java, GFE is essentially a series of Java library (.jar) files. The GemFire distribution includes additional Java libraries, .dll files and .so files that are used by the monitoring and management system. The .dll files and .so files, in particular, are used to collect Operating System statistics concurrently with cluster statistics. You do not necessarily need all of these libraries for a development system. Installing GemFire Server does not modify any operating system environments or registries. The installation step for the GFE server (Java) executes the GemFire click license and unpacks the distribution file.
For more information on the installation and requirements for the GemFire Native Client, please read this .
What is a Distributed System (DS) ?
A Distributed System (DS) is formed when a set of GFE instances are clustered, typically over a LAN, to provide the collective data storage and replication behavior (although a single process may run as its own Distributed System). The term 'Distributed System' is used throughout the documentation to refer to the set of processes that tightly correlate activity with each other and that are capable of providing replication and high availability services to each other. Clients are not considered part of a Distributed System. Distributed Systems may be loosely coupled to each other using Gateways. Multiple Distributed Systems may operate on the same node.
How do my applications access GFE?
Typically your applications use language appropriate APIs to access GFE. They integrate with the appropriate GFE library and make API calls in the language of your choice ( Java, C++, C#/.NET). For supported platforms and languages see this page
Can I use GFE as my primary or sole data source (data of record)?
Yes, of course! Make sure to define and implement an appropriate strategy for recovering from catastrophic failure. GFE provides various configurable options in this area.
What can I store in GFE?
The basic element of storage within GFE is objects. You can store any data in GFE by defining appropriate objects to represent it. You can store data blobs, Java, C++, C#/.NET Objects or any binary data.
How can I use GFE as a database?
GemFire provides the same basic functionality of a database: data access, querying, indexing, server-side functions, transactions and archival persistence. Because GFE is designed to run in the memory space of applications, it exposes data as ready-formed objects, rather than disjointed tables.
While GFE is often used as a replacement or front for an OLTP database, it is not a suitable replacement for an OLAP database. This is because the data structures in OLAP databases (data warehouses) are optimized for ad-hoc queries, which is significantly different from the way both GFE and OLTP databases store data.
Where can I run GemFire Enterprise (GFE) ?
GemFire Enterprise can be run on variety of platforms including Linux, Solaris and Windows. For details on system requirements please refer to GemFire Enterpirse (GFE) system administration guide.
Why do I want to use GFE?
GFE is being used to solve performance and scalability bottlenecks as well as to overcome availability and resiliency limitations in a variety of architectures and environments. Please refer to the section 'What problems does GemFire solve?' below to see how you can benefit by using GFE.
What problems does GemFire solve?
GemFire is designed for many diverse data management situations, but is especially useful for high-volume, latency-sensitive, mission-critical, transactional systems. Below is a list of production use cases where GemFire is utilized:
- Distributed Object caching
- Pluggable Database caching
- Distributed data and event management layer for Event Driven Architectures
- Continuous view maintenance for GUI clients
- Continuous Analytics on streaming data for Event Driven Architectures
- Guaranteed Messaging and Data Distribution
- Session state management in J2EE environments
- Data Grid for Grid Deployments
- Web Services Caching for SOA architectures
- High Performance Data Integration Cache
- Real Time Alerting
- File Transport
Standards, Platforms and Language Support
What OS platform(s) your does your software support?
For the supported platform matrix please refer to the platform support matrix here.
What languages are supported by GFE?
GFE supports Java, C++ and C#\.NET with full interoperability between the these languages. This is done in a highly efficient manner with wire level protocol compatibility between Java, C++ and C#\.NET.
The GFE server side is implemented in Java, and as such, the server side pluggable code ( cache plug-ins and functions) is only supported in Java.
If required, a third party software like Codemesh can be used for implementing C++/C# server side pluggable code.
Is your solution JSR 107 (JCache) compliant?
Yes, the implementation has been based on the publicly visible initial JSR document.
Architecture
Describe GFE's architecture.
Please review the GemFire architecture described here.
What is a data region in GemFire Enterprise?
A data region is similar to a Map that stores data in a key value pair. A region can be used to store one type of object (e.g. Employee) where all keys are of the same type (e.g. employeeId) or it can store objects of multiple types (e.g. Company, Department and Employee). The API of a data region is similar to a Map where data can be retrieved and stored using get() and put() methods respectively. Additionally, GemFire Enterprise provides query capabilities where multiple regions can be joined together in the query to retrieve objects of interest.
Often it helps to think of GFE Regions to be similar to RDBMS Tables, because the relationship between a GFE Cache and a GFE Region is similar to that of an RDBMS Database and an RDBMS Table. GFE Regions, however, are much more flexible than RDBMS Tables.
Can data be partitioned across nodes? If so, how is this configured? Are requests for data handled and routed automatically to the correct partition?
When using GemFire Enterprise's Partitioned Region technology, GemFire Enterprise will automatically distribute a large data set across multiple cache servers. A data update operation in application code uses the same API as it would with a static data storage configuration, but behind the scenes, GemFire Enterprise's distributed hashing algorithm very efficiently manages the allocation of data to different processes. User-defined policies/configurations control the memory management and redundancy (for high availability) options of the partitioned regions. Data redundancy is also maintained, and the loss of any one server can be made to be transparent to application code running within the cluster. When a failure does occur in a process that is part of a Partitioned Region with data redundancy, GemFire Enterprise automatically selects the least-loaded server to assume the failed server's responsibilities. As with a plain full mirror, the newly selected process pulls its image of data from other peers in the cluster without interrupting concurrent processing activity.
GemFire Enterprise also provides disjoint distribution across cache nodes. Each member cache manages a subset of the entire data set (i.e. Region's data). Member caches pull data from other nodes on demand, leaving only the data most frequently used by the application in the local cache node. This pull on-demand model is another way to partition the data.
Can distributed caches be partitioned according to keys? Can a new partitioning algorithm be plugged in? Do partitioned keys support warm startup?
The dynamic cache partitioning (Partitioned Regions) implementation uses a hashing algorithm to assign the key to a bucket. And, buckets are automatically mapped to cache nodes. Applications have the option to specify an 'edge' cache on any cache node, so that frequently accessed data items are readily available locally. Partitioned Caching does provide the option of built-in, transparent redundancy/failover. By default, GemFire uses a hashing policy where the key is hashed to determine a target 'bucket' and then the buckets are then mapped to a member to store the data. The physical location of the key-value pair is abstracted away from the application, so the application does not control the location of the data. Custom partitioning allows applications to control colocation of related data entries. GemFire also allows the configuration of entry colocation across multiple partitioned regions.
Does GemFire support 64 bit addressing?
Yes. The product has been tested and certified for 64-bit environments. For the supported 64 bit platforms see the support matrix.
Can the maximum size of the cache be configured? If so, does your solution provide paging of the cache when it's full?
Yes. Capacity controllers can be installed on any region to limit the amount of memory resource consumed by any data region. GemFire comes with multiple built-in capacity controllers. A HeapLRUCapacityController controls the contents of Region based on the percentage of Java VM heap memory that is currently being used. If the percentage of Java VM heap memory in use exceeds the given percentage, then the least recently used entry of the region is offloaded to disk ore evicted from cache. GemFire disk persistence (for disaster recovery or overflow) is highly optimized and extremely fast.
Does GemFire provide LAN/MAN/WAN replication?
GemFire supports LAN/MAN/WAN environments through a variety of topologies and consistency policies. GemFire offers a novel model to address these topologies ranging from a single cluster all the way to replication across multiple data centers across a WAN. This model allows distributed systems to potentially scale-out in an unbounded and loosely coupled fashion, without loss of performance and data consistency. At the core of this architecture is a gateway hub/gateway model to connect and configure distributed systems/sites in a loosely coupled fashion. Each GemFire distributed system can assign a process as its gateway hub, which contains multiple gateways that connect to other Distributed Systems. Backup gateways and gateway hubs can be set up and configured to handle automatic fail-over.
The list of configuration options for data distribution and replication includes optimistic, pessimistic, transactional, store-and-forward guaranteed delivery, and automated management of 'slow receiver' processes. If configured, GemFire guarantees event callback notifications in any failover/failback scenario in addition to assuring cache consistency. All these replication options are configured via the cache.xml file.
For replicating data across many cache instances within a LAN, GemFire offers the following options:
- 'Replication on demand': Data object is replicated to where it's used. (A 'PULL' model). In this model, the object resides only in the member cache that originally created it. Objects arrive to other cache members only when the connected applications request the object. The object is lazily pulled from other member caches. Once the object arrives, it will automatically receive updates to the object as long as the member cache retains interest in the object.
- 'Key replication': Only the keys of the objects cached are replicated - A 'push' model. This model can preserve network bandwidth and be used for low bandwidth networks.
- 'Total replication': All data is replicated (A 'push' model)
Can the number of replicas be configured?
Yes. The selective mirroring model in GemFire permits applications to dynamically add nodes with mirrored cache regions to increase the availability of the cached data. For instance, a application configured with just one mirror could dynamically be expanded to increase the number of mirrors by simply adding a cache member with mirroring turned ON. GemFire also supports dynamic cache partitioning, allowing you to 'stripe' data across multiple processes and dynamically add capacity as well as redundancy.
Does GemFire fail-over automatically to a backup node? Does it provide fail-back to the primary node?
Yes and Yes.
There are two key logical components to deal with: failure detection and actual failover. Failover detection is managed through GemFire Enterprise's Group Membership Services, which monitors all connections through various mechanisms including hearbeats. Configuration options let you tune failure-detection sensitivity by choosing timeout, retry interval, and retry count parameters. Once a failure is determined, failover is nearly instantaneous as GemFire maintains open connections to hot-backup servers at all times.
Failover/Failback is generally transparent to application logic.
Can GemFire be dynamically configured with changes in the cluster membership? Is this re-configuration automatic?
Yes, membership discovery is either handled through an IP multicast channel or through the use of a TCP/IP locator service (when Multicast is not an option). New member information is automatically relayed to all members. The Distribution System uses a heartbeat mechanism to monitor the health of the entire system. Sudden member departure will be notified to all the members, once the death of departure has been confirmed. Locks, connections, and other resources owned by the departed member are automatically released.
Does your solution need to be backed by a database?
No.
For Replicated regions, GemFire has a built-in, high-performance persistence mechanism. Any meta data required during a recovery from failure scenario is automatically obtained either from the cache XML descriptor or from other connected members.
For Partitioned regions, partition replicates are used to ensure that multiple coherent copies of the data are stored on multiple machines.
For database integration, GemFire provides a write-through and a guaranteed write-behind API callback capability where you can place O/R mapping and other database integration code.
Performance and Scalability
With increased work load, does GemFire Enterprise provide linear throughput with consistent latency?
GemFire Enterprise delivers consistent end-to-end latency on throughput of thousands of messages per second. This has been proven via benchmarks produced by the Securities Technology Analysis Center, LLC (STAC®) an independent provider of solution design, benchmarking, and validation services in the securities industry.The complete Data Transactions benchmark can be accessed here: http://www.gemstone.com/pdf/GemCache_Data_Transactions_STAC_Benchmark.pdf
How does GemFire Enterprise handle excessive data that cannot fit into the total memory of the distributed system?
Users can programmatically and declaratively (via xml file) manage the amount of memory a region consumes through data entry and region expiration settings and with least recently used (LRU) eviction settings. Entry expiration allows removal of data that has not been updated or accessed for a specified amount of time. When eviction is configured, GemFire Enterprise monitors region size or entry count and, when capacity is reached, stops region growth by removing or overflowing to disk the stalest, LRU entries. If it is necessary to potentially recall the excessive data, overflow or write-through features should be used.
Security
How does GFE address my security needs?
GemFire provides an authentication and authorization framework on the server side to allow applications to secure access to the data containers.
GemFire Enterprise can be configured to authenticate peer system members, clients, and remote gateways. GemFire also provides for authorization of cache operations on a server from clients. This allows GFE to block unauthenticated access to a GemFire Distributed System, or block cache operations as per user defined policies. When the peer system members are configured for authentication, then the GemFire system should make use of locators for discovery (i.e. using multicast for discovery is incompatible with peer authentication). For more details see the security developer notes.
Is it possible to secure the data replication between cluster members?
GemFire manages data in many nodes where access to data has to be protected. GemFire Enterprise provides 'On-the-wire' protection which is used to prove that the information has not been modified by a third party (some entity other than the source of the information). All communication between member caches can be made tamper proof by configuring SSL (key signing). The use of SSL in GemFire Enterprise communications is enabled in an all-or-nothing fashion. The choice of providers for certificates, protocol and cipher suites are all configurable.
Can I use my existing authentication and authorization framework to secure access to data in GemFire Enterprise?
GemFire Enterprise augments the current SSL based security model with a plug-in architecture, allowing enterprises to use existing authentication and authorization frameworks to secure cache members and clients. By caching the security credentials in memory, this plug-in security framework is designed to be minimally intrusive and have negligible impact on performance. With authentication enabled, the distributed system bars malicious clients or cache peers without valid credentials and deters inadvertent access to its cache from other systems. Learn more.
Concurrency and Data Consistency
Does GemFire provide pessimistic and/or optimistic concurrency?
Both. Pessimistic concurrency control is provided through a distributed lock service built into the product. Transactional updates through the GemFire Transaction Manager are ACID, two-phase. optimistic, and READ_COMMITED.
Does GemFire support single-phase and two-phase (XA) transactions?
Yes. GemFire has a built-in JTA-compliant transactions manager. It may enlist other registered JTA resources in a transaction, be enlisted by container-managed transactions, or be used for GemFire-only transactions.
It should be noted that the 2-phase operation is used internally by GemFire for synchronizing changes across many cache instances participating in the transaction and involving a single external resource such as a database.
Do you provide time-to-live and eviction policy functionality?
Yes. Applications can configure TTL on memory regions and set eviction policy such as global destruction of the cache entries, local invalidation or simply spill over to disk. Besides TTL, applications can also configure eviction based on 'idleTime'- the amount of time the object may remain in the cache after last access. Subsequent access to entries that have timed out will automatically result in the entry either being recovered from disk, a different cache instance or loaded from the data source.
Do you provide a read-write facility governed by a distributed lock manager?
Yes. Data regions configured as global in scope, automatically engage the distributed lock manager for all read/write operations. The locks obtained during a read can be prolonged through the use of the distributed lock service API.
Does you solution provide an API or other means by which a distributed lock manager can be utilized?
Yes. Applications can explicitly manage locks using the GemFire Enterprise distributed lock service. This service is typically used, when the multiple locks over multiple GemFire regions are required by a pessimistic application use case.
Does GemFire Enterprise support REPEATABLE READ isolation level?
Yes. By using the Repeatable Read isolation level, data region fetch requests issued multiple times within the same transaction will always produce the same result.
Data Management
What network/transport protocols does it support/rely upon?
TCP/IP, Reliable Multicast, and Unicast.
Multicast may be used for member discovery and/or for data distribution. In all scenarios, GemFire Enterprise also supports TCP/IP-only configurations.
What data consistency options are available in GemFire?
GemFire offers a complete range of consistency options-allowing you to trade-off strength of consistency for performance. An important differentiator of GemFire and competing products is the attention we pay to event notifications in response to cache updates. Performance, cache consistency, and event notification reliability, and network utilization must all be considered when choosing your model. Some of GFE's options (in descending order of performance, but ascending order of consistency):
• Asynchronous distribution over reliable multicast - This is extremely fast and scalable, but prone to data loss if the network isn't clean.
• Asynchronous distribution over TCP/IP - More reliable, but increases network load through shared network infrastructure components as updates are send separately to each peer in a cluster.
• Synchronous distribution with 'Early-Ack' - Each receiving process for an update acknowledges immediately after reading an update from the socket. Deserialization and event callback notifications are asynchronous.
• Synchronous distribution with Acknowledgment - Each receiving process for an update acknowledges AFTER reading the update from the socket, deserializing the object, and invoking any associated callbacks.
• Transactional Updates - Spans multiple updates to one or more data Regions. The transaction context is synchronously distributed, all included objects are deserialized, and all relevant API callback notifications must complete.
• Global Pessimistic Distribution - Before the update is distribution, a global distributed lock is first obtained from all relevant processes.
Note that for all cases, GemFire has the ability to dynamically detect slow processes and quarantine them if necessary. Thus, if one slow process is preventing rapid updates to others, GemFire will dynamically spawn a store-and-forward queue for the slow receiver and remove it from the synchronous distribution.
Does GemFire Enterprise provide synchronous and asynchronous replication between nodes?
Due to the way they are used, Partitioned Regions and their backups are always updated synchronously with respect to the updating application thread. The following options are available for Replicated Data Regions:
GemFire Enterprise provides a full menu of configuration options for data replication and distribution:
Synchronous communication without application acknowledgmentApplications that do not have very strict consistency requirements and have very low latency requirements should use synchronous communication model without acknowledgments to synchronize data across cache nodes. This is the default distribution model and provides the highest response time and throughput. Though the communication is 'out-of-band', the sender cache instance makes every attempt to dispatch messages as soon as possible diminishing the probability of data conflicts.
Synchronous communication with application acknowledgmentReplicated regions can also be configured to perform synchronous updates with other cache members. With this configuration, the control returns back to the application only after the receiving caches have all acknowledged receipt of the message. This pessimistic mode should be used with prudence especially with increased number of cache members accessing the same region.
Synchronous communication with distributed global lockingFinally, for pessimistic application scenarios, global locks can first be obtained before sending updates to other cache members. A distributed lock service manages acquiring, releasing and timing out locks. Any Replicated Region can be configured to implicitly use global locks behind the scenes through simple configuration. Applications can also explicitly request locks on cached objects if they want to prevent dirty reads on objects replicated across many nodes.
Automatic switching to asynchronous mode for slow consumersA distribution timeout can be associated with each node, so that if a message publisher (cache) does not receive message acknowledgments within the timeout period from a consuming node, it can switch from the default synchronous communication mode to an asynchronous mode for that consumer. This kind of a switch is primarily used only for publishing regions that do not require application acknowledgments from the consuming side. When the asynchronous communication mode is used, a producer batches messages to be sent to a consumer via a queue, the size of which is controlled either via queue timeout policy or a queue max size parameter. Events being sent to this queue can also be conflated if the receiver is interested only in the most recent value of a data entity. Once the queue is empty, the producer switches back to the synchronous distribution mode, so that message latencies are removed and cache consistency is ensured at all times.
Applications that use Partitioned Regions may explicitly use the Lock Service (which, for a Partitioned Region does not need to communicate with other peers) for pessimistic locking. All applications using that region must be coded to respect the lock-before-use protocol, because the Partitioned Region will not enforce it.
Can the synchrony parameter (i.e. control of consistency vs. performance) of the WAN, MAN and LAN replication be configured independently?
All distribution models discussed above, as well as store-and-forward queues, are configured on each cache region and scoped by a cache instance. In other words, different cache instances can specify different distribution scope for the same region. So, it is possible for a remote site to configure the region with store-and-forward queue distribution, whereas the local members can use a pessimistic (synchronous) replication strategy.
How is distribution reliability controlled in GFE?
At a transport level, the usage of TCP/IP or Reliable UDP Multicast guarantees message delivery.
In addition, GFE provides a novel, declarative (user-defined) approach for managing data distribution with the required levels of reliability and consistency across several hundreds or even thousands of nodes. Application architects can define 'roles' relating to specific member configurations and identify certain roles as 'required roles' for a given operation/application. For instance, 'DB Writer' can be defined as a role that describes a member in the GemFire Enterprise Distributed System that writes cache updates to a database. The 'DB Writer' role can now be associated as a 'required role' for another application (Data feeder), whose function is to receive data streams (e.g., price quotes) from multiple sources and put them into the cache. Once the system is configured in such a fashion, the data feeder will check to see if at least one of the applications with role 'DB Writer' is online and functional before it propagates any data updates. If for some reason, none of the 'DB Writers' are available, the price feeder application can be configured to respond in one of the following ways - (a) block any cache operations. (b) allow certain specific cache operations, (c) allow all cache operations or (d) disconnect and reconnect for a specified number of times to check if the required roles have been restored.
Describe how certain volatile scenarios (such as split-brain syndrome) are handled
GemFire includes a membership algorithm that is used to detect a split-brain scenario. The solution to the problem requires 3 members at a minimum. When a member detects ungraceful exit of another member, it announces this to other members. Similarly, the other members do the same. If more than one member indicates the unexpected exit of the same member, the member is presumed unreachable. On the other hand, the member who lost its network link, will notice that all the other members have ungracefully left the system and can take a different course of action.
GFE can also automatically handle and recover from issues such as network segmentation, which causes the GemFire Distributed System to become disjointed into two or more smaller systems. In such a scenario, each member in a disjointed partition evaluates the availability of all the required roles (discussed above) in that partition, and if all such roles are available, then that partition automatically reconfigures itself and sustains itself as an independent GemFire distributed system. On the other hand, if all the required roles are not found in a partition, then the primary member in that partition can be configured to disconnect and attempt reconnection with the lost members for a specified number of attempts. If this reconnect protocol fails, the member shuts down after logging an appropriate error message. With this approach, a network segmentation/partitioning is handled by the Distributed System without any human intervention. As a result, complex and expensive network 'merge' operations are not required once the segmentation issues have been fixed, and data inconsistency is avoided. This level of built-in intelligence and self-healing capabilities greatly enhances the productivity and efficiency of network engineers and application architects, who often spend several hours monitoring, identifying and debugging such issues in complex IT environments.
How does a cluster member recover after a failure in GemFire Enterprise?
Once a failed member node recovers, it may reload data from the local persistent store or it may be configured to restore its data image from any other available peer in its Distributed System.
In a client-server topology, how does a client handle connection failures with the cache server?
If a connection to a server is abnormally terminated or if the server is unreachable, the client automatically retries the operation a pre-configured number of times, as defined by the value of the retryAttempts setting. If retryAttempts is exceeded, the server in question is added to a dead server list, and the client selects another server to connect to, based on the selected load balancing policy. This automatic failover prevents any server from being a single point of failure. Any pending updates to the client can be recovered from the new server.
The servers in the dead server list and live server list are periodically pinged with an intelligent ping that ensures cache health. If a dead server is determined to be healthy again, it is promoted back to the list of healthy servers, thereby improving the load balancing on the system. You can configure the time period between server pings with the retryInterval parameter.
Does GemFire Enterprise provide data location transparency?
Yes. By distributing data in-memory and providing a consistent API to access data, the location of data sources may be made to be transparent to applications. Moreover, if data requested by the applications is not found in the cache, the data is loaded from a (potentially remote) data source without the application knowing about it. The API for data read remains the same irrespective of whether the data was found in the cache or not.
Is it possible to add and remove cluster members without any any loss of service?
Data redundancy can be defined programatically or via the xml configuration file descriptor in GemFire Enterprise. Based upon the redundancy level, multiple coherent copies of the data may be maintained in the Distributed System. A loss of a node member is automatically detected by other members. Additional copies of the data may now need to be established in existing members in order to maintain the requested level of redundancy. If a new node member is added, the data is will be distributed to the new member (upon application or administrative control) from existing members so as to provide wider data distribution.
Does GemFire Enterprise support Infiniband?
GemStone provides support for InfiniBand, and is deployed in production in such an environment today.
What are the cache entry expiration options?
It is possible to expire entries based on (a) time to live and (b) idle timeout. Both may be applied either at the entry or at the region level. View the Expiration types section on the developer's guide for additional information.
Messaging
Does GemFire Enterprise guarantee message ordering?
Events and messages in a GemFire distributed system always maintain update ordering.
Does GemFire Enterprise provide publish/subscribe support?
GemFire supports pub-sub by keys, wildcarding on key-based subscriptions, and also content-based continuous query capability wherein you submit a where clause into the distributed system, which is evaluated in real-time as data changes, and delivers deltas based on the evaluation of the where clause.
Configuration
Describe the configuration process and capabilities. What components can be configured?
GemFire can be configured decarlatively (through configuration files), programmatically (through a Java admin API) or through a JMX interface.
GemFire caches can be configured through two configuration files; gemfire.properties, and cache.xml (both may be renamed). The gemfire.properties file contains the settings required to join a distributed system. The cache.xml file contains region and entry configurations that are used to initialize a cache at creation time.
The cache.xml is the central place for describing all the data regions, their distribution and consistency characteristics, eviction model, external data loading and synchronization callbacks, etc. Static data that needs to be preloaded into the cache during startup may also be specified in this descriptor.
GemFire also includes management tools for Distributed System configuration. The GemFire Console and the GemFire command-line utility allow you to view and modify configuration attributes for distributed systems and individual system members.
The components that can be configured include: embedded cache, cache servers, high availability and failover of cache servers, distribution system - IP Multicast or TCP locator setup, etc.
Can configuration changes be applied dynamically to a running system, or does it need to be restarted?
Yes, with some restrictions. For instance, you can add a new mirror cache to the system to increase the availability of the system and data, add capacity to a partitioned cache, change the eviction characteristics, etc. But, certain operations like dynamically reconfiguring a local cache to be partitioned is not permitted. The allowed configuration changes are designed such that there are no compromises on data consistency or possibility of race conditions.
If a configuration API is provided, is it JMX-based?
There is a proprietary Java admin API and a JMX API. The Java API offers significant simplicity in comparison to JMX.
How is configuration information stored? In an XML file or another database repository?
Configuration information is stored in XML and properties files.
Does your solution allow for registering call-backs on occurrence of changes in the configuration of the distributed cache, including topology and cluster membership changes?
Yes. GemFire admin API permits applications to register a listener for membership level callbacks. The callback provides member arrival or departure information. This includes the identifier for the member cache along with severity (when the cache exits)
Does your system provide alerts? If so, under what conditions are alerts triggered? How are these alerts delivered?
Alerts can be configured using the GemFire admin API. Alert severity level can be configured. For instance, if the distribution system notices that the ACKs received take longer than the configured time limit, alerts can be raised (in the system log and in GFMon). Alerts are triggered under other exception conditions too.
How easy is it to customize the configuration ?
Most of the configuration options have defaults. Developers can start using the distributed with almost no explicit configuration and gradually tune the system.
Can I change the configuration at runtime?
Yes. You can update your cache properties and contents programmatically and from new cache.xml file declarations. Programmatic management is performed through Cache and Region instance methods. Through these methods, you can change cache configuration parameters and modify region structure. In addition, you can update your cache through new XML declarations using the Cache.loadCacheXml method. Where possible, declarations in the new cache.xml file supersede existing cache definitions.
For example, if a region declared in the cache.xml file already exists in the cache, its mutable attributes are modified according to the file declarations. Immutable attributes are not affected. If a region does not already exist, it is created. Entries and indexes are created or updated according to the state of the cache and the file declarations.
Mutable region attributes include: expiration, eviction, and application plug-in settings.
What is the copy-on-read setting and why would I use it?
You can set the copy-on-read attribute through the API or in the declarative cache.xml file like this:
<cache copy-on-read="true">
</cache>
Setting copy-on-read to 'true' causes the GemFire get() method to return a copy of the object in memory. Setting copy-on-read to 'false' causes the get() method to return a pointer to the cached object, and should only be used for read-only applications, or read-mostly applications if the following caveat is followed.
If you do not set the cache copy-on-read attribute to true, do not modify the objects returned from the Java entry access methods. If the application must modify it, create a copy of the object, then modify the copy and return it to the cache using the Java put() method. Modifying a value in place bypasses the entire distribution framework provided by GemFire Enterprise, including cache listeners and expiration activities, and can produce undesired results.
System Monitoring and Management
What system management tools are available for GemFire based systems?
There are a variety of ways that GemFire can be monitored
1. GemFire provides a GUI monitoring tool, GFMon, that runs separately from the core Distributed System connectivity framework, allowing you to easily perform GemFire system monitoring over dedicated management networks, isolating monitoring and application traffic. This tool allows you to define your expected system environment and see its health at a glance-including up/down status, queue sizes, Region sizes, WAN distribution activity, peer-to-peer connectivity status, bridge client connectivity status, and more.
2. Through the GemFire Admin API's in Java, C++, or C#.
3. Through the GemFire JMX management API's. The exposed GemFire JMX MBeans are available for third party tools (e.g, Jconsole, JManage).
4. Using standard log-scraping monitoring tools to watch for various Java cache member warnings and exceptions. Note that GemFire uses standard Log4J for all logging functions.
5. GemFire Statistics. This is comprised of a statistics gathering module that tracks a very large number of cache and operating system activity metrics with negligible impact on system performance. To analyze system performance, GemFire's Visual Statistics Display tool can overlay graphs from many different processes and maintain them in near real-time. This allows you, for example, to obtain centralized views of things such as cache hits, cache misses, data updates, memory usage, invalidations, transactions, and user-defined statistics, and hundreds of more stats across an entire distributed system.
Does GemFire have an API for monitoring a running system?
The health monitoring API allows you to configure and monitor system health indicators for GemFire Enterprise Distributed Systems and their components. There are three levels of health: Good health that indicates that all GemFire components are behaving reasonably, OK health that indicates that one or more GemFire components is slightly unhealthy and may need some attention, and Poor health that indicates that a GemFire component is unhealthy and needs immediate attention.
Because each GemFire application has its own definition of what it means to be 'healthy', the metrics that are used to determine health are configurable. GemFireHealthConfig provides methods for configuring the health of the Distributed System, system managers, and members that host cache instances. Health can be configured on both a global and per-machine basis. GemFireHealthConfig also allows you to configure how often GemFire's health is evaluated.
The health administration APIs allow you to configure performance thresholds for each component type in the Distributed System (including the Distributed System itself). These threshold settings are compared to system statistics to obtain a report on each component's health. A component is considered to be in good health if all of the user-specified criteria for that component are satisfied. The other possible health settings, OK and Poor, are assigned to a component as fewer of the health criteria are met.
What performance runtime statistics can be gathered and monitored? What do these statistics determine?
GemFire Enterprise incorporates a sophisticated statistics management system. The statistics sub-system collects statistics on cache utilization, performance, distribution, locking, memory footprint, CPU utilization, custom application statistics and more at run time. The collected statistics are periodically written to a statistics archive file. GemFire offers real-time statistics aggregation and correlation capabilities in GFMon as well as a graphical tool called Visual Statistics Display (VSD). GemFire exposes the statistics API to the applications making it possible to correlate application level events with events in the cache. The run time statistics serve multiple purposes: from debugging hard-to-find performance related issues and memory leaks to fine tuning the deployment in production.
How is the information presented? Is this centralized?
Statisticsa are presented graphically in GFMon as either charts or numerical values in cells. VSD provides all stats in charts.
VSD is capable of reading multiple statistic archives (from many members of a distributed system) to analyze the behavior across the entire Distributed System. It may also be used to compare statistics information from multiple test runs.
How easy is it to create customized reports? Describe the process of creating one.
Charts can be custom built in VSD. Any number of related stats can be combined and viewed in a single chart. For instance the administrator might want to analyze the impact of CPU utilization on the cache access performance. These custom charts can be saved as templates for later reuse.
Do you provide a JMX-based monitoring API?
All GemFire Enterprise managed stats are accessible through the JMX interface.
h35. Does GemFire Enterprise support Log4J logging services?
GemFire Enterprise utilizes the standard Log4J logging services with many different log-levels. Logging may be used for various debugging and exception tracking purposes, as well as providing an audit-trail of many GemFire operations. Most customers integrate existing audit solutions into their GemFire systems through our event listener API's.
Programming
What is the programming model of GemFire Enteprise (GFE)?
Java, C++, and C#/.NET are the primary programming models for GFE.
What language is GFE written in?
GFE Server is written in Java, and Clients are written in Java, C++ and C#/.NET.
Is there a Java Process running in my C++ or C#/.NET application if I use GFE?
No, the C++ and C#/.NET applications integrate with client libraries that are written in native C++ and C#. The client libraries communicate with
servers using a highly optimized wire level protocol.
What APIs can be used to access the GFE ?
The primary APIs are in Java, C++ and C#/.NET.
Does GFE support a query language?
GFE supports OQL ( Object Query Language). We have found that developers familiar to SQL can quickly adopt to OQL as the
concepts are similar.
What are functions? Why do I need them?
Functions allow server side execution of behavior that can be initiated from a client. Moving the behavior to the data can often be much more effective than moving the data to the behavior. Only the final results/data are sent to the clients. For
further details on functions and data aware function behavior please read this.
Can I have a list of all the GemFire cross-language serialization requirements for Java, C++ and C#/.NET cache members?
- Your C#/.NET classes must implement the IGFSerializable interfaces (i.e. implement toData() and fromData() methods, optionally the objectSize() method, and, must have a unique classID).
- Your C++ classes must implement the Serializable interface (i.e. implement toData() and fromData() methods, optionally the objectSize() method, and, must have a unique classID).
- Each of your C#/.NET IGFSerializable cacheable classes must have a unique classID in the range 0 to ((2^31)-1), both inclusive.
- You should have the same version of your project jar(s) in the Java cache server's classpath and in the Java cache client's CLASSPATH. Running with different class versions is possible - contact your GemStone support engineer or support@gemstone.com for more information.
- For every class you store in a Java cache member (i.e. Java cache server or Java cache client) you would need to register an instantiator. The normal method of registering an instantiator is to do it in a static initialization block.
- As a best practice for your application class version management, please make sure you define the serialVersionUID static final variable in each Java cacheable class.
- Every time you make changes to your java classes (e.g., adding serialVersionUID, modifying toData/fromData), you would need to restart the cache server and the cache client instances for the changes to be applied. Running with different class versions is possible - contact your GemStone support engineer or support@gemstone.com for more information.
- All Java cache servers in the same GemFire Distributed System need to use the same gemfire.jar (a.k.a. same GemFire product version). Java cache clients connected to a 6.0 GemFire Cache Server can be GemFire v5.7.X or v6.0 based clients.
- Every custom Java class used as a cache entry key must implement the hashCode() and equals() methods. This is part of the contract of com.gemstone.gemfire.cache.Region interface as it extends the java.util.Map interface. If you use simple Java classes such as String or Integer, their implementation of hashCode() and equals() is fine. The default implementation of Object.hashCode() and Object.equals() is based on object identity, which can and will change when copies of objects are made and objects are serialized/deserialized between containers.
How can I measure the size of the GemFire serialized object?
The following code demonstrates how to measure the size of the serialized object byte[]. The length of the serialized object byte[] in both Java and C#/.NET should be the same.
C# cache client code snippet - generating a serialized object as a byte[] and measuring its size:
DataOutput do = new DataOutput();
obj.ToData(do);
uint len = do.BufferLength;
byte\[\] ptr1 = do.GetBuffer();
Console.WriteLine(ptr1.Length);
For completeness:
Java cache client code snippet - generating a serialized object as a byte[] and measuring its size:
import com.gemstone.gemfire.internal.HeapDataOutputStream;
HeapDataOutputStream hdos = new HeapDataOutputStream();
DataSerializer.writeObject(obj, hdos);
byte\[\] bytes = hdos.toByteArray();
System.out.println(bytes.Size);
What is the correct way to implement a Java Cache Server main method loop?
GemFire includes a script named cacheserver that provides an easy way to start a server using the declarative (cache.xml) configuration style.
If you want to start a GemFire server using APIs, your top level class should take the following form:
/** Object for main thread to wait on */
private final Object go = new Object();
/** Is true indicates that application is running */
private volatile boolean running = true;
/**
* Adds a shutdown hook and loops until user hits CTRL-C. JMX Notifications
* will be coming in on a separate thread.
*/
public void go()
{
Runtime.getRuntime().addShutdownHook(new Thread() {
public void run()
{
disconnect();
}
});
try {
synchronized (go) {
while (this.running) {
go.wait();
}
}
} catch (InterruptedException e) {
}
}
Transaction Support
Does GemFire Enterprise provide JTA support?
JTA is a standard Java interface that can coordinate GemFire Enterprise transactions and JDBC transactions under one umbrella. The GemFire Enterprise includes an implementation of JTA, but GemFire Enterprise can enlist in transactions controlled by any implementation of JTA. GemFire Enterprise works with container's transaction manager when deployed in a J2EE container with its own built-in JTA implementation.
Can multiple data regions be updated as part of one transaction?
The data scope of a GemFire Enterprise cache transaction's view is a Cache. Because a transaction is managed on a per Cache basis, multiple regions in the distributed system can participate in a single transaction.
Is the transaction distributed for a distributed region?
The commited operations from a Cache transactions are distributed in the same way that region changes are distributed, that is, the changes they generate can be distributed. As always, the scope of the region determines the rules of the distribution. Data distribution happens at commit time. Until then, changes made in a transaction are not distributed, even among threads in the same process.
Integration with data sources
How do I integrate GFE with other data sources?
GemFire can be integrated through to other data sources using user defined plug-ins (i.e. CacheLoader, CacheWriter, CacheListener and GatewayListener). Generally for read through/write through behavior implement CacheLoader/CacheWriter and for write behind implement a GatewayListener. For details on plug-ins see controlling Data Flow With Application Plug-ins and for details on write behind cache listener read this.
Do you provide write-through to a database?
GemFire provides a plug-in interface that can be programmed for write-through to a database. For more information, read this.
Please describe the write-through architecture?
The synchronous write-thru architecture in GemFire captures and applies the update operation to the external data source first, and IF it succeeds, applies the update to the GFE data container. The data entry is locked while the DB synchronization occurs. Applications configure a 'writer' (CacheWriter plug-in see above) that takes a data event and applies it to the underlying external data source. A "writer" can be configured on any node in the system. A data update operation triggered on a different node automatically triggers the remote writer first. This feature is referred to as a 'netWrite'.
Writers can use entity beans or straight JDBC to write to the database and automatically participate in any ongoing transaction.
GemFire provides example plug-ins ( CacheLoader, CacheWriter, CacheListener) that use the Hibernate O-R mapping tool to synchronize with a relational database.
When using the GFE write-behind (GatewayListener) capability, your system can continue to run even during a lengthy database outage. Updates written to the data container with write-behind activated are queued (with HA) and then written to the database on a low-priority thread.
Does GemFire allow write through to more than one database?
GFE allows only one CacheWriter plug-in configured per data region. Inside of the CacheWriter, the application code may update multiple data sources for each invocation.
How does GemFire Enterprise handle failure in the case that the member node reading or updating the database using the CacheWriter or CacheLoader plug-in fails?
In the case that a server executing a CacheWriter or CacheLoader plug-in fails, the GemFire Enterprise process that initiated the data update instructs another server process that has a CacheLoader or CacheWriter plug-in defined for the data region to execute the plug-in logic.
Debugging
How do I set the GemFire log and statistic file sizes?
Make sure you provide enough disk space allocation for GemFire logs and stats. For example, to set 200MB of disk space for all the GemFire statistics files, and, to set the maximum stat file size as 10MB, you would include the following in your gemfire.properties file (or set this cache properties programmatically):
archive-disk-space-limit=200
archive-file-size-limit=10
The same but for GemFire log files would be:
log-disk-space-limit=200
log-file-size-limit=10
How do I turn on native statistics generation?
On Linux, to turn on generation of native stats you will need to set the LD_LIBRARY_PATH to include the $GEMFIRE/lib directory and the associated SO files (i.e. libgemfire64.so for 64-bit SUN JVM, libgemfire.so for 32-bit SUM JVM)
export LD_LIBRARY_PATH=/opt/test/gemfire/lib/
On Windows, set the LIBPATH to include the gemfire.dll file location, typically stored under the lib directory under installation directory.
How to take a thread stack dump on a cache member?
On the prompt run, kill -3 pid or kill -QUIT pid which will dump a stack trace for each thread to be written onto the GemFire log file for that cache member.
Why must I have a time synchonization service across the nodes where my caches are running?
You must run a time synchronization service such as NTP on all hosts where you are running GemFire cache members (i.e. cache servers and cache clients) to produce useful logs (and statistics) for troubleshooting. Synchronized time stamps ensure that log messages (and stats) on different hosts can be merged to accurately reproduce a chronological history of a distributed run.
How do I turn on GemFire log generation on a Java cache member JVM?
Option A:
Add the following entries to the cache server's gemfire.properties file:
# default and recommended log level
log-level=config
# setting path and base name for server log file
log-file=cache-server-1.log
# log-disk-space-limit - The maximum size in megabytes of all inactive log files combined.
log-disk-space-limit=500
# maximum size in MB to which a log file can grow before it is closed and logging rolls on to a new (child) log file.
log-file-size-limit=20
Option B:
When you start your Java cache member, add the following flags to the cache member JVM start up command:
-J-Dgemfire.log-level=config -J-Dgemfire.log-file=cache-server-1.log
-J-Dgemfire.log-disk-space-limit=500 -J-Dgemfire.log-file-size-limit=20
Settings and Startup
How do I indicate the location of the GemFire product license file?
Add the following to the Java command line -Dgemfire.license-file=gemfireLicense.zip.
Do not unpack the license zip file.
How do I start up a locator service?
If the GemFire product has been installed, you can run the following commands to start/stop a locator:
gemfire start-locator \-address=<ip address> \-port=<port number>
gemfire stop-locator \-address=<ip address> \-port=<port number>
Object Query Language (OQL)
What is the difference between a remote query and a local query?
A Java cache client can run two types of ad-hoc queries:
- Region.query will execute the query on the server side (this is a remote query). The query gets executed on the server and the resultset sent to the cache client.
- QueryService will query the data on the client cache (this is a local query).
What are generic OQL guidelines for best query performance?
Best practices are typically dependant on the amount of data and on the specific data access patterns. GemStone's Professional Services team can help you with your architecture design and fine tuning to meet your applications requirements and SLA.
These are some generic OQL guidelines:
- Please remember GemFire is an in-memory distributed object cache, and NOT an in-memory database.
- In general, it is recommended to simplify and minimize your OQL usage for best performance.
- It is recommended to ensure your query search criteria narrows down the results. That is, the first expression in the where clause should return the smallest result set, then the next and so forth.
- It is recommended to use the exact index expressions in your query. Please note that index expressions are case sensitive.
- It is recommended to use compact indexes to best manage/minimize memory usage on the cache servers. When using compact indexes, in general, it is recommended to use synchronous index updates.
- For optimal query performance, it is recommended to include the value-constraint and key-constraint region attributes for every region that needs to be queried: setting the value-constraint and key-constraint region attributes can increase query performance, and, reduces but doesn't eliminate the cost of Java reflection. This affects all query executions. Both value-constraint and key-constraint are useful to querying and indexing because it provides object type information to the query engine.
- If you are looking for best query performance, it is strongly recommended that all queries that execute on the server do "select *" and avoid projections. In addition to that, avoid "select distinct".
- For best performance, when possible, use int indexes over String indexes: int indexes will have less overhead than String indexes as the hash lookup of an int index is always faster than the hash lookup of a String index.
- For OQL query/indexing guidelines and examples, please check out the following web link: Querying and Indexing
How do queries work with overflow regions?
Queries work on overflow regions however, currently indicies are not supported and queries do cause the values to be fetched from disk, so performance will be impaired.
Do OQL-based queries invoke any CacheLoaders?
No, ad-hoc and continuous OQL queries running on a Java cache server or Java cache client don't invoke any CacheLoaders. More specifically, there is no way to tell the OQL query engine that some data is missing and needs to be loaded from a database (or another data source) into the cache. The developer is responsible for loading the relevant operational data into the GemFire cache before running the query (e.g. via region.putAll() or region.put() method calls.