Preventing and Recovering from Disk Full Errors

It is important to monitor the disk usage of GemFire members. If a member lacks sufficient disk space for a disk store, the member attempts to shut down the disk store and its associated cache, and logs an error message. After you make sufficient disk space available to the member, you can restart the member. A shutdown due to a member running out of disk space can cause loss of data, data file corruption, log file corruption and other error conditions that can negatively impact your applications.

You can prevent disk file errors using the following techniques:

When a disk write fails due to disk full conditions, the disk store and any regions associated with the disk store on that member are closed.

Recovering from Disk Full Errors

If a member of your GemFire distributed system fails due to a disk full error condition, add or make additional disk capacity available and attempt to restart the member normally. If the member does not restart and there is a redundant copy of its regions in a disk store on another member, you can restore the member using the following steps:

  1. Delete or move the disk store files from the failed member.
  2. Use the gfsh show missing-disk-stores command to identify any missing data. You may need to manually restore this data.
  3. Revoke the missing disk stores using the revoke missing-disk-store gfsh command.
  4. Restart the member.

See Handling Missing Disk Stores for more information.