Published by sumanth-pasupuleti over 6 years ago
Avoiding too verbose logging by changing info to debug for start and stop execution, and logging postrestorehook command
Published by sumanth-pasupuleti over 6 years ago
Avoiding too verbose logging by changing info to debug for start and stop execution, and logging postrestorehook command
Published by sumanth-pasupuleti over 6 years ago
PostRestoreHook gets executed once the files are downloaded as part of the restore process, before starting C*.
There are several configurations for PostRestoreHook:
CONFIG_POST_RESTORE_HOOK_ENABLED - indicates if postrestorehook is enabled
CONFIG_POST_RESTORE_HOOK - contains the command with arguments to be executed as part of postrestorehook. Priam would wait for completion of this hook before proceeding to starting C*
CONFIG_POST_RESTORE_HOOK_HEARTBEAT_FILENAME - heartbeat file that postrestorehook emits. Priam keeps a tab on this file to make sure postrestorehook is making progress. Otherwise, a new process of postrestorehook would be spawned (upon killing existing process if still exists)
CONFIG_POST_RESTORE_HOOK_DONE_FILENAME - 'done' file that postrestorehook creates upon completion of execution.
CONFIG_POST_RESTORE_HOOK_TIMEOUT_IN_DAYS - maximum time that Priam should wait before killing the postrestorehook process (if not already complete)
Published by sumanth-pasupuleti over 6 years ago
PostRestoreHook gets executed once the files are downloaded as part of the restore process, before starting C*.
There are several configurations for PostRestoreHook:
CONFIG_POST_RESTORE_HOOK_ENABLED - indicates if postrestorehook is enabled
CONFIG_POST_RESTORE_HOOK - contains the command with arguments to be executed as part of postrestorehook. Priam would wait for completion of this hook before proceeding to starting C*
CONFIG_POST_RESTORE_HOOK_HEARTBEAT_FILENAME - heartbeat file that postrestorehook emits. Priam keeps a tab on this file to make sure postrestorehook is making progress. Otherwise, a new process of postrestorehook would be spawned (upon killing existing process if still exists)
CONFIG_POST_RESTORE_HOOK_DONE_FILENAME - 'done' file that postrestorehook creates upon completion of execution.
CONFIG_POST_RESTORE_HOOK_TIMEOUT_IN_DAYS - maximum time that Priam should wait before killing the postrestorehook process (if not already complete)
Published by arunagrawal84 over 6 years ago
(#680): Mark snapshot as a failure if there is an issue with uploading a file. This is to ensure we fail-fast. This is in contrast to previous behavior where snapshot would "ignore" any failures in the upload of a file and mark snapshot as "success".
Since it was not truly a "success" marking that as "failure" is the right thing to do. Also, meta.json should really be uploaded in case of "success" and not in case of "failure" as the presence of "meta.json" marks the backup as successful.
The case for fail-fast: In a scenario where we had an issue say at the start of the backup, it makes more sense to fail-fast then to keep uploading other files (and waste bandwidth and use backup resources). The remediation step for backup failure is anyways to take a full snapshot again.
Published by arunagrawal84 over 6 years ago
(#679) Mark snapshot as a failure if there is an issue with uploading a file. This is to ensure we fail-fast. This is in contrast to previous behavior where snapshot would "ignore" any failures in the upload of a file and mark snapshot as "success".
Since it was not truly a "success" marking that as "failure" is the right thing to do. Also, meta.json should really be uploaded in case of "success" and not in case of "failure" as the presence of "meta.json" marks the backup as successful.
The case for fail-fast: In a scenario where we had an issue say at the start of the backup, it makes more sense to fail-fast then to keep uploading other files (and waste bandwidth and use backup resources). The remediation step for backup failure is anyways to take a full snapshot again.
Published by arunagrawal84 over 6 years ago
Published by arunagrawal84 over 6 years ago
Published by jolynch over 6 years ago
gracefulDrainHealthWaitSeconds
option. If this option set to a positive integer (>=0) then before callingInstanceState.isHealthy
) for the configured number of seconds and then will issue a nodetool drain
with 30s timeout (since drain can hang), and finally call the provided stop script. By default this is set to -1
to disable this feature for backwards compatibility. This is useful if you want to gracefully drain cassandra clients off a node before running drain
(which kills the Native/Thrift server and resets and tcp connections that were established; in flight requests can get dropped), then running drain to safely stop Cassandra, and then call your stop script. If your service discovery system does not integrate with Priam's health system or your stop script already does all these things then leave this functionality disabled./v1/cassadmin/stop
http API call now takes an optional force
parameter (e.g. /v1/cassadmin/stop?force=true
which will skip the graceful path for that particular stop; default value is false
.jmxUsername
and jmxPassword
options. By default these are null and not provided.commons-io
, aws-java-sdk
, snakeyaml
ICassandraProcess
internally the start
method has been refactored to take a boolean force
parameter. If you implement this interface you can supply false
to preserve previous behavior.Published by jolynch over 6 years ago
gracefulDrainHealthWaitSeconds
option. If this option set to a positive integer (>=0) then before callingInstanceState.isHealthy
) for the configured number of seconds and then will issue a nodetool drain
with 30s timeout (since drain can hang), and finally call the provided stop script. By default this is set to -1
to disable this feature for backwards compatibility. This is useful if you want to gracefully drain cassandra clients off a node before running drain
(which kills the Native/Thrift server and resets and tcp connections that were established; in flight requests can get dropped), then running drain to safely stop Cassandra, and then call your stop script. If your service discovery system does not integrate with Priam's health system or your stop script already does all these things then leave this functionality disabled./v1/cassadmin/stop
http API call now takes an optional force
parameter (e.g. /v1/cassadmin/stop?force=true
which will skip the graceful path for that particular stop; default value is false
.jmxUsername
and jmxPassword
options. By default these are null and not provided.Snapshotstatus
to actually contain bkupMetadata
commons-io
, aws-java-sdk
, snakeyaml
ICassandraProcess
internally the start
method has been refactored to take a boolean force
parameter. If you implement this interface you can supply false
to preserve previous behavior.Published by tulumvinh over 6 years ago
Eliminate assumption that existence of an element in a data structure means successful backup.
Published by tulumvinh over 6 years ago
Eliminate assumption that existence of an element in a data structure means successful backup.
Published by jolynch over 6 years ago
Published by jolynch over 6 years ago
Published by vinaykumarchella almost 7 years ago
priam.backup.status.location
.priam.sdb.instanceIdentity.region
.Published by jolynch almost 7 years ago
Published by jolynch almost 7 years ago
Published by jolynch almost 7 years ago
priam.remediate.dead.cassandra.rate
configuration option. IfPublished by jolynch almost 7 years ago
priam.remediate.dead.cassandra.rate
configuration option. IfPublished by arunagrawal84 almost 7 years ago
priam.backup.notification.topic.arn
to enable this.