[Redis] Redis Persistence

1) Persistence Methods

1.1) No Persistence, dataset in memory 

You can disable persistence if you want your data to exist as long as server is running.

1.2) Redis Database File (RDF)

Snapshot of dataset per time interval

1.3) Append Only File (AOF)

Log every write operation, which will be played again at server startup, reconstructing the original dataset.

1.4) Hybrid (RDB & AOF)

Note that when Redis restarts, the AOF file will be used to reconstruct the original dataset since it is guaranteed to be the most complete.

2) RDB (Redis Database File)

Allow you to snapshot dataset every N seconds if there are at least M changes.

2.1) How does it work? (also called snapshotting)

When Redis create a snapshot, this happens:
  1. Redis forks. New child process starts in addition to the parent process.
  2. The child process begins to write the dataset to a temporary RDB file.
  3. Once the child process finishes writing to the new RDB file, it replaces the old one.

2.2) Commands

  • --SAVE: block all the other clients, while saving the DB
  • --BGSAVE: save the db in background process (preferred in production env as you don’t want to block Redis until it completes the snapshot)
  • In redis.conf, save 30 100 -> means take snapshot every 30 seconds if there are 100 keys changed.

2.3) Pros & Cons

2.3.1) Pros

  • Straightforward approach to backup and restore your data; enabled by default in redis.config file.
  • For instance, you may want to archive your RDB files every hour for the latest 24 hour and save an RDB snapshot everyday for 30 days. This allows you to restore different versions of your data set incase of disaster.
  • RDB is good for disaster recovery. It is a single compact file that can be transferred to far data centers or any server; possibly encrypted as well.
  • Maximizes Redis performance since the only work Redis parent process needs to do in order to persist is forking a child that will do all the rest. The parent instance will never perform disk I/O or alike.

2.3.2) Cons:

  • Possibly lose data stored after last snapshot.
  • Frequently fork of child process to persist on disk can be time-consuming if dataset is big.

3) AOF (Append Only File)

Logs each write operation received by the server, that will be played again at server startup, restoring original dataset.

3.1) Details

  • When the AOF file gets too big, Redis rewrites a completely new AOF with minimal set of operations needed to create the current data set. The rewrite is completely safe as while Redis continues appending to the old file, and once this second file is ready, Redis switches the two and starts appending to the new one.
  • Fsync defines how often to append. There are 3 types:
    • Appendfsync always: fsync every time new command is appended. Very very slow, very safe.
    • Appendfsync everysec (recommended): fsync every second. You can lose 1 second of data if there’s a disaster.
    • Appendfsync no: never fsync, just put your data in the hands of the OS. The faster and less safe method. Normally, linux will flush data every 30 seconds with this configuration, but it’s up to the kernel exact tuning.

3.2) How it works?

Log rewriting uses the same copy-on-write trick in snapshotting (RDB).
  • Redis forks; now we have child and parent process.
  • Child starts writing the new AOF in temporary file.
  • Parent accumulates all the new changes in an in-memory buffer (at the same time, it writes the new changes in the old append-only file, so if rewriting fails, we are safe).
  • While child is done rewriting the file, parent gets a signal, and appends the in-memory buffer at the end of the file generated by the child.
  • Redis atomically renames old file to new one, and starts appending new data to the new file.

3.3) Commands:

In redis.config, set “appendonly yes” and ‘appendfilename “<SOME FILE NAME>”’

3.4) Pros & Cons

3.4.1) Pros:

  • More durable – can have 3 different fsync policies: no fsync, fsync every seconds, fsync every query.
  • Default is fsync every second.
  • No corruption problems because it is append only logs. (Even if logs ends with half-written command, redis-check-aof tool can fix this)
  • Redis automatically rewrites AOF in background when it gets too big.
  • AOF contains a log of all operations one after the other in an easy to easy and parse format.

3.4.2) Cons:

  • AOF files are usually larger than RDB files for the same dataset.
  • AOF writes to disk every operation; an expensive task.
  • AOF can be slower than RDB depending on fsync policy.

3.5) Corrupted AOF file:

Best thing to do is to run the redis-check-aof utility, initially without the –fix option. Then, understand the problem, jump at the given offset in the file and see if you can manually repair the file. Otherwise, let the utility fix the file for us.

4) Hybrid

The best practice is to use both persistence methods.

Redis handles both well, the RDB-Snapshot and AOF-Rewrite processes will never run at the same time. But keep in mind: When starting up, Redis will always take the AOF Logfile, as it provides the more robust solution.

Resource

https://redis.io/topics/persistence
https://www.inovex.de/blog/redis-backup/

Comments

Popular posts from this blog

[Redis] Redis Cluster vs Redis Sentinel

[Unit Testing] Test Doubles (Stubs, Mocks....etc)

[Node.js] Pending HTTP requests lead to unresponsive nodeJS