Tuesday, October 30, 2018

Remember your history

PostgreSQL keeps track of which WAL files go with which timelines in small history files. Each time you make a base backup, a history file is born. The file is written once and never updated. It's a simple system, and it works well and silently.

In fact, sometimes it works a little too silently.

At PostgreSQL Experts we've run into the problem where a client's history files disappear because they are stored in S3, and there's a lifecycle configuration in place that says to move everything over a certain age to Glacier. That's a good policy for WAL files!

Unfortunately, it's not a good policy for history files: without the latest history file you can't restore the latest backup, and without past history files, you are unable to do PITR to certain points in time.

The solution we used was to move the whole WAL archive to S3 Standard-Infrequent Access storage, dissolving the problem with lifecycle configurations while controlling costs. But you could also fix this by editing the lifecycle configuration.

The important thing is this: hold on to all history files. They're tiny text files, and when you need them, you really need them. This is also a good reason to test restores, not just of the latest backup, but of database states at arbitrary points in time.

*    *    *

Addendum: another very common problem we see is WAL archives that become corrupted because a client accidentally pointed a two primaries at the same WAL archive (for instance, they might have copied a postgresql.conf file by hand, or via a DevOps tool like Puppet). In this case, the whole archive is corrupted, and you're best off starting with a fresh S3 bucket or an empty directory and doing a new base backup immediately.

One of the many nice features of pgBackRest is that it will notice this and prevent you from doing it. Fewer footguns → better backups.

No comments:

Post a Comment