Wednesday, March 29, 2017

Change autovacuum_freeze_max_age without a restart (sort of…)

This blog post is kind of involved, so I'm giving a short version at the top, with some background for beginners at the bottom. The middle section explains the motivation for using this hack in the first place.

Short version

I came up with a useful and/or terrible hack the other day: setting autovacuum_freeze_max_age as a storage parameter. I definitely don't recommend doing this routinely, but it unblocked us during a critical maintenance window.

    ALTER TABLE my_table SET (autovacuum_freeze_max_age = 300000000);

Don't forget to set it back when you're done! Otherwise you will incur an even longer autovacuum freeze, probably when you least expect it.

Medium-length version

My colleague Kacey Holston was in the midst of upgrading a client from PostgreSQL 9.4 to 9.6, using Slony for minimal downtime. As planned, the client took a few minutes of downtime so Kacey could do. She was ready to reverse the direction of replication (so the 9.6 server was replicating to the 9.4 server, in case our client to fall back to it). But there was an autovacuum freeze (a.k.a. "autovacuum (to prevent wraparound)" that was keeping Slony from getting the brief ExclusiveLock it needed.

She knew from experience that this table takes three hours to freeze. But the client had only minutes of downtime scheduled – that was the whole point of using Slony!

If only it were possible to change autovacuum_freeze_max_age on the fly; then we could bump it up to stop that autovacuum. Unfortunately, you have to restart the database in order to change it. Except…

You can set it on a per-table basis, as follows. This took effect immediately:

    ALTER TABLE my_table SET (autovacuum_freeze_max_age = 300000000);

If you do this, don't forget to set it back to the normal value (by default, 200000000) once you're done! Otherwise autovacuum freezes on this table will come around less often and take even longer.

Background for beginners:

When the oldest transaction ID on any row in a table is more than autovacuum_freeze_max_age old (200 million transaction old, by default), then an "autovacuum (to prevent wraparound)" process runs on the table to reclaim old transaction IDs. For large tables, this can be a problem, because it can generate a lot of CPU and I/O activity during busy hours. Also, as we saw here, it locks the table (in a SHARE UPDATE EXCLUSIVE mode); this blocks DDL changes (a.k.a. migrations).

For more-technical background, see the official PostgreSQL docs on how transaction IDs work, and for a friendlier intro, see this series of blog posts.

1 comment:

  1. This doesn't really work.

    First, you can't actually *increase* autovacuum_freeze_max_age using the table storage parameters. This is documented here https://www.postgresql.org/docs/9.6/static/runtime-config-autovacuum.html#GUC-AUTOVACUUM-FREEZE-MAX-AGE and the code is here https://github.com/postgres/postgres/blob/01fd6f8f2d15a9369768921d6fc95ac481779430/src/backend/postmaster/autovacuum.c#L2803 . You can only decrease it.

    Secondly, a freezing autovacuum does not automatically give way to an ALTER TABLE. You *must* have called pg_cancel_backend() on the autovacuum process. And once you do so, Slony gets its lock and your ALTER TABLE runs. But, as mentioned, the parameter can't be increased during runtime, so the autovacuum immediately restarted and put itself at the end of the lock queue.

    And that's the trick your post should have focused on. If you have a freezing autovacuum and you need to get a conflicting lock on the table and don't mind losing any progress (unsure what happens in 9.6 due to freeze map; perhaps not all progress is lost), you can queue the lock you want and cancel the autovacuum. The autovacuum will restart immediately afterwards, but that's not really a problem if you just want to run a quick ALTER TABLE or similar.

    ReplyDelete