Thursday, October 11, 2018

HyperLogLog at SFPUG

HyperLogLog is one of those simple, useful, and ingenious algorithms that everyone should know. In SQL terms, it's a way to do COUNT(DISTINCT …) estimates with guaranteed accuracy bounds.

At this month's San Francisco PostgreSQL Users' Group, Sai Srirampur will explain HLL's workings, the postgresql-hll extension, and its applications in distributed PostgreSQL à la Citus.

Check it out!

Wednesday, September 5, 2018

Locks talk this Friday, at PostgresOpen! (2018-09-07)

UPDATE: my slides are available; the video has yet to be published.

Attending PostgresOpen?

Come join me Friday for a gentle introduction to locks in PostgreSQL. My example-driven talk covers basic lock theory, tools for lock debugging, and common pitfalls and solutions. I hope to see you there!

Time and place info is on the PostgresOpen SV website.

Sunday, July 9, 2017

A word of praise for TextBelt

Quite often I need to kick off a long-running process for a client, then resume work immediately once it's done. pg_restore, pg_basebackup, pg_upgrade, and vacuumdb --analyze-only all come to mind as examples. The ideal thing is to get a text message upon completion. When I'm working on my own boxes, I can just mail(1) my mobile carrier's text gateway, but I can't count on clients' servers having sendmail or the like set up (they usually don't).

Enter TextBelt. This service is a dead-simple HTTP-to-SMS gateway. Adapted from their docs:

curl -X POST https://textbelt.com/text \
       --data-urlencode phone="$MY_MOBILE_NUMBER" \
       --data-urlencode message='The process completed.' \
       -d key="$MY_TEXTBELT_API_KEY"

Cleartext HTTP is also supported, in case the client box has broken SSL libraries. My texts are always nondescript, so I don't mind sending them in the clear.

The whole thing is open-source, so you can set up your own TextBelt server. Or you can be lazy and throw a few dollars their way for a certain number of texts. I rather like this business model, actually, as a way to support open-source work.

Friday, July 7, 2017

The PATH of cron

Short version

Postgres Plus keeps psql out /usr/bin, so you need to set PATH in your cron jobs (including for WAL-E).

Longer version

Like all good people, I set up a cron jobs to run nightly WAL-E base backups. With one client, this failed the first time:
wal_e.main ERROR MSG: could not run one or more external programs WAL-E depends upon DETAIL: Could not run the following programs, are they installed? psql STRUCTURED: time=2017-07-07T03:00:01.957961-00 pid=25833
Turns out they were using Postgres Plus, and it puts psql in /opt/PostgresPlus/9.3AS/bin. That directory is in the enterprisedb user's PATH, of course, but not in the minimal PATH that cron jobs get by default. So I had to log in as enterprisedb, echo $PATH, and then paste PATH = [what echo said] at the top of my cron job. Moral of the story: take care when setting up cron jobs for PostgreSQL forks, or for non-standard community PostgreSQL installations.

Wednesday, March 29, 2017

Change autovacuum_freeze_max_age without a restart (sort of…)

This blog post is kind of involved, so I'm giving a short version at the top, with some background for beginners at the bottom. The middle section explains the motivation for using this hack in the first place.

Short version

I came up with a useful and/or terrible hack the other day: setting autovacuum_freeze_max_age as a storage parameter. I definitely don't recommend doing this routinely, but it unblocked us during a critical maintenance window.

    ALTER TABLE my_table SET (autovacuum_freeze_max_age = 300000000);

Don't forget to set it back when you're done! Otherwise you will incur an even longer autovacuum freeze, probably when you least expect it.

Medium-length version

My colleague Kacey Holston was in the midst of upgrading a client from PostgreSQL 9.4 to 9.6, using Slony for minimal downtime. As planned, the client took a few minutes of downtime so Kacey could do. She was ready to reverse the direction of replication (so the 9.6 server was replicating to the 9.4 server, in case our client to fall back to it). But there was an autovacuum freeze (a.k.a. "autovacuum (to prevent wraparound)" that was keeping Slony from getting the brief ExclusiveLock it needed.

She knew from experience that this table takes three hours to freeze. But the client had only minutes of downtime scheduled – that was the whole point of using Slony!

If only it were possible to change autovacuum_freeze_max_age on the fly; then we could bump it up to stop that autovacuum. Unfortunately, you have to restart the database in order to change it. Except…

You can set it on a per-table basis, as follows. This took effect immediately:

    ALTER TABLE my_table SET (autovacuum_freeze_max_age = 300000000);

If you do this, don't forget to set it back to the normal value (by default, 200000000) once you're done! Otherwise autovacuum freezes on this table will come around less often and take even longer.

Background for beginners:

When the oldest transaction ID on any row in a table is more than autovacuum_freeze_max_age old (200 million transaction old, by default), then an "autovacuum (to prevent wraparound)" process runs on the table to reclaim old transaction IDs. For large tables, this can be a problem, because it can generate a lot of CPU and I/O activity during busy hours. Also, as we saw here, it locks the table (in a SHARE UPDATE EXCLUSIVE mode); this blocks DDL changes (a.k.a. migrations).

For more-technical background, see the official PostgreSQL docs on how transaction IDs work, and for a friendlier intro, see this series of blog posts.

Friday, January 29, 2016

Language thoughts

All the languages I've used heavily have one primitive data structure you fall into using as a golden hammer:

  • C: arrays
  • Lisp: lists
  • Perl: hashes
  • JS: "objects"
Et cetera. Python is borderline; it's tempting to abuse dicts, but there's a bit less friction to using proper objects than with Perl.

Saturday, January 16, 2016

Blocked by rdsadmin

One of our clients on RDS had a VACUUM FREEZE that was hanging for a long time. "I bet it's waiting on a lock," I thought. Yup, but the curious part is what it was waiting on: a mysterious process run by the rdsadmin role (see RECORD 2, below):
SELECT pg_stat_activity, pg_locks.mode
  FROM pg_stat_activity
  JOIN pg_locks USING (pid)
  JOIN pg_class ON pg_locks.relation = pg_class.oid
 WHERE pg_class.relname = 'users'
   AND pg_locks.mode IN ('ShareUpdateExclusiveLock', 'ShareLock', 'ShareRowExclusiveLock', 'ExclusiveLock', 'AccessExclusiveLock');

-[ RECORD 1 ]----+------------------------------
datid            | 1234
datname          | my_database
pid              | 14641
usesysid         | 16396
usename          | postgres
application_name | psql
client_addr      | 198.162.0.0
client_hostname  |
client_port      | 5430
backend_start    | 2016-01-15 22:05:06.161987+00
xact_start       | 2016-01-15 22:14:39.301425+00
query_start      | 2016-01-15 22:14:39.301425+00
state_change     | 2016-01-15 22:14:39.301429+00
waiting          | t
state            | active
query            | VACUUM FREEZE verbose users;
mode             | ShareUpdateExclusiveLock
-[ RECORD 2 ]----+------------------------------
datid            | 1234
datname          | my_database
pid              | 22328
usesysid         | 10
usename          | rdsadmin
application_name |
client_addr      |
client_hostname  |
client_port      |
backend_start    |
xact_start       |
query_start      |
state_change     |
waiting          |
state            |
query            | 
mode             | ShareUpdateExclusiveLock
Further examination showed that this process was locking only the users table (and its indexes):
SELECT locktype, relation::regclass AS tablename
  FROM pg_locks
  JOIN pg_stat_activity USING (pid)
 WHERE pid = 22328;

locktype    |                  tablename
------------+---------------------------------------------
 relation   | user_index_a
 relation   | user_index_b
 relation   | user_index_c
 relation   | users_pkey
 virtualxid |
 relation   | users
(13 rows)

Has anyone else seen such a process? I'm curious as to what it is. My current best guess is a vacuum We opened a ticket with Amazon to ask them, so I'll update this post when Amazon replies.

EDIT 2016-01-17: the above is a braino for "an autovacuum." An Amazon rep wrote back the next day to say that it was, indeed, an autovacuum process, and included some vacuum-tuning tips. So good on them, although it's unfortunate that RDS's privilege system doesn't allow you to see the pg_stat_activity.query field when the rdsadmin role is working on a table you own.