Wednesday, September 17, 2014

RDS for Postgres: List of Supported Extensions

Today I learned that Amazon doesn't keep any list of extensions supported in PostgreSQL. Instead, their documentation tells you to start a psql session and run 'SHOW rds.extensions'. But that creates a chicken-and-egg situation if you have an app that needs extensions, and you're trying to decide whether to migrate.

So here's a list of extensions supported as of today, 2014-09-17 (RDS PostgreSQL 9.3.3). I'll try to keep this current.

btree_gin
btree_gist
chkpass
citext
cube
dblink
dict_int
dict_xsyn
earthdistance
fuzzystrmatch
hstore
intagg
intarray
isn
ltree
pgcrypto
pgrowlocks
pg_trgm
plperl
plpgsql
pltcl
postgis
postgis_tiger_geocoder
postgis_topology
sslinfo
tablefunc
tsearch2
unaccent
uuid-ossp
btree_gin
btree_gist
chkpass
citext
cube
dblink
dict_int
dict_xsyn
earthdistance
fuzzystrmatch
hstore
intagg
intarray
isn
ltree
pgcrypto
pgrowlocks
pg_trgm
plperl
plpgsql
pltcl
postgis
postgis_tiger_geocoder
postgis_topology
sslinfo
tablefunc
tsearch2
unaccent
uuid-ossp

Wednesday, April 2, 2014

Putting stats_temp_directory on a ramdisk

This hack is an old chestnut among PostgreSQL performance tuners, but it doesn't seem to be widely known elsewhere. That's a shame, because it's pure win, and it's ridiculously easy to set up. You don't even need to restart PostgreSQL.

Here's the situation: PostgreSQL writes certain temporary statistics. These go in the dir given by the stats_temp_directory setting. By default, that's pg_stat_tmp in the data dir. Temp files get written a lot, but there's no need for them to persist.

That makes them perfect candidates for a ramdisk (a.k.a. RAM drive). A ramdisk is a chunk of memory treated as a block device by the OS. Because it's RAM, it's super-fast. As far as the app is concerned, the ramdisk just holds a filesystem that it can read and write like any other. Moreover, PostgreSQL generally only needs a few hundred kilobytes for stats_temp_directory; any modern server can fit that in RAM.

In Linux, you set up a ramdisk like this:

As root:

'mkdir /var/lib/pgsql_stats_tmp' [1]

'chmod 777 /var/lib/pgsql_stats_tmp'

'chmod +t /var/lib/pgsql_stats_tmp'

Add this line to /etc/fstab. That 2G is an upper limit; the system will use only as much as it needs.

tmpfs /var/lib/pgsql_stats_tmp tmpfs size=2G,uid=postgres,gid=postgres 0 0

'mount /var/lib/pgsql_stats_tmp'

Then, as postgres:

Change the stats_temp_directory setting in postgresql.conf:

stats_temp_directory = '/var/lib/pgsql_stats_tmp'

Tell PostgreSQL to re-read its configuration:

'pg_ctl -D YOUR_DATA_DIR reload'

And that's it!

Other operating systems have different ways to set up ramdisks. Perhaps I'll cover them in a later post.

[1] The directory /var/lib/pgsql_stats_tmp is an arbitrary choice, but it works well for Debian's filesystem layout.

Thursday, August 22, 2013

mysql_auto_reconnect

I prefer PostgreSQL, which is the core of our business at pgExperts. But I still have some legacy clients on MySQL. For one of them, I'm writing new features in Perl/MySQL. The mysql_auto_reconnect option just saved my day. I'd been getting errors like "DBD::mysql::st execute failed: MySQL server has gone away." To use it in DBI, do
my $dbh = DBI->connect(
    "DBI:mysql:database=${MY_DATABASE};host=${MY_HOST};port=${MY_PORT}",
    $MY_USER,
    $MY_PASSWORD,
    { mysql_auto_reconnect => 1 },
);
To use it in Rose::DB::Object, do
File MyApp::DB;
our @ISA = qw( Rose::DB );

__PACKAGE__->default_connect_options({
    …,
    mysql_auto_reconnect => 1,
});

Monday, February 18, 2013

Python Type Gripe

I've been digging into Python lately. On Friday, I ran into an unintuitive behavior where two sets containing identical objects (of a custom class) were comparing unequal. I bet you can guess the reason: I hadn't defined __hash__ and __eq__ for my class. Take a look at this documentation for __repr__:
For many types, this function makes an attempt to return a string that would yield an object with the same value when passed to eval(), otherwise the representation is a string enclosed in angle brackets that contains the name of the type of the object together with additional information often including the name and address of the object. A class can control what this function returns for its instances by defining a __repr__() method.
"For many?" "makes an attempt?!" "often including?!!" To my way of thinking, this is a little crazy. There should be a single root class in the inheritance hierarchy, and that root should define __repr__, __hash__, and __eq__ methods that operate on/check for equality of each attribute in the class (by value, not by memory address!) Then they would behave consistently for all classes. Then two sets, each containing three objects of the same class, each containing the same attributes with the same values, would compare equal when you used == on them, following the principle of least surprise.

Of course, I can't rewrite Python to make this happen. I'm tempted just to make my own root class that implements these behaviors via metaprogramming, and make every class I ever define a subclass of it. Is there a better way?

PS: A colleague pointed out that this behavior reflects a deliberate decision in Python design philosophy: "Python […] favors informal protocols over inheritance trees […] You can think of __eq__ and __hash__ as kind of being parts of an informal 'collectable' protocol." That sort of makes sense. Of course, the thing about informal protocols is that there's nothing to enforce consistency.

Friday, February 1, 2013

Why PostgreSQL Doesn't Allow Unsafe Implicit Type Coercions

I got bitten by a MySQL bug today, and it reminded me why we don't allow¹ unsafe type coercions in PostgreSQL. It's easy to talk about them in the abstract, but harder to explain why they're a problem in practice. Here's a really clear example.

I do most of my consulting through pgExperts these days, but I still have a legacy client on MySQL. Today I was writing a new feature for them. Using test-driven development, I was debugging a particular invocation of a five-way join. I found it was returning many rows from a certain right-hand table, when it should have been returning none.

I took a closer look at my SQL and found this code²:

    …
    WHERE order.status_id NOT IN ('completed', 'cancelled')
    …

But that's not right! status_id isn't a varchar. It's an int FK to a lookup table:

    CREATE TABLE statuses (
        id int(10) unsigned NOT NULL AUTO_INCREMENT,
        name varchar(16) NOT NULL,
        PRIMARY KEY (id),
        UNIQUE KEY name (name)
    ) ENGINE=InnoDB;

MySQL was silently coercing my integer order.status_id to a varchar, comparing it to 'completed' and 'cancelled' (which of course it could never match), and returning a bunch of right-hand table rows as a result. The filter was broken.

PostgreSQL would never haved allowed this. It would have complained about the type mismatch between status_is and 'completed' or 'cancelled'. Rather than a confusing resultset, I'd get a clear error message at SQL compile time:

    ERROR: invalid input syntax for integer: "cancelled"
    LINE 8: select * from frobs where order.status_id NOT IN ('cancelled', 'c...
                                                                                               ^

Fortunately, I caught and fixed this quickly (hurrah for test-driven development!) The fix was simple:

    …
    JOIN statuses stat on stat.id = order.status_id
    WHERE stat.name NOT IN ('completed', 'cancelled')
    …

But under different circumstances this could have seriously burned me.

And that's why we disallow certain implicit type coercions. This post is not intended as a rant against MySQL. Rather, it's an illustration of how a seemingly abstract issue can seriously hose your code. Caveat hacker!


1. Since PostgreSQL 8.3 (2008), that is. Before that, we did allow unsafe coercions.

2. I've changed the code and identifiers to protect client confidentiality.