Hacks o’Clock: It's just an expression

PostgreSQL has lots of great features for text search.

In particular, there are a bunch of ways to do case-insensitive searches for text:

There's standard SQL ILIKE… but than can be expensive — especially if you put %'s at both start and end — and it's overkill if you want an exact string match.
The same goes for case-insensitive regexp matching: overkill for simple case-insensitive matches. It does work with indexes, though!
Then there's the citext extension, which is pretty much the perfect answer. It lets you use indexes and still get case-insensitive matching, which is really cool. It Just Works.

OK, but what if you didn't have the foresight to use citext? And what if you don't want to go through the pain of changing the data type of that column? In a case like this, an expression index can be really handy.

Without an index, a case-insensitive match like this…

sandbox# select addy from email_addresses where lower(addy) = 'quinn@example.com';

… is forced to use a sequential scan, lowercasing the addy column of each row before comparing it to the desired address:




                                               QUERY PLAN

----------------------------------------------------------------------------------------------------------

 Seq Scan on email_addresses  (cost=0.00..1.04 rows=1 width=32) (actual time=0.031..0.032 rows=1 loops=1)

   Filter: (lower(addy) = 'quinn@example.com'::text)

   Rows Removed by Filter: 3

 Planning time: 0.087 ms

 Execution time: 0.051 ms

(5 rows)

For my toy example, a four-row table, that's not too expensive, but for a real table with thousands or millions of rows, it can become quite a pain point. And a regular index on email_addresses(addy) won't help; the lower() operation forces a sequential scan, regardless of whether an index is present.

But an expression index will do the trick. An astute commenter noted that, in my toy example, walking the index is actually slower than a sequential scan. But for a table with many rows (most of which are not the one email address I'm looking for!) an index scan will be dramatically faster.


sandbox# create index email_addresses__lower__addy on email_addresses (lower(addy));

CREATE INDEX

sandbox# explain analyze select addy from email_addresses where lower(addy) = 'quinn@example.com';

                                                                  QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------

 Index Scan using email_addresses__lower__addy on email_addresses  (cost=0.13..8.15 rows=1 width=32) (actual time=0.093..0.095 rows=1 loops=1)

   Index Cond: (lower(addy) = 'quinn@example.com'::text)

 Planning time: 0.105 ms

 Execution time: 0.115 ms

(4 rows)

In short, expression indexes are one of those neat tricks that can save you a lot of pain.

Hacks o’Clock

Monday, October 29, 2018

It's just an expression

2 comments: