How to use GNU/Parallel and pg_dump ?

I work with the PostgreSQL database server and for the backup process, I use the pg_dump utility.

If you have many databases, you know there is a problem with this tool. In fact, it’s a serial process, it saves database after another and if you have a big one, you risk to wait for longtime. In my situation, this is problematic and I looked to use an other tool to parallelize the process.

But I discovered the tool GNU/Parallel.

GNU/Parallel is a shell tool for executing jobs in parallel using one or more computers.

Installation

wget http://ftp.gnu.org/gnu/parallel/parallel-20110722.tar.bz2
tar xfj parallel-20110722.tar.bz2
cd parallel-20110722
./configure --prefix=/opt/local
make
make install

The Script

#!/usr/bin/env bash
# Path to the GNU/Parallel tool
PARALLEL=/opt/local/bin/parallel

# We select all the databases we want to backup except the 'template0', 
# 'template1' and 'postgres' databases
INPUTLIST=`psql -d postgres -A -t <<EOF
    SELECT datname FROM pg_database 
    WHERE datname NOT IN ('template0','template1','postgres')
    ORDER BY datname ASC;
EOF`

# And we call the parallel tool with the right arguments
CMD='pg_dump --no-owner {} | gzip -9 > ${OUTPUT_DIR}/{}.dump.sql.gz'
(echo "$INPUTLIST") | $PARALLEL "$CMD"

If you have an other solution to this problem, I’m interrested

In a future post, I will show you how to use GNU/Parallel and rsnapshot to parallelize the rsync tool.