Headaches migrating an old postgis database server

Tags: nelenschuurmans, python

Years ago, a database server was installed. Ubuntu 14.04 with postgres 9.3 and postgis 2.1, the standard stuff for 14.04. Over the years, some 450 databases were created by our consultants for various geo-water-related projects.

14.04, so it should have been updated at least four years ago, but we never got around to doing it. It was “only” an internal server, so no priority. Last week, we created a new server and wanted to do a simple pg_dumpall on the old and a psql < all.dump on the new one. Nothing fancy.

It was a big 250 GB sql dump file. Ideally, you’d remove unused databases beforehand. But how? It has been in use for eight years. Cleaning up isn’t our company’s biggest strength. We did send out an email “what can be removed?” but almost nobody answers.

  • Colleagues who made databases seven years ago and don’t work at the company don’t even receive the email :-)

  • If you don’t answer, the database is probably just left alone.

  • “It might be useful later”.

Well, so a quick pg_dumpall was done. But the restore failed. Then I got called in to see if I could figure something out. It took a couple of days before I got something working as it was quite hard to figure out what was actually going wrong.

When restoring the sql dump with psql, it would fail halfway like this:

ERROR:  relation "raster.crop" does not exist
STATEMENT:  COPY raster.crop (rid, rast, filename) FROM stdin;
LOG:  invalid message length
LOG:  disconnection: session time: 0:00:34.943 user=postgres database=work_r0146_zutphen_2d host=[local]

That led me to think it was related to some “crop” table in a “raster” schema that was missing. But it was there in the dump just fine (which was of course a bit of a bummer to extract from a 250 GB textfile…). Looking a bit better at the error output, there were quite some errors like that:

CREATE TABLE
ALTER TABLE
CREATE TABLE
ALTER TABLE
CREATE SEQUENCE
ALTER TABLE
ALTER SEQUENCE
psql:/dump/db.out:51310622: ERROR:  type "public.raster" does not exist
LINE 3:     rast public.raster,
                 ^
psql:/dump/db.out:51310625: ERROR:  relation "raster.crop" does not exist
CREATE SEQUENCE
ALTER TABLE
psql:/dump/db.out:51310645: ERROR:  relation "raster.crop" does not exist
psql:/dump/db.out:51310667: ERROR:  type "public.raster" does not exist
LINE 3:     rast public.raster,
                 ^
psql:/dump/db.out:51310670: ERROR:  relation "raster.dem" does not exist
CREATE SEQUENCE
ALTER TABLE
psql:/dump/db.out:51310690: ERROR:  relation "raster.dem" does not exist
psql:/dump/db.out:51310712: ERROR:  type "public.raster" does not exist
LINE 3:     rast public.raster,

So something is wrong with rasters. But loads of errors were also with normal postgis data types.

Problem 1: the new server was installed just with postgres, not with postgis. There were of course no installation instructions for the old server, and my colleague who set up the (manually installed) server didn’t know that we always include postgis.

Ok, installing postgis… now there were way fewer errors, though rasters still didn’t seem to work. I was still baffled by the sudded death halfway of psql:

ERROR:  relation "raster.crop" does not exist
STATEMENT:  COPY raster.crop (rid, rast, filename) FROM stdin;
LOG:  invalid message length
LOG:  disconnection: session time: 0:00:34.943 user=postgres database=work_r0146_zutphen_2d host=[local]

I mean, there were quite some errors, for instance failing row constraints. So why would it fail with a disconnect on this particular error? Let’s look at the logs in /var/logs/postgresql/. Problem 2: empty logfiles????

I spend a while trying to figure it out until I spotted that log_min_messages and log_min_error_statement were set to panic, so effectively off. My colleague did that to keep the main disk from filling up too quickly with a logfile bulging with postgis-is-missing related errors. That was solved in the mean time, so I could set the minimum log level back to a normal value (“warning”).

After crashing again, I looked at the error message in the logfile, which looked a bit different:

psql:/dump/db.out:51646175: ERROR:  relation "raster.crop" does not exist
psql:/dump/db.out:51663992: error: invalid command \.
psql:/dump/db.out:51663999: server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
psql:/dump/db.out:51663999: fatal: connection to server was lost

The second line, with invalid command \. was strange. It is the line after a huge number of lines where some ascii-encoded binary geotiff is loaded into a raster. So probably loading the geotiff into the raster fails with invalid message length.

When googling ‘postgres invalid message length’, I basically only found some message from 2003 related to a jdbc driver… When I get so few useful hits, I normally suspect there’s something very basic wrong with what I’m doing. Problem 3: no helpful google results.

The geotiff with the error was the first geotiff getting loaded, so it was probable that loading such rasters was the thing that failed. During lunch, I talked with someone about the problem and he mentioned that there was a breaking change between postgis 2 and 3: previously, the postgis_raster extension was included with the postgis extension, in version 3 you have to enable it manually.

So… The nice create extension postgis statements in the huge sql dump were not enough to also get the postgis_raster extension working. Hence, probably, the error.

In the end, it turned that most of the databases on the database server were more of the “it might be handy someday but it is not strictly necessary” variant. Apparently the source data was often stored somewhere else or could be re-created if needed… I heard this by pure luck during lunch. Ok… so why did I spend so much time on this???

Anyway, I did a dump of just the schemas and grepped a bit to find the databases that used rasters:

$ pg_dumpall --schema-only -v -f /tmp/schemas.dump

There were only three… I deleted them. The alternative was to handle the dumps separately, creating the databases by hand including enabling the postgis_raster extension and then loading the dumps. But it had taken enough time as is and the data had 90% chance of not being used and 99.99% chance of not being desperately needed, so… delete.

With that out of the way, I wanted to make individual dumps with pgdump --format=custom, as the custom format is way smaller than a plain sql dump (and apparently nicer to postgres). Only: pg_dumpall doesn’t support it. So I grabbed a list of the databases and whipped up a quick script:

#!/bin/bash
set -x
# ^^^ This prints every command: nice to monitor the progress.
cd /dump
# ^^^ A temp mount to dump the dumps.
pg_dump --format=custom -f dw_pacific.dump dw_pacific
pg_dump --format=custom -f geul_midden.dump geul_midden
...

Then a similar script to load them all on the other server:

#!/bin/bash
set -x
pg_restore -d postgres --clean --create dw_pacific.dump
pg_restore -d postgres --clean --create geul_midden.dump
...

So…. perhaps this long blog post helps someone to figure out why he’s getting an invalid message length message :-)

 
vanrees.org logo

About me

My name is Reinout van Rees and I work a lot with Python (programming language) and Django (website framework). I live in The Netherlands and I'm happily married to Annie van Rees-Kooiman.

Weblog feeds

Most of my website content is in my weblog. You can keep up to date by subscribing to the automatic feeds (for instance with Google reader):