PostgreSQL Logical Replication Gotchas – pgDash
PostgreSQL 10 got here with the welcome addition of the logical replication
characteristic. This offers a extra versatile and simpler means to copy your
tables than the common streaming replication mechanism. Nonetheless, it does
have some limitations which will or might not forestall you from using it for
replication. Learn on to be taught extra.
What Is Logical Replication Anyway?
Streaming Replication
Previous to v10, the one strategy to replicate information residing in a server was to
replicate the modifications at WAL-level. Throughout it’s operation, a PostgreSQL server
(the major) generates a sequence of WAL information. The essential concept is to get
these information over to a different PostgreSQL server (the standby) which takes in
these information and “replays” them to recreate the identical modifications occurring on the
major server. The standby server stays in a read-only mode referred to as the
restoration mode, and any modifications to the standby server are not permitted (that
is, solely read-only transactions are allowed).
The method of delivery the WAL information from the first to standby known as log
delivery, and may be carried out manually (scripts to rsync modifications from major’s
$PGDATA/pg_wal
listing to secondary’s) or by way of streaming replication.
Numerous options like replication slots, standby suggestions and failover
have been added over time to enhance the reliability and usefulness of streaming
replication.
One large “characteristic” of streaming replication is that it’s all or nothing. All
modifications to all objects from all databases on the first must be shipped to
the standby, and the standby has to import each change. It’s not potential to
selectively replicate part of your database.
Logical Replication
Logical Replication, added in v10, makes it potential to do exactly that –
replicate solely a set of tables to different servers. It’s best
defined with an instance. Let’s take a database referred to as src
in a server, and
create a desk in it:
We’re additionally going to create a publication on this database (notice that you just want
to have superuser privileges to do that):
Now let’s goto a database dst
on one other server and create an identical desk:
And we now setup a subscription right here that may hook up with the publication on
the supply and begin pulling within the modifications. (Notice that you might want to have a consumer
repuser
on the supply server with replication privileges and browse entry to
the tables.)
The modifications are synced, and you may see the rows on the vacation spot facet:
The vacation spot desk has an additional column “col3”, which isn’t touched by the
replication. The modifications are replicated “logically” – so, so long as it potential
to insert a row with t.col1 and t.col2 alone, the replication course of will do
so.
In comparison with streaming replication, the logical replication characteristic is ideal
for replicating, say, a single schema or a set of tables in a particular database
to a different server.
Replication of Schema Adjustments
Assume you have got a Django utility with it’s set of tables residing within the
supply database. It’s straightforward and environment friendly to setup logical replication to deliver
over all these tables into one other server, the place you may run reporting,
analytics, batch jobs, developer/buyer help apps and the like with out
touching the “actual” information and with out affecting the manufacturing app.
Presumably the largest limitation of Logical Replication at the moment is that it does
not replicate schema modifications – any DDL command executed on the supply database
doesn’t trigger an identical change within the vacation spot database, not like in streaming
replication. For instance, if we do that on the supply database:
this will get logged within the vacation spot log file:
and the replication stops. The column needs to be added “manually” on the
vacation spot, at which level the replication resumes:
Which means in case your Django utility has added a brand new characteristic that wants
new columns or tables, and also you’ve to run django-admin migrate
on the supply
database, the replication setup breaks.
Workaround
Your finest guess to repair this difficulty can be to pause the subscription on the
vacation spot, migrate the vacation spot first, then the supply after which resume the
subscription. You’ll be able to pause and resume subscriptions like this:
If new tables are added and your publication isn’t “FOR ALL TABLES”, you’ll
want so as to add them to the publication manually:
You’ll additionally have to “refresh” the subscription on the vacation spot facet to inform
Postgres to begin syncing the brand new tables:
Sequences
Take into account this desk on the supply, having a sequence:
The sequence s_a_seq
was created to again the a
column, of serial
kind.
This generates the autoincrementing values for s.a
. Now let’s replicate this
into dst
, and insert one other row:
Oops, what simply occurred? The vacation spot tried to begin the sequence from
scratch and generated a price of 1 for a
. It is because logical replication
doesn’t replicate the values for sequences for the reason that subsequent worth of the
sequence isn’t saved within the desk itself.
Workaround
If you consider it logically, you may’t modify the identical “autoincrement” worth
from two locations with out bidirectional synchronization. If you actually need an
incrementing quantity in every row of a desk, and have to insert into that desk
from a number of servers, you might:
- use an exterior supply for the quantity, like ZooKeeper or etcd,
- use non-overlapping ranges – for instance, the primary server generates and
inserts numbers within the vary 1 to 1 million, the second within the vary 1
million to 2 million, and so forth.
Tables With out Distinctive Rows
Let’s strive making a desk and not using a major key, and replicating it:
And the rows are actually on the vacation spot too:
Now let’s strive deleting the second row on the supply:
This occurs as a result of the vacation spot received’t be capable to uniquely determine the row
that must be deleted (or up to date) and not using a major key.
Workaround
You’ll be able to after all, alter the schema to incorporate a major key. In case you don’t
wish to try this, you ALTER TABLE
and set the “reproduction determine” to the total
row or a novel index. For instance:
The delete now succeeds, and the replication too:
In case your desk actually has no strategy to uniquely determine rows, then you definitely’re a bit
caught. See the REPLICA IDENTITY
part of ALTER
TABLE for extra data.
In another way Partitioned Locations
Wouldn’t or not it’s good to have a supply that’s partitioned a method and a
vacation spot another way? For instance, on the supply we will hold
parititions for every month, and on the vacation spot for every year. Presumably the
vacation spot is a much bigger machine, and we have to hold historic information, however want
that information hardly ever.
Let’s create a monthly-partitioned desk on the supply:
And check out making a yearly-partitioned desk on the vacation spot:
Postgres complains that it wants the partition desk for Jan 2019, which now we have
no intention of making on the vacation spot.
This occurs as a result of logical replication works not on the base desk stage, however
on the little one desk stage. There isn’t any actual workaround for this – if you happen to’re
utilizing partitions, the partition hierarchy should be the identical on either side of a
logical replication setup.
Massive Objects
Massive objects can’t be replicated utilizing logical replication. That is most likely
not a giant deal these days, as storing massive objects isn’t a standard modern-day
apply. Additionally it is simpler to retailer a reference to a big object on some
exterior, redudant storage (like NFS, S3 and many others.) and replicate that reference slightly than storing and
replicating the item itself.
About pgDash
pgDash is a contemporary, in-depth monitoring resolution designed
particularly for PostgreSQL deployments. pgDash exhibits you data and
metrics about each side of your PostgreSQL database server, collected utilizing
the open-source device pgmetrics.
pgDash offers core reporting and visualization
performance, together with amassing and displaying PostgreSQL data and
offering time-series graphs, detailed reviews, alerting, groups and extra.
pgDash can monitor replication, locks, queries and extra. It may possibly additionally present
fast diagnostics and in depth alerts. Learn more here or signup
today for a free trial.