MySQL to PostgreSQL replica system
BSD-2-CLAUSE License
Bot releases are visible (Hide)
Published by the4thdoctor almost 7 years ago
This release candidate comes with few bug fixes and few usability improvements.
Previously when adding a table with a replicated DDL having an unique key, the table's creation failed because of the fields were
set as NULLable . Now the command works properly.
The system now checks if the MySQL configuration allows the replica when initialising or refreshing replicated entities.
A new class rollbar_notifier
was added in order to simplyfi the message management within the source and engine classes.
Now the commands init_replica,refresh_schema,sync_tables
send an info notification to rollbar when they complete successfully or
an error if they don't.
The command sync_tables
now allows the special name --tables disabled
to have all the tables with replica disabled
re synchronised at once.
--tables disabled
when syncing the tables to re synchronise all the tables excluded from the replicaPublished by the4thdoctor almost 7 years ago
The first beta for the milestone 2.0 adds fixes a long standing bug to the replica process and adds more features to the postgresql support.
The race condition fixed was caused by a not tokenised DDL preceeded by row images, causing the collected binlog rows to be added several times to the log_table.
It was quite hard to debug as the only visible effect was a primary key violation on random tables.
The issue is caused if a set of rows lesser than the replica_batch_size
are followed by a DDL that is not tokenised (e.g. CREATE TEMPORARY TABLE `foo`;
)
which coincides with the end of read from the binary log.
In that case the batch is not closed and the next read replica attempt will restart from the previous position reading and storing again the same set of rows.
When the batch is closed the replay function will eventually fail because of a primary/unique key violation.
The tokeniser now works properly when an ALTER TABLE ADD COLUMN
's definition is surrounded by parentheses e.g. ALTER TABLE foo ADD COLUMN(bar varchar(30));
There are now error handlers when wrong table names, wrong schema names, wrong source name and wrong commands are specified to chameleon.py
When running commands that require a source name tye system checks if the source is registered.
The init_replica
for source pgsql now can read from an hot standby but the copy is not consistent as it's not possible to export a snapshot from the hot standbys.
Also the * init_replica
for source pgsql adds the copied tables as fake "replicated tables" for better show_status display.
For the source type pgsql
the following restrictions apply.
init_replica
is currently supportedALTER TABLE foo ADD COLUMN(bar varchar(30));
Published by the4thdoctor almost 7 years ago
please note this is a not production release. do not use it in production
The third and final alpha3 for the milestone 2.0 fixes some issues and add more features to the system.
As there are changes in the replica catalog if upgrading from the alpha1 there will be need to do a drop_replica_schema
followed by a create_replica_schema
. This will drop any existing replica and will require re adding the sources and
re initialise them with init_replica
.
The system now supports a source type pgsql
with the following limitations.
init_replica
is currently supportedshow_status
detailed command the replicated tables counters are always zeroA stack trace capture is now added on the log and the rollbar message for better debugging.
A new parameter on_error_replay
is available for the sources to set whether the replay process should skip the tables or exit on error.
This release adds the command upgrade_replica_schema
for upgrading the replica schema from the version 1.8 to the 2.0.
The upgrade procedure is described in the documentation.
Please read it carefully before any upgrade and backup the schema sch_chameleon before attempting any upgrade.
on_error_replay
to set whether the replay process should skip the tables or exit on errorPublished by the4thdoctor almost 7 years ago
The version 1.8.2 is the bugfix for the final release 1.8 for the branch v1.
There are few bugfixes, ans some backports from the version 2.0, which is currently in alpha.
This release upgrades the replica catalogue to the version 1.7, adding a new field t_source_schema
to the table t_sources
.
The field is used only for the migration to the version 2.0.0 and is updated every time a sourceid is requested from the class pg_engine
.
The show_version command now displays the source schema as well.
show_status
to display the source schemaPublished by the4thdoctor almost 7 years ago
The second alpha of the milestone 2.0 comes after a week of full debugging. This release is more usable and stable than the
alpha1. As there are changes in the replica catalog if upgrading from the alpha1 there will be need to do a drop_replica_schema
followed by a create_replica_schema
. This will drop any existing replica and will require re adding the sources and
re initialise them with init_replica
.
The full list of changes is in the CHANGELOG file. However there are few notable remarks.
There is a detailed display of the show_status
command when a source is specified. In particular the number of replicated and
not replicated tables is displayed. Also if any table as been pulled out from the replica it appears on the bottom.
From this release there is an error log which saves the exception's data during the replay phase.
The error log can be queried with the new command show_errors
.
A new source parameter replay_max_rows
has been added to set the amount of rows to replay.
Previously the value was set by the parameter replica_batch_size
. If upgrading from alpha1 you may need to add
this parameter to your existing configuration.
Finally there is a new class called pgsql_source
, not yet functional though.
This class will add a very basic support for the postgres source type.
More details will come in the alpha3.
replay_max_rows
to set the amount of rows to replay. Previously the value was set by replica_batch_size
type_override
to be emptyPublished by the4thdoctor almost 7 years ago
please note this is a not production release. do not use it in production
The documentation is available at http://www.pgchameleon.org/documents_v2/index.html
This is the first alpha of the milestone 2.0.
The project has been restructured in many ways thanks to the user's feedback.
Hopefully this will make the system much simple to use.
The main changes in the version 2 are the following.
The system is Python 3 only compatible. Python 3 is the future and there is no reason why to keep developing thing in 2.7.
The system now can read from multiple MySQL schemas in the same database and replicate them it into a target PostgreSQL database.
The source and target schema names can be different.
The system now use a conservative approach to the replica. The tables which generate errors during the replay are automatically excluded from the replica.
The init_replica process runs in background unless the logging is on the standard output or the debug option is passed to the command line.
The replica process now runs in background with two separated subprocess, one for the read and one for the replay.
If the logging is on the standard output or the debug option is passed to the command line the main process stays in foreground though.
The system now use a soft approach when initialising the replica .
The tables are locked only when copied. Their log coordinates will be used by the replica damon to put the database in a consistent status gradually.
The system can now use the rollbark key and environment to setup the Rollbar integration, for a better error detection.
Published by the4thdoctor almost 7 years ago
Pg_chameleon is a replication tool from MySQL to PostgreSQL developed in Python 2.7 and Python 3.3+
The system relies on the mysql-replication library to pull the changes from MySQL and covert them into a jsonb object.
A plpgsql function decodes the jsonb and replays the changes into the PostgreSQL database.
The tool requires an initial replica setup which pulls the data from MySQL in read only mode.
This is done by the tool running FLUSH TABLE WITH READ LOCK; .
The tool can pull the data from a cascading replica when the MySQL slave is configured with log-slave-updates.
Published by the4thdoctor about 7 years ago
Pg_chameleon is a replication tool from MySQL to PostgreSQL developed in Python 2.7 and Python 3.3+
The system relies on the mysql-replication library to pull the changes from MySQL and covert them into a jsonb object.
A plpgsql function decodes the jsonb and replays the changes into the PostgreSQL database.
The tool requires an initial replica setup which pulls the data from MySQL in read only mode.
This is done by the tool running FLUSH TABLE WITH READ LOCK; .
The tool can pull the data from a cascading replica when the MySQL slave is configured with log-slave-updates.
Published by the4thdoctor about 7 years ago
Pg_chameleon is a replication tool from MySQL to PostgreSQL developed in Python 2.7 and Python 3.3+
The system relies on the mysql-replication library to pull the changes from MySQL and covert them into a jsonb object.
A plpgsql function decodes the jsonb and replays the changes into the PostgreSQL database.
The tool requires an initial replica setup which pulls the data from MySQL in read only mode.
This is done by the tool running FLUSH TABLE WITH READ LOCK; .
The tool can pull the data from a cascading replica when the MySQL slave is configured with log-slave-updates.
Published by the4thdoctor about 7 years ago
Pg_chameleon is a replication tool from MySQL to PostgreSQL developed in Python 2.7 and Python 3.3+
The system relies on the mysql-replication library to pull the changes from MySQL and covert them into a jsonb object.
A plpgsql function decodes the jsonb and replays the changes into the PostgreSQL database.
The tool requires an initial replica setup which pulls the data from MySQL in read only mode.
This is done by the tool running FLUSH TABLE WITH READ LOCK; .
The tool can pull the data from a cascading replica when the MySQL slave is configured with log-slave-updates.
Published by the4thdoctor about 7 years ago
Pg_chameleon is a replication tool from MySQL to PostgreSQL developed in Python 2.7 and Python 3.3+
The system relies on the mysql-replication library to pull the changes from MySQL and covert them into a jsonb object.
A plpgsql function decodes the jsonb and replays the changes into the PostgreSQL database.
The tool requires an initial replica setup which pulls the data from MySQL in read only mode.
This is done by the tool running FLUSH TABLE WITH READ LOCK; .
The tool can pull the data from a cascading replica when the MySQL slave is configured with log-slave-updates.
Published by the4thdoctor over 7 years ago
Pg_chameleon is a replication tool from MySQL to PostgreSQL developed in Python 2.7 and Python 3.3+
The system relies on the mysql-replication library to pull the changes from MySQL and covert them into a jsonb object.
A plpgsql function decodes the jsonb and replays the changes into the PostgreSQL database.
The tool requires an initial replica setup which pulls the data from MySQL in read only mode.
This is done by the tool running FLUSH TABLE WITH READ LOCK; .
The tool can pull the data from a cascading replica when the MySQL slave is configured with log-slave-updates.
The command sync_replica is now replaced by sync_tables as this new name better reflects the concept behind the process.
The command requires the option --table followed by a comma separated list of table names.
If the specified table is not present in the origin's schema the table is silently skipped.
When a table is synchronised the existing copy in the target database is dropped and recreated from scratch.
In order to get the table in consistent state the log coordinates are saved in the the t_replica_tables.
The replica process will ignore the table until the log position reaches the table's snapsot position,
ensuring a consistent state for the replica target.
Published by the4thdoctor over 7 years ago
I discovered a regression when running the init_replica caused by a wrong handling of missing time in master coordinates. Sorry about that.
Pg_chameleon is a replication tool from MySQL to PostgreSQL developed in Python 2.7 and Python 3.3+
The system relies on the mysql-replication library to pull the changes from MySQL and covert them into a jsonb object.
A plpgsql function decodes the jsonb and replays the changes into the PostgreSQL database.
The tool requires an initial replica setup which pulls the data from MySQL in read only mode.
This is done by the tool running FLUSH TABLE WITH READ LOCK; .
The tool can pull the data from a cascading replica when the MySQL slave is configured with log-slave-updates.
The sync_replica command do not work as expected when running in single table mode.
As the issue requires time to be fixed this release temporarly disables the sync_replica command.
The initial implementation for the relay data was to have two log tables t_log_replica_1 and t_log_replica_2 with the
replica process accessing one table at a time.
This approach allows autovacuum to take care of the unused partition meanwhile the other is written.
The method worked fine with only one replica worker. However as the flip flop between the tables is calculated indepentently
for each source this could cause unwanted bloat on the log tables if several sources are replicating all togheter.
In this scenario autovacuum will struggle to truncate the empty space in the table's end.
The pg_chameleon version 1.3 implements the log tables per source. Each source have a dedicated couple of tables still inherited from
the root partition t_log_replica.
The schema is migrated at the first run after the upgrade by pg_chameleon's integrated schema migrator.
The upgrade scripts are installed in the python specific site-packages directory.
For example if have a python 3.6 virtualenv in the directory ~/venv you'll find the upgrade files in
~/venv/lib/python-3.6/site-packages/pg_chameleon/sql/upgrade/
The migration performs the following operations.
please read carefully before attempting any upgrade
The schema upgrade happen automatically at the first run.
Because this one involves a data copy could take more time than the usual. If the process seems frozen do not stop it otherwise you may lose your replica setup .
pip install pg_chameleon --upgrade
chameleon.py upgrade_schema --config <your_config> --debug
If something goes wrong in the upgrade process you shall restore the sch_chameleon's backup,
Then you should downgrade the installation to pg_chameleon 1.2 and start the replica as usual.
Published by the4thdoctor over 7 years ago
Pg_chameleon is a replication tool from MySQL to PostgreSQL developed in Python 2.7 and Python 3.3+
The system relies on the mysql-replication library to pull the changes from MySQL and covert them into a jsonb object.
A plpgsql function decodes the jsonb and replays the changes into the PostgreSQL database.
The tool requires an initial replica setup which pulls the data from MySQL in read only mode.
This is done by the tool running FLUSH TABLE WITH READ LOCK; .
The tool can pull the data from a cascading replica when the MySQL slave is configured with log-slave-updates.
The sync_replica command do not work as expected when running in single table mode.
As the issue requires time to be fixed this release temporarly disables the sync_replica command.
The initial implementation for the relay data was to have two log tables t_log_replica_1 and t_log_replica_2 with the
replica process accessing one table at a time.
This approach allows autovacuum to take care of the unused partition meanwhile the other is written.
The method worked fine with only one replica worker. However as the flip flop between the tables is calculated indepentently
for each source this could cause unwanted bloat on the log tables if several sources are replicating all togheter.
In this scenario autovacuum will struggle to truncate the empty space in the table's end.
The pg_chameleon version 1.3 implements the log tables per source. Each source have a dedicated couple of tables still inherited from
the root partition t_log_replica.
The schema is migrated at the first run after the upgrade by pg_chameleon's integrated schema migrator.
The upgrade scripts are installed in the python specific site-packages directory.
For example if have a python 3.6 virtualenv in the directory ~/venv you'll find the upgrade files in
~/venv/lib/python-3.6/site-packages/pg_chameleon/sql/upgrade/
The migration performs the following operations.
please read carefully before attempting any upgrade
The schema upgrade happen automatically at the first run.
Because this one involves a data copy could take more time than the usual. If the process seems frozen do not stop it otherwise you may lose your replica setup .
pip install pg_chameleon --upgrade
chameleon.py upgrade_schema --config <your_config> --debug
If something goes wrong in the upgrade process you shall restore the sch_chameleon's backup,
Then you should downgrade the installation to pg_chameleon 1.2 and start the replica as usual.
Published by the4thdoctor over 7 years ago
Pg_chameleon is a replication tool from MySQL to PostgreSQL developed in Python 2.7 and Python 3.3+
The system relies on the mysql-replication library to pull the changes from MySQL and covert them into a jsonb object.
A plpgsql function decodes the jsonb and replays the changes into the PostgreSQL database.
The tool requires an initial replica setup which pulls the data from MySQL in read only mode.
This is done by the tool running FLUSH TABLE WITH READ LOCK; .
The tool can pull the data from a cascading replica when the MySQL slave is configured with log-slave-updates.
Published by the4thdoctor over 7 years ago
Pg_chameleon is a replication tool from MySQL to PostgreSQL developed in Python 2.7 and Python 3.3+
The system relies on the mysql-replication library to pull the changes from MySQL and covert them into a jsonb object.
A plpgsql function decodes the jsonb and replays the changes into the PostgreSQL database.
The tool requires an initial replica setup which pulls the data from MySQL in read only mode.
This is done by the tool running FLUSH TABLE WITH READ LOCK; .
The tool can pull the data from a cascading replica when the MySQL slave is configured with log-slave-updates.
The race condition fixed in the 1.1 affects servers with low write activity and using the temporary tables.
The following statement executed on mysql with the version 1.0 will cause the read_replica function restarting from the binlog position marked by the CREATE table.
DROP TABLE IF EXISTS test;
CREATE TABLE test(
id SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT,
value1 VARCHAR(45) NOT NULL,
PRIMARY KEY (id)
)ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TEMPORARY TABLE tmp_test(
id SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT,
value1 VARCHAR(45) NOT NULL,
PRIMARY KEY (id)
)ENGINE=InnoDB DEFAULT CHARSET=utf8;
insert into tmp_test (value1) values('blah'),('blah');
insert into test (value1) values('blah');
DROP TEMPORARY TABLE if exists tmp_test ;
This is caused by the batch not marked as closed when the query is received. The batch was closed only when the query were parsed. The statements CREATE TEMPORARY TABLE are not parsed (by design) and therefore the batch were not closed correctly. The subsequent row images generated by the inserts were read at each loop causing a replay issue because of the primary key violation.
Published by the4thdoctor over 7 years ago
Pg_chameleon is a replication tool from MySQL to PostgreSQL developed in Python 2.7 and Python 3.3+
The system relies on the mysql-replication library to pull the changes from MySQL and covert them into a jsonb object.
A plpgsql function decodes the jsonb and replays the changes into the PostgreSQL database.
The tool requires an initial replica setup which pulls the data from MySQL in read only mode.
This is done by the tool running FLUSH TABLE WITH READ LOCK; .
The tool can pull the data from a cascading replica when the MySQL slave is configured with log-slave-updates.
Published by the4thdoctor over 7 years ago
Pg_chameleon is a replication tool from MySQL to PostgreSQL developed in Python 2.7 and Python 3.3+
The system relies on the mysql-replication library to pull the changes from MySQL and covert them into a jsonb object.
A plpgsql function decodes the jsonb and replays the changes into the PostgreSQL database.
The tool requires an initial replica setup which pulls the data from MySQL in read only mode.
This is done by the tool running FLUSH TABLE WITH READ LOCK; .
The tool can pull the data from a cascading replica when the MySQL slave is configured with log-slave-updates.
Published by the4thdoctor over 7 years ago
Pg_chameleon is a replication tool from MySQL to PostgreSQL developed in Python 2.7 and Python 3.3+
The system relies on the mysql-replication library to pull the changes from MySQL and covert them into a jsonb object.
A plpgsql function decodes the jsonb and replays the changes into the PostgreSQL database.
The tool requires an initial replica setup which pulls the data from MySQL in read only mode.
This is done by the tool running FLUSH TABLE WITH READ LOCK; .
The tool can pull the data from a cascading replica when the MySQL slave is configured with log-slave-updates.
Published by the4thdoctor over 7 years ago
Pg_chameleon is a replication tool from MySQL to PostgreSQL developed in Python 2.7 and Python 3.3+
The system relies on the mysql-replication library to pull the changes from MySQL and covert them into a jsonb object.
A plpgsql function decodes the jsonb and replays the changes into the PostgreSQL database.
The tool requires an initial replica setup which pulls the data from MySQL in read only mode.
This is done by the tool running FLUSH TABLE WITH READ LOCK; .
The tool can pull the data from a cascading replica when the MySQL slave is configured with log-slave-updates.