pg_chameleon

MySQL to PostgreSQL replica system

BSD-2-CLAUSE License

Downloads
889
Stars
375
Committers
9

Bot releases are hidden (Show)

pg_chameleon - v2.0.0.rc1

Published by the4thdoctor almost 7 years ago

This release candidate comes with few bug fixes and few usability improvements.

Previously when adding a table with a replicated DDL having an unique key, the table's creation failed because of the fields were
set as NULLable . Now the command works properly.

The system now checks if the MySQL configuration allows the replica when initialising or refreshing replicated entities.

A new class rollbar_notifier was added in order to simplyfi the message management within the source and engine classes.

Now the commands init_replica,refresh_schema,sync_tables send an info notification to rollbar when they complete successfully or
an error if they don't.

The command sync_tables now allows the special name --tables disabled to have all the tables with replica disabled
re synchronised at once.

Changelog from v2.0beta1

  • Fix for issue #52, When adding a unique key the table's creation fails because of the NULLable field
  • Add check for the MySQL configuration when initialising or refreshing replicated entities
  • Add class rollbar_notifier for simpler message management
  • Add end of init_replica,refresh_schema,sync_tables notification to rollbar
  • Allow --tables disabled when syncing the tables to re synchronise all the tables excluded from the replica
pg_chameleon - v2.0.0.beta1

Published by the4thdoctor almost 7 years ago

The first beta for the milestone 2.0 adds fixes a long standing bug to the replica process and adds more features to the postgresql support.

The race condition fixed was caused by a not tokenised DDL preceeded by row images, causing the collected binlog rows to be added several times to the log_table.
It was quite hard to debug as the only visible effect was a primary key violation on random tables.

The issue is caused if a set of rows lesser than the replica_batch_size are followed by a DDL that is not tokenised (e.g. CREATE TEMPORARY TABLE `foo`; )
which coincides with the end of read from the binary log.
In that case the batch is not closed and the next read replica attempt will restart from the previous position reading and storing again the same set of rows.
When the batch is closed the replay function will eventually fail because of a primary/unique key violation.

The tokeniser now works properly when an ALTER TABLE ADD COLUMN's definition is surrounded by parentheses e.g. ALTER TABLE foo ADD COLUMN(bar varchar(30));
There are now error handlers when wrong table names, wrong schema names, wrong source name and wrong commands are specified to chameleon.py
When running commands that require a source name tye system checks if the source is registered.

The init_replica for source pgsql now can read from an hot standby but the copy is not consistent as it's not possible to export a snapshot from the hot standbys.
Also the * init_replica for source pgsql adds the copied tables as fake "replicated tables" for better show_status display.

For the source type pgsql the following restrictions apply.

  • There is no support for real time replica
  • The data copy happens always with file method
  • The copy_max_memory doesn't apply
  • The type override doesn't apply
  • Only init_replica is currently supported
  • The source connection string requires a database name

Changelog from v2.0alpha3

  • fix a race condition where an unrelated DDL can cause the collected binlog rows to be added several times to the log_table
  • fix regression in write ddl caused by the change of private method
  • fix wrong ddl parsing when a column definition is surrounded by parentheses e.g. ALTER TABLE foo ADD COLUMN(bar varchar(30));
  • error handling for wrong table names, wrong schema names, wrong source name and wrong commands
  • init_replica for source pgsql now can read from an hot standby but the copy is not consistent
  • init_replica for source pgsql adds "replicated tables" for better show_status display
  • check if the source is registered when running commands that require a source nam
pg_chameleon - v2.0.0.alpha3

Published by the4thdoctor almost 7 years ago

please note this is a not production release. do not use it in production

The third and final alpha3 for the milestone 2.0 fixes some issues and add more features to the system.

As there are changes in the replica catalog if upgrading from the alpha1 there will be need to do a drop_replica_schema
followed by a create_replica_schema. This will drop any existing replica and will require re adding the sources and
re initialise them with init_replica.

The system now supports a source type pgsql with the following limitations.

  • There is no support for real time replica
  • The data copy happens always with file method
  • The copy_max_memory doesn't apply
  • The type override doesn't apply
  • Only init_replica is currently supported
  • The source connection string requires a database name
  • In the show_status detailed command the replicated tables counters are always zero

A stack trace capture is now added on the log and the rollbar message for better debugging.
A new parameter on_error_replay is available for the sources to set whether the replay process should skip the tables or exit on error.

This release adds the command upgrade_replica_schema for upgrading the replica schema from the version 1.8 to the 2.0.

The upgrade procedure is described in the documentation.

Please read it carefully before any upgrade and backup the schema sch_chameleon before attempting any upgrade.

Changelog from v2.0alpha2

  • Remove limit_tables from binlogreader initialisation, as we can read from multiple schemas we should only exclude the tables not limit
  • Fix wrong formatting for default value when altering a field
  • Add upgrade procedure from version 1.8.2 to 2.0
  • Improve error logging and table exclusion in replay function
  • Add stack trace capture to the rollbar and log message when one of the replica daemon crash
  • Add on_error_replay to set whether the replay process should skip the tables or exit on error
  • Add init_replica support for source type pgsql (EXPERIMENTAL)
pg_chameleon - v1.8.2

Published by the4thdoctor almost 7 years ago

The version 1.8.2 is the bugfix for the final release 1.8 for the branch v1.
There are few bugfixes, ans some backports from the version 2.0, which is currently in alpha.

This release upgrades the replica catalogue to the version 1.7, adding a new field t_source_schema to the table t_sources .
The field is used only for the migration to the version 2.0.0 and is updated every time a sourceid is requested from the class pg_engine.

The show_version command now displays the source schema as well.

Changelog from 1.8.1

  • Fix for issue #33 pg can't handle NUL characters in string
  • Fix exception in b64 conversion when saving a discarded row
  • Add t_source_schema in table t_sources, used for the upgrade to the upcoming version 2.0
  • change log line formatting inspired by the super clean look in pgbackrest (thanks you guys)
  • Update show_status to display the source schema
pg_chameleon - v2.0.0.alpha2

Published by the4thdoctor almost 7 years ago

The second alpha of the milestone 2.0 comes after a week of full debugging. This release is more usable and stable than the
alpha1. As there are changes in the replica catalog if upgrading from the alpha1 there will be need to do a drop_replica_schema
followed by a create_replica_schema. This will drop any existing replica and will require re adding the sources and
re initialise them with init_replica.

The full list of changes is in the CHANGELOG file. However there are few notable remarks.

There is a detailed display of the show_status command when a source is specified. In particular the number of replicated and
not replicated tables is displayed. Also if any table as been pulled out from the replica it appears on the bottom.

From this release there is an error log which saves the exception's data during the replay phase.
The error log can be queried with the new command show_errors.

A new source parameter replay_max_rows has been added to set the amount of rows to replay.
Previously the value was set by the parameter replica_batch_size. If upgrading from alpha1 you may need to add
this parameter to your existing configuration.

Finally there is a new class called pgsql_source, not yet functional though.
This class will add a very basic support for the postgres source type.
More details will come in the alpha3.

Changelog from v2.0alpha1

  • Fix wrong position when determining the destination schema in read_replica_stream
  • Fix wrong log position stored in the source's high watermark
  • Fix wrong table inclusion/exclusion in read_replica_steam
  • Add source parameter replay_max_rows to set the amount of rows to replay. Previously the value was set by replica_batch_size
  • Fix crash when an alter table affected a table not replicated
  • Fixed issue with alter table during the drop/set default for the column (thanks to psycopg2's sql.Identifier)
  • add type display to source status
  • Add fix for issue #33 cleanup NUL markers from the rows before trying to insert them in PostgreSQL
  • Fix broken save_discarded_row
  • Add more detail to show_status when specifying the source with --source
  • Changed some methods to private
  • ensure the match for the alter table's commands are enclosed by word boundaries
  • add if exists when trying to drop the table in swap tables. previously adding a new table failed because the table wasn't there
  • fix wrong drop enum type when adding a new field
  • add log error for storing the errors generated during the replay
  • add not functional class pgsql_source for source type pgsql
  • allow type_override to be empty
  • add show_status command for displaying the log error entries
  • add separate logs for per source
  • change log line formatting inspired by the super clean look in pgbackrest (thanks you guys)
pg_chameleon - v2.0alpha1

Published by the4thdoctor almost 7 years ago

release notes

please note this is a not production release. do not use it in production

The documentation is available at http://www.pgchameleon.org/documents_v2/index.html

This is the first alpha of the milestone 2.0.
The project has been restructured in many ways thanks to the user's feedback.
Hopefully this will make the system much simple to use.

The main changes in the version 2 are the following.

The system is Python 3 only compatible. Python 3 is the future and there is no reason why to keep developing thing in 2.7.

The system now can read from multiple MySQL schemas in the same database and replicate them it into a target PostgreSQL database.
The source and target schema names can be different.

The system now use a conservative approach to the replica. The tables which generate errors during the replay are automatically excluded from the replica.

The init_replica process runs in background unless the logging is on the standard output or the debug option is passed to the command line.

The replica process now runs in background with two separated subprocess, one for the read and one for the replay.
If the logging is on the standard output or the debug option is passed to the command line the main process stays in foreground though.

The system now use a soft approach when initialising the replica .
The tables are locked only when copied. Their log coordinates will be used by the replica damon to put the database in a consistent status gradually.

The system can now use the rollbark key and environment to setup the Rollbar integration, for a better error detection.

changelog from version 1.8

  • Python 3 only development
  • Add support for reading from multiple MySQL schemas and restore them it into a target PostgreSQL database. The source and target schema names can be different.
  • Conservative approach to the replica. Tables which generate errors are automatically excluded from the replica.
  • Daemonised init_replica process.
  • Daemonised replica process with two separated subprocess, one for the read and one for the replay.
  • Soft replica initialisation. The tables are locked when needed and stored with their log coordinates. The replica damon will put the database in a consistent status gradually.
  • Rollbar integration for a simpler error detection.
pg_chameleon - v1.8.1

Published by the4thdoctor almost 7 years ago

Pg_chameleon is a replication tool from MySQL to PostgreSQL developed in Python 2.7 and Python 3.3+
The system relies on the mysql-replication library to pull the changes from MySQL and covert them into a jsonb object.
A plpgsql function decodes the jsonb and replays the changes into the PostgreSQL database.

The tool requires an initial replica setup which pulls the data from MySQL in read only mode.
This is done by the tool running FLUSH TABLE WITH READ LOCK; .

The tool can pull the data from a cascading replica when the MySQL slave is configured with log-slave-updates.

Changelog from 1.8

  • Fix for issue #31, MySQL numeric password breaks pg_chameleon
  • Fix for issue #32, pg Schema name upper case is dropped to lowercase in init_replica
  • Fix for issue #34, Missing escape for table names
pg_chameleon - v1.8

Published by the4thdoctor about 7 years ago

Pg_chameleon is a replication tool from MySQL to PostgreSQL developed in Python 2.7 and Python 3.3+
The system relies on the mysql-replication library to pull the changes from MySQL and covert them into a jsonb object.
A plpgsql function decodes the jsonb and replays the changes into the PostgreSQL database.

The tool requires an initial replica setup which pulls the data from MySQL in read only mode.
This is done by the tool running FLUSH TABLE WITH READ LOCK; .

The tool can pull the data from a cascading replica when the MySQL slave is configured with log-slave-updates.

Changelog from 1.7

  • Fix wrong check in thread alive when running with --thread option
  • Add support for RENAME statement
pg_chameleon - v1.7

Published by the4thdoctor about 7 years ago

Pg_chameleon is a replication tool from MySQL to PostgreSQL developed in Python 2.7 and Python 3.3+
The system relies on the mysql-replication library to pull the changes from MySQL and covert them into a jsonb object.
A plpgsql function decodes the jsonb and replays the changes into the PostgreSQL database.

The tool requires an initial replica setup which pulls the data from MySQL in read only mode.
This is done by the tool running FLUSH TABLE WITH READ LOCK; .

The tool can pull the data from a cascading replica when the MySQL slave is configured with log-slave-updates.

Changelog from 1.6

  • add optional threaded mode for read and replay as requested in issue #14
  • fix for issue #16 when running init_replica convert to NULL 0000-00-00 00:00:00 values if the data type is date or datetime
  • Handling of datatypes when migrating schema as requested in issue #21
pg_chameleon - v1.6

Published by the4thdoctor about 7 years ago

Pg_chameleon is a replication tool from MySQL to PostgreSQL developed in Python 2.7 and Python 3.3+
The system relies on the mysql-replication library to pull the changes from MySQL and covert them into a jsonb object.
A plpgsql function decodes the jsonb and replays the changes into the PostgreSQL database.

The tool requires an initial replica setup which pulls the data from MySQL in read only mode.
This is done by the tool running FLUSH TABLE WITH READ LOCK; .

The tool can pull the data from a cascading replica when the MySQL slave is configured with log-slave-updates.

Changelog from 1.5

  • fix wrong table name when parsing CREATE TABLE schema_name.table_name
  • fix missing parse for numeric_scale in sql_utils
  • lock only the affected tables when running sync_tables
  • improve performance for the replay plpgsql function
  • rename change lag to read lag and add replay lag in the the show_status output
  • add TravisCI configuration to the source tree
  • add set_config for initial config dir creation (needed by the CI tests)
  • add a regexp match to exclude the keywords in the parse alter table
  • add FOREIGN KEY to the excluded keyworkds when parsing alter table
  • fix issue #22 add KEY to the excluded keyworkds when parsing alter table
pg_chameleon - v1.5

Published by the4thdoctor about 7 years ago

Pg_chameleon is a replication tool from MySQL to PostgreSQL developed in Python 2.7 and Python 3.3+
The system relies on the mysql-replication library to pull the changes from MySQL and covert them into a jsonb object.
A plpgsql function decodes the jsonb and replays the changes into the PostgreSQL database.

The tool requires an initial replica setup which pulls the data from MySQL in read only mode.
This is done by the tool running FLUSH TABLE WITH READ LOCK; .

The tool can pull the data from a cascading replica when the MySQL slave is configured with log-slave-updates.

Changelog from 1.4

  • fix wrong evaluation in table's consistent state (row and DDL)
  • fix wrong dimensions when building floating point data types
  • add support for DEFAULT value on ADD COLUMN,CHANGE,MODIFY
  • add indices to the t_log _replica tables in order to speedup the batch cleanup
  • Fix for Issue #5 add cast to char with character set in column generation in order to override collation mix. the charset is the mysql general character set
  • Improve logging messages to be more descriptive
  • Remove total_events and evaluate when writing the batch using the real number of events stored from the mysql replica
  • fix ALTER TABLE...CHANGE parsing to lower the data type
  • add warning when a table is without primary key
  • Fix Issue #9 add configuration check for MySQL before starting the replica
  • Fix Issue #10 clarify running replica log messages to describe better what's happening
  • Add --nolock option as per request on Issue #13 . When started with --nolock the replica doesn't create the lock file in case of error.
pg_chameleon - v1.4

Published by the4thdoctor over 7 years ago

Pg_chameleon is a replication tool from MySQL to PostgreSQL developed in Python 2.7 and Python 3.3+
The system relies on the mysql-replication library to pull the changes from MySQL and covert them into a jsonb object.
A plpgsql function decodes the jsonb and replays the changes into the PostgreSQL database.

The tool requires an initial replica setup which pulls the data from MySQL in read only mode.
This is done by the tool running FLUSH TABLE WITH READ LOCK; .

The tool can pull the data from a cascading replica when the MySQL slave is configured with log-slave-updates.

sync_replica replaced by sync_tables

The command sync_replica is now replaced by sync_tables as this new name better reflects the concept behind the process.
The command requires the option --table followed by a comma separated list of table names.

If the specified table is not present in the origin's schema the table is silently skipped.
When a table is synchronised the existing copy in the target database is dropped and recreated from scratch.
In order to get the table in consistent state the log coordinates are saved in the the t_replica_tables.
The replica process will ignore the table until the log position reaches the table's snapsot position,
ensuring a consistent state for the replica target.

Changelog from 1.3.1

  • add varbinary to the type conversion dictionary
  • fix wrong quoting when key field is surrounded by backtick `
  • add geometry to the supported types
  • add varbinary and geometry to hexify in config-example.yaml
  • add INDEX and UNIQUE to excluded words when parsing alter table. this prevents the ddl replica to crash when the alter table adds an index
  • Fix for Issue #4 add generic exception when fallback on inserts to trap unexpected data not handled by psycopg2
  • Replace sync_replica with sync_tables. Check the release notes for implementation.
  • Add --version to display the program version.
  • Move documentation on pgchameleon.org
pg_chameleon - Release v1.3.1

Published by the4thdoctor over 7 years ago

Bugfix emergency release v1.3.1

I discovered a regression when running the init_replica caused by a wrong handling of missing time in master coordinates. Sorry about that.

Pg_chameleon is a replication tool from MySQL to PostgreSQL developed in Python 2.7 and Python 3.3+
The system relies on the mysql-replication library to pull the changes from MySQL and covert them into a jsonb object.
A plpgsql function decodes the jsonb and replays the changes into the PostgreSQL database.

The tool requires an initial replica setup which pulls the data from MySQL in read only mode.
This is done by the tool running FLUSH TABLE WITH READ LOCK; .

The tool can pull the data from a cascading replica when the MySQL slave is configured with log-slave-updates.

sync_replica disabled

The sync_replica command do not work as expected when running in single table mode.
As the issue requires time to be fixed this release temporarly disables the sync_replica command.

Change in replica storage

The initial implementation for the relay data was to have two log tables t_log_replica_1 and t_log_replica_2 with the
replica process accessing one table at a time.

This approach allows autovacuum to take care of the unused partition meanwhile the other is written.
The method worked fine with only one replica worker. However as the flip flop between the tables is calculated indepentently
for each source this could cause unwanted bloat on the log tables if several sources are replicating all togheter.
In this scenario autovacuum will struggle to truncate the empty space in the table's end.

The pg_chameleon version 1.3 implements the log tables per source. Each source have a dedicated couple of tables still inherited from
the root partition t_log_replica.

The schema is migrated at the first run after the upgrade by pg_chameleon's integrated schema migrator.
The upgrade scripts are installed in the python specific site-packages directory.

For example if have a python 3.6 virtualenv in the directory ~/venv you'll find the upgrade files in
~/venv/lib/python-3.6/site-packages/pg_chameleon/sql/upgrade/

The migration performs the following operations.

  • add a field v_log_table to t_sources
  • add an helper plpgsql function fn_refresh_parts() which creates the source's log tables if not present
  • with a DO block creates the new log tables for the existing sources and copies the data from the old t_log_replica_x to the new log tables
  • drops the old log tables
  • removes the field v_log_table from t_replica_batch

Upgrade

please read carefully before attempting any upgrade

The schema upgrade happen automatically at the first run.
Because this one involves a data copy could take more time than the usual. If the process seems frozen do not stop it otherwise you may lose your replica setup .

Upgrade steps

  • Stop all the replica sources. The show_status command must show all the rows in stopped status
  • Take a backup of the schema sch_chameleon with pg_dump
  • Upgrade pg_chameleon with pip install pg_chameleon --upgrade
  • Run chameleon.py upgrade_schema --config <your_config> --debug
  • When the upgrade is finished start the replica process as usual

Rollback

If something goes wrong in the upgrade process you shall restore the sch_chameleon's backup,
Then you should downgrade the installation to pg_chameleon 1.2 and start the replica as usual.

Changelog from 1.2

  • each source now uses two dedicated log tables for better performance
  • set descriptive application_name in postgres process to track the replica activity
  • fix race condition when two sources have a batch with the same value in t_replica_batch.ts_created
  • add switch --debug for having the log on console with DEBUG verbosity without need to change configuration file
  • fix regexp for foreign keys when omitting CONSTRAINT keyword
  • change lag display in show_status from seconds to interval for better display
  • add quote_col method in sql_token class to ensure all index columns are correctly quoted
  • add a release_notes file for the details on the releases 1.3+
  • fix wrong timestamp save when updating the last received event in sources
  • temporarly disable sync_replica because is not working as expected
pg_chameleon - Release v1.3

Published by the4thdoctor over 7 years ago

Pg_chameleon is a replication tool from MySQL to PostgreSQL developed in Python 2.7 and Python 3.3+
The system relies on the mysql-replication library to pull the changes from MySQL and covert them into a jsonb object.
A plpgsql function decodes the jsonb and replays the changes into the PostgreSQL database.

The tool requires an initial replica setup which pulls the data from MySQL in read only mode.
This is done by the tool running FLUSH TABLE WITH READ LOCK; .

The tool can pull the data from a cascading replica when the MySQL slave is configured with log-slave-updates.

sync_replica disabled

The sync_replica command do not work as expected when running in single table mode.
As the issue requires time to be fixed this release temporarly disables the sync_replica command.

Change in replica storage

The initial implementation for the relay data was to have two log tables t_log_replica_1 and t_log_replica_2 with the
replica process accessing one table at a time.

This approach allows autovacuum to take care of the unused partition meanwhile the other is written.
The method worked fine with only one replica worker. However as the flip flop between the tables is calculated indepentently
for each source this could cause unwanted bloat on the log tables if several sources are replicating all togheter.
In this scenario autovacuum will struggle to truncate the empty space in the table's end.

The pg_chameleon version 1.3 implements the log tables per source. Each source have a dedicated couple of tables still inherited from
the root partition t_log_replica.

The schema is migrated at the first run after the upgrade by pg_chameleon's integrated schema migrator.
The upgrade scripts are installed in the python specific site-packages directory.

For example if have a python 3.6 virtualenv in the directory ~/venv you'll find the upgrade files in
~/venv/lib/python-3.6/site-packages/pg_chameleon/sql/upgrade/

The migration performs the following operations.

  • add a field v_log_table to t_sources
  • add an helper plpgsql function fn_refresh_parts() which creates the source's log tables if not present
  • with a DO block creates the new log tables for the existing sources and copies the data from the old t_log_replica_x to the new log tables
  • drops the old log tables
  • removes the field v_log_table from t_replica_batch

Upgrade

please read carefully before attempting any upgrade

The schema upgrade happen automatically at the first run.
Because this one involves a data copy could take more time than the usual. If the process seems frozen do not stop it otherwise you may lose your replica setup .

Upgrade steps

  • Stop all the replica sources. The show_status command must show all the rows in stopped status
  • Take a backup of the schema sch_chameleon with pg_dump
  • Upgrade pg_chameleon with pip install pg_chameleon --upgrade
  • Run chameleon.py upgrade_schema --config <your_config> --debug
  • When the upgrade is finished start the replica process as usual

Rollback

If something goes wrong in the upgrade process you shall restore the sch_chameleon's backup,
Then you should downgrade the installation to pg_chameleon 1.2 and start the replica as usual.

Changelog from 1.2

  • each source now uses two dedicated log tables for better performance
  • set descriptive application_name in postgres process to track the replica activity
  • fix race condition when two sources have a batch with the same value in t_replica_batch.ts_created
  • add switch --debug for having the log on console with DEBUG verbosity without need to change configuration file
  • fix regexp for foreign keys when omitting CONSTRAINT keyword
  • change lag display in show_status from seconds to interval for better display
  • add quote_col method in sql_token class to ensure all index columns are correctly quoted
  • add a release_notes file for the details on the releases 1.3+
  • fix wrong timestamp save when updating the last received event in sources
  • temporarly disable sync_replica because is not working as expected
pg_chameleon - Release v1.2

Published by the4thdoctor over 7 years ago

Pg_chameleon is a replication tool from MySQL to PostgreSQL developed in Python 2.7 and Python 3.3+
The system relies on the mysql-replication library to pull the changes from MySQL and covert them into a jsonb object.
A plpgsql function decodes the jsonb and replays the changes into the PostgreSQL database.

The tool requires an initial replica setup which pulls the data from MySQL in read only mode.
This is done by the tool running FLUSH TABLE WITH READ LOCK; .

The tool can pull the data from a cascading replica when the MySQL slave is configured with log-slave-updates.

Changelog from 1.1

  • fix deadlock when replicating from multiple sources:
  • add source id when cleaning batches
  • add missing marker when outputting failed mysql query in copy_tabl_data
  • fix wrong decimal handling in build_tab_ddl
  • add bool to the data dictionary
  • exclude ddl when coming from schemas different from the one replicated
  • fix wrong quoting when capturing primary key inline
  • add error handling in read_replica
  • move the version view management in the pg_engine code
pg_chameleon - Release v1.1

Published by the4thdoctor over 7 years ago

Pg_chameleon is a replication tool from MySQL to PostgreSQL developed in Python 2.7 and Python 3.3+
The system relies on the mysql-replication library to pull the changes from MySQL and covert them into a jsonb object.
A plpgsql function decodes the jsonb and replays the changes into the PostgreSQL database.

The tool requires an initial replica setup which pulls the data from MySQL in read only mode.
This is done by the tool running FLUSH TABLE WITH READ LOCK; .

The tool can pull the data from a cascading replica when the MySQL slave is configured with log-slave-updates.

Changelog from 1.0

  • completed docstrings in sql_util.py
  • fix race condition when capturing queries not tokenised that leave the binglog position unchanged

The race condition fixed in the 1.1 affects servers with low write activity and using the temporary tables.
The following statement executed on mysql with the version 1.0 will cause the read_replica function restarting from the binlog position marked by the CREATE table.

DROP TABLE IF EXISTS test;
CREATE TABLE test(
  id SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT,
  value1 VARCHAR(45) NOT NULL,
  PRIMARY KEY  (id)
)ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TEMPORARY TABLE tmp_test(
  id SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT,
  value1 VARCHAR(45) NOT NULL,
  PRIMARY KEY  (id)
)ENGINE=InnoDB DEFAULT CHARSET=utf8;
insert into tmp_test (value1) values('blah'),('blah');
insert into test (value1) values('blah');
DROP TEMPORARY TABLE if exists tmp_test ;

This is caused by the batch not marked as closed when the query is received. The batch was closed only when the query were parsed. The statements CREATE TEMPORARY TABLE are not parsed (by design) and therefore the batch were not closed correctly. The subsequent row images generated by the inserts were read at each loop causing a replay issue because of the primary key violation.

pg_chameleon - Release v1.0

Published by the4thdoctor over 7 years ago

Pg_chameleon is a replication tool from MySQL to PostgreSQL developed in Python 2.7 and Python 3.3+
The system relies on the mysql-replication library to pull the changes from MySQL and covert them into a jsonb object.
A plpgsql function decodes the jsonb and replays the changes into the PostgreSQL database.

The tool requires an initial replica setup which pulls the data from MySQL in read only mode.
This is done by the tool running FLUSH TABLE WITH READ LOCK; .

The tool can pull the data from a cascading replica when the MySQL slave is configured with log-slave-updates.

Changelog from 1.0-RC.2

  • Completed docstrings in pg_lib.py
pg_chameleon - Release v1.0-RC.2

Published by the4thdoctor over 7 years ago

Pg_chameleon is a replication tool from MySQL to PostgreSQL developed in Python 2.7 and Python 3.3+
The system relies on the mysql-replication library to pull the changes from MySQL and covert them into a jsonb object.
A plpgsql function decodes the jsonb and replays the changes into the PostgreSQL database.

The tool requires an initial replica setup which pulls the data from MySQL in read only mode.
This is done by the tool running FLUSH TABLE WITH READ LOCK; .

The tool can pull the data from a cascading replica when the MySQL slave is configured with log-slave-updates.

Changelog from 1.0-RC.1

  • Completed docstrings in global_lib.py and mysql_lib.py
  • Partial writing for docstrings in pg_lib.py
  • Restore fallback on inserts when the copy batch data fails
pg_chameleon - Release v1.0-RC.1

Published by the4thdoctor over 7 years ago

Pg_chameleon is a replication tool from MySQL to PostgreSQL developed in Python 2.7 and Python 3.3+
The system relies on the mysql-replication library to pull the changes from MySQL and covert them into a jsonb object.
A plpgsql function decodes the jsonb and replays the changes into the PostgreSQL database.

The tool requires an initial replica setup which pulls the data from MySQL in read only mode.
This is done by the tool running FLUSH TABLE WITH READ LOCK; .

The tool can pull the data from a cascading replica when the MySQL slave is configured with log-slave-updates.

Changelog from 1.0 Beta 2

  • add support for primay key defined as column constraint
  • fix regression if null constraint is omitted in a primary key column
  • add foreign key generation to detach replica. keys are added invalid and a validation attempt is performed.
  • add support for truncate table
  • add parameter out_dir to set the output destination for the csv files during init_replica
  • add set tables_only to table_limit when streaming the mysql replica
  • force a close batch on rotate event if binlog changes without any row events collected
  • fixed replica regression with python 3.x and empty binary data
  • added event_update in hexlify strategy
  • add tabulate for nice display for sources/status
  • logs are rotated on a daily basis
  • removed parameter log_append
  • add parameter log_days_keep to specify how many days keep the logs
  • feature freeze
pg_chameleon - Release v1.0-beta.2

Published by the4thdoctor over 7 years ago

Pg_chameleon is a replication tool from MySQL to PostgreSQL developed in Python 2.7 and Python 3.3+
The system relies on the mysql-replication library to pull the changes from MySQL and covert them into a jsonb object.
A plpgsql function decodes the jsonb and replays the changes into the PostgreSQL database.

The tool requires an initial replica setup which pulls the data from MySQL in read only mode.
This is done by the tool running FLUSH TABLE WITH READ LOCK; .

The tool can pull the data from a cascading replica when the MySQL slave is configured with log-slave-updates.

Changelog from 1.0 Beta 1

  • add detach replica with automatic sequence reset (no FK generation yet)
  • replica speed improvement with the exclusion of BEGIN,COMMIT when capturing the queries from MySQL
  • fix the capturing regexp for primary keys and foreign keys
  • fix version in documentation