pg_chameleon - v2.0.0.rc1

Published by the4thdoctor almost 7 years ago

This release candidate comes with few bug fixes and few usability improvements.

Previously when adding a table with a replicated DDL having an unique key, the table's creation failed because of the fields were
set as NULLable . Now the command works properly.

The system now checks if the MySQL configuration allows the replica when initialising or refreshing replicated entities.

A new class rollbar_notifier was added in order to simplyfi the message management within the source and engine classes.

Now the commands init_replica,refresh_schema,sync_tables send an info notification to rollbar when they complete successfully or
an error if they don't.

The command sync_tables now allows the special name --tables disabled to have all the tables with replica disabled
re synchronised at once.

Changelog from v2.0beta1

Fix for issue #52, When adding a unique key the table's creation fails because of the NULLable field
Add check for the MySQL configuration when initialising or refreshing replicated entities
Add class rollbar_notifier for simpler message management
Add end of init_replica,refresh_schema,sync_tables notification to rollbar
Allow --tables disabled when syncing the tables to re synchronise all the tables excluded from the replica

pg_chameleon - v2.0.0.beta1

Published by the4thdoctor almost 7 years ago

The first beta for the milestone 2.0 adds fixes a long standing bug to the replica process and adds more features to the postgresql support.

The race condition fixed was caused by a not tokenised DDL preceeded by row images, causing the collected binlog rows to be added several times to the log_table.
It was quite hard to debug as the only visible effect was a primary key violation on random tables.

The issue is caused if a set of rows lesser than the replica_batch_size are followed by a DDL that is not tokenised (e.g. CREATE TEMPORARY TABLE `foo`; )
which coincides with the end of read from the binary log.
In that case the batch is not closed and the next read replica attempt will restart from the previous position reading and storing again the same set of rows.
When the batch is closed the replay function will eventually fail because of a primary/unique key violation.

The tokeniser now works properly when an ALTER TABLE ADD COLUMN's definition is surrounded by parentheses e.g. ALTER TABLE foo ADD COLUMN(bar varchar(30));
There are now error handlers when wrong table names, wrong schema names, wrong source name and wrong commands are specified to chameleon.py
When running commands that require a source name tye system checks if the source is registered.

The init_replica for source pgsql now can read from an hot standby but the copy is not consistent as it's not possible to export a snapshot from the hot standbys.
Also the * init_replica for source pgsql adds the copied tables as fake "replicated tables" for better show_status display.

For the source type pgsql the following restrictions apply.

There is no support for real time replica
The data copy happens always with file method
The copy_max_memory doesn't apply
The type override doesn't apply
Only init_replica is currently supported
The source connection string requires a database name

Changelog from v2.0alpha3

fix a race condition where an unrelated DDL can cause the collected binlog rows to be added several times to the log_table
fix regression in write ddl caused by the change of private method
fix wrong ddl parsing when a column definition is surrounded by parentheses e.g. ALTER TABLE foo ADD COLUMN(bar varchar(30));
error handling for wrong table names, wrong schema names, wrong source name and wrong commands
init_replica for source pgsql now can read from an hot standby but the copy is not consistent
init_replica for source pgsql adds "replicated tables" for better show_status display
check if the source is registered when running commands that require a source nam

pg_chameleon - v2.0.0.alpha3

Published by the4thdoctor almost 7 years ago

please note this is a not production release. do not use it in production

The third and final alpha3 for the milestone 2.0 fixes some issues and add more features to the system.

As there are changes in the replica catalog if upgrading from the alpha1 there will be need to do a drop_replica_schema
followed by a create_replica_schema. This will drop any existing replica and will require re adding the sources and
re initialise them with init_replica.

The system now supports a source type pgsql with the following limitations.

There is no support for real time replica
The data copy happens always with file method
The copy_max_memory doesn't apply
The type override doesn't apply
Only init_replica is currently supported
The source connection string requires a database name
In the show_status detailed command the replicated tables counters are always zero

A stack trace capture is now added on the log and the rollbar message for better debugging.
A new parameter on_error_replay is available for the sources to set whether the replay process should skip the tables or exit on error.

This release adds the command upgrade_replica_schema for upgrading the replica schema from the version 1.8 to the 2.0.

The upgrade procedure is described in the documentation.

Please read it carefully before any upgrade and backup the schema sch_chameleon before attempting any upgrade.

Changelog from v2.0alpha2

Remove limit_tables from binlogreader initialisation, as we can read from multiple schemas we should only exclude the tables not limit
Fix wrong formatting for default value when altering a field
Add upgrade procedure from version 1.8.2 to 2.0
Improve error logging and table exclusion in replay function
Add stack trace capture to the rollbar and log message when one of the replica daemon crash
Add on_error_replay to set whether the replay process should skip the tables or exit on error
Add init_replica support for source type pgsql (EXPERIMENTAL)

pg_chameleon - v1.8.2

Published by the4thdoctor almost 7 years ago

The version 1.8.2 is the bugfix for the final release 1.8 for the branch v1.
There are few bugfixes, ans some backports from the version 2.0, which is currently in alpha.

This release upgrades the replica catalogue to the version 1.7, adding a new field t_source_schema to the table t_sources .
The field is used only for the migration to the version 2.0.0 and is updated every time a sourceid is requested from the class pg_engine.

The show_version command now displays the source schema as well.

Changelog from 1.8.1

Fix for issue #33 pg can't handle NUL characters in string
Fix exception in b64 conversion when saving a discarded row
Add t_source_schema in table t_sources, used for the upgrade to the upcoming version 2.0
change log line formatting inspired by the super clean look in pgbackrest (thanks you guys)
Update show_status to display the source schema

pg_chameleon - v2.0.0.alpha2

Published by the4thdoctor almost 7 years ago

The second alpha of the milestone 2.0 comes after a week of full debugging. This release is more usable and stable than the
alpha1. As there are changes in the replica catalog if upgrading from the alpha1 there will be need to do a drop_replica_schema
followed by a create_replica_schema. This will drop any existing replica and will require re adding the sources and
re initialise them with init_replica.

The full list of changes is in the CHANGELOG file. However there are few notable remarks.

There is a detailed display of the show_status command when a source is specified. In particular the number of replicated and
not replicated tables is displayed. Also if any table as been pulled out from the replica it appears on the bottom.

From this release there is an error log which saves the exception's data during the replay phase.
The error log can be queried with the new command show_errors.

A new source parameter replay_max_rows has been added to set the amount of rows to replay.
Previously the value was set by the parameter replica_batch_size. If upgrading from alpha1 you may need to add
this parameter to your existing configuration.

Finally there is a new class called pgsql_source, not yet functional though.
This class will add a very basic support for the postgres source type.
More details will come in the alpha3.

Changelog from v2.0alpha1

Fix wrong position when determining the destination schema in read_replica_stream
Fix wrong log position stored in the source's high watermark
Fix wrong table inclusion/exclusion in read_replica_steam
Add source parameter replay_max_rows to set the amount of rows to replay. Previously the value was set by replica_batch_size
Fix crash when an alter table affected a table not replicated
Fixed issue with alter table during the drop/set default for the column (thanks to psycopg2's sql.Identifier)
add type display to source status
Add fix for issue #33 cleanup NUL markers from the rows before trying to insert them in PostgreSQL
Fix broken save_discarded_row
Add more detail to show_status when specifying the source with --source
Changed some methods to private
ensure the match for the alter table's commands are enclosed by word boundaries
add if exists when trying to drop the table in swap tables. previously adding a new table failed because the table wasn't there
fix wrong drop enum type when adding a new field
add log error for storing the errors generated during the replay
add not functional class pgsql_source for source type pgsql
allow type_override to be empty
add show_status command for displaying the log error entries
add separate logs for per source
change log line formatting inspired by the super clean look in pgbackrest (thanks you guys)

pg_chameleon - v2.0alpha1

Published by the4thdoctor almost 7 years ago

release notes

please note this is a not production release. do not use it in production

The documentation is available at http://www.pgchameleon.org/documents_v2/index.html

This is the first alpha of the milestone 2.0.
The project has been restructured in many ways thanks to the user's feedback.
Hopefully this will make the system much simple to use.

The main changes in the version 2 are the following.

The system is Python 3 only compatible. Python 3 is the future and there is no reason why to keep developing thing in 2.7.

The system now can read from multiple MySQL schemas in the same database and replicate them it into a target PostgreSQL database.
The source and target schema names can be different.

The system now use a conservative approach to the replica. The tables which generate errors during the replay are automatically excluded from the replica.

The init_replica process runs in background unless the logging is on the standard output or the debug option is passed to the command line.

The replica process now runs in background with two separated subprocess, one for the read and one for the replay.
If the logging is on the standard output or the debug option is passed to the command line the main process stays in foreground though.

The system now use a soft approach when initialising the replica .
The tables are locked only when copied. Their log coordinates will be used by the replica damon to put the database in a consistent status gradually.

The system can now use the rollbark key and environment to setup the Rollbar integration, for a better error detection.

changelog from version 1.8

Python 3 only development
Add support for reading from multiple MySQL schemas and restore them it into a target PostgreSQL database. The source and target schema names can be different.
Conservative approach to the replica. Tables which generate errors are automatically excluded from the replica.
Daemonised init_replica process.
Daemonised replica process with two separated subprocess, one for the read and one for the replay.
Soft replica initialisation. The tables are locked when needed and stored with their log coordinates. The replica damon will put the database in a consistent status gradually.
Rollbar integration for a simpler error detection.

pg_chameleon - v1.8.1

Published by the4thdoctor almost 7 years ago

Pg_chameleon is a replication tool from MySQL to PostgreSQL developed in Python 2.7 and Python 3.3+
The system relies on the mysql-replication library to pull the changes from MySQL and covert them into a jsonb object.
A plpgsql function decodes the jsonb and replays the changes into the PostgreSQL database.

The tool requires an initial replica setup which pulls the data from MySQL in read only mode.
This is done by the tool running FLUSH TABLE WITH READ LOCK; .

The tool can pull the data from a cascading replica when the MySQL slave is configured with log-slave-updates.

Changelog from 1.8

Fix for issue #31, MySQL numeric password breaks pg_chameleon
Fix for issue #32, pg Schema name upper case is dropped to lowercase in init_replica
Fix for issue #34, Missing escape for table names

pg_chameleon - v1.8

Published by the4thdoctor about 7 years ago

Pg_chameleon is a replication tool from MySQL to PostgreSQL developed in Python 2.7 and Python 3.3+
The system relies on the mysql-replication library to pull the changes from MySQL and covert them into a jsonb object.
A plpgsql function decodes the jsonb and replays the changes into the PostgreSQL database.

The tool requires an initial replica setup which pulls the data from MySQL in read only mode.
This is done by the tool running FLUSH TABLE WITH READ LOCK; .

The tool can pull the data from a cascading replica when the MySQL slave is configured with log-slave-updates.

Changelog from 1.7

Fix wrong check in thread alive when running with --thread option
Add support for RENAME statement

pg_chameleon - v1.7

Published by the4thdoctor about 7 years ago

Pg_chameleon is a replication tool from MySQL to PostgreSQL developed in Python 2.7 and Python 3.3+
The system relies on the mysql-replication library to pull the changes from MySQL and covert them into a jsonb object.
A plpgsql function decodes the jsonb and replays the changes into the PostgreSQL database.

The tool requires an initial replica setup which pulls the data from MySQL in read only mode.
This is done by the tool running FLUSH TABLE WITH READ LOCK; .

The tool can pull the data from a cascading replica when the MySQL slave is configured with log-slave-updates.

Changelog from 1.6

add optional threaded mode for read and replay as requested in issue #14
fix for issue #16 when running init_replica convert to NULL 0000-00-00 00:00:00 values if the data type is date or datetime
Handling of datatypes when migrating schema as requested in issue #21

pg_chameleon - v1.6

Published by the4thdoctor about 7 years ago

Pg_chameleon is a replication tool from MySQL to PostgreSQL developed in Python 2.7 and Python 3.3+
The system relies on the mysql-replication library to pull the changes from MySQL and covert them into a jsonb object.
A plpgsql function decodes the jsonb and replays the changes into the PostgreSQL database.

The tool requires an initial replica setup which pulls the data from MySQL in read only mode.
This is done by the tool running FLUSH TABLE WITH READ LOCK; .

The tool can pull the data from a cascading replica when the MySQL slave is configured with log-slave-updates.

Changelog from 1.5

fix wrong table name when parsing CREATE TABLE schema_name.table_name
fix missing parse for numeric_scale in sql_utils
lock only the affected tables when running sync_tables
improve performance for the replay plpgsql function
rename change lag to read lag and add replay lag in the the show_status output
add TravisCI configuration to the source tree
add set_config for initial config dir creation (needed by the CI tests)
add a regexp match to exclude the keywords in the parse alter table
add FOREIGN KEY to the excluded keyworkds when parsing alter table
fix issue #22 add KEY to the excluded keyworkds when parsing alter table

pg_chameleon - v1.5

Published by the4thdoctor about 7 years ago

Pg_chameleon is a replication tool from MySQL to PostgreSQL developed in Python 2.7 and Python 3.3+
The system relies on the mysql-replication library to pull the changes from MySQL and covert them into a jsonb object.
A plpgsql function decodes the jsonb and replays the changes into the PostgreSQL database.

The tool requires an initial replica setup which pulls the data from MySQL in read only mode.
This is done by the tool running FLUSH TABLE WITH READ LOCK; .

The tool can pull the data from a cascading replica when the MySQL slave is configured with log-slave-updates.

Changelog from 1.4

fix wrong evaluation in table's consistent state (row and DDL)
fix wrong dimensions when building floating point data types
add support for DEFAULT value on ADD COLUMN,CHANGE,MODIFY
add indices to the t_log _replica tables in order to speedup the batch cleanup
Fix for Issue #5 add cast to char with character set in column generation in order to override collation mix. the charset is the mysql general character set
Improve logging messages to be more descriptive
Remove total_events and evaluate when writing the batch using the real number of events stored from the mysql replica
fix ALTER TABLE...CHANGE parsing to lower the data type
add warning when a table is without primary key
Fix Issue #9 add configuration check for MySQL before starting the replica
Fix Issue #10 clarify running replica log messages to describe better what's happening
Add --nolock option as per request on Issue #13 . When started with --nolock the replica doesn't create the lock file in case of error.

pg_chameleon - v1.4

Published by the4thdoctor over 7 years ago

Pg_chameleon is a replication tool from MySQL to PostgreSQL developed in Python 2.7 and Python 3.3+
The system relies on the mysql-replication library to pull the changes from MySQL and covert them into a jsonb object.
A plpgsql function decodes the jsonb and replays the changes into the PostgreSQL database.

The tool requires an initial replica setup which pulls the data from MySQL in read only mode.
This is done by the tool running FLUSH TABLE WITH READ LOCK; .

The tool can pull the data from a cascading replica when the MySQL slave is configured with log-slave-updates.

sync_replica replaced by sync_tables

The command sync_replica is now replaced by sync_tables as this new name better reflects the concept behind the process.
The command requires the option --table followed by a comma separated list of table names.

If the specified table is not present in the origin's schema the table is silently skipped.
When a table is synchronised the existing copy in the target database is dropped and recreated from scratch.
In order to get the table in consistent state the log coordinates are saved in the the t_replica_tables.
The replica process will ignore the table until the log position reaches the table's snapsot position,
ensuring a consistent state for the replica target.

Changelog from 1.3.1

add varbinary to the type conversion dictionary
fix wrong quoting when key field is surrounded by backtick `
add geometry to the supported types
add varbinary and geometry to hexify in config-example.yaml
add INDEX and UNIQUE to excluded words when parsing alter table. this prevents the ddl replica to crash when the alter table adds an index
Fix for Issue #4 add generic exception when fallback on inserts to trap unexpected data not handled by psycopg2
Replace sync_replica with sync_tables. Check the release notes for implementation.
Add --version to display the program version.
Move documentation on pgchameleon.org

pg_chameleon - Release v1.3.1

Published by the4thdoctor over 7 years ago

Bugfix emergency release v1.3.1

I discovered a regression when running the init_replica caused by a wrong handling of missing time in master coordinates. Sorry about that.

Pg_chameleon is a replication tool from MySQL to PostgreSQL developed in Python 2.7 and Python 3.3+
The system relies on the mysql-replication library to pull the changes from MySQL and covert them into a jsonb object.
A plpgsql function decodes the jsonb and replays the changes into the PostgreSQL database.

The tool requires an initial replica setup which pulls the data from MySQL in read only mode.
This is done by the tool running FLUSH TABLE WITH READ LOCK; .

The tool can pull the data from a cascading replica when the MySQL slave is configured with log-slave-updates.

sync_replica disabled

The sync_replica command do not work as expected when running in single table mode.
As the issue requires time to be fixed this release temporarly disables the sync_replica command.

Change in replica storage

The initial implementation for the relay data was to have two log tables t_log_replica_1 and t_log_replica_2 with the
replica process accessing one table at a time.

This approach allows autovacuum to take care of the unused partition meanwhile the other is written.
The method worked fine with only one replica worker. However as the flip flop between the tables is calculated indepentently
for each source this could cause unwanted bloat on the log tables if several sources are replicating all togheter.
In this scenario autovacuum will struggle to truncate the empty space in the table's end.

The pg_chameleon version 1.3 implements the log tables per source. Each source have a dedicated couple of tables still inherited from
the root partition t_log_replica.

The schema is migrated at the first run after the upgrade by pg_chameleon's integrated schema migrator.
The upgrade scripts are installed in the python specific site-packages directory.

For example if have a python 3.6 virtualenv in the directory ~/venv you'll find the upgrade files in
~/venv/lib/python-3.6/site-packages/pg_chameleon/sql/upgrade/

The migration performs the following operations.

add a field v_log_table to t_sources
add an helper plpgsql function fn_refresh_parts() which creates the source's log tables if not present
with a DO block creates the new log tables for the existing sources and copies the data from the old t_log_replica_x to the new log tables
drops the old log tables
removes the field v_log_table from t_replica_batch

Upgrade

please read carefully before attempting any upgrade

The schema upgrade happen automatically at the first run.
Because this one involves a data copy could take more time than the usual. If the process seems frozen do not stop it otherwise you may lose your replica setup .

Upgrade steps

Stop all the replica sources. The show_status command must show all the rows in stopped status
Take a backup of the schema sch_chameleon with pg_dump
Upgrade pg_chameleon with pip install pg_chameleon --upgrade
Run chameleon.py upgrade_schema --config <your_config> --debug
When the upgrade is finished start the replica process as usual

Rollback

If something goes wrong in the upgrade process you shall restore the sch_chameleon's backup,
Then you should downgrade the installation to pg_chameleon 1.2 and start the replica as usual.

Changelog from 1.2

each source now uses two dedicated log tables for better performance
set descriptive application_name in postgres process to track the replica activity
fix race condition when two sources have a batch with the same value in t_replica_batch.ts_created
add switch --debug for having the log on console with DEBUG verbosity without need to change configuration file
fix regexp for foreign keys when omitting CONSTRAINT keyword
change lag display in show_status from seconds to interval for better display
add quote_col method in sql_token class to ensure all index columns are correctly quoted
add a release_notes file for the details on the releases 1.3+
fix wrong timestamp save when updating the last received event in sources
temporarly disable sync_replica because is not working as expected

pg_chameleon - Release v1.3

Published by the4thdoctor over 7 years ago

Pg_chameleon is a replication tool from MySQL to PostgreSQL developed in Python 2.7 and Python 3.3+
The system relies on the mysql-replication library to pull the changes from MySQL and covert them into a jsonb object.
A plpgsql function decodes the jsonb and replays the changes into the PostgreSQL database.

The tool requires an initial replica setup which pulls the data from MySQL in read only mode.
This is done by the tool running FLUSH TABLE WITH READ LOCK; .

The tool can pull the data from a cascading replica when the MySQL slave is configured with log-slave-updates.

sync_replica disabled

The sync_replica command do not work as expected when running in single table mode.
As the issue requires time to be fixed this release temporarly disables the sync_replica command.

Change in replica storage

The initial implementation for the relay data was to have two log tables t_log_replica_1 and t_log_replica_2 with the
replica process accessing one table at a time.

This approach allows autovacuum to take care of the unused partition meanwhile the other is written.
The method worked fine with only one replica worker. However as the flip flop between the tables is calculated indepentently
for each source this could cause unwanted bloat on the log tables if several sources are replicating all togheter.
In this scenario autovacuum will struggle to truncate the empty space in the table's end.

The pg_chameleon version 1.3 implements the log tables per source. Each source have a dedicated couple of tables still inherited from
the root partition t_log_replica.

The schema is migrated at the first run after the upgrade by pg_chameleon's integrated schema migrator.
The upgrade scripts are installed in the python specific site-packages directory.

For example if have a python 3.6 virtualenv in the directory ~/venv you'll find the upgrade files in
~/venv/lib/python-3.6/site-packages/pg_chameleon/sql/upgrade/

The migration performs the following operations.

add a field v_log_table to t_sources
add an helper plpgsql function fn_refresh_parts() which creates the source's log tables if not present
with a DO block creates the new log tables for the existing sources and copies the data from the old t_log_replica_x to the new log tables
drops the old log tables
removes the field v_log_table from t_replica_batch

Upgrade

please read carefully before attempting any upgrade

The schema upgrade happen automatically at the first run.
Because this one involves a data copy could take more time than the usual. If the process seems frozen do not stop it otherwise you may lose your replica setup .

Upgrade steps

Stop all the replica sources. The show_status command must show all the rows in stopped status
Take a backup of the schema sch_chameleon with pg_dump
Upgrade pg_chameleon with pip install pg_chameleon --upgrade
Run chameleon.py upgrade_schema --config <your_config> --debug
When the upgrade is finished start the replica process as usual

Rollback

If something goes wrong in the upgrade process you shall restore the sch_chameleon's backup,
Then you should downgrade the installation to pg_chameleon 1.2 and start the replica as usual.

Changelog from 1.2

each source now uses two dedicated log tables for better performance
set descriptive application_name in postgres process to track the replica activity
fix race condition when two sources have a batch with the same value in t_replica_batch.ts_created
add switch --debug for having the log on console with DEBUG verbosity without need to change configuration file
fix regexp for foreign keys when omitting CONSTRAINT keyword
change lag display in show_status from seconds to interval for better display
add quote_col method in sql_token class to ensure all index columns are correctly quoted
add a release_notes file for the details on the releases 1.3+
fix wrong timestamp save when updating the last received event in sources
temporarly disable sync_replica because is not working as expected

pg_chameleon - Release v1.2

Published by the4thdoctor over 7 years ago

Pg_chameleon is a replication tool from MySQL to PostgreSQL developed in Python 2.7 and Python 3.3+
The system relies on the mysql-replication library to pull the changes from MySQL and covert them into a jsonb object.
A plpgsql function decodes the jsonb and replays the changes into the PostgreSQL database.

The tool requires an initial replica setup which pulls the data from MySQL in read only mode.
This is done by the tool running FLUSH TABLE WITH READ LOCK; .

The tool can pull the data from a cascading replica when the MySQL slave is configured with log-slave-updates.

Changelog from 1.1

fix deadlock when replicating from multiple sources:
add source id when cleaning batches
add missing marker when outputting failed mysql query in copy_tabl_data
fix wrong decimal handling in build_tab_ddl
add bool to the data dictionary
exclude ddl when coming from schemas different from the one replicated
fix wrong quoting when capturing primary key inline
add error handling in read_replica
move the version view management in the pg_engine code

pg_chameleon - Release v1.1

Published by the4thdoctor over 7 years ago

Pg_chameleon is a replication tool from MySQL to PostgreSQL developed in Python 2.7 and Python 3.3+
The system relies on the mysql-replication library to pull the changes from MySQL and covert them into a jsonb object.
A plpgsql function decodes the jsonb and replays the changes into the PostgreSQL database.

The tool requires an initial replica setup which pulls the data from MySQL in read only mode.
This is done by the tool running FLUSH TABLE WITH READ LOCK; .

The tool can pull the data from a cascading replica when the MySQL slave is configured with log-slave-updates.

Changelog from 1.0

completed docstrings in sql_util.py
fix race condition when capturing queries not tokenised that leave the binglog position unchanged

The race condition fixed in the 1.1 affects servers with low write activity and using the temporary tables.
The following statement executed on mysql with the version 1.0 will cause the read_replica function restarting from the binlog position marked by the CREATE table.

DROP TABLE IF EXISTS test;
CREATE TABLE test(
  id SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT,
  value1 VARCHAR(45) NOT NULL,
  PRIMARY KEY  (id)
)ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TEMPORARY TABLE tmp_test(
  id SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT,
  value1 VARCHAR(45) NOT NULL,
  PRIMARY KEY  (id)
)ENGINE=InnoDB DEFAULT CHARSET=utf8;
insert into tmp_test (value1) values('blah'),('blah');
insert into test (value1) values('blah');
DROP TEMPORARY TABLE if exists tmp_test ;

This is caused by the batch not marked as closed when the query is received. The batch was closed only when the query were parsed. The statements CREATE TEMPORARY TABLE are not parsed (by design) and therefore the batch were not closed correctly. The subsequent row images generated by the inserts were read at each loop causing a replay issue because of the primary key violation.

pg_chameleon - Release v1.0

Published by the4thdoctor over 7 years ago

Pg_chameleon is a replication tool from MySQL to PostgreSQL developed in Python 2.7 and Python 3.3+
The system relies on the mysql-replication library to pull the changes from MySQL and covert them into a jsonb object.
A plpgsql function decodes the jsonb and replays the changes into the PostgreSQL database.

The tool requires an initial replica setup which pulls the data from MySQL in read only mode.
This is done by the tool running FLUSH TABLE WITH READ LOCK; .

The tool can pull the data from a cascading replica when the MySQL slave is configured with log-slave-updates.

Changelog from 1.0-RC.2

Completed docstrings in pg_lib.py

pg_chameleon - Release v1.0-RC.2

Published by the4thdoctor over 7 years ago

Pg_chameleon is a replication tool from MySQL to PostgreSQL developed in Python 2.7 and Python 3.3+
The system relies on the mysql-replication library to pull the changes from MySQL and covert them into a jsonb object.
A plpgsql function decodes the jsonb and replays the changes into the PostgreSQL database.

The tool requires an initial replica setup which pulls the data from MySQL in read only mode.
This is done by the tool running FLUSH TABLE WITH READ LOCK; .

The tool can pull the data from a cascading replica when the MySQL slave is configured with log-slave-updates.

Changelog from 1.0-RC.1

Completed docstrings in global_lib.py and mysql_lib.py
Partial writing for docstrings in pg_lib.py
Restore fallback on inserts when the copy batch data fails

pg_chameleon - Release v1.0-RC.1

Published by the4thdoctor over 7 years ago

Pg_chameleon is a replication tool from MySQL to PostgreSQL developed in Python 2.7 and Python 3.3+
The system relies on the mysql-replication library to pull the changes from MySQL and covert them into a jsonb object.
A plpgsql function decodes the jsonb and replays the changes into the PostgreSQL database.

The tool requires an initial replica setup which pulls the data from MySQL in read only mode.
This is done by the tool running FLUSH TABLE WITH READ LOCK; .

The tool can pull the data from a cascading replica when the MySQL slave is configured with log-slave-updates.

Changelog from 1.0 Beta 2

add support for primay key defined as column constraint
fix regression if null constraint is omitted in a primary key column
add foreign key generation to detach replica. keys are added invalid and a validation attempt is performed.
add support for truncate table
add parameter out_dir to set the output destination for the csv files during init_replica
add set tables_only to table_limit when streaming the mysql replica
force a close batch on rotate event if binlog changes without any row events collected
fixed replica regression with python 3.x and empty binary data
added event_update in hexlify strategy
add tabulate for nice display for sources/status
logs are rotated on a daily basis
removed parameter log_append
add parameter log_days_keep to specify how many days keep the logs
feature freeze

pg_chameleon - Release v1.0-beta.2

Published by the4thdoctor over 7 years ago

Pg_chameleon is a replication tool from MySQL to PostgreSQL developed in Python 2.7 and Python 3.3+
The system relies on the mysql-replication library to pull the changes from MySQL and covert them into a jsonb object.
A plpgsql function decodes the jsonb and replays the changes into the PostgreSQL database.

The tool requires an initial replica setup which pulls the data from MySQL in read only mode.
This is done by the tool running FLUSH TABLE WITH READ LOCK; .

The tool can pull the data from a cascading replica when the MySQL slave is configured with log-slave-updates.

Changelog from 1.0 Beta 1

add detach replica with automatic sequence reset (no FK generation yet)
replica speed improvement with the exclusion of BEGIN,COMMIT when capturing the queries from MySQL
fix the capturing regexp for primary keys and foreign keys
fix version in documentation

pg_chameleon

Changelog from v2.0beta1

Changelog from v2.0alpha3

Changelog from v2.0alpha2

Changelog from 1.8.1

Changelog from v2.0alpha1

release notes

changelog from version 1.8

Changelog from 1.8

Changelog from 1.7

Changelog from 1.6

Changelog from 1.5

Changelog from 1.4

sync_replica replaced by sync_tables

Changelog from 1.3.1

Bugfix emergency release v1.3.1

sync_replica disabled

Change in replica storage

Upgrade

Upgrade steps

Rollback

Changelog from 1.2

sync_replica disabled

Change in replica storage

Upgrade

Upgrade steps

Rollback

Changelog from 1.2

Changelog from 1.1

Changelog from 1.0

Changelog from 1.0-RC.2

Changelog from 1.0-RC.1

Changelog from 1.0 Beta 2

Changelog from 1.0 Beta 1

Related Projects

pgloader

dbmate

sql_exporter

db2ch

migrate