Bot releases are visible (Hide)

dolt - 1.41.4

Published by github-actions[bot] 3 months ago

Merged PRs

dolt

8107: Optimize JSON_SET and JSON_REPLACE on IndexedJsonDocument
This PR includes a new implementation of the JSON_SET and JSON_REPLACE functions that leverage the new indexed JSON storage format.
For JSON documents that span multiple chunks, only the affected chunks need to be loaded and modified, allowing operations to scale with the size of the removed value instead of the size of the entire document.
8103: Optimize JSON_REMOVE on IndexedJsonDocument
This PR includes a new implementation of the JSON_REMOVE function that leverages the new indexed JSON storage format.
For JSON documents that span multiple chunks, only the affected chunks need to be loaded and modified, allowing operations to scale with the size of the removed value instead of the size of the entire document.

go-mysql-server

2591: Bug fix: Fix @@binlog_row_metadata, add @@binlog_row_image
Fixes an error in the definition for the @@binlog_row_metadata system variable that prevented it from being queried. Adds the @@binlog_row_image system variable that was missing.
2590: Fix databases iter for information_schema tables
Doltgres information_schema tables include the schemas for the current database, not all databases
2588: Only register version function if it doesn't already exist
2587: Fix information_schema catalog and schema names
We had already made this change for doltgres here, this just applies it to more information_schema tables
2580: Remove a duplicate column from information_schema
Just what it says on the tin. This duplicate column causes problems for DuckDB when attempting to connect to doltdb databases.
2578: Catch panics in listener
We had several goroutines in which panics would crash the server process. These were being triggered by panics due to errors in doltgres. Added a recover block to each goroutine.

vitess

357: Bug fix: Send an error response when the server fails to handle COM_BINLOG_DUMP_GTID
A MySQL primary needs to be able to send back an error response when handling the COM_BINLOG_DUMP_GTID command. Previously, when the integrator returned an error, it was logged in the primary server logs, but it was not being sent back to the replica who sent the command. This change causes an error packet to be sent to the replica, containing the details of the error the integrator returned.
This change is difficult to test in isolation, but I have tests in dolt that will exercise this codepath.
356: Bug fix: Off-by-one error when parsing multiple statements
An off-by-one error in multistatement parsing prevents us from parsing multistatements without a space between the delimiter and the next statement. For example: "SELECT 1;SELECT 2;" would previously be parsed as "SELECT 1;S" and "ELECT 2;".
Found while testing changes for https://github.com/dolthub/driver/issues/28

Closed Issues

8109: ArmDocker

dolt - 1.41.3

Published by github-actions[bot] 4 months ago

Merged PRs

dolt

8102: Bug fix: binlog heartbeat nextLogPosition field
Heartbeat binlog events must have a correct NextLogPosition field that matches up with the previous events that have been sent in the stream. If not, the replica will shutdown the binlog stream. https://github.com/dolthub/dolt/pull/8087 fixed this issue, but didn't account for when a heartbeat event is sent after the initial Format Description event, but before any user initiated requests. The way to trigger this is to start replica; on the replica, then don't run any commands on the primary and let the first heartbeat go out after the binlog stream has been up for 30s.
8101: go.mod: Migrate from gopkg.in/square/go-jose.v2 to gopkg.in/go-jose/go-jose.v2. Bump version. Picks up fix for CVE-2024-28180.
8088: [no-release-info] Add additional tests for manipulating large JSON documents and fix corner case bugs in JSON_LOOKUP and JSON_INSERT
This PR adds additional tests for calling JSON_INSERT on large JSON documents. It also fixes three issues with IndexedJsonDocuments:
1. Some operations are not supported by the new optimized implementation for JSON_LOOKUP, such as wildcards on array paths (eg $[*]). Instead of returning an error, we detect the error and fall back on the original implementation.
2. Attempting to insert a value into a document could cause an infinite loop.
3. We would fail to read some keys from an IndexedJsonDocument's StaticMap if the document contained arrays.

go-mysql-server

2583: [stats] Disable histogram bucket merging for now because it mutated shared memory
Merging buckets in the current format is unsafe:
- we collect statistics for an index where two buckets have overlapping values
- we execute a join using the index with overlapping values, and use a merge algorithm to combine those buckets. The merged bucket is synthetic, but the statistics used for the join is also synthetic, so this all works as expected.
- a future indexscan selects the compressed range from before, accessing one of the synthetic buckets created by the join
- we error invalid bucket type: *stats.Bucket at the end of the indexscan when adding the filtered histogram with a synthetic back to the implementor-type statistic
  Edited mergeOverlappingBuckets to not share memory, but also I'm not sure if merging buckets is a common performance win in most cases, so disabling for now
2580: Remove a duplicate column from information_schema
Just what it says on the tin. This duplicate column causes problems for DuckDB when attempting to connect to doltdb databases.

Closed Issues

Performance

Read Tests	MySQL	Dolt	Multiple
covering_index_scan	2.07	2.97	1.4
groupby_scan	13.7	17.32	1.3
index_join	1.34	5.28	3.9
index_join_scan	1.27	2.57	2.0
index_scan	34.33	54.83	1.6
oltp_point_select	0.18	0.46	2.6
oltp_read_only	3.49	7.7	2.2
select_random_points	0.33	0.77	2.3
select_random_ranges	0.39	0.9	2.3
table_scan	34.95	56.84	1.6
types_table_scan	75.82	144.97	1.9
reads_mean_multiplier			2.1

Write Tests	MySQL	Dolt	Multiple
oltp_delete_insert	8.13	6.09	0.7
oltp_insert	3.82	3.02	0.8
oltp_read_write	8.58	13.95	1.6
oltp_update_index	3.89	3.07	0.8
oltp_update_non_index	3.89	3.02	0.8
oltp_write_only	5.37	6.43	1.2
types_delete_insert	7.7	6.67	0.9
writes_mean_multiplier			1.0

TPC-C TPS Tests	MySQL	Dolt	Multiple
tpcc-scale-factor-1	98.43	32.02	3.1
tpcc_tps_multiplier			3.1

Overall Mean Multiple	2.07

dolt - 1.41.2

Published by github-actions[bot] 4 months ago

Merged PRs

dolt

8093: go/libraries/doltcore/remotestorage: internal/reliable: Recv: Fix a race where a completed state machine run and a canceled parent context could return a nil response message with a nil error.
8090: dolt sql shell slash redux
8087: Bug fix: Use correct log position in Dolt to MySQL replication heartbeats
Ensure heartbeat events sent from a Dolt primary to a MySQL replica have the latest nextLogPosition populated, otherwise the MySQL replica will close the binlog event stream.
8086: define schema for dolt_schemas table
https://github.com/dolthub/doltgresql/pull/454 depends on this PR.
8082: /docker/{docker-entrypoint.sh,serverDockerfile}: change image to pass all args to dolt sql-server command
This PR fixes https://github.com/dolthub/dolt/issues/8079. Now when running dolthub/dolt-sql-server if the args dolt sql-server are passed to the image, it will error. This will also prevent accidentally starting two Dolt servers in the container.
8081: Fixed keyless secondary indexing for Doltgres
- Companion PR: https://github.com/dolthub/doltgresql/pull/452
  This PR fixes two issues with creating secondary indexes for Doltgres types. The first deals with handlers, as we were not adding a nil handler for the additional hash type, which would cause a panic as the counts were not equal (all non-Extended types should have a matching nil handler).
  The second issue was due to the reuse of an ExtendedTupleComparator. When creating a new ExtendedTupleComparator, we pass in the previous TupleTypeHandler to handle all non-Extended types. If the previous TupleTypeHandler was ExtendedTupleComparator and the new one was also ExtendedTupleComparator, then we could end up with a misinterpretation of data that could lead to incorrect results, as the handler assumed a different type than the actual type. This has been changed so that ExtendedTupleComparator will always use the inner comparator of a previous ExtendedTupleComparator. For now this will always be the default comparator, but if we ever add another one, then this should properly handle that change.
8080: Re-enable doltgres sysbench scripts
8078: Archive index rework to make loading faster
The initial impl of archive indexes over optimized for space. This resulted in being 10x slower to load the index of archives than noms table files. To address this:
- Dropped the end to end compression of the index
- Dropped the use of var ints for offset deltas and chunk refs
- Altered the use of byte span offsets, and instead used a end-offset approach which requires no delta processing on load.
- Used only slices of primitive types in the index memory. Constant time read path with a little more complexity, but allows us to read directly off disk into memory.
  Testing indicates that on a 41 Gb archive file, this returned load performance to match classic table files, and the size of the index increased by about 350Mb (total ~ 1Gb)

8077: /go/libraries/doltcore/remotestorage/chunk_fetcher.go: fix nil pointer
We observe dolthubapi can crash with the following nil pointer error:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x48 pc=0x29e14d1]
goroutine 399548427 [running]:
github.com/dolthub/dolt/go/libraries/doltcore/remotestorage.fetcherRPCDownloadLocsThread.func3()
external/com_github_dolthub_dolt_go/libraries/doltcore/remotestorage/chunk_fetcher.go:266 +0xf1
golang.org/x/sync/errgroup.(*Group).Go.func1()
external/org_golang_x_sync/errgroup/errgroup.go:78 +0x56
created by golang.org/x/sync/errgroup.(*Group).Go in goroutine 399548420
external/org_golang_x_sync/errgroup/errgroup.go:75 +0x96

This pr aims to prevent this.

8076: Bump golang.org/x/image from 0.10.0 to 0.18.0 in /go
Bumps golang.org/x/image from 0.10.0 to 0.18.0.
8073: Added schema to index creation
In Doltgres, whenever we would create an index, we would use the empty schema as the destination (the default value for the schema name). This meant that the updated table with an index was saved into the empty schema, which is incorrect since Doltgres always has a schema. This adds the schema to index creation, along with several other locations that it should be in.

go-mysql-server

2583: [stats] Disable histogram bucket merging for now because it mutated shared memory
Merging buckets in the current format is unsafe:
- we collect statistics for an index where two buckets have overlapping values
- we execute a join using the index with overlapping values, and use a merge algorithm to combine those buckets. The merged bucket is synthetic, but the statistics used for the join is also synthetic, so this all works as expected.
- a future indexscan selects the compressed range from before, accessing one of the synthetic buckets created by the join
- we error invalid bucket type: *stats.Bucket at the end of the indexscan when adding the filtered histogram with a synthetic back to the implementor-type statistic
  Edited mergeOverlappingBuckets to not share memory, but also I'm not sure if merging buckets is a common performance win in most cases, so disabling for now
2581: [stats] populate types for nil zeroing
2577: calling JSON_EXTRACT and JSON_VALUE with a path that has an out-of-bounds array access should return SQL NULL, not an error.
The jsonpath module returns an error when performing a lookup with an out-of-bounds array index. We need to capture that error and return nil for the lookup operation instead.
2576: fix type and precision for unix_timestamp
builds off of: https://github.com/dolthub/go-mysql-server/pull/2573
2572: fix for table_catalog for information_schema.tables

vitess

355: New functions to create PreviousGtids events, and to update event checksum

Closed Issues

8079: dolthub/dolt-sql-server image doesnt work correctly when regular dolt commands are supplied to it as arguments
8054: Checked out a branch but data is still obtained from the main branch
8051: Flyway and dolt: missing performance_schema
8052: select from a subquery with information_schema: command denied to user 'restadmin'@'%'

Performance

Read Tests	MySQL	Dolt	Multiple
covering_index_scan	2.07	2.91	1.4
groupby_scan	13.22	17.01	1.3
index_join	1.34	5.37	4.0
index_join_scan	1.27	2.57	2.0
index_scan	34.33	53.85	1.6
oltp_point_select	0.18	0.46	2.6
oltp_read_only	3.49	7.56	2.2
select_random_points	0.34	0.75	2.2
select_random_ranges	0.39	0.89	2.3
table_scan	34.33	54.83	1.6
types_table_scan	74.46	142.39	1.9
reads_mean_multiplier			2.1

Write Tests	MySQL	Dolt	Multiple
oltp_delete_insert	8.13	6.09	0.7
oltp_insert	3.75	3.02	0.8
oltp_read_write	8.58	13.95	1.6
oltp_update_index	3.89	3.07	0.8
oltp_update_non_index	3.89	3.02	0.8
oltp_write_only	5.37	6.43	1.2
types_delete_insert	7.7	6.67	0.9
writes_mean_multiplier			1.0

TPC-C TPS Tests	MySQL	Dolt	Multiple
tpcc-scale-factor-1	99.28	32.71	3.0
tpcc_tps_multiplier			3.0

Overall Mean Multiple	2.03

dolt - 1.41.1

Published by github-actions[bot] 4 months ago

Merged PRs

dolt

8062: add SchemaName to DatabaseSchema interface
Depends on https://github.com/dolthub/go-mysql-server/pull/2569
8059: Add initial, no-op implementation for ListBinaryLogs API changes
Adds a simple no-op implementation for DoltBinlogPrimaryController.ListBinaryLogs to keep it in sync with API changes in GMS.
Depends on https://github.com/dolthub/go-mysql-server/pull/2567

go-mysql-server

2572: fix for table_catalog for information_schema.tables
2570: Added infoschema to privilege check
This fixes: https://github.com/dolthub/dolt/issues/8052
In the analyzer, we make a check to determine if we're querying the information schema. The queries provided in the issue that do not work are regarded as subqueries, and these are explicitly ignored. This causes the privilege checker to look for the information schema tables by name, which is not the intended behavior.
This PR just adds an additional information schema check at a lower layer, which should remove the inconsistencies found from the queries provided in the issue.
2569: add SchemaName to DatabaseSchema interface
This method returns schema name. Schema name for Doltgres and database name for Dolt.
2567: Add support for SHOW BINARY LOGS
When the SHOW BINARY LOGS statement is executed, GMS will invoke the registered BinlogPrimaryController to ask it for the list of binary logs and send them back to the client.
2566: Renamed index functions and enums to be public
This renames the indexScanOp enum so that it's accessible from outside the package, and also replaces the type switch in newLeaf with a replaceable function that can be overridden from outside the package to support types that are not native to GMS.
2565: When a subroutine (like CREATE PROCEDURE contains a subqeury, correctly index into it.
Fixes https://github.com/dolthub/dolt/issues/8028
We didn't have tests for constructs containing nested subroutines (like CREATE PROCEDURE foo() CREATE PROCEDURE bar() SELECT 1;
This PR adds tests for those, but tests where we don't currently match MySQL are disabled. There's enough enabled tests to show that statements like this no longer cause a panic.
Making sure that we match MySQL for these statements should be done in a follow-up.
2564: validate all string types for unicode
MySQL throws errors on invalid utf8 encoded strings. A previous PR detected those, but only for []byte string conversions. Prepared statements receive the string parameters as a string type, so this PR moves the check for all conversions.
Additionally, it adds bindings to AssertErr and AssertErrWithCtx methods.
related pr: https://github.com/dolthub/go-mysql-server/pull/2562
dolt pr: https://github.com/dolthub/dolt/pull/8060
fixes https://github.com/dolthub/dolt/issues/8040
2563: small changes to stats bucket counting
Joins don't track output MCVs anymore, they aren't in a format where they'd be useful anyways. Also assume MCVs are sorted for faster matching.

Closed Issues

8052: select from a subquery with information_schema: command denied to user 'restadmin'@'%'
8040: Inserting BINARY into VARCHAR(26) should result in error Incorrect string value

dolt - 1.41.0

Published by github-actions[bot] 4 months ago

This PR includes a backwards incompatible change to table statistics type encoding. Old statistics will not load with the new client, and will have to be manually updated with ANALYZE, or deleted with call dolt_stats_drop(), or removed from the filesystem. Additionally, table statistics will load on startup by default for databases with fewer than 2 million rows. This is usually a one-time penalty of a few seconds.

Merged PRs

dolt

8056: Revert "Merge pull request #7940 from dolthub/macneale4/slash-cmds"
This reverts commit dd7071f8711a6d33bbd5aa8d64ea6b8a90094456, reversing changes made to 02f450318cb34a09c448bbde5640d512d8408931.
Reverting slash commands to fix: https://github.com/dolthub/dolt/issues/8050 and https://github.com/dolthub/dolt/issues/8022
8036: [statspro] Bootstrap database statistics once on startup
Load database statistics once on sql engine startup. If auto refresh is enabled, bootstrap is not performed. Behavior is on by default and can be turned off:
```
dolt sql -q "set @@PERSIST.dolt_stats_bootstrap_enabled = 1;"
```
(calling the command above with non-empty tables will still bootstrap statistics once)
This includes a small change to the way we encode column types for stats. We previously split using a comma",", but enums and others can include commas so we use a line break now "/n". Old versions of stats will fail to load with the newer version.

Closed Issues

8050: Why the Division Operator can't work in dolt's sql?
8028: Panic on invalid query

dolt - 1.40.3

Published by github-actions[bot] 4 months ago

Merged PRs

dolt

8041: Truncate MCVs
Sort and truncate MCVs. Only keep values whose frequency is > twice the uniform frequency. This prevents us from manually summing non-outliers (which is expensive).
8025: [prolly] Float keyRange increment bug
Incrementing the [n, n+1) key range is a lot faster than a binary search with a tuple comparison callback. But it is subject to at least two edge cases where (n+1) is not a valid stop range: (1) n+1 == n, because of precision loss, and (2) n+1 < n, because of overflow.
I added a series of GMS tests here: https://github.com/dolthub/go-mysql-server/pull/2554. I couldn't find a DECIMAL failure case, I think DECIMAL always encodes a valid 1's place, and is not subject to overflow AFAICT.

go-mysql-server

2563: small changes to stats bucket counting
Joins don't track output MCVs anymore, they aren't in a format where they'd be useful anyways. Also assume MCVs are sorted for faster matching.
2562: throw error on invalid utf8 encoding for strings
fixes https://github.com/dolthub/dolt/issues/8040
2561: Fixes unexpected timezone converting when passing TIMESTAMP to unix_timestamp()
see #2111
2560: fix GetField indexes for UpdateJoin with Update Trigger
This PR addresses an issue where we were incorrectly assigning GetField indexes to an update join query.
The fix involved:
- adding a case for triggerIters to rowUpdateAccumulator
- not picking ResolvedTable references under SubqueryAliases when there are multiple
- correctly setting the scope node for update joins
  fixes https://github.com/dolthub/dolt/issues/7943
2559: Implement support for DECLARE CONTINUE HANDLER
Fixes https://github.com/dolthub/dolt/issues/7971
Previously, we would always terminate a block when encountering an error, even if there's a matching handler. Additionally, there was no mechanism to resume an error that happened inside a LOOP construct.
This correctly implements DECLARE CONTINUE HANDLER by making the following changes:
- Checks for handlers while executing the Block node instead of the BeginEnd node.
- For DECLARE EXIT HANDLER, the Block returns a special error value that propagates to the containing BeginEnd node in order to terminate just that node.
2556: Compute GetField indexes in procedure if-conditions.
Fixes https://github.com/dolthub/dolt/issues/7994
It seems like we aren't running the assignExecIndexes analysis pass on if-conditions when invoking stored procedures, which can cause execution failures if the condition has a sub-expression that has a GetField node.
Fixing this revealed a related issue: when constructing the scope of the if-condition to determine the correct indexes, we were incorrectly including any columns from the condition's body in the scope. This was also causing incorrect index calculations for GetFields in the if-condition, and is also fixed here.
(This only affected conditionals in stored procedures, not conditionals in triggers, because the analysis has a separate execution path for each; analyzeProcedureBodies is not called for triggers.)
2555: [stats] simplify stats comparison and mcv logic
Lazier comparison logic. Skip promoting/converting types when the index types match.
Remove an expensive and seemingly unnecessary bucket compression step that was re-evaluating mcvs.
2552: support VALUES statement
fixes https://github.com/dolthub/dolt/issues/8012
syntax: https://github.com/dolthub/vitess/pull/354

vitess

355: New functions to create PreviousGtids events, and to update event checksum
354: support VALUES statement
This PR add syntax support for VALUES statment as an alias for SELECT * FROM ....
We are still missing SELECT (VALUES ...) (support for values as a select_expression).
syntax for https://github.com/dolthub/dolt/issues/8012

Closed Issues

8034: Users are able to create branches with a "-" at the start. If you try to delete the branch after that, it looks like dolt is understanding it as an option for dolt_branch
8040: Inserting BINARY into VARCHAR(26) should result in error Incorrect string value
7943: Update with subquery join clause causes field index error/panic for table with trigger
8042: mysql:latest - missing error: [HY000][1524] Plugin 'mysql_native_password' is not loaded
7971: NOT FOUND handlers in procedures cause ERROR 1105 (HY000): EOF
7994: "Unable to find field with index" error on INSERT in procedure following IF (SELECT ...)

Performance

Read Tests	MySQL	Dolt	Multiple
covering_index_scan	2.07	2.91	1.4
groupby_scan	13.22	17.32	1.3
index_join	1.34	5.37	4.0
index_join_scan	1.27	2.22	1.7
index_scan	34.33	53.85	1.6
oltp_point_select	0.18	0.52	2.9
oltp_read_only	3.49	8.28	2.4
select_random_points	0.33	0.81	2.5
select_random_ranges	0.39	0.97	2.5
table_scan	34.95	54.83	1.6
types_table_scan	75.82	142.39	1.9
reads_mean_multiplier			2.2

Write Tests	MySQL	Dolt	Multiple
oltp_delete_insert	8.13	6.09	0.7
oltp_insert	3.82	3.02	0.8
oltp_read_write	8.58	15.0	1.7
oltp_update_index	3.89	3.13	0.8
oltp_update_non_index	3.89	3.07	0.8
oltp_write_only	5.37	6.55	1.2
types_delete_insert	7.7	6.79	0.9
writes_mean_multiplier			1.0

TPC-C TPS Tests	MySQL	Dolt	Multiple
tpcc-scale-factor-1	98.85	32.66	3.0
tpcc_tps_multiplier			3.0

Overall Mean Multiple	2.07

dolt - 1.40.2

Published by github-actions[bot] 4 months ago

Merged PRs

dolt

7963: Create pg_catalog by default for Doltgres
This makes it such that pg_catalog is created by default when Doltgres is using Dolt. In addition, adds a new function to hook into schema functionality.

go-mysql-server

2552: support VALUES statement
fixes https://github.com/dolthub/dolt/issues/8012
syntax: https://github.com/dolthub/vitess/pull/354
2551: unwrap parenthesized table references
fixes https://github.com/dolthub/dolt/issues/8009

vitess

354: support VALUES statement
This PR add syntax support for VALUES statment as an alias for SELECT * FROM ....
We are still missing SELECT (VALUES ...) (support for values as a select_expression).
syntax for https://github.com/dolthub/dolt/issues/8012
353: allow backticks in system and user variables
This PR allows the use of backticks in system and user variables.
We are more lenient than MySQL when it comes to backticks in set statements.
For example, we allow set @abc.def = 10, while MySQL throws an error.
This is because we treat this as a qualified column identifer and automatically strip the backticks.
test bump https://github.com/dolthub/go-mysql-server/pull/2548
fixes https://github.com/dolthub/dolt/issues/8010

Closed Issues

8012: VALUES statement not supported

Performance

Read Tests	MySQL	Dolt	Multiple
covering_index_scan	2.07	2.81	1.4
groupby_scan	13.46	17.32	1.3
index_join	1.37	5.37	3.9
index_join_scan	1.27	10.84	8.5
index_scan	34.95	53.85	1.5
oltp_point_select	0.18	0.46	2.6
oltp_read_only	3.49	7.56	2.2
select_random_points	0.34	0.75	2.2
select_random_ranges	0.39	0.9	2.3
table_scan	34.95	54.83	1.6
types_table_scan	75.82	137.35	1.8
reads_mean_multiplier			2.7

Write Tests	MySQL	Dolt	Multiple
oltp_delete_insert	7.98	6.09	0.8
oltp_insert	3.82	3.02	0.8
oltp_read_write	8.58	13.95	1.6
oltp_update_index	3.89	3.07	0.8
oltp_update_non_index	3.89	3.02	0.8
oltp_write_only	5.37	6.32	1.2
types_delete_insert	7.7	6.67	0.9
writes_mean_multiplier			1.0

TPC-C TPS Tests	MySQL	Dolt	Multiple
tpcc-scale-factor-1	99.66	26.16	3.8
tpcc_tps_multiplier			3.8

Overall Mean Multiple	2.50

dolt - 1.40.1

Published by github-actions[bot] 4 months ago

Previous releases 1.39.5 and 1.40.0 contained a bug when updating floats that would produce incorrect data. The change that caused this bug has been reverted in this release. Releases 1.39.5 and 1.40.0 have been deleted. If you are using those releases, we highly encourage you to upgrade to this release.

Note, only tables containing float types would be effected by the above bug and then only if a value was updated. The effected releases were only in the wild for 48 hours so we think the impact of this bug is small. If you are impacted by the bug, please come by our Discord and we will help further.

The bug was caught by our nightly fuzzer testing.

https://github.com/dolthub/fuzzer

Merged PRs

dolt

8024: Revert "[prolly] filteredIter optimization for exact prefix ranges (#…
…7966)"
This reverts commit 6ae4251c9d85916c83a95291282d5ebd5ff4089a.
8018: Archive DDict cache and multi-file bug fixes
Two primary issues addressed in the dolt admin archive command:
1. Add caching to dictionaries. This improved performance significantly.
2. Fix multiple bugs related to having multiple table files. That was a gap in testing, so added a bats test for the command.
8001: Feature: Support restore subcommand in dolt_backup()
The dolt_backup() stored procedure now supports the restore subcommand. Customers can use this support to create a new database from an existing backup, or to sync an existing database from a backup. Note that the restore subcommand currently requires root/superuser access to execute, since it can change database state (particular when the --force argument is used).
Example usage to create a database named db1 from a backup on disk:
```
call dolt_backup('restore', 'file:///opt/local/dolt-backups/db1', 'db1');
```
Related to https://github.com/dolthub/dolt/issues/7993
Fixes https://github.com/dolthub/dolt/issues/6074
7999: Generate TEMPORARY TABLE tags the same as normal TABLEs
This PR fixes this particular collision and makes collisions with other temporary tables more unlikely, probably, by using the deterministic random number generator used for generating tags for normal persisting tables.
fixes https://github.com/dolthub/dolt/issues/7995
7990: support auto_increment on temporary tables
fixes https://github.com/dolthub/dolt/issues/7972
7988: /.github/scripts/fuzzer/get-fuzzer-job-json.sh: add app label to fuzzer
7966: [prolly] filteredIter optimization for exact prefix ranges
Index range iteration uses a callback that is arbitrarily flexible but expensive. I changed index table access to only perform partial index scans for complete prefixes, and when the prefix fields equality conditions the generality of the index range callback is overkill. We just need to scan from the partial key (field1, ..., fieldn, nil, ...) to one higher than the partial key (field1, fieldn+1, nil, ...).
This PR differentiates between RangeField.StrictKey and .Equal attributes to differentiate max-1-row and an equality restriction.
Still need to do follow-up tracing, but this is in response to the queries from TPC-C below. The string ones are much more common. Each of these use a set of equality filters than only partially completes a secondary index prefix. All of them spend ~5ms of CPU time executing Range.Matches, which is mostly eliminated with this PR.
```
SELECT o_entry_d FROM orders1  WHERE o_w_id = 1  AND o_d_id = 5  AND o_c_id = 1891  ORDER BY o_id DESC;
SELECT c_id  FROM customer1 WHERE c_w_id = 1 AND c_d_id= 6 AND c_last='ABLECALLYABLE' ORDER BY c_first;
SELECT o_id, o_carrier_id, o_entry_d FROM orders1 WHERE o_w_id = 1 AND o_d_id = 9 AND o_c_id = 1709 ORDER BY o_id DESC
```
7914: Feature: Binlog replication
Initial support for Dolt to stream binlog events to a MySQL replica.
In this initial iteration, binlog events are streamed directly to connected replicas, instead of being written to a log file first. This enables customers to test out the initial binlog replication support, but it means that replicas will only receive the events that happen while they are connected, since they are not persisted in a log file yet. The next iteration will persist binlog events to a log file and will enable replicas to receive events that happened while they were not connected.
To enable binlog replication, you must persisted the system variables below. Similar to Dolt's other replication formats, the Dolt server must come up with the replication system variables set in order for replication to be enabled. You cannot set these system variables on a running Dolt sql-server to turn on binlog replication – you must persist the values and then restart the sql-server.
```
SET @@PERSIST.log_bin=1;
SET @@PERSIST.enforce_gtid_consistency=ON;
SET @@PERSIST.gtid_mode=ON;
```
Related to https://github.com/dolthub/dolt/issues/7512
7912: Add IndexedJsonDocument, a JSONWrapper implementation that stores JSON documents in a prolly tree with probabilistic hashing.
tl;dr: We store a JSON document in a prolly tree, where the leaf nodes of the tree are blob nodes with each contain a fragment of the document, and the intermediate nodes are address map nodes, where the keys describe a JSONPath.
The new logic for reading and writing JSON documents is cleanly separated into the following files:
IndexedJsonDocument - The new JSONWrapper implementation. It holds the root hash of the prolly tree.
JsonChunker - A wrapper around a regular chunker. Used to write new JSON documents or apply edits to existing documents.
JsonCursor - A wrapper around a regular cursor, with added functionality allowing callers to seek to a specific location in the document.
JsonScanner - A custom JSON parser that tracks that current JSONPath.
JsonLocation - A custom representation of a JSON path suitable for use as a prolly tree key.
Each added file has additional documentation with more details about the individual components.
Throughout every iteration of this project, the core idea has always been to represent a JSON document as a mapping from JSONPath locations to the values stored at those locations, then we could store that map in a prolly tree and get all the benefits that we currently get from storing tables in prolly trees: fast diffing and merging, fast point lookups and mutations, etc.
This goal has three major challenges:
- For deeply nested JSON documents, simply listing every JSONPath requires asymptotically more space than the original document.
- We need to do this in a way that doesn't compromise performance on simply reading JSON documents from a table, which I understand is the most common use pattern.
- Ideally, users should not need to migrate their databases, or update their clients in order to read newer dbs, or have to choose between different configurations based on their use case.
  This design achieves all three of these requirements:
- While it requires additional storage, this additional storage cannot exceed the size of the original document, and is in practice much smaller.
- It has indistinguishable performance for reading JSON documents from storage, while also allowing asymptotically faster diff and merge operations when the size of the changes is much smaller than the size of the document. (There is a cost: initial inserts of JSON documents are currently around 20% slower, but this is a one-time cost that does not impact subsequent reads and could potentially be optimized further.)
- Documents written by the new JSONChunker are backwards compatible with current Dolt binaries and can be read back by existing versions of Dolt. (Although they will have different hashes than equivalent documents that those versions would write.)

go-mysql-server

2551: unwrap parenthesized table references
fixes https://github.com/dolthub/dolt/issues/8009
2546: Add support for tracking the Aborted_connects status variable
Adds support for MySQL's Aborted_connects status variable.
Depends on: https://github.com/dolthub/vitess/pull/351
2542: When casting json to a string, always call StringifyJSON.
This ensures we match MySQL.
We previously weren't calling StringifyJSON in ConvertToString because that same method was being used when printing JSON to the screen or a MySQL client, which favored speed over matching MySQL exactly. But for casts we must be precise.
By adding an extra case to StringType.SQL we can distinguish between these cases and handle them properly.
2541: resolve default values for views
This was somewhat of a regression caused by https://github.com/dolthub/go-mysql-server/pull/2465.
However, before that PR views always had NULL as their default values, which did not match MySQL.
Now, we just resolve the default values in the schema, similar to ResolvedTables.
fixes https://github.com/dolthub/dolt/issues/7997
2540: [planbuilder] More update join table name validation
2539: fix UPDATE IGNORE ... JOIN
fixes: https://github.com/dolthub/dolt/issues/7986
2534: Implement row alias expressions (INSERT ... VALUES (...) AS new_tbl ON DUPLICATE x = new_tbl.x)
When inserting values, the user can specify names for both the source table and columns which are used in ON DUPLICATE expressions. It looks like either of the below options:
```
INSERT INTO tbl VALUES (1, 2) AS tbl_new ON DUPLICATE KEY b = tbl_new.b;
INSERT INTO tbl VALUES (1, 2) AS tbl_new(a_new, b_new) ON DUPLICATE KEY b = b_new;
```
This replaces the previous (now-deprecated) syntax:
```
INSERT INTO tbl VALUES (1, 2) ON DUPLICATE KEY b = VALUES(b);
```
Supporting both syntaxes together was non-trivial because it means there's now two different ways to refer to the same column. While he had an existing way to "redirect" one column name to another, this only worked for unqualified names (no table name), and it overrode the normal name resolution rules, which meant we would fail to detect cases that should be seen as ambiguous.
Previously, we would implement references to the inserted values by using a special table named "__new_ins". I implemented this by keeping that as the default, but using the row alias instead of one was provided. We then create a map from the destination table names to column aliases, and use that map to rewrite expressions that appear inside the VALUES() function.

vitess

353: allow backticks in system and user variables
This PR allows the use of backticks in system and user variables.
We are more lenient than MySQL when it comes to backticks in set statements.
For example, we allow set @abc.def = 10, while MySQL throws an error.
This is because we treat this as a qualified column identifer and automatically strip the backticks.
test bump https://github.com/dolthub/go-mysql-server/pull/2548
fixes https://github.com/dolthub/dolt/issues/8010
352: Add support for the CONSTRAINT keyword when adding a foreign key without a constraint name
Customer issue: https://github.com/dolthub/dolt/issues/8008
351: Add ConnectionAborted() callback to Handler interface
In order to support the Aborted_connects status variable, GMS needs to be notified when a connection attempt is aborted in the Vitess layer. This change adds a ConnectionAborted() callback method to Vitess' Handler interface and calls it whenever a connection attempt errors out before it's fully established.
Coordinated with https://github.com/dolthub/go-mysql-server/pull/2546
350: Refactoring BinlogStream type into BinlogMetadata
The mysql.BinlogStream type from Vitess was a little awkward to use, and seems to have been mostly intended as test code. This gives it a more descriptive name and makes it a little easier to pass around struct copies without concurrency issues from a shared instance.

Closed Issues

8011: Deprecated := assignment syntax in UPDATE queries causes syntax error in Dolt
8009: Parenthesised table references in JOIN clauses cause syntax errors if not followed by nested JOINs
8010: Backtick escaping doesn't work for variables
8008: ADD CONSTRAINT FOREIGN KEY causes syntax error in Dolt
7993: After Dolt CLI restore procedure database is not visible through the SQL client
7638: Syntax Error Occurs When Using AS Clause with ON DUPLICATE KEY UPDATE
6074: Support CALL DOLT_RESTORE() and support a -f force option
7995: Creating temporary tables can cause tag collisions
7997: error: plan is not resolved because of node '*plan.ShowColumns' when executing SHOW FULL COLUMNS or DESCRIBE for specific views
7958: UPDATE ... JOIN fails for tables containing capital letters
7972: Temporary tables don't support AUTO_INCREMENT
7986: UPDATE IGNORE ... JOIN queries fail with "failed to apply rowHandler" error
7973: dolt pull fails in the presence of ignored tables
7961: error reading server preface: http2: frame too large
7957: Dolt returns wrong number of affected rows for UPDATE ... JOIN with clientFoundRows=true
7956: Foreign keys disappear after merge for tables created with FOREIGN_KEY_CHECKS=0
7959: Auto-generated FK names don't match MySQL for renamed tables
7960: Auto-generated index names don't match MySQL for composite keys

dolt - 1.39.4

Published by github-actions[bot] 5 months ago

Merged PRs

dolt

7979: Allow pulling from a remote if the only changes are to ignored tables.
This loosens the restrictions for pulling from remotes: instead of requiring that there are no working changes, we allow working changes but only to ignored tables.
If there are conflicting changes to ignored tables (which is rare, since ignored tables shouldn't be pushed to remotes in the first place), this will still abort the pull later when it's computing the new root hash.
This is, IMO, better than what git does. Git will just overwrite the ignored files without warning.
7977: integration-tests/bats: Add some waits on the remotesrv_pid exit for bats tests which spawn a background remotesrv.
Attempts to fix some observed flakiness in bats tests.
7975: implemented dolt_hashof_db function
Implemented dolt_hashof_db() function which returns the root hash of a database.
7967: go/libraries/doltcore/remotesrv: grpc.go: Respect X-Forwarded-Proto when generating HTTP download links in the gRPC server.
Fixes #7961.
7965: Bug fix: Allow unresolved FKs to merge with resolved FKs
Dolt foreign keys can be in a "resolved" or "unresolved" state. A resolved FK has resolved the table and columns it references, and contains unique identifiers for the referenced columns. An unresolved FK only knows the table and column names that it references. Because of these two states, the way Dolt matches FKs differs depending on whether each key is resolved or unresolved.
Dolt has logic (ForeignKeyCollection.GetMatchingKey()) to match a resolved FK with an unresolved FK, but this function didn't support matching an unresolved FK with a resolved FK. That code assumed that the ForeignKeyCollection would always be from an ancestor root value and therefore it wasn't valid for the ancestor to be resolved, while a more recent root value was unresolved. However, since then, we have used this logic in our root merging logic that breaks that assumption.
In a multi-session environment, one client can create a table with an unresolved FK, then a second session can load that table, resolve the FK, and commit the changes to disk. If the first session still contains references to the unresolved FK, then when it goes to commit, Dolt's merge logic wasn't able to match the unresolved FK in the session with the resolved FK that was written to disk, and the FK constraints were silently dropped from the new table version.
This PR adds a new parameter to ForeignKeyCollection.GetMatchingKey() to allow the caller to control whether a resolved FK should match with unresolved FKs or not. This means ForeignKeyCollection.GetMatchingKey() doesn't have to assume its receiver instance is a ForeignKeyCollection from an ancestor root value, and instead the caller is responsible for specifying which behavior is needed.
Related to https://github.com/dolthub/dolt/issues/7956
7955: [types] cache frequently read value store chunks (like working set roots)
7952: Made drop table work with search path

go-mysql-server

2537: Update generated index names to match MySQL
A customer pointed out that when we add indexes with generated names, we don't generate the same names as MySQL. Specifically:
- When a FK is added with an explicit constraint name, that name should be used to name the automatically created index, if one is created.
- Secondary indexes are named after the first column in the index in MySQL, not by joining all the columns together.
2536: Rename generated FK names when their table is renamed
Updates our rename table logic to match MySQL's behavior of updating auto-generated foreign key names to match the new table name.
Customer issue: https://github.com/dolthub/dolt/issues/7959
Dolt companion PR: https://github.com/dolthub/dolt/pull/7968
2535: Fix UPDATE JOIN matchedRows
fixes: https://github.com/dolthub/dolt/issues/7957
Main question is how thorough we want to make the child iter check. Should all iterators implement a ChildIter interface?
2534: Implement row alias expressions (INSERT ... VALUES (...) AS new_tbl ON DUPLICATE x = new_tbl.x)
When inserting values, the user can specify names for both the source table and columns which are used in ON DUPLICATE expressions. It looks like either of the below options:
```
INSERT INTO tbl VALUES (1, 2) AS tbl_new ON DUPLICATE KEY b = tbl_new.b;
INSERT INTO tbl VALUES (1, 2) AS tbl_new(a_new, b_new) ON DUPLICATE KEY b = b_new;
```
This replaces the previous (now-deprecated) syntax:
```
INSERT INTO tbl VALUES (1, 2) ON DUPLICATE KEY b = VALUES(b);
```
Supporting both syntaxes together was non-trivial because it means there's now two different ways to refer to the same column. While he had an existing way to "redirect" one column name to another, this only worked for unqualified names (no table name), and it overrode the normal name resolution rules, which meant we would fail to detect cases that should be seen as ambiguous.
Previously, we would implement references to the inserted values by using a special table named "__new_ins". I implemented this by keeping that as the default, but using the row alias instead of one was provided. We then create a map from the destination table names to column aliases, and use that map to rewrite expressions that appear inside the VALUES() function.
2533: Table name validation folds strings
fixes: https://github.com/dolthub/dolt/issues/7958
2532: Move json_function_tests.go and json tests that depend on it to their own package.
This ensures that non-test code in sql/expression/function/json doesn't depend on testify, which is a library that we only want to depend on for tests.
2530: Bug Fix: Index name case-insensitivity
MySQL index names are case-insensitive, but GMS' memory implementation wasn't handling them that way. This makes index names case-insensitive.
Related to https://github.com/dolthub/dolt/issues/7945

Closed Issues

7970: Add support for DOLT_HASHOF_DB()
7957: Dolt returns wrong number of affected rows for UPDATE ... JOIN with clientFoundRows=true
7958: UPDATE ... JOIN fails for tables containing capital letters
7945: Dolt panics when renaming index containing capital letters
7944: Dolt panics on subquery in IF statement in procedure

Performance

Read Tests	MySQL	Dolt	Multiple
covering_index_scan	2.11	2.97	1.4
groupby_scan	13.22	17.32	1.3
index_join	1.34	5.47	4.1
index_join_scan	1.27	2.26	1.8
index_scan	34.33	54.83	1.6
oltp_point_select	0.18	0.52	2.9
oltp_read_only	3.55	8.28	2.3
select_random_points	0.34	0.83	2.4
select_random_ranges	0.39	0.97	2.5
table_scan	34.33	54.83	1.6
types_table_scan	77.19	139.85	1.8
reads_mean_multiplier			2.2

Write Tests	MySQL	Dolt	Multiple
oltp_delete_insert	8.13	6.21	0.8
oltp_insert	3.82	3.02	0.8
oltp_read_write	8.58	15.0	1.7
oltp_update_index	3.89	3.13	0.8
oltp_update_non_index	3.89	3.07	0.8
oltp_write_only	5.47	6.55	1.2
types_delete_insert	7.7	6.79	0.9
writes_mean_multiplier			1.0

TPC-C TPS Tests	MySQL	Dolt	Multiple
tpcc-scale-factor-1	98.38	26.01	4.0
tpcc_tps_multiplier			3.8

Overall Mean Multiple	2.33

dolt - 1.39.3

Published by github-actions[bot] 5 months ago

Merged PRs

dolt

7947: Bug Fix: Index name case-insensitivity
MySQL index names are case-insensitive, but Dolt's index implementation wasn't handling them that way. This makes index names case-insensitive.
Customer issue: https://github.com/dolthub/dolt/issues/7945
New enginetests added in GMS PR: https://github.com/dolthub/go-mysql-server/pull/2530
7941: Bug fix for AllSchemas method for schemas
7940: dolt sql slash cmds
Add the ability to run some dolt commands directly from the dolt sql shell.
Fixes: https://github.com/dolthub/dolt/issues/6874
7933: Update get-mysql-dolt-job-json.sh
TPS comparison is inverted compared t latency_p95 comparison.
7931: Update get-mysql-dolt-job-json.sh

go-mysql-server

2531: Bug Fix: Finalize subqueries in IfConditionals when applying stored procedures
When applying a stored procedure to a CALL statement, we weren't calling finalizeSubqueries() on any subqueries in IfConditional expressions, which caused the subquery to not have a NodeExecBuilder populated.
Customer issue: https://github.com/dolthub/dolt/issues/7944
2529: Fix global decimal.MarshalJSONWithoutQuotes overwrite
The decimal.MarshalJSONWithoutQuotes is a global variable.
By setting this value then this can cause problems with any other code that does not expect this value to be changed.
Instead using a custom encoder to ensure that the marshalling behaviour is as expected without changing the global value ensure that this will not cause compatibility issues with other projects.
This code is covered both by existing tests, and an additional one in this PR.
(if the custom encode switch case is not added, but the global variables are, then the tests fail).
2528: Bug fix for unwrapping a privileged db
2524: Adding @@max_binlog_size system variable
https://dev.mysql.com/doc/refman/8.0/en/replication-options-binary-log.html#sysvar_max_binlog_size
2523: Added additional analyzer hooks for integrators
2522: More INSERT short-circuits
Only run an "on update" code block when expressions are non-nil. Directly compare sql mode default string, rather than lowercasing every time.
2519: IndexedTableAccess gets indexing fast path

vitess

350: Refactoring BinlogStream type into BinlogMetadata
The mysql.BinlogStream type from Vitess was a little awkward to use, and seems to have been mostly intended as test code. This gives it a more descriptive name and makes it a little easier to pass around struct copies without concurrency issues from a shared instance.
349: Fixed timestamp bindvar formatting to match MySQL string expectation
348: Allowing caching plugin to be specified in string quotes
The CREATE USER ... IDENTIFIED WITH syntax (MySQL ref) allows the caching plugin to be specified in string quotes, but our parser only supported identifier quotes.
This came up as part of binlog replication testing – MySQL was sending a CREATE USER statement from the primary to a Dolt replica, but Dolt wasn't able to parse the statement because of the use of string quotes around the caching plugin name.

Closed Issues

6874: Embed cli command in dolt sql
2289: First Unique Key in a keyless table should be represented as a primary key

Performance

Read Tests	MySQL	Dolt	Multiple
covering_index_scan	2.07	2.97	1.4
groupby_scan	13.22	17.01	1.3
index_join	1.34	5.28	3.9
index_join_scan	1.27	2.22	1.7
index_scan	34.95	52.89	1.5
oltp_point_select	0.18	0.5	2.8
oltp_read_only	3.49	8.13	2.3
select_random_points	0.34	0.81	2.4
select_random_ranges	0.39	0.95	2.4
table_scan	34.95	54.83	1.6
types_table_scan	75.82	137.35	1.8
reads_mean_multiplier			2.1

Write Tests	MySQL	Dolt	Multiple
oltp_delete_insert	7.98	6.21	0.8
oltp_insert	3.82	3.07	0.8
oltp_read_write	8.58	14.73	1.7
oltp_update_index	3.89	3.19	0.8
oltp_update_non_index	3.89	3.13	0.8
oltp_write_only	5.37	6.55	1.2
types_delete_insert	7.7	6.79	0.9
writes_mean_multiplier			1.0

TPC-C TPS Tests	MySQL	Dolt	Multiple
tpcc-scale-factor-1	99.71	25.77	4.2
tpcc_tps_multiplier			3.9

Overall Mean Multiple	2.33

dolt - 1.39.2

Published by github-actions[bot] 5 months ago

Merged PRs

dolt

7930: Bump mysql2 from 3.9.7 to 3.9.8 in /integration-tests/mysql-client-tests/node
Bumps mysql2 from 3.9.7 to 3.9.8.
7929: dolt fetch default spec from empty repo should return silently
Git fetch returns without error when you fetch the default refspec. When you fetch a specific ref you get an error. Dolt now matches this behavior.
Fixes: https://github.com/dolthub/dolt/issues/7928
7925: apply filter-branch changes to working/staged changes
This PR adds support for a --apply-to-uncommitted option to dolt filter-branch, which applies the filter-branch changes to the working and staged roots.
fixes https://github.com/dolthub/dolt/issues/7902
7923: [dsess] Cache checks lookup for TPC-C update
7922: [writer] skip more deserialization steps in getTableWriter
7900: prevent dolt filter branch when it would overwrite unchecked branch's working set
Turns out other branches can have working sets, and dolt-filter branch would drop those. PR prevents that from happening.
adding tests to comments I missed here:
https://github.com/dolthub/dolt/pull/7895
7898: Added workflow for checking DoltgreSQL
This adds a new workflow that runs a subset of the tests in DoltgreSQL to check for any major integration errors. The workflow does not fail if errors are encountered. Instead, it creates a comment stating that failures were found. If no failures were found, then no comment is made.
7892: dolt admin archive
This hidden admin command will convert the table files in oldgen into archive files, then update the manifest so that you can run queries against the archive for performance testing. Currently we assume that dolt gc has been run immediately prior to using this command.
After the build is complete, we lookup every chunk in the archive using the index of the originating table file. We then verify each chunk's key checks out. If this verification fails, exit status 1.
Lot of rough edges still:
- Currently no feedback as the build progresses. This is annoying because it can take a fair amount of time
- ChunkSource interface is single threaded, so getMany and hasMany are not going to perform well.
- Lacking checks to ensure that the server isn't running and we have the LOCK on oldgen.
- No bats tests, and this is kind of a temporary thing. There are go tests on key bits.
7863: Use the search path to resolve table names in Doltgres
Doltgres enables the UseSearchPath global at startup, which triggers this behavior.
This is a shim to get a proof of concept of this behavior working faster. A better solution, coming next, involves making this behavior pluggable and putting this logic in the Doltgres package, not in Dolt.
Companion PRs:
https://github.com/dolthub/go-mysql-server/pull/2498
https://github.com/dolthub/doltgresql/pull/269

go-mysql-server

2520: Default sql mode for common path
Bit strange & verbose, but has a noticeable effect for small queries.
perf here: https://github.com/dolthub/dolt/pull/7915
2519: IndexedTableAccess gets indexing fast path
2518: Short circuit for update/delete
Simple updates and deletes skip most of analysis.
perf here: https://github.com/dolthub/dolt/pull/7907

2517: Improve correctness and error messages for JSON functions.
MySQL doesn't do this and neither should we.
MySQL:

mysql> select JSON_INSERT("null", "$.a", 1);
+-------------------------------+
| JSON_INSERT("null", "$.a", 1) |
+-------------------------------+
| null                          |
+-------------------------------+
1 row in set (0.00 sec)
mysql> select JSON_INSERT("null", "$.a", 1) is null;
+---------------------------------------+
| JSON_INSERT("null", "$.a", 1) is null |
+---------------------------------------+
|                                     0 |
+---------------------------------------+

The only time we should be coercing a JSON-null document into SQL-null is for JSON_EXTRACT (for paths other than "$") and JSON_VALUE (for all paths). But these are already handled separately.

2515: Zachmu/schemas2 merge
2513: Added workflows for checking integrators
This adds a new workflow that runs a subset of tests in Dolt and DoltgreSQL to check for any major integration errors. The workflows do not fail if errors are encountered. Instead, they'll create a comment stating which projects had failures. If no failures were found, then no comment is made.
2498: New interfaces for resolving table names for databases with schemas
This is a proof of concept to get schema resolution working quickly, and I'm not super happy with the separation of concerns. A better solution would implement table name resolution in the Catalog directly, rather than in the integrator. That effort is significantly hindered by the Catalog being a concrete analyzer implementation with many analyzer-specific details that can't be easily substituted for another implementation. The longer term plan is to perform the extensive refactoring necessary to make the relevant parts of the Catalog swappable, rather than (effectively) having to swap only DatabaseProvider and friends.

Closed Issues

7902: filter-branch option to apply query to WORKING and STAGED roots
7928: CLI dolt fetch <remote> failed to use the default ref spec
7897: Pomelo Entity Framework connector is not able to commit changes
7909: [Question] How to init Dolt database programatically?

Performance

Read Tests	MySQL	Dolt	Multiple
covering_index_scan	2.11	2.97	1.4
groupby_scan	13.22	17.32	1.3
index_join	1.34	5.18	3.9
index_join_scan	1.27	2.18	1.7
index_scan	33.72	52.89	1.6
oltp_point_select	0.17	0.5	2.9
oltp_read_only	3.36	8.13	2.4
select_random_points	0.32	0.8	2.5
select_random_ranges	0.38	0.95	2.5
table_scan	34.33	54.83	1.6
types_table_scan	73.13	137.35	1.9
reads_mean_multiplier			2.2

Write Tests	MySQL	Dolt	Multiple
oltp_delete_insert	7.98	6.21	0.8
oltp_insert	3.75	3.07	0.8
oltp_read_write	8.43	15.0	1.8
oltp_update_index	3.82	3.19	0.8
oltp_update_non_index	3.82	3.13	0.8
oltp_write_only	5.37	6.55	1.2
types_delete_insert	7.7	6.91	0.9
writes_mean_multiplier			1.0

TPC-C TPS Tests	MySQL	Dolt	Multiple
tpcc-scale-factor-1	101.2	25.57	4.1
tpcc_tps_multiplier			0.3

Overall Mean Multiple	1.17

dolt - 1.39.1

Published by github-actions[bot] 5 months ago

Merged PRs

dolt

7901: Zachmu/schemas2 merge
7899: Properly add database collation change when using -a option in dolt commit
This PR fixes a case where we don't properly handle database collation changes with the -a option in dolt commit.
fixes https://github.com/dolthub/dolt/issues/7897
7888: [sort] index build streams sorted edits
Use sorting to skip more steps building a prolly map. Shaves maybe 20-25% off of external index rebuilds.
This also fixes a bug where we were incorrectly using only the prefix descriptor to sort secondary index keys.

go-mysql-server

2512: Spooling shortcut for one/zero return schemas
Nodes that return zero or one row don't need a beefy channel/wait group setup to execute. They just need to grab the first row and close the iterator. There are several nodes that incorrectly reported their schemas previously, which I've updated to be more accurate. There are some nodes that optionally return rows, which I've simplified to return an empty schema that can be differentiated from the nil schema. We could make the distinction more explicit, also.
bump with perf here: https://github.com/dolthub/dolt/pull/7894

vitess

348: Allowing caching plugin to be specified in string quotes
The CREATE USER ... IDENTIFIED WITH syntax (MySQL ref) allows the caching plugin to be specified in string quotes, but our parser only supported identifier quotes.
This came up as part of binlog replication testing – MySQL was sending a CREATE USER statement from the primary to a Dolt replica, but Dolt wasn't able to parse the statement because of the use of string quotes around the caching plugin name.
347: Added InjectedStatement
This is the same as InjectedExpr, except for statements instead of expressions.

Closed Issues

7891: filter-branch destroys working and staged roots
7897: Pomelo Entity Framework connector is not able to commit changes
7890: Pomelo Entity Framework connector is not able to recreate database.

dolt - 1.39.0

Published by github-actions[bot] 5 months ago

Merged PRs

dolt

7895: prevent filter-branch when there are local changes
This PR changes filter-branch to detect any local changes so working/staged changes aren't lost.
A future PR should include changes to have the working set just be applied over the result of dolt filter-branch.
partially addresses: https://github.com/dolthub/dolt/issues/7891

go-mysql-server

2512: Spooling shortcut for one/zero return schemas
Nodes that return zero or one row don't need a beefy channel/wait group setup to execute. They just need to grab the first row and close the iterator. There are several nodes that incorrectly reported their schemas previously, which I've updated to be more accurate. There are some nodes that optionally return rows, which I've simplified to return an empty schema that can be differentiated from the nil schema. We could make the distinction more explicit, also.
bump with perf here: https://github.com/dolthub/dolt/pull/7894
2511: Adding mapping to error code 1049 for ErrDatabaseNotFound errors
When a database doesn't exist, MySQL returns error code 1049. This change adds a mapping to error code 1049 for ErrDatabaseNotFound errors, and updates our handler so that ComInitDB messages will map errors to MySQL error codes.
This is needed because tooling (e.g. Pomelo EntityFramework MySQL library) can rely on this error code in application logic.
Related to https://github.com/dolthub/dolt/issues/7890

Closed Issues

7890: Pomelo Entity Framework connector is not able to recreate database.

dolt - 1.38.3

Published by github-actions[bot] 5 months ago

Merged PRs

dolt

7884: [tree] return blob builders to pool after use
I added a builder pool and never returned the objects, this adds the Put().
7882: Bug fix: no-op dolt_pull() was leaving working set dirty
Customer-reported bug. Two dolt_pull() operations on two branches in the same session when local branches are already up to date, with @@autocommit off, leave the session unable to commit because two branch heads are considered dirty. See new bats test for details on reproducing.
The issue is that DoltSession.SetWorkingSet() marks that branch head dirty until the transaction is committed. Most merge code paths used by pull involve performing a dolt_commit(), which has the side effect of zeroing out the current transaction, meaning the next statement would get a new transaction and fresh working sets loaded from disk, avoiding the dirty state problem. Only the code path where the branch head is already up to date is affected by this bug. All the merge library code that actually needs to call DoltSession.SetWorkingSet() (only necessary before a dolt_commit happens, or in the case of a squash where changes should remain in the working set) already does so, making the additional call in dolt_pull.go redundant and leading to this buggy behavior in the no-change case.
There are probably still related bugs for session state management during pull and merge operations, but I want to keep this fix narrow to address the customer issue while I build up more robust (non-bats) tests for pull.
7878: Move sql patch statement generation APIs to the sqlfmt package
We have a few different APIs scattered around for generating SQL patch statements. I needed to make some functions from dolt_patch_table_function.go public to generate DDL statements for binlog support, so I moved them into the sqlfmt package and cleaned up some package import cycles along the way.
7872: Various test utils and small fixes
As part of the work for binlog source support (on fulghum/binlog_prototype branch), these are various smaller changes to tidy up docs, packaging, small bug fixes, and add new test utils that I've pulled out into this PR to review separately.
Notable changes:
- Adds the third version component to the go version in our go.mod. Two component versions indicate a development version, not a release version and cause an error about not being able to download a toolchain.
- Allows Dolt binlog replicas to accept the SOURCE_AUTO_POSITION config parameter, and errors if a user attempts to disable GTID auto positioning.
- Adds several new test util functions specific to binlog testing.
7870: go/utils/publishrelease: Bump MUSL toolchains used for cutting releases.
The new toolchain uses MUSL + mimalloc.
Include the mimalloc license in our released LICENSES notice.
7859: Cache table and schema indexes on schema address
The bulk of ~1ms read and write TPC-C queries benefit from caching table and index schemas, which have a lifecycle between schema migrations/alter statements/new table additions. This is in contrast to how we've typically cached objects using the root value hash, which is great for read-only workflows, but has a much shorter half-life.

go-mysql-server

2511: Adding mapping to error code 1049 for ErrDatabaseNotFound errors
When a database doesn't exist, MySQL returns error code 1049. This change adds a mapping to error code 1049 for ErrDatabaseNotFound errors, and updates our handler so that ComInitDB messages will map errors to MySQL error codes.
This is needed because tooling (e.g. Pomelo EntityFramework MySQL library) can rely on this error code in application logic.
Related to https://github.com/dolthub/dolt/issues/7890
2510: Fix race errors with memory tables
We use this library for running our tests. These are run with the -race flag - and we are seeing some errors related to concurrency and updating of the tables map.
I've added a sync.Mutex to all the places where this map is updated - our tests are now passing :)
2504: Added InjectedStatement as an AST node
This is the same as InjectedExpr, except for statements instead of expressions.
2502: Use Uint32 for SEQ_IN_INDEX in 'SHOW INDEXES' queries.
This is seemingly the correct type for this field.
MySQL Connector/NET expects this for servers >8.0.1: https://github.com/mysql/mysql-connector-net/blob/8.4.0/MySQL.Data/src/SchemaProvider.cs#L298-L300
Fixes dolthub/go-mysql-server#2501

vitess

347: Added InjectedStatement
This is the same as InjectedExpr, except for statements instead of expressions.
346: support DATE, TIME, and TIMESTAMP literal parsing
The SQL standard has special syntax for parsing date, time, and timestring literals.
https://dev.mysql.com/doc/refman/8.0/en/date-and-time-literals.html
This PR adds support for that.
Code was mostly taken from vitessio.
The types are still left as string types, as type conversion later on handles it just fine.
345: parse type aliases in cast
add support for statements like:
- select cast(<str> as character)
- select cast(<str> as double precision)
- select cast(<str> as read)

Closed Issues

2501: Problems with MySQL Connector/NET (Mysql.Data) and go-mysql-server
2503: "ON UPDATE CURRENT_TIMESTAMP" not come into effect

dolt - 1.38.2

Published by github-actions[bot] 5 months ago

Merged PRs

dolt

7880: Migrate dolt remote to SQL
Fixes: https://github.com/dolthub/dolt/issues/7622
7879: avoid NewEmptyIndex when table does not exist
It would be better to have NewEmptyIndex not write to chunkstore, but I'm not sure if it's possible right now.
Workaround is to just avoid calling it altogether in this particular case.

Closed Issues

7622: Migrate dolt remote to SQL

Performance

Read Tests	MySQL	Dolt	Multiple
covering_index_scan	2.07	3.02	1.5
groupby_scan	13.22	17.32	1.3
index_join	1.34	5.18	3.9
index_join_scan	1.27	2.18	1.7
index_scan	35.59	53.85	1.5
oltp_point_select	0.17	0.51	3.0
oltp_read_only	3.36	8.28	2.5
select_random_points	0.33	0.8	2.4
select_random_ranges	0.39	0.95	2.4
table_scan	35.59	55.82	1.6
types_table_scan	75.82	137.35	1.8
reads_mean_multiplier			2.1

Write Tests	MySQL	Dolt	Multiple
oltp_delete_insert	7.98	6.67	0.8
oltp_insert	3.75	3.25	0.9
oltp_read_write	8.43	15.83	1.9
oltp_update_index	3.82	3.49	0.9
oltp_update_non_index	3.82	3.43	0.9
oltp_write_only	5.37	7.56	1.4
types_delete_insert	7.7	7.56	1.0
writes_mean_multiplier			1.1

TPC-C TPS Tests	MySQL	Dolt	Multiple
tpcc-scale-factor-1	101.66	22.89	4.4
tpcc_tps_multiplier			4.4

Overall Mean Multiple	2.53

dolt - 1.38.1

Published by github-actions[bot] 5 months ago

Merged PRs

dolt

7877: keep sql.Schema for conflicts table schema
Previously, we would create a new schema of NomStringKind for every column. Now, we just reuse the underlying sql.Schema.
https://github.com/dolthub/dolt/issues/7874
7876: Improve error messages for CLI commands when a sql-server is running
Related to https://github.com/dolthub/dolt/issues/7873
Resolves https://github.com/dolthub/dolt/issues/7875
7866: update release process for new config refactor
7864: move config
7862: Bug fix: sql-server should initialize persisted global vars
The local config store (.dolt/config.json) can store persisted global variable values, but when --data-dir is used when starting a sql-server, the local configuration doesn't get loaded properly.
7860: Bug fix: load local config when using --data-dir
When using the --data-dir flag to work on a Dolt directory outside of the current working directory, the local configuration in the Dolt directory wasn't getting correctly loaded. This change evaluates the --data-dir parameter earlier, so that the first time we load the Dolt environment, we can pass the data directory and get the local configuration loaded correctly.
7858: [nbs] safer peek root hash record
7848: Added additional function to RootValue
This just adds a function to the RootValue for special merge logic, which is used by Doltgres.
7846: [dsess] session trigger cache

go-mysql-server

2499: fix LIKE NULL edge case
This PR fixes an edge case where SELECT <str> LIKE NULL should return NULL instead of false.
2497: trim whitespace when converting strings to numbers
fixes https://github.com/dolthub/dolt/issues/7854
2495: fix panic in VALUES constructor
When the number of rows in a ... VALUES ROW(...), ROW(...) statement were not equal, we would throw a panic.
This PR also unskips some tests that are now fixed.
Companion PR: https://github.com/dolthub/dolt/issues/6849
fixes: https://github.com/dolthub/dolt/issues/6849
2494: Replace count star also matches single column pk
2493: Implement status variables for Slow_queries, Max_used_connections, Com_select, and Connections
Adds support for four new status variables:
- Slow_queries
- Max_used_connections
- Com_select
- Connections
  Note that Connections currently only reports the successful connection attempts, but MySQL includes all connection attempts in that status variable. To capture the failed attempts, we'll need to expose that information from the Vitess layer.
  Also removes a mutex that was covering the whole scope over all status variables. Now that each individual status variable has a value that uses an atomic instance, we don't need to synchronize at a larger scope.
  Related to https://github.com/dolthub/dolt/issues/7646
2492: skip source values analyze when it only contains simple types

vitess

345: parse type aliases in cast
add support for statements like:
- select cast(<str> as character)
- select cast(<str> as double precision)
- select cast(<str> as read)
344: make row optional in VALUES constructor and insert statement
This PR adds additional syntax support for VALUE constructor.
fixes https://github.com/dolthub/dolt/issues/6849
fixes https://github.com/dolthub/dolt/issues/7853
338: Add a schema qualifier to table names

Closed Issues

7873: Running sql-server from an empty state make inconsistent repository
7874: Failed to write conflicts table
7875: Confusing error messages when using Dolt CLI from within a running Dolt sql-server directory
7854: trim whitespace when casting from string
7853: make ROW keyword optional in VALUES statement
6849: support INSERT INTO ... (VALUES ROW(...)) statement
7845: Make sure deleting a branch behaves similarly to drop database

Performance

Read Tests	MySQL	Dolt	Multiple
covering_index_scan	2.07	3.02	1.5
groupby_scan	13.22	17.63	1.3
index_join	1.34	5.18	3.9
index_join_scan	1.27	2.22	1.7
index_scan	34.33	54.83	1.6
oltp_point_select	0.17	0.51	3.0
oltp_read_only	3.36	8.43	2.5
select_random_points	0.32	0.8	2.5
select_random_ranges	0.39	0.95	2.4
table_scan	34.33	55.82	1.6
types_table_scan	73.13	137.35	1.9
reads_mean_multiplier			2.2

Write Tests	MySQL	Dolt	Multiple
oltp_delete_insert	7.98	6.67	0.8
oltp_insert	3.75	3.25	0.9
oltp_read_write	8.28	15.83	1.9
oltp_update_index	3.82	3.49	0.9
oltp_update_non_index	3.82	3.43	0.9
oltp_write_only	5.28	7.56	1.4
types_delete_insert	7.56	7.56	1.0
writes_mean_multiplier			1.1

TPC-C TPS Tests	MySQL	Dolt	Multiple
tpcc-scale-factor-1	101.35	22.45	4.9
tpcc_tps_multiplier			4.9

Overall Mean Multiple	2.73

dolt - 1.38.0

Published by github-actions[bot] 5 months ago

Merged PRs

This minor release includes a new entry in the dolt_status and dolt_diff system tables for database collation changes, making these tables backwards incompatible for some select statements. Changes to a dolt database collation will show up as table changes with the name __DATABASE__<db>. Additionally, tables starting with this prefix are not allowed.

dolt

7823: handle database charset/collation changes
This PR makes dolt aware of database collation changes.
We treat database collation changes similarly to a collation change to a table.
To properly show a dolt diff we need to add support for show create database as of ..., which would require changes to vitess and gms. For now, we just show the new create statement.
Addtionally, we should add support to resolve database collation merge conflicts.
Affected functions are:
- dolt add
- dolt commit
- dolt status
- dolt diff
- dolt merge
  Addresses: https://github.com/dolthub/dolt/issues/7815
7819: use parser interface in engine
7803: Avoid escaping HTML characters when displaying them to the user.
This fixes an issue where if a JSON document in the storage layer contains escaped characters, those escape sequences could end up being displayed to the user via the dolt sql -r json command.
7764: Bump golang.org/x/net from 0.17.0 to 0.23.0 in /go
Bumps golang.org/x/net from 0.17.0 to 0.23.0.

go-mysql-server

2492: skip source values analyze when it only contains simple types
2491: ValidateInsertColumns avoids allocating hash map
2490: Avoid escaping HTML when Marshalling JSON
Due to a misconfiguration, HTML characters were being escaped when marshaling JSON. This is unnecessary, and since we now potentially display marshalled JSON to the user, we shouldn't be doing this.
2488: System Variables: Add log_bin and change the default of performance_schema
The log_bin system variable controls whether a MySQL server logs to the binary log or not.
The performance_schema system variable was previously defaulted to 1, to match MySQL's default, but this can cause tools (e.g. Datadog) to believe that the performance_schema system tables are available, and then error out when trying to query them. Since we don't provide a performance_schema database, the new default for the performance_schema system variable is 0.
2487: Expand literals in comparisons when safe
2486: add parser interface in engine
This PR creates sql.Parser interface. This interface is defined in the engine and it should be used rather than using mysql parser directly.
Added GlobalParser variable to expose Doltgres parser for parsing view definition for now. It can also be used in places that needs doltgres-specific syntax parsing.

Closed Issues

7812: Entity Framework updating working set when there are no changes
7815: dolt workflows for manipulating the collation of a DB needed
6161: Optimizer statistics v1

dolt - 1.37.0

Published by github-actions[bot] 6 months ago

The previous (now deleted) release 1.36.1 had a start up time issue for databases > 10GB. We patched it with this one. That release was only up for an hour or so, so it is unlikely anyone got it. Thus, we moved this to 1.37.0 to warn people, just in case.

This minor release includes an internal interface change to the chunk journal index. The first startup process for a database with the old index format will perform a rewrite. This rewrite is a one-time penalty that in testing is <5% of the time it would take to reimport the database.

Merged PRs

dolt

7833: Bug fix: Apply replication settings for newly cloned databases
Dolt SQL servers using remote-based replication will pull new databases if the @@dolt_replication_remote_url_template system variable is configured, but those new databases weren't getting configured to continue pulling updates from the remote.
This change registers the newly cloned databases as ReadReplicaDatabase instances, so that they will poll their remote and pull new commits. It also adds some additional logging to help debug issues with remote-based replication.
7829: Changed RootValue into an interface
Companion:
- https://github.com/dolthub/doltgresql/pull/232
  This changes the RootValue into an interface. Every function that seems unique to Dolt's RootValue has been changed into a function variable, with the variable being overwritten from Doltgres to point to a different function.
7799: Archive Serialization and Deserialization
This PR doesn't direclty change any Dolt behavior. It just lays the groundwork for archive creation and reading. Currently, no file is materialized by this code as unit tests exercise it with ByteSinks.
7780: Reformat journal index
Change the way we write journal index lookups. Each write appends a lookup to a bufio.Writer that lazily writes to disk. And after some increment we flush a CRC/root value record for consistency checking the index during bootstrap. This avoids big stalls for flushing a batch of index records. We also only write an addr16 now, because that's what we load into the default chunk address map.
Databases with the older format will pay a one-time startup penalty to rewrite the journal index. In testing this appears to be 5-10% of the import time for the database.
7836: Journal index offset 8bytes
On >10GB datasets, offsets overflow uint32. Bug from previous PR https://github.com/dolthub/dolt/pull/7780
7834: minver refactor to be used by doltgres
7821: Prevent panic when dropping columns in schema merge
Fixes https://github.com/dolthub/dolt/issues/7762
In certain cases, performing a schema merge when the merged schema had fewer columns than the base schema would cause a panic.
We actually had a test for this, but the test was disabled because a limitation in how the test harness generated column tags was causing incorrect detection of merge conflicts.
To re-enable these tests, this PR slightly relaxes the logic for merge conflicts wrt column tags. This is safe to do because column tags shouldn't influence the result of merges outside of helping to identify renamed columns, so long as the merge behaves the same in both directions.

Closed Issues

7762: Panic during schema merge

Performance

Read Tests	MySQL	Dolt	Multiple
covering_index_scan	2.07	2.97	1.4
groupby_scan	13.22	17.63	1.3
index_join	1.37	5.18	3.8
index_join_scan	1.27	2.22	1.7
index_scan	34.33	53.85	1.6
oltp_point_select	0.17	0.51	3.0
oltp_read_only	3.36	8.43	2.5
select_random_points	0.33	0.8	2.4
select_random_ranges	0.39	0.95	2.4
table_scan	34.33	54.83	1.6
types_table_scan	74.46	134.9	1.8
reads_mean_multiplier			2.1

Write Tests	MySQL	Dolt	Multiple
oltp_delete_insert	7.98	6.91	0.9
oltp_insert	3.75	3.43	0.9
oltp_read_write	8.43	16.12	1.9
oltp_update_index	3.82	3.55	0.9
oltp_update_non_index	3.82	3.43	0.9
oltp_write_only	5.37	7.84	1.5
types_delete_insert	7.7	7.56	1.0
writes_mean_multiplier			1.1

TPC-C TPS Tests	MySQL	Dolt	Multiple
tpcc-scale-factor-1	102.12	22.29	4.6
tpcc_tps_multiplier			4.6

Overall Mean Multiple	2.60

dolt - 1.36.0

Published by github-actions[bot] 6 months ago

This version does not include an interface change, but does include large changes to the performance and network utilization behavior of dolt fetch and related functionality, such as shallow clone and Dolt cluster replication.

Merged PRs

dolt

7828: Bug fix: Allow dolt init to use --data-dir param
Previously, dolt init would use the value of --data-dir for almost all of the repository initialization, but the code that set up repository configuration would always use the current directory. This change allows callers to use dolt init with the --data-dir param to initialize directories other than the current working directory as Dolt repositories.
7825: Bug fix: Decimal type binlog serialization
7824: dolt fetch: Implement pipelined, continuous downloads during pulls from DoltHub and dolt sql-server remotes.
Dolt fetch, pull, shallow clone and cluster replication will now make more aggressive utilization of available network resources.
7816: allow database alters and case insensitive check for info schema
fixes: https://github.com/dolthub/dolt/issues/7814

go-mysql-server

2488: System Variables: Add log_bin and change the default of performance_schema
The log_bin system variable controls whether a MySQL server logs to the binary log or not.
The performance_schema system variable was previously defaulted to 1, to match MySQL's default, but this can cause tools (e.g. Datadog) to believe that the performance_schema system tables are available, and then error out when trying to query them. Since we don't provide a performance_schema database, the new default for the performance_schema system variable is 0.
2485: Have LazyJSONDocument implement fmt.Stringer and driver.Valuer, in order to interoperate with other go SQL libraries.

Closed Issues

7814: dolt sql fails to run DDL operations
6624: Table size calculation using DATA_LENGTH in information schema is naive and massively overstates the size of tables

dolt - 1.35.13

Published by github-actions[bot] 6 months ago

Merged PRs

dolt

7818: Apply a factor to better estimate information_schema.TABLES.DATA_LENGTH
information_schema.TABLES.DATA_LENGTH currently reports the max possible table size for a table, and doesn't take into account table file compression or that variable length fields (e.g. TEXT) are not always fully used. Tools such as DBeaver use this metadata to display table sizes, and since the estimates can easily be orders of magnitude greater than the actual size on disk, it can cause customers to be concerned by the reported sizes (e.g. https://github.com/dolthub/dolt/issues/6624).
As a short-term fix to make these estimates more accurate, we apply a constant factor to the max table size. I came up with this scaling factor by measuring a best case scenario (where no fields are variable length) and a worst case scenario (were all fields are variable length and only use a few bytes), then picking a value roughly in the middle. Longer-term, a better way to estimate table size on disk will be to use statistics data.
7810: fix output for dolt diff --stat -r json
This PR tidys up the code for printing diffs, specifically for JSON result format, and prints --stat correctly for JSON result format.
Additionally, we throw an error for SQL result format instead of just returning incorrect output. It might be worth implenting now, but I can just make an issue for it.
fixes: https://github.com/dolthub/dolt/issues/7800
7809: go/libraries/doltcore/sqle/dprocedures: dolt_pull.go: Improve CPU utilization of call dolt_pull.
7805: Fix: allow jsonSerializer to load JSON from LazyJSONDocument
7804: Changing database init/drop hooks to be a slice of hooks
The Dolt database provider currently has a single init hook and a single drop hook. We have a few hooks, and in order to support multiple hooks, we chain them together. Binlog replication will also need to register a similar init and drop hook to capture database create/drop actions, so to prepare for that, this PR turns the single init hook and single drop hook into a slice of init hooks and a slice of drop hooks.
7802: adding --name-only option for dolt diff
This PR adds support for --name-only option for dolt diff, which just prints the tables that have changed between the two commits. This mirrors git diff --name-only.
fixes: https://github.com/dolthub/dolt/issues/7797
7795: Serialization code for binlog events
Provides support for serializing all Dolt data types into MySQL's binary encoding used in binlog events. Vitess provides good support for deserializing binary values from binlog events into Go datatypes, but doesn't provide any support for serializing types into MySQL's binary format. This PR pulls data out of Dolt's storage system and encodes it into MySQL's binary format. It would be interesting to split out the Dolt storage system specific code and the core MySQL serialization logic in the future, but this seems like the right first step.
Related to https://github.com/dolthub/dolt/issues/7512
7785: Use LazyJSONDocument when reading from a JSON column.
This is the Dolt side of https://github.com/dolthub/dolt/issues/7749
The GMS PR is https://github.com/dolthub/go-mysql-server/pull/2470
LazyJSONDocument is an alternate implementation of sql.JSONWrapper that takes a string of serialized JSON and defers deserialization until it's actually required.
This is useful because in the most common use case (selecting a JSON column), deserialization is never required.
In an extreme example, I created a table with 8000 rows, with each row containing a 80KB JSON document.
dolt sql -q "SELECT * FROM test_table" ran in 47 seconds using JSONDocument, and 28 seconds using LazyJSONDocument, nearly half the time.
Even in cases where we do need to deserialize the JSON in order to filter on it, we can avoid reserializing it afterward, which is still a performance win.
Of note: In some cases we use a special serializer (defined in json_encode.go::marshalToMySqlString) in order to produce a string that is, according to the docstring "compatible with MySQL's JSON output, including spaces."
This currently gets used
- In Query Diff
- When hashing values for fulltext tables
- When casting JSON columns to a text type
- When writing values along the wire
  The last one is the most worrying, because it means that we can't avoid the serialization round-trip if we're connecting to a dolt server remotely. I discussed with Max whether or not we consider it a requirement to match MySQL's wire responses exactly for JSON, and agreed that we could probably relax that requirement. Casting a document to a text type will still result in the same output as MySQL.
7754: Index rebuilds with external key sorting
Index builds now write keys to intermediate files and merge sort before materializing the prolly tree for the secondary index. This contrasts the default approach, which rebuilds the prolly tree each time we flush keys from memory. The old approach reads most of the tree with random reads and writes when memory flushes are unsorted keys. The new approach structures work for sequential IO by flushing sorted runs that become incrementally merge sorted. The sequential IO is dramatically faster for disk-based systems.

go-mysql-server

2485: Have LazyJSONDocument implement fmt.Stringer and driver.Valuer, in order to interoperate with other go SQL libraries.
2470: Add LazyJSONDocument, which wraps a JSON string and only deserializes it if needed.
This is the GMS side of https://github.com/dolthub/dolt/issues/7749
This is a new JSONWrapper implementation. It isn't used by the GMS in-memory storage, but it will be used in Dolt to speed up SELECT queries that don't care about the structure of the JSON.
A big difference between this and JSONDocument is that even after it de-serializes the JSON into a go value, it continues to keep the string in memory. This is good in cases where we would want to re-serialize the JSON later without changing it. (So statements like SELECT json FROM table WHERE json->>"$.key" = "foo"; will still be faster.) But with the downside of using more memory than JSONDocument)
2469: refactor index validation and prevent indexes over json columns
This PR consolidates the logic to validate if an index.
Additionally, it fixes a bug where create table t (i int, index (i, i)); was allowed.
fixes: https://github.com/dolthub/dolt/issues/6064
2466: Schema-qualified table names
This PR also fixes a couple unrelated issues:
- IMDB query plans are brought up to date (this is most of the change lines)
- Fixed bugs in certain show statements (information_schema tests)

Closed Issues

7813: Ability to export diffs as SQL
6624: Table size calculation using DATA_LENGTH in information schema is naive and massively overstates the size of tables
7800: dolt diff --stat -r json produces invalid JSON
7749: Dolt serializes and deserializes JSON unnecessarily.
7797: dolt diff ... that only shows the tables changed in a simpler format

Performance

Read Tests	MySQL	Dolt	Multiple
covering_index_scan	2.14	3.13	1.5
groupby_scan	13.46	17.95	1.3
index_join	1.37	5.28	3.9
index_join_scan	1.27	2.26	1.8
index_scan	34.33	54.83	1.6
oltp_point_select	0.17	0.51	3.0
oltp_read_only	3.43	8.43	2.5
select_random_points	0.33	0.8	2.4
select_random_ranges	0.39	0.97	2.5
table_scan	34.33	54.83	1.6
types_table_scan	74.46	137.35	1.8
reads_mean_multiplier			2.2

Write Tests	MySQL	Dolt	Multiple
oltp_delete_insert	7.98	6.91	0.9
oltp_insert	3.75	3.43	0.9
oltp_read_write	8.43	16.41	1.9
oltp_update_index	3.82	3.55	0.9
oltp_update_non_index	3.82	3.49	0.9
oltp_write_only	5.37	7.98	1.5
types_delete_insert	7.7	7.56	1.0
writes_mean_multiplier			1.1

TPC-C TPS Tests	MySQL	Dolt	Multiple
tpcc-scale-factor-1	101.88	22.32	4.9
tpcc_tps_multiplier			4.9

Overall Mean Multiple	2.73

Package Rankings

Top 14.22% on Formulae.brew.sh

Top 1.44% on Proxy.golang.org

Top 3.88% on Pypi.org

Badges

Extracted from project README

Related Projects

usql

Universal command-line interface for SQL databases

02 Mar 2017 8,609

yuniql

Free and open source schema versioning and database migration made natively with .NET/6. NEW THIS...

06 Oct 2019 418

go-mysql-server

A MySQL-compatible relational database with a storage agnostic query engine. Implemented in pure Go.

25 Jun 2019 2,228

sqldef

Idempotent schema management for MySQL, PostgreSQL, and more

25 Aug 2018 1,809