dolt

Dolt – Git for Data

APACHE-2.0 License

Downloads
2.4K
Stars
17.1K
Committers
143

Bot releases are visible (Hide)

dolt - 1.42.20

Published by github-actions[bot] about 1 month ago

Merged PRs

dolt

  • 8353: Support dolt diff --staged
    The common way to look at the staged diff in git is the --staged flag. --cached is still supported, but not recommended (and confusing). This change makes dolt more like git.
  • 8352: Fix dolt_merge_base string validation for doltgres
    dolt_merge_base was failing in doltgres with a invalid type: unknown error
  • 8350: [statspro] Restart drop db
    Quick fix for at least one of the problems in https://github.com/dolthub/dolt/issues/8345.
    The specific example fails because the restart function refreshes the stats database instance, but does not reload the stats contained within. So the contents in memory did not track the contents on disk. This was only noticeable when restart and read were called within the same shell context.
  • 8348: Testing for dolt add --patch
  • 8343: support for schemas in various version control operations
    This change adds support for schemas in dolt_add, dolt_merge, and dolt_status. Many places that had a string for a table name now have a doltdb.TableName.
    Also fixes a newly discovered bug with creating a database:
    1. creating a DB with a collation left the working set dirty
    2. DBs created in a transaction couldn't be queried until starting a new tx
  • 8341: Fix more stats collection races
    fixes: https://github.com/dolthub/dolt/issues/8339
    The concurrency tests stopped working at some point when we were rearranging the default stats behavior. I caught a couple other issues after I got those tests running again.
  • 8320: Bump actions/download-artifact from 3 to 4.1.7 in /.github/workflows
    Bumps actions/download-artifact from 3 to 4.1.7.
  • 8310: Bump com.mysql:mysql-connector-j from 8.0.31 to 8.2.0 in /integration-tests/orm-tests/hibernate/DoltHibernateSmokeTest
    Bumps com.mysql:mysql-connector-j from 8.0.31 to 8.2.0.

go-mysql-server

vitess

Closed Issues

  • 8307: Load data error
  • 8339: Panic when connection to a running server using dolt sql when stats are on
dolt - 1.42.19

Published by github-actions[bot] about 1 month ago

Merged PRs

dolt

  • 8336: dolt add --patch
    CLI update to enable the "--patch" option for dolt add.
    This option is only supported in a CLI context (sql shell included) because the dolt_add stored procedure doesn't allow for a user interactive workflow.
    Currently this change lacks tests. I'll work on that after I ship the blog post.
    Fixes: https://github.com/dolthub/dolt/issues/2465
  • 8335: [statspro] Avoid stopping the world during stats updates
    The stats provider holds a lock that was required to (1) update stats, and (2) access statistics.
    The new behavior
    • Reject duplicate update requests on the same database/branch/table. So an analyze table <t> will error if a conflicting job is active.
    • Only lock the provider in critical sections. Update threads only grab the lock to read current statistics or write updates. So regular reads/writes will not hang waiting for the stats provider mutex longer than it takes to finish a critical section.
  • 8334: Fix issue where stats db created in wrong location
    fixes: https://github.com/dolthub/dolt/issues/8324

Closed Issues

  • 2465: Support dolt add -p
  • 8324: call dolt_stats_restart() executed using dolt sql -q in the root of the database puts stats in the wrong place
  • 8317: SELECT WHERE (id = x AND id = y) returns row matching y instead of no results
  • 8316: Sqlalchemy can't reflect schema with foreign key
dolt - 1.42.18

Published by github-actions[bot] about 2 months ago

Merged PRs

dolt

  • 8321: Optional String Argument
    support optional string arguments in the middle of command line args.
  • 8319: reset auto increment counter on dolt_reset('--hard')
    When calling dolt_reset(--hard), we don't properly reload the auto increment value from the global tracker.
    The bug is a side effect from maintaining auto increment value between branches. We track the last greatest auto increment value across all branches in the session, and dropping a table sets the previous auto_increment value to 0.
    The fix is to get the tracker to reload values when resetting.
    Some notable behavior here is that a dolt_reset over insert will not reset the auto_increment tracker.
    fixes: https://github.com/dolthub/dolt/issues/8272
  • 8314: Fix Integration Workflow
    This is Dolt's counterpart to the following PR:
  • 8313: Fixed table resolution for reset
  • 8309: Minor doc updates
  • 8306: Export DoltSystemVariables var so that it can be used by doltgres

go-mysql-server

  • 2651: fix collations with recursive hash for HashInTuple expressions
    When hashing tuples, we don't take into account the collation of the individual elements, which results in incorrect comparisons for case insensitive collations.
    The fix here was to recursively hash each individual element in a tuple, and hash the slice of hashes.
    fixes https://github.com/dolthub/dolt/issues/8316
  • 2650: conflict in EqualityRangeBuilder
    Fix bug in equality range builder where duplicate equality filters to one value would stomp each other.
    fixes: https://github.com/dolthub/dolt/issues/8317
    dolt-side: https://github.com/dolthub/dolt/pull/8322
  • 2645: Fix revision databases not showing up for schema databases

Closed Issues

  • 8317: SELECT WHERE (id = x AND id = y) returns row matching y instead of no results
  • 8330: Unique index not used when there is an IN expression in conditional
  • 8333: Class of Query intermittently slow on Wikipedia Import to Media Wiki
  • 8332: Slow query on where clause in composite primary key
  • 8316: Sqlalchemy can't reflect schema with foreign key
  • 8331: Slow query on wikipedia import to Media Wiki
  • 8272: dolt_reset(--hard) after a drop table reserts the auto_increment counter to 1
  • 8308: Empty result set from query with INNER JOIN + WHERE
dolt - 1.42.17

Published by github-actions[bot] about 2 months ago

Merged PRs

dolt

  • 8311: [kvexec] fix more lookup bugs related to schema/projection inconsistencies
    We were using the full table schema to index a covering index KV pair. The fix has to be sensitive to covering, non-covering, and keyless tables, which required extra tests on the GMS side to check all cases.
    fixes: https://github.com/dolthub/dolt/issues/8308
    re: https://github.com/dolthub/go-mysql-server/pull/2646
  • 8303: Perf: Optimize SQL transaction commits in binlog replication applier
    The binlog replication applier process is responsible for applying the replicated updates from a MySQL source to the Dolt replica. Previously, when committing SQL transactions after applying a change, the applier would attempt to commit on all known databases. This worked when the number of databases was small, but when the server is handling hundreds of databases, it adds noticeable and unnecessary delay. This change switches to use the dirty tracking data of the session to more efficiently commit to only the databases that have had changes applied.

    Performance Comparison

    The old SQL transaction commit logic scaled exponentially, while the new logic scales linearly.
    Num Databases Code Path Replication Time (s)
    100 Old 11.02
    300 Old 144.34
    500 Old 633.44
    100 New 6.79
    300 New 20.51
    500 New 36.04
  • 8294: Feature: Data conflict resolution during interactive rebase
    When data conflicts occur during an interactive rebase, customers can now use Dolt's conflict resolution tools to resolve the conflicts and continue the rebase operation. Note that schema conflicts are not supported yet.
    Example:
    > select * from dolt_rebase;
    +--------------+--------+----------------------------------+----------------------------+
    | rebase_order | action | commit_hash                      | commit_message             |
    +--------------+--------+----------------------------------+----------------------------+
    | 1.00         | drop   | iumpo2t0hd6drcn11jo8osdack1jmafp | inserting row 1 on branch1 |
    | 2.00         | pick   | us2uaji20dj77cnf1i2l698itr5392se | updating row 1 on branch1  |
    +--------------+--------+----------------------------------+----------------------------+
    > call dolt_rebase('--continue');
    data conflict detected while rebasing commit us2uaji20dj77cnf1i2l698itr5392se (updating row 1 on branch1).
    Resolve the conflicts and remove them from the dolt_conflicts_<table> tables, then continue the rebase by calling dolt_rebase('--continue')
    > select * from dolt_conflicts_t;
    +----------------------------------+---------+---------+--------+--------+---------------+----------+----------+-----------------+------------------------+
    | from_root_ish                    | base_pk | base_c1 | our_pk | our_c1 | our_diff_type | their_pk | their_c1 | their_diff_type | dolt_conflict_id       |
    +----------------------------------+---------+---------+--------+--------+---------------+----------+----------+-----------------+------------------------+
    | us2uaji20dj77cnf1i2l698itr5392se | 1       | one     | NULL   | NULL   | removed       | 1        | uno      | modified        | B1dLxGtGIlHRKJo88tigpw |
    +----------------------------------+---------+---------+--------+--------+---------------+----------+----------+-----------------+------------------------+
    > call dolt_conflicts_resolve('--theirs', 't');
    +--------+
    | status |
    +--------+
    | 0      |
    +--------+
    > call dolt_add('t');
    > call dolt_rebase('--continue');
    +--------+-----------------------------------------------------+
    | status | message                                             |
    +--------+-----------------------------------------------------+
    | 0      | Successfully rebased and updated refs/heads/branch1 |
    +--------+-----------------------------------------------------+
    
    Customer issue: https://github.com/dolthub/dolt/issues/7820

go-mysql-server

  • 2644: Fix Integration Workflow
    This makes a few changes to how the integration workflow is handled:
    • main is now merged into this PR before testing. Previously, we would just use the PR as-is, which created a situation where the main branch included changes that the PR had not yet merged in. This created an issue as Dolt and DoltgreSQL would expect some changes to be present that were not yet merged into the PR, causing compilation errors. By merging main, we bypass this. In addition, the workflow will automatically pass if a merge conflict is detected. I think this is fine, since conflicts must be resolved before the PR can be merged anyway. This does mean that some errors may not be caught for as long as merge conflicts against main exist.
    • We only pulled comments, and the PR description does not count as a comment. This made it seem a bit inconsistent with how PR detection was handled. This has now been added, and we're now doing a basic string search instead of using a JSON parser, as concatenating the comments and description does not result in a valid JSON object.
    • Workflow should automatically run when a comment is added, modified, or deleted. This was already supposed to happen, but the event structure differed between a comment and push in a subtle way, causing the workflow to immediately error and for the UI to continue displaying the previous run. This made it seem as though the workflow did not respond to comment updates.
    • Additional logging messages have been added, so it's easier to debug if something goes wrong in the future.
  • 2641: Correctly handle indexes on virtual columns
    Fixes https://github.com/dolthub/dolt/issues/8276
    Lots of small behaviors around virtual columns were not working correctly:
    • Adding an index on a virtual column triggered a table rebuild even when this wasn't necessary
    • Rebuilding a table that contained virtual columns could lead to incorrect results
    • Inserting into a table with a virtual column could update indexes incorrectly
    • Adding a generated column to the start of a table could lead to incorrect results
      This PR adds tests for these cases and fixes them by tweaking the logic for projections on tables with generated columns.

Closed Issues

  • 8308: Empty result set from query with INNER JOIN + WHERE

Performance

Read Tests MySQL Dolt Multiple
covering_index_scan 2.07 0.65 0.3
groupby_scan 13.7 16.71 1.2
index_join 1.37 2.66 1.9
index_join_scan 1.3 2.11 1.6
index_scan 34.95 54.83 1.6
oltp_point_select 0.18 0.3 1.7
oltp_read_only 3.49 5.77 1.7
select_random_points 0.34 0.67 2.0
select_random_ranges 0.39 0.69 1.8
table_scan 34.95 54.83 1.6
types_table_scan 75.82 144.97 1.9
reads_mean_multiplier 1.6
Write Tests MySQL Dolt Multiple
oltp_delete_insert 8.13 5.99 0.7
oltp_insert 3.82 2.97 0.8
oltp_read_write 8.58 11.87 1.4
oltp_update_index 3.89 3.02 0.8
oltp_update_non_index 3.89 2.97 0.8
oltp_write_only 5.37 6.09 1.1
types_delete_insert 7.7 6.43 0.8
writes_mean_multiplier 0.9
TPC-C TPS Tests MySQL Dolt Multiple
tpcc-scale-factor-1 98.27 39.08 2.5
tpcc_tps_multiplier 2.5
Overall Mean Multiple 1.67
dolt - 1.42.16

Published by github-actions[bot] about 2 months ago

Merged PRs

dolt

  • 8300: Fix nil panic in using dolt show to inspect secondary indexes.
  • 8299: More consistent return types for dolt procedures
    Most procedures that return integers return int64, but there are a few exceptions. The procedures that return int break in doltgres here
  • 8297: dolt_workspace_* update and delete support
    This change adds the ability to update dolt_workspace_ tables. Updates can take two forms:
    1. The "staging" column of the table. may be toggled from it's current state. If setting from false to true, the working value will be written into the staged table. Setting from true to false will remove the row from staging, and leave the value in working as is.
    2. You can delete any row which has a "staged" column of false. This will revert the workspace changes and return them to the original value.
  • 8186: Add support for Doltgres indexes

go-mysql-server

  • 2641: Correctly handle indexes on virtual columns
    Fixes https://github.com/dolthub/dolt/issues/8276
    Lots of small behaviors around virtual columns were not working correctly:
    • Adding an index on a virtual column triggered a table rebuild even when this wasn't necessary
    • Rebuilding a table that contained virtual columns could lead to incorrect results
    • Inserting into a table with a virtual column could update indexes incorrectly
    • Adding a generated column to the start of a table could lead to incorrect results
      This PR adds tests for these cases and fixes them by tweaking the logic for projections on tables with generated columns.
  • 2638: Change ranges to an interface

Closed Issues

  • 8276: Indexes on virtual generated columns generate incorrect results.

Performance

Read Tests MySQL Dolt Multiple
covering_index_scan 2.07 0.65 0.3
groupby_scan 13.22 16.41 1.2
index_join 1.37 2.66 1.9
index_join_scan 1.27 2.11 1.7
index_scan 34.33 54.83 1.6
oltp_point_select 0.18 0.3 1.7
oltp_read_only 3.49 5.77 1.7
select_random_points 0.34 0.65 1.9
select_random_ranges 0.39 0.69 1.8
table_scan 34.33 54.83 1.6
types_table_scan 75.82 142.39 1.9
reads_mean_multiplier 1.6
Write Tests MySQL Dolt Multiple
oltp_delete_insert 8.13 5.99 0.7
oltp_insert 3.82 3.02 0.8
oltp_read_write 8.58 11.87 1.4
oltp_update_index 3.89 3.02 0.8
oltp_update_non_index 3.89 2.97 0.8
oltp_write_only 5.37 6.21 1.2
types_delete_insert 7.7 6.43 0.8
writes_mean_multiplier 0.9
TPC-C TPS Tests MySQL Dolt Multiple
tpcc-scale-factor-1 99.27 39.96 2.5
tpcc_tps_multiplier 2.5
Overall Mean Multiple 1.67
dolt - 1.42.15

Published by github-actions[bot] about 2 months ago

Merged PRs

dolt

  • 8296: fix case insensitive column match in kv iter
    When building the keyLookupMapper for the special prolly tree kvIter, we match column names by exact casing. This leads to incorrect lookup indexes.
    The fix was to use lowercase when matching.
    companion pr: https://github.com/dolthub/go-mysql-server/pull/2640
  • 8295: Update README.md
  • 8292: working?
  • 8291: [commands] faster sqldump query scanner
    The default sqldump scanner works as follows:
    • (1) read +4k lines into buffer
    • (2) scan entire buffer for delimiter (starting from index 0)
    • (3) failing to find delimiter, go to step 1
      So if the delimiter is at character 16k, we will execute the loop 4 times. I noticed some maybe weird things about the default buffered reader from a cursory glance that I might be misunderstanding: (i) it copies bytes between at least 3 buffers (src, dst, target), (ii) do double IO for reads (the bufio docs say this, I didn't completely follow where it happens), and (iii) when doubling the buffer capacity reread the initial contents again (tracking the size of the buffer, it seems to increase and then decrease slightly at doubling implements. There might be something else going on here).
      Anyway, this prototype just doesn't do any of that. Just stream the bytes seeking a delimiter.
      Prototype goes from ~10seconds to import 10k lines -> 1 second on the motivating test dataset.

    Update with tests passing:
    before
    maxhoffman@Maxs-MacBook-Pro ~/D/d/indexer> time dolt sql < doltdump.sql &> log
    ________________________________________________________
    Executed in  276.23 secs    fish           external
    usr time  276.57 secs    0.19 millis  276.57 secs
    sys time    2.70 secs    1.72 millis    2.70 secs
    after:
    maxhoffman@Maxs-MacBook-Pro ~/D/d/indexer> time dolt sql < doltdump.sql &> log
    ________________________________________________________
    Executed in   33.79 secs    fish           external
    usr time   38.18 secs    0.21 millis   38.18 secs
    sys time    0.64 secs    1.78 millis    0.64 secs
    

    Update after not double-parsing every query:
    maxhoffman@Maxs-MacBook-Pro ~/D/d/indexer> time dolt sql < doltdump.sql &> log
    ________________________________________________________
    Executed in   25.60 secs    fish           external
    usr time   30.31 secs    0.28 millis   30.31 secs
    sys time    0.56 secs    1.21 millis    0.56 secs
    

go-mysql-server

Closed Issues

dolt - 1.42.14

Published by github-actions[bot] about 2 months ago

Merged PRs

dolt

go-mysql-server

  • 2637: Adding a test combining duplicate indexes through create table and alter table
    Additional test case for https://github.com/dolthub/go-mysql-server/pull/2634
  • 2635: Fix multi-statements in nested triggers
    changes:
    • have different savepoint names
    • this fixes nested triggers overwriting save points and clearing the same savepoint
    • handle aliases independently for each statement in trigger blocks
    • sync up prepend and scope nesting for triggers
    • wrap applyTrigger rule wraps triggerExecutors over individual statements in BeginEndBlocks
    • this prevents wrapping triggerExecutors over the wrong statements (not matching event or table)
      related: https://github.com/dolthub/dolt/issues/8213
  • 2634: Adding tests for supporting duplicate secondary indexes
    New tests asserting that multiple indexes over the same set of columns can be created on tables.
    https://github.com/dolthub/dolt/pull/8274 fixes Dolt for these tests to pass.

Closed Issues

  • 8221: Fix Grammar Typos in Go Comments
  • 8213: nested (delete) triggers have the wrong context in the nested call
  • 8250: Permission for the home directory: panic: runtime error: invalid memory address or nil pointer dereference

Performance

Read Tests MySQL Dolt Multiple
covering_index_scan 2.18 0.68 0.3
groupby_scan 12.98 16.41 1.3
index_join 1.39 2.66 1.9
index_join_scan 1.3 2.14 1.6
index_scan 34.95 54.83 1.6
oltp_point_select 0.18 0.3 1.7
oltp_read_only 3.49 5.77 1.7
select_random_points 0.34 0.65 1.9
select_random_ranges 0.39 0.69 1.8
table_scan 34.95 54.83 1.6
types_table_scan 75.82 144.97 1.9
reads_mean_multiplier 1.6
Write Tests MySQL Dolt Multiple
oltp_delete_insert 7.98 5.99 0.8
oltp_insert 3.75 2.97 0.8
oltp_read_write 8.58 11.87 1.4
oltp_update_index 3.89 3.02 0.8
oltp_update_non_index 3.89 2.97 0.8
oltp_write_only 5.37 6.09 1.1
types_delete_insert 7.7 6.43 0.8
writes_mean_multiplier 0.9
TPC-C TPS Tests MySQL Dolt Multiple
tpcc-scale-factor-1 100.08 39.36 2.5
tpcc_tps_multiplier 2.5
Overall Mean Multiple 1.67
dolt - 1.42.13

Published by github-actions[bot] 2 months ago

Merged PRs

dolt

  • 8274: Allow duplicate indexes, to match MySQL behavior
    This change allows Dolt to create multiple indexes on a table to cover the same set of columns, to match MySQL's behavior. While duplicate indexes are not generally useful, some MySQL tooling (e.g. Django) can create duplicate indexes as part of generated schema migration code.
    Test for creating duplicate indexes were added to GMS, in the PR below. I also tested merge behavior with duplicate indexes and confirmed that we already have a guardrail in place to prevent merging multiple indexes that cover the same set of columns, and a test for that guardrail.
    GMS PR: https://github.com/dolthub/go-mysql-server/pull/2634
    Customer issue: https://github.com/dolthub/dolt/issues/8254
  • 8273: ResolveDefaultExpression should return a ColumnDefaultValue instance, because it has information about the type of the column and performs conversions.
    Fixes https://github.com/dolthub/dolt/issues/8269
    We were removing the ColumnDefaultValue expression node when building secondary indexes, which was a problem when the type of the default value needed to be converted to the column type: it's the ColumnDefaultValue expression that does that conversion.
  • 8268: Db/dolt ci events
  • 8260: Fixes for dolt installer bash-script
    1. Fixed error message mistakenly implying that dolt can be installed on 32-bit systems
    2. Fixed color printing on bash (and zsh running in bash compatibility mode) installed on an OS with non-GNU coreutils (Alpine Linux, macOS)
    3. Linted install.sh and fixed all the errors and warning that had obvious fixes. The only one left is local error_code="$1" in fail function that should be removed but may be reused for debugging purposes by manually editing the script
  • 8253: Add dolt_workspace_* system tables
    This is the read only addition of the dolt_workspace_{table} s. These dynamically generated tables are always relative to HEAD for the given session of the caller. There are no commits as a result, and the schema of the output looks like:
    | ID (int) | STAGED (bool) | DIFF_TYPE (string) | to_A | to_B | ... | from_A | from_B | ... |
    Currently there is no mechanism to update this table directly, but in the future that will make it possible to have fine grain modification of your workspace, similar to git add --patch
  • 8242: Skip filterIter match check when a key range is contiguous

go-mysql-server

Closed Issues

  • 8254: Adding a duplicate constraint with a new name overwrites the existing constraint rather than coexisting
  • 8269: Panic when creating index on generated column
  • 7830: Support --empty flag for handling empty commits during rebase
  • 8250: Permission for the home directory: panic: runtime error: invalid memory address or nil pointer dereference

Performance

Read Tests MySQL Dolt Multiple
covering_index_scan 2.07 1.06 0.5
groupby_scan 13.22 17.01 1.3
index_join 1.32 2.66 2.0
index_join_scan 1.25 2.14 1.7
index_scan 34.33 54.83 1.6
oltp_point_select 0.18 0.3 1.7
oltp_read_only 3.49 5.77 1.7
select_random_points 0.34 0.65 1.9
select_random_ranges 0.39 0.7 1.8
table_scan 34.33 54.83 1.6
types_table_scan 74.46 147.61 2.0
reads_mean_multiplier 1.6
Write Tests MySQL Dolt Multiple
oltp_delete_insert 8.13 5.99 0.7
oltp_insert 3.82 3.02 0.8
oltp_read_write 8.58 11.87 1.4
oltp_update_index 3.89 3.02 0.8
oltp_update_non_index 3.89 2.97 0.8
oltp_write_only 5.37 6.09 1.1
types_delete_insert 7.7 6.43 0.8
writes_mean_multiplier 0.9
TPC-C TPS Tests MySQL Dolt Multiple
tpcc-scale-factor-1 96.5 38.71 2.5
tpcc_tps_multiplier 2.5
Overall Mean Multiple 1.67
dolt - 1.42.11

Published by github-actions[bot] 2 months ago

Merged PRs

dolt

  • 8261: liuliu/graph-spaces-fix
  • 8259: liuliu/graph-remove-extra-s
  • 8258: Bug fix: Testing for invalid global configuration dir permissions earlier, to prevent a panic
    Customer issue: https://github.com/dolthub/dolt/issues/8250
  • 8252: go/store/nbs: Fix table_index for table files with so many chunks that certain index slice operations overflow a uint32.
  • 8251: liuliu/color-string-fix
  • 8248: support \checkout, \merge, \show
    Until now there was an awkward behavior in dolt sql shell where the \checkout and \merge commands didn't really play nice when they could have. Specifically, if you used dolt sql and it was connected to a remote host, then running \checkout would give you an error telling you to stop the server. That is no longer the case. \checkout and \merge will work well in the dolt sql shell when connected to a remote host now.
    Also added the \show command
    And made the expect tests more correct and expanded.
  • 8247: Fix displaying AddressMap non-leaf nodes in noms show
    This fixes an index out of bounds panic when trying to display AddressMap non-leaf nodes in noms show.
    I didn't add a regression test, because it turns out that all noms show tests only work against the old deprecated format, so adding more tests to the test suite wouldn't actually do anything. It's also mostly unused: we basically only use it as part splunk.pl to visualize chunks for debugging.
    Instead of fixing the test suite, it would be a better use of my time to fully deprecate noms show and switch splunk.pl to use dolt show instead. Especially now that https://github.com/dolthub/dolt/pull/8143 added support for visualizing prolly tree chunks in dolt show, bringing it to feature parity with noms show.
  • 8245: Feature: support for --empty=[drop|keep] in dolt_rebase()
    Adds support for the --empty option to dolt_rebase(). This option controls how commits that become empty are handled. For example, if a branch is rebased and all the changes in one commit on that branch have already been applied to the upstream branch, then when that commit is reapplied, it will end up being empty. The two initially supported values are drop and keep. (Git also supports a stop value, which lets the user manually intervene.)
    Also adds support for the --allow-empty flag for dolt_cherry_pick(). This flag controls whether Dolt will cherry-pick empty commits (i.e. commits that start off as empty, not commits that become empty after they are applied).
    These behaviors are slightly confusing for two reasons: 1) Git distinguishes between a commit that starts off empty and a commit that becomes empty while applying its changes, and 2) rebase and cherry-pick have slightly different default values for these two options. The differences are summarized below.
    Commits that start empty Commits that become empty
    rebase default: keep, can be overridden with --no-keep-empty (--keep-empty is also supported) default: drop, can be overridden with --empty=keep. For interactive rebases, the default changes to stop, which is not supported by Dolt yet.
    cherry-pick default: fail, can be overridden with --allow-empty default: stop, can be overridden with --empty=keep or --empty=drop. Dolt does not support stop yet, so Dolt's default is to fail.
    Related issue: https://github.com/dolthub/dolt/issues/7830
  • 8239: Rewrite the dolt show implementation.
    dolt show had an issue where it would not correctly display the SerialMessage for commits if provided with a hash. This came about as part of a refactor to make dolt show not depend on the env.DoltEnv object, when only exists on locally running servers, and not when connected to a remote server. Unfortunately, it looks like that refactor didn't actually remove the dependency either, as DoltEnv was still used in every possible invocation of dolt show
    To get it working, I essentially rewrote the implementation of dolt show in such a way that it now actually only uses DoltEnv when it can't get the necessary information from a running server: Basically, if we need to display SerialMessages or resolve branch names, we still rely on a locally running server. This can likely be improved in the future. But calls like dolt show #hash should now work against remote servers.
  • 8226: Liuliu/log one line graph
    An example of the graph of us-jails:
    Screenshot 2024-08-07 at 2 04 03 PM
  • 8222: [kvexec] customized operator for count aggregation
    Operators that count the number of rows in a relation don't have to deserialize those KV's from storage (ex: select count(y) from xy where x > 1). There are some circumstances where we have to check for field nullability, but otherwise we can just count the KV's returned by the source iterator.
  • 8129: Add optimized diffing and three-way merge of indexed JSON Documents.
    This PR adds some additional tests, but I plan on adding more tests around large documents before merging. Still, the implementation is ready for review.
    This adds a new JSON diffing algorithm designed for IndexedJSONDocument. Because three way merge only operates on values read from a Dolt table, which are always returned as a IndexedJSONDocuemt, this should mean that the original implementation is no longer used.

go-mysql-server

  • 2629: normalize column defaults
    This PR adds a new analyzer rule to normalize literal column default values.
    This rule ensures that the default value is consistent for the column type (float defaults over int columns are rounded properly).
    It does this by evaluating the column default, and placing that into a NewLiteral of the proper type.
    Additionally, this ensures that dolt serialization receives consistent values (normalized floats and proper types).
    fixes: https://github.com/dolthub/dolt/issues/8190
  • 2627: Fix error when comparing incompatible types in IndexLookups
    When building lookups for IndexedTableAccess, we always convert the key type to the columns type.
    This is problematic when the key can't be converted to the column type without error.
    The expressions used in Filters properly handle this conversion, so we should default to that.
    Example:
    tmp/main*> create table t (i int primary key);
    tmp/main*> select * from t where i = json_array();
    error: '[]interface {}' is not a valid value type for 'int'
    
    This doesn't errror in MySQL. Also without a primary key or secondary index, the query succeeds in dolt.
  • 2625: Bug fix: the timestamp function should convert to a datetime type
    MySQL's timestamp function, despite its name, actually returns a datetime type and not a timestamp type.
    MySQL example:
    mysql -uroot --protocol TCP -e "select timestamp('1000-01-01 00:00:00');" --column-type-info
    Field   1:  `timestamp('1000-01-01 00:00:00')`
    Catalog:    `def`
    Database:   ``
    Table:      ``
    Org_table:  ``
    Type:       DATETIME
    Collation:  binary (63)
    Length:     19
    Max_length: 19
    Decimals:   0
    Flags:      BINARY
    +----------------------------------+
    | timestamp('1000-01-01 00:00:00') |
    +----------------------------------+
    | 1000-01-01 00:00:00              |
    +----------------------------------+
    
    Note: We still need to add support for the second, optional parameter to timestamp().
    Customer issue: https://github.com/dolthub/dolt/issues/8236

Closed Issues

  • 4367: Add support for --graph option in dolt log
  • 8190: Table schema stores unnormalized expression for default values, which leads to unexpected behaviors.
  • 8236: Issue with datetime(6) column using timestamp(6) range

Performance

Read Tests MySQL Dolt Multiple
covering_index_scan 2.07 1.16 0.6
groupby_scan 12.98 16.71 1.3
index_join 1.34 2.66 2.0
index_join_scan 1.27 2.14 1.7
index_scan 34.33 55.82 1.6
oltp_point_select 0.18 0.3 1.7
oltp_read_only 3.43 5.88 1.7
select_random_points 0.33 0.65 2.0
select_random_ranges 0.39 0.81 2.1
table_scan 34.95 55.82 1.6
types_table_scan 75.82 144.97 1.9
reads_mean_multiplier 1.7
Write Tests MySQL Dolt Multiple
oltp_delete_insert 7.98 5.88 0.7
oltp_insert 3.75 2.97 0.8
oltp_read_write 8.43 11.87 1.4
oltp_update_index 3.82 2.97 0.8
oltp_update_non_index 3.89 2.91 0.7
oltp_write_only 5.37 6.09 1.1
types_delete_insert 7.7 6.43 0.8
writes_mean_multiplier 0.9
TPC-C TPS Tests MySQL Dolt Multiple
tpcc-scale-factor-1 99.52 39.06 2.5
tpcc_tps_multiplier 2.5
Overall Mean Multiple 1.70
dolt - 1.42.10

Published by github-actions[bot] 2 months ago

Merged PRs

dolt

go-mysql-server

  • 2625: Bug fix: the timestamp function should convert to a datetime type
    MySQL's timestamp function, despite its name, actually returns a datetime type and not a timestamp type.
    MySQL example:
    mysql -uroot --protocol TCP -e "select timestamp('1000-01-01 00:00:00');" --column-type-info
    Field   1:  `timestamp('1000-01-01 00:00:00')`
    Catalog:    `def`
    Database:   ``
    Table:      ``
    Org_table:  ``
    Type:       DATETIME
    Collation:  binary (63)
    Length:     19
    Max_length: 19
    Decimals:   0
    Flags:      BINARY
    +----------------------------------+
    | timestamp('1000-01-01 00:00:00') |
    +----------------------------------+
    | 1000-01-01 00:00:00              |
    +----------------------------------+
    
    Note: We still need to add support for the second, optional parameter to timestamp().
    Customer issue: https://github.com/dolthub/dolt/issues/8236
  • 2624: Use ctx.Done() as a faster check for ctx.Err()
    The err call is noticeable for queries that read a lot of rows.
  • 2623: Fix anti-join correctess bug
    We had some strange logic for accepting a join anti-match, ripped it out and everything seems to be working correctly now.
  • 2621: implement icu_version function
    MySQL Docs: https://dev.mysql.com/doc/refman/8.4/en/information-functions.html#function_icu-version
  • 2619: Assume text index comparisons are exact
    We currently do not eliminate filters of the form column(VARCHAR) = text literal (longtext) when pushing filters into index lookups. The safety check is necessary at least for datetimes, spatial/fulltext and partial TEXT indexes. It's not clear whether it is necessary for full varchar indexes.
    dolt side seems OK: https://github.com/dolthub/dolt/pull/8218

Closed Issues

Performance

Read Tests MySQL Dolt Multiple
covering_index_scan 2.07 2.86 1.4
groupby_scan 13.22 17.01 1.3
index_join 1.34 2.66 2.0
index_join_scan 1.27 2.11 1.7
index_scan 34.33 53.85 1.6
oltp_point_select 0.18 0.3 1.7
oltp_read_only 3.49 5.88 1.7
select_random_points 0.34 0.65 1.9
select_random_ranges 0.39 0.83 2.1
table_scan 34.33 54.83 1.6
types_table_scan 74.46 142.39 1.9
reads_mean_multiplier 1.7
Write Tests MySQL Dolt Multiple
oltp_delete_insert 8.13 5.99 0.7
oltp_insert 3.82 2.97 0.8
oltp_read_write 8.58 12.08 1.4
oltp_update_index 3.89 3.02 0.8
oltp_update_non_index 3.89 2.97 0.8
oltp_write_only 5.37 6.09 1.1
types_delete_insert 7.7 6.43 0.8
writes_mean_multiplier 0.9
TPC-C TPS Tests MySQL Dolt Multiple
tpcc-scale-factor-1 98.97 39.23 2.5
tpcc_tps_multiplier 2.5
Overall Mean Multiple 1.70
dolt - 1.42.9

Published by github-actions[bot] 2 months ago

Merged PRs

dolt

go-mysql-server

  • 2623: Fix anti-join correctess bug
    We had some strange logic for accepting a join anti-match, ripped it out and everything seems to be working correctly now.
  • 2620: implement name_const function
    MySQL docs: https://dev.mysql.com/doc/refman/8.4/en/miscellaneous-functions.html#function_name-const
  • 2618: More aggressively elide IN filters used for indexed lookups
    re: https://github.com/dolthub/dolt/pull/8215
  • 2617: More QueryProps, missed max1rowiter usage
  • 2613: Query properties rule filtering
    Edit most of the analyzer interfaces to pass a new context object that accumulates query specific properties. Currently the object is called QueryFlags, and accumulates information about the query to inform better rule filtering and more efficient spooling strategies.
    The change that has the biggest effect on oltp_point_select perf is the sql.QFlagMax1Row setting, which lets us skip the default results iter boilerplate when we're only returning one row. Added a couple other skips for rules that are easy to whitelist correctly and show prominently on CPU profiles, like aggregations and subqueries.

vitess

  • 361: adding instant as non-reserved keyword
    The INSTANT keyword isn't in the MySQL docs, but it is a non reserved keyword.
    MySQL Docs: https://dev.mysql.com/doc/refman/8.4/en/keywords.html
    fixes: https://github.com/dolthub/dolt/issues/8220
  • 360: Bug fix: Preserve sign for integers in prepared statements
    Bound integer values for prepared statements are parsed from the wire and packaged into int64 values that are then passed to the SQL engine to execute with the prepared statement. For int8, int16, int24, and int32 types those bytes from the wire weren't getting cast to the correct type first, before they were cast to int64, which meant if the signed bit was set, the value was interpreted incorrectly.
    Customer issue: https://github.com/dolthub/dolt/issues/8085

Closed Issues

  • 8220: Correctness Issue, queries run differently on MySQL and Dolt
  • 8206: Unknown JSON struct tag in schema_marshaling.go
  • 8167: Scheduled events fail to execute after a server restart
  • 8114: dolt_status table unhelpful during merge when there are constraint violations.

Performance

Read Tests MySQL Dolt Multiple
covering_index_scan 2.07 3.02 1.5
groupby_scan 13.7 17.63 1.3
index_join 1.37 2.66 1.9
index_join_scan 1.3 2.18 1.7
index_scan 34.95 54.83 1.6
oltp_point_select 0.18 0.3 1.7
oltp_read_only 3.49 5.99 1.7
select_random_points 0.34 0.65 1.9
select_random_ranges 0.39 0.83 2.1
table_scan 34.95 55.82 1.6
types_table_scan 75.82 144.97 1.9
reads_mean_multiplier 1.7
Write Tests MySQL Dolt Multiple
oltp_delete_insert 8.13 5.99 0.7
oltp_insert 3.82 3.02 0.8
oltp_read_write 8.58 12.08 1.4
oltp_update_index 3.89 3.02 0.8
oltp_update_non_index 3.89 2.97 0.8
oltp_write_only 5.47 6.21 1.1
types_delete_insert 7.7 6.55 0.9
writes_mean_multiplier 0.9
TPC-C TPS Tests MySQL Dolt Multiple
tpcc-scale-factor-1 98.12 38.64 2.5
tpcc_tps_multiplier 2.5
Overall Mean Multiple 1.70
dolt - 1.42.8

Published by github-actions[bot] 3 months ago

Merged PRs

dolt

  • 8211: Bug fix: binlog replication: logging levels, use MySQL's older date serialization format
    Small bug fixes for binlog replication while testing with the python-mysql-replication library:
    • Adjusting DEBUG logging to be less verbose
    • Using MySQL's older date serialization format for compatibility
    • Adding a nil check for gtidPosition
  • 8207: removing unconditional recursive calls
    fixes:
  • 8201: chore: update lib:eventsapi 2022-> 2024
    Ran into issues when I ran:
    go install github.com/dolthub/dolt/go/cmd/dolt@main
    # github.com/dolthub/dolt/go/cmd/dolt/commands
    /go/pkg/mod/github.com/dolthub/dolt/[email protected]/cmd/dolt/commands/cherry-pick.go:79:19: undefined: eventsapi.ClientEventType_CHERRY_PICK
    /go/pkg/mod/github.com/dolthub/dolt/[email protected]/cmd/dolt/commands/dump.go:112:19: undefined: eventsapi.ClientEventType_DUMP
    /go/pkg/mod/github.com/dolthub/dolt/[email protected]/cmd/dolt/commands/filter-branch.go:95:19: undefined: eventsapi.ClientEventType_FILTER_BRANCH
    /go/pkg/mod/github.com/dolthub/dolt/[email protected]/cmd/dolt/commands/gc.go:76:19: undefined: eventsapi.ClientEventType_GARBAGE_COLLECTION
    /go/pkg/mod/github.com/dolthub/dolt/[email protected]/cmd/dolt/commands/migrate.go:74:19: undefined: eventsapi.ClientEventType_MIGRATE
    /go/pkg/mod/github.com/dolthub/dolt/[email protected]/cmd/dolt/commands/profile.go:89:19: undefined: eventsapi.ClientEventType_PROFILE
    /go/pkg/mod/github.com/dolthub/dolt/[email protected]/cmd/dolt/commands/rebase.go:74:19: undefined: eventsapi.ClientEventType_REBASE
    /go/pkg/mod/github.com/dolthub/dolt/[email protected]/cmd/dolt/commands/reflog.go:63:19: undefined: eventsapi.ClientEventType_REFLOG
    /go/pkg/mod/github.com/dolthub/dolt/[email protected]/cmd/dolt/commands/show.go:72:19: undefined: eventsapi.ClientEventType_SHOW
    /go/pkg/mod/github.com/dolthub/dolt/[email protected]/cmd/dolt/commands/tag.go:81:19: undefined: eventsapi.ClientEventType_TAG
    /go/pkg/mod/github.com/dolthub/dolt/[email protected]/cmd/dolt/commands/tag.go:81:19: too many errors
    
    Realized the eventsapi library version in go.mod is outdated
  • 8187: Feature: Auto-start binlog replication on server restart
    Once replication has been started with START REPLICA;, replication now automatically restarts when the server is restarted. If replication is stopped with STOP REPLICA;, it is not restarted automatically on server restart. This matches MySQL's behavior.
    This PR also includes a few other small improvements to binlog replication:
    • allows 'show replica status' before replication is started
    • moves logging of binlog messages from DEBUG to TRACE level
    • removes several references to GetRunningServer() global function
      Resolves https://github.com/dolthub/dolt/issues/8168
  • 8184: [dtables] fix from_commit index behavior
  • 8182: go/libraries/doltcore/env/actions: Improve error message when a clone fails mid-download.
    Fixes a dropped error return in the clone implementation.
  • 8181: Bug fix: resolving case-sensitive table name for TableEditor
    A MySQL primary server may send table names in TableMap binlog events that do not match the case of the table name. We use a case-insensitive table name lookup when getting the table's schema, but not when creating the TableWriter that applies row updates/inserts/deletes. This change fixes that so that mixed case table names can replicate correctly.
  • 8177: Bump fast-xml-parser and @aws-sdk/client-ses in /.github/actions/ses-email-action
    Bumps fast-xml-parser to 4.4.1 and updates ancestor dependency @aws-sdk/client-ses. These dependencies need to be updated together.
    Updates fast-xml-parser from 4.2.5 to 4.4.1
  • 8163: chore: init go workspace
    • migrate local replace directives to go.work as per go1.18+ standards
  • 8161: Liuliu/log-graph
    command: dolt log --graph
    Steps for Drawing the Commit Graph
    1. Calculate the Positions of Each Commit:
    • Row Position: Determined by the commit's topological order, initialized using the index in the list. Adjust the vertical spacing between commits based on the commit message's height in expandGraph function.
    • Column Position, calculated based on the positions of child commits, implemented in computeColumnEnds function:
    • No Children: Places the commit in a new column (furthest right) if it has no child commits, indicating the start of a new branch.
    • Branch Children: If a commit has branch children, it is positioned in the same column as its leftmost branch child.
    • Merge Children: For commits with merge children, diagonal lines ('') are used to connect to these children, positioning the commit in an available column starting from its rightmost child.
    1. Render the Graph:
      Place a * at each commit's calculated position to mark it.
      Connect commits using vertical (|) and diagonal (\ or /) lines to represent branches and merges.
      An example of the graph of us-jails:
      Screenshot 2024-08-02 at 10 11 12 AM

go-mysql-server

  • 2616: fix insert id
    The logic setting the InsertID in OkResult, did not match results returned from last_insert_id().
    This was made apparent due to changes from https://github.com/dolthub/go-mysql-server/pull/2614.
    For a single insert statement, MySQL sets InsertID exactly once when the AutoIncrement on the column is first triggered.
    While the linked PR fixes that issue and properly sets the session variable, our insertRowHandler (which is responsible for returning OkResult structs) was setting InsertID incorrectly.
    The fix is to just read the LastInsertID from the session, since it is already set to the right value.
  • 2615: faster status updates
    System variables can be session, global, or both. sql.IncrementStatusVariable is a helper method that primarily helps the "both" category increment the global and session counters for certain variables. Threads_running is a global only variable that is incremented/decremented every begin/end query, and gets a lot of traffic. The old code used sql.IncrementStatusVariable to increment Threads_running, which was a particularly expensive way to increment a global var because (1) we'd make a new error for every call to the session updater, and (2) the extra map lookup is unnecessary. We don't do the extra map lookup now, and we weren't using the error return so I removed the return variable.
    Note: this also refactors status variables to be explicitly initializated in the engine
    bump/perf here: https://github.com/dolthub/dolt/pull/8189
  • 2614: update LAST_INSERT_ID when auto incrementing from empty, NULL, and DEFAULT
    Our logic for determining whether or not we needed to update last insert id only looked at the insertSource schema.
    This does not take into consideration empty, NULL or DEFAULT values.
    Additionally, the value that last insert id is set to depends on what the auto increment value will be.
    This PR addresses those issues.
    Also, has some refactoring for readability.
    fixes: https://github.com/dolthub/dolt/issues/7565
  • 2613: Query properties rule filtering
    Edit most of the analyzer interfaces to pass a new context object that accumulates query specific properties. Currently the object is called QueryFlags, and accumulates information about the query to inform better rule filtering and more efficient spooling strategies.
    The change that has the biggest effect on oltp_point_select perf is the sql.QFlagMax1Row setting, which lets us skip the default results iter boilerplate when we're only returning one row. Added a couple other skips for rules that are easy to whitelist correctly and show prominently on CPU profiles, like aggregations and subqueries.
  • 2609: fix output type for DateAdd() and DateSub() functions
    The output of DateAdd(), AddDate(), DateSub(), and SubDate(), changes if the input is a properly formatted string vs a date/datetime/time/timestamp.
    fixes: https://github.com/dolthub/dolt/issues/7304
  • 2604: Index searchable edits
    We previously added support for integrators choosing their own indexes with an sql.IndexSearchable interface. This was for a customer use case. This PR expands the interface to let Dolt cache information about strict key lookups.
    The motivation is that (1) strict key lookups will always be the best-case scenario result of index costing, (2) caching this information in-between ALTER statements is usually a long enough lifecycle for the overhead to be worth it.
    I added a streamlined range builder as part of this optimization that only accepts literal values in the order expected by the target lookup. The user of this range builder takes responsibility for feeding the values in the correct order. As a result, we sidestep expensive string formatting, map creating, and map lookups during range building.
    Follow-on fixes to functional dependencies permuted plans a bit more. Inner joins are chosen more frequently in some of our test plans now that we're reflecting strict key max-1-row cardinalities.
  • 2601: Fixed error in converting panics to errors

Closed Issues

  • 8206: Unknown JSON struct tag in schema_marshaling.go
  • 8202: Unconditional recursive call in prolly_index_writer_keyless.go
  • 8203: Identical blocks for then and else branches of conditional in typeinfo_test.go
  • 8204: Unconditional recursive call in cs_metrics_wrapper.go
  • 8205: Unconditional recursive call in indexed_dolt_table.go
  • 8168: Support restarting replication on server restart
  • 8185: chore: initialize go workspaces
  • 7565: Dolt doesn't update last_insert_id() when DEFAULT is used
  • 8022: Setting the delimiter to begin with / causes the dolt shell to incorrectly interpret lines as slash commands.
  • 8173: prisma migrate dev, generating changes to columns of the date type in loop
  • 8085: CJException: 3878257648 out of range for int
  • 7304: Date functions should omit 00:00:00 time component when dealing with string arguments

Performance

Read Tests MySQL Dolt Multiple
covering_index_scan 2.07 2.91 1.4
groupby_scan 13.46 16.71 1.2
index_join 1.39 2.61 1.9
index_join_scan 1.3 2.11 1.6
index_scan 34.33 52.89 1.5
oltp_point_select 0.18 0.39 2.2
oltp_read_only 3.43 6.79 2.0
select_random_points 0.33 0.7 2.1
select_random_ranges 0.39 0.81 2.1
table_scan 34.33 53.85 1.6
types_table_scan 74.46 139.85 1.9
reads_mean_multiplier 1.8
Write Tests MySQL Dolt Multiple
oltp_delete_insert 8.13 5.88 0.7
oltp_insert 3.82 2.97 0.8
oltp_read_write 8.58 12.52 1.5
oltp_update_index 3.89 2.97 0.8
oltp_update_non_index 3.89 2.91 0.7
oltp_write_only 5.37 5.99 1.1
types_delete_insert 7.7 6.43 0.8
writes_mean_multiplier 0.9
TPC-C TPS Tests MySQL Dolt Multiple
tpcc-scale-factor-1 98.89 38.07 2.6
tpcc_tps_multiplier 2.6
Overall Mean Multiple 1.77
dolt - 1.42.7

Published by github-actions[bot] 3 months ago

Merged PRs

dolt

  • 8174: check for pg_catalog table when resolving a table
    Fixes:
    https://github.com/dolthub/doltgresql/issues/442
    https://github.com/dolthub/doltgresql/issues/513
  • 8172: Reveal dolt archive command
    Move the dolt admin archive command to dolt archive
  • 8164: archive guard rails
    Add features to ensure that archive files are not used when running a remotesrv or when pushing backups.
    Also provide the --revert flag which can return a database back to the classic format.
  • 8137: Dolt index searchable, lookup caching
    Two main changes
    • cache schema hashes on root values, so that we don't have to a table lookup to get a schema hash
    • Implement sql.IndexSearchable in a way that we cache strict key lookups for a given table schema. The lifecycle for these objects is the span between ALTER TABLE statements

go-mysql-server

  • 2610: use datetime precision in information_schema.columns.datetime_precision
    When determining if a schema change occurred, one of the tables Prisma looks at is the information_schema.columns.
    Here, we incorrectly mark all datetime and timestamp columns as with a precision of 6.
    If a table has type DATETIME(3), Prisma would think there was a schema change, and perform a migration when one isn't needed.
    This PR addresses this issue by having the information_schema.columns table accurately reflect the datetime preicision of the columns,
    fixes: https://github.com/dolthub/dolt/issues/8173
  • 2604: Index searchable edits
    We previously added support for integrators choosing their own indexes with an sql.IndexSearchable interface. This was for a customer use case. This PR expands the interface to let Dolt cache information about strict key lookups.
    The motivation is that (1) strict key lookups will always be the best-case scenario result of index costing, (2) caching this information in-between ALTER statements is usually a long enough lifecycle for the overhead to be worth it.
    I added a streamlined range builder as part of this optimization that only accepts literal values in the order expected by the target lookup. The user of this range builder takes responsibility for feeding the values in the correct order. As a result, we sidestep expensive string formatting, map creating, and map lookups during range building.
    Follow-on fixes to functional dependencies permuted plans a bit more. Inner joins are chosen more frequently in some of our test plans now that we're reflecting strict key max-1-row cardinalities.

vitess

  • 360: Bug fix: Preserve sign for integers in prepared statements
    Bound integer values for prepared statements are parsed from the wire and packaged into int64 values that are then passed to the SQL engine to execute with the prepared statement. For int8, int16, int24, and int32 types those bytes from the wire weren't getting cast to the correct type first, before they were cast to int64, which meant if the signed bit was set, the value was interpreted incorrectly.
    Customer issue: https://github.com/dolthub/dolt/issues/8085
  • 359: fix detection of multi-statements in ComPrepare
    Currently, preparing multi-statements is not supported; so we can't prepare a query like select ?; select ?;.
    However, the check for this condition just looked for any characters after the first ;, which meant that queries like select ?; \n would incorrectly throw an error.
    This was made apparent using the Prisma ORM, which runs the query:
    SELECT TABLE_NAME AS view_name, VIEW_DEFINITION AS view_sql
    FROM INFORMATION_SCHEMA.VIEWS
    WHERE TABLE_SCHEMA = ?;
    
    The above query ends in a newline character.
    The fix is to use SplitStatementToPieces(), which trims these white space characters, and check if there's exactly one piece; this was taken from the vitessio repo: https://github.com/vitessio/vitess/blob/main/go/mysql/conn.go#L1204
    fixes https://github.com/dolthub/dolt/issues/8157

Closed Issues

  • 4127: Support for date and time literals
  • 8173: prisma migrate dev, generating changes to columns of the date type in loop
  • 7200: Weird behavior for datetime literal syntax.
  • 8157: Error when using Prisma ORM: "Error: unknown error: can not prepare multiple statements"

Performance

Read Tests MySQL Dolt Multiple
covering_index_scan 2.11 3.02 1.4
groupby_scan 13.22 17.01 1.3
index_join 1.34 2.81 2.1
index_join_scan 1.27 2.22 1.7
index_scan 38.25 54.83 1.4
oltp_point_select 0.18 0.44 2.4
oltp_read_only 3.49 7.56 2.2
select_random_points 0.34 0.78 2.3
select_random_ranges 0.39 0.92 2.4
table_scan 38.94 55.82 1.4
types_table_scan 80.03 144.97 1.8
reads_mean_multiplier 1.9
Write Tests MySQL Dolt Multiple
oltp_delete_insert 8.13 6.09 0.7
oltp_insert 3.82 3.02 0.8
oltp_read_write 8.58 13.7 1.6
oltp_update_index 3.89 3.07 0.8
oltp_update_non_index 3.89 3.02 0.8
oltp_write_only 5.47 6.32 1.2
types_delete_insert 7.7 6.67 0.9
writes_mean_multiplier 1.0
TPC-C TPS Tests MySQL Dolt Multiple
tpcc-scale-factor-1 99.65 36.43 2.7
tpcc_tps_multiplier 2.7
Overall Mean Multiple 1.87
dolt - 1.42.6

Published by github-actions[bot] 3 months ago

Merged PRs

dolt

  • 8166: [kvexec] Fix panic in non-covering strict lookup
    I missed a test case in nonCovLaxSecondaryLookupGen. I tried to add more tests surrounding the specific panic query here: https://github.com/dolthub/go-mysql-server/pull/2607.
    Also rename the rowexec to kvexec.
  • 8139: Feature: Log binlog events to disk
    Major changes:
    • Moved binlogging initialization code from sqlengine to server, so that bin logging only happens when in sql-server mode.
    • binlogProducer now sends events to the new logManager type that writes binlog events to files on disk. binlogStreamer now reads events from those logs and streams them to replicas (instead of receiving events directly from binlogProducer).
    • DoltBinlogPrimaryController now validates that the missing GTIDs from a replica are available in the binlog files and sends an error if the primary doesn't have enough binlog data on disk to get a replica in sync.

go-mysql-server

  • 2606: [memo] assume self-join stats cardinality continuity
    Self-join stats estimation is particularly expensive because all of the buckets exactly overlap. If the index is unique, the cardinality distribution will not change. If the index is non-unique, the cardinality will expand proportional to rowCount/distinctCount.
    before
    BenchmarkOltpJoinScan-12    	    1766	    694524 ns/op	  462834 B/op	    8240 allocs/op
    after
    BenchmarkOltpJoinScan-12    	    2460	    481166 ns/op	  193569 B/op	    7129 allocs/op
    
    sysbench perf here: https://github.com/dolthub/dolt/pull/8159

vitess

  • 359: fix detection of multi-statements in ComPrepare
    Currently, preparing multi-statements is not supported; so we can't prepare a query like select ?; select ?;.
    However, the check for this condition just looked for any characters after the first ;, which meant that queries like select ?; \n would incorrectly throw an error.
    This was made apparent using the Prisma ORM, which runs the query:
    SELECT TABLE_NAME AS view_name, VIEW_DEFINITION AS view_sql
    FROM INFORMATION_SCHEMA.VIEWS
    WHERE TABLE_SCHEMA = ?;
    
    The above query ends in a newline character.
    The fix is to use SplitStatementToPieces(), which trims these white space characters, and check if there's exactly one piece; this was taken from the vitessio repo: https://github.com/vitessio/vitess/blob/main/go/mysql/conn.go#L1204
    fixes https://github.com/dolthub/dolt/issues/8157
  • 358: Feature: parser support for PURGE BINARY LOGS syntax
    https://dev.mysql.com/doc/refman/8.4/en/purge-binary-logs.html
  • 357: Bug fix: Send an error response when the server fails to handle COM_BINLOG_DUMP_GTID
    A MySQL primary needs to be able to send back an error response when handling the COM_BINLOG_DUMP_GTID command. Previously, when the integrator returned an error, it was logged in the primary server logs, but it was not being sent back to the replica who sent the command. This change causes an error packet to be sent to the replica, containing the details of the error the integrator returned.
    This change is difficult to test in isolation, but I have tests in dolt that will exercise this codepath.

Closed Issues

  • 8157: Error when using Prisma ORM: "Error: unknown error: can not prepare multiple statements"
  • 6816: subquery insert into not null column throws error
dolt - 1.42.5

Published by github-actions[bot] 3 months ago

Merged PRs

dolt

  • 8162: Better error message when attempting to push from a shallow clone
    Push from a shallow clone is possible, but we should message better when it's not possible. In order to do this we need to error better from the Generational Chunk Store with a custom error.
    Fixes: https://github.com/dolthub/dolt/issues/8156
  • 8158: fix no-op merge msg in cli
    Fixes: https://github.com/dolthub/dolt/issues/8148
  • 8154: When removing from a secondary index during merging, use the pre-merge ordinal mapping, not the merged mapping, to identify the secondary key to be removed.
    Basically, we have a bug where dropping a column on the remote side of a merge can interfere with updating secondary indexes.
    Example:
    Base Schema: (pk INT PRIMARY KEY, a TINYINT, b INT, UNIQUE KEY b_idx (b))
    "theirs" drops column a
    Merged Schema: (pk INT PRIMARY KEY, b INT, UNIQUE KEY b_idx (b))
    In effect, the merger would see that in the final table, b is the second column. Then, when updating b_idx for each resolved row, it would use the second column of "ours" to find the index entry to remove. But this is incorrect, because the second column of "ours" is a.
    If the user is lucky, these two columns will be different sizes and the merger will panic. But if the two columns are the same size, merge proceeds with an incorrect value. This will cause it to either fail to remove the old row from the secondary index, or remove a different row. Either way, the secondary index is now incorrect.
  • 8150: fix warnings in dolt sql shell
    This PR fixes a bug where warnings were incorrectly being suppressed in the dolt sql shell.
    The bug is caused by the shell making queries to check if the working set is dirty, what database we're on, and what branch we're on. The fix is to set a special flag in the session to not clear the warnings for those specific queries.
    companion pr: https://github.com/dolthub/go-mysql-server/pull/2605
    fixes: https://github.com/dolthub/dolt/issues/8016
  • 8124: [stats] limit stats boostrap to server start
    fixes: https://github.com/dolthub/dolt/issues/8123

go-mysql-server

  • 2606: [memo] assume self-join stats cardinality continuity
    Self-join stats estimation is particularly expensive because all of the buckets exactly overlap. If the index is unique, the cardinality distribution will not change. If the index is non-unique, the cardinality will expand proportional to rowCount/distinctCount.
    before
    BenchmarkOltpJoinScan-12    	    1766	    694524 ns/op	  462834 B/op	    8240 allocs/op
    after
    BenchmarkOltpJoinScan-12    	    2460	    481166 ns/op	  193569 B/op	    7129 allocs/op
    
    sysbench perf here: https://github.com/dolthub/dolt/pull/8159
  • 2605: add lock to prevent warnings from being cleared
    This PR adds two functions to BaseSession that toggle a boolean, so integrators can prevent warnings from being cleared.
    This is mostly useful for dolt sql shell.
    addresses https://github.com/dolthub/dolt/issues/8016

Closed Issues

  • 8123: stats refresh warning a little too zealous
  • 8156: Unknown push Error When Pushing Large DB to Remote
  • 8016: dolt sql suppresses warnings.
  • 8148: Ghost Commit Error On Empty Merge
  • 7638: Syntax Error Occurs When Using AS Clause with ON DUPLICATE KEY UPDATE

Performance

Read Tests MySQL Dolt Multiple
covering_index_scan 2.11 2.97 1.4
groupby_scan 13.46 17.32 1.3
index_join 1.37 2.81 2.1
index_join_scan 1.3 2.22 1.7
index_scan 34.33 53.85 1.6
oltp_point_select 0.18 0.46 2.6
oltp_read_only 3.49 7.7 2.2
select_random_points 0.34 0.77 2.3
select_random_ranges 0.39 0.89 2.3
table_scan 34.33 54.83 1.6
types_table_scan 74.46 142.39 1.9
reads_mean_multiplier 1.9
Write Tests MySQL Dolt Multiple
oltp_delete_insert 8.13 6.09 0.7
oltp_insert 3.82 3.02 0.8
oltp_read_write 8.58 13.95 1.6
oltp_update_index 3.82 3.07 0.8
oltp_update_non_index 3.89 3.02 0.8
oltp_write_only 5.37 6.43 1.2
types_delete_insert 7.7 6.67 0.9
writes_mean_multiplier 1.0
TPC-C TPS Tests MySQL Dolt Multiple
tpcc-scale-factor-1 99.43 34.46 2.9
tpcc_tps_multiplier 2.9
Overall Mean Multiple 1.93
dolt - 1.42.4 Latest Release

Published by github-actions[bot] 3 months ago

Merged PRs

dolt

  • 8155: Bug fix: loading persisted @@server_id from config.json
    This fixes binlog replicas loading @@server_id values persisted in config.json so that they are properly converted from strings to the right type.
    Also added extra connection error logging
  • 8153: make tags case insensitive
    This PR makes it so tags are case-insensitive.
    Additionally, fixes a display error in the sql shell.
    fixes https://github.com/dolthub/dolt/issues/8147
  • 8152: Revive ability to read really old dolt_schema tables
    We have several old databases which we don't touch but are public on DoltHub. These databases present an error to users with the remove of the migration ability in 1.42.0. The change will bring them back.
  • 8135: Build Archives more robustly
    This change is long in coming. Several findings from building a large user database are incorporated here. Specifically:
    • No longer storing all chunks in memory
    • Cancellable process with ^C
    • Progress reporting for major stages of the build
    • No chunk grouping by default, --group-chunks to enable. Pathological cgo dictionary building problem needs to be fixed before we enable by default.
    • Writing a useful metadata json block with dolt version and origin table file ID.

go-mysql-server

  • 2605: add lock to prevent warnings from being cleared
    This PR adds two functions to BaseSession that toggle a boolean, so integrators can prevent warnings from being cleared.
    This is mostly useful for dolt sql shell.
    addresses https://github.com/dolthub/dolt/issues/8016
  • 2603: [memo] reorder should add new plans to intermediate expr join child
    There was a bug where we'd add reordered join plans to project or distinct nodes, rather than their join children. Code comment explains more clearly how this works.
  • 2602: disallow forward slash in database name
    fixes https://github.com/dolthub/dolt/issues/8126
  • 2600: Support information_schema.columns hook for doltgres
  • 2593: custom row exec
    Additions for custom row operators on Dolt side: https://github.com/dolthub/dolt/pull/8072

Closed Issues

  • 8147: Tag releases: database not found
dolt - 1.42.3

Published by github-actions[bot] 3 months ago

Merged PRs

dolt

  • 8144: Don't panic when performing a GC on a shallow clone
  • 8143: Add support for visualizing prolly tree and blob messages in dolt show
    This PR does a couple things:
    1. dolt show #address can now display the internals of a ProllyTreeMap message (typically used for storing indexes). Previously, only splunk/noms show could do this.
    2. Both dolt show and splunk can now display the contents of a Blob message.
    3. If a ProllyTreeMap leaf node contains a value that is itself an address (example: the value of text and json columns), that value is shown as a human readable address, which can be fed back into dolt show or splunk to explore the whole tree.
  • 8141: support STAGED as commit hash
    This PR adds support for STAGED as a commit_hash when filtering dolt diff system tables.
    fixes for https://github.com/dolthub/dolt/issues/7978
  • 8140: Continue to support writes when archives are in play
    Error in hasMany effectively prevented writes after moving database to archives.
  • 8084: enable key range iter
    There was an issue merging https://github.com/dolthub/dolt/pull/8025 ontop of a revert. Enable the key iteration optimization.
  • 8072: [rowexec] dolt-side lookup execution operator
    This PR adds custom Dolt execution operators for lookup joins. When building an execution plan, we try to replace joinIter with a Dolt equivalent that inlines the key building and map get. This is a lot faster than repeatedly building the secondary iterator and materializing sql.Rows in-between lookups.
    The main downside is that this PR hoists filters in join children to after materializing lookup join rows.
    This brings index_join from 5.18 ms/query to 2.64 ms/q, which will be about 2.0x MySQL's latency.
    This PR falls short of some aspiration goals:
    • We hoist table filters until after the final join row is built because we don't have a way to call scalar expressions on val.Tuple yet. There are edge case queries that might be dramatically slower because of this. To fix this, we could need to convert sql.Expression filters into a format that we could execute on val.Tuple KV pairs.
    • We do not yet try to to optimize consecutive lookup joins. I'm not sure if a materialization block would be better represented iteratively or recursively beyond a simple string of lookups. A lot of interfaces and indexing considerations to think about there.
      Safety comments:
    • we fallback to GMS when lookup source/dest keys are not prolly.Encoding compatible
    • the source iterators are the same as what we used before, but without projection mapping to sql.Rows. The keyless iterator required a change to return duplicate rows at the KV layer (vs the sql layer).
    • the secondary iterators are a generalization of what we currently use, but return KV pairs instead of rows
    • projection mapping is the same but generalized to merge an arbitrary list of KV pairs after the join
      There are extra tests here: https://github.com/dolthub/go-mysql-server/pull/2593

go-mysql-server

Closed Issues

  • 8126: CREATE DATABASE statements should not create databases hierarchicaly
  • 7698: Incorrect return typefor UNIX_TIMESTAMP
  • 7978: Should be able to use "STAGED" as commits in dolt_commit_diff_[tablename]
  • 8131: Docker Mysql Server Resets When Connecting Immediately After Launch
dolt - 1.42.2

Published by github-actions[bot] 3 months ago

Merged PRs

dolt

  • 8132: Detect reading from non-indexed JSON documents and store it in a LazyJSONDocument instead of an IndexedJSONDocument.
    This adds tests that we properly read JSON documents written before the migration to the new indexed JSON storage format. When loading such documents from storage, we should represent them as a LazyJSONDocument instead of an IndexedJSONDocument. This ensures we don't accidentally use the optimized versions of the JSON functions when there aren't any keys in the prolly tree to guide the optimization.

go-mysql-server

  • 2596: Fix information_schema.columns for databases with schemas
    Missed this table in my original PR
  • 2595: Adding user name and host length validation to CREATE USER
    This change matches MySQL's behavior of limiting user names to 32 chars and host names to 255 chars. Attempting to create a user with a name or host longer than that limit now returns a validation error.
    Customer issue: https://github.com/dolthub/dolt/issues/8120

Performance

Read Tests MySQL Dolt Multiple
covering_index_scan 2.07 3.02 1.5
groupby_scan 13.22 17.32 1.3
index_join 1.37 5.37 3.9
index_join_scan 1.27 2.57 2.0
index_scan 34.33 54.83 1.6
oltp_point_select 0.18 0.46 2.6
oltp_read_only 3.49 7.7 2.2
select_random_points 0.34 0.75 2.2
select_random_ranges 0.39 0.9 2.3
table_scan 34.33 56.84 1.7
types_table_scan 74.46 144.97 1.9
reads_mean_multiplier 2.1
Write Tests MySQL Dolt Multiple
oltp_delete_insert 8.13 6.09 0.7
oltp_insert 3.82 3.02 0.8
oltp_read_write 8.58 13.95 1.6
oltp_update_index 3.89 3.07 0.8
oltp_update_non_index 3.89 3.02 0.8
oltp_write_only 5.37 6.32 1.2
types_delete_insert 7.7 6.67 0.9
writes_mean_multiplier 1.0
TPC-C TPS Tests MySQL Dolt Multiple
tpcc-scale-factor-1 99.02 35.06 2.8
tpcc_tps_multiplier 2.8
Overall Mean Multiple 1.97
dolt - 1.42.1

Published by github-actions[bot] 3 months ago

Merged PRs

dolt

Closed Issues

  • 6272: CALL DOLT_RESET('--hard') does not implicitly commit the transaction
  • 8120: CREATE USER allows username over 32 chars, then SELECT throws error
  • 8117: REGEXP isn't collation aware
dolt - 1.42.0

Published by github-actions[bot] 3 months ago

Backwards incompatible changes in this release:

  • The dolt_schemas and dolt_procedures tables now return empty results and disallow edits. In previous releases, select * from either table would result in an error if there was no data. These tables are internal tables to Dolt which are intended to be managed by Dolt itself, and not modified by users. If your application is currently writing to these tables, it will no longer be able to do so and changes will be required in order to upgrade your database.
  • Username and host names are now restricted in length (32 and 255 respectively) to match the behavior of MySQL. Attempting to create a user with a name or host longer than that limit now returns a validation error. Existing users and hostnames which are longer than those limits are not affected.

Per Dolt’s versioning policy, this is a minor version bump because these changes may impact existing applications. Please reach out to us on GitHub or Discord if you have questions or need help with any of these changes.

Merged PRs

dolt

  • 8116: Return empty dolt_procedures and dolt_schemas tables
    Previously dolt_procedures and dolt_schemas tables could not be selected from if there were not procedures or views. That resulted in errors in the when a dolt CLI client attempted to run dolt diff. This was previously swallowed by the client, but it made the server logs dirty for no reason.
    Now, select * from dolt_schemas will result in an empty result set.
    I did not add new tests for this as several existing tests were broken by this behavior change. I fixed those instead. Due to change in existinging behavior which people may depend on, this requires a minor version bump.

go-mysql-server

  • 2595: Adding user name and host length validation to CREATE USER
    This change matches MySQL's behavior of limiting user names to 32 chars and host names to 255 chars. Attempting to create a user with a name or host longer than that limit now returns a validation error.
    Customer issue: https://github.com/dolthub/dolt/issues/8120
  • 2594: Fixed REGEXP
    This fixes the case-sensitivity issue found in: https://github.com/dolthub/dolt/issues/8117
    Although we had moved REGEXP_LIKE to the ICU engine, we forgot to also move REGEXP, which is a synonym for REGEXP_LIKE according to the docs. This makes that change, and also completely removes all remnants of the old regex code.

Closed Issues

  • 8110: ARM64 Docker Image does not contain arm binaries