dolt

Dolt – Git for Data

APACHE-2.0 License

Downloads
2.4K
Stars
17.1K
Committers
143

Bot releases are hidden (Show)

dolt - 0.17.1

Published by oscarbatori over 4 years ago

We are excited to announce the release of Dolt 0.17.1.

In this lease we extend the index support we discussed in the prior release to UNIQUE indexes. Our type inference now also supports DATE, TIME, and DATETIME types.

Merged PRs

  • 713: removed pkg resultset
  • 710: New TableEditor, thread-safety, and table import
    So I created a new TableEditor under doltdb, similar to the IndexEditor. I renamed the old editor to sqlTableEditor, and it's now using TableEditor underneath. It's just to comply with the SQL interfaces.
    Perhaps one of the most striking changes are the synchronization changes. The motivation was from a comment from @zachmu , which implied that some of the write paths are parallelized for performance, and the old editor was not thread-safe by any means. In order to support these new write paths, it was necessary to make TableEditor thread-safe. Because of this, IndexEditor also received the same changes. I verified that there was a lack of data races using go test -race, and also ran the concurrency tests with thousands more iterations and edits (they're lower now so that we're not spending an hour in Jenkins). Just to verify that my tests also worked, I made temporary intentional changes that would allow for data races, and the tests caught them, and go test -race also caught them.
    In addition to the above changes, dolt table import and dolt table cp both use the TableEditor now. Due to the concerns mentioned in the previous paragraph, I decided to benchmark a few metrics to get an idea of the performance impact of these two major changes.
    To measure just the impact of the threading changes, I imported 1,000,000 rows through SQL and timed the results:
    Before: 67.074s
    After:  67.363s
    
    The result is close enough that it's within the margin of error, so it's safe to say that new TableEditor is just as performant as the old implementation.
    The table import is a different story though. This time, I imported a 10,000,000 row .psv, with the results being:
    Before: 53.521s
    After:  107.816s
    
    It takes roughly twice as long now, which is slower, but not tragically so. Primarily, the difference in speed is that the old code just wrote directly to the edit accumulator, where as the TableEditor does a ton more bookkeeping, being a generalized editor.
  • 709: testing large numeric types
  • 708: Extra BATS for table import & schema import
  • 707: validates index columns
  • 706: reset hard fixes
  • 705: always calculate merge stats
  • 703: Zachmu/indexes
    Killed off index driver and let tables declare indexes natively.
  • 702: create table bats
  • 701: Bh/remotes bat fixes
  • 700: swish
  • 699: Andy/arg parsing
  • 698: fixed some bats tests
  • 697: support HEAD and ancestor spec syntax
  • 694: fixing zach's bats
  • 693: optimize diff and hist tables
  • 692: floats literals can be used in int columns in mysql 8
  • 691: quoting time types
  • 690: fix panic for select 1/0 from dual
    depends on go-mysql-server PR
  • 689: fixed panic for empty string args
    one down
  • 688: o/libraries/doltcore/remotestorage/events_interceptor.go: Add missing ADD_TABLE_FILES instrumentation.
  • 687: go/{store/nbs,libraries/doltcore/remotestorage}: Expose repository size as Size() on TableFileSource.
  • 686: go/libraries/doltcore/remotestorage: Move retrying and metrics instrumentation for doltremoteapi interactions to grpc client interceptors.
  • 685: go/store/nbs/table.go: Thread ctx at Open() for table file pull interface.
  • 684: proto: remoatesapi: GetRepoMetadataResponse: Add repository_size field for communicating approximate repository size.
  • 683: Added UNIQUE constraint
    Largest change you'll see here is that I've extracted the index changes in table_editor.go into its own file and built upon it. I've also changed everything that modifies indexes, from rebuilding the indexes to the SQL path, to make use of the new index editor.
  • 682: table import inferrence
  • 681: Diff SQL handles escape sequences
  • 679: Index performance improvements
  • 678: diff v2
  • 677: Updated version for release of version 0.17.0
  • 110: Zachmu/index refactor
    Rewrote index interfaces. Indexes are now available either through a driver, or natively via the table implementations themselves.
    Also re-enabled auth tests for windows by removing the pilosa dependency, and fixed several test suites that were broken on Linux.
  • 109: 1.0/0.0 == NULL, 1 div 0 == NULL
  • 108: Andy/div by zero
    This fixes a panic for select 1/0 from dual
    MySQL also errors on divide by zero in at least some configs
    I can change this to return NULL if that's preferable.
  • 107: sql/session.go: Use new view and index registries per sql.Context, instead of global ones.
  • 106: Zachmu/table naming regression
    Fixed regression caused by update to alias handling. Removed a couple places that allowed aliased tables to be referred to by their unaliased names, which is an error.
dolt - 0.17.0

Published by oscarbatori over 4 years ago

We are excited to announce the release of Dolt 0.17.0!

Dolt now supports the main index statements, that is CREATE INDEX, DROP INDEX, and all ALTER TABLE commands involving an index. Some less common types of index types that are not supported are PARTIAL, UNIQUE, SPATIAL, and FULLTEXT. Finally, SHOW INDEX is not yet supported.

As usual, please file an issue if you find any bugs, or have a feature request.

Merged PRs

  • 676: Db/tester fix
  • 674: SELECT comparisons and BETWEEN now use indexes
    SELECT <, <=, >, >=, and BETWEEN all now use indexes. Previously, only = used indexes or primary keys, so this should be a pretty huge win.
  • 673: Andy/baseball databank fix
    Baseball databank was breaking the previous migration because it has null values in primary keys
    (see table allstarfull)
  • 671: Fix issue with renaming temp files across volumes
  • 668: Added partial key lookups for indexes and primary keys
  • 665: Zachmu/case insensitive
    Updated tests for new case-insensitivity improvements in go-mysql-server. Relies on https://github.com/liquidata-inc/go-mysql-server/pull/102, which is not on master yet.
  • 664: hotfix
    not sure why Jenkins didn't catch these on https://github.com/liquidata-inc/dolt/pull/658 but it's fixed
  • 663: Verify System Table Tag Constants
    Tags for Dolt system tables are defined with const definitions and need to be consistent across Dolt versions. Using iota within const definitions can lead to unexpected results:
    const (
    name = "Andy"
    mood = "ugh"
    a = iota + 1
    b
    c
    )
    func main() {
    fmt.Println(a)
    fmt.Println(b)
    fmt.Println(c)
    }
    
    3
    4
    5
    
    These unit tests lock these constants in place.
  • 662: Updated version for release of version 0.16.4
  • 103: Zachmu/user privileges
    Adds an empty user_privileges table to the information_schema database. This is necessary for the latest version of datagrip.
    Stacked on top of #102.
  • 102: Zachmu/case sensitivity
    Better (but not perfect) case insensitivity for table, alias and column names. Better enforcement of unique table / alias names in queries.

Closed Issues

  • 672: Installation script quietly fails
dolt - 0.16.4

Published by oscarbatori over 4 years ago

Merged PRs

  • 661: disable auto completion in multi-repo environment
  • 657: rename publishrelease.sh and remove unsupported builds
  • 653: bump go-mysql-server version
  • 652: Fix dolt dump-docs
    Fix markdown for dolt sql command, and fix code that swallowed templating error that allowed broken docs in
  • 650: Secondary Indexes Pt. 1 Episode 2
    This PR is strictly the tests. All of the PR feedback and whatnot can be seen from the commit history found on the parent PR https://github.com/liquidata-inc/dolt/pull/495.
  • 649: Disallowed tag changes
  • 646: bug fix
  • 645: fix panic
    fix of 630
  • 643: Zachmu/go mysql server fork
    Use the new hard-fork of go-mysql-server.
    This is based on #641, so the diffs will be more reasonable when that is merged.
  • 641: Zachmu/test refactor
    Refactored the SQL engine tests to make it possible to validate schemas of result sets are as expected. This involved moving a lot of code around to avoid import cycles. Also unskipped or deleted a bunch of tests that had been skipped.
  • 638: /benchmark/sql_regressions/run_regressions.sh: Update jobs to push to master, not regressions
    nightly and releases now push to master. (regressions) has been merged into master
  • 637: Bumped version
  • 100: remove schema caching
  • 99: exposing ddl parse method
  • 98: fixes nil value conversion
  • 95: server: fix tests on linux
  • 94: Added RAND() function
  • 93: s/src-d/liquidata-inc/g
  • 92: Zachmu/keyword case
  • 91: Fixed pilosa using statements

Closed Issues

  • 660: show views doesn't work
  • 639: publish mysql benchmarks
dolt - 0.16.3

Published by oscarbatori over 4 years ago

Merged PRs

  • 636: sql-server command documentation
    NAME
    dolt sql-server - Start a MySQL-compatible server.
    SYNOPSIS
    dolt sql-server --config <file>
    dolt sql-server [-H <host>] [-P <port>] [-u <user>] [-p <password>] [-t <timeout>] [-l <loglevel>] [--multi-db-dir <directory>] [-r]
    DESCRIPTION
    By default, starts a MySQL-compatible server which allows only one user connection ata time to the dolt repository in the current directory. Any edits made through this server will be automatically reflected
    in the working set.  This behavior can be modified using a yaml configuration file passed to the server via --config <file>, or by using the supported switches and flags to configure the server directly on
    the command line (If --config <file> is provided all other command line arguments are ignored).
    This is an example yaml configuration file showing all supported items and their default values:
    log_level: info
    behavior:
    read_only: false
    autocommit: true
    user:
    name: root
    password: ""
    listener:
    host: localhost
    port: 3306
    max_connections: 1
    read_timeout_millis: 30000
    write_timeout_millis: 30000
    databases: []
    SUPPORTED CONFIG FILE FIELDS:
    log_level - Level of logging provided. Options are: `trace', `debug`, `info`, `warning`, `error`, and `fatal`.
    behavior.read_only - If true database modification is disabled.
    behavior.autocommit - If true write queries will automatically alter the working set.  When working with autocommit enabled it is highly recommended that listener.max_connections be set to 1 as concurrency
    issues will arise otherwise.
    user.name - The username that connections should use for authentication.
    user.password - The password that connections should use for authentication.
    listener.host - The host address that the server will run on.  This may be `localhost` or an IPv4 or IPv6 address.
    listener.port - The port that the server should listen on.
    listener.max_connections - The number of simultaneous connections that the server will accept.
    listener.read_timeout_millis - The number of milliseconds that the server will wait for a read operation.
    listener.write_timeout_millis - The number of milliseconds that the server will wait for a write operation.
    databases - a list of dolt data repositories to make available as SQL databases. If databases is missing or empty then the working directory must be a valid dolt data repository which will be made available
    as a SQL database
    databases[i].path - A path to a dolt data repository.
    databases[i].name - The name that the database corresponding to the given path should be referenced via SQL.
    If a config file is not provided many of these settings may be configured on the command line.
    OPTIONS
    --config=<file>
    When provided configuration is taken from the yaml config file and all command line parameters are ignored.
    -H <Host address>, --host=<Host address>
    Defines the host address that the server will run on (default `localhost`)
    -P <Port>, --port=<Port>
    Defines the port that the server will run on (default `3306`)
    -u <User>, --user=<User>
    Defines the server user (default `root`)
    -p <Password>, --password=<Password>
    Defines the server password (default ``)
    -t <Connection timeout>, --timeout=<Connection timeout>
    Defines the timeout, in seconds, used for connections
    A value of `0` represents an infinite timeout (default `30000`)
    -r, --readonly
    Disables modification of the database
    -l <Log level>, --loglevel=<Log level>
    Defines the level of logging provided
    Options are: `trace', `debug`, `info`, `warning`, `error`, `fatal` (default `info`)
    --multi-db-dir=<directory>
    Defines a directory whose subdirectories should all be dolt data repositories accessible as independent databases.
    --no-auto-commit
    When provided sessions will not automatically commit their changes to the working set. Anything not manually committed will be lost.
    
  • 635: Bumped version for release
  • 89: Secondary Indexes
    This has been cleaned up and whatnot, so you can give this an earnest look. Most things are a deletion, since I chose to go about things a bit differently than it was already coded in, but overall there are relatively few new additions. I also removed a lot of the temporary code that was hanging around, as it turns out most of it wasn't needed.
  • 88: Zachmu/analyzer
    Analyzer improvements, mostly related to treatment of aliases to support selecting the same table twice, but lots of other improvements as well.
  • 87: ctx in Session.Set
dolt - 0.16.2

Published by oscarbatori over 4 years ago

We are pleased to release a couple of bug fixes and a new configuration feature.

In. particular:

  • a bug that caused queries against dolt_diff_<table> tables is now fixed
  • users can now configure their MySQL Server instance with a YAML file that will allow them to select multiple repos, among other options

Merged PRs

  • 625: Added skipped bats test for bad cloning behavior
  • 622: configuration changes
  • 621: Db/push to regressions latest
    idea here is to maintain two shallow branches used to help with the sqllogictest-tester jenkins job. whenever we run the releases job, we update branch regressions-tip-latest to the latest release results and also update regressions-tip-previous to be what regressions-tip-latest was...
    Decided to have two branches since we may not always get fixing a regression right away, and this gives us a little time to do so...
  • 618: Merging release with version bump back to master
  • 617: Andy/dolt diff panic
  • 616: /bats/arg-parsing.bats: Add skipped test for checkout panic
  • 615: Bh/partial key2
    handle composite set for partial key iteration
    refactor: pulled some of the initialization code out of the creation functions
  • 614: partial key iteration
    This is simple partial key iteration. Certainly not optimal.
  • 613: Zachmu/sql batch errors
    Real tokenization for SQL statements in batch processing mode, instead of line-based scans with hacks for embedded semicolons. Print out the line number failed query during batch processing failure.
    Fixes https://github.com/liquidata-inc/dolt/issues/555
  • 612: Db/sqllogictest tester
  • 611: filter reader tests
  • 609: change default cmItr which used to rely on WithFilters call

Closed Issues

  • 623: AWS URL Formats ?
dolt - 0.16.1

Published by oscarbatori over 4 years ago

This release contains a bug fix to the SQL server implementation that may cause the server to have issues with starting.

Merged PRs

  • 607: Jenkinsfile: Enable AWS remote bats tests in Jenkins.
  • 605: {bats,libraries/doltcore/env}: Fix init stomp bug
    Dont stomp/save any docs if one or more docs already exist in repo
  • 604: Skipped bats tests for dolt init stomping existing LICENSE.md and REA…
    …DME.md
  • 603: Schema export specification in bats tests
  • 600: Added test for bad error message on only passing one argument to dolt…
    … push
  • 599: Fixed bug for multi-key indexes
  • 596: go/store/config: config_test.go: Fix test that fails when running test suite as root.
    This can happen when building and running in a golang docker container, for
    example.
  • 595: [WIP] Writable Branches
    Still a WIP. Writing additional bats and unit tests.
  • 594: Tim/bats schema import tags
    Skipped bats test for two sequential schema imports causing a guaranteed tag collision
  • 592: /benchmark/sql_regressions/DoltRegressionsJenkinsfile: Refactor sql-watchers failure email
  • 590: Add json to supported output types in dolt sql --help
    Fixes #588
  • 586: Andy/Init commits don't need migration
    If a repo created with an old client ( < 0.16.0) has a branch with only the init commit, newer clients will always register that repo as un-migrated.
    This change ignores init commits when checking if a repo has been migrated
  • 584: Added skipped bats test for DATETIME support in schema import
  • 583: go/utils/publishrelease: Run the builds in a docker container to a get managed toolchain.

Closed Issues

  • 598: enum types can't be inserted
  • 589: Incorrect sql command help
dolt - 0.16.0 [ACTION REQUIRED]

Published by oscarbatori over 4 years ago

Dolt 0.16.0 is a very exciting release. It contains an important change to how we store columns, as well as a host of exciting features. The change to how we store columns does require users to migrate their repositories when they upgrade. We will provide some background, as well as the (very simple) migration procedure, before discussing features in the release.

We are absolutely committed to making this as painless as possible for our users, as a consequence don't hesitate to shoot us a note at [email protected] if you face any difficulty with the migration, or just need to discuss it further due to the sensitive nature of your data.

Unique Tags Migration

Dolt uses an integer value called a tag to identify columns. This contrived example illustrates that:

$ dolt init
$ cat > text.csv
name,id
novak,1
$ dolt table import -c --pk=id peeps text.csv
CREATE TABLE `peeps` (
  `name` LONGTEXT COMMENT 'tag:0',
  `id` LONGTEXT NOT NULL COMMENT 'tag:1',
  PRIMARY KEY (`id`)
);

Versions of Dolt prior to this release only required tag uniqueness per table in a given commit, not across tables and across every commit.. This caused issues when diffing and merging between commits where column tags had been reused. We decided to bite the bullet and make the fix. Going forward, all column tags will be unique across all tables and history.

Existing Dolt repositories must be migrated to be used with Dolt client 0.16.0 and later. Running a command with the new client on an old repository will result in an error message prompting you to migrate the repository. The migration process is heavily tested, deterministic, and makes no changes to underlying data. It merely fixes the format to satisfy the requirements of new versions. . After upgrading Dolt, run a single command in your repository to migrate:

$ dolt migrate

Once that's complete you should be done. If you have Dolt data living at a remote, and collaborators, there's just one additional step. The first user to upgrade will need to run:

$ dolt migrate --push

This will force push the migrated branches. Subsequent collaborators will have to then run:

$ dolt migrate --pull

This will sync their migrated repo with the migrated remote, and preserve any local changes they have, applying them on top.

SQL

We are committed to making our SQL implementation as close as possible 100% correct, and this release represents a big step towards that goal. The improvements include:

  • SHOW CREATE VIEW now works, so views can be inspected in the canonical manner
  • views now appear in SHOW TABLES statements
  • added support for new types: DECIMAL, TIME, SET, ENUM, NCHAR, NVARCHAR, and aliases to many more
  • dolt sql-server and dolt sql now support accessing multiple dolt repositories in a single SQL session. Each repository is exposed as a database. See databases with SHOW DATABASES, and select one to query with USE $database. Joins across repositories are supported. Start dolt sql or dolt sql-server with the new --multi-db-dir argument, which must name a directory containing dolt repositories to expose in queries.
  • dolt sql-server now supports writes, which means that the working set will be updated by UPDATE, INSERT, DELETE and other statements which change data. The SQL server was previously read-only. This fixes https://github.com/liquidata-inc/dolt/issues/549. Important caveat: only one concurrent connection is allowed to prevent data races. Support for better concurrency is being tracked in https://github.com/liquidata-inc/dolt/issues/579
  • functions user(), left(), if() are now supported, motivated by getting Dolt working with DataGrip
  • saved queries now execute at the command line with dolt sql -x option
  • more complete implementation of information_schema database
  • JSON output option for SQL query results with dolt sql -r json

VCS in SQL

As well as making our SQL implementation compliant with MySQL, we are also committed to implementing all VCS operations available on the command line available via the SQL interface.

  • we now have a dolt_branch system table where the list of branches on a repo is surfaced in SQL

Remotes

We have now fixed AWS S3 remotes, so if you want to use your own S3 infrastructure as a backend, you can do that. See the dolt remote CLI documentation for details. While we love to see data and users on DoltHub, we are committed to making Dolt and open standard, and that means being useful to the broadest possible audience.

Bug Fixes etc.

As well as fixing the issue with remotes, we fixed a number of other bugs:

  • checking out or merging working set doc files
  • Better SQL error messages
  • SQL queries respect case of column aliases, issue here
  • Queries required by Jetbrains DataGrip are now supported issue here.

Merged PRs

  • 577: streaming map edits
  • 576: Help Fix
    As identified by Asgavar in https://github.com/liquidata-inc/dolt/pull/553 there is a segfault caused by differences in logic between isHelp and the Parse function of the ArgParser. I found that changing the Parser to be like the isHelp function caused issues for some commands if you have a branch named help or a table named help. As a result I opted to change the isHelp logic instead.
    Thank you @Asgavar
  • 572: More Types V2
    Have fun @zachmu
  • 571: Skipping git-dolt bats tests on Windows due to flakiness
    Is this fine? By putting it at the end of the setup function, it's equivalent to manually putting a skip on every test. Then whenever we fix it, we can just delete it in one place.
  • 569: Andy/migrate push pull
  • 568: Zachmu/sql updates
    Implemented auto-commit database flavor, and use it in SQL server. Also:
  • 565: SQL Reserved word and SQL Keyword in column name tests
  • 564: go/libraries/doltcore/env: paths.go: Consult HOME environment variable for home location before consulting os/user.
  • 563: Dockerfile: Bump golang version; use mod=readonly.
  • 559: Andy/migration refactor
  • 556: Basic SQL batch mode bats tests
  • 554: Skipped bats test for help command segfault
  • 552: using correct root
  • 551: Zachmu/sql json
    Added JSON result output to sql command. Also fixed handling of NULL values in CSV output.
    This fixes https://github.com/liquidata-inc/dolt/issues/533
  • 548: read only version of branches table
  • 547: Km/doc checkout bug
    Taylor brought a bad docs bug to my attention. If you had a modified doc, and then dolt checkout <branch> or dolt checkout -b <branch>, your local docs would be overwritten by the working root (your changes vanish).
    The intended behavior is to keep the working set exactly as it is on dolt checkout <branch>, given there are no conflicts. Since your "working set" for docs is technically on the filesystem, and not on the roots, it was getting wiped. Now i'm pulling the unstagedDocDiffs from the filesystem, and excluding those when it comes time to saveDocsOnCheckout.
    Added bats test coverage too
  • 546: /bats/1pk5col-strings.bats: Add skipped test for exporting to csv then reimporting
  • 545: /benchmark/sql_regressions/DoltRegressionsJenkinsfile: Add sql watcher 3
  • 544: Zachmu/bheni patch
    Fixed bug in parsing URLs of relative file paths.
  • 543: bats/aws-remotes.bats: Enable test for push.
  • 542: Added skipped bats test for issue 538
    https://github.com/liquidata-inc/dolt/issues/538
  • 540: interface for multi-db and tests
  • 539: Db/dolt harness test
    Pretty simple tests, but I think are effective. Tested against the commit that initially caused the breaks and these tests failed... should also have caught the "unable to find table errors" that occurred during the harness fixing iteration process. Feels pretty seems fine to me! LMK
  • 537: Jenkinsfile: Add environment variables for running AWS remote bats tests.
  • 536: bats/aws-remotes.bats: Add some smoke tests for AWS remotes interactions.
    Currently these get skipped in CI. Will follow up with CI changes after this lands as they will require some external state creation and some small infra work.
    Push is skipped here and can be unskipped after #531 lands.
    Clone is skipped here and will remain skipped until another PR fixes it. I took a pass this morning but clone logic has gotten a little hairy and I wasn't happy with the progress I was making. Going to take another pass soon.
  • 535: /benchmark/sql_regressions/run_regressions.sh: Silently exit if dolt version exists in dolt-sql-performance repo
  • 534: go/cmd/dolt: commands/clone: In the case that Clone bailed early, we hung indefinitely.
  • 531: store/datas/database.go: Update datas.CanUsePuller to check for write support in TableFileStore.
    The TableFileStore optimizations accidentally broke push support for
    non-dotlremoteapi/non-filepath chunk stores. Attempting to push an AWS remote
    currently results in an error.
    This is a quick minimal PR to check for write support in a TableFileStore and
    not use the push/pull fast path if it isn't supported.
  • 530: store/datas/database.go: Update datas.CanUsePuller to check for write support in TableFileStore.
    The TableFileStore optimizations accidentally broke push support for
    non-dotlremoteapi/non-filepath chunk stores. Attempting to push an AWS remote
    currently results in an error.
    This is a quick minimal PR to check for write support in a TableFileStore and
    not use the push/pull fast path if it isn't supported.
  • 528: go/cmd/dolt/commands: Handle some verrs which were previously being discarded
    I noticed some places where we were discarding some verr values. Looked quickly for open issues related to these and didn't find any, but the errors seem pretty important. The refactorings are maybe a little tricky so I appreciate as many eyes as I can get.
  • 527: Andy/autogen import
  • 526: /benchmark/sql_regressions/run_regressions.sh: Fix error messages
  • 525: Fixed bad regexes in diff --sql tests.
    Now these are failed tests that need to be fixed.
    Andy, sorry I did not catch these bad regexes in review the first time. Unfortunately, these are working regexes now and the tests do not test what you think they do. We need to go back and fix the tests or fix the bad behavior in the diff --sql
  • 524: Tim/new diff sql tests
  • 523: fix logictest
  • 522: go/libraries/doltcore/env/dolt_docs.go: Add newline to end of initial LICENSE/README text
    Fixes this:
    image
  • 519: bats/sql.bats: Add skipped bats test for ON DUPLICATE KEY UPDATE insert
  • 518: Andy/autogen migrate
    use the same tag generation method in the migration that we do in creating tables
    The old migration just assigned sequential tags to all of the columns in the repo.
  • 517: Multi DB
  • 514: CSV and JSON Import Behavior Changes
    Based on what was discussed before, this PR modifies the import behavior so that, when importing CSV files, if the destination schema has a bool replacement (BIT(1) or TINYINT/BOOLEAN) then it will parse TRUE and FALSE (and other variations) as their respective integers. This restores a behavior that was removed when the new type system was introduced, and all of the code paths were coalesced.
    In addition, this PR also fixes a bug where importing a null value through JSON would ignore a column having NOT NULL set.
  • 513: Fixed a bug in the newly created skipped bats test for diff --cached …
    …where I referred to instead of correctly using
  • 512: Added skipped bats test for dolt diff --cached
  • 511: Fix doltharness
  • 508: /benchmark/sql_regressions/run_regressions.sh: Fix mean query to include group by test_file
    Need to group by both PKs
  • 506: Andy/force push & force fetch
  • 503: Bh/puller error fix
  • 500: go/cmd/dolt/commands/diff.go: Handle error from rowconv.NewJoiner in diffRows
    I was just poking around a little bit and noticed that we weren't handling this error.
  • 497: Changes to support per connection sql state
  • 494: Bumped version
  • 492: fix return value on failure to push
  • 86: Zachmu/autocommit
    Bug fix for interpreting autocommit
  • 85: Zachmu/autocommit
    Auto commit
  • 84: Zachmu/update results
    Added support for OkResult in result sets, which mirrors the OK packet that MySQL sends on non-SELECT queries.
    Also:
  • 82: Zachmu/alias case
    Case sensitive column aliases in result schema.
  • 81: Zachmu/datagrip
    Lots of changes related to getting datagrip working:
    • Better information_schema support. Several tables have no rows, but are defined with the correct.
    • Better view support, including "show create view"
    • Added several new functions: LEFT, INSTR, IF, SCHEMA, USER
    • Added support for unquoted strings in SET VARIABLE statements
    • Eliminated several instances of custom parsing and used vitess directly instead
  • 80: Fixed case branches worrying about type
    For issue https://github.com/liquidata-inc/dolt/issues/529
  • 79: Error handling for USE DB_NAME when DB_NAME is not valid
  • 78: Added support for queries that don't specify a table when there is no table selected.
  • 77: Current db in session
  • 76: per connection state
    Want your thoughts on moving the IndexRegistry and the ViewRegistry out of the Catalog object and making them accessible as part of the context. It's now up to the SessionBuilder implementation to provide the IndexRegistry and ViewRegistry for a session. In dolt we would register indexes and views at that point, and would be altered when the connection changes what head is pointed at.

Closed Issues

  • 533: Feature: dolt sql --format=json
  • 529: Unable to cast default values to matching type
dolt - 0.15.2

Published by oscarbatori over 4 years ago

We are excited to announce the release of Dolt 0.15.2.

AS OF Further Enhanced

In our last release, 0.15.1 we highlighted the ability to query AS OF a branch or commit. In this release we expand this functionality by allowing uses to query AS OF a timestamp. This represents a version of the syntax familiar to users of other SQL databases with versioned data support. Thus we allow users to treat the underling commit graph as either wall clock time or relative to a commit.

TIMESTAMP and DATETIME functions

Dolt SQL now supports TIMESTAMP and DATETIME functions, which can be used to construct a time object of the given type. DATETIME('2020-03-01') and DATETIME() will return the given time or the current time, respectively.

Other

We continue to make enhancements to our underlying SQL engine, and other assorted bug fixes.

As usual, let us know if you have any thoughts by filing an issue!

Merged PRs

  • 487: fix issue with merge blowing away changes in working
    Previously we used to fail if there was anything checked out and you tried to merge. Then we went to the git model where we would allow merging as long as the changes in the working set didn't touch the same tables as the changes in the commit being merged. For fast forward merges this was stomping working table changes as we'd just set the root and not re-apply the changes to the tables that were modified in the working set.
  • 485: Andy/ci compatibility
  • 482: Db/bats merge stats
    One test for inaccurate merge stats, one for confusion with the checkout command.
    @timsehn how do you want checkout to work when branch name and table name are identical?
  • 481: go/go.mod: Upgrade dependencies.
  • 480: /go/{go.mod, go.sum}: Update go.mod github.com/liquidata-inc/sqllogictest
  • 478: execute saved query
    Implements:
    dolt sql -x <saved_query_name>
    dolt sql --list-saved
    
    Changes
    • dolt sql -s <name> -q <query> now saves the query with id = name
  • 476: Zachmu/as of timestamp
    Support for AS OF queries with a timestamp
  • 475: bump version
  • 75: Added database qualifiers to view resolution.
  • 74: Zachmu/datefns
    Added support for DATETIME() and TIMESTAMP() functions
  • 73: Zachmu/as of
    Support for pushing AS OF clauses down to tables in a view. Kind of janky, but works great!
  • 72: update vitess dep
dolt - Dolt 0.15.1 released

Published by oscarbatori over 4 years ago

We are excited to announce the the release of Dolt 0.15.1.

AS OF

We now support AS OF queries, similar to the Microsoft SQL Server implementation described here. Timestamps are not yet supported for AS OF expressions, but users can use branch names, commit hashes, and other commits specs:

SELECT name FROM persons AS OF 'add-new-employees'
SELECT * FROM persons AS OF 'hvbsl13cbi03ptft78k0pgtkgpd68ehj'
SELECT * FROM persons AS OF 'HEAD~'

These queries will retrieve rows from the table as it existed at the named revision. Different tables in a join can use different AS OF clauses:

SELECT name FROM persons as of 'HEAD~' NATURAL JOIN addresses AS OF 'HEAD~2'

Other

Elsewhere we continued to improve the performance and correctness of SQL, expand test coverage, and fix bugs.

Merged PRs

  • 474: Fixed bug caused by overzealous refactoring
  • 471: Zachmu/batch bug
    This fixes https://github.com/liquidata-inc/dolt/issues/467.
    The SQL script in question deleted and recreated a subset of the table, and should have resulted in no diff. Before this change, it resulted in some subset of rows being deleted. The issue was that DoltDatabase in batch mode was using mapEditor.Remove and mapEditor.Add for the same keys, which doesn't work in all (most) cases. The solution is to flush the cache before and after any non-insert statement when running in batch mode.
    This change also makes the output for batch mode more sensible.
    Brian to review, Andy and Aaron FYI.
  • 469: dolt sql-server bats test support
  • 468: Zachmu/as of
    Dolt support for AS OF queries.
    This won't build until https://github.com/liquidata-inc/go-mysql-server/pull/71 is merged and dependencies updated.
  • 465: Fixed error where we weren'tlogging an error that resulted from flush…
    …ing a batch SQL import
  • 461: add information_schema database
  • 460: /go/libraries/doltcore/sqle/logictest/main/main.go: Add timeout result for sqllogictests
  • 456: Andy/datetime fix
  • 455: Fix the SQL
    server command by providing it the appropriate variable when instantiating SQL server object.
  • 453: reorganized to create testcommands package
    This PR is simply a reorganization to allow for the testcommands package, see go/libraries/doltcore/dtestutils/testcommands/command.go.
    These sorts of end-to-end tests have been very useful so far in writing rebase and super schema, and I'd like to expand their use across the codebase.
  • 452: /bats/remotes-file-system.bats: Add skipped test for failed branch deletes after fetch
  • 451: /bats/remotes-file-system.bats: Add skipped bats test for fetch display branch bug
  • 449: adding skipped bats tests for table merge panics
  • 447: First cut at generating CLI docs for docs site
    Our CLI has help text, we also use that text to generate docs for our documentation site at (DoltHub docs)[dolthub.com/docs].
    Generating docs and command line output from the same content entails three separate concerns:
    1. the raw content
    2. modifying the content so the rendering is correct (in this case shell output and Gatsby build of .mdx files, JSX version of .md)
    3. IO for writing to console (CLI help text) and files (.md files for building docs)
      The goal of this PR is to move us towards a clean separation of these three concerns using Go templates. Specifically
    • implement helper types and methods for modifying content to render correctly in CLI output or .mdx
    • modify the raw content to the new data structures and text format that can be templated
    • update markdown generation code to simply request a template, rather than building document manually
      Future work will be to use templates for the CLI output.
  • 446: go/store/nbs: s3_table_reader: Be certain to close body readers when reading from S3.
    It is necessary and correct that we close these readers. Helps with persistent
    connection reuse, accurate logging and timing metrics, timely resource
    finalization, etc.
  • 443: Added a succeeding bats test for push --set-upstream.
    Could not repro panic from #442.
  • 441: Added skipped bats test for update a datetime field
  • 439: get rid of unique counts
  • 438: Db/dolt sqllogic add version to DoltRecordResult
  • 437: bumped version for release
  • 436: pull stomp fix
  • 71: Zachmu/as of
    AS OF implementation.
    This change also removes context.Context references from the core.go interfaces, replacing them with sql.Context to be consistent.
  • 69: added conversion logic to SetField expression

Closed Issues

  • 467: Running piped SQL file that drops and adds a table produces different results on subsequent runs
  • 466: Repo got in broken state
dolt - Dolt 0.15.0 released

Published by oscarbatori over 4 years ago

We are excited to announce the release of Dolt 0.15.0.

SQL Type System

Previously Dolt had a much narrower type system than MySQL. For ease of use reasons, we just mapped types that we did not support to their "super type", for example using the previous Dolt release:

doltsql> CREATE TABLE pet (name VARCHAR(20), owner VARCHAR(20), species VARCHAR(20), sex CHAR(1), birth DATE, death DATE, PRIMARY KEY (name));
doltsql> describe pet;
+---------+----------+------+-----+---------+-------+
| Field   | Type     | Null | Key | Default | Extra |
+---------+----------+------+-----+---------+-------+
| name    | LONGTEXT | NO   | PRI |         |       |
| owner   | LONGTEXT | YES  |     |         |       |
| species | LONGTEXT | YES  |     |         |       |
| sex     | LONGTEXT | YES  |     |         |       |
| birth   | DATETIME | YES  |     |         |       |
| death   | DATETIME | YES  |     |         |       |
+---------+----------+------+-----+---------+-------+

Using this release of Dolt, we can see that richer type choices are respected:

doltsql> CREATE TABLE pet (name VARCHAR(20), owner VARCHAR(20), species VARCHAR(20), sex CHAR(1), birth DATE, death DATE, PRIMARY KEY (name));
doltsql> describe pet;
+---------+-------------+------+-----+---------+-------+
| Field   | Type        | Null | Key | Default | Extra |
+---------+-------------+------+-----+---------+-------+
| name    | VARCHAR(20) | NO   | PRI |         |       |
| owner   | VARCHAR(20) | YES  |     |         |       |
| species | VARCHAR(20) | YES  |     |         |       |
| sex     | CHAR(1)     | YES  |     |         |       |
| birth   | DATE        | YES  |     |         |       |
| death   | DATE        | YES  |     |         |       |
+---------+-------------+------+-----+---------+-------+

We hope this makes it easier for users to use Dolt in the context of their existing data infrastructure. This change is backward compatible. So old versions of Dolt can read repos written by new versions, and vice versa. That said, you should upgrade to the latest and greatest!

Unions

We also now support unions, a powerful tool for tabulating the results of analyses produced by different queries:

doltsql> (select 1 as this) union (select 2 as this);
+------+
| this |
+------+
| 1    |
| 2    |
+------+

Other

We also improved the performance of the Dolt log table, as well as making our usual assortment of bug fixes and improvements.

Merged PRs

  • 435: /go/libraries/utils/iohelp/read_test.go: Skipping TestReadWithMinThroughput due to flakiness
  • 434: Added skipped divide by zero bats test
  • 432: go.mod: Bump go-mysql-server to support SQL UNION.
  • 426: topo sort for log
  • 424: Dolt log bats test rework
  • 423: /go/libraries/doltcore/sqle/logictest/main/main.go: Add withdurations option
  • 422: Made a file system remotes bats test file, added some tests, and move…
    …d appropriate tests from remotes.bats. Added a skipped test in remotes.bats for dolt pull stomping a dirty working set
  • 419: add user agent to grpc calls
  • 418: filter commits used by history table
    replaces https://github.com/liquidata-inc/dolt/pull/409
  • 417: optimize the map iterator used by the history table
  • 416: bh/set-algebra
    Package setalgebra provides the ability to perform algebraic set operations on mathematical sets built directly on noms
    types. Unlike standard sets in computer science, which define a finitely sized collection of unordered unique values,
    sets in mathematics are defined as a well-defined collection of distint objects. This can include infinitely sized
    groupings such as the set of all real numbers greater than 0.
    See https://en.wikipedia.org/wiki/Set_(mathematics)
    There are 3 types of sets defined in this package: FiniteSet, Interval, and CompositeSet.
    FiniteSet is your typical computer science set representing a finite number of unique objects stored in a map. An
    example would be the set of strings {"red","blue","green"}, or the set of numbers {5, 73, 127}.
    Interval is a set which can be written as an inequality such as {n | n > 0} (set of all numbers n such that n > 0) or a
    chained comparison {n | 0.0 <= n <= 1.0 } (set of all floating point values between 0.0 and 1.0)
    CompositeSet is a set which is made up of a FiniteSet and one or more non overlapping intervals such as
    {n | n < 0 or n > 100} (set of all numbers n below 0 or greater than 100) this set contains 2 non overlapping intervals
    and an empty finite set. Alternatively {n | n < 0 or {5,10,15}} (set of all numbers n below 0 or n equal to 5, 10 or 15)
    which would be represented by one Interval and a FiniteSet containing 5,10, and 15.
    There are 2 special sets also defined in this package: EmptySet, UniversalSet.
    The EmptySet is a set that has no values in it. It has the property that when unioned with any set X, X will be the
    result, and if intersected with any set X, EmptySet will be returned.
    The UniversalSet is the set containing all values. It has the property that when unioned with any set X, UniversalSet is
    returned and when intersected with any set X, X will be returned.
  • 415: publishrelease/install.sh: install -d /usr/local/bin if it does not exist.
  • 414: added RepoStateReader interface
  • 413: Added a couple more test cases in the limit test
  • 412: Skipped bats test for DATE_ADD and DATE_SUB in the where clause
  • 411: Added new test repository with TypeInfo changes
  • 408: Added a group by bats test highlighting inconsistent behavior
  • 406: Bumped version for release
  • 404: iterate a map backward
    How much do you hate this?
  • 403: Tim/dateformat bats
  • 68: sql/{parse,plan}: Add support union parsing and execution.
    This is still partial. We need type coercion and schema validation to be done
    in the analysis phase.
  • 67: Zachmu/datemath
    Fixed panic when using interval expressions in WHERE clauses.
  • 66: Negative Numbers
    Somehow I was able to not only forget to include logic to handle negative numbers, but I forgot to also write a test for them too, and even implemented this library in dolt and didn't test for negative numbers there. I'm actually surprised. It wasn't even caught in peer review. It's the simplest things that we forget to check, and those are the ones that can cause the most havoc.

Closed Issues

  • 420: Dolt log -n 10 returns wrong results. Dolt log produces correct result.
dolt - 0.14.0

Published by oscarbatori over 4 years ago

We are pleased to announce the release of Dolt version 0.14.0.

Query Catalog

The major new feature in this release is the query catalog, which is implemented via adding additional options to the dolt sql command. Users can now pass --save and --message options to SQL queries to save and version them at the repo level:

	-s <saved query name>, --save=<saved query name>
	  Used with --query, save the query to the query catalog with the name provided. Saved queries can be examined in the dolt_query_catalog system table.

	-m <saved query description>, --message=<saved query description>
	  Used with --query and --save, saves the query with the descriptive message given. See also --name

This will allow users to document their data with versioned SQL queries. In the future we hope to make this a validation tool.

Assorted Fixes and Improvements

We continue to make progress on our SQL implementation, both in terms of correctness and performance. For example, GREATEST() now supports timestamps. We are prioritizing correctness over performance, though we are making gains in performance.

As usual, if you see anything amiss, please don't hesitate to file an issue.

Merged PRs

  • 401: Fixed arg-parsing.bats not working on Windows
  • 400: Added 2 skipped argument parsing bats tests.
    We don't support Nix style argument parsing completely right now
  • 399: Quote view name on create view
  • 395: go/cmd/dolt/commands/sql.go: Always close row iters. Improves robustness when SQL returns an error.
  • 394: Km/doc diff print bug
  • 393: Bumped go-mysql-server version
    Integrates https://github.com/liquidata-inc/go-mysql-server/pull/65 into dolt
  • 390: bats/sql.bats: Document some unsupported SQL features with some failing bats tests.
  • 389: go.mod: Upgrade dependencies.
  • 386: delete cell values on drop column
  • 383: deadlock fix
  • 382: README.md: Add sudo invocation to install instructions.
  • 379: no whitespace column names
  • 378: Bumped go-mysql-server version
  • 376: Added skipped test for select as
  • 375: Zachmu/query save feature
    Implemented query saving via new dolt_query_catalog table, created by dolt sql -q -s.
    Also:
    • Unified business logic for validating table names in every code path where a table is created
    • Separated out read-only, read-write, and alterable SQL tables
    • Refactored deeply nested error handling logic in mv and cp commands
    • Refactored import command to separate validation from execution logic
    • Added many tests
  • 372: check for nil or empty column headers
    @timsehn do we want to allow whitespace column names?
  • 371: unskip test after fixing query
  • 369: fix metrics bug
  • 368: chunk store metrics wrapper
  • 367: Changed add all shortcut from lowercase a to uppercase A to match git.
    Added corresponding bats test.
    Fixes #345
  • 366: name already existing file in error message
  • 365: Bumped version for 0.13.2 release
  • 364: Added a basic tests for dolt_diff_ and dolt_history_ system tables.
    Found weird behavior in dolt_diff_.
  • 363: Fix error message when table doesn't exist
    Fixes https://github.com/liquidata-inc/dolt/issues/275.
  • 65: Added DATETIME handling to GREATEST/LEAST functions
    Fixes https://github.com/liquidata-inc/dolt/issues/380
  • 63: sql/expression/function: Add UNIX_TIMESTAMP function.
  • 62: Fix stupid compilation issue
  • 61: small fixes
  • 60: Fixed Bit type
  • 59: Fixed a bug in drop table logic caused by variable shadowing. No test…
    …s of this behavior because we don't have an easy way to make an operation like DropTable return an error in the in-memory database.

Closed Issues

  • 391: IGNORE
  • 388: dolt diff printing bug for doc diffs
  • 381: Installation problems with macOS Catalina
  • 380: SQL: greatest function does not work for datetime type
  • 370: Nil reference exception importing what seems to be a valid CSV
dolt - 0.13.2

Published by oscarbatori over 4 years ago

The set of changes for this version are:

  • Bug fixes and improvements
  • Output query results in CSV format
  • Standards compliant CSV exports

As usual, please let us know if you have any questions or concerns.

Merged PRs

  • 362: Zachmu/sql csv
    CSV output for SQL
  • 361: Added skipped bats test for REPLACE counting issue
  • 360: Added three simple explain bats tests
  • 355: Fixed dolt documentation and improved relevant bats test
  • 354: fix casing issue
  • 353: A start on system tables bats tests
    @bheni will finish this and unskip the tests that are specced
  • 352: ls --system and --all flags
  • 351: correct schema merge
  • 349: go/cmd/dolt: commands/credcmds/use.go: Implement dolt creds use to select a credential.
  • 348: fixed panic on merge of non-existant branch
  • 347: go/cmd/dolt: commands/credcmds/new: Use a newly created credential if there is no existing selected credential.
  • 344: Bump go-mysql-server
  • 343: Changed the install instructions on dolt README.
    Install instructions now point to new install scripts.
  • 342: Zachmu/fix clone empty
    Cloning an empty repo now works as expected. Fixes #217
  • 341: Bumping version on tip of master to fix shell script
  • 339: Added tags tests
  • 338: fixed conditional that was preventing batching
    Batching happens in two places: on initialization and each time the current batch is exhausted. Incorrect logic on lines 53 & 54 was preventing batching once the initial batch was exhausted.
  • 337: bats,{go/cmd/dolt/commands}: Hide doc schema for schema show w/no table args
  • 57: Added ChangeCollation to StringType
    Also renamed CreateBlob and MustCreateBlob to CreateBinary and MustCreateBinary because those names make more sense.

Closed Issues

  • 358: Can't clone a public repo from DoltHub when not logged in.
  • 340: Install Process Failing
dolt - 0.13.1

Published by oscarbatori over 4 years ago

We are releasing a patch fix to Dolt, as the 0.13.0 release contained a bug. The new feature for creating license and read me documents on a repository contained a bug such that when cloning a repository using dolt clone, the documents were not updated. This patch ensures that functionality works correctly.

Since this is a patch of a recent release, see 0.13.0 release notes for details about the new features recently introduced.

Merged PRs

  • 335: Km/doc clone bug
  • 333: fix regex
dolt - 0.13.0

Published by oscarbatori over 4 years ago

We are excited to announce the release of Dolt 0.13.0, hot on the heels of relaunching DoltHub.

Easy Install Script

It's now incredibly easy to install Dolt, so if you haven't tried it yet, you can now obtain a copy with a single command, and start playing with datasets:

$ curl -L https://github.com/liquidata-inc/dolt/releases/latest/download/install.sh | bash

The installer script works on Mac and Linux. For Windows, download the MSI installer below.

System Tables

We released a blog post detailing some exciting new functionality for surfacing versioning data in Dolt. This is the first of a set of features that will eventually expose all the Git-like internals of Dolt to SQL, and facilitate automated use of Dolt by allowing users to define their default choices inside SQL statements.

  • dolt_log: Access the same information as the dolt log command via a SQL query
  • dolt_diff_$table: A system table for each of your tables, which lets you query the diff between two commits. See the blog post for more details.
  • dolt_history_$table: A system table for each of your tables, which lets you query past values of rows in the table at any commit in its history. See the blog post for more details.

LICENSE and README functionality

We now allow users to create License and Readme documents as part of their Dolt repository, these appear as LICENSE.md and README.md files in the root of your repo. Edit them with the text editor of your choice, then add them to a commit with dolt add, the same as a table. Their contents are versioned alongside your tables' data. License and Readme files will soon be visible on DoltHub for repositories that provide them. Allowing users to specify the terms on which data is available is an important step towards creating a vibrant data-sharing community.

Views

Our SQL implementation now supports persistent views, taking us closer to having a fully functioning SQL engine. Create a view using the standard SQL syntax:

CREATE VIEW myview AS SELECT col1, col2 FROM mytable

Then query it like any other table:

SELECT * FROM myview

Other

We made performance enhancements to SQL, including supporting indexed joins on a table's primary key columns. This should make the engine usable for joins on the primary key columns of two tables. Additional improvements in join performance are in the works. We also fixed assorted bugs and made performance improvements in other areas of the SQL engine.

Merged PRs

  • 331: Removed Windows carriage-return and trailing whitespace from bats tests
  • 329: CSV export compliant with RFC 4180
  • 328: bats/helper/windows-compat.bash: Try mktemp on Windows.
  • 326: one down
    The other 32 skipped bats tests are confirmed to fail
  • 324: Removed old table and schema commands from the command line
  • 323: fix buffered sequence iterator and put it back in row iterator
  • 322: reverting buffered iter
  • 320: bats/creds.bats: Debug windows failures.
  • 319: Added a bats test for committing views and referencing them later
    Added some checks for checked in views.
  • 318: Added test case for dolt reset --hard on new tables
  • 316: Buffered Sequence Iterator
    • Created a new interface sequenceIterator for the use case when sequenceCurosor is simply accessing elements in its sequence (ie MapIterator, SetIterator, and ListIterator)
    • Created a new buffered implementation of sequenceIterator designed by @reltuk to batch chunk fetching from the ValueStore. In use cases such as DoltHub where chunk fetching IO is slow, this will dramatically accelerate performance.
  • 310: Km/non-trivial merge of master into doc feature branch
    This is just a merge from master into my doc feature branch. So you can ignore that there are many commits authored by not-me.
    I wanted to get eyes on the last commit before I merge it (https://github.com/liquidata-inc/dolt/pull/310/commits/d4de25964219e9b038728553de2ff0b298834bce). I had to remove 2 of the HasDoltPrefix checks that was breaking create-views.bats. Now i'm checking for DocTableName explicitly. I left the HasDoltPrefix function since I'm using it in the commands package, and presume we'll eventually need to use it again the sqle package.
  • 308: Updated to latest go-mysql-server. Re-enabled indexes by default, and…
    … un-skipped an integration test of indexed join behavior.
  • 307: go/utils/publishrelease: First pass at an install.sh
  • 306: Bumped go-mysql-server version
  • 305: bats/creds.bats: Some initial bats tests for dolt creds new, ls and rm.
  • 302: Km/doc tests
    This PR:
    • Simplifies tests in docs.bats
    • Adds tests for some helper functions in doltdb/root_val_test.go
      Will do more testing tomorrow, but wanted to get this in
  • 301: dumps docs
    This code dumps the standard command line help pages for every command that isn't hidden.
    Because we only had functions for each command it was difficult to add a new method that would be implemented for each command, so I had to refactor all of that code. The refactor makes up the bulk of the PR.
  • 299: dolt checkout, and merge with dolt docs, with bats coverage
    This PR includes:
    • Fixed a bug where dEnv.Docs was not always matching the docs of the current repo state (working root). This required changing the Docs type in the env package to []doltdb.DocDetails from []*doltdb.DocDetails. You'll see some reformatting to accommodate this change.
    • checkout <doc>
    • checkout <branch>
    • merge <branch> (one scenario is still buggy, need help identifying solution)
    • FF merge - docs on the FS get updated to target branch
    • Merge with conflicts - docs on the FS remain as is
    • Merge auto resolved conflicts (currently buggy) - docs on the FS should be updated to targetBranch, but they should not be added to the new working root. This would allow dolt status to indicate that the doc needs to be added and committed to finish merging. Right now it appears the doc is getting added to the working root.
  • 298: go/cmd/dolt/commands/sql: Add view persistence into dolt database.
  • 296: go/cmd/dolt: credcmds/check: Add dolt creds check command.
  • 295: update go-mysql-server to be the latest from liquidata-inc/go-mysql-s…
    …erver@ld-master
  • 294: Added indexes to dolt sqllogictest harness and updated dependency on …
    …go-mysql-server.
  • 293: go/cmd/dolt/commands/credcmds: Add documentation and a little bit of chrome to dolt creds commands.
  • 291: Fixed ignoring an error in put-row
  • 290: go/go.mod: Run go get -u all. Migrate to dbr/v2.
  • 289: Tim/add docs bats
    This is the test for branch, merge, and conflict resolve behavior. You can break it into multiple tests if you want but I think this is fine.
  • 288: add diff_type column to be able to select where diff_type is added, r…
    …emoved, or modified
  • 287: fixes casing issue with system tables
  • 285: Added bad describe bats test per testing session with Katie
  • 284: {go,bats}: Implement dolt diff by parsing docs from args, with …
    …bats test
  • 283: Zachmu/explain
    Fixed describe table statements, and unskipped related tests.
  • 282: change the date field to be a Sql.DateTime
    Output of the date field was in a format that wasn't able to be sorted properly.
  • 281: fix select on system table that doesn't exist
    fix select on a system table that has a valid prefix but whose suffix does not match a valid table.
    What makes this a little bit tough is that you can query diffs or the history of a table that no longer exists. So need to process the entire history and then see if at any time there was a schema'd table with the given name.
  • 280: {bats, go/libraries/doltcore/sqle/database.go}: Remove DoltNamespace from dolt sql command
  • 279: {go,bats}: Remove DocTableName from dolt table, schema, ls, add, reset, diff
    This PR removes DocTableName from the outstanding commands so we don't expose the dolt docs table.
  • 278: {go,bats}: Add dolt docs to dolt diff
    This PR adds docs to the dolt diff command. Diffing individual docs dolt diff <doc> will land in a subsequent PR.
    Here are some example outputs:
    Removing a file that has already been committed:
    rm LICENSE.md
    dolt diff
    diff --dolt a/LICENSE.md b/LICENSE.md
    deleted doc
    - new license
    
    Adding docs that don't already exist on the staged root:
    touch README.md
    touch LICENSE.md
    dolt diff
    diff --dolt a/README.md b/README.md
    added doc
    diff --dolt a/LICENSE.md b/LICENSE.md
    added doc
    
    Modifying a doc that has already been committed:
    dolt diff
    diff --dolt a/LICENSE.md b/LICENSE.md
    --- a/LICENSE.md
    +++ b/LICENSE.md
    this is my license
    +
    + How to use this repository:
    + Step 1)....
    
  • 274: Zachmu/alter table engine
    Alter table statements on the go-mysql-server engine, with new support for modify column statements.
  • 272: go/store/nbs: Fix CopySource in UploadPartCopy for namespaced aws table persisters.
  • 271: go/store/nbs: Disable persisting table data in dynamodb.
  • 270: Removed some extraneous bash commands I found in bats
  • 269: go/store/util/verbose: Add ability to override Log function.
  • 268: added args check to dolt sql command
    image
  • 267: Updated go.mod to point to newest go-mysql-server
  • 266: fix and unit test
  • 265: fixed bool conversion
  • 264: Small fix to change strings back to LONGTEXT
  • 263: dolt reset --soft and --hard for dolt docs
    This PR includes:
    • Adding docs to dolt reset --hard, with tests
    • Adding docs to dolt reset --soft, dolt reset ., dolt reset, dolt reset <doc>, with tests
    • Refactor doltcore/env/dolt_docs to more general use
    • ...and other helper functions in doltcore/env/environment.go to help stage and unstage docs
  • 262: Include required length value for "varchar" column type in README tutorial
  • 261: Dockerfile: First pass at a simple Dockerfile for building an image with dolt installed.
  • 260: Now referencing newest go-mysql-server version
  • 259: fix csv parsing
    Currently there is a panic if a csv line contains nothing but whitespace.
  • 256: Andy/sqlweb
    I needed to reuse logic from SQL processing in Dolt, but as written it needed a CLI environment. Did some refactoring to de-couple env.DoltEnv and env.RepoState:
    • Removed env.DoltEnv as a dependency for sqlEngine in commands/sql
    • Removed FileSystem as a dependency in env.RepoState
      👀 that symmetric stat line tho
  • 255: Changed Datasets to clone section
  • 254: Fix diff where clause
    Fixes diff where so that you no longer need to use to_col or from_col.
    --where col=val
    is evaluated as:
    where to_col=val || from_col=val
  • 249: go-mysql-server types
  • 248: bh/skip describe bats
  • 247: {go,bats}: dolt add
    I will change the base on this PR to the feature branch (#222) once #242 is merged
    I updated the base to the feature branch 👍
  • 246: slow history tables
  • 245: change super_schema to use a CommitItr
  • 242: {go,bats}: dolt add . for dolt_docs, with updated dolt status and bats tests
  • 55: Fixed integer overflow when compiling to 32 bit platform
  • 54: Zachmu/join improvements
    Improved indexed join analysis to match more kinds of queries. Added analysis tests to ensure that queries get the expected join plans.
  • 53: Zachmu/multi column index joins
    Multi-column joins. Also bug fixes for order by clauses.
  • 52: sql/plan: {create,drop}_view: Improve ViewDropper and ViewCreator interactions to better support OR REPLACE / IF EXISTS.
  • 51: absolute value function for SQL
  • 50: Standardized error messages for unsupported syntax / features.
  • 49: Bug fix for sorter where both values are nil.
  • 48: Indexed joins
    Indexed joins for single-column indexes. Also:
    • Standardized capitalization of keywords in engine tests
    • Fixed some bugs in non-indexed join logic
    • Refactored and renamed a few things
      This PR exposes (but does not create) a bug in sort logic for float columns. One new test query fails sometimes on tests of parallel query execution, depending on the race outcome. I'll fix that in a separate PR.
  • 47: Zachmu/describe
  • 46: Implemented SET
  • 45: Implemented ENUM
  • 44: Zachmu/alter table
    Alter table implementation
  • 43: Added DECIMAL type
  • 42: Internal TEXT to LONGTEXT
    Changed all of the internal locations where we're using TEXT to now use LONGTEXT.
  • 41: Fixed BETWEEN and IN issues due to type changes
  • 40: sql: Add interfaces for signaling to database when views are created and dropped.
  • 39: sql/analyzer: resolve_views: Make resolving views an independent analyzer rules that runs before resolve_subqueries.
  • 38: sql/plan/create_view.go: Store the original parsed definition of the view, not the analyzed version.
    A problem occurs when things like the resolved table are stored in the view
    definition. This changes the analyze path to analyze the view definition anew
    each time, and stores the unanalyzed view definition the view registry. CREATE
    VIEW still completely analyzes the view at creation time, in order to catch any
    errors in the view definition.
  • 37: Zachmu/drop create fixes
    Added support for IF NOT EXISTS and column comments to CREATE TABLE statements
  • 36: Zachmu/more index fixes
    Fixed bugs in the implementation of AscendIndex and DescendIndex for memory index lookups, and fixed a bug in assigning indexes to OR expressions.
  • 35: Fixed null comparison panics in artithmetic operations and added tests
  • 34: Add basic VIEWS support.
    Basically takes https://github.com/src-d/go-mysql-server/pull/860 and adapts it for our sqlparser changes.
  • 33: Better types compliance
    First off, I'm going to list the types that are missing from this PR:
    DECIMAL
    ENUM
    SET
    GEOMETRY
    GEOMETRYCOLLECTION
    LINESTRING
    MULTILINESTRING
    POINT
    MULTIPOINT
    POLYGON
    MULTIPOLYGON
    
    The first three mentioned are the last ones I'm going to implement for now, as the GEOMETRY types are very, very complex. However, DECIMAL proved to be a bit tougher than expected, and I didn't want to delay getting this PR out since I'll be gone over the entire Thanksgiving holiday (technically I'm on it now but I promised a PR by the 25th). Besides the ones listed above, every other type is in this PR in some capacity (NCHAR and friends are missing, but that type can be replicated as it's just a charset/collation shortcut).
    Some notes, because they're important. There are no tests. Changing the types has broken a fair amount, and I haven't bothered to fix those things yet before I can get started writing tests.
    I'm making use of higher-level interfaces that are specific to some type or group of types. I played around with the idea of adding a new function to the Type interface that returned an attribute map, but decided against it since I wanted to be able to add more than raw attributes, and a map of functions just seemed too complex for the tradeoff.
    Character sets and collations are complete, in that every single supported one in MySQL is here...which is why there are so many. I had many ideas on how to model this, but decided on using string constants as the base so that it's easy to reference them from outside code, and then have multiple maps for each specific attribute of a charset/collation. For example, if an implementer wants to specifically support the hebrew character set, then they can just reference CharacterSet_hebrew, rather than defining their own variable with custom_var, err := ParseCharacterSet("hebrew").
    JSON is actually a very complex type, so I didn't really change anything from the logic that was already present before. This does mean that we don't truly support JSON, but implementing it right may take 1-2 full weeks, which I don't feel is worth it right now.
    It pains me to say, but in the effort of full compliancy, I am modeling a BOOLEAN as an int8. But that's how MySQL does it, so it's probably for the best.
    You'll notice that some types have global variables while others don't, and that's because some types don't make sense to reference without other information. Int8 can be a global type because it is fully self-contained in its information. Varchar, OTOH, requires a length to be meaningful.
    On the topic of the string types, if any logic looks weird, then it's to mimic something that MySQL is doing. I did a lot of testing to see what MySQL actually did when the documentation wasn't clear, so some rules seem inconsistent by MySQL is apparently inconsistent. I'm also doing nothing with the character set and collation stuff regarding comparisons, so we just support utf8mb4 as a character set (which mimics Go's string type implementation to the best of my knowledge), and changing it to something else doesn't change any actual behavior. That can come later because that's A LOT to do...
    The TIME type is weird, in that the minimum precision is microseconds, so I chose to base the entire type around that. All of the parseable variations are present as well. I was initially going to use time.Duration from Go, but it actually didn't suit the requirements. Working with it turned out to be more difficult than just using microseconds. For example, the only way to make a duration is to parse a string of a specific format, which would be parsing a string, then recreating a different string to parse, which is too much.
    YEAR is the most useless type I've ever encountered in anything, but hey it's included and it's fully supported now, hooray...
  • 32: Zachmu/unary plus
  • 31: bh/lazy loading
  • 30: Zachmu/fix pushdown
    Bug fixes for table pushdowns and index lookups:
    1. Failing to renumber fields in a filter after projecting a subset of columns onto tables in some cases
    2. Incorrectly applying index lookups to OR clauses involving more than one table, which inappropriately restricts the indexed table to only matching values.
      Added a couple test tables to engine_test, and broke out the tables used for information schema into their own set of test definitions.
  • 28: Zachmu/insert update delete perf
  • 27: Fixed replace tests (broken by my changes to type handling code in Up…
    …date)
  • 26: DATETIME handling & microsecond support
  • 25: Zachmu/better type checking
    Better type checking for inserts (results in an error message before any inserts are attempted)
  • 24: Zachmu/insert into select
    First draft. Works, but error messages could be better. Working on an extension to the Type so that we can compare column types for compatibility (right now type mismatches fail at execution with e.g. "expecting int64 but got string" messages)
  • 23: Removed unused variable assignments
  • 22: Zachmu/index or bug fix
    Fixed a bug in indexing code that would cause queries to return incorrect results. Along the way, implemented index capabilities for the in-memory tables (only for correctness verification), and expanded engine_test to test every combination of indexes, partitions, and parallelism.* 21: Fixed PK NN behavior to match MySQL
    Primary Keys should always be NOT NULL according to the MySQL documentation.
    https://dev.mysql.com/doc/refman/8.0/en/create-table.html#idm139638853954816
  • 20: Fixed test error
  • 19: PR feedback: added engine tests, removed Alterable interface, added c…
    …omments.
  • 18: Zachmu/drop table
    Drop and create table support, as well as support for 24-bit integers.
  • 17: Throw an error on CREATE VIEW statements, which were parsing as CREAT…
    …E TABLE with an empty table spec prior to this change.
  • 16: Zachmu/drop table 2
    Drop and create table
  • 14: Compare nulls
  • 13: Implemented UPDATE
    Take a look at this. I have yet to implement all of the tests, as this is mainly to check the logic to make sure that I'm on the correct track and all. First time working with their pipeline stuff, it's kind of interesting.
  • 12: Implemented REPLACE
    I decided to go with the Delete then Insert way instead of adding a Replacer interface. The insert-only functionality in the event of no primary key isn't what the "expected" behavior would be in my opinion, and since we don't have a way to check from this library, I think we should just do it this way instead. On the other side, if we have a Replacer interface, then we can't ensure that the implementer will properly choose if it will an Insert-only. I'd rather enforce a behavior that will be applicable and correct in the most obvious case, and it reduces the necessary code from implementers too since they won't need to duplicate any trigger functionality or anything.
  • 11: Deletes
    DELETE has been implemented!
  • 10: Fixes for column duplicates and existence
    Fixes for both the duplicate columns not erroring and also for invalid column names.
  • 9: Windows support
  • 8: Insert Fixes
    Work so far for fixing issues in imports.

Closed Issues

  • 312: Creating a column with type Null
  • 251: dolt 0.12.0 schema import crash
dolt - 0.12.0

Published by oscarbatori almost 5 years ago

We are excited to announce the release of Dolt 0.12.0!

Community

We have our first open-source committer to the Dolt project! Thanks to @namdnguyen for providing a helpful fix to our documentation. We are hoping this will be the first of many open-source contributions to Dolt.

SQL

As discussed in this blog post, we use sqllogictest to test our SQL implementation's logical correctness. It contains 5 million tests! This release marks a huge jump in compliance, with our implementation now hitting 89%, up from well under 50% just a few weeks ago.

Diff With Predicate

--diff-where command allows the user to add a predicate on the table being diff'd to reduce the surface area of the diff output and drill into specific data of interest.

Override Commit Date

When a user commits data, a timestamp is associated with the commit. By allowing Dolt users to customize the timestamp we allow the user to create an implicit bi-temporal database (based on commit time) while maintaining the ordinal integrity of the commit graph for querying history and reasoning about the sequence of updates.

SQL Diffs

Using the SQL diff command, that is dolt diff -q or dolt diff --sql, users can produce SQL output that will transform one branch into another. In other words this command will produce the difference, in data and schema transformations, between two refspecs in the commit log of Dolt repository.

As usual, this release also contains bug fixes and performance improvements. Please create an issue if you have any questions or find a bug.

Merged PRs

  • 241: Bumped version and added release script
  • 239: bats/create-views.bats: Pick up go-mysql-server support for views.
  • 237: go/performance/benchmarks: remove id from results
  • 236: Noticed an alter table test that now works was skipped. Unskipped.
  • 235: Fix typo in README for table import
    I ran into this typo while using Dolt yesterday. The command keywords were in the incorrect order in the README.
  • 233: Zachmu/sql batch
    Killed off original sql batch inserter and implemented equivalent functionality for new engine.
  • 232: Andy/sqldiffrefactor
  • 230: go/store/nbs: table_set.go: Rebase: Reuse upstream table file instances when supplied table specs correspond to them.
  • 229: fix schema diff primary key changes
    Output looks like this for changing a pk:
    --- a/test @ 4uvb6bb3p7dqudnuidh9oh4ccsehik7n
    +++ b/test @ 2tl4quv92ot0jg4v3ai204rld00trbo4
    CREATE TABLE test (
    `pk` BIGINT NOT NULL COMMENT 'tag:0'
    -   `c1` BIGINT COMMENT 'tag:1'
    `c2` BIGINT COMMENT 'tag:2'
    `c3` BIGINT COMMENT 'tag:3'
    `c4` BIGINT COMMENT 'tag:4'
    `c5` BIGINT COMMENT 'tag:5'
    <    PRIMARY KEY (`pk`, `c1`)
    >    PRIMARY KEY (`pk`)
    );
    
    Also add the pk contstraint so it shows when it is not changing:
    --- a/test @ idfqe6c5s2i9ohihkk4r4tj70tf3l8c7
    +++ b/test @ 2tl4quv92ot0jg4v3ai204rld00trbo4
    CREATE TABLE test (
    `pk` BIGINT NOT NULL COMMENT 'tag:0'
    `c1` BIGINT COMMENT 'tag:1'
    `c2` BIGINT COMMENT 'tag:2'
    <          `c3` BIGINT COMMENT 'tag:3'
    >   `newColName3` BIGINT COMMENT 'tag:3'
    `c4` BIGINT COMMENT 'tag:4'
    `c5` BIGINT COMMENT 'tag:5'
    PRIMARY KEY (`pk`, `c1`)
    );
    
  • 228: bh/add commit date
  • 227: Bug fixes for sqllogictest dolt harness:
    • More inclusive types
    • Better error handling for panics
    • Cheat on tables without primary keys to allow more tests (~40%) to succeed.
  • 225: disable benchmarking dolt sql imports
  • 224: go/cmd/dolt: commands/sql: Keep the sql engine around throughout the lifetime of the shell / batch import.
  • 221: improved super schema names
  • 220: update go-mysql-server dependency
  • 219: dolt benchmarking
    Initial approach is to write a script that will run n benchmarks, collect their results, then serialize those results to later be imported into dolt. Looking for feedback on approach before I head too far down this path, if it is suboptimal.
    In it's current state, there are a lot of switch statements and panics and it only accounts for types int and string and only accounts for .csv style test data formats, but I'd like to make my data generation functions robust enough to be able to account for all file formats that dolt supports and all noms types...
  • 218: Added skipped bats test for schema diffs on adding a primary key
  • 216: Andy/sqlschemadiffs
    Adding schema changes to dolf diff --sql output. Supports:
    • add/drop table
    • add/drop column
    • rename table
    • rename column
  • 215: diff table
  • 214: Bh/super schema
  • 213: Zachmu/sql performance
  • 212: Zachmu/sql indexes2
  • 211: Added time to the handled cases in DATETIME & changed tests
    This won't compile until https://github.com/liquidata-inc/go-mysql-server/pull/26 is referenced in go.mod.
dolt - Dolt 0.11.0 released

Published by oscarbatori almost 5 years ago

We are excited to announce the release of Dolt 0.11.0.

SQL

System Table

We implemented a dolt log table, thus making our first attempt to surface dolt version control concepts in SQL by surfacing commit data. This will allow users to leverage commit data in an automated setting via SQL. Clone a public repo to see how it works:

$ dolt clone Liquidata/ip-to-country
$ cd ip-to-country
$ dolt sql -q "select * from dolt_log"
$ dolt sql -q "select committer,date from dolt_log order by date desc"
+-------------+--------------------------------+
| committer   | date                           |
+-------------+--------------------------------+
| Tim Sehn    | Wed Sep 25 12:30:43 -0400 2019 |
| Tim Sehn    | Wed Sep 18 18:27:02 -0400 2019 |
.
.
.

Timestamps

We added support for DATETIME data type in SQL. This is a major milestone in achieving compatibility with existing RDBMS solutions.

Performance

We continue to rapidly improve our SQL implementation. On the performance side some degenerate cases of query performance saw large improvements. We also resolved some issues where update statements had to be "over parenthesized", with the parser now matching the standard.

Other

We support null values in CSV files that are imported via the command line, as well as minor bug fixes under the hood.

If you find any bugs, or have any questions or feature requests, please create an issue and we will take a look.

Merged PRs

  • 208: go/libraries/doltcore/row: tagged_values.go: Fix n^2 behavior in ParseTaggedValues.
    ParseTaggedValues used to call Tuple.Get(0)...Tuple.Get(n), but Tuple.Get(x)
    has O(n) perf, so the function did O(n^2) decoding work to decode a tuple.
    Use a TupleIterator instead.
  • 206: go/store/types: Improve perf of value decoding for primitive types.
    This fixes a performance regression in value decoding after the work to make it easier to add primitive types to the storage layer.
    First, we change some map lookups into slice lookups, because hashing the small integers on hot decode paths dominates CPU profiles.
    Next, we inline logic for some frequently used primitive types in value_decoder.go, as opposed to going through the table indirection. This is about a 30% perf improvement for linear scans on skipValue(), which is worth the duplication here.
    Code for adding a kind remains correct if the decoder isn't changed to include an inlined decode path. We omit inlining UUID and InlineBlob here for now.
  • 199: Km/redo import nulls
  • 198: checkout a remote only branch
  • 196: Bh/log table
  • 195: Added Timestamp to Dolt and Datetime to SQL
    Have a look!
    I ran into an import cycle issue that I just could not figure out how to avoid, except by putting the tests into their own test folder (sqle/types/tests), so that's why they're in there. In particular, the cycle was that sqle imports sqle/types, and the tests rely on (and thus must import) sqle, causing the cycle.
    I'm thinking of adding tests for the other SQL types later so that we have a few more built-in tests using the server portion, rather than everything using the -q pathway. That will be a different/future PR though.
  • 193: diff where and limit
  • 191: fix branch name panic with period
    Looked into supporting periods in branch names, but it looks like noms relies on periods specifically pretty heavily. Seems to be excluded from the regex below by design, since they build some types on the expectation that a branch name or ref contain a period.
    My understanding is that a user's branch name is used to look up a particular dataset within the noms layer and this variable (go/store/datas/dataset.go):
    // DatasetRe is a regexp that matches a legal Dataset name anywhere within the
    // target string.
    var DatasetRe = regexp.MustCompile(`[a-zA-Z0-9\-_/]+`)
    
    acts as the regex source of "truth" for branch names/ dataset look ups, and I believe more.
    Noms also expects to be able to append a . to this string in order to parse the string later and correctly create it's Path types...
    I went down a rabbit hole trying to change all of the noms Path delimiters to be a different character, but the changes go pretty deep and start breaking a lot of things. Happy to continue down that course in order to support periods in branch names, but it might take me a bit of time change everything. I'm also not sure what character should replace the period... asterisk? Anyway, this PR seemed like low hanging fruit fix to resolve the panic at least.
  • 190: Missed Kind to String
    In my last PR, it looks like I missed that we were using the old DoltToSQLType hardcoded map from the original SQL implementation. I didn't change it everywhere (it's used heavily in the old SQL code that isn't even being called anymore), but it's changed where it matters. Added a new interface function and changed the printing code to be a bit more consistent (we were mixing uppercase with lowercase).
    I'm also returning different values, such as BIGINT for sql.Int64, as int parses in MySQL to a 32-bit integer, which isn't correct. Essentially made it so that if you took the CREATE statement exactly as-is and exported your data to a bunch of inserts and ran it in MySQL then it wouldn't error out, as it previously would have.
  • 189: diff source refactor
  • 188: go/cmd/git-dolt/README.md: Add comparison to git-lfs and note on updates
  • 187: Remove skip on test for / in branch names. Added a skipped test for .…
    … in branch names. . in branch names panics rights now.
  • 186: go/go.mod: Pick up sqlparser improvements for ADD COLUMN, RENAME COLUMN. Fix some tests.
  • 185: Moved command line SQL processing to use new engine for CREATE and DROP
  • 183: add appid to logevents requests
    Need to update the requests so that it reflects the current proto definitions.
  • 182: Moved SQL types to an interface
    Have a look! Just make an empty struct type that implements SqlTypeInit and add the struct to sqlTypeInitializers and you've got a type that works in SQL now!
  • 179: clone reliability
  • 178: go/store/nbs: store.go: Be more careful about updates to nbs field values until all operations have completed successfully.
  • 177: go/cmd/dolt: Bump version to 0.10.0

Closed Issues

  • 194: Wide tables create poor query performance
dolt - 0.10.0

Published by mjesuele almost 5 years ago

We are excited to announce the latest release of Dolt, which includes a new feature, substantial improvements to existing features, and a new Windows installer.

Dolt Blame

Dolt now has a blame command, which provides row audit functionality familiar to Git users. (Blame for individual cells is in the works.) We have a deep dive on the implementation of dolt blame on our blog, so definitely check that out if you're interested.

One of the long-term goals of Dolt is to provide a database with fine-grained audit capabilities to support hygienic management of valuable human-scale data, and this feature is a huge step towards realizing that vision.

SQL Enhancements

One of our major goals for the product is full SQL compliance; this release contains steps towards achieving that. In particular, the following commands are now supported:

  • CREATE TABLE
  • DROP TABLE
  • INSERT VALUES & INSERT SET (no IGNORE or ON DUPLICATE KEY UPDATE support yet, also no INSERT SELECT support yet)
  • UPDATE (Single table, no IGNORE support yet)
  • REPLACE VALUES and REPLACE SET (no REPLACE SELECT support yet)

As well as making progress against our goal of full compliance, we also created a test suite that will help validate our SQL implementation. Check out the test suite and harness, and the related blog post. This is an important step in creating a fully transparent mechanism for our progress against our compliance goal.

We also fixed some bugs and made some performance improvements.

Schema Import

We now support schema inference from a CSV file. This is a convenience function to make importing a CSV with a correct schema easier. The command is best understood by looking at the help details:

$ dolt schema import --help
NAME
	dolt schema import - Creates a new table with an inferred schema.

SYNOPSIS
	dolt schema import [--create|--replace] [--force] [--dry-run] [--lower|--upper] [--keep-types] [--file-type <type>] [--float-threshold] [--map <mapping-file>] [--delim <delimiter>]--pks <field>,... <table> <file>

Windows Installer Packages

We now provide both 32- and 64-bit MSI packages for easy installation of Dolt on Windows. These may be used instead of manually extracting the archives (which are now provided in .zip format instead of .tar.gz). Please let us know if you encounter any issues.

Other

Various bug fixes and enhancements, and also improvements to dolt clone which had a problematic race condition.

As always, bug reports, feedback, and feature requests are very much appreciated. We hope you enjoy using Dolt!

Merged PRs

  • 175: {bats, go}: Make commit spec truly optional in blame
    $ dolt blame lunch-places
    +--------------------+----------------------------------------------------+-----------------+------------------------------+----------------------------------+
    | NAME               | COMMIT MSG                                         | AUTHOR          | TIME                         | COMMIT                           |
    +--------------------+----------------------------------------------------+-----------------+------------------------------+----------------------------------+
    | Boa                | Had an unhandled schema merge conflict which I ch… | Tim Sehn        | Fri Mar 22 12:21:59 PDT 2019 | ffabndafbp64r393ttghused8siip77j |
    | Chipotle           | lunch-places: Added Chipotle                       | katie mcculloch | Thu Aug 29 11:38:00 PDT 2019 | m2jbro89ou8g6rv71rs7q9f3jsmjuk1d |
    | Sidecar            | Had an unhandled schema merge conflict which I ch… | Tim Sehn        | Fri Mar 22 12:21:59 PDT 2019 | ffabndafbp64r393ttghused8siip77j |
    | Wendy's            | Had an unhandled schema merge conflict which I ch… | Tim Sehn        | Fri Mar 22 12:21:59 PDT 2019 | ffabndafbp64r393ttghused8siip77j |
    | Bangkok West Thai  | Had an unhandled schema merge conflict which I ch… | Tim Sehn        | Fri Mar 22 12:21:59 PDT 2019 | ffabndafbp64r393ttghused8siip77j |
    | Jamba Juice        | Had an unhandled schema merge conflict which I ch… | Tim Sehn        | Fri Mar 22 12:21:59 PDT 2019 | ffabndafbp64r393ttghused8siip77j |
    | Kazu Nori          | Had an unhandled schema merge conflict which I ch… | Tim Sehn        | Fri Mar 22 12:21:59 PDT 2019 | ffabndafbp64r393ttghused8siip77j |
    | McDonald's         | Had an unhandled schema merge conflict which I ch… | Tim Sehn        | Fri Mar 22 12:21:59 PDT 2019 | ffabndafbp64r393ttghused8siip77j |
    | Sunnin             | change rating                                      | bheni           | Thu Apr  4 15:43:00 PDT 2019 | 137qgvrsve1u458briekqar5f7iiqq2j |
    | Bruxie             | Had an unhandled schema merge conflict which I ch… | Tim Sehn        | Fri Mar 22 12:21:59 PDT 2019 | ffabndafbp64r393ttghused8siip77j |
    | Espresso Cielo     | added Espresso Cielo                               | Matt Jesuele    | Wed Jul 10 12:20:39 PDT 2019 | 314hls5ncucpol2qfdphf923s21luk16 |
    | Seasalt Fish Grill | fixed ratings                                      | bheni           | Thu Apr  4 14:07:36 PDT 2019 | rqpd7ga1nic3jmc54h44qa05i8124vsp |
    | Starbucks          | Had an unhandled schema merge conflict which I ch… | Tim Sehn        | Fri Mar 22 12:21:59 PDT 2019 | ffabndafbp64r393ttghused8siip77j |
    | Tocaya             | update tocaya rating                               | bheni           | Thu Jun  6 17:22:24 PDT 2019 | qi331vjgoavqpi5am334cji1gmhlkdv5 |
    | Sake House         | fixed ratings                                      | bheni           | Thu Apr  4 14:07:36 PDT 2019 | rqpd7ga1nic3jmc54h44qa05i8124vsp |
    | Swingers           | Had an unhandled schema merge conflict which I ch… | Tim Sehn        | Fri Mar 22 12:21:59 PDT 2019 | ffabndafbp64r393ttghused8siip77j |
    | Art's Table        | Had an unhandled schema merge conflict which I ch… | Tim Sehn        | Fri Mar 22 12:21:59 PDT 2019 | ffabndafbp64r393ttghused8siip77j |
    | Bay Cities         | Had an unhandled schema merge conflict which I ch… | Tim Sehn        | Fri Mar 22 12:21:59 PDT 2019 | ffabndafbp64r393ttghused8siip77j |
    | Benny's Tacos      | Had an unhandled schema merge conflict which I ch… | Tim Sehn        | Fri Mar 22 12:21:59 PDT 2019 | ffabndafbp64r393ttghused8siip77j |
    | Bibibop            | Had an unhandled schema merge conflict which I ch… | Tim Sehn        | Fri Mar 22 12:21:59 PDT 2019 | ffabndafbp64r393ttghused8siip77j |
    | Curious Palate     | Had an unhandled schema merge conflict which I ch… | Tim Sehn        | Fri Mar 22 12:21:59 PDT 2019 | ffabndafbp64r393ttghused8siip77j |
    | Meat on Ocean      | Had an unhandled schema merge conflict which I ch… | Tim Sehn        | Fri Mar 22 12:21:59 PDT 2019 | ffabndafbp64r393ttghused8siip77j |
    +--------------------+----------------------------------------------------+-----------------+------------------------------+----------------------------------+
    
  • 174: Update dolt blame description
  • 170: Skipping the right SQL test that has a hanging race condition (joins …
    …on legacy engine, code to be deleted soon)
  • 168: Bh/correctness fixes
  • 166: clone bug fix
  • 165: README.md: Remove Tim's username from shell prompt
  • 161: Zachmu/sql logictest
    Improved main method for running or parsing sqllogic tests.
  • 160: Bh/upload error checking
  • 159: Zachmu/sql logictest
    Removed code for sqllogic test and took a dependency on the new module instead.
  • 158: Disabling sql server tests on linux, since they appear to be hanging …
    …waiting for the server to start
  • 157: Basic dolt blame
    This still needs some more BATS tests and maybe some UI touches like what @Hydrocharged suggested, but overall I think it's ready for some eyes.
    The biggest thing I don't like is the logic surrounding pretty-printing of primary keys (you'll see why) but please do tell me if you notice other things that are janky.
    In general, feedback and suggestions are very welcome.
  • 156: Bh/schema import
  • 155: Zachmu/sql logictest
    Implementation of sqllogictest for dolt on go-mysql-server. After getting this merged, I plan to fork off the non-dolt portions to a separate repo.
  • 154: Bumping dependency of go-mysql-server to head of ld-master branch. Al…
    …so fixing several issues that come up when doing so.
  • 152: go/store/nbs: Add recover in table_set Rebase goroutines. (Saw a SIGSEGV which crashed doltremoteapi).
  • 148: Added new InlineBlob type
    This turned out to be far smaller than I thought as far as changes go. This types change might be simpler than I first thought! I probably felt it was harder just because finding all of these locations was a major pain...
  • 146: proto/dolt/services/eventsapi: Adopt the version of eventsapi that lives in ld repo instead of here.
  • 145: Update README.md
  • 144: Miscellaneous small changes
    Just some small stuff I cherry-picked out of the dolt blame PR I'm gonna drop tomorrow. In particular, I use getCommitSpec in that.
  • 142: fixes batch building issue
  • 140: go/libraries/doltcore/doltdb/doltdb.go: Change initial commit message to imperative tense
    This is the style recommended for contributions to Git itself and used by GitHub by default:
    image
  • 137: Added skipped bats test for the non-existant primary key bug
  • 135: fix tag number on pb remote_url_scheme
  • 134: Changed IsValidTableName to allow single character table names, added…
    … tests
  • 132: updated schema command text
  • 131: bats: Factor setup and teardown logic out of tests, greatly reducing duplication
    This has been bothering me for a while.
  • 130: fix active remote url to only save the url scheme
  • 129: Batch inserts handle backslashes better
    Added a new bats test to demonstrate, but originally the function didn't handle \\'' properly, and now it does. Had to add 4 backslashes on the test because bash does its own escaping before passing it to dolt.
  • 128: bh/schema sub commands
    This is a refactor of the existing dolt schema functionality. Previously dolt schema had many different ways it could be run:
    "[<commit>] [<table>...]",
    "--export <table> <file>",
    "--add-column [--default <default_value>] [--not-null] [--tag <tag-number>] <table> <name> <type>",
    "--rename-column <table> <old> <new>",
    "--drop-column <table> <column>",
    
    These have all been split into separate commands:
    dolt schema show
    dolt schema export
    dolt schema add-column
    dolt schema rename-column
    dolt schema drop-column
  • 127: Increased the length of the buffer to read from STDIN for SQL stateme…
    …nts.
    My first Golang code.
  • 126: Fixed not returning error on invalid batch inserts
    Fixes Bats test as introduced in: https://github.com/liquidata-inc/dolt/pull/125
  • 125: Added skipped bats test for sql shell continuing after bad statement …
    …and not exiting 0
  • 124: Added skipped bats test for piped sql
  • 123: Fix cell count function for diff summary cells modified
  • 122: Fixed SQL batch insert breaking on semicolons in strings
    Fixes SQL bug as found in new Bats test introduced here: https://github.com/liquidata-inc/dolt/pull/124
  • 118: README.md: Minor updates
  • 117: Bh/diff summary opt

Closed Issues

  • 120: Semicolon bug in piped sql
dolt - 0.9.9

Published by reltuk about 5 years ago

Contained in this release

  1. remote performance improvements (clone, push, and pull)
  2. better support for MySQL in server mode, including DROP, UPDATE, INSERT
  3. SQL performance improvement
  4. diff summary
  5. more metrics
  6. other assorted bug fixes and improvements

If you find any bugs, have a feature request, or an interesting use-case, please raise an issue.

Merged PRs

  • 114: go/libraries/doltcore/sqle: types: Make SqlValToNomsVal compile for 32bit by checking for overflow on uint -> int64 differently.
  • 112: Zachmu/drop table
  • 110: go/utils/checkcommitters: Oscar is an allowed committer and author.
  • 109: attempted deadlock fix
  • 108: Correct the installation instructions
  • 105: dolt diff --summary
    Example output using Liquidata/tatoeba-sentence-translations:
    $ dolt diff --summary rnfm50gmumlettuebt2latmer617ni3t
    diff --dolt a/sentences b/sentences
    --- a/sentences @ gd1v6fsc04k5676c105d046m04hla3ia
    +++ b/sentences @ 2ttci8id13mijhv8u94qlioqegh7lgpo
    7,800,102 Rows Unmodified (99.99%)
    15,030 Rows Added (0.19%)
    108 Rows Deleted (0.00%)
    960 Rows Modified (0.01%)
    1,888 Cells Modified (0.00%)
    (7,801,170 Entries vs 7,816,092 Entries)
    diff --dolt a/translations b/translations
    --- a/translations @ p2355o6clst8ssvr9jha2bfgqbrstkmm
    +++ b/translations @ 62ri8lmohbhs1mc01m9o4rbvj6rbl8ee
    5,856,845 Rows Unmodified (90.91%)
    468,173 Rows Added (7.27%)
    578,242 Rows Deleted (8.98%)
    7,626 Rows Modified (0.12%)
    7,626 Cells Modified (0.06%)
    (6,442,713 Entries vs 6,332,494 Entries)
    
    Fixes #77
  • 104: Bh/output updates3
  • 103: dolt/go/store: Stop panicing on sequence walks when expected hashes are not in the ValueReader.
  • 101: go/{store,libraries/doltcore/remotestorage}: Make the code peddling in nbs table file formats a little more explicit about it.
  • 100: newline changes
  • 99: Implemented UPDATE
    I think we should delete the old SQL methods that are in the sql.go file. I know at first you mentioned keeping them there for reference, but they're not being used at all at this point, and they're still in git history if we want to look at them again in the future for some reason. It's clutter at this point.
    I'm skipping that one test at the end because of a WHERE decision in go-mysql-server. The code looks intentional, in that converting strings to ints will return 0 if the string is not parsable. I'll file it as a non-conforming bug on their end, but for now I'm skipping the test.
  • 98: Bh/output updates
  • 97: store/{nbs,chunks}: Make ChunkStore#GetMany{,Compressed} take send-only channels.
  • 96: update status messages for push/pull
  • 94: Update README.md
    Ensure that installing from source is properly documented, including go-gotchas.
  • 93: Reverts the revert of my push/pull changes with fixes.
  • 92: content length fix
  • 91: go: store/nbs: table_reader: getManyAtOffsetsWithReadFunc: Stop unbounded I/O parallelism in GetMany implementation.
    When we do things like push, pull or (soon-to-be) garbage collection, we have large sets of Chunk addresses that we pass into ChunkStore#GetMany and then go off and process. Clients largely try to control the memory overhead and pipeline depth by passing in a buffered channel of an appropriate size. The expectation is that the implementation of GetMany will have an amount of data in flight at any give in time that is in some reasonable way proportional to the channel size.
    In the current implementation, there is unbounded concurrency on the read destination allocations and the reads themselves, with one go routine spawned for each byte range we want to read. This results in absolutely massive (virtual) heap utilization and unreasonable I/O parallelism and context switch thrashing in large repo push/pull situations.
    This is a small PR to change the concurrency paradigm inside getManyAtOffsetsWithReadFunc so that we only have 4 concurrent dispatched reads per table_reader instance at a time.
    This is still not the behavior we actually want.
    • I/O concurrency should be configurable at the ChunkStore layer (or eventually per-device backing a set of tableReaders), and not depend on the number of tableReaders which happen to back the chunk store.
    • Memory overhead is still not correctly bounded here, since read ahead batches are allowed to grow to arbitrary sizes. Reasonable bounds on memory overhead should be configurable at the ChunkStore layer.
      I'm landing this as a big incremental improvement over status quo. Here are some non-reproducible one-shot test results from a test program. The test program walks the entire chunk graph, assembles every chunk address, and then does a GetManyCompressed on every chunk address and copies their contents to /dev/null. It was run on a ~10GB (compressed) data set:
      Before:
    $ /usr/bin/time -l -- go run test.go
    ...
    MemStats: Sys: 16628128568
    161.29 real        67.29 user       456.38 sys
    5106425856  maximum resident set size
    0  average shared memory size
    0  average unshared data size
    0  average unshared stack size
    10805008  page reclaims
    23881  page faults
    0  swaps
    0  block input operations
    0  block output operations
    0  messages sent
    0  messages received
    8  signals received
    652686  voluntary context switches
    21071339  involuntary context switches
    
    After:
    $ /usr/bin/time -l -- go run test.go
    ...
    MemStats: Sys: 4590759160
    32.17 real        30.53 user        29.62 sys
    4561879040  maximum resident set size
    0  average shared memory size
    0  average unshared data size
    0  average unshared stack size
    1228770  page reclaims
    67100  page faults
    0  swaps
    0  block input operations
    0  block output operations
    0  messages sent
    0  messages received
    14  signals received
    456898  voluntary context switches
    2954503  involuntary context switches
    
    On these runs, sys time, wallclock time, vm page reclaims and virtual memory used are all improved pretty substantially.
    Very open to feedback and discussion of potential performance regressions here, but I think this is an incremental win for now.
  • 90: Implemented REPLACE
    Mostly tests since this just uses the Delete and Insert functions that we already have. The previous delete would ignore a delete on a non-existent row, so I just changed it to throw the correct error if the row does not exist so that REPLACE works properly now (else it will always say a REPLACE did both a delete & insert).
  • 89: Push and Pull v2
  • 88: Add metrics attributes
    Similar to previous PR db/event-metrics, but this time, no byte measurements on clone as the implementation is different. Some things in the events package have been refactored to prevent circular dependencies. Adding StandardAttributes will help me generate the info for my new metrics.
  • 87: {go, bats}: Replace table works with file with schema in different order
  • 86: dolt table import -r
    Fixes #76
    Replaces existing table with the contents of the file while preserving the original schema
  • 85: Bh/cmp chunks
  • 84: revert nil check and always require stats to match aws behavior
  • 83: Bh/clone2
    This version of clone works on the table files directly. It enumerates all the table files and downloads them. It does not inspect the chunks as v1 did.
  • 82: Naked deletes now just delete everything instead of iterating
    I mean this works but it's ugly and I'm not sure of a better way to do it really
  • 81: Progress on switching deletes to new engine
    Currently works for deletes but not thoroughly testing.
  • 80: go/store/nbs: store.go: Make global index cache 64MB instead of 8MB.
  • 79: Removed skips for tests that will now work
    This will fail for now, waiting on https://github.com/liquidata-inc/go-mysql-server/pull/10 to be approved before I merge this in. Super small stuff though.
  • 73: go/libraries/doltcore/remotestorage: Add the ability to have a noop cache on DoltChunkStore.
  • 72: proto: Use fully qualified paths for go_packages.
    This allows cross-package references within proto files to work appropriately.
  • 71: Db/events dir lock
    initial implementation of making event flush concurrency safe
  • 70: go/store/spec: Move to aws://[table:bucket] for NBS on AWS specs because of Go URL parsing changes.
    See https://go.googlesource.com/go/+/61bb56ad63992a3199acc55b2537c8355ef887b6
    for context on the changes.
  • 69: proto: remotesapi: chunkstore: Update message names and fields to clarify between chunk hashes on downloads and table file hashes on uploads.
  • 68: doltcore: commitwalk: Implement GetDotDotRevisions.
    Roughly mimics git log master..feature. Useful for displaying the commit log
    of a pull request, for example.
  • 67: Add file emitter that writes event data file
    Added file emitter that saves event data to files, and a flush that parses the files and sends them to the grpc server.
  • 63: Update README.md
    @timsehn pointed out a shortcoming in the README file.
  • 7: Merge upstream master
  • 6: Fixed bug in comparisons for negative float literals
  • 5: Zachmu/is true
  • 4: Instead of adding offset to rowCount, just reverse the wrapping betwe…
    …en offset and limit nodes.
  • 3: Zachmu/float bugfixes
  • 2: Zachmu/limit bug fixes
  • 1: Replace of vitess dependency with our forked one, and commented local…
    … override

Closed Issues

  • 106: Installation instructions are incorrect
  • 95: dolt push segmentation fault
  • 77: dolt diff --summary
  • 76: dolt table import -r
  • 75: DoltHub: Add repo size to Dataset detail page
dolt - 0.9.8

Published by reltuk about 5 years ago

We have released version 0.98 of Dolt, which as you probably know is now open source. A quick reminder that you can freely host the awesome public data you put in Dolt at DoltHub.

This release contains performance improvements, and bug fixes but no major new features. Please let me know if you have any questions.

Merged PRs

  • 60: bump version
  • 57: Added a PID to a directory. This was causing jenkins on windows to fa…
    …il if it ran twice on the same instance.
  • 55: {bats,go}: Log successful commits
    This closes https://github.com/liquidata-inc/ld/issues/1744
    Before:
    $ dolt commit -m "commit ints"
    
    After:
    $ dolt commit -m "commit ints"
    commit 3cvbeh6bn94hlhfaig5pa65peiribrhn
    Author: Matt Jesuele <[email protected]>
    Date:   Mon Aug 26 19:10:17 -0700 2019
    commit ints
    
  • 50: add dustin to approved commiters/authors
  • 49: [WIP] Add client events to dolt commands
    Added events to all of the dolt commands.
    Turned logging back on while I work on this PR. (will remove before merge)
    I need to write tests for these, should I create a test file for each command file where I test to ensure that the command has an event and the appropriate metrics? Would love input on this.
  • 48: client events
  • 47: Threading context from app launch
  • 46: Add client_event.proto and compiled .go file
  • 45: Add support to get the last modified time from the filesys
  • 44: Changed default remote host to use the env constant
    Before we were using dolthub.com as the default, which is incorrect. I've changed it to the appropriate environment constant so that it also properly updates when we change from our beta domain.
  • 43: Created skipped test for newlines on CSV
  • 42: README.md: Remove erroneous go install instructions.
  • 41: Make the InMemFS thread safe
    The current InMemFS was failing in a multithreaded context as it edits a map which is not thread safe. Something to note is that golang locks are not re-entrant. Some of the refactoring is related to that. Locks are typically put on the exported methods and not the internal methods.
  • 40: Fixed JSON imports and disallowed schemas on import updates
    Fixes #36
  • 39: Add move file functionality to the filesys package
  • 38: Fixes a panic that occurs if multiple bad rows are found during import
    When a pipeline is being run, any stage can write to the bad row channel when an error is encountered. There is a go routine reading from this channel that will not exit until the channel is closed, or an error is encountered. In typical operation the pipeline's sink would close the bad row channel once the pipeline finishes (either via an error triggered stoppage, or successful completion). However, in the case where multiple errors are getting written to the bad row channel from multiple go routines, it is possible for the bad row channel to be written to, which triggers the pipeline to be stopped, and the channel to be closed, and then have a go routine write to that closed channel.
    The fix here is to not close the channel in the sink, but instead to write a marker to the channel which will cause the go routine watching for errors to exit.
  • 37: go/go.mod: Do not depend on //proto/third_party/golang-protobuf.
    Development ergonomics are much worse and the runtime library will maintain
    compability with the generator major version anyway, or it will explicitly
    break compilation.
  • 35: dolt/go: Fix spelling on ancestor
  • 34: proto/Makefile: Use submodule for protoc-gen-go instead of whatever is on the path.
  • 33: Jenkinsfile: Use goimports from go.mod for check_fmt.sh
  • 31: support importing and exporting data to and from stdin and std out
    In the current releases it was possible to chain dolt with other programs via stdout/stdin like so:
    dolt table export table_name --file-type csv /dev/stdout -f|python row_cleaner.py|dolt table import cleaned_data -u --file-type csv /dev/stdin
    Which only works in environments where stdin / stdout are mapped to files on the filesystem. This change will use the stdin / stdout streams for import / export when a file is not provided.
  • 30: Added column lengths for schema output to varchar columns so that the…
    …y can be re-imported
  • 29: go/cmd/dolt: dolt ls -v shows number of rows in each table.
  • 27: Refer to newest version of mmap-go
    We now strictly refer to our own fork of mmap-go. Plus cleaned up the go.mod, as we have git history and don't quite need the comments.
  • 25: Added .idea directory (goland) to top-level .gitignore file
  • 24: fix race condition which caused reproducible crash
    The declaration of the variables readStart, readEnd, and batch are declared outside of the for loop, and it is possible that their value can change before the go routine calls readAtOffsets causing some or all of these values to be incorrect. The fix is to save them to variables scoped to the loop before calling the go routine.
  • 23: Fixed a bug on windows when redirecting STDIN for SQL import, e.g. do…
    …lt sql < dump.sql. Also fixed up ip2nation sample so that it successfully imports

Closed Issues

  • 54: Just a note to say
  • 36: Unable to update tables using JSON files
dolt - 0.9.7

Published by reltuk about 5 years ago

The first public binary release of Dolt, git for data.