delta-rs

A native Rust library for Delta Lake, with bindings into Python

APACHE-2.0 License

Downloads
5.6M
Stars
2K
Committers
79

Bot releases are hidden (Show)

delta-rs - python-v0.15.2: predicate overwrite, improved table state replay

Published by ion-elgreco 9 months ago

New features

Bug Fixes

Other Changes

New Contributors

Full Changelog: https://github.com/delta-io/delta-rs/compare/python-v0.15.1...python-v0.15.2

delta-rs - python-v0.15.1

Published by ion-elgreco 10 months ago

New features

Bug Fixes

Other Changes

Full Changelog: https://github.com/delta-io/delta-rs/compare/python-v0.15.0...python-v0.15.1

delta-rs - python-v0.15.0: check constraints operation, and faster MERGE

Published by ion-elgreco 10 months ago

New features

Bug Fixes

Breaking Changes

To control the writer properties in .update you need to pass the deltalake.WriterProperties class instead of a dicationary.

Other Changes

New Contributors

Full Changelog: https://github.com/delta-io/delta-rs/compare/python-v0.14.0...python-v0.15.0

delta-rs - python-v0.14.0

Published by wjones127 11 months ago

New features

Bug Fixes

Other Changes

New Contributors

Full Changelog: https://github.com/delta-io/delta-rs/compare/python-v0.13.0...python-v0.14.0

delta-rs - rust-v0.16.5

Published by rtyler 11 months ago

⚠️ If you are upgrading from any release other than 0.16.4, please also read these release notes ⚠️

This release includes a number of minor bug fixes including one for users of create_checkpoint_for() which previously allowed the caller to specify a version which did not match the loaded table state, leading to incorrect _last_checkjpoint files and a broke Delta table.

delta-rs - rust-v0.16.4

Published by rtyler 11 months ago

The v0.16.4 version of the deltalake crate contains one notable and important fix: an upgrade to the dynamodb_lock crate to v0.6.1.

That release changes the expected of the format for leaseDuration in DynamoDb from String to Number, which is a long-overlooked bug in the lock code which prevented stale locks from being reaped automatically using DynamoDb's TTL attribute

⚠️ CAUTION: Users of DynamoDb-based locking should use caution when upgrading their applications. ⚠️

Pre-existing locks should be properly respected by this newer version of dynamodb_lock however the consequences of a lock not being respected can result in data corruption of Delta tables. It is therefore recommended that when upgrading:

  • All writers using a given DynamoDb table for locking are stopped
  • DynamoDb is inspected and stale locks are cleared.
  • TTL is enabled on the table on the leaseDuration attribute (adjust if the application uses a different attribute name for lease duration).
  • Writers are restarted.
delta-rs - python-v0.13.0: Repair operation and PyArrow 13+ support

Published by wjones127 12 months ago

New features

Bug fixes

Other changes

New Contributors

Full Changelog: https://github.com/delta-io/delta-rs/compare/python-v0.12.0...python-v0.13.0

delta-rs - python-v0.12.0: Delete, Update, and Merge

Published by wjones127 about 1 year ago

What's Changed

New features

Bug fixes

Other contributions

Breaking changes

The DeltaTable.history() method now returns transactions in reverse chronological order. This matches the Spark implementation.

DeltaTable.files_by_partitions() has been removed. It has been deprecated since 0.7.0. Use DeltaTable.file_uris() instead.

DeltaTable.pyarrow_schema() has been removed. it has been deprecated since 0.7.0. Use DeltaTable.schema().to_pyarrow() instead.

New Contributors

Full Changelog: https://github.com/delta-io/delta-rs/compare/python-v0.11.0...python-v0.12.0

delta-rs - rust-v0.16.0

Published by rtyler about 1 year ago

Full Changelog

Implemented enhancements:

  • Expose Optimize option min_commit_interval in Python #1640
  • Expose create_checkpoint_for #1513
  • integration tests regularly fail for HDFS #1428
  • Add Support for Microsoft OneLake #1418
  • add support for atomic rename in R2 #1356

Fixed bugs:

  • Writing with large arrow types (e.g. large_utf8), writes wrong partition encoding #1669
  • [python] Different stringification of partition values in reader and writer #1653
  • Unable to interface with data written from Spark Databricks #1651
  • get_last_checkpoint does some unnecessary listing #1643
  • PartitionWriter's buffer_len doesn't include incomplete row groups #1637
  • Slack community invite link has expired #1636
  • delta-rs does not appear to support tables with liquid clustering #1626
  • Internal Parquet panic when using a Map type. #1619
  • partition_by with "$" on local filesystem #1591
  • ProtocolChanged error when perfoming append write #1585
  • Unable to cargo update using git tag or rev on Rust 1.70 #1580
  • NoMetadata error when reading detlatable #1562
  • Cannot read delta table: Delta protocol violation #1557
  • Update the CODEOWNERS to capture the current reviewers and contributors #1553
  • [Python] Incorrect file URIs when partition values contain escape character #1533
  • add documentation how to Query Delta natively from datafusion #1485
  • Python: write_deltalake to ADLS Gen2 issue #1456
  • Partition values that have been url encoded cannot be read when using deltalake #1446
  • Error optimizing large table #1419
  • Cannot read partitions with special characters (including space) with pyarrow >= 11 #1393
  • ImportError: deltalake/_internal.abi3.so: cannot allocate memory in static TLS block #1380
  • Invalid JSON in log record missing field schemaString for DLT tables #1302
  • Special characters in partition path not handled locally #1299

Merged pull requests:

  • chore: bump rust crate version #1675 (rtyler)
  • fix: change partitioning schema from large to normal string for pyarrow<12 #1671 (ion-elgreco)
  • feat: allow to set large dtypes for the schema check in write_deltalake #1668 (ion-elgreco)
  • docs: small consistency update in guide and readme #1666 (ion-elgreco)
  • fix: exception string in writer.py #1665 (sebdiem)
  • chore: increment python library version #1664 (wjones127)
  • docs: fix some typos #1662 (ion-elgreco)
  • fix: more consistent handling of partition values and file paths #1661 (roeap)
  • docs: add docstring to protocol method #1660 (MrPowers)
  • docs: make docs.rs build docs with all features enabled #1658 (simonvandel)
  • fix: enable offset listing for s3 #1654 (eeroel)
  • chore: fix the incorrect Slack link in our readme #1649 (rtyler)
  • fix: compensate for invalid log files created by Delta Live Tables #1647 (rtyler)
  • chore: proposed updated CODEOWNERS to allow better review notifications #1646 (rtyler)
  • feat: expose min_commit_interval to optimize.compact and optimize.z_order #1645 (ion-elgreco)
  • fix: avoid excess listing of log files #1644 (eeroel)
  • fix: introduce support for Microsoft OneLake #1642 (rtyler)
  • fix: explicitly require chrono 0.4.31 or greater #1641 (rtyler)
  • fix: include in-progress row group when calculating in-memory buffer length #1638 (BnMcG)
  • chore: relax chrono pin to 0.4 #1635 (houqp)
  • chore: update datafusion to 31, arrow to 46 and object_store to 0.7 #1634 (houqp)
  • docs: update Readme #1633 (dennyglee)
  • chore: pin the chrono dependency #1631 (rtyler)
  • feat: pass known file sizes to filesystem in Python #1630 (eeroel)
  • feat: implement parsing for the new domainMetadata actions in the commit log #1629 (rtyler)
  • ci: fix python release #1624 (wjones127)
  • ci: extend azure timeout #1622 (wjones127)
  • feat: allow multiple incremental commits in optimize #1621 (kvap)
  • fix: change map nullable value to false #1620 (cmackenzie1)
  • Introduce the changelog for the last couple releases #1617 (rtyler)
  • chore: bump python version to 0.10.2 #1616 (wjones127)
  • perf: avoid holding GIL in DeltaFileSystemHandler #1615 (wjones127)
  • fix: don't re-encode paths #1613 (wjones127)
  • feat: use url parsing from object store #1592 (roeap)
  • feat: buffered reading of transaction logs #1549 (eeroel)
  • feat: merge operation #1522 (Blajda)
  • feat: expose create_checkpoint_for to the public #1514 (haruband)
  • docs: update Readme #1440 (roeap)
  • refactor: re-organize top level modules #1434 (roeap)
  • feat: integrate unity catalog with datafusion #1338 (roeap)
delta-rs - python-v0.11.0

Published by wjones127 about 1 year ago

What's Changed

New Features

Performance Improvements

Other

New Contributors

Full Changelog: https://github.com/delta-io/delta-rs/compare/python-v0.10.2...python-v0.11.0

delta-rs - python-v0.10.2

Published by wjones127 about 1 year ago

What's Changed

New features

Bug fixes

Other

New Contributors

Full Changelog: https://github.com/delta-io/delta-rs/compare/python-v0.10.1...python-v0.10.2

delta-rs - rust-v0.14.0

Published by rtyler about 1 year ago

What's Changed

New Contributors

Full Changelog: https://github.com/delta-io/delta-rs/compare/rust-v0.13.0...rust-v0.14.0

delta-rs - python-v0.10.1

Published by wjones127 about 1 year ago

What's Changed

New features

Fixes

Other

New Contributors

Full Changelog: https://github.com/delta-io/delta-rs/compare/python-v0.10.0...python-v0.10.1

delta-rs - rust-v0.13.0

Published by rtyler over 1 year ago

Full Changelog

Implemented enhancements:

  • Add nested struct supports #1518
  • Support FixedLenByteArray UUID statistics as a logical scalar #1483
  • Exposing create_add in the API #1458
  • Update features table on README #1404
  • docs(python): show data catalog options in Python API reference #1347
  • Add optimization to only list log files starting at a certain name #1252
  • Support configuring parquet compression #1235
  • parallel processing in Optimize command #1171

Fixed bugs:

  • get_add_actions() MAX is not showing complete value #1534
  • Can't get stats's minValues in add actions #1515
  • Pyarrow is_null filter not working as expected after loading using deltalake #1496
  • Can't write to table that uses generated columns #1495
  • Json error: Binary is not supported by JSON when writing checkpoint files #1493
  • _last_checkpoint size field is incorrect #1468
  • Error when Z Ordering a larger dataset #1459
  • Timestamp parsing issue #1455
  • File options are ignored when writing delta #1444
  • Slack Invite Link No Longer Valid #1425
  • cleanup_metadata doesn't remove .checkpoint.parquet files #1420
  • The test of reading the data from the blob storage located in Azurite container failed #1415
  • The test of reading the data from the bucket located in Minio container failed #1408
  • Datafusion: unreachable code reached when parsing statistics with missing columns #1374
  • vacuum is very slow on Cloudflare R2 #1366

Closed issues:

  • Expose Compression Options or WriterProperties for writing to Delta #1469
  • Support out-of-core Z-order using DataFusion #1460
  • Expose Z-order in Python #1442

Merged pull requests:

delta-rs - python-v0.10.0: Z-order, faster optimize and vacuum

Published by wjones127 over 1 year ago

What's Changed

New Contributors

Full Changelog: https://github.com/delta-io/delta-rs/compare/python-v0.9.0...python-v0.10.0

delta-rs -

Published by rtyler over 1 year ago

delta-rs - rust-v0.11.0

Published by rtyler over 1 year ago

Full Changelog

Implemented enhancements:

  • Implement simple delete case #832

Merged pull requests:

  • chore: update Rust package version #1346 (rtyler)
  • fix: replace deprecated arrow::json::reader::Decoder #1226 (rtyler)
  • feat: delete operation #1176 (Blajda)
  • feat: add wasbs to known schemes #1345 (iajoiner)
  • test: add some missing unit and doc tests for DeltaTablePartition #1341 (rtyler)
  • feat: write command improvements #1267 (roeap)
  • feat: added support for Databricks Unity Catalog #1331 (nohajc)
  • fix: double url encode of partition key #1324 (mrjoe7)
delta-rs - python-v0.9.0

Published by wjones127 over 1 year ago

What's Changed

New features

Fixes

New Contributors

Full Changelog: https://github.com/delta-io/delta-rs/compare/python-v0.8.1...python-v0.9.0

delta-rs - rust-v0.10.0

Published by rtyler over 1 year ago

Full Changelog

Implemented enhancements:

  • Support Optimize on non-append-only tables #1125

Fixed bugs:

  • DataFusion integration incorrectly handles partition columns defined "first" in schema #1168
  • Datafusion: SQL projection returns wrong column for partitioned data #1292
  • Unable to query partitioned tables #1291

Merged pull requests:

  • chore: add deprecation notices for commit logic on DeltaTable #1323 (roeap)
  • fix: handle local paths on windows #1322 (roeap)
  • fix: scan partitioned tables with datafusion #1303 (roeap)
  • fix: allow special characters in storage prefix #1311 (wjones127)
  • feat: upgrade to Arrow 37 and Datafusion 23 #1314 (rtyler)
  • Hide the parquet/json feature behind our own JSON feature #1307 (rtyler)
  • Enable the json feature for the parquet crate #1300 (rtyler)
delta-rs - rust-v0.9.0

Published by rtyler over 1 year ago

Full Changelog

Implemented enhancements:

  • hdfs support #300
  • Add decimal primitive type to document #1280
  • Improve error message when filtering on non-existant partition columns #1218

Fixed bugs:

  • Datafusion table provider: issues with timestamp types #441
  • Not matching column names when creating a RecordBatch from MapArray #1257
  • All stores created using DeltaObjectStore::new have an identical object_store_url #1188

Merged pull requests:

  • Upgrade datafusion to 22 which brings arrow upgrades with it #1249 (rtyler)
  • chore: df / arrow changes after update #1288 (roeap)
  • feat: read schema from parquet files in datafusion scans #1266 (roeap)
  • HDFS storage support via datafusion-objectstore-hdfs #1279 (iajoiner)
  • Add description of decimal primitive to SchemaDataType #1281 (ognis1205)
  • Fix names and nullability when creating RecordBatch from MapArray #1258 (balbok0)
  • Simplify the Store Backend Configuration code #1265 (mrjoe7)
  • feat: optimistic transaction protocol #632 (roeap)
  • Write support for additional Arrow datatypes #1044(chitralverma)
  • Unique delta object store url #1212 (gruuya)
  • improve err msg on use of non-partitioned column #1221 (marijncv)