DataFusion Comet 0.2.0 Changelog

This release consists of 87 commits from 14 contributors. See credits at the end of this changelog for more information.

Fixed bugs:

fix: dictionary decimal vector optimization #705 (kazuyukitanimura)
fix: Unsupported window expression should fall back to Spark #710 (viirya)
fix: ReusedExchangeExec can be child operator of CometBroadcastExchangeExec #713 (viirya)
fix: Fallback to Spark for window expression with range frame #719 (viirya)
fix: Remove skip.surefire.tests mvn property #739 (wForget)
fix: subquery execution under CometTakeOrderedAndProjectExec should not fail #748 (viirya)
fix: skip negative scale checks for creating decimals #723 (kazuyukitanimura)
fix: Fallback to Spark for unsupported partitioning #759 (viirya)
fix: Unsupported types for SinglePartition should fallback to Spark #765 (viirya)
fix: unwrap dictionaries in CreateNamedStruct #754 (andygrove)
fix: Fallback to Spark for unsupported input besides ordering #768 (viirya)
fix: Native window operator should be CometUnaryExec #774 (viirya)
fix: Fallback to Spark when shuffling on struct with duplicate field name #776 (viirya)
fix: withInfo was overwriting information in some cases #780 (andygrove)
fix: Improve support for nested structs #800 (eejbyfeldt)
fix: Sort on single struct should fallback to Spark #811 (viirya)
fix: Check sort order of SortExec instead of child output #821 (viirya)
fix: Fix panic in avg aggregate and disable stddev by default #819 (andygrove)
fix: Supported nested types in HashJoin #735 (eejbyfeldt)

Performance related:

perf: Improve performance of CASE .. WHEN expressions #703 (andygrove)
perf: Optimize IfExpr by delegating to CaseExpr #681 (andygrove)
fix: optimize isNullAt #732 (kazuyukitanimura)
perf: decimal decode improvements #727 (parthchandra)
fix: Remove castting on decimals with a small precision to decimal256 #741 (kazuyukitanimura)
fix: optimize some bit functions #718 (kazuyukitanimura)
fix: Optimize getDecimal for small precision #758 (kazuyukitanimura)
perf: add metrics to CopyExec and ScanExec #778 (andygrove)
fix: Optimize decimal creation macros #764 (kazuyukitanimura)
perf: Improve count aggregate performance #784 (andygrove)
fix: Optimize read_side_padding #772 (kazuyukitanimura)
perf: Remove some redundant copying of batches #816 (andygrove)
perf: Remove redundant copying of batches after FilterExec #835 (andygrove)
fix: Optimize CheckOverflow #852 (kazuyukitanimura)
perf: Add benchmarks for Spark Scan + Comet Exec #863 (andygrove)

Implemented enhancements:

feat: Add support for time-zone, 3 & 5 digit years: Cast from string to timestamp. #704 (akhilss99)
feat: Support count AggregateUDF for window function #736 (huaxingao)
feat: Implement basic version of RLIKE #734 (andygrove)
feat: show executed native plan with metrics when in debug mode #746 (andygrove)
feat: Add GetStructField expression #731 (Kimahriman)
feat: Add config to enable native upper and lower string conversion #767 (andygrove)
feat: Improve native explain #795 (andygrove)
feat: Add support for null literal with struct type #797 (eejbyfeldt)
feat: Optimze CreateNamedStruct preserve dictionaries #789 (eejbyfeldt)
feat: CreateArray support #793 (Kimahriman)
feat: Add native thread configs #828 (viirya)
feat: Add specific configs for converting Spark Parquet and JSON data to Arrow #832 (andygrove)
feat: Support sum in window function #802 (huaxingao)
feat: Simplify configs for enabling/disabling operators #855 (andygrove)
feat: Enable clippy::clone_on_ref_ptr on proto and spark_exprs crates #859 (comphead)
feat: Enable clippy::clone_on_ref_ptr on core crate #860 (comphead)
feat: Use CometPlugin as main entrypoint #853 (andygrove)

Documentation updates:

doc: Update outdated spark.comet.columnar.shuffle.enabled configuration doc #738 (wForget)
docs: Add explicit configs for enabling operators #801 (andygrove)
doc: Document CometPlugin to start Comet in cluster mode #836 (comphead)

Other:

chore: Make rust clippy happy #701 (Xuanwo)
chore: Update version to 0.2.0 and add 0.1.0 changelog #696 (andygrove)
chore: Use rust-toolchain.toml for better toolchain support #699 (Xuanwo)
chore(native): Make sure all targets in workspace been covered by clippy #702 (Xuanwo)
Apache DataFusion Comet Logo #697 (aocsa)
chore: Add logo to rat exclude list #709 (andygrove)
chore: Use new logo in README and website #724 (andygrove)
build: Add Comet logo files into exclude list #726 (viirya)
chore: Remove TPC-DS benchmark results #728 (andygrove)
chore: make Cast's logic reusable for other projects #716 (Blizzara)
chore: move scalar_funcs into spark-expr #712 (Blizzara)
chore: Bump DataFusion to rev 35c2e7e #740 (andygrove)
chore: add more aggregate functions to benchmark test #706 (huaxingao)
chore: Add criterion benchmark for decimal_div #743 (andygrove)
build: Re-enable TPCDS q72 for broadcast and hash join configs #781 (viirya)
chore: bump DataFusion to rev f4e519f #783 (huaxingao)
chore: Upgrade to DataFusion rev bddb641 and disable "skip partial aggregates" feature #788 (andygrove)
chore: Remove legacy code for adding a cast to a coalesce #790 (andygrove)
chore: Use DataFusion 41.0.0-rc1 #794 (andygrove)
chore: rename CometRowToColumnar and fix duplication bug #785 (Kimahriman)
chore: Enable shuffle in micro benchmarks #806 (andygrove)
Minor: ScanExec code cleanup and additional documentation #804 (andygrove)
chore: Make it possible to run 'make benchmark-%' using jvm 17+ #823 (eejbyfeldt)
chore: Add more unsupported cases to supportedSortType #825 (viirya)
chore: Enable Comet shuffle with AQE coalesce partitions #834 (viirya)
chore: Add GitHub workflow to publish Docker image #847 (andygrove)
chore: Revert "fix: change the not exists base image apache/spark:3.4.3 to 3.4.2" #854 (haoxins)
chore: fix docker-publish attempt 1 #851 (andygrove)
minor: stop warning that AQEShuffleRead cannot run natively #842 (andygrove)
chore: Improve ObjectHashAggregate fallback error message #849 (andygrove)
chore: Fix docker image publishing (specify ghcr.io in tag) #856 (andygrove)
chore: Use Git tag as Comet version when publishing Docker images #857 (andygrove)

Credits

Thank you to everyone who contributed to this release. Here is a breakdown of commits (PRs merged) per contributor.

    36	Andy Grove
    16	Liang-Chi Hsieh
     9	KAZUYUKI TANIMURA
     5	Emil Ejbyfeldt
     4	Huaxin Gao
     3	Adam Binford
     3	Oleks V
     3	Xuanwo
     2	Arttu
     2	Zhen Wang
     1	Akhil S S
     1	Alexander Ocsa
     1	Parth Chandra
     1	Xin Hao

Thank you also to everyone who contributed in other ways such as filing issues, reviewing PRs, and providing feedback on this release.

datafusion-comet - 0.1.0

Published by andygrove about 2 months ago

DataFusion Comet 0.1.0 Changelog

This release consists of 343 commits from 41 contributors. See credits at the end of this changelog for more information.