datafusion-comet

Apache DataFusion Comet Spark Accelerator

APACHE-2.0 License

Downloads
1.3K
Stars
761
Committers
48

Bot releases are hidden (Show)

datafusion-comet - 0.2.0 Latest Release

Published by andygrove about 2 months ago

DataFusion Comet 0.2.0 Changelog

This release consists of 87 commits from 14 contributors. See credits at the end of this changelog for more information.

Fixed bugs:

  • fix: dictionary decimal vector optimization #705 (kazuyukitanimura)
  • fix: Unsupported window expression should fall back to Spark #710 (viirya)
  • fix: ReusedExchangeExec can be child operator of CometBroadcastExchangeExec #713 (viirya)
  • fix: Fallback to Spark for window expression with range frame #719 (viirya)
  • fix: Remove skip.surefire.tests mvn property #739 (wForget)
  • fix: subquery execution under CometTakeOrderedAndProjectExec should not fail #748 (viirya)
  • fix: skip negative scale checks for creating decimals #723 (kazuyukitanimura)
  • fix: Fallback to Spark for unsupported partitioning #759 (viirya)
  • fix: Unsupported types for SinglePartition should fallback to Spark #765 (viirya)
  • fix: unwrap dictionaries in CreateNamedStruct #754 (andygrove)
  • fix: Fallback to Spark for unsupported input besides ordering #768 (viirya)
  • fix: Native window operator should be CometUnaryExec #774 (viirya)
  • fix: Fallback to Spark when shuffling on struct with duplicate field name #776 (viirya)
  • fix: withInfo was overwriting information in some cases #780 (andygrove)
  • fix: Improve support for nested structs #800 (eejbyfeldt)
  • fix: Sort on single struct should fallback to Spark #811 (viirya)
  • fix: Check sort order of SortExec instead of child output #821 (viirya)
  • fix: Fix panic in avg aggregate and disable stddev by default #819 (andygrove)
  • fix: Supported nested types in HashJoin #735 (eejbyfeldt)

Performance related:

  • perf: Improve performance of CASE .. WHEN expressions #703 (andygrove)
  • perf: Optimize IfExpr by delegating to CaseExpr #681 (andygrove)
  • fix: optimize isNullAt #732 (kazuyukitanimura)
  • perf: decimal decode improvements #727 (parthchandra)
  • fix: Remove castting on decimals with a small precision to decimal256 #741 (kazuyukitanimura)
  • fix: optimize some bit functions #718 (kazuyukitanimura)
  • fix: Optimize getDecimal for small precision #758 (kazuyukitanimura)
  • perf: add metrics to CopyExec and ScanExec #778 (andygrove)
  • fix: Optimize decimal creation macros #764 (kazuyukitanimura)
  • perf: Improve count aggregate performance #784 (andygrove)
  • fix: Optimize read_side_padding #772 (kazuyukitanimura)
  • perf: Remove some redundant copying of batches #816 (andygrove)
  • perf: Remove redundant copying of batches after FilterExec #835 (andygrove)
  • fix: Optimize CheckOverflow #852 (kazuyukitanimura)
  • perf: Add benchmarks for Spark Scan + Comet Exec #863 (andygrove)

Implemented enhancements:

  • feat: Add support for time-zone, 3 & 5 digit years: Cast from string to timestamp. #704 (akhilss99)
  • feat: Support count AggregateUDF for window function #736 (huaxingao)
  • feat: Implement basic version of RLIKE #734 (andygrove)
  • feat: show executed native plan with metrics when in debug mode #746 (andygrove)
  • feat: Add GetStructField expression #731 (Kimahriman)
  • feat: Add config to enable native upper and lower string conversion #767 (andygrove)
  • feat: Improve native explain #795 (andygrove)
  • feat: Add support for null literal with struct type #797 (eejbyfeldt)
  • feat: Optimze CreateNamedStruct preserve dictionaries #789 (eejbyfeldt)
  • feat: CreateArray support #793 (Kimahriman)
  • feat: Add native thread configs #828 (viirya)
  • feat: Add specific configs for converting Spark Parquet and JSON data to Arrow #832 (andygrove)
  • feat: Support sum in window function #802 (huaxingao)
  • feat: Simplify configs for enabling/disabling operators #855 (andygrove)
  • feat: Enable clippy::clone_on_ref_ptr on proto and spark_exprs crates #859 (comphead)
  • feat: Enable clippy::clone_on_ref_ptr on core crate #860 (comphead)
  • feat: Use CometPlugin as main entrypoint #853 (andygrove)

Documentation updates:

  • doc: Update outdated spark.comet.columnar.shuffle.enabled configuration doc #738 (wForget)
  • docs: Add explicit configs for enabling operators #801 (andygrove)
  • doc: Document CometPlugin to start Comet in cluster mode #836 (comphead)

Other:

  • chore: Make rust clippy happy #701 (Xuanwo)
  • chore: Update version to 0.2.0 and add 0.1.0 changelog #696 (andygrove)
  • chore: Use rust-toolchain.toml for better toolchain support #699 (Xuanwo)
  • chore(native): Make sure all targets in workspace been covered by clippy #702 (Xuanwo)
  • Apache DataFusion Comet Logo #697 (aocsa)
  • chore: Add logo to rat exclude list #709 (andygrove)
  • chore: Use new logo in README and website #724 (andygrove)
  • build: Add Comet logo files into exclude list #726 (viirya)
  • chore: Remove TPC-DS benchmark results #728 (andygrove)
  • chore: make Cast's logic reusable for other projects #716 (Blizzara)
  • chore: move scalar_funcs into spark-expr #712 (Blizzara)
  • chore: Bump DataFusion to rev 35c2e7e #740 (andygrove)
  • chore: add more aggregate functions to benchmark test #706 (huaxingao)
  • chore: Add criterion benchmark for decimal_div #743 (andygrove)
  • build: Re-enable TPCDS q72 for broadcast and hash join configs #781 (viirya)
  • chore: bump DataFusion to rev f4e519f #783 (huaxingao)
  • chore: Upgrade to DataFusion rev bddb641 and disable "skip partial aggregates" feature #788 (andygrove)
  • chore: Remove legacy code for adding a cast to a coalesce #790 (andygrove)
  • chore: Use DataFusion 41.0.0-rc1 #794 (andygrove)
  • chore: rename CometRowToColumnar and fix duplication bug #785 (Kimahriman)
  • chore: Enable shuffle in micro benchmarks #806 (andygrove)
  • Minor: ScanExec code cleanup and additional documentation #804 (andygrove)
  • chore: Make it possible to run 'make benchmark-%' using jvm 17+ #823 (eejbyfeldt)
  • chore: Add more unsupported cases to supportedSortType #825 (viirya)
  • chore: Enable Comet shuffle with AQE coalesce partitions #834 (viirya)
  • chore: Add GitHub workflow to publish Docker image #847 (andygrove)
  • chore: Revert "fix: change the not exists base image apache/spark:3.4.3 to 3.4.2" #854 (haoxins)
  • chore: fix docker-publish attempt 1 #851 (andygrove)
  • minor: stop warning that AQEShuffleRead cannot run natively #842 (andygrove)
  • chore: Improve ObjectHashAggregate fallback error message #849 (andygrove)
  • chore: Fix docker image publishing (specify ghcr.io in tag) #856 (andygrove)
  • chore: Use Git tag as Comet version when publishing Docker images #857 (andygrove)

Credits

Thank you to everyone who contributed to this release. Here is a breakdown of commits (PRs merged) per contributor.

    36	Andy Grove
    16	Liang-Chi Hsieh
     9	KAZUYUKI TANIMURA
     5	Emil Ejbyfeldt
     4	Huaxin Gao
     3	Adam Binford
     3	Oleks V
     3	Xuanwo
     2	Arttu
     2	Zhen Wang
     1	Akhil S S
     1	Alexander Ocsa
     1	Parth Chandra
     1	Xin Hao

Thank you also to everyone who contributed in other ways such as filing issues, reviewing PRs, and providing feedback on this release.

datafusion-comet - 0.1.0

Published by andygrove about 2 months ago

DataFusion Comet 0.1.0 Changelog

This release consists of 343 commits from 41 contributors. See credits at the end of this changelog for more information.

Implemented enhancements:

  • feat: Add native shuffle and columnar shuffle #30 (viirya)
  • feat: Support Emit::First for SumDecimalGroupsAccumulator #47 (viirya)
  • feat: Nested map support for columnar shuffle #51 (viirya)
  • feat: Support Count(Distinct) and similar aggregation functions #42 (huaxingao)
  • feat: Upgrade to jni-rs 0.21 #50 (sunchao)
  • feat: Handle exception thrown from native side #61 (sunchao)
  • feat: Support InSet expression in Comet #59 (viirya)
  • feat: Add CometNativeException for exceptions thrown from the native side #62 (sunchao)
  • feat: Add cause to native exception #63 (viirya)
  • feat: Pull based native execution #69 (viirya)
  • feat: Add executeColumnarCollectIterator to CometExec to collect Comet operator result #71 (viirya)
  • feat: Add CometBroadcastExchangeExec to support broadcasting the result of Comet native operator #80 (viirya)
  • feat: Reduce memory consumption when writing sorted shuffle files #82 (sunchao)
  • feat: Add struct/map as unsupported map key/value for columnar shuffle #84 (viirya)
  • feat: Support multiple input sources for CometNativeExec #87 (viirya)
  • feat: Date and timestamp trunc with format array #94 (parthchandra)
  • feat: Support First/Last aggregate functions #97 (huaxingao)
  • feat: Add support of TakeOrderedAndProjectExec in Comet #88 (viirya)
  • feat: Support Binary in shuffle writer #106 (advancedxy)
  • feat: Add license header by spotless:apply automatically #110 (advancedxy)
  • feat: Add dictionary binary to shuffle writer #111 (viirya)
  • feat: Minimize number of connections used by parallel reader #126 (parthchandra)
  • feat: Support CollectLimit operator #100 (advancedxy)
  • feat: Enable min/max for boolean type #165 (huaxingao)
  • feat: Introduce CometTaskMemoryManager and native side memory pool #83 (sunchao)
  • feat: Fix old style names #201 (comphead)
  • feat: enable comet shuffle manager for comet shell #204 (zuston)
  • feat: Support bitwise aggregate functions #197 (huaxingao)
  • feat: Support BloomFilterMightContain expr #179 (advancedxy)
  • feat: Support sort merge join #178 (viirya)
  • feat: Support HashJoin operator #194 (viirya)
  • feat: Remove use of nightly int_roundings feature #228 (psvri)
  • feat: Support Broadcast HashJoin #211 (viirya)
  • feat: Enable Comet broadcast by default #213 (viirya)
  • feat: Add CometRowToColumnar operator #206 (advancedxy)
  • feat: Document the class path / classloader issue with the shuffle manager #256 (holdenk)
  • feat: Port Datafusion Covariance to Comet #234 (huaxingao)
  • feat: Add manual test to calculate spark builtin functions coverage #263 (comphead)
  • feat: Support ANSI mode in CAST from String to Bool #290 (andygrove)
  • feat: Add extended explain info to Comet plan #255 (parthchandra)
  • feat: Improve CometSortMergeJoin statistics #304 (planga82)
  • feat: Add compatibility guide #316 (andygrove)
  • feat: Improve CometHashJoin statistics #309 (planga82)
  • feat: Support Variance #297 (huaxingao)
  • feat: Support murmur3_hash and sha2 family hash functions #226 (advancedxy)
  • feat: Disable cast string to timestamp by default #337 (andygrove)
  • feat: Improve CometBroadcastHashJoin statistics #339 (planga82)
  • feat: Implement Spark-compatible CAST from string to integral types #307 (andygrove)
  • feat: Implement Spark-compatible CAST from string to timestamp types #335 (vaibhawvipul)
  • feat: Implement Spark-compatible CAST float/double to string #346 (mattharder91)
  • feat: Only allow incompatible cast expressions to run in comet if a config is enabled #362 (andygrove)
  • feat: Implement Spark-compatible CAST between integer types #340 (ganeshkumar269)
  • feat: Supports Stddev #348 (huaxingao)
  • feat: Improve cast compatibility tests and docs #379 (andygrove)
  • feat: Implement Spark-compatible CAST from non-integral numeric types to integral types #399 (rohitrastogi)
  • feat: Implement Spark unhex #342 (tshauck)
  • feat: Enable columnar shuffle by default #250 (viirya)
  • feat: Implement Spark-compatible CAST from floating-point/double to decimal #384 (vaibhawvipul)
  • feat: Add logging to explain reasons for Comet not being able to run a query stage natively #397 (andygrove)
  • feat: Add support for TryCast expression in Spark 3.2 and 3.3 #416 (vaibhawvipul)
  • feat: Supports UUID column #395 (huaxingao)
  • feat: correlation support #456 (huaxingao)
  • feat: Implement Spark-compatible CAST from String to Date #383 (vidyasankarv)
  • feat: Add COMET_SHUFFLE_MODE config to control Comet shuffle mode #460 (viirya)
  • feat: Add random row generator in data generator #451 (advancedxy)
  • feat: Add xxhash64 function support #424 (advancedxy)
  • feat: add hex scalar function #449 (tshauck)
  • feat: Add "Comet Fuzz" fuzz-testing utility #472 (andygrove)
  • feat: Use enum to represent CAST eval_mode in expr.proto #415 (prashantksharma)
  • feat: Implement ANSI support for UnaryMinus #471 (vaibhawvipul)
  • feat: Add specific fuzz tests for cast and try_cast and fix NPE found during fuzz testing #514 (andygrove)
  • feat: Add fuzz testing for arithmetic expressions #519 (andygrove)
  • feat: Add HashJoin support for BuildRight #437 (viirya)
  • feat: Fix Comet error message #544 (comphead)
  • feat: Support Ansi mode in abs function #500 (planga82)
  • feat: Enable xxhash64 by default #583 (andygrove)
  • feat: Add experimental support for Apache Spark 3.5.1 #587 (andygrove)
  • feat: add nullOnDivideByZero for Covariance #564 (huaxingao)
  • feat: Implement more efficient version of xxhash64 #575 (andygrove)
  • feat: Enable Spark SQL tests for Spark 3.5.1 #603 (andygrove)
  • feat: Initial support for Window function #599 (huaxingao)
  • feat: IsNaN expression in Comet #612 (eejbyfeldt)
  • feat: Add support for CreateNamedStruct #620 (eejbyfeldt)
  • feat: add cargo machete to remove udeps #641 (vaibhawvipul)
  • feat: Upgrade to DataFusion 40.0.0-rc1 #644 (andygrove)
  • feat: Use unified allocator for execution iterators #613 (viirya)
  • feat: Create new datafusion-comet-spark-expr crate containing Spark-compatible DataFusion expressions #638 (andygrove)
  • feat: Move IfExpr to spark-expr crate #653 (andygrove)
  • feat: Upgrade to DataFusion 40 #657 (andygrove)
  • feat: Show user a more intuitive message when queries fall back to Spark #656 (andygrove)
  • feat: Enable remaining Spark 3.5.1 tests #676 (andygrove)
  • feat: Spark-4.0 widening type support #604 (kazuyukitanimura)
  • feat: add scalar subquery pushdown to scan #678 (parthchandra)

Fixed bugs:

  • fix: Comet sink operator should not have children operators #26 (viirya)
  • fix: Fix the UnionExec match branches in CometExecRule #68 (wankunde)
  • fix: Appending null values to element array builders of StructBuilder for null row in a StructArray #78 (viirya)
  • fix: Fix compilation error for CometBroadcastExchangeExec #86 (viirya)
  • fix: Avoid exception caused by broadcasting empty result #92 (wForget)
  • fix: Add num_rows when building RecordBatch #103 (advancedxy)
  • fix: Cast string to boolean not compatible with Spark #107 (erenavsarogullari)
  • fix: Another attempt to fix libcrypto.dylib loading issue #112 (advancedxy)
  • fix: Fix compilation error for Spark 3.2 & 3.3 #117 (sunchao)
  • fix: Fix corrupted AggregateMode when transforming plan parameters #118 (viirya)
  • fix: bitwise shift with different left/right types #135 (viirya)
  • fix: Avoid null exception in removeSubquery #147 (viirya)
  • fix: rat check error in vscode ide #161 (thexiay)
  • fix: Final aggregation should not bind to the input of partial aggregation #155 (viirya)
  • fix: coalesce should return correct datatype #168 (viirya)
  • fix: attempt to divide by zero error on decimal division #172 (viirya)
  • fix: Aggregation without aggregation expressions should use correct result expressions #175 (viirya)
  • fix: Comet native operator can be executed after ReusedExchange #187 (viirya)
  • fix: Try to convert a static list into a set in Rust #184 (advancedxy)
  • fix: Include active spiller when computing peak shuffle memory #196 (sunchao)
  • fix: CometExecRule should handle ShuffleQueryStage and ReusedExchange #186 (viirya)
  • fix: Use makeCopy to change relation in FileSourceScanExec #207 (viirya)
  • fix: Remove duplicate byte array allocation for CometDictionary #224 (viirya)
  • fix: Remove redundant data copy in columnar shuffle #233 (viirya)
  • fix: Only maps FIXED_LEN_BYTE_ARRAY to String for uuid type #238 (huaxingao)
  • fix: Reduce RowPartition memory allocation #244 (viirya)
  • fix: Remove wrong calculation for Murmur3Hash for float with null input #245 (advancedxy)
  • fix: Deallocate row addresses and size arrays after exporting #246 (viirya)
  • fix: Fix wrong children expression order in IfExpr #249 (viirya)
  • fix: Average expression in Comet Final should handle all null inputs from partial Spark aggregation #261 (viirya)
  • fix: Only trigger Comet Final aggregation on Comet partial aggregation #264 (viirya)
  • fix: incorrect result on Comet multiple column distinct count #268 (viirya)
  • fix: Avoid using CometConf #266 (snmvaughan)
  • fix: Fix arrow error when sorting on empty batch #271 (viirya)
  • fix: Include license using # instead of using XML comment #274 (snmvaughan)
  • fix: Comet should not translate try_sum to native sum expression #277 (viirya)
  • fix: incorrect result with aggregate expression with filter #284 (viirya)
  • fix: Comet should not fail on negative limit parameter #288 (viirya)
  • fix: Comet columnar shuffle should not be on top of another Comet shuffle operator #296 (viirya)
  • fix: Iceberg scan transition should be in front of other data source v2 #302 (viirya)
  • fix: CometExec's outputPartitioning might not be same as Spark expects after AQE interferes #299 (viirya)
  • fix: CometShuffleExchangeExec logical link should be correct #324 (viirya)
  • fix: SortMergeJoin with unsupported key type should fall back to Spark #355 (viirya)
  • fix: limit with offset should return correct results #359 (viirya)
  • fix: Disable Comet shuffle with AQE coalesce partitions enabled #380 (viirya)
  • fix: Unknown operator id when explain with formatted mode #410 (leoluan2009)
  • fix: Reuse CometBroadcastExchangeExec with Spark ReuseExchangeAndSubquery rule #441 (viirya)
  • fix: newFileScanRDD should not take constructor from custom Spark versions #412 (ceppelli)
  • fix: fix CometNativeExec.doCanonicalize for ReusedExchangeExec #447 (viirya)
  • fix: Enable cast string to int tests and fix compatibility issue #453 (andygrove)
  • fix: Compute murmur3 hash with dictionary input correctly #433 (advancedxy)
  • fix: Only delegate to DataFusion cast when we know that it is compatible with Spark #461 (andygrove)
  • fix: ColumnReader.loadVector should initiate CometDictionary after re-import arrays #473 (viirya)
  • fix: substring with negative indices should produce correct result #470 (sonhmai)
  • fix: CometReader.loadVector should not overwrite dictionary ids #476 (viirya)
  • fix: Reuse previous CometDictionary Java arrays #489 (viirya)
  • fix: Fallback to Spark for LIKE with custom escape character #478 (sujithjay)
  • fix: Incorrect input schema when preparing result expressions for HashAggregation #501 (viirya)
  • fix: Input batch to ShuffleRepartitioner.insert_batch should not be larger than configured batch size #523 (viirya)
  • fix: Fix integer overflow in date_parser #529 (eejbyfeldt)
  • fix: null character not permitted in chr function #513 (vaibhawvipul)
  • fix: Overflow when reading Timestamp from parquet file #542 (eejbyfeldt)
  • fix: Re-implement some Parquet decode methods without copy_nonoverlapping #558 (andygrove)
  • fix: requested character too large for encoding in chr function #552 (vaibhawvipul)
  • fix: Running cargo build always triggers rebuild #579 (eejbyfeldt)
  • fix: Avoid recursive call to canonicalizePlans #582 (viirya)
  • fix: Return error in pre_timestamp_cast instead of panic #543 (eejbyfeldt)
  • perf: Add criterion benchmark for xxhash64 function #560 (andygrove)
  • fix: Fix range out of index error with a temporary workaround #584 (viirya)
  • fix: Improve error "BroadcastExchange is not supported" #577 (parthchandra)
  • fix: Avoid creating huge duplicate of canonicalized plans for CometNativeExec #639 (viirya)
  • fix: Tag ignored tests that require SubqueryBroadcastExec #647 (parthchandra)
  • fix: Optimize some functions to rewrite dictionary-encoded strings #627 (vaibhawvipul)
  • fix: Remove nightly flag in release-nogit target in Makefile #667 (andygrove)
  • fix: change the not exists base image apache/spark:3.4.3 to 3.4.2 #686 (haoxins)
  • fix: Spark 4.0 SparkArithmeticException test #688 (kazuyukitanimura)
  • fix: address failure caused by method signature change in SPARK-48791 #693 (parthchandra)

Documentation updates:

  • doc: Add Quickstart Comet doc section #125 (comphead)
  • doc: Minor fix Getting started reformatting #128 (comphead)
  • doc: Add initial doc how to expand Comet exceptions #170 (comphead)
  • doc: Update README.md with shuffle configs #208 (viirya)
  • doc: Update supported expressions #237 (viirya)
  • doc: Fix a small typo in README.md #272 (rz-vastdata)
  • doc: Update DataFusion project name and url #300 (viirya)
  • docs: Move existing documentation into new Contributor Guide and add Getting Started section #334 (andygrove)
  • docs: Add more content to the user guide #347 (andygrove)
  • docs: Generate configuration guide in mvn build #349 (andygrove)
  • docs: Add a plugin overview page to the contributors guide #345 (andygrove)
  • doc: Fix target typo in development.md #364 (jc4x4)
  • doc: Clean up supported JDKs in README #366 (edmondop)
  • doc: add contributing in README.md #382 (caicancai)
  • docs: fix the docs url of installation instructions #393 (haoxins)
  • docs: Running ScalaTest suites from the CLI #404 (edmondop)
  • docs: Remove spark.comet.exec.broadcast.enabled from config docs #421 (andygrove)
  • docs: fix various sphinx warnings #428 (tshauck)
  • doc: Add Plan Stability Testing to development guide #432 (viirya)
  • docs: Update Spark shell command to include setting additional class path #435 (andygrove)
  • doc: Add Tuning Guide with shuffle configs #443 (viirya)
  • docs: Add benchmarking guide #444 (andygrove)
  • docs: add guide to adding a new expression #422 (tshauck)
  • docs: changes in documentation #512 (SemyonSinchenko)
  • docs: Improve user documentation for supported operators and expressions #520 (andygrove)
  • docs: Proposal for source release process #556 (andygrove)
  • docs: Update benchmark results #687 (andygrove)
  • docs: Update percentage speedups in benchmarking guide #691 (andygrove)
  • doc: Add memory tuning section to user guide #684 (viirya)

Other:

  • Initial PR #1 (sunchao)
  • build: Add Maven wrapper to the project #13 (sunchao)
  • build: Add basic CI test pipelines #18 (sunchao)
  • Bump com.google.protobuf:protobuf-java from 3.17.3 to 3.19.6 #5 (dependabot[bot])
  • build: Add PR template #23 (sunchao)
  • build: Create ticket templates #24 (comphead)
  • build: Re-enable Scala style checker and spotless #21 (sunchao)
  • build: Remove license header from pull request template #28 (viirya)
  • build: Exclude .github from apache-rat-plugin check #32 (viirya)
  • build: Add CI for MacOS (x64 and aarch64) #35 (sunchao)
  • fix broken link in README.md #39 (nairbv)
  • test: Add some fuzz testing for cast operations #16 (andygrove)
  • test: Fix CI failure on libcrypto #41 (sunchao)
  • test: Reduce test time spent in CometShuffleSuite #40 (sunchao)
  • test: Add test for RoundRobinPartitioning #54 (viirya)
  • build: Fix potential libcrypto lib loading issue for X86 mac runners #55 (advancedxy)
  • refactor: Remove a few duplicated occurrences #53 (sunchao)
  • build: Fix mvn cache for containerized runners #48 (advancedxy)
  • test: Ensure traversed operators during finding first partial aggregaion are all native #58 (viirya)
  • build: Upgrade arrow-rs to 50.0.0 and DataFusion to 35.0.0 #65 (viirya)
  • build: Support built with java 1.8 #45 (advancedxy)
  • test: Add golden files for TPCDSPlanStabilitySuite #73 (sunchao)
  • test: Add TPC-DS test results #77 (sunchao)
  • build: Upgrade spotless version to 2.43.0 #85 (viirya)
  • test: Expose thrown exception when executing query in CometTPCHQuerySuite #96 (viirya)
  • test: Enable TPCDS q41 in CometTPCDSQuerySuite #98 (viirya)
  • build: Add CI for TPCDS queries #99 (viirya)
  • build: Add tpcds-sf-1 to license header excluded list #108 (viirya)
  • build: Show time duration for scala test #116 (advancedxy)
  • test: Move MacOS (x86) pipelines to post-commit #122 (sunchao)
  • build: Upgrade DF to 36.0.0 and arrow-rs 50.0.0 #66 (comphead)
  • test: Reduce end-to-end test time #109 (sunchao)
  • build: Separate and speedup TPC-DS benchmark #130 (advancedxy)
  • build: Re-enable TPCDS queries q34 and q64 in CometTPCDSQuerySuite #133 (viirya)
  • build: Refine names in benchmark.yml #132 (advancedxy)
  • build: Make the build system work out of box #136 (advancedxy)
  • minor: Update README.md with system diagram #148 (alamb)
  • test: Add golden files for test #150 (snmvaughan)
  • build: Add checker for PR title #151 (sunchao)
  • build: Support CI pipelines for Spark 3.2, 3.3 and 3.4 #153 (advancedxy)
  • minor: Only trigger PR title checker on pull requests #154 (sunchao)
  • chore: Fix warnings in both compiler and test environments #164 (advancedxy)
  • build: Upload test reports and coverage #163 (advancedxy)
  • minor: Remove unnecessary logic #169 (sunchao)
  • minor: Make QueryPlanSerde warning log less confusing #181 (viirya)
  • refactor: Skipping slicing on shuffle arrays in shuffle reader #189 (viirya)
  • build: Run Spark SQL tests for 3.4 #166 (sunchao)
  • build: Enforce scalafix check in CI #203 (advancedxy)
  • test: Follow up on Spark 3.4 diff #209 (sunchao)
  • build: Avoid confusion by using profile with clean #215 (snmvaughan)
  • test: Add TPC-H test results #218 (viirya)
  • build: Add CI for TPC-H queries #220 (viirya)
  • test: Enable Comet shuffle in Spark SQL tests #210 (sunchao)
  • test: Disable spark ui in unit test by default #235 (beryllw)
  • chore: Replace deprecated temporal methods #229 (snmvaughan)
  • build: Use specified branch of arrow-rs with workaround to invalid offset buffers from Java Arrow #239 (viirya)
  • test: Enable string-to-bool cast test #251 (andygrove)
  • test: Restore tests in CometTPCDSQuerySuite #252 (viirya)
  • test: Enable all remaining TPCDS queries #254 (viirya)
  • test: Enable all remaining TPCH queries #257 (viirya)
  • chore: Remove some calls to unwrap when calling create_expr in planner.rs #269 (andygrove)
  • chore: Fix typo in info message #279 (andygrove)
  • chore: Fix NPE when running CometTPCHQueriesList directly #285 (advancedxy)
  • chore: Update Comet repo description #291 (viirya)
  • Chore: Cleanup how datafusion session config is created #289 (psvri)
  • build: Update asf.yaml to use @datafusion.apache.org #294 (sunchao)
  • chore: Remove unused functions #301 (kazuyukitanimura)
  • chore: Ignore unused variables #306 (snmvaughan)
  • chore: Update documentation publishing domain and path #310 (andygrove)
  • chore: Add documentation publishing infrastructure #314 (andygrove)
  • build: Move shim directories #318 (kazuyukitanimura)
  • test: Suppress decimal random number tests for 3.2 and 3.3 #319 (kazuyukitanimura)
  • chore: Add allocation source to StreamReader #332 (viirya)
  • chore: Add more cast tests and improve test framework #351 (andygrove)
  • chore: Implement remaining CAST tests #356 (andygrove)
  • build: Add Spark SQL test pipeline with ANSI mode enabled #321 (parthchandra)
  • chore: Store EXTENSION_INFO as Set[String] instead of newline-delimited String #386 (andygrove)
  • build: Add scala-version to matrix #396 (snmvaughan)
  • chore: Add criterion benchmarks for casting between integer types #401 (andygrove)
  • chore: Make COMET_EXEC_BROADCAST_FORCE_ENABLED internal config #413 (viirya)
  • chore: Rename some columnar shuffle configs for code consistently #418 (leoluan2009)
  • chore: Remove an unused config #430 (andygrove)
  • tests: Move random data generation methods from CometCastSuite to new DataGenerator class #426 (andygrove)
  • test: Fix explain with exteded info comet test #436 (kazuyukitanimura)
  • chore: Add cargo bench for shuffle writer #438 (andygrove)
  • chore: improve fallback message when comet native shuffle is not enabled #445 (andygrove)
  • Coverage: Add a manual test to show what Spark built in expression the DF can support directly #331 (comphead)
  • build: Add spark-4.0 profile and shims #407 (kazuyukitanimura)
  • build: bump spark version to 3.4.3 #292 (huaxingao)
  • chore: Removing copying data from dictionary values into CometDictionary #490 (viirya)
  • chore: Update README to highlight Comet benefits #497 (andygrove)
  • test: fix ClassNotFoundException for Hive tests #499 (kazuyukitanimura)
  • build: Enable comet tests with spark-4.0 profile #493 (kazuyukitanimura)
  • chore: Switch to stable Rust #505 (andygrove)
  • Minor: Generate the supported Spark builtin expression list into MD file #455 (comphead)
  • chore: Simplify code in CometExecIterator and avoid some small overhead #522 (andygrove)
  • chore: Upgrade spark to 4.0.0-preview1 #526 (advancedxy)
  • chore: Add UnboundColumn to carry datatype for unbound reference #518 (viirya)
  • chore: Remove 3.4.2.diff #528 (kazuyukitanimura)
  • build: Switch back to official DataFusion repo and arrow-rs after Arrow Java 16 is released #403 (viirya)
  • chore: Add CometEvalMode enum to replace string literals #539 (andygrove)
  • chore: Create initial release process scripts for official ASF source release #429 (andygrove)
  • build: Use DataFusion 39.0.0 release #550 (viirya)
  • chore: disable xxhash64 by default #548 (andygrove)
  • chore: Remove unsafe use of from_raw_parts in Parquet decoder #549 (andygrove)
  • test: Add tests for Scalar and Inverval values for UnaryMinus #538 (vaibhawvipul)
  • chore: Add changelog generator #545 (andygrove)
  • chore: Remove unused hash_utils.rs #561 (andygrove)
  • chore: Use in_list func directly #559 (advancedxy)
  • chore: Fix most of the scala/java build warnings #562 (andygrove)
  • chore: Upgrade to Rust 1.78 and fix UB issues in unsafe code #546 (andygrove)
  • chore: Remove spark.comet.xxhash64.enabled from the config document #586 (viirya)
  • build: Drop Spark 3.2 support #581 (huaxingao)
  • test: Enable Spark 4.0 tests #537 (kazuyukitanimura)
  • refactor: Remove method get_global_jclass #580 (eejbyfeldt)
  • chore: Move some utility methods to submodules of scalar_funcs #590 (advancedxy)
  • chore: Upgrade to Rust 1.79 #570 (andygrove)
  • chore: Remove some calls to unwrap #598 (andygrove)
  • chore: Improve JNI safety #600 (andygrove)
  • chore: remove some unwraps from shuffle module #601 (andygrove)
  • chore: Use proper constructor of IndexShuffleBlockResolver #610 (viirya)
  • chore: Update benchmark results #614 (andygrove)
  • build: Upgrade to 2.13.14 for scala-2.13 profile #626 (viirya)
  • chore: Rename shuffle write metric #624 (andygrove)
  • minor: replace .downcast_ref::().is_some() with .is::() #635 (andygrove)
  • test: Add CometTPCDSQueryTestSuite #628 (viirya)
  • chore: Convert Rust project into a workspace #637 (andygrove)
  • chore: Add Miri workflow #636 (andygrove)
  • test: Run optimized version of q72 derived from TPC-DS #652 (viirya)
  • chore: Refactoring of CometError/SparkError #655 (andygrove)
  • chore: Move cast to spark-expr crate #654 (andygrove)
  • chore: Remove utils crate and move utils into spark-expr crate #658 (andygrove)
  • chore: Move temporal kernels and expressions to spark-expr crate #660 (andygrove)
  • chore: Move protobuf files to separate crate #661 (andygrove)
  • Use IfExpr to check when input to log2 is <=0 and return null #506 (PedroMDuarte)
  • chore: Change suffix on some expressions from Exec to Expr #673 (andygrove)
  • chore: Fix some regressions with Spark 3.5.1 #674 (andygrove)
  • chore: Improve fuzz testing coverage #668 (andygrove)
  • Create Comet docker file #675 (comphead)
  • chore: Add microbenchmarks #671 (andygrove)
  • build: Exclude protobug generated codes from apache-rat check #683 (viirya)
  • chore: Disable abs and signum because they return incorrect results #695 (andygrove)

Credits

Thank you to everyone who contributed to this release. Here is a breakdown of commits (PRs merged) per contributor.

   100	Liang-Chi Hsieh
    82	Andy Grove
    28	advancedxy
    27	Chao Sun
    14	Huaxin Gao
    11	KAZUYUKI TANIMURA
     9	Vipul Vaibhaw
     8	Parth Chandra
     7	Emil Ejbyfeldt
     7	Steve Vaughan
     7	comphead
     4	Oleks V
     4	Pablo Langa
     4	Trent Hauck
     2	Edmondo Porcu
     2	Vrishabh
     2	Xin Hao
     2	Xuedong Luan
     1	Andrew Lamb
     1	Brian Vaughan
     1	Cancai Cai
     1	Eren Avsarogullari
     1	Holden Karau
     1	JC
     1	Junbo wang
     1	Junfan Zhang
     1	Pedro M Duarte
     1	Prashant K. Sharma
     1	RickestCode
     1	Rohit Rastogi
     1	Roman Zeyde
     1	Semyon
     1	Son
     1	Sujith Jay Nair
     1	Zhen Wang
     1	ceppelli
     1	dependabot[bot]
     1	thexia
     1	vidyasankarv
     1	wankun
     1	గణేష్

Thank you also to everyone who contributed in other ways such as filing issues, reviewing PRs, and providing feedback on this release.

Package Rankings
Badges
Extracted from project README
Apache licensed Discord chat