datafusion-comet

Apache DataFusion Comet Spark Accelerator

APACHE-2.0 License

Downloads
1.3K
Stars
786
Committers
48

Bot releases are hidden (Show)

datafusion-comet - 0.2.0 Latest Release

Published by andygrove about 2 months ago

DataFusion Comet 0.2.0 Changelog

This release consists of 87 commits from 14 contributors. See credits at the end of this changelog for more information.

Fixed bugs:

  • fix: dictionary decimal vector optimization #705 (kazuyukitanimura)
  • fix: Unsupported window expression should fall back to Spark #710 (viirya)
  • fix: ReusedExchangeExec can be child operator of CometBroadcastExchangeExec #713 (viirya)
  • fix: Fallback to Spark for window expression with range frame #719 (viirya)
  • fix: Remove skip.surefire.tests mvn property #739 (wForget)
  • fix: subquery execution under CometTakeOrderedAndProjectExec should not fail #748 (viirya)
  • fix: skip negative scale checks for creating decimals #723 (kazuyukitanimura)
  • fix: Fallback to Spark for unsupported partitioning #759 (viirya)
  • fix: Unsupported types for SinglePartition should fallback to Spark #765 (viirya)
  • fix: unwrap dictionaries in CreateNamedStruct #754 (andygrove)
  • fix: Fallback to Spark for unsupported input besides ordering #768 (viirya)
  • fix: Native window operator should be CometUnaryExec #774 (viirya)
  • fix: Fallback to Spark when shuffling on struct with duplicate field name #776 (viirya)
  • fix: withInfo was overwriting information in some cases #780 (andygrove)
  • fix: Improve support for nested structs #800 (eejbyfeldt)
  • fix: Sort on single struct should fallback to Spark #811 (viirya)
  • fix: Check sort order of SortExec instead of child output #821 (viirya)
  • fix: Fix panic in avg aggregate and disable stddev by default #819 (andygrove)
  • fix: Supported nested types in HashJoin #735 (eejbyfeldt)

Performance related:

  • perf: Improve performance of CASE .. WHEN expressions #703 (andygrove)
  • perf: Optimize IfExpr by delegating to CaseExpr #681 (andygrove)
  • fix: optimize isNullAt #732 (kazuyukitanimura)
  • perf: decimal decode improvements #727 (parthchandra)
  • fix: Remove castting on decimals with a small precision to decimal256 #741 (kazuyukitanimura)
  • fix: optimize some bit functions #718 (kazuyukitanimura)
  • fix: Optimize getDecimal for small precision #758 (kazuyukitanimura)
  • perf: add metrics to CopyExec and ScanExec #778 (andygrove)
  • fix: Optimize decimal creation macros #764 (kazuyukitanimura)
  • perf: Improve count aggregate performance #784 (andygrove)
  • fix: Optimize read_side_padding #772 (kazuyukitanimura)
  • perf: Remove some redundant copying of batches #816 (andygrove)
  • perf: Remove redundant copying of batches after FilterExec #835 (andygrove)
  • fix: Optimize CheckOverflow #852 (kazuyukitanimura)
  • perf: Add benchmarks for Spark Scan + Comet Exec #863 (andygrove)

Implemented enhancements:

  • feat: Add support for time-zone, 3 & 5 digit years: Cast from string to timestamp. #704 (akhilss99)
  • feat: Support count AggregateUDF for window function #736 (huaxingao)
  • feat: Implement basic version of RLIKE #734 (andygrove)
  • feat: show executed native plan with metrics when in debug mode #746 (andygrove)
  • feat: Add GetStructField expression #731 (Kimahriman)
  • feat: Add config to enable native upper and lower string conversion #767 (andygrove)
  • feat: Improve native explain #795 (andygrove)
  • feat: Add support for null literal with struct type #797 (eejbyfeldt)
  • feat: Optimze CreateNamedStruct preserve dictionaries #789 (eejbyfeldt)
  • feat: CreateArray support #793 (Kimahriman)
  • feat: Add native thread configs #828 (viirya)
  • feat: Add specific configs for converting Spark Parquet and JSON data to Arrow #832 (andygrove)
  • feat: Support sum in window function #802 (huaxingao)
  • feat: Simplify configs for enabling/disabling operators #855 (andygrove)
  • feat: Enable clippy::clone_on_ref_ptr on proto and spark_exprs crates #859 (comphead)
  • feat: Enable clippy::clone_on_ref_ptr on core crate #860 (comphead)
  • feat: Use CometPlugin as main entrypoint #853 (andygrove)

Documentation updates:

  • doc: Update outdated spark.comet.columnar.shuffle.enabled configuration doc #738 (wForget)
  • docs: Add explicit configs for enabling operators #801 (andygrove)
  • doc: Document CometPlugin to start Comet in cluster mode #836 (comphead)

Other:

  • chore: Make rust clippy happy #701 (Xuanwo)
  • chore: Update version to 0.2.0 and add 0.1.0 changelog #696 (andygrove)
  • chore: Use rust-toolchain.toml for better toolchain support #699 (Xuanwo)
  • chore(native): Make sure all targets in workspace been covered by clippy #702 (Xuanwo)
  • Apache DataFusion Comet Logo #697 (aocsa)
  • chore: Add logo to rat exclude list #709 (andygrove)
  • chore: Use new logo in README and website #724 (andygrove)
  • build: Add Comet logo files into exclude list #726 (viirya)
  • chore: Remove TPC-DS benchmark results #728 (andygrove)
  • chore: make Cast's logic reusable for other projects #716 (Blizzara)
  • chore: move scalar_funcs into spark-expr #712 (Blizzara)
  • chore: Bump DataFusion to rev 35c2e7e #740 (andygrove)
  • chore: add more aggregate functions to benchmark test #706 (huaxingao)
  • chore: Add criterion benchmark for decimal_div #743 (andygrove)
  • build: Re-enable TPCDS q72 for broadcast and hash join configs #781 (viirya)
  • chore: bump DataFusion to rev f4e519f #783 (huaxingao)
  • chore: Upgrade to DataFusion rev bddb641 and disable "skip partial aggregates" feature #788 (andygrove)
  • chore: Remove legacy code for adding a cast to a coalesce #790 (andygrove)
  • chore: Use DataFusion 41.0.0-rc1 #794 (andygrove)
  • chore: rename CometRowToColumnar and fix duplication bug #785 (Kimahriman)
  • chore: Enable shuffle in micro benchmarks #806 (andygrove)
  • Minor: ScanExec code cleanup and additional documentation #804 (andygrove)
  • chore: Make it possible to run 'make benchmark-%' using jvm 17+ #823 (eejbyfeldt)
  • chore: Add more unsupported cases to supportedSortType #825 (viirya)
  • chore: Enable Comet shuffle with AQE coalesce partitions #834 (viirya)
  • chore: Add GitHub workflow to publish Docker image #847 (andygrove)
  • chore: Revert "fix: change the not exists base image apache/spark:3.4.3 to 3.4.2" #854 (haoxins)
  • chore: fix docker-publish attempt 1 #851 (andygrove)
  • minor: stop warning that AQEShuffleRead cannot run natively #842 (andygrove)
  • chore: Improve ObjectHashAggregate fallback error message #849 (andygrove)
  • chore: Fix docker image publishing (specify ghcr.io in tag) #856 (andygrove)
  • chore: Use Git tag as Comet version when publishing Docker images #857 (andygrove)

Credits

Thank you to everyone who contributed to this release. Here is a breakdown of commits (PRs merged) per contributor.

    36	Andy Grove
    16	Liang-Chi Hsieh
     9	KAZUYUKI TANIMURA
     5	Emil Ejbyfeldt
     4	Huaxin Gao
     3	Adam Binford
     3	Oleks V
     3	Xuanwo
     2	Arttu
     2	Zhen Wang
     1	Akhil S S
     1	Alexander Ocsa
     1	Parth Chandra
     1	Xin Hao

Thank you also to everyone who contributed in other ways such as filing issues, reviewing PRs, and providing feedback on this release.

datafusion-comet - 0.1.0

Published by andygrove about 2 months ago

DataFusion Comet 0.1.0 Changelog

This release consists of 343 commits from 41 contributors. See credits at the end of this changelog for more information.

Implemented enhancements:

  • feat: Add native shuffle and columnar shuffle #30 (viirya)
  • feat: Support Emit::First for SumDecimalGroupsAccumulator #47 (viirya)
  • feat: Nested map support for columnar shuffle #51 (viirya)
  • feat: Support Count(Distinct) and similar aggregation functions #42 (huaxingao)
  • feat: Upgrade to jni-rs 0.21 #50 (sunchao)
  • feat: Handle exception thrown from native side #61 (sunchao)
  • feat: Support InSet expression in Comet #59 (viirya)
  • feat: Add CometNativeException for exceptions thrown from the native side #62 (sunchao)
  • feat: Add cause to native exception #63 (viirya)
  • feat: Pull based native execution #69 (viirya)
  • feat: Add executeColumnarCollectIterator to CometExec to collect Comet operator result #71 (viirya)
  • feat: Add CometBroadcastExchangeExec to support broadcasting the result of Comet native operator #80 (viirya)
  • feat: Reduce memory consumption when writing sorted shuffle files #82 (sunchao)
  • feat: Add struct/map as unsupported map key/value for columnar shuffle #84 (viirya)
  • feat: Support multiple input sources for CometNativeExec #87 (viirya)
  • feat: Date and timestamp trunc with format array #94 (parthchandra)
  • feat: Support First/Last aggregate functions #97 (huaxingao)
  • feat: Add support of TakeOrderedAndProjectExec in Comet #88 (viirya)
  • feat: Support Binary in shuffle writer #106 (advancedxy)
  • feat: Add license header by spotless:apply automatically #110 (advancedxy)
  • feat: Add dictionary binary to shuffle writer #111 (viirya)
  • feat: Minimize number of connections used by parallel reader #126 (parthchandra)
  • feat: Support CollectLimit operator #100 (advancedxy)
  • feat: Enable min/max for boolean type #165 (huaxingao)
  • feat: Introduce CometTaskMemoryManager and native side memory pool #83 (sunchao)
  • feat: Fix old style names #201 (comphead)
  • feat: enable comet shuffle manager for comet shell #204 (zuston)
  • feat: Support bitwise aggregate functions #197 (huaxingao)
  • feat: Support BloomFilterMightContain expr #179 (advancedxy)
  • feat: Support sort merge join #178 (viirya)
  • feat: Support HashJoin operator #194 (viirya)
  • feat: Remove use of nightly int_roundings feature #228 (psvri)
  • feat: Support Broadcast HashJoin #211 (viirya)
  • feat: Enable Comet broadcast by default #213 (viirya)
  • feat: Add CometRowToColumnar operator #206 (advancedxy)
  • feat: Document the class path / classloader issue with the shuffle manager #256 (holdenk)
  • feat: Port Datafusion Covariance to Comet #234 (huaxingao)
  • feat: Add manual test to calculate spark builtin functions coverage #263 (comphead)
  • feat: Support ANSI mode in CAST from String to Bool #290 (andygrove)
  • feat: Add extended explain info to Comet plan #255 (parthchandra)
  • feat: Improve CometSortMergeJoin statistics #304 (planga82)
  • feat: Add compatibility guide #316 (andygrove)
  • feat: Improve CometHashJoin statistics #309 (planga82)
  • feat: Support Variance #297 (huaxingao)
  • feat: Support murmur3_hash and sha2 family hash functions #226 (advancedxy)
  • feat: Disable cast string to timestamp by default #337 (andygrove)
  • feat: Improve CometBroadcastHashJoin statistics #339 (planga82)
  • feat: Implement Spark-compatible CAST from string to integral types #307 (andygrove)
  • feat: Implement Spark-compatible CAST from string to timestamp types #335 (vaibhawvipul)
  • feat: Implement Spark-compatible CAST float/double to string #346 (mattharder91)
  • feat: Only allow incompatible cast expressions to run in comet if a config is enabled #362 (andygrove)
  • feat: Implement Spark-compatible CAST between integer types #340 (ganeshkumar269)
  • feat: Supports Stddev #348 (huaxingao)
  • feat: Improve cast compatibility tests and docs #379 (andygrove)
  • feat: Implement Spark-compatible CAST from non-integral numeric types to integral types #399 (rohitrastogi)
  • feat: Implement Spark unhex #342 (tshauck)
  • feat: Enable columnar shuffle by default #250 (viirya)
  • feat: Implement Spark-compatible CAST from floating-point/double to decimal #384 (vaibhawvipul)
  • feat: Add logging to explain reasons for Comet not being able to run a query stage natively #397 (andygrove)
  • feat: Add support for TryCast expression in Spark 3.2 and 3.3 #416 (vaibhawvipul)
  • feat: Supports UUID column #395 (huaxingao)
  • feat: correlation support #456 (huaxingao)
  • feat: Implement Spark-compatible CAST from String to Date #383 (vidyasankarv)
  • feat: Add COMET_SHUFFLE_MODE config to control Comet shuffle mode #460 (viirya)
  • feat: Add random row generator in data generator #451 (advancedxy)
  • feat: Add xxhash64 function support #424 (advancedxy)
  • feat: add hex scalar function #449 (tshauck)
  • feat: Add "Comet Fuzz" fuzz-testing utility #472 (andygrove)
  • feat: Use enum to represent CAST eval_mode in expr.proto #415 (prashantksharma)
  • feat: Implement ANSI support for UnaryMinus #471 (vaibhawvipul)
  • feat: Add specific fuzz tests for cast and try_cast and fix NPE found during fuzz testing #514 (andygrove)
  • feat: Add fuzz testing for arithmetic expressions #519 (andygrove)
  • feat: Add HashJoin support for BuildRight #437 (viirya)
  • feat: Fix Comet error message #544 (comphead)
  • feat: Support Ansi mode in abs function #500 (planga82)
  • feat: Enable xxhash64 by default #583 (andygrove)
  • feat: Add experimental support for Apache Spark 3.5.1 #587 (andygrove)
  • feat: add nullOnDivideByZero for Covariance #564 (huaxingao)
  • feat: Implement more efficient version of xxhash64 #575 (andygrove)
  • feat: Enable Spark SQL tests for Spark 3.5.1 #603 (andygrove)
  • feat: Initial support for Window function #599 (huaxingao)
  • feat: IsNaN expression in Comet #612 (eejbyfeldt)
  • feat: Add support for CreateNamedStruct #620 (eejbyfeldt)
  • feat: add cargo machete to remove udeps #641 (vaibhawvipul)
  • feat: Upgrade to DataFusion 40.0.0-rc1 #644 (andygrove)
  • feat: Use unified allocator for execution iterators #613 (viirya)
  • feat: Create new datafusion-comet-spark-expr crate containing Spark-compatible DataFusion expressions #638 (andygrove)
  • feat: Move IfExpr to spark-expr crate #653 (andygrove)
  • feat: Upgrade to DataFusion 40 #657 (andygrove)
  • feat: Show user a more intuitive message when queries fall back to Spark #656 (andygrove)
  • feat: Enable remaining Spark 3.5.1 tests #676 (andygrove)
  • feat: Spark-4.0 widening type support #604 (kazuyukitanimura)
  • feat: add scalar subquery pushdown to scan #678 (parthchandra)

Fixed bugs:

  • fix: Comet sink operator should not have children operators #26 (viirya)
  • fix: Fix the UnionExec match branches in CometExecRule #68 (wankunde)
  • fix: Appending null values to element array builders of StructBuilder for null row in a StructArray #78 (viirya)
  • fix: Fix compilation error for CometBroadcastExchangeExec #86 (viirya)
  • fix: Avoid exception caused by broadcasting empty result #92 (wForget)
  • fix: Add num_rows when building RecordBatch #103 (advancedxy)
  • fix: Cast string to boolean not compatible with Spark #107 (erenavsarogullari)
  • fix: Another attempt to fix libcrypto.dylib loading issue #112 (advancedxy)
  • fix: Fix compilation error for Spark 3.2 & 3.3 #117 (sunchao)
  • fix: Fix corrupted AggregateMode when transforming plan parameters #118 (viirya)
  • fix: bitwise shift with different left/right types #135 (viirya)
  • fix: Avoid null exception in removeSubquery #147 (viirya)
  • fix: rat check error in vscode ide #161 (thexiay)
  • fix: Final aggregation should not bind to the input of partial aggregation #155 (viirya)
  • fix: coalesce should return correct datatype #168 (viirya)
  • fix: attempt to divide by zero error on decimal division #172 (viirya)
  • fix: Aggregation without aggregation expressions should use correct result expressions #175 (viirya)
  • fix: Comet native operator can be executed after ReusedExchange #187 (viirya)
  • fix: Try to convert a static list into a set in Rust #184 (advancedxy)
  • fix: Include active spiller when computing peak shuffle memory #196 (sunchao)
  • fix: CometExecRule should handle ShuffleQueryStage and ReusedExchange #186 (viirya)
  • fix: Use makeCopy to change relation in FileSourceScanExec #207 (viirya)
  • fix: Remove duplicate byte array allocation for CometDictionary #224 (viirya)
  • fix: Remove redundant data copy in columnar shuffle #233 (viirya)
  • fix: Only maps FIXED_LEN_BYTE_ARRAY to String for uuid type #238 (huaxingao)
  • fix: Reduce RowPartition memory allocation #244 (viirya)
  • fix: Remove wrong calculation for Murmur3Hash for float with null input #245 (advancedxy)
  • fix: Deallocate row addresses and size arrays after exporting #246 (viirya)
  • fix: Fix wrong children expression order in IfExpr #249 (viirya)
  • fix: Average expression in Comet Final should handle all null inputs from partial Spark aggregation #261 (viirya)
  • fix: Only trigger Comet Final aggregation on Comet partial aggregation #264 (viirya)
  • fix: incorrect result on Comet multiple column distinct count #268 (viirya)
  • fix: Avoid using CometConf #266 (snmvaughan)
  • fix: Fix arrow error when sorting on empty batch #271 (viirya)
  • fix: Include license using # instead of using XML comment #274 (snmvaughan)
  • fix: Comet should not translate try_sum to native sum expression #277 (viirya)
  • fix: incorrect result with aggregate expression with filter #284 (viirya)
  • fix: Comet should not fail on negative limit parameter #288 (viirya)
  • fix: Comet columnar shuffle should not be on top of another Comet shuffle operator #296 (viirya)
  • fix: Iceberg scan transition should be in front of other data source v2 #302 (viirya)
  • fix: CometExec's outputPartitioning might not be same as Spark expects after AQE interferes #299 (viirya)
  • fix: CometShuffleExchangeExec logical link should be correct #324 (viirya)
  • fix: SortMergeJoin with unsupported key type should fall back to Spark #355 (viirya)
  • fix: limit with offset should return correct results #359 (viirya)
  • fix: Disable Comet shuffle with AQE coalesce partitions enabled #380 (viirya)
  • fix: Unknown operator id when explain with formatted mode #410 (leoluan2009)
  • fix: Reuse CometBroadcastExchangeExec with Spark ReuseExchangeAndSubquery rule #441 (viirya)
  • fix: newFileScanRDD should not take constructor from custom Spark versions #412 (ceppelli)
  • fix: fix CometNativeExec.doCanonicalize for ReusedExchangeExec #447 (viirya)
  • fix: Enable cast string to int tests and fix compatibility issue #453 (andygrove)
  • fix: Compute murmur3 hash with dictionary input correctly #433 (advancedxy)
  • fix: Only delegate to DataFusion cast when we know that it is compatible with Spark #461 (andygrove)
  • fix: ColumnReader.loadVector should initiate CometDictionary after re-import arrays #473 (viirya)
  • fix: substring with negative indices should produce correct result #470 (sonhmai)
  • fix: CometReader.loadVector should not overwrite dictionary ids #476 (viirya)
  • fix: Reuse previous CometDictionary Java arrays #489 (viirya)
  • fix: Fallback to Spark for LIKE with custom escape character #478 (sujithjay)
  • fix: Incorrect input schema when preparing result expressions for HashAggregation #501 (viirya)
  • fix: Input batch to ShuffleRepartitioner.insert_batch should not be larger than configured batch size #523 (viirya)
  • fix: Fix integer overflow in date_parser #529 (eejbyfeldt)
  • fix: null character not permitted in chr function #513 (vaibhawvipul)
  • fix: Overflow when reading Timestamp from parquet file #542 (eejbyfeldt)
  • fix: Re-implement some Parquet decode methods without copy_nonoverlapping #558 (andygrove)
  • fix: requested character too large for encoding in chr function #552 (vaibhawvipul)
  • fix: Running cargo build always triggers rebuild #579 (eejbyfeldt)
  • fix: Avoid recursive call to canonicalizePlans #582 (viirya)
  • fix: Return error in pre_timestamp_cast instead of panic #543 (eejbyfeldt)
  • perf: Add criterion benchmark for xxhash64 function #560 (andygrove)
  • fix: Fix range out of index error with a temporary workaround #584 (viirya)
  • fix: Improve error "BroadcastExchange is not supported" #577 (parthchandra)
  • fix: Avoid creating huge duplicate of canonicalized plans for CometNativeExec #639 (viirya)
  • fix: Tag ignored tests that require SubqueryBroadcastExec #647 (parthchandra)
  • fix: Optimize some functions to rewrite dictionary-encoded strings #627 (vaibhawvipul)
  • fix: Remove nightly flag in release-nogit target in Makefile #667 (andygrove)
  • fix: change the not exists base image apache/spark:3.4.3 to 3.4.2 #686 (haoxins)
  • fix: Spark 4.0 SparkArithmeticException test #688 (kazuyukitanimura)
  • fix: address failure caused by method signature change in SPARK-48791 #693 (parthchandra)

Documentation updates:

  • doc: Add Quickstart Comet doc section #125 (comphead)
  • doc: Minor fix Getting started reformatting #128 (comphead)
  • doc: Add initial doc how to expand Comet exceptions #170 (comphead)
  • doc: Update README.md with shuffle configs #208 (viirya)
  • doc: Update supported expressions #237 (viirya)
  • doc: Fix a small typo in README.md #272 (rz-vastdata)
  • doc: Update DataFusion project name and url #300 (viirya)
  • docs: Move existing documentation into new Contributor Guide and add Getting Started section #334 (andygrove)
  • docs: Add more content to the user guide #347 (andygrove)
  • docs: Generate configuration guide in mvn build #349 (andygrove)
  • docs: Add a plugin overview page to the contributors guide #345 (andygrove)
  • doc: Fix target typo in development.md #364 (jc4x4)
  • doc: Clean up supported JDKs in README #366 (edmondop)
  • doc: add contributing in README.md #382 (caicancai)
  • docs: fix the docs url of installation instructions #393 (haoxins)
  • docs: Running ScalaTest suites from the CLI #404 (edmondop)
  • docs: Remove spark.comet.exec.broadcast.enabled from config docs #421 (andygrove)
  • docs: fix various sphinx warnings #428 (tshauck)
  • doc: Add Plan Stability Testing to development guide #432 (viirya)
  • docs: Update Spark shell command to include setting additional class path #435 (andygrove)
  • doc: Add Tuning Guide with shuffle configs #443 (viirya)
  • docs: Add benchmarking guide #444 (andygrove)
  • docs: add guide to adding a new expression #422 (tshauck)
  • docs: changes in documentation #512 (SemyonSinchenko)
  • docs: Improve user documentation for supported operators and expressions #520 (andygrove)
  • docs: Proposal for source release process #556 (andygrove)
  • docs: Update benchmark results #687 (andygrove)
  • docs: Update percentage speedups in benchmarking guide #691 (andygrove)
  • doc: Add memory tuning section to user guide #684 (viirya)

Other:

  • Initial PR #1 (sunchao)
  • build: Add Maven wrapper to the project #13 (sunchao)
  • build: Add basic CI test pipelines #18 (sunchao)
  • Bump com.google.protobuf:protobuf-java from 3.17.3 to 3.19.6 #5 (dependabot[bot])
  • build: Add PR template #23 (sunchao)
  • build: Create ticket templates #24 (comphead)
  • build: Re-enable Scala style checker and spotless #21 (sunchao)
  • build: Remove license header from pull request template #28 (viirya)
  • build: Exclude .github from apache-rat-plugin check #32 (viirya)
  • build: Add CI for MacOS (x64 and aarch64) #35 (sunchao)
  • fix broken link in README.md #39 (nairbv)
  • test: Add some fuzz testing for cast operations #16 (andygrove)
  • test: Fix CI failure on libcrypto #41 (sunchao)
  • test: Reduce test time spent in CometShuffleSuite #40 (sunchao)
  • test: Add test for RoundRobinPartitioning #54 (viirya)
  • build: Fix potential libcrypto lib loading issue for X86 mac runners #55 (advancedxy)
  • refactor: Remove a few duplicated occurrences #53 (sunchao)
  • build: Fix mvn cache for containerized runners #48 (advancedxy)
  • test: Ensure traversed operators during finding first partial aggregaion are all native #58 (viirya)
  • build: Upgrade arrow-rs to 50.0.0 and DataFusion to 35.0.0 #65 (viirya)
  • build: Support built with java 1.8 #45 (advancedxy)
  • test: Add golden files for TPCDSPlanStabilitySuite #73 (sunchao)
  • test: Add TPC-DS test results #77 (sunchao)
  • build: Upgrade spotless version to 2.43.0 #85 (viirya)
  • test: Expose thrown exception when executing query in CometTPCHQuerySuite #96 (viirya)
  • test: Enable TPCDS q41 in CometTPCDSQuerySuite #98 (viirya)
  • build: Add CI for TPCDS queries #99 (viirya)
  • build: Add tpcds-sf-1 to license header excluded list #108 (viirya)
  • build: Show time duration for scala test #116 (advancedxy)
  • test: Move MacOS (x86) pipelines to post-commit #122 (sunchao)
  • build: Upgrade DF to 36.0.0 and arrow-rs 50.0.0 #66 (comphead)
  • test: Reduce end-to-end test time #109 (sunchao)
  • build: Separate and speedup TPC-DS benchmark #130 (advancedxy)
  • build: Re-enable TPCDS queries q34 and q64 in CometTPCDSQuerySuite #133 (viirya)
  • build: Refine names in benchmark.yml #132 (advancedxy)
  • build: Make the build system work out of box #136 (advancedxy)
  • minor: Update README.md with system diagram #148 (alamb)
  • test: Add golden files for test #150 (snmvaughan)
  • build: Add checker for PR title #151 (sunchao)
  • build: Support CI pipelines for Spark 3.2, 3.3 and 3.4 #153 (advancedxy)
  • minor: Only trigger PR title checker on pull requests #154 (sunchao)
  • chore: Fix warnings in both compiler and test environments #164 (advancedxy)
  • build: Upload test reports and coverage #163 (advancedxy)
  • minor: Remove unnecessary logic #169 (sunchao)
  • minor: Make QueryPlanSerde warning log less confusing #181 (viirya)
  • refactor: Skipping slicing on shuffle arrays in shuffle reader #189 (viirya)
  • build: Run Spark SQL tests for 3.4 #166 (sunchao)
  • build: Enforce scalafix check in CI #203 (advancedxy)
  • test: Follow up on Spark 3.4 diff #209 (sunchao)
  • build: Avoid confusion by using profile with clean #215 (snmvaughan)
  • test: Add TPC-H test results #218 (viirya)
  • build: Add CI for TPC-H queries #220 (viirya)
  • test: Enable Comet shuffle in Spark SQL tests #210 (sunchao)
  • test: Disable spark ui in unit test by default #235 (beryllw)
  • chore: Replace deprecated temporal methods #229 (snmvaughan)
  • build: Use specified branch of arrow-rs with workaround to invalid offset buffers from Java Arrow #239 (viirya)
  • test: Enable string-to-bool cast test #251 (andygrove)
  • test: Restore tests in CometTPCDSQuerySuite #252 (viirya)
  • test: Enable all remaining TPCDS queries #254 (viirya)
  • test: Enable all remaining TPCH queries #257 (viirya)
  • chore: Remove some calls to unwrap when calling create_expr in planner.rs #269 (andygrove)
  • chore: Fix typo in info message #279 (andygrove)
  • chore: Fix NPE when running CometTPCHQueriesList directly #285 (advancedxy)
  • chore: Update Comet repo description #291 (viirya)
  • Chore: Cleanup how datafusion session config is created #289 (psvri)
  • build: Update asf.yaml to use @datafusion.apache.org #294 (sunchao)
  • chore: Remove unused functions #301 (kazuyukitanimura)
  • chore: Ignore unused variables #306 (snmvaughan)
  • chore: Update documentation publishing domain and path #310 (andygrove)
  • chore: Add documentation publishing infrastructure #314 (andygrove)
  • build: Move shim directories #318 (kazuyukitanimura)
  • test: Suppress decimal random number tests for 3.2 and 3.3 #319 (kazuyukitanimura)
  • chore: Add allocation source to StreamReader #332 (viirya)
  • chore: Add more cast tests and improve test framework #351 (andygrove)
  • chore: Implement remaining CAST tests #356 (andygrove)
  • build: Add Spark SQL test pipeline with ANSI mode enabled #321 (parthchandra)
  • chore: Store EXTENSION_INFO as Set[String] instead of newline-delimited String #386 (andygrove)
  • build: Add scala-version to matrix #396 (snmvaughan)
  • chore: Add criterion benchmarks for casting between integer types #401 (andygrove)
  • chore: Make COMET_EXEC_BROADCAST_FORCE_ENABLED internal config #413 (viirya)
  • chore: Rename some columnar shuffle configs for code consistently #418 (leoluan2009)
  • chore: Remove an unused config #430 (andygrove)
  • tests: Move random data generation methods from CometCastSuite to new DataGenerator class #426 (andygrove)
  • test: Fix explain with exteded info comet test #436 (kazuyukitanimura)
  • chore: Add cargo bench for shuffle writer #438 (andygrove)
  • chore: improve fallback message when comet native shuffle is not enabled #445 (andygrove)
  • Coverage: Add a manual test to show what Spark built in expression the DF can support directly #331 (comphead)
  • build: Add spark-4.0 profile and shims #407 (kazuyukitanimura)
  • build: bump spark version to 3.4.3 #292 (huaxingao)
  • chore: Removing copying data from dictionary values into CometDictionary #490 (viirya)
  • chore: Update README to highlight Comet benefits #497 (andygrove)
  • test: fix ClassNotFoundException for Hive tests #499 (kazuyukitanimura)
  • build: Enable comet tests with spark-4.0 profile #493 (kazuyukitanimura)
  • chore: Switch to stable Rust #505 (andygrove)
  • Minor: Generate the supported Spark builtin expression list into MD file #455 (comphead)
  • chore: Simplify code in CometExecIterator and avoid some small overhead #522 (andygrove)
  • chore: Upgrade spark to 4.0.0-preview1 #526 (advancedxy)
  • chore: Add UnboundColumn to carry datatype for unbound reference #518 (viirya)
  • chore: Remove 3.4.2.diff #528 (kazuyukitanimura)
  • build: Switch back to official DataFusion repo and arrow-rs after Arrow Java 16 is released #403 (viirya)
  • chore: Add CometEvalMode enum to replace string literals #539 (andygrove)
  • chore: Create initial release process scripts for official ASF source release #429 (andygrove)
  • build: Use DataFusion 39.0.0 release #550 (viirya)
  • chore: disable xxhash64 by default #548 (andygrove)
  • chore: Remove unsafe use of from_raw_parts in Parquet decoder #549 (andygrove)
  • test: Add tests for Scalar and Inverval values for UnaryMinus #538 (vaibhawvipul)
  • chore: Add changelog generator #545 (andygrove)
  • chore: Remove unused hash_utils.rs #561 (andygrove)
  • chore: Use in_list func directly #559 (advancedxy)
  • chore: Fix most of the scala/java build warnings #562 (andygrove)
  • chore: Upgrade to Rust 1.78 and fix UB issues in unsafe code #546 (andygrove)
  • chore: Remove spark.comet.xxhash64.enabled from the config document #586 (viirya)
  • build: Drop Spark 3.2 support #581 (huaxingao)
  • test: Enable Spark 4.0 tests #537 (kazuyukitanimura)
  • refactor: Remove method get_global_jclass #580 (eejbyfeldt)
  • chore: Move some utility methods to submodules of scalar_funcs #590 (advancedxy)
  • chore: Upgrade to Rust 1.79 #570 (andygrove)
  • chore: Remove some calls to unwrap #598 (andygrove)
  • chore: Improve JNI safety #600 (andygrove)
  • chore: remove some unwraps from shuffle module #601 (andygrove)
  • chore: Use proper constructor of IndexShuffleBlockResolver #610 (viirya)
  • chore: Update benchmark results #614 (andygrove)
  • build: Upgrade to 2.13.14 for scala-2.13 profile #626 (viirya)
  • chore: Rename shuffle write metric #624 (andygrove)
  • minor: replace .downcast_ref::().is_some() with .is::() #635 (andygrove)
  • test: Add CometTPCDSQueryTestSuite #628 (viirya)
  • chore: Convert Rust project into a workspace #637 (andygrove)
  • chore: Add Miri workflow #636 (andygrove)
  • test: Run optimized version of q72 derived from TPC-DS #652 (viirya)
  • chore: Refactoring of CometError/SparkError #655 (andygrove)
  • chore: Move cast to spark-expr crate #654 (andygrove)
  • chore: Remove utils crate and move utils into spark-expr crate #658 (andygrove)
  • chore: Move temporal kernels and expressions to spark-expr crate #660 (andygrove)
  • chore: Move protobuf files to separate crate #661 (andygrove)
  • Use IfExpr to check when input to log2 is <=0 and return null #506 (PedroMDuarte)
  • chore: Change suffix on some expressions from Exec to Expr #673 (andygrove)
  • chore: Fix some regressions with Spark 3.5.1 #674 (andygrove)
  • chore: Improve fuzz testing coverage #668 (andygrove)
  • Create Comet docker file #675 (comphead)
  • chore: Add microbenchmarks #671 (andygrove)
  • build: Exclude protobug generated codes from apache-rat check #683 (viirya)
  • chore: Disable abs and signum because they return incorrect results #695 (andygrove)

Credits

Thank you to everyone who contributed to this release. Here is a breakdown of commits (PRs merged) per contributor.

   100	Liang-Chi Hsieh
    82	Andy Grove
    28	advancedxy
    27	Chao Sun
    14	Huaxin Gao
    11	KAZUYUKI TANIMURA
     9	Vipul Vaibhaw
     8	Parth Chandra
     7	Emil Ejbyfeldt
     7	Steve Vaughan
     7	comphead
     4	Oleks V
     4	Pablo Langa
     4	Trent Hauck
     2	Edmondo Porcu
     2	Vrishabh
     2	Xin Hao
     2	Xuedong Luan
     1	Andrew Lamb
     1	Brian Vaughan
     1	Cancai Cai
     1	Eren Avsarogullari
     1	Holden Karau
     1	JC
     1	Junbo wang
     1	Junfan Zhang
     1	Pedro M Duarte
     1	Prashant K. Sharma
     1	RickestCode
     1	Rohit Rastogi
     1	Roman Zeyde
     1	Semyon
     1	Son
     1	Sujith Jay Nair
     1	Zhen Wang
     1	ceppelli
     1	dependabot[bot]
     1	thexia
     1	vidyasankarv
     1	wankun
     1	గణేష్

Thank you also to everyone who contributed in other ways such as filing issues, reviewing PRs, and providing feedback on this release.