cudf

cuDF - GPU DataFrame Library

APACHE-2.0 License

Downloads
13.3K
Stars
7.2K
Committers
246

Bot releases are visible (Hide)

cudf - [NIGHTLY] v23.02.00

Published by rapids-bot[bot] over 1 year ago

πŸ”— Links

🚨 Breaking Changes

  • Pin dask and distributed for release (#12695) @galipremsagar
  • Change ways to access ptr in Buffer (#12587) @galipremsagar
  • Remove column names (#12578) @vuule
  • Default cudf::io::read_json to nested JSON parser (#12544) @vuule
  • Switch engine=cudf to the new JSON reader (#12509) @galipremsagar
  • Add trailing comma support for nested JSON reader (#12448) @karthikeyann
  • Upgrade to arrow-10.0.1 (#12327) @galipremsagar
  • Fail loudly to avoid data corruption with unsupported input in read_orc (#12325) @vuule
  • CSV, JSON reader to infer integer column with nulls as int64 instead of float64 (#12309) @karthikeyann
  • Remove deprecated code for 23.02 (#12281) @vyasr
  • Null element for parsing error in numeric types in JSON, CSV reader (#12272) @karthikeyann
  • Purge non-empty nulls for superimpose_nulls and push_down_nulls (#12239) @ttnghia
  • Rename cudf::structs::detail::superimpose_parent_nulls APIs (#12230) @ttnghia
  • Remove JIT type names, refactor id_to_type. (#12158) @bdice
  • Floor division uses integer division for integral arguments (#12131) @wence-

πŸ› Bug Fixes

  • Fix update-version.sh (#12745) @raydouglass
  • Fix a mask data corruption in UDF (#12647) @galipremsagar
  • pre-commit: Update isort version to 5.12.0 (#12645) @wence-
  • tests: Skip cuInit tests if cuda-gdb is not found or not working (#12644) @wence-
  • Revert regex program java APIs and tests (#12639) @cindyyuanjiang
  • Fix leaks in ColumnVectorTest (#12625) @jlowe
  • Handle when spillable buffers own each other (#12607) @madsbk
  • Fix incorrect null counts for sliced columns in JCudfSerialization (#12589) @jlowe
  • lists: Transfer dtypes correctly through list.get (#12586) @wence-
  • timedelta: Don't go via float intermediates for floordiv (#12585) @wence-
  • Fixing BUG, get_next_chunk() should use the blocking function device_read() (#12584) @madsbk
  • Make JNI QuoteStyle accessible outside ai.rapids.cudf (#12572) @mythrocks
  • partition_by_hash(): support index (#12554) @madsbk
  • Mixed Join benchmark bug due to wrong conditional column (#12553) @divyegala
  • Update List Lexicographical Comparator (#12538) @divyegala
  • Dynamically read PTX version (#12534) @brandon-b-miller
  • build.sh switch to use RAPIDS magic value (#12525) @robertmaynard
  • Loosen runtime arrow pinning (#12522) @vyasr
  • Enable metadata transfer for complex types in transpose (#12491) @galipremsagar
  • Fix issues with parquet chunked reader (#12488) @nvdbaranec
  • Fix missing metadata transfer in concat for ListColumn (#12487) @galipremsagar
  • Rename libcudf substring source files to slice (#12484) @davidwendt
  • Fix compile issue with arrow 10 (#12465) @ttnghia
  • Fix List offsets bug in mixed type list column in nested JSON reader (#12447) @karthikeyann
  • Fix xfail incompatibilities (#12423) @vyasr
  • Fix bug in Parquet column index encoding (#12404) @etseidl
  • When building Arrow shared look for a shared OpenSSL (#12396) @robertmaynard
  • Fix get_json_object to return empty column on empty input (#12384) @davidwendt
  • Pin arrow 9 in testing dependencies to prevent conda solve issues (#12377) @vyasr
  • Fix reductions any/all return value for empty input (#12374) @davidwendt
  • Fix debug compile errors in parquet.hpp (#12372) @davidwendt
  • Purge non-empty nulls in cudf::make_lists_column (#12370) @ttnghia
  • Use correct memory resource in io::make_column (#12364) @vyasr
  • Add code to detect possible malformed page data in parquet files. (#12360) @nvdbaranec
  • Fail loudly to avoid data corruption with unsupported input in read_orc (#12325) @vuule
  • Fix NumericPairIteratorTest for float values (#12306) @davidwendt
  • Fixes memory allocation in nested JSON tokenizer (#12300) @elstehle
  • Reconstruct dtypes correctly for list aggs of struct columns (#12290) @wence-
  • Fix regex \A and \Z to strictly match string begin/end (#12282) @davidwendt
  • Fix compile issue in json_chunked_reader.cpp (#12280) @ttnghia
  • Change reductions any/all to return valid values for empty input (#12279) @davidwendt
  • Only exclude join keys that are indices from key columns (#12271) @wence-
  • Fix spill to device limit (#12252) @madsbk
  • Correct behaviour of sort in concat for singleton concatenations (#12247) @wence-
  • Purge non-empty nulls for superimpose_nulls and push_down_nulls (#12239) @ttnghia
  • Patch CUB DeviceSegmentedSort and remove workaround (#12234) @davidwendt
  • Fix memory leak in udf_string::assign(&&) function (#12206) @davidwendt
  • Workaround thrust-copy-if limit in json get_tree_representation (#12190) @davidwendt
  • Fix page size calculation in Parquet writer (#12182) @etseidl
  • Add cudf::detail::sizes_to_offsets_iterator to allow checking overflow in offsets (#12180) @davidwendt
  • Workaround thrust-copy-if limit in wordpiece-tokenizer (#12168) @davidwendt
  • Floor division uses integer division for integral arguments (#12131) @wence-

πŸ“– Documentation

  • Fix link to NVTX (#12598) @sameerz
  • Include missing groupby functions in documentation (#12580) @quasiben
  • Fix documentation author (#12527) @bdice
  • Update libcudf reduction docs for casting output types (#12526) @davidwendt
  • Add JSON reader page in user guide (#12499) @GregoryKimball
  • Link unsupported iteration API docstrings (#12482) @galipremsagar
  • strings_udf doc update (#12469) @brandon-b-miller
  • Update cudf_assert docs with correct NDEBUG behavior (#12464) @robertmaynard
  • Update pre-commit hooks guide (#12395) @bdice
  • Update test docs to not use detail comparison utilities (#12332) @PointKernel
  • Fix doxygen description for regex_program::compute_working_memory_size (#12329) @davidwendt
  • Add eval to docs. (#12322) @vyasr
  • Turn on xfail_strict=true (#12244) @wence-
  • Update 10 minutes to cuDF (#12114) @wence-

πŸš€ New Features

  • Use kvikIO as the default IO backend (#12574) @vuule
  • Use has_nonempty_nulls instead of may_contain_non_empty_nulls in superimpose_nulls and push_down_nulls (#12560) @ttnghia
  • Add strings methods removeprefix and removesuffix (#12557) @davidwendt
  • Add regex_program java APIs and unit tests (#12548) @cindyyuanjiang
  • Default cudf::io::read_json to nested JSON parser (#12544) @vuule
  • Make string quoting optional on CSV write (#12539) @mythrocks
  • Use new nvCOMP API to optimize the compression temp memory size (#12533) @vuule
  • Support "values" orient (array of arrays) in Nested JSON reader (#12498) @karthikeyann
  • one_hot_encode to use experimental row comparators (#12478) @divyegala
  • Support %W and %w format specifiers in cudf::strings::to_timestamps (#12475) @davidwendt
  • Add JSON Writer (#12474) @karthikeyann
  • Refactor thrust_copy_if into cudf::detail::copy_if_safe (#12455) @ttnghia
  • Add trailing comma support for nested JSON reader (#12448) @karthikeyann
  • Extract tokenize_json.hpp detail header from src/io/json/nested_json.hpp (#12432) @ttnghia
  • JNI bindings to write CSV (#12425) @mythrocks
  • Nested JSON depth benchmark (#12371) @karthikeyann
  • Implement lists::reverse (#12336) @ttnghia
  • Use device_read in experimental read_json (#12314) @vuule
  • Implement JNI for strings::reverse (#12283) @ttnghia
  • Null element for parsing error in numeric types in JSON, CSV reader (#12272) @karthikeyann
  • Add cudf::strings:like function with multiple patterns (#12269) @davidwendt
  • Add environment variable to control host memory allocation in hostdevice_vector (#12251) @vuule
  • Add cudf::strings::reverse function (#12227) @davidwendt
  • Selectively use dictionary encoding in Parquet writer (#12211) @etseidl
  • Support replace in strings_udf (#12207) @brandon-b-miller
  • Add support to read binary encoded decimals in parquet (#12205) @PointKernel
  • Support regex EOL where the string ends with a new-line character (#12181) @davidwendt
  • Updating stream_compaction/unique to use new row comparators (#12159) @divyegala
  • Add device buffer datasource (#12024) @PointKernel
  • Implement groupby apply with JIT (#11452) @bwyogatama

πŸ› οΈ Improvements

  • Update shared workflow branches (#12696) @ajschmidt8
  • Pin dask and distributed for release (#12695) @galipremsagar
  • Don't upload libcudf-example to Anaconda.org (#12671) @ajschmidt8
  • Pin wheel dependencies to same RAPIDS release (#12659) @sevagh
  • Use CTK 118/cp310 branch of wheel workflows (#12602) @sevagh
  • Change ways to access ptr in Buffer (#12587) @galipremsagar
  • Version a parquet writer xfail (#12579) @galipremsagar
  • Remove column names (#12578) @vuule
  • Parquet reader optimization to address V100 regression. (#12577) @nvdbaranec
  • Add support for category dtypes in CSV reader (#12571) @galipremsagar
  • Remove spill_lock parameter from SpillableBuffer.get_ptr() (#12564) @madsbk
  • Optimize cudf::make_lists_column (#12547) @ttnghia
  • Remove cudf::strings::repeat_strings_output_sizes from Java and JNI (#12546) @ttnghia
  • Test that cuInit is not called when RAPIDS_NO_INITIALIZE is set (#12545) @wence-
  • Rework repeat_strings to use sizes-to-offsets utility (#12543) @davidwendt
  • Replace exclusive_scan with sizes_to_offsets in cudf::lists::sequences (#12541) @davidwendt
  • Rework nvtext::ngrams_tokenize to use sizes-to-offsets utility (#12540) @davidwendt
  • Fix binary-ops gtests coded in namespace cudf::test (#12536) @davidwendt
  • More @acquire_spill_lock() and as_buffer(..., exposed=False) (#12535) @madsbk
  • Guard CUDA runtime APIs with error checking (#12531) @PointKernel
  • Update TODOs from issue 10432. (#12528) @bdice
  • Update rapids-cmake definitions version in GitHub Actions style checks. (#12511) @bdice
  • Switch engine=cudf to the new JSON reader (#12509) @galipremsagar
  • Fix SUM/MEAN aggregation type support. (#12503) @bdice
  • Stop using pandas._testing (#12492) @vyasr
  • Fix ROLLING_TEST gtests coded in namespace cudf::test (#12490) @davidwendt
  • Fix erroneously skipped ORC ZSTD test (#12486) @vuule
  • Rework nvtext::generate_character_ngrams to use make_strings_children (#12480) @davidwendt
  • Raise warnings as errors in the test suite (#12468) @vyasr
  • Remove int32 hard-coding in python (#12467) @galipremsagar
  • Use cudaMemcpyDefault. (#12466) @bdice
  • Update workflows for nightly tests (#12462) @ajschmidt8
  • Build CUDA 11.8 and Python 3.10 Packages (#12457) @ajschmidt8
  • JNI build image default as cuda11.8 (#12441) @pxLi
  • Re-enable Recently Updated Check (#12435) @ajschmidt8
  • Rework remaining cudf::strings::from_xyz functions to use make_strings_children (#12434) @vuule
  • Build wheels alongside conda CI (#12427) @sevagh
  • Remove arguments for checking exception messages in Python (#12424) @vyasr
  • Clean up cuco usage (#12421) @PointKernel
  • Fix warnings in remaining modules (#12406) @vyasr
  • Update ops-bot.yaml (#12402) @ajschmidt8
  • Rework cudf::strings::integers_to_ipv4 to use make_strings_children utility (#12401) @davidwendt
  • Use numpy.empty() instead of bytearray to allocate host memory for spilling (#12399) @madsbk
  • Deprecate chunksize from dask_cudf.read_csv (#12394) @rjzamora
  • Expose the RMM pool size in JNI (#12390) @revans2
  • Fix COPYING_TEST: gtests coded in namespace cudf::test (#12387) @davidwendt
  • Rework cudf::strings::url_encode to use make_strings_children utility (#12385) @davidwendt
  • Use make_strings_children in parse_data nested json reader (#12382) @karthikeyann
  • Fix warnings in test_datetime.py (#12381) @vyasr
  • Mixed Join Benchmarks (#12375) @divyegala
  • Fix warnings in dataframe.py (#12369) @vyasr
  • Update conda recipes. (#12368) @bdice
  • Use gpu-latest-1 runner tag (#12366) @bdice
  • Rework cudf::strings::from_booleans to use make_strings_children (#12365) @vuule
  • Fix warnings in test modules up to test_dataframe.py (#12355) @vyasr
  • JSON column performance optimization - struct column nulls (#12354) @karthikeyann
  • Accelerate stable-segmented-sort with CUB segmented sort (#12347) @davidwendt
  • Add size check to make_offsets_child_column utility (#12345) @davidwendt
  • Enable max compression ratio small block optimization for ZSTD (#12338) @vuule
  • Fix warnings in test_monotonic.py (#12334) @vyasr
  • Improve JSON column creation performance (list offsets) (#12330) @karthikeyann
  • Upgrade to arrow-10.0.1 (#12327) @galipremsagar
  • Fix warnings in test_orc.py (#12326) @vyasr
  • Fix warnings in test_groupby.py (#12324) @vyasr
  • Fix test_notebooks.sh (#12323) @ajschmidt8
  • Fix transform gtests coded in namespace cudf::test (#12321) @davidwendt
  • Fix check_style.sh script (#12320) @ajschmidt8
  • Rework cudf::strings::from_timestamps to use make_strings_children (#12317) @davidwendt
  • Fix warnings in test_index.py (#12313) @vyasr
  • Fix warnings in test_multiindex.py (#12310) @vyasr
  • CSV, JSON reader to infer integer column with nulls as int64 instead of float64 (#12309) @karthikeyann
  • Fix warnings in test_indexing.py (#12305) @vyasr
  • Fix warnings in test_joining.py (#12304) @vyasr
  • Unpin dask and distributed for development (#12302) @galipremsagar
  • Re-enable sccache for Jenkins builds (#12297) @ajschmidt8
  • Define needs for pr-builder workflow. (#12296) @bdice
  • Forward merge 22.12 into 23.02 (#12294) @vyasr
  • Fix warnings in test_stats.py (#12293) @vyasr
  • Fix table gtests coded in namespace cudf::test (#12292) @davidwendt
  • Change cython for regex calls to use cudf::strings::regex_program (#12289) @davidwendt
  • Improved error reporting when reading multiple JSON files (#12285) @vuule
  • Deprecate Frame.sum_of_squares (#12284) @vyasr
  • Remove deprecated code for 23.02 (#12281) @vyasr
  • Clean up handling of max_page_size_bytes in Parquet writer (#12277) @etseidl
  • Fix replace gtests coded in namespace cudf::test (#12270) @davidwendt
  • Add pandas nullable type support in Index.to_pandas (#12268) @galipremsagar
  • Rework nvtext::detokenize to use indexalator for row indices (#12267) @davidwendt
  • Fix reduction gtests coded in namespace cudf::test (#12257) @davidwendt
  • Remove default parameters from cudf::detail::sort function declarations (#12254) @davidwendt
  • Add duplicated support for Series, DataFrame and Index (#12246) @galipremsagar
  • Replace column/table test utilities with macros (#12242) @PointKernel
  • Rework cudf::strings::pad and zfill to use make_strings_children (#12238) @davidwendt
  • Fix sort gtests coded in namespace cudf::test (#12237) @davidwendt
  • Wrapping concat and file writes in @acquire_spill_lock() (#12232) @madsbk
  • Rename cudf::structs::detail::superimpose_parent_nulls APIs (#12230) @ttnghia
  • Cover parsing to decimal types in read_json tests (#12229) @vuule
  • Spill Statistics (#12223) @madsbk
  • Use CUDF_JNI_ENABLE_PROFILING to conditionally enable profiling support. (#12221) @bdice
  • Clean up of test_spilling.py (#12220) @madsbk
  • Simplify repetitive boolean logic (#12218) @vuule
  • Add Series.hasnans and Index.hasnans (#12214) @galipremsagar
  • Add cudf::strings:udf::replace function (#12210) @davidwendt
  • Adds in new java APIs for appending byte arrays to host columnar data (#12208) @revans2
  • Remove Python dependencies from Java CI. (#12193) @bdice
  • Fix null order in sort-based groupby and improve groupby tests (#12191) @divyegala
  • Move strings children functions from cudf/strings/detail/utilities.cuh to new header (#12185) @davidwendt
  • Clean up existing JNI scalar to column code (#12173) @revans2
  • Remove JIT type names, refactor id_to_type. (#12158) @bdice
  • Update JNI version to 23.02.0-SNAPSHOT (#12129) @pxLi
  • Minor refactor of cpp/src/io/parquet/page_data.cu (#12126) @etseidl
  • Add codespell as a linter (#12097) @benfred
  • Enable specifying exceptions in error macros (#12078) @vyasr
  • Move _label_encoding from Series to Column (#12040) @shwina
  • Add GitHub Actions Workflows (#12002) @ajschmidt8
  • Consolidate dask-cudf groupby_agg calls in one place (#10835) @charlesbluca
cudf - v23.02.00

Published by raydouglass over 1 year ago

🚨 Breaking Changes

  • Pin dask and distributed for release (#12695) @galipremsagar
  • Change ways to access ptr in Buffer (#12587) @galipremsagar
  • Remove column names (#12578) @vuule
  • Default cudf::io::read_json to nested JSON parser (#12544) @vuule
  • Switch engine=cudf to the new JSON reader (#12509) @galipremsagar
  • Add trailing comma support for nested JSON reader (#12448) @karthikeyann
  • Upgrade to arrow-10.0.1 (#12327) @galipremsagar
  • Fail loudly to avoid data corruption with unsupported input in read_orc (#12325) @vuule
  • CSV, JSON reader to infer integer column with nulls as int64 instead of float64 (#12309) @karthikeyann
  • Remove deprecated code for 23.02 (#12281) @vyasr
  • Null element for parsing error in numeric types in JSON, CSV reader (#12272) @karthikeyann
  • Purge non-empty nulls for superimpose_nulls and push_down_nulls (#12239) @ttnghia
  • Rename cudf::structs::detail::superimpose_parent_nulls APIs (#12230) @ttnghia
  • Remove JIT type names, refactor id_to_type. (#12158) @bdice
  • Floor division uses integer division for integral arguments (#12131) @wence-

πŸ› Bug Fixes

  • Fix a mask data corruption in UDF (#12647) @galipremsagar
  • pre-commit: Update isort version to 5.12.0 (#12645) @wence-
  • tests: Skip cuInit tests if cuda-gdb is not found or not working (#12644) @wence-
  • Revert regex program java APIs and tests (#12639) @cindyyuanjiang
  • Fix leaks in ColumnVectorTest (#12625) @jlowe
  • Handle when spillable buffers own each other (#12607) @madsbk
  • Fix incorrect null counts for sliced columns in JCudfSerialization (#12589) @jlowe
  • lists: Transfer dtypes correctly through list.get (#12586) @wence-
  • timedelta: Don't go via float intermediates for floordiv (#12585) @wence-
  • Fixing BUG, get_next_chunk() should use the blocking function device_read() (#12584) @madsbk
  • Make JNI QuoteStyle accessible outside ai.rapids.cudf (#12572) @mythrocks
  • partition_by_hash(): support index (#12554) @madsbk
  • Mixed Join benchmark bug due to wrong conditional column (#12553) @divyegala
  • Update List Lexicographical Comparator (#12538) @divyegala
  • Dynamically read PTX version (#12534) @brandon-b-miller
  • build.sh switch to use RAPIDS magic value (#12525) @robertmaynard
  • Loosen runtime arrow pinning (#12522) @vyasr
  • Enable metadata transfer for complex types in transpose (#12491) @galipremsagar
  • Fix issues with parquet chunked reader (#12488) @nvdbaranec
  • Fix missing metadata transfer in concat for ListColumn (#12487) @galipremsagar
  • Rename libcudf substring source files to slice (#12484) @davidwendt
  • Fix compile issue with arrow 10 (#12465) @ttnghia
  • Fix List offsets bug in mixed type list column in nested JSON reader (#12447) @karthikeyann
  • Fix xfail incompatibilities (#12423) @vyasr
  • Fix bug in Parquet column index encoding (#12404) @etseidl
  • When building Arrow shared look for a shared OpenSSL (#12396) @robertmaynard
  • Fix get_json_object to return empty column on empty input (#12384) @davidwendt
  • Pin arrow 9 in testing dependencies to prevent conda solve issues (#12377) @vyasr
  • Fix reductions any/all return value for empty input (#12374) @davidwendt
  • Fix debug compile errors in parquet.hpp (#12372) @davidwendt
  • Purge non-empty nulls in cudf::make_lists_column (#12370) @ttnghia
  • Use correct memory resource in io::make_column (#12364) @vyasr
  • Add code to detect possible malformed page data in parquet files. (#12360) @nvdbaranec
  • Fail loudly to avoid data corruption with unsupported input in read_orc (#12325) @vuule
  • Fix NumericPairIteratorTest for float values (#12306) @davidwendt
  • Fixes memory allocation in nested JSON tokenizer (#12300) @elstehle
  • Reconstruct dtypes correctly for list aggs of struct columns (#12290) @wence-
  • Fix regex \A and \Z to strictly match string begin/end (#12282) @davidwendt
  • Fix compile issue in json_chunked_reader.cpp (#12280) @ttnghia
  • Change reductions any/all to return valid values for empty input (#12279) @davidwendt
  • Only exclude join keys that are indices from key columns (#12271) @wence-
  • Fix spill to device limit (#12252) @madsbk
  • Correct behaviour of sort in concat for singleton concatenations (#12247) @wence-
  • Purge non-empty nulls for superimpose_nulls and push_down_nulls (#12239) @ttnghia
  • Patch CUB DeviceSegmentedSort and remove workaround (#12234) @davidwendt
  • Fix memory leak in udf_string::assign(&&) function (#12206) @davidwendt
  • Workaround thrust-copy-if limit in json get_tree_representation (#12190) @davidwendt
  • Fix page size calculation in Parquet writer (#12182) @etseidl
  • Add cudf::detail::sizes_to_offsets_iterator to allow checking overflow in offsets (#12180) @davidwendt
  • Workaround thrust-copy-if limit in wordpiece-tokenizer (#12168) @davidwendt
  • Floor division uses integer division for integral arguments (#12131) @wence-

πŸ“– Documentation

  • Fix link to NVTX (#12598) @sameerz
  • Include missing groupby functions in documentation (#12580) @quasiben
  • Fix documentation author (#12527) @bdice
  • Update libcudf reduction docs for casting output types (#12526) @davidwendt
  • Add JSON reader page in user guide (#12499) @GregoryKimball
  • Link unsupported iteration API docstrings (#12482) @galipremsagar
  • strings_udf doc update (#12469) @brandon-b-miller
  • Update cudf_assert docs with correct NDEBUG behavior (#12464) @robertmaynard
  • Update pre-commit hooks guide (#12395) @bdice
  • Update test docs to not use detail comparison utilities (#12332) @PointKernel
  • Fix doxygen description for regex_program::compute_working_memory_size (#12329) @davidwendt
  • Add eval to docs. (#12322) @vyasr
  • Turn on xfail_strict=true (#12244) @wence-
  • Update 10 minutes to cuDF (#12114) @wence-

πŸš€ New Features

  • Use kvikIO as the default IO backend (#12574) @vuule
  • Use has_nonempty_nulls instead of may_contain_non_empty_nulls in superimpose_nulls and push_down_nulls (#12560) @ttnghia
  • Add strings methods removeprefix and removesuffix (#12557) @davidwendt
  • Add regex_program java APIs and unit tests (#12548) @cindyyuanjiang
  • Default cudf::io::read_json to nested JSON parser (#12544) @vuule
  • Make string quoting optional on CSV write (#12539) @mythrocks
  • Use new nvCOMP API to optimize the compression temp memory size (#12533) @vuule
  • Support "values" orient (array of arrays) in Nested JSON reader (#12498) @karthikeyann
  • one_hot_encode to use experimental row comparators (#12478) @divyegala
  • Support %W and %w format specifiers in cudf::strings::to_timestamps (#12475) @davidwendt
  • Add JSON Writer (#12474) @karthikeyann
  • Refactor thrust_copy_if into cudf::detail::copy_if_safe (#12455) @ttnghia
  • Add trailing comma support for nested JSON reader (#12448) @karthikeyann
  • Extract tokenize_json.hpp detail header from src/io/json/nested_json.hpp (#12432) @ttnghia
  • JNI bindings to write CSV (#12425) @mythrocks
  • Nested JSON depth benchmark (#12371) @karthikeyann
  • Implement lists::reverse (#12336) @ttnghia
  • Use device_read in experimental read_json (#12314) @vuule
  • Implement JNI for strings::reverse (#12283) @ttnghia
  • Null element for parsing error in numeric types in JSON, CSV reader (#12272) @karthikeyann
  • Add cudf::strings:like function with multiple patterns (#12269) @davidwendt
  • Add environment variable to control host memory allocation in hostdevice_vector (#12251) @vuule
  • Add cudf::strings::reverse function (#12227) @davidwendt
  • Selectively use dictionary encoding in Parquet writer (#12211) @etseidl
  • Support replace in strings_udf (#12207) @brandon-b-miller
  • Add support to read binary encoded decimals in parquet (#12205) @PointKernel
  • Support regex EOL where the string ends with a new-line character (#12181) @davidwendt
  • Updating stream_compaction/unique to use new row comparators (#12159) @divyegala
  • Add device buffer datasource (#12024) @PointKernel
  • Implement groupby apply with JIT (#11452) @bwyogatama

πŸ› οΈ Improvements

  • Update shared workflow branches (#12696) @ajschmidt8
  • Pin dask and distributed for release (#12695) @galipremsagar
  • Don't upload libcudf-example to Anaconda.org (#12671) @ajschmidt8
  • Pin wheel dependencies to same RAPIDS release (#12659) @sevagh
  • Use CTK 118/cp310 branch of wheel workflows (#12602) @sevagh
  • Change ways to access ptr in Buffer (#12587) @galipremsagar
  • Version a parquet writer xfail (#12579) @galipremsagar
  • Remove column names (#12578) @vuule
  • Parquet reader optimization to address V100 regression. (#12577) @nvdbaranec
  • Add support for category dtypes in CSV reader (#12571) @galipremsagar
  • Remove spill_lock parameter from SpillableBuffer.get_ptr() (#12564) @madsbk
  • Optimize cudf::make_lists_column (#12547) @ttnghia
  • Remove cudf::strings::repeat_strings_output_sizes from Java and JNI (#12546) @ttnghia
  • Test that cuInit is not called when RAPIDS_NO_INITIALIZE is set (#12545) @wence-
  • Rework repeat_strings to use sizes-to-offsets utility (#12543) @davidwendt
  • Replace exclusive_scan with sizes_to_offsets in cudf::lists::sequences (#12541) @davidwendt
  • Rework nvtext::ngrams_tokenize to use sizes-to-offsets utility (#12540) @davidwendt
  • Fix binary-ops gtests coded in namespace cudf::test (#12536) @davidwendt
  • More @acquire_spill_lock() and as_buffer(..., exposed=False) (#12535) @madsbk
  • Guard CUDA runtime APIs with error checking (#12531) @PointKernel
  • Update TODOs from issue 10432. (#12528) @bdice
  • Update rapids-cmake definitions version in GitHub Actions style checks. (#12511) @bdice
  • Switch engine=cudf to the new JSON reader (#12509) @galipremsagar
  • Fix SUM/MEAN aggregation type support. (#12503) @bdice
  • Stop using pandas._testing (#12492) @vyasr
  • Fix ROLLING_TEST gtests coded in namespace cudf::test (#12490) @davidwendt
  • Fix erroneously skipped ORC ZSTD test (#12486) @vuule
  • Rework nvtext::generate_character_ngrams to use make_strings_children (#12480) @davidwendt
  • Raise warnings as errors in the test suite (#12468) @vyasr
  • Remove int32 hard-coding in python (#12467) @galipremsagar
  • Use cudaMemcpyDefault. (#12466) @bdice
  • Update workflows for nightly tests (#12462) @ajschmidt8
  • Build CUDA 11.8 and Python 3.10 Packages (#12457) @ajschmidt8
  • JNI build image default as cuda11.8 (#12441) @pxLi
  • Re-enable Recently Updated Check (#12435) @ajschmidt8
  • Rework remaining cudf::strings::from_xyz functions to use make_strings_children (#12434) @vuule
  • Build wheels alongside conda CI (#12427) @sevagh
  • Remove arguments for checking exception messages in Python (#12424) @vyasr
  • Clean up cuco usage (#12421) @PointKernel
  • Fix warnings in remaining modules (#12406) @vyasr
  • Update ops-bot.yaml (#12402) @ajschmidt8
  • Rework cudf::strings::integers_to_ipv4 to use make_strings_children utility (#12401) @davidwendt
  • Use numpy.empty() instead of bytearray to allocate host memory for spilling (#12399) @madsbk
  • Deprecate chunksize from dask_cudf.read_csv (#12394) @rjzamora
  • Expose the RMM pool size in JNI (#12390) @revans2
  • Fix COPYING_TEST: gtests coded in namespace cudf::test (#12387) @davidwendt
  • Rework cudf::strings::url_encode to use make_strings_children utility (#12385) @davidwendt
  • Use make_strings_children in parse_data nested json reader (#12382) @karthikeyann
  • Fix warnings in test_datetime.py (#12381) @vyasr
  • Mixed Join Benchmarks (#12375) @divyegala
  • Fix warnings in dataframe.py (#12369) @vyasr
  • Update conda recipes. (#12368) @bdice
  • Use gpu-latest-1 runner tag (#12366) @bdice
  • Rework cudf::strings::from_booleans to use make_strings_children (#12365) @vuule
  • Fix warnings in test modules up to test_dataframe.py (#12355) @vyasr
  • JSON column performance optimization - struct column nulls (#12354) @karthikeyann
  • Accelerate stable-segmented-sort with CUB segmented sort (#12347) @davidwendt
  • Add size check to make_offsets_child_column utility (#12345) @davidwendt
  • Enable max compression ratio small block optimization for ZSTD (#12338) @vuule
  • Fix warnings in test_monotonic.py (#12334) @vyasr
  • Improve JSON column creation performance (list offsets) (#12330) @karthikeyann
  • Upgrade to arrow-10.0.1 (#12327) @galipremsagar
  • Fix warnings in test_orc.py (#12326) @vyasr
  • Fix warnings in test_groupby.py (#12324) @vyasr
  • Fix test_notebooks.sh (#12323) @ajschmidt8
  • Fix transform gtests coded in namespace cudf::test (#12321) @davidwendt
  • Fix check_style.sh script (#12320) @ajschmidt8
  • Rework cudf::strings::from_timestamps to use make_strings_children (#12317) @davidwendt
  • Fix warnings in test_index.py (#12313) @vyasr
  • Fix warnings in test_multiindex.py (#12310) @vyasr
  • CSV, JSON reader to infer integer column with nulls as int64 instead of float64 (#12309) @karthikeyann
  • Fix warnings in test_indexing.py (#12305) @vyasr
  • Fix warnings in test_joining.py (#12304) @vyasr
  • Unpin dask and distributed for development (#12302) @galipremsagar
  • Re-enable sccache for Jenkins builds (#12297) @ajschmidt8
  • Define needs for pr-builder workflow. (#12296) @bdice
  • Forward merge 22.12 into 23.02 (#12294) @vyasr
  • Fix warnings in test_stats.py (#12293) @vyasr
  • Fix table gtests coded in namespace cudf::test (#12292) @davidwendt
  • Change cython for regex calls to use cudf::strings::regex_program (#12289) @davidwendt
  • Improved error reporting when reading multiple JSON files (#12285) @vuule
  • Deprecate Frame.sum_of_squares (#12284) @vyasr
  • Remove deprecated code for 23.02 (#12281) @vyasr
  • Clean up handling of max_page_size_bytes in Parquet writer (#12277) @etseidl
  • Fix replace gtests coded in namespace cudf::test (#12270) @davidwendt
  • Add pandas nullable type support in Index.to_pandas (#12268) @galipremsagar
  • Rework nvtext::detokenize to use indexalator for row indices (#12267) @davidwendt
  • Fix reduction gtests coded in namespace cudf::test (#12257) @davidwendt
  • Remove default parameters from cudf::detail::sort function declarations (#12254) @davidwendt
  • Add duplicated support for Series, DataFrame and Index (#12246) @galipremsagar
  • Replace column/table test utilities with macros (#12242) @PointKernel
  • Rework cudf::strings::pad and zfill to use make_strings_children (#12238) @davidwendt
  • Fix sort gtests coded in namespace cudf::test (#12237) @davidwendt
  • Wrapping concat and file writes in @acquire_spill_lock() (#12232) @madsbk
  • Rename cudf::structs::detail::superimpose_parent_nulls APIs (#12230) @ttnghia
  • Cover parsing to decimal types in read_json tests (#12229) @vuule
  • Spill Statistics (#12223) @madsbk
  • Use CUDF_JNI_ENABLE_PROFILING to conditionally enable profiling support. (#12221) @bdice
  • Clean up of test_spilling.py (#12220) @madsbk
  • Simplify repetitive boolean logic (#12218) @vuule
  • Add Series.hasnans and Index.hasnans (#12214) @galipremsagar
  • Add cudf::strings:udf::replace function (#12210) @davidwendt
  • Adds in new java APIs for appending byte arrays to host columnar data (#12208) @revans2
  • Remove Python dependencies from Java CI. (#12193) @bdice
  • Fix null order in sort-based groupby and improve groupby tests (#12191) @divyegala
  • Move strings children functions from cudf/strings/detail/utilities.cuh to new header (#12185) @davidwendt
  • Clean up existing JNI scalar to column code (#12173) @revans2
  • Remove JIT type names, refactor id_to_type. (#12158) @bdice
  • Update JNI version to 23.02.0-SNAPSHOT (#12129) @pxLi
  • Minor refactor of cpp/src/io/parquet/page_data.cu (#12126) @etseidl
  • Add codespell as a linter (#12097) @benfred
  • Enable specifying exceptions in error macros (#12078) @vyasr
  • Move _label_encoding from Series to Column (#12040) @shwina
  • Add GitHub Actions Workflows (#12002) @ajschmidt8
  • Consolidate dask-cudf groupby_agg calls in one place (#10835) @charlesbluca
cudf - v22.12.01

Published by GPUtester almost 2 years ago

🚨 Breaking Changes

  • Add JNI for substring without 'end' parameter. (#12113) @firestarman
  • Refactor purge_nonempty_nulls (#12111) @ttnghia
  • Create an int8 column in read_csv when all elements are missing (#12110) @vuule
  • Throw an error when libcudf is built without cuFile and LIBCUDF_CUFILE_POLICY is set to "ALWAYS" (#12080) @vuule
  • Fix type promotion edge cases in numerical binops (#12074) @wence-
  • Reduce/Remove reliance on **kwargs and *args in IO readers & writers (#12025) @galipremsagar
  • Rollback of DeviceBufferLike (#12009) @madsbk
  • Remove unused managed_allocator (#12005) @vyasr
  • Pass column names to write_csv instead of table_metadata pointer (#11972) @vuule
  • Accept const refs instead of const unique_ptr refs in reduce and scan APIs. (#11960) @vyasr
  • Default to equal NaNs in make_merge_sets_aggregation. (#11952) @bdice
  • Remove validation that requires introspection (#11938) @vyasr
  • Trim quotes for non-string values in nested json parsing (#11898) @karthikeyann
  • Add tests ensuring that cudf's default stream is always used (#11875) @vyasr
  • Support nested types as groupby keys in libcudf (#11792) @PointKernel
  • Default to equal NaNs in make_collect_set_aggregation. (#11621) @bdice
  • Removing int8 column option from parquet byte_array writing (#11539) @hyperbolic2346
  • part1: Simplify BaseIndex to an abstract class (#10389) @skirui-source

πŸ› Bug Fixes

  • strings_udf: use libcudf caching of character tables (#12343) @wence-
  • Fix include line for IO Cython modules (#12250) @vyasr
  • Make dask pinning looser (#12231) @vyasr
  • Workaround for CUB segmented-sort bug with boolean keys (#12217) @davidwendt
  • Fix from_dict backend dispatch to match upstream dask (#12203) @galipremsagar
  • Merge branch-22.10 into branch-22.12 (#12198) @davidwendt
  • Fix compression in ORC writer (#12194) @vuule
  • Don't use CMake 3.25.0 as it has a show stopping FindCUDAToolkit bug (#12188) @robertmaynard
  • Fix data corruption when reading ORC files with empty stripes (#12160) @vuule
  • Fix decimal binary operations (#12142) @galipremsagar
  • Ensure dlpack include is provided to cudf interop lib (#12139) @robertmaynard
  • Safely allocate udf_string pointers in strings_udf (#12138) @brandon-b-miller
  • Fix/disable jitify lto (#12122) @robertmaynard
  • Fix conditional_full_join benchmark (#12121) @GregoryKimball
  • Fix regex working-memory-size refactor error (#12119) @davidwendt
  • Add in negative size checks for columns (#12118) @revans2
  • Add JNI for substring without 'end' parameter. (#12113) @firestarman
  • Fix reading of CSV files with blank second row (#12098) @vuule
  • Fix an error in IO with GzipFile type (#12085) @galipremsagar
  • Workaround groupby aggregate thrust::copy_if overflow (#12079) @davidwendt
  • Fix alignment of compressed blocks in ORC writer (#12077) @vuule
  • Fix singleton-range __setitem__ edge case (#12075) @wence-
  • Fix type promotion edge cases in numerical binops (#12074) @wence-
  • Force using old fmt in nvbench. (#12067) @vyasr
  • Fixes List offset bug in Nested JSON reader (#12060) @karthikeyann
  • Allow falling back to shim_60.ptx by default in strings_udf (#12056) @brandon-b-miller
  • Force black exclusions for pre-commit. (#12036) @bdice
  • Add memory_usage & items implementation for Struct column & dtype (#12033) @galipremsagar
  • Reduce/Remove reliance on **kwargs and *args in IO readers & writers (#12025) @galipremsagar
  • Fixes bug in csv_reader_options construction in cython (#12021) @karthikeyann
  • Fix issues when both usecols and names options are used in read_csv (#12018) @vuule
  • Port thrust's pinned_allocator to cudf, since Thrust 1.17 removes the type (#12004) @robertmaynard
  • Revert "Replace most of preprocessor usage in nvcomp adapter with constexpr" (#11999) @vuule
  • Fix bug where df.loc resulting in single row could give wrong index (#11998) @eriknw
  • Switch to DISABLE_DEPRECATION_WARNINGS to match other RAPIDS projects (#11989) @robertmaynard
  • Fix maximum page size estimate in Parquet writer (#11962) @vuule
  • Fix local offset handling in bgzip reader (#11918) @upsj
  • Fix an issue reading struct-of-list types in Parquet. (#11910) @nvdbaranec
  • Fix memcheck error in TypeInference.Timestamp gtest (#11905) @davidwendt
  • Fix type casting in Series.setitem (#11904) @wence-
  • Fix memcheck error in get_dremel_data (#11903) @davidwendt
  • Fixes Unsupported column type error due to empty list columns in Nested JSON reader (#11897) @karthikeyann
  • Fix segmented-sort to ignore indices outside the offsets (#11888) @davidwendt
  • Fix cudf::stable_sorted_order for NaN and -NaN in FLOAT64 columns (#11874) @davidwendt
  • Fix writing of Parquet files with many fragments (#11869) @etseidl
  • Fix RangeIndex unary operators. (#11868) @vyasr
  • JNI Avoid NPE for reading host binary data (#11865) @revans2
  • Fix decimal benchmark input data generation (#11863) @karthikeyann
  • Fix pre-commit copyright check (#11860) @galipremsagar
  • Fix Parquet support for seconds and milliseconds duration types (#11854) @vuule
  • Ensure better compiler cache results between cudf cal-ver branches (#11835) @robertmaynard
  • Fix make_column_from_scalar for all-null strings column (#11807) @davidwendt
  • Tell jitify_preprocess where to search for libnvrtc (#11787) @robertmaynard
  • add V2 page header support to parquet reader (#11778) @etseidl
  • Parquet reader: bug fix for a num_rows/skip_rows corner case, w/optimization for nested preprocessing (#11752) @nvdbaranec
  • Determine if Arrow has S3 support at runtime in unit test. (#11560) @bdice

πŸ“– Documentation

  • Use rapidsai CODE_OF_CONDUCT.md (#12166) @bdice
  • Add symlinks to notebooks. (#12128) @bdice
  • Add truncate API to python doc pages (#12109) @galipremsagar
  • Update Numba docs links. (#12107) @bdice
  • Remove "Multi-GPU with Dask-cuDF" notebook. (#12095) @bdice
  • Fix link to c++ developer guide from CONTRIBUTING.md (#12084) @brandon-b-miller
  • Add pivot_table and crosstab to docs. (#12014) @bdice
  • Fix doxygen text for cudf::dictionary::encode (#11991) @davidwendt
  • Replace default_stream_value with get_default_stream in docs. (#11985) @vyasr
  • Add dtype docs pages and docstrings for cudf specific dtypes (#11974) @galipremsagar
  • Update Unit Testing in libcudf guidelines to code tests outside the cudf::test namespace (#11959) @davidwendt
  • Rename libcudf++ to libcudf. (#11953) @bdice
  • Fix documentation referring to removed as_gpu_matrix method. (#11937) @bdice
  • Remove "experimental" warning for struct columns in ORC reader and writer (#11880) @vuule
  • Initial draft of policies and guidelines for libcudf usage. (#11853) @vyasr
  • Add clear indication of non-GPU accelerated parameters in read_json docstring (#11825) @GregoryKimball
  • Add developer docs for writing tests (#11199) @vyasr

πŸš€ New Features

  • Adds an EventHandler to Java MemoryBuffer to be invoked on close (#12125) @abellina
  • Support + in strings_udf (#12117) @brandon-b-miller
  • Support upper and lower in strings_udf (#12099) @brandon-b-miller
  • Add wheel builds (#12096) @vyasr
  • Allow setting malloc heap size in string udfs (#12094) @brandon-b-miller
  • Support strip, lstrip, and rstrip in strings_udf (#12091) @brandon-b-miller
  • Mark nvcomp zstd compression stable (#12059) @jbrennan333
  • Add debug-only onAllocated/onDeallocated to RmmEventHandler (#12054) @abellina
  • Enable building against the libarrow contained in pyarrow (#12034) @vyasr
  • Add strings like jni and native method (#12032) @cindyyuanjiang
  • Cleanup common parsing code in JSON, CSV reader (#12022) @karthikeyann
  • byte_range support for JSON Lines format (#12017) @karthikeyann
  • Minor cleanup of root CMakeLists.txt for better organization (#11988) @robertmaynard
  • Add inplace arithmetic operators to MaskedType (#11987) @brandon-b-miller
  • Implement JNI for chunked Parquet reader (#11961) @ttnghia
  • Add method argument to DataFrame.quantile (#11957) @rjzamora
  • Add gpu memory watermark apis to JNI (#11950) @abellina
  • Adds retryCount to RmmEventHandler.onAllocFailure (#11940) @abellina
  • Enable returning string data from UDFs used through apply (#11933) @brandon-b-miller
  • Switch over to rapids-cmake patches for thrust (#11921) @robertmaynard
  • Add strings udf C++ classes and functions for phase II (#11912) @davidwendt
  • Trim quotes for non-string values in nested json parsing (#11898) @karthikeyann
  • Enable CEC for strings_udf (#11884) @brandon-b-miller
  • ArrowIPCTableWriter writes en empty batch in the case of an empty table. (#11883) @firestarman
  • Implement chunked Parquet reader (#11867) @ttnghia
  • Add read_orc_metadata to libcudf (#11815) @vuule
  • Support nested types as groupby keys in libcudf (#11792) @PointKernel
  • Adding feature Truncate to DataFrame and Series (#11435) @VamsiTallam95

πŸ› οΈ Improvements

  • Reduce number of tests marked spilling (#12197) @madsbk
  • Pin dask and distributed for release (#12165) @galipremsagar
  • Don't rely on GNU find in headers_test.sh (#12164) @wence-
  • Update cp.clip call (#12148) @quasiben
  • Enable automatic column projection in groupby().agg (#12124) @rjzamora
  • Refactor purge_nonempty_nulls (#12111) @ttnghia
  • Create an int8 column in read_csv when all elements are missing (#12110) @vuule
  • Spilling to host memory (#12106) @madsbk
  • First pass of pd.read_orc changes in tests (#12103) @galipremsagar
  • Expose engine argument in dask_cudf.read_json (#12101) @rjzamora
  • Remove CUDA 10 compatibility code. (#12088) @bdice
  • Move and update dask nigthly install in CI (#12082) @galipremsagar
  • Throw an error when libcudf is built without cuFile and LIBCUDF_CUFILE_POLICY is set to "ALWAYS" (#12080) @vuule
  • Remove macros that inspect the contents of exceptions (#12076) @vyasr
  • Fix ingest_raw_data performance issue in Nested JSON reader due to RVO (#12070) @karthikeyann
  • Remove overflow error during decimal binops (#12063) @galipremsagar
  • Change cudf::detail::tdigest to cudf::tdigest::detail (#12050) @davidwendt
  • Fix quantile gtests coded in namespace cudf::test (#12049) @davidwendt
  • Add support for DataFrame.from_dict`to_dictandSeries.to_dict` (#12048) @galipremsagar
  • Refactor Parquet reader (#12046) @ttnghia
  • Forward merge 22.10 into 22.12 (#12045) @vyasr
  • Standardize newlines at ends of files. (#12042) @bdice
  • Trim trailing whitespace from all files. (#12041) @bdice
  • Use nosync policy in gather and scatter implementations. (#12038) @bdice
  • Remove smart quotes from all docstrings. (#12035) @bdice
  • Update cuda-python dependency to 11.7.1 (#12030) @galipremsagar
  • Add cython-lint to pre-commit checks. (#12020) @bdice
  • Use pragma once (#12019) @bdice
  • New GHA to add issues/prs to project board (#12016) @jarmak-nv
  • Add DataFrame.pivot_table. (#12015) @bdice
  • Rollback of DeviceBufferLike (#12009) @madsbk
  • Remove default parameters for nvtext::detail functions (#12007) @davidwendt
  • Remove default parameters for cudf::dictionary::detail functions (#12006) @davidwendt
  • Remove unused managed_allocator (#12005) @vyasr
  • Remove default parameters for cudf::strings::detail functions (#12003) @davidwendt
  • Remove unnecessary code from dask-cudf _Frame (#12001) @rjzamora
  • Ignore python docs build artifacts (#12000) @galipremsagar
  • Use rapids-cmake for google benchmark. (#11997) @vyasr
  • Leverage rapids_cython for more automated RPATH handling (#11996) @vyasr
  • Remove stale labeler (#11995) @raydouglass
  • Move protobuf compilation to CMake (#11986) @vyasr
  • Replace most of preprocessor usage in nvcomp adapter with constexpr (#11980) @vuule
  • Add missing noexcepts to column_in_metadata methods (#11973) @vyasr
  • Pass column names to write_csv instead of table_metadata pointer (#11972) @vuule
  • Accelerate libcudf segmented sort with CUB segmented sort (#11969) @davidwendt
  • Feature/remove default streams (#11967) @vyasr
  • Add pool memory resource to libcudf basic example (#11966) @davidwendt
  • Fix some libcudf calls to cudf::detail::gather (#11963) @davidwendt
  • Accept const refs instead of const unique_ptr refs in reduce and scan APIs. (#11960) @vyasr
  • Add deprecation warning for set_allocator. (#11958) @vyasr
  • Fix lists and structs gtests coded in namespace cudf::test (#11956) @davidwendt
  • Add full page indexes to Parquet writer benchmarks (#11955) @etseidl
  • Use gather-based strings factory in cudf::strings::strip (#11954) @davidwendt
  • Default to equal NaNs in make_merge_sets_aggregation. (#11952) @bdice
  • Add strip_delimiters option to read_text (#11946) @upsj
  • Refactor multibyte_split output_builder (#11945) @upsj
  • Remove validation that requires introspection (#11938) @vyasr
  • Add .str.find_multiple API (#11928) @galipremsagar
  • Add regex_program class for use with all regex APIs (#11927) @davidwendt
  • Enable backend dispatching for Dask-DataFrame creation (#11920) @rjzamora
  • Performance improvement in JSON Tree traversal (#11919) @karthikeyann
  • Fix some gtests incorrectly coded in namespace cudf::test (part I) (#11917) @davidwendt
  • Refactor pad/zfill functions for reuse with strings udf (#11914) @davidwendt
  • Add nanosecond & microsecond to DatetimeProperties (#11911) @galipremsagar
  • Pin mimesis version in setup.py. (#11906) @bdice
  • Error on ListColumn or any new unsupported column in cudf.Index (#11902) @galipremsagar
  • Add thrust output iterator fix (1805) to thrust.patch (#11900) @davidwendt
  • Relax codecov threshold diff (#11899) @galipremsagar
  • Use public APIs in STREAM_COMPACTION_NVBENCH (#11892) @GregoryKimball
  • Add coverage for string UDF tests. (#11891) @vyasr
  • Provide data_chunk_source wrapper for datasource (#11886) @upsj
  • Handle multibyte_split byte_range out-of-bounds offsets on host (#11885) @upsj
  • Add tests ensuring that cudf's default stream is always used (#11875) @vyasr
  • Change expect_strings_empty into expect_column_empty libcudf test utility (#11873) @davidwendt
  • Add ngroup (#11871) @shwina
  • Reduce memory usage in nested JSON parser - tree generation (#11864) @karthikeyann
  • Unpin dask and distributed for development (#11859) @galipremsagar
  • Remove unused includes for table/row_operators (#11857) @GregoryKimball
  • Use conda-forge's pyorc (#11855) @jakirkham
  • Add libcudf strings examples (#11849) @davidwendt
  • Remove cudf_io namespace alias (#11827) @vuule
  • Test/remove thrust vector usage (#11813) @vyasr
  • Add BGZIP reader to python read_text (#11802) @upsj
  • Merge branch-22.10 into branch-22.12 (#11801) @davidwendt
  • Fix compile warning from CUDF_FUNC_RANGE in a member function (#11798) @davidwendt
  • Update cudf JNI version to 22.12.0-SNAPSHOT (#11764) @pxLi
  • Update flake8 to 5.0.4 and use flake8-force to check Cython. (#11736) @bdice
  • Add BGZIP multibyte_split benchmark (#11723) @upsj
  • Bifurcate Dependency Lists (#11674) @bdice
  • Default to equal NaNs in make_collect_set_aggregation. (#11621) @bdice
  • Conform "bench_isin" to match generator column names (#11549) @GregoryKimball
  • Removing int8 column option from parquet byte_array writing (#11539) @hyperbolic2346
  • Add checks for HLG layers in dask-cudf groupby tests (#10853) @charlesbluca
  • part1: Simplify BaseIndex to an abstract class (#10389) @skirui-source
  • Make all nvcc warnings into errors (#8916) @trxcllnt
cudf - v22.12.00

Published by GPUtester almost 2 years ago

🚨 Breaking Changes

  • Add JNI for substring without 'end' parameter. (#12113) @firestarman
  • Refactor purge_nonempty_nulls (#12111) @ttnghia
  • Create an int8 column in read_csv when all elements are missing (#12110) @vuule
  • Throw an error when libcudf is built without cuFile and LIBCUDF_CUFILE_POLICY is set to "ALWAYS" (#12080) @vuule
  • Fix type promotion edge cases in numerical binops (#12074) @wence-
  • Reduce/Remove reliance on **kwargs and *args in IO readers & writers (#12025) @galipremsagar
  • Rollback of DeviceBufferLike (#12009) @madsbk
  • Remove unused managed_allocator (#12005) @vyasr
  • Pass column names to write_csv instead of table_metadata pointer (#11972) @vuule
  • Accept const refs instead of const unique_ptr refs in reduce and scan APIs. (#11960) @vyasr
  • Default to equal NaNs in make_merge_sets_aggregation. (#11952) @bdice
  • Remove validation that requires introspection (#11938) @vyasr
  • Trim quotes for non-string values in nested json parsing (#11898) @karthikeyann
  • Add tests ensuring that cudf's default stream is always used (#11875) @vyasr
  • Support nested types as groupby keys in libcudf (#11792) @PointKernel
  • Default to equal NaNs in make_collect_set_aggregation. (#11621) @bdice
  • Removing int8 column option from parquet byte_array writing (#11539) @hyperbolic2346
  • part1: Simplify BaseIndex to an abstract class (#10389) @skirui-source

πŸ› Bug Fixes

  • Fix include line for IO Cython modules (#12250) @vyasr
  • Make dask pinning looser (#12231) @vyasr
  • Workaround for CUB segmented-sort bug with boolean keys (#12217) @davidwendt
  • Fix from_dict backend dispatch to match upstream dask (#12203) @galipremsagar
  • Merge branch-22.10 into branch-22.12 (#12198) @davidwendt
  • Fix compression in ORC writer (#12194) @vuule
  • Don't use CMake 3.25.0 as it has a show stopping FindCUDAToolkit bug (#12188) @robertmaynard
  • Fix data corruption when reading ORC files with empty stripes (#12160) @vuule
  • Fix decimal binary operations (#12142) @galipremsagar
  • Ensure dlpack include is provided to cudf interop lib (#12139) @robertmaynard
  • Safely allocate udf_string pointers in strings_udf (#12138) @brandon-b-miller
  • Fix/disable jitify lto (#12122) @robertmaynard
  • Fix conditional_full_join benchmark (#12121) @GregoryKimball
  • Fix regex working-memory-size refactor error (#12119) @davidwendt
  • Add in negative size checks for columns (#12118) @revans2
  • Add JNI for substring without 'end' parameter. (#12113) @firestarman
  • Fix reading of CSV files with blank second row (#12098) @vuule
  • Fix an error in IO with GzipFile type (#12085) @galipremsagar
  • Workaround groupby aggregate thrust::copy_if overflow (#12079) @davidwendt
  • Fix alignment of compressed blocks in ORC writer (#12077) @vuule
  • Fix singleton-range __setitem__ edge case (#12075) @wence-
  • Fix type promotion edge cases in numerical binops (#12074) @wence-
  • Force using old fmt in nvbench. (#12067) @vyasr
  • Fixes List offset bug in Nested JSON reader (#12060) @karthikeyann
  • Allow falling back to shim_60.ptx by default in strings_udf (#12056) @brandon-b-miller
  • Force black exclusions for pre-commit. (#12036) @bdice
  • Add memory_usage & items implementation for Struct column & dtype (#12033) @galipremsagar
  • Reduce/Remove reliance on **kwargs and *args in IO readers & writers (#12025) @galipremsagar
  • Fixes bug in csv_reader_options construction in cython (#12021) @karthikeyann
  • Fix issues when both usecols and names options are used in read_csv (#12018) @vuule
  • Port thrust's pinned_allocator to cudf, since Thrust 1.17 removes the type (#12004) @robertmaynard
  • Revert "Replace most of preprocessor usage in nvcomp adapter with constexpr" (#11999) @vuule
  • Fix bug where df.loc resulting in single row could give wrong index (#11998) @eriknw
  • Switch to DISABLE_DEPRECATION_WARNINGS to match other RAPIDS projects (#11989) @robertmaynard
  • Fix maximum page size estimate in Parquet writer (#11962) @vuule
  • Fix local offset handling in bgzip reader (#11918) @upsj
  • Fix an issue reading struct-of-list types in Parquet. (#11910) @nvdbaranec
  • Fix memcheck error in TypeInference.Timestamp gtest (#11905) @davidwendt
  • Fix type casting in Series.setitem (#11904) @wence-
  • Fix memcheck error in get_dremel_data (#11903) @davidwendt
  • Fixes Unsupported column type error due to empty list columns in Nested JSON reader (#11897) @karthikeyann
  • Fix segmented-sort to ignore indices outside the offsets (#11888) @davidwendt
  • Fix cudf::stable_sorted_order for NaN and -NaN in FLOAT64 columns (#11874) @davidwendt
  • Fix writing of Parquet files with many fragments (#11869) @etseidl
  • Fix RangeIndex unary operators. (#11868) @vyasr
  • JNI Avoid NPE for reading host binary data (#11865) @revans2
  • Fix decimal benchmark input data generation (#11863) @karthikeyann
  • Fix pre-commit copyright check (#11860) @galipremsagar
  • Fix Parquet support for seconds and milliseconds duration types (#11854) @vuule
  • Ensure better compiler cache results between cudf cal-ver branches (#11835) @robertmaynard
  • Fix make_column_from_scalar for all-null strings column (#11807) @davidwendt
  • Tell jitify_preprocess where to search for libnvrtc (#11787) @robertmaynard
  • add V2 page header support to parquet reader (#11778) @etseidl
  • Parquet reader: bug fix for a num_rows/skip_rows corner case, w/optimization for nested preprocessing (#11752) @nvdbaranec
  • Determine if Arrow has S3 support at runtime in unit test. (#11560) @bdice

πŸ“– Documentation

  • Use rapidsai CODE_OF_CONDUCT.md (#12166) @bdice
  • Add symlinks to notebooks. (#12128) @bdice
  • Add truncate API to python doc pages (#12109) @galipremsagar
  • Update Numba docs links. (#12107) @bdice
  • Remove "Multi-GPU with Dask-cuDF" notebook. (#12095) @bdice
  • Fix link to c++ developer guide from CONTRIBUTING.md (#12084) @brandon-b-miller
  • Add pivot_table and crosstab to docs. (#12014) @bdice
  • Fix doxygen text for cudf::dictionary::encode (#11991) @davidwendt
  • Replace default_stream_value with get_default_stream in docs. (#11985) @vyasr
  • Add dtype docs pages and docstrings for cudf specific dtypes (#11974) @galipremsagar
  • Update Unit Testing in libcudf guidelines to code tests outside the cudf::test namespace (#11959) @davidwendt
  • Rename libcudf++ to libcudf. (#11953) @bdice
  • Fix documentation referring to removed as_gpu_matrix method. (#11937) @bdice
  • Remove "experimental" warning for struct columns in ORC reader and writer (#11880) @vuule
  • Initial draft of policies and guidelines for libcudf usage. (#11853) @vyasr
  • Add clear indication of non-GPU accelerated parameters in read_json docstring (#11825) @GregoryKimball
  • Add developer docs for writing tests (#11199) @vyasr

πŸš€ New Features

  • Adds an EventHandler to Java MemoryBuffer to be invoked on close (#12125) @abellina
  • Support + in strings_udf (#12117) @brandon-b-miller
  • Support upper and lower in strings_udf (#12099) @brandon-b-miller
  • Add wheel builds (#12096) @vyasr
  • Allow setting malloc heap size in string udfs (#12094) @brandon-b-miller
  • Support strip, lstrip, and rstrip in strings_udf (#12091) @brandon-b-miller
  • Mark nvcomp zstd compression stable (#12059) @jbrennan333
  • Add debug-only onAllocated/onDeallocated to RmmEventHandler (#12054) @abellina
  • Enable building against the libarrow contained in pyarrow (#12034) @vyasr
  • Add strings like jni and native method (#12032) @cindyyuanjiang
  • Cleanup common parsing code in JSON, CSV reader (#12022) @karthikeyann
  • byte_range support for JSON Lines format (#12017) @karthikeyann
  • Minor cleanup of root CMakeLists.txt for better organization (#11988) @robertmaynard
  • Add inplace arithmetic operators to MaskedType (#11987) @brandon-b-miller
  • Implement JNI for chunked Parquet reader (#11961) @ttnghia
  • Add method argument to DataFrame.quantile (#11957) @rjzamora
  • Add gpu memory watermark apis to JNI (#11950) @abellina
  • Adds retryCount to RmmEventHandler.onAllocFailure (#11940) @abellina
  • Enable returning string data from UDFs used through apply (#11933) @brandon-b-miller
  • Switch over to rapids-cmake patches for thrust (#11921) @robertmaynard
  • Add strings udf C++ classes and functions for phase II (#11912) @davidwendt
  • Trim quotes for non-string values in nested json parsing (#11898) @karthikeyann
  • Enable CEC for strings_udf (#11884) @brandon-b-miller
  • ArrowIPCTableWriter writes en empty batch in the case of an empty table. (#11883) @firestarman
  • Implement chunked Parquet reader (#11867) @ttnghia
  • Add read_orc_metadata to libcudf (#11815) @vuule
  • Support nested types as groupby keys in libcudf (#11792) @PointKernel
  • Adding feature Truncate to DataFrame and Series (#11435) @VamsiTallam95

πŸ› οΈ Improvements

  • Reduce number of tests marked spilling (#12197) @madsbk
  • Pin dask and distributed for release (#12165) @galipremsagar
  • Don't rely on GNU find in headers_test.sh (#12164) @wence-
  • Update cp.clip call (#12148) @quasiben
  • Enable automatic column projection in groupby().agg (#12124) @rjzamora
  • Refactor purge_nonempty_nulls (#12111) @ttnghia
  • Create an int8 column in read_csv when all elements are missing (#12110) @vuule
  • Spilling to host memory (#12106) @madsbk
  • First pass of pd.read_orc changes in tests (#12103) @galipremsagar
  • Expose engine argument in dask_cudf.read_json (#12101) @rjzamora
  • Remove CUDA 10 compatibility code. (#12088) @bdice
  • Move and update dask nigthly install in CI (#12082) @galipremsagar
  • Throw an error when libcudf is built without cuFile and LIBCUDF_CUFILE_POLICY is set to "ALWAYS" (#12080) @vuule
  • Remove macros that inspect the contents of exceptions (#12076) @vyasr
  • Fix ingest_raw_data performance issue in Nested JSON reader due to RVO (#12070) @karthikeyann
  • Remove overflow error during decimal binops (#12063) @galipremsagar
  • Change cudf::detail::tdigest to cudf::tdigest::detail (#12050) @davidwendt
  • Fix quantile gtests coded in namespace cudf::test (#12049) @davidwendt
  • Add support for DataFrame.from_dict`to_dictandSeries.to_dict` (#12048) @galipremsagar
  • Refactor Parquet reader (#12046) @ttnghia
  • Forward merge 22.10 into 22.12 (#12045) @vyasr
  • Standardize newlines at ends of files. (#12042) @bdice
  • Trim trailing whitespace from all files. (#12041) @bdice
  • Use nosync policy in gather and scatter implementations. (#12038) @bdice
  • Remove smart quotes from all docstrings. (#12035) @bdice
  • Update cuda-python dependency to 11.7.1 (#12030) @galipremsagar
  • Add cython-lint to pre-commit checks. (#12020) @bdice
  • Use pragma once (#12019) @bdice
  • New GHA to add issues/prs to project board (#12016) @jarmak-nv
  • Add DataFrame.pivot_table. (#12015) @bdice
  • Rollback of DeviceBufferLike (#12009) @madsbk
  • Remove default parameters for nvtext::detail functions (#12007) @davidwendt
  • Remove default parameters for cudf::dictionary::detail functions (#12006) @davidwendt
  • Remove unused managed_allocator (#12005) @vyasr
  • Remove default parameters for cudf::strings::detail functions (#12003) @davidwendt
  • Remove unnecessary code from dask-cudf _Frame (#12001) @rjzamora
  • Ignore python docs build artifacts (#12000) @galipremsagar
  • Use rapids-cmake for google benchmark. (#11997) @vyasr
  • Leverage rapids_cython for more automated RPATH handling (#11996) @vyasr
  • Remove stale labeler (#11995) @raydouglass
  • Move protobuf compilation to CMake (#11986) @vyasr
  • Replace most of preprocessor usage in nvcomp adapter with constexpr (#11980) @vuule
  • Add missing noexcepts to column_in_metadata methods (#11973) @vyasr
  • Pass column names to write_csv instead of table_metadata pointer (#11972) @vuule
  • Accelerate libcudf segmented sort with CUB segmented sort (#11969) @davidwendt
  • Feature/remove default streams (#11967) @vyasr
  • Add pool memory resource to libcudf basic example (#11966) @davidwendt
  • Fix some libcudf calls to cudf::detail::gather (#11963) @davidwendt
  • Accept const refs instead of const unique_ptr refs in reduce and scan APIs. (#11960) @vyasr
  • Add deprecation warning for set_allocator. (#11958) @vyasr
  • Fix lists and structs gtests coded in namespace cudf::test (#11956) @davidwendt
  • Add full page indexes to Parquet writer benchmarks (#11955) @etseidl
  • Use gather-based strings factory in cudf::strings::strip (#11954) @davidwendt
  • Default to equal NaNs in make_merge_sets_aggregation. (#11952) @bdice
  • Add strip_delimiters option to read_text (#11946) @upsj
  • Refactor multibyte_split output_builder (#11945) @upsj
  • Remove validation that requires introspection (#11938) @vyasr
  • Add .str.find_multiple API (#11928) @galipremsagar
  • Add regex_program class for use with all regex APIs (#11927) @davidwendt
  • Enable backend dispatching for Dask-DataFrame creation (#11920) @rjzamora
  • Performance improvement in JSON Tree traversal (#11919) @karthikeyann
  • Fix some gtests incorrectly coded in namespace cudf::test (part I) (#11917) @davidwendt
  • Refactor pad/zfill functions for reuse with strings udf (#11914) @davidwendt
  • Add nanosecond & microsecond to DatetimeProperties (#11911) @galipremsagar
  • Pin mimesis version in setup.py. (#11906) @bdice
  • Error on ListColumn or any new unsupported column in cudf.Index (#11902) @galipremsagar
  • Add thrust output iterator fix (1805) to thrust.patch (#11900) @davidwendt
  • Relax codecov threshold diff (#11899) @galipremsagar
  • Use public APIs in STREAM_COMPACTION_NVBENCH (#11892) @GregoryKimball
  • Add coverage for string UDF tests. (#11891) @vyasr
  • Provide data_chunk_source wrapper for datasource (#11886) @upsj
  • Handle multibyte_split byte_range out-of-bounds offsets on host (#11885) @upsj
  • Add tests ensuring that cudf's default stream is always used (#11875) @vyasr
  • Change expect_strings_empty into expect_column_empty libcudf test utility (#11873) @davidwendt
  • Add ngroup (#11871) @shwina
  • Reduce memory usage in nested JSON parser - tree generation (#11864) @karthikeyann
  • Unpin dask and distributed for development (#11859) @galipremsagar
  • Remove unused includes for table/row_operators (#11857) @GregoryKimball
  • Use conda-forge's pyorc (#11855) @jakirkham
  • Add libcudf strings examples (#11849) @davidwendt
  • Remove cudf_io namespace alias (#11827) @vuule
  • Test/remove thrust vector usage (#11813) @vyasr
  • Add BGZIP reader to python read_text (#11802) @upsj
  • Merge branch-22.10 into branch-22.12 (#11801) @davidwendt
  • Fix compile warning from CUDF_FUNC_RANGE in a member function (#11798) @davidwendt
  • Update cudf JNI version to 22.12.0-SNAPSHOT (#11764) @pxLi
  • Update flake8 to 5.0.4 and use flake8-force to check Cython. (#11736) @bdice
  • Add BGZIP multibyte_split benchmark (#11723) @upsj
  • Bifurcate Dependency Lists (#11674) @bdice
  • Default to equal NaNs in make_collect_set_aggregation. (#11621) @bdice
  • Conform "bench_isin" to match generator column names (#11549) @GregoryKimball
  • Removing int8 column option from parquet byte_array writing (#11539) @hyperbolic2346
  • Add checks for HLG layers in dask-cudf groupby tests (#10853) @charlesbluca
  • part1: Simplify BaseIndex to an abstract class (#10389) @skirui-source
  • Make all nvcc warnings into errors (#8916) @trxcllnt
cudf - [NIGHTLY] v22.10.00

Published by rapids-bot[bot] almost 2 years ago

πŸ”— Links

🚨 Breaking Changes

  • Disable Zstandard decompression on nvCOMP 2.4 and Pascal GPus (#11856) @vuule
  • Disable nvCOMP DEFLATE integration (#11811) @vuule
  • Fix return type of Index.isna & Index.notna (#11769) @galipremsagar
  • Remove kwargs in read_csv & to_csv (#11762) @galipremsagar
  • Fix cudf::partition* APIs that do not return offsets for empty output table (#11709) @ttnghia
  • Fix regex negated classes to not automatically include new-lines (#11644) @davidwendt
  • Update zfill to match Python output (#11634) @davidwendt
  • Upgrade pandas to 1.5 (#11617) @galipremsagar
  • Change default value of ordered to False in CategoricalDtype (#11604) @galipremsagar
  • Move cudf::strings::findall_record to cudf::strings::findall (#11575) @davidwendt
  • Adding optional parquet reader schema (#11524) @hyperbolic2346
  • Deprecate skiprows and num_rows in read_orc (#11522) @galipremsagar
  • Remove support for skip_rows / num_rows options in the parquet reader. (#11503) @nvdbaranec
  • Drop support for skiprows and num_rows in cudf.read_parquet (#11480) @galipremsagar
  • Disable Arrow S3 support by default. (#11470) @bdice
  • Convert thrust::optional usages to std::optional (#11455) @robertmaynard
  • Remove unused is_struct trait. (#11450) @bdice
  • Refactor the Buffer class (#11447) @madsbk
  • Return empty dataframe when reading an ORC file using empty columns option (#11446) @vuule
  • Refactor pad_side and strip_type enums into side_type enum (#11438) @davidwendt
  • Remove HASH_SERIAL_MURMUR3 / serial32BitMurmurHash3 (#11383) @bdice
  • Use the new JSON parser when the experimental reader is selected (#11364) @vuule
  • Remove deprecated Series.applymap. (#11031) @bdice
  • Remove deprecated expand parameter from str.findall. (#11030) @bdice

πŸ› Bug Fixes

  • Force using old fmt in nvbench. (#12064) @vyasr
  • Update cuda-python dependency to 11.7.1 (#11994) @shwina
  • Fixes bug in temporary decompression space estimation before calling nvcomp (#11879) @abellina
  • Handle ptx file paths during strings_udf import (#11862) @galipremsagar
  • Disable Zstandard decompression on nvCOMP 2.4 and Pascal GPus (#11856) @vuule
  • Reset strings_udf CEC and solve several related issues (#11846) @brandon-b-miller
  • Fix bug in new shuffle-based groupby implementation (#11836) @rjzamora
  • Fix is_valid checks in Scalar._binaryop (#11818) @wence-
  • Fix operator NotImplemented issue with numpy (#11816) @galipremsagar
  • Disable nvCOMP DEFLATE integration (#11811) @vuule
  • Build strings_udf package with other python packages in nightlies (#11808) @brandon-b-miller
  • Revert problematic shuffle=explicit-comms changes (#11803) @rjzamora
  • Fix regex out-of-bounds write in strided rows logic (#11797) @davidwendt
  • Build cudf locally before building strings_udf conda packages in CI (#11785) @brandon-b-miller
  • Fix an issue in cudf::row_bit_count involving structs and lists at multiple levels. (#11779) @nvdbaranec
  • Fix return type of Index.isna & Index.notna (#11769) @galipremsagar
  • Fix issue with set-item incase of list and struct types (#11760) @galipremsagar
  • Ensure all libcudf APIs run on cudf's default stream (#11759) @vyasr
  • Resolve dask_cudf failures caused by upstream groupby changes (#11755) @rjzamora
  • Fix ORC string sum statistics (#11740) @vuule
  • Add strings_udf package for python 3.9 (#11730) @brandon-b-miller
  • Ensure that all tests launch kernels on cudf's default stream (#11726) @vyasr
  • Don't assume stream is a compile-time constant expression (#11725) @vyasr
  • Fix get_thrust.cmake format at patch command (#11715) @davidwendt
  • Fix cudf::partition* APIs that do not return offsets for empty output table (#11709) @ttnghia
  • Fix cudf::lists::sort_lists for NaN and Infinity values (#11703) @davidwendt
  • Modify ORC reader timestamp parsing to match the apache reader behavior (#11699) @vuule
  • Fix DataFrame.from_arrow to preserve type metadata (#11698) @galipremsagar
  • Fix compile error due to missing header (#11697) @ttnghia
  • Default to Snappy compression in to_orc when using cuDF or Dask (#11690) @vuule
  • Fix an issue related to Multindex when group_keys=True (#11689) @galipremsagar
  • Transfer correct dtype to exploded column (#11687) @wence-
  • Ignore protobuf generated files in mypy checks (#11685) @galipremsagar
  • Maintain the index name after .loc (#11677) @shwina
  • Fix issue with extracting nested column data & dtype preservation (#11671) @galipremsagar
  • Ensure that all cudf tests and benchmarks are conda env aware (#11666) @robertmaynard
  • Update to Thrust 1.17.2 to fix cub ODR issues (#11665) @robertmaynard
  • Fix multi-file remote datasource bug (#11655) @rjzamora
  • Fix invalid regex quantifier check to not include alternation (#11654) @davidwendt
  • Fix bug in device_write(): it uses an incorrect size (#11651) @madsbk
  • fixes overflows in benchmarks (#11649) @elstehle
  • Fix regex negated classes to not automatically include new-lines (#11644) @davidwendt
  • Fix compile error in benchmark nested_json.cpp (#11637) @davidwendt
  • Update zfill to match Python output (#11634) @davidwendt
  • Removed converted type for INT32 and INT64 since they do not convert (#11627) @hyperbolic2346
  • Fix host scalars construction of nested types (#11612) @galipremsagar
  • Fix compile warning in nested_json_gpu.cu (#11607) @davidwendt
  • Change default value of ordered to False in CategoricalDtype (#11604) @galipremsagar
  • Preserve order if necessary when deduping categoricals internally (#11597) @brandon-b-miller
  • Add is_timestamp test for leap second (60) (#11594) @davidwendt
  • Fix an issue with to_arrow when column name type is not a string (#11590) @galipremsagar
  • Fix exception in segmented-reduce benchmark (#11588) @davidwendt
  • Fix encode/decode of negative timestamps in ORC reader/writer (#11586) @vuule
  • Correct distribution data type in quantiles benchmark (#11584) @vuule
  • Fix multibyte_split benchmark for host buffers (#11583) @upsj
  • xfail custreamz display test for now (#11567) @shwina
  • Fix JNI for TableWithMeta to use schema_info instead of column_names (#11566) @jlowe
  • Reduce code duplication for dask & distributed nightly/stable installs (#11565) @galipremsagar
  • Fix groupby failures in dask_cudf CI (#11561) @rjzamora
  • Fix for pivot: error when 'values' is a multicharacter string (#11538) @shaswat-indian
  • find_package(cudf) + arrow9 usable with cudf build directory (#11535) @robertmaynard
  • Fixing crash when writing binary nested data in parquet (#11526) @hyperbolic2346
  • Fix for: error when assigning a value to an empty series (#11523) @shaswat-indian
  • Fix invalid results from conditional-left-anti-join in debug build (#11517) @davidwendt
  • Fix cmake error after upgrading to Arrow 9 (#11513) @ttnghia
  • Fix reverse binary operators acting on a host value and cudf.Scalar (#11512) @bdice
  • Update parquet fuzz tests to drop support for skiprows & num_rows (#11505) @galipremsagar
  • Use rapids-cmake 22.10 best practice for RAPIDS.cmake location (#11493) @robertmaynard
  • Handle some zero-sized corner cases in dlpack interop (#11449) @wence-
  • Return empty dataframe when reading an ORC file using empty columns option (#11446) @vuule
  • libcudf c++ example updated to CPM version 0.35.3 (#11417) @robertmaynard
  • Fix regex quantifier check to include capture groups (#11373) @davidwendt
  • Fix read_text when byte_range is aligned with field (#11371) @upsj
  • Fix to_timestamps truncated subsecond calculation (#11367) @davidwendt
  • column: calculate null_count before release()ing the cudf::column (#11365) @wence-

πŸ“– Documentation

  • Update guide-to-udfs notebook (#11861) @brandon-b-miller
  • Update docstring for cudf.read_text (#11799) @GregoryKimball
  • Add doc section for list & struct handling (#11770) @galipremsagar
  • Document that minimum required CMake version is now 3.23.1 (#11751) @robertmaynard
  • Update libcudf documentation build command in DOCUMENTATION.md (#11735) @davidwendt
  • Add docs for use of string data to DataFrame.apply and Series.apply and update guide to UDFs notebook (#11733) @brandon-b-miller
  • Enable more Pydocstyle rules (#11582) @bdice
  • Remove unused cpp/img folder (#11554) @davidwendt
  • Publish C++ developer docs (#11475) @vyasr
  • Fix a misalignment in cudf.get_dummies docstring (#11443) @galipremsagar
  • Update contributing doc to include links to the developer guides (#11390) @davidwendt
  • Fix table_view_base doxygen format (#11340) @davidwendt
  • Create main developer guide for Python (#11235) @vyasr
  • Add developer documentation for benchmarking (#11122) @vyasr
  • cuDF error handling document (#7917) @isVoid

πŸš€ New Features

  • Add hasNull statistic reading ability to ORC (#11747) @devavret
  • Add istitle to string UDFs (#11738) @brandon-b-miller
  • JSON Column creation in GPU (#11714) @karthikeyann
  • Adds option to take explicit nested schema for nested JSON reader (#11682) @elstehle
  • Add BGZIP data_chunk_reader (#11652) @upsj
  • Support DECIMAL order-by for RANGE window functions (#11645) @mythrocks
  • changing version of cmake to 3.23.3 (#11619) @hyperbolic2346
  • Generate unique keys table in java JNI contiguousSplitGroups (#11614) @res-life
  • Generic type casting to support the new nested JSON reader (#11613) @elstehle
  • JSON tree traversal (#11610) @karthikeyann
  • Add casting operators to masked UDFs (#11578) @brandon-b-miller
  • Adds type inference and type conversion for leaf-columns to the nested JSON parser (#11574) @elstehle
  • Add strings 'like' function (#11558) @davidwendt
  • Handle hyphen as literal for regex cclass when incomplete range (#11557) @davidwendt
  • Enable ZSTD compression in ORC and Parquet writers (#11551) @vuule
  • Adds support for json lines format to the nested JSON reader (#11534) @elstehle
  • Adding optional parquet reader schema (#11524) @hyperbolic2346
  • Adds GPU implementation of JSON-token-stream to JSON-tree (#11518) @karthikeyann
  • Add gdb pretty-printers for simple types (#11499) @upsj
  • Add create_random_column function to the data generator (#11490) @vuule
  • Add fluent API builder to data_profile (#11479) @vuule
  • Adds Nested Json benchmark (#11466) @karthikeyann
  • Convert thrust::optional usages to std::optional (#11455) @robertmaynard
  • Python API for the future experimental JSON reader (#11426) @vuule
  • Return schema info from JSON reader (#11419) @vuule
  • Add regex ASCII flag support for matching builtin character classes (#11404) @davidwendt
  • Truncate parquet column indexes (#11403) @etseidl
  • Adds the end-to-end JSON parser implementation (#11388) @elstehle
  • Use the new JSON parser when the experimental reader is selected (#11364) @vuule
  • Add placeholder for the experimental JSON reader (#11334) @vuule
  • Add read-only functions on string dtypes to DataFrame.apply and Series.apply (#11319) @brandon-b-miller
  • Added 'crosstab' and 'pivot_table' features (#11314) @shaswat-indian
  • Quickly error out when trying to build with unsupported nvcc versions (#11297) @robertmaynard
  • Adds JSON tokenizer (#11264) @elstehle
  • List lexicographic comparator (#11129) @devavret
  • Add generic type inference for cuIO (#11121) @PointKernel
  • Fully support nested types in cudf::contains (#10656) @ttnghia
  • Support nested types in lists::contains (#10548) @ttnghia

πŸ› οΈ Improvements

  • Pin dask and distributed for release (#11822) @galipremsagar
  • Add examples for Nested JSON reader (#11814) @GregoryKimball
  • Support shuffle-based groupby aggregations in dask_cudf (#11800) @rjzamora
  • Update strings udf version updater script (#11772) @galipremsagar
  • Remove kwargs in read_csv & to_csv (#11762) @galipremsagar
  • Pass dtype param to avoid pd.Series warnings (#11761) @galipremsagar
  • Enable schema_element & keep_quotes support in json reader (#11746) @galipremsagar
  • Add ability to construct ListColumn when size is None (#11745) @galipremsagar
  • Reduces memory requirements in JSON parser and adds bytes/s and peak memory usage to benchmarks (#11732) @elstehle
  • Add missing copyright headers. (#11712) @bdice
  • Fix copyright check issues in pre-commit (#11711) @bdice
  • Include decimal in supported types for range window order-by columns (#11710) @mythrocks
  • Disable very large column gtest for contiguous-split (#11706) @davidwendt
  • Drop split_out=None test from groupby.agg (#11704) @wence-
  • Use CubinLinker for CUDA Minor Version Compatibility (#11701) @gmarkall
  • Add regex capture-group parameter to auto convert to non-capture groups (#11695) @davidwendt
  • Add a __dataframe__ method to the protocol dataframe object (#11692) @rgommers
  • Special-case multibyte_split for single-byte delimiter (#11681) @upsj
  • Remove isort exclusions (#11680) @bdice
  • Refactor CSV reader benchmarks with nvbench (#11678) @PointKernel
  • Check conda recipe headers with pre-commit (#11669) @bdice
  • Remove redundant style check for clang-format. (#11668) @bdice
  • Add support for group_keys in groupby (#11659) @galipremsagar
  • Fix pandoc pinning. (#11658) @bdice
  • Revert removal of skip_rows / num_rows options from the Parquet reader. (#11657) @nvdbaranec
  • Update git metadata (#11647) @bdice
  • Call set_null_count on a returning column if null-count is known (#11646) @davidwendt
  • Fix some libcudf detail calls not passing the stream variable (#11642) @davidwendt
  • Update to mypy 0.971 (#11640) @wence-
  • Refactor strings strip functor to details header (#11635) @davidwendt
  • Fix incorrect nullCount in get_json_object (#11633) @trxcllnt
  • Simplify hostdevice_vector (#11631) @upsj
  • Refactor parquet writer benchmarks with nvbench (#11623) @PointKernel
  • Rework contains_scalar to check nulls at runtime (#11622) @davidwendt
  • Fix incorrect memory resource used in rolling temp columns (#11618) @mythrocks
  • Upgrade pandas to 1.5 (#11617) @galipremsagar
  • Move type-dispatcher calls from traits.hpp to traits.cpp (#11616) @davidwendt
  • Refactor parquet reader benchmarks with nvbench (#11611) @PointKernel
  • Forward-merge branch-22.08 to branch-22.10 (#11608) @bdice
  • Use stream in Java API. (#11601) @bdice
  • Refactors of public/detail APIs, CUDF_FUNC_RANGE, stream handling. (#11600) @bdice
  • Improve ORC writer benchmark with nvbench (#11598) @PointKernel
  • Tune multibyte_split kernel (#11587) @upsj
  • Move split_utils.cuh to strings/detail (#11585) @davidwendt
  • Fix warnings due to compiler regression with if constexpr (#11581) @ttnghia
  • Add full 24-bit dictionary support to Parquet writer (#11580) @etseidl
  • Expose "explicit-comms" option in shuffle-based dask_cudf functions (#11576) @rjzamora
  • Move cudf::strings::findall_record to cudf::strings::findall (#11575) @davidwendt
  • Refactor dask_cudf groupby to use apply_concat_apply (#11571) @rjzamora
  • Add ability to write list(struct) columns as map type in orc writer (#11568) @galipremsagar
  • Add byte_range to multibyte_split benchmark + NVBench refactor (#11562) @upsj
  • JNI support for writing binary columns in parquet (#11556) @revans2
  • Support additional dictionary bit widths in Parquet writer (#11547) @etseidl
  • Refactor string/numeric conversion utilities (#11545) @davidwendt
  • Removing unnecessary asserts in parquet tests (#11544) @hyperbolic2346
  • Clean up ORC reader benchmarks with NVBench (#11543) @PointKernel
  • Reuse MurmurHash3_32 in Parquet page data. (#11528) @bdice
  • Add hexadecimal value separators (#11527) @bdice
  • Deprecate skiprows and num_rows in read_orc (#11522) @galipremsagar
  • Struct support for NULL_EQUALS binary operation (#11520) @rwlee
  • Bump hadoop-common from 3.2.3 to 3.2.4 in /java (#11516) @dependabot[bot]
  • Fix Feather test warning. (#11511) @bdice
  • copy_range ballot_syncs to have no execution dependency (#11508) @robertmaynard
  • Upgrade to arrow-9.x (#11507) @galipremsagar
  • Remove support for skip_rows / num_rows options in the parquet reader. (#11503) @nvdbaranec
  • Single-pass multibyte_split (#11500) @upsj
  • Sanitize percentile_approx() output for empty input (#11498) @SrikarVanavasam
  • Unpin dask and distributed for development (#11492) @galipremsagar
  • Move SparkMurmurHash3_32 functor. (#11489) @bdice
  • Refactor group_nunique.cu to use nullate::DYNAMIC for reduce-by-key functor (#11482) @davidwendt
  • Drop support for skiprows and num_rows in cudf.read_parquet (#11480) @galipremsagar
  • Add reduction distinct_count benchmark (#11473) @ttnghia
  • Add groupby nunique aggregation benchmark (#11472) @ttnghia
  • Disable Arrow S3 support by default. (#11470) @bdice
  • Add groupby max aggregation benchmark (#11464) @ttnghia
  • Extract Dremel encoding code from Parquet (#11461) @vyasr
  • Add missing Thrust #includes. (#11457) @bdice
  • Make CMake hooks verbose (#11456) @vyasr
  • Control Parquet page size through Python API (#11454) @etseidl
  • Add control of Parquet column index creation to python (#11453) @etseidl
  • Remove unused is_struct trait. (#11450) @bdice
  • Refactor the Buffer class (#11447) @madsbk
  • Refactor pad_side and strip_type enums into side_type enum (#11438) @davidwendt
  • Update to Thrust 1.17.0 (#11437) @bdice
  • Add in JNI for parsing JSON data and getting the metadata back too. (#11431) @revans2
  • Convert byte_array_view to use std::byte (#11424) @hyperbolic2346
  • Deprecate unflatten_nested_columns (#11421) @SrikarVanavasam
  • Remove HASH_SERIAL_MURMUR3 / serial32BitMurmurHash3 (#11383) @bdice
  • Add Spark list hashing Java tests (#11379) @bdice
  • Move cmake to the build section. (#11376) @vyasr
  • Remove use of CUDA driver API calls from libcudf (#11370) @shwina
  • Add column constructor from device_uvector&& (#11356) @SrikarVanavasam
  • Remove unused custreamz thirdparty directory (#11343) @vyasr
  • Update jni version to 22.10.0-SNAPSHOT (#11338) @pxLi
  • Enable using upstream jitify2 (#11287) @shwina
  • Cache cudf.Scalar (#11246) @shwina
  • Remove deprecated Series.applymap. (#11031) @bdice
  • Remove deprecated expand parameter from str.findall. (#11030) @bdice
cudf - v22.10.01

Published by GPUtester almost 2 years ago

🚨 Breaking Changes

  • Disable Zstandard decompression on nvCOMP 2.4 and Pascal GPus (#11856) @vuule
  • Disable nvCOMP DEFLATE integration (#11811) @vuule
  • Fix return type of Index.isna & Index.notna (#11769) @galipremsagar
  • Remove kwargs in read_csv & to_csv (#11762) @galipremsagar
  • Fix cudf::partition* APIs that do not return offsets for empty output table (#11709) @ttnghia
  • Fix regex negated classes to not automatically include new-lines (#11644) @davidwendt
  • Update zfill to match Python output (#11634) @davidwendt
  • Upgrade pandas to 1.5 (#11617) @galipremsagar
  • Change default value of ordered to False in CategoricalDtype (#11604) @galipremsagar
  • Move cudf::strings::findall_record to cudf::strings::findall (#11575) @davidwendt
  • Adding optional parquet reader schema (#11524) @hyperbolic2346
  • Deprecate skiprows and num_rows in read_orc (#11522) @galipremsagar
  • Remove support for skip_rows / num_rows options in the parquet reader. (#11503) @nvdbaranec
  • Drop support for skiprows and num_rows in cudf.read_parquet (#11480) @galipremsagar
  • Disable Arrow S3 support by default. (#11470) @bdice
  • Convert thrust::optional usages to std::optional (#11455) @robertmaynard
  • Remove unused is_struct trait. (#11450) @bdice
  • Refactor the Buffer class (#11447) @madsbk
  • Return empty dataframe when reading an ORC file using empty columns option (#11446) @vuule
  • Refactor pad_side and strip_type enums into side_type enum (#11438) @davidwendt
  • Remove HASH_SERIAL_MURMUR3 / serial32BitMurmurHash3 (#11383) @bdice
  • Use the new JSON parser when the experimental reader is selected (#11364) @vuule
  • Remove deprecated Series.applymap. (#11031) @bdice
  • Remove deprecated expand parameter from str.findall. (#11030) @bdice

πŸ› Bug Fixes

  • Update cuda-python dependency to 11.7.1 (#11994) @shwina
  • Fixes bug in temporary decompression space estimation before calling nvcomp (#11879) @abellina
  • Handle ptx file paths during strings_udf import (#11862) @galipremsagar
  • Disable Zstandard decompression on nvCOMP 2.4 and Pascal GPus (#11856) @vuule
  • Reset strings_udf CEC and solve several related issues (#11846) @brandon-b-miller
  • Fix bug in new shuffle-based groupby implementation (#11836) @rjzamora
  • Fix is_valid checks in Scalar._binaryop (#11818) @wence-
  • Fix operator NotImplemented issue with numpy (#11816) @galipremsagar
  • Disable nvCOMP DEFLATE integration (#11811) @vuule
  • Build strings_udf package with other python packages in nightlies (#11808) @brandon-b-miller
  • Revert problematic shuffle=explicit-comms changes (#11803) @rjzamora
  • Fix regex out-of-bounds write in strided rows logic (#11797) @davidwendt
  • Build cudf locally before building strings_udf conda packages in CI (#11785) @brandon-b-miller
  • Fix an issue in cudf::row_bit_count involving structs and lists at multiple levels. (#11779) @nvdbaranec
  • Fix return type of Index.isna & Index.notna (#11769) @galipremsagar
  • Fix issue with set-item incase of list and struct types (#11760) @galipremsagar
  • Ensure all libcudf APIs run on cudf's default stream (#11759) @vyasr
  • Resolve dask_cudf failures caused by upstream groupby changes (#11755) @rjzamora
  • Fix ORC string sum statistics (#11740) @vuule
  • Add strings_udf package for python 3.9 (#11730) @brandon-b-miller
  • Ensure that all tests launch kernels on cudf's default stream (#11726) @vyasr
  • Don't assume stream is a compile-time constant expression (#11725) @vyasr
  • Fix get_thrust.cmake format at patch command (#11715) @davidwendt
  • Fix cudf::partition* APIs that do not return offsets for empty output table (#11709) @ttnghia
  • Fix cudf::lists::sort_lists for NaN and Infinity values (#11703) @davidwendt
  • Modify ORC reader timestamp parsing to match the apache reader behavior (#11699) @vuule
  • Fix DataFrame.from_arrow to preserve type metadata (#11698) @galipremsagar
  • Fix compile error due to missing header (#11697) @ttnghia
  • Default to Snappy compression in to_orc when using cuDF or Dask (#11690) @vuule
  • Fix an issue related to Multindex when group_keys=True (#11689) @galipremsagar
  • Transfer correct dtype to exploded column (#11687) @wence-
  • Ignore protobuf generated files in mypy checks (#11685) @galipremsagar
  • Maintain the index name after .loc (#11677) @shwina
  • Fix issue with extracting nested column data & dtype preservation (#11671) @galipremsagar
  • Ensure that all cudf tests and benchmarks are conda env aware (#11666) @robertmaynard
  • Update to Thrust 1.17.2 to fix cub ODR issues (#11665) @robertmaynard
  • Fix multi-file remote datasource bug (#11655) @rjzamora
  • Fix invalid regex quantifier check to not include alternation (#11654) @davidwendt
  • Fix bug in device_write(): it uses an incorrect size (#11651) @madsbk
  • fixes overflows in benchmarks (#11649) @elstehle
  • Fix regex negated classes to not automatically include new-lines (#11644) @davidwendt
  • Fix compile error in benchmark nested_json.cpp (#11637) @davidwendt
  • Update zfill to match Python output (#11634) @davidwendt
  • Removed converted type for INT32 and INT64 since they do not convert (#11627) @hyperbolic2346
  • Fix host scalars construction of nested types (#11612) @galipremsagar
  • Fix compile warning in nested_json_gpu.cu (#11607) @davidwendt
  • Change default value of ordered to False in CategoricalDtype (#11604) @galipremsagar
  • Preserve order if necessary when deduping categoricals internally (#11597) @brandon-b-miller
  • Add is_timestamp test for leap second (60) (#11594) @davidwendt
  • Fix an issue with to_arrow when column name type is not a string (#11590) @galipremsagar
  • Fix exception in segmented-reduce benchmark (#11588) @davidwendt
  • Fix encode/decode of negative timestamps in ORC reader/writer (#11586) @vuule
  • Correct distribution data type in quantiles benchmark (#11584) @vuule
  • Fix multibyte_split benchmark for host buffers (#11583) @upsj
  • xfail custreamz display test for now (#11567) @shwina
  • Fix JNI for TableWithMeta to use schema_info instead of column_names (#11566) @jlowe
  • Reduce code duplication for dask & distributed nightly/stable installs (#11565) @galipremsagar
  • Fix groupby failures in dask_cudf CI (#11561) @rjzamora
  • Fix for pivot: error when 'values' is a multicharacter string (#11538) @shaswat-indian
  • find_package(cudf) + arrow9 usable with cudf build directory (#11535) @robertmaynard
  • Fixing crash when writing binary nested data in parquet (#11526) @hyperbolic2346
  • Fix for: error when assigning a value to an empty series (#11523) @shaswat-indian
  • Fix invalid results from conditional-left-anti-join in debug build (#11517) @davidwendt
  • Fix cmake error after upgrading to Arrow 9 (#11513) @ttnghia
  • Fix reverse binary operators acting on a host value and cudf.Scalar (#11512) @bdice
  • Update parquet fuzz tests to drop support for skiprows & num_rows (#11505) @galipremsagar
  • Use rapids-cmake 22.10 best practice for RAPIDS.cmake location (#11493) @robertmaynard
  • Handle some zero-sized corner cases in dlpack interop (#11449) @wence-
  • Return empty dataframe when reading an ORC file using empty columns option (#11446) @vuule
  • libcudf c++ example updated to CPM version 0.35.3 (#11417) @robertmaynard
  • Fix regex quantifier check to include capture groups (#11373) @davidwendt
  • Fix read_text when byte_range is aligned with field (#11371) @upsj
  • Fix to_timestamps truncated subsecond calculation (#11367) @davidwendt
  • column: calculate null_count before release()ing the cudf::column (#11365) @wence-

πŸ“– Documentation

  • Update guide-to-udfs notebook (#11861) @brandon-b-miller
  • Update docstring for cudf.read_text (#11799) @GregoryKimball
  • Add doc section for list & struct handling (#11770) @galipremsagar
  • Document that minimum required CMake version is now 3.23.1 (#11751) @robertmaynard
  • Update libcudf documentation build command in DOCUMENTATION.md (#11735) @davidwendt
  • Add docs for use of string data to DataFrame.apply and Series.apply and update guide to UDFs notebook (#11733) @brandon-b-miller
  • Enable more Pydocstyle rules (#11582) @bdice
  • Remove unused cpp/img folder (#11554) @davidwendt
  • Publish C++ developer docs (#11475) @vyasr
  • Fix a misalignment in cudf.get_dummies docstring (#11443) @galipremsagar
  • Update contributing doc to include links to the developer guides (#11390) @davidwendt
  • Fix table_view_base doxygen format (#11340) @davidwendt
  • Create main developer guide for Python (#11235) @vyasr
  • Add developer documentation for benchmarking (#11122) @vyasr
  • cuDF error handling document (#7917) @isVoid

πŸš€ New Features

  • Add hasNull statistic reading ability to ORC (#11747) @devavret
  • Add istitle to string UDFs (#11738) @brandon-b-miller
  • JSON Column creation in GPU (#11714) @karthikeyann
  • Adds option to take explicit nested schema for nested JSON reader (#11682) @elstehle
  • Add BGZIP data_chunk_reader (#11652) @upsj
  • Support DECIMAL order-by for RANGE window functions (#11645) @mythrocks
  • changing version of cmake to 3.23.3 (#11619) @hyperbolic2346
  • Generate unique keys table in java JNI contiguousSplitGroups (#11614) @res-life
  • Generic type casting to support the new nested JSON reader (#11613) @elstehle
  • JSON tree traversal (#11610) @karthikeyann
  • Add casting operators to masked UDFs (#11578) @brandon-b-miller
  • Adds type inference and type conversion for leaf-columns to the nested JSON parser (#11574) @elstehle
  • Add strings 'like' function (#11558) @davidwendt
  • Handle hyphen as literal for regex cclass when incomplete range (#11557) @davidwendt
  • Enable ZSTD compression in ORC and Parquet writers (#11551) @vuule
  • Adds support for json lines format to the nested JSON reader (#11534) @elstehle
  • Adding optional parquet reader schema (#11524) @hyperbolic2346
  • Adds GPU implementation of JSON-token-stream to JSON-tree (#11518) @karthikeyann
  • Add gdb pretty-printers for simple types (#11499) @upsj
  • Add create_random_column function to the data generator (#11490) @vuule
  • Add fluent API builder to data_profile (#11479) @vuule
  • Adds Nested Json benchmark (#11466) @karthikeyann
  • Convert thrust::optional usages to std::optional (#11455) @robertmaynard
  • Python API for the future experimental JSON reader (#11426) @vuule
  • Return schema info from JSON reader (#11419) @vuule
  • Add regex ASCII flag support for matching builtin character classes (#11404) @davidwendt
  • Truncate parquet column indexes (#11403) @etseidl
  • Adds the end-to-end JSON parser implementation (#11388) @elstehle
  • Use the new JSON parser when the experimental reader is selected (#11364) @vuule
  • Add placeholder for the experimental JSON reader (#11334) @vuule
  • Add read-only functions on string dtypes to DataFrame.apply and Series.apply (#11319) @brandon-b-miller
  • Added 'crosstab' and 'pivot_table' features (#11314) @shaswat-indian
  • Quickly error out when trying to build with unsupported nvcc versions (#11297) @robertmaynard
  • Adds JSON tokenizer (#11264) @elstehle
  • List lexicographic comparator (#11129) @devavret
  • Add generic type inference for cuIO (#11121) @PointKernel
  • Fully support nested types in cudf::contains (#10656) @ttnghia
  • Support nested types in lists::contains (#10548) @ttnghia

πŸ› οΈ Improvements

  • Pin dask and distributed for release (#11822) @galipremsagar
  • Add examples for Nested JSON reader (#11814) @GregoryKimball
  • Support shuffle-based groupby aggregations in dask_cudf (#11800) @rjzamora
  • Update strings udf version updater script (#11772) @galipremsagar
  • Remove kwargs in read_csv & to_csv (#11762) @galipremsagar
  • Pass dtype param to avoid pd.Series warnings (#11761) @galipremsagar
  • Enable schema_element & keep_quotes support in json reader (#11746) @galipremsagar
  • Add ability to construct ListColumn when size is None (#11745) @galipremsagar
  • Reduces memory requirements in JSON parser and adds bytes/s and peak memory usage to benchmarks (#11732) @elstehle
  • Add missing copyright headers. (#11712) @bdice
  • Fix copyright check issues in pre-commit (#11711) @bdice
  • Include decimal in supported types for range window order-by columns (#11710) @mythrocks
  • Disable very large column gtest for contiguous-split (#11706) @davidwendt
  • Drop split_out=None test from groupby.agg (#11704) @wence-
  • Use CubinLinker for CUDA Minor Version Compatibility (#11701) @gmarkall
  • Add regex capture-group parameter to auto convert to non-capture groups (#11695) @davidwendt
  • Add a __dataframe__ method to the protocol dataframe object (#11692) @rgommers
  • Special-case multibyte_split for single-byte delimiter (#11681) @upsj
  • Remove isort exclusions (#11680) @bdice
  • Refactor CSV reader benchmarks with nvbench (#11678) @PointKernel
  • Check conda recipe headers with pre-commit (#11669) @bdice
  • Remove redundant style check for clang-format. (#11668) @bdice
  • Add support for group_keys in groupby (#11659) @galipremsagar
  • Fix pandoc pinning. (#11658) @bdice
  • Revert removal of skip_rows / num_rows options from the Parquet reader. (#11657) @nvdbaranec
  • Update git metadata (#11647) @bdice
  • Call set_null_count on a returning column if null-count is known (#11646) @davidwendt
  • Fix some libcudf detail calls not passing the stream variable (#11642) @davidwendt
  • Update to mypy 0.971 (#11640) @wence-
  • Refactor strings strip functor to details header (#11635) @davidwendt
  • Fix incorrect nullCount in get_json_object (#11633) @trxcllnt
  • Simplify hostdevice_vector (#11631) @upsj
  • Refactor parquet writer benchmarks with nvbench (#11623) @PointKernel
  • Rework contains_scalar to check nulls at runtime (#11622) @davidwendt
  • Fix incorrect memory resource used in rolling temp columns (#11618) @mythrocks
  • Upgrade pandas to 1.5 (#11617) @galipremsagar
  • Move type-dispatcher calls from traits.hpp to traits.cpp (#11616) @davidwendt
  • Refactor parquet reader benchmarks with nvbench (#11611) @PointKernel
  • Forward-merge branch-22.08 to branch-22.10 (#11608) @bdice
  • Use stream in Java API. (#11601) @bdice
  • Refactors of public/detail APIs, CUDF_FUNC_RANGE, stream handling. (#11600) @bdice
  • Improve ORC writer benchmark with nvbench (#11598) @PointKernel
  • Tune multibyte_split kernel (#11587) @upsj
  • Move split_utils.cuh to strings/detail (#11585) @davidwendt
  • Fix warnings due to compiler regression with if constexpr (#11581) @ttnghia
  • Add full 24-bit dictionary support to Parquet writer (#11580) @etseidl
  • Expose "explicit-comms" option in shuffle-based dask_cudf functions (#11576) @rjzamora
  • Move cudf::strings::findall_record to cudf::strings::findall (#11575) @davidwendt
  • Refactor dask_cudf groupby to use apply_concat_apply (#11571) @rjzamora
  • Add ability to write list(struct) columns as map type in orc writer (#11568) @galipremsagar
  • Add byte_range to multibyte_split benchmark + NVBench refactor (#11562) @upsj
  • JNI support for writing binary columns in parquet (#11556) @revans2
  • Support additional dictionary bit widths in Parquet writer (#11547) @etseidl
  • Refactor string/numeric conversion utilities (#11545) @davidwendt
  • Removing unnecessary asserts in parquet tests (#11544) @hyperbolic2346
  • Clean up ORC reader benchmarks with NVBench (#11543) @PointKernel
  • Reuse MurmurHash3_32 in Parquet page data. (#11528) @bdice
  • Add hexadecimal value separators (#11527) @bdice
  • Deprecate skiprows and num_rows in read_orc (#11522) @galipremsagar
  • Struct support for NULL_EQUALS binary operation (#11520) @rwlee
  • Bump hadoop-common from 3.2.3 to 3.2.4 in /java (#11516) @dependabot[bot]
  • Fix Feather test warning. (#11511) @bdice
  • copy_range ballot_syncs to have no execution dependency (#11508) @robertmaynard
  • Upgrade to arrow-9.x (#11507) @galipremsagar
  • Remove support for skip_rows / num_rows options in the parquet reader. (#11503) @nvdbaranec
  • Single-pass multibyte_split (#11500) @upsj
  • Sanitize percentile_approx() output for empty input (#11498) @SrikarVanavasam
  • Unpin dask and distributed for development (#11492) @galipremsagar
  • Move SparkMurmurHash3_32 functor. (#11489) @bdice
  • Refactor group_nunique.cu to use nullate::DYNAMIC for reduce-by-key functor (#11482) @davidwendt
  • Drop support for skiprows and num_rows in cudf.read_parquet (#11480) @galipremsagar
  • Add reduction distinct_count benchmark (#11473) @ttnghia
  • Add groupby nunique aggregation benchmark (#11472) @ttnghia
  • Disable Arrow S3 support by default. (#11470) @bdice
  • Add groupby max aggregation benchmark (#11464) @ttnghia
  • Extract Dremel encoding code from Parquet (#11461) @vyasr
  • Add missing Thrust #includes. (#11457) @bdice
  • Make CMake hooks verbose (#11456) @vyasr
  • Control Parquet page size through Python API (#11454) @etseidl
  • Add control of Parquet column index creation to python (#11453) @etseidl
  • Remove unused is_struct trait. (#11450) @bdice
  • Refactor the Buffer class (#11447) @madsbk
  • Refactor pad_side and strip_type enums into side_type enum (#11438) @davidwendt
  • Update to Thrust 1.17.0 (#11437) @bdice
  • Add in JNI for parsing JSON data and getting the metadata back too. (#11431) @revans2
  • Convert byte_array_view to use std::byte (#11424) @hyperbolic2346
  • Deprecate unflatten_nested_columns (#11421) @SrikarVanavasam
  • Remove HASH_SERIAL_MURMUR3 / serial32BitMurmurHash3 (#11383) @bdice
  • Add Spark list hashing Java tests (#11379) @bdice
  • Move cmake to the build section. (#11376) @vyasr
  • Remove use of CUDA driver API calls from libcudf (#11370) @shwina
  • Add column constructor from device_uvector&& (#11356) @SrikarVanavasam
  • Remove unused custreamz thirdparty directory (#11343) @vyasr
  • Update jni version to 22.10.0-SNAPSHOT (#11338) @pxLi
  • Enable using upstream jitify2 (#11287) @shwina
  • Cache cudf.Scalar (#11246) @shwina
  • Remove deprecated Series.applymap. (#11031) @bdice
  • Remove deprecated expand parameter from str.findall. (#11030) @bdice
cudf - v22.10.00

Published by GPUtester about 2 years ago

🚨 Breaking Changes

  • Disable Zstandard decompression on nvCOMP 2.4 and Pascal GPus (#11856) @vuule
  • Disable nvCOMP DEFLATE integration (#11811) @vuule
  • Fix return type of Index.isna & Index.notna (#11769) @galipremsagar
  • Remove kwargs in read_csv & to_csv (#11762) @galipremsagar
  • Fix cudf::partition* APIs that do not return offsets for empty output table (#11709) @ttnghia
  • Fix regex negated classes to not automatically include new-lines (#11644) @davidwendt
  • Update zfill to match Python output (#11634) @davidwendt
  • Upgrade pandas to 1.5 (#11617) @galipremsagar
  • Change default value of ordered to False in CategoricalDtype (#11604) @galipremsagar
  • Move cudf::strings::findall_record to cudf::strings::findall (#11575) @davidwendt
  • Adding optional parquet reader schema (#11524) @hyperbolic2346
  • Deprecate skiprows and num_rows in read_orc (#11522) @galipremsagar
  • Remove support for skip_rows / num_rows options in the parquet reader. (#11503) @nvdbaranec
  • Drop support for skiprows and num_rows in cudf.read_parquet (#11480) @galipremsagar
  • Disable Arrow S3 support by default. (#11470) @bdice
  • Convert thrust::optional usages to std::optional (#11455) @robertmaynard
  • Remove unused is_struct trait. (#11450) @bdice
  • Refactor the Buffer class (#11447) @madsbk
  • Return empty dataframe when reading an ORC file using empty columns option (#11446) @vuule
  • Refactor pad_side and strip_type enums into side_type enum (#11438) @davidwendt
  • Remove HASH_SERIAL_MURMUR3 / serial32BitMurmurHash3 (#11383) @bdice
  • Use the new JSON parser when the experimental reader is selected (#11364) @vuule
  • Remove deprecated Series.applymap. (#11031) @bdice
  • Remove deprecated expand parameter from str.findall. (#11030) @bdice

πŸ› Bug Fixes

  • Fixes bug in temporary decompression space estimation before calling nvcomp (#11879) @abellina
  • Handle ptx file paths during strings_udf import (#11862) @galipremsagar
  • Disable Zstandard decompression on nvCOMP 2.4 and Pascal GPus (#11856) @vuule
  • Reset strings_udf CEC and solve several related issues (#11846) @brandon-b-miller
  • Fix bug in new shuffle-based groupby implementation (#11836) @rjzamora
  • Fix is_valid checks in Scalar._binaryop (#11818) @wence-
  • Fix operator NotImplemented issue with numpy (#11816) @galipremsagar
  • Disable nvCOMP DEFLATE integration (#11811) @vuule
  • Build strings_udf package with other python packages in nightlies (#11808) @brandon-b-miller
  • Revert problematic shuffle=explicit-comms changes (#11803) @rjzamora
  • Fix regex out-of-bounds write in strided rows logic (#11797) @davidwendt
  • Build cudf locally before building strings_udf conda packages in CI (#11785) @brandon-b-miller
  • Fix an issue in cudf::row_bit_count involving structs and lists at multiple levels. (#11779) @nvdbaranec
  • Fix return type of Index.isna & Index.notna (#11769) @galipremsagar
  • Fix issue with set-item incase of list and struct types (#11760) @galipremsagar
  • Ensure all libcudf APIs run on cudf's default stream (#11759) @vyasr
  • Resolve dask_cudf failures caused by upstream groupby changes (#11755) @rjzamora
  • Fix ORC string sum statistics (#11740) @vuule
  • Add strings_udf package for python 3.9 (#11730) @brandon-b-miller
  • Ensure that all tests launch kernels on cudf's default stream (#11726) @vyasr
  • Don't assume stream is a compile-time constant expression (#11725) @vyasr
  • Fix get_thrust.cmake format at patch command (#11715) @davidwendt
  • Fix cudf::partition* APIs that do not return offsets for empty output table (#11709) @ttnghia
  • Fix cudf::lists::sort_lists for NaN and Infinity values (#11703) @davidwendt
  • Modify ORC reader timestamp parsing to match the apache reader behavior (#11699) @vuule
  • Fix DataFrame.from_arrow to preserve type metadata (#11698) @galipremsagar
  • Fix compile error due to missing header (#11697) @ttnghia
  • Default to Snappy compression in to_orc when using cuDF or Dask (#11690) @vuule
  • Fix an issue related to Multindex when group_keys=True (#11689) @galipremsagar
  • Transfer correct dtype to exploded column (#11687) @wence-
  • Ignore protobuf generated files in mypy checks (#11685) @galipremsagar
  • Maintain the index name after .loc (#11677) @shwina
  • Fix issue with extracting nested column data & dtype preservation (#11671) @galipremsagar
  • Ensure that all cudf tests and benchmarks are conda env aware (#11666) @robertmaynard
  • Update to Thrust 1.17.2 to fix cub ODR issues (#11665) @robertmaynard
  • Fix multi-file remote datasource bug (#11655) @rjzamora
  • Fix invalid regex quantifier check to not include alternation (#11654) @davidwendt
  • Fix bug in device_write(): it uses an incorrect size (#11651) @madsbk
  • fixes overflows in benchmarks (#11649) @elstehle
  • Fix regex negated classes to not automatically include new-lines (#11644) @davidwendt
  • Fix compile error in benchmark nested_json.cpp (#11637) @davidwendt
  • Update zfill to match Python output (#11634) @davidwendt
  • Removed converted type for INT32 and INT64 since they do not convert (#11627) @hyperbolic2346
  • Fix host scalars construction of nested types (#11612) @galipremsagar
  • Fix compile warning in nested_json_gpu.cu (#11607) @davidwendt
  • Change default value of ordered to False in CategoricalDtype (#11604) @galipremsagar
  • Preserve order if necessary when deduping categoricals internally (#11597) @brandon-b-miller
  • Add is_timestamp test for leap second (60) (#11594) @davidwendt
  • Fix an issue with to_arrow when column name type is not a string (#11590) @galipremsagar
  • Fix exception in segmented-reduce benchmark (#11588) @davidwendt
  • Fix encode/decode of negative timestamps in ORC reader/writer (#11586) @vuule
  • Correct distribution data type in quantiles benchmark (#11584) @vuule
  • Fix multibyte_split benchmark for host buffers (#11583) @upsj
  • xfail custreamz display test for now (#11567) @shwina
  • Fix JNI for TableWithMeta to use schema_info instead of column_names (#11566) @jlowe
  • Reduce code duplication for dask & distributed nightly/stable installs (#11565) @galipremsagar
  • Fix groupby failures in dask_cudf CI (#11561) @rjzamora
  • Fix for pivot: error when 'values' is a multicharacter string (#11538) @shaswat-indian
  • find_package(cudf) + arrow9 usable with cudf build directory (#11535) @robertmaynard
  • Fixing crash when writing binary nested data in parquet (#11526) @hyperbolic2346
  • Fix for: error when assigning a value to an empty series (#11523) @shaswat-indian
  • Fix invalid results from conditional-left-anti-join in debug build (#11517) @davidwendt
  • Fix cmake error after upgrading to Arrow 9 (#11513) @ttnghia
  • Fix reverse binary operators acting on a host value and cudf.Scalar (#11512) @bdice
  • Update parquet fuzz tests to drop support for skiprows & num_rows (#11505) @galipremsagar
  • Use rapids-cmake 22.10 best practice for RAPIDS.cmake location (#11493) @robertmaynard
  • Handle some zero-sized corner cases in dlpack interop (#11449) @wence-
  • Return empty dataframe when reading an ORC file using empty columns option (#11446) @vuule
  • libcudf c++ example updated to CPM version 0.35.3 (#11417) @robertmaynard
  • Fix regex quantifier check to include capture groups (#11373) @davidwendt
  • Fix read_text when byte_range is aligned with field (#11371) @upsj
  • Fix to_timestamps truncated subsecond calculation (#11367) @davidwendt
  • column: calculate null_count before release()ing the cudf::column (#11365) @wence-

πŸ“– Documentation

  • Update guide-to-udfs notebook (#11861) @brandon-b-miller
  • Update docstring for cudf.read_text (#11799) @GregoryKimball
  • Add doc section for list & struct handling (#11770) @galipremsagar
  • Document that minimum required CMake version is now 3.23.1 (#11751) @robertmaynard
  • Update libcudf documentation build command in DOCUMENTATION.md (#11735) @davidwendt
  • Add docs for use of string data to DataFrame.apply and Series.apply and update guide to UDFs notebook (#11733) @brandon-b-miller
  • Enable more Pydocstyle rules (#11582) @bdice
  • Remove unused cpp/img folder (#11554) @davidwendt
  • Publish C++ developer docs (#11475) @vyasr
  • Fix a misalignment in cudf.get_dummies docstring (#11443) @galipremsagar
  • Update contributing doc to include links to the developer guides (#11390) @davidwendt
  • Fix table_view_base doxygen format (#11340) @davidwendt
  • Create main developer guide for Python (#11235) @vyasr
  • Add developer documentation for benchmarking (#11122) @vyasr
  • cuDF error handling document (#7917) @isVoid

πŸš€ New Features

  • Add hasNull statistic reading ability to ORC (#11747) @devavret
  • Add istitle to string UDFs (#11738) @brandon-b-miller
  • JSON Column creation in GPU (#11714) @karthikeyann
  • Adds option to take explicit nested schema for nested JSON reader (#11682) @elstehle
  • Add BGZIP data_chunk_reader (#11652) @upsj
  • Support DECIMAL order-by for RANGE window functions (#11645) @mythrocks
  • changing version of cmake to 3.23.3 (#11619) @hyperbolic2346
  • Generate unique keys table in java JNI contiguousSplitGroups (#11614) @res-life
  • Generic type casting to support the new nested JSON reader (#11613) @elstehle
  • JSON tree traversal (#11610) @karthikeyann
  • Add casting operators to masked UDFs (#11578) @brandon-b-miller
  • Adds type inference and type conversion for leaf-columns to the nested JSON parser (#11574) @elstehle
  • Add strings 'like' function (#11558) @davidwendt
  • Handle hyphen as literal for regex cclass when incomplete range (#11557) @davidwendt
  • Enable ZSTD compression in ORC and Parquet writers (#11551) @vuule
  • Adds support for json lines format to the nested JSON reader (#11534) @elstehle
  • Adding optional parquet reader schema (#11524) @hyperbolic2346
  • Adds GPU implementation of JSON-token-stream to JSON-tree (#11518) @karthikeyann
  • Add gdb pretty-printers for simple types (#11499) @upsj
  • Add create_random_column function to the data generator (#11490) @vuule
  • Add fluent API builder to data_profile (#11479) @vuule
  • Adds Nested Json benchmark (#11466) @karthikeyann
  • Convert thrust::optional usages to std::optional (#11455) @robertmaynard
  • Python API for the future experimental JSON reader (#11426) @vuule
  • Return schema info from JSON reader (#11419) @vuule
  • Add regex ASCII flag support for matching builtin character classes (#11404) @davidwendt
  • Truncate parquet column indexes (#11403) @etseidl
  • Adds the end-to-end JSON parser implementation (#11388) @elstehle
  • Use the new JSON parser when the experimental reader is selected (#11364) @vuule
  • Add placeholder for the experimental JSON reader (#11334) @vuule
  • Add read-only functions on string dtypes to DataFrame.apply and Series.apply (#11319) @brandon-b-miller
  • Added 'crosstab' and 'pivot_table' features (#11314) @shaswat-indian
  • Quickly error out when trying to build with unsupported nvcc versions (#11297) @robertmaynard
  • Adds JSON tokenizer (#11264) @elstehle
  • List lexicographic comparator (#11129) @devavret
  • Add generic type inference for cuIO (#11121) @PointKernel
  • Fully support nested types in cudf::contains (#10656) @ttnghia
  • Support nested types in lists::contains (#10548) @ttnghia

πŸ› οΈ Improvements

  • Pin dask and distributed for release (#11822) @galipremsagar
  • Add examples for Nested JSON reader (#11814) @GregoryKimball
  • Support shuffle-based groupby aggregations in dask_cudf (#11800) @rjzamora
  • Update strings udf version updater script (#11772) @galipremsagar
  • Remove kwargs in read_csv & to_csv (#11762) @galipremsagar
  • Pass dtype param to avoid pd.Series warnings (#11761) @galipremsagar
  • Enable schema_element & keep_quotes support in json reader (#11746) @galipremsagar
  • Add ability to construct ListColumn when size is None (#11745) @galipremsagar
  • Reduces memory requirements in JSON parser and adds bytes/s and peak memory usage to benchmarks (#11732) @elstehle
  • Add missing copyright headers. (#11712) @bdice
  • Fix copyright check issues in pre-commit (#11711) @bdice
  • Include decimal in supported types for range window order-by columns (#11710) @mythrocks
  • Disable very large column gtest for contiguous-split (#11706) @davidwendt
  • Drop split_out=None test from groupby.agg (#11704) @wence-
  • Use CubinLinker for CUDA Minor Version Compatibility (#11701) @gmarkall
  • Add regex capture-group parameter to auto convert to non-capture groups (#11695) @davidwendt
  • Add a __dataframe__ method to the protocol dataframe object (#11692) @rgommers
  • Special-case multibyte_split for single-byte delimiter (#11681) @upsj
  • Remove isort exclusions (#11680) @bdice
  • Refactor CSV reader benchmarks with nvbench (#11678) @PointKernel
  • Check conda recipe headers with pre-commit (#11669) @bdice
  • Remove redundant style check for clang-format. (#11668) @bdice
  • Add support for group_keys in groupby (#11659) @galipremsagar
  • Fix pandoc pinning. (#11658) @bdice
  • Revert removal of skip_rows / num_rows options from the Parquet reader. (#11657) @nvdbaranec
  • Update git metadata (#11647) @bdice
  • Call set_null_count on a returning column if null-count is known (#11646) @davidwendt
  • Fix some libcudf detail calls not passing the stream variable (#11642) @davidwendt
  • Update to mypy 0.971 (#11640) @wence-
  • Refactor strings strip functor to details header (#11635) @davidwendt
  • Fix incorrect nullCount in get_json_object (#11633) @trxcllnt
  • Simplify hostdevice_vector (#11631) @upsj
  • Refactor parquet writer benchmarks with nvbench (#11623) @PointKernel
  • Rework contains_scalar to check nulls at runtime (#11622) @davidwendt
  • Fix incorrect memory resource used in rolling temp columns (#11618) @mythrocks
  • Upgrade pandas to 1.5 (#11617) @galipremsagar
  • Move type-dispatcher calls from traits.hpp to traits.cpp (#11616) @davidwendt
  • Refactor parquet reader benchmarks with nvbench (#11611) @PointKernel
  • Forward-merge branch-22.08 to branch-22.10 (#11608) @bdice
  • Use stream in Java API. (#11601) @bdice
  • Refactors of public/detail APIs, CUDF_FUNC_RANGE, stream handling. (#11600) @bdice
  • Improve ORC writer benchmark with nvbench (#11598) @PointKernel
  • Tune multibyte_split kernel (#11587) @upsj
  • Move split_utils.cuh to strings/detail (#11585) @davidwendt
  • Fix warnings due to compiler regression with if constexpr (#11581) @ttnghia
  • Add full 24-bit dictionary support to Parquet writer (#11580) @etseidl
  • Expose "explicit-comms" option in shuffle-based dask_cudf functions (#11576) @rjzamora
  • Move cudf::strings::findall_record to cudf::strings::findall (#11575) @davidwendt
  • Refactor dask_cudf groupby to use apply_concat_apply (#11571) @rjzamora
  • Add ability to write list(struct) columns as map type in orc writer (#11568) @galipremsagar
  • Add byte_range to multibyte_split benchmark + NVBench refactor (#11562) @upsj
  • JNI support for writing binary columns in parquet (#11556) @revans2
  • Support additional dictionary bit widths in Parquet writer (#11547) @etseidl
  • Refactor string/numeric conversion utilities (#11545) @davidwendt
  • Removing unnecessary asserts in parquet tests (#11544) @hyperbolic2346
  • Clean up ORC reader benchmarks with NVBench (#11543) @PointKernel
  • Reuse MurmurHash3_32 in Parquet page data. (#11528) @bdice
  • Add hexadecimal value separators (#11527) @bdice
  • Deprecate skiprows and num_rows in read_orc (#11522) @galipremsagar
  • Struct support for NULL_EQUALS binary operation (#11520) @rwlee
  • Bump hadoop-common from 3.2.3 to 3.2.4 in /java (#11516) @dependabot[bot]
  • Fix Feather test warning. (#11511) @bdice
  • copy_range ballot_syncs to have no execution dependency (#11508) @robertmaynard
  • Upgrade to arrow-9.x (#11507) @galipremsagar
  • Remove support for skip_rows / num_rows options in the parquet reader. (#11503) @nvdbaranec
  • Single-pass multibyte_split (#11500) @upsj
  • Sanitize percentile_approx() output for empty input (#11498) @SrikarVanavasam
  • Unpin dask and distributed for development (#11492) @galipremsagar
  • Move SparkMurmurHash3_32 functor. (#11489) @bdice
  • Refactor group_nunique.cu to use nullate::DYNAMIC for reduce-by-key functor (#11482) @davidwendt
  • Drop support for skiprows and num_rows in cudf.read_parquet (#11480) @galipremsagar
  • Add reduction distinct_count benchmark (#11473) @ttnghia
  • Add groupby nunique aggregation benchmark (#11472) @ttnghia
  • Disable Arrow S3 support by default. (#11470) @bdice
  • Add groupby max aggregation benchmark (#11464) @ttnghia
  • Extract Dremel encoding code from Parquet (#11461) @vyasr
  • Add missing Thrust #includes. (#11457) @bdice
  • Make CMake hooks verbose (#11456) @vyasr
  • Control Parquet page size through Python API (#11454) @etseidl
  • Add control of Parquet column index creation to python (#11453) @etseidl
  • Remove unused is_struct trait. (#11450) @bdice
  • Refactor the Buffer class (#11447) @madsbk
  • Refactor pad_side and strip_type enums into side_type enum (#11438) @davidwendt
  • Update to Thrust 1.17.0 (#11437) @bdice
  • Add in JNI for parsing JSON data and getting the metadata back too. (#11431) @revans2
  • Convert byte_array_view to use std::byte (#11424) @hyperbolic2346
  • Deprecate unflatten_nested_columns (#11421) @SrikarVanavasam
  • Remove HASH_SERIAL_MURMUR3 / serial32BitMurmurHash3 (#11383) @bdice
  • Add Spark list hashing Java tests (#11379) @bdice
  • Move cmake to the build section. (#11376) @vyasr
  • Remove use of CUDA driver API calls from libcudf (#11370) @shwina
  • Add column constructor from device_uvector&& (#11356) @SrikarVanavasam
  • Remove unused custreamz thirdparty directory (#11343) @vyasr
  • Update jni version to 22.10.0-SNAPSHOT (#11338) @pxLi
  • Enable using upstream jitify2 (#11287) @shwina
  • Cache cudf.Scalar (#11246) @shwina
  • Remove deprecated Series.applymap. (#11031) @bdice
  • Remove deprecated expand parameter from str.findall. (#11030) @bdice
cudf - v22.08.01

Published by GPUtester about 2 years ago

🚨 Breaking Changes

  • Pin numpy to <1.23 (#11824) @galipremsagar
  • Remove legacy join APIs (#11274) @vyasr
  • Remove lists::drop_list_duplicates (#11236) @ttnghia
  • Remove Index.replace API (#11131) @vyasr
  • Remove deprecated Index methods from Frame (#11073) @vyasr
  • Remove public API of cudf.merge_sorted. (#11032) @bdice
  • Drop python 3.7 in code-base (#11029) @galipremsagar
  • Return empty dataframe when reading a Parquet file using empty columns option (#11018) @vuule
  • Remove Arrow CUDA IPC code (#10995) @shwina
  • Buffer: make .ptr read-only (#10872) @madsbk

πŸ› Bug Fixes

  • Fix out-of-bound access in cudf::detail::label_segments (#11497) @ttnghia
  • Fix distributed error related to loop_in_thread (#11428) @galipremsagar
  • Fix atomic operations on NaN values (#11420) @ttnghia
  • Relax arrow pinning to just 8.x and remove cuda build dependency from cudf recipe (#11412) @kkraus14
  • Revert "Allow CuPy 11" (#11409) @jakirkham
  • Fix moto timeouts (#11369) @galipremsagar
  • Set +/-infinity as the identity values for floating-point numbers in device operators min and max (#11357) @ttnghia
  • Fix memory_usage() for ListSeries (#11355) @thomcom
  • Fix constructing Column from column_view with expired mask (#11354) @shwina
  • Handle parquet corner case: Columns with more rows than are in the row group. (#11353) @nvdbaranec
  • Fix DatetimeIndex & TimedeltaIndex constructors (#11342) @galipremsagar
  • Fix unsigned-compare compile warning in IntPow binops (#11339) @davidwendt
  • Fix performance issue and add a new code path to cudf::detail::contains (#11330) @ttnghia
  • Pin pytorch to temporarily unblock from libcupti errors (#11289) @galipremsagar
  • Workaround for nvcomp zstd overwriting blocks for orc due to underestimate of sizes (#11288) @jbrennan333
  • Fix inconsistency when hashing two tables in cudf::detail::contains (#11284) @ttnghia
  • Fix issue related to numpy array and category dtype (#11282) @galipremsagar
  • Add NotImplementedError when on is specified in DataFrame.join. (#11275) @vyasr
  • Fix invalid allocate_like() and empty_like() tests. (#11268) @nvdbaranec
  • Returns DataFrame When Concating Along Axis 1 (#11263) @isVoid
  • Fix compile error due to missing header (#11257) @ttnghia
  • Fix a memory aliasing/crash issue in scatter for lists. (#11254) @nvdbaranec
  • Fix tests/rolling/empty_input_test (#11238) @ttnghia
  • Fix const qualifier when using host_span<bitmask_type const*> (#11220) @ttnghia
  • Avoid using nvcompBatchedDeflateDecompressGetTempSizeEx in cuIO (#11213) @vuule
  • Generate benchmark data with correct run length regardless of cardinality (#11205) @vuule
  • Fix cumulative count index behavior (#11188) @brandon-b-miller
  • Fix assertion in dask_cudf test_struct_explode (#11170) @rjzamora
  • Provides a method for the user to remove the hook and re-register the hook in a custom shutdown hook manager (#11161) @res-life
  • Fix compatibility issues with pandas 1.4.3 (#11152) @vyasr
  • Ensure cuco export set is installed in cmake build (#11147) @jlowe
  • Avoid redundant deepcopy in cudf.from_pandas (#11142) @galipremsagar
  • Fix compile error due to missing header (#11126) @ttnghia
  • Fix __cuda_array_interface__ failures (#11113) @galipremsagar
  • Support octal and hex within regex character class pattern (#11112) @davidwendt
  • Fix split_re matching logic for word boundaries (#11106) @davidwendt
  • Handle multiple files metadata in read_parquet (#11105) @galipremsagar
  • Fix index alignment for Series objects with repeated index (#11103) @shwina
  • FindcuFile now searches in the current CUDA Toolkit location (#11101) @robertmaynard
  • Fix regex word boundary logic to include underline (#11099) @davidwendt
  • Exclude CudaFatalTest when selecting all Java tests (#11083) @jlowe
  • Fix duplicate cudatoolkit pinning issue (#11070) @galipremsagar
  • Maintain the input index in the result of a groupby-transform (#11068) @shwina
  • Fix bug with row count comparison for expect_columns_equivalent(). (#11059) @nvdbaranec
  • Fix BPE uninitialized size value for null and empty input strings (#11054) @davidwendt
  • Include missing header for usage of get_current_device_resource() (#11047) @AtlantaPepsi
  • Fix warn_unused_result error in parquet test (#11026) @karthikeyann
  • Return empty dataframe when reading a Parquet file using empty columns option (#11018) @vuule
  • Fix small error in page row count limiting (#10991) @etseidl
  • Fix a row index entry error in ORC writer issue (#10989) @vuule
  • Fix grouped covariance to require both values to be convertible to double. (#10891) @bdice

πŸ“– Documentation

  • Defer loading of custom.js (#11465) @galipremsagar
  • Fix issues with day & night modes in python docs (#11400) @galipremsagar
  • Update missing data handling APIs in docs (#11345) @galipremsagar
  • Add lists filtering APIs to doxygen group. (#11336) @bdice
  • Remove unused import in README sample (#11318) @vyasr
  • Note null behavior in where docs (#11276) @brandon-b-miller
  • Update docstring for spans in get_row_data_range (#11271) @vyasr
  • Update nvCOMP integration table (#11231) @vuule
  • Add dev docs for documentation writing (#11217) @vyasr
  • Documentation fix for concatenate (#11187) @dagardner-nv
  • Fix unresolved links in markdown (#11173) @karthikeyann
  • Fix cudf version in README.md install commands (#11164) @jvanstraten
  • Switch language from None to "en" in docs build (#11133) @galipremsagar
  • Remove docs mentioning scalar_view since no such class exists. (#11132) @bdice
  • Add docstring entry for DataFrame.value_counts (#11039) @galipremsagar
  • Add docs to rolling var, std, count. (#11035) @bdice
  • Fix docs for Numba UDFs. (#11020) @bdice
  • Replace column comparison utilities functions with macros (#11007) @karthikeyann
  • Fix Doxygen warnings in multiple headers files (#11003) @karthikeyann
  • Fix doxygen warnings in utilities/ headers (#10974) @karthikeyann
  • Fix Doxygen warnings in table header files (#10964) @karthikeyann
  • Fix Doxygen warnings in column header files (#10963) @karthikeyann
  • Fix Doxygen warnings in strings / header files (#10937) @karthikeyann
  • Generate Doxygen Tag File for Libcudf (#10932) @isVoid
  • Fix doxygen warnings in structs, lists headers (#10923) @karthikeyann
  • Fix doxygen warnings in fixed_point.hpp (#10922) @karthikeyann
  • Fix doxygen warnings in ast/, rolling, tdigest/, wrappers/, dictionary/ headers (#10921) @karthikeyann
  • fix doxygen warnings in cudf/io/types.hpp, other header files (#10913) @karthikeyann
  • fix doxygen warnings in cudf/io/ avro, csv, json, orc, parquet header files (#10912) @karthikeyann
  • Fix doxygen warnings in cudf/*.hpp (#10896) @karthikeyann
  • Add missing documentation in aggregation.hpp (#10887) @karthikeyann
  • Revise PR template. (#10774) @bdice

πŸš€ New Features

  • Change cmake to allow controlling Arrow version via cmake variable (#11429) @kkraus14
  • Adding support for list<int8> columns to be written as byte arrays in parquet (#11328) @hyperbolic2346
  • Adding byte array view structure (#11322) @hyperbolic2346
  • Adding byte_array statistics (#11303) @hyperbolic2346
  • Add column indexes to Parquet writer (#11302) @etseidl
  • Provide an Option for Default Integer and Floating Bitwidth (#11272) @isVoid
  • FST benchmark (#11243) @karthikeyann
  • Adds the Finite-State Transducer algorithm (#11242) @elstehle
  • Refactor collect_set to use cudf::distinct and cudf::lists::distinct (#11228) @ttnghia
  • Treat zstd as stable in nvcomp releases 2.3.2 and later (#11226) @jbrennan333
  • Add 24 bit dictionary support to Parquet writer (#11216) @devavret
  • Enable positive group indices for extractAllRecord on JNI (#11215) @anthony-chang
  • JNI bindings for NTH_ELEMENT window aggregation (#11201) @mythrocks
  • Add JNI bindings for extractAllRecord (#11196) @anthony-chang
  • Add cudf.options (#11193) @isVoid
  • Add thrift support for parquet column and offset indexes (#11178) @etseidl
  • Adding binary read/write as options for parquet (#11160) @hyperbolic2346
  • Support nth_element for window functions (#11158) @mythrocks
  • Implement lists::distinct and cudf::detail::stable_distinct (#11149) @ttnghia
  • Implement Groupby pct_change (#11144) @skirui-source
  • Add JNI for set operations (#11143) @ttnghia
  • Remove deprecated PER_THREAD_DEFAULT_STREAM (#11134) @jbrennan333
  • Added a Java method to check the existence of a list of keys in a map (#11128) @razajafri
  • Feature/python benchmarking (#11125) @vyasr
  • Support nan_equality in cudf::distinct (#11118) @ttnghia
  • Added JNI for getMapValueForKeys (#11104) @razajafri
  • Refactor semi_anti_join (#11100) @ttnghia
  • Replace remaining instances of rmm::cuda_stream_default with cudf::default_stream_value (#11082) @jbrennan333
  • Adds the Logical Stack algorithm (#11078) @elstehle
  • Add doxygen-check pre-commit hook (#11076) @karthikeyann
  • Use new nvCOMP API to optimize the decompression temp memory size (#11064) @vuule
  • Add Doxygen CI check (#11057) @karthikeyann
  • Support duplicate_keep_option in cudf::distinct (#11052) @ttnghia
  • Support set operations (#11043) @ttnghia
  • Support for ZLIB compression in ORC writer (#11036) @vuule
  • Adding feature swaplevels (#11027) @VamsiTallam95
  • Use nvCOMP for ZLIB decompression in ORC reader (#11024) @vuule
  • Function for bfill, ffill #9591 (#11022) @Sreekiran096
  • Generate group offsets from element labels (#11017) @ttnghia
  • Feature axes (#10979) @VamsiTallam95
  • Generate group labels from offsets (#10945) @ttnghia
  • Add missing cuIO benchmark coverage for duration types (#10933) @vuule
  • Dask-cuDF cumulative groupby ops (#10889) @brandon-b-miller
  • Reindex Improvements (#10815) @brandon-b-miller
  • Implement value_counts for DataFrame (#10813) @martinfalisse

πŸ› οΈ Improvements

  • Pin numpy to &lt;1.23 (#11824) @galipremsagar
  • Make Index Join Tests on Default Precisions Deterministic (#11451) @isVoid
  • Pin dask & distributed for release (#11433) @galipremsagar
  • Use documented header template for doxygen (#11430) @galipremsagar
  • Relax arrow version in dev env (#11418) @galipremsagar
  • Added Java bindings for Parquet options for binary read (#11410) @razajafri
  • Allow CuPy 11 (#11393) @jakirkham
  • Improve multibyte_split performance (#11347) @cwharris
  • Switch death test to use explicit trap. (#11326) @vyasr
  • Add --output-on-failure to ctest args. (#11321) @vyasr
  • Consolidate remaining DataFrame/Series APIs (#11315) @vyasr
  • Add JNI support for the join_strings API (#11309) @revans2
  • Add cupy version to setup.py install_requires (#11306) @vyasr
  • removing some unused code (#11305) @hyperbolic2346
  • Add test of wildcard selection (#11300) @vyasr
  • Update parquet reader to take stream parameter (#11294) @PointKernel
  • Spark list hashing (#11292) @bdice
  • Remove legacy join APIs (#11274) @vyasr
  • Fix cudf recipes syntax (#11273) @ajschmidt8
  • Fix cudf recipe (#11267) @ajschmidt8
  • Cleanup config files (#11266) @vyasr
  • Run mypy on all packages (#11265) @vyasr
  • Update to isort 5.10.1. (#11262) @vyasr
  • Consolidate flake8 and pydocstyle configuration (#11260) @vyasr
  • Remove redundant black config specifications. (#11258) @vyasr
  • Ensure DeprecationWarnings are not introduced via pre-commit (#11255) @wence-
  • Optimization to gpu::PreprocessColumnData in parquet reader. (#11252) @nvdbaranec
  • Move rolling impl details to detail/ directory. (#11250) @mythrocks
  • Remove lists::drop_list_duplicates (#11236) @ttnghia
  • Use cudf::lists::distinct in Python binding (#11234) @ttnghia
  • Use cudf::lists::distinct in Java binding (#11233) @ttnghia
  • Use cudf::distinct in Java binding (#11232) @ttnghia
  • Pin dask-cuda in dev environment (#11229) @galipremsagar
  • Remove cruft in map_lookup (#11221) @mythrocks
  • Deprecate skiprows & num_rows in parquet reader (#11218) @galipremsagar
  • Remove Frame._index (#11210) @vyasr
  • Improve performance for cudf::contains when searching for a scalar (#11202) @ttnghia
  • Document why Development component is needing for CMake. (#11200) @vyasr
  • cleanup unused code in rolling_test.hpp (#11195) @karthikeyann
  • Standardize join internals around DataFrame (#11184) @vyasr
  • Move character case table declarations from src to detail (#11183) @davidwendt
  • Remove usage of Frame in StringMethods (#11181) @vyasr
  • Expose get_json_object_options to Python (#11180) @SrikarVanavasam
  • Fix decimal128 stats in parquet writer (#11179) @etseidl
  • Modify CheckPageRows in parquet_test to use datasources (#11177) @etseidl
  • Pin max version of cuda-python to 11.7.0 (#11174) @Ethyling
  • Refactor and optimize Frame.where (#11168) @vyasr
  • Add npos const static member to cudf::string_view (#11166) @davidwendt
  • Move _drop_rows_by_label from Frame to IndexedFrame (#11157) @vyasr
  • Clean up _copy_type_metadata (#11156) @vyasr
  • Add nvcc conda package in dev environment (#11154) @galipremsagar
  • Struct binary comparison op functionality for spark rapids (#11153) @rwlee
  • Refactor inline conditionals. (#11151) @bdice
  • Refactor Spark hashing tests (#11145) @bdice
  • Add new _from_data_like_self factory (#11140) @vyasr
  • Update get_cucollections to use rapids-cmake (#11139) @vyasr
  • Remove unnecessary extra function for libcudacxx detection (#11138) @vyasr
  • Allow initial value for cudf::reduce and cudf::segmented_reduce. (#11137) @SrikarVanavasam
  • Remove Index.replace API (#11131) @vyasr
  • Move char-type table function declarations from src to detail (#11127) @davidwendt
  • Clean up repo root (#11124) @bdice
  • Improve print formatting of strings containing newline characters. (#11108) @nvdbaranec
  • Fix cudf::string_view::find() to return pos for empty string argument (#11107) @davidwendt
  • Forward-merge branch-22.06 to branch-22.08 (#11086) @bdice
  • Take iterators by value in clamp.cu. (#11084) @bdice
  • Performance improvements for row to column conversions (#11075) @hyperbolic2346
  • Remove deprecated Index methods from Frame (#11073) @vyasr
  • Use per-page max compressed size estimate for compression (#11066) @devavret
  • column to row refactor for performance (#11063) @hyperbolic2346
  • Include skbuild directory into build.sh clean operation (#11060) @galipremsagar
  • Unpin dask & distributed for development (#11058) @galipremsagar
  • Add support for Series.between (#11051) @galipremsagar
  • Fix groupby include (#11046) @bwyogatama
  • Regex cleanup internal reclass and reclass_device classes (#11045) @davidwendt
  • Remove public API of cudf.merge_sorted. (#11032) @bdice
  • Drop python 3.7 in code-base (#11029) @galipremsagar
  • Addition & integration of the integer power operator (#11025) @AtlantaPepsi
  • Refactor lists::contains (#11019) @ttnghia
  • Change build.sh to find C++ library by default and avoid shadowing CMAKE_ARGS (#11013) @vyasr
  • Clean up parquet unit test (#11005) @PointKernel
  • Add missing #pragma once to header files (#11004) @karthikeyann
  • Cleanup iterator.cuh and add fixed point support for scalar_optional_accessor (#10999) @ttnghia
  • Refactor cudf::contains (#10997) @ttnghia
  • Remove Arrow CUDA IPC code (#10995) @shwina
  • Change file extension for groupby benchmark (#10985) @ttnghia
  • Sort recipe include checks. (#10984) @bdice
  • Update cuCollections for thrust upgrade (#10983) @PointKernel
  • Expose row-group size options in cudf ParquetWriter (#10980) @rjzamora
  • Cleanup cudf::strings::detail::regex_parser class source (#10975) @davidwendt
  • Handle missing fields as nulls in get_json_object() (#10970) @SrikarVanavasam
  • Fix license families to match all-caps expected by conda-verify. (#10931) @bdice
  • Include <optional> for GCC 11 compatibility. (#10927) @bdice
  • Enable builds with scikit-build (#10919) @vyasr
  • Improve distinct by using cuco::static_map::retrieve_all (#10916) @PointKernel
  • update cudfjni to 22.08.0-SNAPSHOT (#10910) @pxLi
  • Improve the capture of fatal cuda error (#10884) @sperlingxx
  • Cleanup regex compiler operators and operands source (#10879) @davidwendt
  • Buffer: make .ptr read-only (#10872) @madsbk
  • Configurable NaN handling in device_row_comparators (#10870) @rwlee
  • Register cudf.core.groupby.Grouper objects to dask grouper_dispatch (#10838) @brandon-b-miller
  • Upgrade to arrow-8 (#10816) @galipremsagar
  • Remove getattr method in RangeIndex class (#10538) @skirui-source
  • Adding bins to value counts (#8247) @marlenezw
cudf - v22.08.00

Published by GPUtester about 2 years ago

🚨 Breaking Changes

  • Remove legacy join APIs (#11274) @vyasr
  • Remove lists::drop_list_duplicates (#11236) @ttnghia
  • Remove Index.replace API (#11131) @vyasr
  • Remove deprecated Index methods from Frame (#11073) @vyasr
  • Remove public API of cudf.merge_sorted. (#11032) @bdice
  • Drop python 3.7 in code-base (#11029) @galipremsagar
  • Return empty dataframe when reading a Parquet file using empty columns option (#11018) @vuule
  • Remove Arrow CUDA IPC code (#10995) @shwina
  • Buffer: make .ptr read-only (#10872) @madsbk

πŸ› Bug Fixes

  • Fix distributed error related to loop_in_thread (#11428) @galipremsagar
  • Relax arrow pinning to just 8.x and remove cuda build dependency from cudf recipe (#11412) @kkraus14
  • Revert "Allow CuPy 11" (#11409) @jakirkham
  • Fix moto timeouts (#11369) @galipremsagar
  • Set +/-infinity as the identity values for floating-point numbers in device operators min and max (#11357) @ttnghia
  • Fix memory_usage() for ListSeries (#11355) @thomcom
  • Fix constructing Column from column_view with expired mask (#11354) @shwina
  • Handle parquet corner case: Columns with more rows than are in the row group. (#11353) @nvdbaranec
  • Fix DatetimeIndex & TimedeltaIndex constructors (#11342) @galipremsagar
  • Fix unsigned-compare compile warning in IntPow binops (#11339) @davidwendt
  • Fix performance issue and add a new code path to cudf::detail::contains (#11330) @ttnghia
  • Pin pytorch to temporarily unblock from libcupti errors (#11289) @galipremsagar
  • Workaround for nvcomp zstd overwriting blocks for orc due to underestimate of sizes (#11288) @jbrennan333
  • Fix inconsistency when hashing two tables in cudf::detail::contains (#11284) @ttnghia
  • Fix issue related to numpy array and category dtype (#11282) @galipremsagar
  • Add NotImplementedError when on is specified in DataFrame.join. (#11275) @vyasr
  • Fix invalid allocate_like() and empty_like() tests. (#11268) @nvdbaranec
  • Returns DataFrame When Concating Along Axis 1 (#11263) @isVoid
  • Fix compile error due to missing header (#11257) @ttnghia
  • Fix a memory aliasing/crash issue in scatter for lists. (#11254) @nvdbaranec
  • Fix tests/rolling/empty_input_test (#11238) @ttnghia
  • Fix const qualifier when using host_span&lt;bitmask_type const*&gt; (#11220) @ttnghia
  • Avoid using nvcompBatchedDeflateDecompressGetTempSizeEx in cuIO (#11213) @vuule
  • Generate benchmark data with correct run length regardless of cardinality (#11205) @vuule
  • Fix cumulative count index behavior (#11188) @brandon-b-miller
  • Fix assertion in dask_cudf test_struct_explode (#11170) @rjzamora
  • Provides a method for the user to remove the hook and re-register the hook in a custom shutdown hook manager (#11161) @res-life
  • Fix compatibility issues with pandas 1.4.3 (#11152) @vyasr
  • Ensure cuco export set is installed in cmake build (#11147) @jlowe
  • Avoid redundant deepcopy in cudf.from_pandas (#11142) @galipremsagar
  • Fix compile error due to missing header (#11126) @ttnghia
  • Fix __cuda_array_interface__ failures (#11113) @galipremsagar
  • Support octal and hex within regex character class pattern (#11112) @davidwendt
  • Fix split_re matching logic for word boundaries (#11106) @davidwendt
  • Handle multiple files metadata in read_parquet (#11105) @galipremsagar
  • Fix index alignment for Series objects with repeated index (#11103) @shwina
  • FindcuFile now searches in the current CUDA Toolkit location (#11101) @robertmaynard
  • Fix regex word boundary logic to include underline (#11099) @davidwendt
  • Exclude CudaFatalTest when selecting all Java tests (#11083) @jlowe
  • Fix duplicate cudatoolkit pinning issue (#11070) @galipremsagar
  • Maintain the input index in the result of a groupby-transform (#11068) @shwina
  • Fix bug with row count comparison for expect_columns_equivalent(). (#11059) @nvdbaranec
  • Fix BPE uninitialized size value for null and empty input strings (#11054) @davidwendt
  • Include missing header for usage of get_current_device_resource() (#11047) @AtlantaPepsi
  • Fix warn_unused_result error in parquet test (#11026) @karthikeyann
  • Return empty dataframe when reading a Parquet file using empty columns option (#11018) @vuule
  • Fix small error in page row count limiting (#10991) @etseidl
  • Fix a row index entry error in ORC writer issue (#10989) @vuule
  • Fix grouped covariance to require both values to be convertible to double. (#10891) @bdice

πŸ“– Documentation

  • Fix issues with day & night modes in python docs (#11400) @galipremsagar
  • Update missing data handling APIs in docs (#11345) @galipremsagar
  • Add lists filtering APIs to doxygen group. (#11336) @bdice
  • Remove unused import in README sample (#11318) @vyasr
  • Note null behavior in where docs (#11276) @brandon-b-miller
  • Update docstring for spans in get_row_data_range (#11271) @vyasr
  • Update nvCOMP integration table (#11231) @vuule
  • Add dev docs for documentation writing (#11217) @vyasr
  • Documentation fix for concatenate (#11187) @dagardner-nv
  • Fix unresolved links in markdown (#11173) @karthikeyann
  • Fix cudf version in README.md install commands (#11164) @jvanstraten
  • Switch language from None to &quot;en&quot; in docs build (#11133) @galipremsagar
  • Remove docs mentioning scalar_view since no such class exists. (#11132) @bdice
  • Add docstring entry for DataFrame.value_counts (#11039) @galipremsagar
  • Add docs to rolling var, std, count. (#11035) @bdice
  • Fix docs for Numba UDFs. (#11020) @bdice
  • Replace column comparison utilities functions with macros (#11007) @karthikeyann
  • Fix Doxygen warnings in multiple headers files (#11003) @karthikeyann
  • Fix doxygen warnings in utilities/ headers (#10974) @karthikeyann
  • Fix Doxygen warnings in table header files (#10964) @karthikeyann
  • Fix Doxygen warnings in column header files (#10963) @karthikeyann
  • Fix Doxygen warnings in strings / header files (#10937) @karthikeyann
  • Generate Doxygen Tag File for Libcudf (#10932) @isVoid
  • Fix doxygen warnings in structs, lists headers (#10923) @karthikeyann
  • Fix doxygen warnings in fixed_point.hpp (#10922) @karthikeyann
  • Fix doxygen warnings in ast/, rolling, tdigest/, wrappers/, dictionary/ headers (#10921) @karthikeyann
  • fix doxygen warnings in cudf/io/types.hpp, other header files (#10913) @karthikeyann
  • fix doxygen warnings in cudf/io/ avro, csv, json, orc, parquet header files (#10912) @karthikeyann
  • Fix doxygen warnings in cudf/*.hpp (#10896) @karthikeyann
  • Add missing documentation in aggregation.hpp (#10887) @karthikeyann
  • Revise PR template. (#10774) @bdice

πŸš€ New Features

  • Change cmake to allow controlling Arrow version via cmake variable (#11429) @kkraus14
  • Adding support for list<int8> columns to be written as byte arrays in parquet (#11328) @hyperbolic2346
  • Adding byte array view structure (#11322) @hyperbolic2346
  • Adding byte_array statistics (#11303) @hyperbolic2346
  • Add column indexes to Parquet writer (#11302) @etseidl
  • Provide an Option for Default Integer and Floating Bitwidth (#11272) @isVoid
  • FST benchmark (#11243) @karthikeyann
  • Adds the Finite-State Transducer algorithm (#11242) @elstehle
  • Refactor collect_set to use cudf::distinct and cudf::lists::distinct (#11228) @ttnghia
  • Treat zstd as stable in nvcomp releases 2.3.2 and later (#11226) @jbrennan333
  • Add 24 bit dictionary support to Parquet writer (#11216) @devavret
  • Enable positive group indices for extractAllRecord on JNI (#11215) @anthony-chang
  • JNI bindings for NTH_ELEMENT window aggregation (#11201) @mythrocks
  • Add JNI bindings for extractAllRecord (#11196) @anthony-chang
  • Add cudf.options (#11193) @isVoid
  • Add thrift support for parquet column and offset indexes (#11178) @etseidl
  • Adding binary read/write as options for parquet (#11160) @hyperbolic2346
  • Support nth_element for window functions (#11158) @mythrocks
  • Implement lists::distinct and cudf::detail::stable_distinct (#11149) @ttnghia
  • Implement Groupby pct_change (#11144) @skirui-source
  • Add JNI for set operations (#11143) @ttnghia
  • Remove deprecated PER_THREAD_DEFAULT_STREAM (#11134) @jbrennan333
  • Added a Java method to check the existence of a list of keys in a map (#11128) @razajafri
  • Feature/python benchmarking (#11125) @vyasr
  • Support nan_equality in cudf::distinct (#11118) @ttnghia
  • Added JNI for getMapValueForKeys (#11104) @razajafri
  • Refactor semi_anti_join (#11100) @ttnghia
  • Replace remaining instances of rmm::cuda_stream_default with cudf::default_stream_value (#11082) @jbrennan333
  • Adds the Logical Stack algorithm (#11078) @elstehle
  • Add doxygen-check pre-commit hook (#11076) @karthikeyann
  • Use new nvCOMP API to optimize the decompression temp memory size (#11064) @vuule
  • Add Doxygen CI check (#11057) @karthikeyann
  • Support duplicate_keep_option in cudf::distinct (#11052) @ttnghia
  • Support set operations (#11043) @ttnghia
  • Support for ZLIB compression in ORC writer (#11036) @vuule
  • Adding feature swaplevels (#11027) @VamsiTallam95
  • Use nvCOMP for ZLIB decompression in ORC reader (#11024) @vuule
  • Function for bfill, ffill #9591 (#11022) @Sreekiran096
  • Generate group offsets from element labels (#11017) @ttnghia
  • Feature axes (#10979) @VamsiTallam95
  • Generate group labels from offsets (#10945) @ttnghia
  • Add missing cuIO benchmark coverage for duration types (#10933) @vuule
  • Dask-cuDF cumulative groupby ops (#10889) @brandon-b-miller
  • Reindex Improvements (#10815) @brandon-b-miller
  • Implement value_counts for DataFrame (#10813) @martinfalisse

πŸ› οΈ Improvements

  • Pin dask & distributed for release (#11433) @galipremsagar
  • Use documented header template for doxygen (#11430) @galipremsagar
  • Relax arrow version in dev env (#11418) @galipremsagar
  • Allow CuPy 11 (#11393) @jakirkham
  • Improve multibyte_split performance (#11347) @cwharris
  • Switch death test to use explicit trap. (#11326) @vyasr
  • Add --output-on-failure to ctest args. (#11321) @vyasr
  • Consolidate remaining DataFrame/Series APIs (#11315) @vyasr
  • Add JNI support for the join_strings API (#11309) @revans2
  • Add cupy version to setup.py install_requires (#11306) @vyasr
  • removing some unused code (#11305) @hyperbolic2346
  • Add test of wildcard selection (#11300) @vyasr
  • Update parquet reader to take stream parameter (#11294) @PointKernel
  • Spark list hashing (#11292) @bdice
  • Remove legacy join APIs (#11274) @vyasr
  • Fix cudf recipes syntax (#11273) @ajschmidt8
  • Fix cudf recipe (#11267) @ajschmidt8
  • Cleanup config files (#11266) @vyasr
  • Run mypy on all packages (#11265) @vyasr
  • Update to isort 5.10.1. (#11262) @vyasr
  • Consolidate flake8 and pydocstyle configuration (#11260) @vyasr
  • Remove redundant black config specifications. (#11258) @vyasr
  • Ensure DeprecationWarnings are not introduced via pre-commit (#11255) @wence-
  • Optimization to gpu::PreprocessColumnData in parquet reader. (#11252) @nvdbaranec
  • Move rolling impl details to detail/ directory. (#11250) @mythrocks
  • Remove lists::drop_list_duplicates (#11236) @ttnghia
  • Use cudf::lists::distinct in Python binding (#11234) @ttnghia
  • Use cudf::lists::distinct in Java binding (#11233) @ttnghia
  • Use cudf::distinct in Java binding (#11232) @ttnghia
  • Pin dask-cuda in dev environment (#11229) @galipremsagar
  • Remove cruft in map_lookup (#11221) @mythrocks
  • Deprecate skiprows & num_rows in parquet reader (#11218) @galipremsagar
  • Remove Frame._index (#11210) @vyasr
  • Improve performance for cudf::contains when searching for a scalar (#11202) @ttnghia
  • Document why Development component is needing for CMake. (#11200) @vyasr
  • cleanup unused code in rolling_test.hpp (#11195) @karthikeyann
  • Standardize join internals around DataFrame (#11184) @vyasr
  • Move character case table declarations from src to detail (#11183) @davidwendt
  • Remove usage of Frame in StringMethods (#11181) @vyasr
  • Expose get_json_object_options to Python (#11180) @SrikarVanavasam
  • Fix decimal128 stats in parquet writer (#11179) @etseidl
  • Modify CheckPageRows in parquet_test to use datasources (#11177) @etseidl
  • Pin max version of cuda-python to 11.7.0 (#11174) @Ethyling
  • Refactor and optimize Frame.where (#11168) @vyasr
  • Add npos const static member to cudf::string_view (#11166) @davidwendt
  • Move _drop_rows_by_label from Frame to IndexedFrame (#11157) @vyasr
  • Clean up _copy_type_metadata (#11156) @vyasr
  • Add nvcc conda package in dev environment (#11154) @galipremsagar
  • Struct binary comparison op functionality for spark rapids (#11153) @rwlee
  • Refactor inline conditionals. (#11151) @bdice
  • Refactor Spark hashing tests (#11145) @bdice
  • Add new _from_data_like_self factory (#11140) @vyasr
  • Update get_cucollections to use rapids-cmake (#11139) @vyasr
  • Remove unnecessary extra function for libcudacxx detection (#11138) @vyasr
  • Allow initial value for cudf::reduce and cudf::segmented_reduce. (#11137) @SrikarVanavasam
  • Remove Index.replace API (#11131) @vyasr
  • Move char-type table function declarations from src to detail (#11127) @davidwendt
  • Clean up repo root (#11124) @bdice
  • Improve print formatting of strings containing newline characters. (#11108) @nvdbaranec
  • Fix cudf::string_view::find() to return pos for empty string argument (#11107) @davidwendt
  • Forward-merge branch-22.06 to branch-22.08 (#11086) @bdice
  • Take iterators by value in clamp.cu. (#11084) @bdice
  • Performance improvements for row to column conversions (#11075) @hyperbolic2346
  • Remove deprecated Index methods from Frame (#11073) @vyasr
  • Use per-page max compressed size estimate for compression (#11066) @devavret
  • column to row refactor for performance (#11063) @hyperbolic2346
  • Include skbuild directory into build.sh clean operation (#11060) @galipremsagar
  • Unpin dask & distributed for development (#11058) @galipremsagar
  • Add support for Series.between (#11051) @galipremsagar
  • Fix groupby include (#11046) @bwyogatama
  • Regex cleanup internal reclass and reclass_device classes (#11045) @davidwendt
  • Remove public API of cudf.merge_sorted. (#11032) @bdice
  • Drop python 3.7 in code-base (#11029) @galipremsagar
  • Addition & integration of the integer power operator (#11025) @AtlantaPepsi
  • Refactor lists::contains (#11019) @ttnghia
  • Change build.sh to find C++ library by default and avoid shadowing CMAKE_ARGS (#11013) @vyasr
  • Clean up parquet unit test (#11005) @PointKernel
  • Add missing #pragma once to header files (#11004) @karthikeyann
  • Cleanup iterator.cuh and add fixed point support for scalar_optional_accessor (#10999) @ttnghia
  • Refactor cudf::contains (#10997) @ttnghia
  • Remove Arrow CUDA IPC code (#10995) @shwina
  • Change file extension for groupby benchmark (#10985) @ttnghia
  • Sort recipe include checks. (#10984) @bdice
  • Update cuCollections for thrust upgrade (#10983) @PointKernel
  • Expose row-group size options in cudf ParquetWriter (#10980) @rjzamora
  • Cleanup cudf::strings::detail::regex_parser class source (#10975) @davidwendt
  • Handle missing fields as nulls in get_json_object() (#10970) @SrikarVanavasam
  • Fix license families to match all-caps expected by conda-verify. (#10931) @bdice
  • Include <optional> for GCC 11 compatibility. (#10927) @bdice
  • Enable builds with scikit-build (#10919) @vyasr
  • Improve distinct by using cuco::static_map::retrieve_all (#10916) @PointKernel
  • update cudfjni to 22.08.0-SNAPSHOT (#10910) @pxLi
  • Improve the capture of fatal cuda error (#10884) @sperlingxx
  • Cleanup regex compiler operators and operands source (#10879) @davidwendt
  • Buffer: make .ptr read-only (#10872) @madsbk
  • Configurable NaN handling in device_row_comparators (#10870) @rwlee
  • Register cudf.core.groupby.Grouper objects to dask grouper_dispatch (#10838) @brandon-b-miller
  • Upgrade to arrow-8 (#10816) @galipremsagar
  • Remove getattr method in RangeIndex class (#10538) @skirui-source
  • Adding bins to value counts (#8247) @marlenezw
cudf - v22.06.01

Published by GPUtester over 2 years ago

v22.06.01

cudf - v22.06.00

Published by GPUtester over 2 years ago

🚨 Breaking Changes

  • Enable Zstandard decompression only when all nvcomp integrations are enabled (#10944) @vuule
  • Rename sliced_child to get_sliced_child. (#10885) @bdice
  • Add parameters to control page size in Parquet writer (#10882) @etseidl
  • Make cudf::test::expect_columns_equal() to fail when comparing unsanitary lists. (#10880) @nvdbaranec
  • Cleanup regex compiler fixed quantifiers source (#10843) @davidwendt
  • Refactor cudf::contains, renaming and switching parameters role (#10802) @ttnghia
  • Generic serialization of all column types (#10784) @wence-
  • Return per-file metadata from readers (#10782) @vuule
  • HostColumnVectoreCore#isNull should return true for out-of-range rows (#10779) @gerashegalov
  • Update groupby::hash to use new row operators for keys (#10770) @PointKernel
  • update mangle_dupe_cols behavior in csv reader to match pandas 1.4.0 behavior (#10749) @karthikeyann
  • Rename CUDA_TRY macro to CUDF_CUDA_TRY, rename CHECK_CUDA macro to CUDF_CHECK_CUDA. (#10589) @bdice
  • Upgrade cudf to support pandas 1.4.x versions (#10584) @galipremsagar
  • Move binop methods from Frame to IndexedFrame and standardize the docstring (#10576) @vyasr
  • Add default= kwarg to .list.get() accessor method (#10547) @shwina
  • Remove deprecated decimal_cols_as_float in the ORC reader (#10515) @vuule
  • Support nvComp 2.3 if local, otherwise use nvcomp 2.2 (#10513) @robertmaynard
  • Fix findall_record to return empty list for no matches (#10491) @davidwendt
  • Namespace/Docstring Fixes for Reduction (#10471) @isVoid
  • Additional refactoring of hash functions (#10462) @bdice
  • Fix default value of str.split expand parameter. (#10457) @bdice
  • Remove deprecated code. (#10450) @vyasr

πŸ› Bug Fixes

  • Fix single column MultiIndex issue in sort_index (#10957) @galipremsagar
  • Make SerializedTableHeader(numRows) public (#10949) @gerashegalov
  • Fix gcc_linux version pinning in dev environment (#10943) @galipremsagar
  • Fix an issue with reading raw string in cudf.read_json (#10924) @galipremsagar
  • Make cudf::test::expect_columns_equal() to fail when comparing unsanitary lists. (#10880) @nvdbaranec
  • Fix segmented_reduce on empty column with non-empty offsets (#10876) @davidwendt
  • Fix dask-cudf groupby handling when grouping by all columns (#10866) @charlesbluca
  • Fix a bug in distinct: using nested nulls logic (#10848) @PointKernel
  • Fix constness / references in weak ordering operator() signatures. (#10846) @bdice
  • Suppress sizeof-array-div warnings in thrust found by gcc-11 (#10840) @robertmaynard
  • Add handling for string by-columns in dask-cudf groupby (#10830) @charlesbluca
  • Fix compile warning in search.cu (#10827) @davidwendt
  • Fix element access const correctness in hostdevice_vector (#10804) @vuule
  • Update cuco git tag (#10788) @PointKernel
  • HostColumnVectoreCore#isNull should return true for out-of-range rows (#10779) @gerashegalov
  • Fixing deprecation warnings in test_orc.py (#10772) @hyperbolic2346
  • Enable writing to s3 storage in chunked parquet writer (#10769) @galipremsagar
  • Fix construction of nested structs with EMPTY child (#10761) @shwina
  • Fix replace error when regex has only zero match quantifiers (#10760) @davidwendt
  • Fix an issue with one_level_list schemas in parquet reader. (#10750) @nvdbaranec
  • update mangle_dupe_cols behavior in csv reader to match pandas 1.4.0 behavior (#10749) @karthikeyann
  • Fix cupy function in notebook (#10737) @ajschmidt8
  • Fix fillna to retain columns when it is MultiIndex (#10729) @galipremsagar
  • Fix scatter for all-empty-string column case (#10724) @davidwendt
  • Retain series name in Series.apply (#10716) @brandon-b-miller
  • Correct build dir cudf-config dependency issues for static builds (#10704) @robertmaynard
  • Fix list of testing requirements in setup.py. (#10678) @bdice
  • Fix rounding to zero error in stod on very small float numbers (#10672) @davidwendt
  • cuco isn't a cudf dependency when we are built shared (#10662) @robertmaynard
  • Fix to_timestamps to support Z for %z format specifier (#10617) @davidwendt
  • Verify compression type in Parquet reader (#10610) @vuule
  • Fix struct row comparator's exception on empty structs (#10604) @sperlingxx
  • Fix strings strip() to accept only str Scalar for to_strip parameter (#10597) @davidwendt
  • Fix has_atomic_support check in can_use_hash_groupby() (#10588) @jbrennan333
  • Revert Thrust 1.16 to Thrust 1.15 (#10586) @bdice
  • Fix missing RMM_STATIC_CUDART define when compiling JNI with static CUDA runtime (#10585) @jlowe
  • pin more cmake versions (#10570) @robertmaynard
  • Re-enable Build Metrics Report (#10562) @davidwendt
  • Remove statically linked CUDA runtime check in Java build (#10532) @jlowe
  • Fix temp data cleanup in test_text.py (#10524) @brandon-b-miller
  • Update pre-commit to run black 22.3.0 (#10523) @vyasr
  • Remove deprecated decimal_cols_as_float in the ORC reader (#10515) @vuule
  • Fix findall_record to return empty list for no matches (#10491) @davidwendt
  • Allow users to specify data types for a subset of columns in read_csv (#10484) @vuule
  • Fix default value of str.split expand parameter. (#10457) @bdice
  • Improve coverage of dask-cudf's groupby aggregation, add tests for dropna support (#10449) @charlesbluca
  • Allow string aggs for dask_cudf.CudfDataFrameGroupBy.aggregate (#10222) @charlesbluca
  • In-place updates with loc or iloc don't work correctly when the LHS has more than one column (#9918) @skirui-source

πŸ“– Documentation

  • Clarify append deprecation notice. (#10930) @bdice
  • Use full name of GPUDirect Storage SDK in docs (#10904) @vuule
  • Update Dask + Pandas to Dask + cuDF path (#10897) @miguelusque
  • Add missing documentation in cudf/types.hpp (#10895) @karthikeyann
  • Add strong index iterator docs. (#10888) @bdice
  • spell check fixes (#10865) @karthikeyann
  • Add missing documentation in scalar/ headers (#10861) @karthikeyann
  • Remove typo in ngram documentation (#10859) @miguelusque
  • fix doxygen warnings (#10842) @karthikeyann
  • Add a library_design.md file documenting the core Python data structures and their relationship (#10817) @vyasr
  • Add NumPy to intersphinx references. (#10809) @bdice
  • Add a section to the docs that compares cuDF with Pandas (#10796) @shwina
  • Mention 2 cpp-reviewer requirement in pull request template (#10768) @davidwendt
  • Enable pydocstyle for all packages. (#10759) @bdice
  • Enable pydocstyle rules involving quotes (#10748) @vyasr
  • Revise 10 minutes notebook. (#10738) @bdice
  • Reorganize cuDF Python docs (#10691) @shwina
  • Fix sphinx/jupyter heading issue in UDF notebook (#10690) @brandon-b-miller
  • Migrated user guide notebooks to MyST-NB and added sphinx extension (#10685) @mmccarty
  • add data generation to benchmark documentation (#10677) @karthikeyann
  • Fix some docs build warnings (#10674) @galipremsagar
  • Update UDF notebook in User Guide. (#10668) @bdice
  • Improve User Guide docs (#10663) @bdice
  • Fix some docstrings formatting (#10660) @galipremsagar
  • Remove implementation details from apply docstrings (#10651) @brandon-b-miller
  • Revise CONTRIBUTING.md (#10644) @bdice
  • Add missing APIs to documentation. (#10643) @bdice
  • Use cudf.read_json as documented API name. (#10640) @bdice
  • Fix docstring section headings. (#10639) @bdice
  • Document cudf.read_text and cudf.read_avro. (#10638) @bdice
  • Fix type-o in docstring for json_reader_options (#10627) @dagardner-nv
  • Update guide to UDFs with notes about Series.applymap deprecation and related changes (#10607) @brandon-b-miller
  • Fix doxygen Modules page for cudf::lists::sequences (#10561) @davidwendt
  • Add Replace Backreferences section to Regex Features page (#10560) @davidwendt
  • Introduce deprecation policy to developer guide. (#10252) @vyasr

πŸš€ New Features

  • Enable Zstandard decompression only when all nvcomp integrations are enabled (#10944) @vuule
  • Handle nested types in cudf::concatenate_rows() (#10890) @nvdbaranec
  • Strong index types for equality comparator (#10883) @ttnghia
  • Add parameters to control page size in Parquet writer (#10882) @etseidl
  • Support for Zstandard decompression in ORC reader (#10873) @vuule
  • Use pre-built nvcomp 2.3 binaries by default (#10851) @robertmaynard
  • Support for Zstandard decompression in Parquet reader (#10847) @vuule
  • Add JNI support for apply_boolean_mask (#10812) @res-life
  • Segmented Min/Max for Fixed Point Types (#10794) @isVoid
  • Return per-file metadata from readers (#10782) @vuule
  • Segmented apply_boolean_mask for LIST columns (#10773) @mythrocks
  • Update groupby::hash to use new row operators for keys (#10770) @PointKernel
  • Support purging non-empty null elements from LIST/STRING columns (#10701) @mythrocks
  • Add detail::hash_join (#10695) @PointKernel
  • Persist string statistics data across multiple calls to orc chunked write (#10694) @hyperbolic2346
  • Add .list.astype() to cast list leaves to specified dtype (#10693) @shwina
  • JNI: Add generateListOffsets API (#10683) @sperlingxx
  • Support args in groupby apply (#10682) @brandon-b-miller
  • Enable segmented_gather in Java package (#10669) @sperlingxx
  • Add row hasher with nested column support (#10641) @devavret
  • Add support for numeric_only in DataFrame._reduce (#10629) @martinfalisse
  • First step toward statistics in ORC files with chunked writes (#10567) @hyperbolic2346
  • Add support for struct columns to the random table generator (#10566) @vuule
  • Enable passing a sequence for the index argument to .list.get() (#10564) @shwina
  • Add python bindings for cudf::list::index_of (#10549) @ChrisJar
  • Add default= kwarg to .list.get() accessor method (#10547) @shwina
  • Add cudf.DataFrame.applymap (#10542) @brandon-b-miller
  • Support nvComp 2.3 if local, otherwise use nvcomp 2.2 (#10513) @robertmaynard
  • Add column field ID control in parquet writer (#10504) @PointKernel
  • Deprecate Series.applymap (#10497) @brandon-b-miller
  • Add option to drop cache in cuIO benchmarks (#10488) @vuule
  • move benchmark input generation in device in reduction nvbench (#10486) @karthikeyann
  • Support Segmented Min/Max Reduction on String Type (#10447) @isVoid
  • List element Equality comparator (#10289) @devavret
  • Implement all methods of groupby rank aggregation in libcudf, python (#9569) @karthikeyann
  • Implement DataFrame.eval using libcudf ASTs (#8022) @vyasr

πŸ› οΈ Improvements

  • Use conda compilers in env file (#10915) @galipremsagar
  • Remove C style artifacts in cuIO (#10886) @vuule
  • Rename sliced_child to get_sliced_child. (#10885) @bdice
  • Replace defaulted stream value for libcudf APIs that use NVCOMP (#10877) @jbrennan333
  • Add more unit tests for cudf::distinct for nested types with sliced input (#10860) @ttnghia
  • Changing list_view.cuh to list_view.hpp (#10854) @ttnghia
  • More error checking in from_dlpack (#10850) @wence-
  • Cleanup regex compiler fixed quantifiers source (#10843) @davidwendt
  • Adds the JNI call for Cuda.deviceSynchronize (#10839) @abellina
  • Add missing cuda-python dependency to cudf (#10833) @bdice
  • Change std::string parameters in cudf::strings APIs to std::string_view (#10832) @davidwendt
  • Split up search.cu to improve compile time (#10831) @davidwendt
  • Add tests for null scalar binaryops (#10828) @brandon-b-miller
  • Cleanup regex compile optimize functions (#10825) @davidwendt
  • Use ThreadedMotoServer instead of subprocess in spinning up s3 server (#10822) @galipremsagar
  • Import NA from missing rather than using cudf.NA everywhere (#10821) @brandon-b-miller
  • Refactor regex builtin character-class identifiers (#10814) @davidwendt
  • Change pattern parameter for regex APIs from std::string to std::string_view (#10810) @davidwendt
  • Make the JNI API to get list offsets as a view public. (#10807) @revans2
  • Add cudf JNI docker build github action (#10806) @pxLi
  • Removed mr parameter from inplace bitmask operations (#10805) @AtlantaPepsi
  • Refactor cudf::contains, renaming and switching parameters role (#10802) @ttnghia
  • Handle closed property in IntervalDtype.from_pandas (#10798) @wence-
  • Return weak orderings from device_row_comparator. (#10793) @rwlee
  • Rework Scalar imports (#10791) @brandon-b-miller
  • Enable ccache for cudfjni build in Docker (#10790) @gerashegalov
  • Generic serialization of all column types (#10784) @wence-
  • simplifying skiprows test in test_orc.py (#10783) @hyperbolic2346
  • Use column_views instead of column_device_views in binary operations. (#10780) @bdice
  • Add struct utility functions. (#10776) @bdice
  • Add multiple rows to subword tokenizer benchmark (#10767) @davidwendt
  • Refactor host decompression in ORC reader (#10764) @vuule
  • Flush output streams before creating a process to drop caches (#10762) @vuule
  • Refactor binaryop/compiled/util.cpp (#10756) @bdice
  • Use warp per string for long strings in cudf::strings::contains() (#10739) @davidwendt
  • Use generator expressions in any/all functions. (#10736) @bdice
  • Use canonical "magic methods" (replace x.__repr__() with repr(x)). (#10735) @bdice
  • Improve use of isinstance. (#10734) @bdice
  • Rename tests from multiIndex to multiindex. (#10732) @bdice
  • Two-table comparators with strong index types (#10730) @bdice
  • Replace std::make_pair with std::pair (C++17 CTAD) (#10727) @karthikeyann
  • Use structured bindings instead of std::tie (#10726) @karthikeyann
  • Missing f prefix on f-strings fix (#10721) @code-review-doctor
  • Add max_file_size parameter to chunked parquet dataset writer (#10718) @galipremsagar
  • Deprecate merge_sorted, change dask cudf usage to internal method (#10713) @isVoid
  • Prepare dask_cudf test_parquet.py for upcoming API changes (#10709) @rjzamora
  • Remove or simplify various utility functions (#10705) @vyasr
  • Allow building arrow with parquet and not python (#10702) @revans2
  • Partial cuIO GPU decompression refactor (#10699) @vuule
  • Cython API refactor: merge.pyx (#10698) @isVoid
  • Fix random string data length to become variable (#10697) @galipremsagar
  • Add bindings for index_of with column search key (#10696) @ChrisJar
  • Deprecate index merging (#10689) @vyasr
  • Remove cudf::strings::string namespace (#10684) @davidwendt
  • Standardize imports. (#10680) @bdice
  • Standardize usage of collections.abc. (#10679) @bdice
  • Cython API Refactor: transpose.pyx, sort.pyx (#10675) @isVoid
  • Add device_memory_resource parameter to create_string_vector_from_column (#10673) @davidwendt
  • Split up mixed-join kernels source files (#10671) @davidwendt
  • Use std::filesystem for temporary directory location and deletion (#10664) @vuule
  • cleanup benchmark includes (#10661) @karthikeyann
  • Use upstream clang-format pre-commit hook. (#10659) @bdice
  • Clean up C++ includes to use <> instead of "". (#10658) @bdice
  • Handle RuntimeError thrown by CUDA Python in validate_setup (#10653) @shwina
  • Rework JNI CMake to leverage rapids_find_package (#10649) @jlowe
  • Use conda to build python packages during GPU tests (#10648) @Ethyling
  • Deprecate various functions that don't need to be defined for Index. (#10647) @vyasr
  • Update pinning to allow newer CMake versions. (#10646) @vyasr
  • Bump hadoop-common from 3.1.4 to 3.2.3 in /java (#10645) @dependabot[bot]
  • Remove concurrent_unordered_multimap. (#10642) @bdice
  • Improve parquet dictionary encoding (#10635) @PointKernel
  • Improve cudf::cuda_error (#10630) @sperlingxx
  • Add support for null and non-numeric types in Series.diff and DataFrame.diff (#10625) @Matt711
  • Branch 22.06 merge 22.04 (#10624) @vyasr
  • Unpin dask & distributed for development (#10623) @galipremsagar
  • Slightly improve accuracy of stod in to_floats (#10622) @davidwendt
  • Allow libcudfjni to be built as a static library (#10619) @jlowe
  • Change stack-based regex state data to use global memory (#10600) @davidwendt
  • Resolve Forward merging of branch-22.04 into branch-22.06 (#10598) @galipremsagar
  • KvikIO as an alternative GDS backend (#10593) @madsbk
  • Rename CUDA_TRY macro to CUDF_CUDA_TRY, rename CHECK_CUDA macro to CUDF_CHECK_CUDA. (#10589) @bdice
  • Upgrade cudf to support pandas 1.4.x versions (#10584) @galipremsagar
  • Refactor binary ops for timedelta and datetime columns (#10581) @vyasr
  • Refactor cudf::strings::count_re API to use count_matches utility (#10580) @davidwendt
  • Update Programming Language :: Python Versions to 3.8 & 3.9 (#10579) @madsbk
  • Automate Java cudf jar build with statically linked dependencies (#10578) @gerashegalov
  • Add patch for thrust-cub 1.16 to fix sort compile times (#10577) @davidwendt
  • Move binop methods from Frame to IndexedFrame and standardize the docstring (#10576) @vyasr
  • Cleanup libcudf strings regex classes (#10573) @davidwendt
  • Simplify preprocessing of arguments for DataFrame binops (#10563) @vyasr
  • Reduce kernel calls to build strings findall results (#10559) @davidwendt
  • Forward-merge branch-22.04 to branch-22.06 (#10557) @bdice
  • Update strings contains benchmark to measure varying match rates (#10555) @davidwendt
  • JNI: throw CUDA errors more specifically (#10551) @sperlingxx
  • Enable building static libs (#10545) @trxcllnt
  • Remove pip requirements files. (#10543) @bdice
  • Remove Click pinnings that are unnecessary after upgrading black. (#10541) @vyasr
  • Refactor memory_usage to improve performance (#10537) @galipremsagar
  • Adjust the valid range of group index for replace_with_backrefs (#10530) @sperlingxx
  • add accidentally removed comment. (#10526) @vyasr
  • Update conda environment. (#10525) @vyasr
  • Remove ColumnBase.getitem (#10516) @vyasr
  • Optimize left_semi_join by materializing the gather mask (#10511) @cheinger
  • Define proper binary operation APIs for columns (#10509) @vyasr
  • Upgrade arrow-cpp & pyarrow to 7.0.0 (#10503) @galipremsagar
  • Update to Thrust 1.16 (#10489) @bdice
  • Namespace/Docstring Fixes for Reduction (#10471) @isVoid
  • Update cudfjni 22.06.0-SNAPSHOT (#10467) @pxLi
  • Use Lists of Columns for Various Files (#10463) @isVoid
  • Additional refactoring of hash functions (#10462) @bdice
  • Fix Series.str.findall behavior for expand=False. (#10459) @bdice
  • Remove deprecated code. (#10450) @vyasr
  • Update cmake-format version. (#10440) @vyasr
  • Consolidate C++ conda recipes and add libcudf-tests package (#10326) @ajschmidt8
  • Use conda compilers (#10275) @Ethyling
  • Add row bitmask as a detail::hash_join member (#10248) @PointKernel
cudf - v22.04.00

Published by GPUtester over 2 years ago

🚨 Breaking Changes

  • Drop unsupported method argument from nunique and distinct_count. (#10411) @bdice
  • Refactor stream compaction APIs (#10370) @PointKernel
  • Add scan_aggregation and reduce_aggregation derived types. (#10357) @nvdbaranec
  • Avoid decimal type narrowing for decimal binops (#10299) @galipremsagar
  • Rewrites sample API (#10262) @isVoid
  • Remove probe-time null equality parameters in cudf::hash_join (#10260) @PointKernel
  • Enable proper Index round-tripping in orc reader and writer (#10170) @galipremsagar
  • Add JNI for strings::split_re and strings::split_record_re (#10139) @ttnghia
  • Change cudf::strings::find_multiple to return a lists column (#10134) @davidwendt
  • Remove the option to completely disable decimal128 columns in the ORC reader (#10127) @vuule
  • Remove deprecated code (#10124) @vyasr
  • Update gpu_utils.py to reflect current CUDA support. (#10113) @bdice
  • Optimize compaction operations (#10030) @PointKernel
  • Remove deprecated method Series.set_index. (#9945) @bdice
  • Add cudf::strings::findall_record API (#9911) @davidwendt
  • Upgrade arrow & pyarrow to 6.0.1 (#9686) @galipremsagar

πŸ› Bug Fixes

  • Fix an issue with tdigest merge aggregations. (#10506) @nvdbaranec
  • Batch of fixes for index overflows in grid stride loops. (#10448) @nvdbaranec
  • Update dask_cudf imports to be compatible with latest dask (#10442) @rlratzel
  • Fix for integer overflow in contiguous-split (#10437) @jbrennan333
  • Fix has_null predicate for drop_list_duplicates on nested structs (#10436) @sperlingxx
  • Fix empty reduce with List output and non-List input (#10435) @sperlingxx
  • Fix list and struct meta generation issue in dask-cudf (#10434) @galipremsagar
  • Fix error in cudf.to_numeric when a bool input is passed (#10431) @galipremsagar
  • Support cupy array in quantile input (#10429) @galipremsagar
  • Fix benchmarks to work with new aggregation types (#10428) @davidwendt
  • Fix cudf::shift to handle offset greater than column size (#10414) @davidwendt
  • Fix lifespan of the temporary directory that holds cuFile configuration file (#10403) @vuule
  • Fix error thrown in compiled-binaryop benchmark (#10398) @davidwendt
  • Limiting async allocator using alignment of 512 (#10395) @rongou
  • Include <optional> in multibyte split. (#10385) @bdice
  • Fix issue with column and scalar re-assignment (#10377) @galipremsagar
  • Fix floating point data generation in benchmarks (#10372) @vuule
  • Avoid overflow in fused_concatenate_kernel output_index (#10344) @abellina
  • Remove is_relationally_comparable for table device views (#10342) @davidwendt
  • Fix debug compile error in device_span to column_view conversion (#10331) @davidwendt
  • Add Pascal support to JCUDF transcode (row_conversion) (#10329) @mythrocks
  • Fix std::bad_alloc exception due to JIT reserving a huge buffer (#10317) @ttnghia
  • Fixes up the overflowed fixed-point round on nullable column (#10316) @sperlingxx
  • Fix DataFrame slicing issues for empty cases (#10310) @brandon-b-miller
  • Fix documentation issues (#10307) @ajschmidt8
  • Allow Java bindings to use default decimal precisions when writing columns (#10276) @sperlingxx
  • Fix incorrect slicing of GDS read/write calls (#10274) @vuule
  • Fix out-of-memory error in compiled-binaryop benchmark (#10269) @davidwendt
  • Add tests of reflected ufuncs and fix behavior of logical reflected ufuncs (#10261) @vyasr
  • Remove probe-time null equality parameters in cudf::hash_join (#10260) @PointKernel
  • Fix out-of-memory error in UrlDecode benchmark (#10258) @davidwendt
  • Fix groupby reductions that perform operations on source type instead of target type (#10250) @ttnghia
  • Fix small leak in explode (#10245) @revans2
  • Yet another small JNI memory leak (#10238) @revans2
  • Fix regex octal parsing to limit to 3 characters (#10233) @davidwendt
  • Fix string to decimal128 conversion handling large exponents (#10231) @davidwendt
  • Fix JNI leak on copy to device (#10229) @revans2
  • Fix the data generator element size for decimal types (#10225) @vuule
  • Fix decimal metadata in parquet writer (#10224) @galipremsagar
  • Fix strings handling of hex in regex pattern (#10220) @davidwendt
  • Fix docs builds (#10216) @ajschmidt8
  • Fix a leftover _has_nulls change from Nullate (#10211) @devavret
  • Fix bitmask of the output for JNI of lists::drop_list_duplicates (#10210) @ttnghia
  • Fix compile error in binaryop/compiled/util.cpp (#10209) @ttnghia
  • Skip ORC and Parquet readers' benchmark cases that are not currently supported (#10194) @vuule
  • Fix JNI leak of a cudf::column_view native class. (#10171) @revans2
  • Enable proper Index round-tripping in orc reader and writer (#10170) @galipremsagar
  • Convert Column Name to String Before Using Struct Column Factory (#10156) @isVoid
  • Preserve the correct ListDtype while creating an identical empty column (#10151) @galipremsagar
  • benchmark fixture - static object pointer fix (#10145) @karthikeyann
  • Fix UDF Caching (#10133) @brandon-b-miller
  • Raise duplicate column error in DataFrame.rename (#10120) @galipremsagar
  • Fix flaky memory usage test by guaranteeing array size. (#10114) @vyasr
  • Encode values from python callback for C++ (#10103) @jdye64
  • Add check for regex instructions causing an infinite-loop (#10095) @davidwendt
  • Remove metadata singleton from nvtext normalizer (#10090) @davidwendt
  • Column equality testing fixes (#10011) @brandon-b-miller
  • Pin libcudf runtime dependency for cudf / libcudf-kafka nightlies (#9847) @charlesbluca

πŸ“– Documentation

  • Fix documentation for DataFrame.corr and Series.corr. (#10493) @bdice
  • Add cut to API docs (#10479) @shwina
  • Remove documentation for methods removed in #10124. (#10366) @bdice
  • Fix documentation issues (#10306) @ajschmidt8
  • Fix fixed_point binary operation documentation (#10198) @codereport
  • Remove cleaned up methods from docs (#10189) @galipremsagar
  • Update developer guide to recommend no default stream parameter. (#10136) @bdice
  • Update benchmarking guide to use NVBench. (#10093) @bdice

πŸš€ New Features

  • Add StringIO support to read_text (#10465) @cwharris
  • Add support for tdigest and merge_tdigest aggregations through cudf::reduce (#10433) @nvdbaranec
  • JNI support for Collect Ops in Reduction (#10427) @sperlingxx
  • Enable read_text with dask_cudf using byte_range (#10407) @ChrisJar
  • Add cudf::stable_sort_by_key (#10387) @PointKernel
  • Implement maps_column_view abstraction over LIST&lt;STRUCT&lt;K,V&gt;&gt; (#10380) @mythrocks
  • Support Java bindings for Avro reader (#10373) @HaoYang670
  • Refactor stream compaction APIs (#10370) @PointKernel
  • Support collect aggregations in reduction (#10353) @sperlingxx
  • Refactor array_ufunc for Index and unify across all classes (#10346) @vyasr
  • Add JNI for extract_list_element with index column (#10341) @firestarman
  • Support min and max operations for structs in rolling window (#10332) @ttnghia
  • Add device create_sequence_table for benchmarks (#10300) @karthikeyann
  • Enable numpy ufuncs for DataFrame (#10287) @vyasr
  • move input generation for json benchmark to device (#10281) @karthikeyann
  • move input generation for type dispatcher benchmark to device (#10280) @karthikeyann
  • move input generation for copy benchmark to device (#10279) @karthikeyann
  • generate url decode benchmark input in device (#10278) @karthikeyann
  • device input generation in join bench (#10277) @karthikeyann
  • Add nvtext::byte_pair_encoding API (#10270) @davidwendt
  • Prevent internal usage of expensive APIs (#10263) @vyasr
  • Column to JCUDF row for tables with strings (#10235) @hyperbolic2346
  • Support percent_rank() aggregation (#10227) @mythrocks
  • Refactor Series.array_ufunc (#10217) @vyasr
  • Reduce pytest runtime (#10203) @brandon-b-miller
  • Add regex flags parameter to python cudf strings split (#10185) @davidwendt
  • Support for MOD, PMOD and PYMOD for decimal32/64/128 (#10179) @codereport
  • Adding string row size iterator for row to column and column to row conversion (#10157) @hyperbolic2346
  • Add file size counter to cuIO benchmarks (#10154) @vuule
  • byte_range support for multibyte_split/read_text (#10150) @cwharris
  • Add JNI for strings::split_re and strings::split_record_re (#10139) @ttnghia
  • Add maxSplit parameter to Java binding for strings:split (#10137) @ttnghia
  • Add libcudf strings split API that accepts regex pattern (#10128) @davidwendt
  • generate benchmark input in device (#10109) @karthikeyann
  • Avoid nan_as_null op if nan_count is 0 (#10082) @galipremsagar
  • Add Dataframe and Index nunique (#10077) @martinfalisse
  • Support nanosecond timestamps in parquet (#10063) @PointKernel
  • Java bindings for mixed semi and anti joins (#10040) @jlowe
  • Implement mixed equality/conditional semi/anti joins (#10037) @vyasr
  • Optimize compaction operations (#10030) @PointKernel
  • Support args= in Series.apply (#9982) @brandon-b-miller
  • Add cudf::strings::findall_record API (#9911) @davidwendt
  • Add covariance for sort groupby (python) (#9889) @mayankanand007
  • Implement DataFrame diff() (#9817) @skirui-source
  • Implement DataFrame pct_change (#9805) @skirui-source
  • Support segmented reductions and null mask reductions (#9621) @isVoid
  • Add 'spearman' correlation method for dataframe.corr and series.corr (#7141) @dominicshanshan

πŸ› οΈ Improvements

  • Add scipy skip for a test (#10502) @galipremsagar
  • Temporarily disable new ops-bot functionality (#10496) @ajschmidt8
  • Include <cstddef> to fix compilation of parquet reader on GCC 11. (#10483) @bdice
  • Pin dask and distributed (#10481) @galipremsagar
  • MD5 refactoring. (#10445) @bdice
  • Remove or split up Frame methods that use the index (#10439) @vyasr
  • Centralization of tdigest aggregation code. (#10422) @nvdbaranec
  • Simplify column binary operations (#10421) @vyasr
  • Add .github/ops-bot.yaml config file (#10420) @ajschmidt8
  • Use list of columns for methods in Groupby.pyx (#10419) @isVoid
  • Remove warnings in test_timedelta.py (#10418) @galipremsagar
  • Fix some warnings in test_parquet.py (#10416) @galipremsagar
  • JNI support for segmented reduce (#10413) @revans2
  • Clean up null mask after purging null entries (#10412) @sperlingxx
  • Drop unsupported method argument from nunique and distinct_count. (#10411) @bdice
  • Use str instead of builtins.str. (#10410) @bdice
  • Fix warnings in test_rolling (#10405) @bdice
  • Enable codecov github-check in CI (#10404) @galipremsagar
  • Fix warnings in test_cuda_apply, test_numerical, test_pickling, test_unaops. (#10402) @bdice
  • Set column names in _from_columns_like_self factory (#10400) @isVoid
  • Refactor nvtx annotations in cudf & dask-cudf (#10396) @galipremsagar
  • Consolidate .cov and .corr for sort groupby (#10386) @skirui-source
  • Consolidate some Frame APIs (#10381) @vyasr
  • Refactor hash functions and hash_combine (#10379) @bdice
  • Add nvtx annotations for Series and Index (#10374) @galipremsagar
  • Refactor filling.repeat API (#10371) @isVoid
  • Move standalone UTF8 functions from string_view.hpp to utf8.hpp (#10369) @davidwendt
  • Remove doc for deprecated function one_hot_encoding (#10367) @isVoid
  • Refactor array function (#10364) @vyasr
  • Fix warnings in test_csv.py. (#10362) @bdice
  • Implement a mixin for binops (#10360) @vyasr
  • Refactor cython interface: copying.pyx (#10359) @isVoid
  • Implement a mixin for scans (#10358) @vyasr
  • Add scan_aggregation and reduce_aggregation derived types. (#10357) @nvdbaranec
  • Add cleanup of python artifacts (#10355) @galipremsagar
  • Fix warnings in test_categorical.py. (#10354) @bdice
  • Create a dispatcher for invoking regex kernel functions (#10349) @davidwendt
  • Fix codecov in CI (#10347) @galipremsagar
  • Enable caching for memory_usage calculation in Column (#10345) @galipremsagar
  • C++17 cleanup: traits replace std::enable_if<>::type with std::enable_if_t (#10343) @karthikeyann
  • JNI: Support appending DECIMAL128 into ColumnBuilder in terms of byte array (#10338) @sperlingxx
  • multibyte_split test improvements (#10328) @vuule
  • Fix warnings in test_binops.py. (#10327) @bdice
  • Fix warnings from pandas in test_array_ufunc.py. (#10324) @bdice
  • Update upload script (#10321) @ajschmidt8
  • Move hash type declarations to hashing.hpp (#10320) @davidwendt
  • C++17 cleanup: traits replace ::value with _v (#10319) @karthikeyann
  • Remove internal columns usage (#10315) @vyasr
  • Remove extraneous build.sh parameter (#10313) @ajschmidt8
  • Add const qualifier to MurmurHash3_32::hash_combine (#10311) @davidwendt
  • Remove TODO in libcudf_kafka recipe (#10309) @ajschmidt8
  • Add conversions between column_view and device_span<T const>. (#10302) @bdice
  • Avoid decimal type narrowing for decimal binops (#10299) @galipremsagar
  • Deprecate DataFrame.iteritems and introduce .items (#10298) @galipremsagar
  • Explicitly request CMake use gnu++17 over c++17 (#10297) @robertmaynard
  • Add copyright check as pre-commit hook. (#10290) @vyasr
  • DataFrame insert and creation optimizations (#10285) @galipremsagar
  • Improve hash join detail functions (#10273) @PointKernel
  • Replace custom cached_property implementation with functools (#10272) @shwina
  • Rewrites sample API (#10262) @isVoid
  • Bump hadoop-common from 3.1.0 to 3.1.4 in /java (#10259) @dependabot[bot]
  • Remove making redundant copy across code-base (#10257) @galipremsagar
  • Add more nvtx annotations (#10256) @galipremsagar
  • Add copyright check in cudf (#10253) @galipremsagar
  • Remove redundant copies in fillna to improve performance (#10241) @galipremsagar
  • Remove std::numeric_limit specializations for timestamp & durations (#10239) @codereport
  • Optimize DataFrame creation across code-base (#10236) @galipremsagar
  • Change pytest distribution algorithm and increase parallelism in CI (#10232) @galipremsagar
  • Add environment variables for I/O thread pool and slice sizes (#10218) @vuule
  • Add regex flags to strings findall functions (#10208) @davidwendt
  • Update dask-cudf parquet tests to reflect upstream bugfixes to _metadata (#10206) @charlesbluca
  • Remove unnecessary nunique function in Series. (#10205) @martinfalisse
  • Refactor DataFrame tests. (#10204) @bdice
  • Rewrites column.__setitem__, Use boolean_mask_scatter (#10202) @isVoid
  • Java utilities to aid in accelerating aggregations on 128-bit types (#10201) @jlowe
  • Fix docstrings alignment in Frame methods (#10199) @galipremsagar
  • Fix cuco pair issue in hash join (#10195) @PointKernel
  • Replace dask groupby .index usages with .by (#10193) @galipremsagar
  • Add regex flags to strings extract function (#10192) @davidwendt
  • Forward-merge branch-22.02 to branch-22.04 (#10191) @bdice
  • Add CMake install rule for tests (#10190) @ajschmidt8
  • Unpin dask & distributed (#10182) @galipremsagar
  • Add comments to explain test validation (#10176) @galipremsagar
  • Reduce warnings in pytest output (#10168) @bdice
  • Some consolidation of indexed frame methods (#10167) @vyasr
  • Refactor isin implementations (#10165) @vyasr
  • Faster struct row comparator (#10164) @devavret
  • Refactor groupby::get_groups. (#10161) @bdice
  • Deprecate decimal_cols_as_float in ORC reader (C++ layer) (#10152) @vuule
  • Replace ccache with sccache (#10146) @ajschmidt8
  • Murmur3 hash kernel cleanup (#10143) @rwlee
  • Deprecate decimal_cols_as_float in ORC reader (#10142) @galipremsagar
  • Run pyupgrade 2.31.0. (#10141) @bdice
  • Remove drop_nan from internal IndexedFrame._drop_na_rows. (#10140) @bdice
  • Change cudf::strings::find_multiple to return a lists column (#10134) @davidwendt
  • Update cmake-format script for branch 22.04. (#10132) @bdice
  • Accept r-value references in convert_table_for_return(): (#10131) @mythrocks
  • Remove the option to completely disable decimal128 columns in the ORC reader (#10127) @vuule
  • Remove deprecated code (#10124) @vyasr
  • Update gpu_utils.py to reflect current CUDA support. (#10113) @bdice
  • Remove benchmarks suffix (#10112) @bdice
  • Update cudf java binding version to 22.04.0-SNAPSHOT (#10084) @pxLi
  • Remove unnecessary docker files. (#10069) @vyasr
  • Limit benchmark iterations using environment variable (#10060) @karthikeyann
  • Add timing chart for libcudf build metrics report page (#10038) @davidwendt
  • JNI: Rewrite growBuffersAndRows to accelerate the HostColumnBuilder (#10025) @sperlingxx
  • Reduce redundant code in CUDF JNI (#10019) @mythrocks
  • Make snappy decompress check more efficient (#9995) @cheinger
  • Remove deprecated method Series.set_index. (#9945) @bdice
  • Implement a mixin for reductions (#9925) @vyasr
  • JNI: Push back decimal utils from spark-rapids (#9907) @sperlingxx
  • Add assert_column_memory_* (#9882) @isVoid
  • Add CUDF_UNREACHABLE macro. (#9727) @bdice
  • Upgrade arrow & pyarrow to 6.0.1 (#9686) @galipremsagar
cudf - v22.02.00

Published by GPUtester over 2 years ago

🚨 Breaking Changes

  • ORC writer API changes for granular statistics (#10058) @mythrocks
  • decimal128 Support for to/from_arrow (#9986) @codereport
  • Remove deprecated method one_hot_encoding (#9977) @isVoid
  • Remove str.subword_tokenize (#9968) @VibhuJawa
  • Remove deprecated method parameter from merge and join. (#9944) @bdice
  • Remove deprecated method DataFrame.hash_columns. (#9943) @bdice
  • Remove deprecated method Series.hash_encode. (#9942) @bdice
  • Refactoring ceil/round/floor code for datetime64 types (#9926) @mayankanand007
  • Introduce nan_as_null parameter for cudf.Index (#9893) @galipremsagar
  • Add regex_flags parameter to strings replace_re functions (#9878) @davidwendt
  • Break tie for top categorical columns in Series.describe (#9867) @isVoid
  • Add partitioning support in parquet writer (#9810) @devavret
  • Move drop_duplicates, drop_na, _gather, take to IndexFrame and create their _base_index counterparts (#9807) @isVoid
  • Raise temporary error for decimal128 types in parquet reader (#9804) @galipremsagar
  • Change default dtype of all nulls column from float to object (#9803) @galipremsagar
  • Remove unused masked udf cython/c++ code (#9792) @brandon-b-miller
  • Pick smallest decimal type with required precision in ORC reader (#9775) @vuule
  • Add decimal128 support to Parquet reader and writer (#9765) @vuule
  • Refactor TableTest assertion methods to a separate utility class (#9762) @jlowe
  • Use cuFile direct device reads/writes by default in cuIO (#9722) @vuule
  • Match pandas scalar result types in reductions (#9717) @brandon-b-miller
  • Add parameters to control row group size in Parquet writer (#9677) @vuule
  • Refactor bit counting APIs, introduce valid/null count functions, and split host/device side code for segmented counts. (#9588) @bdice
  • Add support for decimal128 in cudf python (#9533) @galipremsagar
  • Implement lists::index_of() to find positions in list rows (#9510) @mythrocks
  • Rewriting row/column conversions for Spark <-> cudf data conversions (#8444) @hyperbolic2346

πŸ› Bug Fixes

  • Add check for negative stripe index in ORC reader (#10074) @vuule
  • Update Java tests to expect DECIMAL128 from Arrow (#10073) @jlowe
  • Avoid index materialization when DataFrame is created with un-named Series objects (#10071) @galipremsagar
  • fix gcc 11 compilation errors (#10067) @rongou
  • Fix columns ordering issue in parquet reader (#10066) @galipremsagar
  • Fix dataframe setitem with ndarray types (#10056) @galipremsagar
  • Remove implicit copy due to conversion from cudf::size_type and size_t (#10045) @robertmaynard
  • Include <optional> in headers that use std::optional (#10044) @robertmaynard
  • Fix repr and concat of StructColumn (#10042) @galipremsagar
  • Include row group level stats when writing ORC files (#10041) @vuule
  • build.sh respects the --build_metrics and --incl_cache_stats flags (#10035) @robertmaynard
  • Fix memory leaks in JNI native code. (#10029) @mythrocks
  • Update JNI to use new arena mr constructor (#10027) @rongou
  • Fix null check when comparing structs in arg_min operation of reduction/groupby (#10026) @ttnghia
  • Wrap CI script shell variables in quotes to fix local testing. (#10018) @bdice
  • cudftestutil no longer propagates compiler flags to external users (#10017) @robertmaynard
  • Remove CUDA_DEVICE_CALLABLE macro usage (#10015) @hyperbolic2346
  • Add missing list filling header in meta.yaml (#10007) @devavret
  • Fix conda recipes for custreamz & cudf_kafka (#10003) @ajschmidt8
  • Fix matching regex word-boundary (\b) in strings replace (#9997) @davidwendt
  • Fix null check when comparing structs in min and max reduction/groupby operations (#9994) @ttnghia
  • Fix octal pattern matching in regex string (#9993) @davidwendt
  • decimal128 Support for to/from_arrow (#9986) @codereport
  • Fix groupby shift/diff/fill after selecting from a GroupBy (#9984) @shwina
  • Fix the overflow problem of decimal rescale (#9966) @sperlingxx
  • Use default value for decimal precision in parquet writer when not specified (#9963) @devavret
  • Fix cudf java build error. (#9958) @firestarman
  • Use gpuci_mamba_retry to install local artifacts. (#9951) @bdice
  • Fix regression HostColumnVectorCore requiring native libs (#9948) @jlowe
  • Rename aggregate_metadata in writer to fix name collision (#9938) @devavret
  • Fixed issue with percentile_approx where output tdigests could have uninitialized data at the end. (#9931) @nvdbaranec
  • Resolve racecheck errors in ORC kernels (#9916) @vuule
  • Fix the java build after parquet partitioning support (#9908) @revans2
  • Fix compilation of benchmark for parquet writer. (#9905) @bdice
  • Fix a memcheck error in ORC writer (#9896) @vuule
  • Introduce nan_as_null parameter for cudf.Index (#9893) @galipremsagar
  • Fix fallback to sort aggregation for grouping only hash aggregate (#9891) @abellina
  • Add zlib to cudfjni link when using static libcudf library dependency (#9890) @jlowe
  • TimedeltaIndex constructor raises an AttributeError. (#9884) @skirui-source
  • Fix cudf.Scalar string datetime construction (#9875) @brandon-b-miller
  • Load libcufile.so with RTLD_NODELETE flag (#9872) @vuule
  • Break tie for top categorical columns in Series.describe (#9867) @isVoid
  • Fix null handling for structs min and arg_min in groupby, groupby scan, reduction, and inclusive_scan (#9864) @ttnghia
  • Add one-level list encoding support in parquet reader (#9848) @PointKernel
  • Fix an out-of-bounds read in validity copying in contiguous_split. (#9842) @nvdbaranec
  • Fix join of MultiIndex to Index with one column and overlapping name. (#9830) @vyasr
  • Fix caching in Series.applymap (#9821) @brandon-b-miller
  • Enforce boolean ascending for dask-cudf sort_values (#9814) @charlesbluca
  • Fix ORC writer crash with empty input columns (#9808) @vuule
  • Change default dtype of all nulls column from float to object (#9803) @galipremsagar
  • Load native dependencies when Java ColumnView is loaded (#9800) @jlowe
  • Fix dtype-argument bug in dask_cudf read_csv (#9796) @rjzamora
  • Fix overflow for min calculation in strings::from_timestamps (#9793) @revans2
  • Fix memory error due to lambda return type deduction limitation (#9778) @karthikeyann
  • Revert regex $/EOL end-of-string new-line special case handling (#9774) @davidwendt
  • Fix missing streams (#9767) @karthikeyann
  • Fix make_empty_scalar_like on list_type (#9759) @sperlingxx
  • Update cmake and conda to 22.02 (#9746) @devavret
  • Fix out-of-bounds memory write in decimal128-to-string conversion (#9740) @davidwendt
  • Match pandas scalar result types in reductions (#9717) @brandon-b-miller
  • Fix regex non-multiline EOL/$ matching strings ending with a new-line (#9715) @davidwendt
  • Fixed build by adding more checks for int8, int16 (#9707) @razajafri
  • Fix null handling when boolean dtype is passed (#9691) @galipremsagar
  • Fix stream usage in segmented_gather() (#9679) @mythrocks

πŸ“– Documentation

  • Update decimal dtypes related docs entries (#10072) @galipremsagar
  • Fix regex doc describing hexadecimal escape characters (#10009) @davidwendt
  • Fix cudf compilation instructions. (#9956) @esoha-nvidia
  • Fix see also links for IO APIs (#9895) @galipremsagar
  • Fix build instructions for libcudf doxygen (#9837) @davidwendt
  • Fix some doxygen warnings and add missing documentation (#9770) @karthikeyann
  • update cuda version in local build (#9736) @karthikeyann
  • Fix doxygen for enum types in libcudf (#9724) @davidwendt
  • Spell check fixes (#9682) @karthikeyann
  • Fix links in C++ Developer Guide. (#9675) @bdice

πŸš€ New Features

  • Remove libcudacxx patch needed for nvcc 11.4 (#10057) @robertmaynard
  • Allow CuPy 10 (#10048) @jakirkham
  • Add in support for NULL_LOGICAL_AND and NULL_LOGICAL_OR binops (#10016) @revans2
  • Add groupby.transform (only support for aggregations) (#10005) @shwina
  • Add partitioning support to Parquet chunked writer (#10000) @devavret
  • Add jni for sequences (#9972) @wbo4958
  • Java bindings for mixed left, inner, and full joins (#9941) @jlowe
  • Java bindings for JSON reader support (#9940) @wbo4958
  • Enable transpose for string columns in cudf python (#9937) @galipremsagar
  • Support structs for cudf::contains with column/scalar input (#9929) @ttnghia
  • Implement mixed equality/conditional joins (#9917) @vyasr
  • Add cudf::strings::extract_all API (#9909) @davidwendt
  • Implement JNI for cudf::scatter APIs (#9903) @ttnghia
  • JNI: Function to copy and set validity from bool column. (#9901) @mythrocks
  • Add dictionary support to cudf::copy_if_else (#9887) @davidwendt
  • add run_benchmarks target for running benchmarks with json output (#9879) @karthikeyann
  • Add regex_flags parameter to strings replace_re functions (#9878) @davidwendt
  • Add_suffix and add_prefix for DataFrames and Series (#9846) @mayankanand007
  • Add JNI for cudf::drop_duplicates (#9841) @ttnghia
  • Implement per-list sequence (#9839) @ttnghia
  • adding series.transpose (#9835) @mayankanand007
  • Adding support for Series.autocorr (#9833) @mayankanand007
  • Support round operation on datetime64 datatypes (#9820) @mayankanand007
  • Add partitioning support in parquet writer (#9810) @devavret
  • Raise temporary error for decimal128 types in parquet reader (#9804) @galipremsagar
  • Add decimal128 support to Parquet reader and writer (#9765) @vuule
  • Optimize groupby::scan (#9754) @PointKernel
  • Add sample JNI API (#9728) @res-life
  • Support min and max in inclusive scan for structs (#9725) @ttnghia
  • Add first and last method to IndexedFrame (#9710) @isVoid
  • Support min and max reduction for structs (#9697) @ttnghia
  • Add parameters to control row group size in Parquet writer (#9677) @vuule
  • Run compute-sanitizer in nightly build (#9641) @karthikeyann
  • Implement Series.datetime.floor (#9571) @skirui-source
  • ceil/floor for DatetimeIndex (#9554) @mayankanand007
  • Add support for decimal128 in cudf python (#9533) @galipremsagar
  • Implement lists::index_of() to find positions in list rows (#9510) @mythrocks
  • custreamz oauth callback for kafka (librdkafka) (#9486) @jdye64
  • Add Pearson correlation for sort groupby (python) (#9166) @skirui-source
  • Interchange dataframe protocol (#9071) @iskode
  • Rewriting row/column conversions for Spark <-> cudf data conversions (#8444) @hyperbolic2346

πŸ› οΈ Improvements

  • Prepare upload scripts for Python 3.7 removal (#10092) @Ethyling
  • Simplify custreamz and cudf_kafka recipes files (#10065) @Ethyling
  • ORC writer API changes for granular statistics (#10058) @mythrocks
  • Remove python constraints in cutreamz and cudf_kafka recipes (#10052) @Ethyling
  • Unpin dask and distributed in CI (#10028) @galipremsagar
  • Add _from_column_like_self factory (#10022) @isVoid
  • Replace custom CUDA bindings previously provided by RMM with official CUDA Python bindings (#10008) @shwina
  • Use cuda::std::is_arithmetic in cudf::is_numeric trait. (#9996) @bdice
  • Clean up CUDA stream use in cuIO (#9991) @vuule
  • Use addressed-ordered first fit for the pinned memory pool (#9989) @rongou
  • Add strings tests to transpose_test.cpp (#9985) @davidwendt
  • Use gpuci_mamba_retry on Java CI. (#9983) @bdice
  • Remove deprecated method one_hot_encoding (#9977) @isVoid
  • Minor cleanup of unused Python functions (#9974) @vyasr
  • Use new efficient partitioned parquet writing in cuDF (#9971) @devavret
  • Remove str.subword_tokenize (#9968) @VibhuJawa
  • Forward-merge branch-21.12 to branch-22.02 (#9947) @bdice
  • Remove deprecated method parameter from merge and join. (#9944) @bdice
  • Remove deprecated method DataFrame.hash_columns. (#9943) @bdice
  • Remove deprecated method Series.hash_encode. (#9942) @bdice
  • use ninja in java ci build (#9933) @rongou
  • Add build-time publish step to cpu build script (#9927) @davidwendt
  • Refactoring ceil/round/floor code for datetime64 types (#9926) @mayankanand007
  • Remove various unused functions (#9922) @vyasr
  • Raise in query if dtype is not supported (#9921) @brandon-b-miller
  • Add missing imports tests (#9920) @Ethyling
  • Spark Decimal128 hashing (#9919) @rwlee
  • Replace thrust/std::get with structured bindings (#9915) @codereport
  • Upgrade thrust version to 1.15 (#9912) @robertmaynard
  • Remove conda envs for CUDA 11.0 and 11.2. (#9910) @bdice
  • Return count of set bits from inplace_bitmask_and. (#9904) @bdice
  • Use dynamic nullate for join hasher and equality comparator (#9902) @davidwendt
  • Update ucx-py version on release using rvc (#9897) @Ethyling
  • Remove IncludeCategories from .clang-format (#9876) @codereport
  • Support statically linking CUDA runtime for Java bindings (#9873) @jlowe
  • Add clang-tidy to libcudf (#9860) @codereport
  • Remove deprecated methods from Java Table class (#9853) @jlowe
  • Add test for map column metadata handling in ORC writer (#9852) @vuule
  • Use pandas to_offset to parse frequency string in date_range (#9843) @isVoid
  • add templated benchmark with fixture (#9838) @karthikeyann
  • Use list of column inputs for apply_boolean_mask (#9832) @isVoid
  • Added a few more tests for Decimal to String cast (#9818) @razajafri
  • Run doctests. (#9815) @bdice
  • Avoid overflow for fixed_point round (#9809) @sperlingxx
  • Move drop_duplicates, drop_na, _gather, take to IndexFrame and create their _base_index counterparts (#9807) @isVoid
  • Use vector factories for host-device copies. (#9806) @bdice
  • Refactor host device macros (#9797) @vyasr
  • Remove unused masked udf cython/c++ code (#9792) @brandon-b-miller
  • Allow custom sort functions for dask-cudf sort_values (#9789) @charlesbluca
  • Improve build time of libcudf iterator tests (#9788) @davidwendt
  • Copy Java native dependencies directly into classpath (#9787) @jlowe
  • Add decimal types to cuIO benchmarks (#9776) @vuule
  • Pick smallest decimal type with required precision in ORC reader (#9775) @vuule
  • Avoid overflow for fixed_point cudf::cast and performance optimization (#9772) @codereport
  • Use CTAD with Thrust function objects (#9768) @codereport
  • Refactor TableTest assertion methods to a separate utility class (#9762) @jlowe
  • Use Java classloader to find test resources (#9760) @jlowe
  • Allow cast decimal128 to string and add tests (#9756) @razajafri
  • Load balance optimization for contiguous_split (#9755) @nvdbaranec
  • Consolidate and improve reset_index (#9750) @isVoid
  • Update to UCX-Py 0.24 (#9748) @pentschev
  • Skip cufile tests in JNI build script (#9744) @pxLi
  • Enable string to decimal 128 cast (#9742) @razajafri
  • Use stop instead of stop_. (#9735) @bdice
  • Forward-merge branch-21.12 to branch-22.02 (#9730) @bdice
  • Improve cmake format script (#9723) @vyasr
  • Use cuFile direct device reads/writes by default in cuIO (#9722) @vuule
  • Add directory-partitioned data support to cudf.read_parquet (#9720) @rjzamora
  • Use stream allocator adaptor for hash join table (#9704) @PointKernel
  • Update check for inf/nan strings in libcudf float conversion to ignore case (#9694) @davidwendt
  • Update cudf JNI to 22.02.0-SNAPSHOT (#9681) @pxLi
  • Replace cudf's concurrent_ordered_map with cuco::static_map in semi/anti joins (#9666) @vyasr
  • Some improvements to parse_decimal function and bindings for is_fixed_point (#9658) @razajafri
  • Add utility to format ninja-log build times (#9631) @davidwendt
  • Allow runtime has_nulls parameter for row operators (#9623) @davidwendt
  • Use fsspec.parquet for improved read_parquet performance from remote storage (#9589) @rjzamora
  • Refactor bit counting APIs, introduce valid/null count functions, and split host/device side code for segmented counts. (#9588) @bdice
  • Use List of Columns as Input for drop_nulls, gather and drop_duplicates (#9558) @isVoid
  • Simplify merge internals and reduce overhead (#9516) @vyasr
  • Add struct generation support in datagenerator & fuzz tests (#9180) @galipremsagar
  • Simplify write_csv by removing unnecessary writer/impl classes (#9089) @cwharris
cudf - v21.12.02

Published by GPUtester almost 3 years ago

v21.12.02

cudf - v21.12.01

Published by GPUtester almost 3 years ago

v21.12.01

cudf - v21.12.00

Published by GPUtester almost 3 years ago

🚨 Breaking Changes

  • Update bitmask_and and bitmask_or to return a pair of resulting mask and count of unset bits (#9616) @PointKernel
  • Remove sizeof and standardize on memory_usage (#9544) @vyasr
  • Add support for single-line regex anchors ^/$ in contains_re (#9482) @davidwendt
  • Refactor sorting APIs (#9464) @vyasr
  • Update Java nvcomp JNI bindings to nvcomp 2.x API (#9384) @jbrennan333
  • Support Python UDFs written in terms of rows (#9343) @brandon-b-miller
  • JNI: Support nested types in ORC writer (#9334) @firestarman
  • Optionally nullify out-of-bounds indices in segmented_gather(). (#9318) @mythrocks
  • Refactor cuIO timestamp processing with cuda::std::chrono (#9278) @PointKernel
  • Various internal MultiIndex improvements (#9243) @vyasr

πŸ› Bug Fixes

  • Fix read_parquet bug for bytes input (#9669) @rjzamora
  • Use _gather internal for sort_* (#9668) @isVoid
  • Fix behavior of equals for non-DataFrame Frames and add tests. (#9653) @vyasr
  • Dont recompute output size if it is already available (#9649) @abellina
  • Fix read_parquet bug for extended dtypes from remote storage (#9638) @rjzamora
  • add const when getting data from a JNI data wrapper (#9637) @wjxiz1992
  • Fix debrotli issue on CUDA 11.5 (#9632) @vuule
  • Use std::size_t when computing join output size (#9626) @jlowe
  • Fix usecols parameter handling in dask_cudf.read_csv (#9618) @galipremsagar
  • Add support for string &#39;nan&#39;, &#39;inf&#39; &amp; &#39;-inf&#39; values while type-casting to float (#9613) @galipremsagar
  • Avoid passing NativeFileDatasource to pyarrow in read_parquet (#9608) @rjzamora
  • Fix test failure with cuda 11.5 in row_bit_count tests. (#9581) @nvdbaranec
  • Correct _LIBCUDACXX_CUDACC_VER value computation (#9579) @robertmaynard
  • Increase max RLE stream size estimate to avoid potential overflows (#9568) @vuule
  • Fix edge case in tdigest scalar generation for groups containing all nulls. (#9551) @nvdbaranec
  • Fix pytests failing in cuda-11.5 environment (#9547) @galipremsagar
  • compile libnvcomp with PTDS if requested (#9540) @jbrennan333
  • Fix segmented_gather() for null LIST rows (#9537) @mythrocks
  • Deprecate DataFrame.label_encoding, use private _label_encoding method internally. (#9535) @bdice
  • Fix several test and benchmark issues related to bitmask allocations. (#9521) @nvdbaranec
  • Fix for inserting duplicates in groupby result cache (#9508) @karthikeyann
  • Fix mismatched types error in clip() when using non int64 numeric types (#9498) @davidwendt
  • Match conda pinnings for style checks (revert part of #9412, #9433). (#9490) @bdice
  • Make sure all dask-cudf supported aggs are handled in _tree_node_agg (#9487) @charlesbluca
  • Resolve hash_columns FutureWarning in dask_cudf (#9481) @pentschev
  • Add fixed point to AllTypes in libcudf unit tests (#9472) @karthikeyann
  • Fix regex handling of embedded null characters (#9470) @davidwendt
  • Fix memcheck error in copy-if-else (#9467) @davidwendt
  • Fix bug in dask_cudf.read_parquet for index=False (#9453) @rjzamora
  • Preserve the decimal scale when creating a default scalar (#9449) @revans2
  • Push down parent nulls when flattening nested columns. (#9443) @mythrocks
  • Fix memcheck error in gtest SegmentedGatherTest/GatherSliced (#9442) @davidwendt
  • Revert "Fix quantile division / partition handling for dask-cudf sort… (#9438) @charlesbluca
  • Allow int-like objects for the decimals argument in round (#9428) @shwina
  • Fix stream compaction's drop_duplicates API to use stable sort (#9417) @ttnghia
  • Skip Comparing Uniform Window Results in Var/std Tests (#9416) @isVoid
  • Fix StructColumn.to_pandas type handling issues (#9388) @galipremsagar
  • Correct issues in the build dir cudf-config.cmake (#9386) @robertmaynard
  • Fix Java table partition test to account for non-deterministic ordering (#9385) @jlowe
  • Fix timestamp truncation/overflow bugs in orc/parquet (#9382) @PointKernel
  • Fix the crash in stats code (#9368) @devavret
  • Make Series.hash_encode results reproducible. (#9366) @bdice
  • Fix libcudf compile warnings on debug 11.4 build (#9360) @davidwendt
  • Fail gracefully when compiling python UDFs that attempt to access columns with unsupported dtypes (#9359) @brandon-b-miller
  • Set pass_filenames: false in mypy pre-commit configuration. (#9349) @bdice
  • Fix cudf_assert in cudf::io::orc::gpu::gpuDecodeOrcColumnData (#9348) @davidwendt
  • Fix memcheck error in groupby-tdigest get_scalar_minmax (#9339) @davidwendt
  • Optimizations for cudf.concat when axis=1 (#9333) @galipremsagar
  • Use f-string in join helper warning message. (#9325) @bdice
  • Avoid casting to list or struct dtypes in dask_cudf.read_parquet (#9314) @rjzamora
  • Fix null count in statistics for parquet (#9303) @devavret
  • Potential overflow of decimal32 when casting to int64_t (#9287) @codereport
  • Fix quantile division / partition handling for dask-cudf sort on null dataframes (#9259) @charlesbluca
  • Updating cudf version also updates rapids cmake branch (#9249) @robertmaynard
  • Implement one_hot_encoding in libcudf and bind to python (#9229) @isVoid
  • BUG FIX: CSV Writer ignores the header parameter when no metadata is provided (#8740) @skirui-source

πŸ“– Documentation

  • Update Documentation to use TYPED_TEST_SUITE (#9654) @codereport
  • Add dedicated page for StringHandling in python docs (#9624) @galipremsagar
  • Update docstring of DataFrame.merge (#9572) @galipremsagar
  • Use raw strings to avoid SyntaxErrors in parsed docstrings. (#9526) @bdice
  • Add example to docstrings in rolling.apply (#9522) @isVoid
  • Update help message to escape quotes in ./build.sh --cmake-args. (#9494) @bdice
  • Improve Python docstring formatting. (#9493) @bdice
  • Update table of I/O supported types (#9476) @vuule
  • Document invalid regex patterns as undefined behavior (#9473) @davidwendt
  • Miscellaneous documentation fixes to cudf (#9471) @galipremsagar
  • Fix many documentation errors in libcudf. (#9355) @karthikeyann
  • Fixing SubwordTokenizer docs issue (#9354) @mayankanand007
  • Improved deprecation warnings. (#9347) @bdice
  • doc reorder mr, stream to stream, mr (#9308) @karthikeyann
  • Deprecate method parameters to DataFrame.join, DataFrame.merge. (#9291) @bdice
  • Added deprecation warning for .label_encoding() (#9289) @mayankanand007

πŸš€ New Features

  • Enable Series.divide and DataFrame.divide (#9630) @vyasr
  • Update bitmask_and and bitmask_or to return a pair of resulting mask and count of unset bits (#9616) @PointKernel
  • Add handling of mixed numeric types in to_dlpack (#9585) @galipremsagar
  • Support re.Pattern object for pat arg in str.replace (#9573) @davidwendt
  • Add JNI for lists::drop_list_duplicates with keys-values input column (#9553) @ttnghia
  • Support structs column in min, max, argmin and argmax groupby aggregate() and scan() (#9545) @ttnghia
  • Move libcudacxx to use rapids_cpm and use newer versions (#9539) @robertmaynard
  • Add scan min/max support for chrono types to libcudf reduction-scan (not groupby scan) (#9518) @davidwendt
  • Support args= in apply (#9514) @brandon-b-miller
  • Add groupby scan min/max support for strings values (#9502) @davidwendt
  • Add list output option to character_ngrams() function (#9499) @davidwendt
  • More granular column selection in ORC reader (#9496) @vuule
  • add min_periods, ddof to groupby covariance, & correlation aggregation (#9492) @karthikeyann
  • Implement Series.datetime.floor (#9488) @skirui-source
  • Enable linting of CMake files using pre-commit (#9484) @vyasr
  • Add support for single-line regex anchors ^/$ in contains_re (#9482) @davidwendt
  • Augment order_by to Accept a List of null_precedence (#9455) @isVoid
  • Add format API for list column of strings (#9454) @davidwendt
  • Enable Datetime/Timedelta dtypes in Masked UDFs (#9451) @brandon-b-miller
  • Add cudf python groupby.diff (#9446) @karthikeyann
  • Implement lists::stable_sort_lists for stable sorting of elements within each row of lists column (#9425) @ttnghia
  • add ctest memcheck using cuda-sanitizer (#9414) @karthikeyann
  • Support Unary Operations in Masked UDF (#9409) @isVoid
  • Move Several Series Function to Frame (#9394) @isVoid
  • MD5 Python hash API (#9390) @bdice
  • Add cudf strings is_title API (#9380) @davidwendt
  • Enable casting to int64, uint64, and double in AST code. (#9379) @vyasr
  • Add support for writing ORC with map columns (#9369) @vuule
  • extract_list_elements() with column_view indices (#9367) @mythrocks
  • Reimplement lists::drop_list_duplicates for keys-values lists columns (#9345) @ttnghia
  • Support Python UDFs written in terms of rows (#9343) @brandon-b-miller
  • JNI: Support nested types in ORC writer (#9334) @firestarman
  • Optionally nullify out-of-bounds indices in segmented_gather(). (#9318) @mythrocks
  • Add shallow hash function and shallow equality comparison for column_view (#9312) @karthikeyann
  • Add CudaMemoryBuffer for cudaMalloc memory using RMM cuda_memory_resource (#9311) @rongou
  • Add parameters to control row index stride and stripe size in ORC writer (#9310) @vuule
  • Add na_position param to dask-cudf sort_values (#9264) @charlesbluca
  • Add ascending parameter for dask-cudf sort_values (#9250) @charlesbluca
  • New array conversion methods (#9236) @vyasr
  • Series apply method backed by masked UDFs (#9217) @brandon-b-miller
  • Grouping by frequency and resampling (#9178) @shwina
  • Pure-python masked UDFs (#9174) @brandon-b-miller
  • Add Covariance, Pearson correlation for sort groupby (libcudf) (#9154) @karthikeyann
  • Add calendrical_month_sequence in c++ and date_range in python (#8886) @shwina

πŸ› οΈ Improvements

  • Followup to PR 9088 comments (#9659) @cwharris
  • Update cuCollections to version that supports installed libcudacxx (#9633) @robertmaynard
  • Add 11.5 dev.yml to cudf (#9617) @galipremsagar
  • Add xfail for parquet reader 11.5 issue (#9612) @galipremsagar
  • remove deprecated Rmm.initialize method (#9607) @rongou
  • Use HostColumnVectorCore for child columns in JCudfSerialization.unpackHostColumnVectors (#9596) @sperlingxx
  • Set RMM pool to a fixed size in JNI (#9583) @rongou
  • Use nvCOMP for Snappy compression/decompression (#9582) @vuule
  • Build CUDA version agnostic packages for dask-cudf (#9578) @Ethyling
  • Fixed tests warning: "TYPED_TEST_CASE is deprecated, please use TYPED_TEST_SUITE" (#9574) @ttnghia
  • Enable CMake format in CI and fix style (#9570) @vyasr
  • Add NVTX Start/End Ranges to JNI (#9563) @abellina
  • Add librdkafka and python-confluent-kafka to dev conda environments s… (#9562) @jdye64
  • Add offsets_begin/end() to strings_column_view (#9559) @davidwendt
  • remove alignment options for RMM jni (#9550) @rongou
  • Add axis parameter passthrough to DataFrame and Series take for pandas API compatibility (#9549) @dantegd
  • Remove sizeof and standardize on memory_usage (#9544) @vyasr
  • Adds cudaProfilerStart/cudaProfilerStop in JNI api (#9543) @abellina
  • Generalize comparison binary operations (#9542) @vyasr
  • Expose APIs to wrap CUDA or RMM allocations with a Java device buffer instance (#9538) @jlowe
  • Add scan sum support for duration types to libcudf (#9536) @davidwendt
  • Force inlining to improve AST performance (#9530) @vyasr
  • Generalize some more indexed frame methods (#9529) @vyasr
  • Add Java bindings for rolling window stddev aggregation (#9527) @razajafri
  • catch rmm::out_of_memory exceptions in jni (#9525) @rongou
  • Add an overload of make_empty_column with type_id parameter (#9524) @ttnghia
  • Accelerate conditional inner joins with larger right tables (#9523) @vyasr
  • Initial pass of generalizing decimal support in cudf python layer (#9517) @galipremsagar
  • Cleanup for flattening nested columns (#9509) @rwlee
  • Enable running tests using RMM arena and async memory resources (#9506) @rongou
  • Remove dependency on six. (#9495) @bdice
  • Cleanup some libcudf strings gtests (#9489) @davidwendt
  • Rename strings/array_tests.cu to strings/array_tests.cpp (#9480) @davidwendt
  • Refactor sorting APIs (#9464) @vyasr
  • Implement DataFrame.hash_values, deprecate DataFrame.hash_columns. (#9458) @bdice
  • Deprecate Series.hash_encode. (#9457) @bdice
  • Update conda recipes for Enhanced Compatibility effort (#9456) @ajschmidt8
  • Small clean up to simplify column selection code in ORC reader (#9444) @vuule
  • add missing stream to scalar.is_valid() wherever stream is available (#9436) @karthikeyann
  • Adds Deprecation Warnings to one_hot_encoding and Implement get_dummies with Cython API (#9435) @isVoid
  • Update pre-commit hook URLs. (#9433) @bdice
  • Remove pyarrow import in dask_cudf.io.parquet (#9429) @charlesbluca
  • Miscellaneous improvements for UDFs (#9422) @isVoid
  • Use pre-commit for CI (#9412) @vyasr
  • Update to UCX-Py 0.23 (#9407) @pentschev
  • Expose OutOfBoundsPolicy in JNI for Table.gather (#9406) @abellina
  • Improvements to tdigest aggregation code. (#9403) @nvdbaranec
  • Add Java API to deserialize a table to host columns (#9402) @jlowe
  • Frame copy to use class instead of type() (#9397) @madsbk
  • Change all DeprecationWarnings to FutureWarning. (#9392) @bdice
  • Update Java nvcomp JNI bindings to nvcomp 2.x API (#9384) @jbrennan333
  • Add IndexedFrame class and move SingleColumnFrame to a separate module (#9378) @vyasr
  • Support Arrow NativeFile and PythonFile for remote ORC storage (#9377) @rjzamora
  • Use Arrow PythonFile for remote CSV storage (#9376) @rjzamora
  • Add multi-threaded writing to GDS writes (#9372) @devavret
  • Miscellaneous column cleanup (#9370) @vyasr
  • Use single kernel to extract all groups in cudf::strings::extract (#9358) @davidwendt
  • Consolidate binary ops into Frame (#9357) @isVoid
  • Move rank scan implementations from scan_inclusive.cu to rank_scan.cu (#9351) @davidwendt
  • Remove usage of deprecated thrust::host_space_tag. (#9350) @bdice
  • Use Default Memory Resource for Temporaries in reduction.cpp (#9344) @isVoid
  • Fix Cython compilation warnings. (#9327) @bdice
  • Fix some unused variable warnings in libcudf (#9326) @davidwendt
  • Use optional-iterator for copy-if-else kernel (#9324) @davidwendt
  • Remove Table class (#9315) @vyasr
  • Unpin dask and distributed in CI (#9307) @galipremsagar
  • Add optional-iterator support to indexalator (#9306) @davidwendt
  • Consolidate more methods in Frame (#9305) @vyasr
  • Add Arrow-NativeFile and PythonFile support to read_parquet and read_csv in cudf (#9304) @rjzamora
  • Pin mypy in .pre-commit-config.yaml to match conda environment pinning. (#9300) @bdice
  • Use gather.hpp when gather-map exists in device memory (#9299) @davidwendt
  • Fix Automerger for Branch-21.12 from branch-21.10 (#9285) @galipremsagar
  • Refactor cuIO timestamp processing with cuda::std::chrono (#9278) @PointKernel
  • Change strings copy_if_else to use optional-iterator instead of pair-iterator (#9266) @davidwendt
  • Update cudf java bindings to 21.12.0-SNAPSHOT (#9248) @pxLi
  • Various internal MultiIndex improvements (#9243) @vyasr
  • Add detail interface for split and slice(table_view), refactors both function with host_span (#9226) @isVoid
  • Refactor MD5 implementation. (#9212) @bdice
  • Update groupby result_cache to allow sharing intermediate results based on column_view instead of requests. (#9195) @karthikeyann
  • Use nvcomp's snappy decompressor in avro reader (#9181) @devavret
  • Add isocalendar API support (#9169) @marlenezw
  • Simplify read_json by removing unnecessary reader/impl classes (#9088) @cwharris
  • Simplify read_csv by removing unnecessary reader/impl classes (#9041) @cwharris
  • Refactor hash join with cuCollections multimap (#8934) @PointKernel
cudf - v21.10.01

Published by GPUtester about 3 years ago

v21.10.01

cudf - v21.10.00

Published by GPUtester about 3 years ago

🚨 Breaking Changes

  • Remove Cython APIs for table view generation (#9199) @vyasr
  • Upgrade pandas version in cudf (#9147) @galipremsagar
  • Make AST operators nullable (#9096) @vyasr
  • Remove the option to pass data types as strings to read_csv and read_json (#9079) @vuule
  • Update JNI java CSV APIs to not use deprecated API (#9066) @revans2
  • Support additional format specifiers in from_timestamps (#9047) @davidwendt
  • Expose expression base class publicly and simplify public AST API (#9045) @vyasr
  • Add support for struct type in ORC writer (#9025) @vuule
  • Remove aliases of various api.types APIs from utils.dtypes. (#9011) @vyasr
  • Java bindings for conditional join output sizes (#9002) @jlowe
  • Move compute_column API out of ast namespace (#8957) @vyasr
  • cudf.dtype function (#8949) @shwina
  • Refactor Frame reductions (#8944) @vyasr
  • Add nested column selection to parquet reader (#8933) @devavret
  • JNI Aggregation Type Changes (#8919) @revans2
  • Add groupby_aggregation and groupby_scan_aggregation classes and force their usage. (#8906) @nvdbaranec
  • Expand CSV and JSON reader APIs to accept dtypes as a vector or map of data_type objects (#8856) @vuule
  • Change cudf docs theme to pydata theme (#8746) @galipremsagar
  • Enable compiled binary ops in libcudf, python and java (#8741) @karthikeyann
  • Make groupby transform-like op order match original data order (#8720) @isVoid

πŸ› Bug Fixes

  • fixed_point cudf::groupby for mean aggregation (#9296) @codereport
  • Fix interleave_columns when the input string lists column having empty child column (#9292) @ttnghia
  • Update nvcomp to include fixes for installation of headers (#9276) @devavret
  • Fix Java column leak in testParquetWriteMap (#9271) @jlowe
  • Fix call to thrust::reduce_by_key in argmin/argmax libcudf groupby (#9263) @davidwendt
  • Fixing empty input to getMapValue crashing (#9262) @hyperbolic2346
  • Fix duplicate names issue in MultiIndex.deserialize (#9258) @galipremsagar
  • Dataframe.sort_index optimizations (#9238) @galipremsagar
  • Temporarily disabling problematic test in parquet writer (#9230) @devavret
  • Explicitly disable groupby on unsupported key types. (#9227) @mythrocks
  • Fix gather for sliced input structs column (#9218) @ttnghia
  • Fix JNI code for left semi and anti joins (#9207) @jlowe
  • Only install thrust when using a non 'system' version (#9206) @robertmaynard
  • Remove zlib from libcudf public CMake dependencies (#9204) @robertmaynard
  • Fix out-of-bounds memory read in orc gpuEncodeOrcColumnData (#9196) @davidwendt
  • Fix gather() for STRUCT inputs with no nulls in members. (#9194) @mythrocks
  • get_cucollections properly uses rapids_cpm_find (#9189) @robertmaynard
  • rapids-export correctly reference build code block and doc strings (#9186) @robertmaynard
  • Fix logic while parsing the sum statistic for numerical orc columns (#9183) @ayushdg
  • Add handling for nulls in dask_cudf.sorting.quantile_divisions (#9171) @charlesbluca
  • Approximate overflow detection in ORC statistics (#9163) @vuule
  • Use decimal precision metadata when reading from parquet files (#9162) @shwina
  • Fix variable name in Java build script (#9161) @jlowe
  • Import rapids-cmake modules using the correct cmake variable. (#9149) @robertmaynard
  • Fix conditional joins with empty left table (#9146) @vyasr
  • Fix joining on indexes with duplicate level names (#9137) @shwina
  • Fixes missing child column name in dtype while reading ORC file. (#9134) @rgsl888prabhu
  • Apply type metadata after column is slice-copied (#9131) @isVoid
  • Fix a bug: inner_join_size return zero if build table is empty (#9128) @PointKernel
  • Fix multi hive-partition parquet reading in dask-cudf (#9122) @rjzamora
  • Support null literals in expressions (#9117) @vyasr
  • Fix cudf::hash_join output size for struct joins (#9107) @jlowe
  • Import fix (#9104) @shwina
  • Fix cudf::strings::is_fixed_point checking of overflow for decimal32 (#9093) @davidwendt
  • Fix branch_stack calculation in row_bit_count() (#9076) @mythrocks
  • Fetch rapids-cmake to work around cuCollection cmake issue (#9075) @jlowe
  • Fix compilation errors in groupby benchmarks. (#9072) @nvdbaranec
  • Preserve float16 upscaling (#9069) @galipremsagar
  • Fix memcheck read error in libcudf contiguous_split (#9067) @davidwendt
  • Add support for reading ORC file with no row group index (#9060) @rgsl888prabhu
  • Various multiindex related fixes (#9036) @shwina
  • Avoid rebuilding cython in build.sh (#9034) @brandon-b-miller
  • Add support for percentile dispatch in dask_cudf (#9031) @galipremsagar
  • cudf resolve nvcc 11.0 compiler crashes during codegen (#9028) @robertmaynard
  • Fetch correct grouping keys agg of dask groupby (#9022) @galipremsagar
  • Allow where() to work with a Series and other=cudf.NA (#9019) @sarahyurick
  • Use correct index when returning Series from GroupBy.apply() (#9016) @charlesbluca
  • Fix Dataframe indexer setitem when array is passed (#9006) @galipremsagar
  • Fix ORC reading of files with struct columns that have null values (#9005) @vuule
  • Ensure JNI native libraries load when CompiledExpression loads (#8997) @jlowe
  • Fix memory read error in get_dremel_data in page_enc.cu (#8995) @davidwendt
  • Fix memory write error in get_list_child_to_list_row_mapping utility (#8994) @davidwendt
  • Fix debug compile error for csv_test.cpp (#8981) @davidwendt
  • Fix memory read/write error in concatenate_lists_ignore_null (#8978) @davidwendt
  • Fix concatenation of cudf.RangeIndex (#8970) @galipremsagar
  • Java conditional joins should not require matching column counts (#8955) @jlowe
  • Fix concatenate empty structs (#8947) @sperlingxx
  • Fix cuda-memcheck errors for some libcudf functions (#8941) @davidwendt
  • Apply series name to result of SeriesGroupby.apply() (#8939) @charlesbluca
  • cdef packed_columns as cppclass instead of struct (#8936) @charlesbluca
  • Inserting a cudf.NA into a DataFrame (#8923) @sarahyurick
  • Support casting with Pandas dtype aliases (#8920) @sarahyurick
  • Allow sort_values to accept same kind values as Pandas (#8912) @sarahyurick
  • Enable casting to pandas nullable dtypes (#8889) @brandon-b-miller
  • Fix libcudf memory errors (#8884) @karthikeyann
  • Throw KeyError when accessing field from struct with nonexistent key (#8880) @NV-jpt
  • replace auto with auto& ref for cast<&> (#8866) @karthikeyann
  • Add missing include<optional> in binops (#8864) @karthikeyann
  • Fix select_dtypes to work when non-class dtypes present in dataframe (#8849) @sarahyurick
  • Re-enable JSON tests (#8843) @vuule
  • Support header with embedded delimiter in csv writer (#8798) @davidwendt

πŸ“– Documentation

  • Add IO docs page in cudf documentation (#9145) @galipremsagar
  • use correct namespace in cuio code examples (#9037) @cwharris
  • Restructuring Contributing doc (#9026) @iskode
  • Update stable version in readme (#9008) @galipremsagar
  • Add spans and more include guidelines to libcudf developer guide (#8931) @harrism
  • Update Java build instructions to mention Arrow S3 and Docker (#8867) @jlowe
  • List GDS-enabled formats in the docs (#8805) @vuule
  • Change cudf docs theme to pydata theme (#8746) @galipremsagar

πŸš€ New Features

  • Revert "Add shallow hash function and shallow equality comparison for column_view (#9185)" (#9283) @karthikeyann
  • Align DataFrame.apply signature with pandas (#9275) @brandon-b-miller
  • Add struct type support for drop_list_duplicates (#9202) @ttnghia
  • support CUDA async memory resource in JNI (#9201) @rongou
  • Add shallow hash function and shallow equality comparison for column_view (#9185) @karthikeyann
  • Superimpose null masks for STRUCT columns. (#9144) @mythrocks
  • Implemented bindings for ceil timestamp operation (#9141) @shaneding
  • Adding MAP type support for ORC Reader (#9132) @rgsl888prabhu
  • Implement interleave_columns for lists with arbitrary nested type (#9130) @ttnghia
  • Add python bindings to fixed-size window and groupby rolling.var, rolling.std (#9097) @isVoid
  • Make AST operators nullable (#9096) @vyasr
  • Java bindings for approx_percentile (#9094) @andygrove
  • Add dseries.struct.explode (#9086) @isVoid
  • Add support for BaseIndexer in Rolling APIs (#9085) @galipremsagar
  • Remove the option to pass data types as strings to read_csv and read_json (#9079) @vuule
  • Add handling for nested dicts in dask-cudf groupby (#9054) @charlesbluca
  • Added Series.dt.is_quarter_start and Series.dt.is_quarter_end (#9046) @TravisHester
  • Support nested types for nth_element reduction (#9043) @sperlingxx
  • Update sort groupby to use non-atomic operation (#9035) @karthikeyann
  • Add support for struct type in ORC writer (#9025) @vuule
  • Implement interleave_columns for structs columns (#9012) @ttnghia
  • Add groupby first and last aggregations (#9004) @shwina
  • Add DecimalBaseColumn and move as_decimal_column (#9001) @isVoid
  • Python/Cython bindings for multibyte_split (#8998) @jdye64
  • Support scalar months in add_calendrical_months, extends API to INT32 support (#8991) @isVoid
  • Added Series.dt.is_month_end (#8989) @TravisHester
  • Support for using tdigests to compute approximate percentiles. (#8983) @nvdbaranec
  • Support "unflatten" of columns flattened via flatten_nested_columns(): (#8956) @mythrocks
  • Implement timestamp ceil (#8942) @shaneding
  • Add nested column selection to parquet reader (#8933) @devavret
  • Expose conditional join size calculation (#8928) @vyasr
  • Support Nulls in Timeseries Generator (#8925) @isVoid
  • Avoid index equality check in _CPackedColumns.from_py_table() (#8917) @charlesbluca
  • Add dot product binary op (#8909) @charlesbluca
  • Expose days_in_month function in libcudf and add python bindings (#8892) @isVoid
  • Series string repeat (#8882) @sarahyurick
  • Python binding for quarters (#8862) @shaneding
  • Expand CSV and JSON reader APIs to accept dtypes as a vector or map of data_type objects (#8856) @vuule
  • Add Java bindings for AST transform (#8846) @jlowe
  • Series datetime is_month_start (#8844) @sarahyurick
  • Support bracket syntax for cudf::strings::replace_with_backrefs group index values (#8841) @davidwendt
  • Support VARIANCE and STD aggregation in rolling op (#8809) @isVoid
  • Add quarters to libcudf datetime (#8779) @shaneding
  • Linear Interpolation of nans via cupy (#8767) @brandon-b-miller
  • Enable compiled binary ops in libcudf, python and java (#8741) @karthikeyann
  • Make groupby transform-like op order match original data order (#8720) @isVoid
  • multibyte_split (#8702) @cwharris
  • Implement JNI for strings:repeat_strings that repeats each string separately by different numbers of times (#8572) @ttnghia

πŸ› οΈ Improvements

  • Pin max dask and distributed versions to 2021.09.1 (#9286) @galipremsagar
  • Optimized fsspec data transfer for remote file-systems (#9265) @rjzamora
  • Skip dask-cudf tests on arm64 (#9252) @Ethyling
  • Use nvcomp's snappy compressor in ORC writer (#9242) @devavret
  • Only run imports tests on x86_64 (#9241) @Ethyling
  • Remove unnecessary call to device_uvector::release() (#9237) @harrism
  • Use nvcomp's snappy decompression in ORC reader (#9235) @devavret
  • Add grouped_rolling test with STRUCT groupby keys. (#9228) @mythrocks
  • Optimize cudf.concat for axis=0 (#9222) @galipremsagar
  • Fix some libcudf calls not passing the stream parameter (#9220) @davidwendt
  • Add min and max bounds for random dataframe generator numeric types (#9211) @galipremsagar
  • Improve performance of expression evaluation (#9210) @vyasr
  • Misc optimizations in cudf (#9203) @galipremsagar
  • Remove Cython APIs for table view generation (#9199) @vyasr
  • Add JNI support for drop_list_duplicates (#9198) @revans2
  • Update pandas versions in conda recipes and requirements.txt files (#9197) @galipremsagar
  • Minor C++17 cleanup of groupby.cu: structured bindings, more concise lambda, etc (#9193) @codereport
  • Explicit about bitwidth difference between cudf boolean and arrow boolean (#9192) @isVoid
  • Remove _source_index from MultiIndex (#9191) @vyasr
  • Fix typo in the name of cudf-testing-targets.cmake (#9190) @trxcllnt
  • Add support for single-digits in cudf::to_timestamps (#9173) @davidwendt
  • Fix cufilejni build include path (#9168) @pxLi
  • dask_cudf dispatch registering cleanup (#9160) @galipremsagar
  • Remove unneeded stream/mr from a cudf::make_strings_column (#9148) @davidwendt
  • Upgrade pandas version in cudf (#9147) @galipremsagar
  • make data chunk reader return unique_ptr (#9129) @cwharris
  • Add backend for percentile_lookup dispatch (#9118) @galipremsagar
  • Refactor implementation of column setitem (#9110) @vyasr
  • Fix compile warnings found using nvcc 11.4 (#9101) @davidwendt
  • Update to UCX-Py 0.22 (#9099) @pentschev
  • Simplify read_avro by removing unnecessary writer/impl classes (#9090) @cwharris
  • Allowing %f in format to return nanoseconds (#9081) @marlenezw
  • Java bindings for cudf::hash_join (#9080) @jlowe
  • Remove stale code in ColumnBase._fill (#9078) @isVoid
  • Add support for get_group in GroupBy (#9070) @galipremsagar
  • Remove remaining "support" methods from DataFrame (#9068) @vyasr
  • Update JNI java CSV APIs to not use deprecated API (#9066) @revans2
  • Added method to remove null_masks if the column has no nulls (#9061) @razajafri
  • Consolidate Several Series and Dataframe Methods (#9059) @isVoid
  • Remove usage of string based set_dtypes for csv & json readers (#9049) @galipremsagar
  • Remove some debug print statements from gtests (#9048) @davidwendt
  • Support additional format specifiers in from_timestamps (#9047) @davidwendt
  • Expose expression base class publicly and simplify public AST API (#9045) @vyasr
  • move filepath and mmap logic out of json/csv up to functions.cpp (#9040) @cwharris
  • Refactor Index hierarchy (#9039) @vyasr
  • cudf now leverages rapids-cmake to reduce CMake boilerplate (#9030) @robertmaynard
  • Add support for STRUCT input to groupby (#9024) @mythrocks
  • Refactor Frame scans (#9021) @vyasr
  • Remove duplicate set_categories code (#9018) @isVoid
  • Map support for ParquetWriter (#9013) @razajafri
  • Remove aliases of various api.types APIs from utils.dtypes. (#9011) @vyasr
  • Java bindings for conditional join output sizes (#9002) @jlowe
  • Remove _copy_construct factory (#8999) @vyasr
  • ENH Allow arbitrary CMake config options in build.sh (#8996) @dillon-cullinan
  • A small optimization for JNI copy column view to column vector (#8985) @revans2
  • Fix nvcc warnings in ORC writer (#8975) @devavret
  • Support nested structs in rank and dense rank (#8962) @rwlee
  • Move compute_column API out of ast namespace (#8957) @vyasr
  • Series datetime is_year_end and is_year_start (#8954) @marlenezw
  • Make Java AstNode public (#8953) @jlowe
  • Replace allocate with device_uvector for subword_tokenize internal tables (#8952) @davidwendt
  • cudf.dtype function (#8949) @shwina
  • Refactor Frame reductions (#8944) @vyasr
  • Add deprecation warning for Series.set_mask API (#8943) @galipremsagar
  • Move AST evaluator into a separate header (#8930) @vyasr
  • JNI Aggregation Type Changes (#8919) @revans2
  • Move template parameter to function parameter in cudf::detail::left_semi_anti_join (#8914) @davidwendt
  • Upgrade arrow & pyarrow to 5.0.0 (#8908) @galipremsagar
  • Add groupby_aggregation and groupby_scan_aggregation classes and force their usage. (#8906) @nvdbaranec
  • Move structs_column_tests.cu to .cpp. (#8902) @mythrocks
  • Add stream and memory-resource parameters to struct-scalar copy ctor (#8901) @davidwendt
  • Combine linearizer and ast_plan (#8900) @vyasr
  • Add Java bindings for conditional join gather maps (#8888) @jlowe
  • Remove max version pin for dask & distributed on development branch (#8881) @galipremsagar
  • fix cufilejni build w/ c++17 (#8877) @pxLi
  • Add struct accessor to dask-cudf (#8874) @NV-jpt
  • Migrate dask-cudf CudfEngine to leverage ArrowDatasetEngine (#8871) @rjzamora
  • Add JNI for extract_quarter, add_calendrical_months, and is_leap_year (#8863) @revans2
  • Change cudf::scalar copy and move constructors to protected (#8857) @davidwendt
  • Replace is_same&lt;&gt;::value with is_same_v&lt;&gt; (#8852) @codereport
  • Add min pytorch version to importorskip in pytest (#8851) @galipremsagar
  • Java bindings for regex replace (#8847) @jlowe
  • Remove make strings children with null mask (#8830) @davidwendt
  • Refactor conditional joins (#8815) @vyasr
  • Small cleanup (unused headers / commented code removals) (#8799) @codereport
  • ENH Replace gpuci_conda_retry with gpuci_mamba_retry (#8770) @dillon-cullinan
  • Update cudf java bindings to 21.10.0-SNAPSHOT (#8765) @pxLi
  • Refactor and improve join benchmarks with nvbench (#8734) @PointKernel
  • Refactor Python factories and remove usage of Table for libcudf output handling (#8687) @vyasr
  • Optimize URL Decoding (#8622) @gaohao95
  • Parquet writer dictionary encoding refactor (#8476) @devavret
  • Use nvcomp's snappy decompression in parquet reader (#8252) @devavret
  • Use nvcomp's snappy compressor in parquet writer (#8229) @devavret
cudf - v21.08.03

Published by GPUtester about 3 years ago

v21.08.03

cudf - v21.08.02

Published by GPUtester about 3 years ago

v21.08.02

Package Rankings
Top 5.32% on Pypi.org
Top 8.17% on Proxy.golang.org
Top 4.8% on Repo1.maven.org