cudf | Python Ecosystem Directory

Bot releases are visible (Hide)

cudf - [NIGHTLY] v23.02.00

Published by rapids-bot[bot] over 1 year ago

🔗 Links

🚨 Breaking Changes

Pin dask and distributed for release (#12695) @galipremsagar
Change ways to access ptr in Buffer (#12587) @galipremsagar
Remove column names (#12578) @vuule
Default cudf::io::read_json to nested JSON parser (#12544) @vuule
Switch engine=cudf to the new JSON reader (#12509) @galipremsagar
Add trailing comma support for nested JSON reader (#12448) @karthikeyann
Upgrade to arrow-10.0.1 (#12327) @galipremsagar
Fail loudly to avoid data corruption with unsupported input in read_orc (#12325) @vuule
CSV, JSON reader to infer integer column with nulls as int64 instead of float64 (#12309) @karthikeyann
Remove deprecated code for 23.02 (#12281) @vyasr
Null element for parsing error in numeric types in JSON, CSV reader (#12272) @karthikeyann
Purge non-empty nulls for superimpose_nulls and push_down_nulls (#12239) @ttnghia
Rename cudf::structs::detail::superimpose_parent_nulls APIs (#12230) @ttnghia
Remove JIT type names, refactor id_to_type. (#12158) @bdice
Floor division uses integer division for integral arguments (#12131) @wence-

🐛 Bug Fixes

Fix update-version.sh (#12745) @raydouglass
Fix a mask data corruption in UDF (#12647) @galipremsagar
pre-commit: Update isort version to 5.12.0 (#12645) @wence-
tests: Skip cuInit tests if cuda-gdb is not found or not working (#12644) @wence-
Revert regex program java APIs and tests (#12639) @cindyyuanjiang
Fix leaks in ColumnVectorTest (#12625) @jlowe
Handle when spillable buffers own each other (#12607) @madsbk
Fix incorrect null counts for sliced columns in JCudfSerialization (#12589) @jlowe
lists: Transfer dtypes correctly through list.get (#12586) @wence-
timedelta: Don't go via float intermediates for floordiv (#12585) @wence-
Fixing BUG, get_next_chunk() should use the blocking function device_read() (#12584) @madsbk
Make JNI QuoteStyle accessible outside ai.rapids.cudf (#12572) @mythrocks
partition_by_hash(): support index (#12554) @madsbk
Mixed Join benchmark bug due to wrong conditional column (#12553) @divyegala
Update List Lexicographical Comparator (#12538) @divyegala
Dynamically read PTX version (#12534) @brandon-b-miller
build.sh switch to use RAPIDS magic value (#12525) @robertmaynard
Loosen runtime arrow pinning (#12522) @vyasr
Enable metadata transfer for complex types in transpose (#12491) @galipremsagar
Fix issues with parquet chunked reader (#12488) @nvdbaranec
Fix missing metadata transfer in concat for ListColumn (#12487) @galipremsagar
Rename libcudf substring source files to slice (#12484) @davidwendt
Fix compile issue with arrow 10 (#12465) @ttnghia
Fix List offsets bug in mixed type list column in nested JSON reader (#12447) @karthikeyann
Fix xfail incompatibilities (#12423) @vyasr
Fix bug in Parquet column index encoding (#12404) @etseidl
When building Arrow shared look for a shared OpenSSL (#12396) @robertmaynard
Fix get_json_object to return empty column on empty input (#12384) @davidwendt
Pin arrow 9 in testing dependencies to prevent conda solve issues (#12377) @vyasr
Fix reductions any/all return value for empty input (#12374) @davidwendt
Fix debug compile errors in parquet.hpp (#12372) @davidwendt
Purge non-empty nulls in cudf::make_lists_column (#12370) @ttnghia
Use correct memory resource in io::make_column (#12364) @vyasr
Add code to detect possible malformed page data in parquet files. (#12360) @nvdbaranec
Fail loudly to avoid data corruption with unsupported input in read_orc (#12325) @vuule
Fix NumericPairIteratorTest for float values (#12306) @davidwendt
Fixes memory allocation in nested JSON tokenizer (#12300) @elstehle
Reconstruct dtypes correctly for list aggs of struct columns (#12290) @wence-
Fix regex \A and \Z to strictly match string begin/end (#12282) @davidwendt
Fix compile issue in json_chunked_reader.cpp (#12280) @ttnghia
Change reductions any/all to return valid values for empty input (#12279) @davidwendt
Only exclude join keys that are indices from key columns (#12271) @wence-
Fix spill to device limit (#12252) @madsbk
Correct behaviour of sort in concat for singleton concatenations (#12247) @wence-
Purge non-empty nulls for superimpose_nulls and push_down_nulls (#12239) @ttnghia
Patch CUB DeviceSegmentedSort and remove workaround (#12234) @davidwendt
Fix memory leak in udf_string::assign(&&) function (#12206) @davidwendt
Workaround thrust-copy-if limit in json get_tree_representation (#12190) @davidwendt
Fix page size calculation in Parquet writer (#12182) @etseidl
Add cudf::detail::sizes_to_offsets_iterator to allow checking overflow in offsets (#12180) @davidwendt
Workaround thrust-copy-if limit in wordpiece-tokenizer (#12168) @davidwendt
Floor division uses integer division for integral arguments (#12131) @wence-

📖 Documentation

Fix link to NVTX (#12598) @sameerz
Include missing groupby functions in documentation (#12580) @quasiben
Fix documentation author (#12527) @bdice
Update libcudf reduction docs for casting output types (#12526) @davidwendt
Add JSON reader page in user guide (#12499) @GregoryKimball
Link unsupported iteration API docstrings (#12482) @galipremsagar
strings_udf doc update (#12469) @brandon-b-miller
Update cudf_assert docs with correct NDEBUG behavior (#12464) @robertmaynard
Update pre-commit hooks guide (#12395) @bdice
Update test docs to not use detail comparison utilities (#12332) @PointKernel
Fix doxygen description for regex_program::compute_working_memory_size (#12329) @davidwendt
Add eval to docs. (#12322) @vyasr
Turn on xfail_strict=true (#12244) @wence-
Update 10 minutes to cuDF (#12114) @wence-

🚀 New Features

Use kvikIO as the default IO backend (#12574) @vuule
Use has_nonempty_nulls instead of may_contain_non_empty_nulls in superimpose_nulls and push_down_nulls (#12560) @ttnghia
Add strings methods removeprefix and removesuffix (#12557) @davidwendt
Add regex_program java APIs and unit tests (#12548) @cindyyuanjiang
Default cudf::io::read_json to nested JSON parser (#12544) @vuule
Make string quoting optional on CSV write (#12539) @mythrocks
Use new nvCOMP API to optimize the compression temp memory size (#12533) @vuule
Support "values" orient (array of arrays) in Nested JSON reader (#12498) @karthikeyann
one_hot_encode to use experimental row comparators (#12478) @divyegala
Support %W and %w format specifiers in cudf::strings::to_timestamps (#12475) @davidwendt
Add JSON Writer (#12474) @karthikeyann
Refactor thrust_copy_if into cudf::detail::copy_if_safe (#12455) @ttnghia
Add trailing comma support for nested JSON reader (#12448) @karthikeyann
Extract tokenize_json.hpp detail header from src/io/json/nested_json.hpp (#12432) @ttnghia
JNI bindings to write CSV (#12425) @mythrocks
Nested JSON depth benchmark (#12371) @karthikeyann
Implement lists::reverse (#12336) @ttnghia
Use device_read in experimental read_json (#12314) @vuule
Implement JNI for strings::reverse (#12283) @ttnghia
Null element for parsing error in numeric types in JSON, CSV reader (#12272) @karthikeyann
Add cudf::strings:like function with multiple patterns (#12269) @davidwendt
Add environment variable to control host memory allocation in hostdevice_vector (#12251) @vuule
Add cudf::strings::reverse function (#12227) @davidwendt
Selectively use dictionary encoding in Parquet writer (#12211) @etseidl
Support replace in strings_udf (#12207) @brandon-b-miller
Add support to read binary encoded decimals in parquet (#12205) @PointKernel
Support regex EOL where the string ends with a new-line character (#12181) @davidwendt
Updating stream_compaction/unique to use new row comparators (#12159) @divyegala
Add device buffer datasource (#12024) @PointKernel
Implement groupby apply with JIT (#11452) @bwyogatama

🛠️ Improvements

Update shared workflow branches (#12696) @ajschmidt8
Pin dask and distributed for release (#12695) @galipremsagar
Don't upload libcudf-example to Anaconda.org (#12671) @ajschmidt8
Pin wheel dependencies to same RAPIDS release (#12659) @sevagh
Use CTK 118/cp310 branch of wheel workflows (#12602) @sevagh
Change ways to access ptr in Buffer (#12587) @galipremsagar
Version a parquet writer xfail (#12579) @galipremsagar
Remove column names (#12578) @vuule
Parquet reader optimization to address V100 regression. (#12577) @nvdbaranec
Add support for category dtypes in CSV reader (#12571) @galipremsagar
Remove spill_lock parameter from SpillableBuffer.get_ptr() (#12564) @madsbk
Optimize cudf::make_lists_column (#12547) @ttnghia
Remove cudf::strings::repeat_strings_output_sizes from Java and JNI (#12546) @ttnghia
Test that cuInit is not called when RAPIDS_NO_INITIALIZE is set (#12545) @wence-
Rework repeat_strings to use sizes-to-offsets utility (#12543) @davidwendt
Replace exclusive_scan with sizes_to_offsets in cudf::lists::sequences (#12541) @davidwendt
Rework nvtext::ngrams_tokenize to use sizes-to-offsets utility (#12540) @davidwendt
Fix binary-ops gtests coded in namespace cudf::test (#12536) @davidwendt
More @acquire_spill_lock() and as_buffer(..., exposed=False) (#12535) @madsbk
Guard CUDA runtime APIs with error checking (#12531) @PointKernel
Update TODOs from issue 10432. (#12528) @bdice
Update rapids-cmake definitions version in GitHub Actions style checks. (#12511) @bdice
Switch engine=cudf to the new JSON reader (#12509) @galipremsagar
Fix SUM/MEAN aggregation type support. (#12503) @bdice
Stop using pandas._testing (#12492) @vyasr
Fix ROLLING_TEST gtests coded in namespace cudf::test (#12490) @davidwendt
Fix erroneously skipped ORC ZSTD test (#12486) @vuule
Rework nvtext::generate_character_ngrams to use make_strings_children (#12480) @davidwendt
Raise warnings as errors in the test suite (#12468) @vyasr
Remove int32 hard-coding in python (#12467) @galipremsagar
Use cudaMemcpyDefault. (#12466) @bdice
Update workflows for nightly tests (#12462) @ajschmidt8
Build CUDA 11.8 and Python 3.10 Packages (#12457) @ajschmidt8
JNI build image default as cuda11.8 (#12441) @pxLi
Re-enable Recently Updated Check (#12435) @ajschmidt8
Rework remaining cudf::strings::from_xyz functions to use make_strings_children (#12434) @vuule
Build wheels alongside conda CI (#12427) @sevagh
Remove arguments for checking exception messages in Python (#12424) @vyasr
Clean up cuco usage (#12421) @PointKernel
Fix warnings in remaining modules (#12406) @vyasr
Update ops-bot.yaml (#12402) @ajschmidt8
Rework cudf::strings::integers_to_ipv4 to use make_strings_children utility (#12401) @davidwendt
Use numpy.empty() instead of bytearray to allocate host memory for spilling (#12399) @madsbk
Deprecate chunksize from dask_cudf.read_csv (#12394) @rjzamora
Expose the RMM pool size in JNI (#12390) @revans2
Fix COPYING_TEST: gtests coded in namespace cudf::test (#12387) @davidwendt
Rework cudf::strings::url_encode to use make_strings_children utility (#12385) @davidwendt
Use make_strings_children in parse_data nested json reader (#12382) @karthikeyann
Fix warnings in test_datetime.py (#12381) @vyasr
Mixed Join Benchmarks (#12375) @divyegala
Fix warnings in dataframe.py (#12369) @vyasr
Update conda recipes. (#12368) @bdice
Use gpu-latest-1 runner tag (#12366) @bdice
Rework cudf::strings::from_booleans to use make_strings_children (#12365) @vuule
Fix warnings in test modules up to test_dataframe.py (#12355) @vyasr
JSON column performance optimization - struct column nulls (#12354) @karthikeyann
Accelerate stable-segmented-sort with CUB segmented sort (#12347) @davidwendt
Add size check to make_offsets_child_column utility (#12345) @davidwendt
Enable max compression ratio small block optimization for ZSTD (#12338) @vuule
Fix warnings in test_monotonic.py (#12334) @vyasr
Improve JSON column creation performance (list offsets) (#12330) @karthikeyann
Upgrade to arrow-10.0.1 (#12327) @galipremsagar
Fix warnings in test_orc.py (#12326) @vyasr
Fix warnings in test_groupby.py (#12324) @vyasr
Fix test_notebooks.sh (#12323) @ajschmidt8
Fix transform gtests coded in namespace cudf::test (#12321) @davidwendt
Fix check_style.sh script (#12320) @ajschmidt8
Rework cudf::strings::from_timestamps to use make_strings_children (#12317) @davidwendt
Fix warnings in test_index.py (#12313) @vyasr
Fix warnings in test_multiindex.py (#12310) @vyasr
CSV, JSON reader to infer integer column with nulls as int64 instead of float64 (#12309) @karthikeyann
Fix warnings in test_indexing.py (#12305) @vyasr
Fix warnings in test_joining.py (#12304) @vyasr
Unpin dask and distributed for development (#12302) @galipremsagar
Re-enable sccache for Jenkins builds (#12297) @ajschmidt8
Define needs for pr-builder workflow. (#12296) @bdice
Forward merge 22.12 into 23.02 (#12294) @vyasr
Fix warnings in test_stats.py (#12293) @vyasr
Fix table gtests coded in namespace cudf::test (#12292) @davidwendt
Change cython for regex calls to use cudf::strings::regex_program (#12289) @davidwendt
Improved error reporting when reading multiple JSON files (#12285) @vuule
Deprecate Frame.sum_of_squares (#12284) @vyasr
Remove deprecated code for 23.02 (#12281) @vyasr
Clean up handling of max_page_size_bytes in Parquet writer (#12277) @etseidl
Fix replace gtests coded in namespace cudf::test (#12270) @davidwendt
Add pandas nullable type support in Index.to_pandas (#12268) @galipremsagar
Rework nvtext::detokenize to use indexalator for row indices (#12267) @davidwendt
Fix reduction gtests coded in namespace cudf::test (#12257) @davidwendt
Remove default parameters from cudf::detail::sort function declarations (#12254) @davidwendt
Add duplicated support for Series, DataFrame and Index (#12246) @galipremsagar
Replace column/table test utilities with macros (#12242) @PointKernel
Rework cudf::strings::pad and zfill to use make_strings_children (#12238) @davidwendt
Fix sort gtests coded in namespace cudf::test (#12237) @davidwendt
Wrapping concat and file writes in @acquire_spill_lock() (#12232) @madsbk
Rename cudf::structs::detail::superimpose_parent_nulls APIs (#12230) @ttnghia
Cover parsing to decimal types in read_json tests (#12229) @vuule
Spill Statistics (#12223) @madsbk
Use CUDF_JNI_ENABLE_PROFILING to conditionally enable profiling support. (#12221) @bdice
Clean up of test_spilling.py (#12220) @madsbk
Simplify repetitive boolean logic (#12218) @vuule
Add Series.hasnans and Index.hasnans (#12214) @galipremsagar
Add cudf::strings:udf::replace function (#12210) @davidwendt
Adds in new java APIs for appending byte arrays to host columnar data (#12208) @revans2
Remove Python dependencies from Java CI. (#12193) @bdice
Fix null order in sort-based groupby and improve groupby tests (#12191) @divyegala
Move strings children functions from cudf/strings/detail/utilities.cuh to new header (#12185) @davidwendt
Clean up existing JNI scalar to column code (#12173) @revans2
Remove JIT type names, refactor id_to_type. (#12158) @bdice
Update JNI version to 23.02.0-SNAPSHOT (#12129) @pxLi
Minor refactor of cpp/src/io/parquet/page_data.cu (#12126) @etseidl
Add codespell as a linter (#12097) @benfred
Enable specifying exceptions in error macros (#12078) @vyasr
Move _label_encoding from Series to Column (#12040) @shwina
Add GitHub Actions Workflows (#12002) @ajschmidt8
Consolidate dask-cudf groupby_agg calls in one place (#10835) @charlesbluca

cudf - v23.02.00

Published by raydouglass over 1 year ago

🚨 Breaking Changes

Pin dask and distributed for release (#12695) @galipremsagar
Change ways to access ptr in Buffer (#12587) @galipremsagar
Remove column names (#12578) @vuule
Default cudf::io::read_json to nested JSON parser (#12544) @vuule
Switch engine=cudf to the new JSON reader (#12509) @galipremsagar
Add trailing comma support for nested JSON reader (#12448) @karthikeyann
Upgrade to arrow-10.0.1 (#12327) @galipremsagar
Fail loudly to avoid data corruption with unsupported input in read_orc (#12325) @vuule
CSV, JSON reader to infer integer column with nulls as int64 instead of float64 (#12309) @karthikeyann
Remove deprecated code for 23.02 (#12281) @vyasr
Null element for parsing error in numeric types in JSON, CSV reader (#12272) @karthikeyann
Purge non-empty nulls for superimpose_nulls and push_down_nulls (#12239) @ttnghia
Rename cudf::structs::detail::superimpose_parent_nulls APIs (#12230) @ttnghia
Remove JIT type names, refactor id_to_type. (#12158) @bdice
Floor division uses integer division for integral arguments (#12131) @wence-

🐛 Bug Fixes

Fix a mask data corruption in UDF (#12647) @galipremsagar
pre-commit: Update isort version to 5.12.0 (#12645) @wence-
tests: Skip cuInit tests if cuda-gdb is not found or not working (#12644) @wence-
Revert regex program java APIs and tests (#12639) @cindyyuanjiang
Fix leaks in ColumnVectorTest (#12625) @jlowe
Handle when spillable buffers own each other (#12607) @madsbk
Fix incorrect null counts for sliced columns in JCudfSerialization (#12589) @jlowe
lists: Transfer dtypes correctly through list.get (#12586) @wence-
timedelta: Don't go via float intermediates for floordiv (#12585) @wence-
Fixing BUG, get_next_chunk() should use the blocking function device_read() (#12584) @madsbk
Make JNI QuoteStyle accessible outside ai.rapids.cudf (#12572) @mythrocks
partition_by_hash(): support index (#12554) @madsbk
Mixed Join benchmark bug due to wrong conditional column (#12553) @divyegala
Update List Lexicographical Comparator (#12538) @divyegala
Dynamically read PTX version (#12534) @brandon-b-miller
build.sh switch to use RAPIDS magic value (#12525) @robertmaynard
Loosen runtime arrow pinning (#12522) @vyasr
Enable metadata transfer for complex types in transpose (#12491) @galipremsagar
Fix issues with parquet chunked reader (#12488) @nvdbaranec
Fix missing metadata transfer in concat for ListColumn (#12487) @galipremsagar
Rename libcudf substring source files to slice (#12484) @davidwendt
Fix compile issue with arrow 10 (#12465) @ttnghia
Fix List offsets bug in mixed type list column in nested JSON reader (#12447) @karthikeyann
Fix xfail incompatibilities (#12423) @vyasr
Fix bug in Parquet column index encoding (#12404) @etseidl
When building Arrow shared look for a shared OpenSSL (#12396) @robertmaynard
Fix get_json_object to return empty column on empty input (#12384) @davidwendt
Pin arrow 9 in testing dependencies to prevent conda solve issues (#12377) @vyasr
Fix reductions any/all return value for empty input (#12374) @davidwendt
Fix debug compile errors in parquet.hpp (#12372) @davidwendt
Purge non-empty nulls in cudf::make_lists_column (#12370) @ttnghia
Use correct memory resource in io::make_column (#12364) @vyasr
Add code to detect possible malformed page data in parquet files. (#12360) @nvdbaranec
Fail loudly to avoid data corruption with unsupported input in read_orc (#12325) @vuule
Fix NumericPairIteratorTest for float values (#12306) @davidwendt
Fixes memory allocation in nested JSON tokenizer (#12300) @elstehle
Reconstruct dtypes correctly for list aggs of struct columns (#12290) @wence-
Fix regex \A and \Z to strictly match string begin/end (#12282) @davidwendt
Fix compile issue in json_chunked_reader.cpp (#12280) @ttnghia
Change reductions any/all to return valid values for empty input (#12279) @davidwendt
Only exclude join keys that are indices from key columns (#12271) @wence-
Fix spill to device limit (#12252) @madsbk
Correct behaviour of sort in concat for singleton concatenations (#12247) @wence-
Purge non-empty nulls for superimpose_nulls and push_down_nulls (#12239) @ttnghia
Patch CUB DeviceSegmentedSort and remove workaround (#12234) @davidwendt
Fix memory leak in udf_string::assign(&&) function (#12206) @davidwendt
Workaround thrust-copy-if limit in json get_tree_representation (#12190) @davidwendt
Fix page size calculation in Parquet writer (#12182) @etseidl
Add cudf::detail::sizes_to_offsets_iterator to allow checking overflow in offsets (#12180) @davidwendt
Workaround thrust-copy-if limit in wordpiece-tokenizer (#12168) @davidwendt
Floor division uses integer division for integral arguments (#12131) @wence-

📖 Documentation

Fix link to NVTX (#12598) @sameerz
Include missing groupby functions in documentation (#12580) @quasiben
Fix documentation author (#12527) @bdice
Update libcudf reduction docs for casting output types (#12526) @davidwendt
Add JSON reader page in user guide (#12499) @GregoryKimball
Link unsupported iteration API docstrings (#12482) @galipremsagar
strings_udf doc update (#12469) @brandon-b-miller
Update cudf_assert docs with correct NDEBUG behavior (#12464) @robertmaynard
Update pre-commit hooks guide (#12395) @bdice
Update test docs to not use detail comparison utilities (#12332) @PointKernel
Fix doxygen description for regex_program::compute_working_memory_size (#12329) @davidwendt
Add eval to docs. (#12322) @vyasr
Turn on xfail_strict=true (#12244) @wence-
Update 10 minutes to cuDF (#12114) @wence-

🚀 New Features

Use kvikIO as the default IO backend (#12574) @vuule
Use has_nonempty_nulls instead of may_contain_non_empty_nulls in superimpose_nulls and push_down_nulls (#12560) @ttnghia
Add strings methods removeprefix and removesuffix (#12557) @davidwendt
Add regex_program java APIs and unit tests (#12548) @cindyyuanjiang
Default cudf::io::read_json to nested JSON parser (#12544) @vuule
Make string quoting optional on CSV write (#12539) @mythrocks
Use new nvCOMP API to optimize the compression temp memory size (#12533) @vuule
Support "values" orient (array of arrays) in Nested JSON reader (#12498) @karthikeyann
one_hot_encode to use experimental row comparators (#12478) @divyegala
Support %W and %w format specifiers in cudf::strings::to_timestamps (#12475) @davidwendt
Add JSON Writer (#12474) @karthikeyann
Refactor thrust_copy_if into cudf::detail::copy_if_safe (#12455) @ttnghia
Add trailing comma support for nested JSON reader (#12448) @karthikeyann
Extract tokenize_json.hpp detail header from src/io/json/nested_json.hpp (#12432) @ttnghia
JNI bindings to write CSV (#12425) @mythrocks
Nested JSON depth benchmark (#12371) @karthikeyann
Implement lists::reverse (#12336) @ttnghia
Use device_read in experimental read_json (#12314) @vuule
Implement JNI for strings::reverse (#12283) @ttnghia
Null element for parsing error in numeric types in JSON, CSV reader (#12272) @karthikeyann
Add cudf::strings:like function with multiple patterns (#12269) @davidwendt
Add environment variable to control host memory allocation in hostdevice_vector (#12251) @vuule
Add cudf::strings::reverse function (#12227) @davidwendt
Selectively use dictionary encoding in Parquet writer (#12211) @etseidl
Support replace in strings_udf (#12207) @brandon-b-miller
Add support to read binary encoded decimals in parquet (#12205) @PointKernel
Support regex EOL where the string ends with a new-line character (#12181) @davidwendt
Updating stream_compaction/unique to use new row comparators (#12159) @divyegala
Add device buffer datasource (#12024) @PointKernel
Implement groupby apply with JIT (#11452) @bwyogatama

🛠️ Improvements

Update shared workflow branches (#12696) @ajschmidt8
Pin dask and distributed for release (#12695) @galipremsagar
Don't upload libcudf-example to Anaconda.org (#12671) @ajschmidt8
Pin wheel dependencies to same RAPIDS release (#12659) @sevagh
Use CTK 118/cp310 branch of wheel workflows (#12602) @sevagh
Change ways to access ptr in Buffer (#12587) @galipremsagar
Version a parquet writer xfail (#12579) @galipremsagar
Remove column names (#12578) @vuule
Parquet reader optimization to address V100 regression. (#12577) @nvdbaranec
Add support for category dtypes in CSV reader (#12571) @galipremsagar
Remove spill_lock parameter from SpillableBuffer.get_ptr() (#12564) @madsbk
Optimize cudf::make_lists_column (#12547) @ttnghia
Remove cudf::strings::repeat_strings_output_sizes from Java and JNI (#12546) @ttnghia
Test that cuInit is not called when RAPIDS_NO_INITIALIZE is set (#12545) @wence-
Rework repeat_strings to use sizes-to-offsets utility (#12543) @davidwendt
Replace exclusive_scan with sizes_to_offsets in cudf::lists::sequences (#12541) @davidwendt
Rework nvtext::ngrams_tokenize to use sizes-to-offsets utility (#12540) @davidwendt
Fix binary-ops gtests coded in namespace cudf::test (#12536) @davidwendt
More @acquire_spill_lock() and as_buffer(..., exposed=False) (#12535) @madsbk
Guard CUDA runtime APIs with error checking (#12531) @PointKernel
Update TODOs from issue 10432. (#12528) @bdice
Update rapids-cmake definitions version in GitHub Actions style checks. (#12511) @bdice
Switch engine=cudf to the new JSON reader (#12509) @galipremsagar
Fix SUM/MEAN aggregation type support. (#12503) @bdice
Stop using pandas._testing (#12492) @vyasr
Fix ROLLING_TEST gtests coded in namespace cudf::test (#12490) @davidwendt
Fix erroneously skipped ORC ZSTD test (#12486) @vuule
Rework nvtext::generate_character_ngrams to use make_strings_children (#12480) @davidwendt
Raise warnings as errors in the test suite (#12468) @vyasr
Remove int32 hard-coding in python (#12467) @galipremsagar
Use cudaMemcpyDefault. (#12466) @bdice
Update workflows for nightly tests (#12462) @ajschmidt8
Build CUDA 11.8 and Python 3.10 Packages (#12457) @ajschmidt8
JNI build image default as cuda11.8 (#12441) @pxLi
Re-enable Recently Updated Check (#12435) @ajschmidt8
Rework remaining cudf::strings::from_xyz functions to use make_strings_children (#12434) @vuule
Build wheels alongside conda CI (#12427) @sevagh
Remove arguments for checking exception messages in Python (#12424) @vyasr
Clean up cuco usage (#12421) @PointKernel
Fix warnings in remaining modules (#12406) @vyasr
Update ops-bot.yaml (#12402) @ajschmidt8
Rework cudf::strings::integers_to_ipv4 to use make_strings_children utility (#12401) @davidwendt
Use numpy.empty() instead of bytearray to allocate host memory for spilling (#12399) @madsbk
Deprecate chunksize from dask_cudf.read_csv (#12394) @rjzamora
Expose the RMM pool size in JNI (#12390) @revans2
Fix COPYING_TEST: gtests coded in namespace cudf::test (#12387) @davidwendt
Rework cudf::strings::url_encode to use make_strings_children utility (#12385) @davidwendt
Use make_strings_children in parse_data nested json reader (#12382) @karthikeyann
Fix warnings in test_datetime.py (#12381) @vyasr
Mixed Join Benchmarks (#12375) @divyegala
Fix warnings in dataframe.py (#12369) @vyasr
Update conda recipes. (#12368) @bdice
Use gpu-latest-1 runner tag (#12366) @bdice
Rework cudf::strings::from_booleans to use make_strings_children (#12365) @vuule
Fix warnings in test modules up to test_dataframe.py (#12355) @vyasr
JSON column performance optimization - struct column nulls (#12354) @karthikeyann
Accelerate stable-segmented-sort with CUB segmented sort (#12347) @davidwendt
Add size check to make_offsets_child_column utility (#12345) @davidwendt
Enable max compression ratio small block optimization for ZSTD (#12338) @vuule
Fix warnings in test_monotonic.py (#12334) @vyasr
Improve JSON column creation performance (list offsets) (#12330) @karthikeyann
Upgrade to arrow-10.0.1 (#12327) @galipremsagar
Fix warnings in test_orc.py (#12326) @vyasr
Fix warnings in test_groupby.py (#12324) @vyasr
Fix test_notebooks.sh (#12323) @ajschmidt8
Fix transform gtests coded in namespace cudf::test (#12321) @davidwendt
Fix check_style.sh script (#12320) @ajschmidt8
Rework cudf::strings::from_timestamps to use make_strings_children (#12317) @davidwendt
Fix warnings in test_index.py (#12313) @vyasr
Fix warnings in test_multiindex.py (#12310) @vyasr
CSV, JSON reader to infer integer column with nulls as int64 instead of float64 (#12309) @karthikeyann
Fix warnings in test_indexing.py (#12305) @vyasr
Fix warnings in test_joining.py (#12304) @vyasr
Unpin dask and distributed for development (#12302) @galipremsagar
Re-enable sccache for Jenkins builds (#12297) @ajschmidt8
Define needs for pr-builder workflow. (#12296) @bdice
Forward merge 22.12 into 23.02 (#12294) @vyasr
Fix warnings in test_stats.py (#12293) @vyasr
Fix table gtests coded in namespace cudf::test (#12292) @davidwendt
Change cython for regex calls to use cudf::strings::regex_program (#12289) @davidwendt
Improved error reporting when reading multiple JSON files (#12285) @vuule
Deprecate Frame.sum_of_squares (#12284) @vyasr
Remove deprecated code for 23.02 (#12281) @vyasr
Clean up handling of max_page_size_bytes in Parquet writer (#12277) @etseidl
Fix replace gtests coded in namespace cudf::test (#12270) @davidwendt
Add pandas nullable type support in Index.to_pandas (#12268) @galipremsagar
Rework nvtext::detokenize to use indexalator for row indices (#12267) @davidwendt
Fix reduction gtests coded in namespace cudf::test (#12257) @davidwendt
Remove default parameters from cudf::detail::sort function declarations (#12254) @davidwendt
Add duplicated support for Series, DataFrame and Index (#12246) @galipremsagar
Replace column/table test utilities with macros (#12242) @PointKernel
Rework cudf::strings::pad and zfill to use make_strings_children (#12238) @davidwendt
Fix sort gtests coded in namespace cudf::test (#12237) @davidwendt
Wrapping concat and file writes in @acquire_spill_lock() (#12232) @madsbk
Rename cudf::structs::detail::superimpose_parent_nulls APIs (#12230) @ttnghia
Cover parsing to decimal types in read_json tests (#12229) @vuule
Spill Statistics (#12223) @madsbk
Use CUDF_JNI_ENABLE_PROFILING to conditionally enable profiling support. (#12221) @bdice
Clean up of test_spilling.py (#12220) @madsbk
Simplify repetitive boolean logic (#12218) @vuule
Add Series.hasnans and Index.hasnans (#12214) @galipremsagar
Add cudf::strings:udf::replace function (#12210) @davidwendt
Adds in new java APIs for appending byte arrays to host columnar data (#12208) @revans2
Remove Python dependencies from Java CI. (#12193) @bdice
Fix null order in sort-based groupby and improve groupby tests (#12191) @divyegala
Move strings children functions from cudf/strings/detail/utilities.cuh to new header (#12185) @davidwendt
Clean up existing JNI scalar to column code (#12173) @revans2
Remove JIT type names, refactor id_to_type. (#12158) @bdice
Update JNI version to 23.02.0-SNAPSHOT (#12129) @pxLi
Minor refactor of cpp/src/io/parquet/page_data.cu (#12126) @etseidl
Add codespell as a linter (#12097) @benfred
Enable specifying exceptions in error macros (#12078) @vyasr
Move _label_encoding from Series to Column (#12040) @shwina
Add GitHub Actions Workflows (#12002) @ajschmidt8
Consolidate dask-cudf groupby_agg calls in one place (#10835) @charlesbluca

cudf - v22.12.01

Published by GPUtester almost 2 years ago

🚨 Breaking Changes

Add JNI for substring without 'end' parameter. (#12113) @firestarman
Refactor purge_nonempty_nulls (#12111) @ttnghia
Create an int8 column in read_csv when all elements are missing (#12110) @vuule
Throw an error when libcudf is built without cuFile and LIBCUDF_CUFILE_POLICY is set to "ALWAYS" (#12080) @vuule
Fix type promotion edge cases in numerical binops (#12074) @wence-
Reduce/Remove reliance on **kwargs and *args in IO readers & writers (#12025) @galipremsagar
Rollback of DeviceBufferLike (#12009) @madsbk
Remove unused managed_allocator (#12005) @vyasr
Pass column names to write_csv instead of table_metadata pointer (#11972) @vuule
Accept const refs instead of const unique_ptr refs in reduce and scan APIs. (#11960) @vyasr
Default to equal NaNs in make_merge_sets_aggregation. (#11952) @bdice
Remove validation that requires introspection (#11938) @vyasr
Trim quotes for non-string values in nested json parsing (#11898) @karthikeyann
Add tests ensuring that cudf's default stream is always used (#11875) @vyasr
Support nested types as groupby keys in libcudf (#11792) @PointKernel
Default to equal NaNs in make_collect_set_aggregation. (#11621) @bdice
Removing int8 column option from parquet byte_array writing (#11539) @hyperbolic2346
part1: Simplify BaseIndex to an abstract class (#10389) @skirui-source

🐛 Bug Fixes

strings_udf: use libcudf caching of character tables (#12343) @wence-
Fix include line for IO Cython modules (#12250) @vyasr
Make dask pinning looser (#12231) @vyasr
Workaround for CUB segmented-sort bug with boolean keys (#12217) @davidwendt
Fix from_dict backend dispatch to match upstream dask (#12203) @galipremsagar
Merge branch-22.10 into branch-22.12 (#12198) @davidwendt
Fix compression in ORC writer (#12194) @vuule
Don't use CMake 3.25.0 as it has a show stopping FindCUDAToolkit bug (#12188) @robertmaynard
Fix data corruption when reading ORC files with empty stripes (#12160) @vuule
Fix decimal binary operations (#12142) @galipremsagar
Ensure dlpack include is provided to cudf interop lib (#12139) @robertmaynard
Safely allocate udf_string pointers in strings_udf (#12138) @brandon-b-miller
Fix/disable jitify lto (#12122) @robertmaynard
Fix conditional_full_join benchmark (#12121) @GregoryKimball
Fix regex working-memory-size refactor error (#12119) @davidwendt
Add in negative size checks for columns (#12118) @revans2
Add JNI for substring without 'end' parameter. (#12113) @firestarman
Fix reading of CSV files with blank second row (#12098) @vuule
Fix an error in IO with GzipFile type (#12085) @galipremsagar
Workaround groupby aggregate thrust::copy_if overflow (#12079) @davidwendt
Fix alignment of compressed blocks in ORC writer (#12077) @vuule
Fix singleton-range __setitem__ edge case (#12075) @wence-
Fix type promotion edge cases in numerical binops (#12074) @wence-
Force using old fmt in nvbench. (#12067) @vyasr
Fixes List offset bug in Nested JSON reader (#12060) @karthikeyann
Allow falling back to shim_60.ptx by default in strings_udf (#12056) @brandon-b-miller
Force black exclusions for pre-commit. (#12036) @bdice
Add memory_usage & items implementation for Struct column & dtype (#12033) @galipremsagar
Reduce/Remove reliance on **kwargs and *args in IO readers & writers (#12025) @galipremsagar
Fixes bug in csv_reader_options construction in cython (#12021) @karthikeyann
Fix issues when both usecols and names options are used in read_csv (#12018) @vuule
Port thrust's pinned_allocator to cudf, since Thrust 1.17 removes the type (#12004) @robertmaynard
Revert "Replace most of preprocessor usage in nvcomp adapter with constexpr" (#11999) @vuule
Fix bug where df.loc resulting in single row could give wrong index (#11998) @eriknw
Switch to DISABLE_DEPRECATION_WARNINGS to match other RAPIDS projects (#11989) @robertmaynard
Fix maximum page size estimate in Parquet writer (#11962) @vuule
Fix local offset handling in bgzip reader (#11918) @upsj
Fix an issue reading struct-of-list types in Parquet. (#11910) @nvdbaranec
Fix memcheck error in TypeInference.Timestamp gtest (#11905) @davidwendt
Fix type casting in Series.setitem (#11904) @wence-
Fix memcheck error in get_dremel_data (#11903) @davidwendt
Fixes Unsupported column type error due to empty list columns in Nested JSON reader (#11897) @karthikeyann
Fix segmented-sort to ignore indices outside the offsets (#11888) @davidwendt
Fix cudf::stable_sorted_order for NaN and -NaN in FLOAT64 columns (#11874) @davidwendt
Fix writing of Parquet files with many fragments (#11869) @etseidl
Fix RangeIndex unary operators. (#11868) @vyasr
JNI Avoid NPE for reading host binary data (#11865) @revans2
Fix decimal benchmark input data generation (#11863) @karthikeyann
Fix pre-commit copyright check (#11860) @galipremsagar
Fix Parquet support for seconds and milliseconds duration types (#11854) @vuule
Ensure better compiler cache results between cudf cal-ver branches (#11835) @robertmaynard
Fix make_column_from_scalar for all-null strings column (#11807) @davidwendt
Tell jitify_preprocess where to search for libnvrtc (#11787) @robertmaynard
add V2 page header support to parquet reader (#11778) @etseidl
Parquet reader: bug fix for a num_rows/skip_rows corner case, w/optimization for nested preprocessing (#11752) @nvdbaranec
Determine if Arrow has S3 support at runtime in unit test. (#11560) @bdice

📖 Documentation

Use rapidsai CODE_OF_CONDUCT.md (#12166) @bdice
Add symlinks to notebooks. (#12128) @bdice
Add truncate API to python doc pages (#12109) @galipremsagar
Update Numba docs links. (#12107) @bdice
Remove "Multi-GPU with Dask-cuDF" notebook. (#12095) @bdice
Fix link to c++ developer guide from CONTRIBUTING.md (#12084) @brandon-b-miller
Add pivot_table and crosstab to docs. (#12014) @bdice
Fix doxygen text for cudf::dictionary::encode (#11991) @davidwendt
Replace default_stream_value with get_default_stream in docs. (#11985) @vyasr
Add dtype docs pages and docstrings for cudf specific dtypes (#11974) @galipremsagar
Update Unit Testing in libcudf guidelines to code tests outside the cudf::test namespace (#11959) @davidwendt
Rename libcudf++ to libcudf. (#11953) @bdice
Fix documentation referring to removed as_gpu_matrix method. (#11937) @bdice
Remove "experimental" warning for struct columns in ORC reader and writer (#11880) @vuule
Initial draft of policies and guidelines for libcudf usage. (#11853) @vyasr
Add clear indication of non-GPU accelerated parameters in read_json docstring (#11825) @GregoryKimball
Add developer docs for writing tests (#11199) @vyasr

🚀 New Features

Adds an EventHandler to Java MemoryBuffer to be invoked on close (#12125) @abellina
Support + in strings_udf (#12117) @brandon-b-miller
Support upper and lower in strings_udf (#12099) @brandon-b-miller
Add wheel builds (#12096) @vyasr
Allow setting malloc heap size in string udfs (#12094) @brandon-b-miller
Support strip, lstrip, and rstrip in strings_udf (#12091) @brandon-b-miller
Mark nvcomp zstd compression stable (#12059) @jbrennan333
Add debug-only onAllocated/onDeallocated to RmmEventHandler (#12054) @abellina
Enable building against the libarrow contained in pyarrow (#12034) @vyasr
Add strings like jni and native method (#12032) @cindyyuanjiang
Cleanup common parsing code in JSON, CSV reader (#12022) @karthikeyann
byte_range support for JSON Lines format (#12017) @karthikeyann
Minor cleanup of root CMakeLists.txt for better organization (#11988) @robertmaynard
Add inplace arithmetic operators to MaskedType (#11987) @brandon-b-miller
Implement JNI for chunked Parquet reader (#11961) @ttnghia
Add method argument to DataFrame.quantile (#11957) @rjzamora
Add gpu memory watermark apis to JNI (#11950) @abellina
Adds retryCount to RmmEventHandler.onAllocFailure (#11940) @abellina
Enable returning string data from UDFs used through apply (#11933) @brandon-b-miller
Switch over to rapids-cmake patches for thrust (#11921) @robertmaynard
Add strings udf C++ classes and functions for phase II (#11912) @davidwendt
Trim quotes for non-string values in nested json parsing (#11898) @karthikeyann
Enable CEC for strings_udf (#11884) @brandon-b-miller
ArrowIPCTableWriter writes en empty batch in the case of an empty table. (#11883) @firestarman
Implement chunked Parquet reader (#11867) @ttnghia
Add read_orc_metadata to libcudf (#11815) @vuule
Support nested types as groupby keys in libcudf (#11792) @PointKernel
Adding feature Truncate to DataFrame and Series (#11435) @VamsiTallam95

🛠️ Improvements

Reduce number of tests marked spilling (#12197) @madsbk
Pin dask and distributed for release (#12165) @galipremsagar
Don't rely on GNU find in headers_test.sh (#12164) @wence-
Update cp.clip call (#12148) @quasiben
Enable automatic column projection in groupby().agg (#12124) @rjzamora
Refactor purge_nonempty_nulls (#12111) @ttnghia
Create an int8 column in read_csv when all elements are missing (#12110) @vuule
Spilling to host memory (#12106) @madsbk
First pass of pd.read_orc changes in tests (#12103) @galipremsagar
Expose engine argument in dask_cudf.read_json (#12101) @rjzamora
Remove CUDA 10 compatibility code. (#12088) @bdice
Move and update dask nigthly install in CI (#12082) @galipremsagar
Throw an error when libcudf is built without cuFile and LIBCUDF_CUFILE_POLICY is set to "ALWAYS" (#12080) @vuule
Remove macros that inspect the contents of exceptions (#12076) @vyasr
Fix ingest_raw_data performance issue in Nested JSON reader due to RVO (#12070) @karthikeyann
Remove overflow error during decimal binops (#12063) @galipremsagar
Change cudf::detail::tdigest to cudf::tdigest::detail (#12050) @davidwendt
Fix quantile gtests coded in namespace cudf::test (#12049) @davidwendt
Add support for DataFrame.from_dict`to_dictandSeries.to_dict` (#12048) @galipremsagar
Refactor Parquet reader (#12046) @ttnghia
Forward merge 22.10 into 22.12 (#12045) @vyasr
Standardize newlines at ends of files. (#12042) @bdice
Trim trailing whitespace from all files. (#12041) @bdice
Use nosync policy in gather and scatter implementations. (#12038) @bdice
Remove smart quotes from all docstrings. (#12035) @bdice
Update cuda-python dependency to 11.7.1 (#12030) @galipremsagar
Add cython-lint to pre-commit checks. (#12020) @bdice
Use pragma once (#12019) @bdice
New GHA to add issues/prs to project board (#12016) @jarmak-nv
Add DataFrame.pivot_table. (#12015) @bdice
Rollback of DeviceBufferLike (#12009) @madsbk
Remove default parameters for nvtext::detail functions (#12007) @davidwendt
Remove default parameters for cudf::dictionary::detail functions (#12006) @davidwendt
Remove unused managed_allocator (#12005) @vyasr
Remove default parameters for cudf::strings::detail functions (#12003) @davidwendt
Remove unnecessary code from dask-cudf _Frame (#12001) @rjzamora
Ignore python docs build artifacts (#12000) @galipremsagar
Use rapids-cmake for google benchmark. (#11997) @vyasr
Leverage rapids_cython for more automated RPATH handling (#11996) @vyasr
Remove stale labeler (#11995) @raydouglass
Move protobuf compilation to CMake (#11986) @vyasr
Replace most of preprocessor usage in nvcomp adapter with constexpr (#11980) @vuule
Add missing noexcepts to column_in_metadata methods (#11973) @vyasr
Pass column names to write_csv instead of table_metadata pointer (#11972) @vuule
Accelerate libcudf segmented sort with CUB segmented sort (#11969) @davidwendt
Feature/remove default streams (#11967) @vyasr
Add pool memory resource to libcudf basic example (#11966) @davidwendt
Fix some libcudf calls to cudf::detail::gather (#11963) @davidwendt
Accept const refs instead of const unique_ptr refs in reduce and scan APIs. (#11960) @vyasr
Add deprecation warning for set_allocator. (#11958) @vyasr
Fix lists and structs gtests coded in namespace cudf::test (#11956) @davidwendt
Add full page indexes to Parquet writer benchmarks (#11955) @etseidl
Use gather-based strings factory in cudf::strings::strip (#11954) @davidwendt
Default to equal NaNs in make_merge_sets_aggregation. (#11952) @bdice
Add strip_delimiters option to read_text (#11946) @upsj
Refactor multibyte_split output_builder (#11945) @upsj
Remove validation that requires introspection (#11938) @vyasr
Add .str.find_multiple API (#11928) @galipremsagar
Add regex_program class for use with all regex APIs (#11927) @davidwendt
Enable backend dispatching for Dask-DataFrame creation (#11920) @rjzamora
Performance improvement in JSON Tree traversal (#11919) @karthikeyann
Fix some gtests incorrectly coded in namespace cudf::test (part I) (#11917) @davidwendt
Refactor pad/zfill functions for reuse with strings udf (#11914) @davidwendt
Add nanosecond & microsecond to DatetimeProperties (#11911) @galipremsagar
Pin mimesis version in setup.py. (#11906) @bdice
Error on ListColumn or any new unsupported column in cudf.Index (#11902) @galipremsagar
Add thrust output iterator fix (1805) to thrust.patch (#11900) @davidwendt
Relax codecov threshold diff (#11899) @galipremsagar
Use public APIs in STREAM_COMPACTION_NVBENCH (#11892) @GregoryKimball
Add coverage for string UDF tests. (#11891) @vyasr
Provide data_chunk_source wrapper for datasource (#11886) @upsj
Handle multibyte_split byte_range out-of-bounds offsets on host (#11885) @upsj
Add tests ensuring that cudf's default stream is always used (#11875) @vyasr
Change expect_strings_empty into expect_column_empty libcudf test utility (#11873) @davidwendt
Add ngroup (#11871) @shwina
Reduce memory usage in nested JSON parser - tree generation (#11864) @karthikeyann
Unpin dask and distributed for development (#11859) @galipremsagar
Remove unused includes for table/row_operators (#11857) @GregoryKimball
Use conda-forge's pyorc (#11855) @jakirkham
Add libcudf strings examples (#11849) @davidwendt
Remove cudf_io namespace alias (#11827) @vuule
Test/remove thrust vector usage (#11813) @vyasr
Add BGZIP reader to python read_text (#11802) @upsj
Merge branch-22.10 into branch-22.12 (#11801) @davidwendt
Fix compile warning from CUDF_FUNC_RANGE in a member function (#11798) @davidwendt
Update cudf JNI version to 22.12.0-SNAPSHOT (#11764) @pxLi
Update flake8 to 5.0.4 and use flake8-force to check Cython. (#11736) @bdice
Add BGZIP multibyte_split benchmark (#11723) @upsj
Bifurcate Dependency Lists (#11674) @bdice
Default to equal NaNs in make_collect_set_aggregation. (#11621) @bdice
Conform "bench_isin" to match generator column names (#11549) @GregoryKimball
Removing int8 column option from parquet byte_array writing (#11539) @hyperbolic2346
Add checks for HLG layers in dask-cudf groupby tests (#10853) @charlesbluca
part1: Simplify BaseIndex to an abstract class (#10389) @skirui-source
Make all nvcc warnings into errors (#8916) @trxcllnt

cudf - v22.12.00

Published by GPUtester almost 2 years ago

🚨 Breaking Changes

Add JNI for substring without 'end' parameter. (#12113) @firestarman
Refactor purge_nonempty_nulls (#12111) @ttnghia
Create an int8 column in read_csv when all elements are missing (#12110) @vuule
Throw an error when libcudf is built without cuFile and LIBCUDF_CUFILE_POLICY is set to "ALWAYS" (#12080) @vuule
Fix type promotion edge cases in numerical binops (#12074) @wence-
Reduce/Remove reliance on **kwargs and *args in IO readers & writers (#12025) @galipremsagar
Rollback of DeviceBufferLike (#12009) @madsbk
Remove unused managed_allocator (#12005) @vyasr
Pass column names to write_csv instead of table_metadata pointer (#11972) @vuule
Accept const refs instead of const unique_ptr refs in reduce and scan APIs. (#11960) @vyasr
Default to equal NaNs in make_merge_sets_aggregation. (#11952) @bdice
Remove validation that requires introspection (#11938) @vyasr
Trim quotes for non-string values in nested json parsing (#11898) @karthikeyann
Add tests ensuring that cudf's default stream is always used (#11875) @vyasr
Support nested types as groupby keys in libcudf (#11792) @PointKernel
Default to equal NaNs in make_collect_set_aggregation. (#11621) @bdice
Removing int8 column option from parquet byte_array writing (#11539) @hyperbolic2346
part1: Simplify BaseIndex to an abstract class (#10389) @skirui-source

🐛 Bug Fixes

Fix include line for IO Cython modules (#12250) @vyasr
Make dask pinning looser (#12231) @vyasr
Workaround for CUB segmented-sort bug with boolean keys (#12217) @davidwendt
Fix from_dict backend dispatch to match upstream dask (#12203) @galipremsagar
Merge branch-22.10 into branch-22.12 (#12198) @davidwendt
Fix compression in ORC writer (#12194) @vuule
Don't use CMake 3.25.0 as it has a show stopping FindCUDAToolkit bug (#12188) @robertmaynard
Fix data corruption when reading ORC files with empty stripes (#12160) @vuule
Fix decimal binary operations (#12142) @galipremsagar
Ensure dlpack include is provided to cudf interop lib (#12139) @robertmaynard
Safely allocate udf_string pointers in strings_udf (#12138) @brandon-b-miller
Fix/disable jitify lto (#12122) @robertmaynard
Fix conditional_full_join benchmark (#12121) @GregoryKimball
Fix regex working-memory-size refactor error (#12119) @davidwendt
Add in negative size checks for columns (#12118) @revans2
Add JNI for substring without 'end' parameter. (#12113) @firestarman
Fix reading of CSV files with blank second row (#12098) @vuule
Fix an error in IO with GzipFile type (#12085) @galipremsagar
Workaround groupby aggregate thrust::copy_if overflow (#12079) @davidwendt
Fix alignment of compressed blocks in ORC writer (#12077) @vuule
Fix singleton-range __setitem__ edge case (#12075) @wence-
Fix type promotion edge cases in numerical binops (#12074) @wence-
Force using old fmt in nvbench. (#12067) @vyasr
Fixes List offset bug in Nested JSON reader (#12060) @karthikeyann
Allow falling back to shim_60.ptx by default in strings_udf (#12056) @brandon-b-miller
Force black exclusions for pre-commit. (#12036) @bdice
Add memory_usage & items implementation for Struct column & dtype (#12033) @galipremsagar
Reduce/Remove reliance on **kwargs and *args in IO readers & writers (#12025) @galipremsagar
Fixes bug in csv_reader_options construction in cython (#12021) @karthikeyann
Fix issues when both usecols and names options are used in read_csv (#12018) @vuule
Port thrust's pinned_allocator to cudf, since Thrust 1.17 removes the type (#12004) @robertmaynard
Revert "Replace most of preprocessor usage in nvcomp adapter with constexpr" (#11999) @vuule
Fix bug where df.loc resulting in single row could give wrong index (#11998) @eriknw
Switch to DISABLE_DEPRECATION_WARNINGS to match other RAPIDS projects (#11989) @robertmaynard
Fix maximum page size estimate in Parquet writer (#11962) @vuule
Fix local offset handling in bgzip reader (#11918) @upsj
Fix an issue reading struct-of-list types in Parquet. (#11910) @nvdbaranec
Fix memcheck error in TypeInference.Timestamp gtest (#11905) @davidwendt
Fix type casting in Series.setitem (#11904) @wence-
Fix memcheck error in get_dremel_data (#11903) @davidwendt
Fixes Unsupported column type error due to empty list columns in Nested JSON reader (#11897) @karthikeyann
Fix segmented-sort to ignore indices outside the offsets (#11888) @davidwendt
Fix cudf::stable_sorted_order for NaN and -NaN in FLOAT64 columns (#11874) @davidwendt
Fix writing of Parquet files with many fragments (#11869) @etseidl
Fix RangeIndex unary operators. (#11868) @vyasr
JNI Avoid NPE for reading host binary data (#11865) @revans2
Fix decimal benchmark input data generation (#11863) @karthikeyann
Fix pre-commit copyright check (#11860) @galipremsagar
Fix Parquet support for seconds and milliseconds duration types (#11854) @vuule
Ensure better compiler cache results between cudf cal-ver branches (#11835) @robertmaynard
Fix make_column_from_scalar for all-null strings column (#11807) @davidwendt
Tell jitify_preprocess where to search for libnvrtc (#11787) @robertmaynard
add V2 page header support to parquet reader (#11778) @etseidl
Parquet reader: bug fix for a num_rows/skip_rows corner case, w/optimization for nested preprocessing (#11752) @nvdbaranec
Determine if Arrow has S3 support at runtime in unit test. (#11560) @bdice

📖 Documentation

Use rapidsai CODE_OF_CONDUCT.md (#12166) @bdice
Add symlinks to notebooks. (#12128) @bdice
Add truncate API to python doc pages (#12109) @galipremsagar
Update Numba docs links. (#12107) @bdice
Remove "Multi-GPU with Dask-cuDF" notebook. (#12095) @bdice
Fix link to c++ developer guide from CONTRIBUTING.md (#12084) @brandon-b-miller
Add pivot_table and crosstab to docs. (#12014) @bdice
Fix doxygen text for cudf::dictionary::encode (#11991) @davidwendt
Replace default_stream_value with get_default_stream in docs. (#11985) @vyasr
Add dtype docs pages and docstrings for cudf specific dtypes (#11974) @galipremsagar
Update Unit Testing in libcudf guidelines to code tests outside the cudf::test namespace (#11959) @davidwendt
Rename libcudf++ to libcudf. (#11953) @bdice
Fix documentation referring to removed as_gpu_matrix method. (#11937) @bdice
Remove "experimental" warning for struct columns in ORC reader and writer (#11880) @vuule
Initial draft of policies and guidelines for libcudf usage. (#11853) @vyasr
Add clear indication of non-GPU accelerated parameters in read_json docstring (#11825) @GregoryKimball
Add developer docs for writing tests (#11199) @vyasr

🚀 New Features

Adds an EventHandler to Java MemoryBuffer to be invoked on close (#12125) @abellina
Support + in strings_udf (#12117) @brandon-b-miller
Support upper and lower in strings_udf (#12099) @brandon-b-miller
Add wheel builds (#12096) @vyasr
Allow setting malloc heap size in string udfs (#12094) @brandon-b-miller
Support strip, lstrip, and rstrip in strings_udf (#12091) @brandon-b-miller
Mark nvcomp zstd compression stable (#12059) @jbrennan333
Add debug-only onAllocated/onDeallocated to RmmEventHandler (#12054) @abellina
Enable building against the libarrow contained in pyarrow (#12034) @vyasr
Add strings like jni and native method (#12032) @cindyyuanjiang
Cleanup common parsing code in JSON, CSV reader (#12022) @karthikeyann
byte_range support for JSON Lines format (#12017) @karthikeyann
Minor cleanup of root CMakeLists.txt for better organization (#11988) @robertmaynard
Add inplace arithmetic operators to MaskedType (#11987) @brandon-b-miller
Implement JNI for chunked Parquet reader (#11961) @ttnghia
Add method argument to DataFrame.quantile (#11957) @rjzamora
Add gpu memory watermark apis to JNI (#11950) @abellina
Adds retryCount to RmmEventHandler.onAllocFailure (#11940) @abellina
Enable returning string data from UDFs used through apply (#11933) @brandon-b-miller
Switch over to rapids-cmake patches for thrust (#11921) @robertmaynard
Add strings udf C++ classes and functions for phase II (#11912) @davidwendt
Trim quotes for non-string values in nested json parsing (#11898) @karthikeyann
Enable CEC for strings_udf (#11884) @brandon-b-miller
ArrowIPCTableWriter writes en empty batch in the case of an empty table. (#11883) @firestarman
Implement chunked Parquet reader (#11867) @ttnghia
Add read_orc_metadata to libcudf (#11815) @vuule
Support nested types as groupby keys in libcudf (#11792) @PointKernel
Adding feature Truncate to DataFrame and Series (#11435) @VamsiTallam95

🛠️ Improvements

Reduce number of tests marked spilling (#12197) @madsbk
Pin dask and distributed for release (#12165) @galipremsagar
Don't rely on GNU find in headers_test.sh (#12164) @wence-
Update cp.clip call (#12148) @quasiben
Enable automatic column projection in groupby().agg (#12124) @rjzamora
Refactor purge_nonempty_nulls (#12111) @ttnghia
Create an int8 column in read_csv when all elements are missing (#12110) @vuule
Spilling to host memory (#12106) @madsbk
First pass of pd.read_orc changes in tests (#12103) @galipremsagar
Expose engine argument in dask_cudf.read_json (#12101) @rjzamora
Remove CUDA 10 compatibility code. (#12088) @bdice
Move and update dask nigthly install in CI (#12082) @galipremsagar
Throw an error when libcudf is built without cuFile and LIBCUDF_CUFILE_POLICY is set to "ALWAYS" (#12080) @vuule
Remove macros that inspect the contents of exceptions (#12076) @vyasr
Fix ingest_raw_data performance issue in Nested JSON reader due to RVO (#12070) @karthikeyann
Remove overflow error during decimal binops (#12063) @galipremsagar
Change cudf::detail::tdigest to cudf::tdigest::detail (#12050) @davidwendt
Fix quantile gtests coded in namespace cudf::test (#12049) @davidwendt
Add support for DataFrame.from_dict`to_dictandSeries.to_dict` (#12048) @galipremsagar
Refactor Parquet reader (#12046) @ttnghia
Forward merge 22.10 into 22.12 (#12045) @vyasr
Standardize newlines at ends of files. (#12042) @bdice
Trim trailing whitespace from all files. (#12041) @bdice
Use nosync policy in gather and scatter implementations. (#12038) @bdice
Remove smart quotes from all docstrings. (#12035) @bdice
Update cuda-python dependency to 11.7.1 (#12030) @galipremsagar
Add cython-lint to pre-commit checks. (#12020) @bdice
Use pragma once (#12019) @bdice
New GHA to add issues/prs to project board (#12016) @jarmak-nv
Add DataFrame.pivot_table. (#12015) @bdice
Rollback of DeviceBufferLike (#12009) @madsbk
Remove default parameters for nvtext::detail functions (#12007) @davidwendt
Remove default parameters for cudf::dictionary::detail functions (#12006) @davidwendt
Remove unused managed_allocator (#12005) @vyasr
Remove default parameters for cudf::strings::detail functions (#12003) @davidwendt
Remove unnecessary code from dask-cudf _Frame (#12001) @rjzamora
Ignore python docs build artifacts (#12000) @galipremsagar
Use rapids-cmake for google benchmark. (#11997) @vyasr
Leverage rapids_cython for more automated RPATH handling (#11996) @vyasr
Remove stale labeler (#11995) @raydouglass
Move protobuf compilation to CMake (#11986) @vyasr
Replace most of preprocessor usage in nvcomp adapter with constexpr (#11980) @vuule
Add missing noexcepts to column_in_metadata methods (#11973) @vyasr
Pass column names to write_csv instead of table_metadata pointer (#11972) @vuule
Accelerate libcudf segmented sort with CUB segmented sort (#11969) @davidwendt
Feature/remove default streams (#11967) @vyasr
Add pool memory resource to libcudf basic example (#11966) @davidwendt
Fix some libcudf calls to cudf::detail::gather (#11963) @davidwendt
Accept const refs instead of const unique_ptr refs in reduce and scan APIs. (#11960) @vyasr
Add deprecation warning for set_allocator. (#11958) @vyasr
Fix lists and structs gtests coded in namespace cudf::test (#11956) @davidwendt
Add full page indexes to Parquet writer benchmarks (#11955) @etseidl
Use gather-based strings factory in cudf::strings::strip (#11954) @davidwendt
Default to equal NaNs in make_merge_sets_aggregation. (#11952) @bdice
Add strip_delimiters option to read_text (#11946) @upsj
Refactor multibyte_split output_builder (#11945) @upsj
Remove validation that requires introspection (#11938) @vyasr
Add .str.find_multiple API (#11928) @galipremsagar
Add regex_program class for use with all regex APIs (#11927) @davidwendt
Enable backend dispatching for Dask-DataFrame creation (#11920) @rjzamora
Performance improvement in JSON Tree traversal (#11919) @karthikeyann
Fix some gtests incorrectly coded in namespace cudf::test (part I) (#11917) @davidwendt
Refactor pad/zfill functions for reuse with strings udf (#11914) @davidwendt
Add nanosecond & microsecond to DatetimeProperties (#11911) @galipremsagar
Pin mimesis version in setup.py. (#11906) @bdice
Error on ListColumn or any new unsupported column in cudf.Index (#11902) @galipremsagar
Add thrust output iterator fix (1805) to thrust.patch (#11900) @davidwendt
Relax codecov threshold diff (#11899) @galipremsagar
Use public APIs in STREAM_COMPACTION_NVBENCH (#11892) @GregoryKimball
Add coverage for string UDF tests. (#11891) @vyasr
Provide data_chunk_source wrapper for datasource (#11886) @upsj
Handle multibyte_split byte_range out-of-bounds offsets on host (#11885) @upsj
Add tests ensuring that cudf's default stream is always used (#11875) @vyasr
Change expect_strings_empty into expect_column_empty libcudf test utility (#11873) @davidwendt
Add ngroup (#11871) @shwina
Reduce memory usage in nested JSON parser - tree generation (#11864) @karthikeyann
Unpin dask and distributed for development (#11859) @galipremsagar
Remove unused includes for table/row_operators (#11857) @GregoryKimball
Use conda-forge's pyorc (#11855) @jakirkham
Add libcudf strings examples (#11849) @davidwendt
Remove cudf_io namespace alias (#11827) @vuule
Test/remove thrust vector usage (#11813) @vyasr
Add BGZIP reader to python read_text (#11802) @upsj
Merge branch-22.10 into branch-22.12 (#11801) @davidwendt
Fix compile warning from CUDF_FUNC_RANGE in a member function (#11798) @davidwendt
Update cudf JNI version to 22.12.0-SNAPSHOT (#11764) @pxLi
Update flake8 to 5.0.4 and use flake8-force to check Cython. (#11736) @bdice
Add BGZIP multibyte_split benchmark (#11723) @upsj
Bifurcate Dependency Lists (#11674) @bdice
Default to equal NaNs in make_collect_set_aggregation. (#11621) @bdice
Conform "bench_isin" to match generator column names (#11549) @GregoryKimball
Removing int8 column option from parquet byte_array writing (#11539) @hyperbolic2346
Add checks for HLG layers in dask-cudf groupby tests (#10853) @charlesbluca
part1: Simplify BaseIndex to an abstract class (#10389) @skirui-source
Make all nvcc warnings into errors (#8916) @trxcllnt

cudf - [NIGHTLY] v22.10.00

Published by rapids-bot[bot] almost 2 years ago

🔗 Links

🚨 Breaking Changes

Disable Zstandard decompression on nvCOMP 2.4 and Pascal GPus (#11856) @vuule
Disable nvCOMP DEFLATE integration (#11811) @vuule
Fix return type of Index.isna & Index.notna (#11769) @galipremsagar
Remove kwargs in read_csv & to_csv (#11762) @galipremsagar
Fix cudf::partition* APIs that do not return offsets for empty output table (#11709) @ttnghia
Fix regex negated classes to not automatically include new-lines (#11644) @davidwendt
Update zfill to match Python output (#11634) @davidwendt
Upgrade pandas to 1.5 (#11617) @galipremsagar
Change default value of ordered to False in CategoricalDtype (#11604) @galipremsagar
Move cudf::strings::findall_record to cudf::strings::findall (#11575) @davidwendt
Adding optional parquet reader schema (#11524) @hyperbolic2346
Deprecate skiprows and num_rows in read_orc (#11522) @galipremsagar
Remove support for skip_rows / num_rows options in the parquet reader. (#11503) @nvdbaranec
Drop support for skiprows and num_rows in cudf.read_parquet (#11480) @galipremsagar
Disable Arrow S3 support by default. (#11470) @bdice
Convert thrust::optional usages to std::optional (#11455) @robertmaynard
Remove unused is_struct trait. (#11450) @bdice
Refactor the Buffer class (#11447) @madsbk
Return empty dataframe when reading an ORC file using empty columns option (#11446) @vuule
Refactor pad_side and strip_type enums into side_type enum (#11438) @davidwendt
Remove HASH_SERIAL_MURMUR3 / serial32BitMurmurHash3 (#11383) @bdice
Use the new JSON parser when the experimental reader is selected (#11364) @vuule
Remove deprecated Series.applymap. (#11031) @bdice
Remove deprecated expand parameter from str.findall. (#11030) @bdice

🐛 Bug Fixes

Force using old fmt in nvbench. (#12064) @vyasr
Update cuda-python dependency to 11.7.1 (#11994) @shwina
Fixes bug in temporary decompression space estimation before calling nvcomp (#11879) @abellina
Handle ptx file paths during strings_udf import (#11862) @galipremsagar
Disable Zstandard decompression on nvCOMP 2.4 and Pascal GPus (#11856) @vuule
Reset strings_udf CEC and solve several related issues (#11846) @brandon-b-miller
Fix bug in new shuffle-based groupby implementation (#11836) @rjzamora
Fix is_valid checks in Scalar._binaryop (#11818) @wence-
Fix operator NotImplemented issue with numpy (#11816) @galipremsagar
Disable nvCOMP DEFLATE integration (#11811) @vuule
Build strings_udf package with other python packages in nightlies (#11808) @brandon-b-miller
Revert problematic shuffle=explicit-comms changes (#11803) @rjzamora
Fix regex out-of-bounds write in strided rows logic (#11797) @davidwendt
Build cudf locally before building strings_udf conda packages in CI (#11785) @brandon-b-miller
Fix an issue in cudf::row_bit_count involving structs and lists at multiple levels. (#11779) @nvdbaranec
Fix return type of Index.isna & Index.notna (#11769) @galipremsagar
Fix issue with set-item incase of list and struct types (#11760) @galipremsagar
Ensure all libcudf APIs run on cudf's default stream (#11759) @vyasr
Resolve dask_cudf failures caused by upstream groupby changes (#11755) @rjzamora
Fix ORC string sum statistics (#11740) @vuule
Add strings_udf package for python 3.9 (#11730) @brandon-b-miller
Ensure that all tests launch kernels on cudf's default stream (#11726) @vyasr
Don't assume stream is a compile-time constant expression (#11725) @vyasr
Fix get_thrust.cmake format at patch command (#11715) @davidwendt
Fix cudf::partition* APIs that do not return offsets for empty output table (#11709) @ttnghia
Fix cudf::lists::sort_lists for NaN and Infinity values (#11703) @davidwendt
Modify ORC reader timestamp parsing to match the apache reader behavior (#11699) @vuule
Fix DataFrame.from_arrow to preserve type metadata (#11698) @galipremsagar
Fix compile error due to missing header (#11697) @ttnghia
Default to Snappy compression in to_orc when using cuDF or Dask (#11690) @vuule
Fix an issue related to Multindex when group_keys=True (#11689) @galipremsagar
Transfer correct dtype to exploded column (#11687) @wence-
Ignore protobuf generated files in mypy checks (#11685) @galipremsagar
Maintain the index name after .loc (#11677) @shwina
Fix issue with extracting nested column data & dtype preservation (#11671) @galipremsagar
Ensure that all cudf tests and benchmarks are conda env aware (#11666) @robertmaynard
Update to Thrust 1.17.2 to fix cub ODR issues (#11665) @robertmaynard
Fix multi-file remote datasource bug (#11655) @rjzamora
Fix invalid regex quantifier check to not include alternation (#11654) @davidwendt
Fix bug in device_write(): it uses an incorrect size (#11651) @madsbk
fixes overflows in benchmarks (#11649) @elstehle
Fix regex negated classes to not automatically include new-lines (#11644) @davidwendt
Fix compile error in benchmark nested_json.cpp (#11637) @davidwendt
Update zfill to match Python output (#11634) @davidwendt
Removed converted type for INT32 and INT64 since they do not convert (#11627) @hyperbolic2346
Fix host scalars construction of nested types (#11612) @galipremsagar
Fix compile warning in nested_json_gpu.cu (#11607) @davidwendt
Change default value of ordered to False in CategoricalDtype (#11604) @galipremsagar
Preserve order if necessary when deduping categoricals internally (#11597) @brandon-b-miller
Add is_timestamp test for leap second (60) (#11594) @davidwendt
Fix an issue with to_arrow when column name type is not a string (#11590) @galipremsagar
Fix exception in segmented-reduce benchmark (#11588) @davidwendt
Fix encode/decode of negative timestamps in ORC reader/writer (#11586) @vuule
Correct distribution data type in quantiles benchmark (#11584) @vuule
Fix multibyte_split benchmark for host buffers (#11583) @upsj
xfail custreamz display test for now (#11567) @shwina
Fix JNI for TableWithMeta to use schema_info instead of column_names (#11566) @jlowe
Reduce code duplication for dask & distributed nightly/stable installs (#11565) @galipremsagar
Fix groupby failures in dask_cudf CI (#11561) @rjzamora
Fix for pivot: error when 'values' is a multicharacter string (#11538) @shaswat-indian
find_package(cudf) + arrow9 usable with cudf build directory (#11535) @robertmaynard
Fixing crash when writing binary nested data in parquet (#11526) @hyperbolic2346
Fix for: error when assigning a value to an empty series (#11523) @shaswat-indian
Fix invalid results from conditional-left-anti-join in debug build (#11517) @davidwendt
Fix cmake error after upgrading to Arrow 9 (#11513) @ttnghia
Fix reverse binary operators acting on a host value and cudf.Scalar (#11512) @bdice
Update parquet fuzz tests to drop support for skiprows & num_rows (#11505) @galipremsagar
Use rapids-cmake 22.10 best practice for RAPIDS.cmake location (#11493) @robertmaynard
Handle some zero-sized corner cases in dlpack interop (#11449) @wence-
Return empty dataframe when reading an ORC file using empty columns option (#11446) @vuule
libcudf c++ example updated to CPM version 0.35.3 (#11417) @robertmaynard
Fix regex quantifier check to include capture groups (#11373) @davidwendt
Fix read_text when byte_range is aligned with field (#11371) @upsj
Fix to_timestamps truncated subsecond calculation (#11367) @davidwendt
column: calculate null_count before release()ing the cudf::column (#11365) @wence-

📖 Documentation

Update guide-to-udfs notebook (#11861) @brandon-b-miller
Update docstring for cudf.read_text (#11799) @GregoryKimball
Add doc section for list & struct handling (#11770) @galipremsagar
Document that minimum required CMake version is now 3.23.1 (#11751) @robertmaynard
Update libcudf documentation build command in DOCUMENTATION.md (#11735) @davidwendt
Add docs for use of string data to DataFrame.apply and Series.apply and update guide to UDFs notebook (#11733) @brandon-b-miller
Enable more Pydocstyle rules (#11582) @bdice
Remove unused cpp/img folder (#11554) @davidwendt
Publish C++ developer docs (#11475) @vyasr
Fix a misalignment in cudf.get_dummies docstring (#11443) @galipremsagar
Update contributing doc to include links to the developer guides (#11390) @davidwendt
Fix table_view_base doxygen format (#11340) @davidwendt
Create main developer guide for Python (#11235) @vyasr
Add developer documentation for benchmarking (#11122) @vyasr
cuDF error handling document (#7917) @isVoid

🚀 New Features

Add hasNull statistic reading ability to ORC (#11747) @devavret
Add istitle to string UDFs (#11738) @brandon-b-miller
JSON Column creation in GPU (#11714) @karthikeyann
Adds option to take explicit nested schema for nested JSON reader (#11682) @elstehle
Add BGZIP data_chunk_reader (#11652) @upsj
Support DECIMAL order-by for RANGE window functions (#11645) @mythrocks
changing version of cmake to 3.23.3 (#11619) @hyperbolic2346
Generate unique keys table in java JNI contiguousSplitGroups (#11614) @res-life
Generic type casting to support the new nested JSON reader (#11613) @elstehle
JSON tree traversal (#11610) @karthikeyann
Add casting operators to masked UDFs (#11578) @brandon-b-miller
Adds type inference and type conversion for leaf-columns to the nested JSON parser (#11574) @elstehle
Add strings 'like' function (#11558) @davidwendt
Handle hyphen as literal for regex cclass when incomplete range (#11557) @davidwendt
Enable ZSTD compression in ORC and Parquet writers (#11551) @vuule
Adds support for json lines format to the nested JSON reader (#11534) @elstehle
Adding optional parquet reader schema (#11524) @hyperbolic2346
Adds GPU implementation of JSON-token-stream to JSON-tree (#11518) @karthikeyann
Add gdb pretty-printers for simple types (#11499) @upsj
Add create_random_column function to the data generator (#11490) @vuule
Add fluent API builder to data_profile (#11479) @vuule
Adds Nested Json benchmark (#11466) @karthikeyann
Convert thrust::optional usages to std::optional (#11455) @robertmaynard
Python API for the future experimental JSON reader (#11426) @vuule
Return schema info from JSON reader (#11419) @vuule
Add regex ASCII flag support for matching builtin character classes (#11404) @davidwendt
Truncate parquet column indexes (#11403) @etseidl
Adds the end-to-end JSON parser implementation (#11388) @elstehle
Use the new JSON parser when the experimental reader is selected (#11364) @vuule
Add placeholder for the experimental JSON reader (#11334) @vuule
Add read-only functions on string dtypes to DataFrame.apply and Series.apply (#11319) @brandon-b-miller
Added 'crosstab' and 'pivot_table' features (#11314) @shaswat-indian
Quickly error out when trying to build with unsupported nvcc versions (#11297) @robertmaynard
Adds JSON tokenizer (#11264) @elstehle
List lexicographic comparator (#11129) @devavret
Add generic type inference for cuIO (#11121) @PointKernel
Fully support nested types in cudf::contains (#10656) @ttnghia
Support nested types in lists::contains (#10548) @ttnghia

🛠️ Improvements

Pin dask and distributed for release (#11822) @galipremsagar
Add examples for Nested JSON reader (#11814) @GregoryKimball
Support shuffle-based groupby aggregations in dask_cudf (#11800) @rjzamora
Update strings udf version updater script (#11772) @galipremsagar
Remove kwargs in read_csv & to_csv (#11762) @galipremsagar
Pass dtype param to avoid pd.Series warnings (#11761) @galipremsagar
Enable schema_element & keep_quotes support in json reader (#11746) @galipremsagar
Add ability to construct ListColumn when size is None (#11745) @galipremsagar
Reduces memory requirements in JSON parser and adds bytes/s and peak memory usage to benchmarks (#11732) @elstehle
Add missing copyright headers. (#11712) @bdice
Fix copyright check issues in pre-commit (#11711) @bdice
Include decimal in supported types for range window order-by columns (#11710) @mythrocks
Disable very large column gtest for contiguous-split (#11706) @davidwendt
Drop split_out=None test from groupby.agg (#11704) @wence-
Use CubinLinker for CUDA Minor Version Compatibility (#11701) @gmarkall
Add regex capture-group parameter to auto convert to non-capture groups (#11695) @davidwendt
Add a __dataframe__ method to the protocol dataframe object (#11692) @rgommers
Special-case multibyte_split for single-byte delimiter (#11681) @upsj
Remove isort exclusions (#11680) @bdice
Refactor CSV reader benchmarks with nvbench (#11678) @PointKernel
Check conda recipe headers with pre-commit (#11669) @bdice
Remove redundant style check for clang-format. (#11668) @bdice
Add support for group_keys in groupby (#11659) @galipremsagar
Fix pandoc pinning. (#11658) @bdice
Revert removal of skip_rows / num_rows options from the Parquet reader. (#11657) @nvdbaranec
Update git metadata (#11647) @bdice
Call set_null_count on a returning column if null-count is known (#11646) @davidwendt
Fix some libcudf detail calls not passing the stream variable (#11642) @davidwendt
Update to mypy 0.971 (#11640) @wence-
Refactor strings strip functor to details header (#11635) @davidwendt
Fix incorrect nullCount in get_json_object (#11633) @trxcllnt
Simplify hostdevice_vector (#11631) @upsj
Refactor parquet writer benchmarks with nvbench (#11623) @PointKernel
Rework contains_scalar to check nulls at runtime (#11622) @davidwendt
Fix incorrect memory resource used in rolling temp columns (#11618) @mythrocks
Upgrade pandas to 1.5 (#11617) @galipremsagar
Move type-dispatcher calls from traits.hpp to traits.cpp (#11616) @davidwendt
Refactor parquet reader benchmarks with nvbench (#11611) @PointKernel
Forward-merge branch-22.08 to branch-22.10 (#11608) @bdice
Use stream in Java API. (#11601) @bdice
Refactors of public/detail APIs, CUDF_FUNC_RANGE, stream handling. (#11600) @bdice
Improve ORC writer benchmark with nvbench (#11598) @PointKernel
Tune multibyte_split kernel (#11587) @upsj
Move split_utils.cuh to strings/detail (#11585) @davidwendt
Fix warnings due to compiler regression with if constexpr (#11581) @ttnghia
Add full 24-bit dictionary support to Parquet writer (#11580) @etseidl
Expose "explicit-comms" option in shuffle-based dask_cudf functions (#11576) @rjzamora
Move cudf::strings::findall_record to cudf::strings::findall (#11575) @davidwendt
Refactor dask_cudf groupby to use apply_concat_apply (#11571) @rjzamora
Add ability to write list(struct) columns as map type in orc writer (#11568) @galipremsagar
Add byte_range to multibyte_split benchmark + NVBench refactor (#11562) @upsj
JNI support for writing binary columns in parquet (#11556) @revans2
Support additional dictionary bit widths in Parquet writer (#11547) @etseidl
Refactor string/numeric conversion utilities (#11545) @davidwendt
Removing unnecessary asserts in parquet tests (#11544) @hyperbolic2346
Clean up ORC reader benchmarks with NVBench (#11543) @PointKernel
Reuse MurmurHash3_32 in Parquet page data. (#11528) @bdice
Add hexadecimal value separators (#11527) @bdice
Deprecate skiprows and num_rows in read_orc (#11522) @galipremsagar
Struct support for NULL_EQUALS binary operation (#11520) @rwlee
Bump hadoop-common from 3.2.3 to 3.2.4 in /java (#11516) @dependabot[bot]
Fix Feather test warning. (#11511) @bdice
copy_range ballot_syncs to have no execution dependency (#11508) @robertmaynard
Upgrade to arrow-9.x (#11507) @galipremsagar
Remove support for skip_rows / num_rows options in the parquet reader. (#11503) @nvdbaranec
Single-pass multibyte_split (#11500) @upsj
Sanitize percentile_approx() output for empty input (#11498) @SrikarVanavasam
Unpin dask and distributed for development (#11492) @galipremsagar
Move SparkMurmurHash3_32 functor. (#11489) @bdice
Refactor group_nunique.cu to use nullate::DYNAMIC for reduce-by-key functor (#11482) @davidwendt
Drop support for skiprows and num_rows in cudf.read_parquet (#11480) @galipremsagar
Add reduction distinct_count benchmark (#11473) @ttnghia
Add groupby nunique aggregation benchmark (#11472) @ttnghia
Disable Arrow S3 support by default. (#11470) @bdice
Add groupby max aggregation benchmark (#11464) @ttnghia
Extract Dremel encoding code from Parquet (#11461) @vyasr
Add missing Thrust #includes. (#11457) @bdice
Make CMake hooks verbose (#11456) @vyasr
Control Parquet page size through Python API (#11454) @etseidl
Add control of Parquet column index creation to python (#11453) @etseidl
Remove unused is_struct trait. (#11450) @bdice
Refactor the Buffer class (#11447) @madsbk
Refactor pad_side and strip_type enums into side_type enum (#11438) @davidwendt
Update to Thrust 1.17.0 (#11437) @bdice
Add in JNI for parsing JSON data and getting the metadata back too. (#11431) @revans2
Convert byte_array_view to use std::byte (#11424) @hyperbolic2346
Deprecate unflatten_nested_columns (#11421) @SrikarVanavasam
Remove HASH_SERIAL_MURMUR3 / serial32BitMurmurHash3 (#11383) @bdice
Add Spark list hashing Java tests (#11379) @bdice
Move cmake to the build section. (#11376) @vyasr
Remove use of CUDA driver API calls from libcudf (#11370) @shwina
Add column constructor from device_uvector&& (#11356) @SrikarVanavasam
Remove unused custreamz thirdparty directory (#11343) @vyasr
Update jni version to 22.10.0-SNAPSHOT (#11338) @pxLi
Enable using upstream jitify2 (#11287) @shwina
Cache cudf.Scalar (#11246) @shwina
Remove deprecated Series.applymap. (#11031) @bdice
Remove deprecated expand parameter from str.findall. (#11030) @bdice

cudf - v22.10.01

Published by GPUtester almost 2 years ago

🚨 Breaking Changes

Disable Zstandard decompression on nvCOMP 2.4 and Pascal GPus (#11856) @vuule
Disable nvCOMP DEFLATE integration (#11811) @vuule
Fix return type of Index.isna & Index.notna (#11769) @galipremsagar
Remove kwargs in read_csv & to_csv (#11762) @galipremsagar
Fix cudf::partition* APIs that do not return offsets for empty output table (#11709) @ttnghia
Fix regex negated classes to not automatically include new-lines (#11644) @davidwendt
Update zfill to match Python output (#11634) @davidwendt
Upgrade pandas to 1.5 (#11617) @galipremsagar
Change default value of ordered to False in CategoricalDtype (#11604) @galipremsagar
Move cudf::strings::findall_record to cudf::strings::findall (#11575) @davidwendt
Adding optional parquet reader schema (#11524) @hyperbolic2346
Deprecate skiprows and num_rows in read_orc (#11522) @galipremsagar
Remove support for skip_rows / num_rows options in the parquet reader. (#11503) @nvdbaranec
Drop support for skiprows and num_rows in cudf.read_parquet (#11480) @galipremsagar
Disable Arrow S3 support by default. (#11470) @bdice
Convert thrust::optional usages to std::optional (#11455) @robertmaynard
Remove unused is_struct trait. (#11450) @bdice
Refactor the Buffer class (#11447) @madsbk
Return empty dataframe when reading an ORC file using empty columns option (#11446) @vuule
Refactor pad_side and strip_type enums into side_type enum (#11438) @davidwendt
Remove HASH_SERIAL_MURMUR3 / serial32BitMurmurHash3 (#11383) @bdice
Use the new JSON parser when the experimental reader is selected (#11364) @vuule
Remove deprecated Series.applymap. (#11031) @bdice
Remove deprecated expand parameter from str.findall. (#11030) @bdice

🐛 Bug Fixes

Update cuda-python dependency to 11.7.1 (#11994) @shwina
Fixes bug in temporary decompression space estimation before calling nvcomp (#11879) @abellina
Handle ptx file paths during strings_udf import (#11862) @galipremsagar
Disable Zstandard decompression on nvCOMP 2.4 and Pascal GPus (#11856) @vuule
Reset strings_udf CEC and solve several related issues (#11846) @brandon-b-miller
Fix bug in new shuffle-based groupby implementation (#11836) @rjzamora
Fix is_valid checks in Scalar._binaryop (#11818) @wence-
Fix operator NotImplemented issue with numpy (#11816) @galipremsagar
Disable nvCOMP DEFLATE integration (#11811) @vuule
Build strings_udf package with other python packages in nightlies (#11808) @brandon-b-miller
Revert problematic shuffle=explicit-comms changes (#11803) @rjzamora
Fix regex out-of-bounds write in strided rows logic (#11797) @davidwendt
Build cudf locally before building strings_udf conda packages in CI (#11785) @brandon-b-miller
Fix an issue in cudf::row_bit_count involving structs and lists at multiple levels. (#11779) @nvdbaranec
Fix return type of Index.isna & Index.notna (#11769) @galipremsagar
Fix issue with set-item incase of list and struct types (#11760) @galipremsagar
Ensure all libcudf APIs run on cudf's default stream (#11759) @vyasr
Resolve dask_cudf failures caused by upstream groupby changes (#11755) @rjzamora
Fix ORC string sum statistics (#11740) @vuule
Add strings_udf package for python 3.9 (#11730) @brandon-b-miller
Ensure that all tests launch kernels on cudf's default stream (#11726) @vyasr
Don't assume stream is a compile-time constant expression (#11725) @vyasr
Fix get_thrust.cmake format at patch command (#11715) @davidwendt
Fix cudf::partition* APIs that do not return offsets for empty output table (#11709) @ttnghia
Fix cudf::lists::sort_lists for NaN and Infinity values (#11703) @davidwendt
Modify ORC reader timestamp parsing to match the apache reader behavior (#11699) @vuule
Fix DataFrame.from_arrow to preserve type metadata (#11698) @galipremsagar
Fix compile error due to missing header (#11697) @ttnghia
Default to Snappy compression in to_orc when using cuDF or Dask (#11690) @vuule
Fix an issue related to Multindex when group_keys=True (#11689) @galipremsagar
Transfer correct dtype to exploded column (#11687) @wence-
Ignore protobuf generated files in mypy checks (#11685) @galipremsagar
Maintain the index name after .loc (#11677) @shwina
Fix issue with extracting nested column data & dtype preservation (#11671) @galipremsagar
Ensure that all cudf tests and benchmarks are conda env aware (#11666) @robertmaynard
Update to Thrust 1.17.2 to fix cub ODR issues (#11665) @robertmaynard
Fix multi-file remote datasource bug (#11655) @rjzamora
Fix invalid regex quantifier check to not include alternation (#11654) @davidwendt
Fix bug in device_write(): it uses an incorrect size (#11651) @madsbk
fixes overflows in benchmarks (#11649) @elstehle
Fix regex negated classes to not automatically include new-lines (#11644) @davidwendt
Fix compile error in benchmark nested_json.cpp (#11637) @davidwendt
Update zfill to match Python output (#11634) @davidwendt
Removed converted type for INT32 and INT64 since they do not convert (#11627) @hyperbolic2346
Fix host scalars construction of nested types (#11612) @galipremsagar
Fix compile warning in nested_json_gpu.cu (#11607) @davidwendt
Change default value of ordered to False in CategoricalDtype (#11604) @galipremsagar
Preserve order if necessary when deduping categoricals internally (#11597) @brandon-b-miller
Add is_timestamp test for leap second (60) (#11594) @davidwendt
Fix an issue with to_arrow when column name type is not a string (#11590) @galipremsagar
Fix exception in segmented-reduce benchmark (#11588) @davidwendt
Fix encode/decode of negative timestamps in ORC reader/writer (#11586) @vuule
Correct distribution data type in quantiles benchmark (#11584) @vuule
Fix multibyte_split benchmark for host buffers (#11583) @upsj
xfail custreamz display test for now (#11567) @shwina
Fix JNI for TableWithMeta to use schema_info instead of column_names (#11566) @jlowe
Reduce code duplication for dask & distributed nightly/stable installs (#11565) @galipremsagar
Fix groupby failures in dask_cudf CI (#11561) @rjzamora
Fix for pivot: error when 'values' is a multicharacter string (#11538) @shaswat-indian
find_package(cudf) + arrow9 usable with cudf build directory (#11535) @robertmaynard
Fixing crash when writing binary nested data in parquet (#11526) @hyperbolic2346
Fix for: error when assigning a value to an empty series (#11523) @shaswat-indian
Fix invalid results from conditional-left-anti-join in debug build (#11517) @davidwendt
Fix cmake error after upgrading to Arrow 9 (#11513) @ttnghia
Fix reverse binary operators acting on a host value and cudf.Scalar (#11512) @bdice
Update parquet fuzz tests to drop support for skiprows & num_rows (#11505) @galipremsagar
Use rapids-cmake 22.10 best practice for RAPIDS.cmake location (#11493) @robertmaynard
Handle some zero-sized corner cases in dlpack interop (#11449) @wence-
Return empty dataframe when reading an ORC file using empty columns option (#11446) @vuule
libcudf c++ example updated to CPM version 0.35.3 (#11417) @robertmaynard
Fix regex quantifier check to include capture groups (#11373) @davidwendt
Fix read_text when byte_range is aligned with field (#11371) @upsj
Fix to_timestamps truncated subsecond calculation (#11367) @davidwendt
column: calculate null_count before release()ing the cudf::column (#11365) @wence-

📖 Documentation

Update guide-to-udfs notebook (#11861) @brandon-b-miller
Update docstring for cudf.read_text (#11799) @GregoryKimball
Add doc section for list & struct handling (#11770) @galipremsagar
Document that minimum required CMake version is now 3.23.1 (#11751) @robertmaynard
Update libcudf documentation build command in DOCUMENTATION.md (#11735) @davidwendt
Add docs for use of string data to DataFrame.apply and Series.apply and update guide to UDFs notebook (#11733) @brandon-b-miller
Enable more Pydocstyle rules (#11582) @bdice
Remove unused cpp/img folder (#11554) @davidwendt
Publish C++ developer docs (#11475) @vyasr
Fix a misalignment in cudf.get_dummies docstring (#11443) @galipremsagar
Update contributing doc to include links to the developer guides (#11390) @davidwendt
Fix table_view_base doxygen format (#11340) @davidwendt
Create main developer guide for Python (#11235) @vyasr
Add developer documentation for benchmarking (#11122) @vyasr
cuDF error handling document (#7917) @isVoid

🚀 New Features

Add hasNull statistic reading ability to ORC (#11747) @devavret
Add istitle to string UDFs (#11738) @brandon-b-miller
JSON Column creation in GPU (#11714) @karthikeyann
Adds option to take explicit nested schema for nested JSON reader (#11682) @elstehle
Add BGZIP data_chunk_reader (#11652) @upsj
Support DECIMAL order-by for RANGE window functions (#11645) @mythrocks
changing version of cmake to 3.23.3 (#11619) @hyperbolic2346
Generate unique keys table in java JNI contiguousSplitGroups (#11614) @res-life
Generic type casting to support the new nested JSON reader (#11613) @elstehle
JSON tree traversal (#11610) @karthikeyann
Add casting operators to masked UDFs (#11578) @brandon-b-miller
Adds type inference and type conversion for leaf-columns to the nested JSON parser (#11574) @elstehle
Add strings 'like' function (#11558) @davidwendt
Handle hyphen as literal for regex cclass when incomplete range (#11557) @davidwendt
Enable ZSTD compression in ORC and Parquet writers (#11551) @vuule
Adds support for json lines format to the nested JSON reader (#11534) @elstehle
Adding optional parquet reader schema (#11524) @hyperbolic2346
Adds GPU implementation of JSON-token-stream to JSON-tree (#11518) @karthikeyann
Add gdb pretty-printers for simple types (#11499) @upsj
Add create_random_column function to the data generator (#11490) @vuule
Add fluent API builder to data_profile (#11479) @vuule
Adds Nested Json benchmark (#11466) @karthikeyann
Convert thrust::optional usages to std::optional (#11455) @robertmaynard
Python API for the future experimental JSON reader (#11426) @vuule
Return schema info from JSON reader (#11419) @vuule
Add regex ASCII flag support for matching builtin character classes (#11404) @davidwendt
Truncate parquet column indexes (#11403) @etseidl
Adds the end-to-end JSON parser implementation (#11388) @elstehle
Use the new JSON parser when the experimental reader is selected (#11364) @vuule
Add placeholder for the experimental JSON reader (#11334) @vuule
Add read-only functions on string dtypes to DataFrame.apply and Series.apply (#11319) @brandon-b-miller
Added 'crosstab' and 'pivot_table' features (#11314) @shaswat-indian
Quickly error out when trying to build with unsupported nvcc versions (#11297) @robertmaynard
Adds JSON tokenizer (#11264) @elstehle
List lexicographic comparator (#11129) @devavret
Add generic type inference for cuIO (#11121) @PointKernel
Fully support nested types in cudf::contains (#10656) @ttnghia
Support nested types in lists::contains (#10548) @ttnghia

🛠️ Improvements

Pin dask and distributed for release (#11822) @galipremsagar
Add examples for Nested JSON reader (#11814) @GregoryKimball
Support shuffle-based groupby aggregations in dask_cudf (#11800) @rjzamora
Update strings udf version updater script (#11772) @galipremsagar
Remove kwargs in read_csv & to_csv (#11762) @galipremsagar
Pass dtype param to avoid pd.Series warnings (#11761) @galipremsagar
Enable schema_element & keep_quotes support in json reader (#11746) @galipremsagar
Add ability to construct ListColumn when size is None (#11745) @galipremsagar
Reduces memory requirements in JSON parser and adds bytes/s and peak memory usage to benchmarks (#11732) @elstehle
Add missing copyright headers. (#11712) @bdice
Fix copyright check issues in pre-commit (#11711) @bdice
Include decimal in supported types for range window order-by columns (#11710) @mythrocks
Disable very large column gtest for contiguous-split (#11706) @davidwendt
Drop split_out=None test from groupby.agg (#11704) @wence-
Use CubinLinker for CUDA Minor Version Compatibility (#11701) @gmarkall
Add regex capture-group parameter to auto convert to non-capture groups (#11695) @davidwendt
Add a __dataframe__ method to the protocol dataframe object (#11692) @rgommers
Special-case multibyte_split for single-byte delimiter (#11681) @upsj
Remove isort exclusions (#11680) @bdice
Refactor CSV reader benchmarks with nvbench (#11678) @PointKernel
Check conda recipe headers with pre-commit (#11669) @bdice
Remove redundant style check for clang-format. (#11668) @bdice
Add support for group_keys in groupby (#11659) @galipremsagar
Fix pandoc pinning. (#11658) @bdice
Revert removal of skip_rows / num_rows options from the Parquet reader. (#11657) @nvdbaranec
Update git metadata (#11647) @bdice
Call set_null_count on a returning column if null-count is known (#11646) @davidwendt
Fix some libcudf detail calls not passing the stream variable (#11642) @davidwendt
Update to mypy 0.971 (#11640) @wence-
Refactor strings strip functor to details header (#11635) @davidwendt
Fix incorrect nullCount in get_json_object (#11633) @trxcllnt
Simplify hostdevice_vector (#11631) @upsj
Refactor parquet writer benchmarks with nvbench (#11623) @PointKernel
Rework contains_scalar to check nulls at runtime (#11622) @davidwendt
Fix incorrect memory resource used in rolling temp columns (#11618) @mythrocks
Upgrade pandas to 1.5 (#11617) @galipremsagar
Move type-dispatcher calls from traits.hpp to traits.cpp (#11616) @davidwendt
Refactor parquet reader benchmarks with nvbench (#11611) @PointKernel
Forward-merge branch-22.08 to branch-22.10 (#11608) @bdice
Use stream in Java API. (#11601) @bdice
Refactors of public/detail APIs, CUDF_FUNC_RANGE, stream handling. (#11600) @bdice
Improve ORC writer benchmark with nvbench (#11598) @PointKernel
Tune multibyte_split kernel (#11587) @upsj
Move split_utils.cuh to strings/detail (#11585) @davidwendt
Fix warnings due to compiler regression with if constexpr (#11581) @ttnghia
Add full 24-bit dictionary support to Parquet writer (#11580) @etseidl
Expose "explicit-comms" option in shuffle-based dask_cudf functions (#11576) @rjzamora
Move cudf::strings::findall_record to cudf::strings::findall (#11575) @davidwendt
Refactor dask_cudf groupby to use apply_concat_apply (#11571) @rjzamora
Add ability to write list(struct) columns as map type in orc writer (#11568) @galipremsagar
Add byte_range to multibyte_split benchmark + NVBench refactor (#11562) @upsj
JNI support for writing binary columns in parquet (#11556) @revans2
Support additional dictionary bit widths in Parquet writer (#11547) @etseidl
Refactor string/numeric conversion utilities (#11545) @davidwendt
Removing unnecessary asserts in parquet tests (#11544) @hyperbolic2346
Clean up ORC reader benchmarks with NVBench (#11543) @PointKernel
Reuse MurmurHash3_32 in Parquet page data. (#11528) @bdice
Add hexadecimal value separators (#11527) @bdice
Deprecate skiprows and num_rows in read_orc (#11522) @galipremsagar
Struct support for NULL_EQUALS binary operation (#11520) @rwlee
Bump hadoop-common from 3.2.3 to 3.2.4 in /java (#11516) @dependabot[bot]
Fix Feather test warning. (#11511) @bdice
copy_range ballot_syncs to have no execution dependency (#11508) @robertmaynard
Upgrade to arrow-9.x (#11507) @galipremsagar
Remove support for skip_rows / num_rows options in the parquet reader. (#11503) @nvdbaranec
Single-pass multibyte_split (#11500) @upsj
Sanitize percentile_approx() output for empty input (#11498) @SrikarVanavasam
Unpin dask and distributed for development (#11492) @galipremsagar
Move SparkMurmurHash3_32 functor. (#11489) @bdice
Refactor group_nunique.cu to use nullate::DYNAMIC for reduce-by-key functor (#11482) @davidwendt
Drop support for skiprows and num_rows in cudf.read_parquet (#11480) @galipremsagar
Add reduction distinct_count benchmark (#11473) @ttnghia
Add groupby nunique aggregation benchmark (#11472) @ttnghia
Disable Arrow S3 support by default. (#11470) @bdice
Add groupby max aggregation benchmark (#11464) @ttnghia
Extract Dremel encoding code from Parquet (#11461) @vyasr
Add missing Thrust #includes. (#11457) @bdice
Make CMake hooks verbose (#11456) @vyasr
Control Parquet page size through Python API (#11454) @etseidl
Add control of Parquet column index creation to python (#11453) @etseidl
Remove unused is_struct trait. (#11450) @bdice
Refactor the Buffer class (#11447) @madsbk
Refactor pad_side and strip_type enums into side_type enum (#11438) @davidwendt
Update to Thrust 1.17.0 (#11437) @bdice
Add in JNI for parsing JSON data and getting the metadata back too. (#11431) @revans2
Convert byte_array_view to use std::byte (#11424) @hyperbolic2346
Deprecate unflatten_nested_columns (#11421) @SrikarVanavasam
Remove HASH_SERIAL_MURMUR3 / serial32BitMurmurHash3 (#11383) @bdice
Add Spark list hashing Java tests (#11379) @bdice
Move cmake to the build section. (#11376) @vyasr
Remove use of CUDA driver API calls from libcudf (#11370) @shwina
Add column constructor from device_uvector&& (#11356) @SrikarVanavasam
Remove unused custreamz thirdparty directory (#11343) @vyasr
Update jni version to 22.10.0-SNAPSHOT (#11338) @pxLi
Enable using upstream jitify2 (#11287) @shwina
Cache cudf.Scalar (#11246) @shwina
Remove deprecated Series.applymap. (#11031) @bdice
Remove deprecated expand parameter from str.findall. (#11030) @bdice

cudf - v22.10.00

Published by GPUtester about 2 years ago

🚨 Breaking Changes

Disable Zstandard decompression on nvCOMP 2.4 and Pascal GPus (#11856) @vuule
Disable nvCOMP DEFLATE integration (#11811) @vuule
Fix return type of Index.isna & Index.notna (#11769) @galipremsagar
Remove kwargs in read_csv & to_csv (#11762) @galipremsagar
Fix cudf::partition* APIs that do not return offsets for empty output table (#11709) @ttnghia
Fix regex negated classes to not automatically include new-lines (#11644) @davidwendt
Update zfill to match Python output (#11634) @davidwendt
Upgrade pandas to 1.5 (#11617) @galipremsagar
Change default value of ordered to False in CategoricalDtype (#11604) @galipremsagar
Move cudf::strings::findall_record to cudf::strings::findall (#11575) @davidwendt
Adding optional parquet reader schema (#11524) @hyperbolic2346
Deprecate skiprows and num_rows in read_orc (#11522) @galipremsagar
Remove support for skip_rows / num_rows options in the parquet reader. (#11503) @nvdbaranec
Drop support for skiprows and num_rows in cudf.read_parquet (#11480) @galipremsagar
Disable Arrow S3 support by default. (#11470) @bdice
Convert thrust::optional usages to std::optional (#11455) @robertmaynard
Remove unused is_struct trait. (#11450) @bdice
Refactor the Buffer class (#11447) @madsbk
Return empty dataframe when reading an ORC file using empty columns option (#11446) @vuule
Refactor pad_side and strip_type enums into side_type enum (#11438) @davidwendt
Remove HASH_SERIAL_MURMUR3 / serial32BitMurmurHash3 (#11383) @bdice
Use the new JSON parser when the experimental reader is selected (#11364) @vuule
Remove deprecated Series.applymap. (#11031) @bdice
Remove deprecated expand parameter from str.findall. (#11030) @bdice

🐛 Bug Fixes

Fixes bug in temporary decompression space estimation before calling nvcomp (#11879) @abellina
Handle ptx file paths during strings_udf import (#11862) @galipremsagar
Disable Zstandard decompression on nvCOMP 2.4 and Pascal GPus (#11856) @vuule
Reset strings_udf CEC and solve several related issues (#11846) @brandon-b-miller
Fix bug in new shuffle-based groupby implementation (#11836) @rjzamora
Fix is_valid checks in Scalar._binaryop (#11818) @wence-
Fix operator NotImplemented issue with numpy (#11816) @galipremsagar
Disable nvCOMP DEFLATE integration (#11811) @vuule
Build strings_udf package with other python packages in nightlies (#11808) @brandon-b-miller
Revert problematic shuffle=explicit-comms changes (#11803) @rjzamora
Fix regex out-of-bounds write in strided rows logic (#11797) @davidwendt
Build cudf locally before building strings_udf conda packages in CI (#11785) @brandon-b-miller
Fix an issue in cudf::row_bit_count involving structs and lists at multiple levels. (#11779) @nvdbaranec
Fix return type of Index.isna & Index.notna (#11769) @galipremsagar
Fix issue with set-item incase of list and struct types (#11760) @galipremsagar
Ensure all libcudf APIs run on cudf's default stream (#11759) @vyasr
Resolve dask_cudf failures caused by upstream groupby changes (#11755) @rjzamora
Fix ORC string sum statistics (#11740) @vuule
Add strings_udf package for python 3.9 (#11730) @brandon-b-miller
Ensure that all tests launch kernels on cudf's default stream (#11726) @vyasr
Don't assume stream is a compile-time constant expression (#11725) @vyasr
Fix get_thrust.cmake format at patch command (#11715) @davidwendt
Fix cudf::partition* APIs that do not return offsets for empty output table (#11709) @ttnghia
Fix cudf::lists::sort_lists for NaN and Infinity values (#11703) @davidwendt
Modify ORC reader timestamp parsing to match the apache reader behavior (#11699) @vuule
Fix DataFrame.from_arrow to preserve type metadata (#11698) @galipremsagar
Fix compile error due to missing header (#11697) @ttnghia
Default to Snappy compression in to_orc when using cuDF or Dask (#11690) @vuule
Fix an issue related to Multindex when group_keys=True (#11689) @galipremsagar
Transfer correct dtype to exploded column (#11687) @wence-
Ignore protobuf generated files in mypy checks (#11685) @galipremsagar
Maintain the index name after .loc (#11677) @shwina
Fix issue with extracting nested column data & dtype preservation (#11671) @galipremsagar
Ensure that all cudf tests and benchmarks are conda env aware (#11666) @robertmaynard
Update to Thrust 1.17.2 to fix cub ODR issues (#11665) @robertmaynard
Fix multi-file remote datasource bug (#11655) @rjzamora
Fix invalid regex quantifier check to not include alternation (#11654) @davidwendt
Fix bug in device_write(): it uses an incorrect size (#11651) @madsbk
fixes overflows in benchmarks (#11649) @elstehle
Fix regex negated classes to not automatically include new-lines (#11644) @davidwendt
Fix compile error in benchmark nested_json.cpp (#11637) @davidwendt
Update zfill to match Python output (#11634) @davidwendt
Removed converted type for INT32 and INT64 since they do not convert (#11627) @hyperbolic2346
Fix host scalars construction of nested types (#11612) @galipremsagar
Fix compile warning in nested_json_gpu.cu (#11607) @davidwendt
Change default value of ordered to False in CategoricalDtype (#11604) @galipremsagar
Preserve order if necessary when deduping categoricals internally (#11597) @brandon-b-miller
Add is_timestamp test for leap second (60) (#11594) @davidwendt
Fix an issue with to_arrow when column name type is not a string (#11590) @galipremsagar
Fix exception in segmented-reduce benchmark (#11588) @davidwendt
Fix encode/decode of negative timestamps in ORC reader/writer (#11586) @vuule
Correct distribution data type in quantiles benchmark (#11584) @vuule
Fix multibyte_split benchmark for host buffers (#11583) @upsj
xfail custreamz display test for now (#11567) @shwina
Fix JNI for TableWithMeta to use schema_info instead of column_names (#11566) @jlowe
Reduce code duplication for dask & distributed nightly/stable installs (#11565) @galipremsagar
Fix groupby failures in dask_cudf CI (#11561) @rjzamora
Fix for pivot: error when 'values' is a multicharacter string (#11538) @shaswat-indian
find_package(cudf) + arrow9 usable with cudf build directory (#11535) @robertmaynard
Fixing crash when writing binary nested data in parquet (#11526) @hyperbolic2346
Fix for: error when assigning a value to an empty series (#11523) @shaswat-indian
Fix invalid results from conditional-left-anti-join in debug build (#11517) @davidwendt
Fix cmake error after upgrading to Arrow 9 (#11513) @ttnghia
Fix reverse binary operators acting on a host value and cudf.Scalar (#11512) @bdice
Update parquet fuzz tests to drop support for skiprows & num_rows (#11505) @galipremsagar
Use rapids-cmake 22.10 best practice for RAPIDS.cmake location (#11493) @robertmaynard
Handle some zero-sized corner cases in dlpack interop (#11449) @wence-
Return empty dataframe when reading an ORC file using empty columns option (#11446) @vuule
libcudf c++ example updated to CPM version 0.35.3 (#11417) @robertmaynard
Fix regex quantifier check to include capture groups (#11373) @davidwendt
Fix read_text when byte_range is aligned with field (#11371) @upsj
Fix to_timestamps truncated subsecond calculation (#11367) @davidwendt
column: calculate null_count before release()ing the cudf::column (#11365) @wence-

📖 Documentation

Update guide-to-udfs notebook (#11861) @brandon-b-miller
Update docstring for cudf.read_text (#11799) @GregoryKimball
Add doc section for list & struct handling (#11770) @galipremsagar
Document that minimum required CMake version is now 3.23.1 (#11751) @robertmaynard
Update libcudf documentation build command in DOCUMENTATION.md (#11735) @davidwendt
Add docs for use of string data to DataFrame.apply and Series.apply and update guide to UDFs notebook (#11733) @brandon-b-miller
Enable more Pydocstyle rules (#11582) @bdice
Remove unused cpp/img folder (#11554) @davidwendt
Publish C++ developer docs (#11475) @vyasr
Fix a misalignment in cudf.get_dummies docstring (#11443) @galipremsagar
Update contributing doc to include links to the developer guides (#11390) @davidwendt
Fix table_view_base doxygen format (#11340) @davidwendt
Create main developer guide for Python (#11235) @vyasr
Add developer documentation for benchmarking (#11122) @vyasr
cuDF error handling document (#7917) @isVoid

🚀 New Features

Add hasNull statistic reading ability to ORC (#11747) @devavret
Add istitle to string UDFs (#11738) @brandon-b-miller
JSON Column creation in GPU (#11714) @karthikeyann
Adds option to take explicit nested schema for nested JSON reader (#11682) @elstehle
Add BGZIP data_chunk_reader (#11652) @upsj
Support DECIMAL order-by for RANGE window functions (#11645) @mythrocks
changing version of cmake to 3.23.3 (#11619) @hyperbolic2346
Generate unique keys table in java JNI contiguousSplitGroups (#11614) @res-life
Generic type casting to support the new nested JSON reader (#11613) @elstehle
JSON tree traversal (#11610) @karthikeyann
Add casting operators to masked UDFs (#11578) @brandon-b-miller
Adds type inference and type conversion for leaf-columns to the nested JSON parser (#11574) @elstehle
Add strings 'like' function (#11558) @davidwendt
Handle hyphen as literal for regex cclass when incomplete range (#11557) @davidwendt
Enable ZSTD compression in ORC and Parquet writers (#11551) @vuule
Adds support for json lines format to the nested JSON reader (#11534) @elstehle
Adding optional parquet reader schema (#11524) @hyperbolic2346
Adds GPU implementation of JSON-token-stream to JSON-tree (#11518) @karthikeyann
Add gdb pretty-printers for simple types (#11499) @upsj
Add create_random_column function to the data generator (#11490) @vuule
Add fluent API builder to data_profile (#11479) @vuule
Adds Nested Json benchmark (#11466) @karthikeyann
Convert thrust::optional usages to std::optional (#11455) @robertmaynard
Python API for the future experimental JSON reader (#11426) @vuule
Return schema info from JSON reader (#11419) @vuule
Add regex ASCII flag support for matching builtin character classes (#11404) @davidwendt
Truncate parquet column indexes (#11403) @etseidl
Adds the end-to-end JSON parser implementation (#11388) @elstehle
Use the new JSON parser when the experimental reader is selected (#11364) @vuule
Add placeholder for the experimental JSON reader (#11334) @vuule
Add read-only functions on string dtypes to DataFrame.apply and Series.apply (#11319) @brandon-b-miller
Added 'crosstab' and 'pivot_table' features (#11314) @shaswat-indian
Quickly error out when trying to build with unsupported nvcc versions (#11297) @robertmaynard
Adds JSON tokenizer (#11264) @elstehle
List lexicographic comparator (#11129) @devavret
Add generic type inference for cuIO (#11121) @PointKernel
Fully support nested types in cudf::contains (#10656) @ttnghia
Support nested types in lists::contains (#10548) @ttnghia

🛠️ Improvements

Pin dask and distributed for release (#11822) @galipremsagar
Add examples for Nested JSON reader (#11814) @GregoryKimball
Support shuffle-based groupby aggregations in dask_cudf (#11800) @rjzamora
Update strings udf version updater script (#11772) @galipremsagar
Remove kwargs in read_csv & to_csv (#11762) @galipremsagar
Pass dtype param to avoid pd.Series warnings (#11761) @galipremsagar
Enable schema_element & keep_quotes support in json reader (#11746) @galipremsagar
Add ability to construct ListColumn when size is None (#11745) @galipremsagar
Reduces memory requirements in JSON parser and adds bytes/s and peak memory usage to benchmarks (#11732) @elstehle
Add missing copyright headers. (#11712) @bdice
Fix copyright check issues in pre-commit (#11711) @bdice
Include decimal in supported types for range window order-by columns (#11710) @mythrocks
Disable very large column gtest for contiguous-split (#11706) @davidwendt
Drop split_out=None test from groupby.agg (#11704) @wence-
Use CubinLinker for CUDA Minor Version Compatibility (#11701) @gmarkall
Add regex capture-group parameter to auto convert to non-capture groups (#11695) @davidwendt
Add a __dataframe__ method to the protocol dataframe object (#11692) @rgommers
Special-case multibyte_split for single-byte delimiter (#11681) @upsj
Remove isort exclusions (#11680) @bdice
Refactor CSV reader benchmarks with nvbench (#11678) @PointKernel
Check conda recipe headers with pre-commit (#11669) @bdice
Remove redundant style check for clang-format. (#11668) @bdice
Add support for group_keys in groupby (#11659) @galipremsagar
Fix pandoc pinning. (#11658) @bdice
Revert removal of skip_rows / num_rows options from the Parquet reader. (#11657) @nvdbaranec
Update git metadata (#11647) @bdice
Call set_null_count on a returning column if null-count is known (#11646) @davidwendt
Fix some libcudf detail calls not passing the stream variable (#11642) @davidwendt
Update to mypy 0.971 (#11640) @wence-
Refactor strings strip functor to details header (#11635) @davidwendt
Fix incorrect nullCount in get_json_object (#11633) @trxcllnt
Simplify hostdevice_vector (#11631) @upsj
Refactor parquet writer benchmarks with nvbench (#11623) @PointKernel
Rework contains_scalar to check nulls at runtime (#11622) @davidwendt
Fix incorrect memory resource used in rolling temp columns (#11618) @mythrocks
Upgrade pandas to 1.5 (#11617) @galipremsagar
Move type-dispatcher calls from traits.hpp to traits.cpp (#11616) @davidwendt
Refactor parquet reader benchmarks with nvbench (#11611) @PointKernel
Forward-merge branch-22.08 to branch-22.10 (#11608) @bdice
Use stream in Java API. (#11601) @bdice
Refactors of public/detail APIs, CUDF_FUNC_RANGE, stream handling. (#11600) @bdice
Improve ORC writer benchmark with nvbench (#11598) @PointKernel
Tune multibyte_split kernel (#11587) @upsj
Move split_utils.cuh to strings/detail (#11585) @davidwendt
Fix warnings due to compiler regression with if constexpr (#11581) @ttnghia
Add full 24-bit dictionary support to Parquet writer (#11580) @etseidl
Expose "explicit-comms" option in shuffle-based dask_cudf functions (#11576) @rjzamora
Move cudf::strings::findall_record to cudf::strings::findall (#11575) @davidwendt
Refactor dask_cudf groupby to use apply_concat_apply (#11571) @rjzamora
Add ability to write list(struct) columns as map type in orc writer (#11568) @galipremsagar
Add byte_range to multibyte_split benchmark + NVBench refactor (#11562) @upsj
JNI support for writing binary columns in parquet (#11556) @revans2
Support additional dictionary bit widths in Parquet writer (#11547) @etseidl
Refactor string/numeric conversion utilities (#11545) @davidwendt
Removing unnecessary asserts in parquet tests (#11544) @hyperbolic2346
Clean up ORC reader benchmarks with NVBench (#11543) @PointKernel
Reuse MurmurHash3_32 in Parquet page data. (#11528) @bdice
Add hexadecimal value separators (#11527) @bdice
Deprecate skiprows and num_rows in read_orc (#11522) @galipremsagar
Struct support for NULL_EQUALS binary operation (#11520) @rwlee
Bump hadoop-common from 3.2.3 to 3.2.4 in /java (#11516) @dependabot[bot]
Fix Feather test warning. (#11511) @bdice
copy_range ballot_syncs to have no execution dependency (#11508) @robertmaynard
Upgrade to arrow-9.x (#11507) @galipremsagar
Remove support for skip_rows / num_rows options in the parquet reader. (#11503) @nvdbaranec
Single-pass multibyte_split (#11500) @upsj
Sanitize percentile_approx() output for empty input (#11498) @SrikarVanavasam
Unpin dask and distributed for development (#11492) @galipremsagar
Move SparkMurmurHash3_32 functor. (#11489) @bdice
Refactor group_nunique.cu to use nullate::DYNAMIC for reduce-by-key functor (#11482) @davidwendt
Drop support for skiprows and num_rows in cudf.read_parquet (#11480) @galipremsagar
Add reduction distinct_count benchmark (#11473) @ttnghia
Add groupby nunique aggregation benchmark (#11472) @ttnghia
Disable Arrow S3 support by default. (#11470) @bdice
Add groupby max aggregation benchmark (#11464) @ttnghia
Extract Dremel encoding code from Parquet (#11461) @vyasr
Add missing Thrust #includes. (#11457) @bdice
Make CMake hooks verbose (#11456) @vyasr
Control Parquet page size through Python API (#11454) @etseidl
Add control of Parquet column index creation to python (#11453) @etseidl
Remove unused is_struct trait. (#11450) @bdice
Refactor the Buffer class (#11447) @madsbk
Refactor pad_side and strip_type enums into side_type enum (#11438) @davidwendt
Update to Thrust 1.17.0 (#11437) @bdice
Add in JNI for parsing JSON data and getting the metadata back too. (#11431) @revans2
Convert byte_array_view to use std::byte (#11424) @hyperbolic2346
Deprecate unflatten_nested_columns (#11421) @SrikarVanavasam
Remove HASH_SERIAL_MURMUR3 / serial32BitMurmurHash3 (#11383) @bdice
Add Spark list hashing Java tests (#11379) @bdice
Move cmake to the build section. (#11376) @vyasr
Remove use of CUDA driver API calls from libcudf (#11370) @shwina
Add column constructor from device_uvector&& (#11356) @SrikarVanavasam
Remove unused custreamz thirdparty directory (#11343) @vyasr
Update jni version to 22.10.0-SNAPSHOT (#11338) @pxLi
Enable using upstream jitify2 (#11287) @shwina
Cache cudf.Scalar (#11246) @shwina
Remove deprecated Series.applymap. (#11031) @bdice
Remove deprecated expand parameter from str.findall. (#11030) @bdice

cudf - v22.08.01

Published by GPUtester about 2 years ago

🚨 Breaking Changes

Pin numpy to <1.23 (#11824) @galipremsagar
Remove legacy join APIs (#11274) @vyasr
Remove lists::drop_list_duplicates (#11236) @ttnghia
Remove Index.replace API (#11131) @vyasr
Remove deprecated Index methods from Frame (#11073) @vyasr
Remove public API of cudf.merge_sorted. (#11032) @bdice
Drop python 3.7 in code-base (#11029) @galipremsagar
Return empty dataframe when reading a Parquet file using empty columns option (#11018) @vuule
Remove Arrow CUDA IPC code (#10995) @shwina
Buffer: make .ptr read-only (#10872) @madsbk

🐛 Bug Fixes

Fix out-of-bound access in cudf::detail::label_segments (#11497) @ttnghia
Fix distributed error related to loop_in_thread (#11428) @galipremsagar
Fix atomic operations on NaN values (#11420) @ttnghia
Relax arrow pinning to just 8.x and remove cuda build dependency from cudf recipe (#11412) @kkraus14
Revert "Allow CuPy 11" (#11409) @jakirkham
Fix moto timeouts (#11369) @galipremsagar
Set +/-infinity as the identity values for floating-point numbers in device operators min and max (#11357) @ttnghia
Fix memory_usage() for ListSeries (#11355) @thomcom
Fix constructing Column from column_view with expired mask (#11354) @shwina
Handle parquet corner case: Columns with more rows than are in the row group. (#11353) @nvdbaranec
Fix DatetimeIndex & TimedeltaIndex constructors (#11342) @galipremsagar
Fix unsigned-compare compile warning in IntPow binops (#11339) @davidwendt
Fix performance issue and add a new code path to cudf::detail::contains (#11330) @ttnghia
Pin pytorch to temporarily unblock from libcupti errors (#11289) @galipremsagar
Workaround for nvcomp zstd overwriting blocks for orc due to underestimate of sizes (#11288) @jbrennan333
Fix inconsistency when hashing two tables in cudf::detail::contains (#11284) @ttnghia
Fix issue related to numpy array and category dtype (#11282) @galipremsagar
Add NotImplementedError when on is specified in DataFrame.join. (#11275) @vyasr
Fix invalid allocate_like() and empty_like() tests. (#11268) @nvdbaranec
Returns DataFrame When Concating Along Axis 1 (#11263) @isVoid
Fix compile error due to missing header (#11257) @ttnghia
Fix a memory aliasing/crash issue in scatter for lists. (#11254) @nvdbaranec
Fix tests/rolling/empty_input_test (#11238) @ttnghia
Fix const qualifier when using host_span<bitmask_type const*> (#11220) @ttnghia
Avoid using nvcompBatchedDeflateDecompressGetTempSizeEx in cuIO (#11213) @vuule
Generate benchmark data with correct run length regardless of cardinality (#11205) @vuule
Fix cumulative count index behavior (#11188) @brandon-b-miller
Fix assertion in dask_cudf test_struct_explode (#11170) @rjzamora
Provides a method for the user to remove the hook and re-register the hook in a custom shutdown hook manager (#11161) @res-life
Fix compatibility issues with pandas 1.4.3 (#11152) @vyasr
Ensure cuco export set is installed in cmake build (#11147) @jlowe
Avoid redundant deepcopy in cudf.from_pandas (#11142) @galipremsagar
Fix compile error due to missing header (#11126) @ttnghia
Fix __cuda_array_interface__ failures (#11113) @galipremsagar
Support octal and hex within regex character class pattern (#11112) @davidwendt
Fix split_re matching logic for word boundaries (#11106) @davidwendt
Handle multiple files metadata in read_parquet (#11105) @galipremsagar
Fix index alignment for Series objects with repeated index (#11103) @shwina
FindcuFile now searches in the current CUDA Toolkit location (#11101) @robertmaynard
Fix regex word boundary logic to include underline (#11099) @davidwendt
Exclude CudaFatalTest when selecting all Java tests (#11083) @jlowe
Fix duplicate cudatoolkit pinning issue (#11070) @galipremsagar
Maintain the input index in the result of a groupby-transform (#11068) @shwina
Fix bug with row count comparison for expect_columns_equivalent(). (#11059) @nvdbaranec
Fix BPE uninitialized size value for null and empty input strings (#11054) @davidwendt
Include missing header for usage of get_current_device_resource() (#11047) @AtlantaPepsi
Fix warn_unused_result error in parquet test (#11026) @karthikeyann
Return empty dataframe when reading a Parquet file using empty columns option (#11018) @vuule
Fix small error in page row count limiting (#10991) @etseidl
Fix a row index entry error in ORC writer issue (#10989) @vuule
Fix grouped covariance to require both values to be convertible to double. (#10891) @bdice

📖 Documentation

Defer loading of custom.js (#11465) @galipremsagar
Fix issues with day & night modes in python docs (#11400) @galipremsagar
Update missing data handling APIs in docs (#11345) @galipremsagar
Add lists filtering APIs to doxygen group. (#11336) @bdice
Remove unused import in README sample (#11318) @vyasr
Note null behavior in where docs (#11276) @brandon-b-miller
Update docstring for spans in get_row_data_range (#11271) @vyasr
Update nvCOMP integration table (#11231) @vuule
Add dev docs for documentation writing (#11217) @vyasr
Documentation fix for concatenate (#11187) @dagardner-nv
Fix unresolved links in markdown (#11173) @karthikeyann
Fix cudf version in README.md install commands (#11164) @jvanstraten
Switch language from None to "en" in docs build (#11133) @galipremsagar
Remove docs mentioning scalar_view since no such class exists. (#11132) @bdice
Add docstring entry for DataFrame.value_counts (#11039) @galipremsagar
Add docs to rolling var, std, count. (#11035) @bdice
Fix docs for Numba UDFs. (#11020) @bdice
Replace column comparison utilities functions with macros (#11007) @karthikeyann
Fix Doxygen warnings in multiple headers files (#11003) @karthikeyann
Fix doxygen warnings in utilities/ headers (#10974) @karthikeyann
Fix Doxygen warnings in table header files (#10964) @karthikeyann
Fix Doxygen warnings in column header files (#10963) @karthikeyann
Fix Doxygen warnings in strings / header files (#10937) @karthikeyann
Generate Doxygen Tag File for Libcudf (#10932) @isVoid
Fix doxygen warnings in structs, lists headers (#10923) @karthikeyann
Fix doxygen warnings in fixed_point.hpp (#10922) @karthikeyann
Fix doxygen warnings in ast/, rolling, tdigest/, wrappers/, dictionary/ headers (#10921) @karthikeyann
fix doxygen warnings in cudf/io/types.hpp, other header files (#10913) @karthikeyann
fix doxygen warnings in cudf/io/ avro, csv, json, orc, parquet header files (#10912) @karthikeyann
Fix doxygen warnings in cudf/*.hpp (#10896) @karthikeyann
Add missing documentation in aggregation.hpp (#10887) @karthikeyann
Revise PR template. (#10774) @bdice

🚀 New Features

Change cmake to allow controlling Arrow version via cmake variable (#11429) @kkraus14
Adding support for list<int8> columns to be written as byte arrays in parquet (#11328) @hyperbolic2346
Adding byte array view structure (#11322) @hyperbolic2346
Adding byte_array statistics (#11303) @hyperbolic2346
Add column indexes to Parquet writer (#11302) @etseidl
Provide an Option for Default Integer and Floating Bitwidth (#11272) @isVoid
FST benchmark (#11243) @karthikeyann
Adds the Finite-State Transducer algorithm (#11242) @elstehle
Refactor collect_set to use cudf::distinct and cudf::lists::distinct (#11228) @ttnghia
Treat zstd as stable in nvcomp releases 2.3.2 and later (#11226) @jbrennan333
Add 24 bit dictionary support to Parquet writer (#11216) @devavret
Enable positive group indices for extractAllRecord on JNI (#11215) @anthony-chang
JNI bindings for NTH_ELEMENT window aggregation (#11201) @mythrocks
Add JNI bindings for extractAllRecord (#11196) @anthony-chang
Add cudf.options (#11193) @isVoid
Add thrift support for parquet column and offset indexes (#11178) @etseidl
Adding binary read/write as options for parquet (#11160) @hyperbolic2346
Support nth_element for window functions (#11158) @mythrocks
Implement lists::distinct and cudf::detail::stable_distinct (#11149) @ttnghia
Implement Groupby pct_change (#11144) @skirui-source
Add JNI for set operations (#11143) @ttnghia
Remove deprecated PER_THREAD_DEFAULT_STREAM (#11134) @jbrennan333
Added a Java method to check the existence of a list of keys in a map (#11128) @razajafri
Feature/python benchmarking (#11125) @vyasr
Support nan_equality in cudf::distinct (#11118) @ttnghia
Added JNI for getMapValueForKeys (#11104) @razajafri
Refactor semi_anti_join (#11100) @ttnghia
Replace remaining instances of rmm::cuda_stream_default with cudf::default_stream_value (#11082) @jbrennan333
Adds the Logical Stack algorithm (#11078) @elstehle
Add doxygen-check pre-commit hook (#11076) @karthikeyann
Use new nvCOMP API to optimize the decompression temp memory size (#11064) @vuule
Add Doxygen CI check (#11057) @karthikeyann
Support duplicate_keep_option in cudf::distinct (#11052) @ttnghia
Support set operations (#11043) @ttnghia
Support for ZLIB compression in ORC writer (#11036) @vuule
Adding feature swaplevels (#11027) @VamsiTallam95
Use nvCOMP for ZLIB decompression in ORC reader (#11024) @vuule
Function for bfill, ffill #9591 (#11022) @Sreekiran096
Generate group offsets from element labels (#11017) @ttnghia
Feature axes (#10979) @VamsiTallam95
Generate group labels from offsets (#10945) @ttnghia
Add missing cuIO benchmark coverage for duration types (#10933) @vuule
Dask-cuDF cumulative groupby ops (#10889) @brandon-b-miller
Reindex Improvements (#10815) @brandon-b-miller
Implement value_counts for DataFrame (#10813) @martinfalisse

🛠️ Improvements

Pin numpy to <1.23 (#11824) @galipremsagar
Make Index Join Tests on Default Precisions Deterministic (#11451) @isVoid
Pin dask & distributed for release (#11433) @galipremsagar
Use documented header template for doxygen (#11430) @galipremsagar
Relax arrow version in dev env (#11418) @galipremsagar
Added Java bindings for Parquet options for binary read (#11410) @razajafri
Allow CuPy 11 (#11393) @jakirkham
Improve multibyte_split performance (#11347) @cwharris
Switch death test to use explicit trap. (#11326) @vyasr
Add --output-on-failure to ctest args. (#11321) @vyasr
Consolidate remaining DataFrame/Series APIs (#11315) @vyasr
Add JNI support for the join_strings API (#11309) @revans2
Add cupy version to setup.py install_requires (#11306) @vyasr
removing some unused code (#11305) @hyperbolic2346
Add test of wildcard selection (#11300) @vyasr
Update parquet reader to take stream parameter (#11294) @PointKernel
Spark list hashing (#11292) @bdice
Remove legacy join APIs (#11274) @vyasr
Fix cudf recipes syntax (#11273) @ajschmidt8
Fix cudf recipe (#11267) @ajschmidt8
Cleanup config files (#11266) @vyasr
Run mypy on all packages (#11265) @vyasr
Update to isort 5.10.1. (#11262) @vyasr
Consolidate flake8 and pydocstyle configuration (#11260) @vyasr
Remove redundant black config specifications. (#11258) @vyasr
Ensure DeprecationWarnings are not introduced via pre-commit (#11255) @wence-
Optimization to gpu::PreprocessColumnData in parquet reader. (#11252) @nvdbaranec
Move rolling impl details to detail/ directory. (#11250) @mythrocks
Remove lists::drop_list_duplicates (#11236) @ttnghia
Use cudf::lists::distinct in Python binding (#11234) @ttnghia
Use cudf::lists::distinct in Java binding (#11233) @ttnghia
Use cudf::distinct in Java binding (#11232) @ttnghia
Pin dask-cuda in dev environment (#11229) @galipremsagar
Remove cruft in map_lookup (#11221) @mythrocks
Deprecate skiprows & num_rows in parquet reader (#11218) @galipremsagar
Remove Frame._index (#11210) @vyasr
Improve performance for cudf::contains when searching for a scalar (#11202) @ttnghia
Document why Development component is needing for CMake. (#11200) @vyasr
cleanup unused code in rolling_test.hpp (#11195) @karthikeyann
Standardize join internals around DataFrame (#11184) @vyasr
Move character case table declarations from src to detail (#11183) @davidwendt
Remove usage of Frame in StringMethods (#11181) @vyasr
Expose get_json_object_options to Python (#11180) @SrikarVanavasam
Fix decimal128 stats in parquet writer (#11179) @etseidl
Modify CheckPageRows in parquet_test to use datasources (#11177) @etseidl
Pin max version of cuda-python to 11.7.0 (#11174) @Ethyling
Refactor and optimize Frame.where (#11168) @vyasr
Add npos const static member to cudf::string_view (#11166) @davidwendt
Move _drop_rows_by_label from Frame to IndexedFrame (#11157) @vyasr
Clean up _copy_type_metadata (#11156) @vyasr
Add nvcc conda package in dev environment (#11154) @galipremsagar
Struct binary comparison op functionality for spark rapids (#11153) @rwlee
Refactor inline conditionals. (#11151) @bdice
Refactor Spark hashing tests (#11145) @bdice
Add new _from_data_like_self factory (#11140) @vyasr
Update get_cucollections to use rapids-cmake (#11139) @vyasr
Remove unnecessary extra function for libcudacxx detection (#11138) @vyasr
Allow initial value for cudf::reduce and cudf::segmented_reduce. (#11137) @SrikarVanavasam
Remove Index.replace API (#11131) @vyasr
Move char-type table function declarations from src to detail (#11127) @davidwendt
Clean up repo root (#11124) @bdice
Improve print formatting of strings containing newline characters. (#11108) @nvdbaranec
Fix cudf::string_view::find() to return pos for empty string argument (#11107) @davidwendt
Forward-merge branch-22.06 to branch-22.08 (#11086) @bdice
Take iterators by value in clamp.cu. (#11084) @bdice
Performance improvements for row to column conversions (#11075) @hyperbolic2346
Remove deprecated Index methods from Frame (#11073) @vyasr
Use per-page max compressed size estimate for compression (#11066) @devavret
column to row refactor for performance (#11063) @hyperbolic2346
Include skbuild directory into build.sh clean operation (#11060) @galipremsagar
Unpin dask & distributed for development (#11058) @galipremsagar
Add support for Series.between (#11051) @galipremsagar
Fix groupby include (#11046) @bwyogatama
Regex cleanup internal reclass and reclass_device classes (#11045) @davidwendt
Remove public API of cudf.merge_sorted. (#11032) @bdice
Drop python 3.7 in code-base (#11029) @galipremsagar
Addition & integration of the integer power operator (#11025) @AtlantaPepsi
Refactor lists::contains (#11019) @ttnghia
Change build.sh to find C++ library by default and avoid shadowing CMAKE_ARGS (#11013) @vyasr
Clean up parquet unit test (#11005) @PointKernel
Add missing #pragma once to header files (#11004) @karthikeyann
Cleanup iterator.cuh and add fixed point support for scalar_optional_accessor (#10999) @ttnghia
Refactor cudf::contains (#10997) @ttnghia
Remove Arrow CUDA IPC code (#10995) @shwina
Change file extension for groupby benchmark (#10985) @ttnghia
Sort recipe include checks. (#10984) @bdice
Update cuCollections for thrust upgrade (#10983) @PointKernel
Expose row-group size options in cudf ParquetWriter (#10980) @rjzamora
Cleanup cudf::strings::detail::regex_parser class source (#10975) @davidwendt
Handle missing fields as nulls in get_json_object() (#10970) @SrikarVanavasam
Fix license families to match all-caps expected by conda-verify. (#10931) @bdice
Include <optional> for GCC 11 compatibility. (#10927) @bdice
Enable builds with scikit-build (#10919) @vyasr
Improve distinct by using cuco::static_map::retrieve_all (#10916) @PointKernel
update cudfjni to 22.08.0-SNAPSHOT (#10910) @pxLi
Improve the capture of fatal cuda error (#10884) @sperlingxx
Cleanup regex compiler operators and operands source (#10879) @davidwendt
Buffer: make .ptr read-only (#10872) @madsbk
Configurable NaN handling in device_row_comparators (#10870) @rwlee
Register cudf.core.groupby.Grouper objects to dask grouper_dispatch (#10838) @brandon-b-miller
Upgrade to arrow-8 (#10816) @galipremsagar
Remove getattr method in RangeIndex class (#10538) @skirui-source
Adding bins to value counts (#8247) @marlenezw

cudf - v22.08.00

Published by GPUtester about 2 years ago

🚨 Breaking Changes

Remove legacy join APIs (#11274) @vyasr
Remove lists::drop_list_duplicates (#11236) @ttnghia
Remove Index.replace API (#11131) @vyasr
Remove deprecated Index methods from Frame (#11073) @vyasr
Remove public API of cudf.merge_sorted. (#11032) @bdice
Drop python 3.7 in code-base (#11029) @galipremsagar
Return empty dataframe when reading a Parquet file using empty columns option (#11018) @vuule
Remove Arrow CUDA IPC code (#10995) @shwina
Buffer: make .ptr read-only (#10872) @madsbk

🐛 Bug Fixes

Fix distributed error related to loop_in_thread (#11428) @galipremsagar
Relax arrow pinning to just 8.x and remove cuda build dependency from cudf recipe (#11412) @kkraus14
Revert "Allow CuPy 11" (#11409) @jakirkham
Fix moto timeouts (#11369) @galipremsagar
Set +/-infinity as the identity values for floating-point numbers in device operators min and max (#11357) @ttnghia
Fix memory_usage() for ListSeries (#11355) @thomcom
Fix constructing Column from column_view with expired mask (#11354) @shwina
Handle parquet corner case: Columns with more rows than are in the row group. (#11353) @nvdbaranec
Fix DatetimeIndex & TimedeltaIndex constructors (#11342) @galipremsagar
Fix unsigned-compare compile warning in IntPow binops (#11339) @davidwendt
Fix performance issue and add a new code path to cudf::detail::contains (#11330) @ttnghia
Pin pytorch to temporarily unblock from libcupti errors (#11289) @galipremsagar
Workaround for nvcomp zstd overwriting blocks for orc due to underestimate of sizes (#11288) @jbrennan333
Fix inconsistency when hashing two tables in cudf::detail::contains (#11284) @ttnghia
Fix issue related to numpy array and category dtype (#11282) @galipremsagar
Add NotImplementedError when on is specified in DataFrame.join. (#11275) @vyasr
Fix invalid allocate_like() and empty_like() tests. (#11268) @nvdbaranec
Returns DataFrame When Concating Along Axis 1 (#11263) @isVoid
Fix compile error due to missing header (#11257) @ttnghia
Fix a memory aliasing/crash issue in scatter for lists. (#11254) @nvdbaranec
Fix tests/rolling/empty_input_test (#11238) @ttnghia
Fix const qualifier when using host_span<bitmask_type const*> (#11220) @ttnghia
Avoid using nvcompBatchedDeflateDecompressGetTempSizeEx in cuIO (#11213) @vuule
Generate benchmark data with correct run length regardless of cardinality (#11205) @vuule
Fix cumulative count index behavior (#11188) @brandon-b-miller
Fix assertion in dask_cudf test_struct_explode (#11170) @rjzamora
Provides a method for the user to remove the hook and re-register the hook in a custom shutdown hook manager (#11161) @res-life
Fix compatibility issues with pandas 1.4.3 (#11152) @vyasr
Ensure cuco export set is installed in cmake build (#11147) @jlowe
Avoid redundant deepcopy in cudf.from_pandas (#11142) @galipremsagar
Fix compile error due to missing header (#11126) @ttnghia
Fix __cuda_array_interface__ failures (#11113) @galipremsagar
Support octal and hex within regex character class pattern (#11112) @davidwendt
Fix split_re matching logic for word boundaries (#11106) @davidwendt
Handle multiple files metadata in read_parquet (#11105) @galipremsagar
Fix index alignment for Series objects with repeated index (#11103) @shwina
FindcuFile now searches in the current CUDA Toolkit location (#11101) @robertmaynard
Fix regex word boundary logic to include underline (#11099) @davidwendt
Exclude CudaFatalTest when selecting all Java tests (#11083) @jlowe
Fix duplicate cudatoolkit pinning issue (#11070) @galipremsagar
Maintain the input index in the result of a groupby-transform (#11068) @shwina
Fix bug with row count comparison for expect_columns_equivalent(). (#11059) @nvdbaranec
Fix BPE uninitialized size value for null and empty input strings (#11054) @davidwendt
Include missing header for usage of get_current_device_resource() (#11047) @AtlantaPepsi
Fix warn_unused_result error in parquet test (#11026) @karthikeyann
Return empty dataframe when reading a Parquet file using empty columns option (#11018) @vuule
Fix small error in page row count limiting (#10991) @etseidl
Fix a row index entry error in ORC writer issue (#10989) @vuule
Fix grouped covariance to require both values to be convertible to double. (#10891) @bdice

📖 Documentation

Fix issues with day & night modes in python docs (#11400) @galipremsagar
Update missing data handling APIs in docs (#11345) @galipremsagar
Add lists filtering APIs to doxygen group. (#11336) @bdice
Remove unused import in README sample (#11318) @vyasr
Note null behavior in where docs (#11276) @brandon-b-miller
Update docstring for spans in get_row_data_range (#11271) @vyasr
Update nvCOMP integration table (#11231) @vuule
Add dev docs for documentation writing (#11217) @vyasr
Documentation fix for concatenate (#11187) @dagardner-nv
Fix unresolved links in markdown (#11173) @karthikeyann
Fix cudf version in README.md install commands (#11164) @jvanstraten
Switch language from None to "en" in docs build (#11133) @galipremsagar
Remove docs mentioning scalar_view since no such class exists. (#11132) @bdice
Add docstring entry for DataFrame.value_counts (#11039) @galipremsagar
Add docs to rolling var, std, count. (#11035) @bdice
Fix docs for Numba UDFs. (#11020) @bdice
Replace column comparison utilities functions with macros (#11007) @karthikeyann
Fix Doxygen warnings in multiple headers files (#11003) @karthikeyann
Fix doxygen warnings in utilities/ headers (#10974) @karthikeyann
Fix Doxygen warnings in table header files (#10964) @karthikeyann
Fix Doxygen warnings in column header files (#10963) @karthikeyann
Fix Doxygen warnings in strings / header files (#10937) @karthikeyann
Generate Doxygen Tag File for Libcudf (#10932) @isVoid
Fix doxygen warnings in structs, lists headers (#10923) @karthikeyann
Fix doxygen warnings in fixed_point.hpp (#10922) @karthikeyann
Fix doxygen warnings in ast/, rolling, tdigest/, wrappers/, dictionary/ headers (#10921) @karthikeyann
fix doxygen warnings in cudf/io/types.hpp, other header files (#10913) @karthikeyann
fix doxygen warnings in cudf/io/ avro, csv, json, orc, parquet header files (#10912) @karthikeyann
Fix doxygen warnings in cudf/*.hpp (#10896) @karthikeyann
Add missing documentation in aggregation.hpp (#10887) @karthikeyann
Revise PR template. (#10774) @bdice

🚀 New Features

Change cmake to allow controlling Arrow version via cmake variable (#11429) @kkraus14
Adding support for list<int8> columns to be written as byte arrays in parquet (#11328) @hyperbolic2346
Adding byte array view structure (#11322) @hyperbolic2346
Adding byte_array statistics (#11303) @hyperbolic2346
Add column indexes to Parquet writer (#11302) @etseidl
Provide an Option for Default Integer and Floating Bitwidth (#11272) @isVoid
FST benchmark (#11243) @karthikeyann
Adds the Finite-State Transducer algorithm (#11242) @elstehle
Refactor collect_set to use cudf::distinct and cudf::lists::distinct (#11228) @ttnghia
Treat zstd as stable in nvcomp releases 2.3.2 and later (#11226) @jbrennan333
Add 24 bit dictionary support to Parquet writer (#11216) @devavret
Enable positive group indices for extractAllRecord on JNI (#11215) @anthony-chang
JNI bindings for NTH_ELEMENT window aggregation (#11201) @mythrocks
Add JNI bindings for extractAllRecord (#11196) @anthony-chang
Add cudf.options (#11193) @isVoid
Add thrift support for parquet column and offset indexes (#11178) @etseidl
Adding binary read/write as options for parquet (#11160) @hyperbolic2346
Support nth_element for window functions (#11158) @mythrocks
Implement lists::distinct and cudf::detail::stable_distinct (#11149) @ttnghia
Implement Groupby pct_change (#11144) @skirui-source
Add JNI for set operations (#11143) @ttnghia
Remove deprecated PER_THREAD_DEFAULT_STREAM (#11134) @jbrennan333
Added a Java method to check the existence of a list of keys in a map (#11128) @razajafri
Feature/python benchmarking (#11125) @vyasr
Support nan_equality in cudf::distinct (#11118) @ttnghia
Added JNI for getMapValueForKeys (#11104) @razajafri
Refactor semi_anti_join (#11100) @ttnghia
Replace remaining instances of rmm::cuda_stream_default with cudf::default_stream_value (#11082) @jbrennan333
Adds the Logical Stack algorithm (#11078) @elstehle
Add doxygen-check pre-commit hook (#11076) @karthikeyann
Use new nvCOMP API to optimize the decompression temp memory size (#11064) @vuule
Add Doxygen CI check (#11057) @karthikeyann
Support duplicate_keep_option in cudf::distinct (#11052) @ttnghia
Support set operations (#11043) @ttnghia
Support for ZLIB compression in ORC writer (#11036) @vuule
Adding feature swaplevels (#11027) @VamsiTallam95
Use nvCOMP for ZLIB decompression in ORC reader (#11024) @vuule
Function for bfill, ffill #9591 (#11022) @Sreekiran096
Generate group offsets from element labels (#11017) @ttnghia
Feature axes (#10979) @VamsiTallam95
Generate group labels from offsets (#10945) @ttnghia
Add missing cuIO benchmark coverage for duration types (#10933) @vuule
Dask-cuDF cumulative groupby ops (#10889) @brandon-b-miller
Reindex Improvements (#10815) @brandon-b-miller
Implement value_counts for DataFrame (#10813) @martinfalisse

🛠️ Improvements

Pin dask & distributed for release (#11433) @galipremsagar
Use documented header template for doxygen (#11430) @galipremsagar
Relax arrow version in dev env (#11418) @galipremsagar
Allow CuPy 11 (#11393) @jakirkham
Improve multibyte_split performance (#11347) @cwharris
Switch death test to use explicit trap. (#11326) @vyasr
Add --output-on-failure to ctest args. (#11321) @vyasr
Consolidate remaining DataFrame/Series APIs (#11315) @vyasr
Add JNI support for the join_strings API (#11309) @revans2
Add cupy version to setup.py install_requires (#11306) @vyasr
removing some unused code (#11305) @hyperbolic2346
Add test of wildcard selection (#11300) @vyasr
Update parquet reader to take stream parameter (#11294) @PointKernel
Spark list hashing (#11292) @bdice
Remove legacy join APIs (#11274) @vyasr
Fix cudf recipes syntax (#11273) @ajschmidt8
Fix cudf recipe (#11267) @ajschmidt8
Cleanup config files (#11266) @vyasr
Run mypy on all packages (#11265) @vyasr
Update to isort 5.10.1. (#11262) @vyasr
Consolidate flake8 and pydocstyle configuration (#11260) @vyasr
Remove redundant black config specifications. (#11258) @vyasr
Ensure DeprecationWarnings are not introduced via pre-commit (#11255) @wence-
Optimization to gpu::PreprocessColumnData in parquet reader. (#11252) @nvdbaranec
Move rolling impl details to detail/ directory. (#11250) @mythrocks
Remove lists::drop_list_duplicates (#11236) @ttnghia
Use cudf::lists::distinct in Python binding (#11234) @ttnghia
Use cudf::lists::distinct in Java binding (#11233) @ttnghia
Use cudf::distinct in Java binding (#11232) @ttnghia
Pin dask-cuda in dev environment (#11229) @galipremsagar
Remove cruft in map_lookup (#11221) @mythrocks
Deprecate skiprows & num_rows in parquet reader (#11218) @galipremsagar
Remove Frame._index (#11210) @vyasr
Improve performance for cudf::contains when searching for a scalar (#11202) @ttnghia
Document why Development component is needing for CMake. (#11200) @vyasr
cleanup unused code in rolling_test.hpp (#11195) @karthikeyann
Standardize join internals around DataFrame (#11184) @vyasr
Move character case table declarations from src to detail (#11183) @davidwendt
Remove usage of Frame in StringMethods (#11181) @vyasr
Expose get_json_object_options to Python (#11180) @SrikarVanavasam
Fix decimal128 stats in parquet writer (#11179) @etseidl
Modify CheckPageRows in parquet_test to use datasources (#11177) @etseidl
Pin max version of cuda-python to 11.7.0 (#11174) @Ethyling
Refactor and optimize Frame.where (#11168) @vyasr
Add npos const static member to cudf::string_view (#11166) @davidwendt
Move _drop_rows_by_label from Frame to IndexedFrame (#11157) @vyasr
Clean up _copy_type_metadata (#11156) @vyasr
Add nvcc conda package in dev environment (#11154) @galipremsagar
Struct binary comparison op functionality for spark rapids (#11153) @rwlee
Refactor inline conditionals. (#11151) @bdice
Refactor Spark hashing tests (#11145) @bdice
Add new _from_data_like_self factory (#11140) @vyasr
Update get_cucollections to use rapids-cmake (#11139) @vyasr
Remove unnecessary extra function for libcudacxx detection (#11138) @vyasr
Allow initial value for cudf::reduce and cudf::segmented_reduce. (#11137) @SrikarVanavasam
Remove Index.replace API (#11131) @vyasr
Move char-type table function declarations from src to detail (#11127) @davidwendt
Clean up repo root (#11124) @bdice
Improve print formatting of strings containing newline characters. (#11108) @nvdbaranec
Fix cudf::string_view::find() to return pos for empty string argument (#11107) @davidwendt
Forward-merge branch-22.06 to branch-22.08 (#11086) @bdice
Take iterators by value in clamp.cu. (#11084) @bdice
Performance improvements for row to column conversions (#11075) @hyperbolic2346
Remove deprecated Index methods from Frame (#11073) @vyasr
Use per-page max compressed size estimate for compression (#11066) @devavret
column to row refactor for performance (#11063) @hyperbolic2346
Include skbuild directory into build.sh clean operation (#11060) @galipremsagar
Unpin dask & distributed for development (#11058) @galipremsagar
Add support for Series.between (#11051) @galipremsagar
Fix groupby include (#11046) @bwyogatama
Regex cleanup internal reclass and reclass_device classes (#11045) @davidwendt
Remove public API of cudf.merge_sorted. (#11032) @bdice
Drop python 3.7 in code-base (#11029) @galipremsagar
Addition & integration of the integer power operator (#11025) @AtlantaPepsi
Refactor lists::contains (#11019) @ttnghia
Change build.sh to find C++ library by default and avoid shadowing CMAKE_ARGS (#11013) @vyasr
Clean up parquet unit test (#11005) @PointKernel
Add missing #pragma once to header files (#11004) @karthikeyann
Cleanup iterator.cuh and add fixed point support for scalar_optional_accessor (#10999) @ttnghia
Refactor cudf::contains (#10997) @ttnghia
Remove Arrow CUDA IPC code (#10995) @shwina
Change file extension for groupby benchmark (#10985) @ttnghia
Sort recipe include checks. (#10984) @bdice
Update cuCollections for thrust upgrade (#10983) @PointKernel
Expose row-group size options in cudf ParquetWriter (#10980) @rjzamora
Cleanup cudf::strings::detail::regex_parser class source (#10975) @davidwendt
Handle missing fields as nulls in get_json_object() (#10970) @SrikarVanavasam
Fix license families to match all-caps expected by conda-verify. (#10931) @bdice
Include <optional> for GCC 11 compatibility. (#10927) @bdice
Enable builds with scikit-build (#10919) @vyasr
Improve distinct by using cuco::static_map::retrieve_all (#10916) @PointKernel
update cudfjni to 22.08.0-SNAPSHOT (#10910) @pxLi
Improve the capture of fatal cuda error (#10884) @sperlingxx
Cleanup regex compiler operators and operands source (#10879) @davidwendt
Buffer: make .ptr read-only (#10872) @madsbk
Configurable NaN handling in device_row_comparators (#10870) @rwlee
Register cudf.core.groupby.Grouper objects to dask grouper_dispatch (#10838) @brandon-b-miller
Upgrade to arrow-8 (#10816) @galipremsagar
Remove getattr method in RangeIndex class (#10538) @skirui-source
Adding bins to value counts (#8247) @marlenezw

cudf - v22.06.01

Published by GPUtester over 2 years ago

v22.06.01

cudf - v22.06.00

Published by GPUtester over 2 years ago

🚨 Breaking Changes

Enable Zstandard decompression only when all nvcomp integrations are enabled (#10944) @vuule
Rename sliced_child to get_sliced_child. (#10885) @bdice
Add parameters to control page size in Parquet writer (#10882) @etseidl
Make cudf::test::expect_columns_equal() to fail when comparing unsanitary lists. (#10880) @nvdbaranec
Cleanup regex compiler fixed quantifiers source (#10843) @davidwendt
Refactor cudf::contains, renaming and switching parameters role (#10802) @ttnghia
Generic serialization of all column types (#10784) @wence-
Return per-file metadata from readers (#10782) @vuule
HostColumnVectoreCore#isNull should return true for out-of-range rows (#10779) @gerashegalov
Update groupby::hash to use new row operators for keys (#10770) @PointKernel
update mangle_dupe_cols behavior in csv reader to match pandas 1.4.0 behavior (#10749) @karthikeyann
Rename CUDA_TRY macro to CUDF_CUDA_TRY, rename CHECK_CUDA macro to CUDF_CHECK_CUDA. (#10589) @bdice
Upgrade cudf to support pandas 1.4.x versions (#10584) @galipremsagar
Move binop methods from Frame to IndexedFrame and standardize the docstring (#10576) @vyasr
Add default= kwarg to .list.get() accessor method (#10547) @shwina
Remove deprecated decimal_cols_as_float in the ORC reader (#10515) @vuule
Support nvComp 2.3 if local, otherwise use nvcomp 2.2 (#10513) @robertmaynard
Fix findall_record to return empty list for no matches (#10491) @davidwendt
Namespace/Docstring Fixes for Reduction (#10471) @isVoid
Additional refactoring of hash functions (#10462) @bdice
Fix default value of str.split expand parameter. (#10457) @bdice
Remove deprecated code. (#10450) @vyasr

🐛 Bug Fixes

Fix single column MultiIndex issue in sort_index (#10957) @galipremsagar
Make SerializedTableHeader(numRows) public (#10949) @gerashegalov
Fix gcc_linux version pinning in dev environment (#10943) @galipremsagar
Fix an issue with reading raw string in cudf.read_json (#10924) @galipremsagar
Make cudf::test::expect_columns_equal() to fail when comparing unsanitary lists. (#10880) @nvdbaranec
Fix segmented_reduce on empty column with non-empty offsets (#10876) @davidwendt
Fix dask-cudf groupby handling when grouping by all columns (#10866) @charlesbluca
Fix a bug in distinct: using nested nulls logic (#10848) @PointKernel
Fix constness / references in weak ordering operator() signatures. (#10846) @bdice
Suppress sizeof-array-div warnings in thrust found by gcc-11 (#10840) @robertmaynard
Add handling for string by-columns in dask-cudf groupby (#10830) @charlesbluca
Fix compile warning in search.cu (#10827) @davidwendt
Fix element access const correctness in hostdevice_vector (#10804) @vuule
Update cuco git tag (#10788) @PointKernel
HostColumnVectoreCore#isNull should return true for out-of-range rows (#10779) @gerashegalov
Fixing deprecation warnings in test_orc.py (#10772) @hyperbolic2346
Enable writing to s3 storage in chunked parquet writer (#10769) @galipremsagar
Fix construction of nested structs with EMPTY child (#10761) @shwina
Fix replace error when regex has only zero match quantifiers (#10760) @davidwendt
Fix an issue with one_level_list schemas in parquet reader. (#10750) @nvdbaranec
update mangle_dupe_cols behavior in csv reader to match pandas 1.4.0 behavior (#10749) @karthikeyann
Fix cupy function in notebook (#10737) @ajschmidt8
Fix fillna to retain columns when it is MultiIndex (#10729) @galipremsagar
Fix scatter for all-empty-string column case (#10724) @davidwendt
Retain series name in Series.apply (#10716) @brandon-b-miller
Correct build dir cudf-config dependency issues for static builds (#10704) @robertmaynard
Fix list of testing requirements in setup.py. (#10678) @bdice
Fix rounding to zero error in stod on very small float numbers (#10672) @davidwendt
cuco isn't a cudf dependency when we are built shared (#10662) @robertmaynard
Fix to_timestamps to support Z for %z format specifier (#10617) @davidwendt
Verify compression type in Parquet reader (#10610) @vuule
Fix struct row comparator's exception on empty structs (#10604) @sperlingxx
Fix strings strip() to accept only str Scalar for to_strip parameter (#10597) @davidwendt
Fix has_atomic_support check in can_use_hash_groupby() (#10588) @jbrennan333
Revert Thrust 1.16 to Thrust 1.15 (#10586) @bdice
Fix missing RMM_STATIC_CUDART define when compiling JNI with static CUDA runtime (#10585) @jlowe
pin more cmake versions (#10570) @robertmaynard
Re-enable Build Metrics Report (#10562) @davidwendt
Remove statically linked CUDA runtime check in Java build (#10532) @jlowe
Fix temp data cleanup in test_text.py (#10524) @brandon-b-miller
Update pre-commit to run black 22.3.0 (#10523) @vyasr
Remove deprecated decimal_cols_as_float in the ORC reader (#10515) @vuule
Fix findall_record to return empty list for no matches (#10491) @davidwendt
Allow users to specify data types for a subset of columns in read_csv (#10484) @vuule
Fix default value of str.split expand parameter. (#10457) @bdice
Improve coverage of dask-cudf's groupby aggregation, add tests for dropna support (#10449) @charlesbluca
Allow string aggs for dask_cudf.CudfDataFrameGroupBy.aggregate (#10222) @charlesbluca
In-place updates with loc or iloc don't work correctly when the LHS has more than one column (#9918) @skirui-source

📖 Documentation

Clarify append deprecation notice. (#10930) @bdice
Use full name of GPUDirect Storage SDK in docs (#10904) @vuule
Update Dask + Pandas to Dask + cuDF path (#10897) @miguelusque
Add missing documentation in cudf/types.hpp (#10895) @karthikeyann
Add strong index iterator docs. (#10888) @bdice
spell check fixes (#10865) @karthikeyann
Add missing documentation in scalar/ headers (#10861) @karthikeyann
Remove typo in ngram documentation (#10859) @miguelusque
fix doxygen warnings (#10842) @karthikeyann
Add a library_design.md file documenting the core Python data structures and their relationship (#10817) @vyasr
Add NumPy to intersphinx references. (#10809) @bdice
Add a section to the docs that compares cuDF with Pandas (#10796) @shwina
Mention 2 cpp-reviewer requirement in pull request template (#10768) @davidwendt
Enable pydocstyle for all packages. (#10759) @bdice
Enable pydocstyle rules involving quotes (#10748) @vyasr
Revise 10 minutes notebook. (#10738) @bdice
Reorganize cuDF Python docs (#10691) @shwina
Fix sphinx/jupyter heading issue in UDF notebook (#10690) @brandon-b-miller
Migrated user guide notebooks to MyST-NB and added sphinx extension (#10685) @mmccarty
add data generation to benchmark documentation (#10677) @karthikeyann
Fix some docs build warnings (#10674) @galipremsagar
Update UDF notebook in User Guide. (#10668) @bdice
Improve User Guide docs (#10663) @bdice
Fix some docstrings formatting (#10660) @galipremsagar
Remove implementation details from apply docstrings (#10651) @brandon-b-miller
Revise CONTRIBUTING.md (#10644) @bdice
Add missing APIs to documentation. (#10643) @bdice
Use cudf.read_json as documented API name. (#10640) @bdice
Fix docstring section headings. (#10639) @bdice
Document cudf.read_text and cudf.read_avro. (#10638) @bdice
Fix type-o in docstring for json_reader_options (#10627) @dagardner-nv
Update guide to UDFs with notes about Series.applymap deprecation and related changes (#10607) @brandon-b-miller
Fix doxygen Modules page for cudf::lists::sequences (#10561) @davidwendt
Add Replace Backreferences section to Regex Features page (#10560) @davidwendt
Introduce deprecation policy to developer guide. (#10252) @vyasr

🚀 New Features

Enable Zstandard decompression only when all nvcomp integrations are enabled (#10944) @vuule
Handle nested types in cudf::concatenate_rows() (#10890) @nvdbaranec
Strong index types for equality comparator (#10883) @ttnghia
Add parameters to control page size in Parquet writer (#10882) @etseidl
Support for Zstandard decompression in ORC reader (#10873) @vuule
Use pre-built nvcomp 2.3 binaries by default (#10851) @robertmaynard
Support for Zstandard decompression in Parquet reader (#10847) @vuule
Add JNI support for apply_boolean_mask (#10812) @res-life
Segmented Min/Max for Fixed Point Types (#10794) @isVoid
Return per-file metadata from readers (#10782) @vuule
Segmented apply_boolean_mask for LIST columns (#10773) @mythrocks
Update groupby::hash to use new row operators for keys (#10770) @PointKernel
Support purging non-empty null elements from LIST/STRING columns (#10701) @mythrocks
Add detail::hash_join (#10695) @PointKernel
Persist string statistics data across multiple calls to orc chunked write (#10694) @hyperbolic2346
Add .list.astype() to cast list leaves to specified dtype (#10693) @shwina
JNI: Add generateListOffsets API (#10683) @sperlingxx
Support args in groupby apply (#10682) @brandon-b-miller
Enable segmented_gather in Java package (#10669) @sperlingxx
Add row hasher with nested column support (#10641) @devavret
Add support for numeric_only in DataFrame._reduce (#10629) @martinfalisse
First step toward statistics in ORC files with chunked writes (#10567) @hyperbolic2346
Add support for struct columns to the random table generator (#10566) @vuule
Enable passing a sequence for the index argument to .list.get() (#10564) @shwina
Add python bindings for cudf::list::index_of (#10549) @ChrisJar
Add default= kwarg to .list.get() accessor method (#10547) @shwina
Add cudf.DataFrame.applymap (#10542) @brandon-b-miller
Support nvComp 2.3 if local, otherwise use nvcomp 2.2 (#10513) @robertmaynard
Add column field ID control in parquet writer (#10504) @PointKernel
Deprecate Series.applymap (#10497) @brandon-b-miller
Add option to drop cache in cuIO benchmarks (#10488) @vuule
move benchmark input generation in device in reduction nvbench (#10486) @karthikeyann
Support Segmented Min/Max Reduction on String Type (#10447) @isVoid
List element Equality comparator (#10289) @devavret
Implement all methods of groupby rank aggregation in libcudf, python (#9569) @karthikeyann
Implement DataFrame.eval using libcudf ASTs (#8022) @vyasr

🛠️ Improvements

Use conda compilers in env file (#10915) @galipremsagar
Remove C style artifacts in cuIO (#10886) @vuule
Rename sliced_child to get_sliced_child. (#10885) @bdice
Replace defaulted stream value for libcudf APIs that use NVCOMP (#10877) @jbrennan333
Add more unit tests for cudf::distinct for nested types with sliced input (#10860) @ttnghia
Changing list_view.cuh to list_view.hpp (#10854) @ttnghia
More error checking in from_dlpack (#10850) @wence-
Cleanup regex compiler fixed quantifiers source (#10843) @davidwendt
Adds the JNI call for Cuda.deviceSynchronize (#10839) @abellina
Add missing cuda-python dependency to cudf (#10833) @bdice
Change std::string parameters in cudf::strings APIs to std::string_view (#10832) @davidwendt
Split up search.cu to improve compile time (#10831) @davidwendt
Add tests for null scalar binaryops (#10828) @brandon-b-miller
Cleanup regex compile optimize functions (#10825) @davidwendt
Use ThreadedMotoServer instead of subprocess in spinning up s3 server (#10822) @galipremsagar
Import NA from missing rather than using cudf.NA everywhere (#10821) @brandon-b-miller
Refactor regex builtin character-class identifiers (#10814) @davidwendt
Change pattern parameter for regex APIs from std::string to std::string_view (#10810) @davidwendt
Make the JNI API to get list offsets as a view public. (#10807) @revans2
Add cudf JNI docker build github action (#10806) @pxLi
Removed mr parameter from inplace bitmask operations (#10805) @AtlantaPepsi
Refactor cudf::contains, renaming and switching parameters role (#10802) @ttnghia
Handle closed property in IntervalDtype.from_pandas (#10798) @wence-
Return weak orderings from device_row_comparator. (#10793) @rwlee
Rework Scalar imports (#10791) @brandon-b-miller
Enable ccache for cudfjni build in Docker (#10790) @gerashegalov
Generic serialization of all column types (#10784) @wence-
simplifying skiprows test in test_orc.py (#10783) @hyperbolic2346
Use column_views instead of column_device_views in binary operations. (#10780) @bdice
Add struct utility functions. (#10776) @bdice
Add multiple rows to subword tokenizer benchmark (#10767) @davidwendt
Refactor host decompression in ORC reader (#10764) @vuule
Flush output streams before creating a process to drop caches (#10762) @vuule
Refactor binaryop/compiled/util.cpp (#10756) @bdice
Use warp per string for long strings in cudf::strings::contains() (#10739) @davidwendt
Use generator expressions in any/all functions. (#10736) @bdice
Use canonical "magic methods" (replace x.__repr__() with repr(x)). (#10735) @bdice
Improve use of isinstance. (#10734) @bdice
Rename tests from multiIndex to multiindex. (#10732) @bdice
Two-table comparators with strong index types (#10730) @bdice
Replace std::make_pair with std::pair (C++17 CTAD) (#10727) @karthikeyann
Use structured bindings instead of std::tie (#10726) @karthikeyann
Missing f prefix on f-strings fix (#10721) @code-review-doctor
Add max_file_size parameter to chunked parquet dataset writer (#10718) @galipremsagar
Deprecate merge_sorted, change dask cudf usage to internal method (#10713) @isVoid
Prepare dask_cudf test_parquet.py for upcoming API changes (#10709) @rjzamora
Remove or simplify various utility functions (#10705) @vyasr
Allow building arrow with parquet and not python (#10702) @revans2
Partial cuIO GPU decompression refactor (#10699) @vuule
Cython API refactor: merge.pyx (#10698) @isVoid
Fix random string data length to become variable (#10697) @galipremsagar
Add bindings for index_of with column search key (#10696) @ChrisJar
Deprecate index merging (#10689) @vyasr
Remove cudf::strings::string namespace (#10684) @davidwendt
Standardize imports. (#10680) @bdice
Standardize usage of collections.abc. (#10679) @bdice
Cython API Refactor: transpose.pyx, sort.pyx (#10675) @isVoid
Add device_memory_resource parameter to create_string_vector_from_column (#10673) @davidwendt
Split up mixed-join kernels source files (#10671) @davidwendt
Use std::filesystem for temporary directory location and deletion (#10664) @vuule
cleanup benchmark includes (#10661) @karthikeyann
Use upstream clang-format pre-commit hook. (#10659) @bdice
Clean up C++ includes to use <> instead of "". (#10658) @bdice
Handle RuntimeError thrown by CUDA Python in validate_setup (#10653) @shwina
Rework JNI CMake to leverage rapids_find_package (#10649) @jlowe
Use conda to build python packages during GPU tests (#10648) @Ethyling
Deprecate various functions that don't need to be defined for Index. (#10647) @vyasr
Update pinning to allow newer CMake versions. (#10646) @vyasr
Bump hadoop-common from 3.1.4 to 3.2.3 in /java (#10645) @dependabot[bot]
Remove concurrent_unordered_multimap. (#10642) @bdice
Improve parquet dictionary encoding (#10635) @PointKernel
Improve cudf::cuda_error (#10630) @sperlingxx
Add support for null and non-numeric types in Series.diff and DataFrame.diff (#10625) @Matt711
Branch 22.06 merge 22.04 (#10624) @vyasr
Unpin dask & distributed for development (#10623) @galipremsagar
Slightly improve accuracy of stod in to_floats (#10622) @davidwendt
Allow libcudfjni to be built as a static library (#10619) @jlowe
Change stack-based regex state data to use global memory (#10600) @davidwendt
Resolve Forward merging of branch-22.04 into branch-22.06 (#10598) @galipremsagar
KvikIO as an alternative GDS backend (#10593) @madsbk
Rename CUDA_TRY macro to CUDF_CUDA_TRY, rename CHECK_CUDA macro to CUDF_CHECK_CUDA. (#10589) @bdice
Upgrade cudf to support pandas 1.4.x versions (#10584) @galipremsagar
Refactor binary ops for timedelta and datetime columns (#10581) @vyasr
Refactor cudf::strings::count_re API to use count_matches utility (#10580) @davidwendt
Update Programming Language :: Python Versions to 3.8 & 3.9 (#10579) @madsbk
Automate Java cudf jar build with statically linked dependencies (#10578) @gerashegalov
Add patch for thrust-cub 1.16 to fix sort compile times (#10577) @davidwendt
Move binop methods from Frame to IndexedFrame and standardize the docstring (#10576) @vyasr
Cleanup libcudf strings regex classes (#10573) @davidwendt
Simplify preprocessing of arguments for DataFrame binops (#10563) @vyasr
Reduce kernel calls to build strings findall results (#10559) @davidwendt
Forward-merge branch-22.04 to branch-22.06 (#10557) @bdice
Update strings contains benchmark to measure varying match rates (#10555) @davidwendt
JNI: throw CUDA errors more specifically (#10551) @sperlingxx
Enable building static libs (#10545) @trxcllnt
Remove pip requirements files. (#10543) @bdice
Remove Click pinnings that are unnecessary after upgrading black. (#10541) @vyasr
Refactor memory_usage to improve performance (#10537) @galipremsagar
Adjust the valid range of group index for replace_with_backrefs (#10530) @sperlingxx
add accidentally removed comment. (#10526) @vyasr
Update conda environment. (#10525) @vyasr
Remove ColumnBase.getitem (#10516) @vyasr
Optimize left_semi_join by materializing the gather mask (#10511) @cheinger
Define proper binary operation APIs for columns (#10509) @vyasr
Upgrade arrow-cpp & pyarrow to 7.0.0 (#10503) @galipremsagar
Update to Thrust 1.16 (#10489) @bdice
Namespace/Docstring Fixes for Reduction (#10471) @isVoid
Update cudfjni 22.06.0-SNAPSHOT (#10467) @pxLi
Use Lists of Columns for Various Files (#10463) @isVoid
Additional refactoring of hash functions (#10462) @bdice
Fix Series.str.findall behavior for expand=False. (#10459) @bdice
Remove deprecated code. (#10450) @vyasr
Update cmake-format version. (#10440) @vyasr
Consolidate C++ conda recipes and add libcudf-tests package (#10326) @ajschmidt8
Use conda compilers (#10275) @Ethyling
Add row bitmask as a detail::hash_join member (#10248) @PointKernel

cudf - v22.04.00

Published by GPUtester over 2 years ago

🚨 Breaking Changes

Drop unsupported method argument from nunique and distinct_count. (#10411) @bdice
Refactor stream compaction APIs (#10370) @PointKernel
Add scan_aggregation and reduce_aggregation derived types. (#10357) @nvdbaranec
Avoid decimal type narrowing for decimal binops (#10299) @galipremsagar
Rewrites sample API (#10262) @isVoid
Remove probe-time null equality parameters in cudf::hash_join (#10260) @PointKernel
Enable proper Index round-tripping in orc reader and writer (#10170) @galipremsagar
Add JNI for strings::split_re and strings::split_record_re (#10139) @ttnghia
Change cudf::strings::find_multiple to return a lists column (#10134) @davidwendt
Remove the option to completely disable decimal128 columns in the ORC reader (#10127) @vuule
Remove deprecated code (#10124) @vyasr
Update gpu_utils.py to reflect current CUDA support. (#10113) @bdice
Optimize compaction operations (#10030) @PointKernel
Remove deprecated method Series.set_index. (#9945) @bdice
Add cudf::strings::findall_record API (#9911) @davidwendt
Upgrade arrow & pyarrow to 6.0.1 (#9686) @galipremsagar

🐛 Bug Fixes

Fix an issue with tdigest merge aggregations. (#10506) @nvdbaranec
Batch of fixes for index overflows in grid stride loops. (#10448) @nvdbaranec
Update dask_cudf imports to be compatible with latest dask (#10442) @rlratzel
Fix for integer overflow in contiguous-split (#10437) @jbrennan333
Fix has_null predicate for drop_list_duplicates on nested structs (#10436) @sperlingxx
Fix empty reduce with List output and non-List input (#10435) @sperlingxx
Fix list and struct meta generation issue in dask-cudf (#10434) @galipremsagar
Fix error in cudf.to_numeric when a bool input is passed (#10431) @galipremsagar
Support cupy array in quantile input (#10429) @galipremsagar
Fix benchmarks to work with new aggregation types (#10428) @davidwendt
Fix cudf::shift to handle offset greater than column size (#10414) @davidwendt
Fix lifespan of the temporary directory that holds cuFile configuration file (#10403) @vuule
Fix error thrown in compiled-binaryop benchmark (#10398) @davidwendt
Limiting async allocator using alignment of 512 (#10395) @rongou
Include <optional> in multibyte split. (#10385) @bdice
Fix issue with column and scalar re-assignment (#10377) @galipremsagar
Fix floating point data generation in benchmarks (#10372) @vuule
Avoid overflow in fused_concatenate_kernel output_index (#10344) @abellina
Remove is_relationally_comparable for table device views (#10342) @davidwendt
Fix debug compile error in device_span to column_view conversion (#10331) @davidwendt
Add Pascal support to JCUDF transcode (row_conversion) (#10329) @mythrocks
Fix std::bad_alloc exception due to JIT reserving a huge buffer (#10317) @ttnghia
Fixes up the overflowed fixed-point round on nullable column (#10316) @sperlingxx
Fix DataFrame slicing issues for empty cases (#10310) @brandon-b-miller
Fix documentation issues (#10307) @ajschmidt8
Allow Java bindings to use default decimal precisions when writing columns (#10276) @sperlingxx
Fix incorrect slicing of GDS read/write calls (#10274) @vuule
Fix out-of-memory error in compiled-binaryop benchmark (#10269) @davidwendt
Add tests of reflected ufuncs and fix behavior of logical reflected ufuncs (#10261) @vyasr
Remove probe-time null equality parameters in cudf::hash_join (#10260) @PointKernel
Fix out-of-memory error in UrlDecode benchmark (#10258) @davidwendt
Fix groupby reductions that perform operations on source type instead of target type (#10250) @ttnghia
Fix small leak in explode (#10245) @revans2
Yet another small JNI memory leak (#10238) @revans2
Fix regex octal parsing to limit to 3 characters (#10233) @davidwendt
Fix string to decimal128 conversion handling large exponents (#10231) @davidwendt
Fix JNI leak on copy to device (#10229) @revans2
Fix the data generator element size for decimal types (#10225) @vuule
Fix decimal metadata in parquet writer (#10224) @galipremsagar
Fix strings handling of hex in regex pattern (#10220) @davidwendt
Fix docs builds (#10216) @ajschmidt8
Fix a leftover _has_nulls change from Nullate (#10211) @devavret
Fix bitmask of the output for JNI of lists::drop_list_duplicates (#10210) @ttnghia
Fix compile error in binaryop/compiled/util.cpp (#10209) @ttnghia
Skip ORC and Parquet readers' benchmark cases that are not currently supported (#10194) @vuule
Fix JNI leak of a cudf::column_view native class. (#10171) @revans2
Enable proper Index round-tripping in orc reader and writer (#10170) @galipremsagar
Convert Column Name to String Before Using Struct Column Factory (#10156) @isVoid
Preserve the correct ListDtype while creating an identical empty column (#10151) @galipremsagar
benchmark fixture - static object pointer fix (#10145) @karthikeyann
Fix UDF Caching (#10133) @brandon-b-miller
Raise duplicate column error in DataFrame.rename (#10120) @galipremsagar
Fix flaky memory usage test by guaranteeing array size. (#10114) @vyasr
Encode values from python callback for C++ (#10103) @jdye64
Add check for regex instructions causing an infinite-loop (#10095) @davidwendt
Remove metadata singleton from nvtext normalizer (#10090) @davidwendt
Column equality testing fixes (#10011) @brandon-b-miller
Pin libcudf runtime dependency for cudf / libcudf-kafka nightlies (#9847) @charlesbluca

📖 Documentation

Fix documentation for DataFrame.corr and Series.corr. (#10493) @bdice
Add cut to API docs (#10479) @shwina
Remove documentation for methods removed in #10124. (#10366) @bdice
Fix documentation issues (#10306) @ajschmidt8
Fix fixed_point binary operation documentation (#10198) @codereport
Remove cleaned up methods from docs (#10189) @galipremsagar
Update developer guide to recommend no default stream parameter. (#10136) @bdice
Update benchmarking guide to use NVBench. (#10093) @bdice

🚀 New Features

Add StringIO support to read_text (#10465) @cwharris
Add support for tdigest and merge_tdigest aggregations through cudf::reduce (#10433) @nvdbaranec
JNI support for Collect Ops in Reduction (#10427) @sperlingxx
Enable read_text with dask_cudf using byte_range (#10407) @ChrisJar
Add cudf::stable_sort_by_key (#10387) @PointKernel
Implement maps_column_view abstraction over LIST<STRUCT<K,V>> (#10380) @mythrocks
Support Java bindings for Avro reader (#10373) @HaoYang670
Refactor stream compaction APIs (#10370) @PointKernel
Support collect aggregations in reduction (#10353) @sperlingxx
Refactor array_ufunc for Index and unify across all classes (#10346) @vyasr
Add JNI for extract_list_element with index column (#10341) @firestarman
Support min and max operations for structs in rolling window (#10332) @ttnghia
Add device create_sequence_table for benchmarks (#10300) @karthikeyann
Enable numpy ufuncs for DataFrame (#10287) @vyasr
move input generation for json benchmark to device (#10281) @karthikeyann
move input generation for type dispatcher benchmark to device (#10280) @karthikeyann
move input generation for copy benchmark to device (#10279) @karthikeyann
generate url decode benchmark input in device (#10278) @karthikeyann
device input generation in join bench (#10277) @karthikeyann
Add nvtext::byte_pair_encoding API (#10270) @davidwendt
Prevent internal usage of expensive APIs (#10263) @vyasr
Column to JCUDF row for tables with strings (#10235) @hyperbolic2346
Support percent_rank() aggregation (#10227) @mythrocks
Refactor Series.array_ufunc (#10217) @vyasr
Reduce pytest runtime (#10203) @brandon-b-miller
Add regex flags parameter to python cudf strings split (#10185) @davidwendt
Support for MOD, PMOD and PYMOD for decimal32/64/128 (#10179) @codereport
Adding string row size iterator for row to column and column to row conversion (#10157) @hyperbolic2346
Add file size counter to cuIO benchmarks (#10154) @vuule
byte_range support for multibyte_split/read_text (#10150) @cwharris
Add JNI for strings::split_re and strings::split_record_re (#10139) @ttnghia
Add maxSplit parameter to Java binding for strings:split (#10137) @ttnghia
Add libcudf strings split API that accepts regex pattern (#10128) @davidwendt
generate benchmark input in device (#10109) @karthikeyann
Avoid nan_as_null op if nan_count is 0 (#10082) @galipremsagar
Add Dataframe and Index nunique (#10077) @martinfalisse
Support nanosecond timestamps in parquet (#10063) @PointKernel
Java bindings for mixed semi and anti joins (#10040) @jlowe
Implement mixed equality/conditional semi/anti joins (#10037) @vyasr
Optimize compaction operations (#10030) @PointKernel
Support args= in Series.apply (#9982) @brandon-b-miller
Add cudf::strings::findall_record API (#9911) @davidwendt
Add covariance for sort groupby (python) (#9889) @mayankanand007
Implement DataFrame diff() (#9817) @skirui-source
Implement DataFrame pct_change (#9805) @skirui-source
Support segmented reductions and null mask reductions (#9621) @isVoid
Add 'spearman' correlation method for dataframe.corr and series.corr (#7141) @dominicshanshan

🛠️ Improvements

Add scipy skip for a test (#10502) @galipremsagar
Temporarily disable new ops-bot functionality (#10496) @ajschmidt8
Include <cstddef> to fix compilation of parquet reader on GCC 11. (#10483) @bdice
Pin dask and distributed (#10481) @galipremsagar
MD5 refactoring. (#10445) @bdice
Remove or split up Frame methods that use the index (#10439) @vyasr
Centralization of tdigest aggregation code. (#10422) @nvdbaranec
Simplify column binary operations (#10421) @vyasr
Add .github/ops-bot.yaml config file (#10420) @ajschmidt8
Use list of columns for methods in Groupby.pyx (#10419) @isVoid
Remove warnings in test_timedelta.py (#10418) @galipremsagar
Fix some warnings in test_parquet.py (#10416) @galipremsagar
JNI support for segmented reduce (#10413) @revans2
Clean up null mask after purging null entries (#10412) @sperlingxx
Drop unsupported method argument from nunique and distinct_count. (#10411) @bdice
Use str instead of builtins.str. (#10410) @bdice
Fix warnings in test_rolling (#10405) @bdice
Enable codecov github-check in CI (#10404) @galipremsagar
Fix warnings in test_cuda_apply, test_numerical, test_pickling, test_unaops. (#10402) @bdice
Set column names in _from_columns_like_self factory (#10400) @isVoid
Refactor nvtx annotations in cudf & dask-cudf (#10396) @galipremsagar
Consolidate .cov and .corr for sort groupby (#10386) @skirui-source
Consolidate some Frame APIs (#10381) @vyasr
Refactor hash functions and hash_combine (#10379) @bdice
Add nvtx annotations for Series and Index (#10374) @galipremsagar
Refactor filling.repeat API (#10371) @isVoid
Move standalone UTF8 functions from string_view.hpp to utf8.hpp (#10369) @davidwendt
Remove doc for deprecated function one_hot_encoding (#10367) @isVoid
Refactor array function (#10364) @vyasr
Fix warnings in test_csv.py. (#10362) @bdice
Implement a mixin for binops (#10360) @vyasr
Refactor cython interface: copying.pyx (#10359) @isVoid
Implement a mixin for scans (#10358) @vyasr
Add scan_aggregation and reduce_aggregation derived types. (#10357) @nvdbaranec
Add cleanup of python artifacts (#10355) @galipremsagar
Fix warnings in test_categorical.py. (#10354) @bdice
Create a dispatcher for invoking regex kernel functions (#10349) @davidwendt
Fix codecov in CI (#10347) @galipremsagar
Enable caching for memory_usage calculation in Column (#10345) @galipremsagar
C++17 cleanup: traits replace std::enable_if<>::type with std::enable_if_t (#10343) @karthikeyann
JNI: Support appending DECIMAL128 into ColumnBuilder in terms of byte array (#10338) @sperlingxx
multibyte_split test improvements (#10328) @vuule
Fix warnings in test_binops.py. (#10327) @bdice
Fix warnings from pandas in test_array_ufunc.py. (#10324) @bdice
Update upload script (#10321) @ajschmidt8
Move hash type declarations to hashing.hpp (#10320) @davidwendt
C++17 cleanup: traits replace ::value with _v (#10319) @karthikeyann
Remove internal columns usage (#10315) @vyasr
Remove extraneous build.sh parameter (#10313) @ajschmidt8
Add const qualifier to MurmurHash3_32::hash_combine (#10311) @davidwendt
Remove TODO in libcudf_kafka recipe (#10309) @ajschmidt8
Add conversions between column_view and device_span<T const>. (#10302) @bdice
Avoid decimal type narrowing for decimal binops (#10299) @galipremsagar
Deprecate DataFrame.iteritems and introduce .items (#10298) @galipremsagar
Explicitly request CMake use gnu++17 over c++17 (#10297) @robertmaynard
Add copyright check as pre-commit hook. (#10290) @vyasr
DataFrame insert and creation optimizations (#10285) @galipremsagar
Improve hash join detail functions (#10273) @PointKernel
Replace custom cached_property implementation with functools (#10272) @shwina
Rewrites sample API (#10262) @isVoid
Bump hadoop-common from 3.1.0 to 3.1.4 in /java (#10259) @dependabot[bot]
Remove making redundant copy across code-base (#10257) @galipremsagar
Add more nvtx annotations (#10256) @galipremsagar
Add copyright check in cudf (#10253) @galipremsagar
Remove redundant copies in fillna to improve performance (#10241) @galipremsagar
Remove std::numeric_limit specializations for timestamp & durations (#10239) @codereport
Optimize DataFrame creation across code-base (#10236) @galipremsagar
Change pytest distribution algorithm and increase parallelism in CI (#10232) @galipremsagar
Add environment variables for I/O thread pool and slice sizes (#10218) @vuule
Add regex flags to strings findall functions (#10208) @davidwendt
Update dask-cudf parquet tests to reflect upstream bugfixes to _metadata (#10206) @charlesbluca
Remove unnecessary nunique function in Series. (#10205) @martinfalisse
Refactor DataFrame tests. (#10204) @bdice
Rewrites column.__setitem__, Use boolean_mask_scatter (#10202) @isVoid
Java utilities to aid in accelerating aggregations on 128-bit types (#10201) @jlowe
Fix docstrings alignment in Frame methods (#10199) @galipremsagar
Fix cuco pair issue in hash join (#10195) @PointKernel
Replace dask groupby .index usages with .by (#10193) @galipremsagar
Add regex flags to strings extract function (#10192) @davidwendt
Forward-merge branch-22.02 to branch-22.04 (#10191) @bdice
Add CMake install rule for tests (#10190) @ajschmidt8
Unpin dask & distributed (#10182) @galipremsagar
Add comments to explain test validation (#10176) @galipremsagar
Reduce warnings in pytest output (#10168) @bdice
Some consolidation of indexed frame methods (#10167) @vyasr
Refactor isin implementations (#10165) @vyasr
Faster struct row comparator (#10164) @devavret
Refactor groupby::get_groups. (#10161) @bdice
Deprecate decimal_cols_as_float in ORC reader (C++ layer) (#10152) @vuule
Replace ccache with sccache (#10146) @ajschmidt8
Murmur3 hash kernel cleanup (#10143) @rwlee
Deprecate decimal_cols_as_float in ORC reader (#10142) @galipremsagar
Run pyupgrade 2.31.0. (#10141) @bdice
Remove drop_nan from internal IndexedFrame._drop_na_rows. (#10140) @bdice
Change cudf::strings::find_multiple to return a lists column (#10134) @davidwendt
Update cmake-format script for branch 22.04. (#10132) @bdice
Accept r-value references in convert_table_for_return(): (#10131) @mythrocks
Remove the option to completely disable decimal128 columns in the ORC reader (#10127) @vuule
Remove deprecated code (#10124) @vyasr
Update gpu_utils.py to reflect current CUDA support. (#10113) @bdice
Remove benchmarks suffix (#10112) @bdice
Update cudf java binding version to 22.04.0-SNAPSHOT (#10084) @pxLi
Remove unnecessary docker files. (#10069) @vyasr
Limit benchmark iterations using environment variable (#10060) @karthikeyann
Add timing chart for libcudf build metrics report page (#10038) @davidwendt
JNI: Rewrite growBuffersAndRows to accelerate the HostColumnBuilder (#10025) @sperlingxx
Reduce redundant code in CUDF JNI (#10019) @mythrocks
Make snappy decompress check more efficient (#9995) @cheinger
Remove deprecated method Series.set_index. (#9945) @bdice
Implement a mixin for reductions (#9925) @vyasr
JNI: Push back decimal utils from spark-rapids (#9907) @sperlingxx
Add assert_column_memory_* (#9882) @isVoid
Add CUDF_UNREACHABLE macro. (#9727) @bdice
Upgrade arrow & pyarrow to 6.0.1 (#9686) @galipremsagar

cudf - v22.02.00

Published by GPUtester over 2 years ago

🚨 Breaking Changes

ORC writer API changes for granular statistics (#10058) @mythrocks
decimal128 Support for to/from_arrow (#9986) @codereport
Remove deprecated method one_hot_encoding (#9977) @isVoid
Remove str.subword_tokenize (#9968) @VibhuJawa
Remove deprecated method parameter from merge and join. (#9944) @bdice
Remove deprecated method DataFrame.hash_columns. (#9943) @bdice
Remove deprecated method Series.hash_encode. (#9942) @bdice
Refactoring ceil/round/floor code for datetime64 types (#9926) @mayankanand007
Introduce nan_as_null parameter for cudf.Index (#9893) @galipremsagar
Add regex_flags parameter to strings replace_re functions (#9878) @davidwendt
Break tie for top categorical columns in Series.describe (#9867) @isVoid
Add partitioning support in parquet writer (#9810) @devavret
Move drop_duplicates, drop_na, _gather, take to IndexFrame and create their _base_index counterparts (#9807) @isVoid
Raise temporary error for decimal128 types in parquet reader (#9804) @galipremsagar
Change default dtype of all nulls column from float to object (#9803) @galipremsagar
Remove unused masked udf cython/c++ code (#9792) @brandon-b-miller
Pick smallest decimal type with required precision in ORC reader (#9775) @vuule
Add decimal128 support to Parquet reader and writer (#9765) @vuule
Refactor TableTest assertion methods to a separate utility class (#9762) @jlowe
Use cuFile direct device reads/writes by default in cuIO (#9722) @vuule
Match pandas scalar result types in reductions (#9717) @brandon-b-miller
Add parameters to control row group size in Parquet writer (#9677) @vuule
Refactor bit counting APIs, introduce valid/null count functions, and split host/device side code for segmented counts. (#9588) @bdice
Add support for decimal128 in cudf python (#9533) @galipremsagar
Implement lists::index_of() to find positions in list rows (#9510) @mythrocks
Rewriting row/column conversions for Spark <-> cudf data conversions (#8444) @hyperbolic2346

🐛 Bug Fixes

Add check for negative stripe index in ORC reader (#10074) @vuule
Update Java tests to expect DECIMAL128 from Arrow (#10073) @jlowe
Avoid index materialization when DataFrame is created with un-named Series objects (#10071) @galipremsagar
fix gcc 11 compilation errors (#10067) @rongou
Fix columns ordering issue in parquet reader (#10066) @galipremsagar
Fix dataframe setitem with ndarray types (#10056) @galipremsagar
Remove implicit copy due to conversion from cudf::size_type and size_t (#10045) @robertmaynard
Include <optional> in headers that use std::optional (#10044) @robertmaynard
Fix repr and concat of StructColumn (#10042) @galipremsagar
Include row group level stats when writing ORC files (#10041) @vuule
build.sh respects the --build_metrics and --incl_cache_stats flags (#10035) @robertmaynard
Fix memory leaks in JNI native code. (#10029) @mythrocks
Update JNI to use new arena mr constructor (#10027) @rongou
Fix null check when comparing structs in arg_min operation of reduction/groupby (#10026) @ttnghia
Wrap CI script shell variables in quotes to fix local testing. (#10018) @bdice
cudftestutil no longer propagates compiler flags to external users (#10017) @robertmaynard
Remove CUDA_DEVICE_CALLABLE macro usage (#10015) @hyperbolic2346
Add missing list filling header in meta.yaml (#10007) @devavret
Fix conda recipes for custreamz & cudf_kafka (#10003) @ajschmidt8
Fix matching regex word-boundary (\b) in strings replace (#9997) @davidwendt
Fix null check when comparing structs in min and max reduction/groupby operations (#9994) @ttnghia
Fix octal pattern matching in regex string (#9993) @davidwendt
decimal128 Support for to/from_arrow (#9986) @codereport
Fix groupby shift/diff/fill after selecting from a GroupBy (#9984) @shwina
Fix the overflow problem of decimal rescale (#9966) @sperlingxx
Use default value for decimal precision in parquet writer when not specified (#9963) @devavret
Fix cudf java build error. (#9958) @firestarman
Use gpuci_mamba_retry to install local artifacts. (#9951) @bdice
Fix regression HostColumnVectorCore requiring native libs (#9948) @jlowe
Rename aggregate_metadata in writer to fix name collision (#9938) @devavret
Fixed issue with percentile_approx where output tdigests could have uninitialized data at the end. (#9931) @nvdbaranec
Resolve racecheck errors in ORC kernels (#9916) @vuule
Fix the java build after parquet partitioning support (#9908) @revans2
Fix compilation of benchmark for parquet writer. (#9905) @bdice
Fix a memcheck error in ORC writer (#9896) @vuule
Introduce nan_as_null parameter for cudf.Index (#9893) @galipremsagar
Fix fallback to sort aggregation for grouping only hash aggregate (#9891) @abellina
Add zlib to cudfjni link when using static libcudf library dependency (#9890) @jlowe
TimedeltaIndex constructor raises an AttributeError. (#9884) @skirui-source
Fix cudf.Scalar string datetime construction (#9875) @brandon-b-miller
Load libcufile.so with RTLD_NODELETE flag (#9872) @vuule
Break tie for top categorical columns in Series.describe (#9867) @isVoid
Fix null handling for structs min and arg_min in groupby, groupby scan, reduction, and inclusive_scan (#9864) @ttnghia
Add one-level list encoding support in parquet reader (#9848) @PointKernel
Fix an out-of-bounds read in validity copying in contiguous_split. (#9842) @nvdbaranec
Fix join of MultiIndex to Index with one column and overlapping name. (#9830) @vyasr
Fix caching in Series.applymap (#9821) @brandon-b-miller
Enforce boolean ascending for dask-cudf sort_values (#9814) @charlesbluca
Fix ORC writer crash with empty input columns (#9808) @vuule
Change default dtype of all nulls column from float to object (#9803) @galipremsagar
Load native dependencies when Java ColumnView is loaded (#9800) @jlowe
Fix dtype-argument bug in dask_cudf read_csv (#9796) @rjzamora
Fix overflow for min calculation in strings::from_timestamps (#9793) @revans2
Fix memory error due to lambda return type deduction limitation (#9778) @karthikeyann
Revert regex $/EOL end-of-string new-line special case handling (#9774) @davidwendt
Fix missing streams (#9767) @karthikeyann
Fix make_empty_scalar_like on list_type (#9759) @sperlingxx
Update cmake and conda to 22.02 (#9746) @devavret
Fix out-of-bounds memory write in decimal128-to-string conversion (#9740) @davidwendt
Match pandas scalar result types in reductions (#9717) @brandon-b-miller
Fix regex non-multiline EOL/$ matching strings ending with a new-line (#9715) @davidwendt
Fixed build by adding more checks for int8, int16 (#9707) @razajafri
Fix null handling when boolean dtype is passed (#9691) @galipremsagar
Fix stream usage in segmented_gather() (#9679) @mythrocks

📖 Documentation

Update decimal dtypes related docs entries (#10072) @galipremsagar
Fix regex doc describing hexadecimal escape characters (#10009) @davidwendt
Fix cudf compilation instructions. (#9956) @esoha-nvidia
Fix see also links for IO APIs (#9895) @galipremsagar
Fix build instructions for libcudf doxygen (#9837) @davidwendt
Fix some doxygen warnings and add missing documentation (#9770) @karthikeyann
update cuda version in local build (#9736) @karthikeyann
Fix doxygen for enum types in libcudf (#9724) @davidwendt
Spell check fixes (#9682) @karthikeyann
Fix links in C++ Developer Guide. (#9675) @bdice

🚀 New Features

Remove libcudacxx patch needed for nvcc 11.4 (#10057) @robertmaynard
Allow CuPy 10 (#10048) @jakirkham
Add in support for NULL_LOGICAL_AND and NULL_LOGICAL_OR binops (#10016) @revans2
Add groupby.transform (only support for aggregations) (#10005) @shwina
Add partitioning support to Parquet chunked writer (#10000) @devavret
Add jni for sequences (#9972) @wbo4958
Java bindings for mixed left, inner, and full joins (#9941) @jlowe
Java bindings for JSON reader support (#9940) @wbo4958
Enable transpose for string columns in cudf python (#9937) @galipremsagar
Support structs for cudf::contains with column/scalar input (#9929) @ttnghia
Implement mixed equality/conditional joins (#9917) @vyasr
Add cudf::strings::extract_all API (#9909) @davidwendt
Implement JNI for cudf::scatter APIs (#9903) @ttnghia
JNI: Function to copy and set validity from bool column. (#9901) @mythrocks
Add dictionary support to cudf::copy_if_else (#9887) @davidwendt
add run_benchmarks target for running benchmarks with json output (#9879) @karthikeyann
Add regex_flags parameter to strings replace_re functions (#9878) @davidwendt
Add_suffix and add_prefix for DataFrames and Series (#9846) @mayankanand007
Add JNI for cudf::drop_duplicates (#9841) @ttnghia
Implement per-list sequence (#9839) @ttnghia
adding series.transpose (#9835) @mayankanand007
Adding support for Series.autocorr (#9833) @mayankanand007
Support round operation on datetime64 datatypes (#9820) @mayankanand007
Add partitioning support in parquet writer (#9810) @devavret
Raise temporary error for decimal128 types in parquet reader (#9804) @galipremsagar
Add decimal128 support to Parquet reader and writer (#9765) @vuule
Optimize groupby::scan (#9754) @PointKernel
Add sample JNI API (#9728) @res-life
Support min and max in inclusive scan for structs (#9725) @ttnghia
Add first and last method to IndexedFrame (#9710) @isVoid
Support min and max reduction for structs (#9697) @ttnghia
Add parameters to control row group size in Parquet writer (#9677) @vuule
Run compute-sanitizer in nightly build (#9641) @karthikeyann
Implement Series.datetime.floor (#9571) @skirui-source
ceil/floor for DatetimeIndex (#9554) @mayankanand007
Add support for decimal128 in cudf python (#9533) @galipremsagar
Implement lists::index_of() to find positions in list rows (#9510) @mythrocks
custreamz oauth callback for kafka (librdkafka) (#9486) @jdye64
Add Pearson correlation for sort groupby (python) (#9166) @skirui-source
Interchange dataframe protocol (#9071) @iskode
Rewriting row/column conversions for Spark <-> cudf data conversions (#8444) @hyperbolic2346

🛠️ Improvements

Prepare upload scripts for Python 3.7 removal (#10092) @Ethyling
Simplify custreamz and cudf_kafka recipes files (#10065) @Ethyling
ORC writer API changes for granular statistics (#10058) @mythrocks
Remove python constraints in cutreamz and cudf_kafka recipes (#10052) @Ethyling
Unpin dask and distributed in CI (#10028) @galipremsagar
Add _from_column_like_self factory (#10022) @isVoid
Replace custom CUDA bindings previously provided by RMM with official CUDA Python bindings (#10008) @shwina
Use cuda::std::is_arithmetic in cudf::is_numeric trait. (#9996) @bdice
Clean up CUDA stream use in cuIO (#9991) @vuule
Use addressed-ordered first fit for the pinned memory pool (#9989) @rongou
Add strings tests to transpose_test.cpp (#9985) @davidwendt
Use gpuci_mamba_retry on Java CI. (#9983) @bdice
Remove deprecated method one_hot_encoding (#9977) @isVoid
Minor cleanup of unused Python functions (#9974) @vyasr
Use new efficient partitioned parquet writing in cuDF (#9971) @devavret
Remove str.subword_tokenize (#9968) @VibhuJawa
Forward-merge branch-21.12 to branch-22.02 (#9947) @bdice
Remove deprecated method parameter from merge and join. (#9944) @bdice
Remove deprecated method DataFrame.hash_columns. (#9943) @bdice
Remove deprecated method Series.hash_encode. (#9942) @bdice
use ninja in java ci build (#9933) @rongou
Add build-time publish step to cpu build script (#9927) @davidwendt
Refactoring ceil/round/floor code for datetime64 types (#9926) @mayankanand007
Remove various unused functions (#9922) @vyasr
Raise in query if dtype is not supported (#9921) @brandon-b-miller
Add missing imports tests (#9920) @Ethyling
Spark Decimal128 hashing (#9919) @rwlee
Replace thrust/std::get with structured bindings (#9915) @codereport
Upgrade thrust version to 1.15 (#9912) @robertmaynard
Remove conda envs for CUDA 11.0 and 11.2. (#9910) @bdice
Return count of set bits from inplace_bitmask_and. (#9904) @bdice
Use dynamic nullate for join hasher and equality comparator (#9902) @davidwendt
Update ucx-py version on release using rvc (#9897) @Ethyling
Remove IncludeCategories from .clang-format (#9876) @codereport
Support statically linking CUDA runtime for Java bindings (#9873) @jlowe
Add clang-tidy to libcudf (#9860) @codereport
Remove deprecated methods from Java Table class (#9853) @jlowe
Add test for map column metadata handling in ORC writer (#9852) @vuule
Use pandas to_offset to parse frequency string in date_range (#9843) @isVoid
add templated benchmark with fixture (#9838) @karthikeyann
Use list of column inputs for apply_boolean_mask (#9832) @isVoid
Added a few more tests for Decimal to String cast (#9818) @razajafri
Run doctests. (#9815) @bdice
Avoid overflow for fixed_point round (#9809) @sperlingxx
Move drop_duplicates, drop_na, _gather, take to IndexFrame and create their _base_index counterparts (#9807) @isVoid
Use vector factories for host-device copies. (#9806) @bdice
Refactor host device macros (#9797) @vyasr
Remove unused masked udf cython/c++ code (#9792) @brandon-b-miller
Allow custom sort functions for dask-cudf sort_values (#9789) @charlesbluca
Improve build time of libcudf iterator tests (#9788) @davidwendt
Copy Java native dependencies directly into classpath (#9787) @jlowe
Add decimal types to cuIO benchmarks (#9776) @vuule
Pick smallest decimal type with required precision in ORC reader (#9775) @vuule
Avoid overflow for fixed_point cudf::cast and performance optimization (#9772) @codereport
Use CTAD with Thrust function objects (#9768) @codereport
Refactor TableTest assertion methods to a separate utility class (#9762) @jlowe
Use Java classloader to find test resources (#9760) @jlowe
Allow cast decimal128 to string and add tests (#9756) @razajafri
Load balance optimization for contiguous_split (#9755) @nvdbaranec
Consolidate and improve reset_index (#9750) @isVoid
Update to UCX-Py 0.24 (#9748) @pentschev
Skip cufile tests in JNI build script (#9744) @pxLi
Enable string to decimal 128 cast (#9742) @razajafri
Use stop instead of stop_. (#9735) @bdice
Forward-merge branch-21.12 to branch-22.02 (#9730) @bdice
Improve cmake format script (#9723) @vyasr
Use cuFile direct device reads/writes by default in cuIO (#9722) @vuule
Add directory-partitioned data support to cudf.read_parquet (#9720) @rjzamora
Use stream allocator adaptor for hash join table (#9704) @PointKernel
Update check for inf/nan strings in libcudf float conversion to ignore case (#9694) @davidwendt
Update cudf JNI to 22.02.0-SNAPSHOT (#9681) @pxLi
Replace cudf's concurrent_ordered_map with cuco::static_map in semi/anti joins (#9666) @vyasr
Some improvements to parse_decimal function and bindings for is_fixed_point (#9658) @razajafri
Add utility to format ninja-log build times (#9631) @davidwendt
Allow runtime has_nulls parameter for row operators (#9623) @davidwendt
Use fsspec.parquet for improved read_parquet performance from remote storage (#9589) @rjzamora
Refactor bit counting APIs, introduce valid/null count functions, and split host/device side code for segmented counts. (#9588) @bdice
Use List of Columns as Input for drop_nulls, gather and drop_duplicates (#9558) @isVoid
Simplify merge internals and reduce overhead (#9516) @vyasr
Add struct generation support in datagenerator & fuzz tests (#9180) @galipremsagar
Simplify write_csv by removing unnecessary writer/impl classes (#9089) @cwharris

cudf - v21.12.02

Published by GPUtester almost 3 years ago

v21.12.02

cudf - v21.12.01

Published by GPUtester almost 3 years ago

v21.12.01

cudf - v21.12.00

Published by GPUtester almost 3 years ago

🚨 Breaking Changes

Update bitmask_and and bitmask_or to return a pair of resulting mask and count of unset bits (#9616) @PointKernel
Remove sizeof and standardize on memory_usage (#9544) @vyasr
Add support for single-line regex anchors ^/$ in contains_re (#9482) @davidwendt
Refactor sorting APIs (#9464) @vyasr
Update Java nvcomp JNI bindings to nvcomp 2.x API (#9384) @jbrennan333
Support Python UDFs written in terms of rows (#9343) @brandon-b-miller
JNI: Support nested types in ORC writer (#9334) @firestarman
Optionally nullify out-of-bounds indices in segmented_gather(). (#9318) @mythrocks
Refactor cuIO timestamp processing with cuda::std::chrono (#9278) @PointKernel
Various internal MultiIndex improvements (#9243) @vyasr

🐛 Bug Fixes

Fix read_parquet bug for bytes input (#9669) @rjzamora
Use _gather internal for sort_* (#9668) @isVoid
Fix behavior of equals for non-DataFrame Frames and add tests. (#9653) @vyasr
Dont recompute output size if it is already available (#9649) @abellina
Fix read_parquet bug for extended dtypes from remote storage (#9638) @rjzamora
add const when getting data from a JNI data wrapper (#9637) @wjxiz1992
Fix debrotli issue on CUDA 11.5 (#9632) @vuule
Use std::size_t when computing join output size (#9626) @jlowe
Fix usecols parameter handling in dask_cudf.read_csv (#9618) @galipremsagar
Add support for string 'nan', 'inf' & '-inf' values while type-casting to float (#9613) @galipremsagar
Avoid passing NativeFileDatasource to pyarrow in read_parquet (#9608) @rjzamora
Fix test failure with cuda 11.5 in row_bit_count tests. (#9581) @nvdbaranec
Correct _LIBCUDACXX_CUDACC_VER value computation (#9579) @robertmaynard
Increase max RLE stream size estimate to avoid potential overflows (#9568) @vuule
Fix edge case in tdigest scalar generation for groups containing all nulls. (#9551) @nvdbaranec
Fix pytests failing in cuda-11.5 environment (#9547) @galipremsagar
compile libnvcomp with PTDS if requested (#9540) @jbrennan333
Fix segmented_gather() for null LIST rows (#9537) @mythrocks
Deprecate DataFrame.label_encoding, use private _label_encoding method internally. (#9535) @bdice
Fix several test and benchmark issues related to bitmask allocations. (#9521) @nvdbaranec
Fix for inserting duplicates in groupby result cache (#9508) @karthikeyann
Fix mismatched types error in clip() when using non int64 numeric types (#9498) @davidwendt
Match conda pinnings for style checks (revert part of #9412, #9433). (#9490) @bdice
Make sure all dask-cudf supported aggs are handled in _tree_node_agg (#9487) @charlesbluca
Resolve hash_columns FutureWarning in dask_cudf (#9481) @pentschev
Add fixed point to AllTypes in libcudf unit tests (#9472) @karthikeyann
Fix regex handling of embedded null characters (#9470) @davidwendt
Fix memcheck error in copy-if-else (#9467) @davidwendt
Fix bug in dask_cudf.read_parquet for index=False (#9453) @rjzamora
Preserve the decimal scale when creating a default scalar (#9449) @revans2
Push down parent nulls when flattening nested columns. (#9443) @mythrocks
Fix memcheck error in gtest SegmentedGatherTest/GatherSliced (#9442) @davidwendt
Revert "Fix quantile division / partition handling for dask-cudf sort… (#9438) @charlesbluca
Allow int-like objects for the decimals argument in round (#9428) @shwina
Fix stream compaction's drop_duplicates API to use stable sort (#9417) @ttnghia
Skip Comparing Uniform Window Results in Var/std Tests (#9416) @isVoid
Fix StructColumn.to_pandas type handling issues (#9388) @galipremsagar
Correct issues in the build dir cudf-config.cmake (#9386) @robertmaynard
Fix Java table partition test to account for non-deterministic ordering (#9385) @jlowe
Fix timestamp truncation/overflow bugs in orc/parquet (#9382) @PointKernel
Fix the crash in stats code (#9368) @devavret
Make Series.hash_encode results reproducible. (#9366) @bdice
Fix libcudf compile warnings on debug 11.4 build (#9360) @davidwendt
Fail gracefully when compiling python UDFs that attempt to access columns with unsupported dtypes (#9359) @brandon-b-miller
Set pass_filenames: false in mypy pre-commit configuration. (#9349) @bdice
Fix cudf_assert in cudf::io::orc::gpu::gpuDecodeOrcColumnData (#9348) @davidwendt
Fix memcheck error in groupby-tdigest get_scalar_minmax (#9339) @davidwendt
Optimizations for cudf.concat when axis=1 (#9333) @galipremsagar
Use f-string in join helper warning message. (#9325) @bdice
Avoid casting to list or struct dtypes in dask_cudf.read_parquet (#9314) @rjzamora
Fix null count in statistics for parquet (#9303) @devavret
Potential overflow of decimal32 when casting to int64_t (#9287) @codereport
Fix quantile division / partition handling for dask-cudf sort on null dataframes (#9259) @charlesbluca
Updating cudf version also updates rapids cmake branch (#9249) @robertmaynard
Implement one_hot_encoding in libcudf and bind to python (#9229) @isVoid
BUG FIX: CSV Writer ignores the header parameter when no metadata is provided (#8740) @skirui-source

📖 Documentation

Update Documentation to use TYPED_TEST_SUITE (#9654) @codereport
Add dedicated page for StringHandling in python docs (#9624) @galipremsagar
Update docstring of DataFrame.merge (#9572) @galipremsagar
Use raw strings to avoid SyntaxErrors in parsed docstrings. (#9526) @bdice
Add example to docstrings in rolling.apply (#9522) @isVoid
Update help message to escape quotes in ./build.sh --cmake-args. (#9494) @bdice
Improve Python docstring formatting. (#9493) @bdice
Update table of I/O supported types (#9476) @vuule
Document invalid regex patterns as undefined behavior (#9473) @davidwendt
Miscellaneous documentation fixes to cudf (#9471) @galipremsagar
Fix many documentation errors in libcudf. (#9355) @karthikeyann
Fixing SubwordTokenizer docs issue (#9354) @mayankanand007
Improved deprecation warnings. (#9347) @bdice
doc reorder mr, stream to stream, mr (#9308) @karthikeyann
Deprecate method parameters to DataFrame.join, DataFrame.merge. (#9291) @bdice
Added deprecation warning for .label_encoding() (#9289) @mayankanand007

🚀 New Features

Enable Series.divide and DataFrame.divide (#9630) @vyasr
Update bitmask_and and bitmask_or to return a pair of resulting mask and count of unset bits (#9616) @PointKernel
Add handling of mixed numeric types in to_dlpack (#9585) @galipremsagar
Support re.Pattern object for pat arg in str.replace (#9573) @davidwendt
Add JNI for lists::drop_list_duplicates with keys-values input column (#9553) @ttnghia
Support structs column in min, max, argmin and argmax groupby aggregate() and scan() (#9545) @ttnghia
Move libcudacxx to use rapids_cpm and use newer versions (#9539) @robertmaynard
Add scan min/max support for chrono types to libcudf reduction-scan (not groupby scan) (#9518) @davidwendt
Support args= in apply (#9514) @brandon-b-miller
Add groupby scan min/max support for strings values (#9502) @davidwendt
Add list output option to character_ngrams() function (#9499) @davidwendt
More granular column selection in ORC reader (#9496) @vuule
add min_periods, ddof to groupby covariance, & correlation aggregation (#9492) @karthikeyann
Implement Series.datetime.floor (#9488) @skirui-source
Enable linting of CMake files using pre-commit (#9484) @vyasr
Add support for single-line regex anchors ^/$ in contains_re (#9482) @davidwendt
Augment order_by to Accept a List of null_precedence (#9455) @isVoid
Add format API for list column of strings (#9454) @davidwendt
Enable Datetime/Timedelta dtypes in Masked UDFs (#9451) @brandon-b-miller
Add cudf python groupby.diff (#9446) @karthikeyann
Implement lists::stable_sort_lists for stable sorting of elements within each row of lists column (#9425) @ttnghia
add ctest memcheck using cuda-sanitizer (#9414) @karthikeyann
Support Unary Operations in Masked UDF (#9409) @isVoid
Move Several Series Function to Frame (#9394) @isVoid
MD5 Python hash API (#9390) @bdice
Add cudf strings is_title API (#9380) @davidwendt
Enable casting to int64, uint64, and double in AST code. (#9379) @vyasr
Add support for writing ORC with map columns (#9369) @vuule
extract_list_elements() with column_view indices (#9367) @mythrocks
Reimplement lists::drop_list_duplicates for keys-values lists columns (#9345) @ttnghia
Support Python UDFs written in terms of rows (#9343) @brandon-b-miller
JNI: Support nested types in ORC writer (#9334) @firestarman
Optionally nullify out-of-bounds indices in segmented_gather(). (#9318) @mythrocks
Add shallow hash function and shallow equality comparison for column_view (#9312) @karthikeyann
Add CudaMemoryBuffer for cudaMalloc memory using RMM cuda_memory_resource (#9311) @rongou
Add parameters to control row index stride and stripe size in ORC writer (#9310) @vuule
Add na_position param to dask-cudf sort_values (#9264) @charlesbluca
Add ascending parameter for dask-cudf sort_values (#9250) @charlesbluca
New array conversion methods (#9236) @vyasr
Series apply method backed by masked UDFs (#9217) @brandon-b-miller
Grouping by frequency and resampling (#9178) @shwina
Pure-python masked UDFs (#9174) @brandon-b-miller
Add Covariance, Pearson correlation for sort groupby (libcudf) (#9154) @karthikeyann
Add calendrical_month_sequence in c++ and date_range in python (#8886) @shwina

🛠️ Improvements

Followup to PR 9088 comments (#9659) @cwharris
Update cuCollections to version that supports installed libcudacxx (#9633) @robertmaynard
Add 11.5 dev.yml to cudf (#9617) @galipremsagar
Add xfail for parquet reader 11.5 issue (#9612) @galipremsagar
remove deprecated Rmm.initialize method (#9607) @rongou
Use HostColumnVectorCore for child columns in JCudfSerialization.unpackHostColumnVectors (#9596) @sperlingxx
Set RMM pool to a fixed size in JNI (#9583) @rongou
Use nvCOMP for Snappy compression/decompression (#9582) @vuule
Build CUDA version agnostic packages for dask-cudf (#9578) @Ethyling
Fixed tests warning: "TYPED_TEST_CASE is deprecated, please use TYPED_TEST_SUITE" (#9574) @ttnghia
Enable CMake format in CI and fix style (#9570) @vyasr
Add NVTX Start/End Ranges to JNI (#9563) @abellina
Add librdkafka and python-confluent-kafka to dev conda environments s… (#9562) @jdye64
Add offsets_begin/end() to strings_column_view (#9559) @davidwendt
remove alignment options for RMM jni (#9550) @rongou
Add axis parameter passthrough to DataFrame and Series take for pandas API compatibility (#9549) @dantegd
Remove sizeof and standardize on memory_usage (#9544) @vyasr
Adds cudaProfilerStart/cudaProfilerStop in JNI api (#9543) @abellina
Generalize comparison binary operations (#9542) @vyasr
Expose APIs to wrap CUDA or RMM allocations with a Java device buffer instance (#9538) @jlowe
Add scan sum support for duration types to libcudf (#9536) @davidwendt
Force inlining to improve AST performance (#9530) @vyasr
Generalize some more indexed frame methods (#9529) @vyasr
Add Java bindings for rolling window stddev aggregation (#9527) @razajafri
catch rmm::out_of_memory exceptions in jni (#9525) @rongou
Add an overload of make_empty_column with type_id parameter (#9524) @ttnghia
Accelerate conditional inner joins with larger right tables (#9523) @vyasr
Initial pass of generalizing decimal support in cudf python layer (#9517) @galipremsagar
Cleanup for flattening nested columns (#9509) @rwlee
Enable running tests using RMM arena and async memory resources (#9506) @rongou
Remove dependency on six. (#9495) @bdice
Cleanup some libcudf strings gtests (#9489) @davidwendt
Rename strings/array_tests.cu to strings/array_tests.cpp (#9480) @davidwendt
Refactor sorting APIs (#9464) @vyasr
Implement DataFrame.hash_values, deprecate DataFrame.hash_columns. (#9458) @bdice
Deprecate Series.hash_encode. (#9457) @bdice
Update conda recipes for Enhanced Compatibility effort (#9456) @ajschmidt8
Small clean up to simplify column selection code in ORC reader (#9444) @vuule
add missing stream to scalar.is_valid() wherever stream is available (#9436) @karthikeyann
Adds Deprecation Warnings to one_hot_encoding and Implement get_dummies with Cython API (#9435) @isVoid
Update pre-commit hook URLs. (#9433) @bdice
Remove pyarrow import in dask_cudf.io.parquet (#9429) @charlesbluca
Miscellaneous improvements for UDFs (#9422) @isVoid
Use pre-commit for CI (#9412) @vyasr
Update to UCX-Py 0.23 (#9407) @pentschev
Expose OutOfBoundsPolicy in JNI for Table.gather (#9406) @abellina
Improvements to tdigest aggregation code. (#9403) @nvdbaranec
Add Java API to deserialize a table to host columns (#9402) @jlowe
Frame copy to use class instead of type() (#9397) @madsbk
Change all DeprecationWarnings to FutureWarning. (#9392) @bdice
Update Java nvcomp JNI bindings to nvcomp 2.x API (#9384) @jbrennan333
Add IndexedFrame class and move SingleColumnFrame to a separate module (#9378) @vyasr
Support Arrow NativeFile and PythonFile for remote ORC storage (#9377) @rjzamora
Use Arrow PythonFile for remote CSV storage (#9376) @rjzamora
Add multi-threaded writing to GDS writes (#9372) @devavret
Miscellaneous column cleanup (#9370) @vyasr
Use single kernel to extract all groups in cudf::strings::extract (#9358) @davidwendt
Consolidate binary ops into Frame (#9357) @isVoid
Move rank scan implementations from scan_inclusive.cu to rank_scan.cu (#9351) @davidwendt
Remove usage of deprecated thrust::host_space_tag. (#9350) @bdice
Use Default Memory Resource for Temporaries in reduction.cpp (#9344) @isVoid
Fix Cython compilation warnings. (#9327) @bdice
Fix some unused variable warnings in libcudf (#9326) @davidwendt
Use optional-iterator for copy-if-else kernel (#9324) @davidwendt
Remove Table class (#9315) @vyasr
Unpin dask and distributed in CI (#9307) @galipremsagar
Add optional-iterator support to indexalator (#9306) @davidwendt
Consolidate more methods in Frame (#9305) @vyasr
Add Arrow-NativeFile and PythonFile support to read_parquet and read_csv in cudf (#9304) @rjzamora
Pin mypy in .pre-commit-config.yaml to match conda environment pinning. (#9300) @bdice
Use gather.hpp when gather-map exists in device memory (#9299) @davidwendt
Fix Automerger for Branch-21.12 from branch-21.10 (#9285) @galipremsagar
Refactor cuIO timestamp processing with cuda::std::chrono (#9278) @PointKernel
Change strings copy_if_else to use optional-iterator instead of pair-iterator (#9266) @davidwendt
Update cudf java bindings to 21.12.0-SNAPSHOT (#9248) @pxLi
Various internal MultiIndex improvements (#9243) @vyasr
Add detail interface for split and slice(table_view), refactors both function with host_span (#9226) @isVoid
Refactor MD5 implementation. (#9212) @bdice
Update groupby result_cache to allow sharing intermediate results based on column_view instead of requests. (#9195) @karthikeyann
Use nvcomp's snappy decompressor in avro reader (#9181) @devavret
Add isocalendar API support (#9169) @marlenezw
Simplify read_json by removing unnecessary reader/impl classes (#9088) @cwharris
Simplify read_csv by removing unnecessary reader/impl classes (#9041) @cwharris
Refactor hash join with cuCollections multimap (#8934) @PointKernel

cudf - v21.10.01

Published by GPUtester about 3 years ago

v21.10.01

cudf - v21.10.00

Published by GPUtester about 3 years ago

🚨 Breaking Changes

Remove Cython APIs for table view generation (#9199) @vyasr
Upgrade pandas version in cudf (#9147) @galipremsagar
Make AST operators nullable (#9096) @vyasr
Remove the option to pass data types as strings to read_csv and read_json (#9079) @vuule
Update JNI java CSV APIs to not use deprecated API (#9066) @revans2
Support additional format specifiers in from_timestamps (#9047) @davidwendt
Expose expression base class publicly and simplify public AST API (#9045) @vyasr
Add support for struct type in ORC writer (#9025) @vuule
Remove aliases of various api.types APIs from utils.dtypes. (#9011) @vyasr
Java bindings for conditional join output sizes (#9002) @jlowe
Move compute_column API out of ast namespace (#8957) @vyasr
cudf.dtype function (#8949) @shwina
Refactor Frame reductions (#8944) @vyasr
Add nested column selection to parquet reader (#8933) @devavret
JNI Aggregation Type Changes (#8919) @revans2
Add groupby_aggregation and groupby_scan_aggregation classes and force their usage. (#8906) @nvdbaranec
Expand CSV and JSON reader APIs to accept dtypes as a vector or map of data_type objects (#8856) @vuule
Change cudf docs theme to pydata theme (#8746) @galipremsagar
Enable compiled binary ops in libcudf, python and java (#8741) @karthikeyann
Make groupby transform-like op order match original data order (#8720) @isVoid

🐛 Bug Fixes

fixed_point cudf::groupby for mean aggregation (#9296) @codereport
Fix interleave_columns when the input string lists column having empty child column (#9292) @ttnghia
Update nvcomp to include fixes for installation of headers (#9276) @devavret
Fix Java column leak in testParquetWriteMap (#9271) @jlowe
Fix call to thrust::reduce_by_key in argmin/argmax libcudf groupby (#9263) @davidwendt
Fixing empty input to getMapValue crashing (#9262) @hyperbolic2346
Fix duplicate names issue in MultiIndex.deserialize (#9258) @galipremsagar
Dataframe.sort_index optimizations (#9238) @galipremsagar
Temporarily disabling problematic test in parquet writer (#9230) @devavret
Explicitly disable groupby on unsupported key types. (#9227) @mythrocks
Fix gather for sliced input structs column (#9218) @ttnghia
Fix JNI code for left semi and anti joins (#9207) @jlowe
Only install thrust when using a non 'system' version (#9206) @robertmaynard
Remove zlib from libcudf public CMake dependencies (#9204) @robertmaynard
Fix out-of-bounds memory read in orc gpuEncodeOrcColumnData (#9196) @davidwendt
Fix gather() for STRUCT inputs with no nulls in members. (#9194) @mythrocks
get_cucollections properly uses rapids_cpm_find (#9189) @robertmaynard
rapids-export correctly reference build code block and doc strings (#9186) @robertmaynard
Fix logic while parsing the sum statistic for numerical orc columns (#9183) @ayushdg
Add handling for nulls in dask_cudf.sorting.quantile_divisions (#9171) @charlesbluca
Approximate overflow detection in ORC statistics (#9163) @vuule
Use decimal precision metadata when reading from parquet files (#9162) @shwina
Fix variable name in Java build script (#9161) @jlowe
Import rapids-cmake modules using the correct cmake variable. (#9149) @robertmaynard
Fix conditional joins with empty left table (#9146) @vyasr
Fix joining on indexes with duplicate level names (#9137) @shwina
Fixes missing child column name in dtype while reading ORC file. (#9134) @rgsl888prabhu
Apply type metadata after column is slice-copied (#9131) @isVoid
Fix a bug: inner_join_size return zero if build table is empty (#9128) @PointKernel
Fix multi hive-partition parquet reading in dask-cudf (#9122) @rjzamora
Support null literals in expressions (#9117) @vyasr
Fix cudf::hash_join output size for struct joins (#9107) @jlowe
Import fix (#9104) @shwina
Fix cudf::strings::is_fixed_point checking of overflow for decimal32 (#9093) @davidwendt
Fix branch_stack calculation in row_bit_count() (#9076) @mythrocks
Fetch rapids-cmake to work around cuCollection cmake issue (#9075) @jlowe
Fix compilation errors in groupby benchmarks. (#9072) @nvdbaranec
Preserve float16 upscaling (#9069) @galipremsagar
Fix memcheck read error in libcudf contiguous_split (#9067) @davidwendt
Add support for reading ORC file with no row group index (#9060) @rgsl888prabhu
Various multiindex related fixes (#9036) @shwina
Avoid rebuilding cython in build.sh (#9034) @brandon-b-miller
Add support for percentile dispatch in dask_cudf (#9031) @galipremsagar
cudf resolve nvcc 11.0 compiler crashes during codegen (#9028) @robertmaynard
Fetch correct grouping keys agg of dask groupby (#9022) @galipremsagar
Allow where() to work with a Series and other=cudf.NA (#9019) @sarahyurick
Use correct index when returning Series from GroupBy.apply() (#9016) @charlesbluca
Fix Dataframe indexer setitem when array is passed (#9006) @galipremsagar
Fix ORC reading of files with struct columns that have null values (#9005) @vuule
Ensure JNI native libraries load when CompiledExpression loads (#8997) @jlowe
Fix memory read error in get_dremel_data in page_enc.cu (#8995) @davidwendt
Fix memory write error in get_list_child_to_list_row_mapping utility (#8994) @davidwendt
Fix debug compile error for csv_test.cpp (#8981) @davidwendt
Fix memory read/write error in concatenate_lists_ignore_null (#8978) @davidwendt
Fix concatenation of cudf.RangeIndex (#8970) @galipremsagar
Java conditional joins should not require matching column counts (#8955) @jlowe
Fix concatenate empty structs (#8947) @sperlingxx
Fix cuda-memcheck errors for some libcudf functions (#8941) @davidwendt
Apply series name to result of SeriesGroupby.apply() (#8939) @charlesbluca
cdef packed_columns as cppclass instead of struct (#8936) @charlesbluca
Inserting a cudf.NA into a DataFrame (#8923) @sarahyurick
Support casting with Pandas dtype aliases (#8920) @sarahyurick
Allow sort_values to accept same kind values as Pandas (#8912) @sarahyurick
Enable casting to pandas nullable dtypes (#8889) @brandon-b-miller
Fix libcudf memory errors (#8884) @karthikeyann
Throw KeyError when accessing field from struct with nonexistent key (#8880) @NV-jpt
replace auto with auto& ref for cast<&> (#8866) @karthikeyann
Add missing include<optional> in binops (#8864) @karthikeyann
Fix select_dtypes to work when non-class dtypes present in dataframe (#8849) @sarahyurick
Re-enable JSON tests (#8843) @vuule
Support header with embedded delimiter in csv writer (#8798) @davidwendt

📖 Documentation

Add IO docs page in cudf documentation (#9145) @galipremsagar
use correct namespace in cuio code examples (#9037) @cwharris
Restructuring Contributing doc (#9026) @iskode
Update stable version in readme (#9008) @galipremsagar
Add spans and more include guidelines to libcudf developer guide (#8931) @harrism
Update Java build instructions to mention Arrow S3 and Docker (#8867) @jlowe
List GDS-enabled formats in the docs (#8805) @vuule
Change cudf docs theme to pydata theme (#8746) @galipremsagar

🚀 New Features

Revert "Add shallow hash function and shallow equality comparison for column_view (#9185)" (#9283) @karthikeyann
Align DataFrame.apply signature with pandas (#9275) @brandon-b-miller
Add struct type support for drop_list_duplicates (#9202) @ttnghia
support CUDA async memory resource in JNI (#9201) @rongou
Add shallow hash function and shallow equality comparison for column_view (#9185) @karthikeyann
Superimpose null masks for STRUCT columns. (#9144) @mythrocks
Implemented bindings for ceil timestamp operation (#9141) @shaneding
Adding MAP type support for ORC Reader (#9132) @rgsl888prabhu
Implement interleave_columns for lists with arbitrary nested type (#9130) @ttnghia
Add python bindings to fixed-size window and groupby rolling.var, rolling.std (#9097) @isVoid
Make AST operators nullable (#9096) @vyasr
Java bindings for approx_percentile (#9094) @andygrove
Add dseries.struct.explode (#9086) @isVoid
Add support for BaseIndexer in Rolling APIs (#9085) @galipremsagar
Remove the option to pass data types as strings to read_csv and read_json (#9079) @vuule
Add handling for nested dicts in dask-cudf groupby (#9054) @charlesbluca
Added Series.dt.is_quarter_start and Series.dt.is_quarter_end (#9046) @TravisHester
Support nested types for nth_element reduction (#9043) @sperlingxx
Update sort groupby to use non-atomic operation (#9035) @karthikeyann
Add support for struct type in ORC writer (#9025) @vuule
Implement interleave_columns for structs columns (#9012) @ttnghia
Add groupby first and last aggregations (#9004) @shwina
Add DecimalBaseColumn and move as_decimal_column (#9001) @isVoid
Python/Cython bindings for multibyte_split (#8998) @jdye64
Support scalar months in add_calendrical_months, extends API to INT32 support (#8991) @isVoid
Added Series.dt.is_month_end (#8989) @TravisHester
Support for using tdigests to compute approximate percentiles. (#8983) @nvdbaranec
Support "unflatten" of columns flattened via flatten_nested_columns(): (#8956) @mythrocks
Implement timestamp ceil (#8942) @shaneding
Add nested column selection to parquet reader (#8933) @devavret
Expose conditional join size calculation (#8928) @vyasr
Support Nulls in Timeseries Generator (#8925) @isVoid
Avoid index equality check in _CPackedColumns.from_py_table() (#8917) @charlesbluca
Add dot product binary op (#8909) @charlesbluca
Expose days_in_month function in libcudf and add python bindings (#8892) @isVoid
Series string repeat (#8882) @sarahyurick
Python binding for quarters (#8862) @shaneding
Expand CSV and JSON reader APIs to accept dtypes as a vector or map of data_type objects (#8856) @vuule
Add Java bindings for AST transform (#8846) @jlowe
Series datetime is_month_start (#8844) @sarahyurick
Support bracket syntax for cudf::strings::replace_with_backrefs group index values (#8841) @davidwendt
Support VARIANCE and STD aggregation in rolling op (#8809) @isVoid
Add quarters to libcudf datetime (#8779) @shaneding
Linear Interpolation of nans via cupy (#8767) @brandon-b-miller
Enable compiled binary ops in libcudf, python and java (#8741) @karthikeyann
Make groupby transform-like op order match original data order (#8720) @isVoid
multibyte_split (#8702) @cwharris
Implement JNI for strings:repeat_strings that repeats each string separately by different numbers of times (#8572) @ttnghia

🛠️ Improvements

Pin max dask and distributed versions to 2021.09.1 (#9286) @galipremsagar
Optimized fsspec data transfer for remote file-systems (#9265) @rjzamora
Skip dask-cudf tests on arm64 (#9252) @Ethyling
Use nvcomp's snappy compressor in ORC writer (#9242) @devavret
Only run imports tests on x86_64 (#9241) @Ethyling
Remove unnecessary call to device_uvector::release() (#9237) @harrism
Use nvcomp's snappy decompression in ORC reader (#9235) @devavret
Add grouped_rolling test with STRUCT groupby keys. (#9228) @mythrocks
Optimize cudf.concat for axis=0 (#9222) @galipremsagar
Fix some libcudf calls not passing the stream parameter (#9220) @davidwendt
Add min and max bounds for random dataframe generator numeric types (#9211) @galipremsagar
Improve performance of expression evaluation (#9210) @vyasr
Misc optimizations in cudf (#9203) @galipremsagar
Remove Cython APIs for table view generation (#9199) @vyasr
Add JNI support for drop_list_duplicates (#9198) @revans2
Update pandas versions in conda recipes and requirements.txt files (#9197) @galipremsagar
Minor C++17 cleanup of groupby.cu: structured bindings, more concise lambda, etc (#9193) @codereport
Explicit about bitwidth difference between cudf boolean and arrow boolean (#9192) @isVoid
Remove _source_index from MultiIndex (#9191) @vyasr
Fix typo in the name of cudf-testing-targets.cmake (#9190) @trxcllnt
Add support for single-digits in cudf::to_timestamps (#9173) @davidwendt
Fix cufilejni build include path (#9168) @pxLi
dask_cudf dispatch registering cleanup (#9160) @galipremsagar
Remove unneeded stream/mr from a cudf::make_strings_column (#9148) @davidwendt
Upgrade pandas version in cudf (#9147) @galipremsagar
make data chunk reader return unique_ptr (#9129) @cwharris
Add backend for percentile_lookup dispatch (#9118) @galipremsagar
Refactor implementation of column setitem (#9110) @vyasr
Fix compile warnings found using nvcc 11.4 (#9101) @davidwendt
Update to UCX-Py 0.22 (#9099) @pentschev
Simplify read_avro by removing unnecessary writer/impl classes (#9090) @cwharris
Allowing %f in format to return nanoseconds (#9081) @marlenezw
Java bindings for cudf::hash_join (#9080) @jlowe
Remove stale code in ColumnBase._fill (#9078) @isVoid
Add support for get_group in GroupBy (#9070) @galipremsagar
Remove remaining "support" methods from DataFrame (#9068) @vyasr
Update JNI java CSV APIs to not use deprecated API (#9066) @revans2
Added method to remove null_masks if the column has no nulls (#9061) @razajafri
Consolidate Several Series and Dataframe Methods (#9059) @isVoid
Remove usage of string based set_dtypes for csv & json readers (#9049) @galipremsagar
Remove some debug print statements from gtests (#9048) @davidwendt
Support additional format specifiers in from_timestamps (#9047) @davidwendt
Expose expression base class publicly and simplify public AST API (#9045) @vyasr
move filepath and mmap logic out of json/csv up to functions.cpp (#9040) @cwharris
Refactor Index hierarchy (#9039) @vyasr
cudf now leverages rapids-cmake to reduce CMake boilerplate (#9030) @robertmaynard
Add support for STRUCT input to groupby (#9024) @mythrocks
Refactor Frame scans (#9021) @vyasr
Remove duplicate set_categories code (#9018) @isVoid
Map support for ParquetWriter (#9013) @razajafri
Remove aliases of various api.types APIs from utils.dtypes. (#9011) @vyasr
Java bindings for conditional join output sizes (#9002) @jlowe
Remove _copy_construct factory (#8999) @vyasr
ENH Allow arbitrary CMake config options in build.sh (#8996) @dillon-cullinan
A small optimization for JNI copy column view to column vector (#8985) @revans2
Fix nvcc warnings in ORC writer (#8975) @devavret
Support nested structs in rank and dense rank (#8962) @rwlee
Move compute_column API out of ast namespace (#8957) @vyasr
Series datetime is_year_end and is_year_start (#8954) @marlenezw
Make Java AstNode public (#8953) @jlowe
Replace allocate with device_uvector for subword_tokenize internal tables (#8952) @davidwendt
cudf.dtype function (#8949) @shwina
Refactor Frame reductions (#8944) @vyasr
Add deprecation warning for Series.set_mask API (#8943) @galipremsagar
Move AST evaluator into a separate header (#8930) @vyasr
JNI Aggregation Type Changes (#8919) @revans2
Move template parameter to function parameter in cudf::detail::left_semi_anti_join (#8914) @davidwendt
Upgrade arrow & pyarrow to 5.0.0 (#8908) @galipremsagar
Add groupby_aggregation and groupby_scan_aggregation classes and force their usage. (#8906) @nvdbaranec
Move structs_column_tests.cu to .cpp. (#8902) @mythrocks
Add stream and memory-resource parameters to struct-scalar copy ctor (#8901) @davidwendt
Combine linearizer and ast_plan (#8900) @vyasr
Add Java bindings for conditional join gather maps (#8888) @jlowe
Remove max version pin for dask & distributed on development branch (#8881) @galipremsagar
fix cufilejni build w/ c++17 (#8877) @pxLi
Add struct accessor to dask-cudf (#8874) @NV-jpt
Migrate dask-cudf CudfEngine to leverage ArrowDatasetEngine (#8871) @rjzamora
Add JNI for extract_quarter, add_calendrical_months, and is_leap_year (#8863) @revans2
Change cudf::scalar copy and move constructors to protected (#8857) @davidwendt
Replace is_same<>::value with is_same_v<> (#8852) @codereport
Add min pytorch version to importorskip in pytest (#8851) @galipremsagar
Java bindings for regex replace (#8847) @jlowe
Remove make strings children with null mask (#8830) @davidwendt
Refactor conditional joins (#8815) @vyasr
Small cleanup (unused headers / commented code removals) (#8799) @codereport
ENH Replace gpuci_conda_retry with gpuci_mamba_retry (#8770) @dillon-cullinan
Update cudf java bindings to 21.10.0-SNAPSHOT (#8765) @pxLi
Refactor and improve join benchmarks with nvbench (#8734) @PointKernel
Refactor Python factories and remove usage of Table for libcudf output handling (#8687) @vyasr
Optimize URL Decoding (#8622) @gaohao95
Parquet writer dictionary encoding refactor (#8476) @devavret
Use nvcomp's snappy decompression in parquet reader (#8252) @devavret
Use nvcomp's snappy compressor in parquet writer (#8229) @devavret

cudf - v21.08.03

Published by GPUtester about 3 years ago

v21.08.03

cudf - v21.08.02

Published by GPUtester about 3 years ago

v21.08.02

Package Rankings

Top 5.32% on Pypi.org

Top 8.17% on Proxy.golang.org

Top 4.8% on Repo1.maven.org

Related Projects

localGPT

Chat with your documents on your local device using GPT models. No data leaves your device and 10...

24 May 2023 19,925

librapid

A highly optimised C++ library for mathematical applications and neural networks.

25 May 2021 163

DeepRec

DeepRec is a high-performance recommendation deep learning framework based on TensorFlow. It is h...

24 Dec 2021 1,029

cumm

CUda Matrix Multiply library.

08 Oct 2021 67

spconv

Spatial Sparse Convolution Library

19 Jan 2019 1,847

annotated-s4

Implementation of https://srush.github.io/annotated-s4

08 Dec 2021 450

blazingsql

BlazingSQL is a lightweight, GPU accelerated, SQL engine for Python. Built on RAPIDS cuDF.

24 Sep 2018 1,896

panda3d

Powerful, mature open-source cross-platform game engine for Python and C++, developed by Disney a...

30 Sep 2013 4,258

sit4onnx

Tools for simple inference testing using TensorRT, CUDA and OpenVINO CPU/GPU and CPU providers. S...

12 May 2022 18

CV-CUDA

CV-CUDA™ is an open-source, GPU accelerated library for cloud-scale image processing and computer...

23 Aug 2022 2,338

sqaod

Solvers/annealers for simulated quantum annealing on CPU and CUDA(NVIDIA GPU).

24 Oct 2017 81

vqa-outliers

Code and Experiments for ACL-IJCNLP 2021 Paper "Mind Your Outliers! Investigating the Negative Im...

25 May 2021 55

CuVec

Unifying Python/C++/CUDA memory: Python buffered array ↔️ `std::vector` ↔️ CUDA managed memory

16 Jan 2021 80

cupy

NumPy & SciPy for GPU

01 Nov 2016 7,739