cudf | Python Ecosystem Directory

Bot releases are hidden (Show)

cudf - v22.06.01

Published by GPUtester over 2 years ago

v22.06.01

cudf - v22.06.00

Published by GPUtester over 2 years ago

🚨 Breaking Changes

Enable Zstandard decompression only when all nvcomp integrations are enabled (#10944) @vuule
Rename sliced_child to get_sliced_child. (#10885) @bdice
Add parameters to control page size in Parquet writer (#10882) @etseidl
Make cudf::test::expect_columns_equal() to fail when comparing unsanitary lists. (#10880) @nvdbaranec
Cleanup regex compiler fixed quantifiers source (#10843) @davidwendt
Refactor cudf::contains, renaming and switching parameters role (#10802) @ttnghia
Generic serialization of all column types (#10784) @wence-
Return per-file metadata from readers (#10782) @vuule
HostColumnVectoreCore#isNull should return true for out-of-range rows (#10779) @gerashegalov
Update groupby::hash to use new row operators for keys (#10770) @PointKernel
update mangle_dupe_cols behavior in csv reader to match pandas 1.4.0 behavior (#10749) @karthikeyann
Rename CUDA_TRY macro to CUDF_CUDA_TRY, rename CHECK_CUDA macro to CUDF_CHECK_CUDA. (#10589) @bdice
Upgrade cudf to support pandas 1.4.x versions (#10584) @galipremsagar
Move binop methods from Frame to IndexedFrame and standardize the docstring (#10576) @vyasr
Add default= kwarg to .list.get() accessor method (#10547) @shwina
Remove deprecated decimal_cols_as_float in the ORC reader (#10515) @vuule
Support nvComp 2.3 if local, otherwise use nvcomp 2.2 (#10513) @robertmaynard
Fix findall_record to return empty list for no matches (#10491) @davidwendt
Namespace/Docstring Fixes for Reduction (#10471) @isVoid
Additional refactoring of hash functions (#10462) @bdice
Fix default value of str.split expand parameter. (#10457) @bdice
Remove deprecated code. (#10450) @vyasr

🐛 Bug Fixes

Fix single column MultiIndex issue in sort_index (#10957) @galipremsagar
Make SerializedTableHeader(numRows) public (#10949) @gerashegalov
Fix gcc_linux version pinning in dev environment (#10943) @galipremsagar
Fix an issue with reading raw string in cudf.read_json (#10924) @galipremsagar
Make cudf::test::expect_columns_equal() to fail when comparing unsanitary lists. (#10880) @nvdbaranec
Fix segmented_reduce on empty column with non-empty offsets (#10876) @davidwendt
Fix dask-cudf groupby handling when grouping by all columns (#10866) @charlesbluca
Fix a bug in distinct: using nested nulls logic (#10848) @PointKernel
Fix constness / references in weak ordering operator() signatures. (#10846) @bdice
Suppress sizeof-array-div warnings in thrust found by gcc-11 (#10840) @robertmaynard
Add handling for string by-columns in dask-cudf groupby (#10830) @charlesbluca
Fix compile warning in search.cu (#10827) @davidwendt
Fix element access const correctness in hostdevice_vector (#10804) @vuule
Update cuco git tag (#10788) @PointKernel
HostColumnVectoreCore#isNull should return true for out-of-range rows (#10779) @gerashegalov
Fixing deprecation warnings in test_orc.py (#10772) @hyperbolic2346
Enable writing to s3 storage in chunked parquet writer (#10769) @galipremsagar
Fix construction of nested structs with EMPTY child (#10761) @shwina
Fix replace error when regex has only zero match quantifiers (#10760) @davidwendt
Fix an issue with one_level_list schemas in parquet reader. (#10750) @nvdbaranec
update mangle_dupe_cols behavior in csv reader to match pandas 1.4.0 behavior (#10749) @karthikeyann
Fix cupy function in notebook (#10737) @ajschmidt8
Fix fillna to retain columns when it is MultiIndex (#10729) @galipremsagar
Fix scatter for all-empty-string column case (#10724) @davidwendt
Retain series name in Series.apply (#10716) @brandon-b-miller
Correct build dir cudf-config dependency issues for static builds (#10704) @robertmaynard
Fix list of testing requirements in setup.py. (#10678) @bdice
Fix rounding to zero error in stod on very small float numbers (#10672) @davidwendt
cuco isn't a cudf dependency when we are built shared (#10662) @robertmaynard
Fix to_timestamps to support Z for %z format specifier (#10617) @davidwendt
Verify compression type in Parquet reader (#10610) @vuule
Fix struct row comparator's exception on empty structs (#10604) @sperlingxx
Fix strings strip() to accept only str Scalar for to_strip parameter (#10597) @davidwendt
Fix has_atomic_support check in can_use_hash_groupby() (#10588) @jbrennan333
Revert Thrust 1.16 to Thrust 1.15 (#10586) @bdice
Fix missing RMM_STATIC_CUDART define when compiling JNI with static CUDA runtime (#10585) @jlowe
pin more cmake versions (#10570) @robertmaynard
Re-enable Build Metrics Report (#10562) @davidwendt
Remove statically linked CUDA runtime check in Java build (#10532) @jlowe
Fix temp data cleanup in test_text.py (#10524) @brandon-b-miller
Update pre-commit to run black 22.3.0 (#10523) @vyasr
Remove deprecated decimal_cols_as_float in the ORC reader (#10515) @vuule
Fix findall_record to return empty list for no matches (#10491) @davidwendt
Allow users to specify data types for a subset of columns in read_csv (#10484) @vuule
Fix default value of str.split expand parameter. (#10457) @bdice
Improve coverage of dask-cudf's groupby aggregation, add tests for dropna support (#10449) @charlesbluca
Allow string aggs for dask_cudf.CudfDataFrameGroupBy.aggregate (#10222) @charlesbluca
In-place updates with loc or iloc don't work correctly when the LHS has more than one column (#9918) @skirui-source

📖 Documentation

Clarify append deprecation notice. (#10930) @bdice
Use full name of GPUDirect Storage SDK in docs (#10904) @vuule
Update Dask + Pandas to Dask + cuDF path (#10897) @miguelusque
Add missing documentation in cudf/types.hpp (#10895) @karthikeyann
Add strong index iterator docs. (#10888) @bdice
spell check fixes (#10865) @karthikeyann
Add missing documentation in scalar/ headers (#10861) @karthikeyann
Remove typo in ngram documentation (#10859) @miguelusque
fix doxygen warnings (#10842) @karthikeyann
Add a library_design.md file documenting the core Python data structures and their relationship (#10817) @vyasr
Add NumPy to intersphinx references. (#10809) @bdice
Add a section to the docs that compares cuDF with Pandas (#10796) @shwina
Mention 2 cpp-reviewer requirement in pull request template (#10768) @davidwendt
Enable pydocstyle for all packages. (#10759) @bdice
Enable pydocstyle rules involving quotes (#10748) @vyasr
Revise 10 minutes notebook. (#10738) @bdice
Reorganize cuDF Python docs (#10691) @shwina
Fix sphinx/jupyter heading issue in UDF notebook (#10690) @brandon-b-miller
Migrated user guide notebooks to MyST-NB and added sphinx extension (#10685) @mmccarty
add data generation to benchmark documentation (#10677) @karthikeyann
Fix some docs build warnings (#10674) @galipremsagar
Update UDF notebook in User Guide. (#10668) @bdice
Improve User Guide docs (#10663) @bdice
Fix some docstrings formatting (#10660) @galipremsagar
Remove implementation details from apply docstrings (#10651) @brandon-b-miller
Revise CONTRIBUTING.md (#10644) @bdice
Add missing APIs to documentation. (#10643) @bdice
Use cudf.read_json as documented API name. (#10640) @bdice
Fix docstring section headings. (#10639) @bdice
Document cudf.read_text and cudf.read_avro. (#10638) @bdice
Fix type-o in docstring for json_reader_options (#10627) @dagardner-nv
Update guide to UDFs with notes about Series.applymap deprecation and related changes (#10607) @brandon-b-miller
Fix doxygen Modules page for cudf::lists::sequences (#10561) @davidwendt
Add Replace Backreferences section to Regex Features page (#10560) @davidwendt
Introduce deprecation policy to developer guide. (#10252) @vyasr

🚀 New Features

Enable Zstandard decompression only when all nvcomp integrations are enabled (#10944) @vuule
Handle nested types in cudf::concatenate_rows() (#10890) @nvdbaranec
Strong index types for equality comparator (#10883) @ttnghia
Add parameters to control page size in Parquet writer (#10882) @etseidl
Support for Zstandard decompression in ORC reader (#10873) @vuule
Use pre-built nvcomp 2.3 binaries by default (#10851) @robertmaynard
Support for Zstandard decompression in Parquet reader (#10847) @vuule
Add JNI support for apply_boolean_mask (#10812) @res-life
Segmented Min/Max for Fixed Point Types (#10794) @isVoid
Return per-file metadata from readers (#10782) @vuule
Segmented apply_boolean_mask for LIST columns (#10773) @mythrocks
Update groupby::hash to use new row operators for keys (#10770) @PointKernel
Support purging non-empty null elements from LIST/STRING columns (#10701) @mythrocks
Add detail::hash_join (#10695) @PointKernel
Persist string statistics data across multiple calls to orc chunked write (#10694) @hyperbolic2346
Add .list.astype() to cast list leaves to specified dtype (#10693) @shwina
JNI: Add generateListOffsets API (#10683) @sperlingxx
Support args in groupby apply (#10682) @brandon-b-miller
Enable segmented_gather in Java package (#10669) @sperlingxx
Add row hasher with nested column support (#10641) @devavret
Add support for numeric_only in DataFrame._reduce (#10629) @martinfalisse
First step toward statistics in ORC files with chunked writes (#10567) @hyperbolic2346
Add support for struct columns to the random table generator (#10566) @vuule
Enable passing a sequence for the index argument to .list.get() (#10564) @shwina
Add python bindings for cudf::list::index_of (#10549) @ChrisJar
Add default= kwarg to .list.get() accessor method (#10547) @shwina
Add cudf.DataFrame.applymap (#10542) @brandon-b-miller
Support nvComp 2.3 if local, otherwise use nvcomp 2.2 (#10513) @robertmaynard
Add column field ID control in parquet writer (#10504) @PointKernel
Deprecate Series.applymap (#10497) @brandon-b-miller
Add option to drop cache in cuIO benchmarks (#10488) @vuule
move benchmark input generation in device in reduction nvbench (#10486) @karthikeyann
Support Segmented Min/Max Reduction on String Type (#10447) @isVoid
List element Equality comparator (#10289) @devavret
Implement all methods of groupby rank aggregation in libcudf, python (#9569) @karthikeyann
Implement DataFrame.eval using libcudf ASTs (#8022) @vyasr

🛠️ Improvements

Use conda compilers in env file (#10915) @galipremsagar
Remove C style artifacts in cuIO (#10886) @vuule
Rename sliced_child to get_sliced_child. (#10885) @bdice
Replace defaulted stream value for libcudf APIs that use NVCOMP (#10877) @jbrennan333
Add more unit tests for cudf::distinct for nested types with sliced input (#10860) @ttnghia
Changing list_view.cuh to list_view.hpp (#10854) @ttnghia
More error checking in from_dlpack (#10850) @wence-
Cleanup regex compiler fixed quantifiers source (#10843) @davidwendt
Adds the JNI call for Cuda.deviceSynchronize (#10839) @abellina
Add missing cuda-python dependency to cudf (#10833) @bdice
Change std::string parameters in cudf::strings APIs to std::string_view (#10832) @davidwendt
Split up search.cu to improve compile time (#10831) @davidwendt
Add tests for null scalar binaryops (#10828) @brandon-b-miller
Cleanup regex compile optimize functions (#10825) @davidwendt
Use ThreadedMotoServer instead of subprocess in spinning up s3 server (#10822) @galipremsagar
Import NA from missing rather than using cudf.NA everywhere (#10821) @brandon-b-miller
Refactor regex builtin character-class identifiers (#10814) @davidwendt
Change pattern parameter for regex APIs from std::string to std::string_view (#10810) @davidwendt
Make the JNI API to get list offsets as a view public. (#10807) @revans2
Add cudf JNI docker build github action (#10806) @pxLi
Removed mr parameter from inplace bitmask operations (#10805) @AtlantaPepsi
Refactor cudf::contains, renaming and switching parameters role (#10802) @ttnghia
Handle closed property in IntervalDtype.from_pandas (#10798) @wence-
Return weak orderings from device_row_comparator. (#10793) @rwlee
Rework Scalar imports (#10791) @brandon-b-miller
Enable ccache for cudfjni build in Docker (#10790) @gerashegalov
Generic serialization of all column types (#10784) @wence-
simplifying skiprows test in test_orc.py (#10783) @hyperbolic2346
Use column_views instead of column_device_views in binary operations. (#10780) @bdice
Add struct utility functions. (#10776) @bdice
Add multiple rows to subword tokenizer benchmark (#10767) @davidwendt
Refactor host decompression in ORC reader (#10764) @vuule
Flush output streams before creating a process to drop caches (#10762) @vuule
Refactor binaryop/compiled/util.cpp (#10756) @bdice
Use warp per string for long strings in cudf::strings::contains() (#10739) @davidwendt
Use generator expressions in any/all functions. (#10736) @bdice
Use canonical "magic methods" (replace x.__repr__() with repr(x)). (#10735) @bdice
Improve use of isinstance. (#10734) @bdice
Rename tests from multiIndex to multiindex. (#10732) @bdice
Two-table comparators with strong index types (#10730) @bdice
Replace std::make_pair with std::pair (C++17 CTAD) (#10727) @karthikeyann
Use structured bindings instead of std::tie (#10726) @karthikeyann
Missing f prefix on f-strings fix (#10721) @code-review-doctor
Add max_file_size parameter to chunked parquet dataset writer (#10718) @galipremsagar
Deprecate merge_sorted, change dask cudf usage to internal method (#10713) @isVoid
Prepare dask_cudf test_parquet.py for upcoming API changes (#10709) @rjzamora
Remove or simplify various utility functions (#10705) @vyasr
Allow building arrow with parquet and not python (#10702) @revans2
Partial cuIO GPU decompression refactor (#10699) @vuule
Cython API refactor: merge.pyx (#10698) @isVoid
Fix random string data length to become variable (#10697) @galipremsagar
Add bindings for index_of with column search key (#10696) @ChrisJar
Deprecate index merging (#10689) @vyasr
Remove cudf::strings::string namespace (#10684) @davidwendt
Standardize imports. (#10680) @bdice
Standardize usage of collections.abc. (#10679) @bdice
Cython API Refactor: transpose.pyx, sort.pyx (#10675) @isVoid
Add device_memory_resource parameter to create_string_vector_from_column (#10673) @davidwendt
Split up mixed-join kernels source files (#10671) @davidwendt
Use std::filesystem for temporary directory location and deletion (#10664) @vuule
cleanup benchmark includes (#10661) @karthikeyann
Use upstream clang-format pre-commit hook. (#10659) @bdice
Clean up C++ includes to use <> instead of "". (#10658) @bdice
Handle RuntimeError thrown by CUDA Python in validate_setup (#10653) @shwina
Rework JNI CMake to leverage rapids_find_package (#10649) @jlowe
Use conda to build python packages during GPU tests (#10648) @Ethyling
Deprecate various functions that don't need to be defined for Index. (#10647) @vyasr
Update pinning to allow newer CMake versions. (#10646) @vyasr
Bump hadoop-common from 3.1.4 to 3.2.3 in /java (#10645) @dependabot[bot]
Remove concurrent_unordered_multimap. (#10642) @bdice
Improve parquet dictionary encoding (#10635) @PointKernel
Improve cudf::cuda_error (#10630) @sperlingxx
Add support for null and non-numeric types in Series.diff and DataFrame.diff (#10625) @Matt711
Branch 22.06 merge 22.04 (#10624) @vyasr
Unpin dask & distributed for development (#10623) @galipremsagar
Slightly improve accuracy of stod in to_floats (#10622) @davidwendt
Allow libcudfjni to be built as a static library (#10619) @jlowe
Change stack-based regex state data to use global memory (#10600) @davidwendt
Resolve Forward merging of branch-22.04 into branch-22.06 (#10598) @galipremsagar
KvikIO as an alternative GDS backend (#10593) @madsbk
Rename CUDA_TRY macro to CUDF_CUDA_TRY, rename CHECK_CUDA macro to CUDF_CHECK_CUDA. (#10589) @bdice
Upgrade cudf to support pandas 1.4.x versions (#10584) @galipremsagar
Refactor binary ops for timedelta and datetime columns (#10581) @vyasr
Refactor cudf::strings::count_re API to use count_matches utility (#10580) @davidwendt
Update Programming Language :: Python Versions to 3.8 & 3.9 (#10579) @madsbk
Automate Java cudf jar build with statically linked dependencies (#10578) @gerashegalov
Add patch for thrust-cub 1.16 to fix sort compile times (#10577) @davidwendt
Move binop methods from Frame to IndexedFrame and standardize the docstring (#10576) @vyasr
Cleanup libcudf strings regex classes (#10573) @davidwendt
Simplify preprocessing of arguments for DataFrame binops (#10563) @vyasr
Reduce kernel calls to build strings findall results (#10559) @davidwendt
Forward-merge branch-22.04 to branch-22.06 (#10557) @bdice
Update strings contains benchmark to measure varying match rates (#10555) @davidwendt
JNI: throw CUDA errors more specifically (#10551) @sperlingxx
Enable building static libs (#10545) @trxcllnt
Remove pip requirements files. (#10543) @bdice
Remove Click pinnings that are unnecessary after upgrading black. (#10541) @vyasr
Refactor memory_usage to improve performance (#10537) @galipremsagar
Adjust the valid range of group index for replace_with_backrefs (#10530) @sperlingxx
add accidentally removed comment. (#10526) @vyasr
Update conda environment. (#10525) @vyasr
Remove ColumnBase.getitem (#10516) @vyasr
Optimize left_semi_join by materializing the gather mask (#10511) @cheinger
Define proper binary operation APIs for columns (#10509) @vyasr
Upgrade arrow-cpp & pyarrow to 7.0.0 (#10503) @galipremsagar
Update to Thrust 1.16 (#10489) @bdice
Namespace/Docstring Fixes for Reduction (#10471) @isVoid
Update cudfjni 22.06.0-SNAPSHOT (#10467) @pxLi
Use Lists of Columns for Various Files (#10463) @isVoid
Additional refactoring of hash functions (#10462) @bdice
Fix Series.str.findall behavior for expand=False. (#10459) @bdice
Remove deprecated code. (#10450) @vyasr
Update cmake-format version. (#10440) @vyasr
Consolidate C++ conda recipes and add libcudf-tests package (#10326) @ajschmidt8
Use conda compilers (#10275) @Ethyling
Add row bitmask as a detail::hash_join member (#10248) @PointKernel

cudf - v22.04.00

Published by GPUtester over 2 years ago

🚨 Breaking Changes

Drop unsupported method argument from nunique and distinct_count. (#10411) @bdice
Refactor stream compaction APIs (#10370) @PointKernel
Add scan_aggregation and reduce_aggregation derived types. (#10357) @nvdbaranec
Avoid decimal type narrowing for decimal binops (#10299) @galipremsagar
Rewrites sample API (#10262) @isVoid
Remove probe-time null equality parameters in cudf::hash_join (#10260) @PointKernel
Enable proper Index round-tripping in orc reader and writer (#10170) @galipremsagar
Add JNI for strings::split_re and strings::split_record_re (#10139) @ttnghia
Change cudf::strings::find_multiple to return a lists column (#10134) @davidwendt
Remove the option to completely disable decimal128 columns in the ORC reader (#10127) @vuule
Remove deprecated code (#10124) @vyasr
Update gpu_utils.py to reflect current CUDA support. (#10113) @bdice
Optimize compaction operations (#10030) @PointKernel
Remove deprecated method Series.set_index. (#9945) @bdice
Add cudf::strings::findall_record API (#9911) @davidwendt
Upgrade arrow & pyarrow to 6.0.1 (#9686) @galipremsagar

🐛 Bug Fixes

Fix an issue with tdigest merge aggregations. (#10506) @nvdbaranec
Batch of fixes for index overflows in grid stride loops. (#10448) @nvdbaranec
Update dask_cudf imports to be compatible with latest dask (#10442) @rlratzel
Fix for integer overflow in contiguous-split (#10437) @jbrennan333
Fix has_null predicate for drop_list_duplicates on nested structs (#10436) @sperlingxx
Fix empty reduce with List output and non-List input (#10435) @sperlingxx
Fix list and struct meta generation issue in dask-cudf (#10434) @galipremsagar
Fix error in cudf.to_numeric when a bool input is passed (#10431) @galipremsagar
Support cupy array in quantile input (#10429) @galipremsagar
Fix benchmarks to work with new aggregation types (#10428) @davidwendt
Fix cudf::shift to handle offset greater than column size (#10414) @davidwendt
Fix lifespan of the temporary directory that holds cuFile configuration file (#10403) @vuule
Fix error thrown in compiled-binaryop benchmark (#10398) @davidwendt
Limiting async allocator using alignment of 512 (#10395) @rongou
Include <optional> in multibyte split. (#10385) @bdice
Fix issue with column and scalar re-assignment (#10377) @galipremsagar
Fix floating point data generation in benchmarks (#10372) @vuule
Avoid overflow in fused_concatenate_kernel output_index (#10344) @abellina
Remove is_relationally_comparable for table device views (#10342) @davidwendt
Fix debug compile error in device_span to column_view conversion (#10331) @davidwendt
Add Pascal support to JCUDF transcode (row_conversion) (#10329) @mythrocks
Fix std::bad_alloc exception due to JIT reserving a huge buffer (#10317) @ttnghia
Fixes up the overflowed fixed-point round on nullable column (#10316) @sperlingxx
Fix DataFrame slicing issues for empty cases (#10310) @brandon-b-miller
Fix documentation issues (#10307) @ajschmidt8
Allow Java bindings to use default decimal precisions when writing columns (#10276) @sperlingxx
Fix incorrect slicing of GDS read/write calls (#10274) @vuule
Fix out-of-memory error in compiled-binaryop benchmark (#10269) @davidwendt
Add tests of reflected ufuncs and fix behavior of logical reflected ufuncs (#10261) @vyasr
Remove probe-time null equality parameters in cudf::hash_join (#10260) @PointKernel
Fix out-of-memory error in UrlDecode benchmark (#10258) @davidwendt
Fix groupby reductions that perform operations on source type instead of target type (#10250) @ttnghia
Fix small leak in explode (#10245) @revans2
Yet another small JNI memory leak (#10238) @revans2
Fix regex octal parsing to limit to 3 characters (#10233) @davidwendt
Fix string to decimal128 conversion handling large exponents (#10231) @davidwendt
Fix JNI leak on copy to device (#10229) @revans2
Fix the data generator element size for decimal types (#10225) @vuule
Fix decimal metadata in parquet writer (#10224) @galipremsagar
Fix strings handling of hex in regex pattern (#10220) @davidwendt
Fix docs builds (#10216) @ajschmidt8
Fix a leftover _has_nulls change from Nullate (#10211) @devavret
Fix bitmask of the output for JNI of lists::drop_list_duplicates (#10210) @ttnghia
Fix compile error in binaryop/compiled/util.cpp (#10209) @ttnghia
Skip ORC and Parquet readers' benchmark cases that are not currently supported (#10194) @vuule
Fix JNI leak of a cudf::column_view native class. (#10171) @revans2
Enable proper Index round-tripping in orc reader and writer (#10170) @galipremsagar
Convert Column Name to String Before Using Struct Column Factory (#10156) @isVoid
Preserve the correct ListDtype while creating an identical empty column (#10151) @galipremsagar
benchmark fixture - static object pointer fix (#10145) @karthikeyann
Fix UDF Caching (#10133) @brandon-b-miller
Raise duplicate column error in DataFrame.rename (#10120) @galipremsagar
Fix flaky memory usage test by guaranteeing array size. (#10114) @vyasr
Encode values from python callback for C++ (#10103) @jdye64
Add check for regex instructions causing an infinite-loop (#10095) @davidwendt
Remove metadata singleton from nvtext normalizer (#10090) @davidwendt
Column equality testing fixes (#10011) @brandon-b-miller
Pin libcudf runtime dependency for cudf / libcudf-kafka nightlies (#9847) @charlesbluca

📖 Documentation

Fix documentation for DataFrame.corr and Series.corr. (#10493) @bdice
Add cut to API docs (#10479) @shwina
Remove documentation for methods removed in #10124. (#10366) @bdice
Fix documentation issues (#10306) @ajschmidt8
Fix fixed_point binary operation documentation (#10198) @codereport
Remove cleaned up methods from docs (#10189) @galipremsagar
Update developer guide to recommend no default stream parameter. (#10136) @bdice
Update benchmarking guide to use NVBench. (#10093) @bdice

🚀 New Features

Add StringIO support to read_text (#10465) @cwharris
Add support for tdigest and merge_tdigest aggregations through cudf::reduce (#10433) @nvdbaranec
JNI support for Collect Ops in Reduction (#10427) @sperlingxx
Enable read_text with dask_cudf using byte_range (#10407) @ChrisJar
Add cudf::stable_sort_by_key (#10387) @PointKernel
Implement maps_column_view abstraction over LIST<STRUCT<K,V>> (#10380) @mythrocks
Support Java bindings for Avro reader (#10373) @HaoYang670
Refactor stream compaction APIs (#10370) @PointKernel
Support collect aggregations in reduction (#10353) @sperlingxx
Refactor array_ufunc for Index and unify across all classes (#10346) @vyasr
Add JNI for extract_list_element with index column (#10341) @firestarman
Support min and max operations for structs in rolling window (#10332) @ttnghia
Add device create_sequence_table for benchmarks (#10300) @karthikeyann
Enable numpy ufuncs for DataFrame (#10287) @vyasr
move input generation for json benchmark to device (#10281) @karthikeyann
move input generation for type dispatcher benchmark to device (#10280) @karthikeyann
move input generation for copy benchmark to device (#10279) @karthikeyann
generate url decode benchmark input in device (#10278) @karthikeyann
device input generation in join bench (#10277) @karthikeyann
Add nvtext::byte_pair_encoding API (#10270) @davidwendt
Prevent internal usage of expensive APIs (#10263) @vyasr
Column to JCUDF row for tables with strings (#10235) @hyperbolic2346
Support percent_rank() aggregation (#10227) @mythrocks
Refactor Series.array_ufunc (#10217) @vyasr
Reduce pytest runtime (#10203) @brandon-b-miller
Add regex flags parameter to python cudf strings split (#10185) @davidwendt
Support for MOD, PMOD and PYMOD for decimal32/64/128 (#10179) @codereport
Adding string row size iterator for row to column and column to row conversion (#10157) @hyperbolic2346
Add file size counter to cuIO benchmarks (#10154) @vuule
byte_range support for multibyte_split/read_text (#10150) @cwharris
Add JNI for strings::split_re and strings::split_record_re (#10139) @ttnghia
Add maxSplit parameter to Java binding for strings:split (#10137) @ttnghia
Add libcudf strings split API that accepts regex pattern (#10128) @davidwendt
generate benchmark input in device (#10109) @karthikeyann
Avoid nan_as_null op if nan_count is 0 (#10082) @galipremsagar
Add Dataframe and Index nunique (#10077) @martinfalisse
Support nanosecond timestamps in parquet (#10063) @PointKernel
Java bindings for mixed semi and anti joins (#10040) @jlowe
Implement mixed equality/conditional semi/anti joins (#10037) @vyasr
Optimize compaction operations (#10030) @PointKernel
Support args= in Series.apply (#9982) @brandon-b-miller
Add cudf::strings::findall_record API (#9911) @davidwendt
Add covariance for sort groupby (python) (#9889) @mayankanand007
Implement DataFrame diff() (#9817) @skirui-source
Implement DataFrame pct_change (#9805) @skirui-source
Support segmented reductions and null mask reductions (#9621) @isVoid
Add 'spearman' correlation method for dataframe.corr and series.corr (#7141) @dominicshanshan

🛠️ Improvements

Add scipy skip for a test (#10502) @galipremsagar
Temporarily disable new ops-bot functionality (#10496) @ajschmidt8
Include <cstddef> to fix compilation of parquet reader on GCC 11. (#10483) @bdice
Pin dask and distributed (#10481) @galipremsagar
MD5 refactoring. (#10445) @bdice
Remove or split up Frame methods that use the index (#10439) @vyasr
Centralization of tdigest aggregation code. (#10422) @nvdbaranec
Simplify column binary operations (#10421) @vyasr
Add .github/ops-bot.yaml config file (#10420) @ajschmidt8
Use list of columns for methods in Groupby.pyx (#10419) @isVoid
Remove warnings in test_timedelta.py (#10418) @galipremsagar
Fix some warnings in test_parquet.py (#10416) @galipremsagar
JNI support for segmented reduce (#10413) @revans2
Clean up null mask after purging null entries (#10412) @sperlingxx
Drop unsupported method argument from nunique and distinct_count. (#10411) @bdice
Use str instead of builtins.str. (#10410) @bdice
Fix warnings in test_rolling (#10405) @bdice
Enable codecov github-check in CI (#10404) @galipremsagar
Fix warnings in test_cuda_apply, test_numerical, test_pickling, test_unaops. (#10402) @bdice
Set column names in _from_columns_like_self factory (#10400) @isVoid
Refactor nvtx annotations in cudf & dask-cudf (#10396) @galipremsagar
Consolidate .cov and .corr for sort groupby (#10386) @skirui-source
Consolidate some Frame APIs (#10381) @vyasr
Refactor hash functions and hash_combine (#10379) @bdice
Add nvtx annotations for Series and Index (#10374) @galipremsagar
Refactor filling.repeat API (#10371) @isVoid
Move standalone UTF8 functions from string_view.hpp to utf8.hpp (#10369) @davidwendt
Remove doc for deprecated function one_hot_encoding (#10367) @isVoid
Refactor array function (#10364) @vyasr
Fix warnings in test_csv.py. (#10362) @bdice
Implement a mixin for binops (#10360) @vyasr
Refactor cython interface: copying.pyx (#10359) @isVoid
Implement a mixin for scans (#10358) @vyasr
Add scan_aggregation and reduce_aggregation derived types. (#10357) @nvdbaranec
Add cleanup of python artifacts (#10355) @galipremsagar
Fix warnings in test_categorical.py. (#10354) @bdice
Create a dispatcher for invoking regex kernel functions (#10349) @davidwendt
Fix codecov in CI (#10347) @galipremsagar
Enable caching for memory_usage calculation in Column (#10345) @galipremsagar
C++17 cleanup: traits replace std::enable_if<>::type with std::enable_if_t (#10343) @karthikeyann
JNI: Support appending DECIMAL128 into ColumnBuilder in terms of byte array (#10338) @sperlingxx
multibyte_split test improvements (#10328) @vuule
Fix warnings in test_binops.py. (#10327) @bdice
Fix warnings from pandas in test_array_ufunc.py. (#10324) @bdice
Update upload script (#10321) @ajschmidt8
Move hash type declarations to hashing.hpp (#10320) @davidwendt
C++17 cleanup: traits replace ::value with _v (#10319) @karthikeyann
Remove internal columns usage (#10315) @vyasr
Remove extraneous build.sh parameter (#10313) @ajschmidt8
Add const qualifier to MurmurHash3_32::hash_combine (#10311) @davidwendt
Remove TODO in libcudf_kafka recipe (#10309) @ajschmidt8
Add conversions between column_view and device_span<T const>. (#10302) @bdice
Avoid decimal type narrowing for decimal binops (#10299) @galipremsagar
Deprecate DataFrame.iteritems and introduce .items (#10298) @galipremsagar
Explicitly request CMake use gnu++17 over c++17 (#10297) @robertmaynard
Add copyright check as pre-commit hook. (#10290) @vyasr
DataFrame insert and creation optimizations (#10285) @galipremsagar
Improve hash join detail functions (#10273) @PointKernel
Replace custom cached_property implementation with functools (#10272) @shwina
Rewrites sample API (#10262) @isVoid
Bump hadoop-common from 3.1.0 to 3.1.4 in /java (#10259) @dependabot[bot]
Remove making redundant copy across code-base (#10257) @galipremsagar
Add more nvtx annotations (#10256) @galipremsagar
Add copyright check in cudf (#10253) @galipremsagar
Remove redundant copies in fillna to improve performance (#10241) @galipremsagar
Remove std::numeric_limit specializations for timestamp & durations (#10239) @codereport
Optimize DataFrame creation across code-base (#10236) @galipremsagar
Change pytest distribution algorithm and increase parallelism in CI (#10232) @galipremsagar
Add environment variables for I/O thread pool and slice sizes (#10218) @vuule
Add regex flags to strings findall functions (#10208) @davidwendt
Update dask-cudf parquet tests to reflect upstream bugfixes to _metadata (#10206) @charlesbluca
Remove unnecessary nunique function in Series. (#10205) @martinfalisse
Refactor DataFrame tests. (#10204) @bdice
Rewrites column.__setitem__, Use boolean_mask_scatter (#10202) @isVoid
Java utilities to aid in accelerating aggregations on 128-bit types (#10201) @jlowe
Fix docstrings alignment in Frame methods (#10199) @galipremsagar
Fix cuco pair issue in hash join (#10195) @PointKernel
Replace dask groupby .index usages with .by (#10193) @galipremsagar
Add regex flags to strings extract function (#10192) @davidwendt
Forward-merge branch-22.02 to branch-22.04 (#10191) @bdice
Add CMake install rule for tests (#10190) @ajschmidt8
Unpin dask & distributed (#10182) @galipremsagar
Add comments to explain test validation (#10176) @galipremsagar
Reduce warnings in pytest output (#10168) @bdice
Some consolidation of indexed frame methods (#10167) @vyasr
Refactor isin implementations (#10165) @vyasr
Faster struct row comparator (#10164) @devavret
Refactor groupby::get_groups. (#10161) @bdice
Deprecate decimal_cols_as_float in ORC reader (C++ layer) (#10152) @vuule
Replace ccache with sccache (#10146) @ajschmidt8
Murmur3 hash kernel cleanup (#10143) @rwlee
Deprecate decimal_cols_as_float in ORC reader (#10142) @galipremsagar
Run pyupgrade 2.31.0. (#10141) @bdice
Remove drop_nan from internal IndexedFrame._drop_na_rows. (#10140) @bdice
Change cudf::strings::find_multiple to return a lists column (#10134) @davidwendt
Update cmake-format script for branch 22.04. (#10132) @bdice
Accept r-value references in convert_table_for_return(): (#10131) @mythrocks
Remove the option to completely disable decimal128 columns in the ORC reader (#10127) @vuule
Remove deprecated code (#10124) @vyasr
Update gpu_utils.py to reflect current CUDA support. (#10113) @bdice
Remove benchmarks suffix (#10112) @bdice
Update cudf java binding version to 22.04.0-SNAPSHOT (#10084) @pxLi
Remove unnecessary docker files. (#10069) @vyasr
Limit benchmark iterations using environment variable (#10060) @karthikeyann
Add timing chart for libcudf build metrics report page (#10038) @davidwendt
JNI: Rewrite growBuffersAndRows to accelerate the HostColumnBuilder (#10025) @sperlingxx
Reduce redundant code in CUDF JNI (#10019) @mythrocks
Make snappy decompress check more efficient (#9995) @cheinger
Remove deprecated method Series.set_index. (#9945) @bdice
Implement a mixin for reductions (#9925) @vyasr
JNI: Push back decimal utils from spark-rapids (#9907) @sperlingxx
Add assert_column_memory_* (#9882) @isVoid
Add CUDF_UNREACHABLE macro. (#9727) @bdice
Upgrade arrow & pyarrow to 6.0.1 (#9686) @galipremsagar

cudf - v22.02.00

Published by GPUtester over 2 years ago

🚨 Breaking Changes

ORC writer API changes for granular statistics (#10058) @mythrocks
decimal128 Support for to/from_arrow (#9986) @codereport
Remove deprecated method one_hot_encoding (#9977) @isVoid
Remove str.subword_tokenize (#9968) @VibhuJawa
Remove deprecated method parameter from merge and join. (#9944) @bdice
Remove deprecated method DataFrame.hash_columns. (#9943) @bdice
Remove deprecated method Series.hash_encode. (#9942) @bdice
Refactoring ceil/round/floor code for datetime64 types (#9926) @mayankanand007
Introduce nan_as_null parameter for cudf.Index (#9893) @galipremsagar
Add regex_flags parameter to strings replace_re functions (#9878) @davidwendt
Break tie for top categorical columns in Series.describe (#9867) @isVoid
Add partitioning support in parquet writer (#9810) @devavret
Move drop_duplicates, drop_na, _gather, take to IndexFrame and create their _base_index counterparts (#9807) @isVoid
Raise temporary error for decimal128 types in parquet reader (#9804) @galipremsagar
Change default dtype of all nulls column from float to object (#9803) @galipremsagar
Remove unused masked udf cython/c++ code (#9792) @brandon-b-miller
Pick smallest decimal type with required precision in ORC reader (#9775) @vuule
Add decimal128 support to Parquet reader and writer (#9765) @vuule
Refactor TableTest assertion methods to a separate utility class (#9762) @jlowe
Use cuFile direct device reads/writes by default in cuIO (#9722) @vuule
Match pandas scalar result types in reductions (#9717) @brandon-b-miller
Add parameters to control row group size in Parquet writer (#9677) @vuule
Refactor bit counting APIs, introduce valid/null count functions, and split host/device side code for segmented counts. (#9588) @bdice
Add support for decimal128 in cudf python (#9533) @galipremsagar
Implement lists::index_of() to find positions in list rows (#9510) @mythrocks
Rewriting row/column conversions for Spark <-> cudf data conversions (#8444) @hyperbolic2346

🐛 Bug Fixes

Add check for negative stripe index in ORC reader (#10074) @vuule
Update Java tests to expect DECIMAL128 from Arrow (#10073) @jlowe
Avoid index materialization when DataFrame is created with un-named Series objects (#10071) @galipremsagar
fix gcc 11 compilation errors (#10067) @rongou
Fix columns ordering issue in parquet reader (#10066) @galipremsagar
Fix dataframe setitem with ndarray types (#10056) @galipremsagar
Remove implicit copy due to conversion from cudf::size_type and size_t (#10045) @robertmaynard
Include <optional> in headers that use std::optional (#10044) @robertmaynard
Fix repr and concat of StructColumn (#10042) @galipremsagar
Include row group level stats when writing ORC files (#10041) @vuule
build.sh respects the --build_metrics and --incl_cache_stats flags (#10035) @robertmaynard
Fix memory leaks in JNI native code. (#10029) @mythrocks
Update JNI to use new arena mr constructor (#10027) @rongou
Fix null check when comparing structs in arg_min operation of reduction/groupby (#10026) @ttnghia
Wrap CI script shell variables in quotes to fix local testing. (#10018) @bdice
cudftestutil no longer propagates compiler flags to external users (#10017) @robertmaynard
Remove CUDA_DEVICE_CALLABLE macro usage (#10015) @hyperbolic2346
Add missing list filling header in meta.yaml (#10007) @devavret
Fix conda recipes for custreamz & cudf_kafka (#10003) @ajschmidt8
Fix matching regex word-boundary (\b) in strings replace (#9997) @davidwendt
Fix null check when comparing structs in min and max reduction/groupby operations (#9994) @ttnghia
Fix octal pattern matching in regex string (#9993) @davidwendt
decimal128 Support for to/from_arrow (#9986) @codereport
Fix groupby shift/diff/fill after selecting from a GroupBy (#9984) @shwina
Fix the overflow problem of decimal rescale (#9966) @sperlingxx
Use default value for decimal precision in parquet writer when not specified (#9963) @devavret
Fix cudf java build error. (#9958) @firestarman
Use gpuci_mamba_retry to install local artifacts. (#9951) @bdice
Fix regression HostColumnVectorCore requiring native libs (#9948) @jlowe
Rename aggregate_metadata in writer to fix name collision (#9938) @devavret
Fixed issue with percentile_approx where output tdigests could have uninitialized data at the end. (#9931) @nvdbaranec
Resolve racecheck errors in ORC kernels (#9916) @vuule
Fix the java build after parquet partitioning support (#9908) @revans2
Fix compilation of benchmark for parquet writer. (#9905) @bdice
Fix a memcheck error in ORC writer (#9896) @vuule
Introduce nan_as_null parameter for cudf.Index (#9893) @galipremsagar
Fix fallback to sort aggregation for grouping only hash aggregate (#9891) @abellina
Add zlib to cudfjni link when using static libcudf library dependency (#9890) @jlowe
TimedeltaIndex constructor raises an AttributeError. (#9884) @skirui-source
Fix cudf.Scalar string datetime construction (#9875) @brandon-b-miller
Load libcufile.so with RTLD_NODELETE flag (#9872) @vuule
Break tie for top categorical columns in Series.describe (#9867) @isVoid
Fix null handling for structs min and arg_min in groupby, groupby scan, reduction, and inclusive_scan (#9864) @ttnghia
Add one-level list encoding support in parquet reader (#9848) @PointKernel
Fix an out-of-bounds read in validity copying in contiguous_split. (#9842) @nvdbaranec
Fix join of MultiIndex to Index with one column and overlapping name. (#9830) @vyasr
Fix caching in Series.applymap (#9821) @brandon-b-miller
Enforce boolean ascending for dask-cudf sort_values (#9814) @charlesbluca
Fix ORC writer crash with empty input columns (#9808) @vuule
Change default dtype of all nulls column from float to object (#9803) @galipremsagar
Load native dependencies when Java ColumnView is loaded (#9800) @jlowe
Fix dtype-argument bug in dask_cudf read_csv (#9796) @rjzamora
Fix overflow for min calculation in strings::from_timestamps (#9793) @revans2
Fix memory error due to lambda return type deduction limitation (#9778) @karthikeyann
Revert regex $/EOL end-of-string new-line special case handling (#9774) @davidwendt
Fix missing streams (#9767) @karthikeyann
Fix make_empty_scalar_like on list_type (#9759) @sperlingxx
Update cmake and conda to 22.02 (#9746) @devavret
Fix out-of-bounds memory write in decimal128-to-string conversion (#9740) @davidwendt
Match pandas scalar result types in reductions (#9717) @brandon-b-miller
Fix regex non-multiline EOL/$ matching strings ending with a new-line (#9715) @davidwendt
Fixed build by adding more checks for int8, int16 (#9707) @razajafri
Fix null handling when boolean dtype is passed (#9691) @galipremsagar
Fix stream usage in segmented_gather() (#9679) @mythrocks

📖 Documentation

Update decimal dtypes related docs entries (#10072) @galipremsagar
Fix regex doc describing hexadecimal escape characters (#10009) @davidwendt
Fix cudf compilation instructions. (#9956) @esoha-nvidia
Fix see also links for IO APIs (#9895) @galipremsagar
Fix build instructions for libcudf doxygen (#9837) @davidwendt
Fix some doxygen warnings and add missing documentation (#9770) @karthikeyann
update cuda version in local build (#9736) @karthikeyann
Fix doxygen for enum types in libcudf (#9724) @davidwendt
Spell check fixes (#9682) @karthikeyann
Fix links in C++ Developer Guide. (#9675) @bdice

🚀 New Features

Remove libcudacxx patch needed for nvcc 11.4 (#10057) @robertmaynard
Allow CuPy 10 (#10048) @jakirkham
Add in support for NULL_LOGICAL_AND and NULL_LOGICAL_OR binops (#10016) @revans2
Add groupby.transform (only support for aggregations) (#10005) @shwina
Add partitioning support to Parquet chunked writer (#10000) @devavret
Add jni for sequences (#9972) @wbo4958
Java bindings for mixed left, inner, and full joins (#9941) @jlowe
Java bindings for JSON reader support (#9940) @wbo4958
Enable transpose for string columns in cudf python (#9937) @galipremsagar
Support structs for cudf::contains with column/scalar input (#9929) @ttnghia
Implement mixed equality/conditional joins (#9917) @vyasr
Add cudf::strings::extract_all API (#9909) @davidwendt
Implement JNI for cudf::scatter APIs (#9903) @ttnghia
JNI: Function to copy and set validity from bool column. (#9901) @mythrocks
Add dictionary support to cudf::copy_if_else (#9887) @davidwendt
add run_benchmarks target for running benchmarks with json output (#9879) @karthikeyann
Add regex_flags parameter to strings replace_re functions (#9878) @davidwendt
Add_suffix and add_prefix for DataFrames and Series (#9846) @mayankanand007
Add JNI for cudf::drop_duplicates (#9841) @ttnghia
Implement per-list sequence (#9839) @ttnghia
adding series.transpose (#9835) @mayankanand007
Adding support for Series.autocorr (#9833) @mayankanand007
Support round operation on datetime64 datatypes (#9820) @mayankanand007
Add partitioning support in parquet writer (#9810) @devavret
Raise temporary error for decimal128 types in parquet reader (#9804) @galipremsagar
Add decimal128 support to Parquet reader and writer (#9765) @vuule
Optimize groupby::scan (#9754) @PointKernel
Add sample JNI API (#9728) @res-life
Support min and max in inclusive scan for structs (#9725) @ttnghia
Add first and last method to IndexedFrame (#9710) @isVoid
Support min and max reduction for structs (#9697) @ttnghia
Add parameters to control row group size in Parquet writer (#9677) @vuule
Run compute-sanitizer in nightly build (#9641) @karthikeyann
Implement Series.datetime.floor (#9571) @skirui-source
ceil/floor for DatetimeIndex (#9554) @mayankanand007
Add support for decimal128 in cudf python (#9533) @galipremsagar
Implement lists::index_of() to find positions in list rows (#9510) @mythrocks
custreamz oauth callback for kafka (librdkafka) (#9486) @jdye64
Add Pearson correlation for sort groupby (python) (#9166) @skirui-source
Interchange dataframe protocol (#9071) @iskode
Rewriting row/column conversions for Spark <-> cudf data conversions (#8444) @hyperbolic2346

🛠️ Improvements

Prepare upload scripts for Python 3.7 removal (#10092) @Ethyling
Simplify custreamz and cudf_kafka recipes files (#10065) @Ethyling
ORC writer API changes for granular statistics (#10058) @mythrocks
Remove python constraints in cutreamz and cudf_kafka recipes (#10052) @Ethyling
Unpin dask and distributed in CI (#10028) @galipremsagar
Add _from_column_like_self factory (#10022) @isVoid
Replace custom CUDA bindings previously provided by RMM with official CUDA Python bindings (#10008) @shwina
Use cuda::std::is_arithmetic in cudf::is_numeric trait. (#9996) @bdice
Clean up CUDA stream use in cuIO (#9991) @vuule
Use addressed-ordered first fit for the pinned memory pool (#9989) @rongou
Add strings tests to transpose_test.cpp (#9985) @davidwendt
Use gpuci_mamba_retry on Java CI. (#9983) @bdice
Remove deprecated method one_hot_encoding (#9977) @isVoid
Minor cleanup of unused Python functions (#9974) @vyasr
Use new efficient partitioned parquet writing in cuDF (#9971) @devavret
Remove str.subword_tokenize (#9968) @VibhuJawa
Forward-merge branch-21.12 to branch-22.02 (#9947) @bdice
Remove deprecated method parameter from merge and join. (#9944) @bdice
Remove deprecated method DataFrame.hash_columns. (#9943) @bdice
Remove deprecated method Series.hash_encode. (#9942) @bdice
use ninja in java ci build (#9933) @rongou
Add build-time publish step to cpu build script (#9927) @davidwendt
Refactoring ceil/round/floor code for datetime64 types (#9926) @mayankanand007
Remove various unused functions (#9922) @vyasr
Raise in query if dtype is not supported (#9921) @brandon-b-miller
Add missing imports tests (#9920) @Ethyling
Spark Decimal128 hashing (#9919) @rwlee
Replace thrust/std::get with structured bindings (#9915) @codereport
Upgrade thrust version to 1.15 (#9912) @robertmaynard
Remove conda envs for CUDA 11.0 and 11.2. (#9910) @bdice
Return count of set bits from inplace_bitmask_and. (#9904) @bdice
Use dynamic nullate for join hasher and equality comparator (#9902) @davidwendt
Update ucx-py version on release using rvc (#9897) @Ethyling
Remove IncludeCategories from .clang-format (#9876) @codereport
Support statically linking CUDA runtime for Java bindings (#9873) @jlowe
Add clang-tidy to libcudf (#9860) @codereport
Remove deprecated methods from Java Table class (#9853) @jlowe
Add test for map column metadata handling in ORC writer (#9852) @vuule
Use pandas to_offset to parse frequency string in date_range (#9843) @isVoid
add templated benchmark with fixture (#9838) @karthikeyann
Use list of column inputs for apply_boolean_mask (#9832) @isVoid
Added a few more tests for Decimal to String cast (#9818) @razajafri
Run doctests. (#9815) @bdice
Avoid overflow for fixed_point round (#9809) @sperlingxx
Move drop_duplicates, drop_na, _gather, take to IndexFrame and create their _base_index counterparts (#9807) @isVoid
Use vector factories for host-device copies. (#9806) @bdice
Refactor host device macros (#9797) @vyasr
Remove unused masked udf cython/c++ code (#9792) @brandon-b-miller
Allow custom sort functions for dask-cudf sort_values (#9789) @charlesbluca
Improve build time of libcudf iterator tests (#9788) @davidwendt
Copy Java native dependencies directly into classpath (#9787) @jlowe
Add decimal types to cuIO benchmarks (#9776) @vuule
Pick smallest decimal type with required precision in ORC reader (#9775) @vuule
Avoid overflow for fixed_point cudf::cast and performance optimization (#9772) @codereport
Use CTAD with Thrust function objects (#9768) @codereport
Refactor TableTest assertion methods to a separate utility class (#9762) @jlowe
Use Java classloader to find test resources (#9760) @jlowe
Allow cast decimal128 to string and add tests (#9756) @razajafri
Load balance optimization for contiguous_split (#9755) @nvdbaranec
Consolidate and improve reset_index (#9750) @isVoid
Update to UCX-Py 0.24 (#9748) @pentschev
Skip cufile tests in JNI build script (#9744) @pxLi
Enable string to decimal 128 cast (#9742) @razajafri
Use stop instead of stop_. (#9735) @bdice
Forward-merge branch-21.12 to branch-22.02 (#9730) @bdice
Improve cmake format script (#9723) @vyasr
Use cuFile direct device reads/writes by default in cuIO (#9722) @vuule
Add directory-partitioned data support to cudf.read_parquet (#9720) @rjzamora
Use stream allocator adaptor for hash join table (#9704) @PointKernel
Update check for inf/nan strings in libcudf float conversion to ignore case (#9694) @davidwendt
Update cudf JNI to 22.02.0-SNAPSHOT (#9681) @pxLi
Replace cudf's concurrent_ordered_map with cuco::static_map in semi/anti joins (#9666) @vyasr
Some improvements to parse_decimal function and bindings for is_fixed_point (#9658) @razajafri
Add utility to format ninja-log build times (#9631) @davidwendt
Allow runtime has_nulls parameter for row operators (#9623) @davidwendt
Use fsspec.parquet for improved read_parquet performance from remote storage (#9589) @rjzamora
Refactor bit counting APIs, introduce valid/null count functions, and split host/device side code for segmented counts. (#9588) @bdice
Use List of Columns as Input for drop_nulls, gather and drop_duplicates (#9558) @isVoid
Simplify merge internals and reduce overhead (#9516) @vyasr
Add struct generation support in datagenerator & fuzz tests (#9180) @galipremsagar
Simplify write_csv by removing unnecessary writer/impl classes (#9089) @cwharris

cudf - v21.12.02

Published by GPUtester almost 3 years ago

v21.12.02

cudf - v21.12.01

Published by GPUtester almost 3 years ago

v21.12.01

cudf - v21.12.00

Published by GPUtester almost 3 years ago

🚨 Breaking Changes

Update bitmask_and and bitmask_or to return a pair of resulting mask and count of unset bits (#9616) @PointKernel
Remove sizeof and standardize on memory_usage (#9544) @vyasr
Add support for single-line regex anchors ^/$ in contains_re (#9482) @davidwendt
Refactor sorting APIs (#9464) @vyasr
Update Java nvcomp JNI bindings to nvcomp 2.x API (#9384) @jbrennan333
Support Python UDFs written in terms of rows (#9343) @brandon-b-miller
JNI: Support nested types in ORC writer (#9334) @firestarman
Optionally nullify out-of-bounds indices in segmented_gather(). (#9318) @mythrocks
Refactor cuIO timestamp processing with cuda::std::chrono (#9278) @PointKernel
Various internal MultiIndex improvements (#9243) @vyasr

🐛 Bug Fixes

Fix read_parquet bug for bytes input (#9669) @rjzamora
Use _gather internal for sort_* (#9668) @isVoid
Fix behavior of equals for non-DataFrame Frames and add tests. (#9653) @vyasr
Dont recompute output size if it is already available (#9649) @abellina
Fix read_parquet bug for extended dtypes from remote storage (#9638) @rjzamora
add const when getting data from a JNI data wrapper (#9637) @wjxiz1992
Fix debrotli issue on CUDA 11.5 (#9632) @vuule
Use std::size_t when computing join output size (#9626) @jlowe
Fix usecols parameter handling in dask_cudf.read_csv (#9618) @galipremsagar
Add support for string 'nan', 'inf' & '-inf' values while type-casting to float (#9613) @galipremsagar
Avoid passing NativeFileDatasource to pyarrow in read_parquet (#9608) @rjzamora
Fix test failure with cuda 11.5 in row_bit_count tests. (#9581) @nvdbaranec
Correct _LIBCUDACXX_CUDACC_VER value computation (#9579) @robertmaynard
Increase max RLE stream size estimate to avoid potential overflows (#9568) @vuule
Fix edge case in tdigest scalar generation for groups containing all nulls. (#9551) @nvdbaranec
Fix pytests failing in cuda-11.5 environment (#9547) @galipremsagar
compile libnvcomp with PTDS if requested (#9540) @jbrennan333
Fix segmented_gather() for null LIST rows (#9537) @mythrocks
Deprecate DataFrame.label_encoding, use private _label_encoding method internally. (#9535) @bdice
Fix several test and benchmark issues related to bitmask allocations. (#9521) @nvdbaranec
Fix for inserting duplicates in groupby result cache (#9508) @karthikeyann
Fix mismatched types error in clip() when using non int64 numeric types (#9498) @davidwendt
Match conda pinnings for style checks (revert part of #9412, #9433). (#9490) @bdice
Make sure all dask-cudf supported aggs are handled in _tree_node_agg (#9487) @charlesbluca
Resolve hash_columns FutureWarning in dask_cudf (#9481) @pentschev
Add fixed point to AllTypes in libcudf unit tests (#9472) @karthikeyann
Fix regex handling of embedded null characters (#9470) @davidwendt
Fix memcheck error in copy-if-else (#9467) @davidwendt
Fix bug in dask_cudf.read_parquet for index=False (#9453) @rjzamora
Preserve the decimal scale when creating a default scalar (#9449) @revans2
Push down parent nulls when flattening nested columns. (#9443) @mythrocks
Fix memcheck error in gtest SegmentedGatherTest/GatherSliced (#9442) @davidwendt
Revert "Fix quantile division / partition handling for dask-cudf sort… (#9438) @charlesbluca
Allow int-like objects for the decimals argument in round (#9428) @shwina
Fix stream compaction's drop_duplicates API to use stable sort (#9417) @ttnghia
Skip Comparing Uniform Window Results in Var/std Tests (#9416) @isVoid
Fix StructColumn.to_pandas type handling issues (#9388) @galipremsagar
Correct issues in the build dir cudf-config.cmake (#9386) @robertmaynard
Fix Java table partition test to account for non-deterministic ordering (#9385) @jlowe
Fix timestamp truncation/overflow bugs in orc/parquet (#9382) @PointKernel
Fix the crash in stats code (#9368) @devavret
Make Series.hash_encode results reproducible. (#9366) @bdice
Fix libcudf compile warnings on debug 11.4 build (#9360) @davidwendt
Fail gracefully when compiling python UDFs that attempt to access columns with unsupported dtypes (#9359) @brandon-b-miller
Set pass_filenames: false in mypy pre-commit configuration. (#9349) @bdice
Fix cudf_assert in cudf::io::orc::gpu::gpuDecodeOrcColumnData (#9348) @davidwendt
Fix memcheck error in groupby-tdigest get_scalar_minmax (#9339) @davidwendt
Optimizations for cudf.concat when axis=1 (#9333) @galipremsagar
Use f-string in join helper warning message. (#9325) @bdice
Avoid casting to list or struct dtypes in dask_cudf.read_parquet (#9314) @rjzamora
Fix null count in statistics for parquet (#9303) @devavret
Potential overflow of decimal32 when casting to int64_t (#9287) @codereport
Fix quantile division / partition handling for dask-cudf sort on null dataframes (#9259) @charlesbluca
Updating cudf version also updates rapids cmake branch (#9249) @robertmaynard
Implement one_hot_encoding in libcudf and bind to python (#9229) @isVoid
BUG FIX: CSV Writer ignores the header parameter when no metadata is provided (#8740) @skirui-source

📖 Documentation

Update Documentation to use TYPED_TEST_SUITE (#9654) @codereport
Add dedicated page for StringHandling in python docs (#9624) @galipremsagar
Update docstring of DataFrame.merge (#9572) @galipremsagar
Use raw strings to avoid SyntaxErrors in parsed docstrings. (#9526) @bdice
Add example to docstrings in rolling.apply (#9522) @isVoid
Update help message to escape quotes in ./build.sh --cmake-args. (#9494) @bdice
Improve Python docstring formatting. (#9493) @bdice
Update table of I/O supported types (#9476) @vuule
Document invalid regex patterns as undefined behavior (#9473) @davidwendt
Miscellaneous documentation fixes to cudf (#9471) @galipremsagar
Fix many documentation errors in libcudf. (#9355) @karthikeyann
Fixing SubwordTokenizer docs issue (#9354) @mayankanand007
Improved deprecation warnings. (#9347) @bdice
doc reorder mr, stream to stream, mr (#9308) @karthikeyann
Deprecate method parameters to DataFrame.join, DataFrame.merge. (#9291) @bdice
Added deprecation warning for .label_encoding() (#9289) @mayankanand007

🚀 New Features

Enable Series.divide and DataFrame.divide (#9630) @vyasr
Update bitmask_and and bitmask_or to return a pair of resulting mask and count of unset bits (#9616) @PointKernel
Add handling of mixed numeric types in to_dlpack (#9585) @galipremsagar
Support re.Pattern object for pat arg in str.replace (#9573) @davidwendt
Add JNI for lists::drop_list_duplicates with keys-values input column (#9553) @ttnghia
Support structs column in min, max, argmin and argmax groupby aggregate() and scan() (#9545) @ttnghia
Move libcudacxx to use rapids_cpm and use newer versions (#9539) @robertmaynard
Add scan min/max support for chrono types to libcudf reduction-scan (not groupby scan) (#9518) @davidwendt
Support args= in apply (#9514) @brandon-b-miller
Add groupby scan min/max support for strings values (#9502) @davidwendt
Add list output option to character_ngrams() function (#9499) @davidwendt
More granular column selection in ORC reader (#9496) @vuule
add min_periods, ddof to groupby covariance, & correlation aggregation (#9492) @karthikeyann
Implement Series.datetime.floor (#9488) @skirui-source
Enable linting of CMake files using pre-commit (#9484) @vyasr
Add support for single-line regex anchors ^/$ in contains_re (#9482) @davidwendt
Augment order_by to Accept a List of null_precedence (#9455) @isVoid
Add format API for list column of strings (#9454) @davidwendt
Enable Datetime/Timedelta dtypes in Masked UDFs (#9451) @brandon-b-miller
Add cudf python groupby.diff (#9446) @karthikeyann
Implement lists::stable_sort_lists for stable sorting of elements within each row of lists column (#9425) @ttnghia
add ctest memcheck using cuda-sanitizer (#9414) @karthikeyann
Support Unary Operations in Masked UDF (#9409) @isVoid
Move Several Series Function to Frame (#9394) @isVoid
MD5 Python hash API (#9390) @bdice
Add cudf strings is_title API (#9380) @davidwendt
Enable casting to int64, uint64, and double in AST code. (#9379) @vyasr
Add support for writing ORC with map columns (#9369) @vuule
extract_list_elements() with column_view indices (#9367) @mythrocks
Reimplement lists::drop_list_duplicates for keys-values lists columns (#9345) @ttnghia
Support Python UDFs written in terms of rows (#9343) @brandon-b-miller
JNI: Support nested types in ORC writer (#9334) @firestarman
Optionally nullify out-of-bounds indices in segmented_gather(). (#9318) @mythrocks
Add shallow hash function and shallow equality comparison for column_view (#9312) @karthikeyann
Add CudaMemoryBuffer for cudaMalloc memory using RMM cuda_memory_resource (#9311) @rongou
Add parameters to control row index stride and stripe size in ORC writer (#9310) @vuule
Add na_position param to dask-cudf sort_values (#9264) @charlesbluca
Add ascending parameter for dask-cudf sort_values (#9250) @charlesbluca
New array conversion methods (#9236) @vyasr
Series apply method backed by masked UDFs (#9217) @brandon-b-miller
Grouping by frequency and resampling (#9178) @shwina
Pure-python masked UDFs (#9174) @brandon-b-miller
Add Covariance, Pearson correlation for sort groupby (libcudf) (#9154) @karthikeyann
Add calendrical_month_sequence in c++ and date_range in python (#8886) @shwina

🛠️ Improvements

Followup to PR 9088 comments (#9659) @cwharris
Update cuCollections to version that supports installed libcudacxx (#9633) @robertmaynard
Add 11.5 dev.yml to cudf (#9617) @galipremsagar
Add xfail for parquet reader 11.5 issue (#9612) @galipremsagar
remove deprecated Rmm.initialize method (#9607) @rongou
Use HostColumnVectorCore for child columns in JCudfSerialization.unpackHostColumnVectors (#9596) @sperlingxx
Set RMM pool to a fixed size in JNI (#9583) @rongou
Use nvCOMP for Snappy compression/decompression (#9582) @vuule
Build CUDA version agnostic packages for dask-cudf (#9578) @Ethyling
Fixed tests warning: "TYPED_TEST_CASE is deprecated, please use TYPED_TEST_SUITE" (#9574) @ttnghia
Enable CMake format in CI and fix style (#9570) @vyasr
Add NVTX Start/End Ranges to JNI (#9563) @abellina
Add librdkafka and python-confluent-kafka to dev conda environments s… (#9562) @jdye64
Add offsets_begin/end() to strings_column_view (#9559) @davidwendt
remove alignment options for RMM jni (#9550) @rongou
Add axis parameter passthrough to DataFrame and Series take for pandas API compatibility (#9549) @dantegd
Remove sizeof and standardize on memory_usage (#9544) @vyasr
Adds cudaProfilerStart/cudaProfilerStop in JNI api (#9543) @abellina
Generalize comparison binary operations (#9542) @vyasr
Expose APIs to wrap CUDA or RMM allocations with a Java device buffer instance (#9538) @jlowe
Add scan sum support for duration types to libcudf (#9536) @davidwendt
Force inlining to improve AST performance (#9530) @vyasr
Generalize some more indexed frame methods (#9529) @vyasr
Add Java bindings for rolling window stddev aggregation (#9527) @razajafri
catch rmm::out_of_memory exceptions in jni (#9525) @rongou
Add an overload of make_empty_column with type_id parameter (#9524) @ttnghia
Accelerate conditional inner joins with larger right tables (#9523) @vyasr
Initial pass of generalizing decimal support in cudf python layer (#9517) @galipremsagar
Cleanup for flattening nested columns (#9509) @rwlee
Enable running tests using RMM arena and async memory resources (#9506) @rongou
Remove dependency on six. (#9495) @bdice
Cleanup some libcudf strings gtests (#9489) @davidwendt
Rename strings/array_tests.cu to strings/array_tests.cpp (#9480) @davidwendt
Refactor sorting APIs (#9464) @vyasr
Implement DataFrame.hash_values, deprecate DataFrame.hash_columns. (#9458) @bdice
Deprecate Series.hash_encode. (#9457) @bdice
Update conda recipes for Enhanced Compatibility effort (#9456) @ajschmidt8
Small clean up to simplify column selection code in ORC reader (#9444) @vuule
add missing stream to scalar.is_valid() wherever stream is available (#9436) @karthikeyann
Adds Deprecation Warnings to one_hot_encoding and Implement get_dummies with Cython API (#9435) @isVoid
Update pre-commit hook URLs. (#9433) @bdice
Remove pyarrow import in dask_cudf.io.parquet (#9429) @charlesbluca
Miscellaneous improvements for UDFs (#9422) @isVoid
Use pre-commit for CI (#9412) @vyasr
Update to UCX-Py 0.23 (#9407) @pentschev
Expose OutOfBoundsPolicy in JNI for Table.gather (#9406) @abellina
Improvements to tdigest aggregation code. (#9403) @nvdbaranec
Add Java API to deserialize a table to host columns (#9402) @jlowe
Frame copy to use class instead of type() (#9397) @madsbk
Change all DeprecationWarnings to FutureWarning. (#9392) @bdice
Update Java nvcomp JNI bindings to nvcomp 2.x API (#9384) @jbrennan333
Add IndexedFrame class and move SingleColumnFrame to a separate module (#9378) @vyasr
Support Arrow NativeFile and PythonFile for remote ORC storage (#9377) @rjzamora
Use Arrow PythonFile for remote CSV storage (#9376) @rjzamora
Add multi-threaded writing to GDS writes (#9372) @devavret
Miscellaneous column cleanup (#9370) @vyasr
Use single kernel to extract all groups in cudf::strings::extract (#9358) @davidwendt
Consolidate binary ops into Frame (#9357) @isVoid
Move rank scan implementations from scan_inclusive.cu to rank_scan.cu (#9351) @davidwendt
Remove usage of deprecated thrust::host_space_tag. (#9350) @bdice
Use Default Memory Resource for Temporaries in reduction.cpp (#9344) @isVoid
Fix Cython compilation warnings. (#9327) @bdice
Fix some unused variable warnings in libcudf (#9326) @davidwendt
Use optional-iterator for copy-if-else kernel (#9324) @davidwendt
Remove Table class (#9315) @vyasr
Unpin dask and distributed in CI (#9307) @galipremsagar
Add optional-iterator support to indexalator (#9306) @davidwendt
Consolidate more methods in Frame (#9305) @vyasr
Add Arrow-NativeFile and PythonFile support to read_parquet and read_csv in cudf (#9304) @rjzamora
Pin mypy in .pre-commit-config.yaml to match conda environment pinning. (#9300) @bdice
Use gather.hpp when gather-map exists in device memory (#9299) @davidwendt
Fix Automerger for Branch-21.12 from branch-21.10 (#9285) @galipremsagar
Refactor cuIO timestamp processing with cuda::std::chrono (#9278) @PointKernel
Change strings copy_if_else to use optional-iterator instead of pair-iterator (#9266) @davidwendt
Update cudf java bindings to 21.12.0-SNAPSHOT (#9248) @pxLi
Various internal MultiIndex improvements (#9243) @vyasr
Add detail interface for split and slice(table_view), refactors both function with host_span (#9226) @isVoid
Refactor MD5 implementation. (#9212) @bdice
Update groupby result_cache to allow sharing intermediate results based on column_view instead of requests. (#9195) @karthikeyann
Use nvcomp's snappy decompressor in avro reader (#9181) @devavret
Add isocalendar API support (#9169) @marlenezw
Simplify read_json by removing unnecessary reader/impl classes (#9088) @cwharris
Simplify read_csv by removing unnecessary reader/impl classes (#9041) @cwharris
Refactor hash join with cuCollections multimap (#8934) @PointKernel

cudf - v21.10.01

Published by GPUtester about 3 years ago

v21.10.01

cudf - v21.10.00

Published by GPUtester about 3 years ago

🚨 Breaking Changes

Remove Cython APIs for table view generation (#9199) @vyasr
Upgrade pandas version in cudf (#9147) @galipremsagar
Make AST operators nullable (#9096) @vyasr
Remove the option to pass data types as strings to read_csv and read_json (#9079) @vuule
Update JNI java CSV APIs to not use deprecated API (#9066) @revans2
Support additional format specifiers in from_timestamps (#9047) @davidwendt
Expose expression base class publicly and simplify public AST API (#9045) @vyasr
Add support for struct type in ORC writer (#9025) @vuule
Remove aliases of various api.types APIs from utils.dtypes. (#9011) @vyasr
Java bindings for conditional join output sizes (#9002) @jlowe
Move compute_column API out of ast namespace (#8957) @vyasr
cudf.dtype function (#8949) @shwina
Refactor Frame reductions (#8944) @vyasr
Add nested column selection to parquet reader (#8933) @devavret
JNI Aggregation Type Changes (#8919) @revans2
Add groupby_aggregation and groupby_scan_aggregation classes and force their usage. (#8906) @nvdbaranec
Expand CSV and JSON reader APIs to accept dtypes as a vector or map of data_type objects (#8856) @vuule
Change cudf docs theme to pydata theme (#8746) @galipremsagar
Enable compiled binary ops in libcudf, python and java (#8741) @karthikeyann
Make groupby transform-like op order match original data order (#8720) @isVoid

🐛 Bug Fixes

fixed_point cudf::groupby for mean aggregation (#9296) @codereport
Fix interleave_columns when the input string lists column having empty child column (#9292) @ttnghia
Update nvcomp to include fixes for installation of headers (#9276) @devavret
Fix Java column leak in testParquetWriteMap (#9271) @jlowe
Fix call to thrust::reduce_by_key in argmin/argmax libcudf groupby (#9263) @davidwendt
Fixing empty input to getMapValue crashing (#9262) @hyperbolic2346
Fix duplicate names issue in MultiIndex.deserialize (#9258) @galipremsagar
Dataframe.sort_index optimizations (#9238) @galipremsagar
Temporarily disabling problematic test in parquet writer (#9230) @devavret
Explicitly disable groupby on unsupported key types. (#9227) @mythrocks
Fix gather for sliced input structs column (#9218) @ttnghia
Fix JNI code for left semi and anti joins (#9207) @jlowe
Only install thrust when using a non 'system' version (#9206) @robertmaynard
Remove zlib from libcudf public CMake dependencies (#9204) @robertmaynard
Fix out-of-bounds memory read in orc gpuEncodeOrcColumnData (#9196) @davidwendt
Fix gather() for STRUCT inputs with no nulls in members. (#9194) @mythrocks
get_cucollections properly uses rapids_cpm_find (#9189) @robertmaynard
rapids-export correctly reference build code block and doc strings (#9186) @robertmaynard
Fix logic while parsing the sum statistic for numerical orc columns (#9183) @ayushdg
Add handling for nulls in dask_cudf.sorting.quantile_divisions (#9171) @charlesbluca
Approximate overflow detection in ORC statistics (#9163) @vuule
Use decimal precision metadata when reading from parquet files (#9162) @shwina
Fix variable name in Java build script (#9161) @jlowe
Import rapids-cmake modules using the correct cmake variable. (#9149) @robertmaynard
Fix conditional joins with empty left table (#9146) @vyasr
Fix joining on indexes with duplicate level names (#9137) @shwina
Fixes missing child column name in dtype while reading ORC file. (#9134) @rgsl888prabhu
Apply type metadata after column is slice-copied (#9131) @isVoid
Fix a bug: inner_join_size return zero if build table is empty (#9128) @PointKernel
Fix multi hive-partition parquet reading in dask-cudf (#9122) @rjzamora
Support null literals in expressions (#9117) @vyasr
Fix cudf::hash_join output size for struct joins (#9107) @jlowe
Import fix (#9104) @shwina
Fix cudf::strings::is_fixed_point checking of overflow for decimal32 (#9093) @davidwendt
Fix branch_stack calculation in row_bit_count() (#9076) @mythrocks
Fetch rapids-cmake to work around cuCollection cmake issue (#9075) @jlowe
Fix compilation errors in groupby benchmarks. (#9072) @nvdbaranec
Preserve float16 upscaling (#9069) @galipremsagar
Fix memcheck read error in libcudf contiguous_split (#9067) @davidwendt
Add support for reading ORC file with no row group index (#9060) @rgsl888prabhu
Various multiindex related fixes (#9036) @shwina
Avoid rebuilding cython in build.sh (#9034) @brandon-b-miller
Add support for percentile dispatch in dask_cudf (#9031) @galipremsagar
cudf resolve nvcc 11.0 compiler crashes during codegen (#9028) @robertmaynard
Fetch correct grouping keys agg of dask groupby (#9022) @galipremsagar
Allow where() to work with a Series and other=cudf.NA (#9019) @sarahyurick
Use correct index when returning Series from GroupBy.apply() (#9016) @charlesbluca
Fix Dataframe indexer setitem when array is passed (#9006) @galipremsagar
Fix ORC reading of files with struct columns that have null values (#9005) @vuule
Ensure JNI native libraries load when CompiledExpression loads (#8997) @jlowe
Fix memory read error in get_dremel_data in page_enc.cu (#8995) @davidwendt
Fix memory write error in get_list_child_to_list_row_mapping utility (#8994) @davidwendt
Fix debug compile error for csv_test.cpp (#8981) @davidwendt
Fix memory read/write error in concatenate_lists_ignore_null (#8978) @davidwendt
Fix concatenation of cudf.RangeIndex (#8970) @galipremsagar
Java conditional joins should not require matching column counts (#8955) @jlowe
Fix concatenate empty structs (#8947) @sperlingxx
Fix cuda-memcheck errors for some libcudf functions (#8941) @davidwendt
Apply series name to result of SeriesGroupby.apply() (#8939) @charlesbluca
cdef packed_columns as cppclass instead of struct (#8936) @charlesbluca
Inserting a cudf.NA into a DataFrame (#8923) @sarahyurick
Support casting with Pandas dtype aliases (#8920) @sarahyurick
Allow sort_values to accept same kind values as Pandas (#8912) @sarahyurick
Enable casting to pandas nullable dtypes (#8889) @brandon-b-miller
Fix libcudf memory errors (#8884) @karthikeyann
Throw KeyError when accessing field from struct with nonexistent key (#8880) @NV-jpt
replace auto with auto& ref for cast<&> (#8866) @karthikeyann
Add missing include<optional> in binops (#8864) @karthikeyann
Fix select_dtypes to work when non-class dtypes present in dataframe (#8849) @sarahyurick
Re-enable JSON tests (#8843) @vuule
Support header with embedded delimiter in csv writer (#8798) @davidwendt

📖 Documentation

Add IO docs page in cudf documentation (#9145) @galipremsagar
use correct namespace in cuio code examples (#9037) @cwharris
Restructuring Contributing doc (#9026) @iskode
Update stable version in readme (#9008) @galipremsagar
Add spans and more include guidelines to libcudf developer guide (#8931) @harrism
Update Java build instructions to mention Arrow S3 and Docker (#8867) @jlowe
List GDS-enabled formats in the docs (#8805) @vuule
Change cudf docs theme to pydata theme (#8746) @galipremsagar

🚀 New Features

Revert "Add shallow hash function and shallow equality comparison for column_view (#9185)" (#9283) @karthikeyann
Align DataFrame.apply signature with pandas (#9275) @brandon-b-miller
Add struct type support for drop_list_duplicates (#9202) @ttnghia
support CUDA async memory resource in JNI (#9201) @rongou
Add shallow hash function and shallow equality comparison for column_view (#9185) @karthikeyann
Superimpose null masks for STRUCT columns. (#9144) @mythrocks
Implemented bindings for ceil timestamp operation (#9141) @shaneding
Adding MAP type support for ORC Reader (#9132) @rgsl888prabhu
Implement interleave_columns for lists with arbitrary nested type (#9130) @ttnghia
Add python bindings to fixed-size window and groupby rolling.var, rolling.std (#9097) @isVoid
Make AST operators nullable (#9096) @vyasr
Java bindings for approx_percentile (#9094) @andygrove
Add dseries.struct.explode (#9086) @isVoid
Add support for BaseIndexer in Rolling APIs (#9085) @galipremsagar
Remove the option to pass data types as strings to read_csv and read_json (#9079) @vuule
Add handling for nested dicts in dask-cudf groupby (#9054) @charlesbluca
Added Series.dt.is_quarter_start and Series.dt.is_quarter_end (#9046) @TravisHester
Support nested types for nth_element reduction (#9043) @sperlingxx
Update sort groupby to use non-atomic operation (#9035) @karthikeyann
Add support for struct type in ORC writer (#9025) @vuule
Implement interleave_columns for structs columns (#9012) @ttnghia
Add groupby first and last aggregations (#9004) @shwina
Add DecimalBaseColumn and move as_decimal_column (#9001) @isVoid
Python/Cython bindings for multibyte_split (#8998) @jdye64
Support scalar months in add_calendrical_months, extends API to INT32 support (#8991) @isVoid
Added Series.dt.is_month_end (#8989) @TravisHester
Support for using tdigests to compute approximate percentiles. (#8983) @nvdbaranec
Support "unflatten" of columns flattened via flatten_nested_columns(): (#8956) @mythrocks
Implement timestamp ceil (#8942) @shaneding
Add nested column selection to parquet reader (#8933) @devavret
Expose conditional join size calculation (#8928) @vyasr
Support Nulls in Timeseries Generator (#8925) @isVoid
Avoid index equality check in _CPackedColumns.from_py_table() (#8917) @charlesbluca
Add dot product binary op (#8909) @charlesbluca
Expose days_in_month function in libcudf and add python bindings (#8892) @isVoid
Series string repeat (#8882) @sarahyurick
Python binding for quarters (#8862) @shaneding
Expand CSV and JSON reader APIs to accept dtypes as a vector or map of data_type objects (#8856) @vuule
Add Java bindings for AST transform (#8846) @jlowe
Series datetime is_month_start (#8844) @sarahyurick
Support bracket syntax for cudf::strings::replace_with_backrefs group index values (#8841) @davidwendt
Support VARIANCE and STD aggregation in rolling op (#8809) @isVoid
Add quarters to libcudf datetime (#8779) @shaneding
Linear Interpolation of nans via cupy (#8767) @brandon-b-miller
Enable compiled binary ops in libcudf, python and java (#8741) @karthikeyann
Make groupby transform-like op order match original data order (#8720) @isVoid
multibyte_split (#8702) @cwharris
Implement JNI for strings:repeat_strings that repeats each string separately by different numbers of times (#8572) @ttnghia

🛠️ Improvements

Pin max dask and distributed versions to 2021.09.1 (#9286) @galipremsagar
Optimized fsspec data transfer for remote file-systems (#9265) @rjzamora
Skip dask-cudf tests on arm64 (#9252) @Ethyling
Use nvcomp's snappy compressor in ORC writer (#9242) @devavret
Only run imports tests on x86_64 (#9241) @Ethyling
Remove unnecessary call to device_uvector::release() (#9237) @harrism
Use nvcomp's snappy decompression in ORC reader (#9235) @devavret
Add grouped_rolling test with STRUCT groupby keys. (#9228) @mythrocks
Optimize cudf.concat for axis=0 (#9222) @galipremsagar
Fix some libcudf calls not passing the stream parameter (#9220) @davidwendt
Add min and max bounds for random dataframe generator numeric types (#9211) @galipremsagar
Improve performance of expression evaluation (#9210) @vyasr
Misc optimizations in cudf (#9203) @galipremsagar
Remove Cython APIs for table view generation (#9199) @vyasr
Add JNI support for drop_list_duplicates (#9198) @revans2
Update pandas versions in conda recipes and requirements.txt files (#9197) @galipremsagar
Minor C++17 cleanup of groupby.cu: structured bindings, more concise lambda, etc (#9193) @codereport
Explicit about bitwidth difference between cudf boolean and arrow boolean (#9192) @isVoid
Remove _source_index from MultiIndex (#9191) @vyasr
Fix typo in the name of cudf-testing-targets.cmake (#9190) @trxcllnt
Add support for single-digits in cudf::to_timestamps (#9173) @davidwendt
Fix cufilejni build include path (#9168) @pxLi
dask_cudf dispatch registering cleanup (#9160) @galipremsagar
Remove unneeded stream/mr from a cudf::make_strings_column (#9148) @davidwendt
Upgrade pandas version in cudf (#9147) @galipremsagar
make data chunk reader return unique_ptr (#9129) @cwharris
Add backend for percentile_lookup dispatch (#9118) @galipremsagar
Refactor implementation of column setitem (#9110) @vyasr
Fix compile warnings found using nvcc 11.4 (#9101) @davidwendt
Update to UCX-Py 0.22 (#9099) @pentschev
Simplify read_avro by removing unnecessary writer/impl classes (#9090) @cwharris
Allowing %f in format to return nanoseconds (#9081) @marlenezw
Java bindings for cudf::hash_join (#9080) @jlowe
Remove stale code in ColumnBase._fill (#9078) @isVoid
Add support for get_group in GroupBy (#9070) @galipremsagar
Remove remaining "support" methods from DataFrame (#9068) @vyasr
Update JNI java CSV APIs to not use deprecated API (#9066) @revans2
Added method to remove null_masks if the column has no nulls (#9061) @razajafri
Consolidate Several Series and Dataframe Methods (#9059) @isVoid
Remove usage of string based set_dtypes for csv & json readers (#9049) @galipremsagar
Remove some debug print statements from gtests (#9048) @davidwendt
Support additional format specifiers in from_timestamps (#9047) @davidwendt
Expose expression base class publicly and simplify public AST API (#9045) @vyasr
move filepath and mmap logic out of json/csv up to functions.cpp (#9040) @cwharris
Refactor Index hierarchy (#9039) @vyasr
cudf now leverages rapids-cmake to reduce CMake boilerplate (#9030) @robertmaynard
Add support for STRUCT input to groupby (#9024) @mythrocks
Refactor Frame scans (#9021) @vyasr
Remove duplicate set_categories code (#9018) @isVoid
Map support for ParquetWriter (#9013) @razajafri
Remove aliases of various api.types APIs from utils.dtypes. (#9011) @vyasr
Java bindings for conditional join output sizes (#9002) @jlowe
Remove _copy_construct factory (#8999) @vyasr
ENH Allow arbitrary CMake config options in build.sh (#8996) @dillon-cullinan
A small optimization for JNI copy column view to column vector (#8985) @revans2
Fix nvcc warnings in ORC writer (#8975) @devavret
Support nested structs in rank and dense rank (#8962) @rwlee
Move compute_column API out of ast namespace (#8957) @vyasr
Series datetime is_year_end and is_year_start (#8954) @marlenezw
Make Java AstNode public (#8953) @jlowe
Replace allocate with device_uvector for subword_tokenize internal tables (#8952) @davidwendt
cudf.dtype function (#8949) @shwina
Refactor Frame reductions (#8944) @vyasr
Add deprecation warning for Series.set_mask API (#8943) @galipremsagar
Move AST evaluator into a separate header (#8930) @vyasr
JNI Aggregation Type Changes (#8919) @revans2
Move template parameter to function parameter in cudf::detail::left_semi_anti_join (#8914) @davidwendt
Upgrade arrow & pyarrow to 5.0.0 (#8908) @galipremsagar
Add groupby_aggregation and groupby_scan_aggregation classes and force their usage. (#8906) @nvdbaranec
Move structs_column_tests.cu to .cpp. (#8902) @mythrocks
Add stream and memory-resource parameters to struct-scalar copy ctor (#8901) @davidwendt
Combine linearizer and ast_plan (#8900) @vyasr
Add Java bindings for conditional join gather maps (#8888) @jlowe
Remove max version pin for dask & distributed on development branch (#8881) @galipremsagar
fix cufilejni build w/ c++17 (#8877) @pxLi
Add struct accessor to dask-cudf (#8874) @NV-jpt
Migrate dask-cudf CudfEngine to leverage ArrowDatasetEngine (#8871) @rjzamora
Add JNI for extract_quarter, add_calendrical_months, and is_leap_year (#8863) @revans2
Change cudf::scalar copy and move constructors to protected (#8857) @davidwendt
Replace is_same<>::value with is_same_v<> (#8852) @codereport
Add min pytorch version to importorskip in pytest (#8851) @galipremsagar
Java bindings for regex replace (#8847) @jlowe
Remove make strings children with null mask (#8830) @davidwendt
Refactor conditional joins (#8815) @vyasr
Small cleanup (unused headers / commented code removals) (#8799) @codereport
ENH Replace gpuci_conda_retry with gpuci_mamba_retry (#8770) @dillon-cullinan
Update cudf java bindings to 21.10.0-SNAPSHOT (#8765) @pxLi
Refactor and improve join benchmarks with nvbench (#8734) @PointKernel
Refactor Python factories and remove usage of Table for libcudf output handling (#8687) @vyasr
Optimize URL Decoding (#8622) @gaohao95
Parquet writer dictionary encoding refactor (#8476) @devavret
Use nvcomp's snappy decompression in parquet reader (#8252) @devavret
Use nvcomp's snappy compressor in parquet writer (#8229) @devavret

cudf - v21.08.03

Published by GPUtester about 3 years ago

v21.08.03

cudf - v21.08.02

Published by GPUtester about 3 years ago

v21.08.02

cudf - v21.08.01

Published by GPUtester about 3 years ago

v21.08.01

cudf - v21.08.00

Published by GPUtester about 3 years ago

🚨 Breaking Changes

Fix a crash in pack() when being handed tables with no columns. (#8697) @nvdbaranec
Remove unused cudf::strings::create_offsets (#8663) @davidwendt
Add delimiter parameter to cudf::strings::capitalize() (#8620) @davidwendt
Change default datetime index resolution to ns to match pandas (#8611) @vyasr
Add sequence_type parameter to cudf::strings::title function (#8602) @davidwendt
Add strings::repeat_strings API that can repeat each string a different number of times (#8561) @ttnghia
String-to-boolean conversion is different from Pandas (#8549) @skirui-source
Add accurate hash join size functions (#8453) @PointKernel
Expose a Decimal32Dtype in cuDF Python (#8438) @skirui-source
Update dask make_meta changes to be compatible with dask upstream (#8426) @galipremsagar
Adapt cudf::scalar classes to changes in rmm::device_scalar (#8411) @harrism
Remove special Index class from the general index class hierarchy (#8309) @vyasr
Add first-class dtype utilities (#8308) @vyasr
ORC - Support reading multiple orc files/buffers in a single operation (#8142) @jdye64
Upgrade arrow to 4.0.1 (#7495) @galipremsagar

🐛 Bug Fixes

Fix contains check in string column (#8834) @galipremsagar
Remove unused variable from row_bit_count_test. (#8829) @mythrocks
Fixes issue with null struct columns in ORC reader (#8819) @rgsl888prabhu
Set CMake vars for python/parquet support in libarrow builds (#8808) @vyasr
Handle empty child columns in row_bit_count() (#8791) @mythrocks
Revert "Remove cudf unneeded build time requirement of the cuda driver" (#8784) @robertmaynard
Fix isort error in utils.pyx (#8771) @charlesbluca
Handle sliced struct/list columns properly in concatenate() bounds checking. (#8760) @nvdbaranec
Fix issues with _CPackedColumns.serialize() handling of host and device data (#8759) @charlesbluca
Fix issues with MultiIndex in dropna, stack & reset_index (#8753) @galipremsagar
Write pandas extension types to parquet file metadata (#8749) @devavret
Fix where to handle DataFrame & Series input combination (#8747) @galipremsagar
Fix replace to handle null values correctly (#8744) @galipremsagar
Handle sliced structs properly in pack/contiguous_split. (#8739) @nvdbaranec
Fix issue in slice() where columns with a positive offset were computing null counts incorrectly. (#8738) @nvdbaranec
Fix cudf.Series constructor to handle list of sequences (#8735) @galipremsagar
Fix min/max sorted groupby aggregation on string column with nulls (argmin, argmax sentinel value missing on nulls) (#8731) @karthikeyann
Fix orc reader assert on create data_type in debug (#8706) @davidwendt
Fix min/max inclusive cudf::scan for strings column (#8705) @davidwendt
JNI: Fix driver version assertion logic in testGetCudaRuntimeInfo (#8701) @sperlingxx
Adding fix for skip_rows and crash in orc reader (#8700) @rgsl888prabhu
Bug fix: replace_nulls_policy functor not returning correct indices for gathermap (#8699) @isVoid
Fix a crash in pack() when being handed tables with no columns. (#8697) @nvdbaranec
Add post-processing steps to dask_cudf.groupby.CudfSeriesGroupby.aggregate (#8694) @charlesbluca
JNI build no longer looks for Arrow in conda environment (#8686) @jlowe
Handle arbitrarily different data in null list column rows when checking for equivalency. (#8666) @nvdbaranec
Add ConfigureNVBench to avoid concurrent main() entry points (#8662) @PointKernel
Pin *arrow to use *cuda in run (#8651) @jakirkham
Add proper support for tolerances in testing methods. (#8649) @vyasr
Support multi-char case conversion in capitalize function (#8647) @davidwendt
Fix repeated mangled names in read_csv with duplicate column names (#8645) @karthikeyann
Temporarily disable libcudf example build tests (#8642) @isVoid
Use conda-sourced cudf artifacts for libcudf example in CI (#8638) @isVoid
Ensure dev environment uses Arrow GPU packages (#8637) @charlesbluca
Fix bug that columns only initialized once when specified columns and index in dataframe ctor (#8628) @isVoid
Propagate **kwargs through to as_*_column methods (#8618) @shwina
Fix orc_reader_benchmark.cpp compile error (#8609) @davidwendt
Fix missed renumbering of Aggregation values (#8600) @revans2
Update cmake to 3.20.5 in the Java Docker image (#8593) @NvTimLiu
Fix bug in replace_with_backrefs when group has greedy quantifier (#8575) @davidwendt
Apply metadata to keys before returning in Frame._encode (#8560) @charlesbluca
Fix for strings containing special JSON characters in get_json_object(). (#8556) @nvdbaranec
Fix debug compile error in gather_struct_tests.cpp (#8554) @davidwendt
String-to-boolean conversion is different from Pandas (#8549) @skirui-source
Fix __repr__ output with display.max_rows is None (#8547) @galipremsagar
Fix size passed to column constructors in _with_type_metadata (#8539) @shwina
Properly retrieve last column when -1 is specified for column index (#8529) @isVoid
Fix importing apply from dask (#8517) @galipremsagar
Fix offset of the string dictionary length stream (#8515) @vuule
Fix double counting of selected columns in CSV reader (#8508) @ochan1
Incorrect map size in scatter_to_gather corrupts struct columns (#8507) @gerashegalov
replace_nulls properly propagates memory resource to gather calls (#8500) @robertmaynard
Disallow groupby aggs for StructColumns (#8499) @charlesbluca
Fixes out-of-bounds access for small files in unzip (#8498) @elstehle
Adding support for writing empty dataframe (#8490) @shaneding
Fix exclusive scan when including nulls and improve testing (#8478) @harrism
Add workaround for crash in libcudf debug build using output_indexalator in thrust::lower_bound (#8432) @davidwendt
Install only the same Thrust files that Thrust itself installs (#8420) @robertmaynard
Add nightly version for ucx-py in ci script (#8419) @galipremsagar
Fix null_equality config of rolling_collect_set (#8415) @sperlingxx
CollectSetAggregation: implement RollingAggregation interface (#8406) @sperlingxx
Handle pre-sliced nested columns in contiguous_split. (#8391) @nvdbaranec
Fix bitmask_tests.cpp host accessing device memory (#8370) @davidwendt
Fix concurrent_unordered_map to prevent accessing padding bits in pair_type (#8348) @davidwendt
BUG FIX: Raise appropriate strings error when concatenating strings column (#8290) @skirui-source
Make gpuCI and pre-commit style configurations consistent (#8215) @charlesbluca
Add collect list to dask-cudf groupby aggregations (#8045) @charlesbluca

📖 Documentation

Update Python UDFs notebook (#8810) @brandon-b-miller
Fix dask.dataframe API docs links after reorg (#8772) @jsignell
Fix instructions for running cuDF/dask-cuDF tests in CONTRIBUTING.md (#8724) @shwina
Translate Markdown documentation to rST and remove recommonmark (#8698) @vyasr
Fixed spelling mistakes in libcudf documentation (#8664) @karthikeyann
Custom Sphinx Extension: PandasCompat (#8643) @isVoid
Fix README.md (#8535) @ajschmidt8
Change namespace contains_nulls to struct (#8523) @davidwendt
Add info about NVTX ranges to dev guide (#8461) @jrhemstad
Fixed documentation bug in groupby agg method (#8325) @ahmet-uyar

🚀 New Features

Fix concatenating structs (#8811) @shaneding
Implement JNI for groupby aggregations M2 and MERGE_M2 (#8763) @ttnghia
Bump isort to 5.6.4 and remove isort overrides made for 5.0.7 (#8755) @charlesbluca
Implement __setitem__ for StructColumn (#8737) @shaneding
Add is_leap_year to DateTimeProperties and DatetimeIndex (#8736) @isVoid
Add struct.explode() method (#8729) @shwina
Add DataFrame.to_struct() method to convert a DataFrame to a struct Series (#8728) @shwina
Add support for list type in ORC writer (#8723) @vuule
Fix slicing from struct columns and accessing struct columns (#8719) @shaneding
Add datetime::is_leap_year (#8711) @isVoid
Accessing struct columns from dask_cudf (#8675) @shaneding
Added pct_change to Series (#8650) @TravisHester
Add strings support to cudf::shift function (#8648) @davidwendt
Support Scatter struct_scalar (#8630) @isVoid
Struct scalar from host dictionary (#8629) @shaneding
Add dayofyear and day_of_year to Series, DatetimeColumn, and DatetimeIndex (#8626) @beckernick
JNI support for capitalize (#8624) @firestarman
Add delimiter parameter to cudf::strings::capitalize() (#8620) @davidwendt
Add NVBench in CMake (#8619) @PointKernel
Change default datetime index resolution to ns to match pandas (#8611) @vyasr
ListColumn __setitem__ (#8606) @brandon-b-miller
Implement groupby aggregations M2 and MERGE_M2 (#8605) @ttnghia
Add sequence_type parameter to cudf::strings::title function (#8602) @davidwendt
Adding support for list and struct type in ORC Reader (#8599) @rgsl888prabhu
Benchmark for strings::repeat_strings APIs (#8589) @ttnghia
Nested scalar support for copy if else (#8588) @gerashegalov
User specified decimal columns to float64 (#8587) @jdye64
Add get_element for struct column (#8578) @isVoid
Python changes for adding __getitem__ for struct (#8577) @shaneding
Add strings::repeat_strings API that can repeat each string a different number of times (#8561) @ttnghia
Refactor tests/iterator_utilities.hpp functions (#8540) @ttnghia
Support MERGE_LISTS and MERGE_SETS in Java package (#8516) @sperlingxx
Decimal support csv reader (#8511) @elstehle
Add column type tests (#8505) @isVoid
Warn when downscaling decimal columns (#8492) @ChrisJar
Add JNI for strings::repeat_strings (#8491) @ttnghia
Add Index.get_loc for Numerical, String Index support (#8489) @isVoid
Expose half_up rounding in cuDF (#8477) @shwina
Java APIs to fetch CUDA runtime info (#8465) @sperlingxx
Add str.edit_distance_matrix (#8463) @isVoid
Support constructing cudf.Scalar objects from host side lists (#8459) @brandon-b-miller
Add accurate hash join size functions (#8453) @PointKernel
Add cudf::strings::integer_to_hex convert API (#8450) @davidwendt
Create objects from iterables that contain cudf.NA (#8442) @brandon-b-miller
JNI bindings for sort_lists (#8439) @sperlingxx
Expose a Decimal32Dtype in cuDF Python (#8438) @skirui-source
Replace all_null() and all_valid() by iterator_all_nulls() and iterator_no_null() in tests (#8437) @ttnghia
Implement groupby MERGE_LISTS and MERGE_SETS aggregates (#8436) @ttnghia
Add public libcudf match_dictionaries API (#8429) @davidwendt
Add move constructors for string_scalar and struct_scalar (#8428) @ttnghia
Implement strings::repeat_strings (#8423) @ttnghia
STRUCT column support for cudf::merge. (#8422) @nvdbaranec
Implement reverse in libcudf (#8410) @shaneding
Support multiple input files/buffers for read_json (#8403) @jdye64
Improve test coverage for struct search (#8396) @ttnghia
Add groupby.fillna (#8362) @isVoid
Enable AST-based joining (#8214) @vyasr
Generalized null support in user defined functions (#8213) @brandon-b-miller
Add compiled binary operation (#8192) @karthikeyann
Implement .describe() for DataFrameGroupBy (#8179) @skirui-source
ORC - Support reading multiple orc files/buffers in a single operation (#8142) @jdye64
Add Python bindings for lists::concatenate_list_elements and expose them as .list.concat() (#8006) @shwina
Use Arrow URI FileSystem backed instance to retrieve remote files (#7709) @jdye64
Example to build custom application and link to libcudf (#7671) @isVoid
Upgrade arrow to 4.0.1 (#7495) @galipremsagar

🛠️ Improvements

Provide a better error message when CUDA::cuda_driver not found (#8794) @robertmaynard
Remove anonymous namespace from null_mask.cuh (#8786) @nvdbaranec
Allow cudf to be built without libcuda.so existing (#8751) @robertmaynard
Pin mimesis to <4.1 (#8745) @galipremsagar
Update conda environment name for CI (#8692) @ajschmidt8
Remove flatbuffers dependency (#8671) @Ethyling
Add options to build Arrow with Python and Parquet support (#8670) @trxcllnt
Remove unused cudf::strings::create_offsets (#8663) @davidwendt
Update GDS lib version to 1.0.0 (#8654) @pxLi
Support for groupby/scan rank and dense_rank aggregations (#8652) @rwlee
Fix usage of deprecated arrow ipc API (#8632) @revans2
Use absolute imports in cudf (#8631) @galipremsagar
ENH Add Java CI build script (#8627) @dillon-cullinan
Add DeprecationWarning to ser.str.subword_tokenize (#8603) @VibhuJawa
Rewrite binary operations for improved performance and additional type support (#8598) @vyasr
Fix mypy errors surfacing because of numpy-1.21.0 (#8595) @galipremsagar
Remove unneeded includes from cudf::string_view headers (#8594) @davidwendt
Use cmake 3.20.1 as it is now required by rmm (#8586) @robertmaynard
Remove device debug symbols from cmake CUDF_CUDA_FLAGS (#8584) @davidwendt
Dask-CuDF: use default Dask Dataframe optimizer (#8581) @madsbk
Remove checking if an unsigned value is less than zero (#8579) @robertmaynard
Remove strings_count parameter from cudf::strings::detail::create_chars_child_column (#8576) @davidwendt
Make cudf.api.types imports consistent (#8571) @galipremsagar
Modernize libcudf basic example CMakeFile; updates CI build tests (#8568) @isVoid
Rename concatenate_tests.cu to .cpp (#8555) @davidwendt
enable window lead/lag test on struct (#8548) @wbo4958
Add Java methods to split and write column views (#8546) @razajafri
Small cleanup (#8534) @codereport
Unpin dask version in CI (#8533) @galipremsagar
Added optional flag for building Arrow with S3 filesystem support (#8531) @jdye64
Minor clean up of various internal column and frame utilities (#8528) @vyasr
Rename some copying_test source files .cu to .cpp (#8527) @davidwendt
Correct the last warnings and issues when using newer cuda versions (#8525) @robertmaynard
Correct unused parameter warnings in transform and unary ops (#8521) @robertmaynard
Correct unused parameter warnings in string algorithms (#8509) @robertmaynard
Add in JNI APIs for scan, replace_nulls, group_by.scan, and group_by.replace_nulls (#8503) @revans2
Fix 21.08 forward-merge conflicts (#8502) @ajschmidt8
Fix Cython formatting command in Contributing.md. (#8496) @marlenezw
Bug/correct unused parameters in reshape and text (#8495) @robertmaynard
Correct unused parameter warnings in partitioning and stream compact (#8494) @robertmaynard
Correct unused parameter warnings in labelling and list algorithms (#8493) @robertmaynard
Refactor index construction (#8485) @vyasr
Correct unused parameter warnings in replace algorithms (#8483) @robertmaynard
Correct unused parameter warnings in reduction algorithms (#8481) @robertmaynard
Correct unused parameter warnings in io algorithms (#8480) @robertmaynard
Correct unused parameter warnings in interop algorithms (#8479) @robertmaynard
Correct unused parameter warnings in filling algorithms (#8468) @robertmaynard
Correct unused parameter warnings in groupby (#8467) @robertmaynard
use libcu++ time_point as timestamp (#8466) @karthikeyann
Modify reprog_device::extract to return groups in a single pass (#8460) @davidwendt
Update minimum Dask requirement to 2021.6.0 (#8458) @pentschev
Fix failures when performing binary operations on DataFrames with empty columns (#8452) @ChrisJar
Fix conflicts in 8447 (#8448) @ajschmidt8
Add serialization methods for List and StructDtype (#8441) @charlesbluca
Replace make_empty_strings_column with make_empty_column (#8435) @davidwendt
JNI bindings for get_element (#8433) @revans2
Update dask make_meta changes to be compatible with dask upstream (#8426) @galipremsagar
Unpin dask version on CI (#8425) @galipremsagar
Add benchmark for strings/fixed_point convert APIs (#8417) @davidwendt
Adapt cudf::scalar classes to changes in rmm::device_scalar (#8411) @harrism
Add benchmark for strings/integers convert APIs (#8402) @davidwendt
Enable multi-file partitioning in dask_cudf.read_parquet (#8393) @rjzamora
Correct unused parameter warnings in rolling algorithms (#8390) @robertmaynard
Correct unused parameters in column round and search (#8389) @robertmaynard
Add functionality to apply Dtype metadata to ColumnBase (#8373) @charlesbluca
Refactor setting stack size in regex code (#8358) @davidwendt
Update Java bindings to 21.08-SNAPSHOT (#8344) @pxLi
Replace remaining uses of device_vector (#8343) @harrism
Statically link libnvcomp into libcudfjni (#8334) @jlowe
Resolve auto merge conflicts for Branch 21.08 from branch 21.06 (#8329) @galipremsagar
Minor code refactor for sorted_order (#8326) @wbo4958
Remove special Index class from the general index class hierarchy (#8309) @vyasr
Add first-class dtype utilities (#8308) @vyasr
Add option to link Java bindings with Arrow dynamically (#8307) @jlowe
Refactor ColumnMethods and its subclasses to remove column argument and require parent argument (#8306) @shwina
Refactor scatter for list columns (#8255) @isVoid
Expose pack/unpack API to Python (#8153) @charlesbluca
Adding cudf.cut method (#8002) @marlenezw
Optimize string gather performance for large strings (#7980) @gaohao95
Add peak memory usage tracking to cuIO benchmarks (#7770) @devavret
Updating Clang Version to 11.0.0 (#6695) @codereport

cudf - v21.06.01

Published by GPUtester over 3 years ago

cudf - v21.06.00

Published by GPUtester over 3 years ago

🚨 Breaking Changes

Add support for make_meta_obj dispatch in dask-cudf (#8342) @galipremsagar
Add separator-on-null parameter to strings concatenate APIs (#8282) @davidwendt
Introduce a common parent class for NumericalColumn and DecimalColumn (#8278) @vyasr
Update ORC statistics API to use C++17 standard library (#8241) @vuule
Preserve column hierarchy when getting NULL row from LIST column (#8206) @isVoid
Groupby.shift c++ API refactor and python binding (#8131) @isVoid

🐛 Bug Fixes

Fix struct flattening to add a validity column only when the input column has null element (#8374) @ttnghia
Compilation fix: Remove redefinition for std::is_same_v() (#8369) @mythrocks
Add backward compatibility for dask-cudf to work with other versions of dask (#8368) @galipremsagar
Handle empty results with nested types in copy_if_else (#8359) @nvdbaranec
Handle nested column types properly for empty parquet files. (#8350) @nvdbaranec
Raise error when unsupported arguments are passed to dask_cudf.DataFrame.sort_values (#8349) @galipremsagar
Raise NotImplementedError for axis=1 in rank (#8347) @galipremsagar
Add support for make_meta_obj dispatch in dask-cudf (#8342) @galipremsagar
Update Java string concatenate test for single column (#8330) @tgravescs
Use empty_like in scatter (#8314) @revans2
Fix concatenate_lists_ignore_null on rows of all_nulls (#8312) @sperlingxx
Add separator-on-null parameter to strings concatenate APIs (#8282) @davidwendt
COLLECT_LIST support returning empty output columns. (#8279) @mythrocks
Update io util to convert path like object to string (#8275) @ayushdg
Fix result column types for empty inputs to rolling window (#8274) @mythrocks
Actually test equality in assert_groupby_results_equal (#8272) @shwina
CMake always explicitly specify a source files extension (#8270) @robertmaynard
Fix struct binary search and struct flattening (#8268) @ttnghia
Revert "patch thrust to fix intmax num elements limitation in scan_by_key" (#8263) @cwharris
upgrade dlpack to 0.5 (#8262) @cwharris
Fixes CSV-reader type inference for thousands separator and decimal point (#8261) @elstehle
Fix incorrect assertion in Java concat (#8258) @sperlingxx
Copy nested types upon construction (#8244) @isVoid
Preserve column hierarchy when getting NULL row from LIST column (#8206) @isVoid
Clip decimal binary op precision at max precision (#8194) @ChrisJar

📖 Documentation

Add docstring for dask_cudf.read_csv (#8355) @galipremsagar
Fix cudf release version in readme (#8331) @galipremsagar
Fix structs column description in dev docs (#8318) @isVoid
Update readme with correct CUDA versions (#8315) @raydouglass
Add description of the cuIO GDS integration (#8293) @vuule
Remove unused parameter from copy_partition kernel documentation (#8283) @robertmaynard

🚀 New Features

Add support merging b/w categorical data (#8332) @galipremsagar
Java: Support struct scalar (#8327) @sperlingxx
added _is_homogeneous property (#8299) @shaneding
Added decimal writing for CSV writer (#8296) @kaatish
Java: Support creating a scalar from utf8 string (#8294) @firestarman
Add Java API for Concatenate strings with separator (#8289) @tgravescs
strings::join_list_elements options for empty list inputs (#8285) @ttnghia
Return python lists for getitem calls to list type series (#8265) @brandon-b-miller
add unit tests for lead/lag on list for row window (#8259) @wbo4958
Create a String column from UTF8 String byte arrays (#8257) @firestarman
Support scattering list_scalar (#8256) @isVoid
Implement lists::concatenate_list_elements (#8231) @ttnghia
Support for struct scalars. (#8220) @nvdbaranec
Add support for decimal types in ORC writer (#8198) @vuule
Support create lists column from a list_scalar (#8185) @isVoid
Groupby.shift c++ API refactor and python binding (#8131) @isVoid
Add groupby::replace_nulls(replace_policy) api (#7118) @isVoid

🛠️ Improvements

Support Dask + Distributed 2021.05.1 (#8392) @jakirkham
Add aliases for string methods (#8353) @shwina
Update environment variable used to determine cuda_version (#8321) @ajschmidt8
JNI: Refactor the code of making column from scalar (#8310) @firestarman
Update CHANGELOG.md links for calver (#8303) @ajschmidt8
Merge branch-0.19 into branch-21.06 (#8302) @ajschmidt8
use address and length for GDS reads/writes (#8301) @rongou
Update cudfjni version to 21.06.0 (#8292) @pxLi
Update docs build script (#8284) @ajschmidt8
Make device_buffer streams explicit and enforce move construction (#8280) @harrism
Introduce a common parent class for NumericalColumn and DecimalColumn (#8278) @vyasr
Do not add nulls to the hash table when null_equality::NOT_EQUAL is passed to left_semi_join and left_anti_join (#8277) @nvdbaranec
Enable implicit casting when concatenating mixed types (#8276) @ChrisJar
Fix CMake FindPackage rmm, pin dev envs' dlpack to v0.3 (#8271) @trxcllnt
Update cudfjni version to 21.06 (#8267) @pxLi
support RMM aligned resource adapter in JNI (#8266) @rongou
Pass compiler environment variables to conda python build (#8260) @Ethyling
Remove abc inheritance from Serializable (#8254) @vyasr
Move more methods into SingleColumnFrame (#8253) @vyasr
Update ORC statistics API to use C++17 standard library (#8241) @vuule
Correct unused parameter warnings in dictonary algorithms (#8239) @robertmaynard
Correct unused parameters in the copying algorithms (#8232) @robertmaynard
IO statistics cleanup (#8191) @kaatish
Refactor of rolling_window implementation. (#8158) @nvdbaranec
Add a flag for allowing single quotes in JSON strings. (#8144) @nvdbaranec
Column refactoring 2 (#8130) @vyasr
support space in workspace (#7956) @jolorunyomi
Support collect_set on rolling window (#7881) @sperlingxx

cudf - v0.19.2

Published by GPUtester over 3 years ago

🚨 Breaking Changes

Allow hash_partition to take a seed value (#7771) @magnatelee
Allow merging index column with data column using keyword "on" (#7736) @skirui-source
Change JNI API to avoid loading native dependencies when creating sort order classes. (#7729) @revans2
Replace device_vector with device_uvector in null_mask (#7715) @harrism
Don't identify decimals as strings. (#7710) @vyasr
Fix Java Parquet write after writer API changes (#7655) @revans2
Convert cudf::concatenate APIs to use spans and device_uvector (#7621) @harrism
Update missing docstring examples in python public APIs (#7546) @galipremsagar
Remove unneeded step parameter from strings::detail::copy_slice (#7525) @davidwendt
Rename ARROW_STATIC_LIB because it conflicts with one in FindArrow.cmake (#7518) @trxcllnt
Match Pandas logic for comparing two objects with nulls (#7490) @brandon-b-miller
Add struct support to parquet writer (#7461) @devavret
Join APIs that return gathermaps (#7454) @shwina
fixed_point + cudf::binary_operation API Changes (#7435) @codereport
Fix BUG: Exception when PYTHONOPTIMIZE=2 (#7434) @skirui-source
Change nvtext::load_vocabulary_file to return a unique ptr (#7424) @davidwendt
Refactor strings column factories (#7397) @harrism
Use CMAKE_CUDA_ARCHITECTURES (#7391) @robertmaynard
Upgrade pandas to 1.2 (#7375) @galipremsagar
Rename logical_cast to bit_cast and allow additional conversions (#7373) @ttnghia
Rework libcudf CMakeLists.txt to export targets for CPM (#7107) @trxcllnt

🐛 Bug Fixes

unsnap: busy wait a number of cycles (#8073) @vuule
Fix returned column type when extracting from an empty list column (#8031) @jlowe
Don't reindex an new value on setitem if the original dataframe was empty (#8026) @vyasr
Fix a NameError in meta dispatch API (#7996) @galipremsagar
Reindex in DataFrame.__setitem__ (#7957) @galipremsagar
jitify direct-to-cubin compilation and caching. (#7919) @cwharris
Use dynamic cudart for nvcomp in java build (#7896) @abellina
fix "incompatible redefinition" warnings (#7894) @cwharris
cudf consistently specifies the cuda runtime (#7887) @robertmaynard
disable verbose output for jitify_preprocess (#7886) @cwharris
CMake jit_preprocess_files function only runs when needed (#7872) @robertmaynard
Push DeviceScalar construction into cython for list.contains (#7864) @brandon-b-miller
cudf now sets an install rpath of $ORIGIN (#7863) @robertmaynard
Don't install Thrust examples, tests, docs, and python files (#7811) @robertmaynard
Sort by index in groupby tests more consistently (#7802) @shwina
Revert "Update conda recipes pinning of repo dependencies (#7743)" (#7793) @raydouglass
Add decimal column handling in copy_type_metadata (#7788) @shwina
Add column names validation in parquet writer (#7786) @galipremsagar
Fix Java explode outer unit tests (#7782) @jlowe
Fix compiler warning about non-POD types passed through ellipsis (#7781) @jrhemstad
User resource fix for replace_nulls (#7769) @magnatelee
Fix type dispatch for columnar replace_nulls (#7768) @jlowe
Add ignore_order parameter to dask-cudf concat dispatch (#7765) @galipremsagar
Fix slicing and arrow representations of decimal columns (#7755) @vyasr
Fixing issue with explode_outer position not nulling position entries of null rows (#7754) @hyperbolic2346
Implement scatter for struct columns (#7752) @ttnghia
Fix data corruption in string columns (#7746) @galipremsagar
Fix string length in stripe dictionary building (#7744) @kaatish
Update conda recipes pinning of repo dependencies (#7743) @mike-wendt
Enable dask dispatch to cuDF's is_categorical_dtype for cuDF objects (#7740) @brandon-b-miller
Fix dictionary size computation in ORC writer (#7737) @vuule
Fix cudf::cast overflow for decimal64 to int32_t or smaller in certain cases (#7733) @codereport
Change JNI API to avoid loading native dependencies when creating sort order classes. (#7729) @revans2
Disable column_view data accessors for unsupported types (#7725) @jrhemstad
Materialize RangeIndex when index=True in parquet writer (#7711) @galipremsagar
Don't identify decimals as strings. (#7710) @vyasr
Fix return type of DataFrame.argsort (#7706) @galipremsagar
Fix/correct cudf installed package requirements (#7688) @robertmaynard
Fix SparkMurmurHash3_32 hash inconsistencies with Apache Spark (#7672) @jlowe
Fix ORC reader issue with reading empty string columns (#7656) @rgsl888prabhu
Fix Java Parquet write after writer API changes (#7655) @revans2
Fixing empty null lists throwing explode_outer for a loop. (#7649) @hyperbolic2346
Fix internal compiler error during JNI Docker build (#7645) @jlowe
Fix Debug build break with device_uvectors in grouped_rolling.cu (#7633) @mythrocks
Parquet reader: Fix issue when using skip_rows on non-nested columns containing nulls (#7627) @nvdbaranec
Fix ORC reader for empty DataFrame/Table (#7624) @rgsl888prabhu
Fix specifying GPU architecture in JNI build (#7612) @jlowe
Fix ORC writer OOM issue (#7605) @vuule
Fix 0.18 --> 0.19 automerge (#7589) @kkraus14
Fix ORC issue with incorrect timestamp nanosecond values (#7581) @vuule
Fix missing Dask imports (#7580) @kkraus14
CMAKE_CUDA_ARCHITECTURES doesn't change when build-system invokes cmake (#7579) @robertmaynard
Another fix for offsets_end() iterator in lists_column_view (#7575) @ttnghia
Fix ORC writer output corruption with string columns (#7565) @vuule
Fix cudf::lists::sort_lists failing for sliced column (#7564) @ttnghia
FIX Fix Anaconda upload args (#7558) @dillon-cullinan
Fix index mismatch issue in equality related APIs (#7555) @galipremsagar
FIX Revert gpuci_conda_retry on conda file output locations (#7552) @dillon-cullinan
Fix offset_end iterator for lists_column_view, which was not correctl… (#7551) @ttnghia
Fix no such file dlpack.h error when build libcudf (#7549) @chenrui17
Update missing docstring examples in python public APIs (#7546) @galipremsagar
Decimal32 Build Fix (#7544) @razajafri
FIX Retry conda output location (#7540) @dillon-cullinan
fix missing renames of dask git branches from master to main (#7535) @kkraus14
Remove detail from device_span (#7533) @rwlee
Change dask and distributed branch to main (#7532) @dantegd
Update JNI build to use CUDF_USE_ARROW_STATIC (#7526) @jlowe
Make sure rmm::rmm CMake target is visibile to cudf users (#7524) @robertmaynard
Fix contiguous_split not properly handling output partitions > 2 GB. (#7515) @nvdbaranec
Change jit launch to safe_launch (#7510) @devavret
Fix comparison between Datetime/Timedelta columns and NULL scalars (#7504) @brandon-b-miller
Fix off-by-one error in char-parallel string scalar replace (#7502) @jlowe
Fix JNI deprecation of all, put it on the wrong version before (#7501) @revans2
Fix Series/Dataframe Mixed Arithmetic (#7491) @brandon-b-miller
Fix JNI build after removal of libcudf sub-libraries (#7486) @jlowe
Correctly compile benchmarks (#7485) @robertmaynard
Fix bool column corruption with ORC Reader (#7483) @rgsl888prabhu
Fix __repr__ for categorical dtype (#7476) @galipremsagar
Java cleaner synchronization (#7474) @abellina
Fix java float/double parsing tests (#7473) @revans2
Pass stream and user resource to make_default_constructed_scalar (#7469) @magnatelee
Improve stability of dask_cudf.DataFrame.var and dask_cudf.DataFrame.std (#7453) @rjzamora
Missing device_storage_dispatch change affecting cudf::gather (#7449) @codereport
fix cuFile JNI compile errors (#7445) @rongou
Support Series.__setitem__ with key to a new row (#7443) @isVoid
Fix BUG: Exception when PYTHONOPTIMIZE=2 (#7434) @skirui-source
Make inclusive scan safe for cases with leading nulls (#7432) @magnatelee
Fix typo in list_device_view::pair_rep_end() (#7423) @mythrocks
Fix string to double conversion and row equivalent comparison (#7410) @ttnghia
Fix thrust failure when transfering data from device_vector to host_vector with vectors of size 1 (#7382) @ttnghia
Fix std::exeception catch-by-reference gcc9 compile error (#7380) @davidwendt
Fix skiprows issue with ORC Reader (#7359) @rgsl888prabhu
fix Arrow CMake file (#7358) @rongou
Fix lists::contains() for NaN and Decimals (#7349) @mythrocks
Handle cupy array in Dataframe.__setitem__ (#7340) @galipremsagar
Fix invalid-device-fn error in cudf::strings::replace_re with multiple regex's (#7336) @davidwendt
FIX Add codecov upload block to gpu script (#6860) @dillon-cullinan

📖 Documentation

Fix join API doxygen (#7890) @shwina
Add Resources to README. (#7697) @bdice
Add isin examples in Docstring (#7479) @galipremsagar
Resolving unlinked type shorthands in cudf doc (#7416) @isVoid
Fix typo in regex.md doc page (#7363) @davidwendt
Fix incorrect strings_column_view::chars_size documentation (#7360) @jlowe

🚀 New Features

Enable basic reductions for decimal columns (#7776) @ChrisJar
Enable join on decimal columns (#7764) @ChrisJar
Allow merging index column with data column using keyword "on" (#7736) @skirui-source
Implement DecimalColumn + Scalar and add cudf.Scalars of Decimal64Dtype (#7732) @brandon-b-miller
Add support for unique groupby aggregation (#7726) @shwina
Expose libcudf's label_bins function to cudf (#7724) @vyasr
Adding support for equi-join on struct (#7720) @hyperbolic2346
Add decimal column comparison operations (#7716) @isVoid
Implement scan operations for decimal columns (#7707) @ChrisJar
Enable typecasting between decimal and int (#7691) @ChrisJar
Enable decimal support in parquet writer (#7673) @devavret
Adds list.unique API (#7664) @isVoid
Fix NaN handling in drop_list_duplicates (#7662) @ttnghia
Add lists.sort_values API (#7657) @isVoid
Add is_integer API that can check for the validity of a string-to-integer conversion (#7642) @ttnghia
Adds explode API (#7607) @isVoid
Adds list.take, python binding for cudf::lists::segmented_gather (#7591) @isVoid
Implement cudf::label_bins() (#7554) @vyasr
Add Python bindings for lists::contains (#7547) @skirui-source
cudf::row_bit_count() support. (#7534) @nvdbaranec
Implement drop_list_duplicates (#7528) @ttnghia
Add Python bindings for lists::extract_lists_element (#7505) @skirui-source
Add explode_outer and explode_outer_position (#7499) @hyperbolic2346
Match Pandas logic for comparing two objects with nulls (#7490) @brandon-b-miller
Add struct support to parquet writer (#7461) @devavret
Enable type conversion from float to decimal type (#7450) @ChrisJar
Add cython for converting strings/fixed-point functions (#7429) @davidwendt
Add struct column support to cudf::sort and cudf::sorted_order (#7422) @karthikeyann
Implement groupby collect_set (#7420) @ttnghia
Merge branch-0.18 into branch-0.19 (#7411) @raydouglass
Refactor strings column factories (#7397) @harrism
Add groupby scan operations (sort groupby) (#7387) @karthikeyann
Add cudf::explode_position (#7376) @hyperbolic2346
Add string conversion to/from decimal values libcudf APIs (#7364) @davidwendt
Add groupby SUM_OF_SQUARES support (#7362) @karthikeyann
Add Series.drop api (#7304) @isVoid
get_json_object() implementation (#7286) @nvdbaranec
Python API for LIstMethods.len() (#7283) @isVoid
Support null_policy::EXCLUDE for COLLECT rolling aggregation (#7264) @mythrocks
Add support for special tokens in nvtext::subword_tokenizer (#7254) @davidwendt
Fix inplace update of data and add Series.update (#7201) @galipremsagar
Implement cudf::group_by (hash) for decimal32 and decimal64 (#7190) @codereport
Adding support to specify "level" parameter for Dataframe.rename (#7135) @skirui-source

🛠️ Improvements

fix GDS include path for version 0.95 (#7877) @rongou
Update dask + distributed to 2021.4.0 (#7858) @jakirkham
Add ability to extract include dirs from CUDF_HOME (#7848) @galipremsagar
Add USE_GDS as an option in build script (#7833) @pxLi
add an allocate method with stream in java DeviceMemoryBuffer (#7826) @rongou
Constrain dask and distributed versions to 2021.3.1 (#7825) @shwina
Revert dask versioning of concat dispatch (#7823) @galipremsagar
add copy methods in Java memory buffer (#7791) @rongou
Update README and CONTRIBUTING for 0.19 (#7778) @robertmaynard
Allow hash_partition to take a seed value (#7771) @magnatelee
Turn on NVTX by default in java build (#7761) @tgravescs
Add Java bindings to join gather map APIs (#7751) @jlowe
Add replacements column support for Java replaceNulls (#7750) @jlowe
Add Java bindings for row_bit_count (#7749) @jlowe
Remove unused JVM array creation (#7748) @jlowe
Added JNI support for new is_integer (#7739) @revans2
Create and promote library aliases in libcudf installations (#7734) @trxcllnt
Support groupby operations for decimal dtypes (#7731) @vyasr
Memory map the input file only when GDS compatiblity mode is not used (#7717) @vuule
Replace device_vector with device_uvector in null_mask (#7715) @harrism
Struct hashing support for SerialMurmur3 and SparkMurmur3 (#7714) @jlowe
Add gbenchmark for nvtext replace-tokens function (#7708) @davidwendt
Use stream in groupby calls (#7705) @karthikeyann
Update codeowners file (#7701) @ajschmidt8
Cleanup groupby to use host_span, device_span, device_uvector (#7698) @karthikeyann
Add gbenchmark for nvtext ngrams functions (#7693) @davidwendt
Misc Python/Cython optimizations (#7686) @shwina
Add gbenchmark for nvtext tokenize functions (#7684) @davidwendt
Add column_device_view to orc writer (#7676) @kaatish
cudf_kafka now uses cuDF CMake export targets (CPM) (#7674) @robertmaynard
Add gbenchmark for nvtext normalize functions (#7668) @davidwendt
Resolve unnecessary import of thrust/optional.hpp in types.hpp (#7667) @vyasr
Feature/optimize accessor copy (#7660) @vyasr
Fix find_package(cudf) (#7658) @trxcllnt
Work-around for gcc7 compile error on Centos7 (#7652) @davidwendt
Add in JNI support for count_elements (#7651) @revans2
Fix issues with building cudf in a non-conda environment (#7647) @galipremsagar
Refactor ConfigureCUDA to not conditionally insert compiler flags (#7643) @robertmaynard
Add gbenchmark for converting strings to/from timestamps (#7641) @davidwendt
Handle constructing a cudf.Scalar from a cudf.Scalar (#7639) @shwina
Add in JNI support for table partition (#7637) @revans2
Add explicit fixed_point merge test (#7635) @codereport
Add JNI support for IDENTITY hash partitioning (#7626) @revans2
Java support on explode_outer (#7625) @sperlingxx
Java support of casting string from/to decimal (#7623) @sperlingxx
Convert cudf::concatenate APIs to use spans and device_uvector (#7621) @harrism
Add gbenchmark for cudf::strings::translate function (#7617) @davidwendt
Use file(COPY ) over file(INSTALL ) so cmake output is reduced (#7616) @robertmaynard
Use rmm::device_uvector in place of rmm::device_vector for ORC reader/writer and cudf::io::column_buffer (#7614) @vuule
Refactor Java host-side buffer concatenation to expose separate steps (#7610) @jlowe
Add gbenchmarks for string substrings functions (#7603) @davidwendt
Refactor string conversion check (#7599) @ttnghia
JNI: Pass names of children struct columns to native Arrow IPC writer (#7598) @firestarman
Revert "ENH Fix stale GHA and prevent duplicates " (#7595) @mike-wendt
ENH Fix stale GHA and prevent duplicates (#7594) @mike-wendt
Fix auto-detecting GPU architectures (#7593) @trxcllnt
Reduce cudf library size (#7583) @robertmaynard
Optimize cudf::make_strings_column for long strings (#7576) @davidwendt
Always build and export the cudf::cudftestutil target (#7574) @trxcllnt
Eliminate literal parameters to uvector::set_element_async and device_scalar::set_value (#7563) @harrism
Add gbenchmark for strings::concatenate (#7560) @davidwendt
Update Changelog Link (#7550) @ajschmidt8
Add gbenchmarks for strings replace regex functions (#7541) @davidwendt
Add __repr__ for Column and ColumnAccessor (#7531) @shwina
Support Decimal DIV changes in cudf (#7527) @razajafri
Remove unneeded step parameter from strings::detail::copy_slice (#7525) @davidwendt
Use device_uvector, device_span in sort groupby (#7523) @karthikeyann
Add gbenchmarks for strings extract function (#7522) @davidwendt
Rename ARROW_STATIC_LIB because it conflicts with one in FindArrow.cmake (#7518) @trxcllnt
Reduce compile time/size for scan.cu (#7516) @davidwendt
Change device_vector to device_uvector in nvtext source files (#7512) @davidwendt
Removed unneeded includes from traits.hpp (#7509) @davidwendt
FIX Remove random build directory generation for ccache (#7508) @dillon-cullinan
xfail failing pytest in pandas 1.2.3 (#7507) @galipremsagar
JNI bit cast (#7493) @revans2
Combine rolling window function tests (#7480) @mythrocks
Prepare Changelog for Automation (#7477) @ajschmidt8
Java support for explode position (#7471) @sperlingxx
Update 0.18 changelog entry (#7463) @ajschmidt8
JNI: Support skipping nulls for collect aggregation (#7457) @firestarman
Join APIs that return gathermaps (#7454) @shwina
Remove dependence on managed memory for multimap test (#7451) @jrhemstad
Use cuFile for Parquet IO when available (#7444) @vuule
Statistics cleanup (#7439) @kaatish
Add gbenchmarks for strings filter functions (#7438) @davidwendt
fixed_point + cudf::binary_operation API Changes (#7435) @codereport
Improve string gather performance (#7433) @jlowe
Don't use user resource for a temporary allocation in sort_by_key (#7431) @magnatelee
Detail APIs for datetime functions (#7430) @magnatelee
Replace thrust::max_element with thrust::reduce in strings findall_re (#7428) @davidwendt
Add gbenchmark for strings split/split_record functions (#7427) @davidwendt
Update JNI build to use CMAKE_CUDA_ARCHITECTURES (#7425) @jlowe
Change nvtext::load_vocabulary_file to return a unique ptr (#7424) @davidwendt
Simplify type dispatch with device_storage_dispatch (#7419) @codereport
Java support for casting of nested child columns (#7417) @razajafri
Improve scalar string replace performance for long strings (#7415) @jlowe
Remove unneeded temporary device vector for strings scatter specialization (#7409) @davidwendt
bitmask_or implementation with bitmask refactor (#7406) @rwlee
Add other cudf::strings::replace functions to current strings replace gbenchmark (#7403) @davidwendt
Clean up included headers in device_operators.cuh (#7401) @codereport
Move nullable index iterator to indexalator factory (#7399) @davidwendt
ENH Pass ccache variables to conda recipe & use Ninja in CI (#7398) @Ethyling
upgrade maven-antrun-plugin to support maven parallel builds (#7393) @rongou
Add gbenchmark for strings find/contains functions (#7392) @davidwendt
Use CMAKE_CUDA_ARCHITECTURES (#7391) @robertmaynard
Refactor libcudf strings::replace to use make_strings_children utility (#7384) @davidwendt
Added in JNI support for out of core sort algorithm (#7381) @revans2
Upgrade pandas to 1.2 (#7375) @galipremsagar
Rename logical_cast to bit_cast and allow additional conversions (#7373) @ttnghia
jitify 2 support (#7372) @cwharris
compile_udf: Cache PTX for similar functions (#7371) @gmarkall
Add string scalar replace benchmark (#7369) @jlowe
Add gbenchmark for strings contains_re/count_re functions (#7366) @davidwendt
Update orc reader and writer fuzz tests (#7357) @galipremsagar
Improve url_decode performance for long strings (#7353) @jlowe
cudf::ast Small Refactorings (#7352) @codereport
Remove std::cout and print in the scatter test function EmptyListsOfNullableStrings. (#7342) @ttnghia
Use cudf::detail::make_counting_transform_iterator (#7338) @codereport
Change block size parameter from a global to a template param. (#7333) @nvdbaranec
Partial clean up of ORC writer (#7324) @vuule
Add gbenchmark for cudf::strings::to_lower (#7316) @davidwendt
Update Java bindings version to 0.19-SNAPSHOT (#7307) @pxLi
Move cudf::test::make_counting_transform_iterator to cudf/detail/iterator.cuh (#7306) @codereport
Use string literals in fixed_point release_asserts (#7303) @codereport
Fix merge conflicts for #7295 (#7297) @ajschmidt8
Add UTF-8 chars to create_random_column<string_view> benchmark utility (#7292) @davidwendt
Abstracting block reduce and block scan from cuIO kernels with cub apis (#7278) @rgsl888prabhu
Build.sh use cmake --build to drive build system invocation (#7270) @robertmaynard
Refactor dictionary support for reductions any/all (#7242) @davidwendt
Replace stream.value() with stream for stream_view args (#7236) @karthikeyann
Interval index and interval_range (#7182) @marlenezw
avro reader integration tests (#7156) @cwharris
Rework libcudf CMakeLists.txt to export targets for CPM (#7107) @trxcllnt
Adding Interval Dtype (#6984) @marlenezw
Cleaning up for loops with make_(counting_)transform_iterator (#6546) @codereport

cudf - v0.19.1

Published by GPUtester over 3 years ago

🚨 Breaking Changes

Allow hash_partition to take a seed value (#7771) @magnatelee
Allow merging index column with data column using keyword "on" (#7736) @skirui-source
Change JNI API to avoid loading native dependencies when creating sort order classes. (#7729) @revans2
Replace device_vector with device_uvector in null_mask (#7715) @harrism
Don't identify decimals as strings. (#7710) @vyasr
Fix Java Parquet write after writer API changes (#7655) @revans2
Convert cudf::concatenate APIs to use spans and device_uvector (#7621) @harrism
Update missing docstring examples in python public APIs (#7546) @galipremsagar
Remove unneeded step parameter from strings::detail::copy_slice (#7525) @davidwendt
Rename ARROW_STATIC_LIB because it conflicts with one in FindArrow.cmake (#7518) @trxcllnt
Match Pandas logic for comparing two objects with nulls (#7490) @brandon-b-miller
Add struct support to parquet writer (#7461) @devavret
Join APIs that return gathermaps (#7454) @shwina
fixed_point + cudf::binary_operation API Changes (#7435) @codereport
Fix BUG: Exception when PYTHONOPTIMIZE=2 (#7434) @skirui-source
Change nvtext::load_vocabulary_file to return a unique ptr (#7424) @davidwendt
Refactor strings column factories (#7397) @harrism
Use CMAKE_CUDA_ARCHITECTURES (#7391) @robertmaynard
Upgrade pandas to 1.2 (#7375) @galipremsagar
Rename logical_cast to bit_cast and allow additional conversions (#7373) @ttnghia
Rework libcudf CMakeLists.txt to export targets for CPM (#7107) @trxcllnt

🐛 Bug Fixes

Fix returned column type when extracting from an empty list column (#8031) @jlowe
Don't reindex an new value on setitem if the original dataframe was empty (#8026) @vyasr
Fix a NameError in meta dispatch API (#7996) @galipremsagar
Reindex in DataFrame.__setitem__ (#7957) @galipremsagar
jitify direct-to-cubin compilation and caching. (#7919) @cwharris
Use dynamic cudart for nvcomp in java build (#7896) @abellina
fix "incompatible redefinition" warnings (#7894) @cwharris
cudf consistently specifies the cuda runtime (#7887) @robertmaynard
disable verbose output for jitify_preprocess (#7886) @cwharris
CMake jit_preprocess_files function only runs when needed (#7872) @robertmaynard
Push DeviceScalar construction into cython for list.contains (#7864) @brandon-b-miller
cudf now sets an install rpath of $ORIGIN (#7863) @robertmaynard
Don't install Thrust examples, tests, docs, and python files (#7811) @robertmaynard
Sort by index in groupby tests more consistently (#7802) @shwina
Revert "Update conda recipes pinning of repo dependencies (#7743)" (#7793) @raydouglass
Add decimal column handling in copy_type_metadata (#7788) @shwina
Add column names validation in parquet writer (#7786) @galipremsagar
Fix Java explode outer unit tests (#7782) @jlowe
Fix compiler warning about non-POD types passed through ellipsis (#7781) @jrhemstad
User resource fix for replace_nulls (#7769) @magnatelee
Fix type dispatch for columnar replace_nulls (#7768) @jlowe
Add ignore_order parameter to dask-cudf concat dispatch (#7765) @galipremsagar
Fix slicing and arrow representations of decimal columns (#7755) @vyasr
Fixing issue with explode_outer position not nulling position entries of null rows (#7754) @hyperbolic2346
Implement scatter for struct columns (#7752) @ttnghia
Fix data corruption in string columns (#7746) @galipremsagar
Fix string length in stripe dictionary building (#7744) @kaatish
Update conda recipes pinning of repo dependencies (#7743) @mike-wendt
Enable dask dispatch to cuDF's is_categorical_dtype for cuDF objects (#7740) @brandon-b-miller
Fix dictionary size computation in ORC writer (#7737) @vuule
Fix cudf::cast overflow for decimal64 to int32_t or smaller in certain cases (#7733) @codereport
Change JNI API to avoid loading native dependencies when creating sort order classes. (#7729) @revans2
Disable column_view data accessors for unsupported types (#7725) @jrhemstad
Materialize RangeIndex when index=True in parquet writer (#7711) @galipremsagar
Don't identify decimals as strings. (#7710) @vyasr
Fix return type of DataFrame.argsort (#7706) @galipremsagar
Fix/correct cudf installed package requirements (#7688) @robertmaynard
Fix SparkMurmurHash3_32 hash inconsistencies with Apache Spark (#7672) @jlowe
Fix ORC reader issue with reading empty string columns (#7656) @rgsl888prabhu
Fix Java Parquet write after writer API changes (#7655) @revans2
Fixing empty null lists throwing explode_outer for a loop. (#7649) @hyperbolic2346
Fix internal compiler error during JNI Docker build (#7645) @jlowe
Fix Debug build break with device_uvectors in grouped_rolling.cu (#7633) @mythrocks
Parquet reader: Fix issue when using skip_rows on non-nested columns containing nulls (#7627) @nvdbaranec
Fix ORC reader for empty DataFrame/Table (#7624) @rgsl888prabhu
Fix specifying GPU architecture in JNI build (#7612) @jlowe
Fix ORC writer OOM issue (#7605) @vuule
Fix 0.18 --> 0.19 automerge (#7589) @kkraus14
Fix ORC issue with incorrect timestamp nanosecond values (#7581) @vuule
Fix missing Dask imports (#7580) @kkraus14
CMAKE_CUDA_ARCHITECTURES doesn't change when build-system invokes cmake (#7579) @robertmaynard
Another fix for offsets_end() iterator in lists_column_view (#7575) @ttnghia
Fix ORC writer output corruption with string columns (#7565) @vuule
Fix cudf::lists::sort_lists failing for sliced column (#7564) @ttnghia
FIX Fix Anaconda upload args (#7558) @dillon-cullinan
Fix index mismatch issue in equality related APIs (#7555) @galipremsagar
FIX Revert gpuci_conda_retry on conda file output locations (#7552) @dillon-cullinan
Fix offset_end iterator for lists_column_view, which was not correctl… (#7551) @ttnghia
Fix no such file dlpack.h error when build libcudf (#7549) @chenrui17
Update missing docstring examples in python public APIs (#7546) @galipremsagar
Decimal32 Build Fix (#7544) @razajafri
FIX Retry conda output location (#7540) @dillon-cullinan
fix missing renames of dask git branches from master to main (#7535) @kkraus14
Remove detail from device_span (#7533) @rwlee
Change dask and distributed branch to main (#7532) @dantegd
Update JNI build to use CUDF_USE_ARROW_STATIC (#7526) @jlowe
Make sure rmm::rmm CMake target is visibile to cudf users (#7524) @robertmaynard
Fix contiguous_split not properly handling output partitions > 2 GB. (#7515) @nvdbaranec
Change jit launch to safe_launch (#7510) @devavret
Fix comparison between Datetime/Timedelta columns and NULL scalars (#7504) @brandon-b-miller
Fix off-by-one error in char-parallel string scalar replace (#7502) @jlowe
Fix JNI deprecation of all, put it on the wrong version before (#7501) @revans2
Fix Series/Dataframe Mixed Arithmetic (#7491) @brandon-b-miller
Fix JNI build after removal of libcudf sub-libraries (#7486) @jlowe
Correctly compile benchmarks (#7485) @robertmaynard
Fix bool column corruption with ORC Reader (#7483) @rgsl888prabhu
Fix __repr__ for categorical dtype (#7476) @galipremsagar
Java cleaner synchronization (#7474) @abellina
Fix java float/double parsing tests (#7473) @revans2
Pass stream and user resource to make_default_constructed_scalar (#7469) @magnatelee
Improve stability of dask_cudf.DataFrame.var and dask_cudf.DataFrame.std (#7453) @rjzamora
Missing device_storage_dispatch change affecting cudf::gather (#7449) @codereport
fix cuFile JNI compile errors (#7445) @rongou
Support Series.__setitem__ with key to a new row (#7443) @isVoid
Fix BUG: Exception when PYTHONOPTIMIZE=2 (#7434) @skirui-source
Make inclusive scan safe for cases with leading nulls (#7432) @magnatelee
Fix typo in list_device_view::pair_rep_end() (#7423) @mythrocks
Fix string to double conversion and row equivalent comparison (#7410) @ttnghia
Fix thrust failure when transfering data from device_vector to host_vector with vectors of size 1 (#7382) @ttnghia
Fix std::exeception catch-by-reference gcc9 compile error (#7380) @davidwendt
Fix skiprows issue with ORC Reader (#7359) @rgsl888prabhu
fix Arrow CMake file (#7358) @rongou
Fix lists::contains() for NaN and Decimals (#7349) @mythrocks
Handle cupy array in Dataframe.__setitem__ (#7340) @galipremsagar
Fix invalid-device-fn error in cudf::strings::replace_re with multiple regex's (#7336) @davidwendt
FIX Add codecov upload block to gpu script (#6860) @dillon-cullinan

📖 Documentation

Fix join API doxygen (#7890) @shwina
Add Resources to README. (#7697) @bdice
Add isin examples in Docstring (#7479) @galipremsagar
Resolving unlinked type shorthands in cudf doc (#7416) @isVoid
Fix typo in regex.md doc page (#7363) @davidwendt
Fix incorrect strings_column_view::chars_size documentation (#7360) @jlowe

🚀 New Features

Enable basic reductions for decimal columns (#7776) @ChrisJar
Enable join on decimal columns (#7764) @ChrisJar
Allow merging index column with data column using keyword "on" (#7736) @skirui-source
Implement DecimalColumn + Scalar and add cudf.Scalars of Decimal64Dtype (#7732) @brandon-b-miller
Add support for unique groupby aggregation (#7726) @shwina
Expose libcudf's label_bins function to cudf (#7724) @vyasr
Adding support for equi-join on struct (#7720) @hyperbolic2346
Add decimal column comparison operations (#7716) @isVoid
Implement scan operations for decimal columns (#7707) @ChrisJar
Enable typecasting between decimal and int (#7691) @ChrisJar
Enable decimal support in parquet writer (#7673) @devavret
Adds list.unique API (#7664) @isVoid
Fix NaN handling in drop_list_duplicates (#7662) @ttnghia
Add lists.sort_values API (#7657) @isVoid
Add is_integer API that can check for the validity of a string-to-integer conversion (#7642) @ttnghia
Adds explode API (#7607) @isVoid
Adds list.take, python binding for cudf::lists::segmented_gather (#7591) @isVoid
Implement cudf::label_bins() (#7554) @vyasr
Add Python bindings for lists::contains (#7547) @skirui-source
cudf::row_bit_count() support. (#7534) @nvdbaranec
Implement drop_list_duplicates (#7528) @ttnghia
Add Python bindings for lists::extract_lists_element (#7505) @skirui-source
Add explode_outer and explode_outer_position (#7499) @hyperbolic2346
Match Pandas logic for comparing two objects with nulls (#7490) @brandon-b-miller
Add struct support to parquet writer (#7461) @devavret
Enable type conversion from float to decimal type (#7450) @ChrisJar
Add cython for converting strings/fixed-point functions (#7429) @davidwendt
Add struct column support to cudf::sort and cudf::sorted_order (#7422) @karthikeyann
Implement groupby collect_set (#7420) @ttnghia
Merge branch-0.18 into branch-0.19 (#7411) @raydouglass
Refactor strings column factories (#7397) @harrism
Add groupby scan operations (sort groupby) (#7387) @karthikeyann
Add cudf::explode_position (#7376) @hyperbolic2346
Add string conversion to/from decimal values libcudf APIs (#7364) @davidwendt
Add groupby SUM_OF_SQUARES support (#7362) @karthikeyann
Add Series.drop api (#7304) @isVoid
get_json_object() implementation (#7286) @nvdbaranec
Python API for LIstMethods.len() (#7283) @isVoid
Support null_policy::EXCLUDE for COLLECT rolling aggregation (#7264) @mythrocks
Add support for special tokens in nvtext::subword_tokenizer (#7254) @davidwendt
Fix inplace update of data and add Series.update (#7201) @galipremsagar
Implement cudf::group_by (hash) for decimal32 and decimal64 (#7190) @codereport
Adding support to specify "level" parameter for Dataframe.rename (#7135) @skirui-source

🛠️ Improvements

fix GDS include path for version 0.95 (#7877) @rongou
Update dask + distributed to 2021.4.0 (#7858) @jakirkham
Add ability to extract include dirs from CUDF_HOME (#7848) @galipremsagar
Add USE_GDS as an option in build script (#7833) @pxLi
add an allocate method with stream in java DeviceMemoryBuffer (#7826) @rongou
Constrain dask and distributed versions to 2021.3.1 (#7825) @shwina
Revert dask versioning of concat dispatch (#7823) @galipremsagar
add copy methods in Java memory buffer (#7791) @rongou
Update README and CONTRIBUTING for 0.19 (#7778) @robertmaynard
Allow hash_partition to take a seed value (#7771) @magnatelee
Turn on NVTX by default in java build (#7761) @tgravescs
Add Java bindings to join gather map APIs (#7751) @jlowe
Add replacements column support for Java replaceNulls (#7750) @jlowe
Add Java bindings for row_bit_count (#7749) @jlowe
Remove unused JVM array creation (#7748) @jlowe
Added JNI support for new is_integer (#7739) @revans2
Create and promote library aliases in libcudf installations (#7734) @trxcllnt
Support groupby operations for decimal dtypes (#7731) @vyasr
Memory map the input file only when GDS compatiblity mode is not used (#7717) @vuule
Replace device_vector with device_uvector in null_mask (#7715) @harrism
Struct hashing support for SerialMurmur3 and SparkMurmur3 (#7714) @jlowe
Add gbenchmark for nvtext replace-tokens function (#7708) @davidwendt
Use stream in groupby calls (#7705) @karthikeyann
Update codeowners file (#7701) @ajschmidt8
Cleanup groupby to use host_span, device_span, device_uvector (#7698) @karthikeyann
Add gbenchmark for nvtext ngrams functions (#7693) @davidwendt
Misc Python/Cython optimizations (#7686) @shwina
Add gbenchmark for nvtext tokenize functions (#7684) @davidwendt
Add column_device_view to orc writer (#7676) @kaatish
cudf_kafka now uses cuDF CMake export targets (CPM) (#7674) @robertmaynard
Add gbenchmark for nvtext normalize functions (#7668) @davidwendt
Resolve unnecessary import of thrust/optional.hpp in types.hpp (#7667) @vyasr
Feature/optimize accessor copy (#7660) @vyasr
Fix find_package(cudf) (#7658) @trxcllnt
Work-around for gcc7 compile error on Centos7 (#7652) @davidwendt
Add in JNI support for count_elements (#7651) @revans2
Fix issues with building cudf in a non-conda environment (#7647) @galipremsagar
Refactor ConfigureCUDA to not conditionally insert compiler flags (#7643) @robertmaynard
Add gbenchmark for converting strings to/from timestamps (#7641) @davidwendt
Handle constructing a cudf.Scalar from a cudf.Scalar (#7639) @shwina
Add in JNI support for table partition (#7637) @revans2
Add explicit fixed_point merge test (#7635) @codereport
Add JNI support for IDENTITY hash partitioning (#7626) @revans2
Java support on explode_outer (#7625) @sperlingxx
Java support of casting string from/to decimal (#7623) @sperlingxx
Convert cudf::concatenate APIs to use spans and device_uvector (#7621) @harrism
Add gbenchmark for cudf::strings::translate function (#7617) @davidwendt
Use file(COPY ) over file(INSTALL ) so cmake output is reduced (#7616) @robertmaynard
Use rmm::device_uvector in place of rmm::device_vector for ORC reader/writer and cudf::io::column_buffer (#7614) @vuule
Refactor Java host-side buffer concatenation to expose separate steps (#7610) @jlowe
Add gbenchmarks for string substrings functions (#7603) @davidwendt
Refactor string conversion check (#7599) @ttnghia
JNI: Pass names of children struct columns to native Arrow IPC writer (#7598) @firestarman
Revert "ENH Fix stale GHA and prevent duplicates " (#7595) @mike-wendt
ENH Fix stale GHA and prevent duplicates (#7594) @mike-wendt
Fix auto-detecting GPU architectures (#7593) @trxcllnt
Reduce cudf library size (#7583) @robertmaynard
Optimize cudf::make_strings_column for long strings (#7576) @davidwendt
Always build and export the cudf::cudftestutil target (#7574) @trxcllnt
Eliminate literal parameters to uvector::set_element_async and device_scalar::set_value (#7563) @harrism
Add gbenchmark for strings::concatenate (#7560) @davidwendt
Update Changelog Link (#7550) @ajschmidt8
Add gbenchmarks for strings replace regex functions (#7541) @davidwendt
Add __repr__ for Column and ColumnAccessor (#7531) @shwina
Support Decimal DIV changes in cudf (#7527) @razajafri
Remove unneeded step parameter from strings::detail::copy_slice (#7525) @davidwendt
Use device_uvector, device_span in sort groupby (#7523) @karthikeyann
Add gbenchmarks for strings extract function (#7522) @davidwendt
Rename ARROW_STATIC_LIB because it conflicts with one in FindArrow.cmake (#7518) @trxcllnt
Reduce compile time/size for scan.cu (#7516) @davidwendt
Change device_vector to device_uvector in nvtext source files (#7512) @davidwendt
Removed unneeded includes from traits.hpp (#7509) @davidwendt
FIX Remove random build directory generation for ccache (#7508) @dillon-cullinan
xfail failing pytest in pandas 1.2.3 (#7507) @galipremsagar
JNI bit cast (#7493) @revans2
Combine rolling window function tests (#7480) @mythrocks
Prepare Changelog for Automation (#7477) @ajschmidt8
Java support for explode position (#7471) @sperlingxx
Update 0.18 changelog entry (#7463) @ajschmidt8
JNI: Support skipping nulls for collect aggregation (#7457) @firestarman
Join APIs that return gathermaps (#7454) @shwina
Remove dependence on managed memory for multimap test (#7451) @jrhemstad
Use cuFile for Parquet IO when available (#7444) @vuule
Statistics cleanup (#7439) @kaatish
Add gbenchmarks for strings filter functions (#7438) @davidwendt
fixed_point + cudf::binary_operation API Changes (#7435) @codereport
Improve string gather performance (#7433) @jlowe
Don't use user resource for a temporary allocation in sort_by_key (#7431) @magnatelee
Detail APIs for datetime functions (#7430) @magnatelee
Replace thrust::max_element with thrust::reduce in strings findall_re (#7428) @davidwendt
Add gbenchmark for strings split/split_record functions (#7427) @davidwendt
Update JNI build to use CMAKE_CUDA_ARCHITECTURES (#7425) @jlowe
Change nvtext::load_vocabulary_file to return a unique ptr (#7424) @davidwendt
Simplify type dispatch with device_storage_dispatch (#7419) @codereport
Java support for casting of nested child columns (#7417) @razajafri
Improve scalar string replace performance for long strings (#7415) @jlowe
Remove unneeded temporary device vector for strings scatter specialization (#7409) @davidwendt
bitmask_or implementation with bitmask refactor (#7406) @rwlee
Add other cudf::strings::replace functions to current strings replace gbenchmark (#7403) @davidwendt
Clean up included headers in device_operators.cuh (#7401) @codereport
Move nullable index iterator to indexalator factory (#7399) @davidwendt
ENH Pass ccache variables to conda recipe & use Ninja in CI (#7398) @Ethyling
upgrade maven-antrun-plugin to support maven parallel builds (#7393) @rongou
Add gbenchmark for strings find/contains functions (#7392) @davidwendt
Use CMAKE_CUDA_ARCHITECTURES (#7391) @robertmaynard
Refactor libcudf strings::replace to use make_strings_children utility (#7384) @davidwendt
Added in JNI support for out of core sort algorithm (#7381) @revans2
Upgrade pandas to 1.2 (#7375) @galipremsagar
Rename logical_cast to bit_cast and allow additional conversions (#7373) @ttnghia
jitify 2 support (#7372) @cwharris
compile_udf: Cache PTX for similar functions (#7371) @gmarkall
Add string scalar replace benchmark (#7369) @jlowe
Add gbenchmark for strings contains_re/count_re functions (#7366) @davidwendt
Update orc reader and writer fuzz tests (#7357) @galipremsagar
Improve url_decode performance for long strings (#7353) @jlowe
cudf::ast Small Refactorings (#7352) @codereport
Remove std::cout and print in the scatter test function EmptyListsOfNullableStrings. (#7342) @ttnghia
Use cudf::detail::make_counting_transform_iterator (#7338) @codereport
Change block size parameter from a global to a template param. (#7333) @nvdbaranec
Partial clean up of ORC writer (#7324) @vuule
Add gbenchmark for cudf::strings::to_lower (#7316) @davidwendt
Update Java bindings version to 0.19-SNAPSHOT (#7307) @pxLi
Move cudf::test::make_counting_transform_iterator to cudf/detail/iterator.cuh (#7306) @codereport
Use string literals in fixed_point release_asserts (#7303) @codereport
Fix merge conflicts for #7295 (#7297) @ajschmidt8
Add UTF-8 chars to create_random_column<string_view> benchmark utility (#7292) @davidwendt
Abstracting block reduce and block scan from cuIO kernels with cub apis (#7278) @rgsl888prabhu
Build.sh use cmake --build to drive build system invocation (#7270) @robertmaynard
Refactor dictionary support for reductions any/all (#7242) @davidwendt
Replace stream.value() with stream for stream_view args (#7236) @karthikeyann
Interval index and interval_range (#7182) @marlenezw
avro reader integration tests (#7156) @cwharris
Rework libcudf CMakeLists.txt to export targets for CPM (#7107) @trxcllnt
Adding Interval Dtype (#6984) @marlenezw
Cleaning up for loops with make_(counting_)transform_iterator (#6546) @codereport

cudf - v0.19.0

Published by GPUtester over 3 years ago

🚨 Breaking Changes

Allow hash_partition to take a seed value (#7771) @magnatelee
Allow merging index column with data column using keyword "on" (#7736) @skirui-source
Change JNI API to avoid loading native dependencies when creating sort order classes. (#7729) @revans2
Replace device_vector with device_uvector in null_mask (#7715) @harrism
Don't identify decimals as strings. (#7710) @vyasr
Fix Java Parquet write after writer API changes (#7655) @revans2
Convert cudf::concatenate APIs to use spans and device_uvector (#7621) @harrism
Update missing docstring examples in python public APIs (#7546) @galipremsagar
Remove unneeded step parameter from strings::detail::copy_slice (#7525) @davidwendt
Rename ARROW_STATIC_LIB because it conflicts with one in FindArrow.cmake (#7518) @trxcllnt
Match Pandas logic for comparing two objects with nulls (#7490) @brandon-b-miller
Add struct support to parquet writer (#7461) @devavret
Join APIs that return gathermaps (#7454) @shwina
fixed_point + cudf::binary_operation API Changes (#7435) @codereport
Fix BUG: Exception when PYTHONOPTIMIZE=2 (#7434) @skirui-source
Change nvtext::load_vocabulary_file to return a unique ptr (#7424) @davidwendt
Refactor strings column factories (#7397) @harrism
Use CMAKE_CUDA_ARCHITECTURES (#7391) @robertmaynard
Upgrade pandas to 1.2 (#7375) @galipremsagar
Rename logical_cast to bit_cast and allow additional conversions (#7373) @ttnghia
Rework libcudf CMakeLists.txt to export targets for CPM (#7107) @trxcllnt

🐛 Bug Fixes

Fix a NameError in meta dispatch API (#7996) @galipremsagar
Reindex in DataFrame.__setitem__ (#7957) @galipremsagar
jitify direct-to-cubin compilation and caching. (#7919) @cwharris
Use dynamic cudart for nvcomp in java build (#7896) @abellina
fix "incompatible redefinition" warnings (#7894) @cwharris
cudf consistently specifies the cuda runtime (#7887) @robertmaynard
disable verbose output for jitify_preprocess (#7886) @cwharris
CMake jit_preprocess_files function only runs when needed (#7872) @robertmaynard
Push DeviceScalar construction into cython for list.contains (#7864) @brandon-b-miller
cudf now sets an install rpath of $ORIGIN (#7863) @robertmaynard
Don't install Thrust examples, tests, docs, and python files (#7811) @robertmaynard
Sort by index in groupby tests more consistently (#7802) @shwina
Revert "Update conda recipes pinning of repo dependencies (#7743)" (#7793) @raydouglass
Add decimal column handling in copy_type_metadata (#7788) @shwina
Add column names validation in parquet writer (#7786) @galipremsagar
Fix Java explode outer unit tests (#7782) @jlowe
Fix compiler warning about non-POD types passed through ellipsis (#7781) @jrhemstad
User resource fix for replace_nulls (#7769) @magnatelee
Fix type dispatch for columnar replace_nulls (#7768) @jlowe
Add ignore_order parameter to dask-cudf concat dispatch (#7765) @galipremsagar
Fix slicing and arrow representations of decimal columns (#7755) @vyasr
Fixing issue with explode_outer position not nulling position entries of null rows (#7754) @hyperbolic2346
Implement scatter for struct columns (#7752) @ttnghia
Fix data corruption in string columns (#7746) @galipremsagar
Fix string length in stripe dictionary building (#7744) @kaatish
Update conda recipes pinning of repo dependencies (#7743) @mike-wendt
Enable dask dispatch to cuDF's is_categorical_dtype for cuDF objects (#7740) @brandon-b-miller
Fix dictionary size computation in ORC writer (#7737) @vuule
Fix cudf::cast overflow for decimal64 to int32_t or smaller in certain cases (#7733) @codereport
Change JNI API to avoid loading native dependencies when creating sort order classes. (#7729) @revans2
Disable column_view data accessors for unsupported types (#7725) @jrhemstad
Materialize RangeIndex when index=True in parquet writer (#7711) @galipremsagar
Don't identify decimals as strings. (#7710) @vyasr
Fix return type of DataFrame.argsort (#7706) @galipremsagar
Fix/correct cudf installed package requirements (#7688) @robertmaynard
Fix SparkMurmurHash3_32 hash inconsistencies with Apache Spark (#7672) @jlowe
Fix ORC reader issue with reading empty string columns (#7656) @rgsl888prabhu
Fix Java Parquet write after writer API changes (#7655) @revans2
Fixing empty null lists throwing explode_outer for a loop. (#7649) @hyperbolic2346
Fix internal compiler error during JNI Docker build (#7645) @jlowe
Fix Debug build break with device_uvectors in grouped_rolling.cu (#7633) @mythrocks
Parquet reader: Fix issue when using skip_rows on non-nested columns containing nulls (#7627) @nvdbaranec
Fix ORC reader for empty DataFrame/Table (#7624) @rgsl888prabhu
Fix specifying GPU architecture in JNI build (#7612) @jlowe
Fix ORC writer OOM issue (#7605) @vuule
Fix 0.18 --> 0.19 automerge (#7589) @kkraus14
Fix ORC issue with incorrect timestamp nanosecond values (#7581) @vuule
Fix missing Dask imports (#7580) @kkraus14
CMAKE_CUDA_ARCHITECTURES doesn't change when build-system invokes cmake (#7579) @robertmaynard
Another fix for offsets_end() iterator in lists_column_view (#7575) @ttnghia
Fix ORC writer output corruption with string columns (#7565) @vuule
Fix cudf::lists::sort_lists failing for sliced column (#7564) @ttnghia
FIX Fix Anaconda upload args (#7558) @dillon-cullinan
Fix index mismatch issue in equality related APIs (#7555) @galipremsagar
FIX Revert gpuci_conda_retry on conda file output locations (#7552) @dillon-cullinan
Fix offset_end iterator for lists_column_view, which was not correctl… (#7551) @ttnghia
Fix no such file dlpack.h error when build libcudf (#7549) @chenrui17
Update missing docstring examples in python public APIs (#7546) @galipremsagar
Decimal32 Build Fix (#7544) @razajafri
FIX Retry conda output location (#7540) @dillon-cullinan
fix missing renames of dask git branches from master to main (#7535) @kkraus14
Remove detail from device_span (#7533) @rwlee
Change dask and distributed branch to main (#7532) @dantegd
Update JNI build to use CUDF_USE_ARROW_STATIC (#7526) @jlowe
Make sure rmm::rmm CMake target is visibile to cudf users (#7524) @robertmaynard
Fix contiguous_split not properly handling output partitions > 2 GB. (#7515) @nvdbaranec
Change jit launch to safe_launch (#7510) @devavret
Fix comparison between Datetime/Timedelta columns and NULL scalars (#7504) @brandon-b-miller
Fix off-by-one error in char-parallel string scalar replace (#7502) @jlowe
Fix JNI deprecation of all, put it on the wrong version before (#7501) @revans2
Fix Series/Dataframe Mixed Arithmetic (#7491) @brandon-b-miller
Fix JNI build after removal of libcudf sub-libraries (#7486) @jlowe
Correctly compile benchmarks (#7485) @robertmaynard
Fix bool column corruption with ORC Reader (#7483) @rgsl888prabhu
Fix __repr__ for categorical dtype (#7476) @galipremsagar
Java cleaner synchronization (#7474) @abellina
Fix java float/double parsing tests (#7473) @revans2
Pass stream and user resource to make_default_constructed_scalar (#7469) @magnatelee
Improve stability of dask_cudf.DataFrame.var and dask_cudf.DataFrame.std (#7453) @rjzamora
Missing device_storage_dispatch change affecting cudf::gather (#7449) @codereport
fix cuFile JNI compile errors (#7445) @rongou
Support Series.__setitem__ with key to a new row (#7443) @isVoid
Fix BUG: Exception when PYTHONOPTIMIZE=2 (#7434) @skirui-source
Make inclusive scan safe for cases with leading nulls (#7432) @magnatelee
Fix typo in list_device_view::pair_rep_end() (#7423) @mythrocks
Fix string to double conversion and row equivalent comparison (#7410) @ttnghia
Fix thrust failure when transfering data from device_vector to host_vector with vectors of size 1 (#7382) @ttnghia
Fix std::exeception catch-by-reference gcc9 compile error (#7380) @davidwendt
Fix skiprows issue with ORC Reader (#7359) @rgsl888prabhu
fix Arrow CMake file (#7358) @rongou
Fix lists::contains() for NaN and Decimals (#7349) @mythrocks
Handle cupy array in Dataframe.__setitem__ (#7340) @galipremsagar
Fix invalid-device-fn error in cudf::strings::replace_re with multiple regex's (#7336) @davidwendt
FIX Add codecov upload block to gpu script (#6860) @dillon-cullinan

📖 Documentation

Fix join API doxygen (#7890) @shwina
Add Resources to README. (#7697) @bdice
Add isin examples in Docstring (#7479) @galipremsagar
Resolving unlinked type shorthands in cudf doc (#7416) @isVoid
Fix typo in regex.md doc page (#7363) @davidwendt
Fix incorrect strings_column_view::chars_size documentation (#7360) @jlowe

🚀 New Features

Enable basic reductions for decimal columns (#7776) @ChrisJar
Enable join on decimal columns (#7764) @ChrisJar
Allow merging index column with data column using keyword "on" (#7736) @skirui-source
Implement DecimalColumn + Scalar and add cudf.Scalars of Decimal64Dtype (#7732) @brandon-b-miller
Add support for unique groupby aggregation (#7726) @shwina
Expose libcudf's label_bins function to cudf (#7724) @vyasr
Adding support for equi-join on struct (#7720) @hyperbolic2346
Add decimal column comparison operations (#7716) @isVoid
Implement scan operations for decimal columns (#7707) @ChrisJar
Enable typecasting between decimal and int (#7691) @ChrisJar
Enable decimal support in parquet writer (#7673) @devavret
Adds list.unique API (#7664) @isVoid
Fix NaN handling in drop_list_duplicates (#7662) @ttnghia
Add lists.sort_values API (#7657) @isVoid
Add is_integer API that can check for the validity of a string-to-integer conversion (#7642) @ttnghia
Adds explode API (#7607) @isVoid
Adds list.take, python binding for cudf::lists::segmented_gather (#7591) @isVoid
Implement cudf::label_bins() (#7554) @vyasr
Add Python bindings for lists::contains (#7547) @skirui-source
cudf::row_bit_count() support. (#7534) @nvdbaranec
Implement drop_list_duplicates (#7528) @ttnghia
Add Python bindings for lists::extract_lists_element (#7505) @skirui-source
Add explode_outer and explode_outer_position (#7499) @hyperbolic2346
Match Pandas logic for comparing two objects with nulls (#7490) @brandon-b-miller
Add struct support to parquet writer (#7461) @devavret
Enable type conversion from float to decimal type (#7450) @ChrisJar
Add cython for converting strings/fixed-point functions (#7429) @davidwendt
Add struct column support to cudf::sort and cudf::sorted_order (#7422) @karthikeyann
Implement groupby collect_set (#7420) @ttnghia
Merge branch-0.18 into branch-0.19 (#7411) @raydouglass
Refactor strings column factories (#7397) @harrism
Add groupby scan operations (sort groupby) (#7387) @karthikeyann
Add cudf::explode_position (#7376) @hyperbolic2346
Add string conversion to/from decimal values libcudf APIs (#7364) @davidwendt
Add groupby SUM_OF_SQUARES support (#7362) @karthikeyann
Add Series.drop api (#7304) @isVoid
get_json_object() implementation (#7286) @nvdbaranec
Python API for LIstMethods.len() (#7283) @isVoid
Support null_policy::EXCLUDE for COLLECT rolling aggregation (#7264) @mythrocks
Add support for special tokens in nvtext::subword_tokenizer (#7254) @davidwendt
Fix inplace update of data and add Series.update (#7201) @galipremsagar
Implement cudf::group_by (hash) for decimal32 and decimal64 (#7190) @codereport
Adding support to specify "level" parameter for Dataframe.rename (#7135) @skirui-source

🛠️ Improvements

fix GDS include path for version 0.95 (#7877) @rongou
Update dask + distributed to 2021.4.0 (#7858) @jakirkham
Add ability to extract include dirs from CUDF_HOME (#7848) @galipremsagar
Add USE_GDS as an option in build script (#7833) @pxLi
add an allocate method with stream in java DeviceMemoryBuffer (#7826) @rongou
Constrain dask and distributed versions to 2021.3.1 (#7825) @shwina
Revert dask versioning of concat dispatch (#7823) @galipremsagar
add copy methods in Java memory buffer (#7791) @rongou
Update README and CONTRIBUTING for 0.19 (#7778) @robertmaynard
Allow hash_partition to take a seed value (#7771) @magnatelee
Turn on NVTX by default in java build (#7761) @tgravescs
Add Java bindings to join gather map APIs (#7751) @jlowe
Add replacements column support for Java replaceNulls (#7750) @jlowe
Add Java bindings for row_bit_count (#7749) @jlowe
Remove unused JVM array creation (#7748) @jlowe
Added JNI support for new is_integer (#7739) @revans2
Create and promote library aliases in libcudf installations (#7734) @trxcllnt
Support groupby operations for decimal dtypes (#7731) @vyasr
Memory map the input file only when GDS compatiblity mode is not used (#7717) @vuule
Replace device_vector with device_uvector in null_mask (#7715) @harrism
Struct hashing support for SerialMurmur3 and SparkMurmur3 (#7714) @jlowe
Add gbenchmark for nvtext replace-tokens function (#7708) @davidwendt
Use stream in groupby calls (#7705) @karthikeyann
Update codeowners file (#7701) @ajschmidt8
Cleanup groupby to use host_span, device_span, device_uvector (#7698) @karthikeyann
Add gbenchmark for nvtext ngrams functions (#7693) @davidwendt
Misc Python/Cython optimizations (#7686) @shwina
Add gbenchmark for nvtext tokenize functions (#7684) @davidwendt
Add column_device_view to orc writer (#7676) @kaatish
cudf_kafka now uses cuDF CMake export targets (CPM) (#7674) @robertmaynard
Add gbenchmark for nvtext normalize functions (#7668) @davidwendt
Resolve unnecessary import of thrust/optional.hpp in types.hpp (#7667) @vyasr
Feature/optimize accessor copy (#7660) @vyasr
Fix find_package(cudf) (#7658) @trxcllnt
Work-around for gcc7 compile error on Centos7 (#7652) @davidwendt
Add in JNI support for count_elements (#7651) @revans2
Fix issues with building cudf in a non-conda environment (#7647) @galipremsagar
Refactor ConfigureCUDA to not conditionally insert compiler flags (#7643) @robertmaynard
Add gbenchmark for converting strings to/from timestamps (#7641) @davidwendt
Handle constructing a cudf.Scalar from a cudf.Scalar (#7639) @shwina
Add in JNI support for table partition (#7637) @revans2
Add explicit fixed_point merge test (#7635) @codereport
Add JNI support for IDENTITY hash partitioning (#7626) @revans2
Java support on explode_outer (#7625) @sperlingxx
Java support of casting string from/to decimal (#7623) @sperlingxx
Convert cudf::concatenate APIs to use spans and device_uvector (#7621) @harrism
Add gbenchmark for cudf::strings::translate function (#7617) @davidwendt
Use file(COPY ) over file(INSTALL ) so cmake output is reduced (#7616) @robertmaynard
Use rmm::device_uvector in place of rmm::device_vector for ORC reader/writer and cudf::io::column_buffer (#7614) @vuule
Refactor Java host-side buffer concatenation to expose separate steps (#7610) @jlowe
Add gbenchmarks for string substrings functions (#7603) @davidwendt
Refactor string conversion check (#7599) @ttnghia
JNI: Pass names of children struct columns to native Arrow IPC writer (#7598) @firestarman
Revert "ENH Fix stale GHA and prevent duplicates " (#7595) @mike-wendt
ENH Fix stale GHA and prevent duplicates (#7594) @mike-wendt
Fix auto-detecting GPU architectures (#7593) @trxcllnt
Reduce cudf library size (#7583) @robertmaynard
Optimize cudf::make_strings_column for long strings (#7576) @davidwendt
Always build and export the cudf::cudftestutil target (#7574) @trxcllnt
Eliminate literal parameters to uvector::set_element_async and device_scalar::set_value (#7563) @harrism
Add gbenchmark for strings::concatenate (#7560) @davidwendt
Update Changelog Link (#7550) @ajschmidt8
Add gbenchmarks for strings replace regex functions (#7541) @davidwendt
Add __repr__ for Column and ColumnAccessor (#7531) @shwina
Support Decimal DIV changes in cudf (#7527) @razajafri
Remove unneeded step parameter from strings::detail::copy_slice (#7525) @davidwendt
Use device_uvector, device_span in sort groupby (#7523) @karthikeyann
Add gbenchmarks for strings extract function (#7522) @davidwendt
Rename ARROW_STATIC_LIB because it conflicts with one in FindArrow.cmake (#7518) @trxcllnt
Reduce compile time/size for scan.cu (#7516) @davidwendt
Change device_vector to device_uvector in nvtext source files (#7512) @davidwendt
Removed unneeded includes from traits.hpp (#7509) @davidwendt
FIX Remove random build directory generation for ccache (#7508) @dillon-cullinan
xfail failing pytest in pandas 1.2.3 (#7507) @galipremsagar
JNI bit cast (#7493) @revans2
Combine rolling window function tests (#7480) @mythrocks
Prepare Changelog for Automation (#7477) @ajschmidt8
Java support for explode position (#7471) @sperlingxx
Update 0.18 changelog entry (#7463) @ajschmidt8
JNI: Support skipping nulls for collect aggregation (#7457) @firestarman
Join APIs that return gathermaps (#7454) @shwina
Remove dependence on managed memory for multimap test (#7451) @jrhemstad
Use cuFile for Parquet IO when available (#7444) @vuule
Statistics cleanup (#7439) @kaatish
Add gbenchmarks for strings filter functions (#7438) @davidwendt
fixed_point + cudf::binary_operation API Changes (#7435) @codereport
Improve string gather performance (#7433) @jlowe
Don't use user resource for a temporary allocation in sort_by_key (#7431) @magnatelee
Detail APIs for datetime functions (#7430) @magnatelee
Replace thrust::max_element with thrust::reduce in strings findall_re (#7428) @davidwendt
Add gbenchmark for strings split/split_record functions (#7427) @davidwendt
Update JNI build to use CMAKE_CUDA_ARCHITECTURES (#7425) @jlowe
Change nvtext::load_vocabulary_file to return a unique ptr (#7424) @davidwendt
Simplify type dispatch with device_storage_dispatch (#7419) @codereport
Java support for casting of nested child columns (#7417) @razajafri
Improve scalar string replace performance for long strings (#7415) @jlowe
Remove unneeded temporary device vector for strings scatter specialization (#7409) @davidwendt
bitmask_or implementation with bitmask refactor (#7406) @rwlee
Add other cudf::strings::replace functions to current strings replace gbenchmark (#7403) @davidwendt
Clean up included headers in device_operators.cuh (#7401) @codereport
Move nullable index iterator to indexalator factory (#7399) @davidwendt
ENH Pass ccache variables to conda recipe & use Ninja in CI (#7398) @Ethyling
upgrade maven-antrun-plugin to support maven parallel builds (#7393) @rongou
Add gbenchmark for strings find/contains functions (#7392) @davidwendt
Use CMAKE_CUDA_ARCHITECTURES (#7391) @robertmaynard
Refactor libcudf strings::replace to use make_strings_children utility (#7384) @davidwendt
Added in JNI support for out of core sort algorithm (#7381) @revans2
Upgrade pandas to 1.2 (#7375) @galipremsagar
Rename logical_cast to bit_cast and allow additional conversions (#7373) @ttnghia
jitify 2 support (#7372) @cwharris
compile_udf: Cache PTX for similar functions (#7371) @gmarkall
Add string scalar replace benchmark (#7369) @jlowe
Add gbenchmark for strings contains_re/count_re functions (#7366) @davidwendt
Update orc reader and writer fuzz tests (#7357) @galipremsagar
Improve url_decode performance for long strings (#7353) @jlowe
cudf::ast Small Refactorings (#7352) @codereport
Remove std::cout and print in the scatter test function EmptyListsOfNullableStrings. (#7342) @ttnghia
Use cudf::detail::make_counting_transform_iterator (#7338) @codereport
Change block size parameter from a global to a template param. (#7333) @nvdbaranec
Partial clean up of ORC writer (#7324) @vuule
Add gbenchmark for cudf::strings::to_lower (#7316) @davidwendt
Update Java bindings version to 0.19-SNAPSHOT (#7307) @pxLi
Move cudf::test::make_counting_transform_iterator to cudf/detail/iterator.cuh (#7306) @codereport
Use string literals in fixed_point release_asserts (#7303) @codereport
Fix merge conflicts for #7295 (#7297) @ajschmidt8
Add UTF-8 chars to create_random_column<string_view> benchmark utility (#7292) @davidwendt
Abstracting block reduce and block scan from cuIO kernels with cub apis (#7278) @rgsl888prabhu
Build.sh use cmake --build to drive build system invocation (#7270) @robertmaynard
Refactor dictionary support for reductions any/all (#7242) @davidwendt
Replace stream.value() with stream for stream_view args (#7236) @karthikeyann
Interval index and interval_range (#7182) @marlenezw
avro reader integration tests (#7156) @cwharris
Rework libcudf CMakeLists.txt to export targets for CPM (#7107) @trxcllnt
Adding Interval Dtype (#6984) @marlenezw
Cleaning up for loops with make_(counting_)transform_iterator (#6546) @codereport

cudf - v0.18.1

Published by GPUtester over 3 years ago

cudf - v0.18.0

Published by GPUtester over 3 years ago

Breaking Changes 🚨

Default groupby to sort=False (#7180) @isVoid
Add libcudf API for parsing of ORC statistics (#7136) @vuule
Replace ORC writer api with class (#7099) @rgsl888prabhu
Pack/unpack functionality to convert tables to and from a serialized format. (#7096) @nvdbaranec
Replace parquet writer api with class (#7058) @rgsl888prabhu
Add days check to cudf::is_timestamp using cuda::std::chrono classes (#7028) @davidwendt
Fix default parameter values of write_csv and write_parquet (#6967) @vuule
Align Series.groupby API to match Pandas (#6964) @kkraus14
Share factorize implementation with Index and cudf module (#6885) @brandon-b-miller

Bug Fixes 🐛

Remove incorrect std::move call on return variable (#7319) @davidwendt
Fix failing CI ORC test (#7313) @vuule
Disallow constructing frames from a ColumnAccessor (#7298) @shwina
fix java cuFile tests (#7296) @rongou
Fix style issues related to NumPy (#7279) @shwina
Fix bug when iloc slice terminates at before-the-zero position (#7277) @isVoid
Fix copying dtype metadata after calling libcudf functions (#7271) @shwina
Move lists utility function definition out of header (#7266) @mythrocks
Throw if bool column would cause incorrect result when writing to ORC (#7261) @vuule
Use uvector in replace_nulls; Fix sort_helper::grouped_value doc (#7256) @isVoid
Remove floating point types from cudf::sort fast-path (#7250) @davidwendt
Disallow picking output columns from nested columns. (#7248) @devavret
Fix loc for Series with a MultiIndex (#7243) @shwina
Fix Arrow column test leaks (#7241) @tgravescs
Fix test column vector leak (#7238) @kuhushukla
Fix some bugs in java scalar support for decimal (#7237) @revans2
Improve assert_eq handling of scalar (#7220) @isVoid
Fix missing null_count() comparison in test framework and related failures (#7219) @nvdbaranec
Remove floating point types from radix sort fast-path (#7215) @davidwendt
Fixing parquet benchmarks (#7214) @rgsl888prabhu
Handle various parameter combinations in replace API (#7207) @galipremsagar
Export mock aws credentials for s3 tests (#7176) @ayushdg
Add MultiIndex.rename API (#7172) @isVoid
Fix importing list & struct types in from_arrow (#7162) @galipremsagar
Fixing parquet precision writing failing if scale is equal to precision (#7146) @hyperbolic2346
Update s3 tests to use moto_server (#7144) @ayushdg
Fix JIT cache multi-process test flakiness in slow drives (#7142) @devavret
Fix compilation errors in libcudf (#7138) @galipremsagar
Fix compilation failure caused by -Wall addition. (#7134) @codereport
Add informative error message for sep in CSV writer (#7095) @galipremsagar
Add JIT cache per compute capability (#7090) @devavret
Implement __hash__ method for ListDtype (#7081) @galipremsagar
Only upload packages that were built (#7077) @raydouglass
Fix comparisons between Series and cudf.NA (#7072) @brandon-b-miller
Handle nan values correctly in Series.one_hot_encoding (#7059) @galipremsagar
Add unstack() support for non-multiindexed dataframes (#7054) @isVoid
Fix read_orc for decimal type (#7034) @rgsl888prabhu
Fix backward compatibility of loading a 0.16 pkl file (#7033) @galipremsagar
Decimal casts in JNI became a NOOP (#7032) @revans2
Restore usual instance/subclass checking to cudf.DateOffset (#7029) @shwina
Add days check to cudf::is_timestamp using cuda::std::chrono classes (#7028) @davidwendt
Fix to_csv delimiter handling of timestamp format (#7023) @davidwendt
Pin librdkakfa to gcc 7 compatible version (#7021) @raydouglass
Fix fillna & dropna to also consider np.nan as a missing value (#7019) @galipremsagar
Fix round operator's HALF_EVEN computation for negative integers (#7014) @nartal1
Skip Thrust sort patch if already applied (#7009) @harrism
Fix cudf::hash_partition for decimal32 and decimal64 (#7006) @codereport
Fix Thrust unroll patch command (#7002) @harrism
Fix loc behaviour when key of incorrect type is used (#6993) @shwina
Fix int to datetime conversion in csv_read (#6991) @kaatish
fix excluding cufile tests by default (#6988) @rongou
Fix java cufile tests when cufile is not installed (#6987) @revans2
Make cudf::round for fixed_point when scale = -decimal_places a no-op (#6975) @codereport
Fix type comparison for java (#6970) @revans2
Fix default parameter values of write_csv and write_parquet (#6967) @vuule
Align Series.groupby API to match Pandas (#6964) @kkraus14
Fix timestamp parsing in ORC reader for timezones without transitions (#6959) @vuule
Fix typo in numerical.py (#6957) @rgsl888prabhu
fixed_point_value double-shifts in fixed_point construction (#6950) @codereport
fix libcu++ include path for jni (#6948) @rongou
Fix groupby agg/apply behaviour when no key columns are provided (#6945) @shwina
Avoid inserting null elements into join hash table when nulls are treated as unequal (#6943) @hyperbolic2346
Fix cudf::merge gtest for dictionary columns (#6942) @davidwendt
Pass numeric scalars of the same dtype through numeric binops (#6938) @brandon-b-miller
Fix N/A detection for empty fields in CSV reader (#6922) @vuule
Fix rmm_mode=managed parameter for gtests (#6912) @davidwendt
Fix nullmask offset handling in parquet and orc writer (#6889) @kaatish
Correct the sampling range when sampling with replacement (#6884) @ChrisJar
Handle nested string columns with no children in contiguous_split. (#6864) @nvdbaranec
Fix columns & index handling in dataframe constructor (#6838) @galipremsagar

Documentation 📖

Update readme (#7318) @shwina
Fix typo in cudf.core.column.string.extract docs (#7253) @adelevie
Update doxyfile project number (#7161) @davidwendt
Update 10 minutes to cuDF and CuPy with new APIs (#7158) @ChrisJar
Cross link RMM & libcudf Doxygen docs (#7149) @ajschmidt8
Add documentation for support dtypes in all IO formats (#7139) @galipremsagar
Add groupby docs (#7100) @shwina
Update cudf python docstrings with new null representation (<NA>) (#7050) @galipremsagar
Make Doxygen comments formatting consistent (#7041) @vuule
Add docs for working with missing data (#7010) @galipremsagar
Remove warning in from_dlpack and to_dlpack methods (#7001) @miguelusque
libcudf Developer Guide (#6977) @harrism
Add JNI wrapper for the cuFile API (GDS) (#6940) @rongou

New Features 🚀

Support numeric_only field for rank() (#7213) @isVoid
Add support for cudf::binary_operation TRUE_DIV for decimal32 and decimal64 (#7198) @codereport
Implement COLLECT rolling window aggregation (#7189) @mythrocks
Add support for array-like inputs in cudf.get_dummies (#7181) @galipremsagar
Default groupby to sort=False (#7180) @isVoid
Add libcudf lists column count_elements API (#7173) @davidwendt
Implement cudf::group_by (sort) for decimal32 and decimal64 (#7169) @codereport
Add encoding and compression argument to CSV writer (#7168) @VibhuJawa
cudf::rolling_window SUM support for decimal32 and decimal64 (#7147) @codereport
Adding support for explode to cuDF (#7140) @hyperbolic2346
Add libcudf API for parsing of ORC statistics (#7136) @vuule
update GDS/cuFile location for 0.9 release (#7131) @rongou
Add Segmented sort (#7122) @karthikeyann
Add cudf::binary_operation NULL_MIN, NULL_MAX & NULL_EQUALS for decimal32 and decimal64 (#7119) @codereport
Add scale and value methods to fixed_point (#7109) @codereport
Replace ORC writer api with class (#7099) @rgsl888prabhu
Pack/unpack functionality to convert tables to and from a serialized format. (#7096) @nvdbaranec
Improve digitize API (#7071) @isVoid
Add List types support in data generator (#7064) @galipremsagar
cudf::scan support for decimal32 and decimal64 (#7063) @codereport
cudf::rolling ROW_NUMBER support for decimal32 and decimal64 (#7061) @codereport
Replace parquet writer api with class (#7058) @rgsl888prabhu
Support contains() on lists of primitives (#7039) @mythrocks
Implement cudf::rolling for decimal32 and decimal64 (#7037) @codereport
Add ffill and bfill to string columns (#7036) @isVoid
Enable round in cudf for DataFrame and Series (#7022) @ChrisJar
Extend replace_nulls_policy to string and dictionary type (#7004) @isVoid
Add segmented_gather(list_column, gather_list) (#7003) @karthikeyann
Add method field to fillna for fixed width columns (#6998) @isVoid
Manual merge of branch 0.17 into branch 0.18 (#6995) @shwina
Implement cudf::reduce for decimal32 and decimal64 (part 2) (#6980) @codereport
Add Ufunc alias look up for appropriate numpy ufunc dispatching (#6973) @VibhuJawa
Add pytest-xdist to dev environment.yml (#6958) @galipremsagar
Add Index.set_names api (#6929) @galipremsagar
Add replace_null API with replace_policy parameter, fixed_width column support (#6907) @isVoid
Share factorize implementation with Index and cudf module (#6885) @brandon-b-miller
Implement update() function (#6883) @skirui-source
Add groupby idxmin, idxmax aggregation (#6856) @karthikeyann
Implement cudf::reduce for decimal32 and decimal64 (part 1) (#6814) @codereport
Implement cudf.DateOffset for months (#6775) @brandon-b-miller
Add Python DecimalColumn (#6715) @shwina
Add dictionary support to libcudf groupby functions (#6585) @davidwendt

Improvements 🛠️

Update stale GHA with exemptions & new labels (#7395) @mike-wendt
Add GHA to mark issues/prs as stale/rotten (#7388) @Ethyling
Unpin from numpy < 1.20 (#7335) @shwina
Prepare Changelog for Automation (#7309) @galipremsagar
Prepare Changelog for Automation (#7272) @ajschmidt8
Add JNI support for converting Arrow buffers to CUDF ColumnVectors (#7222) @tgravescs
Add coverage for skiprows and num_rows in parquet reader fuzz testing (#7216) @galipremsagar
Define and implement more behavior for merging on categorical variables (#7209) @brandon-b-miller
Add CudfSeriesGroupBy to optimize dask_cudf groupby-mean (#7194) @rjzamora
Add dictionary column support to rolling_window (#7186) @davidwendt
Modify the semantics of end pointers in cuIO to match standard library (#7179) @vuule
Adding unit tests for fixed_point with extremely large scales (#7178) @codereport
Fast path single column sort (#7167) @davidwendt
Fix -Werror=sign-compare errors in device code (#7164) @trxcllnt
Refactor cudf::string_view host and device code (#7159) @davidwendt
Enable logic for GPU auto-detection in cudfjni (#7155) @gerashegalov
Java bindings for Fixed-point type support for Parquet (#7153) @razajafri
Add Java interface for the new API 'explode' (#7151) @firestarman
Replace offsets with iterators in cuIO utilities and CSV parser (#7150) @vuule
Add gbenchmarks for reduction aggregations any() and all() (#7129) @davidwendt
Update JNI for contiguous_split packed results (#7127) @jlowe
Add JNI and Java bindings for list_contains (#7125) @kuhushukla
Add Java unit tests for window aggregate 'collect' (#7121) @firestarman
verify window operations on decimal with java tests (#7120) @sperlingxx
Adds in JNI support for creating an list column from existing columns (#7112) @revans2
Build libcudf with -Wall (#7105) @trxcllnt
Add column_device_view pointers to EncColumnDesc (#7097) @kaatish
Add pyorc to dev environment (#7085) @galipremsagar
JNI support for creating struct column from existing columns and fixed bug in struct with no children (#7084) @revans2
Fastpath single strings column in cudf::sort (#7075) @davidwendt
Upgrade nvcomp to 1.2.1 (#7069) @rongou
Refactor ORC ProtobufReader to make it more extendable (#7055) @vuule
Add Java tests for decimal casts (#7051) @sperlingxx
Auto-label PRs based on their content (#7044) @jolorunyomi
Create sort gbenchmark for strings column (#7040) @davidwendt
Refactor io memory fetches to use hostdevice_vector methods (#7035) @ChrisJar
Spark Murmur3 hash functionality (#7024) @rwlee
Fix libcudf strings logic where size_type is used to access INT32 column data (#7020) @davidwendt
Adding decimal writing support to parquet (#7017) @hyperbolic2346
Add compression="infer" as default for dask_cudf.read_csv (#7013) @rjzamora
Correct ORC docstring; other minor cuIO improvements (#7012) @vuule
Reduce number of hostdevice_vector allocations in parquet reader (#7005) @devavret
Check output size overflow on strings gather (#6997) @davidwendt
Improve representation of MultiIndex (#6992) @galipremsagar
Disable some pragma unroll statements in thrust sort.h (#6982) @davidwendt
Minor cudf::round internal refactoring (#6976) @codereport
Add Java bindings for URL conversion (#6972) @jlowe
Enable strict_decimal_types in parquet reading (#6969) @sperlingxx
Add in basic support to JNI for logical_cast (#6954) @revans2
Remove duplicate file array_tests.cpp (#6953) @karthikeyann
Add null mask fixed_point_column_wrapper constructors (#6951) @codereport
Update Java bindings version to 0.18-SNAPSHOT (#6949) @jlowe
Use simplified rmm::exec_policy (#6939) @harrism
Add null count test for apply_boolean_mask (#6903) @harrism
Implement DataFrame.quantile for datetime and timedelta data types (#6902) @ChrisJar
Remove **kwargs from string/categorical methods (#6750) @shwina
Refactor rolling.cu to reduce compile time (#6512) @mythrocks
Add static type checking via Mypy (#6381) @shwina
Update to official libcu++ on Github (#6275) @trxcllnt

Package Rankings

Top 5.32% on Pypi.org

Top 8.17% on Proxy.golang.org

Top 4.8% on Repo1.maven.org

Related Projects

spconv

Spatial Sparse Convolution Library

19 Jan 2019 1,847

CV-CUDA

CV-CUDA™ is an open-source, GPU accelerated library for cloud-scale image processing and computer...

23 Aug 2022 2,338

panda3d

Powerful, mature open-source cross-platform game engine for Python and C++, developed by Disney a...

30 Sep 2013 4,258

librapid

A highly optimised C++ library for mathematical applications and neural networks.

25 May 2021 163

blazingsql

BlazingSQL is a lightweight, GPU accelerated, SQL engine for Python. Built on RAPIDS cuDF.

24 Sep 2018 1,896

cupy

NumPy & SciPy for GPU

01 Nov 2016 7,739

vqa-outliers

Code and Experiments for ACL-IJCNLP 2021 Paper "Mind Your Outliers! Investigating the Negative Im...

25 May 2021 55

annotated-s4

Implementation of https://srush.github.io/annotated-s4

08 Dec 2021 450

sqaod

Solvers/annealers for simulated quantum annealing on CPU and CUDA(NVIDIA GPU).

24 Oct 2017 81

localGPT

Chat with your documents on your local device using GPT models. No data leaves your device and 10...

24 May 2023 19,925

sit4onnx

Tools for simple inference testing using TensorRT, CUDA and OpenVINO CPU/GPU and CPU providers. S...

12 May 2022 18

CuVec

Unifying Python/C++/CUDA memory: Python buffered array ↔️ `std::vector` ↔️ CUDA managed memory

16 Jan 2021 80

cumm

CUda Matrix Multiply library.

08 Oct 2021 67

DeepRec

DeepRec is a high-performance recommendation deep learning framework based on TensorFlow. It is h...

24 Dec 2021 1,029