cudf

cuDF - GPU DataFrame Library

APACHE-2.0 License

Downloads
13.3K
Stars
7.2K
Committers
246

Bot releases are hidden (Show)

cudf - v22.06.01

Published by GPUtester over 2 years ago

v22.06.01

cudf - v22.06.00

Published by GPUtester over 2 years ago

🚨 Breaking Changes

  • Enable Zstandard decompression only when all nvcomp integrations are enabled (#10944) @vuule
  • Rename sliced_child to get_sliced_child. (#10885) @bdice
  • Add parameters to control page size in Parquet writer (#10882) @etseidl
  • Make cudf::test::expect_columns_equal() to fail when comparing unsanitary lists. (#10880) @nvdbaranec
  • Cleanup regex compiler fixed quantifiers source (#10843) @davidwendt
  • Refactor cudf::contains, renaming and switching parameters role (#10802) @ttnghia
  • Generic serialization of all column types (#10784) @wence-
  • Return per-file metadata from readers (#10782) @vuule
  • HostColumnVectoreCore#isNull should return true for out-of-range rows (#10779) @gerashegalov
  • Update groupby::hash to use new row operators for keys (#10770) @PointKernel
  • update mangle_dupe_cols behavior in csv reader to match pandas 1.4.0 behavior (#10749) @karthikeyann
  • Rename CUDA_TRY macro to CUDF_CUDA_TRY, rename CHECK_CUDA macro to CUDF_CHECK_CUDA. (#10589) @bdice
  • Upgrade cudf to support pandas 1.4.x versions (#10584) @galipremsagar
  • Move binop methods from Frame to IndexedFrame and standardize the docstring (#10576) @vyasr
  • Add default= kwarg to .list.get() accessor method (#10547) @shwina
  • Remove deprecated decimal_cols_as_float in the ORC reader (#10515) @vuule
  • Support nvComp 2.3 if local, otherwise use nvcomp 2.2 (#10513) @robertmaynard
  • Fix findall_record to return empty list for no matches (#10491) @davidwendt
  • Namespace/Docstring Fixes for Reduction (#10471) @isVoid
  • Additional refactoring of hash functions (#10462) @bdice
  • Fix default value of str.split expand parameter. (#10457) @bdice
  • Remove deprecated code. (#10450) @vyasr

πŸ› Bug Fixes

  • Fix single column MultiIndex issue in sort_index (#10957) @galipremsagar
  • Make SerializedTableHeader(numRows) public (#10949) @gerashegalov
  • Fix gcc_linux version pinning in dev environment (#10943) @galipremsagar
  • Fix an issue with reading raw string in cudf.read_json (#10924) @galipremsagar
  • Make cudf::test::expect_columns_equal() to fail when comparing unsanitary lists. (#10880) @nvdbaranec
  • Fix segmented_reduce on empty column with non-empty offsets (#10876) @davidwendt
  • Fix dask-cudf groupby handling when grouping by all columns (#10866) @charlesbluca
  • Fix a bug in distinct: using nested nulls logic (#10848) @PointKernel
  • Fix constness / references in weak ordering operator() signatures. (#10846) @bdice
  • Suppress sizeof-array-div warnings in thrust found by gcc-11 (#10840) @robertmaynard
  • Add handling for string by-columns in dask-cudf groupby (#10830) @charlesbluca
  • Fix compile warning in search.cu (#10827) @davidwendt
  • Fix element access const correctness in hostdevice_vector (#10804) @vuule
  • Update cuco git tag (#10788) @PointKernel
  • HostColumnVectoreCore#isNull should return true for out-of-range rows (#10779) @gerashegalov
  • Fixing deprecation warnings in test_orc.py (#10772) @hyperbolic2346
  • Enable writing to s3 storage in chunked parquet writer (#10769) @galipremsagar
  • Fix construction of nested structs with EMPTY child (#10761) @shwina
  • Fix replace error when regex has only zero match quantifiers (#10760) @davidwendt
  • Fix an issue with one_level_list schemas in parquet reader. (#10750) @nvdbaranec
  • update mangle_dupe_cols behavior in csv reader to match pandas 1.4.0 behavior (#10749) @karthikeyann
  • Fix cupy function in notebook (#10737) @ajschmidt8
  • Fix fillna to retain columns when it is MultiIndex (#10729) @galipremsagar
  • Fix scatter for all-empty-string column case (#10724) @davidwendt
  • Retain series name in Series.apply (#10716) @brandon-b-miller
  • Correct build dir cudf-config dependency issues for static builds (#10704) @robertmaynard
  • Fix list of testing requirements in setup.py. (#10678) @bdice
  • Fix rounding to zero error in stod on very small float numbers (#10672) @davidwendt
  • cuco isn't a cudf dependency when we are built shared (#10662) @robertmaynard
  • Fix to_timestamps to support Z for %z format specifier (#10617) @davidwendt
  • Verify compression type in Parquet reader (#10610) @vuule
  • Fix struct row comparator's exception on empty structs (#10604) @sperlingxx
  • Fix strings strip() to accept only str Scalar for to_strip parameter (#10597) @davidwendt
  • Fix has_atomic_support check in can_use_hash_groupby() (#10588) @jbrennan333
  • Revert Thrust 1.16 to Thrust 1.15 (#10586) @bdice
  • Fix missing RMM_STATIC_CUDART define when compiling JNI with static CUDA runtime (#10585) @jlowe
  • pin more cmake versions (#10570) @robertmaynard
  • Re-enable Build Metrics Report (#10562) @davidwendt
  • Remove statically linked CUDA runtime check in Java build (#10532) @jlowe
  • Fix temp data cleanup in test_text.py (#10524) @brandon-b-miller
  • Update pre-commit to run black 22.3.0 (#10523) @vyasr
  • Remove deprecated decimal_cols_as_float in the ORC reader (#10515) @vuule
  • Fix findall_record to return empty list for no matches (#10491) @davidwendt
  • Allow users to specify data types for a subset of columns in read_csv (#10484) @vuule
  • Fix default value of str.split expand parameter. (#10457) @bdice
  • Improve coverage of dask-cudf's groupby aggregation, add tests for dropna support (#10449) @charlesbluca
  • Allow string aggs for dask_cudf.CudfDataFrameGroupBy.aggregate (#10222) @charlesbluca
  • In-place updates with loc or iloc don't work correctly when the LHS has more than one column (#9918) @skirui-source

πŸ“– Documentation

  • Clarify append deprecation notice. (#10930) @bdice
  • Use full name of GPUDirect Storage SDK in docs (#10904) @vuule
  • Update Dask + Pandas to Dask + cuDF path (#10897) @miguelusque
  • Add missing documentation in cudf/types.hpp (#10895) @karthikeyann
  • Add strong index iterator docs. (#10888) @bdice
  • spell check fixes (#10865) @karthikeyann
  • Add missing documentation in scalar/ headers (#10861) @karthikeyann
  • Remove typo in ngram documentation (#10859) @miguelusque
  • fix doxygen warnings (#10842) @karthikeyann
  • Add a library_design.md file documenting the core Python data structures and their relationship (#10817) @vyasr
  • Add NumPy to intersphinx references. (#10809) @bdice
  • Add a section to the docs that compares cuDF with Pandas (#10796) @shwina
  • Mention 2 cpp-reviewer requirement in pull request template (#10768) @davidwendt
  • Enable pydocstyle for all packages. (#10759) @bdice
  • Enable pydocstyle rules involving quotes (#10748) @vyasr
  • Revise 10 minutes notebook. (#10738) @bdice
  • Reorganize cuDF Python docs (#10691) @shwina
  • Fix sphinx/jupyter heading issue in UDF notebook (#10690) @brandon-b-miller
  • Migrated user guide notebooks to MyST-NB and added sphinx extension (#10685) @mmccarty
  • add data generation to benchmark documentation (#10677) @karthikeyann
  • Fix some docs build warnings (#10674) @galipremsagar
  • Update UDF notebook in User Guide. (#10668) @bdice
  • Improve User Guide docs (#10663) @bdice
  • Fix some docstrings formatting (#10660) @galipremsagar
  • Remove implementation details from apply docstrings (#10651) @brandon-b-miller
  • Revise CONTRIBUTING.md (#10644) @bdice
  • Add missing APIs to documentation. (#10643) @bdice
  • Use cudf.read_json as documented API name. (#10640) @bdice
  • Fix docstring section headings. (#10639) @bdice
  • Document cudf.read_text and cudf.read_avro. (#10638) @bdice
  • Fix type-o in docstring for json_reader_options (#10627) @dagardner-nv
  • Update guide to UDFs with notes about Series.applymap deprecation and related changes (#10607) @brandon-b-miller
  • Fix doxygen Modules page for cudf::lists::sequences (#10561) @davidwendt
  • Add Replace Backreferences section to Regex Features page (#10560) @davidwendt
  • Introduce deprecation policy to developer guide. (#10252) @vyasr

πŸš€ New Features

  • Enable Zstandard decompression only when all nvcomp integrations are enabled (#10944) @vuule
  • Handle nested types in cudf::concatenate_rows() (#10890) @nvdbaranec
  • Strong index types for equality comparator (#10883) @ttnghia
  • Add parameters to control page size in Parquet writer (#10882) @etseidl
  • Support for Zstandard decompression in ORC reader (#10873) @vuule
  • Use pre-built nvcomp 2.3 binaries by default (#10851) @robertmaynard
  • Support for Zstandard decompression in Parquet reader (#10847) @vuule
  • Add JNI support for apply_boolean_mask (#10812) @res-life
  • Segmented Min/Max for Fixed Point Types (#10794) @isVoid
  • Return per-file metadata from readers (#10782) @vuule
  • Segmented apply_boolean_mask for LIST columns (#10773) @mythrocks
  • Update groupby::hash to use new row operators for keys (#10770) @PointKernel
  • Support purging non-empty null elements from LIST/STRING columns (#10701) @mythrocks
  • Add detail::hash_join (#10695) @PointKernel
  • Persist string statistics data across multiple calls to orc chunked write (#10694) @hyperbolic2346
  • Add .list.astype() to cast list leaves to specified dtype (#10693) @shwina
  • JNI: Add generateListOffsets API (#10683) @sperlingxx
  • Support args in groupby apply (#10682) @brandon-b-miller
  • Enable segmented_gather in Java package (#10669) @sperlingxx
  • Add row hasher with nested column support (#10641) @devavret
  • Add support for numeric_only in DataFrame._reduce (#10629) @martinfalisse
  • First step toward statistics in ORC files with chunked writes (#10567) @hyperbolic2346
  • Add support for struct columns to the random table generator (#10566) @vuule
  • Enable passing a sequence for the index argument to .list.get() (#10564) @shwina
  • Add python bindings for cudf::list::index_of (#10549) @ChrisJar
  • Add default= kwarg to .list.get() accessor method (#10547) @shwina
  • Add cudf.DataFrame.applymap (#10542) @brandon-b-miller
  • Support nvComp 2.3 if local, otherwise use nvcomp 2.2 (#10513) @robertmaynard
  • Add column field ID control in parquet writer (#10504) @PointKernel
  • Deprecate Series.applymap (#10497) @brandon-b-miller
  • Add option to drop cache in cuIO benchmarks (#10488) @vuule
  • move benchmark input generation in device in reduction nvbench (#10486) @karthikeyann
  • Support Segmented Min/Max Reduction on String Type (#10447) @isVoid
  • List element Equality comparator (#10289) @devavret
  • Implement all methods of groupby rank aggregation in libcudf, python (#9569) @karthikeyann
  • Implement DataFrame.eval using libcudf ASTs (#8022) @vyasr

πŸ› οΈ Improvements

  • Use conda compilers in env file (#10915) @galipremsagar
  • Remove C style artifacts in cuIO (#10886) @vuule
  • Rename sliced_child to get_sliced_child. (#10885) @bdice
  • Replace defaulted stream value for libcudf APIs that use NVCOMP (#10877) @jbrennan333
  • Add more unit tests for cudf::distinct for nested types with sliced input (#10860) @ttnghia
  • Changing list_view.cuh to list_view.hpp (#10854) @ttnghia
  • More error checking in from_dlpack (#10850) @wence-
  • Cleanup regex compiler fixed quantifiers source (#10843) @davidwendt
  • Adds the JNI call for Cuda.deviceSynchronize (#10839) @abellina
  • Add missing cuda-python dependency to cudf (#10833) @bdice
  • Change std::string parameters in cudf::strings APIs to std::string_view (#10832) @davidwendt
  • Split up search.cu to improve compile time (#10831) @davidwendt
  • Add tests for null scalar binaryops (#10828) @brandon-b-miller
  • Cleanup regex compile optimize functions (#10825) @davidwendt
  • Use ThreadedMotoServer instead of subprocess in spinning up s3 server (#10822) @galipremsagar
  • Import NA from missing rather than using cudf.NA everywhere (#10821) @brandon-b-miller
  • Refactor regex builtin character-class identifiers (#10814) @davidwendt
  • Change pattern parameter for regex APIs from std::string to std::string_view (#10810) @davidwendt
  • Make the JNI API to get list offsets as a view public. (#10807) @revans2
  • Add cudf JNI docker build github action (#10806) @pxLi
  • Removed mr parameter from inplace bitmask operations (#10805) @AtlantaPepsi
  • Refactor cudf::contains, renaming and switching parameters role (#10802) @ttnghia
  • Handle closed property in IntervalDtype.from_pandas (#10798) @wence-
  • Return weak orderings from device_row_comparator. (#10793) @rwlee
  • Rework Scalar imports (#10791) @brandon-b-miller
  • Enable ccache for cudfjni build in Docker (#10790) @gerashegalov
  • Generic serialization of all column types (#10784) @wence-
  • simplifying skiprows test in test_orc.py (#10783) @hyperbolic2346
  • Use column_views instead of column_device_views in binary operations. (#10780) @bdice
  • Add struct utility functions. (#10776) @bdice
  • Add multiple rows to subword tokenizer benchmark (#10767) @davidwendt
  • Refactor host decompression in ORC reader (#10764) @vuule
  • Flush output streams before creating a process to drop caches (#10762) @vuule
  • Refactor binaryop/compiled/util.cpp (#10756) @bdice
  • Use warp per string for long strings in cudf::strings::contains() (#10739) @davidwendt
  • Use generator expressions in any/all functions. (#10736) @bdice
  • Use canonical "magic methods" (replace x.__repr__() with repr(x)). (#10735) @bdice
  • Improve use of isinstance. (#10734) @bdice
  • Rename tests from multiIndex to multiindex. (#10732) @bdice
  • Two-table comparators with strong index types (#10730) @bdice
  • Replace std::make_pair with std::pair (C++17 CTAD) (#10727) @karthikeyann
  • Use structured bindings instead of std::tie (#10726) @karthikeyann
  • Missing f prefix on f-strings fix (#10721) @code-review-doctor
  • Add max_file_size parameter to chunked parquet dataset writer (#10718) @galipremsagar
  • Deprecate merge_sorted, change dask cudf usage to internal method (#10713) @isVoid
  • Prepare dask_cudf test_parquet.py for upcoming API changes (#10709) @rjzamora
  • Remove or simplify various utility functions (#10705) @vyasr
  • Allow building arrow with parquet and not python (#10702) @revans2
  • Partial cuIO GPU decompression refactor (#10699) @vuule
  • Cython API refactor: merge.pyx (#10698) @isVoid
  • Fix random string data length to become variable (#10697) @galipremsagar
  • Add bindings for index_of with column search key (#10696) @ChrisJar
  • Deprecate index merging (#10689) @vyasr
  • Remove cudf::strings::string namespace (#10684) @davidwendt
  • Standardize imports. (#10680) @bdice
  • Standardize usage of collections.abc. (#10679) @bdice
  • Cython API Refactor: transpose.pyx, sort.pyx (#10675) @isVoid
  • Add device_memory_resource parameter to create_string_vector_from_column (#10673) @davidwendt
  • Split up mixed-join kernels source files (#10671) @davidwendt
  • Use std::filesystem for temporary directory location and deletion (#10664) @vuule
  • cleanup benchmark includes (#10661) @karthikeyann
  • Use upstream clang-format pre-commit hook. (#10659) @bdice
  • Clean up C++ includes to use <> instead of "". (#10658) @bdice
  • Handle RuntimeError thrown by CUDA Python in validate_setup (#10653) @shwina
  • Rework JNI CMake to leverage rapids_find_package (#10649) @jlowe
  • Use conda to build python packages during GPU tests (#10648) @Ethyling
  • Deprecate various functions that don't need to be defined for Index. (#10647) @vyasr
  • Update pinning to allow newer CMake versions. (#10646) @vyasr
  • Bump hadoop-common from 3.1.4 to 3.2.3 in /java (#10645) @dependabot[bot]
  • Remove concurrent_unordered_multimap. (#10642) @bdice
  • Improve parquet dictionary encoding (#10635) @PointKernel
  • Improve cudf::cuda_error (#10630) @sperlingxx
  • Add support for null and non-numeric types in Series.diff and DataFrame.diff (#10625) @Matt711
  • Branch 22.06 merge 22.04 (#10624) @vyasr
  • Unpin dask & distributed for development (#10623) @galipremsagar
  • Slightly improve accuracy of stod in to_floats (#10622) @davidwendt
  • Allow libcudfjni to be built as a static library (#10619) @jlowe
  • Change stack-based regex state data to use global memory (#10600) @davidwendt
  • Resolve Forward merging of branch-22.04 into branch-22.06 (#10598) @galipremsagar
  • KvikIO as an alternative GDS backend (#10593) @madsbk
  • Rename CUDA_TRY macro to CUDF_CUDA_TRY, rename CHECK_CUDA macro to CUDF_CHECK_CUDA. (#10589) @bdice
  • Upgrade cudf to support pandas 1.4.x versions (#10584) @galipremsagar
  • Refactor binary ops for timedelta and datetime columns (#10581) @vyasr
  • Refactor cudf::strings::count_re API to use count_matches utility (#10580) @davidwendt
  • Update Programming Language :: Python Versions to 3.8 & 3.9 (#10579) @madsbk
  • Automate Java cudf jar build with statically linked dependencies (#10578) @gerashegalov
  • Add patch for thrust-cub 1.16 to fix sort compile times (#10577) @davidwendt
  • Move binop methods from Frame to IndexedFrame and standardize the docstring (#10576) @vyasr
  • Cleanup libcudf strings regex classes (#10573) @davidwendt
  • Simplify preprocessing of arguments for DataFrame binops (#10563) @vyasr
  • Reduce kernel calls to build strings findall results (#10559) @davidwendt
  • Forward-merge branch-22.04 to branch-22.06 (#10557) @bdice
  • Update strings contains benchmark to measure varying match rates (#10555) @davidwendt
  • JNI: throw CUDA errors more specifically (#10551) @sperlingxx
  • Enable building static libs (#10545) @trxcllnt
  • Remove pip requirements files. (#10543) @bdice
  • Remove Click pinnings that are unnecessary after upgrading black. (#10541) @vyasr
  • Refactor memory_usage to improve performance (#10537) @galipremsagar
  • Adjust the valid range of group index for replace_with_backrefs (#10530) @sperlingxx
  • add accidentally removed comment. (#10526) @vyasr
  • Update conda environment. (#10525) @vyasr
  • Remove ColumnBase.getitem (#10516) @vyasr
  • Optimize left_semi_join by materializing the gather mask (#10511) @cheinger
  • Define proper binary operation APIs for columns (#10509) @vyasr
  • Upgrade arrow-cpp & pyarrow to 7.0.0 (#10503) @galipremsagar
  • Update to Thrust 1.16 (#10489) @bdice
  • Namespace/Docstring Fixes for Reduction (#10471) @isVoid
  • Update cudfjni 22.06.0-SNAPSHOT (#10467) @pxLi
  • Use Lists of Columns for Various Files (#10463) @isVoid
  • Additional refactoring of hash functions (#10462) @bdice
  • Fix Series.str.findall behavior for expand=False. (#10459) @bdice
  • Remove deprecated code. (#10450) @vyasr
  • Update cmake-format version. (#10440) @vyasr
  • Consolidate C++ conda recipes and add libcudf-tests package (#10326) @ajschmidt8
  • Use conda compilers (#10275) @Ethyling
  • Add row bitmask as a detail::hash_join member (#10248) @PointKernel
cudf - v22.04.00

Published by GPUtester over 2 years ago

🚨 Breaking Changes

  • Drop unsupported method argument from nunique and distinct_count. (#10411) @bdice
  • Refactor stream compaction APIs (#10370) @PointKernel
  • Add scan_aggregation and reduce_aggregation derived types. (#10357) @nvdbaranec
  • Avoid decimal type narrowing for decimal binops (#10299) @galipremsagar
  • Rewrites sample API (#10262) @isVoid
  • Remove probe-time null equality parameters in cudf::hash_join (#10260) @PointKernel
  • Enable proper Index round-tripping in orc reader and writer (#10170) @galipremsagar
  • Add JNI for strings::split_re and strings::split_record_re (#10139) @ttnghia
  • Change cudf::strings::find_multiple to return a lists column (#10134) @davidwendt
  • Remove the option to completely disable decimal128 columns in the ORC reader (#10127) @vuule
  • Remove deprecated code (#10124) @vyasr
  • Update gpu_utils.py to reflect current CUDA support. (#10113) @bdice
  • Optimize compaction operations (#10030) @PointKernel
  • Remove deprecated method Series.set_index. (#9945) @bdice
  • Add cudf::strings::findall_record API (#9911) @davidwendt
  • Upgrade arrow & pyarrow to 6.0.1 (#9686) @galipremsagar

πŸ› Bug Fixes

  • Fix an issue with tdigest merge aggregations. (#10506) @nvdbaranec
  • Batch of fixes for index overflows in grid stride loops. (#10448) @nvdbaranec
  • Update dask_cudf imports to be compatible with latest dask (#10442) @rlratzel
  • Fix for integer overflow in contiguous-split (#10437) @jbrennan333
  • Fix has_null predicate for drop_list_duplicates on nested structs (#10436) @sperlingxx
  • Fix empty reduce with List output and non-List input (#10435) @sperlingxx
  • Fix list and struct meta generation issue in dask-cudf (#10434) @galipremsagar
  • Fix error in cudf.to_numeric when a bool input is passed (#10431) @galipremsagar
  • Support cupy array in quantile input (#10429) @galipremsagar
  • Fix benchmarks to work with new aggregation types (#10428) @davidwendt
  • Fix cudf::shift to handle offset greater than column size (#10414) @davidwendt
  • Fix lifespan of the temporary directory that holds cuFile configuration file (#10403) @vuule
  • Fix error thrown in compiled-binaryop benchmark (#10398) @davidwendt
  • Limiting async allocator using alignment of 512 (#10395) @rongou
  • Include <optional> in multibyte split. (#10385) @bdice
  • Fix issue with column and scalar re-assignment (#10377) @galipremsagar
  • Fix floating point data generation in benchmarks (#10372) @vuule
  • Avoid overflow in fused_concatenate_kernel output_index (#10344) @abellina
  • Remove is_relationally_comparable for table device views (#10342) @davidwendt
  • Fix debug compile error in device_span to column_view conversion (#10331) @davidwendt
  • Add Pascal support to JCUDF transcode (row_conversion) (#10329) @mythrocks
  • Fix std::bad_alloc exception due to JIT reserving a huge buffer (#10317) @ttnghia
  • Fixes up the overflowed fixed-point round on nullable column (#10316) @sperlingxx
  • Fix DataFrame slicing issues for empty cases (#10310) @brandon-b-miller
  • Fix documentation issues (#10307) @ajschmidt8
  • Allow Java bindings to use default decimal precisions when writing columns (#10276) @sperlingxx
  • Fix incorrect slicing of GDS read/write calls (#10274) @vuule
  • Fix out-of-memory error in compiled-binaryop benchmark (#10269) @davidwendt
  • Add tests of reflected ufuncs and fix behavior of logical reflected ufuncs (#10261) @vyasr
  • Remove probe-time null equality parameters in cudf::hash_join (#10260) @PointKernel
  • Fix out-of-memory error in UrlDecode benchmark (#10258) @davidwendt
  • Fix groupby reductions that perform operations on source type instead of target type (#10250) @ttnghia
  • Fix small leak in explode (#10245) @revans2
  • Yet another small JNI memory leak (#10238) @revans2
  • Fix regex octal parsing to limit to 3 characters (#10233) @davidwendt
  • Fix string to decimal128 conversion handling large exponents (#10231) @davidwendt
  • Fix JNI leak on copy to device (#10229) @revans2
  • Fix the data generator element size for decimal types (#10225) @vuule
  • Fix decimal metadata in parquet writer (#10224) @galipremsagar
  • Fix strings handling of hex in regex pattern (#10220) @davidwendt
  • Fix docs builds (#10216) @ajschmidt8
  • Fix a leftover _has_nulls change from Nullate (#10211) @devavret
  • Fix bitmask of the output for JNI of lists::drop_list_duplicates (#10210) @ttnghia
  • Fix compile error in binaryop/compiled/util.cpp (#10209) @ttnghia
  • Skip ORC and Parquet readers' benchmark cases that are not currently supported (#10194) @vuule
  • Fix JNI leak of a cudf::column_view native class. (#10171) @revans2
  • Enable proper Index round-tripping in orc reader and writer (#10170) @galipremsagar
  • Convert Column Name to String Before Using Struct Column Factory (#10156) @isVoid
  • Preserve the correct ListDtype while creating an identical empty column (#10151) @galipremsagar
  • benchmark fixture - static object pointer fix (#10145) @karthikeyann
  • Fix UDF Caching (#10133) @brandon-b-miller
  • Raise duplicate column error in DataFrame.rename (#10120) @galipremsagar
  • Fix flaky memory usage test by guaranteeing array size. (#10114) @vyasr
  • Encode values from python callback for C++ (#10103) @jdye64
  • Add check for regex instructions causing an infinite-loop (#10095) @davidwendt
  • Remove metadata singleton from nvtext normalizer (#10090) @davidwendt
  • Column equality testing fixes (#10011) @brandon-b-miller
  • Pin libcudf runtime dependency for cudf / libcudf-kafka nightlies (#9847) @charlesbluca

πŸ“– Documentation

  • Fix documentation for DataFrame.corr and Series.corr. (#10493) @bdice
  • Add cut to API docs (#10479) @shwina
  • Remove documentation for methods removed in #10124. (#10366) @bdice
  • Fix documentation issues (#10306) @ajschmidt8
  • Fix fixed_point binary operation documentation (#10198) @codereport
  • Remove cleaned up methods from docs (#10189) @galipremsagar
  • Update developer guide to recommend no default stream parameter. (#10136) @bdice
  • Update benchmarking guide to use NVBench. (#10093) @bdice

πŸš€ New Features

  • Add StringIO support to read_text (#10465) @cwharris
  • Add support for tdigest and merge_tdigest aggregations through cudf::reduce (#10433) @nvdbaranec
  • JNI support for Collect Ops in Reduction (#10427) @sperlingxx
  • Enable read_text with dask_cudf using byte_range (#10407) @ChrisJar
  • Add cudf::stable_sort_by_key (#10387) @PointKernel
  • Implement maps_column_view abstraction over LIST&lt;STRUCT&lt;K,V&gt;&gt; (#10380) @mythrocks
  • Support Java bindings for Avro reader (#10373) @HaoYang670
  • Refactor stream compaction APIs (#10370) @PointKernel
  • Support collect aggregations in reduction (#10353) @sperlingxx
  • Refactor array_ufunc for Index and unify across all classes (#10346) @vyasr
  • Add JNI for extract_list_element with index column (#10341) @firestarman
  • Support min and max operations for structs in rolling window (#10332) @ttnghia
  • Add device create_sequence_table for benchmarks (#10300) @karthikeyann
  • Enable numpy ufuncs for DataFrame (#10287) @vyasr
  • move input generation for json benchmark to device (#10281) @karthikeyann
  • move input generation for type dispatcher benchmark to device (#10280) @karthikeyann
  • move input generation for copy benchmark to device (#10279) @karthikeyann
  • generate url decode benchmark input in device (#10278) @karthikeyann
  • device input generation in join bench (#10277) @karthikeyann
  • Add nvtext::byte_pair_encoding API (#10270) @davidwendt
  • Prevent internal usage of expensive APIs (#10263) @vyasr
  • Column to JCUDF row for tables with strings (#10235) @hyperbolic2346
  • Support percent_rank() aggregation (#10227) @mythrocks
  • Refactor Series.array_ufunc (#10217) @vyasr
  • Reduce pytest runtime (#10203) @brandon-b-miller
  • Add regex flags parameter to python cudf strings split (#10185) @davidwendt
  • Support for MOD, PMOD and PYMOD for decimal32/64/128 (#10179) @codereport
  • Adding string row size iterator for row to column and column to row conversion (#10157) @hyperbolic2346
  • Add file size counter to cuIO benchmarks (#10154) @vuule
  • byte_range support for multibyte_split/read_text (#10150) @cwharris
  • Add JNI for strings::split_re and strings::split_record_re (#10139) @ttnghia
  • Add maxSplit parameter to Java binding for strings:split (#10137) @ttnghia
  • Add libcudf strings split API that accepts regex pattern (#10128) @davidwendt
  • generate benchmark input in device (#10109) @karthikeyann
  • Avoid nan_as_null op if nan_count is 0 (#10082) @galipremsagar
  • Add Dataframe and Index nunique (#10077) @martinfalisse
  • Support nanosecond timestamps in parquet (#10063) @PointKernel
  • Java bindings for mixed semi and anti joins (#10040) @jlowe
  • Implement mixed equality/conditional semi/anti joins (#10037) @vyasr
  • Optimize compaction operations (#10030) @PointKernel
  • Support args= in Series.apply (#9982) @brandon-b-miller
  • Add cudf::strings::findall_record API (#9911) @davidwendt
  • Add covariance for sort groupby (python) (#9889) @mayankanand007
  • Implement DataFrame diff() (#9817) @skirui-source
  • Implement DataFrame pct_change (#9805) @skirui-source
  • Support segmented reductions and null mask reductions (#9621) @isVoid
  • Add 'spearman' correlation method for dataframe.corr and series.corr (#7141) @dominicshanshan

πŸ› οΈ Improvements

  • Add scipy skip for a test (#10502) @galipremsagar
  • Temporarily disable new ops-bot functionality (#10496) @ajschmidt8
  • Include <cstddef> to fix compilation of parquet reader on GCC 11. (#10483) @bdice
  • Pin dask and distributed (#10481) @galipremsagar
  • MD5 refactoring. (#10445) @bdice
  • Remove or split up Frame methods that use the index (#10439) @vyasr
  • Centralization of tdigest aggregation code. (#10422) @nvdbaranec
  • Simplify column binary operations (#10421) @vyasr
  • Add .github/ops-bot.yaml config file (#10420) @ajschmidt8
  • Use list of columns for methods in Groupby.pyx (#10419) @isVoid
  • Remove warnings in test_timedelta.py (#10418) @galipremsagar
  • Fix some warnings in test_parquet.py (#10416) @galipremsagar
  • JNI support for segmented reduce (#10413) @revans2
  • Clean up null mask after purging null entries (#10412) @sperlingxx
  • Drop unsupported method argument from nunique and distinct_count. (#10411) @bdice
  • Use str instead of builtins.str. (#10410) @bdice
  • Fix warnings in test_rolling (#10405) @bdice
  • Enable codecov github-check in CI (#10404) @galipremsagar
  • Fix warnings in test_cuda_apply, test_numerical, test_pickling, test_unaops. (#10402) @bdice
  • Set column names in _from_columns_like_self factory (#10400) @isVoid
  • Refactor nvtx annotations in cudf & dask-cudf (#10396) @galipremsagar
  • Consolidate .cov and .corr for sort groupby (#10386) @skirui-source
  • Consolidate some Frame APIs (#10381) @vyasr
  • Refactor hash functions and hash_combine (#10379) @bdice
  • Add nvtx annotations for Series and Index (#10374) @galipremsagar
  • Refactor filling.repeat API (#10371) @isVoid
  • Move standalone UTF8 functions from string_view.hpp to utf8.hpp (#10369) @davidwendt
  • Remove doc for deprecated function one_hot_encoding (#10367) @isVoid
  • Refactor array function (#10364) @vyasr
  • Fix warnings in test_csv.py. (#10362) @bdice
  • Implement a mixin for binops (#10360) @vyasr
  • Refactor cython interface: copying.pyx (#10359) @isVoid
  • Implement a mixin for scans (#10358) @vyasr
  • Add scan_aggregation and reduce_aggregation derived types. (#10357) @nvdbaranec
  • Add cleanup of python artifacts (#10355) @galipremsagar
  • Fix warnings in test_categorical.py. (#10354) @bdice
  • Create a dispatcher for invoking regex kernel functions (#10349) @davidwendt
  • Fix codecov in CI (#10347) @galipremsagar
  • Enable caching for memory_usage calculation in Column (#10345) @galipremsagar
  • C++17 cleanup: traits replace std::enable_if<>::type with std::enable_if_t (#10343) @karthikeyann
  • JNI: Support appending DECIMAL128 into ColumnBuilder in terms of byte array (#10338) @sperlingxx
  • multibyte_split test improvements (#10328) @vuule
  • Fix warnings in test_binops.py. (#10327) @bdice
  • Fix warnings from pandas in test_array_ufunc.py. (#10324) @bdice
  • Update upload script (#10321) @ajschmidt8
  • Move hash type declarations to hashing.hpp (#10320) @davidwendt
  • C++17 cleanup: traits replace ::value with _v (#10319) @karthikeyann
  • Remove internal columns usage (#10315) @vyasr
  • Remove extraneous build.sh parameter (#10313) @ajschmidt8
  • Add const qualifier to MurmurHash3_32::hash_combine (#10311) @davidwendt
  • Remove TODO in libcudf_kafka recipe (#10309) @ajschmidt8
  • Add conversions between column_view and device_span<T const>. (#10302) @bdice
  • Avoid decimal type narrowing for decimal binops (#10299) @galipremsagar
  • Deprecate DataFrame.iteritems and introduce .items (#10298) @galipremsagar
  • Explicitly request CMake use gnu++17 over c++17 (#10297) @robertmaynard
  • Add copyright check as pre-commit hook. (#10290) @vyasr
  • DataFrame insert and creation optimizations (#10285) @galipremsagar
  • Improve hash join detail functions (#10273) @PointKernel
  • Replace custom cached_property implementation with functools (#10272) @shwina
  • Rewrites sample API (#10262) @isVoid
  • Bump hadoop-common from 3.1.0 to 3.1.4 in /java (#10259) @dependabot[bot]
  • Remove making redundant copy across code-base (#10257) @galipremsagar
  • Add more nvtx annotations (#10256) @galipremsagar
  • Add copyright check in cudf (#10253) @galipremsagar
  • Remove redundant copies in fillna to improve performance (#10241) @galipremsagar
  • Remove std::numeric_limit specializations for timestamp & durations (#10239) @codereport
  • Optimize DataFrame creation across code-base (#10236) @galipremsagar
  • Change pytest distribution algorithm and increase parallelism in CI (#10232) @galipremsagar
  • Add environment variables for I/O thread pool and slice sizes (#10218) @vuule
  • Add regex flags to strings findall functions (#10208) @davidwendt
  • Update dask-cudf parquet tests to reflect upstream bugfixes to _metadata (#10206) @charlesbluca
  • Remove unnecessary nunique function in Series. (#10205) @martinfalisse
  • Refactor DataFrame tests. (#10204) @bdice
  • Rewrites column.__setitem__, Use boolean_mask_scatter (#10202) @isVoid
  • Java utilities to aid in accelerating aggregations on 128-bit types (#10201) @jlowe
  • Fix docstrings alignment in Frame methods (#10199) @galipremsagar
  • Fix cuco pair issue in hash join (#10195) @PointKernel
  • Replace dask groupby .index usages with .by (#10193) @galipremsagar
  • Add regex flags to strings extract function (#10192) @davidwendt
  • Forward-merge branch-22.02 to branch-22.04 (#10191) @bdice
  • Add CMake install rule for tests (#10190) @ajschmidt8
  • Unpin dask & distributed (#10182) @galipremsagar
  • Add comments to explain test validation (#10176) @galipremsagar
  • Reduce warnings in pytest output (#10168) @bdice
  • Some consolidation of indexed frame methods (#10167) @vyasr
  • Refactor isin implementations (#10165) @vyasr
  • Faster struct row comparator (#10164) @devavret
  • Refactor groupby::get_groups. (#10161) @bdice
  • Deprecate decimal_cols_as_float in ORC reader (C++ layer) (#10152) @vuule
  • Replace ccache with sccache (#10146) @ajschmidt8
  • Murmur3 hash kernel cleanup (#10143) @rwlee
  • Deprecate decimal_cols_as_float in ORC reader (#10142) @galipremsagar
  • Run pyupgrade 2.31.0. (#10141) @bdice
  • Remove drop_nan from internal IndexedFrame._drop_na_rows. (#10140) @bdice
  • Change cudf::strings::find_multiple to return a lists column (#10134) @davidwendt
  • Update cmake-format script for branch 22.04. (#10132) @bdice
  • Accept r-value references in convert_table_for_return(): (#10131) @mythrocks
  • Remove the option to completely disable decimal128 columns in the ORC reader (#10127) @vuule
  • Remove deprecated code (#10124) @vyasr
  • Update gpu_utils.py to reflect current CUDA support. (#10113) @bdice
  • Remove benchmarks suffix (#10112) @bdice
  • Update cudf java binding version to 22.04.0-SNAPSHOT (#10084) @pxLi
  • Remove unnecessary docker files. (#10069) @vyasr
  • Limit benchmark iterations using environment variable (#10060) @karthikeyann
  • Add timing chart for libcudf build metrics report page (#10038) @davidwendt
  • JNI: Rewrite growBuffersAndRows to accelerate the HostColumnBuilder (#10025) @sperlingxx
  • Reduce redundant code in CUDF JNI (#10019) @mythrocks
  • Make snappy decompress check more efficient (#9995) @cheinger
  • Remove deprecated method Series.set_index. (#9945) @bdice
  • Implement a mixin for reductions (#9925) @vyasr
  • JNI: Push back decimal utils from spark-rapids (#9907) @sperlingxx
  • Add assert_column_memory_* (#9882) @isVoid
  • Add CUDF_UNREACHABLE macro. (#9727) @bdice
  • Upgrade arrow & pyarrow to 6.0.1 (#9686) @galipremsagar
cudf - v22.02.00

Published by GPUtester over 2 years ago

🚨 Breaking Changes

  • ORC writer API changes for granular statistics (#10058) @mythrocks
  • decimal128 Support for to/from_arrow (#9986) @codereport
  • Remove deprecated method one_hot_encoding (#9977) @isVoid
  • Remove str.subword_tokenize (#9968) @VibhuJawa
  • Remove deprecated method parameter from merge and join. (#9944) @bdice
  • Remove deprecated method DataFrame.hash_columns. (#9943) @bdice
  • Remove deprecated method Series.hash_encode. (#9942) @bdice
  • Refactoring ceil/round/floor code for datetime64 types (#9926) @mayankanand007
  • Introduce nan_as_null parameter for cudf.Index (#9893) @galipremsagar
  • Add regex_flags parameter to strings replace_re functions (#9878) @davidwendt
  • Break tie for top categorical columns in Series.describe (#9867) @isVoid
  • Add partitioning support in parquet writer (#9810) @devavret
  • Move drop_duplicates, drop_na, _gather, take to IndexFrame and create their _base_index counterparts (#9807) @isVoid
  • Raise temporary error for decimal128 types in parquet reader (#9804) @galipremsagar
  • Change default dtype of all nulls column from float to object (#9803) @galipremsagar
  • Remove unused masked udf cython/c++ code (#9792) @brandon-b-miller
  • Pick smallest decimal type with required precision in ORC reader (#9775) @vuule
  • Add decimal128 support to Parquet reader and writer (#9765) @vuule
  • Refactor TableTest assertion methods to a separate utility class (#9762) @jlowe
  • Use cuFile direct device reads/writes by default in cuIO (#9722) @vuule
  • Match pandas scalar result types in reductions (#9717) @brandon-b-miller
  • Add parameters to control row group size in Parquet writer (#9677) @vuule
  • Refactor bit counting APIs, introduce valid/null count functions, and split host/device side code for segmented counts. (#9588) @bdice
  • Add support for decimal128 in cudf python (#9533) @galipremsagar
  • Implement lists::index_of() to find positions in list rows (#9510) @mythrocks
  • Rewriting row/column conversions for Spark <-> cudf data conversions (#8444) @hyperbolic2346

πŸ› Bug Fixes

  • Add check for negative stripe index in ORC reader (#10074) @vuule
  • Update Java tests to expect DECIMAL128 from Arrow (#10073) @jlowe
  • Avoid index materialization when DataFrame is created with un-named Series objects (#10071) @galipremsagar
  • fix gcc 11 compilation errors (#10067) @rongou
  • Fix columns ordering issue in parquet reader (#10066) @galipremsagar
  • Fix dataframe setitem with ndarray types (#10056) @galipremsagar
  • Remove implicit copy due to conversion from cudf::size_type and size_t (#10045) @robertmaynard
  • Include <optional> in headers that use std::optional (#10044) @robertmaynard
  • Fix repr and concat of StructColumn (#10042) @galipremsagar
  • Include row group level stats when writing ORC files (#10041) @vuule
  • build.sh respects the --build_metrics and --incl_cache_stats flags (#10035) @robertmaynard
  • Fix memory leaks in JNI native code. (#10029) @mythrocks
  • Update JNI to use new arena mr constructor (#10027) @rongou
  • Fix null check when comparing structs in arg_min operation of reduction/groupby (#10026) @ttnghia
  • Wrap CI script shell variables in quotes to fix local testing. (#10018) @bdice
  • cudftestutil no longer propagates compiler flags to external users (#10017) @robertmaynard
  • Remove CUDA_DEVICE_CALLABLE macro usage (#10015) @hyperbolic2346
  • Add missing list filling header in meta.yaml (#10007) @devavret
  • Fix conda recipes for custreamz & cudf_kafka (#10003) @ajschmidt8
  • Fix matching regex word-boundary (\b) in strings replace (#9997) @davidwendt
  • Fix null check when comparing structs in min and max reduction/groupby operations (#9994) @ttnghia
  • Fix octal pattern matching in regex string (#9993) @davidwendt
  • decimal128 Support for to/from_arrow (#9986) @codereport
  • Fix groupby shift/diff/fill after selecting from a GroupBy (#9984) @shwina
  • Fix the overflow problem of decimal rescale (#9966) @sperlingxx
  • Use default value for decimal precision in parquet writer when not specified (#9963) @devavret
  • Fix cudf java build error. (#9958) @firestarman
  • Use gpuci_mamba_retry to install local artifacts. (#9951) @bdice
  • Fix regression HostColumnVectorCore requiring native libs (#9948) @jlowe
  • Rename aggregate_metadata in writer to fix name collision (#9938) @devavret
  • Fixed issue with percentile_approx where output tdigests could have uninitialized data at the end. (#9931) @nvdbaranec
  • Resolve racecheck errors in ORC kernels (#9916) @vuule
  • Fix the java build after parquet partitioning support (#9908) @revans2
  • Fix compilation of benchmark for parquet writer. (#9905) @bdice
  • Fix a memcheck error in ORC writer (#9896) @vuule
  • Introduce nan_as_null parameter for cudf.Index (#9893) @galipremsagar
  • Fix fallback to sort aggregation for grouping only hash aggregate (#9891) @abellina
  • Add zlib to cudfjni link when using static libcudf library dependency (#9890) @jlowe
  • TimedeltaIndex constructor raises an AttributeError. (#9884) @skirui-source
  • Fix cudf.Scalar string datetime construction (#9875) @brandon-b-miller
  • Load libcufile.so with RTLD_NODELETE flag (#9872) @vuule
  • Break tie for top categorical columns in Series.describe (#9867) @isVoid
  • Fix null handling for structs min and arg_min in groupby, groupby scan, reduction, and inclusive_scan (#9864) @ttnghia
  • Add one-level list encoding support in parquet reader (#9848) @PointKernel
  • Fix an out-of-bounds read in validity copying in contiguous_split. (#9842) @nvdbaranec
  • Fix join of MultiIndex to Index with one column and overlapping name. (#9830) @vyasr
  • Fix caching in Series.applymap (#9821) @brandon-b-miller
  • Enforce boolean ascending for dask-cudf sort_values (#9814) @charlesbluca
  • Fix ORC writer crash with empty input columns (#9808) @vuule
  • Change default dtype of all nulls column from float to object (#9803) @galipremsagar
  • Load native dependencies when Java ColumnView is loaded (#9800) @jlowe
  • Fix dtype-argument bug in dask_cudf read_csv (#9796) @rjzamora
  • Fix overflow for min calculation in strings::from_timestamps (#9793) @revans2
  • Fix memory error due to lambda return type deduction limitation (#9778) @karthikeyann
  • Revert regex $/EOL end-of-string new-line special case handling (#9774) @davidwendt
  • Fix missing streams (#9767) @karthikeyann
  • Fix make_empty_scalar_like on list_type (#9759) @sperlingxx
  • Update cmake and conda to 22.02 (#9746) @devavret
  • Fix out-of-bounds memory write in decimal128-to-string conversion (#9740) @davidwendt
  • Match pandas scalar result types in reductions (#9717) @brandon-b-miller
  • Fix regex non-multiline EOL/$ matching strings ending with a new-line (#9715) @davidwendt
  • Fixed build by adding more checks for int8, int16 (#9707) @razajafri
  • Fix null handling when boolean dtype is passed (#9691) @galipremsagar
  • Fix stream usage in segmented_gather() (#9679) @mythrocks

πŸ“– Documentation

  • Update decimal dtypes related docs entries (#10072) @galipremsagar
  • Fix regex doc describing hexadecimal escape characters (#10009) @davidwendt
  • Fix cudf compilation instructions. (#9956) @esoha-nvidia
  • Fix see also links for IO APIs (#9895) @galipremsagar
  • Fix build instructions for libcudf doxygen (#9837) @davidwendt
  • Fix some doxygen warnings and add missing documentation (#9770) @karthikeyann
  • update cuda version in local build (#9736) @karthikeyann
  • Fix doxygen for enum types in libcudf (#9724) @davidwendt
  • Spell check fixes (#9682) @karthikeyann
  • Fix links in C++ Developer Guide. (#9675) @bdice

πŸš€ New Features

  • Remove libcudacxx patch needed for nvcc 11.4 (#10057) @robertmaynard
  • Allow CuPy 10 (#10048) @jakirkham
  • Add in support for NULL_LOGICAL_AND and NULL_LOGICAL_OR binops (#10016) @revans2
  • Add groupby.transform (only support for aggregations) (#10005) @shwina
  • Add partitioning support to Parquet chunked writer (#10000) @devavret
  • Add jni for sequences (#9972) @wbo4958
  • Java bindings for mixed left, inner, and full joins (#9941) @jlowe
  • Java bindings for JSON reader support (#9940) @wbo4958
  • Enable transpose for string columns in cudf python (#9937) @galipremsagar
  • Support structs for cudf::contains with column/scalar input (#9929) @ttnghia
  • Implement mixed equality/conditional joins (#9917) @vyasr
  • Add cudf::strings::extract_all API (#9909) @davidwendt
  • Implement JNI for cudf::scatter APIs (#9903) @ttnghia
  • JNI: Function to copy and set validity from bool column. (#9901) @mythrocks
  • Add dictionary support to cudf::copy_if_else (#9887) @davidwendt
  • add run_benchmarks target for running benchmarks with json output (#9879) @karthikeyann
  • Add regex_flags parameter to strings replace_re functions (#9878) @davidwendt
  • Add_suffix and add_prefix for DataFrames and Series (#9846) @mayankanand007
  • Add JNI for cudf::drop_duplicates (#9841) @ttnghia
  • Implement per-list sequence (#9839) @ttnghia
  • adding series.transpose (#9835) @mayankanand007
  • Adding support for Series.autocorr (#9833) @mayankanand007
  • Support round operation on datetime64 datatypes (#9820) @mayankanand007
  • Add partitioning support in parquet writer (#9810) @devavret
  • Raise temporary error for decimal128 types in parquet reader (#9804) @galipremsagar
  • Add decimal128 support to Parquet reader and writer (#9765) @vuule
  • Optimize groupby::scan (#9754) @PointKernel
  • Add sample JNI API (#9728) @res-life
  • Support min and max in inclusive scan for structs (#9725) @ttnghia
  • Add first and last method to IndexedFrame (#9710) @isVoid
  • Support min and max reduction for structs (#9697) @ttnghia
  • Add parameters to control row group size in Parquet writer (#9677) @vuule
  • Run compute-sanitizer in nightly build (#9641) @karthikeyann
  • Implement Series.datetime.floor (#9571) @skirui-source
  • ceil/floor for DatetimeIndex (#9554) @mayankanand007
  • Add support for decimal128 in cudf python (#9533) @galipremsagar
  • Implement lists::index_of() to find positions in list rows (#9510) @mythrocks
  • custreamz oauth callback for kafka (librdkafka) (#9486) @jdye64
  • Add Pearson correlation for sort groupby (python) (#9166) @skirui-source
  • Interchange dataframe protocol (#9071) @iskode
  • Rewriting row/column conversions for Spark <-> cudf data conversions (#8444) @hyperbolic2346

πŸ› οΈ Improvements

  • Prepare upload scripts for Python 3.7 removal (#10092) @Ethyling
  • Simplify custreamz and cudf_kafka recipes files (#10065) @Ethyling
  • ORC writer API changes for granular statistics (#10058) @mythrocks
  • Remove python constraints in cutreamz and cudf_kafka recipes (#10052) @Ethyling
  • Unpin dask and distributed in CI (#10028) @galipremsagar
  • Add _from_column_like_self factory (#10022) @isVoid
  • Replace custom CUDA bindings previously provided by RMM with official CUDA Python bindings (#10008) @shwina
  • Use cuda::std::is_arithmetic in cudf::is_numeric trait. (#9996) @bdice
  • Clean up CUDA stream use in cuIO (#9991) @vuule
  • Use addressed-ordered first fit for the pinned memory pool (#9989) @rongou
  • Add strings tests to transpose_test.cpp (#9985) @davidwendt
  • Use gpuci_mamba_retry on Java CI. (#9983) @bdice
  • Remove deprecated method one_hot_encoding (#9977) @isVoid
  • Minor cleanup of unused Python functions (#9974) @vyasr
  • Use new efficient partitioned parquet writing in cuDF (#9971) @devavret
  • Remove str.subword_tokenize (#9968) @VibhuJawa
  • Forward-merge branch-21.12 to branch-22.02 (#9947) @bdice
  • Remove deprecated method parameter from merge and join. (#9944) @bdice
  • Remove deprecated method DataFrame.hash_columns. (#9943) @bdice
  • Remove deprecated method Series.hash_encode. (#9942) @bdice
  • use ninja in java ci build (#9933) @rongou
  • Add build-time publish step to cpu build script (#9927) @davidwendt
  • Refactoring ceil/round/floor code for datetime64 types (#9926) @mayankanand007
  • Remove various unused functions (#9922) @vyasr
  • Raise in query if dtype is not supported (#9921) @brandon-b-miller
  • Add missing imports tests (#9920) @Ethyling
  • Spark Decimal128 hashing (#9919) @rwlee
  • Replace thrust/std::get with structured bindings (#9915) @codereport
  • Upgrade thrust version to 1.15 (#9912) @robertmaynard
  • Remove conda envs for CUDA 11.0 and 11.2. (#9910) @bdice
  • Return count of set bits from inplace_bitmask_and. (#9904) @bdice
  • Use dynamic nullate for join hasher and equality comparator (#9902) @davidwendt
  • Update ucx-py version on release using rvc (#9897) @Ethyling
  • Remove IncludeCategories from .clang-format (#9876) @codereport
  • Support statically linking CUDA runtime for Java bindings (#9873) @jlowe
  • Add clang-tidy to libcudf (#9860) @codereport
  • Remove deprecated methods from Java Table class (#9853) @jlowe
  • Add test for map column metadata handling in ORC writer (#9852) @vuule
  • Use pandas to_offset to parse frequency string in date_range (#9843) @isVoid
  • add templated benchmark with fixture (#9838) @karthikeyann
  • Use list of column inputs for apply_boolean_mask (#9832) @isVoid
  • Added a few more tests for Decimal to String cast (#9818) @razajafri
  • Run doctests. (#9815) @bdice
  • Avoid overflow for fixed_point round (#9809) @sperlingxx
  • Move drop_duplicates, drop_na, _gather, take to IndexFrame and create their _base_index counterparts (#9807) @isVoid
  • Use vector factories for host-device copies. (#9806) @bdice
  • Refactor host device macros (#9797) @vyasr
  • Remove unused masked udf cython/c++ code (#9792) @brandon-b-miller
  • Allow custom sort functions for dask-cudf sort_values (#9789) @charlesbluca
  • Improve build time of libcudf iterator tests (#9788) @davidwendt
  • Copy Java native dependencies directly into classpath (#9787) @jlowe
  • Add decimal types to cuIO benchmarks (#9776) @vuule
  • Pick smallest decimal type with required precision in ORC reader (#9775) @vuule
  • Avoid overflow for fixed_point cudf::cast and performance optimization (#9772) @codereport
  • Use CTAD with Thrust function objects (#9768) @codereport
  • Refactor TableTest assertion methods to a separate utility class (#9762) @jlowe
  • Use Java classloader to find test resources (#9760) @jlowe
  • Allow cast decimal128 to string and add tests (#9756) @razajafri
  • Load balance optimization for contiguous_split (#9755) @nvdbaranec
  • Consolidate and improve reset_index (#9750) @isVoid
  • Update to UCX-Py 0.24 (#9748) @pentschev
  • Skip cufile tests in JNI build script (#9744) @pxLi
  • Enable string to decimal 128 cast (#9742) @razajafri
  • Use stop instead of stop_. (#9735) @bdice
  • Forward-merge branch-21.12 to branch-22.02 (#9730) @bdice
  • Improve cmake format script (#9723) @vyasr
  • Use cuFile direct device reads/writes by default in cuIO (#9722) @vuule
  • Add directory-partitioned data support to cudf.read_parquet (#9720) @rjzamora
  • Use stream allocator adaptor for hash join table (#9704) @PointKernel
  • Update check for inf/nan strings in libcudf float conversion to ignore case (#9694) @davidwendt
  • Update cudf JNI to 22.02.0-SNAPSHOT (#9681) @pxLi
  • Replace cudf's concurrent_ordered_map with cuco::static_map in semi/anti joins (#9666) @vyasr
  • Some improvements to parse_decimal function and bindings for is_fixed_point (#9658) @razajafri
  • Add utility to format ninja-log build times (#9631) @davidwendt
  • Allow runtime has_nulls parameter for row operators (#9623) @davidwendt
  • Use fsspec.parquet for improved read_parquet performance from remote storage (#9589) @rjzamora
  • Refactor bit counting APIs, introduce valid/null count functions, and split host/device side code for segmented counts. (#9588) @bdice
  • Use List of Columns as Input for drop_nulls, gather and drop_duplicates (#9558) @isVoid
  • Simplify merge internals and reduce overhead (#9516) @vyasr
  • Add struct generation support in datagenerator & fuzz tests (#9180) @galipremsagar
  • Simplify write_csv by removing unnecessary writer/impl classes (#9089) @cwharris
cudf - v21.12.02

Published by GPUtester almost 3 years ago

v21.12.02

cudf - v21.12.01

Published by GPUtester almost 3 years ago

v21.12.01

cudf - v21.12.00

Published by GPUtester almost 3 years ago

🚨 Breaking Changes

  • Update bitmask_and and bitmask_or to return a pair of resulting mask and count of unset bits (#9616) @PointKernel
  • Remove sizeof and standardize on memory_usage (#9544) @vyasr
  • Add support for single-line regex anchors ^/$ in contains_re (#9482) @davidwendt
  • Refactor sorting APIs (#9464) @vyasr
  • Update Java nvcomp JNI bindings to nvcomp 2.x API (#9384) @jbrennan333
  • Support Python UDFs written in terms of rows (#9343) @brandon-b-miller
  • JNI: Support nested types in ORC writer (#9334) @firestarman
  • Optionally nullify out-of-bounds indices in segmented_gather(). (#9318) @mythrocks
  • Refactor cuIO timestamp processing with cuda::std::chrono (#9278) @PointKernel
  • Various internal MultiIndex improvements (#9243) @vyasr

πŸ› Bug Fixes

  • Fix read_parquet bug for bytes input (#9669) @rjzamora
  • Use _gather internal for sort_* (#9668) @isVoid
  • Fix behavior of equals for non-DataFrame Frames and add tests. (#9653) @vyasr
  • Dont recompute output size if it is already available (#9649) @abellina
  • Fix read_parquet bug for extended dtypes from remote storage (#9638) @rjzamora
  • add const when getting data from a JNI data wrapper (#9637) @wjxiz1992
  • Fix debrotli issue on CUDA 11.5 (#9632) @vuule
  • Use std::size_t when computing join output size (#9626) @jlowe
  • Fix usecols parameter handling in dask_cudf.read_csv (#9618) @galipremsagar
  • Add support for string &#39;nan&#39;, &#39;inf&#39; &amp; &#39;-inf&#39; values while type-casting to float (#9613) @galipremsagar
  • Avoid passing NativeFileDatasource to pyarrow in read_parquet (#9608) @rjzamora
  • Fix test failure with cuda 11.5 in row_bit_count tests. (#9581) @nvdbaranec
  • Correct _LIBCUDACXX_CUDACC_VER value computation (#9579) @robertmaynard
  • Increase max RLE stream size estimate to avoid potential overflows (#9568) @vuule
  • Fix edge case in tdigest scalar generation for groups containing all nulls. (#9551) @nvdbaranec
  • Fix pytests failing in cuda-11.5 environment (#9547) @galipremsagar
  • compile libnvcomp with PTDS if requested (#9540) @jbrennan333
  • Fix segmented_gather() for null LIST rows (#9537) @mythrocks
  • Deprecate DataFrame.label_encoding, use private _label_encoding method internally. (#9535) @bdice
  • Fix several test and benchmark issues related to bitmask allocations. (#9521) @nvdbaranec
  • Fix for inserting duplicates in groupby result cache (#9508) @karthikeyann
  • Fix mismatched types error in clip() when using non int64 numeric types (#9498) @davidwendt
  • Match conda pinnings for style checks (revert part of #9412, #9433). (#9490) @bdice
  • Make sure all dask-cudf supported aggs are handled in _tree_node_agg (#9487) @charlesbluca
  • Resolve hash_columns FutureWarning in dask_cudf (#9481) @pentschev
  • Add fixed point to AllTypes in libcudf unit tests (#9472) @karthikeyann
  • Fix regex handling of embedded null characters (#9470) @davidwendt
  • Fix memcheck error in copy-if-else (#9467) @davidwendt
  • Fix bug in dask_cudf.read_parquet for index=False (#9453) @rjzamora
  • Preserve the decimal scale when creating a default scalar (#9449) @revans2
  • Push down parent nulls when flattening nested columns. (#9443) @mythrocks
  • Fix memcheck error in gtest SegmentedGatherTest/GatherSliced (#9442) @davidwendt
  • Revert "Fix quantile division / partition handling for dask-cudf sort… (#9438) @charlesbluca
  • Allow int-like objects for the decimals argument in round (#9428) @shwina
  • Fix stream compaction's drop_duplicates API to use stable sort (#9417) @ttnghia
  • Skip Comparing Uniform Window Results in Var/std Tests (#9416) @isVoid
  • Fix StructColumn.to_pandas type handling issues (#9388) @galipremsagar
  • Correct issues in the build dir cudf-config.cmake (#9386) @robertmaynard
  • Fix Java table partition test to account for non-deterministic ordering (#9385) @jlowe
  • Fix timestamp truncation/overflow bugs in orc/parquet (#9382) @PointKernel
  • Fix the crash in stats code (#9368) @devavret
  • Make Series.hash_encode results reproducible. (#9366) @bdice
  • Fix libcudf compile warnings on debug 11.4 build (#9360) @davidwendt
  • Fail gracefully when compiling python UDFs that attempt to access columns with unsupported dtypes (#9359) @brandon-b-miller
  • Set pass_filenames: false in mypy pre-commit configuration. (#9349) @bdice
  • Fix cudf_assert in cudf::io::orc::gpu::gpuDecodeOrcColumnData (#9348) @davidwendt
  • Fix memcheck error in groupby-tdigest get_scalar_minmax (#9339) @davidwendt
  • Optimizations for cudf.concat when axis=1 (#9333) @galipremsagar
  • Use f-string in join helper warning message. (#9325) @bdice
  • Avoid casting to list or struct dtypes in dask_cudf.read_parquet (#9314) @rjzamora
  • Fix null count in statistics for parquet (#9303) @devavret
  • Potential overflow of decimal32 when casting to int64_t (#9287) @codereport
  • Fix quantile division / partition handling for dask-cudf sort on null dataframes (#9259) @charlesbluca
  • Updating cudf version also updates rapids cmake branch (#9249) @robertmaynard
  • Implement one_hot_encoding in libcudf and bind to python (#9229) @isVoid
  • BUG FIX: CSV Writer ignores the header parameter when no metadata is provided (#8740) @skirui-source

πŸ“– Documentation

  • Update Documentation to use TYPED_TEST_SUITE (#9654) @codereport
  • Add dedicated page for StringHandling in python docs (#9624) @galipremsagar
  • Update docstring of DataFrame.merge (#9572) @galipremsagar
  • Use raw strings to avoid SyntaxErrors in parsed docstrings. (#9526) @bdice
  • Add example to docstrings in rolling.apply (#9522) @isVoid
  • Update help message to escape quotes in ./build.sh --cmake-args. (#9494) @bdice
  • Improve Python docstring formatting. (#9493) @bdice
  • Update table of I/O supported types (#9476) @vuule
  • Document invalid regex patterns as undefined behavior (#9473) @davidwendt
  • Miscellaneous documentation fixes to cudf (#9471) @galipremsagar
  • Fix many documentation errors in libcudf. (#9355) @karthikeyann
  • Fixing SubwordTokenizer docs issue (#9354) @mayankanand007
  • Improved deprecation warnings. (#9347) @bdice
  • doc reorder mr, stream to stream, mr (#9308) @karthikeyann
  • Deprecate method parameters to DataFrame.join, DataFrame.merge. (#9291) @bdice
  • Added deprecation warning for .label_encoding() (#9289) @mayankanand007

πŸš€ New Features

  • Enable Series.divide and DataFrame.divide (#9630) @vyasr
  • Update bitmask_and and bitmask_or to return a pair of resulting mask and count of unset bits (#9616) @PointKernel
  • Add handling of mixed numeric types in to_dlpack (#9585) @galipremsagar
  • Support re.Pattern object for pat arg in str.replace (#9573) @davidwendt
  • Add JNI for lists::drop_list_duplicates with keys-values input column (#9553) @ttnghia
  • Support structs column in min, max, argmin and argmax groupby aggregate() and scan() (#9545) @ttnghia
  • Move libcudacxx to use rapids_cpm and use newer versions (#9539) @robertmaynard
  • Add scan min/max support for chrono types to libcudf reduction-scan (not groupby scan) (#9518) @davidwendt
  • Support args= in apply (#9514) @brandon-b-miller
  • Add groupby scan min/max support for strings values (#9502) @davidwendt
  • Add list output option to character_ngrams() function (#9499) @davidwendt
  • More granular column selection in ORC reader (#9496) @vuule
  • add min_periods, ddof to groupby covariance, & correlation aggregation (#9492) @karthikeyann
  • Implement Series.datetime.floor (#9488) @skirui-source
  • Enable linting of CMake files using pre-commit (#9484) @vyasr
  • Add support for single-line regex anchors ^/$ in contains_re (#9482) @davidwendt
  • Augment order_by to Accept a List of null_precedence (#9455) @isVoid
  • Add format API for list column of strings (#9454) @davidwendt
  • Enable Datetime/Timedelta dtypes in Masked UDFs (#9451) @brandon-b-miller
  • Add cudf python groupby.diff (#9446) @karthikeyann
  • Implement lists::stable_sort_lists for stable sorting of elements within each row of lists column (#9425) @ttnghia
  • add ctest memcheck using cuda-sanitizer (#9414) @karthikeyann
  • Support Unary Operations in Masked UDF (#9409) @isVoid
  • Move Several Series Function to Frame (#9394) @isVoid
  • MD5 Python hash API (#9390) @bdice
  • Add cudf strings is_title API (#9380) @davidwendt
  • Enable casting to int64, uint64, and double in AST code. (#9379) @vyasr
  • Add support for writing ORC with map columns (#9369) @vuule
  • extract_list_elements() with column_view indices (#9367) @mythrocks
  • Reimplement lists::drop_list_duplicates for keys-values lists columns (#9345) @ttnghia
  • Support Python UDFs written in terms of rows (#9343) @brandon-b-miller
  • JNI: Support nested types in ORC writer (#9334) @firestarman
  • Optionally nullify out-of-bounds indices in segmented_gather(). (#9318) @mythrocks
  • Add shallow hash function and shallow equality comparison for column_view (#9312) @karthikeyann
  • Add CudaMemoryBuffer for cudaMalloc memory using RMM cuda_memory_resource (#9311) @rongou
  • Add parameters to control row index stride and stripe size in ORC writer (#9310) @vuule
  • Add na_position param to dask-cudf sort_values (#9264) @charlesbluca
  • Add ascending parameter for dask-cudf sort_values (#9250) @charlesbluca
  • New array conversion methods (#9236) @vyasr
  • Series apply method backed by masked UDFs (#9217) @brandon-b-miller
  • Grouping by frequency and resampling (#9178) @shwina
  • Pure-python masked UDFs (#9174) @brandon-b-miller
  • Add Covariance, Pearson correlation for sort groupby (libcudf) (#9154) @karthikeyann
  • Add calendrical_month_sequence in c++ and date_range in python (#8886) @shwina

πŸ› οΈ Improvements

  • Followup to PR 9088 comments (#9659) @cwharris
  • Update cuCollections to version that supports installed libcudacxx (#9633) @robertmaynard
  • Add 11.5 dev.yml to cudf (#9617) @galipremsagar
  • Add xfail for parquet reader 11.5 issue (#9612) @galipremsagar
  • remove deprecated Rmm.initialize method (#9607) @rongou
  • Use HostColumnVectorCore for child columns in JCudfSerialization.unpackHostColumnVectors (#9596) @sperlingxx
  • Set RMM pool to a fixed size in JNI (#9583) @rongou
  • Use nvCOMP for Snappy compression/decompression (#9582) @vuule
  • Build CUDA version agnostic packages for dask-cudf (#9578) @Ethyling
  • Fixed tests warning: "TYPED_TEST_CASE is deprecated, please use TYPED_TEST_SUITE" (#9574) @ttnghia
  • Enable CMake format in CI and fix style (#9570) @vyasr
  • Add NVTX Start/End Ranges to JNI (#9563) @abellina
  • Add librdkafka and python-confluent-kafka to dev conda environments s… (#9562) @jdye64
  • Add offsets_begin/end() to strings_column_view (#9559) @davidwendt
  • remove alignment options for RMM jni (#9550) @rongou
  • Add axis parameter passthrough to DataFrame and Series take for pandas API compatibility (#9549) @dantegd
  • Remove sizeof and standardize on memory_usage (#9544) @vyasr
  • Adds cudaProfilerStart/cudaProfilerStop in JNI api (#9543) @abellina
  • Generalize comparison binary operations (#9542) @vyasr
  • Expose APIs to wrap CUDA or RMM allocations with a Java device buffer instance (#9538) @jlowe
  • Add scan sum support for duration types to libcudf (#9536) @davidwendt
  • Force inlining to improve AST performance (#9530) @vyasr
  • Generalize some more indexed frame methods (#9529) @vyasr
  • Add Java bindings for rolling window stddev aggregation (#9527) @razajafri
  • catch rmm::out_of_memory exceptions in jni (#9525) @rongou
  • Add an overload of make_empty_column with type_id parameter (#9524) @ttnghia
  • Accelerate conditional inner joins with larger right tables (#9523) @vyasr
  • Initial pass of generalizing decimal support in cudf python layer (#9517) @galipremsagar
  • Cleanup for flattening nested columns (#9509) @rwlee
  • Enable running tests using RMM arena and async memory resources (#9506) @rongou
  • Remove dependency on six. (#9495) @bdice
  • Cleanup some libcudf strings gtests (#9489) @davidwendt
  • Rename strings/array_tests.cu to strings/array_tests.cpp (#9480) @davidwendt
  • Refactor sorting APIs (#9464) @vyasr
  • Implement DataFrame.hash_values, deprecate DataFrame.hash_columns. (#9458) @bdice
  • Deprecate Series.hash_encode. (#9457) @bdice
  • Update conda recipes for Enhanced Compatibility effort (#9456) @ajschmidt8
  • Small clean up to simplify column selection code in ORC reader (#9444) @vuule
  • add missing stream to scalar.is_valid() wherever stream is available (#9436) @karthikeyann
  • Adds Deprecation Warnings to one_hot_encoding and Implement get_dummies with Cython API (#9435) @isVoid
  • Update pre-commit hook URLs. (#9433) @bdice
  • Remove pyarrow import in dask_cudf.io.parquet (#9429) @charlesbluca
  • Miscellaneous improvements for UDFs (#9422) @isVoid
  • Use pre-commit for CI (#9412) @vyasr
  • Update to UCX-Py 0.23 (#9407) @pentschev
  • Expose OutOfBoundsPolicy in JNI for Table.gather (#9406) @abellina
  • Improvements to tdigest aggregation code. (#9403) @nvdbaranec
  • Add Java API to deserialize a table to host columns (#9402) @jlowe
  • Frame copy to use class instead of type() (#9397) @madsbk
  • Change all DeprecationWarnings to FutureWarning. (#9392) @bdice
  • Update Java nvcomp JNI bindings to nvcomp 2.x API (#9384) @jbrennan333
  • Add IndexedFrame class and move SingleColumnFrame to a separate module (#9378) @vyasr
  • Support Arrow NativeFile and PythonFile for remote ORC storage (#9377) @rjzamora
  • Use Arrow PythonFile for remote CSV storage (#9376) @rjzamora
  • Add multi-threaded writing to GDS writes (#9372) @devavret
  • Miscellaneous column cleanup (#9370) @vyasr
  • Use single kernel to extract all groups in cudf::strings::extract (#9358) @davidwendt
  • Consolidate binary ops into Frame (#9357) @isVoid
  • Move rank scan implementations from scan_inclusive.cu to rank_scan.cu (#9351) @davidwendt
  • Remove usage of deprecated thrust::host_space_tag. (#9350) @bdice
  • Use Default Memory Resource for Temporaries in reduction.cpp (#9344) @isVoid
  • Fix Cython compilation warnings. (#9327) @bdice
  • Fix some unused variable warnings in libcudf (#9326) @davidwendt
  • Use optional-iterator for copy-if-else kernel (#9324) @davidwendt
  • Remove Table class (#9315) @vyasr
  • Unpin dask and distributed in CI (#9307) @galipremsagar
  • Add optional-iterator support to indexalator (#9306) @davidwendt
  • Consolidate more methods in Frame (#9305) @vyasr
  • Add Arrow-NativeFile and PythonFile support to read_parquet and read_csv in cudf (#9304) @rjzamora
  • Pin mypy in .pre-commit-config.yaml to match conda environment pinning. (#9300) @bdice
  • Use gather.hpp when gather-map exists in device memory (#9299) @davidwendt
  • Fix Automerger for Branch-21.12 from branch-21.10 (#9285) @galipremsagar
  • Refactor cuIO timestamp processing with cuda::std::chrono (#9278) @PointKernel
  • Change strings copy_if_else to use optional-iterator instead of pair-iterator (#9266) @davidwendt
  • Update cudf java bindings to 21.12.0-SNAPSHOT (#9248) @pxLi
  • Various internal MultiIndex improvements (#9243) @vyasr
  • Add detail interface for split and slice(table_view), refactors both function with host_span (#9226) @isVoid
  • Refactor MD5 implementation. (#9212) @bdice
  • Update groupby result_cache to allow sharing intermediate results based on column_view instead of requests. (#9195) @karthikeyann
  • Use nvcomp's snappy decompressor in avro reader (#9181) @devavret
  • Add isocalendar API support (#9169) @marlenezw
  • Simplify read_json by removing unnecessary reader/impl classes (#9088) @cwharris
  • Simplify read_csv by removing unnecessary reader/impl classes (#9041) @cwharris
  • Refactor hash join with cuCollections multimap (#8934) @PointKernel
cudf - v21.10.01

Published by GPUtester about 3 years ago

v21.10.01

cudf - v21.10.00

Published by GPUtester about 3 years ago

🚨 Breaking Changes

  • Remove Cython APIs for table view generation (#9199) @vyasr
  • Upgrade pandas version in cudf (#9147) @galipremsagar
  • Make AST operators nullable (#9096) @vyasr
  • Remove the option to pass data types as strings to read_csv and read_json (#9079) @vuule
  • Update JNI java CSV APIs to not use deprecated API (#9066) @revans2
  • Support additional format specifiers in from_timestamps (#9047) @davidwendt
  • Expose expression base class publicly and simplify public AST API (#9045) @vyasr
  • Add support for struct type in ORC writer (#9025) @vuule
  • Remove aliases of various api.types APIs from utils.dtypes. (#9011) @vyasr
  • Java bindings for conditional join output sizes (#9002) @jlowe
  • Move compute_column API out of ast namespace (#8957) @vyasr
  • cudf.dtype function (#8949) @shwina
  • Refactor Frame reductions (#8944) @vyasr
  • Add nested column selection to parquet reader (#8933) @devavret
  • JNI Aggregation Type Changes (#8919) @revans2
  • Add groupby_aggregation and groupby_scan_aggregation classes and force their usage. (#8906) @nvdbaranec
  • Expand CSV and JSON reader APIs to accept dtypes as a vector or map of data_type objects (#8856) @vuule
  • Change cudf docs theme to pydata theme (#8746) @galipremsagar
  • Enable compiled binary ops in libcudf, python and java (#8741) @karthikeyann
  • Make groupby transform-like op order match original data order (#8720) @isVoid

πŸ› Bug Fixes

  • fixed_point cudf::groupby for mean aggregation (#9296) @codereport
  • Fix interleave_columns when the input string lists column having empty child column (#9292) @ttnghia
  • Update nvcomp to include fixes for installation of headers (#9276) @devavret
  • Fix Java column leak in testParquetWriteMap (#9271) @jlowe
  • Fix call to thrust::reduce_by_key in argmin/argmax libcudf groupby (#9263) @davidwendt
  • Fixing empty input to getMapValue crashing (#9262) @hyperbolic2346
  • Fix duplicate names issue in MultiIndex.deserialize (#9258) @galipremsagar
  • Dataframe.sort_index optimizations (#9238) @galipremsagar
  • Temporarily disabling problematic test in parquet writer (#9230) @devavret
  • Explicitly disable groupby on unsupported key types. (#9227) @mythrocks
  • Fix gather for sliced input structs column (#9218) @ttnghia
  • Fix JNI code for left semi and anti joins (#9207) @jlowe
  • Only install thrust when using a non 'system' version (#9206) @robertmaynard
  • Remove zlib from libcudf public CMake dependencies (#9204) @robertmaynard
  • Fix out-of-bounds memory read in orc gpuEncodeOrcColumnData (#9196) @davidwendt
  • Fix gather() for STRUCT inputs with no nulls in members. (#9194) @mythrocks
  • get_cucollections properly uses rapids_cpm_find (#9189) @robertmaynard
  • rapids-export correctly reference build code block and doc strings (#9186) @robertmaynard
  • Fix logic while parsing the sum statistic for numerical orc columns (#9183) @ayushdg
  • Add handling for nulls in dask_cudf.sorting.quantile_divisions (#9171) @charlesbluca
  • Approximate overflow detection in ORC statistics (#9163) @vuule
  • Use decimal precision metadata when reading from parquet files (#9162) @shwina
  • Fix variable name in Java build script (#9161) @jlowe
  • Import rapids-cmake modules using the correct cmake variable. (#9149) @robertmaynard
  • Fix conditional joins with empty left table (#9146) @vyasr
  • Fix joining on indexes with duplicate level names (#9137) @shwina
  • Fixes missing child column name in dtype while reading ORC file. (#9134) @rgsl888prabhu
  • Apply type metadata after column is slice-copied (#9131) @isVoid
  • Fix a bug: inner_join_size return zero if build table is empty (#9128) @PointKernel
  • Fix multi hive-partition parquet reading in dask-cudf (#9122) @rjzamora
  • Support null literals in expressions (#9117) @vyasr
  • Fix cudf::hash_join output size for struct joins (#9107) @jlowe
  • Import fix (#9104) @shwina
  • Fix cudf::strings::is_fixed_point checking of overflow for decimal32 (#9093) @davidwendt
  • Fix branch_stack calculation in row_bit_count() (#9076) @mythrocks
  • Fetch rapids-cmake to work around cuCollection cmake issue (#9075) @jlowe
  • Fix compilation errors in groupby benchmarks. (#9072) @nvdbaranec
  • Preserve float16 upscaling (#9069) @galipremsagar
  • Fix memcheck read error in libcudf contiguous_split (#9067) @davidwendt
  • Add support for reading ORC file with no row group index (#9060) @rgsl888prabhu
  • Various multiindex related fixes (#9036) @shwina
  • Avoid rebuilding cython in build.sh (#9034) @brandon-b-miller
  • Add support for percentile dispatch in dask_cudf (#9031) @galipremsagar
  • cudf resolve nvcc 11.0 compiler crashes during codegen (#9028) @robertmaynard
  • Fetch correct grouping keys agg of dask groupby (#9022) @galipremsagar
  • Allow where() to work with a Series and other=cudf.NA (#9019) @sarahyurick
  • Use correct index when returning Series from GroupBy.apply() (#9016) @charlesbluca
  • Fix Dataframe indexer setitem when array is passed (#9006) @galipremsagar
  • Fix ORC reading of files with struct columns that have null values (#9005) @vuule
  • Ensure JNI native libraries load when CompiledExpression loads (#8997) @jlowe
  • Fix memory read error in get_dremel_data in page_enc.cu (#8995) @davidwendt
  • Fix memory write error in get_list_child_to_list_row_mapping utility (#8994) @davidwendt
  • Fix debug compile error for csv_test.cpp (#8981) @davidwendt
  • Fix memory read/write error in concatenate_lists_ignore_null (#8978) @davidwendt
  • Fix concatenation of cudf.RangeIndex (#8970) @galipremsagar
  • Java conditional joins should not require matching column counts (#8955) @jlowe
  • Fix concatenate empty structs (#8947) @sperlingxx
  • Fix cuda-memcheck errors for some libcudf functions (#8941) @davidwendt
  • Apply series name to result of SeriesGroupby.apply() (#8939) @charlesbluca
  • cdef packed_columns as cppclass instead of struct (#8936) @charlesbluca
  • Inserting a cudf.NA into a DataFrame (#8923) @sarahyurick
  • Support casting with Pandas dtype aliases (#8920) @sarahyurick
  • Allow sort_values to accept same kind values as Pandas (#8912) @sarahyurick
  • Enable casting to pandas nullable dtypes (#8889) @brandon-b-miller
  • Fix libcudf memory errors (#8884) @karthikeyann
  • Throw KeyError when accessing field from struct with nonexistent key (#8880) @NV-jpt
  • replace auto with auto& ref for cast<&> (#8866) @karthikeyann
  • Add missing include<optional> in binops (#8864) @karthikeyann
  • Fix select_dtypes to work when non-class dtypes present in dataframe (#8849) @sarahyurick
  • Re-enable JSON tests (#8843) @vuule
  • Support header with embedded delimiter in csv writer (#8798) @davidwendt

πŸ“– Documentation

  • Add IO docs page in cudf documentation (#9145) @galipremsagar
  • use correct namespace in cuio code examples (#9037) @cwharris
  • Restructuring Contributing doc (#9026) @iskode
  • Update stable version in readme (#9008) @galipremsagar
  • Add spans and more include guidelines to libcudf developer guide (#8931) @harrism
  • Update Java build instructions to mention Arrow S3 and Docker (#8867) @jlowe
  • List GDS-enabled formats in the docs (#8805) @vuule
  • Change cudf docs theme to pydata theme (#8746) @galipremsagar

πŸš€ New Features

  • Revert "Add shallow hash function and shallow equality comparison for column_view (#9185)" (#9283) @karthikeyann
  • Align DataFrame.apply signature with pandas (#9275) @brandon-b-miller
  • Add struct type support for drop_list_duplicates (#9202) @ttnghia
  • support CUDA async memory resource in JNI (#9201) @rongou
  • Add shallow hash function and shallow equality comparison for column_view (#9185) @karthikeyann
  • Superimpose null masks for STRUCT columns. (#9144) @mythrocks
  • Implemented bindings for ceil timestamp operation (#9141) @shaneding
  • Adding MAP type support for ORC Reader (#9132) @rgsl888prabhu
  • Implement interleave_columns for lists with arbitrary nested type (#9130) @ttnghia
  • Add python bindings to fixed-size window and groupby rolling.var, rolling.std (#9097) @isVoid
  • Make AST operators nullable (#9096) @vyasr
  • Java bindings for approx_percentile (#9094) @andygrove
  • Add dseries.struct.explode (#9086) @isVoid
  • Add support for BaseIndexer in Rolling APIs (#9085) @galipremsagar
  • Remove the option to pass data types as strings to read_csv and read_json (#9079) @vuule
  • Add handling for nested dicts in dask-cudf groupby (#9054) @charlesbluca
  • Added Series.dt.is_quarter_start and Series.dt.is_quarter_end (#9046) @TravisHester
  • Support nested types for nth_element reduction (#9043) @sperlingxx
  • Update sort groupby to use non-atomic operation (#9035) @karthikeyann
  • Add support for struct type in ORC writer (#9025) @vuule
  • Implement interleave_columns for structs columns (#9012) @ttnghia
  • Add groupby first and last aggregations (#9004) @shwina
  • Add DecimalBaseColumn and move as_decimal_column (#9001) @isVoid
  • Python/Cython bindings for multibyte_split (#8998) @jdye64
  • Support scalar months in add_calendrical_months, extends API to INT32 support (#8991) @isVoid
  • Added Series.dt.is_month_end (#8989) @TravisHester
  • Support for using tdigests to compute approximate percentiles. (#8983) @nvdbaranec
  • Support "unflatten" of columns flattened via flatten_nested_columns(): (#8956) @mythrocks
  • Implement timestamp ceil (#8942) @shaneding
  • Add nested column selection to parquet reader (#8933) @devavret
  • Expose conditional join size calculation (#8928) @vyasr
  • Support Nulls in Timeseries Generator (#8925) @isVoid
  • Avoid index equality check in _CPackedColumns.from_py_table() (#8917) @charlesbluca
  • Add dot product binary op (#8909) @charlesbluca
  • Expose days_in_month function in libcudf and add python bindings (#8892) @isVoid
  • Series string repeat (#8882) @sarahyurick
  • Python binding for quarters (#8862) @shaneding
  • Expand CSV and JSON reader APIs to accept dtypes as a vector or map of data_type objects (#8856) @vuule
  • Add Java bindings for AST transform (#8846) @jlowe
  • Series datetime is_month_start (#8844) @sarahyurick
  • Support bracket syntax for cudf::strings::replace_with_backrefs group index values (#8841) @davidwendt
  • Support VARIANCE and STD aggregation in rolling op (#8809) @isVoid
  • Add quarters to libcudf datetime (#8779) @shaneding
  • Linear Interpolation of nans via cupy (#8767) @brandon-b-miller
  • Enable compiled binary ops in libcudf, python and java (#8741) @karthikeyann
  • Make groupby transform-like op order match original data order (#8720) @isVoid
  • multibyte_split (#8702) @cwharris
  • Implement JNI for strings:repeat_strings that repeats each string separately by different numbers of times (#8572) @ttnghia

πŸ› οΈ Improvements

  • Pin max dask and distributed versions to 2021.09.1 (#9286) @galipremsagar
  • Optimized fsspec data transfer for remote file-systems (#9265) @rjzamora
  • Skip dask-cudf tests on arm64 (#9252) @Ethyling
  • Use nvcomp's snappy compressor in ORC writer (#9242) @devavret
  • Only run imports tests on x86_64 (#9241) @Ethyling
  • Remove unnecessary call to device_uvector::release() (#9237) @harrism
  • Use nvcomp's snappy decompression in ORC reader (#9235) @devavret
  • Add grouped_rolling test with STRUCT groupby keys. (#9228) @mythrocks
  • Optimize cudf.concat for axis=0 (#9222) @galipremsagar
  • Fix some libcudf calls not passing the stream parameter (#9220) @davidwendt
  • Add min and max bounds for random dataframe generator numeric types (#9211) @galipremsagar
  • Improve performance of expression evaluation (#9210) @vyasr
  • Misc optimizations in cudf (#9203) @galipremsagar
  • Remove Cython APIs for table view generation (#9199) @vyasr
  • Add JNI support for drop_list_duplicates (#9198) @revans2
  • Update pandas versions in conda recipes and requirements.txt files (#9197) @galipremsagar
  • Minor C++17 cleanup of groupby.cu: structured bindings, more concise lambda, etc (#9193) @codereport
  • Explicit about bitwidth difference between cudf boolean and arrow boolean (#9192) @isVoid
  • Remove _source_index from MultiIndex (#9191) @vyasr
  • Fix typo in the name of cudf-testing-targets.cmake (#9190) @trxcllnt
  • Add support for single-digits in cudf::to_timestamps (#9173) @davidwendt
  • Fix cufilejni build include path (#9168) @pxLi
  • dask_cudf dispatch registering cleanup (#9160) @galipremsagar
  • Remove unneeded stream/mr from a cudf::make_strings_column (#9148) @davidwendt
  • Upgrade pandas version in cudf (#9147) @galipremsagar
  • make data chunk reader return unique_ptr (#9129) @cwharris
  • Add backend for percentile_lookup dispatch (#9118) @galipremsagar
  • Refactor implementation of column setitem (#9110) @vyasr
  • Fix compile warnings found using nvcc 11.4 (#9101) @davidwendt
  • Update to UCX-Py 0.22 (#9099) @pentschev
  • Simplify read_avro by removing unnecessary writer/impl classes (#9090) @cwharris
  • Allowing %f in format to return nanoseconds (#9081) @marlenezw
  • Java bindings for cudf::hash_join (#9080) @jlowe
  • Remove stale code in ColumnBase._fill (#9078) @isVoid
  • Add support for get_group in GroupBy (#9070) @galipremsagar
  • Remove remaining "support" methods from DataFrame (#9068) @vyasr
  • Update JNI java CSV APIs to not use deprecated API (#9066) @revans2
  • Added method to remove null_masks if the column has no nulls (#9061) @razajafri
  • Consolidate Several Series and Dataframe Methods (#9059) @isVoid
  • Remove usage of string based set_dtypes for csv & json readers (#9049) @galipremsagar
  • Remove some debug print statements from gtests (#9048) @davidwendt
  • Support additional format specifiers in from_timestamps (#9047) @davidwendt
  • Expose expression base class publicly and simplify public AST API (#9045) @vyasr
  • move filepath and mmap logic out of json/csv up to functions.cpp (#9040) @cwharris
  • Refactor Index hierarchy (#9039) @vyasr
  • cudf now leverages rapids-cmake to reduce CMake boilerplate (#9030) @robertmaynard
  • Add support for STRUCT input to groupby (#9024) @mythrocks
  • Refactor Frame scans (#9021) @vyasr
  • Remove duplicate set_categories code (#9018) @isVoid
  • Map support for ParquetWriter (#9013) @razajafri
  • Remove aliases of various api.types APIs from utils.dtypes. (#9011) @vyasr
  • Java bindings for conditional join output sizes (#9002) @jlowe
  • Remove _copy_construct factory (#8999) @vyasr
  • ENH Allow arbitrary CMake config options in build.sh (#8996) @dillon-cullinan
  • A small optimization for JNI copy column view to column vector (#8985) @revans2
  • Fix nvcc warnings in ORC writer (#8975) @devavret
  • Support nested structs in rank and dense rank (#8962) @rwlee
  • Move compute_column API out of ast namespace (#8957) @vyasr
  • Series datetime is_year_end and is_year_start (#8954) @marlenezw
  • Make Java AstNode public (#8953) @jlowe
  • Replace allocate with device_uvector for subword_tokenize internal tables (#8952) @davidwendt
  • cudf.dtype function (#8949) @shwina
  • Refactor Frame reductions (#8944) @vyasr
  • Add deprecation warning for Series.set_mask API (#8943) @galipremsagar
  • Move AST evaluator into a separate header (#8930) @vyasr
  • JNI Aggregation Type Changes (#8919) @revans2
  • Move template parameter to function parameter in cudf::detail::left_semi_anti_join (#8914) @davidwendt
  • Upgrade arrow & pyarrow to 5.0.0 (#8908) @galipremsagar
  • Add groupby_aggregation and groupby_scan_aggregation classes and force their usage. (#8906) @nvdbaranec
  • Move structs_column_tests.cu to .cpp. (#8902) @mythrocks
  • Add stream and memory-resource parameters to struct-scalar copy ctor (#8901) @davidwendt
  • Combine linearizer and ast_plan (#8900) @vyasr
  • Add Java bindings for conditional join gather maps (#8888) @jlowe
  • Remove max version pin for dask & distributed on development branch (#8881) @galipremsagar
  • fix cufilejni build w/ c++17 (#8877) @pxLi
  • Add struct accessor to dask-cudf (#8874) @NV-jpt
  • Migrate dask-cudf CudfEngine to leverage ArrowDatasetEngine (#8871) @rjzamora
  • Add JNI for extract_quarter, add_calendrical_months, and is_leap_year (#8863) @revans2
  • Change cudf::scalar copy and move constructors to protected (#8857) @davidwendt
  • Replace is_same&lt;&gt;::value with is_same_v&lt;&gt; (#8852) @codereport
  • Add min pytorch version to importorskip in pytest (#8851) @galipremsagar
  • Java bindings for regex replace (#8847) @jlowe
  • Remove make strings children with null mask (#8830) @davidwendt
  • Refactor conditional joins (#8815) @vyasr
  • Small cleanup (unused headers / commented code removals) (#8799) @codereport
  • ENH Replace gpuci_conda_retry with gpuci_mamba_retry (#8770) @dillon-cullinan
  • Update cudf java bindings to 21.10.0-SNAPSHOT (#8765) @pxLi
  • Refactor and improve join benchmarks with nvbench (#8734) @PointKernel
  • Refactor Python factories and remove usage of Table for libcudf output handling (#8687) @vyasr
  • Optimize URL Decoding (#8622) @gaohao95
  • Parquet writer dictionary encoding refactor (#8476) @devavret
  • Use nvcomp's snappy decompression in parquet reader (#8252) @devavret
  • Use nvcomp's snappy compressor in parquet writer (#8229) @devavret
cudf - v21.08.03

Published by GPUtester about 3 years ago

v21.08.03

cudf - v21.08.02

Published by GPUtester about 3 years ago

v21.08.02

cudf - v21.08.01

Published by GPUtester about 3 years ago

v21.08.01

cudf - v21.08.00

Published by GPUtester about 3 years ago

🚨 Breaking Changes

  • Fix a crash in pack() when being handed tables with no columns. (#8697) @nvdbaranec
  • Remove unused cudf::strings::create_offsets (#8663) @davidwendt
  • Add delimiter parameter to cudf::strings::capitalize() (#8620) @davidwendt
  • Change default datetime index resolution to ns to match pandas (#8611) @vyasr
  • Add sequence_type parameter to cudf::strings::title function (#8602) @davidwendt
  • Add strings::repeat_strings API that can repeat each string a different number of times (#8561) @ttnghia
  • String-to-boolean conversion is different from Pandas (#8549) @skirui-source
  • Add accurate hash join size functions (#8453) @PointKernel
  • Expose a Decimal32Dtype in cuDF Python (#8438) @skirui-source
  • Update dask make_meta changes to be compatible with dask upstream (#8426) @galipremsagar
  • Adapt cudf::scalar classes to changes in rmm::device_scalar (#8411) @harrism
  • Remove special Index class from the general index class hierarchy (#8309) @vyasr
  • Add first-class dtype utilities (#8308) @vyasr
  • ORC - Support reading multiple orc files/buffers in a single operation (#8142) @jdye64
  • Upgrade arrow to 4.0.1 (#7495) @galipremsagar

πŸ› Bug Fixes

  • Fix contains check in string column (#8834) @galipremsagar
  • Remove unused variable from row_bit_count_test. (#8829) @mythrocks
  • Fixes issue with null struct columns in ORC reader (#8819) @rgsl888prabhu
  • Set CMake vars for python/parquet support in libarrow builds (#8808) @vyasr
  • Handle empty child columns in row_bit_count() (#8791) @mythrocks
  • Revert "Remove cudf unneeded build time requirement of the cuda driver" (#8784) @robertmaynard
  • Fix isort error in utils.pyx (#8771) @charlesbluca
  • Handle sliced struct/list columns properly in concatenate() bounds checking. (#8760) @nvdbaranec
  • Fix issues with _CPackedColumns.serialize() handling of host and device data (#8759) @charlesbluca
  • Fix issues with MultiIndex in dropna, stack & reset_index (#8753) @galipremsagar
  • Write pandas extension types to parquet file metadata (#8749) @devavret
  • Fix where to handle DataFrame & Series input combination (#8747) @galipremsagar
  • Fix replace to handle null values correctly (#8744) @galipremsagar
  • Handle sliced structs properly in pack/contiguous_split. (#8739) @nvdbaranec
  • Fix issue in slice() where columns with a positive offset were computing null counts incorrectly. (#8738) @nvdbaranec
  • Fix cudf.Series constructor to handle list of sequences (#8735) @galipremsagar
  • Fix min/max sorted groupby aggregation on string column with nulls (argmin, argmax sentinel value missing on nulls) (#8731) @karthikeyann
  • Fix orc reader assert on create data_type in debug (#8706) @davidwendt
  • Fix min/max inclusive cudf::scan for strings column (#8705) @davidwendt
  • JNI: Fix driver version assertion logic in testGetCudaRuntimeInfo (#8701) @sperlingxx
  • Adding fix for skip_rows and crash in orc reader (#8700) @rgsl888prabhu
  • Bug fix: replace_nulls_policy functor not returning correct indices for gathermap (#8699) @isVoid
  • Fix a crash in pack() when being handed tables with no columns. (#8697) @nvdbaranec
  • Add post-processing steps to dask_cudf.groupby.CudfSeriesGroupby.aggregate (#8694) @charlesbluca
  • JNI build no longer looks for Arrow in conda environment (#8686) @jlowe
  • Handle arbitrarily different data in null list column rows when checking for equivalency. (#8666) @nvdbaranec
  • Add ConfigureNVBench to avoid concurrent main() entry points (#8662) @PointKernel
  • Pin *arrow to use *cuda in run (#8651) @jakirkham
  • Add proper support for tolerances in testing methods. (#8649) @vyasr
  • Support multi-char case conversion in capitalize function (#8647) @davidwendt
  • Fix repeated mangled names in read_csv with duplicate column names (#8645) @karthikeyann
  • Temporarily disable libcudf example build tests (#8642) @isVoid
  • Use conda-sourced cudf artifacts for libcudf example in CI (#8638) @isVoid
  • Ensure dev environment uses Arrow GPU packages (#8637) @charlesbluca
  • Fix bug that columns only initialized once when specified columns and index in dataframe ctor (#8628) @isVoid
  • Propagate **kwargs through to as_*_column methods (#8618) @shwina
  • Fix orc_reader_benchmark.cpp compile error (#8609) @davidwendt
  • Fix missed renumbering of Aggregation values (#8600) @revans2
  • Update cmake to 3.20.5 in the Java Docker image (#8593) @NvTimLiu
  • Fix bug in replace_with_backrefs when group has greedy quantifier (#8575) @davidwendt
  • Apply metadata to keys before returning in Frame._encode (#8560) @charlesbluca
  • Fix for strings containing special JSON characters in get_json_object(). (#8556) @nvdbaranec
  • Fix debug compile error in gather_struct_tests.cpp (#8554) @davidwendt
  • String-to-boolean conversion is different from Pandas (#8549) @skirui-source
  • Fix __repr__ output with display.max_rows is None (#8547) @galipremsagar
  • Fix size passed to column constructors in _with_type_metadata (#8539) @shwina
  • Properly retrieve last column when -1 is specified for column index (#8529) @isVoid
  • Fix importing apply from dask (#8517) @galipremsagar
  • Fix offset of the string dictionary length stream (#8515) @vuule
  • Fix double counting of selected columns in CSV reader (#8508) @ochan1
  • Incorrect map size in scatter_to_gather corrupts struct columns (#8507) @gerashegalov
  • replace_nulls properly propagates memory resource to gather calls (#8500) @robertmaynard
  • Disallow groupby aggs for StructColumns (#8499) @charlesbluca
  • Fixes out-of-bounds access for small files in unzip (#8498) @elstehle
  • Adding support for writing empty dataframe (#8490) @shaneding
  • Fix exclusive scan when including nulls and improve testing (#8478) @harrism
  • Add workaround for crash in libcudf debug build using output_indexalator in thrust::lower_bound (#8432) @davidwendt
  • Install only the same Thrust files that Thrust itself installs (#8420) @robertmaynard
  • Add nightly version for ucx-py in ci script (#8419) @galipremsagar
  • Fix null_equality config of rolling_collect_set (#8415) @sperlingxx
  • CollectSetAggregation: implement RollingAggregation interface (#8406) @sperlingxx
  • Handle pre-sliced nested columns in contiguous_split. (#8391) @nvdbaranec
  • Fix bitmask_tests.cpp host accessing device memory (#8370) @davidwendt
  • Fix concurrent_unordered_map to prevent accessing padding bits in pair_type (#8348) @davidwendt
  • BUG FIX: Raise appropriate strings error when concatenating strings column (#8290) @skirui-source
  • Make gpuCI and pre-commit style configurations consistent (#8215) @charlesbluca
  • Add collect list to dask-cudf groupby aggregations (#8045) @charlesbluca

πŸ“– Documentation

  • Update Python UDFs notebook (#8810) @brandon-b-miller
  • Fix dask.dataframe API docs links after reorg (#8772) @jsignell
  • Fix instructions for running cuDF/dask-cuDF tests in CONTRIBUTING.md (#8724) @shwina
  • Translate Markdown documentation to rST and remove recommonmark (#8698) @vyasr
  • Fixed spelling mistakes in libcudf documentation (#8664) @karthikeyann
  • Custom Sphinx Extension: PandasCompat (#8643) @isVoid
  • Fix README.md (#8535) @ajschmidt8
  • Change namespace contains_nulls to struct (#8523) @davidwendt
  • Add info about NVTX ranges to dev guide (#8461) @jrhemstad
  • Fixed documentation bug in groupby agg method (#8325) @ahmet-uyar

πŸš€ New Features

  • Fix concatenating structs (#8811) @shaneding
  • Implement JNI for groupby aggregations M2 and MERGE_M2 (#8763) @ttnghia
  • Bump isort to 5.6.4 and remove isort overrides made for 5.0.7 (#8755) @charlesbluca
  • Implement __setitem__ for StructColumn (#8737) @shaneding
  • Add is_leap_year to DateTimeProperties and DatetimeIndex (#8736) @isVoid
  • Add struct.explode() method (#8729) @shwina
  • Add DataFrame.to_struct() method to convert a DataFrame to a struct Series (#8728) @shwina
  • Add support for list type in ORC writer (#8723) @vuule
  • Fix slicing from struct columns and accessing struct columns (#8719) @shaneding
  • Add datetime::is_leap_year (#8711) @isVoid
  • Accessing struct columns from dask_cudf (#8675) @shaneding
  • Added pct_change to Series (#8650) @TravisHester
  • Add strings support to cudf::shift function (#8648) @davidwendt
  • Support Scatter struct_scalar (#8630) @isVoid
  • Struct scalar from host dictionary (#8629) @shaneding
  • Add dayofyear and day_of_year to Series, DatetimeColumn, and DatetimeIndex (#8626) @beckernick
  • JNI support for capitalize (#8624) @firestarman
  • Add delimiter parameter to cudf::strings::capitalize() (#8620) @davidwendt
  • Add NVBench in CMake (#8619) @PointKernel
  • Change default datetime index resolution to ns to match pandas (#8611) @vyasr
  • ListColumn __setitem__ (#8606) @brandon-b-miller
  • Implement groupby aggregations M2 and MERGE_M2 (#8605) @ttnghia
  • Add sequence_type parameter to cudf::strings::title function (#8602) @davidwendt
  • Adding support for list and struct type in ORC Reader (#8599) @rgsl888prabhu
  • Benchmark for strings::repeat_strings APIs (#8589) @ttnghia
  • Nested scalar support for copy if else (#8588) @gerashegalov
  • User specified decimal columns to float64 (#8587) @jdye64
  • Add get_element for struct column (#8578) @isVoid
  • Python changes for adding __getitem__ for struct (#8577) @shaneding
  • Add strings::repeat_strings API that can repeat each string a different number of times (#8561) @ttnghia
  • Refactor tests/iterator_utilities.hpp functions (#8540) @ttnghia
  • Support MERGE_LISTS and MERGE_SETS in Java package (#8516) @sperlingxx
  • Decimal support csv reader (#8511) @elstehle
  • Add column type tests (#8505) @isVoid
  • Warn when downscaling decimal columns (#8492) @ChrisJar
  • Add JNI for strings::repeat_strings (#8491) @ttnghia
  • Add Index.get_loc for Numerical, String Index support (#8489) @isVoid
  • Expose half_up rounding in cuDF (#8477) @shwina
  • Java APIs to fetch CUDA runtime info (#8465) @sperlingxx
  • Add str.edit_distance_matrix (#8463) @isVoid
  • Support constructing cudf.Scalar objects from host side lists (#8459) @brandon-b-miller
  • Add accurate hash join size functions (#8453) @PointKernel
  • Add cudf::strings::integer_to_hex convert API (#8450) @davidwendt
  • Create objects from iterables that contain cudf.NA (#8442) @brandon-b-miller
  • JNI bindings for sort_lists (#8439) @sperlingxx
  • Expose a Decimal32Dtype in cuDF Python (#8438) @skirui-source
  • Replace all_null() and all_valid() by iterator_all_nulls() and iterator_no_null() in tests (#8437) @ttnghia
  • Implement groupby MERGE_LISTS and MERGE_SETS aggregates (#8436) @ttnghia
  • Add public libcudf match_dictionaries API (#8429) @davidwendt
  • Add move constructors for string_scalar and struct_scalar (#8428) @ttnghia
  • Implement strings::repeat_strings (#8423) @ttnghia
  • STRUCT column support for cudf::merge. (#8422) @nvdbaranec
  • Implement reverse in libcudf (#8410) @shaneding
  • Support multiple input files/buffers for read_json (#8403) @jdye64
  • Improve test coverage for struct search (#8396) @ttnghia
  • Add groupby.fillna (#8362) @isVoid
  • Enable AST-based joining (#8214) @vyasr
  • Generalized null support in user defined functions (#8213) @brandon-b-miller
  • Add compiled binary operation (#8192) @karthikeyann
  • Implement .describe() for DataFrameGroupBy (#8179) @skirui-source
  • ORC - Support reading multiple orc files/buffers in a single operation (#8142) @jdye64
  • Add Python bindings for lists::concatenate_list_elements and expose them as .list.concat() (#8006) @shwina
  • Use Arrow URI FileSystem backed instance to retrieve remote files (#7709) @jdye64
  • Example to build custom application and link to libcudf (#7671) @isVoid
  • Upgrade arrow to 4.0.1 (#7495) @galipremsagar

πŸ› οΈ Improvements

  • Provide a better error message when CUDA::cuda_driver not found (#8794) @robertmaynard
  • Remove anonymous namespace from null_mask.cuh (#8786) @nvdbaranec
  • Allow cudf to be built without libcuda.so existing (#8751) @robertmaynard
  • Pin mimesis to &lt;4.1 (#8745) @galipremsagar
  • Update conda environment name for CI (#8692) @ajschmidt8
  • Remove flatbuffers dependency (#8671) @Ethyling
  • Add options to build Arrow with Python and Parquet support (#8670) @trxcllnt
  • Remove unused cudf::strings::create_offsets (#8663) @davidwendt
  • Update GDS lib version to 1.0.0 (#8654) @pxLi
  • Support for groupby/scan rank and dense_rank aggregations (#8652) @rwlee
  • Fix usage of deprecated arrow ipc API (#8632) @revans2
  • Use absolute imports in cudf (#8631) @galipremsagar
  • ENH Add Java CI build script (#8627) @dillon-cullinan
  • Add DeprecationWarning to ser.str.subword_tokenize (#8603) @VibhuJawa
  • Rewrite binary operations for improved performance and additional type support (#8598) @vyasr
  • Fix mypy errors surfacing because of numpy-1.21.0 (#8595) @galipremsagar
  • Remove unneeded includes from cudf::string_view headers (#8594) @davidwendt
  • Use cmake 3.20.1 as it is now required by rmm (#8586) @robertmaynard
  • Remove device debug symbols from cmake CUDF_CUDA_FLAGS (#8584) @davidwendt
  • Dask-CuDF: use default Dask Dataframe optimizer (#8581) @madsbk
  • Remove checking if an unsigned value is less than zero (#8579) @robertmaynard
  • Remove strings_count parameter from cudf::strings::detail::create_chars_child_column (#8576) @davidwendt
  • Make cudf.api.types imports consistent (#8571) @galipremsagar
  • Modernize libcudf basic example CMakeFile; updates CI build tests (#8568) @isVoid
  • Rename concatenate_tests.cu to .cpp (#8555) @davidwendt
  • enable window lead/lag test on struct (#8548) @wbo4958
  • Add Java methods to split and write column views (#8546) @razajafri
  • Small cleanup (#8534) @codereport
  • Unpin dask version in CI (#8533) @galipremsagar
  • Added optional flag for building Arrow with S3 filesystem support (#8531) @jdye64
  • Minor clean up of various internal column and frame utilities (#8528) @vyasr
  • Rename some copying_test source files .cu to .cpp (#8527) @davidwendt
  • Correct the last warnings and issues when using newer cuda versions (#8525) @robertmaynard
  • Correct unused parameter warnings in transform and unary ops (#8521) @robertmaynard
  • Correct unused parameter warnings in string algorithms (#8509) @robertmaynard
  • Add in JNI APIs for scan, replace_nulls, group_by.scan, and group_by.replace_nulls (#8503) @revans2
  • Fix 21.08 forward-merge conflicts (#8502) @ajschmidt8
  • Fix Cython formatting command in Contributing.md. (#8496) @marlenezw
  • Bug/correct unused parameters in reshape and text (#8495) @robertmaynard
  • Correct unused parameter warnings in partitioning and stream compact (#8494) @robertmaynard
  • Correct unused parameter warnings in labelling and list algorithms (#8493) @robertmaynard
  • Refactor index construction (#8485) @vyasr
  • Correct unused parameter warnings in replace algorithms (#8483) @robertmaynard
  • Correct unused parameter warnings in reduction algorithms (#8481) @robertmaynard
  • Correct unused parameter warnings in io algorithms (#8480) @robertmaynard
  • Correct unused parameter warnings in interop algorithms (#8479) @robertmaynard
  • Correct unused parameter warnings in filling algorithms (#8468) @robertmaynard
  • Correct unused parameter warnings in groupby (#8467) @robertmaynard
  • use libcu++ time_point as timestamp (#8466) @karthikeyann
  • Modify reprog_device::extract to return groups in a single pass (#8460) @davidwendt
  • Update minimum Dask requirement to 2021.6.0 (#8458) @pentschev
  • Fix failures when performing binary operations on DataFrames with empty columns (#8452) @ChrisJar
  • Fix conflicts in 8447 (#8448) @ajschmidt8
  • Add serialization methods for List and StructDtype (#8441) @charlesbluca
  • Replace make_empty_strings_column with make_empty_column (#8435) @davidwendt
  • JNI bindings for get_element (#8433) @revans2
  • Update dask make_meta changes to be compatible with dask upstream (#8426) @galipremsagar
  • Unpin dask version on CI (#8425) @galipremsagar
  • Add benchmark for strings/fixed_point convert APIs (#8417) @davidwendt
  • Adapt cudf::scalar classes to changes in rmm::device_scalar (#8411) @harrism
  • Add benchmark for strings/integers convert APIs (#8402) @davidwendt
  • Enable multi-file partitioning in dask_cudf.read_parquet (#8393) @rjzamora
  • Correct unused parameter warnings in rolling algorithms (#8390) @robertmaynard
  • Correct unused parameters in column round and search (#8389) @robertmaynard
  • Add functionality to apply Dtype metadata to ColumnBase (#8373) @charlesbluca
  • Refactor setting stack size in regex code (#8358) @davidwendt
  • Update Java bindings to 21.08-SNAPSHOT (#8344) @pxLi
  • Replace remaining uses of device_vector (#8343) @harrism
  • Statically link libnvcomp into libcudfjni (#8334) @jlowe
  • Resolve auto merge conflicts for Branch 21.08 from branch 21.06 (#8329) @galipremsagar
  • Minor code refactor for sorted_order (#8326) @wbo4958
  • Remove special Index class from the general index class hierarchy (#8309) @vyasr
  • Add first-class dtype utilities (#8308) @vyasr
  • Add option to link Java bindings with Arrow dynamically (#8307) @jlowe
  • Refactor ColumnMethods and its subclasses to remove column argument and require parent argument (#8306) @shwina
  • Refactor scatter for list columns (#8255) @isVoid
  • Expose pack/unpack API to Python (#8153) @charlesbluca
  • Adding cudf.cut method (#8002) @marlenezw
  • Optimize string gather performance for large strings (#7980) @gaohao95
  • Add peak memory usage tracking to cuIO benchmarks (#7770) @devavret
  • Updating Clang Version to 11.0.0 (#6695) @codereport
cudf - v21.06.01

Published by GPUtester over 3 years ago

cudf - v21.06.00

Published by GPUtester over 3 years ago

🚨 Breaking Changes

  • Add support for make_meta_obj dispatch in dask-cudf (#8342) @galipremsagar
  • Add separator-on-null parameter to strings concatenate APIs (#8282) @davidwendt
  • Introduce a common parent class for NumericalColumn and DecimalColumn (#8278) @vyasr
  • Update ORC statistics API to use C++17 standard library (#8241) @vuule
  • Preserve column hierarchy when getting NULL row from LIST column (#8206) @isVoid
  • Groupby.shift c++ API refactor and python binding (#8131) @isVoid

πŸ› Bug Fixes

  • Fix struct flattening to add a validity column only when the input column has null element (#8374) @ttnghia
  • Compilation fix: Remove redefinition for std::is_same_v() (#8369) @mythrocks
  • Add backward compatibility for dask-cudf to work with other versions of dask (#8368) @galipremsagar
  • Handle empty results with nested types in copy_if_else (#8359) @nvdbaranec
  • Handle nested column types properly for empty parquet files. (#8350) @nvdbaranec
  • Raise error when unsupported arguments are passed to dask_cudf.DataFrame.sort_values (#8349) @galipremsagar
  • Raise NotImplementedError for axis=1 in rank (#8347) @galipremsagar
  • Add support for make_meta_obj dispatch in dask-cudf (#8342) @galipremsagar
  • Update Java string concatenate test for single column (#8330) @tgravescs
  • Use empty_like in scatter (#8314) @revans2
  • Fix concatenate_lists_ignore_null on rows of all_nulls (#8312) @sperlingxx
  • Add separator-on-null parameter to strings concatenate APIs (#8282) @davidwendt
  • COLLECT_LIST support returning empty output columns. (#8279) @mythrocks
  • Update io util to convert path like object to string (#8275) @ayushdg
  • Fix result column types for empty inputs to rolling window (#8274) @mythrocks
  • Actually test equality in assert_groupby_results_equal (#8272) @shwina
  • CMake always explicitly specify a source files extension (#8270) @robertmaynard
  • Fix struct binary search and struct flattening (#8268) @ttnghia
  • Revert "patch thrust to fix intmax num elements limitation in scan_by_key" (#8263) @cwharris
  • upgrade dlpack to 0.5 (#8262) @cwharris
  • Fixes CSV-reader type inference for thousands separator and decimal point (#8261) @elstehle
  • Fix incorrect assertion in Java concat (#8258) @sperlingxx
  • Copy nested types upon construction (#8244) @isVoid
  • Preserve column hierarchy when getting NULL row from LIST column (#8206) @isVoid
  • Clip decimal binary op precision at max precision (#8194) @ChrisJar

πŸ“– Documentation

  • Add docstring for dask_cudf.read_csv (#8355) @galipremsagar
  • Fix cudf release version in readme (#8331) @galipremsagar
  • Fix structs column description in dev docs (#8318) @isVoid
  • Update readme with correct CUDA versions (#8315) @raydouglass
  • Add description of the cuIO GDS integration (#8293) @vuule
  • Remove unused parameter from copy_partition kernel documentation (#8283) @robertmaynard

πŸš€ New Features

  • Add support merging b/w categorical data (#8332) @galipremsagar
  • Java: Support struct scalar (#8327) @sperlingxx
  • added _is_homogeneous property (#8299) @shaneding
  • Added decimal writing for CSV writer (#8296) @kaatish
  • Java: Support creating a scalar from utf8 string (#8294) @firestarman
  • Add Java API for Concatenate strings with separator (#8289) @tgravescs
  • strings::join_list_elements options for empty list inputs (#8285) @ttnghia
  • Return python lists for getitem calls to list type series (#8265) @brandon-b-miller
  • add unit tests for lead/lag on list for row window (#8259) @wbo4958
  • Create a String column from UTF8 String byte arrays (#8257) @firestarman
  • Support scattering list_scalar (#8256) @isVoid
  • Implement lists::concatenate_list_elements (#8231) @ttnghia
  • Support for struct scalars. (#8220) @nvdbaranec
  • Add support for decimal types in ORC writer (#8198) @vuule
  • Support create lists column from a list_scalar (#8185) @isVoid
  • Groupby.shift c++ API refactor and python binding (#8131) @isVoid
  • Add groupby::replace_nulls(replace_policy) api (#7118) @isVoid

πŸ› οΈ Improvements

  • Support Dask + Distributed 2021.05.1 (#8392) @jakirkham
  • Add aliases for string methods (#8353) @shwina
  • Update environment variable used to determine cuda_version (#8321) @ajschmidt8
  • JNI: Refactor the code of making column from scalar (#8310) @firestarman
  • Update CHANGELOG.md links for calver (#8303) @ajschmidt8
  • Merge branch-0.19 into branch-21.06 (#8302) @ajschmidt8
  • use address and length for GDS reads/writes (#8301) @rongou
  • Update cudfjni version to 21.06.0 (#8292) @pxLi
  • Update docs build script (#8284) @ajschmidt8
  • Make device_buffer streams explicit and enforce move construction (#8280) @harrism
  • Introduce a common parent class for NumericalColumn and DecimalColumn (#8278) @vyasr
  • Do not add nulls to the hash table when null_equality::NOT_EQUAL is passed to left_semi_join and left_anti_join (#8277) @nvdbaranec
  • Enable implicit casting when concatenating mixed types (#8276) @ChrisJar
  • Fix CMake FindPackage rmm, pin dev envs' dlpack to v0.3 (#8271) @trxcllnt
  • Update cudfjni version to 21.06 (#8267) @pxLi
  • support RMM aligned resource adapter in JNI (#8266) @rongou
  • Pass compiler environment variables to conda python build (#8260) @Ethyling
  • Remove abc inheritance from Serializable (#8254) @vyasr
  • Move more methods into SingleColumnFrame (#8253) @vyasr
  • Update ORC statistics API to use C++17 standard library (#8241) @vuule
  • Correct unused parameter warnings in dictonary algorithms (#8239) @robertmaynard
  • Correct unused parameters in the copying algorithms (#8232) @robertmaynard
  • IO statistics cleanup (#8191) @kaatish
  • Refactor of rolling_window implementation. (#8158) @nvdbaranec
  • Add a flag for allowing single quotes in JSON strings. (#8144) @nvdbaranec
  • Column refactoring 2 (#8130) @vyasr
  • support space in workspace (#7956) @jolorunyomi
  • Support collect_set on rolling window (#7881) @sperlingxx
cudf - v0.19.2

Published by GPUtester over 3 years ago

🚨 Breaking Changes

  • Allow hash_partition to take a seed value (#7771) @magnatelee
  • Allow merging index column with data column using keyword "on" (#7736) @skirui-source
  • Change JNI API to avoid loading native dependencies when creating sort order classes. (#7729) @revans2
  • Replace device_vector with device_uvector in null_mask (#7715) @harrism
  • Don't identify decimals as strings. (#7710) @vyasr
  • Fix Java Parquet write after writer API changes (#7655) @revans2
  • Convert cudf::concatenate APIs to use spans and device_uvector (#7621) @harrism
  • Update missing docstring examples in python public APIs (#7546) @galipremsagar
  • Remove unneeded step parameter from strings::detail::copy_slice (#7525) @davidwendt
  • Rename ARROW_STATIC_LIB because it conflicts with one in FindArrow.cmake (#7518) @trxcllnt
  • Match Pandas logic for comparing two objects with nulls (#7490) @brandon-b-miller
  • Add struct support to parquet writer (#7461) @devavret
  • Join APIs that return gathermaps (#7454) @shwina
  • fixed_point + cudf::binary_operation API Changes (#7435) @codereport
  • Fix BUG: Exception when PYTHONOPTIMIZE=2 (#7434) @skirui-source
  • Change nvtext::load_vocabulary_file to return a unique ptr (#7424) @davidwendt
  • Refactor strings column factories (#7397) @harrism
  • Use CMAKE_CUDA_ARCHITECTURES (#7391) @robertmaynard
  • Upgrade pandas to 1.2 (#7375) @galipremsagar
  • Rename logical_cast to bit_cast and allow additional conversions (#7373) @ttnghia
  • Rework libcudf CMakeLists.txt to export targets for CPM (#7107) @trxcllnt

πŸ› Bug Fixes

  • unsnap: busy wait a number of cycles (#8073) @vuule
  • Fix returned column type when extracting from an empty list column (#8031) @jlowe
  • Don't reindex an new value on setitem if the original dataframe was empty (#8026) @vyasr
  • Fix a NameError in meta dispatch API (#7996) @galipremsagar
  • Reindex in DataFrame.__setitem__ (#7957) @galipremsagar
  • jitify direct-to-cubin compilation and caching. (#7919) @cwharris
  • Use dynamic cudart for nvcomp in java build (#7896) @abellina
  • fix "incompatible redefinition" warnings (#7894) @cwharris
  • cudf consistently specifies the cuda runtime (#7887) @robertmaynard
  • disable verbose output for jitify_preprocess (#7886) @cwharris
  • CMake jit_preprocess_files function only runs when needed (#7872) @robertmaynard
  • Push DeviceScalar construction into cython for list.contains (#7864) @brandon-b-miller
  • cudf now sets an install rpath of $ORIGIN (#7863) @robertmaynard
  • Don't install Thrust examples, tests, docs, and python files (#7811) @robertmaynard
  • Sort by index in groupby tests more consistently (#7802) @shwina
  • Revert "Update conda recipes pinning of repo dependencies (#7743)" (#7793) @raydouglass
  • Add decimal column handling in copy_type_metadata (#7788) @shwina
  • Add column names validation in parquet writer (#7786) @galipremsagar
  • Fix Java explode outer unit tests (#7782) @jlowe
  • Fix compiler warning about non-POD types passed through ellipsis (#7781) @jrhemstad
  • User resource fix for replace_nulls (#7769) @magnatelee
  • Fix type dispatch for columnar replace_nulls (#7768) @jlowe
  • Add ignore_order parameter to dask-cudf concat dispatch (#7765) @galipremsagar
  • Fix slicing and arrow representations of decimal columns (#7755) @vyasr
  • Fixing issue with explode_outer position not nulling position entries of null rows (#7754) @hyperbolic2346
  • Implement scatter for struct columns (#7752) @ttnghia
  • Fix data corruption in string columns (#7746) @galipremsagar
  • Fix string length in stripe dictionary building (#7744) @kaatish
  • Update conda recipes pinning of repo dependencies (#7743) @mike-wendt
  • Enable dask dispatch to cuDF's is_categorical_dtype for cuDF objects (#7740) @brandon-b-miller
  • Fix dictionary size computation in ORC writer (#7737) @vuule
  • Fix cudf::cast overflow for decimal64 to int32_t or smaller in certain cases (#7733) @codereport
  • Change JNI API to avoid loading native dependencies when creating sort order classes. (#7729) @revans2
  • Disable column_view data accessors for unsupported types (#7725) @jrhemstad
  • Materialize RangeIndex when index=True in parquet writer (#7711) @galipremsagar
  • Don't identify decimals as strings. (#7710) @vyasr
  • Fix return type of DataFrame.argsort (#7706) @galipremsagar
  • Fix/correct cudf installed package requirements (#7688) @robertmaynard
  • Fix SparkMurmurHash3_32 hash inconsistencies with Apache Spark (#7672) @jlowe
  • Fix ORC reader issue with reading empty string columns (#7656) @rgsl888prabhu
  • Fix Java Parquet write after writer API changes (#7655) @revans2
  • Fixing empty null lists throwing explode_outer for a loop. (#7649) @hyperbolic2346
  • Fix internal compiler error during JNI Docker build (#7645) @jlowe
  • Fix Debug build break with device_uvectors in grouped_rolling.cu (#7633) @mythrocks
  • Parquet reader: Fix issue when using skip_rows on non-nested columns containing nulls (#7627) @nvdbaranec
  • Fix ORC reader for empty DataFrame/Table (#7624) @rgsl888prabhu
  • Fix specifying GPU architecture in JNI build (#7612) @jlowe
  • Fix ORC writer OOM issue (#7605) @vuule
  • Fix 0.18 --> 0.19 automerge (#7589) @kkraus14
  • Fix ORC issue with incorrect timestamp nanosecond values (#7581) @vuule
  • Fix missing Dask imports (#7580) @kkraus14
  • CMAKE_CUDA_ARCHITECTURES doesn't change when build-system invokes cmake (#7579) @robertmaynard
  • Another fix for offsets_end() iterator in lists_column_view (#7575) @ttnghia
  • Fix ORC writer output corruption with string columns (#7565) @vuule
  • Fix cudf::lists::sort_lists failing for sliced column (#7564) @ttnghia
  • FIX Fix Anaconda upload args (#7558) @dillon-cullinan
  • Fix index mismatch issue in equality related APIs (#7555) @galipremsagar
  • FIX Revert gpuci_conda_retry on conda file output locations (#7552) @dillon-cullinan
  • Fix offset_end iterator for lists_column_view, which was not correctl… (#7551) @ttnghia
  • Fix no such file dlpack.h error when build libcudf (#7549) @chenrui17
  • Update missing docstring examples in python public APIs (#7546) @galipremsagar
  • Decimal32 Build Fix (#7544) @razajafri
  • FIX Retry conda output location (#7540) @dillon-cullinan
  • fix missing renames of dask git branches from master to main (#7535) @kkraus14
  • Remove detail from device_span (#7533) @rwlee
  • Change dask and distributed branch to main (#7532) @dantegd
  • Update JNI build to use CUDF_USE_ARROW_STATIC (#7526) @jlowe
  • Make sure rmm::rmm CMake target is visibile to cudf users (#7524) @robertmaynard
  • Fix contiguous_split not properly handling output partitions > 2 GB. (#7515) @nvdbaranec
  • Change jit launch to safe_launch (#7510) @devavret
  • Fix comparison between Datetime/Timedelta columns and NULL scalars (#7504) @brandon-b-miller
  • Fix off-by-one error in char-parallel string scalar replace (#7502) @jlowe
  • Fix JNI deprecation of all, put it on the wrong version before (#7501) @revans2
  • Fix Series/Dataframe Mixed Arithmetic (#7491) @brandon-b-miller
  • Fix JNI build after removal of libcudf sub-libraries (#7486) @jlowe
  • Correctly compile benchmarks (#7485) @robertmaynard
  • Fix bool column corruption with ORC Reader (#7483) @rgsl888prabhu
  • Fix __repr__ for categorical dtype (#7476) @galipremsagar
  • Java cleaner synchronization (#7474) @abellina
  • Fix java float/double parsing tests (#7473) @revans2
  • Pass stream and user resource to make_default_constructed_scalar (#7469) @magnatelee
  • Improve stability of dask_cudf.DataFrame.var and dask_cudf.DataFrame.std (#7453) @rjzamora
  • Missing device_storage_dispatch change affecting cudf::gather (#7449) @codereport
  • fix cuFile JNI compile errors (#7445) @rongou
  • Support Series.__setitem__ with key to a new row (#7443) @isVoid
  • Fix BUG: Exception when PYTHONOPTIMIZE=2 (#7434) @skirui-source
  • Make inclusive scan safe for cases with leading nulls (#7432) @magnatelee
  • Fix typo in list_device_view::pair_rep_end() (#7423) @mythrocks
  • Fix string to double conversion and row equivalent comparison (#7410) @ttnghia
  • Fix thrust failure when transfering data from device_vector to host_vector with vectors of size 1 (#7382) @ttnghia
  • Fix std::exeception catch-by-reference gcc9 compile error (#7380) @davidwendt
  • Fix skiprows issue with ORC Reader (#7359) @rgsl888prabhu
  • fix Arrow CMake file (#7358) @rongou
  • Fix lists::contains() for NaN and Decimals (#7349) @mythrocks
  • Handle cupy array in Dataframe.__setitem__ (#7340) @galipremsagar
  • Fix invalid-device-fn error in cudf::strings::replace_re with multiple regex's (#7336) @davidwendt
  • FIX Add codecov upload block to gpu script (#6860) @dillon-cullinan

πŸ“– Documentation

  • Fix join API doxygen (#7890) @shwina
  • Add Resources to README. (#7697) @bdice
  • Add isin examples in Docstring (#7479) @galipremsagar
  • Resolving unlinked type shorthands in cudf doc (#7416) @isVoid
  • Fix typo in regex.md doc page (#7363) @davidwendt
  • Fix incorrect strings_column_view::chars_size documentation (#7360) @jlowe

πŸš€ New Features

  • Enable basic reductions for decimal columns (#7776) @ChrisJar
  • Enable join on decimal columns (#7764) @ChrisJar
  • Allow merging index column with data column using keyword "on" (#7736) @skirui-source
  • Implement DecimalColumn + Scalar and add cudf.Scalars of Decimal64Dtype (#7732) @brandon-b-miller
  • Add support for unique groupby aggregation (#7726) @shwina
  • Expose libcudf's label_bins function to cudf (#7724) @vyasr
  • Adding support for equi-join on struct (#7720) @hyperbolic2346
  • Add decimal column comparison operations (#7716) @isVoid
  • Implement scan operations for decimal columns (#7707) @ChrisJar
  • Enable typecasting between decimal and int (#7691) @ChrisJar
  • Enable decimal support in parquet writer (#7673) @devavret
  • Adds list.unique API (#7664) @isVoid
  • Fix NaN handling in drop_list_duplicates (#7662) @ttnghia
  • Add lists.sort_values API (#7657) @isVoid
  • Add is_integer API that can check for the validity of a string-to-integer conversion (#7642) @ttnghia
  • Adds explode API (#7607) @isVoid
  • Adds list.take, python binding for cudf::lists::segmented_gather (#7591) @isVoid
  • Implement cudf::label_bins() (#7554) @vyasr
  • Add Python bindings for lists::contains (#7547) @skirui-source
  • cudf::row_bit_count() support. (#7534) @nvdbaranec
  • Implement drop_list_duplicates (#7528) @ttnghia
  • Add Python bindings for lists::extract_lists_element (#7505) @skirui-source
  • Add explode_outer and explode_outer_position (#7499) @hyperbolic2346
  • Match Pandas logic for comparing two objects with nulls (#7490) @brandon-b-miller
  • Add struct support to parquet writer (#7461) @devavret
  • Enable type conversion from float to decimal type (#7450) @ChrisJar
  • Add cython for converting strings/fixed-point functions (#7429) @davidwendt
  • Add struct column support to cudf::sort and cudf::sorted_order (#7422) @karthikeyann
  • Implement groupby collect_set (#7420) @ttnghia
  • Merge branch-0.18 into branch-0.19 (#7411) @raydouglass
  • Refactor strings column factories (#7397) @harrism
  • Add groupby scan operations (sort groupby) (#7387) @karthikeyann
  • Add cudf::explode_position (#7376) @hyperbolic2346
  • Add string conversion to/from decimal values libcudf APIs (#7364) @davidwendt
  • Add groupby SUM_OF_SQUARES support (#7362) @karthikeyann
  • Add Series.drop api (#7304) @isVoid
  • get_json_object() implementation (#7286) @nvdbaranec
  • Python API for LIstMethods.len() (#7283) @isVoid
  • Support null_policy::EXCLUDE for COLLECT rolling aggregation (#7264) @mythrocks
  • Add support for special tokens in nvtext::subword_tokenizer (#7254) @davidwendt
  • Fix inplace update of data and add Series.update (#7201) @galipremsagar
  • Implement cudf::group_by (hash) for decimal32 and decimal64 (#7190) @codereport
  • Adding support to specify "level" parameter for Dataframe.rename (#7135) @skirui-source

πŸ› οΈ Improvements

  • fix GDS include path for version 0.95 (#7877) @rongou
  • Update dask + distributed to 2021.4.0 (#7858) @jakirkham
  • Add ability to extract include dirs from CUDF_HOME (#7848) @galipremsagar
  • Add USE_GDS as an option in build script (#7833) @pxLi
  • add an allocate method with stream in java DeviceMemoryBuffer (#7826) @rongou
  • Constrain dask and distributed versions to 2021.3.1 (#7825) @shwina
  • Revert dask versioning of concat dispatch (#7823) @galipremsagar
  • add copy methods in Java memory buffer (#7791) @rongou
  • Update README and CONTRIBUTING for 0.19 (#7778) @robertmaynard
  • Allow hash_partition to take a seed value (#7771) @magnatelee
  • Turn on NVTX by default in java build (#7761) @tgravescs
  • Add Java bindings to join gather map APIs (#7751) @jlowe
  • Add replacements column support for Java replaceNulls (#7750) @jlowe
  • Add Java bindings for row_bit_count (#7749) @jlowe
  • Remove unused JVM array creation (#7748) @jlowe
  • Added JNI support for new is_integer (#7739) @revans2
  • Create and promote library aliases in libcudf installations (#7734) @trxcllnt
  • Support groupby operations for decimal dtypes (#7731) @vyasr
  • Memory map the input file only when GDS compatiblity mode is not used (#7717) @vuule
  • Replace device_vector with device_uvector in null_mask (#7715) @harrism
  • Struct hashing support for SerialMurmur3 and SparkMurmur3 (#7714) @jlowe
  • Add gbenchmark for nvtext replace-tokens function (#7708) @davidwendt
  • Use stream in groupby calls (#7705) @karthikeyann
  • Update codeowners file (#7701) @ajschmidt8
  • Cleanup groupby to use host_span, device_span, device_uvector (#7698) @karthikeyann
  • Add gbenchmark for nvtext ngrams functions (#7693) @davidwendt
  • Misc Python/Cython optimizations (#7686) @shwina
  • Add gbenchmark for nvtext tokenize functions (#7684) @davidwendt
  • Add column_device_view to orc writer (#7676) @kaatish
  • cudf_kafka now uses cuDF CMake export targets (CPM) (#7674) @robertmaynard
  • Add gbenchmark for nvtext normalize functions (#7668) @davidwendt
  • Resolve unnecessary import of thrust/optional.hpp in types.hpp (#7667) @vyasr
  • Feature/optimize accessor copy (#7660) @vyasr
  • Fix find_package(cudf) (#7658) @trxcllnt
  • Work-around for gcc7 compile error on Centos7 (#7652) @davidwendt
  • Add in JNI support for count_elements (#7651) @revans2
  • Fix issues with building cudf in a non-conda environment (#7647) @galipremsagar
  • Refactor ConfigureCUDA to not conditionally insert compiler flags (#7643) @robertmaynard
  • Add gbenchmark for converting strings to/from timestamps (#7641) @davidwendt
  • Handle constructing a cudf.Scalar from a cudf.Scalar (#7639) @shwina
  • Add in JNI support for table partition (#7637) @revans2
  • Add explicit fixed_point merge test (#7635) @codereport
  • Add JNI support for IDENTITY hash partitioning (#7626) @revans2
  • Java support on explode_outer (#7625) @sperlingxx
  • Java support of casting string from/to decimal (#7623) @sperlingxx
  • Convert cudf::concatenate APIs to use spans and device_uvector (#7621) @harrism
  • Add gbenchmark for cudf::strings::translate function (#7617) @davidwendt
  • Use file(COPY ) over file(INSTALL ) so cmake output is reduced (#7616) @robertmaynard
  • Use rmm::device_uvector in place of rmm::device_vector for ORC reader/writer and cudf::io::column_buffer (#7614) @vuule
  • Refactor Java host-side buffer concatenation to expose separate steps (#7610) @jlowe
  • Add gbenchmarks for string substrings functions (#7603) @davidwendt
  • Refactor string conversion check (#7599) @ttnghia
  • JNI: Pass names of children struct columns to native Arrow IPC writer (#7598) @firestarman
  • Revert "ENH Fix stale GHA and prevent duplicates " (#7595) @mike-wendt
  • ENH Fix stale GHA and prevent duplicates (#7594) @mike-wendt
  • Fix auto-detecting GPU architectures (#7593) @trxcllnt
  • Reduce cudf library size (#7583) @robertmaynard
  • Optimize cudf::make_strings_column for long strings (#7576) @davidwendt
  • Always build and export the cudf::cudftestutil target (#7574) @trxcllnt
  • Eliminate literal parameters to uvector::set_element_async and device_scalar::set_value (#7563) @harrism
  • Add gbenchmark for strings::concatenate (#7560) @davidwendt
  • Update Changelog Link (#7550) @ajschmidt8
  • Add gbenchmarks for strings replace regex functions (#7541) @davidwendt
  • Add __repr__ for Column and ColumnAccessor (#7531) @shwina
  • Support Decimal DIV changes in cudf (#7527) @razajafri
  • Remove unneeded step parameter from strings::detail::copy_slice (#7525) @davidwendt
  • Use device_uvector, device_span in sort groupby (#7523) @karthikeyann
  • Add gbenchmarks for strings extract function (#7522) @davidwendt
  • Rename ARROW_STATIC_LIB because it conflicts with one in FindArrow.cmake (#7518) @trxcllnt
  • Reduce compile time/size for scan.cu (#7516) @davidwendt
  • Change device_vector to device_uvector in nvtext source files (#7512) @davidwendt
  • Removed unneeded includes from traits.hpp (#7509) @davidwendt
  • FIX Remove random build directory generation for ccache (#7508) @dillon-cullinan
  • xfail failing pytest in pandas 1.2.3 (#7507) @galipremsagar
  • JNI bit cast (#7493) @revans2
  • Combine rolling window function tests (#7480) @mythrocks
  • Prepare Changelog for Automation (#7477) @ajschmidt8
  • Java support for explode position (#7471) @sperlingxx
  • Update 0.18 changelog entry (#7463) @ajschmidt8
  • JNI: Support skipping nulls for collect aggregation (#7457) @firestarman
  • Join APIs that return gathermaps (#7454) @shwina
  • Remove dependence on managed memory for multimap test (#7451) @jrhemstad
  • Use cuFile for Parquet IO when available (#7444) @vuule
  • Statistics cleanup (#7439) @kaatish
  • Add gbenchmarks for strings filter functions (#7438) @davidwendt
  • fixed_point + cudf::binary_operation API Changes (#7435) @codereport
  • Improve string gather performance (#7433) @jlowe
  • Don't use user resource for a temporary allocation in sort_by_key (#7431) @magnatelee
  • Detail APIs for datetime functions (#7430) @magnatelee
  • Replace thrust::max_element with thrust::reduce in strings findall_re (#7428) @davidwendt
  • Add gbenchmark for strings split/split_record functions (#7427) @davidwendt
  • Update JNI build to use CMAKE_CUDA_ARCHITECTURES (#7425) @jlowe
  • Change nvtext::load_vocabulary_file to return a unique ptr (#7424) @davidwendt
  • Simplify type dispatch with device_storage_dispatch (#7419) @codereport
  • Java support for casting of nested child columns (#7417) @razajafri
  • Improve scalar string replace performance for long strings (#7415) @jlowe
  • Remove unneeded temporary device vector for strings scatter specialization (#7409) @davidwendt
  • bitmask_or implementation with bitmask refactor (#7406) @rwlee
  • Add other cudf::strings::replace functions to current strings replace gbenchmark (#7403) @davidwendt
  • Clean up included headers in device_operators.cuh (#7401) @codereport
  • Move nullable index iterator to indexalator factory (#7399) @davidwendt
  • ENH Pass ccache variables to conda recipe & use Ninja in CI (#7398) @Ethyling
  • upgrade maven-antrun-plugin to support maven parallel builds (#7393) @rongou
  • Add gbenchmark for strings find/contains functions (#7392) @davidwendt
  • Use CMAKE_CUDA_ARCHITECTURES (#7391) @robertmaynard
  • Refactor libcudf strings::replace to use make_strings_children utility (#7384) @davidwendt
  • Added in JNI support for out of core sort algorithm (#7381) @revans2
  • Upgrade pandas to 1.2 (#7375) @galipremsagar
  • Rename logical_cast to bit_cast and allow additional conversions (#7373) @ttnghia
  • jitify 2 support (#7372) @cwharris
  • compile_udf: Cache PTX for similar functions (#7371) @gmarkall
  • Add string scalar replace benchmark (#7369) @jlowe
  • Add gbenchmark for strings contains_re/count_re functions (#7366) @davidwendt
  • Update orc reader and writer fuzz tests (#7357) @galipremsagar
  • Improve url_decode performance for long strings (#7353) @jlowe
  • cudf::ast Small Refactorings (#7352) @codereport
  • Remove std::cout and print in the scatter test function EmptyListsOfNullableStrings. (#7342) @ttnghia
  • Use cudf::detail::make_counting_transform_iterator (#7338) @codereport
  • Change block size parameter from a global to a template param. (#7333) @nvdbaranec
  • Partial clean up of ORC writer (#7324) @vuule
  • Add gbenchmark for cudf::strings::to_lower (#7316) @davidwendt
  • Update Java bindings version to 0.19-SNAPSHOT (#7307) @pxLi
  • Move cudf::test::make_counting_transform_iterator to cudf/detail/iterator.cuh (#7306) @codereport
  • Use string literals in fixed_point release_asserts (#7303) @codereport
  • Fix merge conflicts for #7295 (#7297) @ajschmidt8
  • Add UTF-8 chars to create_random_column<string_view> benchmark utility (#7292) @davidwendt
  • Abstracting block reduce and block scan from cuIO kernels with cub apis (#7278) @rgsl888prabhu
  • Build.sh use cmake --build to drive build system invocation (#7270) @robertmaynard
  • Refactor dictionary support for reductions any/all (#7242) @davidwendt
  • Replace stream.value() with stream for stream_view args (#7236) @karthikeyann
  • Interval index and interval_range (#7182) @marlenezw
  • avro reader integration tests (#7156) @cwharris
  • Rework libcudf CMakeLists.txt to export targets for CPM (#7107) @trxcllnt
  • Adding Interval Dtype (#6984) @marlenezw
  • Cleaning up for loops with make_(counting_)transform_iterator (#6546) @codereport
cudf - v0.19.1

Published by GPUtester over 3 years ago

🚨 Breaking Changes

  • Allow hash_partition to take a seed value (#7771) @magnatelee
  • Allow merging index column with data column using keyword "on" (#7736) @skirui-source
  • Change JNI API to avoid loading native dependencies when creating sort order classes. (#7729) @revans2
  • Replace device_vector with device_uvector in null_mask (#7715) @harrism
  • Don't identify decimals as strings. (#7710) @vyasr
  • Fix Java Parquet write after writer API changes (#7655) @revans2
  • Convert cudf::concatenate APIs to use spans and device_uvector (#7621) @harrism
  • Update missing docstring examples in python public APIs (#7546) @galipremsagar
  • Remove unneeded step parameter from strings::detail::copy_slice (#7525) @davidwendt
  • Rename ARROW_STATIC_LIB because it conflicts with one in FindArrow.cmake (#7518) @trxcllnt
  • Match Pandas logic for comparing two objects with nulls (#7490) @brandon-b-miller
  • Add struct support to parquet writer (#7461) @devavret
  • Join APIs that return gathermaps (#7454) @shwina
  • fixed_point + cudf::binary_operation API Changes (#7435) @codereport
  • Fix BUG: Exception when PYTHONOPTIMIZE=2 (#7434) @skirui-source
  • Change nvtext::load_vocabulary_file to return a unique ptr (#7424) @davidwendt
  • Refactor strings column factories (#7397) @harrism
  • Use CMAKE_CUDA_ARCHITECTURES (#7391) @robertmaynard
  • Upgrade pandas to 1.2 (#7375) @galipremsagar
  • Rename logical_cast to bit_cast and allow additional conversions (#7373) @ttnghia
  • Rework libcudf CMakeLists.txt to export targets for CPM (#7107) @trxcllnt

πŸ› Bug Fixes

  • Fix returned column type when extracting from an empty list column (#8031) @jlowe
  • Don't reindex an new value on setitem if the original dataframe was empty (#8026) @vyasr
  • Fix a NameError in meta dispatch API (#7996) @galipremsagar
  • Reindex in DataFrame.__setitem__ (#7957) @galipremsagar
  • jitify direct-to-cubin compilation and caching. (#7919) @cwharris
  • Use dynamic cudart for nvcomp in java build (#7896) @abellina
  • fix "incompatible redefinition" warnings (#7894) @cwharris
  • cudf consistently specifies the cuda runtime (#7887) @robertmaynard
  • disable verbose output for jitify_preprocess (#7886) @cwharris
  • CMake jit_preprocess_files function only runs when needed (#7872) @robertmaynard
  • Push DeviceScalar construction into cython for list.contains (#7864) @brandon-b-miller
  • cudf now sets an install rpath of $ORIGIN (#7863) @robertmaynard
  • Don't install Thrust examples, tests, docs, and python files (#7811) @robertmaynard
  • Sort by index in groupby tests more consistently (#7802) @shwina
  • Revert "Update conda recipes pinning of repo dependencies (#7743)" (#7793) @raydouglass
  • Add decimal column handling in copy_type_metadata (#7788) @shwina
  • Add column names validation in parquet writer (#7786) @galipremsagar
  • Fix Java explode outer unit tests (#7782) @jlowe
  • Fix compiler warning about non-POD types passed through ellipsis (#7781) @jrhemstad
  • User resource fix for replace_nulls (#7769) @magnatelee
  • Fix type dispatch for columnar replace_nulls (#7768) @jlowe
  • Add ignore_order parameter to dask-cudf concat dispatch (#7765) @galipremsagar
  • Fix slicing and arrow representations of decimal columns (#7755) @vyasr
  • Fixing issue with explode_outer position not nulling position entries of null rows (#7754) @hyperbolic2346
  • Implement scatter for struct columns (#7752) @ttnghia
  • Fix data corruption in string columns (#7746) @galipremsagar
  • Fix string length in stripe dictionary building (#7744) @kaatish
  • Update conda recipes pinning of repo dependencies (#7743) @mike-wendt
  • Enable dask dispatch to cuDF's is_categorical_dtype for cuDF objects (#7740) @brandon-b-miller
  • Fix dictionary size computation in ORC writer (#7737) @vuule
  • Fix cudf::cast overflow for decimal64 to int32_t or smaller in certain cases (#7733) @codereport
  • Change JNI API to avoid loading native dependencies when creating sort order classes. (#7729) @revans2
  • Disable column_view data accessors for unsupported types (#7725) @jrhemstad
  • Materialize RangeIndex when index=True in parquet writer (#7711) @galipremsagar
  • Don't identify decimals as strings. (#7710) @vyasr
  • Fix return type of DataFrame.argsort (#7706) @galipremsagar
  • Fix/correct cudf installed package requirements (#7688) @robertmaynard
  • Fix SparkMurmurHash3_32 hash inconsistencies with Apache Spark (#7672) @jlowe
  • Fix ORC reader issue with reading empty string columns (#7656) @rgsl888prabhu
  • Fix Java Parquet write after writer API changes (#7655) @revans2
  • Fixing empty null lists throwing explode_outer for a loop. (#7649) @hyperbolic2346
  • Fix internal compiler error during JNI Docker build (#7645) @jlowe
  • Fix Debug build break with device_uvectors in grouped_rolling.cu (#7633) @mythrocks
  • Parquet reader: Fix issue when using skip_rows on non-nested columns containing nulls (#7627) @nvdbaranec
  • Fix ORC reader for empty DataFrame/Table (#7624) @rgsl888prabhu
  • Fix specifying GPU architecture in JNI build (#7612) @jlowe
  • Fix ORC writer OOM issue (#7605) @vuule
  • Fix 0.18 --> 0.19 automerge (#7589) @kkraus14
  • Fix ORC issue with incorrect timestamp nanosecond values (#7581) @vuule
  • Fix missing Dask imports (#7580) @kkraus14
  • CMAKE_CUDA_ARCHITECTURES doesn't change when build-system invokes cmake (#7579) @robertmaynard
  • Another fix for offsets_end() iterator in lists_column_view (#7575) @ttnghia
  • Fix ORC writer output corruption with string columns (#7565) @vuule
  • Fix cudf::lists::sort_lists failing for sliced column (#7564) @ttnghia
  • FIX Fix Anaconda upload args (#7558) @dillon-cullinan
  • Fix index mismatch issue in equality related APIs (#7555) @galipremsagar
  • FIX Revert gpuci_conda_retry on conda file output locations (#7552) @dillon-cullinan
  • Fix offset_end iterator for lists_column_view, which was not correctl… (#7551) @ttnghia
  • Fix no such file dlpack.h error when build libcudf (#7549) @chenrui17
  • Update missing docstring examples in python public APIs (#7546) @galipremsagar
  • Decimal32 Build Fix (#7544) @razajafri
  • FIX Retry conda output location (#7540) @dillon-cullinan
  • fix missing renames of dask git branches from master to main (#7535) @kkraus14
  • Remove detail from device_span (#7533) @rwlee
  • Change dask and distributed branch to main (#7532) @dantegd
  • Update JNI build to use CUDF_USE_ARROW_STATIC (#7526) @jlowe
  • Make sure rmm::rmm CMake target is visibile to cudf users (#7524) @robertmaynard
  • Fix contiguous_split not properly handling output partitions > 2 GB. (#7515) @nvdbaranec
  • Change jit launch to safe_launch (#7510) @devavret
  • Fix comparison between Datetime/Timedelta columns and NULL scalars (#7504) @brandon-b-miller
  • Fix off-by-one error in char-parallel string scalar replace (#7502) @jlowe
  • Fix JNI deprecation of all, put it on the wrong version before (#7501) @revans2
  • Fix Series/Dataframe Mixed Arithmetic (#7491) @brandon-b-miller
  • Fix JNI build after removal of libcudf sub-libraries (#7486) @jlowe
  • Correctly compile benchmarks (#7485) @robertmaynard
  • Fix bool column corruption with ORC Reader (#7483) @rgsl888prabhu
  • Fix __repr__ for categorical dtype (#7476) @galipremsagar
  • Java cleaner synchronization (#7474) @abellina
  • Fix java float/double parsing tests (#7473) @revans2
  • Pass stream and user resource to make_default_constructed_scalar (#7469) @magnatelee
  • Improve stability of dask_cudf.DataFrame.var and dask_cudf.DataFrame.std (#7453) @rjzamora
  • Missing device_storage_dispatch change affecting cudf::gather (#7449) @codereport
  • fix cuFile JNI compile errors (#7445) @rongou
  • Support Series.__setitem__ with key to a new row (#7443) @isVoid
  • Fix BUG: Exception when PYTHONOPTIMIZE=2 (#7434) @skirui-source
  • Make inclusive scan safe for cases with leading nulls (#7432) @magnatelee
  • Fix typo in list_device_view::pair_rep_end() (#7423) @mythrocks
  • Fix string to double conversion and row equivalent comparison (#7410) @ttnghia
  • Fix thrust failure when transfering data from device_vector to host_vector with vectors of size 1 (#7382) @ttnghia
  • Fix std::exeception catch-by-reference gcc9 compile error (#7380) @davidwendt
  • Fix skiprows issue with ORC Reader (#7359) @rgsl888prabhu
  • fix Arrow CMake file (#7358) @rongou
  • Fix lists::contains() for NaN and Decimals (#7349) @mythrocks
  • Handle cupy array in Dataframe.__setitem__ (#7340) @galipremsagar
  • Fix invalid-device-fn error in cudf::strings::replace_re with multiple regex's (#7336) @davidwendt
  • FIX Add codecov upload block to gpu script (#6860) @dillon-cullinan

πŸ“– Documentation

  • Fix join API doxygen (#7890) @shwina
  • Add Resources to README. (#7697) @bdice
  • Add isin examples in Docstring (#7479) @galipremsagar
  • Resolving unlinked type shorthands in cudf doc (#7416) @isVoid
  • Fix typo in regex.md doc page (#7363) @davidwendt
  • Fix incorrect strings_column_view::chars_size documentation (#7360) @jlowe

πŸš€ New Features

  • Enable basic reductions for decimal columns (#7776) @ChrisJar
  • Enable join on decimal columns (#7764) @ChrisJar
  • Allow merging index column with data column using keyword "on" (#7736) @skirui-source
  • Implement DecimalColumn + Scalar and add cudf.Scalars of Decimal64Dtype (#7732) @brandon-b-miller
  • Add support for unique groupby aggregation (#7726) @shwina
  • Expose libcudf's label_bins function to cudf (#7724) @vyasr
  • Adding support for equi-join on struct (#7720) @hyperbolic2346
  • Add decimal column comparison operations (#7716) @isVoid
  • Implement scan operations for decimal columns (#7707) @ChrisJar
  • Enable typecasting between decimal and int (#7691) @ChrisJar
  • Enable decimal support in parquet writer (#7673) @devavret
  • Adds list.unique API (#7664) @isVoid
  • Fix NaN handling in drop_list_duplicates (#7662) @ttnghia
  • Add lists.sort_values API (#7657) @isVoid
  • Add is_integer API that can check for the validity of a string-to-integer conversion (#7642) @ttnghia
  • Adds explode API (#7607) @isVoid
  • Adds list.take, python binding for cudf::lists::segmented_gather (#7591) @isVoid
  • Implement cudf::label_bins() (#7554) @vyasr
  • Add Python bindings for lists::contains (#7547) @skirui-source
  • cudf::row_bit_count() support. (#7534) @nvdbaranec
  • Implement drop_list_duplicates (#7528) @ttnghia
  • Add Python bindings for lists::extract_lists_element (#7505) @skirui-source
  • Add explode_outer and explode_outer_position (#7499) @hyperbolic2346
  • Match Pandas logic for comparing two objects with nulls (#7490) @brandon-b-miller
  • Add struct support to parquet writer (#7461) @devavret
  • Enable type conversion from float to decimal type (#7450) @ChrisJar
  • Add cython for converting strings/fixed-point functions (#7429) @davidwendt
  • Add struct column support to cudf::sort and cudf::sorted_order (#7422) @karthikeyann
  • Implement groupby collect_set (#7420) @ttnghia
  • Merge branch-0.18 into branch-0.19 (#7411) @raydouglass
  • Refactor strings column factories (#7397) @harrism
  • Add groupby scan operations (sort groupby) (#7387) @karthikeyann
  • Add cudf::explode_position (#7376) @hyperbolic2346
  • Add string conversion to/from decimal values libcudf APIs (#7364) @davidwendt
  • Add groupby SUM_OF_SQUARES support (#7362) @karthikeyann
  • Add Series.drop api (#7304) @isVoid
  • get_json_object() implementation (#7286) @nvdbaranec
  • Python API for LIstMethods.len() (#7283) @isVoid
  • Support null_policy::EXCLUDE for COLLECT rolling aggregation (#7264) @mythrocks
  • Add support for special tokens in nvtext::subword_tokenizer (#7254) @davidwendt
  • Fix inplace update of data and add Series.update (#7201) @galipremsagar
  • Implement cudf::group_by (hash) for decimal32 and decimal64 (#7190) @codereport
  • Adding support to specify "level" parameter for Dataframe.rename (#7135) @skirui-source

πŸ› οΈ Improvements

  • fix GDS include path for version 0.95 (#7877) @rongou
  • Update dask + distributed to 2021.4.0 (#7858) @jakirkham
  • Add ability to extract include dirs from CUDF_HOME (#7848) @galipremsagar
  • Add USE_GDS as an option in build script (#7833) @pxLi
  • add an allocate method with stream in java DeviceMemoryBuffer (#7826) @rongou
  • Constrain dask and distributed versions to 2021.3.1 (#7825) @shwina
  • Revert dask versioning of concat dispatch (#7823) @galipremsagar
  • add copy methods in Java memory buffer (#7791) @rongou
  • Update README and CONTRIBUTING for 0.19 (#7778) @robertmaynard
  • Allow hash_partition to take a seed value (#7771) @magnatelee
  • Turn on NVTX by default in java build (#7761) @tgravescs
  • Add Java bindings to join gather map APIs (#7751) @jlowe
  • Add replacements column support for Java replaceNulls (#7750) @jlowe
  • Add Java bindings for row_bit_count (#7749) @jlowe
  • Remove unused JVM array creation (#7748) @jlowe
  • Added JNI support for new is_integer (#7739) @revans2
  • Create and promote library aliases in libcudf installations (#7734) @trxcllnt
  • Support groupby operations for decimal dtypes (#7731) @vyasr
  • Memory map the input file only when GDS compatiblity mode is not used (#7717) @vuule
  • Replace device_vector with device_uvector in null_mask (#7715) @harrism
  • Struct hashing support for SerialMurmur3 and SparkMurmur3 (#7714) @jlowe
  • Add gbenchmark for nvtext replace-tokens function (#7708) @davidwendt
  • Use stream in groupby calls (#7705) @karthikeyann
  • Update codeowners file (#7701) @ajschmidt8
  • Cleanup groupby to use host_span, device_span, device_uvector (#7698) @karthikeyann
  • Add gbenchmark for nvtext ngrams functions (#7693) @davidwendt
  • Misc Python/Cython optimizations (#7686) @shwina
  • Add gbenchmark for nvtext tokenize functions (#7684) @davidwendt
  • Add column_device_view to orc writer (#7676) @kaatish
  • cudf_kafka now uses cuDF CMake export targets (CPM) (#7674) @robertmaynard
  • Add gbenchmark for nvtext normalize functions (#7668) @davidwendt
  • Resolve unnecessary import of thrust/optional.hpp in types.hpp (#7667) @vyasr
  • Feature/optimize accessor copy (#7660) @vyasr
  • Fix find_package(cudf) (#7658) @trxcllnt
  • Work-around for gcc7 compile error on Centos7 (#7652) @davidwendt
  • Add in JNI support for count_elements (#7651) @revans2
  • Fix issues with building cudf in a non-conda environment (#7647) @galipremsagar
  • Refactor ConfigureCUDA to not conditionally insert compiler flags (#7643) @robertmaynard
  • Add gbenchmark for converting strings to/from timestamps (#7641) @davidwendt
  • Handle constructing a cudf.Scalar from a cudf.Scalar (#7639) @shwina
  • Add in JNI support for table partition (#7637) @revans2
  • Add explicit fixed_point merge test (#7635) @codereport
  • Add JNI support for IDENTITY hash partitioning (#7626) @revans2
  • Java support on explode_outer (#7625) @sperlingxx
  • Java support of casting string from/to decimal (#7623) @sperlingxx
  • Convert cudf::concatenate APIs to use spans and device_uvector (#7621) @harrism
  • Add gbenchmark for cudf::strings::translate function (#7617) @davidwendt
  • Use file(COPY ) over file(INSTALL ) so cmake output is reduced (#7616) @robertmaynard
  • Use rmm::device_uvector in place of rmm::device_vector for ORC reader/writer and cudf::io::column_buffer (#7614) @vuule
  • Refactor Java host-side buffer concatenation to expose separate steps (#7610) @jlowe
  • Add gbenchmarks for string substrings functions (#7603) @davidwendt
  • Refactor string conversion check (#7599) @ttnghia
  • JNI: Pass names of children struct columns to native Arrow IPC writer (#7598) @firestarman
  • Revert "ENH Fix stale GHA and prevent duplicates " (#7595) @mike-wendt
  • ENH Fix stale GHA and prevent duplicates (#7594) @mike-wendt
  • Fix auto-detecting GPU architectures (#7593) @trxcllnt
  • Reduce cudf library size (#7583) @robertmaynard
  • Optimize cudf::make_strings_column for long strings (#7576) @davidwendt
  • Always build and export the cudf::cudftestutil target (#7574) @trxcllnt
  • Eliminate literal parameters to uvector::set_element_async and device_scalar::set_value (#7563) @harrism
  • Add gbenchmark for strings::concatenate (#7560) @davidwendt
  • Update Changelog Link (#7550) @ajschmidt8
  • Add gbenchmarks for strings replace regex functions (#7541) @davidwendt
  • Add __repr__ for Column and ColumnAccessor (#7531) @shwina
  • Support Decimal DIV changes in cudf (#7527) @razajafri
  • Remove unneeded step parameter from strings::detail::copy_slice (#7525) @davidwendt
  • Use device_uvector, device_span in sort groupby (#7523) @karthikeyann
  • Add gbenchmarks for strings extract function (#7522) @davidwendt
  • Rename ARROW_STATIC_LIB because it conflicts with one in FindArrow.cmake (#7518) @trxcllnt
  • Reduce compile time/size for scan.cu (#7516) @davidwendt
  • Change device_vector to device_uvector in nvtext source files (#7512) @davidwendt
  • Removed unneeded includes from traits.hpp (#7509) @davidwendt
  • FIX Remove random build directory generation for ccache (#7508) @dillon-cullinan
  • xfail failing pytest in pandas 1.2.3 (#7507) @galipremsagar
  • JNI bit cast (#7493) @revans2
  • Combine rolling window function tests (#7480) @mythrocks
  • Prepare Changelog for Automation (#7477) @ajschmidt8
  • Java support for explode position (#7471) @sperlingxx
  • Update 0.18 changelog entry (#7463) @ajschmidt8
  • JNI: Support skipping nulls for collect aggregation (#7457) @firestarman
  • Join APIs that return gathermaps (#7454) @shwina
  • Remove dependence on managed memory for multimap test (#7451) @jrhemstad
  • Use cuFile for Parquet IO when available (#7444) @vuule
  • Statistics cleanup (#7439) @kaatish
  • Add gbenchmarks for strings filter functions (#7438) @davidwendt
  • fixed_point + cudf::binary_operation API Changes (#7435) @codereport
  • Improve string gather performance (#7433) @jlowe
  • Don't use user resource for a temporary allocation in sort_by_key (#7431) @magnatelee
  • Detail APIs for datetime functions (#7430) @magnatelee
  • Replace thrust::max_element with thrust::reduce in strings findall_re (#7428) @davidwendt
  • Add gbenchmark for strings split/split_record functions (#7427) @davidwendt
  • Update JNI build to use CMAKE_CUDA_ARCHITECTURES (#7425) @jlowe
  • Change nvtext::load_vocabulary_file to return a unique ptr (#7424) @davidwendt
  • Simplify type dispatch with device_storage_dispatch (#7419) @codereport
  • Java support for casting of nested child columns (#7417) @razajafri
  • Improve scalar string replace performance for long strings (#7415) @jlowe
  • Remove unneeded temporary device vector for strings scatter specialization (#7409) @davidwendt
  • bitmask_or implementation with bitmask refactor (#7406) @rwlee
  • Add other cudf::strings::replace functions to current strings replace gbenchmark (#7403) @davidwendt
  • Clean up included headers in device_operators.cuh (#7401) @codereport
  • Move nullable index iterator to indexalator factory (#7399) @davidwendt
  • ENH Pass ccache variables to conda recipe & use Ninja in CI (#7398) @Ethyling
  • upgrade maven-antrun-plugin to support maven parallel builds (#7393) @rongou
  • Add gbenchmark for strings find/contains functions (#7392) @davidwendt
  • Use CMAKE_CUDA_ARCHITECTURES (#7391) @robertmaynard
  • Refactor libcudf strings::replace to use make_strings_children utility (#7384) @davidwendt
  • Added in JNI support for out of core sort algorithm (#7381) @revans2
  • Upgrade pandas to 1.2 (#7375) @galipremsagar
  • Rename logical_cast to bit_cast and allow additional conversions (#7373) @ttnghia
  • jitify 2 support (#7372) @cwharris
  • compile_udf: Cache PTX for similar functions (#7371) @gmarkall
  • Add string scalar replace benchmark (#7369) @jlowe
  • Add gbenchmark for strings contains_re/count_re functions (#7366) @davidwendt
  • Update orc reader and writer fuzz tests (#7357) @galipremsagar
  • Improve url_decode performance for long strings (#7353) @jlowe
  • cudf::ast Small Refactorings (#7352) @codereport
  • Remove std::cout and print in the scatter test function EmptyListsOfNullableStrings. (#7342) @ttnghia
  • Use cudf::detail::make_counting_transform_iterator (#7338) @codereport
  • Change block size parameter from a global to a template param. (#7333) @nvdbaranec
  • Partial clean up of ORC writer (#7324) @vuule
  • Add gbenchmark for cudf::strings::to_lower (#7316) @davidwendt
  • Update Java bindings version to 0.19-SNAPSHOT (#7307) @pxLi
  • Move cudf::test::make_counting_transform_iterator to cudf/detail/iterator.cuh (#7306) @codereport
  • Use string literals in fixed_point release_asserts (#7303) @codereport
  • Fix merge conflicts for #7295 (#7297) @ajschmidt8
  • Add UTF-8 chars to create_random_column<string_view> benchmark utility (#7292) @davidwendt
  • Abstracting block reduce and block scan from cuIO kernels with cub apis (#7278) @rgsl888prabhu
  • Build.sh use cmake --build to drive build system invocation (#7270) @robertmaynard
  • Refactor dictionary support for reductions any/all (#7242) @davidwendt
  • Replace stream.value() with stream for stream_view args (#7236) @karthikeyann
  • Interval index and interval_range (#7182) @marlenezw
  • avro reader integration tests (#7156) @cwharris
  • Rework libcudf CMakeLists.txt to export targets for CPM (#7107) @trxcllnt
  • Adding Interval Dtype (#6984) @marlenezw
  • Cleaning up for loops with make_(counting_)transform_iterator (#6546) @codereport
cudf - v0.19.0

Published by GPUtester over 3 years ago

🚨 Breaking Changes

  • Allow hash_partition to take a seed value (#7771) @magnatelee
  • Allow merging index column with data column using keyword "on" (#7736) @skirui-source
  • Change JNI API to avoid loading native dependencies when creating sort order classes. (#7729) @revans2
  • Replace device_vector with device_uvector in null_mask (#7715) @harrism
  • Don't identify decimals as strings. (#7710) @vyasr
  • Fix Java Parquet write after writer API changes (#7655) @revans2
  • Convert cudf::concatenate APIs to use spans and device_uvector (#7621) @harrism
  • Update missing docstring examples in python public APIs (#7546) @galipremsagar
  • Remove unneeded step parameter from strings::detail::copy_slice (#7525) @davidwendt
  • Rename ARROW_STATIC_LIB because it conflicts with one in FindArrow.cmake (#7518) @trxcllnt
  • Match Pandas logic for comparing two objects with nulls (#7490) @brandon-b-miller
  • Add struct support to parquet writer (#7461) @devavret
  • Join APIs that return gathermaps (#7454) @shwina
  • fixed_point + cudf::binary_operation API Changes (#7435) @codereport
  • Fix BUG: Exception when PYTHONOPTIMIZE=2 (#7434) @skirui-source
  • Change nvtext::load_vocabulary_file to return a unique ptr (#7424) @davidwendt
  • Refactor strings column factories (#7397) @harrism
  • Use CMAKE_CUDA_ARCHITECTURES (#7391) @robertmaynard
  • Upgrade pandas to 1.2 (#7375) @galipremsagar
  • Rename logical_cast to bit_cast and allow additional conversions (#7373) @ttnghia
  • Rework libcudf CMakeLists.txt to export targets for CPM (#7107) @trxcllnt

πŸ› Bug Fixes

  • Fix a NameError in meta dispatch API (#7996) @galipremsagar
  • Reindex in DataFrame.__setitem__ (#7957) @galipremsagar
  • jitify direct-to-cubin compilation and caching. (#7919) @cwharris
  • Use dynamic cudart for nvcomp in java build (#7896) @abellina
  • fix "incompatible redefinition" warnings (#7894) @cwharris
  • cudf consistently specifies the cuda runtime (#7887) @robertmaynard
  • disable verbose output for jitify_preprocess (#7886) @cwharris
  • CMake jit_preprocess_files function only runs when needed (#7872) @robertmaynard
  • Push DeviceScalar construction into cython for list.contains (#7864) @brandon-b-miller
  • cudf now sets an install rpath of $ORIGIN (#7863) @robertmaynard
  • Don't install Thrust examples, tests, docs, and python files (#7811) @robertmaynard
  • Sort by index in groupby tests more consistently (#7802) @shwina
  • Revert "Update conda recipes pinning of repo dependencies (#7743)" (#7793) @raydouglass
  • Add decimal column handling in copy_type_metadata (#7788) @shwina
  • Add column names validation in parquet writer (#7786) @galipremsagar
  • Fix Java explode outer unit tests (#7782) @jlowe
  • Fix compiler warning about non-POD types passed through ellipsis (#7781) @jrhemstad
  • User resource fix for replace_nulls (#7769) @magnatelee
  • Fix type dispatch for columnar replace_nulls (#7768) @jlowe
  • Add ignore_order parameter to dask-cudf concat dispatch (#7765) @galipremsagar
  • Fix slicing and arrow representations of decimal columns (#7755) @vyasr
  • Fixing issue with explode_outer position not nulling position entries of null rows (#7754) @hyperbolic2346
  • Implement scatter for struct columns (#7752) @ttnghia
  • Fix data corruption in string columns (#7746) @galipremsagar
  • Fix string length in stripe dictionary building (#7744) @kaatish
  • Update conda recipes pinning of repo dependencies (#7743) @mike-wendt
  • Enable dask dispatch to cuDF's is_categorical_dtype for cuDF objects (#7740) @brandon-b-miller
  • Fix dictionary size computation in ORC writer (#7737) @vuule
  • Fix cudf::cast overflow for decimal64 to int32_t or smaller in certain cases (#7733) @codereport
  • Change JNI API to avoid loading native dependencies when creating sort order classes. (#7729) @revans2
  • Disable column_view data accessors for unsupported types (#7725) @jrhemstad
  • Materialize RangeIndex when index=True in parquet writer (#7711) @galipremsagar
  • Don't identify decimals as strings. (#7710) @vyasr
  • Fix return type of DataFrame.argsort (#7706) @galipremsagar
  • Fix/correct cudf installed package requirements (#7688) @robertmaynard
  • Fix SparkMurmurHash3_32 hash inconsistencies with Apache Spark (#7672) @jlowe
  • Fix ORC reader issue with reading empty string columns (#7656) @rgsl888prabhu
  • Fix Java Parquet write after writer API changes (#7655) @revans2
  • Fixing empty null lists throwing explode_outer for a loop. (#7649) @hyperbolic2346
  • Fix internal compiler error during JNI Docker build (#7645) @jlowe
  • Fix Debug build break with device_uvectors in grouped_rolling.cu (#7633) @mythrocks
  • Parquet reader: Fix issue when using skip_rows on non-nested columns containing nulls (#7627) @nvdbaranec
  • Fix ORC reader for empty DataFrame/Table (#7624) @rgsl888prabhu
  • Fix specifying GPU architecture in JNI build (#7612) @jlowe
  • Fix ORC writer OOM issue (#7605) @vuule
  • Fix 0.18 --> 0.19 automerge (#7589) @kkraus14
  • Fix ORC issue with incorrect timestamp nanosecond values (#7581) @vuule
  • Fix missing Dask imports (#7580) @kkraus14
  • CMAKE_CUDA_ARCHITECTURES doesn't change when build-system invokes cmake (#7579) @robertmaynard
  • Another fix for offsets_end() iterator in lists_column_view (#7575) @ttnghia
  • Fix ORC writer output corruption with string columns (#7565) @vuule
  • Fix cudf::lists::sort_lists failing for sliced column (#7564) @ttnghia
  • FIX Fix Anaconda upload args (#7558) @dillon-cullinan
  • Fix index mismatch issue in equality related APIs (#7555) @galipremsagar
  • FIX Revert gpuci_conda_retry on conda file output locations (#7552) @dillon-cullinan
  • Fix offset_end iterator for lists_column_view, which was not correctl… (#7551) @ttnghia
  • Fix no such file dlpack.h error when build libcudf (#7549) @chenrui17
  • Update missing docstring examples in python public APIs (#7546) @galipremsagar
  • Decimal32 Build Fix (#7544) @razajafri
  • FIX Retry conda output location (#7540) @dillon-cullinan
  • fix missing renames of dask git branches from master to main (#7535) @kkraus14
  • Remove detail from device_span (#7533) @rwlee
  • Change dask and distributed branch to main (#7532) @dantegd
  • Update JNI build to use CUDF_USE_ARROW_STATIC (#7526) @jlowe
  • Make sure rmm::rmm CMake target is visibile to cudf users (#7524) @robertmaynard
  • Fix contiguous_split not properly handling output partitions > 2 GB. (#7515) @nvdbaranec
  • Change jit launch to safe_launch (#7510) @devavret
  • Fix comparison between Datetime/Timedelta columns and NULL scalars (#7504) @brandon-b-miller
  • Fix off-by-one error in char-parallel string scalar replace (#7502) @jlowe
  • Fix JNI deprecation of all, put it on the wrong version before (#7501) @revans2
  • Fix Series/Dataframe Mixed Arithmetic (#7491) @brandon-b-miller
  • Fix JNI build after removal of libcudf sub-libraries (#7486) @jlowe
  • Correctly compile benchmarks (#7485) @robertmaynard
  • Fix bool column corruption with ORC Reader (#7483) @rgsl888prabhu
  • Fix __repr__ for categorical dtype (#7476) @galipremsagar
  • Java cleaner synchronization (#7474) @abellina
  • Fix java float/double parsing tests (#7473) @revans2
  • Pass stream and user resource to make_default_constructed_scalar (#7469) @magnatelee
  • Improve stability of dask_cudf.DataFrame.var and dask_cudf.DataFrame.std (#7453) @rjzamora
  • Missing device_storage_dispatch change affecting cudf::gather (#7449) @codereport
  • fix cuFile JNI compile errors (#7445) @rongou
  • Support Series.__setitem__ with key to a new row (#7443) @isVoid
  • Fix BUG: Exception when PYTHONOPTIMIZE=2 (#7434) @skirui-source
  • Make inclusive scan safe for cases with leading nulls (#7432) @magnatelee
  • Fix typo in list_device_view::pair_rep_end() (#7423) @mythrocks
  • Fix string to double conversion and row equivalent comparison (#7410) @ttnghia
  • Fix thrust failure when transfering data from device_vector to host_vector with vectors of size 1 (#7382) @ttnghia
  • Fix std::exeception catch-by-reference gcc9 compile error (#7380) @davidwendt
  • Fix skiprows issue with ORC Reader (#7359) @rgsl888prabhu
  • fix Arrow CMake file (#7358) @rongou
  • Fix lists::contains() for NaN and Decimals (#7349) @mythrocks
  • Handle cupy array in Dataframe.__setitem__ (#7340) @galipremsagar
  • Fix invalid-device-fn error in cudf::strings::replace_re with multiple regex's (#7336) @davidwendt
  • FIX Add codecov upload block to gpu script (#6860) @dillon-cullinan

πŸ“– Documentation

  • Fix join API doxygen (#7890) @shwina
  • Add Resources to README. (#7697) @bdice
  • Add isin examples in Docstring (#7479) @galipremsagar
  • Resolving unlinked type shorthands in cudf doc (#7416) @isVoid
  • Fix typo in regex.md doc page (#7363) @davidwendt
  • Fix incorrect strings_column_view::chars_size documentation (#7360) @jlowe

πŸš€ New Features

  • Enable basic reductions for decimal columns (#7776) @ChrisJar
  • Enable join on decimal columns (#7764) @ChrisJar
  • Allow merging index column with data column using keyword "on" (#7736) @skirui-source
  • Implement DecimalColumn + Scalar and add cudf.Scalars of Decimal64Dtype (#7732) @brandon-b-miller
  • Add support for unique groupby aggregation (#7726) @shwina
  • Expose libcudf's label_bins function to cudf (#7724) @vyasr
  • Adding support for equi-join on struct (#7720) @hyperbolic2346
  • Add decimal column comparison operations (#7716) @isVoid
  • Implement scan operations for decimal columns (#7707) @ChrisJar
  • Enable typecasting between decimal and int (#7691) @ChrisJar
  • Enable decimal support in parquet writer (#7673) @devavret
  • Adds list.unique API (#7664) @isVoid
  • Fix NaN handling in drop_list_duplicates (#7662) @ttnghia
  • Add lists.sort_values API (#7657) @isVoid
  • Add is_integer API that can check for the validity of a string-to-integer conversion (#7642) @ttnghia
  • Adds explode API (#7607) @isVoid
  • Adds list.take, python binding for cudf::lists::segmented_gather (#7591) @isVoid
  • Implement cudf::label_bins() (#7554) @vyasr
  • Add Python bindings for lists::contains (#7547) @skirui-source
  • cudf::row_bit_count() support. (#7534) @nvdbaranec
  • Implement drop_list_duplicates (#7528) @ttnghia
  • Add Python bindings for lists::extract_lists_element (#7505) @skirui-source
  • Add explode_outer and explode_outer_position (#7499) @hyperbolic2346
  • Match Pandas logic for comparing two objects with nulls (#7490) @brandon-b-miller
  • Add struct support to parquet writer (#7461) @devavret
  • Enable type conversion from float to decimal type (#7450) @ChrisJar
  • Add cython for converting strings/fixed-point functions (#7429) @davidwendt
  • Add struct column support to cudf::sort and cudf::sorted_order (#7422) @karthikeyann
  • Implement groupby collect_set (#7420) @ttnghia
  • Merge branch-0.18 into branch-0.19 (#7411) @raydouglass
  • Refactor strings column factories (#7397) @harrism
  • Add groupby scan operations (sort groupby) (#7387) @karthikeyann
  • Add cudf::explode_position (#7376) @hyperbolic2346
  • Add string conversion to/from decimal values libcudf APIs (#7364) @davidwendt
  • Add groupby SUM_OF_SQUARES support (#7362) @karthikeyann
  • Add Series.drop api (#7304) @isVoid
  • get_json_object() implementation (#7286) @nvdbaranec
  • Python API for LIstMethods.len() (#7283) @isVoid
  • Support null_policy::EXCLUDE for COLLECT rolling aggregation (#7264) @mythrocks
  • Add support for special tokens in nvtext::subword_tokenizer (#7254) @davidwendt
  • Fix inplace update of data and add Series.update (#7201) @galipremsagar
  • Implement cudf::group_by (hash) for decimal32 and decimal64 (#7190) @codereport
  • Adding support to specify "level" parameter for Dataframe.rename (#7135) @skirui-source

πŸ› οΈ Improvements

  • fix GDS include path for version 0.95 (#7877) @rongou
  • Update dask + distributed to 2021.4.0 (#7858) @jakirkham
  • Add ability to extract include dirs from CUDF_HOME (#7848) @galipremsagar
  • Add USE_GDS as an option in build script (#7833) @pxLi
  • add an allocate method with stream in java DeviceMemoryBuffer (#7826) @rongou
  • Constrain dask and distributed versions to 2021.3.1 (#7825) @shwina
  • Revert dask versioning of concat dispatch (#7823) @galipremsagar
  • add copy methods in Java memory buffer (#7791) @rongou
  • Update README and CONTRIBUTING for 0.19 (#7778) @robertmaynard
  • Allow hash_partition to take a seed value (#7771) @magnatelee
  • Turn on NVTX by default in java build (#7761) @tgravescs
  • Add Java bindings to join gather map APIs (#7751) @jlowe
  • Add replacements column support for Java replaceNulls (#7750) @jlowe
  • Add Java bindings for row_bit_count (#7749) @jlowe
  • Remove unused JVM array creation (#7748) @jlowe
  • Added JNI support for new is_integer (#7739) @revans2
  • Create and promote library aliases in libcudf installations (#7734) @trxcllnt
  • Support groupby operations for decimal dtypes (#7731) @vyasr
  • Memory map the input file only when GDS compatiblity mode is not used (#7717) @vuule
  • Replace device_vector with device_uvector in null_mask (#7715) @harrism
  • Struct hashing support for SerialMurmur3 and SparkMurmur3 (#7714) @jlowe
  • Add gbenchmark for nvtext replace-tokens function (#7708) @davidwendt
  • Use stream in groupby calls (#7705) @karthikeyann
  • Update codeowners file (#7701) @ajschmidt8
  • Cleanup groupby to use host_span, device_span, device_uvector (#7698) @karthikeyann
  • Add gbenchmark for nvtext ngrams functions (#7693) @davidwendt
  • Misc Python/Cython optimizations (#7686) @shwina
  • Add gbenchmark for nvtext tokenize functions (#7684) @davidwendt
  • Add column_device_view to orc writer (#7676) @kaatish
  • cudf_kafka now uses cuDF CMake export targets (CPM) (#7674) @robertmaynard
  • Add gbenchmark for nvtext normalize functions (#7668) @davidwendt
  • Resolve unnecessary import of thrust/optional.hpp in types.hpp (#7667) @vyasr
  • Feature/optimize accessor copy (#7660) @vyasr
  • Fix find_package(cudf) (#7658) @trxcllnt
  • Work-around for gcc7 compile error on Centos7 (#7652) @davidwendt
  • Add in JNI support for count_elements (#7651) @revans2
  • Fix issues with building cudf in a non-conda environment (#7647) @galipremsagar
  • Refactor ConfigureCUDA to not conditionally insert compiler flags (#7643) @robertmaynard
  • Add gbenchmark for converting strings to/from timestamps (#7641) @davidwendt
  • Handle constructing a cudf.Scalar from a cudf.Scalar (#7639) @shwina
  • Add in JNI support for table partition (#7637) @revans2
  • Add explicit fixed_point merge test (#7635) @codereport
  • Add JNI support for IDENTITY hash partitioning (#7626) @revans2
  • Java support on explode_outer (#7625) @sperlingxx
  • Java support of casting string from/to decimal (#7623) @sperlingxx
  • Convert cudf::concatenate APIs to use spans and device_uvector (#7621) @harrism
  • Add gbenchmark for cudf::strings::translate function (#7617) @davidwendt
  • Use file(COPY ) over file(INSTALL ) so cmake output is reduced (#7616) @robertmaynard
  • Use rmm::device_uvector in place of rmm::device_vector for ORC reader/writer and cudf::io::column_buffer (#7614) @vuule
  • Refactor Java host-side buffer concatenation to expose separate steps (#7610) @jlowe
  • Add gbenchmarks for string substrings functions (#7603) @davidwendt
  • Refactor string conversion check (#7599) @ttnghia
  • JNI: Pass names of children struct columns to native Arrow IPC writer (#7598) @firestarman
  • Revert "ENH Fix stale GHA and prevent duplicates " (#7595) @mike-wendt
  • ENH Fix stale GHA and prevent duplicates (#7594) @mike-wendt
  • Fix auto-detecting GPU architectures (#7593) @trxcllnt
  • Reduce cudf library size (#7583) @robertmaynard
  • Optimize cudf::make_strings_column for long strings (#7576) @davidwendt
  • Always build and export the cudf::cudftestutil target (#7574) @trxcllnt
  • Eliminate literal parameters to uvector::set_element_async and device_scalar::set_value (#7563) @harrism
  • Add gbenchmark for strings::concatenate (#7560) @davidwendt
  • Update Changelog Link (#7550) @ajschmidt8
  • Add gbenchmarks for strings replace regex functions (#7541) @davidwendt
  • Add __repr__ for Column and ColumnAccessor (#7531) @shwina
  • Support Decimal DIV changes in cudf (#7527) @razajafri
  • Remove unneeded step parameter from strings::detail::copy_slice (#7525) @davidwendt
  • Use device_uvector, device_span in sort groupby (#7523) @karthikeyann
  • Add gbenchmarks for strings extract function (#7522) @davidwendt
  • Rename ARROW_STATIC_LIB because it conflicts with one in FindArrow.cmake (#7518) @trxcllnt
  • Reduce compile time/size for scan.cu (#7516) @davidwendt
  • Change device_vector to device_uvector in nvtext source files (#7512) @davidwendt
  • Removed unneeded includes from traits.hpp (#7509) @davidwendt
  • FIX Remove random build directory generation for ccache (#7508) @dillon-cullinan
  • xfail failing pytest in pandas 1.2.3 (#7507) @galipremsagar
  • JNI bit cast (#7493) @revans2
  • Combine rolling window function tests (#7480) @mythrocks
  • Prepare Changelog for Automation (#7477) @ajschmidt8
  • Java support for explode position (#7471) @sperlingxx
  • Update 0.18 changelog entry (#7463) @ajschmidt8
  • JNI: Support skipping nulls for collect aggregation (#7457) @firestarman
  • Join APIs that return gathermaps (#7454) @shwina
  • Remove dependence on managed memory for multimap test (#7451) @jrhemstad
  • Use cuFile for Parquet IO when available (#7444) @vuule
  • Statistics cleanup (#7439) @kaatish
  • Add gbenchmarks for strings filter functions (#7438) @davidwendt
  • fixed_point + cudf::binary_operation API Changes (#7435) @codereport
  • Improve string gather performance (#7433) @jlowe
  • Don't use user resource for a temporary allocation in sort_by_key (#7431) @magnatelee
  • Detail APIs for datetime functions (#7430) @magnatelee
  • Replace thrust::max_element with thrust::reduce in strings findall_re (#7428) @davidwendt
  • Add gbenchmark for strings split/split_record functions (#7427) @davidwendt
  • Update JNI build to use CMAKE_CUDA_ARCHITECTURES (#7425) @jlowe
  • Change nvtext::load_vocabulary_file to return a unique ptr (#7424) @davidwendt
  • Simplify type dispatch with device_storage_dispatch (#7419) @codereport
  • Java support for casting of nested child columns (#7417) @razajafri
  • Improve scalar string replace performance for long strings (#7415) @jlowe
  • Remove unneeded temporary device vector for strings scatter specialization (#7409) @davidwendt
  • bitmask_or implementation with bitmask refactor (#7406) @rwlee
  • Add other cudf::strings::replace functions to current strings replace gbenchmark (#7403) @davidwendt
  • Clean up included headers in device_operators.cuh (#7401) @codereport
  • Move nullable index iterator to indexalator factory (#7399) @davidwendt
  • ENH Pass ccache variables to conda recipe & use Ninja in CI (#7398) @Ethyling
  • upgrade maven-antrun-plugin to support maven parallel builds (#7393) @rongou
  • Add gbenchmark for strings find/contains functions (#7392) @davidwendt
  • Use CMAKE_CUDA_ARCHITECTURES (#7391) @robertmaynard
  • Refactor libcudf strings::replace to use make_strings_children utility (#7384) @davidwendt
  • Added in JNI support for out of core sort algorithm (#7381) @revans2
  • Upgrade pandas to 1.2 (#7375) @galipremsagar
  • Rename logical_cast to bit_cast and allow additional conversions (#7373) @ttnghia
  • jitify 2 support (#7372) @cwharris
  • compile_udf: Cache PTX for similar functions (#7371) @gmarkall
  • Add string scalar replace benchmark (#7369) @jlowe
  • Add gbenchmark for strings contains_re/count_re functions (#7366) @davidwendt
  • Update orc reader and writer fuzz tests (#7357) @galipremsagar
  • Improve url_decode performance for long strings (#7353) @jlowe
  • cudf::ast Small Refactorings (#7352) @codereport
  • Remove std::cout and print in the scatter test function EmptyListsOfNullableStrings. (#7342) @ttnghia
  • Use cudf::detail::make_counting_transform_iterator (#7338) @codereport
  • Change block size parameter from a global to a template param. (#7333) @nvdbaranec
  • Partial clean up of ORC writer (#7324) @vuule
  • Add gbenchmark for cudf::strings::to_lower (#7316) @davidwendt
  • Update Java bindings version to 0.19-SNAPSHOT (#7307) @pxLi
  • Move cudf::test::make_counting_transform_iterator to cudf/detail/iterator.cuh (#7306) @codereport
  • Use string literals in fixed_point release_asserts (#7303) @codereport
  • Fix merge conflicts for #7295 (#7297) @ajschmidt8
  • Add UTF-8 chars to create_random_column<string_view> benchmark utility (#7292) @davidwendt
  • Abstracting block reduce and block scan from cuIO kernels with cub apis (#7278) @rgsl888prabhu
  • Build.sh use cmake --build to drive build system invocation (#7270) @robertmaynard
  • Refactor dictionary support for reductions any/all (#7242) @davidwendt
  • Replace stream.value() with stream for stream_view args (#7236) @karthikeyann
  • Interval index and interval_range (#7182) @marlenezw
  • avro reader integration tests (#7156) @cwharris
  • Rework libcudf CMakeLists.txt to export targets for CPM (#7107) @trxcllnt
  • Adding Interval Dtype (#6984) @marlenezw
  • Cleaning up for loops with make_(counting_)transform_iterator (#6546) @codereport
cudf - v0.18.1

Published by GPUtester over 3 years ago

cudf - v0.18.0

Published by GPUtester over 3 years ago

Breaking Changes 🚨

  • Default groupby to sort=False (#7180) @isVoid
  • Add libcudf API for parsing of ORC statistics (#7136) @vuule
  • Replace ORC writer api with class (#7099) @rgsl888prabhu
  • Pack/unpack functionality to convert tables to and from a serialized format. (#7096) @nvdbaranec
  • Replace parquet writer api with class (#7058) @rgsl888prabhu
  • Add days check to cudf::is_timestamp using cuda::std::chrono classes (#7028) @davidwendt
  • Fix default parameter values of write_csv and write_parquet (#6967) @vuule
  • Align Series.groupby API to match Pandas (#6964) @kkraus14
  • Share factorize implementation with Index and cudf module (#6885) @brandon-b-miller

Bug Fixes πŸ›

  • Remove incorrect std::move call on return variable (#7319) @davidwendt
  • Fix failing CI ORC test (#7313) @vuule
  • Disallow constructing frames from a ColumnAccessor (#7298) @shwina
  • fix java cuFile tests (#7296) @rongou
  • Fix style issues related to NumPy (#7279) @shwina
  • Fix bug when iloc slice terminates at before-the-zero position (#7277) @isVoid
  • Fix copying dtype metadata after calling libcudf functions (#7271) @shwina
  • Move lists utility function definition out of header (#7266) @mythrocks
  • Throw if bool column would cause incorrect result when writing to ORC (#7261) @vuule
  • Use uvector in replace_nulls; Fix sort_helper::grouped_value doc (#7256) @isVoid
  • Remove floating point types from cudf::sort fast-path (#7250) @davidwendt
  • Disallow picking output columns from nested columns. (#7248) @devavret
  • Fix loc for Series with a MultiIndex (#7243) @shwina
  • Fix Arrow column test leaks (#7241) @tgravescs
  • Fix test column vector leak (#7238) @kuhushukla
  • Fix some bugs in java scalar support for decimal (#7237) @revans2
  • Improve assert_eq handling of scalar (#7220) @isVoid
  • Fix missing null_count() comparison in test framework and related failures (#7219) @nvdbaranec
  • Remove floating point types from radix sort fast-path (#7215) @davidwendt
  • Fixing parquet benchmarks (#7214) @rgsl888prabhu
  • Handle various parameter combinations in replace API (#7207) @galipremsagar
  • Export mock aws credentials for s3 tests (#7176) @ayushdg
  • Add MultiIndex.rename API (#7172) @isVoid
  • Fix importing list & struct types in from_arrow (#7162) @galipremsagar
  • Fixing parquet precision writing failing if scale is equal to precision (#7146) @hyperbolic2346
  • Update s3 tests to use moto_server (#7144) @ayushdg
  • Fix JIT cache multi-process test flakiness in slow drives (#7142) @devavret
  • Fix compilation errors in libcudf (#7138) @galipremsagar
  • Fix compilation failure caused by -Wall addition. (#7134) @codereport
  • Add informative error message for sep in CSV writer (#7095) @galipremsagar
  • Add JIT cache per compute capability (#7090) @devavret
  • Implement __hash__ method for ListDtype (#7081) @galipremsagar
  • Only upload packages that were built (#7077) @raydouglass
  • Fix comparisons between Series and cudf.NA (#7072) @brandon-b-miller
  • Handle nan values correctly in Series.one_hot_encoding (#7059) @galipremsagar
  • Add unstack() support for non-multiindexed dataframes (#7054) @isVoid
  • Fix read_orc for decimal type (#7034) @rgsl888prabhu
  • Fix backward compatibility of loading a 0.16 pkl file (#7033) @galipremsagar
  • Decimal casts in JNI became a NOOP (#7032) @revans2
  • Restore usual instance/subclass checking to cudf.DateOffset (#7029) @shwina
  • Add days check to cudf::is_timestamp using cuda::std::chrono classes (#7028) @davidwendt
  • Fix to_csv delimiter handling of timestamp format (#7023) @davidwendt
  • Pin librdkakfa to gcc 7 compatible version (#7021) @raydouglass
  • Fix fillna & dropna to also consider np.nan as a missing value (#7019) @galipremsagar
  • Fix round operator's HALF_EVEN computation for negative integers (#7014) @nartal1
  • Skip Thrust sort patch if already applied (#7009) @harrism
  • Fix cudf::hash_partition for decimal32 and decimal64 (#7006) @codereport
  • Fix Thrust unroll patch command (#7002) @harrism
  • Fix loc behaviour when key of incorrect type is used (#6993) @shwina
  • Fix int to datetime conversion in csv_read (#6991) @kaatish
  • fix excluding cufile tests by default (#6988) @rongou
  • Fix java cufile tests when cufile is not installed (#6987) @revans2
  • Make cudf::round for fixed_point when scale = -decimal_places a no-op (#6975) @codereport
  • Fix type comparison for java (#6970) @revans2
  • Fix default parameter values of write_csv and write_parquet (#6967) @vuule
  • Align Series.groupby API to match Pandas (#6964) @kkraus14
  • Fix timestamp parsing in ORC reader for timezones without transitions (#6959) @vuule
  • Fix typo in numerical.py (#6957) @rgsl888prabhu
  • fixed_point_value double-shifts in fixed_point construction (#6950) @codereport
  • fix libcu++ include path for jni (#6948) @rongou
  • Fix groupby agg/apply behaviour when no key columns are provided (#6945) @shwina
  • Avoid inserting null elements into join hash table when nulls are treated as unequal (#6943) @hyperbolic2346
  • Fix cudf::merge gtest for dictionary columns (#6942) @davidwendt
  • Pass numeric scalars of the same dtype through numeric binops (#6938) @brandon-b-miller
  • Fix N/A detection for empty fields in CSV reader (#6922) @vuule
  • Fix rmm_mode=managed parameter for gtests (#6912) @davidwendt
  • Fix nullmask offset handling in parquet and orc writer (#6889) @kaatish
  • Correct the sampling range when sampling with replacement (#6884) @ChrisJar
  • Handle nested string columns with no children in contiguous_split. (#6864) @nvdbaranec
  • Fix columns & index handling in dataframe constructor (#6838) @galipremsagar

Documentation πŸ“–

  • Update readme (#7318) @shwina
  • Fix typo in cudf.core.column.string.extract docs (#7253) @adelevie
  • Update doxyfile project number (#7161) @davidwendt
  • Update 10 minutes to cuDF and CuPy with new APIs (#7158) @ChrisJar
  • Cross link RMM & libcudf Doxygen docs (#7149) @ajschmidt8
  • Add documentation for support dtypes in all IO formats (#7139) @galipremsagar
  • Add groupby docs (#7100) @shwina
  • Update cudf python docstrings with new null representation (&lt;NA&gt;) (#7050) @galipremsagar
  • Make Doxygen comments formatting consistent (#7041) @vuule
  • Add docs for working with missing data (#7010) @galipremsagar
  • Remove warning in from_dlpack and to_dlpack methods (#7001) @miguelusque
  • libcudf Developer Guide (#6977) @harrism
  • Add JNI wrapper for the cuFile API (GDS) (#6940) @rongou

New Features πŸš€

  • Support numeric_only field for rank() (#7213) @isVoid
  • Add support for cudf::binary_operation TRUE_DIV for decimal32 and decimal64 (#7198) @codereport
  • Implement COLLECT rolling window aggregation (#7189) @mythrocks
  • Add support for array-like inputs in cudf.get_dummies (#7181) @galipremsagar
  • Default groupby to sort=False (#7180) @isVoid
  • Add libcudf lists column count_elements API (#7173) @davidwendt
  • Implement cudf::group_by (sort) for decimal32 and decimal64 (#7169) @codereport
  • Add encoding and compression argument to CSV writer (#7168) @VibhuJawa
  • cudf::rolling_window SUM support for decimal32 and decimal64 (#7147) @codereport
  • Adding support for explode to cuDF (#7140) @hyperbolic2346
  • Add libcudf API for parsing of ORC statistics (#7136) @vuule
  • update GDS/cuFile location for 0.9 release (#7131) @rongou
  • Add Segmented sort (#7122) @karthikeyann
  • Add cudf::binary_operation NULL_MIN, NULL_MAX & NULL_EQUALS for decimal32 and decimal64 (#7119) @codereport
  • Add scale and value methods to fixed_point (#7109) @codereport
  • Replace ORC writer api with class (#7099) @rgsl888prabhu
  • Pack/unpack functionality to convert tables to and from a serialized format. (#7096) @nvdbaranec
  • Improve digitize API (#7071) @isVoid
  • Add List types support in data generator (#7064) @galipremsagar
  • cudf::scan support for decimal32 and decimal64 (#7063) @codereport
  • cudf::rolling ROW_NUMBER support for decimal32 and decimal64 (#7061) @codereport
  • Replace parquet writer api with class (#7058) @rgsl888prabhu
  • Support contains() on lists of primitives (#7039) @mythrocks
  • Implement cudf::rolling for decimal32 and decimal64 (#7037) @codereport
  • Add ffill and bfill to string columns (#7036) @isVoid
  • Enable round in cudf for DataFrame and Series (#7022) @ChrisJar
  • Extend replace_nulls_policy to string and dictionary type (#7004) @isVoid
  • Add segmented_gather(list_column, gather_list) (#7003) @karthikeyann
  • Add method field to fillna for fixed width columns (#6998) @isVoid
  • Manual merge of branch 0.17 into branch 0.18 (#6995) @shwina
  • Implement cudf::reduce for decimal32 and decimal64 (part 2) (#6980) @codereport
  • Add Ufunc alias look up for appropriate numpy ufunc dispatching (#6973) @VibhuJawa
  • Add pytest-xdist to dev environment.yml (#6958) @galipremsagar
  • Add Index.set_names api (#6929) @galipremsagar
  • Add replace_null API with replace_policy parameter, fixed_width column support (#6907) @isVoid
  • Share factorize implementation with Index and cudf module (#6885) @brandon-b-miller
  • Implement update() function (#6883) @skirui-source
  • Add groupby idxmin, idxmax aggregation (#6856) @karthikeyann
  • Implement cudf::reduce for decimal32 and decimal64 (part 1) (#6814) @codereport
  • Implement cudf.DateOffset for months (#6775) @brandon-b-miller
  • Add Python DecimalColumn (#6715) @shwina
  • Add dictionary support to libcudf groupby functions (#6585) @davidwendt

Improvements πŸ› οΈ

  • Update stale GHA with exemptions & new labels (#7395) @mike-wendt
  • Add GHA to mark issues/prs as stale/rotten (#7388) @Ethyling
  • Unpin from numpy < 1.20 (#7335) @shwina
  • Prepare Changelog for Automation (#7309) @galipremsagar
  • Prepare Changelog for Automation (#7272) @ajschmidt8
  • Add JNI support for converting Arrow buffers to CUDF ColumnVectors (#7222) @tgravescs
  • Add coverage for skiprows and num_rows in parquet reader fuzz testing (#7216) @galipremsagar
  • Define and implement more behavior for merging on categorical variables (#7209) @brandon-b-miller
  • Add CudfSeriesGroupBy to optimize dask_cudf groupby-mean (#7194) @rjzamora
  • Add dictionary column support to rolling_window (#7186) @davidwendt
  • Modify the semantics of end pointers in cuIO to match standard library (#7179) @vuule
  • Adding unit tests for fixed_point with extremely large scales (#7178) @codereport
  • Fast path single column sort (#7167) @davidwendt
  • Fix -Werror=sign-compare errors in device code (#7164) @trxcllnt
  • Refactor cudf::string_view host and device code (#7159) @davidwendt
  • Enable logic for GPU auto-detection in cudfjni (#7155) @gerashegalov
  • Java bindings for Fixed-point type support for Parquet (#7153) @razajafri
  • Add Java interface for the new API 'explode' (#7151) @firestarman
  • Replace offsets with iterators in cuIO utilities and CSV parser (#7150) @vuule
  • Add gbenchmarks for reduction aggregations any() and all() (#7129) @davidwendt
  • Update JNI for contiguous_split packed results (#7127) @jlowe
  • Add JNI and Java bindings for list_contains (#7125) @kuhushukla
  • Add Java unit tests for window aggregate 'collect' (#7121) @firestarman
  • verify window operations on decimal with java tests (#7120) @sperlingxx
  • Adds in JNI support for creating an list column from existing columns (#7112) @revans2
  • Build libcudf with -Wall (#7105) @trxcllnt
  • Add column_device_view pointers to EncColumnDesc (#7097) @kaatish
  • Add pyorc to dev environment (#7085) @galipremsagar
  • JNI support for creating struct column from existing columns and fixed bug in struct with no children (#7084) @revans2
  • Fastpath single strings column in cudf::sort (#7075) @davidwendt
  • Upgrade nvcomp to 1.2.1 (#7069) @rongou
  • Refactor ORC ProtobufReader to make it more extendable (#7055) @vuule
  • Add Java tests for decimal casts (#7051) @sperlingxx
  • Auto-label PRs based on their content (#7044) @jolorunyomi
  • Create sort gbenchmark for strings column (#7040) @davidwendt
  • Refactor io memory fetches to use hostdevice_vector methods (#7035) @ChrisJar
  • Spark Murmur3 hash functionality (#7024) @rwlee
  • Fix libcudf strings logic where size_type is used to access INT32 column data (#7020) @davidwendt
  • Adding decimal writing support to parquet (#7017) @hyperbolic2346
  • Add compression="infer" as default for dask_cudf.read_csv (#7013) @rjzamora
  • Correct ORC docstring; other minor cuIO improvements (#7012) @vuule
  • Reduce number of hostdevice_vector allocations in parquet reader (#7005) @devavret
  • Check output size overflow on strings gather (#6997) @davidwendt
  • Improve representation of MultiIndex (#6992) @galipremsagar
  • Disable some pragma unroll statements in thrust sort.h (#6982) @davidwendt
  • Minor cudf::round internal refactoring (#6976) @codereport
  • Add Java bindings for URL conversion (#6972) @jlowe
  • Enable strict_decimal_types in parquet reading (#6969) @sperlingxx
  • Add in basic support to JNI for logical_cast (#6954) @revans2
  • Remove duplicate file array_tests.cpp (#6953) @karthikeyann
  • Add null mask fixed_point_column_wrapper constructors (#6951) @codereport
  • Update Java bindings version to 0.18-SNAPSHOT (#6949) @jlowe
  • Use simplified rmm::exec_policy (#6939) @harrism
  • Add null count test for apply_boolean_mask (#6903) @harrism
  • Implement DataFrame.quantile for datetime and timedelta data types (#6902) @ChrisJar
  • Remove **kwargs from string/categorical methods (#6750) @shwina
  • Refactor rolling.cu to reduce compile time (#6512) @mythrocks
  • Add static type checking via Mypy (#6381) @shwina
  • Update to official libcu++ on Github (#6275) @trxcllnt
Package Rankings
Top 5.32% on Pypi.org
Top 8.17% on Proxy.golang.org
Top 4.8% on Repo1.maven.org