cudf | Python Ecosystem Directory

Bot releases are visible (Hide)

cudf - v21.08.01

Published by GPUtester about 3 years ago

v21.08.01

cudf - v21.08.00

Published by GPUtester about 3 years ago

🚨 Breaking Changes

Fix a crash in pack() when being handed tables with no columns. (#8697) @nvdbaranec
Remove unused cudf::strings::create_offsets (#8663) @davidwendt
Add delimiter parameter to cudf::strings::capitalize() (#8620) @davidwendt
Change default datetime index resolution to ns to match pandas (#8611) @vyasr
Add sequence_type parameter to cudf::strings::title function (#8602) @davidwendt
Add strings::repeat_strings API that can repeat each string a different number of times (#8561) @ttnghia
String-to-boolean conversion is different from Pandas (#8549) @skirui-source
Add accurate hash join size functions (#8453) @PointKernel
Expose a Decimal32Dtype in cuDF Python (#8438) @skirui-source
Update dask make_meta changes to be compatible with dask upstream (#8426) @galipremsagar
Adapt cudf::scalar classes to changes in rmm::device_scalar (#8411) @harrism
Remove special Index class from the general index class hierarchy (#8309) @vyasr
Add first-class dtype utilities (#8308) @vyasr
ORC - Support reading multiple orc files/buffers in a single operation (#8142) @jdye64
Upgrade arrow to 4.0.1 (#7495) @galipremsagar

🐛 Bug Fixes

Fix contains check in string column (#8834) @galipremsagar
Remove unused variable from row_bit_count_test. (#8829) @mythrocks
Fixes issue with null struct columns in ORC reader (#8819) @rgsl888prabhu
Set CMake vars for python/parquet support in libarrow builds (#8808) @vyasr
Handle empty child columns in row_bit_count() (#8791) @mythrocks
Revert "Remove cudf unneeded build time requirement of the cuda driver" (#8784) @robertmaynard
Fix isort error in utils.pyx (#8771) @charlesbluca
Handle sliced struct/list columns properly in concatenate() bounds checking. (#8760) @nvdbaranec
Fix issues with _CPackedColumns.serialize() handling of host and device data (#8759) @charlesbluca
Fix issues with MultiIndex in dropna, stack & reset_index (#8753) @galipremsagar
Write pandas extension types to parquet file metadata (#8749) @devavret
Fix where to handle DataFrame & Series input combination (#8747) @galipremsagar
Fix replace to handle null values correctly (#8744) @galipremsagar
Handle sliced structs properly in pack/contiguous_split. (#8739) @nvdbaranec
Fix issue in slice() where columns with a positive offset were computing null counts incorrectly. (#8738) @nvdbaranec
Fix cudf.Series constructor to handle list of sequences (#8735) @galipremsagar
Fix min/max sorted groupby aggregation on string column with nulls (argmin, argmax sentinel value missing on nulls) (#8731) @karthikeyann
Fix orc reader assert on create data_type in debug (#8706) @davidwendt
Fix min/max inclusive cudf::scan for strings column (#8705) @davidwendt
JNI: Fix driver version assertion logic in testGetCudaRuntimeInfo (#8701) @sperlingxx
Adding fix for skip_rows and crash in orc reader (#8700) @rgsl888prabhu
Bug fix: replace_nulls_policy functor not returning correct indices for gathermap (#8699) @isVoid
Fix a crash in pack() when being handed tables with no columns. (#8697) @nvdbaranec
Add post-processing steps to dask_cudf.groupby.CudfSeriesGroupby.aggregate (#8694) @charlesbluca
JNI build no longer looks for Arrow in conda environment (#8686) @jlowe
Handle arbitrarily different data in null list column rows when checking for equivalency. (#8666) @nvdbaranec
Add ConfigureNVBench to avoid concurrent main() entry points (#8662) @PointKernel
Pin *arrow to use *cuda in run (#8651) @jakirkham
Add proper support for tolerances in testing methods. (#8649) @vyasr
Support multi-char case conversion in capitalize function (#8647) @davidwendt
Fix repeated mangled names in read_csv with duplicate column names (#8645) @karthikeyann
Temporarily disable libcudf example build tests (#8642) @isVoid
Use conda-sourced cudf artifacts for libcudf example in CI (#8638) @isVoid
Ensure dev environment uses Arrow GPU packages (#8637) @charlesbluca
Fix bug that columns only initialized once when specified columns and index in dataframe ctor (#8628) @isVoid
Propagate **kwargs through to as_*_column methods (#8618) @shwina
Fix orc_reader_benchmark.cpp compile error (#8609) @davidwendt
Fix missed renumbering of Aggregation values (#8600) @revans2
Update cmake to 3.20.5 in the Java Docker image (#8593) @NvTimLiu
Fix bug in replace_with_backrefs when group has greedy quantifier (#8575) @davidwendt
Apply metadata to keys before returning in Frame._encode (#8560) @charlesbluca
Fix for strings containing special JSON characters in get_json_object(). (#8556) @nvdbaranec
Fix debug compile error in gather_struct_tests.cpp (#8554) @davidwendt
String-to-boolean conversion is different from Pandas (#8549) @skirui-source
Fix __repr__ output with display.max_rows is None (#8547) @galipremsagar
Fix size passed to column constructors in _with_type_metadata (#8539) @shwina
Properly retrieve last column when -1 is specified for column index (#8529) @isVoid
Fix importing apply from dask (#8517) @galipremsagar
Fix offset of the string dictionary length stream (#8515) @vuule
Fix double counting of selected columns in CSV reader (#8508) @ochan1
Incorrect map size in scatter_to_gather corrupts struct columns (#8507) @gerashegalov
replace_nulls properly propagates memory resource to gather calls (#8500) @robertmaynard
Disallow groupby aggs for StructColumns (#8499) @charlesbluca
Fixes out-of-bounds access for small files in unzip (#8498) @elstehle
Adding support for writing empty dataframe (#8490) @shaneding
Fix exclusive scan when including nulls and improve testing (#8478) @harrism
Add workaround for crash in libcudf debug build using output_indexalator in thrust::lower_bound (#8432) @davidwendt
Install only the same Thrust files that Thrust itself installs (#8420) @robertmaynard
Add nightly version for ucx-py in ci script (#8419) @galipremsagar
Fix null_equality config of rolling_collect_set (#8415) @sperlingxx
CollectSetAggregation: implement RollingAggregation interface (#8406) @sperlingxx
Handle pre-sliced nested columns in contiguous_split. (#8391) @nvdbaranec
Fix bitmask_tests.cpp host accessing device memory (#8370) @davidwendt
Fix concurrent_unordered_map to prevent accessing padding bits in pair_type (#8348) @davidwendt
BUG FIX: Raise appropriate strings error when concatenating strings column (#8290) @skirui-source
Make gpuCI and pre-commit style configurations consistent (#8215) @charlesbluca
Add collect list to dask-cudf groupby aggregations (#8045) @charlesbluca

📖 Documentation

Update Python UDFs notebook (#8810) @brandon-b-miller
Fix dask.dataframe API docs links after reorg (#8772) @jsignell
Fix instructions for running cuDF/dask-cuDF tests in CONTRIBUTING.md (#8724) @shwina
Translate Markdown documentation to rST and remove recommonmark (#8698) @vyasr
Fixed spelling mistakes in libcudf documentation (#8664) @karthikeyann
Custom Sphinx Extension: PandasCompat (#8643) @isVoid
Fix README.md (#8535) @ajschmidt8
Change namespace contains_nulls to struct (#8523) @davidwendt
Add info about NVTX ranges to dev guide (#8461) @jrhemstad
Fixed documentation bug in groupby agg method (#8325) @ahmet-uyar

🚀 New Features

Fix concatenating structs (#8811) @shaneding
Implement JNI for groupby aggregations M2 and MERGE_M2 (#8763) @ttnghia
Bump isort to 5.6.4 and remove isort overrides made for 5.0.7 (#8755) @charlesbluca
Implement __setitem__ for StructColumn (#8737) @shaneding
Add is_leap_year to DateTimeProperties and DatetimeIndex (#8736) @isVoid
Add struct.explode() method (#8729) @shwina
Add DataFrame.to_struct() method to convert a DataFrame to a struct Series (#8728) @shwina
Add support for list type in ORC writer (#8723) @vuule
Fix slicing from struct columns and accessing struct columns (#8719) @shaneding
Add datetime::is_leap_year (#8711) @isVoid
Accessing struct columns from dask_cudf (#8675) @shaneding
Added pct_change to Series (#8650) @TravisHester
Add strings support to cudf::shift function (#8648) @davidwendt
Support Scatter struct_scalar (#8630) @isVoid
Struct scalar from host dictionary (#8629) @shaneding
Add dayofyear and day_of_year to Series, DatetimeColumn, and DatetimeIndex (#8626) @beckernick
JNI support for capitalize (#8624) @firestarman
Add delimiter parameter to cudf::strings::capitalize() (#8620) @davidwendt
Add NVBench in CMake (#8619) @PointKernel
Change default datetime index resolution to ns to match pandas (#8611) @vyasr
ListColumn __setitem__ (#8606) @brandon-b-miller
Implement groupby aggregations M2 and MERGE_M2 (#8605) @ttnghia
Add sequence_type parameter to cudf::strings::title function (#8602) @davidwendt
Adding support for list and struct type in ORC Reader (#8599) @rgsl888prabhu
Benchmark for strings::repeat_strings APIs (#8589) @ttnghia
Nested scalar support for copy if else (#8588) @gerashegalov
User specified decimal columns to float64 (#8587) @jdye64
Add get_element for struct column (#8578) @isVoid
Python changes for adding __getitem__ for struct (#8577) @shaneding
Add strings::repeat_strings API that can repeat each string a different number of times (#8561) @ttnghia
Refactor tests/iterator_utilities.hpp functions (#8540) @ttnghia
Support MERGE_LISTS and MERGE_SETS in Java package (#8516) @sperlingxx
Decimal support csv reader (#8511) @elstehle
Add column type tests (#8505) @isVoid
Warn when downscaling decimal columns (#8492) @ChrisJar
Add JNI for strings::repeat_strings (#8491) @ttnghia
Add Index.get_loc for Numerical, String Index support (#8489) @isVoid
Expose half_up rounding in cuDF (#8477) @shwina
Java APIs to fetch CUDA runtime info (#8465) @sperlingxx
Add str.edit_distance_matrix (#8463) @isVoid
Support constructing cudf.Scalar objects from host side lists (#8459) @brandon-b-miller
Add accurate hash join size functions (#8453) @PointKernel
Add cudf::strings::integer_to_hex convert API (#8450) @davidwendt
Create objects from iterables that contain cudf.NA (#8442) @brandon-b-miller
JNI bindings for sort_lists (#8439) @sperlingxx
Expose a Decimal32Dtype in cuDF Python (#8438) @skirui-source
Replace all_null() and all_valid() by iterator_all_nulls() and iterator_no_null() in tests (#8437) @ttnghia
Implement groupby MERGE_LISTS and MERGE_SETS aggregates (#8436) @ttnghia
Add public libcudf match_dictionaries API (#8429) @davidwendt
Add move constructors for string_scalar and struct_scalar (#8428) @ttnghia
Implement strings::repeat_strings (#8423) @ttnghia
STRUCT column support for cudf::merge. (#8422) @nvdbaranec
Implement reverse in libcudf (#8410) @shaneding
Support multiple input files/buffers for read_json (#8403) @jdye64
Improve test coverage for struct search (#8396) @ttnghia
Add groupby.fillna (#8362) @isVoid
Enable AST-based joining (#8214) @vyasr
Generalized null support in user defined functions (#8213) @brandon-b-miller
Add compiled binary operation (#8192) @karthikeyann
Implement .describe() for DataFrameGroupBy (#8179) @skirui-source
ORC - Support reading multiple orc files/buffers in a single operation (#8142) @jdye64
Add Python bindings for lists::concatenate_list_elements and expose them as .list.concat() (#8006) @shwina
Use Arrow URI FileSystem backed instance to retrieve remote files (#7709) @jdye64
Example to build custom application and link to libcudf (#7671) @isVoid
Upgrade arrow to 4.0.1 (#7495) @galipremsagar

🛠️ Improvements

Provide a better error message when CUDA::cuda_driver not found (#8794) @robertmaynard
Remove anonymous namespace from null_mask.cuh (#8786) @nvdbaranec
Allow cudf to be built without libcuda.so existing (#8751) @robertmaynard
Pin mimesis to <4.1 (#8745) @galipremsagar
Update conda environment name for CI (#8692) @ajschmidt8
Remove flatbuffers dependency (#8671) @Ethyling
Add options to build Arrow with Python and Parquet support (#8670) @trxcllnt
Remove unused cudf::strings::create_offsets (#8663) @davidwendt
Update GDS lib version to 1.0.0 (#8654) @pxLi
Support for groupby/scan rank and dense_rank aggregations (#8652) @rwlee
Fix usage of deprecated arrow ipc API (#8632) @revans2
Use absolute imports in cudf (#8631) @galipremsagar
ENH Add Java CI build script (#8627) @dillon-cullinan
Add DeprecationWarning to ser.str.subword_tokenize (#8603) @VibhuJawa
Rewrite binary operations for improved performance and additional type support (#8598) @vyasr
Fix mypy errors surfacing because of numpy-1.21.0 (#8595) @galipremsagar
Remove unneeded includes from cudf::string_view headers (#8594) @davidwendt
Use cmake 3.20.1 as it is now required by rmm (#8586) @robertmaynard
Remove device debug symbols from cmake CUDF_CUDA_FLAGS (#8584) @davidwendt
Dask-CuDF: use default Dask Dataframe optimizer (#8581) @madsbk
Remove checking if an unsigned value is less than zero (#8579) @robertmaynard
Remove strings_count parameter from cudf::strings::detail::create_chars_child_column (#8576) @davidwendt
Make cudf.api.types imports consistent (#8571) @galipremsagar
Modernize libcudf basic example CMakeFile; updates CI build tests (#8568) @isVoid
Rename concatenate_tests.cu to .cpp (#8555) @davidwendt
enable window lead/lag test on struct (#8548) @wbo4958
Add Java methods to split and write column views (#8546) @razajafri
Small cleanup (#8534) @codereport
Unpin dask version in CI (#8533) @galipremsagar
Added optional flag for building Arrow with S3 filesystem support (#8531) @jdye64
Minor clean up of various internal column and frame utilities (#8528) @vyasr
Rename some copying_test source files .cu to .cpp (#8527) @davidwendt
Correct the last warnings and issues when using newer cuda versions (#8525) @robertmaynard
Correct unused parameter warnings in transform and unary ops (#8521) @robertmaynard
Correct unused parameter warnings in string algorithms (#8509) @robertmaynard
Add in JNI APIs for scan, replace_nulls, group_by.scan, and group_by.replace_nulls (#8503) @revans2
Fix 21.08 forward-merge conflicts (#8502) @ajschmidt8
Fix Cython formatting command in Contributing.md. (#8496) @marlenezw
Bug/correct unused parameters in reshape and text (#8495) @robertmaynard
Correct unused parameter warnings in partitioning and stream compact (#8494) @robertmaynard
Correct unused parameter warnings in labelling and list algorithms (#8493) @robertmaynard
Refactor index construction (#8485) @vyasr
Correct unused parameter warnings in replace algorithms (#8483) @robertmaynard
Correct unused parameter warnings in reduction algorithms (#8481) @robertmaynard
Correct unused parameter warnings in io algorithms (#8480) @robertmaynard
Correct unused parameter warnings in interop algorithms (#8479) @robertmaynard
Correct unused parameter warnings in filling algorithms (#8468) @robertmaynard
Correct unused parameter warnings in groupby (#8467) @robertmaynard
use libcu++ time_point as timestamp (#8466) @karthikeyann
Modify reprog_device::extract to return groups in a single pass (#8460) @davidwendt
Update minimum Dask requirement to 2021.6.0 (#8458) @pentschev
Fix failures when performing binary operations on DataFrames with empty columns (#8452) @ChrisJar
Fix conflicts in 8447 (#8448) @ajschmidt8
Add serialization methods for List and StructDtype (#8441) @charlesbluca
Replace make_empty_strings_column with make_empty_column (#8435) @davidwendt
JNI bindings for get_element (#8433) @revans2
Update dask make_meta changes to be compatible with dask upstream (#8426) @galipremsagar
Unpin dask version on CI (#8425) @galipremsagar
Add benchmark for strings/fixed_point convert APIs (#8417) @davidwendt
Adapt cudf::scalar classes to changes in rmm::device_scalar (#8411) @harrism
Add benchmark for strings/integers convert APIs (#8402) @davidwendt
Enable multi-file partitioning in dask_cudf.read_parquet (#8393) @rjzamora
Correct unused parameter warnings in rolling algorithms (#8390) @robertmaynard
Correct unused parameters in column round and search (#8389) @robertmaynard
Add functionality to apply Dtype metadata to ColumnBase (#8373) @charlesbluca
Refactor setting stack size in regex code (#8358) @davidwendt
Update Java bindings to 21.08-SNAPSHOT (#8344) @pxLi
Replace remaining uses of device_vector (#8343) @harrism
Statically link libnvcomp into libcudfjni (#8334) @jlowe
Resolve auto merge conflicts for Branch 21.08 from branch 21.06 (#8329) @galipremsagar
Minor code refactor for sorted_order (#8326) @wbo4958
Remove special Index class from the general index class hierarchy (#8309) @vyasr
Add first-class dtype utilities (#8308) @vyasr
Add option to link Java bindings with Arrow dynamically (#8307) @jlowe
Refactor ColumnMethods and its subclasses to remove column argument and require parent argument (#8306) @shwina
Refactor scatter for list columns (#8255) @isVoid
Expose pack/unpack API to Python (#8153) @charlesbluca
Adding cudf.cut method (#8002) @marlenezw
Optimize string gather performance for large strings (#7980) @gaohao95
Add peak memory usage tracking to cuIO benchmarks (#7770) @devavret
Updating Clang Version to 11.0.0 (#6695) @codereport

cudf - v21.06.01

Published by GPUtester over 3 years ago

cudf - v21.06.00

Published by GPUtester over 3 years ago

🚨 Breaking Changes

Add support for make_meta_obj dispatch in dask-cudf (#8342) @galipremsagar
Add separator-on-null parameter to strings concatenate APIs (#8282) @davidwendt
Introduce a common parent class for NumericalColumn and DecimalColumn (#8278) @vyasr
Update ORC statistics API to use C++17 standard library (#8241) @vuule
Preserve column hierarchy when getting NULL row from LIST column (#8206) @isVoid
Groupby.shift c++ API refactor and python binding (#8131) @isVoid

🐛 Bug Fixes

Fix struct flattening to add a validity column only when the input column has null element (#8374) @ttnghia
Compilation fix: Remove redefinition for std::is_same_v() (#8369) @mythrocks
Add backward compatibility for dask-cudf to work with other versions of dask (#8368) @galipremsagar
Handle empty results with nested types in copy_if_else (#8359) @nvdbaranec
Handle nested column types properly for empty parquet files. (#8350) @nvdbaranec
Raise error when unsupported arguments are passed to dask_cudf.DataFrame.sort_values (#8349) @galipremsagar
Raise NotImplementedError for axis=1 in rank (#8347) @galipremsagar
Add support for make_meta_obj dispatch in dask-cudf (#8342) @galipremsagar
Update Java string concatenate test for single column (#8330) @tgravescs
Use empty_like in scatter (#8314) @revans2
Fix concatenate_lists_ignore_null on rows of all_nulls (#8312) @sperlingxx
Add separator-on-null parameter to strings concatenate APIs (#8282) @davidwendt
COLLECT_LIST support returning empty output columns. (#8279) @mythrocks
Update io util to convert path like object to string (#8275) @ayushdg
Fix result column types for empty inputs to rolling window (#8274) @mythrocks
Actually test equality in assert_groupby_results_equal (#8272) @shwina
CMake always explicitly specify a source files extension (#8270) @robertmaynard
Fix struct binary search and struct flattening (#8268) @ttnghia
Revert "patch thrust to fix intmax num elements limitation in scan_by_key" (#8263) @cwharris
upgrade dlpack to 0.5 (#8262) @cwharris
Fixes CSV-reader type inference for thousands separator and decimal point (#8261) @elstehle
Fix incorrect assertion in Java concat (#8258) @sperlingxx
Copy nested types upon construction (#8244) @isVoid
Preserve column hierarchy when getting NULL row from LIST column (#8206) @isVoid
Clip decimal binary op precision at max precision (#8194) @ChrisJar

📖 Documentation

Add docstring for dask_cudf.read_csv (#8355) @galipremsagar
Fix cudf release version in readme (#8331) @galipremsagar
Fix structs column description in dev docs (#8318) @isVoid
Update readme with correct CUDA versions (#8315) @raydouglass
Add description of the cuIO GDS integration (#8293) @vuule
Remove unused parameter from copy_partition kernel documentation (#8283) @robertmaynard

🚀 New Features

Add support merging b/w categorical data (#8332) @galipremsagar
Java: Support struct scalar (#8327) @sperlingxx
added _is_homogeneous property (#8299) @shaneding
Added decimal writing for CSV writer (#8296) @kaatish
Java: Support creating a scalar from utf8 string (#8294) @firestarman
Add Java API for Concatenate strings with separator (#8289) @tgravescs
strings::join_list_elements options for empty list inputs (#8285) @ttnghia
Return python lists for getitem calls to list type series (#8265) @brandon-b-miller
add unit tests for lead/lag on list for row window (#8259) @wbo4958
Create a String column from UTF8 String byte arrays (#8257) @firestarman
Support scattering list_scalar (#8256) @isVoid
Implement lists::concatenate_list_elements (#8231) @ttnghia
Support for struct scalars. (#8220) @nvdbaranec
Add support for decimal types in ORC writer (#8198) @vuule
Support create lists column from a list_scalar (#8185) @isVoid
Groupby.shift c++ API refactor and python binding (#8131) @isVoid
Add groupby::replace_nulls(replace_policy) api (#7118) @isVoid

🛠️ Improvements

Support Dask + Distributed 2021.05.1 (#8392) @jakirkham
Add aliases for string methods (#8353) @shwina
Update environment variable used to determine cuda_version (#8321) @ajschmidt8
JNI: Refactor the code of making column from scalar (#8310) @firestarman
Update CHANGELOG.md links for calver (#8303) @ajschmidt8
Merge branch-0.19 into branch-21.06 (#8302) @ajschmidt8
use address and length for GDS reads/writes (#8301) @rongou
Update cudfjni version to 21.06.0 (#8292) @pxLi
Update docs build script (#8284) @ajschmidt8
Make device_buffer streams explicit and enforce move construction (#8280) @harrism
Introduce a common parent class for NumericalColumn and DecimalColumn (#8278) @vyasr
Do not add nulls to the hash table when null_equality::NOT_EQUAL is passed to left_semi_join and left_anti_join (#8277) @nvdbaranec
Enable implicit casting when concatenating mixed types (#8276) @ChrisJar
Fix CMake FindPackage rmm, pin dev envs' dlpack to v0.3 (#8271) @trxcllnt
Update cudfjni version to 21.06 (#8267) @pxLi
support RMM aligned resource adapter in JNI (#8266) @rongou
Pass compiler environment variables to conda python build (#8260) @Ethyling
Remove abc inheritance from Serializable (#8254) @vyasr
Move more methods into SingleColumnFrame (#8253) @vyasr
Update ORC statistics API to use C++17 standard library (#8241) @vuule
Correct unused parameter warnings in dictonary algorithms (#8239) @robertmaynard
Correct unused parameters in the copying algorithms (#8232) @robertmaynard
IO statistics cleanup (#8191) @kaatish
Refactor of rolling_window implementation. (#8158) @nvdbaranec
Add a flag for allowing single quotes in JSON strings. (#8144) @nvdbaranec
Column refactoring 2 (#8130) @vyasr
support space in workspace (#7956) @jolorunyomi
Support collect_set on rolling window (#7881) @sperlingxx

cudf - v0.19.2

Published by GPUtester over 3 years ago

🚨 Breaking Changes

Allow hash_partition to take a seed value (#7771) @magnatelee
Allow merging index column with data column using keyword "on" (#7736) @skirui-source
Change JNI API to avoid loading native dependencies when creating sort order classes. (#7729) @revans2
Replace device_vector with device_uvector in null_mask (#7715) @harrism
Don't identify decimals as strings. (#7710) @vyasr
Fix Java Parquet write after writer API changes (#7655) @revans2
Convert cudf::concatenate APIs to use spans and device_uvector (#7621) @harrism
Update missing docstring examples in python public APIs (#7546) @galipremsagar
Remove unneeded step parameter from strings::detail::copy_slice (#7525) @davidwendt
Rename ARROW_STATIC_LIB because it conflicts with one in FindArrow.cmake (#7518) @trxcllnt
Match Pandas logic for comparing two objects with nulls (#7490) @brandon-b-miller
Add struct support to parquet writer (#7461) @devavret
Join APIs that return gathermaps (#7454) @shwina
fixed_point + cudf::binary_operation API Changes (#7435) @codereport
Fix BUG: Exception when PYTHONOPTIMIZE=2 (#7434) @skirui-source
Change nvtext::load_vocabulary_file to return a unique ptr (#7424) @davidwendt
Refactor strings column factories (#7397) @harrism
Use CMAKE_CUDA_ARCHITECTURES (#7391) @robertmaynard
Upgrade pandas to 1.2 (#7375) @galipremsagar
Rename logical_cast to bit_cast and allow additional conversions (#7373) @ttnghia
Rework libcudf CMakeLists.txt to export targets for CPM (#7107) @trxcllnt

🐛 Bug Fixes

unsnap: busy wait a number of cycles (#8073) @vuule
Fix returned column type when extracting from an empty list column (#8031) @jlowe
Don't reindex an new value on setitem if the original dataframe was empty (#8026) @vyasr
Fix a NameError in meta dispatch API (#7996) @galipremsagar
Reindex in DataFrame.__setitem__ (#7957) @galipremsagar
jitify direct-to-cubin compilation and caching. (#7919) @cwharris
Use dynamic cudart for nvcomp in java build (#7896) @abellina
fix "incompatible redefinition" warnings (#7894) @cwharris
cudf consistently specifies the cuda runtime (#7887) @robertmaynard
disable verbose output for jitify_preprocess (#7886) @cwharris
CMake jit_preprocess_files function only runs when needed (#7872) @robertmaynard
Push DeviceScalar construction into cython for list.contains (#7864) @brandon-b-miller
cudf now sets an install rpath of $ORIGIN (#7863) @robertmaynard
Don't install Thrust examples, tests, docs, and python files (#7811) @robertmaynard
Sort by index in groupby tests more consistently (#7802) @shwina
Revert "Update conda recipes pinning of repo dependencies (#7743)" (#7793) @raydouglass
Add decimal column handling in copy_type_metadata (#7788) @shwina
Add column names validation in parquet writer (#7786) @galipremsagar
Fix Java explode outer unit tests (#7782) @jlowe
Fix compiler warning about non-POD types passed through ellipsis (#7781) @jrhemstad
User resource fix for replace_nulls (#7769) @magnatelee
Fix type dispatch for columnar replace_nulls (#7768) @jlowe
Add ignore_order parameter to dask-cudf concat dispatch (#7765) @galipremsagar
Fix slicing and arrow representations of decimal columns (#7755) @vyasr
Fixing issue with explode_outer position not nulling position entries of null rows (#7754) @hyperbolic2346
Implement scatter for struct columns (#7752) @ttnghia
Fix data corruption in string columns (#7746) @galipremsagar
Fix string length in stripe dictionary building (#7744) @kaatish
Update conda recipes pinning of repo dependencies (#7743) @mike-wendt
Enable dask dispatch to cuDF's is_categorical_dtype for cuDF objects (#7740) @brandon-b-miller
Fix dictionary size computation in ORC writer (#7737) @vuule
Fix cudf::cast overflow for decimal64 to int32_t or smaller in certain cases (#7733) @codereport
Change JNI API to avoid loading native dependencies when creating sort order classes. (#7729) @revans2
Disable column_view data accessors for unsupported types (#7725) @jrhemstad
Materialize RangeIndex when index=True in parquet writer (#7711) @galipremsagar
Don't identify decimals as strings. (#7710) @vyasr
Fix return type of DataFrame.argsort (#7706) @galipremsagar
Fix/correct cudf installed package requirements (#7688) @robertmaynard
Fix SparkMurmurHash3_32 hash inconsistencies with Apache Spark (#7672) @jlowe
Fix ORC reader issue with reading empty string columns (#7656) @rgsl888prabhu
Fix Java Parquet write after writer API changes (#7655) @revans2
Fixing empty null lists throwing explode_outer for a loop. (#7649) @hyperbolic2346
Fix internal compiler error during JNI Docker build (#7645) @jlowe
Fix Debug build break with device_uvectors in grouped_rolling.cu (#7633) @mythrocks
Parquet reader: Fix issue when using skip_rows on non-nested columns containing nulls (#7627) @nvdbaranec
Fix ORC reader for empty DataFrame/Table (#7624) @rgsl888prabhu
Fix specifying GPU architecture in JNI build (#7612) @jlowe
Fix ORC writer OOM issue (#7605) @vuule
Fix 0.18 --> 0.19 automerge (#7589) @kkraus14
Fix ORC issue with incorrect timestamp nanosecond values (#7581) @vuule
Fix missing Dask imports (#7580) @kkraus14
CMAKE_CUDA_ARCHITECTURES doesn't change when build-system invokes cmake (#7579) @robertmaynard
Another fix for offsets_end() iterator in lists_column_view (#7575) @ttnghia
Fix ORC writer output corruption with string columns (#7565) @vuule
Fix cudf::lists::sort_lists failing for sliced column (#7564) @ttnghia
FIX Fix Anaconda upload args (#7558) @dillon-cullinan
Fix index mismatch issue in equality related APIs (#7555) @galipremsagar
FIX Revert gpuci_conda_retry on conda file output locations (#7552) @dillon-cullinan
Fix offset_end iterator for lists_column_view, which was not correctl… (#7551) @ttnghia
Fix no such file dlpack.h error when build libcudf (#7549) @chenrui17
Update missing docstring examples in python public APIs (#7546) @galipremsagar
Decimal32 Build Fix (#7544) @razajafri
FIX Retry conda output location (#7540) @dillon-cullinan
fix missing renames of dask git branches from master to main (#7535) @kkraus14
Remove detail from device_span (#7533) @rwlee
Change dask and distributed branch to main (#7532) @dantegd
Update JNI build to use CUDF_USE_ARROW_STATIC (#7526) @jlowe
Make sure rmm::rmm CMake target is visibile to cudf users (#7524) @robertmaynard
Fix contiguous_split not properly handling output partitions > 2 GB. (#7515) @nvdbaranec
Change jit launch to safe_launch (#7510) @devavret
Fix comparison between Datetime/Timedelta columns and NULL scalars (#7504) @brandon-b-miller
Fix off-by-one error in char-parallel string scalar replace (#7502) @jlowe
Fix JNI deprecation of all, put it on the wrong version before (#7501) @revans2
Fix Series/Dataframe Mixed Arithmetic (#7491) @brandon-b-miller
Fix JNI build after removal of libcudf sub-libraries (#7486) @jlowe
Correctly compile benchmarks (#7485) @robertmaynard
Fix bool column corruption with ORC Reader (#7483) @rgsl888prabhu
Fix __repr__ for categorical dtype (#7476) @galipremsagar
Java cleaner synchronization (#7474) @abellina
Fix java float/double parsing tests (#7473) @revans2
Pass stream and user resource to make_default_constructed_scalar (#7469) @magnatelee
Improve stability of dask_cudf.DataFrame.var and dask_cudf.DataFrame.std (#7453) @rjzamora
Missing device_storage_dispatch change affecting cudf::gather (#7449) @codereport
fix cuFile JNI compile errors (#7445) @rongou
Support Series.__setitem__ with key to a new row (#7443) @isVoid
Fix BUG: Exception when PYTHONOPTIMIZE=2 (#7434) @skirui-source
Make inclusive scan safe for cases with leading nulls (#7432) @magnatelee
Fix typo in list_device_view::pair_rep_end() (#7423) @mythrocks
Fix string to double conversion and row equivalent comparison (#7410) @ttnghia
Fix thrust failure when transfering data from device_vector to host_vector with vectors of size 1 (#7382) @ttnghia
Fix std::exeception catch-by-reference gcc9 compile error (#7380) @davidwendt
Fix skiprows issue with ORC Reader (#7359) @rgsl888prabhu
fix Arrow CMake file (#7358) @rongou
Fix lists::contains() for NaN and Decimals (#7349) @mythrocks
Handle cupy array in Dataframe.__setitem__ (#7340) @galipremsagar
Fix invalid-device-fn error in cudf::strings::replace_re with multiple regex's (#7336) @davidwendt
FIX Add codecov upload block to gpu script (#6860) @dillon-cullinan

📖 Documentation

Fix join API doxygen (#7890) @shwina
Add Resources to README. (#7697) @bdice
Add isin examples in Docstring (#7479) @galipremsagar
Resolving unlinked type shorthands in cudf doc (#7416) @isVoid
Fix typo in regex.md doc page (#7363) @davidwendt
Fix incorrect strings_column_view::chars_size documentation (#7360) @jlowe

🚀 New Features

Enable basic reductions for decimal columns (#7776) @ChrisJar
Enable join on decimal columns (#7764) @ChrisJar
Allow merging index column with data column using keyword "on" (#7736) @skirui-source
Implement DecimalColumn + Scalar and add cudf.Scalars of Decimal64Dtype (#7732) @brandon-b-miller
Add support for unique groupby aggregation (#7726) @shwina
Expose libcudf's label_bins function to cudf (#7724) @vyasr
Adding support for equi-join on struct (#7720) @hyperbolic2346
Add decimal column comparison operations (#7716) @isVoid
Implement scan operations for decimal columns (#7707) @ChrisJar
Enable typecasting between decimal and int (#7691) @ChrisJar
Enable decimal support in parquet writer (#7673) @devavret
Adds list.unique API (#7664) @isVoid
Fix NaN handling in drop_list_duplicates (#7662) @ttnghia
Add lists.sort_values API (#7657) @isVoid
Add is_integer API that can check for the validity of a string-to-integer conversion (#7642) @ttnghia
Adds explode API (#7607) @isVoid
Adds list.take, python binding for cudf::lists::segmented_gather (#7591) @isVoid
Implement cudf::label_bins() (#7554) @vyasr
Add Python bindings for lists::contains (#7547) @skirui-source
cudf::row_bit_count() support. (#7534) @nvdbaranec
Implement drop_list_duplicates (#7528) @ttnghia
Add Python bindings for lists::extract_lists_element (#7505) @skirui-source
Add explode_outer and explode_outer_position (#7499) @hyperbolic2346
Match Pandas logic for comparing two objects with nulls (#7490) @brandon-b-miller
Add struct support to parquet writer (#7461) @devavret
Enable type conversion from float to decimal type (#7450) @ChrisJar
Add cython for converting strings/fixed-point functions (#7429) @davidwendt
Add struct column support to cudf::sort and cudf::sorted_order (#7422) @karthikeyann
Implement groupby collect_set (#7420) @ttnghia
Merge branch-0.18 into branch-0.19 (#7411) @raydouglass
Refactor strings column factories (#7397) @harrism
Add groupby scan operations (sort groupby) (#7387) @karthikeyann
Add cudf::explode_position (#7376) @hyperbolic2346
Add string conversion to/from decimal values libcudf APIs (#7364) @davidwendt
Add groupby SUM_OF_SQUARES support (#7362) @karthikeyann
Add Series.drop api (#7304) @isVoid
get_json_object() implementation (#7286) @nvdbaranec
Python API for LIstMethods.len() (#7283) @isVoid
Support null_policy::EXCLUDE for COLLECT rolling aggregation (#7264) @mythrocks
Add support for special tokens in nvtext::subword_tokenizer (#7254) @davidwendt
Fix inplace update of data and add Series.update (#7201) @galipremsagar
Implement cudf::group_by (hash) for decimal32 and decimal64 (#7190) @codereport
Adding support to specify "level" parameter for Dataframe.rename (#7135) @skirui-source

🛠️ Improvements

fix GDS include path for version 0.95 (#7877) @rongou
Update dask + distributed to 2021.4.0 (#7858) @jakirkham
Add ability to extract include dirs from CUDF_HOME (#7848) @galipremsagar
Add USE_GDS as an option in build script (#7833) @pxLi
add an allocate method with stream in java DeviceMemoryBuffer (#7826) @rongou
Constrain dask and distributed versions to 2021.3.1 (#7825) @shwina
Revert dask versioning of concat dispatch (#7823) @galipremsagar
add copy methods in Java memory buffer (#7791) @rongou
Update README and CONTRIBUTING for 0.19 (#7778) @robertmaynard
Allow hash_partition to take a seed value (#7771) @magnatelee
Turn on NVTX by default in java build (#7761) @tgravescs
Add Java bindings to join gather map APIs (#7751) @jlowe
Add replacements column support for Java replaceNulls (#7750) @jlowe
Add Java bindings for row_bit_count (#7749) @jlowe
Remove unused JVM array creation (#7748) @jlowe
Added JNI support for new is_integer (#7739) @revans2
Create and promote library aliases in libcudf installations (#7734) @trxcllnt
Support groupby operations for decimal dtypes (#7731) @vyasr
Memory map the input file only when GDS compatiblity mode is not used (#7717) @vuule
Replace device_vector with device_uvector in null_mask (#7715) @harrism
Struct hashing support for SerialMurmur3 and SparkMurmur3 (#7714) @jlowe
Add gbenchmark for nvtext replace-tokens function (#7708) @davidwendt
Use stream in groupby calls (#7705) @karthikeyann
Update codeowners file (#7701) @ajschmidt8
Cleanup groupby to use host_span, device_span, device_uvector (#7698) @karthikeyann
Add gbenchmark for nvtext ngrams functions (#7693) @davidwendt
Misc Python/Cython optimizations (#7686) @shwina
Add gbenchmark for nvtext tokenize functions (#7684) @davidwendt
Add column_device_view to orc writer (#7676) @kaatish
cudf_kafka now uses cuDF CMake export targets (CPM) (#7674) @robertmaynard
Add gbenchmark for nvtext normalize functions (#7668) @davidwendt
Resolve unnecessary import of thrust/optional.hpp in types.hpp (#7667) @vyasr
Feature/optimize accessor copy (#7660) @vyasr
Fix find_package(cudf) (#7658) @trxcllnt
Work-around for gcc7 compile error on Centos7 (#7652) @davidwendt
Add in JNI support for count_elements (#7651) @revans2
Fix issues with building cudf in a non-conda environment (#7647) @galipremsagar
Refactor ConfigureCUDA to not conditionally insert compiler flags (#7643) @robertmaynard
Add gbenchmark for converting strings to/from timestamps (#7641) @davidwendt
Handle constructing a cudf.Scalar from a cudf.Scalar (#7639) @shwina
Add in JNI support for table partition (#7637) @revans2
Add explicit fixed_point merge test (#7635) @codereport
Add JNI support for IDENTITY hash partitioning (#7626) @revans2
Java support on explode_outer (#7625) @sperlingxx
Java support of casting string from/to decimal (#7623) @sperlingxx
Convert cudf::concatenate APIs to use spans and device_uvector (#7621) @harrism
Add gbenchmark for cudf::strings::translate function (#7617) @davidwendt
Use file(COPY ) over file(INSTALL ) so cmake output is reduced (#7616) @robertmaynard
Use rmm::device_uvector in place of rmm::device_vector for ORC reader/writer and cudf::io::column_buffer (#7614) @vuule
Refactor Java host-side buffer concatenation to expose separate steps (#7610) @jlowe
Add gbenchmarks for string substrings functions (#7603) @davidwendt
Refactor string conversion check (#7599) @ttnghia
JNI: Pass names of children struct columns to native Arrow IPC writer (#7598) @firestarman
Revert "ENH Fix stale GHA and prevent duplicates " (#7595) @mike-wendt
ENH Fix stale GHA and prevent duplicates (#7594) @mike-wendt
Fix auto-detecting GPU architectures (#7593) @trxcllnt
Reduce cudf library size (#7583) @robertmaynard
Optimize cudf::make_strings_column for long strings (#7576) @davidwendt
Always build and export the cudf::cudftestutil target (#7574) @trxcllnt
Eliminate literal parameters to uvector::set_element_async and device_scalar::set_value (#7563) @harrism
Add gbenchmark for strings::concatenate (#7560) @davidwendt
Update Changelog Link (#7550) @ajschmidt8
Add gbenchmarks for strings replace regex functions (#7541) @davidwendt
Add __repr__ for Column and ColumnAccessor (#7531) @shwina
Support Decimal DIV changes in cudf (#7527) @razajafri
Remove unneeded step parameter from strings::detail::copy_slice (#7525) @davidwendt
Use device_uvector, device_span in sort groupby (#7523) @karthikeyann
Add gbenchmarks for strings extract function (#7522) @davidwendt
Rename ARROW_STATIC_LIB because it conflicts with one in FindArrow.cmake (#7518) @trxcllnt
Reduce compile time/size for scan.cu (#7516) @davidwendt
Change device_vector to device_uvector in nvtext source files (#7512) @davidwendt
Removed unneeded includes from traits.hpp (#7509) @davidwendt
FIX Remove random build directory generation for ccache (#7508) @dillon-cullinan
xfail failing pytest in pandas 1.2.3 (#7507) @galipremsagar
JNI bit cast (#7493) @revans2
Combine rolling window function tests (#7480) @mythrocks
Prepare Changelog for Automation (#7477) @ajschmidt8
Java support for explode position (#7471) @sperlingxx
Update 0.18 changelog entry (#7463) @ajschmidt8
JNI: Support skipping nulls for collect aggregation (#7457) @firestarman
Join APIs that return gathermaps (#7454) @shwina
Remove dependence on managed memory for multimap test (#7451) @jrhemstad
Use cuFile for Parquet IO when available (#7444) @vuule
Statistics cleanup (#7439) @kaatish
Add gbenchmarks for strings filter functions (#7438) @davidwendt
fixed_point + cudf::binary_operation API Changes (#7435) @codereport
Improve string gather performance (#7433) @jlowe
Don't use user resource for a temporary allocation in sort_by_key (#7431) @magnatelee
Detail APIs for datetime functions (#7430) @magnatelee
Replace thrust::max_element with thrust::reduce in strings findall_re (#7428) @davidwendt
Add gbenchmark for strings split/split_record functions (#7427) @davidwendt
Update JNI build to use CMAKE_CUDA_ARCHITECTURES (#7425) @jlowe
Change nvtext::load_vocabulary_file to return a unique ptr (#7424) @davidwendt
Simplify type dispatch with device_storage_dispatch (#7419) @codereport
Java support for casting of nested child columns (#7417) @razajafri
Improve scalar string replace performance for long strings (#7415) @jlowe
Remove unneeded temporary device vector for strings scatter specialization (#7409) @davidwendt
bitmask_or implementation with bitmask refactor (#7406) @rwlee
Add other cudf::strings::replace functions to current strings replace gbenchmark (#7403) @davidwendt
Clean up included headers in device_operators.cuh (#7401) @codereport
Move nullable index iterator to indexalator factory (#7399) @davidwendt
ENH Pass ccache variables to conda recipe & use Ninja in CI (#7398) @Ethyling
upgrade maven-antrun-plugin to support maven parallel builds (#7393) @rongou
Add gbenchmark for strings find/contains functions (#7392) @davidwendt
Use CMAKE_CUDA_ARCHITECTURES (#7391) @robertmaynard
Refactor libcudf strings::replace to use make_strings_children utility (#7384) @davidwendt
Added in JNI support for out of core sort algorithm (#7381) @revans2
Upgrade pandas to 1.2 (#7375) @galipremsagar
Rename logical_cast to bit_cast and allow additional conversions (#7373) @ttnghia
jitify 2 support (#7372) @cwharris
compile_udf: Cache PTX for similar functions (#7371) @gmarkall
Add string scalar replace benchmark (#7369) @jlowe
Add gbenchmark for strings contains_re/count_re functions (#7366) @davidwendt
Update orc reader and writer fuzz tests (#7357) @galipremsagar
Improve url_decode performance for long strings (#7353) @jlowe
cudf::ast Small Refactorings (#7352) @codereport
Remove std::cout and print in the scatter test function EmptyListsOfNullableStrings. (#7342) @ttnghia
Use cudf::detail::make_counting_transform_iterator (#7338) @codereport
Change block size parameter from a global to a template param. (#7333) @nvdbaranec
Partial clean up of ORC writer (#7324) @vuule
Add gbenchmark for cudf::strings::to_lower (#7316) @davidwendt
Update Java bindings version to 0.19-SNAPSHOT (#7307) @pxLi
Move cudf::test::make_counting_transform_iterator to cudf/detail/iterator.cuh (#7306) @codereport
Use string literals in fixed_point release_asserts (#7303) @codereport
Fix merge conflicts for #7295 (#7297) @ajschmidt8
Add UTF-8 chars to create_random_column<string_view> benchmark utility (#7292) @davidwendt
Abstracting block reduce and block scan from cuIO kernels with cub apis (#7278) @rgsl888prabhu
Build.sh use cmake --build to drive build system invocation (#7270) @robertmaynard
Refactor dictionary support for reductions any/all (#7242) @davidwendt
Replace stream.value() with stream for stream_view args (#7236) @karthikeyann
Interval index and interval_range (#7182) @marlenezw
avro reader integration tests (#7156) @cwharris
Rework libcudf CMakeLists.txt to export targets for CPM (#7107) @trxcllnt
Adding Interval Dtype (#6984) @marlenezw
Cleaning up for loops with make_(counting_)transform_iterator (#6546) @codereport

cudf - v0.19.1

Published by GPUtester over 3 years ago

🚨 Breaking Changes

Allow hash_partition to take a seed value (#7771) @magnatelee
Allow merging index column with data column using keyword "on" (#7736) @skirui-source
Change JNI API to avoid loading native dependencies when creating sort order classes. (#7729) @revans2
Replace device_vector with device_uvector in null_mask (#7715) @harrism
Don't identify decimals as strings. (#7710) @vyasr
Fix Java Parquet write after writer API changes (#7655) @revans2
Convert cudf::concatenate APIs to use spans and device_uvector (#7621) @harrism
Update missing docstring examples in python public APIs (#7546) @galipremsagar
Remove unneeded step parameter from strings::detail::copy_slice (#7525) @davidwendt
Rename ARROW_STATIC_LIB because it conflicts with one in FindArrow.cmake (#7518) @trxcllnt
Match Pandas logic for comparing two objects with nulls (#7490) @brandon-b-miller
Add struct support to parquet writer (#7461) @devavret
Join APIs that return gathermaps (#7454) @shwina
fixed_point + cudf::binary_operation API Changes (#7435) @codereport
Fix BUG: Exception when PYTHONOPTIMIZE=2 (#7434) @skirui-source
Change nvtext::load_vocabulary_file to return a unique ptr (#7424) @davidwendt
Refactor strings column factories (#7397) @harrism
Use CMAKE_CUDA_ARCHITECTURES (#7391) @robertmaynard
Upgrade pandas to 1.2 (#7375) @galipremsagar
Rename logical_cast to bit_cast and allow additional conversions (#7373) @ttnghia
Rework libcudf CMakeLists.txt to export targets for CPM (#7107) @trxcllnt

🐛 Bug Fixes

Fix returned column type when extracting from an empty list column (#8031) @jlowe
Don't reindex an new value on setitem if the original dataframe was empty (#8026) @vyasr
Fix a NameError in meta dispatch API (#7996) @galipremsagar
Reindex in DataFrame.__setitem__ (#7957) @galipremsagar
jitify direct-to-cubin compilation and caching. (#7919) @cwharris
Use dynamic cudart for nvcomp in java build (#7896) @abellina
fix "incompatible redefinition" warnings (#7894) @cwharris
cudf consistently specifies the cuda runtime (#7887) @robertmaynard
disable verbose output for jitify_preprocess (#7886) @cwharris
CMake jit_preprocess_files function only runs when needed (#7872) @robertmaynard
Push DeviceScalar construction into cython for list.contains (#7864) @brandon-b-miller
cudf now sets an install rpath of $ORIGIN (#7863) @robertmaynard
Don't install Thrust examples, tests, docs, and python files (#7811) @robertmaynard
Sort by index in groupby tests more consistently (#7802) @shwina
Revert "Update conda recipes pinning of repo dependencies (#7743)" (#7793) @raydouglass
Add decimal column handling in copy_type_metadata (#7788) @shwina
Add column names validation in parquet writer (#7786) @galipremsagar
Fix Java explode outer unit tests (#7782) @jlowe
Fix compiler warning about non-POD types passed through ellipsis (#7781) @jrhemstad
User resource fix for replace_nulls (#7769) @magnatelee
Fix type dispatch for columnar replace_nulls (#7768) @jlowe
Add ignore_order parameter to dask-cudf concat dispatch (#7765) @galipremsagar
Fix slicing and arrow representations of decimal columns (#7755) @vyasr
Fixing issue with explode_outer position not nulling position entries of null rows (#7754) @hyperbolic2346
Implement scatter for struct columns (#7752) @ttnghia
Fix data corruption in string columns (#7746) @galipremsagar
Fix string length in stripe dictionary building (#7744) @kaatish
Update conda recipes pinning of repo dependencies (#7743) @mike-wendt
Enable dask dispatch to cuDF's is_categorical_dtype for cuDF objects (#7740) @brandon-b-miller
Fix dictionary size computation in ORC writer (#7737) @vuule
Fix cudf::cast overflow for decimal64 to int32_t or smaller in certain cases (#7733) @codereport
Change JNI API to avoid loading native dependencies when creating sort order classes. (#7729) @revans2
Disable column_view data accessors for unsupported types (#7725) @jrhemstad
Materialize RangeIndex when index=True in parquet writer (#7711) @galipremsagar
Don't identify decimals as strings. (#7710) @vyasr
Fix return type of DataFrame.argsort (#7706) @galipremsagar
Fix/correct cudf installed package requirements (#7688) @robertmaynard
Fix SparkMurmurHash3_32 hash inconsistencies with Apache Spark (#7672) @jlowe
Fix ORC reader issue with reading empty string columns (#7656) @rgsl888prabhu
Fix Java Parquet write after writer API changes (#7655) @revans2
Fixing empty null lists throwing explode_outer for a loop. (#7649) @hyperbolic2346
Fix internal compiler error during JNI Docker build (#7645) @jlowe
Fix Debug build break with device_uvectors in grouped_rolling.cu (#7633) @mythrocks
Parquet reader: Fix issue when using skip_rows on non-nested columns containing nulls (#7627) @nvdbaranec
Fix ORC reader for empty DataFrame/Table (#7624) @rgsl888prabhu
Fix specifying GPU architecture in JNI build (#7612) @jlowe
Fix ORC writer OOM issue (#7605) @vuule
Fix 0.18 --> 0.19 automerge (#7589) @kkraus14
Fix ORC issue with incorrect timestamp nanosecond values (#7581) @vuule
Fix missing Dask imports (#7580) @kkraus14
CMAKE_CUDA_ARCHITECTURES doesn't change when build-system invokes cmake (#7579) @robertmaynard
Another fix for offsets_end() iterator in lists_column_view (#7575) @ttnghia
Fix ORC writer output corruption with string columns (#7565) @vuule
Fix cudf::lists::sort_lists failing for sliced column (#7564) @ttnghia
FIX Fix Anaconda upload args (#7558) @dillon-cullinan
Fix index mismatch issue in equality related APIs (#7555) @galipremsagar
FIX Revert gpuci_conda_retry on conda file output locations (#7552) @dillon-cullinan
Fix offset_end iterator for lists_column_view, which was not correctl… (#7551) @ttnghia
Fix no such file dlpack.h error when build libcudf (#7549) @chenrui17
Update missing docstring examples in python public APIs (#7546) @galipremsagar
Decimal32 Build Fix (#7544) @razajafri
FIX Retry conda output location (#7540) @dillon-cullinan
fix missing renames of dask git branches from master to main (#7535) @kkraus14
Remove detail from device_span (#7533) @rwlee
Change dask and distributed branch to main (#7532) @dantegd
Update JNI build to use CUDF_USE_ARROW_STATIC (#7526) @jlowe
Make sure rmm::rmm CMake target is visibile to cudf users (#7524) @robertmaynard
Fix contiguous_split not properly handling output partitions > 2 GB. (#7515) @nvdbaranec
Change jit launch to safe_launch (#7510) @devavret
Fix comparison between Datetime/Timedelta columns and NULL scalars (#7504) @brandon-b-miller
Fix off-by-one error in char-parallel string scalar replace (#7502) @jlowe
Fix JNI deprecation of all, put it on the wrong version before (#7501) @revans2
Fix Series/Dataframe Mixed Arithmetic (#7491) @brandon-b-miller
Fix JNI build after removal of libcudf sub-libraries (#7486) @jlowe
Correctly compile benchmarks (#7485) @robertmaynard
Fix bool column corruption with ORC Reader (#7483) @rgsl888prabhu
Fix __repr__ for categorical dtype (#7476) @galipremsagar
Java cleaner synchronization (#7474) @abellina
Fix java float/double parsing tests (#7473) @revans2
Pass stream and user resource to make_default_constructed_scalar (#7469) @magnatelee
Improve stability of dask_cudf.DataFrame.var and dask_cudf.DataFrame.std (#7453) @rjzamora
Missing device_storage_dispatch change affecting cudf::gather (#7449) @codereport
fix cuFile JNI compile errors (#7445) @rongou
Support Series.__setitem__ with key to a new row (#7443) @isVoid
Fix BUG: Exception when PYTHONOPTIMIZE=2 (#7434) @skirui-source
Make inclusive scan safe for cases with leading nulls (#7432) @magnatelee
Fix typo in list_device_view::pair_rep_end() (#7423) @mythrocks
Fix string to double conversion and row equivalent comparison (#7410) @ttnghia
Fix thrust failure when transfering data from device_vector to host_vector with vectors of size 1 (#7382) @ttnghia
Fix std::exeception catch-by-reference gcc9 compile error (#7380) @davidwendt
Fix skiprows issue with ORC Reader (#7359) @rgsl888prabhu
fix Arrow CMake file (#7358) @rongou
Fix lists::contains() for NaN and Decimals (#7349) @mythrocks
Handle cupy array in Dataframe.__setitem__ (#7340) @galipremsagar
Fix invalid-device-fn error in cudf::strings::replace_re with multiple regex's (#7336) @davidwendt
FIX Add codecov upload block to gpu script (#6860) @dillon-cullinan

📖 Documentation

Fix join API doxygen (#7890) @shwina
Add Resources to README. (#7697) @bdice
Add isin examples in Docstring (#7479) @galipremsagar
Resolving unlinked type shorthands in cudf doc (#7416) @isVoid
Fix typo in regex.md doc page (#7363) @davidwendt
Fix incorrect strings_column_view::chars_size documentation (#7360) @jlowe

🚀 New Features

Enable basic reductions for decimal columns (#7776) @ChrisJar
Enable join on decimal columns (#7764) @ChrisJar
Allow merging index column with data column using keyword "on" (#7736) @skirui-source
Implement DecimalColumn + Scalar and add cudf.Scalars of Decimal64Dtype (#7732) @brandon-b-miller
Add support for unique groupby aggregation (#7726) @shwina
Expose libcudf's label_bins function to cudf (#7724) @vyasr
Adding support for equi-join on struct (#7720) @hyperbolic2346
Add decimal column comparison operations (#7716) @isVoid
Implement scan operations for decimal columns (#7707) @ChrisJar
Enable typecasting between decimal and int (#7691) @ChrisJar
Enable decimal support in parquet writer (#7673) @devavret
Adds list.unique API (#7664) @isVoid
Fix NaN handling in drop_list_duplicates (#7662) @ttnghia
Add lists.sort_values API (#7657) @isVoid
Add is_integer API that can check for the validity of a string-to-integer conversion (#7642) @ttnghia
Adds explode API (#7607) @isVoid
Adds list.take, python binding for cudf::lists::segmented_gather (#7591) @isVoid
Implement cudf::label_bins() (#7554) @vyasr
Add Python bindings for lists::contains (#7547) @skirui-source
cudf::row_bit_count() support. (#7534) @nvdbaranec
Implement drop_list_duplicates (#7528) @ttnghia
Add Python bindings for lists::extract_lists_element (#7505) @skirui-source
Add explode_outer and explode_outer_position (#7499) @hyperbolic2346
Match Pandas logic for comparing two objects with nulls (#7490) @brandon-b-miller
Add struct support to parquet writer (#7461) @devavret
Enable type conversion from float to decimal type (#7450) @ChrisJar
Add cython for converting strings/fixed-point functions (#7429) @davidwendt
Add struct column support to cudf::sort and cudf::sorted_order (#7422) @karthikeyann
Implement groupby collect_set (#7420) @ttnghia
Merge branch-0.18 into branch-0.19 (#7411) @raydouglass
Refactor strings column factories (#7397) @harrism
Add groupby scan operations (sort groupby) (#7387) @karthikeyann
Add cudf::explode_position (#7376) @hyperbolic2346
Add string conversion to/from decimal values libcudf APIs (#7364) @davidwendt
Add groupby SUM_OF_SQUARES support (#7362) @karthikeyann
Add Series.drop api (#7304) @isVoid
get_json_object() implementation (#7286) @nvdbaranec
Python API for LIstMethods.len() (#7283) @isVoid
Support null_policy::EXCLUDE for COLLECT rolling aggregation (#7264) @mythrocks
Add support for special tokens in nvtext::subword_tokenizer (#7254) @davidwendt
Fix inplace update of data and add Series.update (#7201) @galipremsagar
Implement cudf::group_by (hash) for decimal32 and decimal64 (#7190) @codereport
Adding support to specify "level" parameter for Dataframe.rename (#7135) @skirui-source

🛠️ Improvements

fix GDS include path for version 0.95 (#7877) @rongou
Update dask + distributed to 2021.4.0 (#7858) @jakirkham
Add ability to extract include dirs from CUDF_HOME (#7848) @galipremsagar
Add USE_GDS as an option in build script (#7833) @pxLi
add an allocate method with stream in java DeviceMemoryBuffer (#7826) @rongou
Constrain dask and distributed versions to 2021.3.1 (#7825) @shwina
Revert dask versioning of concat dispatch (#7823) @galipremsagar
add copy methods in Java memory buffer (#7791) @rongou
Update README and CONTRIBUTING for 0.19 (#7778) @robertmaynard
Allow hash_partition to take a seed value (#7771) @magnatelee
Turn on NVTX by default in java build (#7761) @tgravescs
Add Java bindings to join gather map APIs (#7751) @jlowe
Add replacements column support for Java replaceNulls (#7750) @jlowe
Add Java bindings for row_bit_count (#7749) @jlowe
Remove unused JVM array creation (#7748) @jlowe
Added JNI support for new is_integer (#7739) @revans2
Create and promote library aliases in libcudf installations (#7734) @trxcllnt
Support groupby operations for decimal dtypes (#7731) @vyasr
Memory map the input file only when GDS compatiblity mode is not used (#7717) @vuule
Replace device_vector with device_uvector in null_mask (#7715) @harrism
Struct hashing support for SerialMurmur3 and SparkMurmur3 (#7714) @jlowe
Add gbenchmark for nvtext replace-tokens function (#7708) @davidwendt
Use stream in groupby calls (#7705) @karthikeyann
Update codeowners file (#7701) @ajschmidt8
Cleanup groupby to use host_span, device_span, device_uvector (#7698) @karthikeyann
Add gbenchmark for nvtext ngrams functions (#7693) @davidwendt
Misc Python/Cython optimizations (#7686) @shwina
Add gbenchmark for nvtext tokenize functions (#7684) @davidwendt
Add column_device_view to orc writer (#7676) @kaatish
cudf_kafka now uses cuDF CMake export targets (CPM) (#7674) @robertmaynard
Add gbenchmark for nvtext normalize functions (#7668) @davidwendt
Resolve unnecessary import of thrust/optional.hpp in types.hpp (#7667) @vyasr
Feature/optimize accessor copy (#7660) @vyasr
Fix find_package(cudf) (#7658) @trxcllnt
Work-around for gcc7 compile error on Centos7 (#7652) @davidwendt
Add in JNI support for count_elements (#7651) @revans2
Fix issues with building cudf in a non-conda environment (#7647) @galipremsagar
Refactor ConfigureCUDA to not conditionally insert compiler flags (#7643) @robertmaynard
Add gbenchmark for converting strings to/from timestamps (#7641) @davidwendt
Handle constructing a cudf.Scalar from a cudf.Scalar (#7639) @shwina
Add in JNI support for table partition (#7637) @revans2
Add explicit fixed_point merge test (#7635) @codereport
Add JNI support for IDENTITY hash partitioning (#7626) @revans2
Java support on explode_outer (#7625) @sperlingxx
Java support of casting string from/to decimal (#7623) @sperlingxx
Convert cudf::concatenate APIs to use spans and device_uvector (#7621) @harrism
Add gbenchmark for cudf::strings::translate function (#7617) @davidwendt
Use file(COPY ) over file(INSTALL ) so cmake output is reduced (#7616) @robertmaynard
Use rmm::device_uvector in place of rmm::device_vector for ORC reader/writer and cudf::io::column_buffer (#7614) @vuule
Refactor Java host-side buffer concatenation to expose separate steps (#7610) @jlowe
Add gbenchmarks for string substrings functions (#7603) @davidwendt
Refactor string conversion check (#7599) @ttnghia
JNI: Pass names of children struct columns to native Arrow IPC writer (#7598) @firestarman
Revert "ENH Fix stale GHA and prevent duplicates " (#7595) @mike-wendt
ENH Fix stale GHA and prevent duplicates (#7594) @mike-wendt
Fix auto-detecting GPU architectures (#7593) @trxcllnt
Reduce cudf library size (#7583) @robertmaynard
Optimize cudf::make_strings_column for long strings (#7576) @davidwendt
Always build and export the cudf::cudftestutil target (#7574) @trxcllnt
Eliminate literal parameters to uvector::set_element_async and device_scalar::set_value (#7563) @harrism
Add gbenchmark for strings::concatenate (#7560) @davidwendt
Update Changelog Link (#7550) @ajschmidt8
Add gbenchmarks for strings replace regex functions (#7541) @davidwendt
Add __repr__ for Column and ColumnAccessor (#7531) @shwina
Support Decimal DIV changes in cudf (#7527) @razajafri
Remove unneeded step parameter from strings::detail::copy_slice (#7525) @davidwendt
Use device_uvector, device_span in sort groupby (#7523) @karthikeyann
Add gbenchmarks for strings extract function (#7522) @davidwendt
Rename ARROW_STATIC_LIB because it conflicts with one in FindArrow.cmake (#7518) @trxcllnt
Reduce compile time/size for scan.cu (#7516) @davidwendt
Change device_vector to device_uvector in nvtext source files (#7512) @davidwendt
Removed unneeded includes from traits.hpp (#7509) @davidwendt
FIX Remove random build directory generation for ccache (#7508) @dillon-cullinan
xfail failing pytest in pandas 1.2.3 (#7507) @galipremsagar
JNI bit cast (#7493) @revans2
Combine rolling window function tests (#7480) @mythrocks
Prepare Changelog for Automation (#7477) @ajschmidt8
Java support for explode position (#7471) @sperlingxx
Update 0.18 changelog entry (#7463) @ajschmidt8
JNI: Support skipping nulls for collect aggregation (#7457) @firestarman
Join APIs that return gathermaps (#7454) @shwina
Remove dependence on managed memory for multimap test (#7451) @jrhemstad
Use cuFile for Parquet IO when available (#7444) @vuule
Statistics cleanup (#7439) @kaatish
Add gbenchmarks for strings filter functions (#7438) @davidwendt
fixed_point + cudf::binary_operation API Changes (#7435) @codereport
Improve string gather performance (#7433) @jlowe
Don't use user resource for a temporary allocation in sort_by_key (#7431) @magnatelee
Detail APIs for datetime functions (#7430) @magnatelee
Replace thrust::max_element with thrust::reduce in strings findall_re (#7428) @davidwendt
Add gbenchmark for strings split/split_record functions (#7427) @davidwendt
Update JNI build to use CMAKE_CUDA_ARCHITECTURES (#7425) @jlowe
Change nvtext::load_vocabulary_file to return a unique ptr (#7424) @davidwendt
Simplify type dispatch with device_storage_dispatch (#7419) @codereport
Java support for casting of nested child columns (#7417) @razajafri
Improve scalar string replace performance for long strings (#7415) @jlowe
Remove unneeded temporary device vector for strings scatter specialization (#7409) @davidwendt
bitmask_or implementation with bitmask refactor (#7406) @rwlee
Add other cudf::strings::replace functions to current strings replace gbenchmark (#7403) @davidwendt
Clean up included headers in device_operators.cuh (#7401) @codereport
Move nullable index iterator to indexalator factory (#7399) @davidwendt
ENH Pass ccache variables to conda recipe & use Ninja in CI (#7398) @Ethyling
upgrade maven-antrun-plugin to support maven parallel builds (#7393) @rongou
Add gbenchmark for strings find/contains functions (#7392) @davidwendt
Use CMAKE_CUDA_ARCHITECTURES (#7391) @robertmaynard
Refactor libcudf strings::replace to use make_strings_children utility (#7384) @davidwendt
Added in JNI support for out of core sort algorithm (#7381) @revans2
Upgrade pandas to 1.2 (#7375) @galipremsagar
Rename logical_cast to bit_cast and allow additional conversions (#7373) @ttnghia
jitify 2 support (#7372) @cwharris
compile_udf: Cache PTX for similar functions (#7371) @gmarkall
Add string scalar replace benchmark (#7369) @jlowe
Add gbenchmark for strings contains_re/count_re functions (#7366) @davidwendt
Update orc reader and writer fuzz tests (#7357) @galipremsagar
Improve url_decode performance for long strings (#7353) @jlowe
cudf::ast Small Refactorings (#7352) @codereport
Remove std::cout and print in the scatter test function EmptyListsOfNullableStrings. (#7342) @ttnghia
Use cudf::detail::make_counting_transform_iterator (#7338) @codereport
Change block size parameter from a global to a template param. (#7333) @nvdbaranec
Partial clean up of ORC writer (#7324) @vuule
Add gbenchmark for cudf::strings::to_lower (#7316) @davidwendt
Update Java bindings version to 0.19-SNAPSHOT (#7307) @pxLi
Move cudf::test::make_counting_transform_iterator to cudf/detail/iterator.cuh (#7306) @codereport
Use string literals in fixed_point release_asserts (#7303) @codereport
Fix merge conflicts for #7295 (#7297) @ajschmidt8
Add UTF-8 chars to create_random_column<string_view> benchmark utility (#7292) @davidwendt
Abstracting block reduce and block scan from cuIO kernels with cub apis (#7278) @rgsl888prabhu
Build.sh use cmake --build to drive build system invocation (#7270) @robertmaynard
Refactor dictionary support for reductions any/all (#7242) @davidwendt
Replace stream.value() with stream for stream_view args (#7236) @karthikeyann
Interval index and interval_range (#7182) @marlenezw
avro reader integration tests (#7156) @cwharris
Rework libcudf CMakeLists.txt to export targets for CPM (#7107) @trxcllnt
Adding Interval Dtype (#6984) @marlenezw
Cleaning up for loops with make_(counting_)transform_iterator (#6546) @codereport

cudf - v0.19.0

Published by GPUtester over 3 years ago

🚨 Breaking Changes

Allow hash_partition to take a seed value (#7771) @magnatelee
Allow merging index column with data column using keyword "on" (#7736) @skirui-source
Change JNI API to avoid loading native dependencies when creating sort order classes. (#7729) @revans2
Replace device_vector with device_uvector in null_mask (#7715) @harrism
Don't identify decimals as strings. (#7710) @vyasr
Fix Java Parquet write after writer API changes (#7655) @revans2
Convert cudf::concatenate APIs to use spans and device_uvector (#7621) @harrism
Update missing docstring examples in python public APIs (#7546) @galipremsagar
Remove unneeded step parameter from strings::detail::copy_slice (#7525) @davidwendt
Rename ARROW_STATIC_LIB because it conflicts with one in FindArrow.cmake (#7518) @trxcllnt
Match Pandas logic for comparing two objects with nulls (#7490) @brandon-b-miller
Add struct support to parquet writer (#7461) @devavret
Join APIs that return gathermaps (#7454) @shwina
fixed_point + cudf::binary_operation API Changes (#7435) @codereport
Fix BUG: Exception when PYTHONOPTIMIZE=2 (#7434) @skirui-source
Change nvtext::load_vocabulary_file to return a unique ptr (#7424) @davidwendt
Refactor strings column factories (#7397) @harrism
Use CMAKE_CUDA_ARCHITECTURES (#7391) @robertmaynard
Upgrade pandas to 1.2 (#7375) @galipremsagar
Rename logical_cast to bit_cast and allow additional conversions (#7373) @ttnghia
Rework libcudf CMakeLists.txt to export targets for CPM (#7107) @trxcllnt

🐛 Bug Fixes

Fix a NameError in meta dispatch API (#7996) @galipremsagar
Reindex in DataFrame.__setitem__ (#7957) @galipremsagar
jitify direct-to-cubin compilation and caching. (#7919) @cwharris
Use dynamic cudart for nvcomp in java build (#7896) @abellina
fix "incompatible redefinition" warnings (#7894) @cwharris
cudf consistently specifies the cuda runtime (#7887) @robertmaynard
disable verbose output for jitify_preprocess (#7886) @cwharris
CMake jit_preprocess_files function only runs when needed (#7872) @robertmaynard
Push DeviceScalar construction into cython for list.contains (#7864) @brandon-b-miller
cudf now sets an install rpath of $ORIGIN (#7863) @robertmaynard
Don't install Thrust examples, tests, docs, and python files (#7811) @robertmaynard
Sort by index in groupby tests more consistently (#7802) @shwina
Revert "Update conda recipes pinning of repo dependencies (#7743)" (#7793) @raydouglass
Add decimal column handling in copy_type_metadata (#7788) @shwina
Add column names validation in parquet writer (#7786) @galipremsagar
Fix Java explode outer unit tests (#7782) @jlowe
Fix compiler warning about non-POD types passed through ellipsis (#7781) @jrhemstad
User resource fix for replace_nulls (#7769) @magnatelee
Fix type dispatch for columnar replace_nulls (#7768) @jlowe
Add ignore_order parameter to dask-cudf concat dispatch (#7765) @galipremsagar
Fix slicing and arrow representations of decimal columns (#7755) @vyasr
Fixing issue with explode_outer position not nulling position entries of null rows (#7754) @hyperbolic2346
Implement scatter for struct columns (#7752) @ttnghia
Fix data corruption in string columns (#7746) @galipremsagar
Fix string length in stripe dictionary building (#7744) @kaatish
Update conda recipes pinning of repo dependencies (#7743) @mike-wendt
Enable dask dispatch to cuDF's is_categorical_dtype for cuDF objects (#7740) @brandon-b-miller
Fix dictionary size computation in ORC writer (#7737) @vuule
Fix cudf::cast overflow for decimal64 to int32_t or smaller in certain cases (#7733) @codereport
Change JNI API to avoid loading native dependencies when creating sort order classes. (#7729) @revans2
Disable column_view data accessors for unsupported types (#7725) @jrhemstad
Materialize RangeIndex when index=True in parquet writer (#7711) @galipremsagar
Don't identify decimals as strings. (#7710) @vyasr
Fix return type of DataFrame.argsort (#7706) @galipremsagar
Fix/correct cudf installed package requirements (#7688) @robertmaynard
Fix SparkMurmurHash3_32 hash inconsistencies with Apache Spark (#7672) @jlowe
Fix ORC reader issue with reading empty string columns (#7656) @rgsl888prabhu
Fix Java Parquet write after writer API changes (#7655) @revans2
Fixing empty null lists throwing explode_outer for a loop. (#7649) @hyperbolic2346
Fix internal compiler error during JNI Docker build (#7645) @jlowe
Fix Debug build break with device_uvectors in grouped_rolling.cu (#7633) @mythrocks
Parquet reader: Fix issue when using skip_rows on non-nested columns containing nulls (#7627) @nvdbaranec
Fix ORC reader for empty DataFrame/Table (#7624) @rgsl888prabhu
Fix specifying GPU architecture in JNI build (#7612) @jlowe
Fix ORC writer OOM issue (#7605) @vuule
Fix 0.18 --> 0.19 automerge (#7589) @kkraus14
Fix ORC issue with incorrect timestamp nanosecond values (#7581) @vuule
Fix missing Dask imports (#7580) @kkraus14
CMAKE_CUDA_ARCHITECTURES doesn't change when build-system invokes cmake (#7579) @robertmaynard
Another fix for offsets_end() iterator in lists_column_view (#7575) @ttnghia
Fix ORC writer output corruption with string columns (#7565) @vuule
Fix cudf::lists::sort_lists failing for sliced column (#7564) @ttnghia
FIX Fix Anaconda upload args (#7558) @dillon-cullinan
Fix index mismatch issue in equality related APIs (#7555) @galipremsagar
FIX Revert gpuci_conda_retry on conda file output locations (#7552) @dillon-cullinan
Fix offset_end iterator for lists_column_view, which was not correctl… (#7551) @ttnghia
Fix no such file dlpack.h error when build libcudf (#7549) @chenrui17
Update missing docstring examples in python public APIs (#7546) @galipremsagar
Decimal32 Build Fix (#7544) @razajafri
FIX Retry conda output location (#7540) @dillon-cullinan
fix missing renames of dask git branches from master to main (#7535) @kkraus14
Remove detail from device_span (#7533) @rwlee
Change dask and distributed branch to main (#7532) @dantegd
Update JNI build to use CUDF_USE_ARROW_STATIC (#7526) @jlowe
Make sure rmm::rmm CMake target is visibile to cudf users (#7524) @robertmaynard
Fix contiguous_split not properly handling output partitions > 2 GB. (#7515) @nvdbaranec
Change jit launch to safe_launch (#7510) @devavret
Fix comparison between Datetime/Timedelta columns and NULL scalars (#7504) @brandon-b-miller
Fix off-by-one error in char-parallel string scalar replace (#7502) @jlowe
Fix JNI deprecation of all, put it on the wrong version before (#7501) @revans2
Fix Series/Dataframe Mixed Arithmetic (#7491) @brandon-b-miller
Fix JNI build after removal of libcudf sub-libraries (#7486) @jlowe
Correctly compile benchmarks (#7485) @robertmaynard
Fix bool column corruption with ORC Reader (#7483) @rgsl888prabhu
Fix __repr__ for categorical dtype (#7476) @galipremsagar
Java cleaner synchronization (#7474) @abellina
Fix java float/double parsing tests (#7473) @revans2
Pass stream and user resource to make_default_constructed_scalar (#7469) @magnatelee
Improve stability of dask_cudf.DataFrame.var and dask_cudf.DataFrame.std (#7453) @rjzamora
Missing device_storage_dispatch change affecting cudf::gather (#7449) @codereport
fix cuFile JNI compile errors (#7445) @rongou
Support Series.__setitem__ with key to a new row (#7443) @isVoid
Fix BUG: Exception when PYTHONOPTIMIZE=2 (#7434) @skirui-source
Make inclusive scan safe for cases with leading nulls (#7432) @magnatelee
Fix typo in list_device_view::pair_rep_end() (#7423) @mythrocks
Fix string to double conversion and row equivalent comparison (#7410) @ttnghia
Fix thrust failure when transfering data from device_vector to host_vector with vectors of size 1 (#7382) @ttnghia
Fix std::exeception catch-by-reference gcc9 compile error (#7380) @davidwendt
Fix skiprows issue with ORC Reader (#7359) @rgsl888prabhu
fix Arrow CMake file (#7358) @rongou
Fix lists::contains() for NaN and Decimals (#7349) @mythrocks
Handle cupy array in Dataframe.__setitem__ (#7340) @galipremsagar
Fix invalid-device-fn error in cudf::strings::replace_re with multiple regex's (#7336) @davidwendt
FIX Add codecov upload block to gpu script (#6860) @dillon-cullinan

📖 Documentation

Fix join API doxygen (#7890) @shwina
Add Resources to README. (#7697) @bdice
Add isin examples in Docstring (#7479) @galipremsagar
Resolving unlinked type shorthands in cudf doc (#7416) @isVoid
Fix typo in regex.md doc page (#7363) @davidwendt
Fix incorrect strings_column_view::chars_size documentation (#7360) @jlowe

🚀 New Features

Enable basic reductions for decimal columns (#7776) @ChrisJar
Enable join on decimal columns (#7764) @ChrisJar
Allow merging index column with data column using keyword "on" (#7736) @skirui-source
Implement DecimalColumn + Scalar and add cudf.Scalars of Decimal64Dtype (#7732) @brandon-b-miller
Add support for unique groupby aggregation (#7726) @shwina
Expose libcudf's label_bins function to cudf (#7724) @vyasr
Adding support for equi-join on struct (#7720) @hyperbolic2346
Add decimal column comparison operations (#7716) @isVoid
Implement scan operations for decimal columns (#7707) @ChrisJar
Enable typecasting between decimal and int (#7691) @ChrisJar
Enable decimal support in parquet writer (#7673) @devavret
Adds list.unique API (#7664) @isVoid
Fix NaN handling in drop_list_duplicates (#7662) @ttnghia
Add lists.sort_values API (#7657) @isVoid
Add is_integer API that can check for the validity of a string-to-integer conversion (#7642) @ttnghia
Adds explode API (#7607) @isVoid
Adds list.take, python binding for cudf::lists::segmented_gather (#7591) @isVoid
Implement cudf::label_bins() (#7554) @vyasr
Add Python bindings for lists::contains (#7547) @skirui-source
cudf::row_bit_count() support. (#7534) @nvdbaranec
Implement drop_list_duplicates (#7528) @ttnghia
Add Python bindings for lists::extract_lists_element (#7505) @skirui-source
Add explode_outer and explode_outer_position (#7499) @hyperbolic2346
Match Pandas logic for comparing two objects with nulls (#7490) @brandon-b-miller
Add struct support to parquet writer (#7461) @devavret
Enable type conversion from float to decimal type (#7450) @ChrisJar
Add cython for converting strings/fixed-point functions (#7429) @davidwendt
Add struct column support to cudf::sort and cudf::sorted_order (#7422) @karthikeyann
Implement groupby collect_set (#7420) @ttnghia
Merge branch-0.18 into branch-0.19 (#7411) @raydouglass
Refactor strings column factories (#7397) @harrism
Add groupby scan operations (sort groupby) (#7387) @karthikeyann
Add cudf::explode_position (#7376) @hyperbolic2346
Add string conversion to/from decimal values libcudf APIs (#7364) @davidwendt
Add groupby SUM_OF_SQUARES support (#7362) @karthikeyann
Add Series.drop api (#7304) @isVoid
get_json_object() implementation (#7286) @nvdbaranec
Python API for LIstMethods.len() (#7283) @isVoid
Support null_policy::EXCLUDE for COLLECT rolling aggregation (#7264) @mythrocks
Add support for special tokens in nvtext::subword_tokenizer (#7254) @davidwendt
Fix inplace update of data and add Series.update (#7201) @galipremsagar
Implement cudf::group_by (hash) for decimal32 and decimal64 (#7190) @codereport
Adding support to specify "level" parameter for Dataframe.rename (#7135) @skirui-source

🛠️ Improvements

fix GDS include path for version 0.95 (#7877) @rongou
Update dask + distributed to 2021.4.0 (#7858) @jakirkham
Add ability to extract include dirs from CUDF_HOME (#7848) @galipremsagar
Add USE_GDS as an option in build script (#7833) @pxLi
add an allocate method with stream in java DeviceMemoryBuffer (#7826) @rongou
Constrain dask and distributed versions to 2021.3.1 (#7825) @shwina
Revert dask versioning of concat dispatch (#7823) @galipremsagar
add copy methods in Java memory buffer (#7791) @rongou
Update README and CONTRIBUTING for 0.19 (#7778) @robertmaynard
Allow hash_partition to take a seed value (#7771) @magnatelee
Turn on NVTX by default in java build (#7761) @tgravescs
Add Java bindings to join gather map APIs (#7751) @jlowe
Add replacements column support for Java replaceNulls (#7750) @jlowe
Add Java bindings for row_bit_count (#7749) @jlowe
Remove unused JVM array creation (#7748) @jlowe
Added JNI support for new is_integer (#7739) @revans2
Create and promote library aliases in libcudf installations (#7734) @trxcllnt
Support groupby operations for decimal dtypes (#7731) @vyasr
Memory map the input file only when GDS compatiblity mode is not used (#7717) @vuule
Replace device_vector with device_uvector in null_mask (#7715) @harrism
Struct hashing support for SerialMurmur3 and SparkMurmur3 (#7714) @jlowe
Add gbenchmark for nvtext replace-tokens function (#7708) @davidwendt
Use stream in groupby calls (#7705) @karthikeyann
Update codeowners file (#7701) @ajschmidt8
Cleanup groupby to use host_span, device_span, device_uvector (#7698) @karthikeyann
Add gbenchmark for nvtext ngrams functions (#7693) @davidwendt
Misc Python/Cython optimizations (#7686) @shwina
Add gbenchmark for nvtext tokenize functions (#7684) @davidwendt
Add column_device_view to orc writer (#7676) @kaatish
cudf_kafka now uses cuDF CMake export targets (CPM) (#7674) @robertmaynard
Add gbenchmark for nvtext normalize functions (#7668) @davidwendt
Resolve unnecessary import of thrust/optional.hpp in types.hpp (#7667) @vyasr
Feature/optimize accessor copy (#7660) @vyasr
Fix find_package(cudf) (#7658) @trxcllnt
Work-around for gcc7 compile error on Centos7 (#7652) @davidwendt
Add in JNI support for count_elements (#7651) @revans2
Fix issues with building cudf in a non-conda environment (#7647) @galipremsagar
Refactor ConfigureCUDA to not conditionally insert compiler flags (#7643) @robertmaynard
Add gbenchmark for converting strings to/from timestamps (#7641) @davidwendt
Handle constructing a cudf.Scalar from a cudf.Scalar (#7639) @shwina
Add in JNI support for table partition (#7637) @revans2
Add explicit fixed_point merge test (#7635) @codereport
Add JNI support for IDENTITY hash partitioning (#7626) @revans2
Java support on explode_outer (#7625) @sperlingxx
Java support of casting string from/to decimal (#7623) @sperlingxx
Convert cudf::concatenate APIs to use spans and device_uvector (#7621) @harrism
Add gbenchmark for cudf::strings::translate function (#7617) @davidwendt
Use file(COPY ) over file(INSTALL ) so cmake output is reduced (#7616) @robertmaynard
Use rmm::device_uvector in place of rmm::device_vector for ORC reader/writer and cudf::io::column_buffer (#7614) @vuule
Refactor Java host-side buffer concatenation to expose separate steps (#7610) @jlowe
Add gbenchmarks for string substrings functions (#7603) @davidwendt
Refactor string conversion check (#7599) @ttnghia
JNI: Pass names of children struct columns to native Arrow IPC writer (#7598) @firestarman
Revert "ENH Fix stale GHA and prevent duplicates " (#7595) @mike-wendt
ENH Fix stale GHA and prevent duplicates (#7594) @mike-wendt
Fix auto-detecting GPU architectures (#7593) @trxcllnt
Reduce cudf library size (#7583) @robertmaynard
Optimize cudf::make_strings_column for long strings (#7576) @davidwendt
Always build and export the cudf::cudftestutil target (#7574) @trxcllnt
Eliminate literal parameters to uvector::set_element_async and device_scalar::set_value (#7563) @harrism
Add gbenchmark for strings::concatenate (#7560) @davidwendt
Update Changelog Link (#7550) @ajschmidt8
Add gbenchmarks for strings replace regex functions (#7541) @davidwendt
Add __repr__ for Column and ColumnAccessor (#7531) @shwina
Support Decimal DIV changes in cudf (#7527) @razajafri
Remove unneeded step parameter from strings::detail::copy_slice (#7525) @davidwendt
Use device_uvector, device_span in sort groupby (#7523) @karthikeyann
Add gbenchmarks for strings extract function (#7522) @davidwendt
Rename ARROW_STATIC_LIB because it conflicts with one in FindArrow.cmake (#7518) @trxcllnt
Reduce compile time/size for scan.cu (#7516) @davidwendt
Change device_vector to device_uvector in nvtext source files (#7512) @davidwendt
Removed unneeded includes from traits.hpp (#7509) @davidwendt
FIX Remove random build directory generation for ccache (#7508) @dillon-cullinan
xfail failing pytest in pandas 1.2.3 (#7507) @galipremsagar
JNI bit cast (#7493) @revans2
Combine rolling window function tests (#7480) @mythrocks
Prepare Changelog for Automation (#7477) @ajschmidt8
Java support for explode position (#7471) @sperlingxx
Update 0.18 changelog entry (#7463) @ajschmidt8
JNI: Support skipping nulls for collect aggregation (#7457) @firestarman
Join APIs that return gathermaps (#7454) @shwina
Remove dependence on managed memory for multimap test (#7451) @jrhemstad
Use cuFile for Parquet IO when available (#7444) @vuule
Statistics cleanup (#7439) @kaatish
Add gbenchmarks for strings filter functions (#7438) @davidwendt
fixed_point + cudf::binary_operation API Changes (#7435) @codereport
Improve string gather performance (#7433) @jlowe
Don't use user resource for a temporary allocation in sort_by_key (#7431) @magnatelee
Detail APIs for datetime functions (#7430) @magnatelee
Replace thrust::max_element with thrust::reduce in strings findall_re (#7428) @davidwendt
Add gbenchmark for strings split/split_record functions (#7427) @davidwendt
Update JNI build to use CMAKE_CUDA_ARCHITECTURES (#7425) @jlowe
Change nvtext::load_vocabulary_file to return a unique ptr (#7424) @davidwendt
Simplify type dispatch with device_storage_dispatch (#7419) @codereport
Java support for casting of nested child columns (#7417) @razajafri
Improve scalar string replace performance for long strings (#7415) @jlowe
Remove unneeded temporary device vector for strings scatter specialization (#7409) @davidwendt
bitmask_or implementation with bitmask refactor (#7406) @rwlee
Add other cudf::strings::replace functions to current strings replace gbenchmark (#7403) @davidwendt
Clean up included headers in device_operators.cuh (#7401) @codereport
Move nullable index iterator to indexalator factory (#7399) @davidwendt
ENH Pass ccache variables to conda recipe & use Ninja in CI (#7398) @Ethyling
upgrade maven-antrun-plugin to support maven parallel builds (#7393) @rongou
Add gbenchmark for strings find/contains functions (#7392) @davidwendt
Use CMAKE_CUDA_ARCHITECTURES (#7391) @robertmaynard
Refactor libcudf strings::replace to use make_strings_children utility (#7384) @davidwendt
Added in JNI support for out of core sort algorithm (#7381) @revans2
Upgrade pandas to 1.2 (#7375) @galipremsagar
Rename logical_cast to bit_cast and allow additional conversions (#7373) @ttnghia
jitify 2 support (#7372) @cwharris
compile_udf: Cache PTX for similar functions (#7371) @gmarkall
Add string scalar replace benchmark (#7369) @jlowe
Add gbenchmark for strings contains_re/count_re functions (#7366) @davidwendt
Update orc reader and writer fuzz tests (#7357) @galipremsagar
Improve url_decode performance for long strings (#7353) @jlowe
cudf::ast Small Refactorings (#7352) @codereport
Remove std::cout and print in the scatter test function EmptyListsOfNullableStrings. (#7342) @ttnghia
Use cudf::detail::make_counting_transform_iterator (#7338) @codereport
Change block size parameter from a global to a template param. (#7333) @nvdbaranec
Partial clean up of ORC writer (#7324) @vuule
Add gbenchmark for cudf::strings::to_lower (#7316) @davidwendt
Update Java bindings version to 0.19-SNAPSHOT (#7307) @pxLi
Move cudf::test::make_counting_transform_iterator to cudf/detail/iterator.cuh (#7306) @codereport
Use string literals in fixed_point release_asserts (#7303) @codereport
Fix merge conflicts for #7295 (#7297) @ajschmidt8
Add UTF-8 chars to create_random_column<string_view> benchmark utility (#7292) @davidwendt
Abstracting block reduce and block scan from cuIO kernels with cub apis (#7278) @rgsl888prabhu
Build.sh use cmake --build to drive build system invocation (#7270) @robertmaynard
Refactor dictionary support for reductions any/all (#7242) @davidwendt
Replace stream.value() with stream for stream_view args (#7236) @karthikeyann
Interval index and interval_range (#7182) @marlenezw
avro reader integration tests (#7156) @cwharris
Rework libcudf CMakeLists.txt to export targets for CPM (#7107) @trxcllnt
Adding Interval Dtype (#6984) @marlenezw
Cleaning up for loops with make_(counting_)transform_iterator (#6546) @codereport

cudf - v0.18.1

Published by GPUtester over 3 years ago

cudf - [NIGHTLY] v0.18.0

Published by rapids-bot[bot] over 3 years ago

🔗 Links

🚨 Breaking Changes

Default groupby to sort=False (#7180) @isVoid
Add libcudf API for parsing of ORC statistics (#7136) @vuule
Replace ORC writer api with class (#7099) @rgsl888prabhu
Pack/unpack functionality to convert tables to and from a serialized format. (#7096) @nvdbaranec
Replace parquet writer api with class (#7058) @rgsl888prabhu
Add days check to cudf::is_timestamp using cuda::std::chrono classes (#7028) @davidwendt
Fix default parameter values of write_csv and write_parquet (#6967) @vuule
Align Series.groupby API to match Pandas (#6964) @kkraus14
Share factorize implementation with Index and cudf module (#6885) @brandon-b-miller

🐛 Bug Fixes

Fix null-bounds calculation for ranged window queries (#7568) @mythrocks
Remove incorrect std::move call on return variable (#7319) @davidwendt
Fix failing CI ORC test (#7313) @vuule
Disallow constructing frames from a ColumnAccessor (#7298) @shwina
fix java cuFile tests (#7296) @rongou
Fix style issues related to NumPy (#7279) @shwina
Fix bug when iloc slice terminates at before-the-zero position (#7277) @isVoid
Fix copying dtype metadata after calling libcudf functions (#7271) @shwina
Move lists utility function definition out of header (#7266) @mythrocks
Throw if bool column would cause incorrect result when writing to ORC (#7261) @vuule
Use uvector in replace_nulls; Fix sort_helper::grouped_value doc (#7256) @isVoid
Remove floating point types from cudf::sort fast-path (#7250) @davidwendt
Disallow picking output columns from nested columns. (#7248) @devavret
Fix loc for Series with a MultiIndex (#7243) @shwina
Fix Arrow column test leaks (#7241) @tgravescs
Fix test column vector leak (#7238) @kuhushukla
Fix some bugs in java scalar support for decimal (#7237) @revans2
Improve assert_eq handling of scalar (#7220) @isVoid
Fix missing null_count() comparison in test framework and related failures (#7219) @nvdbaranec
Remove floating point types from radix sort fast-path (#7215) @davidwendt
Fixing parquet benchmarks (#7214) @rgsl888prabhu
Handle various parameter combinations in replace API (#7207) @galipremsagar
Export mock aws credentials for s3 tests (#7176) @ayushdg
Add MultiIndex.rename API (#7172) @isVoid
Fix importing list & struct types in from_arrow (#7162) @galipremsagar
Fixing parquet precision writing failing if scale is equal to precision (#7146) @hyperbolic2346
Update s3 tests to use moto_server (#7144) @ayushdg
Fix JIT cache multi-process test flakiness in slow drives (#7142) @devavret
Fix compilation errors in libcudf (#7138) @galipremsagar
Fix compilation failure caused by -Wall addition. (#7134) @codereport
Add informative error message for sep in CSV writer (#7095) @galipremsagar
Add JIT cache per compute capability (#7090) @devavret
Implement __hash__ method for ListDtype (#7081) @galipremsagar
Only upload packages that were built (#7077) @raydouglass
Fix comparisons between Series and cudf.NA (#7072) @brandon-b-miller
Handle nan values correctly in Series.one_hot_encoding (#7059) @galipremsagar
Add unstack() support for non-multiindexed dataframes (#7054) @isVoid
Fix read_orc for decimal type (#7034) @rgsl888prabhu
Fix backward compatibility of loading a 0.16 pkl file (#7033) @galipremsagar
Decimal casts in JNI became a NOOP (#7032) @revans2
Restore usual instance/subclass checking to cudf.DateOffset (#7029) @shwina
Add days check to cudf::is_timestamp using cuda::std::chrono classes (#7028) @davidwendt
Fix to_csv delimiter handling of timestamp format (#7023) @davidwendt
Pin librdkakfa to gcc 7 compatible version (#7021) @raydouglass
Fix fillna & dropna to also consider np.nan as a missing value (#7019) @galipremsagar
Fix round operator's HALF_EVEN computation for negative integers (#7014) @nartal1
Skip Thrust sort patch if already applied (#7009) @harrism
Fix cudf::hash_partition for decimal32 and decimal64 (#7006) @codereport
Fix Thrust unroll patch command (#7002) @harrism
Fix loc behaviour when key of incorrect type is used (#6993) @shwina
Fix int to datetime conversion in csv_read (#6991) @kaatish
fix excluding cufile tests by default (#6988) @rongou
Fix java cufile tests when cufile is not installed (#6987) @revans2
Make cudf::round for fixed_point when scale = -decimal_places a no-op (#6975) @codereport
Fix type comparison for java (#6970) @revans2
Fix default parameter values of write_csv and write_parquet (#6967) @vuule
Align Series.groupby API to match Pandas (#6964) @kkraus14
Fix timestamp parsing in ORC reader for timezones without transitions (#6959) @vuule
Fix typo in numerical.py (#6957) @rgsl888prabhu
fixed_point_value double-shifts in fixed_point construction (#6950) @codereport
fix libcu++ include path for jni (#6948) @rongou
Fix groupby agg/apply behaviour when no key columns are provided (#6945) @shwina
Avoid inserting null elements into join hash table when nulls are treated as unequal (#6943) @hyperbolic2346
Fix cudf::merge gtest for dictionary columns (#6942) @davidwendt
Pass numeric scalars of the same dtype through numeric binops (#6938) @brandon-b-miller
Fix N/A detection for empty fields in CSV reader (#6922) @vuule
Fix rmm_mode=managed parameter for gtests (#6912) @davidwendt
Fix nullmask offset handling in parquet and orc writer (#6889) @kaatish
Correct the sampling range when sampling with replacement (#6884) @ChrisJar
Handle nested string columns with no children in contiguous_split. (#6864) @nvdbaranec
Fix columns & index handling in dataframe constructor (#6838) @galipremsagar

📖 Documentation

Update readme (#7318) @shwina
Fix typo in cudf.core.column.string.extract docs (#7253) @adelevie
Update doxyfile project number (#7161) @davidwendt
Update 10 minutes to cuDF and CuPy with new APIs (#7158) @ChrisJar
Cross link RMM & libcudf Doxygen docs (#7149) @ajschmidt8
Add documentation for support dtypes in all IO formats (#7139) @galipremsagar
Add groupby docs (#7100) @shwina
Update cudf python docstrings with new null representation (<NA>) (#7050) @galipremsagar
Make Doxygen comments formatting consistent (#7041) @vuule
Add docs for working with missing data (#7010) @galipremsagar
Remove warning in from_dlpack and to_dlpack methods (#7001) @miguelusque
libcudf Developer Guide (#6977) @harrism
Add JNI wrapper for the cuFile API (GDS) (#6940) @rongou

🚀 New Features

Support numeric_only field for rank() (#7213) @isVoid
Add support for cudf::binary_operation TRUE_DIV for decimal32 and decimal64 (#7198) @codereport
Implement COLLECT rolling window aggregation (#7189) @mythrocks
Add support for array-like inputs in cudf.get_dummies (#7181) @galipremsagar
Default groupby to sort=False (#7180) @isVoid
Add libcudf lists column count_elements API (#7173) @davidwendt
Implement cudf::group_by (sort) for decimal32 and decimal64 (#7169) @codereport
Add encoding and compression argument to CSV writer (#7168) @VibhuJawa
cudf::rolling_window SUM support for decimal32 and decimal64 (#7147) @codereport
Adding support for explode to cuDF (#7140) @hyperbolic2346
Add libcudf API for parsing of ORC statistics (#7136) @vuule
update GDS/cuFile location for 0.9 release (#7131) @rongou
Add Segmented sort (#7122) @karthikeyann
Add cudf::binary_operation NULL_MIN, NULL_MAX & NULL_EQUALS for decimal32 and decimal64 (#7119) @codereport
Add scale and value methods to fixed_point (#7109) @codereport
Replace ORC writer api with class (#7099) @rgsl888prabhu
Pack/unpack functionality to convert tables to and from a serialized format. (#7096) @nvdbaranec
Improve digitize API (#7071) @isVoid
Add List types support in data generator (#7064) @galipremsagar
cudf::scan support for decimal32 and decimal64 (#7063) @codereport
cudf::rolling ROW_NUMBER support for decimal32 and decimal64 (#7061) @codereport
Replace parquet writer api with class (#7058) @rgsl888prabhu
Support contains() on lists of primitives (#7039) @mythrocks
Implement cudf::rolling for decimal32 and decimal64 (#7037) @codereport
Add ffill and bfill to string columns (#7036) @isVoid
Enable round in cudf for DataFrame and Series (#7022) @ChrisJar
Extend replace_nulls_policy to string and dictionary type (#7004) @isVoid
Add segmented_gather(list_column, gather_list) (#7003) @karthikeyann
Add method field to fillna for fixed width columns (#6998) @isVoid
Manual merge of branch 0.17 into branch 0.18 (#6995) @shwina
Implement cudf::reduce for decimal32 and decimal64 (part 2) (#6980) @codereport
Add Ufunc alias look up for appropriate numpy ufunc dispatching (#6973) @VibhuJawa
Add pytest-xdist to dev environment.yml (#6958) @galipremsagar
Add Index.set_names api (#6929) @galipremsagar
Add replace_null API with replace_policy parameter, fixed_width column support (#6907) @isVoid
Share factorize implementation with Index and cudf module (#6885) @brandon-b-miller
Implement update() function (#6883) @skirui-source
Add groupby idxmin, idxmax aggregation (#6856) @karthikeyann
Implement cudf::reduce for decimal32 and decimal64 (part 1) (#6814) @codereport
Implement cudf.DateOffset for months (#6775) @brandon-b-miller
Add Python DecimalColumn (#6715) @shwina
Add dictionary support to libcudf groupby functions (#6585) @davidwendt

🛠️ Improvements

Update stale GHA with exemptions & new labels (#7395) @mike-wendt
Add GHA to mark issues/prs as stale/rotten (#7388) @Ethyling
Unpin from numpy < 1.20 (#7335) @shwina
Prepare Changelog for Automation (#7309) @galipremsagar
Prepare Changelog for Automation (#7272) @ajschmidt8
Add JNI support for converting Arrow buffers to CUDF ColumnVectors (#7222) @tgravescs
Add coverage for skiprows and num_rows in parquet reader fuzz testing (#7216) @galipremsagar
Define and implement more behavior for merging on categorical variables (#7209) @brandon-b-miller
Add CudfSeriesGroupBy to optimize dask_cudf groupby-mean (#7194) @rjzamora
Add dictionary column support to rolling_window (#7186) @davidwendt
Modify the semantics of end pointers in cuIO to match standard library (#7179) @vuule
Adding unit tests for fixed_point with extremely large scales (#7178) @codereport
Fast path single column sort (#7167) @davidwendt
Fix -Werror=sign-compare errors in device code (#7164) @trxcllnt
Refactor cudf::string_view host and device code (#7159) @davidwendt
Enable logic for GPU auto-detection in cudfjni (#7155) @gerashegalov
Java bindings for Fixed-point type support for Parquet (#7153) @razajafri
Add Java interface for the new API 'explode' (#7151) @firestarman
Replace offsets with iterators in cuIO utilities and CSV parser (#7150) @vuule
Add gbenchmarks for reduction aggregations any() and all() (#7129) @davidwendt
Update JNI for contiguous_split packed results (#7127) @jlowe
Add JNI and Java bindings for list_contains (#7125) @kuhushukla
Add Java unit tests for window aggregate 'collect' (#7121) @firestarman
verify window operations on decimal with java tests (#7120) @sperlingxx
Adds in JNI support for creating an list column from existing columns (#7112) @revans2
Build libcudf with -Wall (#7105) @trxcllnt
Add column_device_view pointers to EncColumnDesc (#7097) @kaatish
Add pyorc to dev environment (#7085) @galipremsagar
JNI support for creating struct column from existing columns and fixed bug in struct with no children (#7084) @revans2
Fastpath single strings column in cudf::sort (#7075) @davidwendt
Upgrade nvcomp to 1.2.1 (#7069) @rongou
Refactor ORC ProtobufReader to make it more extendable (#7055) @vuule
Add Java tests for decimal casts (#7051) @sperlingxx
Auto-label PRs based on their content (#7044) @jolorunyomi
Create sort gbenchmark for strings column (#7040) @davidwendt
Refactor io memory fetches to use hostdevice_vector methods (#7035) @ChrisJar
Spark Murmur3 hash functionality (#7024) @rwlee
Fix libcudf strings logic where size_type is used to access INT32 column data (#7020) @davidwendt
Adding decimal writing support to parquet (#7017) @hyperbolic2346
Add compression="infer" as default for dask_cudf.read_csv (#7013) @rjzamora
Correct ORC docstring; other minor cuIO improvements (#7012) @vuule
Reduce number of hostdevice_vector allocations in parquet reader (#7005) @devavret
Check output size overflow on strings gather (#6997) @davidwendt
Improve representation of MultiIndex (#6992) @galipremsagar
Disable some pragma unroll statements in thrust sort.h (#6982) @davidwendt
Minor cudf::round internal refactoring (#6976) @codereport
Add Java bindings for URL conversion (#6972) @jlowe
Enable strict_decimal_types in parquet reading (#6969) @sperlingxx
Add in basic support to JNI for logical_cast (#6954) @revans2
Remove duplicate file array_tests.cpp (#6953) @karthikeyann
Add null mask fixed_point_column_wrapper constructors (#6951) @codereport
Update Java bindings version to 0.18-SNAPSHOT (#6949) @jlowe
Use simplified rmm::exec_policy (#6939) @harrism
Add null count test for apply_boolean_mask (#6903) @harrism
Implement DataFrame.quantile for datetime and timedelta data types (#6902) @ChrisJar
Remove **kwargs from string/categorical methods (#6750) @shwina
Refactor rolling.cu to reduce compile time (#6512) @mythrocks
Add static type checking via Mypy (#6381) @shwina
Update to official libcu++ on Github (#6275) @trxcllnt

cudf - v0.18.0

Published by GPUtester over 3 years ago

Breaking Changes 🚨

Default groupby to sort=False (#7180) @isVoid
Add libcudf API for parsing of ORC statistics (#7136) @vuule
Replace ORC writer api with class (#7099) @rgsl888prabhu
Pack/unpack functionality to convert tables to and from a serialized format. (#7096) @nvdbaranec
Replace parquet writer api with class (#7058) @rgsl888prabhu
Add days check to cudf::is_timestamp using cuda::std::chrono classes (#7028) @davidwendt
Fix default parameter values of write_csv and write_parquet (#6967) @vuule
Align Series.groupby API to match Pandas (#6964) @kkraus14
Share factorize implementation with Index and cudf module (#6885) @brandon-b-miller

Bug Fixes 🐛

Remove incorrect std::move call on return variable (#7319) @davidwendt
Fix failing CI ORC test (#7313) @vuule
Disallow constructing frames from a ColumnAccessor (#7298) @shwina
fix java cuFile tests (#7296) @rongou
Fix style issues related to NumPy (#7279) @shwina
Fix bug when iloc slice terminates at before-the-zero position (#7277) @isVoid
Fix copying dtype metadata after calling libcudf functions (#7271) @shwina
Move lists utility function definition out of header (#7266) @mythrocks
Throw if bool column would cause incorrect result when writing to ORC (#7261) @vuule
Use uvector in replace_nulls; Fix sort_helper::grouped_value doc (#7256) @isVoid
Remove floating point types from cudf::sort fast-path (#7250) @davidwendt
Disallow picking output columns from nested columns. (#7248) @devavret
Fix loc for Series with a MultiIndex (#7243) @shwina
Fix Arrow column test leaks (#7241) @tgravescs
Fix test column vector leak (#7238) @kuhushukla
Fix some bugs in java scalar support for decimal (#7237) @revans2
Improve assert_eq handling of scalar (#7220) @isVoid
Fix missing null_count() comparison in test framework and related failures (#7219) @nvdbaranec
Remove floating point types from radix sort fast-path (#7215) @davidwendt
Fixing parquet benchmarks (#7214) @rgsl888prabhu
Handle various parameter combinations in replace API (#7207) @galipremsagar
Export mock aws credentials for s3 tests (#7176) @ayushdg
Add MultiIndex.rename API (#7172) @isVoid
Fix importing list & struct types in from_arrow (#7162) @galipremsagar
Fixing parquet precision writing failing if scale is equal to precision (#7146) @hyperbolic2346
Update s3 tests to use moto_server (#7144) @ayushdg
Fix JIT cache multi-process test flakiness in slow drives (#7142) @devavret
Fix compilation errors in libcudf (#7138) @galipremsagar
Fix compilation failure caused by -Wall addition. (#7134) @codereport
Add informative error message for sep in CSV writer (#7095) @galipremsagar
Add JIT cache per compute capability (#7090) @devavret
Implement __hash__ method for ListDtype (#7081) @galipremsagar
Only upload packages that were built (#7077) @raydouglass
Fix comparisons between Series and cudf.NA (#7072) @brandon-b-miller
Handle nan values correctly in Series.one_hot_encoding (#7059) @galipremsagar
Add unstack() support for non-multiindexed dataframes (#7054) @isVoid
Fix read_orc for decimal type (#7034) @rgsl888prabhu
Fix backward compatibility of loading a 0.16 pkl file (#7033) @galipremsagar
Decimal casts in JNI became a NOOP (#7032) @revans2
Restore usual instance/subclass checking to cudf.DateOffset (#7029) @shwina
Add days check to cudf::is_timestamp using cuda::std::chrono classes (#7028) @davidwendt
Fix to_csv delimiter handling of timestamp format (#7023) @davidwendt
Pin librdkakfa to gcc 7 compatible version (#7021) @raydouglass
Fix fillna & dropna to also consider np.nan as a missing value (#7019) @galipremsagar
Fix round operator's HALF_EVEN computation for negative integers (#7014) @nartal1
Skip Thrust sort patch if already applied (#7009) @harrism
Fix cudf::hash_partition for decimal32 and decimal64 (#7006) @codereport
Fix Thrust unroll patch command (#7002) @harrism
Fix loc behaviour when key of incorrect type is used (#6993) @shwina
Fix int to datetime conversion in csv_read (#6991) @kaatish
fix excluding cufile tests by default (#6988) @rongou
Fix java cufile tests when cufile is not installed (#6987) @revans2
Make cudf::round for fixed_point when scale = -decimal_places a no-op (#6975) @codereport
Fix type comparison for java (#6970) @revans2
Fix default parameter values of write_csv and write_parquet (#6967) @vuule
Align Series.groupby API to match Pandas (#6964) @kkraus14
Fix timestamp parsing in ORC reader for timezones without transitions (#6959) @vuule
Fix typo in numerical.py (#6957) @rgsl888prabhu
fixed_point_value double-shifts in fixed_point construction (#6950) @codereport
fix libcu++ include path for jni (#6948) @rongou
Fix groupby agg/apply behaviour when no key columns are provided (#6945) @shwina
Avoid inserting null elements into join hash table when nulls are treated as unequal (#6943) @hyperbolic2346
Fix cudf::merge gtest for dictionary columns (#6942) @davidwendt
Pass numeric scalars of the same dtype through numeric binops (#6938) @brandon-b-miller
Fix N/A detection for empty fields in CSV reader (#6922) @vuule
Fix rmm_mode=managed parameter for gtests (#6912) @davidwendt
Fix nullmask offset handling in parquet and orc writer (#6889) @kaatish
Correct the sampling range when sampling with replacement (#6884) @ChrisJar
Handle nested string columns with no children in contiguous_split. (#6864) @nvdbaranec
Fix columns & index handling in dataframe constructor (#6838) @galipremsagar

Documentation 📖

Update readme (#7318) @shwina
Fix typo in cudf.core.column.string.extract docs (#7253) @adelevie
Update doxyfile project number (#7161) @davidwendt
Update 10 minutes to cuDF and CuPy with new APIs (#7158) @ChrisJar
Cross link RMM & libcudf Doxygen docs (#7149) @ajschmidt8
Add documentation for support dtypes in all IO formats (#7139) @galipremsagar
Add groupby docs (#7100) @shwina
Update cudf python docstrings with new null representation (<NA>) (#7050) @galipremsagar
Make Doxygen comments formatting consistent (#7041) @vuule
Add docs for working with missing data (#7010) @galipremsagar
Remove warning in from_dlpack and to_dlpack methods (#7001) @miguelusque
libcudf Developer Guide (#6977) @harrism
Add JNI wrapper for the cuFile API (GDS) (#6940) @rongou

New Features 🚀

Support numeric_only field for rank() (#7213) @isVoid
Add support for cudf::binary_operation TRUE_DIV for decimal32 and decimal64 (#7198) @codereport
Implement COLLECT rolling window aggregation (#7189) @mythrocks
Add support for array-like inputs in cudf.get_dummies (#7181) @galipremsagar
Default groupby to sort=False (#7180) @isVoid
Add libcudf lists column count_elements API (#7173) @davidwendt
Implement cudf::group_by (sort) for decimal32 and decimal64 (#7169) @codereport
Add encoding and compression argument to CSV writer (#7168) @VibhuJawa
cudf::rolling_window SUM support for decimal32 and decimal64 (#7147) @codereport
Adding support for explode to cuDF (#7140) @hyperbolic2346
Add libcudf API for parsing of ORC statistics (#7136) @vuule
update GDS/cuFile location for 0.9 release (#7131) @rongou
Add Segmented sort (#7122) @karthikeyann
Add cudf::binary_operation NULL_MIN, NULL_MAX & NULL_EQUALS for decimal32 and decimal64 (#7119) @codereport
Add scale and value methods to fixed_point (#7109) @codereport
Replace ORC writer api with class (#7099) @rgsl888prabhu
Pack/unpack functionality to convert tables to and from a serialized format. (#7096) @nvdbaranec
Improve digitize API (#7071) @isVoid
Add List types support in data generator (#7064) @galipremsagar
cudf::scan support for decimal32 and decimal64 (#7063) @codereport
cudf::rolling ROW_NUMBER support for decimal32 and decimal64 (#7061) @codereport
Replace parquet writer api with class (#7058) @rgsl888prabhu
Support contains() on lists of primitives (#7039) @mythrocks
Implement cudf::rolling for decimal32 and decimal64 (#7037) @codereport
Add ffill and bfill to string columns (#7036) @isVoid
Enable round in cudf for DataFrame and Series (#7022) @ChrisJar
Extend replace_nulls_policy to string and dictionary type (#7004) @isVoid
Add segmented_gather(list_column, gather_list) (#7003) @karthikeyann
Add method field to fillna for fixed width columns (#6998) @isVoid
Manual merge of branch 0.17 into branch 0.18 (#6995) @shwina
Implement cudf::reduce for decimal32 and decimal64 (part 2) (#6980) @codereport
Add Ufunc alias look up for appropriate numpy ufunc dispatching (#6973) @VibhuJawa
Add pytest-xdist to dev environment.yml (#6958) @galipremsagar
Add Index.set_names api (#6929) @galipremsagar
Add replace_null API with replace_policy parameter, fixed_width column support (#6907) @isVoid
Share factorize implementation with Index and cudf module (#6885) @brandon-b-miller
Implement update() function (#6883) @skirui-source
Add groupby idxmin, idxmax aggregation (#6856) @karthikeyann
Implement cudf::reduce for decimal32 and decimal64 (part 1) (#6814) @codereport
Implement cudf.DateOffset for months (#6775) @brandon-b-miller
Add Python DecimalColumn (#6715) @shwina
Add dictionary support to libcudf groupby functions (#6585) @davidwendt

Improvements 🛠️

Update stale GHA with exemptions & new labels (#7395) @mike-wendt
Add GHA to mark issues/prs as stale/rotten (#7388) @Ethyling
Unpin from numpy < 1.20 (#7335) @shwina
Prepare Changelog for Automation (#7309) @galipremsagar
Prepare Changelog for Automation (#7272) @ajschmidt8
Add JNI support for converting Arrow buffers to CUDF ColumnVectors (#7222) @tgravescs
Add coverage for skiprows and num_rows in parquet reader fuzz testing (#7216) @galipremsagar
Define and implement more behavior for merging on categorical variables (#7209) @brandon-b-miller
Add CudfSeriesGroupBy to optimize dask_cudf groupby-mean (#7194) @rjzamora
Add dictionary column support to rolling_window (#7186) @davidwendt
Modify the semantics of end pointers in cuIO to match standard library (#7179) @vuule
Adding unit tests for fixed_point with extremely large scales (#7178) @codereport
Fast path single column sort (#7167) @davidwendt
Fix -Werror=sign-compare errors in device code (#7164) @trxcllnt
Refactor cudf::string_view host and device code (#7159) @davidwendt
Enable logic for GPU auto-detection in cudfjni (#7155) @gerashegalov
Java bindings for Fixed-point type support for Parquet (#7153) @razajafri
Add Java interface for the new API 'explode' (#7151) @firestarman
Replace offsets with iterators in cuIO utilities and CSV parser (#7150) @vuule
Add gbenchmarks for reduction aggregations any() and all() (#7129) @davidwendt
Update JNI for contiguous_split packed results (#7127) @jlowe
Add JNI and Java bindings for list_contains (#7125) @kuhushukla
Add Java unit tests for window aggregate 'collect' (#7121) @firestarman
verify window operations on decimal with java tests (#7120) @sperlingxx
Adds in JNI support for creating an list column from existing columns (#7112) @revans2
Build libcudf with -Wall (#7105) @trxcllnt
Add column_device_view pointers to EncColumnDesc (#7097) @kaatish
Add pyorc to dev environment (#7085) @galipremsagar
JNI support for creating struct column from existing columns and fixed bug in struct with no children (#7084) @revans2
Fastpath single strings column in cudf::sort (#7075) @davidwendt
Upgrade nvcomp to 1.2.1 (#7069) @rongou
Refactor ORC ProtobufReader to make it more extendable (#7055) @vuule
Add Java tests for decimal casts (#7051) @sperlingxx
Auto-label PRs based on their content (#7044) @jolorunyomi
Create sort gbenchmark for strings column (#7040) @davidwendt
Refactor io memory fetches to use hostdevice_vector methods (#7035) @ChrisJar
Spark Murmur3 hash functionality (#7024) @rwlee
Fix libcudf strings logic where size_type is used to access INT32 column data (#7020) @davidwendt
Adding decimal writing support to parquet (#7017) @hyperbolic2346
Add compression="infer" as default for dask_cudf.read_csv (#7013) @rjzamora
Correct ORC docstring; other minor cuIO improvements (#7012) @vuule
Reduce number of hostdevice_vector allocations in parquet reader (#7005) @devavret
Check output size overflow on strings gather (#6997) @davidwendt
Improve representation of MultiIndex (#6992) @galipremsagar
Disable some pragma unroll statements in thrust sort.h (#6982) @davidwendt
Minor cudf::round internal refactoring (#6976) @codereport
Add Java bindings for URL conversion (#6972) @jlowe
Enable strict_decimal_types in parquet reading (#6969) @sperlingxx
Add in basic support to JNI for logical_cast (#6954) @revans2
Remove duplicate file array_tests.cpp (#6953) @karthikeyann
Add null mask fixed_point_column_wrapper constructors (#6951) @codereport
Update Java bindings version to 0.18-SNAPSHOT (#6949) @jlowe
Use simplified rmm::exec_policy (#6939) @harrism
Add null count test for apply_boolean_mask (#6903) @harrism
Implement DataFrame.quantile for datetime and timedelta data types (#6902) @ChrisJar
Remove **kwargs from string/categorical methods (#6750) @shwina
Refactor rolling.cu to reduce compile time (#6512) @mythrocks
Add static type checking via Mypy (#6381) @shwina
Update to official libcu++ on Github (#6275) @trxcllnt

cudf - v0.17.0

Published by GPUtester almost 4 years ago

v0.17.0 Release

cudf - v0.16.0

Published by GPUtester almost 4 years ago

v0.16.0 Release

cudf - v0.15.0

Published by raydouglass about 4 years ago

v0.15.0 Release

Package Rankings

Top 5.32% on Pypi.org

Top 8.17% on Proxy.golang.org

Top 4.8% on Repo1.maven.org

Related Projects

cupy

NumPy & SciPy for GPU

01 Nov 2016 7,739

spconv

Spatial Sparse Convolution Library

19 Jan 2019 1,847

sit4onnx

Tools for simple inference testing using TensorRT, CUDA and OpenVINO CPU/GPU and CPU providers. S...

12 May 2022 18

CV-CUDA

CV-CUDA™ is an open-source, GPU accelerated library for cloud-scale image processing and computer...

23 Aug 2022 2,338

annotated-s4

Implementation of https://srush.github.io/annotated-s4

08 Dec 2021 450

panda3d

Powerful, mature open-source cross-platform game engine for Python and C++, developed by Disney a...

30 Sep 2013 4,258

blazingsql

BlazingSQL is a lightweight, GPU accelerated, SQL engine for Python. Built on RAPIDS cuDF.

24 Sep 2018 1,896

cumm

CUda Matrix Multiply library.

08 Oct 2021 67

CuVec

Unifying Python/C++/CUDA memory: Python buffered array ↔️ `std::vector` ↔️ CUDA managed memory

16 Jan 2021 80

DeepRec

DeepRec is a high-performance recommendation deep learning framework based on TensorFlow. It is h...

24 Dec 2021 1,029

vqa-outliers

Code and Experiments for ACL-IJCNLP 2021 Paper "Mind Your Outliers! Investigating the Negative Im...

25 May 2021 55

localGPT

Chat with your documents on your local device using GPT models. No data leaves your device and 10...

24 May 2023 19,925

librapid

A highly optimised C++ library for mathematical applications and neural networks.

25 May 2021 163

sqaod

Solvers/annealers for simulated quantum annealing on CPU and CUDA(NVIDIA GPU).

24 Oct 2017 81