Bot releases are hidden (Show)

cupy - v9.0.0a1

Published by kmaehashi almost 4 years ago

This is the release note of v9.0.0a1. See here for the complete list of solved issues and merged PRs.

Highlights

CUDA 11.1 Support

Support for CUDA 11.1 is added in #4184, with CUDA 11.1, GeForce RTX 30 series and Quadro RTX series can now be used in CuPy.

Notes on Wheel Packages

Update (2020-11-25): cupy-cuda111 is now available on PyPI.
CuPy for CUDA 11.1 (cupy-cuda111) wheel packages are currently only available for Windows. We are going to publish Linux wheels once we get approval from the PyPI team. Meanwhile, Linux wheels can be downloaded from the Assets section below (or pip install cupy-cuda111 -f https://github.com/cupy/cupy/releases/tag/v9.0.0rc1).

New Features

Add compressed sparse __setitem__ (#3533)
Add cupy.polyfit (#3747)
Support sparse pointwise division by vectors or matrices (#3838)
Add cudaGetDeviceProperties (#3858)
Support sparse pointwise maximum and minimum (#3860)
Add all binary morphology functions to cupyx.scipy.ndimage (#3907)
Support cublasXgetrsBatched and add cupy.cublas.batched_gesv (#3936)
Add cupy.testing.shaped_sparse_random (#3944)
Add sparse pointwise equality & inequality functions (#3945)
Add remaining grayscale morphology operations to cupyx.scipy.ndimage (#3946)
Add histogram2d and histogramdd (#3947)
Add cupy.gradient (#3963)
Add several functions to cupyx.scipy.ndimage.measurements (#3979)
Add cupyx.scipy.linalg.lu (#3995)
Add cupy.apply_along_axis (#4008)
Add cupyx.scipy.sparse.linalg.norm (#4017)
Add missing sparse matrix constructors (#4052)
Add cupy.cusolver.gels (#4064)
Add @ operator support to cupyx.scipy.sparse (#4075)
Add cupy.nancumsum and cupy.nancumprod (#4077)
Add order option in cupy.testing.shaped_random (#4091)
Add cupy.nanmedian (#4092)
Add complex dtype support in cupy.nanmin and cupy.nanmax (#4097)
Add cupy.append and cupy.resize (#4112)
Add cupyx.scipy.sparse.linalg.eigsh (#4138)
Add support for CUDA 11.1 (#4184)

Enhancements

Support list bins with histogram (#3542)
Add a cuFFT plan cache (#3730)
Support transforming NumPy arrays with multi-GPU Plan1d (#3766)
Show numpy and scipy versions in show_config (#3768)
Add cuTENSOR 1.2 support (#3884)
Update FP16 header to CUDA 11.0 Update 1 (11.0.3) (#3888)
Check format of sparse matrix in numpy_cupy_array_equal (#3897)
Improve accuracy of cupy.around (#3904)
Bump cuDNN version to v8.0.3 (#3985)
Add complex dtype support to cupyx.scipy.linalg.lu_factor/solve (#4002)
Add cython bindings to cuSPARSE csrsv2/csrsm2 related functions (#4031)
Support pickling cupy.RawKernel (#4055)
Allow non-contiguous array input to binary morphology functions (#4058)
Improve performance of binary morphology for fully nonzero structuring elements (#4059)
Bump cuDNN to v8.0.4 (#4065)
Add *svdjBatched prototypes (#4071)
Defer import in cupy/_environment.py (#4162)
Record Cython build and runtime versions (#4164)

Performance Improvements

Use cuTENSOR in cupy.prod, cupy.max, cupy.min, cupy.ptp and cupy.mean (#3765)
Use _csr_row_index for CSR matrix major-axis slicing with step (#3852)
Improve CSR matrix column fancy indexing (#3886)
Use LU-decomposition based solver in cupy.linalg.solver (#3942)
Improve cupyx.scipy.sparse int x int indexing (#3981)
Avoid using CUlinkState unless absolutely necessary (#3992)
Improve cupy.in1d (#4018)
Improve cupy.cuda.cub.device_segmented_reduce() (#4161)

Bug Fixes

Fix cooperative kernel launch (#3894)
Fix dtype in CSR matrix division (#3905)
Fix csr2csc for zero-size matrix (#3919)
Handle transfer to cupy view (#3928)
Fix _compressed_sparse_matrix._minor_slice for step > 1 case (#3948)
Fix csr_matrix._get_intXslice for step < 0 case (#3951)
Fix sparse.__getitem__ not to return view of input (#3975)
ROCm: fix rocBLAS and rocSOLVER version displays (#3988)
Add a kernel for integer GEMM (#3994)
Fix typos in cupy.cuda.cufft (#4014)
Fix managed memory leak (#4015)
Fix potential segfault when reduction axis is empty (#4024)
Use __dealloc__ instead of __del__ for cdef class (#4036)
Fix typo in _binary_erosion (#4038)
Fix CUB block reduction for F-order arrays with ndim > 2 (#4062)
Add work-around for issue in cutensorReduction of cuTENSOR 1.2.1 (#4081)
Handle np.nan and np.inf constant values properly in ndimage functions (#4083)
Fix argmax and argmin for F-order inputs (#4084)
Workaround cudaPointerGetAttributes error in CUDA 10.2+ (#4085)
Fix argmax/argmin in CUB block reduction for F-order arrays with ndim > 1 (#4096)
Fix getDeviceProperties for HIP (#4108)
Add compute capability checking for cublasGemmEx() (#4114)
Fix 64-bit int types in type_dispatcher.cuh (#4124)
Fix mode='opencv' case in cupyx.scipy.ndimage.affine_transform (#4130)
Add compute_35 for CUDA 11.0+ (#4137)
Fix device properties for cuda 9.2 (#4142)
Fix cupyx.seterr() when linalg not supplied (#4150)
Fix broadcasting behavior in ndimage.measurements functions (#4151)
Fix argwhere for 0d inputs (#4167)
Fix nonzero for 0d inputs (#4168)
Fix to use current stream properly with CUDA-related libraries (#4173)

Code Fixes

Split cupy cuda header (#3616)
Rename cupy.io submodule to cupy._io (#3712)
Rename cupy.logic submodule to cupy._logic (#3715)
Rename cupy.manipulation submodule to cupy._manipulation (#3716)
Rename cupy.math submodule to cupy._math (#3717)
Rename submodules under cupy.linalg package (#3741)
Rename cupy.statistics submodule to cupy._statistics (#3774)
Rename cupy.util submodule to cupy._util (#3779)
Rename submodules under cupyx.linalg package (#3784)
Refactor CSR sparse matrix row fancy indexing (#3865)
Rename submodule under cupy.prof package (#3869)
Rename submodule under cupy.fft package (#3870)
Hide private names in cupy/__init__.py (#3871)
Rename cupyx.rsqrt submodule (#3873)
Rename cupyx.runtime submodule (#3874)
Rename cupyx.scatter submodule (#3875)
Rename submodule under cupyx.scipy.fft (#3899)
Rename submodule under cupyx.scipy.fftpack (#3900)
Rename submodules under cupyx.scipy.sparse (#3901)
Rename submodules under cupyx.scipy.special (#3902)
Hide private names in cupyx/scipy/__init__.py (#3912)
Hide private names in cupyx.time (#3965)
Hide private names in cupy.cudnn (#3966)
Hide private names in cupy.cusolver (#3967)
Hide private names in cupy.cusparse (#3968)
Hide private names in cupy.cutensor (#3969)
Move _normalize_axis_index to cupy/core/internal.pyx (#4057)
Move matmul from core.pyx to _routine_linalg.pyx (#4060)

Documentation

Fix wrong curand enum names (#3840)
Add cupy.searchsorted to doc (#3908)
Update cupyx.scipy API documentation (#3954)
Fix docs of cupyx.scipy.linalg.lu_factor (#4011)
Improve the plan cache documentation (#4013)
Update README and docs for unified tagline (#4047)
Simplify ROCm install guide (#4048)
Fix typo (#4053)
Add note about starting nvprof with profiling off (#4144)
Fix docstrings of cupyx.scipy.ndimage.{minimum,maximum}_position (#4146)

Installation

Add CUDA_VERSION define for Cython compilation (#3877)

Tests

Code fix on tests for cupyx.scipy.ndiamge stats functions (#3426)
Add different dtype input test in histogram (#3618)
Fix 32-bit boundary test to run on Windows (#3859)
Fix cupy.ndim test style (#3890)
Fix test fail when cudnn is unavailable (#3906)
Add v8 to list of known branch in FlexCI script (#3911)
Fix side effects in some tests (#3934)
Fix some test to check compatibility with scipy's behavior (#3955)
Refactor sparse indexing tests (#3958)
Require SciPy 1.2 for sparse comparison (#4033)
Add generate_matrix to cupy.testing (#4070)
Make parameterized dtype test skip by pytest.skip (#4094)
ROCm gpg url changed (#4127)
Fix tests that have side effects (#4149)
Enhance dtype error message in testing helpers (#4156)
Fix polyfit tests tolerance (#4159)
Use testing.assert_warns (#4169)

HIP/ROCm

ROCm: Fix bugs and test suites to make ROCm/HIP happy - Part 1 (#3823)
ROCm: Fix bugs and test suites to make ROCm/HIP happy - Part 2 (#3835)
ROCm: Support rocTX (#3843)
ROCm: Support rocFFT/hipFFT (#3896)
ROCm: Support more hipBLAS/rocBLAS and rocSOLVER functions (#3950)
ROCm: Support hipCUB/rocPRIM (#4027)
ROCm: Support RCCL (#4099)
ROCm: Build on latest ROCm (#4110)

Others

Disable github checks annotations of Codecov (#4020)
Bump version to v9.0.0a1 (#4194)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@anaruse @carterbox @cjnolet @Dahlia-Chehata @garanews @grlee77 @kalvdans @leofang @mrkwjc @saswatpp

cupy - v8.0.0

Published by kmaehashi about 4 years ago

Highlights

The CuPy v8.0.0 release includes a number of new features, as well as enhanced NumPy/SciPy functionality coverage.

TensorFloat-32 (TF32) Support
- CuPy now supports TensorFloat-32, a new feature available in NVIDIA Ampere GPU and CUDA 11. Set CUPY_TF32=1 environment variable to boost the performance of matrix multiplications in routines such as cupy.matmul or cupy.tensordot.
Official support for NVIDIA cuTENSOR and CUB libraries
- Several routines in CuPy now support using the cuTENSOR and CUB libraries to further improve performance. Set CUPY_ACCELERATORS=cub,cutensor environment variable to benefit from these libraries.
Enhanced kernel fusion
- While combining multiple kernels into a single one using cupy.fuse, it was only possible to use a single reduction operation (cupy.sum, etc.) at the end. With the new kernel fusion mechanism available in CuPy v8, now it is possible to combine multiple element-wise operations with interleaved reductions.
Automatic tuning of kernel launch parameters
- CuPy now supports discovering the optimal CUDA kernel launch parameters depending on the data and device properties for better performance. See the API reference (cupyx.optimizing.optimize) for details.
Memory pool sharing with external libraries
- With the new PythonFunctionAllocator API, you can let CuPy use arbitrary Python functions instead of a built-in memory pool when managing GPU memory. This improves interoperability with external libraries; for example, you can flexibly use CuPy to preprocess data or use its custom CUDA kernel features inside PyTorch. With pytorch-pfn-extras bundled allocator it is possible to easily use the PyTorch memory pool from CuPy.
Improved NumPy/SciPy function coverage
- Many functions added, including the NumPy Polynomials package (results of Google Summer of Code 2020, thanks @Dahlia-Chehata!), the SciPy image processing package, and extended support for the SciPy sparse matrices package.

For the list of all backward-incompatible changes in v8, please refer to the Upgrade Guide.

Notes on Wheel Packages

CuPy for CUDA 10.1 (cupy-cuda101), 10.2 (cupy-cuda102), and 11.0 (cupy-cuda110) packages are built with cuDNN v8 support but without bundled cuDNN shared libraries (see #3724 for the discussion). To use cuDNN features, You need to download cuDNN library using the following command: python -m cupyx.tools.install_library --library cudnn --cuda X.X. It is also possible to install cuDNN v8.0.x via the system package manager (e.g., apt install libcudnn8 or yum install libcudnn8) or manually install it and set LD_LIBRARY_PATH environment variables.

Changes since v8.0.0rc1

See here for the complete list of merged PRs after v8.0.0rc1 release. For all changes since v7 series, please refer to the release notes of the pre-releases (alpha1, beta1, beta2, beta3, beta4, beta5, rc1).

Highlights

Add a cache to reuse FFT plans that greatly improves CPU time. (thanks @leofang!)
Support for cuTENSOR 1.2 and acceleration of cupy.prod, cupy.max, cupy.min, cupy.ptp and cupy.mean by means of CUPY_ACCELERATORS
Sparse matrices support greatly improved with the addition of new operators and the possibility of setting items.

New Features

Support sparse matrix pointwise maximum and minimum (#3943)
Support sparse matrix pointwise division by vectors or matrices (#3964)
Add cupy.testing.shaped_sparse_random (#3976)
Add compressed sparse __setitem__ (#3998)
Add sparse.linalg.norm (#4040)
Add cuTENSOR 1.2 support (#3970)
Add a cuFFT plan cache (#4010)

Enhancements

Update FP16 header to CUDA 11.0 Update 1 (11.0.3) (#3986)
Bump cuDNN version to v8.0.3 (#3996)

Performance Improvements

Use _csr_row_index for CSR matrix major-axis slicing with step (#3898)
Improve CSR matrix column fancy indexing (#3960)
Improve cupyx.scipy.sparse int x int indexing (#4003)
Avoid using CUlinkState unless absolutely necessary (#4016)
Use cuTENSOR in cupy.prod, cupy.max, cupy.min, cupy.ptp and cupy.mean (#4046)

Bug Fixes

Fix dtype in CSR matrix division (#3924)
Fix _compressed_sparse_matrix._minor_slice for step > 1 case (#3952)
Fix csr_matrix._get_intXslice for step < 0 case (#3957)
Handle transfer to cupy view (#3962)
Fix sparse.__getitem__ not to return view of input (#3993)
Fix managed memory leak (#4032)
Use __dealloc__ instead of __del__ for cdef class (#4037)

Code Fixes

Rename cupyx.scatter submodule (#3921)
Hide private names in cupyx/scipy/__init__.py (#3923)
Rename submodule under cupyx.scipy.fftpack (#3926)
Refactor CSR sparse matrix row fancy indexing (#3930)
Rename cupyx.runtime submodule (#3937)
Rename cupy.util submodule to cupy._util (#3938)
Rename cupy.statistics submodule to cupy._statistics (#3939)
Rename submodule under cupy.prof package (#3940)
Hide private names in cupyx.time (#3990)
Hide private names in cupy.cusparse (#4005)
Rename cupy.math submodule to cupy._math (#4028)
Hide private names in cupy.cudnn (#4029)
Rename cupy.logic submodule to cupy._logic (#4030)
Hide private names in cupy/__init__.py (#4039)

Documentation

Add cupy.searchsorted to doc (#3925)
Update cupyx.scipy API documentation (#3997)

Tests

Fix test fail when cuDNN is unavailable (#3910)
Fix 32-bit boundary test to run on Windows (#3913)
Add v8 to list of known branch in FlexCI script (#3914)
Fix side effects in some tests (#3953)
Fix some test to check compatibility with SciPy's behavior (#3956)
Refactor sparse indexing tests (#3977)
Fix cupy.ndim test style (#4034)
Fix bugs and test suites to make ROCm/HIP happy - Part 1 (#3929)

Others

Disable GitHub checks annotations of Codecov (#4022)
Bump version to v8.0.0 (#4049)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@anaruse @cjnolet @grlee77 @kalvdans @leofang @saswatpp

cupy - v7.8.0.post1

Published by kmaehashi about 4 years ago

Fixed the following errors when building v7.8.0 source published on PyPI:

RuntimeError: Missing file: cupy/cuda/cub.cpp (when CUB is configured via the environment variable or using CUDA 11.0)
RuntimeError: Missing file: cupy/cuda/cutensor.cpp (when cuTENSOR is configured via the environment variable)

This release is only for packaging fix; there is no code difference since v7.8.0.

cupy - v8.0.0rc1

Published by kmaehashi about 4 years ago

This is the release note of v8.0.0rc1. See here for the complete list of solved issues and merged PRs.

We are planning to release the final v8.0.0 on October 1st. Please start testing your workload with this release. See the Upgrade Guide for the list of possible breaking changes.

Highlights

This release adds support for CUDA 11, NumPy 1.19, and SciPy 1.5.
Several performance improvements when using cuTENSOR, sparse matrices indexing, matrix multiplication with CUDA 11 using TF32.
Compatibility with numpy.poly is being increased thanks to our GSoC student @Dahlia-Chehata!
Added an interface (#3126) to support using external memory allocators such as the PyTorch one (https://github.com/pytorch/pytorch/pull/33860).

Notes on Wheel Packages

Update on 2020-09-23: cupy-cuda110 package is now available on PyPI! CuPy for CUDA 11.0 (cupy-cuda110) wheel packages are currently available only for Windows. We are going to publish Linux wheels once we get approval from the PyPI team. (Meanwhile, Linux wheels can be downloaded from the Assets section below (or pip install cupy-cuda110 -f https://github.com/cupy/cupy/releases/tag/v8.0.0rc1). Those wheels will be removed once we publish the package on PyPI.)
CuPy for CUDA 10.1 (cupy-cuda101), 10.2 (cupy-cuda102), and 11.0 (cupy-cuda110) packages are built with cuDNN v8 support but without bundled cuDNN shared libraries (see #3724 for the discussion). To use cuDNN features, You need to download cuDNN library using the following command: python -m cupyx.tools.install_library --library cudnn --cuda X.X.
It is also possible to install cuDNN v8.0.x via the system package manager (e.g., apt install libcudnn8 or yum install libcudnn8) or manually install it and set LD_LIBRARY_PATH environment variables.

Changes without compatibility

Deprecate `cupy.sparse` package (#3839, #3856)

CuPy's sparse matrix support was initially implemented in the cupy.sparse package. It was moved to the cupyx.scipy.sparse namespace in CuPy v5, while keeping the cupy.sparse one for backward compatibility.
Since there is no equivalent package in NumPy, it was decided that it will be deprecated and
eventually removed.

Deprecate `*_enabled` flags under `cupy.cuda` (#3732)

Before it was possible to use cupy.cuda.nccl_enabled or similar to detect whether NCCL, cuTENSOR or other optional CUDA libraries are available to use. Now this pull-request introduced a per-module flag (cupy.cuda.nccl.available, cupy.cuda.cutensor.available) to obtain the same information.

Bump version in Docker images (#3733)

The current base Docker images have been updated from Ubuntu 16.04, CUDA 9.2, and Python 3.5 to Ubuntu 18.04, CUDA 10.2, and Python 3.6.

New Features

Add cupy.ndim (#3060)
Add PythonFunctionAllocator (#3126)
Compressed Sparse Inner Indexing (#3486)
Add cupy.polyadd (#3548)
Add cupy.polymul (#3590)
Add cupy.polysub (#3593)
Add most of scipy.linalg.special_matrices (#3641)
Add scipy.signal functions that are simple wrappers of ndimage functions (#3645)
Add cupyx.scipy.ndimage.fourier_shift, fourier_gaussian, fourier_uniform (#3654)
Add 2D Sparse Slicing (#3657)
Add 2D Sparse Slicing + Row Indexing (#3658)
Add 2D Sparse Slicing + Row & Column Indexing (#3659)
Add cupy.roots for Hermitian or symmetric matrix (#3703)
Add cupy.polyval (#3725)
Support __cuda_array_interface__ in cupy.poly1d (#3729)
Implement library preloading for wheels (#3731)
Add cupy.poly1d.__pow__ (#3734)
Add scipy.signal.convolve and correlate functions (#3748)
Add trimcoef (#3793)

Enhancements

Avoid disk I/O in compiler (#3164)
Add check for method in Randomstate seed (#3282)
Support negative axis in sparse min/max/argmin/argmax (#3497)
Mark nonzero parameters experimental in sparse min/max (#3583)
Add a compile method for RawKernel and RawModule (#3644)
Handle __cuda_array_interface__ in asnumpy (#3718)
Use cublasGemmEx in tensordot_core when CUDA11 (#3719)
Deprecate *_enabled flags under cupy.cuda (#3732)
Fix handle types to intptr_t (#3746)
Support TF32 (#3810)
Deprecate cupy.sparse package (#3839)
Add path and readonly options to cupyx.optimizing.optimize (#3845)
Adding a workaround for even-length inputs to scipy.signal.sepfir2d (#3750)
Add multi-axis support to cupy.flip (#3742)

Performance Improvements

Speed up cupy.vdot (#3678)
Improve cupy.cutensor (#3700)
More improvement of cupy.cutensor (#3744)
Improve 2D sparse row slicing (#3782)
Improve median_filter, rank_filter and percentile_filter (#3813)
Improve CSR matrix getrow, getcol and some slicing (#3851)

Bug Fixes

Fix float16 ndarray input in histogram with CUB (#3617)
Support order argument in cupy.ones, cupy.full and cupy.eye (#3655)
Work around a known CUB SpMV bug (#3679)
Fix broken message format (#3691)
Fix can_use_device_segmented_reduce() for incompatible axes (#3740)
Fix circular imports (#3743)
Skip FFT input checks for some CUDA >= 10.1 cases (#3763)
Fix CUDA 11 multi-GPU FFT bug (#3775)
Temporary fixes for cudnn v8 (#3790)
Fix cupy.correlate (#3801)
Copy input by default for C2R transform (#3848)
Fix cupy.sparse.* deprecation (#3856)
Fix cub not bundled in wheels (#3879)
Fix wheel not loading bundled cuDNN on Windows (#3880)
Add option to include wheel metadata (#3881)
Fix not to use cupy.cuda.* from CuPy codebase (#3883)

Code Fixes

Add cupy_backends/cuda/libs/cutensor.pxd (#3595)
Refactor _make_decorator in helper.py (#3697)
Refactor cupy.poly1d tests (#3704)
Remove unnecessary imports in cupy._sorting (#3706)
Rename cupy.binary submodule to cupy._binary (#3707)
Rename cupy.creation submodule to cupy._creation (#3708)
Rename cupy.functional submodule to cupy._functional (#3710)
Rename cupy.indexing submodule to cupy._indexing (#3711)
Remove unnecessary imports of cupy.linalg (#3714)
Rename cupy.misc submodule to cupy._misc (#3726)
Rename cupy.padding submodule to cupy._padding (#3727)
Rename submodules under cupy.random package (#3772)
Refactor logical routines from core.pyx (#3804)
Refactor binary-op routines from core.pyx (#3816)
Fix typo (#3850)
Resolve circular imports between cupy and cupyx.scipy (#3854)

Documentation

Correct format of docstrings in creation routines (#3752)
Update docs for v8 (#3802)
Fix a broken document (#3807)
Add cupy-cuda110 package to README (#3817)
Fix documents to reflect CUPY_ACCELERATORS (#3818)
Support Optuna v2 (install docs) (#3842)
Add upgrade guide for v8 (#3863)
Fix broken link in the installation guide (#3864)

Installation

Bump version in Docker images (#3733)
Update classifiers in setup.py (#3814)
Install SciPy and Optuna to Docker image (#3844)

Tests

Fix wrong test file name (#3722)
Fix test to run without NCCL (#3735)
Avoid mutation of os.environ (#3749)
Relax tolerance in TestArrayElementwiseOp::test_doubly_broadcasted_pow (#3758)
More on using unittest.mock (#3791)
Fix test to run without cuDNN (#3846)

Others

Bump version to v8.0.0rc1 (#3882)
Make nvrtc getPTX use bytes instead of unicode (#3237)
Add hiprtc support (#3238)
Fix build and import errors for ROCm (#3786)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@anaruse, @cjnolet, @coderforlife, @Dahlia-Chehata, @jakirkham, @leofang, @niteya-shah, @pentschev

cupy - v7.8.0

Published by emcastillo about 4 years ago

This is the release note of v7.8.0. See here for the complete list of solved issues and merged PRs.

Highlights

This release adds support for CUDA 11, NumPy 1.19, and SciPy 1.5.
We expect this version to be the final release for v7.x series. Please start testing your workloads with the latest v8.x pre-release.

Notes on CUDA 11.0 support

Update on 2020-09-23: cupy-cuda110 package is now available on PyPI! cupy-cuda110 wheel packages are currently available only for Windows. We are going to publish Linux wheels once we got approval from the PyPI team. (update on 2020-08-21: Meanwhile, Linux wheels can be downloaded from the Assets section below (or pip install cupy-cuda110 -f https://github.com/cupy/cupy/releases/tag/v7.8.0). Those wheels will be removed once we published the package on PyPI.)
cupy-cuda110 packages are built with cuDNN support but without bundled cuDNN shared libraries (see #3724 for the discussion). To use cuDNN features, you need to install cuDNN v8.0.x via the system package manager (e.g, apt install libcudnn8 or yum install libcudnn8) or manually install it and set LD_LIBRARY_PATH (Linux) or PATH (Windows) environment variables.
When building CuPy from source with CUDA 11.0, g++-6 or later is required. See the installation guide for the detailed instructions.

New Features

Support CUDA 11.0 (#3720)
Support cuSPARSE generic API (#3721)

Enhancements

Update CUDA 11.0 FP16 header to production release version (11.0.2) (#3799)

Performance Improvements

Improve cuDNN performance when using deterministic mode (#3798)

Bug Fixes

Fix broken message format (#3698)
Support order argument in cupy.ones, cupy.full and cupy.eye (#3699, thanks @grlee77!)
Fix sparse matrix related test failures on CUDA11 (#3761)
Allow MatDescriptor to be pickle-able (#3771)
Skip FFT input checks for some CUDA >= 10.1 cases (#3792)
Add temporary fixes for cuDNN v8 (#3794)
Fix error message broken (#3800)
Fix cuSparse build failure on Windows (#3809)

Documentation

Fix format of docstrings in creation routines (#3767)
Update requirements (#3803)
Update install doc: source devtoolset needed in CentOS (#3806)

Tests

Fix wrong test file name (#3754)
Relax tolerance in TestArrayElementwiseOp::test_doubly_broadcasted_pow (#3762)
Skip tests failing due to exception type changes in NumPy 1.19 (#3787)
Avoid testing exception type match on NumPy 1.19 (#3797)
Skip TestDiaMatrixScipyComparison failing with scipy>=1.5.0 (#3805)

Others

Bump version to v7.8.0 (#3812)

cupy - v8.0.0b5

Published by emcastillo about 4 years ago

This is the release note of v8.0.0b5. See here for the complete list of solved issues and merged PRs.

Highlights

CUB is now bundled with CuPy so that everyone can use it out-of-the-box (thanks @leofang!). This release also introduces a mechanism to enable acceleration using different libraries, CUPY_ACCELERATORS environment variable. You can enable CUB and cuTENSOR by setting export CUPY_ACCELERATORS=cub,cutensor.

The new features include an implementation of the SciPy ndimage filters contributed by @coderforlife and the introduction of the cupy_backends library, used to decouple the CUDA ecosystem APIs from CuPy itself.
Currently, cupy_backends is considered an undocumented API and it is subject to further refactoring. In the meantime, you can still continue to use cupy.cuda.* APIs.

Changes without compatibility

Supported Platform (#3670)

As announced previously, we dropped support for CUDA 8.0 and 9.1. We are also going to drop support for NumPy 1.15 and SciPy 1.2 or earlier in the upcoming release.

CUB (#2584, #3461, #3562)

CUB is now bundled in the source tree. As a consequence, gcc-6 or later is required for the CuPy v8 build. If you are building CuPy from source on systems with legacy gcc, follow the instructions below. These steps are not necessary for general users using wheel packages.

### Ubuntu 16
$ sudo add-apt-repository ppa:ubuntu-toolchain-r/test
$ sudo apt-get update
$ sudo apt-get install g++-6
$ export NVCC="nvcc --compiler-bindir gcc-6"

### CentOS 6 and 7:
$ sudo yum install centos-release-scl
$ sudo yum install devtoolset-7-gcc-c++
$ source /opt/rh/devtoolset-7/enable

CUB-related environment variables (CUB_PATH, CUB_DISABLED) are no longer effective. You need to enable CUB by setting CUPY_ACCELERATORS=cub environment variable to boost reduction kernels and several functions such as min, max, sum, and scan.

cuTENSOR (#3592)

In response to the introduction of CUPY_ACCELERATORS, you need to explicitly specify the option CUPY_ACCELERATORS=cutensor to enable cuTENSOR.

Others

Avoid early compilation when initializing a RawModule instance (#3534)
Remove CHAINER_SEED (#3674)
Remove sum_duplicate parameter in sparse min/max/argmin/argmax (#3676)

New Features

Support multistage reduction and indexing in cupy.fuse (#2734, thanks @xuzijian629!)
Implementation of ndimage filters (#3184, thanks @coderforlife!)
Add cupy.convolve (#3371, thanks @Dahlia-Chehata!)
Move CUDA low-level API to cupy_backends namespace (#3386)
Add choose_conv_method (#3464, thanks @Dahlia-Chehata!)
Add cupy.poly1d (#3466, thanks @Dahlia-Chehata!)
Sparse mean (#3487, thanks @cjnolet!)
Add support for cusolverDn<t>syevj and cusolverDn<t>syevjBatched (#3488, thanks @dmargala!)
ndimage rank-based filters (#3500, thanks @coderforlife!)
ndimage common linear filters (#3505, thanks @coderforlife!)
Implement flatiter.__iter__() (#3508)
Implement has_sorted_indices, has_canonical_format, sort(ed)_indices() for sparse matrices (#3509)
Add multi-gpu support to time (#3519)
Add cupy.correlate (#3525, thanks @Dahlia-Chehata!)
Add cupyx.scipy.sparse.kron() (#3528)
Support ncclSend / ncclRecv from NCCL 2.7 (#3567)
Add cupyx.scipy.fft.next_fast_len (#3571)
ndimage generic filters (#3614, thanks @coderforlife!)
Support CSR matrix multiply (#3647)
Support CSR matrix division (#3680)

Enhancements

Build the cupy.cuda.cub module by default (#2584)
Expose cuda IPC runtime calls (#3290)
Merge CUPY_CUB_BLOCK_REDUCTION_DISABLED and CUB_DISABLED (#3461)
Support CUB histogram (#3473)
Support cuTENSOR 1.1 (#3477)
Added functionality to print nvcc and nvrtc output (#3485, thanks @mnicely!)
Support axis=None in sparse min/max (#3515)
Small fixes for CUB block reduction kernels (#3520)
Avoid early compilation when initializing a RawModule instance (#3534)
Improve _prepare_mask_indexing_single (#3539)
Support batched slogdet with complex numbers (#3551, thanks @yoshipon!)
Fix hip header files (#3566)
Remove compute_30 when CUDA 11 (#3578)
Change einsum not to use cuTENSOR when accelerator is not set (#3592)
Update CUDA 11.0 FP16 header to production release version (11.0.2) (#3668)
Drop support for CUDA 8.0 and 9.1 (#3670)
Remove CHAINER_SEED (#3674)

Performance Improvements

Use cuTENSOR in cupy.sum (#2939)
Reduce numpy.ndarray creation in cuTENSOR operation preparation (#3393)
Improve scan operation (#3540)
Improve _ArgInfo init (#3549)
Fix small performance issue (#3550)
Improve _fft_convolve (#3560)
Reduce device synchronization in poly1d instantiation (#3563, thanks @Dahlia-Chehata!)
Reuse FFT plan for convolve/correlate (#3587)
Improve efficiency of cupy.fft.fftfreq and cupy.fft.rfftfreq (#3653, thanks @grlee77!)

Bug Fixes

Fix cupyx.scipy.ndimage.sum taking zero-dimensional input (#3425)
Use CUSPARSE_VERSION instead of CUDA_VERSION (#3491)
Fix sparse min/max to return sparse matrix (#3536)
Fix boolean indexing (#3538)
Support 0-size ndarray and fix possible error in __del__ at fft (#3543)
Fix cupy.percentile type assignment in asarray (#3570)
Fix array creation for ndarray list of arrays of different dtypes (#3605)
Change sorting order of COO sparse matrix for cuSPARSE (#3620)
Add __name__ to custom kernels (#3626)
Fix sparse argmin/argmax return shape (#3639)
Fix missing imports and cupy.show_config (#3642)
Fix sparse matrix related test failures on CUDA 11 (#3649)
Fix error message broken (#3669)
Remove sum_duplicate parameter in sparse min/max/argmin/argmax (#3676)
Fix broken imports for cupy.cuda.* (#3685)
Fix Windows build failure of cuSparse generic API (#3690)
Fix compile option on HIP environment (#3604)

Code Fixes

Use .data() for std::vector (#3022)
Add short comments for the internals (#3475)
Use absolute import (#3496)
Make type dispatcher from cupy.cuda.cub reusable (#3546)
Clean up CUB-related stuff (#3562)
Suppress compile warnings (#3573)
Remove unused descriptor definition (#3594)

Documentation

Add sample code for image resizing (#3559, thanks @pmixer!)
Update documentation of CUPY_ACCELERATORS (#3596)
Update url and email (#3608)
Add a warning for sum_duplicates (#3624)
Remove Chainer related docs (#3673)

Installation

Add missing cupy_cub.cu in package data (#3572)
Fix rpath for wheel build (#3689)

Tests

Test against scipy.fft when available (#3032)
Add tests for _cub_reduction (#3462)
Add mock tests to ensure cupy.cuda.cub is used (#3467)
Fix to set testing.slow correctly (#3501)
Check NumPy compatibility in flatiter tests (#3514)
Fix slogdet tests to check dtypes of return values (#3577)
Fix negative value test in test_helper (#3579)
Deprecate numpy_cupy_array_list_equal (#3582)
Use numpy_cupy_array_equal instead of numpy_cupy_array_list_equal (#3599)
Checks return types in testing.numpy_cupy_* (#3621)
Add tests for sparse max with axis=None (#3638)
Parameterize sparse min/max/argmin/argmax tests (#3656)
Expose accelerator internal API to one level up (#3664)

Others

Fix to raise ValueError for invalid order (#3498)
Fix to raise ValueError for invalid clipmode (#3499)
Fix to raise TypeError for invalid subscripts in einsum (#3502)
Use builtins directly (#3651, thanks @larsoner!)
Add link to Twitter account (#3529)
Update style checker version for Python 3.7 (#3585)
Bump version to v8.0.0b5 (#3687)

cupy - v7.7.0

Published by emcastillo about 4 years ago

This is the release note of v7.7.0. See here for the complete list of solved issues and merged PRs.

Enhancements

Support cusparse<t>csrgeam2 and cusparse<t>csrgemm2 (#3666)

Bug Fixes

Fix for cupy.cuda.thrust (#3422)
Fix sorting order of COO sparse matrix for cuSPARSE (#3623)
Fix array creation for ndarray list of arrays of different dtypes (#3663)

Code Fixes

Suppress compile warnings (#3580)

Documentation

Update url and email (#3635)
Add a warning for sum_duplicates (#3636)
Update Installation Guide (#3660)

Tests

Fix negative value test in test_helper (#3622)
Skip csc and erf tests for scipy>1.2 (#3628)

Others

Update style checker version for Python 3.7 (#3589)
Add link to Twitter account (#3634)
Bump version to v7.7.0 (#3688)
Use builtins directly (#3667, thanks @larsoner!)

cupy - v8.0.0b4

Published by asi1024 over 4 years ago

This is the release note of v8.0.0b4. See here for the complete list of solved issues and merged PRs.

Highlights

CuPy v8.0.0b4 focuses on performance improvements by adding a general CUB based reduction kernel contributed by @leofang (#3244). We also introduce support for the upcoming CUDA 11 (#3405) although we don’t provide wheels for it yet. Last but not least, several new routines are added to improve the NumPy and SciPy functions coverage.

Changes without compatibility

Change the behavior of dia_matrix.diagonal to follow SciPy 1.5.0 specification. It does not raise ValueError for invalid values anymore. Now an empty array is returned instead. (#3469)

New Features

Add cupy.shape (#3229)
CUB-backed _SimpleReductionKernel (#3244, thanks @leofang!)
Add cupyx.scipy.ndimage sum, mean, standard deviation and variance (#3259, thanks @niteya-shah!)
Support C++ template code in cupy.RawModule (#3319, thanks @leofang!)
Add cupy.piecewise (#3329, thanks @Dahlia-Chehata!)
Add cupy.trim_zeros (#3340, thanks @Dahlia-Chehata!)
Add cupy.sort_complex (#3348, thanks @Dahlia-Chehata!)
Add cupy.who (#3361)
Support cudaDeviceGetLimit / cudaDeviceSetLimit (#3387, thanks @leofang!)
Add polycompanion (#3398, thanks @Dahlia-Chehata!)
Add wrappers for cusolverDn<t>potrfBatched and cusolverDn<t>potrsBatched (#3399, thanks @IvanYashchuk!)
Add polyvander (#3404, thanks @Dahlia-Chehata!)
Support CUDA 11.0 (#3405)
Add cupy.shares_memory (#3432)
Detect and show Thrust build version (#3444, thanks @leofang!)

Enhancements

Refactor cuTENSOR handle initialization (#2772)
Deprecate testing.numpy_cupy_raises (#3098)
Align vector access with #3020 #3022 (#3228)
Get arch per device and support CUDA 9.2+ (#3366, thanks @leofang!)
Fix cuTENSOR routines to raise ValueError for invalid arguments (#3374)
Support ignore_error in kernel optimization (#3410)
Support boolean in cupyx.scipy.ndimage stats functions (#3419)
Raise TypeError in cupy.ndarray.__array__ (#3421)
Make Optuna optional to allow import (#3427)
Implement flatiter.copy() (#3442)

Performance Improvements

Speed up CSR SpMV by orders of magnitude (#3430, thanks @leofang!)
Index CArray using 32-bit indexes (#3448)

Bug Fixes

Assert that all the pointers are in the same device in concatenate (#3285)
Fix _count_non_nan datatype for windows (#3350)
Fix cupyx.time.repeat to accumulate duration after GPU synchronization (#3375)
Fix PerfCaseResult changing _ts (#3400)
Fix intermediate dtypes for float16 inputs in cupyx.scipy.ndimage stats functions (#3402)
Properly reset current stream in case null stream is destroyed (#3423)
Fix cupy.power(0j, 0j) (#3449)
Fix TypeError in parameterize test catching CUDADriverError (#3451)
Fix scipy.dia_matrix.diagonal for scipy==1.5.0 (#3469)

Code Fixes

Fix array() for readability (#2935)
Remove unnecessary comparison in cupy.linalg.svd (#3373)
Fix initial values in cupy._environment (#3413, thanks @leofang!)
Use find_packages in setup.py (#3424)
Refactor CUB-backed _SimpleReductionKernel (#3443)

Documentation

Add documentation for cupyx.optimizing.optimize (#3397)
Fix sphinx version for travis (#3416)
Document cupy.fromfile (#3439, thanks @jakirkham!)
Fix typos in cupy.linalg.det docstring (#3456, thanks @grlee77!)
Fix docstring of tofile() (#3460, thanks @leofang!)

Installation

Add optuna and remove theano for doctest requirement (#3446)

Tests

Add tests for cupy.cuda.cub (#2598, thanks @leofang!)
Remove chainercv CI configs (#3055)
Add a test to cover accepting large-size arrays via __cuda_array_interface__ (#3297, thanks @leofang!)
Add __init__.py to allow importing test packages (#3395)
Fix ChainerCV tests failing in master branch (#3411)
Test CUB SpMV (#3428, thanks @leofang!)
Deprecate testing.empty (#3438)
Skip some RawModule tests for wrong condition (#3453)
Use unittest.mock (#3468)

Others

Bump version to v8.0.0b4 (#3481)

cupy - v7.6.0

Published by emcastillo over 4 years ago

This is the release note of v7.6.0. See here for the complete list of solved issues and merged PRs.

New Features

Support all dtypes in every sorting function in cupy.cuda.thrust (#3415, thanks @leofang!)

Enhancements

Get arch per device and support CUDA 9.2+ (#3396, thanks @leofang!)

Bug Fixes

Fix _count_non_nan datatype for windows (#3391)
Properly reset current stream in case null stream is destroyed (#3437)
Fix TypeError in parameterize test catching CUDADriverError (#3459)
Assert that all the pointers are in the same device in concatenate (#3472)

Code Fixes

Use find_packages in setup.py (#3436)

Documentation

Fix sphinx version for travis (#3417)
Document cupy.fromfile (#3447, thanks @jakirkham!)
Fix typos in cupy.linalg.det docstring (#3458, thanks @grlee77!)
Fix docstring of tofile() (#3471, thanks @leofang!)

Installation

Remove theano for doctest requirement (#3463)

Tests

Add __init__.py to allow importing test packages (#3409)

Others

Bump version to v7.6.0 (#3480)

cupy - v7.5.0

Published by kmaehashi over 4 years ago

This is the release note of v7.5.0. See here for the complete list of solved issues and merged PRs.

Enhancements

Show versions of CUB and cuTENSOR on cupy.show_config (#3353)
Support sorting complex arrays (#3336, thanks @leofang!)

Bug Fixes

Fix byte buffer handling to support PyPy (#3227)
Fix put when using scalars (#3332)
Remove some xfails in sorting tests (#3345)
Fix linalg.svd for 0-sized matrices (#3355)
Assign a workpace to ormqr functions in _solve (#3356)
Fix windows build issue with CUDA 8.0 (#3379)

Documentation

Remove upper restrictions for numpy and scipy in doc build (#3338)
Add PFN to the README (#3352)

Others

Bump version to v7.5.0 (#3377)

cupy - v8.0.0b3

Published by kmaehashi over 4 years ago

This is the release note of v8.0.0b3. See here for the complete list of solved issues and merged PRs.

As announced in the previous release, we are dropping support for CUDA 8.0 / 9.1 in v8 releases (#3301). Based on the feedback from users, we will continue to provide cuDNN support (#3303).

Highlights

CuPy v8.0.0b3 introduces a mechanism for optimizing internal parameters when launching reduction kernels using Optuna. Depending on your GPU and the kernels you execute, you can take advantage of this feature and improve the performance of your codes by letting Optuna to automatically find the best parameters for your GPU.
To take advantage of this, call functions that perform reductions with the following:

with cupyx.optimizing.optimize(key=None):
    # cupy reduction function
    y = cupy.sum(x)

CuPy is also taking part in GSoC 2020 and we keep adding new functions to improve our compatibility with NumPy.

New Features

Optimize kernel launch parameters using Optuna (#2731)
Support cuSPARSE generic API (#3242)
Implement flatiter.base property (#3250)
Implement flatiter.__len__() special method (#3251)
Implement flatiter.__next__() special method (#3252)
Implement putmask function (#3261, thanks @rushabh-v!)
Show versions of CUB and cuTENSOR on cupy.show_config (#3271)
Enable getting R2C/C2R FFT plans from get_fft_plan() (#3293, thanks @leofang!)
Support surface memory in RawKernel (#3294, thanks @leofang!)
Add cupy.bartlett (#3307, thanks @niteya-shah!)
Add mean for sparse matrices (#3333)
Support max_duration argument in cupyx.time.repeat (#3357)
Support OptimizeContext serialization (#3367)

Enhancements

Support primitive complex scalar in RawKernel (#2606)
Fix the internal streams in multi-GPU Plan1d (#3260, thanks @leofang!)
Support additional dtypes and axis sequences in cupy.median (#3280, thanks @grlee77!)
Support multiple architectures in CUPY_NVCC_GENERATE_CODE (#3330, thanks @leofang!)
Fix too small max_total_time_per_trial (#3365)

Performance Improvements

Rewrite cupyx.scipy.ndimage.interpolation using ElementwiseKernel (#3166, thanks @grlee77!)
Improve ElementwiseKernel cpu time (#3298)
Performance improvements to blackman, hanning and hamming methods (#3312, thanks @niteya-shah!)
Use local cache in cupy.RawKernel (#3341, thanks @leofang!)
Reduce memory usage of cupy.linalg.svd (#3347)

Bug Fixes

Fix SciPy version check in cupyx.scipy.fft (#3311, thanks @grlee77!)
Ensure runtime context on a per-device basis (#3321, thanks @leofang!)
Fix put when using scalars (#3328)
Assign a work space to ormqr functions in _solve (#3331)
Fix linalg.svd for 0-sized matrices (#3354)
Fix wrong parameter names in kernel launch optimizers (#3364)
cupy.around behaves differently from NumPy for EVEN_NUMBER+0.5 (#3335)

Code Fixes

Add alias of shape type (#3310)
Use shape_t instead of tuple (#3315)

Documentation

Add PFN to the README (#3276)
Remove upper restrictions for numpy and scipy in doc build (#3337)

Tests

Add tests for optimizer for kernel launch parameters (#3363)

Others

Bump version to v8.0.0b3 (#3376)

cupy - v8.0.0b2

Published by emcastillo over 4 years ago

This is the release note of v8.0.0b2. See here for the complete list of solved issues and merged PRs.

We are planning to drop support for CUDA 8.0 / 9.1 (#3301) and cuDNN (#3303) in future v8 releases. If you have any concerns, please feel free to leave a comment in these issues.

New Features

Add notification support for fallback_mode (#2279, thanks @Piyush-555!)
Support multi-GPU cupy.cuda.cufft.Plan1d (#2644, thanks @leofang!)
Add cupy.median (#3134, thanks @Harshan01!)
Add cupy.flatiter (#3165)
Add cupy.gcd and cupy.lcm (#3190, thanks @niteya-shah!)
Support cusolverDn<t>gesvdj and cusolverDn<t>gesvdaStridedBatched (#3192)
Add cupyx.scipy.ndimage.label (#3210)
Add cupyx.scipy.ndimage.grey_erosion and cupyx.scipy.ndimage.grey_dilation (#3216)
Add cupy.diag_indices and cupy.diag_indices_from (#3217, thanks @rushabh-v!)
Support cusparse<t>csrgeam2 and cusparse<t>csrgemm2 (#3220)
Add minimum_filter, maximum_filter, grey_closing, grey_opening to scipy.ndimage (#3239)
Support cusolverDn<t>gesvdjBatched (#3247)
Add cupy.kaiser (#3268, thanks @niteya-shah!)
Support all dtypes in every sorting function in cupy.cuda.thrust (#3286, thanks @leofang!)

Enhancements

Add R2C/C2R support to cupy.cuda.cufft.PlanNd (#3102, thanks @leofang!)
Make RawKernel and RawModule aware of CUDA context (alt) (#3201, thanks @leofang!)
Make diff return AxisError for an invalid axis (#3231, thanks @grlee77!)
Improve the efficiency of cupy.pad for some simple cases (#3281, thanks @grlee77!)
HIP
- Support einsum with complex in HIP (#3203)
- Add complex support to HIP Blas (#3206)

Performance Improvements

Reduce list and tuple creation in _kernel and reduction (#2702)
Remove unnecessary Arg instantiation in cuda/function.pyx (#3253)
Improve norm (#3278)

Bug Fixes

Fix: n-dimensional FFTs must preserve array contiguity when copying a view (#3034, thanks @grlee77!)
Use larger type to represent index range in cupy.take (#3118)
Fix byte buffer handling to support PyPy (#3225)
Fix _reduce_dims call in reduction (#3262)
Raise IndexError for R2C/C2R FFT with axes=() (#3264, thanks @leofang!)
Code fix + bug fix for cupy.cuda.thrust (#3291, thanks @leofang!)

Code Fixes

Remove cupy/cuda/_environment.py (#3145, thanks @leofang!)
Fix cupy.fill_diagonal to implement with cupy.flatiter (#3207)
Remove unreachable code (#3235)
Refactor __array_function__ (#3236)
Simplify TestEigenvalue (#3288)

Documentation

Small typo/spelling fixes (#3243, thanks @svlandeg!)
Use Sphinx 2.x on Read the Docs (#3272)

Tests

Fix overfow in matmul test (#2403)
Add cuTENSOR test (#3037)
Rewrite some tests not use numpy_cupy_raises (#3155)
Rewrite tests not use numpy_cupy_raises (#3256)

cupy - v7.4.0

Published by asi1024 over 4 years ago

This is the release note of v7.4.0. See here for the complete list of solved issues and merged PRs.

Enhancements

Add CUDA 10.2 support (#3218, thanks @jakirkham!)

Bug Fixes

Fix: n-dimensional FFTs must preserve array contiguity when copying a view (#3249, thanks @grlee77!)
Use larger type to represent index range in cupy.take (#3265)

Documentation

Update installation guide for conda-forge (#3198, thanks @leofang!)
Small typo/spelling fixes (#3248, thanks @svlandeg!)

Tests

Fix overfow in matmul test (#3245)

Others

Bump version to v7.4.0 (#3300)

cupy - v8.0.0b1

Published by kmaehashi over 4 years ago

This is the release note of v8.0.0b1. See here for the complete list of solved issues and merged PRs.

Known packaging issues:

CuPy build fails when using CUDA 8.0 on Windows (#3076). Due to this issue, cupy-cuda80 wheel packages for Windows are unavailable for this version. Linux or CUDA 9.0+ users are unaffected.

Highlights

CuPy gets faster and more stable towards its v8.0.0 release. This version adds a handful of new routines, adds library wide performance improvements and corrects several bugs.

Changes without compatibility

Removed cupy.scatter_add, which had been deprecated since CuPy v4. Use cupyx.scatter_add instead.

New Features

Add get_global() to cupy.RawModule (#2510, thanks @leofang!)
Support multi-GPU in cupy.cuda.cufft.Plan1d (#2644, thanks @leofang!)
Add hstack, vstack, and bmat to cupyx.scipy.sparse (#2665, thanks @cjnolet!)
Add cupy.require (#3083, thanks @niteya-shah!)
Add cupy.compress (#3103, thanks @Harshan01!)
Add cupy.ravel_multi_index (#3104, thanks @grlee77!)
Add cupy.extract (#3109, thanks @Harshan01!)
Add cupy.bitwise_not as alias to invert (#3120, thanks @Harshan01!)
Add cupy.argwhere (#3135, thanks @rushabh-v!)
Add cupy.select (#3138, thanks @niteya-shah!)
Add cupy.cuda.ExternalStream (#3141)
Add cupy.array_equal (#3189, thanks @rushabh-v!)

Enhancements

Add ndarray variants AND inplace support in fallback_mode (#2391, thanks @Piyush-555!)
Support array-like start/stop and add axis argument to linspace (#2461, thanks @grlee77!)
Add fp16 support of CUB (#2600, thanks @y1r!)
Raise errors instead of assertion on array type checks (#2795)
Drop support for NumPy 1.15 or earlier (#2938)
Import using_allocator in cupy.cuda (#2951, thanks @jakirkham!)
Remove __future__ imports (#2995)
Support CUB prod (#3067, thanks @leofang!)
Remove deprecated cupy.scatter_add (#3074)
Update cupy.pad to use cupy.linspace instead of numpy.linspace internally (#3101, thanks @grlee77!)
Histogram update: support range, weights and density (#3124, thanks @grlee77!)
Add support for ord = 2, -2, and 'nuc' in cupy.linalg.norm (#3130, thanks @rushabh-v!)
Use ElementwiseKernel in cupy.fill_diagonal (#3139)
Allow dia_matrix creation from SciPy equivalent (#3160, thanks @jakirkham!)
Add labels in the benchmark and add kwargs to repeat (#3172, thanks @rushabh-v!)
Add out parameter to cupy.concatenate and cupy.stack (#2983)
Fix reshape to raise ValueError for order 'K' (#3123)

Performance Improvements

Improve cuDNN performance when using deterministic mode (#1380)
Improve performance of cumsum and cumprod (#2907)
Improve ndimage convolve and correlate (#3179)
Add check of c_contiguous when indexing CArray (#3191)

Bug Fixes

Fix an issue using non-existing attribute in cub.pyx (#2985)
Use size_t nbytes in __cuda_array_interface__ (#3009, thanks @jakirkham!)
Fix fill_diagonal (#3011)
Fix cupy.random.multivariate_normal (#3018, thanks @espg!)
Use Python scalar as random seed (#3054)
Properly decrement total bytes in memory pool (#3068)
Fix condition to use slice copy in ndarray.__setitem__ (#3088)
Fix compiler bug when building cupy.cuda.cub with CUDA < 9.2 (#3089, thanks @leofang!)
Fix cub_reduction for CUPY_CUB_MIN and float16 arrays (#3100)
Use time.process_time instead of time.clock (#3128, thanks @rushabh-v!)
Add support for 0 sized matrices in svd (#3140, thanks @rushabh-v!)
Fix CUB-based cupy.prod for half precision (#3148, thanks @leofang!)
Fix error type and message in coo_matrix (#3150)
Allow MatDescriptor to be pickle-able (#3157, thanks @jakirkham!)
Fix erfinv & erfcinv in cupyx.scipy.special (#3159, thanks @leofang!)
Remove some xfails in sorting tests (#3167)
Fix Event.__del__ behavior on shutdown` (#3176)
Add the missing initialization value in the reduction test (#3194, thanks @leofang!)

Code Fixes

Clean up internal.pyx (get_contiguous_strides) (#1950)
Remove custom tempdir context manager (#3003)
Use intptr_t instead of size_t for cuSPARSE and cuBLAS handles (#3081, thanks @Harshan01!)
Use intptr_t for cuDNN handles (#3082, thanks @Harshan01!)
Minor fix to using_allocator (#3094)
Remove IndexOrValueError (#3096)
Remove unused argument in fill_diagonal (#3171)
Silence sign comparison warning (cont'd) (#3181, thanks @leofang!)
Avoid enum comparison (-Wenum-compare) (#3182, thanks @leofang!)
Remove unused, deprecated fields from cudaPointerAttributes (#3183, thanks @leofang!)

Documentation

Update installation guide for conda-forge (#3052, thanks @leofang!)
Include UnownedMemory in the API docs (#3086, thanks @jakirkham!)
Fix gencode example in the doc (#3147, thanks @leofang!)
Document convolve and correlate (#3161, thanks @jakirkham!)

Examples

Add mpi4py examples (#3049, thanks @leofang!)

Tests

Added a compute capability check for testing grid sync (#3051)
Fix tolerance of fft tests (#3056)
Skip irfft tests for compute capability != 7 (#3084)
Rewrite tests not to use numpy_cupy_raises cupyx.* tests (#3099)
Rewrite manipulation tests not to use numpy_cupy_raises (#3122)
Remove python 2.7 builds (#3162)

Others

Add copyright notice for Random Kit (#3107)
Bump version to v8.0.0b1 (#3204)

cupy - v7.3.0

Published by asi1024 over 4 years ago

This is the release note of v7.3.0. See here for the complete list of solved issues and merged PRs.

Enhancements

Import using_allocator in cupy.cuda (#3087, thanks @jakirkham!)

Bug Fixes

Properly decrement total bytes in memory pool (#3093)
Use Python scalar as random seed (#3106)
Use time.process_time instead of time.clock (#3132, thanks @rushabh-v!)
Fix condition to use slice copy in ndarray.__setitem__ (#3143)
Readd ndarray type to y (#3152, thanks @jakirkham!)
Fix fill_diagonal (#3156)
Fix Event.__del__ behavior on shutdown` (#3180)
Add the missing initialization value in the reduction test (#3195, thanks @leofang!)

Code Fixes

Remove unused argument in fill_diagonal (#3177)

Documentation

Include UnownedMemory in the API docs (#3090, thanks @jakirkham!)
Document convolve and correlate (#3168, thanks @jakirkham!)

Examples

Fix GMM example for matplotlib 3 (#3046)

Tests

Some fixes to .pfnci/script.sh (#3047)
Fix atol of fft tests (#3105)

Others

Add copyright notice for Random Kit (#3121)
Bump version to v7.3.0 (#3205)

cupy - v7.2.0

Published by toslunar over 4 years ago

This is the release note of v7.2.0. See here for the complete list of solved issues and merged PRs.

Known packaging issues:

~~Wheel packages for CUDA 10.2 (cupy-cuda102) are currently unavailable on PyPI. Packages will be published after getting approval of the file size limit increase.~~ (resolved on 2020-02-21)

Highlights

This release adds support for CUDA 10.2 and NumPy 1.18.

Enhancements

Fix linspace(..., num=1, endpoint=False, retstep=True) (#2990)
Add nogil to CUB (#3000, thanks @y1r!)

Bug Fixes

Fix ParameterInfo as a cache key (#2961)
Fix import of _get_axis (#2972, thanks @jakirkham!)
Fix an issue using non-existing attribute in cub.pyx (#3001)
Use size_t nbytes in __cuda_array_interface__ (#3015, thanks @jakirkham!)
Fix empty vector access (#3021)
Fix get_fft_plan() and some FFT tests (#3031, thanks @leofang!)
Fix imag for 0-size array (#3039)
Fix nvcc command lookup (#3040)

Code Fixes

Remove code paths for unsupported Python versions (#3045)

Documentation

Fix typo in note (#3019, thanks @Schoyen!)
Add NumPy 1.18 to installation guide (#3042)
Add cupy-cuda102 (#3073)

Installation

Fix an issue that cuComplex_bridge.h is not installed (#3043)
Do not let Python 2 users build CuPy v7+ (#3044, thanks @leofang!)

Tests

Require scipy in test_gmm (#3050)
Print installed packages in pytest (#3065)
Set CUPY_CI environment variable in Travis CI and AppVeyor (#3066)

cupy - v8.0.0a1

Published by emcastillo over 4 years ago

This is the release note of v8.0.0a1. See here for the complete list of solved issues and merged PRs.

Known packaging issues:

CuPy build fails when using CUDA 8.0 on Windows (#3076). Due to this issue, cupy-cuda80 wheel packages for Windows are unavailable for this version. Linux or CUDA 9.0+ users are unaffected.
~~Wheel packages for CUDA 10.2 (cupy-cuda102) are currently unavailable on PyPI. Packages will be published after getting approval of the file size limit increase.~~ (resolved on 2020-02-21)

Highlights

This release adds support for CUDA 10.2 and NumPy 1.18.
CuPy 8.0.0a1 comes with several exciting new features such as better sparse matrix support, and for users who like to write their own CUDA kernels, there is the possibility of using grid synchronization in RawKernel and RawModule and allow to tune the block size for ElementwiseKernels. There are some noticeable performance improvements as well thanks to the extended support of CUB in several CuPy functions.

Changes without compatibility

update slicing of CSR and CSC matrices for compatibility with SciPy 1.4.0 (#2776)
- Fixed to follow Scipy returns empty slices are returned for such cases.
Separate code and path arguments in RawModule (#2784)
Avoid device synchronization in cupy.allclose (#2799)
- Changed cupy.isclose to return a 0-dim cupy.ndarray instead of a float value to avoid device synchronization.
Remove dtype argument from min/max (#2875)
Rename arg of isscalar (#2974)
- Renamed the argument of cupy.isscalar to element, previously named as num.

New Features

Added min, max, argmin, argmax to sparse csr and csc matrices (#2711, thanks @dloney!)
Add helpers to measure execution times (#2740)
Add digitize (#2758)
Support loading PTX in cupy.RawModule (#2782, thanks @leofang!)
Fix cupyx.scipy.ndimage.map_coordinates for cases with coords > 2d (#2813, thanks @grlee77!)
Detect synchronization (#2819)
Add ptp ndarray method and function (#2859, thanks @grlee77!)
Add convex analysis ufuncs to cupyx.scipy.special (#2861, thanks @grlee77!)
Allow ElementwiseKernel to set the block_size (#2914)
Support grid synchronization in RawKernel and RawModule (#2925)
Add cupy.conjugate and make cupy.conj its alias (#2982)
Add a keyword-only plan argument to cupyx.scipy.fft.* (#2998, thanks @leofang!)

Enhancements

Support sorting complex arrays (#2745, thanks @leofang!)
Fix slow import of cupy (#2759, thanks @cgohlke!)
update slicing of CSR and CSC matrices for compatibility with SciPy 1.4.0 (#2776, thanks @grlee77!)
Add nogil to CUB (#2787, thanks @y1r!)
Avoid device synchronization in cupy.allclose (#2799)
Skip zero valued coefficients in cupyx.scipy.ndimage.convolve (#2846, thanks @grlee77!)
Add CUB reduction support to mean (#2860, thanks @grlee77!)
Sort type map in _kernel.pyx (#2881)
Make test helper decorators pdb-friendly (#2888)
Declare device synchronization at runtime.free() (#2898)
Ignore error when peer access is already enabled (#2901, thanks @leofang!)
Add CUDA 10.2 support (#2910, thanks @ksangeek!)
Show warning for cuFFT bug in irfftn (#2922)
Use cuTensor for einsum (#2928)
Improve error message for wrong number of arguments in elementwise kernels (#2932)
Use asynchronous copy in cupy.copyto (#2942)
MemoryPointer.__repr__ (#2981)
Allow multiple axes in expand_dims (#2992)
Check size before accesing empty vectors data ptr (#3025)
Improve compatibility of random.randint (#2828)
Support 64 bit extent randint (#2829)
Disallow boolean subtraction (#2874)
Remove dtype argument from min/max (#2875)
Fix handling of dtypes in cupy.mean (#2903, thanks @grlee77!)
Disallow boolean negative (#2973)
Rename arg of isscalar (#2974)
Fix linspace(..., num=1, endpoint=False, retstep=True) (#2975)

Performance Improvements

Avoid numpy.can_cast call to improve guess routine (#2673)
Improve caching in ElementwiseKernel (#2688)
Remove memory copy to improve memory range checking (#2699)
Avoid can_cast calling to reduce overhead (#2704)
Use getrfBatched in linalg.slogdet (#2735)
reduce overhead in calls to multi-dimensional FFTs. (#2746, thanks @grlee77!)
Allow squashing f-contiguous axes for faster reduction (#2822)
Support CUB prefix sum & product (#2919, thanks @leofang!)
Improve performance of element-wise einsum where no contraction is necessary (#2960)

Bug Fixes

Fix true_divide with dtype argument (#2076)
keepdims should always preserve all dimensions in CUB-based reductions (#2725, thanks @grlee77!)
Update thrust::complex headers with a bug fix (#2741, thanks @leofang!)
Separate code and path arguments in RawModule (#2784)
Avoid looking up null pointers' attributes (#2802, thanks @leofang!)
Fix range used in cupyx.scipy.ndimage filter origin check (#2805, thanks @grlee77!)
Detect interpreter shutdown for proper __del__ behavior (#2809)
Fix split and array_split with indices overrun (#2814)
Fix split and array_split with unordered indices supplied (#2815)
Fix compilation error causes when thrust is enabled (#2838)
Fix testing.shaped_random for shape () (#2870)
Fix argmin/argmax dtype argument (#2872)
Fix imag for 0-size array (#2886)
Fix logic to check explicit size argument in ElementwiseKernel (#2909)
Sets the default value for thread_local.linalg if not defined (#2915)
Fix cupy.cuda.cub.device_segmented_reduce() not being used (#2921, thanks @leofang!)
Fix complex type checks in _correlate_or_convolve (#2923)
Fix ParameterInfo as a cache key (#2941)
Avoid invalid in-place division in CUB-based mean (#2943, thanks @grlee77!)
Fix empty vector access (#3020)
Fix nvcc command lookup (#3028)

Code Fixes

Use intptr_t for cuSOLVER handles (#2718)
Merge reduction implementations (#2732)
Rename and reorder private functions in reduction.pxi (#2767)
Avoid using PyThread API (#2769)
Remove unused cuParamSetTexRef() (#2770, thanks @leofang!)
Separate reduction code from _kernel.pyx (#2785)
Refactor reduction code (#2801)
Refactor ops (#2817)
Separate CArray and family from core.pyx (#2831)
Add missing blank lines (#2887)
Readability fix in memory.pyx (#2899)
Clean up _scalar.pyx (#2917)
Enhance type and argument manipulation in elementwise and reduction kernels (#2940)
Remove intermediate aliases of cupy.sort (#2944, thanks @rushabh-v!)
Silence sign comparison warnings (#2949, thanks @leofang!)
Fix typos in comments (#2978)
Remove dependency to six (#2980)
A nit-picking code fix (#2988)
Rename _op variable in cub.pyx (#3002)
Remove code paths for unsupported Python versions (#3004)

Documentation

Fix docs of options argument in RawKernel and RawModule (#2643)
Document device synchronization (#2798)
Fix typo in scipy.fft docs (#2804, thanks @grlee77!)
Fix the docstring format of cupy.asarray (#2821, thanks @leofang!)
Update cuTENSOR version in docs (#2948)
Document get_allocator function (#2953, thanks @jakirkham!)
Add NumPy 1.18 to installation guide (#3005)
Fix typo in note (#3012, thanks @Schoyen!)
Add cupy-cuda102 (#3057)

Installation

Do not let Python 2 users build CuPy v7+ (#2766, thanks @leofang!)
Fix an issue that cuComplex_bridge.h is not installed (#2984)
Fix ROCm build errors (#3071)

Examples

Fix GMM example for matplotlib 3 (#2996)
Use cupy.random in kmeans example (#3026)

Tests

Test cuTENSOR v1.0.0 (#2727)
Use more stable input to test linalg.matrix_power (#2788)
Remove Python 3.4 matrix from Travis CI (#2794)
Drop ChainerCV's test in master branch. (#2803)
Refactor array testing decorators (#2818)
Fix decorator usage in tests (#2820)
Add f-contiguous reduction tests (#2830)
Test ifloordiv with numpy 1.18 (#2852)
Fix test_helper.py for NumPy 1.18 (#2883)
Avoid 0s in the diagonal of TestSolveTriangular inputs (#2927)
Add tests for size argument with no input (#2931)
Print installed packages in pytest (#2979)
Make testing.parameterize pdb-friendly (#3024)
Require scipy in test_gmm (#3048)

Others

Allow install without thrust (#2730)
Add Mergify configuration file (#2894)
Make cupyx.time.repeat experimental (#2897)
Make cupyx.allow_synchronize experimental (#2947)
Some fixes to .pfnci/script.sh (#3041)
Set CUPY_CI environment variable in Travis CI and AppVeyor (#3058)
Bump version to v8.0.0a1 (#3069)

cupy - v7.1.1

Published by kmaehashi over 4 years ago

This is a hot-fix release for v7.1.0 to address an issue in CUB support. Only users manually building CuPy from source with CUB support enabled are affected; wheel package users (cupy-cudaXXX) are not affected by this issue as CUB support is not enabled in wheels.

This is the release note of v7.1.1. See here for the complete list of solved issues and merged PRs.

Bug Fixes

Fix import of _get_axis in cupy.cub (#2986, thanks @jakirkham!)

cupy - v7.1.0

Published by emcastillo almost 5 years ago

This is the release note of v7.1.0. See here for the complete list of solved issues and merged PRs.

Changes without compatibility

code_or_path argument of cupy.RawModule has been replaced with two keyword arguments (code and path) to avoid ambiguity. (#2786)

Enhancements

Fix slow import of cupy (#2773, thanks @cgohlke!)
Support 64 bit extent randint (#2854)
Improve compatibility of random.randint (#2862)
Show warning for cuFFT bug in irfftn (#2962)

Bug Fixes

Separate code and path arguments in RawModule (#2786)
Fix range used in cupyx.scipy.ndimage filter origin check (#2810, thanks @grlee77!)
Detect interpreter shutdown for proper __del__ behavior (#2811)
Update thrust::complex headers with a bug fix (#2833, thanks @leofang!)
Fix true_divide with dtype argument (#2834)
Fix compilation error causes when thrust is enabled (#2839)
keepdims should always preserve all dimensions in CUB-based reductions (#2848, thanks @grlee77!)
Fix split and array_split with indices overrun (#2851)
Fix split and array_split with unordered indices supplied (#2857)
Avoid looking up null pointers' attributes (#2866, thanks @leofang!)
Fix testing.shaped_random for shape () (#2889)
Fix argmin/argmax dtype argument (#2890)
Fix complex type checks in _correlate_or_convolve (#2924)
Fix cupy.cuda.cub.device_segmented_reduce() not being used (#2936, thanks @leofang!)
Sets the default value for thread_local.linalg if not defined (#2937)

Documentation

Fix docs of options argument in RawKernel and RawModule (#2774)
Fix typo in scipy.fft docs (#2807, thanks @grlee77!)
Fix the docstring format of cupy.asarray (#2825, thanks @leofang!)
Document get_allocator function (#2954, thanks @jakirkham!)
Update cuTENSOR version in docs (#2955)
Document device synchronization (#2957)

Tests

Test cuTENSOR v1.0.0 (#2775)
Use more stable input to test linalg.matrix_power (#2793)
Fix decorator usage in tests (#2847)
Remove Python 3.4 matrix from Travis CI (#2867)
Refactor array testing decorators (#2877)
Test ifloordiv with NumPy 1.18 (#2880)
Fix test_helper.py for NumPy 1.18 (#2913)
Avoid 0s in the diagonal of TestSolveTriangular inputs (#2929)

Others

Add Mergify configuration file (#2933)

cupy - v6.7.0

Published by niboshi almost 5 years ago

This is the release note of v6.7.0. See here for the complete list of solved issues and merged PRs.

As announced previously, this is the final release of v6 series, which is the last version supporting Python 2.

Enhancements

Fix slow import of cupy (#2765, thanks @cgohlke!)
Support 64 bit extent randint (#2855)
Fix @testing.numpy_cupy_ decorators for skips (#2892)
Show warning for cuFFT bug in irfftn (#2959)

Bug Fixes

Fix axes handling in _fftn (#2752)
Detect interpreter shutdown for proper __del__ behavior (#2812)
Fix true_divide with dtype argument (#2835)
Fix split and array_split with indices overrun (#2837)
Fix split and array_split with unordered indices supplied (#2856)
Avoid looking up null pointers' attributes (#2868, thanks @leofang!)
Fix testing.shaped_random for shape () (#2895)

Documentation

Fix the docstring format of cupy.asarray (#2826, thanks @leofang!)

Tests

Use more stable input to test linalg.matrix_power (#2791)
Fix decorator usage in tests (#2850)
Test ifloordiv with numpy 1.18 (#2879)
Fix test_helper.py for NumPy 1.18 (#2912)
Avoid 0s in the diagonal of TestSolveTriangular inputs (#2930)

Others

Add Mergify configuration file (#2934)

Package Rankings

Top 0.96% on Pypi.org

Top 5.87% on Conda-forge.org

Top 8.17% on Proxy.golang.org

Top 19.57% on Anaconda.org

Badges

Extracted from project README

Related Projects

cudf

cuDF - GPU DataFrame Library

07 May 2017 7,236

chainer

A flexible framework of neural networks for deep learning

05 Jun 2015 5,883

sit4onnx

Tools for simple inference testing using TensorRT, CUDA and OpenVINO CPU/GPU and CPU providers. S...

12 May 2022 18

tsne-cuda

GPU Accelerated t-SNE for CUDA with Python bindings

24 Mar 2018 1,782

CuVec

Unifying Python/C++/CUDA memory: Python buffered array ↔️ `std::vector` ↔️ CUDA managed memory

16 Jan 2021 80

tensorly

TensorLy: Tensor Learning in Python.

21 Oct 2016 1,504

pycuda

CUDA integration for Python, plus shiny features

06 Apr 2011 1,827

cupoch

Robotics with GPU computing

22 Oct 2019 898

Neuromorphic-Computing-Guide

Learn about the Neumorphic engineering process of creating large-scale integration (VLSI) systems...

03 Oct 2021 191

Deep-Learning-in-Production

In this repository, I will share some useful notes and references about deploying deep learning-b...

03 May 2018 4,294

spconv

Spatial Sparse Convolution Library

19 Jan 2019 1,847

nice-slam

[CVPR'22] NICE-SLAM: Neural Implicit Scalable Encoding for SLAM

28 Mar 2022 1,418

libpython-clj

Python bindings for Clojure

16 May 2019 1,078

klongpy

High-Performance Klong array language in Python.

06 Jul 2022 117

CV-CUDA

CV-CUDA™ is an open-source, GPU accelerated library for cloud-scale image processing and computer...

23 Aug 2022 2,338

cupy

Highlights

CUDA 11.1 Support

Notes on Wheel Packages

New Features

Enhancements

Performance Improvements

Bug Fixes

Code Fixes

Documentation

Installation

Tests

HIP/ROCm

Others

Contributors

Highlights

Notes on Wheel Packages

Changes since v8.0.0rc1

Highlights

New Features

Enhancements

Performance Improvements

Bug Fixes

Code Fixes

Documentation

Tests

Others

Contributors

Highlights

Notes on Wheel Packages

Changes without compatibility

Deprecate cupy.sparse package (#3839, #3856)

Deprecate *_enabled flags under cupy.cuda (#3732)

Bump version in Docker images (#3733)

New Features

Enhancements

Performance Improvements

Bug Fixes

Code Fixes

Documentation

Installation

Tests

Others

Contributors

Highlights

Notes on CUDA 11.0 support

New Features

Enhancements

Performance Improvements

Bug Fixes

Documentation

Tests

Others

Highlights

Changes without compatibility

Supported Platform (#3670)

CUB (#2584, #3461, #3562)

cuTENSOR (#3592)

Others

New Features

Enhancements

Performance Improvements

Bug Fixes

Code Fixes

Documentation

Installation

Tests

Others

Enhancements

Bug Fixes

Code Fixes

Documentation

Tests

Others

Highlights

Changes without compatibility

New Features

Enhancements

Performance Improvements

Bug Fixes

Deprecate `cupy.sparse` package (#3839, #3856)

Deprecate `*_enabled` flags under `cupy.cuda` (#3732)