Bot releases are hidden (Show)
Published by kmaehashi almost 4 years ago
This is the release note of v9.0.0a1. See here for the complete list of solved issues and merged PRs.
Support for CUDA 11.1 is added in #4184, with CUDA 11.1, GeForce RTX 30 series and Quadro RTX series can now be used in CuPy.
Update (2020-11-25): cupy-cuda111
is now available on PyPI.
CuPy for CUDA 11.1 (cupy-cuda111
) wheel packages are currently only available for Windows. We are going to publish Linux wheels once we get approval from the PyPI team. Meanwhile, Linux wheels can be downloaded from the Assets section below (or pip install cupy-cuda111 -f https://github.com/cupy/cupy/releases/tag/v9.0.0rc1
).
__setitem__
(#3533)cupy.polyfit
(#3747)cudaGetDeviceProperties
(#3858)cupyx.scipy.ndimage
(#3907)cublasXgetrsBatched
and add cupy.cublas.batched_gesv
(#3936)cupy.testing.shaped_sparse_random
(#3944)cupyx.scipy.ndimage
(#3946)histogram2d
and histogramdd
(#3947)cupy.gradient
(#3963)cupyx.scipy.ndimage.measurements
(#3979)cupyx.scipy.linalg.lu
(#3995)cupy.apply_along_axis
(#4008)cupyx.scipy.sparse.linalg.norm
(#4017)cupy.cusolver.gels
(#4064)@
operator support to cupyx.scipy.sparse
(#4075)cupy.nancumsum
and cupy.nancumprod
(#4077)order
option in cupy.testing.shaped_random
(#4091)cupy.nanmedian
(#4092)cupy.nanmin
and cupy.nanmax
(#4097)cupy.append
and cupy.resize
(#4112)cupyx.scipy.sparse.linalg.eigsh
(#4138)histogram
(#3542)Plan1d
(#3766)show_config
(#3768)numpy_cupy_array_equal
(#3897)cupy.around
(#3904)cupyx.scipy.linalg.lu_factor/solve
(#4002)csrsv2
/csrsm2
related functions (#4031)cupy.RawKernel
(#4055)*svdjBatched
prototypes (#4071)cupy/_environment.py
(#4162)cupy.prod
, cupy.max
, cupy.min
, cupy.ptp
and cupy.mean
(#3765)_csr_row_index
for CSR matrix major-axis slicing with step (#3852)cupy.linalg.solver
(#3942)cupyx.scipy.sparse
int x int indexing (#3981)CUlinkState
unless absolutely necessary (#3992)cupy.in1d
(#4018)cupy.cuda.cub.device_segmented_reduce()
(#4161)csr2csc
for zero-size matrix (#3919)_compressed_sparse_matrix._minor_slice
for step > 1 case (#3948)csr_matrix._get_intXslice
for step < 0 case (#3951)sparse.__getitem__
not to return view of input (#3975)cupy.cuda.cufft
(#4014)__dealloc__
instead of __del__
for cdef class (#4036)_binary_erosion
(#4038)cutensorReduction
of cuTENSOR 1.2.1 (#4081)np.nan
and np.inf
constant values properly in ndimage functions (#4083)argmax
and argmin
for F-order inputs (#4084)cudaPointerGetAttributes
error in CUDA 10.2+ (#4085)argmax
/argmin
in CUB block reduction for F-order arrays with ndim > 1 (#4096)getDeviceProperties
for HIP (#4108)cublasGemmEx()
(#4114)type_dispatcher.cuh
(#4124)compute_35
for CUDA 11.0+ (#4137)cupyx.seterr()
when linalg
not supplied (#4150)ndimage.measurements
functions (#4151)argwhere
for 0d inputs (#4167)nonzero
for 0d inputs (#4168)cupy.io
submodule to cupy._io
(#3712)cupy.logic
submodule to cupy._logic
(#3715)cupy.manipulation
submodule to cupy._manipulation
(#3716)cupy.math
submodule to cupy._math
(#3717)cupy.linalg
package (#3741)cupy.statistics
submodule to cupy._statistics
(#3774)cupy.util
submodule to cupy._util
(#3779)cupyx.linalg
package (#3784)cupy.prof
package (#3869)cupy.fft
package (#3870)cupy/__init__.py
(#3871)cupyx.rsqrt
submodule (#3873)cupyx.runtime
submodule (#3874)cupyx.scatter
submodule (#3875)cupyx.scipy.fft
(#3899)cupyx.scipy.fftpack
(#3900)cupyx.scipy.sparse
(#3901)cupyx.scipy.special
(#3902)cupyx/scipy/__init__.py
(#3912)cupyx.time
(#3965)cupy.cudnn
(#3966)cupy.cusolver
(#3967)cupy.cusparse
(#3968)cupy.cutensor
(#3969)_normalize_axis_index
to cupy/core/internal.pyx
(#4057)matmul
from core.pyx
to _routine_linalg.pyx
(#4060)cupy.searchsorted
to doc (#3908)cupyx.scipy
API documentation (#3954)cupyx.scipy.ndimage.{minimum,maximum}_position
(#4146)CUDA_VERSION
define for Cython compilation (#3877)cupyx.scipy.ndiamge
stats functions (#3426)cupy.ndim
test style (#3890)generate_matrix
to cupy.testing
(#4070)pytest.skip
(#4094)polyfit
tests tolerance (#4159)testing.assert_warns
(#4169)The CuPy Team would like to thank all those who contributed to this release!
@anaruse @carterbox @cjnolet @Dahlia-Chehata @garanews @grlee77 @kalvdans @leofang @mrkwjc @saswatpp
Published by kmaehashi about 4 years ago
The CuPy v8.0.0 release includes a number of new features, as well as enhanced NumPy/SciPy functionality coverage.
TensorFloat-32 (TF32) Support
CUPY_TF32=1
environment variable to boost the performance of matrix multiplications in routines such as cupy.matmul
or cupy.tensordot
.Official support for NVIDIA cuTENSOR and CUB libraries
Enhanced kernel fusion
cupy.fuse
, it was only possible to use a single reduction operation (cupy.sum
, etc.) at the end. With the new kernel fusion mechanism available in CuPy v8, now it is possible to combine multiple element-wise operations with interleaved reductions.Automatic tuning of kernel launch parameters
cupyx.optimizing.optimize
) for details.Memory pool sharing with external libraries
PythonFunctionAllocator
API, you can let CuPy use arbitrary Python functions instead of a built-in memory pool when managing GPU memory. This improves interoperability with external libraries; for example, you can flexibly use CuPy to preprocess data or use its custom CUDA kernel features inside PyTorch. With pytorch-pfn-extras bundled allocator it is possible to easily use the PyTorch memory pool from CuPy.Improved NumPy/SciPy function coverage
For the list of all backward-incompatible changes in v8, please refer to the Upgrade Guide.
cupy-cuda101
), 10.2 (cupy-cuda102
), and 11.0 (cupy-cuda110
) packages are built with cuDNN v8 support but without bundled cuDNN shared libraries (see #3724 for the discussion). To use cuDNN features, You need to download cuDNN library using the following command: python -m cupyx.tools.install_library --library cudnn --cuda X.X
. It is also possible to install cuDNN v8.0.x via the system package manager (e.g., apt install libcudnn8
or yum install libcudnn8
) or manually install it and set LD_LIBRARY_PATH
environment variables.See here for the complete list of merged PRs after v8.0.0rc1 release. For all changes since v7 series, please refer to the release notes of the pre-releases (alpha1, beta1, beta2, beta3, beta4, beta5, rc1).
cupy.prod
, cupy.max
, cupy.min
, cupy.ptp
and cupy.mean
by means of CUPY_ACCELERATORS
cupy.testing.shaped_sparse_random
(#3976)__setitem__
(#3998)sparse.linalg.norm
(#4040)_csr_row_index
for CSR matrix major-axis slicing with step (#3898)cupyx.scipy.sparse
int x int indexing (#4003)CUlinkState
unless absolutely necessary (#4016)cupy.prod
, cupy.max
, cupy.min
, cupy.ptp
and cupy.mean
(#4046)_compressed_sparse_matrix._minor_slice
for step > 1 case (#3952)csr_matrix._get_intXslice
for step < 0
case (#3957)sparse.__getitem__
not to return view of input (#3993)__dealloc__
instead of __del__
for cdef class (#4037)cupyx.scatter
submodule (#3921)cupyx/scipy/__init__.py
(#3923)cupyx.scipy.fftpack
(#3926)cupyx.runtime
submodule (#3937)cupy.util
submodule to cupy._util
(#3938)cupy.statistics
submodule to cupy._statistics
(#3939)cupy.prof
package (#3940)cupyx.time
(#3990)cupy.cusparse
(#4005)cupy.math
submodule to cupy._math
(#4028)cupy.cudnn
(#4029)cupy.logic
submodule to cupy._logic
(#4030)cupy/__init__.py
(#4039)cupy.searchsorted
to doc (#3925)cupyx.scipy
API documentation (#3997)cupy.ndim
test style (#4034)The CuPy Team would like to thank all those who contributed to this release!
@anaruse @cjnolet @grlee77 @kalvdans @leofang @saswatpp
Published by kmaehashi about 4 years ago
Fixed the following errors when building v7.8.0 source published on PyPI:
RuntimeError: Missing file: cupy/cuda/cub.cpp
(when CUB is configured via the environment variable or using CUDA 11.0)RuntimeError: Missing file: cupy/cuda/cutensor.cpp
(when cuTENSOR is configured via the environment variable)This release is only for packaging fix; there is no code difference since v7.8.0.
Published by kmaehashi about 4 years ago
This is the release note of v8.0.0rc1. See here for the complete list of solved issues and merged PRs.
We are planning to release the final v8.0.0 on October 1st. Please start testing your workload with this release. See the Upgrade Guide for the list of possible breaking changes.
numpy.poly
is being increased thanks to our GSoC student @Dahlia-Chehata!cupy-cuda110
package is now available on PyPI! cupy-cuda110
) wheel packages are currently available only for Windows. We are going to publish Linux wheels once we get approval from the PyPI team. (Meanwhile, Linux wheels can be downloaded from the Assets section below (or pip install cupy-cuda110 -f https://github.com/cupy/cupy/releases/tag/v8.0.0rc1
). Those wheels will be removed once we publish the package on PyPI.)cupy-cuda101
), 10.2 (cupy-cuda102
), and 11.0 (cupy-cuda110
) packages are built with cuDNN v8 support but without bundled cuDNN shared libraries (see #3724 for the discussion). To use cuDNN features, You need to download cuDNN library using the following command: python -m cupyx.tools.install_library --library cudnn --cuda X.X
.apt install libcudnn8
or yum install libcudnn8
) or manually install it and set LD_LIBRARY_PATH
environment variables.cupy.sparse
package (#3839, #3856)CuPy's sparse matrix support was initially implemented in the cupy.sparse
package. It was moved to the cupyx.scipy.sparse
namespace in CuPy v5, while keeping the cupy.sparse
one for backward compatibility.
Since there is no equivalent package in NumPy, it was decided that it will be deprecated and
eventually removed.
*_enabled
flags under cupy.cuda
(#3732)Before it was possible to use cupy.cuda.nccl_enabled
or similar to detect whether NCCL, cuTENSOR or other optional CUDA libraries are available to use. Now this pull-request introduced a per-module flag (cupy.cuda.nccl.available
, cupy.cuda.cutensor.available
) to obtain the same information.
The current base Docker images have been updated from Ubuntu 16.04, CUDA 9.2, and Python 3.5 to Ubuntu 18.04, CUDA 10.2, and Python 3.6.
cupy.ndim
(#3060)PythonFunctionAllocator
(#3126)cupy.polyadd
(#3548)cupy.polymul
(#3590)cupy.polysub
(#3593)scipy.linalg.special_matrices
(#3641)scipy.signal
functions that are simple wrappers of ndimage
functions (#3645)cupyx.scipy.ndimage.fourier_shift
, fourier_gaussian
, fourier_uniform
(#3654)cupy.roots
for Hermitian or symmetric matrix (#3703)cupy.polyval
(#3725)__cuda_array_interface__
in cupy.poly1d
(#3729)cupy.poly1d.__pow__
(#3734)scipy.signal.convolve
and correlate
functions (#3748)trimcoef
(#3793)axis
in sparse min
/max
/argmin
/argmax
(#3497)nonzero
parameters experimental in sparse min
/max
(#3583)compile
method for RawKernel
and RawModule
(#3644)__cuda_array_interface__
in asnumpy
(#3718)cublasGemmEx
in tensordot_core
when CUDA11 (#3719)*_enabled
flags under cupy.cuda
(#3732)intptr_t
(#3746)cupy.sparse
package (#3839)path
and readonly
options to cupyx.optimizing.optimize
(#3845)scipy.signal.sepfir2d
(#3750)cupy.flip
(#3742)cupy.vdot
(#3678)cupy.cutensor
(#3700)cupy.cutensor
(#3744)getrow
, getcol
and some slicing (#3851)float16
ndarray
input in histogram
with CUB (#3617)cupy.ones
, cupy.full
and cupy.eye
(#3655)can_use_device_segmented_reduce()
for incompatible axes (#3740)cupy.correlate
(#3801)cupy.sparse.*
deprecation (#3856)cupy.cuda.*
from CuPy codebase (#3883)cupy_backends/cuda/libs/cutensor.pxd
(#3595)_make_decorator
in helper.py (#3697)cupy.poly1d
tests (#3704)cupy._sorting
(#3706)cupy.binary
submodule to cupy._binary
(#3707)cupy.creation
submodule to cupy._creation
(#3708)cupy.functional
submodule to cupy._functional
(#3710)cupy.indexing
submodule to cupy._indexing
(#3711)cupy.linalg
(#3714)cupy.misc
submodule to cupy._misc
(#3726)cupy.padding
submodule to cupy._padding
(#3727)cupy.random
package (#3772)core.pyx
(#3804)core.pyx
(#3816)cupy
and cupyx.scipy
(#3854)cupy-cuda110
package to README (#3817)CUPY_ACCELERATORS
(#3818)classifiers
in setup.py
(#3814)os.environ
(#3749)TestArrayElementwiseOp::test_doubly_broadcasted_pow
(#3758)unittest.mock
(#3791)getPTX
use bytes
instead of unicode
(#3237)The CuPy Team would like to thank all those who contributed to this release!
@anaruse, @cjnolet, @coderforlife, @Dahlia-Chehata, @jakirkham, @leofang, @niteya-shah, @pentschev
Published by emcastillo about 4 years ago
This is the release note of v7.8.0. See here for the complete list of solved issues and merged PRs.
cupy-cuda110
package is now available on PyPI! cupy-cuda110
wheel packages are currently available only for Windows. We are going to publish Linux wheels once we got approval from the PyPI team. (update on 2020-08-21: Meanwhile, Linux wheels can be downloaded from the Assets section below (or pip install cupy-cuda110 -f https://github.com/cupy/cupy/releases/tag/v7.8.0
). Those wheels will be removed once we published the package on PyPI.)cupy-cuda110
packages are built with cuDNN support but without bundled cuDNN shared libraries (see #3724 for the discussion). To use cuDNN features, you need to install cuDNN v8.0.x via the system package manager (e.g, apt install libcudnn8
or yum install libcudnn8
) or manually install it and set LD_LIBRARY_PATH
(Linux) or PATH
(Windows) environment variables.MatDescriptor
to be pickle-able (#3771)source devtoolset
needed in CentOS (#3806)TestArrayElementwiseOp::test_doubly_broadcasted_pow
(#3762)TestDiaMatrixScipyComparison
failing with scipy>=1.5.0
(#3805)Published by emcastillo about 4 years ago
This is the release note of v8.0.0b5. See here for the complete list of solved issues and merged PRs.
CUB is now bundled with CuPy so that everyone can use it out-of-the-box (thanks @leofang!). This release also introduces a mechanism to enable acceleration using different libraries, CUPY_ACCELERATORS
environment variable. You can enable CUB and cuTENSOR by setting export CUPY_ACCELERATORS=cub,cutensor
.
The new features include an implementation of the SciPy ndimage filters contributed by @coderforlife and the introduction of the cupy_backends
library, used to decouple the CUDA ecosystem APIs from CuPy itself.
Currently, cupy_backends
is considered an undocumented API and it is subject to further refactoring. In the meantime, you can still continue to use cupy.cuda.*
APIs.
As announced previously, we dropped support for CUDA 8.0 and 9.1. We are also going to drop support for NumPy 1.15 and SciPy 1.2 or earlier in the upcoming release.
CUB is now bundled in the source tree. As a consequence, gcc-6 or later is required for the CuPy v8 build. If you are building CuPy from source on systems with legacy gcc, follow the instructions below. These steps are not necessary for general users using wheel packages.
### Ubuntu 16
$ sudo add-apt-repository ppa:ubuntu-toolchain-r/test
$ sudo apt-get update
$ sudo apt-get install g++-6
$ export NVCC="nvcc --compiler-bindir gcc-6"
### CentOS 6 and 7:
$ sudo yum install centos-release-scl
$ sudo yum install devtoolset-7-gcc-c++
$ source /opt/rh/devtoolset-7/enable
CUB-related environment variables (CUB_PATH
, CUB_DISABLED
) are no longer effective. You need to enable CUB by setting CUPY_ACCELERATORS=cub
environment variable to boost reduction kernels and several functions such as min
, max
, sum
, and scan
.
In response to the introduction of CUPY_ACCELERATORS
, you need to explicitly specify the option CUPY_ACCELERATORS=cutensor
to enable cuTENSOR.
RawModule
instance (#3534)CHAINER_SEED
(#3674)sum_duplicate
parameter in sparse min
/max
/argmin
/argmax
(#3676)cupy.fuse
(#2734, thanks @xuzijian629!)cupy.convolve
(#3371, thanks @Dahlia-Chehata!)cupy_backends
namespace (#3386)choose_conv_method
(#3464, thanks @Dahlia-Chehata!)cupy.poly1d
(#3466, thanks @Dahlia-Chehata!)cusolverDn<t>syevj
and cusolverDn<t>syevjBatched
(#3488, thanks @dmargala!)ndimage
rank-based filters (#3500, thanks @coderforlife!)ndimage
common linear filters (#3505, thanks @coderforlife!)flatiter.__iter__()
(#3508)has_sorted_indices
, has_canonical_format
, sort(ed)_indices()
for sparse matrices (#3509)cupy.correlate
(#3525, thanks @Dahlia-Chehata!)cupyx.scipy.sparse.kron()
(#3528)ncclSend
/ ncclRecv
from NCCL 2.7 (#3567)cupyx.scipy.fft.next_fast_len
(#3571)ndimage
generic filters (#3614, thanks @coderforlife!)cupy.cuda.cub
module by default (#2584)CUPY_CUB_BLOCK_REDUCTION_DISABLED
and CUB_DISABLED
(#3461)axis=None
in sparse min
/max
(#3515)_prepare_mask_indexing_single
(#3539)compute_30
when CUDA 11 (#3578)einsum
not to use cuTENSOR when accelerator is not set (#3592)CHAINER_SEED
(#3674)cupy.sum
(#2939)numpy.ndarray
creation in cuTENSOR operation preparation (#3393)_ArgInfo
init (#3549)_fft_convolve
(#3560)poly1d
instantiation (#3563, thanks @Dahlia-Chehata!)convolve
/correlate
(#3587)cupy.fft.fftfreq
and cupy.fft.rfftfreq
(#3653, thanks @grlee77!)cupyx.scipy.ndimage.sum
taking zero-dimensional input (#3425)CUSPARSE_VERSION
instead of CUDA_VERSION
(#3491)min
/max
to return sparse matrix (#3536)ndarray
and fix possible error in __del__
at fft
(#3543)cupy.percentile
type assignment in asarray
(#3570)__name__
to custom kernels (#3626)argmin
/argmax
return shape (#3639)cupy.show_config
(#3642)sum_duplicate
parameter in sparse min
/max
/argmin
/argmax
(#3676)cupy.cuda.*
(#3685).data()
for std::vector
(#3022)cupy.cuda.cub
reusable (#3546)CUPY_ACCELERATORS
(#3596)sum_duplicates
(#3624)cupy_cub.cu
in package data (#3572)scipy.fft
when available (#3032)_cub_reduction
(#3462)cupy.cuda.cub
is used (#3467)testing.slow
correctly (#3501)flatiter
tests (#3514)slogdet
tests to check dtypes of return values (#3577)test_helper
(#3579)numpy_cupy_array_list_equal
(#3582)numpy_cupy_array_equal
instead of numpy_cupy_array_list_equal
(#3599)testing.numpy_cupy_*
(#3621)axis=None
(#3638)min
/max
/argmin
/argmax
tests (#3656)ValueError
for invalid order
(#3498)ValueError
for invalid clipmode (#3499)TypeError
for invalid subscripts in einsum
(#3502)Published by emcastillo about 4 years ago
This is the release note of v7.7.0. See here for the complete list of solved issues and merged PRs.
cusparse<t>csrgeam2
and cusparse<t>csrgemm2
(#3666)cupy.cuda.thrust
(#3422)cuSPARSE
(#3623)ndarray
list of arrays of different dtypes (#3663)sum_duplicates
(#3636)test_helper
(#3622)csc
and erf
tests for scipy>1.2
(#3628)Published by asi1024 over 4 years ago
This is the release note of v8.0.0b4. See here for the complete list of solved issues and merged PRs.
CuPy v8.0.0b4 focuses on performance improvements by adding a general CUB based reduction kernel contributed by @leofang (#3244). We also introduce support for the upcoming CUDA 11 (#3405) although we don’t provide wheels for it yet. Last but not least, several new routines are added to improve the NumPy and SciPy functions coverage.
Change the behavior of dia_matrix.diagonal
to follow SciPy 1.5.0 specification. It does not raise ValueError
for invalid values anymore. Now an empty array is returned instead. (#3469)
cupy.shape
(#3229)_SimpleReductionKernel
(#3244, thanks @leofang!)cupyx.scipy.ndimage
sum, mean, standard deviation and variance (#3259, thanks @niteya-shah!)cupy.RawModule
(#3319, thanks @leofang!)cupy.piecewise
(#3329, thanks @Dahlia-Chehata!)cupy.trim_zeros
(#3340, thanks @Dahlia-Chehata!)cupy.sort_complex
(#3348, thanks @Dahlia-Chehata!)cupy.who
(#3361)cudaDeviceGetLimit
/ cudaDeviceSetLimit
(#3387, thanks @leofang!)polycompanion
(#3398, thanks @Dahlia-Chehata!)cusolverDn<t>potrfBatched
and cusolverDn<t>potrsBatched
(#3399, thanks @IvanYashchuk!)polyvander
(#3404, thanks @Dahlia-Chehata!)cupy.shares_memory
(#3432)testing.numpy_cupy_raises
(#3098)ValueError
for invalid arguments (#3374)ignore_error
in kernel optimization (#3410)cupyx.scipy.ndimage
stats functions (#3419)TypeError
in cupy.ndarray.__array__
(#3421)flatiter.copy()
(#3442)CArray
using 32-bit indexes (#3448)concatenate
(#3285)_count_non_nan
datatype for windows (#3350)cupyx.time.repeat
to accumulate duration after GPU synchronization (#3375)PerfCaseResult
changing _ts
(#3400)cupyx.scipy.ndimage
stats functions (#3402)cupy.power(0j, 0j)
(#3449)TypeError
in parameterize test catching CUDADriverError
(#3451)scipy.dia_matrix.diagonal
for scipy==1.5.0
(#3469)cupy.linalg.svd
(#3373)cupy._environment
(#3413, thanks @leofang!)find_packages
in setup.py
(#3424)_SimpleReductionKernel
(#3443)cupyx.optimizing.optimize
(#3397)cupy.fromfile
(#3439, thanks @jakirkham!)cupy.linalg.det docstring
(#3456, thanks @grlee77!)tofile()
(#3460, thanks @leofang!)cupy.cuda.cub
(#2598, thanks @leofang!)__cuda_array_interface__
(#3297, thanks @leofang!)__init__.py
to allow importing test packages (#3395)testing.empty
(#3438)RawModule
tests for wrong condition (#3453)unittest.mock
(#3468)Published by emcastillo over 4 years ago
This is the release note of v7.6.0. See here for the complete list of solved issues and merged PRs.
cupy.cuda.thrust
(#3415, thanks @leofang!)_count_non_nan
datatype for windows (#3391)TypeError
in parameterize test catching CUDADriverError
(#3459)concatenate
(#3472)find_packages
in setup.py
(#3436)cupy.fromfile
(#3447, thanks @jakirkham!)cupy.linalg.det docstring
(#3458, thanks @grlee77!)tofile()
(#3471, thanks @leofang!)__init__.py
to allow importing test packages (#3409)Published by kmaehashi over 4 years ago
This is the release note of v7.5.0. See here for the complete list of solved issues and merged PRs.
cupy.show_config
(#3353)put
when using scalars (#3332)xfail
s in sorting tests (#3345)linalg.svd
for 0-sized matrices (#3355)ormqr
functions in _solve
(#3356)Published by kmaehashi over 4 years ago
This is the release note of v8.0.0b3. See here for the complete list of solved issues and merged PRs.
As announced in the previous release, we are dropping support for CUDA 8.0 / 9.1 in v8 releases (#3301). Based on the feedback from users, we will continue to provide cuDNN support (#3303).
CuPy v8.0.0b3 introduces a mechanism for optimizing internal parameters when launching reduction kernels using Optuna. Depending on your GPU and the kernels you execute, you can take advantage of this feature and improve the performance of your codes by letting Optuna to automatically find the best parameters for your GPU.
To take advantage of this, call functions that perform reductions with the following:
with cupyx.optimizing.optimize(key=None):
# cupy reduction function
y = cupy.sum(x)
CuPy is also taking part in GSoC 2020 and we keep adding new functions to improve our compatibility with NumPy.
flatiter.base
property (#3250)flatiter.__len__()
special method (#3251)flatiter.__next__()
special method (#3252)putmask
function (#3261, thanks @rushabh-v!)cupy.show_config
(#3271)get_fft_plan()
(#3293, thanks @leofang!)RawKernel
(#3294, thanks @leofang!)cupy.bartlett
(#3307, thanks @niteya-shah!)mean
for sparse matrices (#3333)max_duration
argument in cupyx.time.repeat
(#3357)OptimizeContext
serialization (#3367)RawKernel
(#2606)CUPY_NVCC_GENERATE_CODE
(#3330, thanks @leofang!)max_total_time_per_trial
(#3365)cupyx.scipy.ndimage.interpolation
using ElementwiseKernel
(#3166, thanks @grlee77!)ElementwiseKernel
cpu time (#3298)blackman
, hanning
and hamming
methods (#3312, thanks @niteya-shah!)cupy.RawKernel
(#3341, thanks @leofang!)cupy.linalg.svd
(#3347)cupyx.scipy.fft
(#3311, thanks @grlee77!)put
when using scalars (#3328)ormqr
functions in _solve
(#3331)linalg.svd
for 0-sized matrices (#3354)cupy.around
behaves differently from NumPy for EVEN_NUMBER+0.5 (#3335)shape_t
instead of tuple
(#3315)Published by emcastillo over 4 years ago
This is the release note of v8.0.0b2. See here for the complete list of solved issues and merged PRs.
We are planning to drop support for CUDA 8.0 / 9.1 (#3301) and cuDNN (#3303) in future v8 releases. If you have any concerns, please feel free to leave a comment in these issues.
fallback_mode
(#2279, thanks @Piyush-555!)cupy.cuda.cufft.Plan1d
(#2644, thanks @leofang!)cupy.median
(#3134, thanks @Harshan01!)cupy.flatiter
(#3165)cupy.gcd
and cupy.lcm
(#3190, thanks @niteya-shah!)cusolverDn<t>gesvdj
and cusolverDn<t>gesvdaStridedBatched
(#3192)cupyx.scipy.ndimage.label
(#3210)cupyx.scipy.ndimage.grey_erosion
and cupyx.scipy.ndimage.grey_dilation
(#3216)cupy.diag_indices
and cupy.diag_indices_from
(#3217, thanks @rushabh-v!)cusparse<t>csrgeam2
and cusparse<t>csrgemm2
(#3220)minimum_filter
, maximum_filter
, grey_closing
, grey_opening
to scipy.ndimage
(#3239)cusolverDn<t>gesvdjBatched
(#3247)cupy.kaiser
(#3268, thanks @niteya-shah!)cupy.cuda.thrust
(#3286, thanks @leofang!)Add R2C/C2R support to cupy.cuda.cufft.PlanNd
(#3102, thanks @leofang!)
Make RawKernel
and RawModule
aware of CUDA context (alt) (#3201, thanks @leofang!)
Make diff
return AxisError
for an invalid axis (#3231, thanks @grlee77!)
Improve the efficiency of cupy.pad
for some simple cases (#3281, thanks @grlee77!)
HIP
einsum
with complex in HIP (#3203)_kernel
and reduction
(#2702)Arg
instantiation in cuda/function.pyx
(#3253)norm
(#3278)cupy.take
(#3118)_reduce_dims
call in reduction (#3262)IndexError
for R2C/C2R FFT with axes=()
(#3264, thanks @leofang!)cupy.cuda.thrust
(#3291, thanks @leofang!)cupy/cuda/_environment.py
(#3145, thanks @leofang!)cupy.fill_diagonal
to implement with cupy.flatiter
(#3207)__array_function__
(#3236)TestEigenvalue
(#3288)matmul
test (#2403)numpy_cupy_raises
(#3155)numpy_cupy_raises
(#3256)Published by asi1024 over 4 years ago
This is the release note of v7.4.0. See here for the complete list of solved issues and merged PRs.
cupy.take
(#3265)matmul
test (#3245)Published by kmaehashi over 4 years ago
This is the release note of v8.0.0b1. See here for the complete list of solved issues and merged PRs.
Known packaging issues:
cupy-cuda80
wheel packages for Windows are unavailable for this version. Linux or CUDA 9.0+ users are unaffected.CuPy gets faster and more stable towards its v8.0.0 release. This version adds a handful of new routines, adds library wide performance improvements and corrects several bugs.
cupy.scatter_add
, which had been deprecated since CuPy v4. Use cupyx.scatter_add
instead.get_global()
to cupy.RawModule
(#2510, thanks @leofang!)cupy.cuda.cufft.Plan1d
(#2644, thanks @leofang!)hstack
, vstack
, and bmat
to cupyx.scipy.sparse
(#2665, thanks @cjnolet!)cupy.require
(#3083, thanks @niteya-shah!)cupy.compress
(#3103, thanks @Harshan01!)cupy.ravel_multi_index
(#3104, thanks @grlee77!)cupy.extract
(#3109, thanks @Harshan01!)cupy.bitwise_not
as alias to invert
(#3120, thanks @Harshan01!)cupy.argwhere
(#3135, thanks @rushabh-v!)cupy.select
(#3138, thanks @niteya-shah!)cupy.cuda.ExternalStream
(#3141)cupy.array_equal
(#3189, thanks @rushabh-v!)ndarray
variants AND inplace support in fallback_mode
(#2391, thanks @Piyush-555!)axis
argument to linspace
(#2461, thanks @grlee77!)using_allocator
in cupy.cuda
(#2951, thanks @jakirkham!)__future__
imports (#2995)prod
(#3067, thanks @leofang!)cupy.scatter_add
(#3074)cupy.pad
to use cupy.linspace
instead of numpy.linspace
internally (#3101, thanks @grlee77!)range
, weights
and density
(#3124, thanks @grlee77!)ord
= 2, -2, and 'nuc' in cupy.linalg.norm
(#3130, thanks @rushabh-v!)ElementwiseKernel
in cupy.fill_diagonal
(#3139)dia_matrix
creation from SciPy equivalent (#3160, thanks @jakirkham!)labels
in the benchmark and add kwargs
to repeat
(#3172, thanks @rushabh-v!)out
parameter to cupy.concatenate
and cupy.stack
(#2983)reshape
to raise ValueError
for order 'K' (#3123)cumsum
and cumprod
(#2907)ndimage
convolve
and correlate
(#3179)c_contiguous
when indexing CArray
(#3191)size_t nbytes
in __cuda_array_interface__
(#3009, thanks @jakirkham!)fill_diagonal
(#3011)cupy.random.multivariate_normal
(#3018, thanks @espg!)ndarray.__setitem__
(#3088)cupy.cuda.cub
with CUDA < 9.2 (#3089, thanks @leofang!)cub_reduction
for CUPY_CUB_MIN
and float16 arrays (#3100)time.process_time
instead of time.clock
(#3128, thanks @rushabh-v!)svd
(#3140, thanks @rushabh-v!)cupy.prod
for half precision (#3148, thanks @leofang!)coo_matrix
(#3150)MatDescriptor
to be pickle-able (#3157, thanks @jakirkham!)erfinv
& erfcinv
in cupyx.scipy.special
(#3159, thanks @leofang!)Event.__del__
behavior on shutdown` (#3176)internal.pyx
(get_contiguous_strides
) (#1950)tempdir
context manager (#3003)intptr_t
instead of size_t
for cuSPARSE and cuBLAS handles (#3081, thanks @Harshan01!)intptr_t
for cuDNN handles (#3082, thanks @Harshan01!)using_allocator
(#3094)IndexOrValueError
(#3096)fill_diagonal
(#3171)cudaPointerAttributes
(#3183, thanks @leofang!)UnownedMemory
in the API docs (#3086, thanks @jakirkham!)convolve
and correlate
(#3161, thanks @jakirkham!)irfft
tests for compute capability != 7 (#3084)numpy_cupy_raises
cupyx.* tests (#3099)numpy_cupy_raises
(#3122)Published by asi1024 over 4 years ago
This is the release note of v7.3.0. See here for the complete list of solved issues and merged PRs.
using_allocator
in cupy.cuda
(#3087, thanks @jakirkham!)time.process_time
instead of time.clock
(#3132, thanks @rushabh-v!)ndarray.__setitem__
(#3143)ndarray
type to y
(#3152, thanks @jakirkham!)fill_diagonal
(#3156)Event.__del__
behavior on shutdown` (#3180)fill_diagonal
(#3177)UnownedMemory
in the API docs (#3090, thanks @jakirkham!)convolve
and correlate
(#3168, thanks @jakirkham!).pfnci/script.sh
(#3047)atol
of fft tests (#3105)Published by toslunar over 4 years ago
This is the release note of v7.2.0. See here for the complete list of solved issues and merged PRs.
Known packaging issues:
cupy-cuda102
) are currently unavailable on PyPI. Packages will be published after getting approval of the file size limit increase.This release adds support for CUDA 10.2 and NumPy 1.18.
linspace(..., num=1, endpoint=False, retstep=True)
(#2990)nogil
to CUB (#3000, thanks @y1r!)ParameterInfo
as a cache key (#2961)_get_axis
(#2972, thanks @jakirkham!)cub.pyx
(#3001)size_t nbytes
in __cuda_array_interface__
(#3015, thanks @jakirkham!)get_fft_plan()
and some FFT tests (#3031, thanks @leofang!)imag
for 0-size array (#3039)cupy-cuda102
(#3073)cuComplex_bridge.h
is not installed (#3043)scipy
in test_gmm
(#3050)CUPY_CI
environment variable in Travis CI and AppVeyor (#3066)Published by emcastillo over 4 years ago
This is the release note of v8.0.0a1. See here for the complete list of solved issues and merged PRs.
Known packaging issues:
cupy-cuda80
wheel packages for Windows are unavailable for this version. Linux or CUDA 9.0+ users are unaffected.cupy-cuda102
) are currently unavailable on PyPI. Packages will be published after getting approval of the file size limit increase.This release adds support for CUDA 10.2 and NumPy 1.18.
CuPy 8.0.0a1 comes with several exciting new features such as better sparse matrix support, and for users who like to write their own CUDA kernels, there is the possibility of using grid synchronization in RawKernel
and RawModule
and allow to tune the block size for ElementwiseKernels
. There are some noticeable performance improvements as well thanks to the extended support of CUB in several CuPy functions.
RawModule
(#2784)cupy.allclose
(#2799)
cupy.isclose
to return a 0-dim cupy.ndarray
instead of a float value to avoid device synchronization.dtype
argument from min
/max
(#2875)isscalar
(#2974)
cupy.isscalar
to element
, previously named as num
.digitize
(#2758)cupy.RawModule
(#2782, thanks @leofang!)cupyx.scipy.ndimage.map_coordinates
for cases with coords > 2d (#2813, thanks @grlee77!)ptp
ndarray method and function (#2859, thanks @grlee77!)cupyx.scipy.special
(#2861, thanks @grlee77!)ElementwiseKernel
to set the block_size (#2914)RawKernel
and RawModule
(#2925)cupy.conjugate
and make cupy.conj
its alias (#2982)plan
argument to cupyx.scipy.fft.*
(#2998, thanks @leofang!)nogil
to CUB (#2787, thanks @y1r!)cupy.allclose
(#2799)mean
(#2860, thanks @grlee77!)_kernel.pyx
(#2881)runtime.free()
(#2898)irfftn
(#2922)einsum
(#2928)cupy.copyto
(#2942)MemoryPointer.__repr__
(#2981)expand_dims
(#2992)random.randint
(#2828)randint
(#2829)dtype
argument from min
/max
(#2875)cupy.mean
(#2903, thanks @grlee77!)negative
(#2973)isscalar
(#2974)linspace(..., num=1, endpoint=False, retstep=True)
(#2975)numpy.can_cast
call to improve guess routine (#2673)ElementwiseKernel
(#2688)can_cast
calling to reduce overhead (#2704)getrfBatched
in linalg.slogdet
(#2735)einsum
where no contraction is necessary (#2960)true_divide
with dtype argument (#2076)keepdims
should always preserve all dimensions in CUB-based reductions (#2725, thanks @grlee77!)RawModule
(#2784)cupyx.scipy.ndimage
filter origin check (#2805, thanks @grlee77!)__del__
behavior (#2809)split
and array_split
with indices overrun (#2814)split
and array_split
with unordered indices supplied (#2815)testing.shaped_random
for shape ()
(#2870)argmin
/argmax
dtype
argument (#2872)imag
for 0-size array (#2886)size
argument in ElementwiseKernel
(#2909)thread_local.linalg
if not defined (#2915)cupy.cuda.cub.device_segmented_reduce()
not being used (#2921, thanks @leofang!)_correlate_or_convolve
(#2923)ParameterInfo
as a cache key (#2941)nvcc
command lookup (#3028)intptr_t
for cuSOLVER handles (#2718)reduction.pxi
(#2767)cuParamSetTexRef()
(#2770, thanks @leofang!)_kernel.pyx
(#2785)CArray
and family from core.pyx
(#2831)memory.pyx
(#2899)_scalar.pyx
(#2917)cupy.sort
(#2944, thanks @rushabh-v!)_op
variable in cub.pyx (#3002)RawKernel
and RawModule
(#2643)cupy.asarray
(#2821, thanks @leofang!)get_allocator
function (#2953, thanks @jakirkham!)cupy-cuda102
(#3057)cuComplex_bridge.h
is not installed (#2984)cupy.random
in kmeans example (#3026)linalg.matrix_power
(#2788)ifloordiv
with numpy 1.18 (#2852)test_helper.py
for NumPy 1.18 (#2883)TestSolveTriangular
inputs (#2927)testing.parameterize
pdb-friendly (#3024)scipy
in test_gmm
(#3048)cupyx.time.repeat
experimental (#2897)cupyx.allow_synchronize
experimental (#2947).pfnci/script.sh
(#3041)CUPY_CI
environment variable in Travis CI and AppVeyor (#3058)Published by kmaehashi over 4 years ago
This is a hot-fix release for v7.1.0 to address an issue in CUB support. Only users manually building CuPy from source with CUB support enabled are affected; wheel package users (cupy-cudaXXX
) are not affected by this issue as CUB support is not enabled in wheels.
This is the release note of v7.1.1. See here for the complete list of solved issues and merged PRs.
_get_axis
in cupy.cub
(#2986, thanks @jakirkham!)Published by emcastillo almost 5 years ago
This is the release note of v7.1.0. See here for the complete list of solved issues and merged PRs.
code_or_path
argument of cupy.RawModule
has been replaced with two keyword arguments (code
and path
) to avoid ambiguity. (#2786)randint
(#2854)random.randint
(#2862)irfftn
(#2962)RawModule
(#2786)cupyx.scipy.ndimage
filter origin check (#2810, thanks @grlee77!)__del__
behavior (#2811)thrust::complex
headers with a bug fix (#2833, thanks @leofang!)true_divide
with dtype argument (#2834)keepdims
should always preserve all dimensions in CUB-based reductions (#2848, thanks @grlee77!)array_split
with indices overrun (#2851)array_split
with unordered indices supplied (#2857)testing.shaped_random
for shape ()
(#2889)argmin
/argmax
dtype
argument (#2890)_correlate_or_convolve
(#2924)cupy.cuda.cub.device_segmented_reduce()
not being used (#2936, thanks @leofang!)thread_local.linalg
if not defined (#2937)RawKernel
and RawModule
(#2774)scipy.fft
docs (#2807, thanks @grlee77!)cupy.asarray
(#2825, thanks @leofang!)get_allocator
function (#2954, thanks @jakirkham!)linalg.matrix_power
(#2793)ifloordiv
with NumPy 1.18 (#2880)test_helper.py
for NumPy 1.18 (#2913)TestSolveTriangular
inputs (#2929)Published by niboshi almost 5 years ago
This is the release note of v6.7.0. See here for the complete list of solved issues and merged PRs.
As announced previously, this is the final release of v6 series, which is the last version supporting Python 2.
randint
(#2855)@testing.numpy_cupy_
decorators for skips (#2892)irfftn
(#2959)_fftn
(#2752)__del__
behavior (#2812)true_divide
with dtype argument (#2835)testing.shaped_random
for shape ()
(#2895)cupy.asarray
(#2826, thanks @leofang!)linalg.matrix_power
(#2791)ifloordiv
with numpy 1.18 (#2879)test_helper.py
for NumPy 1.18 (#2912)TestSolveTriangular
inputs (#2930)