Bot releases are visible (Hide)
Published by emcastillo about 2 years ago
This is the release note of v11.1.0. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
Arm (aarch64) wheels are now compiled with support for compute capability 8.7.
These wheels are available through our Pip index: pip install cupy-cuda11x -f https://pip.cupy.dev/aarch64
cupyx.scipy.special.log_softmax
(#6966)cupy.array_api
(#6929)kind
in sort
/argsort
and fix cupy.array_api.{sort,argsort}
accordingly (#6951)augassign
target is evaluated twice in JIT (#6964)cupy.array_api
(cont'd) (#6973)__device__
option is missing (#6991)_compile.py
(#6993)@pytest.mark.parametrize
in some cases (#7010)keepdims
parameter for average
(#6897)equal_nan
parameter for unique
(#6904)argpartition
use the kth argument properly (#7020)matmul
supports out
(#6899)XFAIL
for tests/cupyx_tests/scipy_tests/sparse_tests/test_coo.py
when scipy>=1.9.0rc2
(#6963)The CuPy Team would like to thank all those who contributed to this release!
@asi1024 @emcastillo @khushi-411 @kmaehashi @leofang @takagi @toslunar
Published by asi1024 about 2 years ago
This is the release note of v11.0.0. See here for the complete list of solved issues and merged PRs.
This release note only covers changes made since v11.0.0rc1 release. Check out our blog for highlights in the v11 release!
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
cupy-wheel
packageCurrently, downstream projects depending on CuPy had a hard time specifying a binary wheel as a dependency, and it was the users’ responsibility to install the correct package in their environments. CuPy v10 introduced the experimental cupy-wheel
meta-package. In this release, we declare this feature ready for production environments. cupy-wheel
will examine the users’ environment and automatically select the matching CuPy binary wheel to be installed.
For all changes in v11, please refer to the release notes of the pre-releases (alpha1, alpha2, beta1, beta2, beta3, rc1).
deg
in cupy.angle
(#6909)cupy-wheel
for v11 (#6913)dtype
of different size (#6850)cupy.win.cuda117
(#6885)The CuPy Team would like to thank all those who contributed to this release!
@emcastillo @kmaehashi @takagi
Published by kmaehashi over 2 years ago
This is the release note of v10.6.0. See here for the complete list of solved issues and merged PRs.
This is the last planned release for CuPy v10 series. We are going to release v11.0.0 on July 28th. Please start testing your workload with the v11 release candidate (pip install --pre cupy-cuda11x -f https://pip.cupy.dev/pre
). See the Upgrade Guide for the list of possible breaking changes in v11.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
Full support for CUDA 11.7 has been added as of this release. Binary packages can be installed with the following command: pip install cupy-cuda117
cupy.array_api
say "cupy" instead of "numpy" (#6795)cupy.median
for NaN inputs (#6760)ndimage.filter
tests for ROCm 4.0 (#6676)ndimage.filter
tests for ROCm 4.0 (#6676)The CuPy Team would like to thank all those who contributed to this release!
@asi1024 @asmeurer @emcastillo @kmaehashi @LostBenjamin @takagi
Published by kmaehashi over 2 years ago
This is the release note of v11.0.0rc1. See here for the complete list of solved issues and merged PRs.
We are going to release v11.0.0 on July 28th. Please start testing your workload with this release candidate (pip install --pre cupy-cuda11x -f https://pip.cupy.dev/pre
). See the Upgrade Guide for the list of possible breaking changes.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
Full support for CUDA 11.7 has been added as of this release. Binary packages can be installed with the following command: pip install --pre cupy-cuda11x -f https://pip.cupy.dev/pre
CuPy v11 provides a unified binary package named cupy-cuda11x
that supports all CUDA 11.2+ releases. This replaces per-CUDA version binary packages (cupy-cuda112
, cupy-cuda113
, …, cupy-cuda117
) provided in CuPy v10 or earlier.
Note that CUDA 11.1 or earlier still requires per-CUDA version binary packages. cupy-cuda102
, cupy-cuda110
, and cupy-cuda111
will be provided for CUDA 10.2, 11.0, and 11.1, respectively.
CuPy v11 provides cupy-cuda11x
binary package built for aarch64, which supports CUDA 11.2+ Arm SBSA and JetPack 5.
These wheels are available through our Pip index: pip install --pre cupy-cuda11x -f https://pip.cupy.dev/aarch64
ndarray
subclassing (#6720, #6755)This release allows users to subclass cupy.ndarray
, using the same protocol as NumPy:
class C(cupy.ndarray):
def __new__(cls, *args, info=None, **kwargs):
obj = super().__new__(cls, *args, **kwargs)
obj.info = info
return obj
def __array_finalize__(self, obj):
if obj is None:
return
self.info = getattr(obj, 'info', None)
a = C([0, 1, 2, 3], info='information')
assert type(a) is C
assert issubclass(type(a), cupy.ndarray)
assert a.info == 'information'
Note that view casting and new from template mechanisms are also supported as described by the NumPy documentation.
cupyx.distributed
for Sparse MatricesAll the collective calls implemented for dense matrices now support sparse matrices. Users interested in this feature should install mpi4py
in order to perform an efficient metadata exchange.
We would like to give a warm welcome to @khushi-411 who will be working in adding support for the cupyx.scipy.interpolate
APIs as part of her GSoC internship!
CuPy official Docker images have been upgraded. Users relying on these images may suffer from compatibility issues with preinstalled tools or libraries.
cupy.setxor1d
(#6582)cupyx.spatial.distance
support from pylibraft (#6690)cupy.ndarray
subclassing - Part 2 - View casting (#6720)broadcast
(#6758)reduce
(#6761)all_reduce
and minor fixes (#6762)all_to_all
, reduce_scatter
, send_recv
(#6765)cupy.ndarray
subclassing - Part 3 - New from template (ufunc) (#6775)cupyx.scipy.special.log_ndtr
(#6776)cupyx.scipy.special.expn
(#6790)cupy-cuda11x
wheel (#6800)CUPY_CUDA_VERSION
as much as possible (#6810)cupy.cuda.compile_with_cache
(#6818)cupy.poly1d.__pow__
(#6770)cupy.median
for NaN inputs (#6759)_cuda_types.py
(#6726)ndarray_base
(#6782)cupy-cuda11x
wheel (#6803)The CuPy Team would like to thank all those who contributed to this release!
@andoorve @asi1024 @asmeurer @cjnolet @emcastillo @khushi-411 @kmaehashi @leofang @LostBenjamin @pri1311 @rietmann-nv @takagi
Published by emcastillo over 2 years ago
This is the release note of v10.5.0. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
Update (2022-06-17): Wheels for CUDA 11.5 Arm SBSA are now available in the Assets section below. (#6705)
ifdef
(#6740)ifdef
for ROCm >= 4.2 (#6751)scipy==1.8.1
sparse dot bugfix (#6728)The CuPy Team would like to thank all those who contributed to this release!
@asi1024 @emcastillo @kmaehashi @leofang @takagi
Published by emcastillo over 2 years ago
This is the release note of v11.0.0b3. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
einsum
backend (#6677) (thanks @leofang!)A new accelerator for CuPy has been added (CUPY_ACCELERATORS=cutensornet
).
This feature requires cuquantum-python >= 22.03
and cuTENSOR >= 1.5.0
. And is used to accelerate and support large array sizes in the cupy.linalg.einsum
API.
CuPy v11 will drop support for ROCm 4.2. We recommend users to use ROCm 4.3 or 5.0 instead.
As per NEP29, NumPy 1.18/1.9 support has been dropped on 2021. SciPy supported versions are the one released close to NumPy supported ones.
einsum
backend (#6677)cupy.poly
(#6697)ifdef
(#6739)bincount
, histogram2d
, histogramdd
with CUB (#6701)ifdef
for ROCm >= 4.2 (#6750)Dim3
class (#6644)scatter_add
example (#6696)LOBPCG
on ROCm 5.0+ (#6603)scipy==1.8.1
sparse dot bugfix (#6727)The CuPy Team would like to thank all those who contributed to this release!
@asi1024 @Dahlia-Chehata @emcastillo @kmaehashi @leofang @takagi
Published by asi1024 over 2 years ago
This is the release note of v11.0.0b2. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
CuPy JIT has been further enhanced thanks to @leofang and @eternalphane!
It is now possible to use CUDA cooperative groups and access .shape
and .strides
attributes of ndarrays.
import cupy
from cupyx import jit
@jit.rawkernel()
def kernel(x, y):
size = x.shape[0]
ntid = jit.gridDim.x * jit.blockDim.x
tid = jit.blockIdx.x * jit.blockDim.x + jit.threadIdx.x
for i in range(tid, size, ntid):
y[i] = x[i]
g = jit.cg.this_thread_block()
g.sync()
x = cupy.arange(200, dtype=cupy.int64)
y = cupy.zeros((200,), dtype=cupy.int64)
kernel[2, 32](x, y)
print(kernel.cached_code)
The above program emits the CUDA code as follows:
#include <cooperative_groups.h>
namespace cg = cooperative_groups;
extern "C" __global__ void kernel(CArray<long long, 1, true, true> x, CArray<long long, 1, true, true> y) {
ptrdiff_t i;
ptrdiff_t size = thrust::get<0>(x.get_shape());
unsigned int ntid = (gridDim.x * blockDim.x);
unsigned int tid = ((blockIdx.x * blockDim.x) + threadIdx.x);
for (ptrdiff_t __it = tid, __stop = size, __step = ntid; __it < __stop; __it += __step) {
i = __it;
y[i] = x[i];
}
cg::thread_block g = cg::this_thread_block();
g.sync();
}
cupyx.distributed
(#6628, #6658)CuPy v10 added the cupyx.distributed
API to perform interprocess communication using NCCL in a way similar to MPI. In CuPy v11 we are extending this API to support sparse matrices as defined in cupyx.scipy.sparse
. Currently only send
/recv
primitives are supported but we will be adding support for collective calls in the following releases.
Additionally, now it is possible to use MPI (through the mpi4py
python package) to initialize the NCCL communicator. This prevents from launching the TCP server used for communication exchange of CPU values. Moreover, we recommend to enable MPI for sparse matrices communication as this requires to exchange metadata per each communication call that lead to device synchronization if MPI is not enabled.
# run with mpiexec -n N python …
import mpi4py
comm = mpi4py.MPI.COMM_WORLD
workers = comm.Get_size()
rank = comm.Get_rank()
comm = cupyx.distributed.init_process_group(workers, rank, use_mpi=True)
cupy-wheel
(EXPERIMENTAL) (#6012)We have added a new package in the PyPI called cupy-wheel
. This meta package allows other libraries to add a dependency to CuPy with the ability to transparently install the exact CuPy binary wheel matching the user environment. Users can also install CuPy using this package instead of manually specifying a CUDA/ROCm version.
pip install cupy-wheel
This package is only available for the stable release as the current pre-release wheels are not hosted in PyPI.
This feature is currently experimental and subject to change so we recommend users not to distribute packages relying on it for now. Your suggestions or comments are highly welcomed (please visit #6688.)
cupyx.distributed
(#6628).shape
and .strides
(#6668)flatten(order)
(#6613)__repr__
for cupyx.profiler._time._PerfCaseResult
(#6617)cudaDevAttrMemoryPoolsSupported
to hip (#6621)kernel.cached_code
test (#6643)cupyx.distributed
(#6658)cupy.intersect1d
(#6586)float16::operator-()
only for ROCm 5.0+ (#6624)cupy.polyval
(#6664)memcpy_async
on CUDA 11.0 (#6671)--pre
option to instructions installing pre-releases (#6612)jenkins
requirements (#6632)TestIncludesCompileCUDA
for HEAD tests (#6646)/test mini
(#6653)The CuPy Team would like to thank all those who contributed to this release!
@asi1024 @code-review-doctor @danielg1111 @davidegavio @emcastillo @eternalphane @kmaehashi @leofang @okuta @takagi @toslunar
Published by asi1024 over 2 years ago
This is the release note of v10.4.0. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
cupy-wheel
(EXPERIMENTAL) (#6012)We have added a new package in the PyPI called cupy-wheel
. This meta package allows other libraries to add a dependency to CuPy with the ability to transparently install the exact CuPy binary wheel matching the user environment. Users can also install CuPy using this package instead of manually specifying a CUDA/ROCm version.
pip install cupy-wheel
This package is only available for the stable release as the current pre-release wheels are not hosted in PyPI.
This feature is currently experimental and subject to change so we recommend users not to distribute packages relying on it for now. Your suggestions or comments are highly welcomed (please visit #6688.)
cudaDevAttrMemoryPoolsSupported
to hip (#6626)float16::operator-()
only for ROCm 5.0+ (#6629)cupy.polyval
(#6666)--pre
option to instructions installing pre-releases (#6614)jenkins
requirements (#6634)TestIncludesCompileCUDA
for HEAD tests (#6650)/test mini
(#6655)The CuPy Team would like to thank all those who contributed to this release!
@asi1024 @code-review-doctor @danielg1111 @emcastillo @kmaehashi @leofang @takagi
Published by emcastillo over 2 years ago
This is the release note of v10.3.1. See here for the complete list of solved issues and merged PRs.
This is a hot-fix release for v10.3.0 which contained a regression that prevents CuPy from working on older CUDA GPUs (Maxwell or earlier).
The CuPy Team would like to thank all those who contributed to this release!
@kmaehashi @takagi
Published by kmaehashi over 2 years ago
This is the release note of v10.3.0. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
We have published a hot-fix release v10.3.1 which addresses a regression that prevents CuPy from working in older CUDA GPUs (Maxwell or earlier).
Full support for CUDA 11.6 has been added as of this release. Binary packages are available in PyPI and can be installed with the following command: pip install cupy-cuda116
Full support for ROCm 5.0 has been added as of this release. Binary packages are available in PyPI and can be installed with the following command: pip install cupy-rocm-5-0
cupy.array_api
(#6550)cupy.copyto
to take NumPy array scalars (#6593)vectorize
(#6515)cupy.cumsum
on ROCm 5.0 (#6525)out
args parser of ufunc (#6547)cupy.fill
to properly take zero-dim cupy.ndarray
(#6548)may_share_memory
algorithm (#6565)MemoryAsyncPool
(#6596)CUPY_SETUP_ENABLE_THRUST=0
environment variable (#6488)--compiler-bindir
if cl.exe
is already on PATH
(#6514)async_malloc
tests on unsupported device (#6544)push
event of FlexCI via GitHub Actions (#6554)The CuPy Team would like to thank all those who contributed to this release!
@anaruse @asi1024 @kmaehashi @leofang @Onkar627 @takagi @toslunar @tushxr16
Published by kmaehashi over 2 years ago
This is the release note of v11.0.0b1. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
We have identified that this release contains a regression that prevents CuPy from working in older CUDA GPUs (Maxwell or earlier). We are planning to fix this issue in the next pre-release. See #6615 for the details.
cupyx.scipy.special
APIs (#6461, #6582, #6571)A series of scipy.special
routines have been added to cupyx
with optimized CUDA raw kernel implementations. loggamma
, multigammaln
, fast Hankel transformations and several other utility special functions are added in these series of PRs by @grlee77 and @khushi-411.
Full support for CUDA 11.6 has been added as of this release. Binary packages can be installed with the following commnad: pip install --pre cupy-cuda116 -f https://pip.cupy.dev/pre
Full support for ROCm 5.0 has been added as of this release. Binary packages can be installed with the following commnad: pip install --pre cupy-rocm-5-0 -f https://pip.cupy.dev/pre
CUB support in CuPy is now enabled by default. This results in faster general reductions and routines such as sum
, argmax
, argmin
having increased performance. Notice that CUB may introduce some non-deterministic behavior and this can be disabled by setting the CUPY_ACCELERATORS=""
environment variable.
CuPy v11 will drop support for ROCm 4.0. We recommend users to use ROCm 4.3 or 5.0 instead.
cupyx.scipy.special
statistical distributions (#6461)cupy.real_if_close
API (#6475)cupyx.scipy.special
loggamma, multigammaln and fast Hankel transforms (#6528)cupyx.scipy.special.{i0e, i1e}
(#6571)cupy.array_api
(#6486)cupy.copyto
to take NumPy array scalars (#6584)ndarray.ravel(order="K")
(#6585)cusparseSpGEMM()
(#6511)cupy.in1d
(#6583)cupy.fill
to properly take zero-dim cupy.ndarray
(#6481)vectorize
(#6499)cupy.cumsum
on ROCm 5.0 (#6520)out
args parser of ufunc (#6546)may_share_memory
algorithm (#6560)MemoryAsyncPool
(#6590)--compiler-bindir
if cl.exe
is already on PATH
(#6510)push
event of FlexCI via GitHub Actions (#6538)async_malloc
tests on unsupported device (#6541)The CuPy Team would like to thank all those who contributed to this release!
@anaruse @asi1024 @emcastillo @grlee77 @khushi-411 @kmaehashi @leofang @Onkar627 @peterbell10 @pri1311 @Smit-create @takagi @toslunar @tushxr16
Published by emcastillo over 2 years ago
This is the release note of v10.2.0. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
Initial support for CUDA 11.6 has been added as of this release. However, binary wheels are not yet distributed and users are expected to build CuPy from source meanwhile.
cupyx.scipy.sparse
) (#6379)__cupy_get_ndarray__
dunder method to transform objects to arrays' (#6465)cupy.show_config()
(#6476)cupyx.ndimage.spline_filter1d
for HIP (#6411)cupy.nan_to_num
(#6431)use_hip
flag in setup (#6398)cupy.__version__
instead of pkg_resources
(#6380)eigh()
for CUDA 11.6 (#6376)cupy.testing.installed
(#6387)generate.py
(#6428)skip
tag (#6477)The CuPy Team would like to thank all those who contributed to this release!
@anaruse @emcastillo @grlee77 @kmaehashi @takagi
Published by emcastillo over 2 years ago
This is the release note of v11.0.0a2 See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
As series of NumPy routines have been proposed as a good-first-issue and as a result, an increasing number of contributors have sent pull requests to help increase the number of available APIs. An issue tracker with the currently implemented issues is available at #6078.
cupy.typing
(#6251)An API equivalent to numpy.typing
to allow the introduction of data types in CuPy and user codes has been added.
Initial support for CUDA 11.6 has been added as of this release. However, binary wheels are not yet distributed and users are expected to build CuPy from source meanwhile.
Initial support for ROCm 5.0 has been added as of this release. However, binary wheels are not yet distributed and users are expected to build CuPy from source meanwhile.
CuPy v11 will drop support for ROCm 4.0. We recommend users to use ROCm 4.2/4.3 instead.
cupy.isneginf
and cupy.isposinf
(#6089)cupy.typing
(#6251)asarray_chkfinite
API. (#6275)cupyx.scipy.special
(#6302)log1p
for cupyx.scipy.special.log1p
(#6315)beta
functions to cupyx.scipy.special
(#6318)cupy.union1d
API. (#6357)cupy.float_power
(#6371)cupy.intersect1d
API. (#6402)cupy.setdiff1d
api. (#6433)cupy.format_float_scientific
API (#6474)mypy
introduction (#4955)cupyx.scipy.sparse
) (#6321)__cupy_get_ndarray__
dunder method to transform objects to arrays' (#6414)cupy.show_config()
(#6472)cupy.sort
(#6392)cupyx.ndimage.spline_filter1d
for HIP (#6406)cupy.nan_to_num
(#6408)cupyx.special.gammainc
, lpmv
and sph_harm
for hip (#6409)use_hip
flag in setup (#6391)cupyx.scipy.linalg
(#6449)cupyx.scipy.ndimage
(#6450)cupyx.scipy.signal
(#6451)cupyx.scipy.sparse
(#6454)cupyx.scipy.stats
(#6456)cupy.__version__
instead of pkg_resources
(#6332)CUPY_SETUP_ENABLE_THRUST=0
environment variable (#6390)eigh()
for CUDA 11.6 (#6347)cupy.testing.installed
(#6381)generate.py
(#6424)skip
tag (#6468)The CuPy Team would like to thank all those who contributed to this release!
@amanchhaparia @anaruse @asi1024 @emcastillo @grlee77 @IvanYashchuk @khushi-411 @kmaehashi @pri1311 @saswatpp @takagi
Published by asi1024 over 2 years ago
This is the release note of v10.1.0. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
matmul
(#6241)cupy.linalg.qr
to align with NumPy 1.22 (#6263)cupy.eye()
(#6213)compile_with_cache
returning None (#6236)flip
()-shaped array (#6237)linalg.eigh
and linalg.eigvalsh
on empty inputs (#6238)array_api
namespace (#6291)cp.linalg.solve()
implementation (#6235)cupy.positive
in API Reference (#6276)eigsh
doc (#6292)convolve2d
(#6194)percentile
and quantile
to support NumPy 1.22 (#6247)setuptools<60
in Windows CI (#6270)cuda-slow
test in FlexCI (#6339)The CuPy Team would like to thank all those who contributed to this release!
@asi1024 @kmaehashi @leofang @ptim0626 @SauravMaheshkar @takagi @thomasjpfan @toslunar @WiseroOrb
Published by asi1024 over 2 years ago
This is the release note of v11.0.0a1. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
As series of NumPy routines have been proposed as a good-first-issue and as a result, an increasing number of contributors have sent pull requests to help increase the number of available APIs. An issue tracker with the currently implemented issues is available at #6078.
cupyx.scipy.special
functions (#5687)Spherical harmonics, Legendre and Gamma functions are implemented using highly performant specific CUDA kernels. Thanks to @grlee77!
This PR adds the ability of using the CUDA Graph API to greatly reduce the overhead of kernel launching. This is done by using the stream capture API, and example follows.
Thanks to @leofang!
import cupy as cp
a = cp.random.randint(0, 10, 100, dtype=np.int32)
s = cp.cuda.Stream(non_blocking=True)
with s:
s.begin_capture()
a += 3
a = cp.abs(a)
g = s.end_capture() # work is queued, but not yet launched
g.launch()
s.synchronize()
__device__
function in CuPy JIT (#6265)The new interface cupyx.jit.rawkernel(device=True)
is supported to define a CUDA device function.
from cupyx import jit
@jit.rawkernel(device=True)
def getitem(x, tid):
return x[tid]
@jit.rawkernel()
def elementwise_copy(x, y):
tid = jit.threadIdx.x + jit.blockDim.x * jit.blockIdx.x
y[tid] = getitem(x, tid)
The following CUDA code is generated from the above python code.
__device__ int getitem_1(CArray<int, 1, true, true> x, unsigned int tid) {
return x[tid];
}
extern "C" __global__ void elementwise_copy(CArray<int, 1, true, true> x, CArray<int, 1, true, true> y) {
unsigned int tid;
tid = (threadIdx.x + (blockDim.x * blockIdx.x));
y[tid] = getitem_1(x, tid);
}
cupy.asfarray
(#6085)cupy.trapz
(#6107)cupy.array_api.linalg
(#6131)cupy.mask_indices
(#6156)cupy.array_equiv
API. (#6254)cupy.cublas.syrk
and cupy.cublas.sbmv
(#6278)cupy.vander
API. (#6279)cupy.ediff1d
API. (#6280)cupy.fabs
API. (#6282)cupyx.scipy.fft
(#6288)logit
, expit
and log_expit
to cupyx.scipy.special
(#6300)xlogy
and xlog1py
to cupyx.scipy.special
(#6301)tril_indices
and tril_indices_from
API. (#6305)cupy.format_float_positional
(#6308)cupy.row_stack
API. (#6312)triu_indices
and triu_indices_from
API. (#6316)cupy.array_api
(#6086)cupy.vectorize
(#6170)matmul
support ufunc kwargs (#6195)None
and Ellipsis
(#6222)__device__
function (#6265)__eq__
(#6287)cupy.linalg.qr
to align with NumPy 1.22 (#6225)percentile
and quantile
to support NumPy 1.22 (#6228)__all__
in cupyx.scipy.fft
(#6071)__getitem__
on Ellipsis and advanced indexing dimension (#6081)copyto
(#6121)solve
(#6167)flip
()-shaped array (#6169)logaddexp
and logaddexp2
(#6172)cupy.eye()
(#6208)linalg.eigh
and linalg.eigvalsh
on empty inputs (#6210)out
in matmul
and (tensor)dot
(#6216)compile_with_cache
returning None (#6232)array_api
namespace (#6289)__all__
from cupyx/scipy/*
(#6149)from os import path
(#6152)cp.linalg.solve()
implementation (#6161)kernel_version
from comparison table (#6072)cupy.trapz
docstring (#6239)eigsh
doc (#6266)cupy.positive
in API Reference (#6274)distutils
with setuptools
in Windows cl.exe
detection (#6025)testing.multi_gpu
to add pytest marker (#6015)multi_gpu
annotation in tests (#6098)convolve2d
(#6171)setuptools<60
in Windows CI (#6260)tril_indices
test (#6322)get_include
instead of array_equiv
for fallback test (#6333)cuda-slow
test in FlexCI (#6335)The CuPy Team would like to thank all those who contributed to this release!
@akochepasov @amanchhaparia @asi1024 @ColmTalbot @emcastillo @eternalphane @grlee77 @haesleinhuepf @khushi-411 @kmaehashi @leofang @okuta @ptim0626 @SauravMaheshkar @shwina @takagi @thomasjpfan @tom24d @toslunar @twmht @WiseroOrb @Yutaro-Sanada
Published by kmaehashi almost 3 years ago
This is the release note of v10.0.0. See here for the complete list of solved issues and merged PRs.
This release note only covers changes made since v10.0.0rc1 release. Check out our blog for highlights in the v10 release!
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
The support for advanced indexing using boolean masks has been completed in CuPy v10.
Now it is possible to index arrays using combinations of Ellipsis
, boolean flags and regular indexes such as a[[[1, 1, -3], [0, 2, 2]], [True, False, True, True]]
and a[..., [[False, True]]]
cupy.vectorize
(#6217)A long-awaited feature to ensure compatibility with NumPy vectorize
has been implemented. In this release, it is now possible to transpile lambda functions. This is especially handy when using JIT in conjunction with cupy.vectorize
:
import cupy
a = cupy.array([0.4, -0.2, 1.8, -1.2])
relu = cupy.vectorize(lambda x: (x > 0.0) * x)
print(relu(a)) # [ 0.4 -0. 1.8 -0. ]
As per the RFC in #5717 and Twitter, the minimum CUDA version that is supported by CuPy v10 is CUDA 10.2.
The minimum supported version for CuPy v10 is NCCL 2.8 as it implements the required primitives for cupyx.distributed
to work.
Following the Python 3.6 sunset on December 2021, and the compatibility lines with NumPy, starting CuPy v10, Python 3.6 will no longer be supported.
As per NEP29, NumPy 1.17 support has been dropped on July 26, 2021.
cupy.array_api.linalg
(#6199)cupy.array_api
(#6105)cupy.vectorize
(#6217)__all__
in cupyx.scipy.fft
(#6083)__getitem__
on Ellipsis and advanced indexing dimension (#6113)copyto
(#6155)logaddexp
and logaddexp2
(#6176)solve
(#6183)from os import path
(#6165)kernel_version
from comparison table (#6090)LLVM_PATH
note on document (#6101)linkcode
implementation (#6206)distutils
with setuptools
in Windows cl.exe
detection (#6138)testing.multi_gpu
to add pytest marker (#6096)multi_gpu
annotation in tests (#6100)The CuPy Team would like to thank all those who contributed to this release!
@asi1024 @emcastillo @eternalphane @kmaehashi @leofang @okuta @takagi @toslunar @twmht @Yutaro-Sanada
Published by emcastillo almost 3 years ago
This is the release note of v9.6.0. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
This is expected to be the last release of the CuPy v9 series. Please start trying your workflow with CuPy v10.0.0rc1 and let us know if you have any feedback!
Wheels for CUDA 11.5 (cupy-cuda115
) are now available.
As per the discussion in #5671, we stopped uploading pre-release binary wheels to PyPI for the health of the ecosystem. Pre-release wheels can now be downloaded from the recently introduced custom index (e.g., pip install cupy-cudaXXX -f https://pip.cupy.dev/pre
) . Note that the sdist package is available in PyPI for all versions.
Outdated (v8.0.0rc1 or earlier) pre-release binaries have been removed from PyPI. See #5667 for details.
show_config
runnable without GPU (#5839)cupy.random.shuffle
(#5887)ndarray.clip
to match numpy (#5916)__repr__
of mode and scalar in cuTENSOR (#5917)blocksize
used in cupyx.optimizing.optimize
for HIP (#5931)ravel
for strides 0 (#5998)cholesky
(#5960)CUPY_ACCELERATORS
(#5975)The CuPy Team would like to thank all those who contributed to this release!
@asi1024 @drbeh @emcastillo @kmaehashi @leofang @takagi @toslunar
Published by emcastillo almost 3 years ago
This is the release note of v10.0.0rc1. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
cupyx.distributed
(#5590)This new version provides a wrapper over NVIDIA’s NCCL library to perform communication in an MPI-like style. Currently, point-to-point and collective communication primitives are supported. Check the documentation for a complete reference of the functions.
Wheels for CUDA 11.5 (cupy-cuda115
) are now available.
Python 3.10 wheels are also available for all supported CUDA / ROCm versions.
Wheels for Jetson can be found in the attached artifacts (pip install cupy-cuda112 -f https://pip.cupy.dev/pre
).
Generator
random API in ROCm 4.3 (#5895)ROCm 4.3 fixes a series of issues that prevented the Generator
random API (#4177) to run in AMD devices.
Refer to the Upgrade Guide for the detailed description.
Peer access is enabled by default when a CuPy ndarray is stored in a different device as long as the machine topology allows it.
Device.use()
semantics to align with Stream.use()
(#5853)When exiting a context, the current device is now reverted back to the device of the parent's context scope, not the device last use()
d.
numpy.ndarray
to little-endian in cupy.array()
and its variants (#5828)Previously CuPy was copying the given numpy.ndarray
to GPU as-is, regardless of the endianness. In CuPy v10, big-endian arrays are converted to little-endian before the transfer, which is the native byte order on GPUs. This change eliminates the need to manually change the array endianness before creating the CuPy array.
cupyx.profiler
module (#5940)A new module cupyx.profiler
is added to host all profiling related APIs in CuPy. Accordingly, the following APIs are relocated to this module:
cupy.prof.TimeRangeDecorator()
-> cupyx.profiler.time_range()
cupy.prof.time_range()
-> cupyx.profiler.time_range()
cupy.cuda.profile()
-> cupyx.profiler.profile()
cupyx.time.repeat()
-> cupyx.profiler.benchmark()
The old routines are deprecated.
cupy.cuda.compile_with_cache
(#5858)An internal API cupy.cuda.compile_with_cache()
has been marked as deprecated as there are better alternatives (RawModule
, RawKernel
). While it has a long-standing history, this API has never been meant to be public. We encourage downstream libraries and users to migrate to the aforementioned public APIs.
As per the RFC in #5717 and Twitter, the minimum CUDA version that will be supported by CuPy v10 is CUDA 10.2.
The minimum supported version for CuPy v10 will be NCCL 2.8 as it implements the required primitives for cupyx.distributed
to work.
Following the Python 3.6 sunset on December 2021, and the compatibility lines with NumPy, starting CuPy v10, Python 3.6 will no longer be supported.
As per NEP29, NumPy 1.17 support has been dropped on July 26, 2021.
As per the discussion in #5671, we stopped uploading pre-release binary wheels to PyPI for the health of the ecosystem. Pre-release wheels can now be downloaded from the recently introduced custom index (e.g., pip install cupy-cudaXXX -f https://pip.cupy.dev/pre
) . Note that the sdist package is available in PyPI for all versions.
Outdated (v8.0.0rc1 or earlier) pre-release binaries have been removed from PyPI. See #5667 for details.
We are planning to drop cuSPARSELt v0.1.0 support in CuPy v10 final release. (#6045)
cupyx.distributed
(#5590)cupy.positive()
(#5774)cupy.array_api
(#5783)cupy.array_api
typing (#5821)trim_mean
from scipy.stats to cupyx (#5900)dtype
and casting
arguments to cupy.concatenate()
(#5759)cupy.array()
and its variants (#5828)connected_components
(#5830)show_config
runnable without GPU (#5835)NotImplementedError
clarity (#5841)Device.use()
semantics to align with Stream.use()
(#5853)cupy.cuda.compile_with_cache
(#5858)cupy.array_api
with Python 3.7 (#5873)bitorder
option to cupy.packbits
(#5898)LLVM_PATH
issue in hipRTC (#5933)cupyx.profiler
module (#5940)index_t
for faster address calculation (#5981)cudaRuntimeGetVersion
instead of CUDA_VERSION
for CUDA Python support (#5723)int
(#5785)cupy.random.shuffle
(#5838)driver.get_build_version
(#5861)nan_to_num
to comply with NumPy API (#5870)_get_cuda_build_version
for ROCm (#5888)__repr__
of mode and scalar in cuTENSOR (#5901)setDevice
succeed (#5904)ndarray.clip
to match numpy (#5910)copyto
with non-contiguous multidevice (#5913)setDevice
in CuPy codebase (#5915)blocksize
used in cupyx.optimizing.optimize
for HIP (#5921)with device
in code base (#5963)__dlpack__
protocol (#5970)cupyx.tools.install_library
for windows (#5977)ravel
for strides 0 (#5978)with
context for streams (#5985)correlate/convolve
(#6046)cupy.array()
(#5842)cupy.array()
(#5844)cupyx.scipy.ndimage.interpolation.map_coordinates
(#5845)addAddNameExpression
with addNameExpression
in NVRTC binding (#5938)_loops
(#5967)CUPY_DLPACK_EXPORT_VERSION
consistent (#5982)setDaemon
method (#6059)driver.get_build_version
(#5860)ppc64le
and aarch64
are supported on conda-forge (#5865)compile_with_cache()
in upgrade guide (#5883)scipy.sparse.csgraph
module (#5903)cupy.linalg.cholesky
(#5941)CUPY_ACCELERATORS
(#5948)np.matrix
in the difference section (#5966)RawKernel
example to docs (#5973)sphinx-copybutton
(#5976)TestJoin
(#5764)LLVM_PATH
(#5849)cupy_builder
(#5856)array-api-tests
in FlexCI (#5862)pylibcugraph
(#5874)ccache
path to support CentOS (#5882)trim_mean
test (#5944)ccache
in Pre-review Test (#6027)The CuPy Team would like to thank all those who contributed to this release!
@Anubha13kumari @SaharCarmel @SwastikTripathi @amathews-amd @anaruse @asi1024 @carterbox @drbeh @emcastillo @iskode @kmaehashi @lanttu1243 @leofang @okuta @prkhrsrvstv1 @spiralray @takagi @toslunar
Published by kmaehashi about 3 years ago
This is the release note of v9.5.0. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
As per the discussion in #5671, we stopped uploading pre-release binary wheels to PyPI for the health of the ecosystem. Pre-release wheels can now be downloaded from the recently introduced custom index (e.g., pip install cupy-cudaXXX -f https://pip.cupy.dev/pre
) . Note that the sdist package is available in PyPI for all versions.
Outdated (v8.0.0rc1 or earlier) pre-release binaries have been removed from PyPI. See #5667 for details.
cupyx.optimize
to save file when no optimization ran (#5760)stdexcept
in hip headers (#5777)MAX_NDIM
and add guards/tests (#5798)--pre
from ROCm source build instructions (#5782)setup.py
(#5758)test_eigenvalue.py
(#5643)TestSplineFilter1dLargeArray
(#5694)unittest.TestCase
for performance (#5710)TestSplineFilter1dLargeArray
marked slow and reduced memory usage (#5729)The CuPy Team would like to thank all those who contributed to this release!
@christinahedges @emcastillo @kmaehashi @leofang @takagi @toslunar
Published by kmaehashi about 3 years ago
This is the release note of v10.0.0b3. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
This release starts implementing the Array API standard for interoperability with other tensor libraries. Please check the CuPy documentation to see a list of the currently available features.
As per the RFC in #5717 and twitter, the minimum CUDA version that will be supported by CuPy v10 is CUDA 10.2.
Following the Python 3.6 sunset on December 2021, and the compatibility lines with NumPy, starting CuPy v10, Python 3.6 will no longer be supported.
As per the discussion in #5671, we stopped uploading pre-release binary wheels to PyPI for the health of the ecosystem. Pre-release wheels can now be downloaded from the recently introduced custom index (e.g., pip install cupy-cudaXXX -f https://pip.cupy.dev/pre
) . Note that the sdist package is available in PyPI for all versions.
Outdated (v8.0.0rc1 or earlier) pre-release binaries have been removed from PyPI. See #5667 for details.
numpy.array_api
module as cupy.array_api
(#5698)cupyx.optimize
to save file when no optimization ran (#5757)bitorder
support to cupy.unpackbits
(#5765)MAX_NDIM
and add guards/tests (#5749)stdexcept
in hip headers (#5769)compile_time_env
with CUPY_
(#5740)--pre
from ROCm source build instructions (#5773)setup.py
(#5745)cupy_setup_options
(#5754)setup.py
(#5756)unittest.TestCase
for performance (#5599)TestSplineFilter1dLargeArray
(#5693)TestSplineFilter1dLargeArray
marked slow and reduced memory usage (#5724)The CuPy Team would like to thank all those who contributed to this release!
@christinahedges @emcastillo @iskode @kmaehashi @leofang @povinsahu1909 @takagi @toslunar