Bot releases are hidden (Show)
Published by emcastillo about 3 years ago
This is the release note of v10.0.0b2. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
CuPy is one of the first libraries providing support for the newly released CUDA Python bindings. To try it, install cuda-python manually and set the CUPY_USE_CUDA_PYTHON=1
environment variable when building CuPy as written in the documentation.
Support for ROCm 4.3 has been added in the latest release and binary wheels are provided as well. Note that there is currently an issue with ROCm 4.3 that prevents it from running in several environments. The current workaround is to set the LLVM_PATH
variable to the llvm folder included in ROCm 4.3 installation (e.g., export LLVM_PATH=/opt/rocm-4.3/llvm
).
As per the discussion in #5671, we will stop uploading pre-release binary wheels to PyPI for the health of the ecosystem. Pre-release wheels can now be downloaded from the assets section of each GitHub release page (e.g., pip install cupy-cudaXXX -f https://github.com/cupy/cupy/releases/tag/v10.0.0b2
) . Note that the sdist package is available in PyPI for all versions.
We are also going to remove outdated (v8.0.0rc1 or earlier) pre-release binary wheels from PyPI on September 20th. See #5667 for details.
cupyx.scipy.sparse.linalg.minres
(#5585)cupy.random.Generator
(#5618)cupy.random.Generator
(#5624)cupy.random.Generator
(#5645)cupy.random.Generator
(#5648)cupy.random.Generator
(#5655)ncclAvg
and ncclBfloat16
for NCCL (#5545)rocSOLVER
(#5555)beta
distribution of cupy.random.Generator
(#5573)unique
for empty array (#5654)batch_identity
helper (#5614)__array_function__
feature by default (#5644)skipTest
in test_decomp_lu
(#5593)lsmr
tests xfail for CSR matrices on HIP (#5597)for_all_dtypes_combination
tests (#5629)cupy-cuda114
to duplicate detection (#5621)The CuPy Team would like to thank all those who contributed to this release!
@hauntsaninja @leofang @povinsahu1909 @yashasvimisra2798
Published by kmaehashi about 3 years ago
This is the release note of v9.4.0. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
Changes NVRTC compile process to produce SASS (CUBIN files) instead of PTX so that kernels compiled with a new CUDA Toolkit version can be run with earlier CUDA Drivers. Check the CUDA Compatibility Guide and NVRTC Documentation for detailed information. We believe most users will not be affected by this change, but you can revert to the previous behavior by setting CUPY_COMPILE_WITH_PTX=1
environment variable just in case.
Support for ROCm 4.3 has been added in the latest release and binary wheels are provided as well. Note that there is currently an issue with ROCm 4.3 that prevents it from running in several environments. The current workaround is to set the LLVM_PATH
variable to the llvm folder included in ROCm 4.3 installation (e.g., export LLVM_PATH=/opt/rocm-4.3/llvm
).
ncclAvg
and ncclBfloat16
for NCCL (#5656)unique
for empty array (#5658)__array_function__
feature by default (#5653)for_all_dtypes_combination
tests (#5639)skipTest
in test_decomp_lu
(#5672)The CuPy Team would like to thank all those who contributed to this release!
@grlee77 @leofang @yashasvimisra2798
Published by kmaehashi about 3 years ago
This is the release note of v10.0.0b1. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
cupy-cuda114
)Along with the new CUDA toolkit version, support for NCCL 2.10.3 and cuDNN 8.2.2 libraries is added.
Compute capability 86 support for GPUs of the RTX 30X0 and AX000 series is also added.
CuPy is participating in Google Summer of Code under the NumFOCUS organization.
Our student @povinsahu1909 is working hard to add support for sparse linear algebra solvers and increasing the compatibility of the new random number generation API.
Changes NVRTC compile process to produce SASS (CUBIN files) instead of PTX so that kernels compiled with a new CUDA Toolkit version can be run with earlier CUDA Drivers. Check the CUDA Compatibility Guide and NVRTC Documentation for detailed information.
By adopting the new DLPack exchange protocol proposed in the Python array API standard, cupy.fromDlpack
has been deprecated in favor of cupy.from_dlpack
.
cupy-cuda102
, cupy-cuda110
and cupy-cuda111
wheels are not available yet in PyPI. In the meantime, they can be downloaded from the Assets section below. See #4971 for detailed instructions._GUFunc
through cupyx
(#5408)jit.gridsize()
syntax in CuPy JIT (#5461)jit.laneid()
and jit.warpsize
syntax in CuPy JIT (#5462)compute_86
(#5434)setup.py
(#5453)matrix_power
support stacked matrices (#5458)cudaDeviceDisablePeerAccess
wrapper (#5495)atomic_add
on HIP (#5383)ndarray.view
(#5428)types
attribute of ufunc (#5448)texture_memory
option in affine_transform
not supported by HIP (#5464)linalg.lstsq
for empty matrix (#5467)integers
(#5479)cudaMemoryType
in the pointer attributes and fix for HIP (#5544)cudnnGetVersion
on import (#5326)__call__()
for built-in functions (#5361)use_32bit_indexing
from CArray
(#5376)dtype.name
instead dtype.char
(#5444)-I
in hipRTC (#5486)__HIP_PLATFORM_AMD__
at build time (#5554)apply_along_axis
(#5432)Returns
section (#5433)user_guide/basic.rst
device agnostic section (#5435)from_dlpack
instead of fromDlpack (#5488)fromDlpack()
(#5509)scipy.fft
backend usage (#5514)envvar
construct (#5570)license_file
option in setup.cfg
(#5406)numpy<1.21
(#5384)all
requirements (#5577)hip
label (#5538)pull_request_target
instead for auto notify bot (#5541)numpy.unwrap
for NumPy 1.21 (#5385)medfilt
for scipy>=1.7.0
(#5386)numpy.typeDict
utilization (#5388)The CuPy Team would like to thank all those who contributed to this release!
@12rambau @grlee77 @leofang @maxim-belkin @Palash-Vishnani @povinsahu1909 @the-lay
Published by kmaehashi about 3 years ago
This is the release note of v9.3.0. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
cupy-cuda114
)Along with the new CUDA toolkit version, support for NCCL 2.10.3 and cuDNN 8.2.2 libraries is added.
Compute capability 86 support for GPUs of the RTX 30X0 and AX000 series is also added.
cupy-cuda102
, cupy-cuda110
and cupy-cuda111
wheels are not available yet in PyPI. In the meantime, they can be downloaded from the Assets section below. See #4971 for detailed instructions.setup.py
(#5471)compute_86
(#5519)matrix_power
support stacked matrices (#5525)atomic_add
on HIP (#5405)ndarray.view
(#5442)types
attribute of ufunc (#5455)integers
(#5484)linalg.lstsq
for empty matrix (#5506)_setStream()
(#5507)cudaMemoryType
in the pointer attributes and fix for HIP (#5571)use_32bit_indexing
from CArray
(#5414)__call__()
for built-in functions (#5422)cudnnGetVersion
on import (#5446)-I
in hipRTC (#5502)__HIP_PLATFORM_AMD__
at build time (#5565)apply_along_axis
(#5441)Returns
section (#5452)user_guide/basic.rst
device agnostic section (#5456)eigh
and eigvalsh
(#5499)scipy.fft
backend usage (#5532)envvar
construct (#5586)license_file
option in setup.cfg
(#5411)numpy<1.21
(#5412)unwrap
tests for v9 (#5426)all
requirements (#5582)hip
label (#5540)pull_request_target
instead for auto notify bot (#5542)numpy.typeDict
utilization (#5403)The CuPy Team would like to thank all those who contributed to this release!
@12rambau @leofang @maxim-belkin @Palash-Vishnani
Published by asi1024 over 3 years ago
This is the release note of v9.2.0. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
cupy-cuda113
) and AMD ROCm 4.2 (cupy-rocm-4-2
) and binary wheels are now available on PyPI.cupy-cuda111
wheels only support CUDA 11.1.1 and will not work with CUDA 11.1.0 (#5313).cupy-cuda110
and cupy-cuda111
wheels are not available yet in PyPI. In the meantime, they can be downloaded from the Assets section below. See #4971 for detailed instructions.cupy.show_config()
(#5285)MemoryAsyncPool
: Use the "current" mempool instead of the "default" one (#5271)check_availablity
for cupy.cusolver
(#5336)CUDAArray
(#5375)cdef
instead of cpdef
where appropriate (#5274)matmul
docstring (#5281)ExternalStream
(#5312)user_guide/basic.rst
: various improvements (#5356)setup_requires
(#5273)The CuPy Team would like to thank all those who contributed to this release!
@leofang @maxim-belkin
Published by asi1024 over 3 years ago
This is the release note of v10.0.0a2. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
cupy-cuda113
) and AMD ROCm 4.2 (cupy-rocm-4-2
) and binary wheels are now available on PyPI.len
, min
, max
Python built-ins.
len(arr)
: Equivalent to arr.shape[0]
.min(scalar1, scalar2, ...)
: Returns the minimum value of the inputs.max(scalar1, scalar2, ...)
: Returns the maximum value of the inputs..ndim
, .size
attributes of ndarray
.(x, y), z = ...
jit.grid()
API, similar to numba.cuda.grid
.
x, y, z = cupyx.jit.grid(3)
(x
is equal to threadIdx.x + blockIdx.x * blockDim.x
.)cupyx.jit.shfl_down_sync(mask, var, val_id)
(__shfl_down_sync(mask, var, val_id)
)cupyx.scipy.sparse.{coo,csr,csc}_matrix
now provides the reshape
method.CUDA 9.2 and NCCL 2.4 are no longer supported in CuPy v10.
The same cupy.cuda.Stream
instance can now safely be shared between multiple threads. To achieve this, CuPy v10 will not destroy the stream (i.e., call cudaStreamDestroy
) if the stream is the current stream of any thread.
cupy-cuda111
wheels only support CUDA 11.1.1 and will not work with CUDA 11.1.0 (#5313).cupy-cuda110
and cupy-cuda111
wheels are not available yet in PyPI. In the meantime, they can be downloaded from the Assets section below. See #4971 for detailed instructions.len
, min
, max
, .ndim
, .size
in jit (#5319)jit.grid()
syntax in CuPy JIT (#5334)cupy.show_config()
(#5073)nan
, posinf
, neginf
in cupy.nan_to_num
(#5295)cupy.einsum
(#5203)check_availablity
for cupy.cusolver
(#5207)MemoryAsync
to keep a weakref to stream (#5264)sm_61
etc (#5304)CUDAArray
(#5342)cdef
instead of cpdef
where appropriate (#5274)matmul
docstring (#5174)ExternalStream
(#5305)user_guide/basic.rst
: various improvements (#5356)setup_requires
(#5273)TestAllocator
(#5308)MemoryPoolAsync
tests (#5350)The CuPy Team would like to thank all those who contributed to this release!
@anaruse @eternalphane @leofang @maxim-belkin @povinsahu1909
Published by emcastillo over 3 years ago
This is the release note of v10.0.0a1. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
In the meantime, they can be downloaded from the Assets section below. See #4971 for the detailed instructions.
CuPy now automatically manages the stream switching when changing a device, so the user is not responsible for changing the stream anymore.
This pull-request also includes a bug fix for #5143. An existing code mixing with stream:
blocks and stream.use()
may get different results as the stream set via use()
API will not be reactivated when exiting a stream context.
s1 = cupy.cuda.Stream()
s2 = cupy.cuda.Stream()
s3 = cupy.cuda.Stream()
with s1:
s2.use()
with s3:
pass
cupy.cuda.get_current_stream() # -> CuPy v10 returns `s1` instead of `s2`.
cupy.cuda.Device
context manager interface thread safe (#5083)The use of a single cupy.cuda.Device
context manager object with multiple threads was leading to incorrect behavior when restoring the previous device since the first versions of CuPy. Now the correct device is restored back so user code relying on this incorrect behavior might need to be updated.
cupyx.allow_synchronize
and cupyx.DeviceSynchronized
APIs (#5226)These APIs used for detecting when synchronization to a device was happening have been deprecated since they don’t provide reliable behavior.
Note: many of these PRs are backported to the v9 series and available since the release.
MemoryAsyncPool
to support malloc_async
(#4592)random
for uniform [0, 1) generation (#4906)poisson
distribution to random API (#4927)hfft2
, ihfft2
, hfftn
, and ihfftn
to cupyx.scipy.fft
(#4996)cupyx.jit.atomic_add
(#5169)MemoryAsyncPool
statistics and limits (#5177)cupy_backends.cuda.libs
(#4930)cupyx.jit.rawkernel
as experimental (#5005)-ftz=true
(#5007)show_config
(#5054)cupy.cuda.Device
context manager interface thread safe (#5083)out
to cupy.asnumpy()
(#5155)sum_labels
to cupyx.scipy.ndimage.measure
(#5200)show_config
(#5215)syncdetect
APIs (#5226)THRUST_OPTIONAL_CPP11_CONSTEXPR
(#5002)ndarray.copy
(#5004)lanes
(#5045)poisson
to support lam array (#5087)svds
(#5140)cupyx.scipy.sparse.linalg.spsolve
(#5168)scatter_add
failure on Windows (#5173)MemoryAsyncPool
: Use the "current" mempool instead of the "default" one (#5191)matmul
for input with relaxed strides (#5205)check_availability
for cuTensor routines (#5206)constexpr
(#5233)cupy.random.Generator.integers
(#5247)cupy.core
submodule to cupy._core
(#3820)cpdef
functions to cdef
in _kernel.pyx
(#5084)cupy.cupy
(#5121)cupyx.time.repeat
(#5015)cupy.cuda.runtime.getDeviceProperties
(#5016)CFunctionAllocator
and ManagedMemory
(#5025)scatter_add
(#5129)coo.py
(#5139)stream.pyx
(#5144)cupyx.scipy.ndimage.sum_labels
to docs (#5223)cutensor
import in the test (#4965)install_tests
runnable without depending on current path (#4969)pip install -e
on Windows CI for performance (#4970)TestStream
cleanup (#5042)testing.slow
(#5061)hfftn
for HIP/ROCm (#5099)cupyx.jit.atomic_add
test (#5186)The CuPy Team would like to thank all those who contributed to this release!
@anaruse @beingaryan @eternalphane @grlee77 @insertinterestingnamehere @keckj @leofang @povinsahu1909 @UmashankarTriforce
Published by emcastillo over 3 years ago
This is the release note of v9.1.0. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
In the meantime, they can be downloaded from the Assets section below. See #4971 for the detailed instructions.
cupy.cuda.Device
context manager interface thread safe (#5083)The use of a single cupy.cuda.Device
context manager object with multiple threads was leading to incorrect behavior when restoring the previous device since the first versions of CuPy. Now the correct device is restored back so user code relying on this incorrect behavior might need to be updated.
cupyx.jit.atomic_add
(#5181)getsource
option in CuPy JIT (#5089)cupy.cuda.Device
context manager interface thread safe (#5147)sum_labels
to cupyx.scipy.ndimage.measure
(#5222)show_config
(#5230)lanes
(#5094)svds
(#5161)scatter_add
failure on Windows (#5178)cupyx.scipy.sparse.linalg.spsolve
(#5180)poisson
to support lam array (#5182)matmul
for input with relaxed strides (#5240)check_availability
for cuTensor routines (#5244)constexpr
(#5250)cupy.random.Generator.integers
(#5261)cupy.cupy
(#5137)coo.py
(#5141)stream.pyx
(#5150)cupyx.scipy.ndimage.sum_labels
to docs (#5245)cupyx.jit.atomic_add
test (#5187)pip install -e
on Windows CI for performance (#5242)The CuPy Team would like to thank all those who contributed to this release!
@anaruse @beingaryan @eternalphane @grlee77 @insertinterestingnamehere @leofang
Published by kmaehashi over 3 years ago
This is the release note of v9.0.0.
This release note only covers the changes since v9.0.0rc1 release. Read the blog for the details of new features introduced in CuPy v9!
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
CuPy now integrates the Python binding for the cuSPARSELt library that accelerates sparse matrix multiplications on NVIDIA Ampere GPUs. We are planning to start using it in CuPy sparse APIs to transparently improve performance.
cupyx.scipy.sparse.csgraph
is added to the API with support for the connected_components
method. The support for cuGraph is optional and can be installed through conda-forge or by manually building CuPy. Currently, PyPI wheels do not have built-in support for cuGraph.
MemoryAsyncPool
to support malloc_async
(#5034)By using cupy.cuda.set_allocator(cupy.cuda.MemoryAsyncPool().malloc)
it is now possible to use the stream ordered memory allocations introduced in CUDA 11.2.
By using the cupyx.empty_pinned()
, cupyx.empty_like_pinned()
, cupyx.zeros_pinned()
cupyx.zeros_like_pinned()
it is possible to obtain NumPy ndarrays with their storage located in pinned memory to improve performance of data movement.
In the meantime, they can be downloaded from the Assets section below. See #4971 for the detailed instructions.
See here for the complete list of solved issues and merged PRs after v9.0.0rc1 release. For all changes since v9 series, please refer to the release notes of the pre-releases ((alpha1, beta1, beta2, beta3, rc1).
random
for uniform [0, 1) generation (#5003)MemoryAsyncPool
to support malloc_async
(#5034)connected_components
(#5113)cupy_backends.cuda.libs
(#5014)-ftz=true
(#5035)cupyx.jit.rawkernel
as experimental (#5057)show_config
(#5065)ndarray.copy
(#5078)cupy.core
submodule to cupy._core
(#4987)cpdef
functions to cdef
in _kernel.pyx
(#5098)cupyx.time.repeat
(#5027)cupy.cuda.runtime.getDeviceProperties
(#5029)CFunctionAllocator
and ManagedMemory
(#5059)cutensor
import in the test (#4981)TestStream
cleanup (#5052)testing.slow
(#5063)The CuPy Team would like to thank all those who contributed to this release!
@anaruse @leofang @povinsahu1909
Published by asi1024 over 3 years ago
This is the release note of v9.0.0rc1. See here for the complete list of solved issues and merged PRs.
We are planning to release the final v9.0.0 on April 22th. Please start testing your workload with this release. See the Upgrade Guide for the list of possible breaking changes.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
Now creating raw kernels out of python functions is possible thanks to the introduction of the @cupyx.jit.rawkernel
decorator.
from cupyx import jit
@jit.rawkernel()
def f(x, y, z, n):
tid = jit.threadIdx.x + jit.blockIdx.x * jit.blockDim.x
ntid = jit.blockDim.x * jit.gridDim.x
for i in range(tid, n, ntid):
z[i] = x[i] + y[i]
n = numpy.uint32(1024)
x = cupy.arange(n)
y = cupy.arange(n)
z = cupy.empty((n,), dtype='l')
f[16, 16](x, y, z, n)
We have added an interface to support Generalized Universal Functions based on the one in Dask. Currently, it is used in matmul
to ensure compatibility with __array_ufunc__
numpy dispatching.
cuTENSOR support is now enabled in wheel packages. To use cuTENSOR features you will need to install the shared library using python -m cupyx.tools.install_library --cuda 11.2 --library cutensor
after installing wheels.
Following NumPy, we have adopted the pydata_sphinx_theme
in our documentation site starting from this release.
In the meantime they can be downloaded from the Assets section below. See #4971 for the detailed instructions.
cupy.cuda.nccl
is hidden by default (#4919)NCCL wrapper is no longer imported in cupy/cuda/__init__.py
requiring it to be explicitly imported from cupy.cuda.nccl
.
NCCL and cuDNN shared libraries are no longer bundled in all wheels. To activate features using NCCL / cuDNN in CuPy v9, you will need to install these libraries using python -m cupyx.tools.install_library
tool after installing CuPy wheels. See the Installation Guide for details.
By eliminating the default bundling of cuDNN & NCCL we have achieved further reductions in the wheel size averaging 5x.
cupy.bool
, cupy.int
, cupy.float
and cupy.complex
(#4790)Following NumPy 1.20 API, these aliases for the Python scalar types have been deprecated.
cupy.bool_
, cupy.int_
, cupy.float_
and cupy.complex_
should be used instead when required.
The official Docker image is now updated to use CUDA 11.2 and Python 3.8.
cupyx.scipy.sparse.linalg.lobpcg
(#4281)pinv
(#4686)cupy.random.Generator.standard_normal
(#4885)__syncthreads()
in CuPy JIT (#4941)nvrtcGetSupportedArchs
(#4691)norm='forward'
/'backward'
in cupy.fft
functions (#4797)norm='forward'
/'backward'
in cupyx.scipy.fft
functions (#4816)cupy.ndarray
type shift
in cupy.roll
(#4884)--threads
option when building CuPy (#4908)lib
directory to support cuTENSOR/NCCL (#4912)cupy_backends.cuda.libs
(#4919)cupy/cuda/cutensor.py
(#4920)cupy.linalg
(#4363)histogram
test failures (#4777)randint
with NumPy (#4808)UnboundLocalError
on copy_from_host_async
(#4900)out
arg verifier in new random interface. (#4904)_SimpleReductionKernel
(#4909)CUDAarray
tests (#4946)CArray._indexing()
only in CuPy JIT mode (#4951)cupy.testing
package (#3868)cupy-cuda112
support from documentation (#4761)cupy-cuda112
support from documentation" (#4785)cupyx.scipy.statistics.correlation
under ROCm/HIP (#4781)sepfir2d
tests (#4804)test_poly1d_pow_scalar
(#4854)TestPolyArithmeticDiffTypes
under HIP/ROCm (#4657)TestPolyfitParametersCombinations
when deg == 0
under ROCm/HIP (#4758)TestPolyfitCovMode
when deg == 0
under ROCm/HIP (#4759)TestInvh
under ROCm/HIP (#4760)HCC_AMDGPU_TARGET
at runtime (#4766)MT19937
not implemented in hipRAND
(#4769)CODEOWNERS
file (#4757)The CuPy Team would like to thank all those who contributed to this release!
@anaruse @aryamccarthy @grlee77 @leofang @mattvend @povinsahu1909 @venkywonka @viantirreau @withshubh
Published by asi1024 over 3 years ago
This is the release note of v8.6.0. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
We expect this version to be the final release for v8.x series. Please start testing your workloads with the latest v9.x pre-release.
In the meantime they can be downloaded from the Assets section below. See #4971 for the detailed instructions.
linalg.pinv
on empty matrices (#4783)cupy.linalg
(#4839)csc
/csr
argmax
/argmin
(#4858)cupy-cuda112
support from documentation (#4762)cupy-cuda112
support from documentation" (#4786)test_poly1d_pow_scalar
(#4889)CODEOWNERS
file (#4788)cupy.testing
package (#4876)The CuPy Team would like to thank all those who contributed to this release!
@anaruse @aryamccarthy @leofang @povinsahu1909 @withshubh
Published by emcastillo over 3 years ago
This is the release note of v8.5.0. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
When installing cupy from the regular sdist wheel, Cython files are provided instead of .cpp
ones so an environment capable of running the latest Cython (0.29.22) is required.
cupy.fuse
(#4639)install_library
(#4703)cupyx.scipy.ndimage
- Part 1 (#4642)shape=None
to mean shape=()
(#4622)poly1d
return types for NumPy 1.20 (#4623)linspace
to NumPy 1.20 (Fix tests for incompatible behavior of NumPy 1.20 linspace
) (#4625)order=0
interpolation (#4570)cupy.array
from nested list of zero-dim ndarray (#4571)normal
and lognormal
support array args (#4626)uniform
-based random distributions support array args (#4671)math_constants.h
(#4679)constexpr
in cupy_thrust.cu
(#4732)constexpr
only for windows (#4740)gesvdj_batched
info array size (#4747)numpy.bool
(#4589)numpy.complex
(#4624)cupy-cuda112
support from documentation (#4762)pyproject.toml
(#4748)numpy.VisibleDeprecationWarning
(#4581)The CuPy Team would like to thank all those who contributed to this release!
@grlee77 @leofang @wphicks
Published by emcastillo over 3 years ago
This is the release note of v9.0.0b3. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
In addition, we have ensured compatibility with the newly released NumPy 1.20 and SciPy 1.6.
cupy.vectorize
DevelopmentThe development of the CUDA JIT for CuPy is progressing steadily. We currently support regular if/while/for statements and constant declarations. Currently the JIT is only used in cupy.vectorize
with an almost complete support. Its uses will be extended in upcoming releases.
Starting in v9.0.0b3, we are providing cupy-rocm-4-0
binary packages (wheels) for ROCm 4.0. Check the installation guide for the details.
Support for ROCm/HIP is being addressed by fixing bugs and providing a stable CI environment in order to ensure a smooth development.
When installing cupy from the regular sdist wheel, Cython files are provided instead of .cpp
ones so an environment capable of running the latest Cython (0.29.22) is required.
The current base Docker images have been updated from CUDA 10.2 to CUDA 11.2.
splu
, spilu
and factorized
to cupyx.scipy.sparse.linalg
(#4392)SparseEfficiencyWarning
and warn inefficient comparison (#4213)cupy.fuse
(#4492)nvrtcGetCUBIN
API (#4558)install_library
(#4680)cupy.cublas
under HIP (#4652)cupy._indexing.generate
under HIP (#4654)TestPoly1dMathArithmetic
under HIP/ROCm (#4656)eigenvalue
under HIP/ROCm (#4661)convolve
under HIP/ROCm (#4668)linspace
to NumPy 1.20 (#4604)poly1d
return types for NumPy 1.20 (#4611)shape=None
to mean shape=()
(#4616)has_sorted_indices
of sparse array (#4564)normal
and lognormal
support array args (#4615)uniform
-based random distributions support array args (#4638)_get_arch
(#4682)constexpr
in cupy_thrust.cu
(#4730)constexpr
only for windows (#4735)cupy.core._dtype.to_cuda_dtype
whenever possible (#3853)numpy.bool
(#4586)numpy.complex
(#4620)CodeBlock
to generate human-readable code in JIT (#4633)cupyx.scipy.linalg.lu_factor
(#4561)cupy-cuda112
support from documentation (#4761)pyproject.toml
(#4734)numpy.VisibleDeprecationWarning
(#4498)The CuPy Team would like to thank all those who contributed to this release!
@anaruse @leofang @pentschev @wphicks
Published by emcastillo over 3 years ago
This is the release note of v8.4.0. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
As announced in #4360, we have removed pre-release wheels earlier than v6.0.0rc1 from PyPI. Those version wheels can be found at the GitHub release page of every version, and can be installed by specifying -f
option:
pip install --pre cupy-cuda101 -f https://github.com/cupy/cupy/releases/v6.0.0rc1
kron
(#4547)polynomial.__eq__
(#4555)--device-c
for RDC compile (#4505)cupy.concatenate
typecheck for out with different dtype (#4528)cupy.take
from an empty array (#4542)FutureWarning
(#4510)The CuPy Team would like to thank all those who contributed to this release!
@leofang @mor2code
Published by kmaehashi over 3 years ago
This is the release note of v9.0.0b2. See here for the complete list of solved issues and merged PRs.
This release adds preliminary support for the new random API introduced in NumPy 1.17.
Since our implementation is based on cuRAND, we currently support the following BitGenerator
objects: XORWOW
, MRG32k3a
, and Philox4x3210
. Notice that they are different from NumPy ones. The new random module is currently in development and only a few distributions are supported in this first release (#4557). Please check the documentation for further reference.
Several bugs have been corrected for AMD devices and added support for ROCm 3.9. Now almost all the CuPy core functionalities are checked to work with HIP/ROCm. However, there are still some issues that require support from AMD such as using arrays with a size larger than 2^32 elements in element-wise and reduction routines.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
As announced in #4360, we have removed pre-release wheels earlier than v6.0.0rc1 from PyPI. Those version wheels can be found at the GitHub release page of every version, and can be installed by specifying -f
option:
pip install --pre cupy-cuda101 -f https://github.com/cupy/cupy/releases/v6.0.0rc1
Generator
(#4177)cufftXtMakePlanMany
and cufftXtExec
(#4407)cupy.ElementwiseKernel
(#4433)cupyx.scipy.signal.oaconvolve
(#4468)cupy.round
(#4539)cupyx.scipy.signal
functions (#4525)__format__
in ndarray
(#4544)kron
(#4536)polynomial.__eq__
(#4554)cupy.argwhere
and cupy.nonzero
(#4367)--device-c
for RDC compile (#4470)axis
of cupy.gradient
(#4523)__cuda_array_interface__
and __array_ufunc__
on HIP (#4524)cupy.concatenate
typecheck for out with different dtype (#4527)cupy.take
from an empty array (#4530)order=0
interpolation (#4552)polynomial.__eq__
to numpy.bool_
(#4563)cupyx.scipy.ndimage.zoom
for outputs of size 1 (#4568)cupy.array
from nested list of zero-dim ndarray
(#4569)random.uniform
may return high (#4509)FutureWarning
(#4226)numpy_cupy_*
decorators (#4403)return
statements (#4519)accept_error
was not raised from test code (#4566)The CuPy Team would like to thank all those who contributed to this release!
@anaruse @coderforlife @grlee77 @leofang
Published by asi1024 almost 4 years ago
This is the release note of v8.3.0. See here for the complete list of solved issues and merged PRs.
cupy.random.bytes
not working (#4323)rcond
arg of linalg.lstsq
(#4408)linalg.lstsq
for complex types (#4426)cupy.searchsorted
on HIP (#4447)solve_triangular
(#4459)cupy.lib
(#4353)Test
(#4372)scipy.fft
module to the API comparison table (#4391)cupy.random
functions/methods (#4474)extra_compile_args
for each module (#4384)linalg.lstsq
(#4425)[jenkins]
requirement (#4473)TestOrderFilter
(#4480)v8.3.0
(#4500)The CuPy Team would like to thank all those who contributed to this release!
@anaruse @grlee77 @leofang
Published by asi1024 almost 4 years ago
This is the release note of v9.0.0b1. See here for the complete list of solved issues and merged PRs.
As announced in #4300, CuPy v9 no longer supports the following out-dated components:
cupy-cuda90
will not be released for v9)As announced in #4360, we are going to remove pre-release wheels earlier than v6.0.0rc1 from PyPI on 2021-01-28. Those version wheels can be found at the GitHub release page of every version, and can be installed by specifying -f
option:
pip install --pre cupy-cuda101 -f https://github.com/cupy/cupy/releases/v6.0.0rc1
cupyx.scipy.signal.fftconvolve
(#3828)cupyx.scipy.sparse.linalg.gmres
(#4236)cupyx.scipy.sparse.linalg.interface
that exposes LinearOperator
Feature (#4258)cupyx.scipy.sparse.csr_matrix.diagonal
and cupyx.scipy.sparse.csr_matrix.setdiag
(#4284)cupy.ReductionKernel
and @cupy.fuse
(#4289)cupyx.ndimage.measurements
: add center_of_mass
, histogram
, labeled_comprehension
(#4311)cudaLaunchHostFunc
(#4338)cupyx.scipy.sparse.linalg.spsolve_triangular
(#4356)cupyx.scipy.ndimage.fourier_ellipsoid
(#4361)cupyx.scipy.stats.entropy
(#4369)cupy.quantile
(#4370)cupyx.scipy.sparse.linalg.spsolve
(#4375)tril
, triu
and find
to cupyx.scipy.sparse
(#4382)cupy.interp
(#4418)LinearOperator
to cg
and gmres
in cupyx.scipy.sparse.linalg
(#4422)CArray
and CIndexer
(#3683)NotImplemented
(#4198)cupyx.scipy.ndimage.spline_filter
(#4314)poly1d
(#4399)ndimage
: support all interpolation boundary modes (#4400)ndimage
: add grid_mode option to zoom (#4401)LinearOperator
to eigsh
and svds
in cupyx.scipy.sparse.linalg
(#4428)__cuda_array_interface__
for HIP (#4482)scan_core
in cupy/core/_routines_math.pyx (#4316)scan
in cupy/core/_routines_math.pyx (#4366)scan
(#4464)rcond
arg of linalg.lstsq
(#4365)linalg.lstsq
for complex types (#4390)cupy.searchsorted
on HIP (#4437)solve_triangular
(#4452)cupy.random.normal
fix broadcasting of scale and loc arguments (#4457)__cuda_array_interface__
for HIP (#4458)cumsum
and cumprod
(#4460)cupy.lib
(#3713)cupy_backends
(#4088)cupyx.scipy.sparse.linalg.eigsh
(#4275)cupy.linalg.inv
(#4293)Test
(#4320)testing.helper
(#4442)scipy.fft
module to the API comparison table (#4278)cupy.random
functions/methods (#4319)spsolve_triangular
to the reference (#4438)LinearOperator
(#4441)__module__
attr to parameterized class (#4239)testing.numpy_cupy_allclose
with per-dtype tolerance (#4269)fft_tests
: unittest -> pytest (#4287)testing.helper
to work without unittest
(#4304)[jenkins]
requirement (#4325)cupy.vectorize
(#4341)linalg.lstsq
(#4388)TestGmres
(#4394)TestLinearOperator
(#4413)TestLinearOperator
(#4420)RawKernel
test (#4472)TestEntropy
: Requires scipy>=1.4.0
for axis
parameter (#4481)The CuPy Team would like to thank all those who contributed to this release!
@anaruse @carterbox @coderforlife @dmargala @grlee77 @leofang @mor2code @venkywonka @wphicks
Published by asi1024 almost 4 years ago
This is the release note of v8.2.0. See here for the complete list of solved issues and merged PRs.
cupy/_environment.py
(#4329)ndimage.measurements functions
(#4204)AssertFunctionIsCalled
(#4253)cupyx.linalg
package (#4202)assert
statement instead of self.assert*
methods (#4297)__bytes__
(#4255)numpy_cupy_equal
for case that both numpy cupy raise errors (#4260)linalg.norm
(#4230)The CuPy Team would like to thank all those who contributed to this release!
@grlee77 @leofang
Published by emcastillo almost 4 years ago
This is the release note of v9.0.0a2. See here for the complete list of solved issues and merged PRs.
Update (2020-12-02): Unfortunately, the Windows build of this release is not working. We have taken down Windows wheels from PyPI, but if you need one for reference purposes you can still download them from the Assets section below. We are working hard to resolve this issue towards the next v9.0.0b1 release.
cupy.vectorize
& Initial CUDA JIT supportWith this release, we are including a very early version of a Python to CUDA transpiler that will allow users to write their own CUDA kernels in Python, similarly to what Numba does. However, while Numba works on the bytecode and directly outputs the PTX code using LLVM, our approach uses the Python AST to directly translate the source code to C-CUDA and compile it using the NVIDIA toolchain, aiming to achieve a higher performance in the long run.
import cupy
def f(x, y):
# This code will be compiled to a CUDA kernel by our JIT
return x * x + y * y
x = cupy.linspace(0, 10, 6)
y = cupy.linspace(0, 20, 6)
func = cupy.vectorize(f)
out = func(x, y)
# out is [ 0. 20. 80. 180. 320. 500.]
The initial version provides the limited support of primitive operators but we will be going forward in the upcoming releases. Check out #4290 if you are interested.
Thanks to @leofang now it is possible to use headers and libraries that were not possible before in RawKernel
or RawModule
due to the NVRTC reliance. With the new jitify=True
option, Jitify is applied to your code so that you can use libraries such as the cuRAND device API, or CUB device routines in your raw kernels.
cupyx.lapack
now as a public interface to cuBLASUntil now, cuBLAS & cuSOLVER bindings were not publicly exposed in the API. However, with the introduction of cupyx.lapack
by @anaruse, now it is possible to use LAPACK compatible routines backed by cuBLAS & cuSOLVER with a much simpler interface.
We are going to drop support for Python 3.5 and obsolete libraries such as CUDA 9.0 and NumPy 1.16. Leave a comment in #4300 if you have any concerns in your use-case.
cupy.cusolver.gesv
that uses cusolverDn<t1><t2>gesv
(#3917)cupy.cusolver.gels
that uses cusolverDn<t1><t2>gels
(#4073)cupy.vectorize
(#4135)spline_filter1d
and spline_filter
to cupyx.scipy.ndimage.interpolation
(#4145)cupyx.scipy.sparse.linalg.svds
(#4155)cupyx.scipy.sparse.linalg.cg
(#4222)cupyx.lapack
(#4235)cupy.testing.NumpyError
(#4225)cupyx.scipy.sparse.linalg.eigsh
with CUDA 9.2 (#4231)cupy.random.randint
(#4160)cupyx.scipy.sparse.linalg.eigsh
(#4214)convolve
/correlate
(#4248)AssertFunctionIsCalled
(#4233)cupy.fft._callback
(#4276)cupy.fft._callback
(#4283)cupy.random.bytes
not working (#4318).imag = 0
at hipFFT workaround (#4234)assert
statement instead of self.assert*
methods (#4292)free_all_blocks
reference (#4196)spline_filter
functions to ndimage docs (#4265)cupy-cuda111
package now on PyPI (#4333)extra_compile_args
for each module (#4336)CUPY_NUM_BUILD_JOBS
option (#4339)cupyx.scipy.ndimage
- Part 1 (#4271)testing.parameterized
(#4178)testing.parameterize
(#4192)numpy_cupy_equal
for case that both numpy cupy raise errors (#4244)__bytes__
(#4252)equal_nan
toggle for NaN
values in array_equal
(#4203)linalg.norm
(#4227)DeprecationWaring
on truth value on empty array (#4308)The CuPy Team would like to thank all those who contributed to this release!
@grlee77 @aitikgupta @anaruse @leofang
Published by kmaehashi almost 4 years ago
This is the release note of v8.1.0. See here for the complete list of solved issues and merged PRs.
Support for CUDA 11.1 is added in #4184, with CUDA 11.1, GeForce RTX 30 series and Quadro RTX series can now be used in CuPy.
Update (2020-11-25): cupy-cuda111
is now available on PyPI.
CuPy for CUDA 11.1 (cupy-cuda111
) wheel packages are currently only available for Windows. We are going to publish Linux wheels once we get approval from the PyPI team. Meanwhile, Linux wheels can be downloaded from the Assets section below (or pip install cupy-cuda111 -f https://github.com/cupy/cupy/releases/tag/v8.1.0
).
cudaGetDeviceProperties
(#4103)order
option in cupy.testing.shaped_random
(#4104)show_config
(#4079)cupy.RawKernel
(#4154)csr2csc
for zero-size matrix (#3922)argmax
and argmin
for F-order inputs (#4106)getDeviceProperties
for HIP (#4113)argmax
/argmin
in CUB block reduction for F-order arrays with ndim > 1 (#4115)cupy.cuda.cufft
(#4117)np.nan
and np.inf
constant values properly in ndimage functions (#4133)type_dispatcher.cuh
(#4134)compute_35
for CUDA 11.0+ (#4140)argwhere
for 0d inputs (#4174)linalg
not supplied (#4189)nonzero
for 0d inputs (#4190)cupyx.scipy.sparse
(#3959)cupy.fft
package (#4066)cupy.cusolver
(#4076)_normalize_axis_index
to cupy/core/internal.pyx
(#4086)cupyx.rsqrt
submodule (#4116)cupyx.scipy.special
(#4119)matmul
from core.pyx
to _routine_linalg.pyx
(#4123)cupy.cutensor
(#4147)cupy.manipulation
submodule to cupy._manipulation
(#4181)cupy.io
submodule to cupy._io
(#4183)cupyx.scipy.fft
(#4186)cupy.linalg
package (#4187)CUDA_VERSION
define for Cython compilation (#4035)pytest.skip
(#4179)cupyx.scipy.ndiamge
stats functions (#4182)The CuPy Team would like to thank all those who contributed to this release!
@anaruse @garanews @grlee77 @leofang