Bot releases are visible (Hide)

cupy - v10.0.0b2

Published by emcastillo about 3 years ago

This is the release note of v10.0.0b2. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

Support for CUDA Python (#5638)

CuPy is one of the first libraries providing support for the newly released CUDA Python bindings. To try it, install cuda-python manually and set the CUPY_USE_CUDA_PYTHON=1 environment variable when building CuPy as written in the documentation.

Support for AMD ROCm 4.3

Support for ROCm 4.3 has been added in the latest release and binary wheels are provided as well. Note that there is currently an issue with ROCm 4.3 that prevents it from running in several environments. The current workaround is to set the LLVM_PATH variable to the llvm folder included in ROCm 4.3 installation (e.g., export LLVM_PATH=/opt/rocm-4.3/llvm).

Announcements

Removal of Alpha/Beta/RC Wheels from PyPI

As per the discussion in #5671, we will stop uploading pre-release binary wheels to PyPI for the health of the ecosystem. Pre-release wheels can now be downloaded from the assets section of each GitHub release page (e.g., pip install cupy-cudaXXX -f https://github.com/cupy/cupy/releases/tag/v10.0.0b2) . Note that the sdist package is available in PyPI for all versions.
We are also going to remove outdated (v8.0.0rc1 or earlier) pre-release binary wheels from PyPI on September 20th. See #5667 for details.

Changes

New Features

Support batched QR solver (#5583)
Add cupyx.scipy.sparse.linalg.minres (#5585)
Add Log Series distribution to cupy.random.Generator (#5618)
Add Power distribution to cupy.random.Generator (#5624)
Add support for CUDA Python (#5638)
Add Chi-square distribution to cupy.random.Generator (#5645)
Add Dirichlet distribution to cupy.random.Generator (#5648)
Add F distribution to cupy.random.Generator (#5655)

Enhancements

Add ncclAvg and ncclBfloat16 for NCCL (#5545)
Add new eigensolvers from rocSOLVER (#5555)
Add support for array input in beta distribution of cupy.random.Generator (#5573)
Release the GIL for several NCCL ops (#5574)
Allow to compile using PTX with an envvar (#5622)
Show CUDA Python version (#5651)
Fix version check for new ROCm version definition (#5657)
Rest of version check fix for new ROCm version definition (#5660)
Add ROCm 4.3 in duplicate detection (#5669)

Bug Fixes

Fix compute capability check (#5600)
Fix FFT convolve for shapes containing 1 (#5609)
Fix squareness checks (#5642)
Fix unique for empty array (#5654)

Code Fixes

Add batch_identity helper (#5614)
Remove unnecessary comments (#5631)

Documentation

Update Sphinx to 4.1.2 (#5612)
Fix random docstring (#5628)
Support ROCm v4.3 in document (#5633)
__array_function__ feature by default (#5644)

Tests

Fix skipTest in test_decomp_lu (#5593)
Mark lsmr tests xfail for CSR matrices on HIP (#5597)
Increase test timeout (#5601)
Fix cubic for_all_dtypes_combination tests (#5629)
Add CI for ROCm 4.3 (#5630)
Reload GPG key for ROCm 4.2 test (#5636)
Fix branch name of cuda-python (#5650)
Add a workaround for ROCm 4.3.0 for testing (#5662)

Others

Add cupy-cuda114 to duplicate detection (#5621)
Bump version to v10.0.0b2 (#5679)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@hauntsaninja @leofang @povinsahu1909 @yashasvimisra2798

cupy - v9.4.0

Published by kmaehashi about 3 years ago

This is the release note of v9.4.0. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

Compile with SASS (CUBIN) for CUDA versions >= 11.1 (#5097)

Changes NVRTC compile process to produce SASS (CUBIN files) instead of PTX so that kernels compiled with a new CUDA Toolkit version can be run with earlier CUDA Drivers. Check the CUDA Compatibility Guide and NVRTC Documentation for detailed information. We believe most users will not be affected by this change, but you can revert to the previous behavior by setting CUPY_COMPILE_WITH_PTX=1 environment variable just in case.

Support for AMD ROCm 4.3

Changes

Enhancements

Compile with SASS for CUDA versions >= 11.1 (#5611)
Allow to compile using PTX with an envvar (#5634)
Add ncclAvg and ncclBfloat16 for NCCL (#5656)
Fix version check for new ROCm version definition (#5661)
Rest of version check fix for new ROCm version definition (#5670)

Bug Fixes

Fix FFT convolve for shapes containing 1 (#5613)
Fix the RTC call path for HIP (#5620)
Fix compute capability check (#5646)
Fix squareness checks (#5652)
Fix unique for empty array (#5658)

Code Fixes

Fix kernel names to be consistent (#5625)
Remove unnecessary comments (#5635)

Documentation

Update Sphinx to 4.1.2 (#5616)
__array_function__ feature by default (#5653)
Support ROCm v4.3 in document (#5674)

Tests

Increase test timeout (#5615)
Increase timeout for CUDA 11.4 tests (#5617)
Add CI for ROCm 4.3 (#5632)
Reload GPG key for ROCm 4.2 test (#5637)
Fix cubic for_all_dtypes_combination tests (#5639)
Add a workaround for ROCm 4.3.0 for testing (#5663)
Fix skipTest in test_decomp_lu (#5672)

Others

Bump version to v9.4.0 (#5680)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@grlee77 @leofang @yashasvimisra2798

cupy - v10.0.0b1

Published by kmaehashi about 3 years ago

This is the release note of v10.0.0b1. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

CuPy now supports CUDA 11.4 (`cupy-cuda114`)

Along with the new CUDA toolkit version, support for NCCL 2.10.3 and cuDNN 8.2.2 libraries is added.

Compute capability 86 support for GPUs of the RTX 30X0 and AX000 series is also added.

Google Summer of Code

CuPy is participating in Google Summer of Code under the NumFOCUS organization.

Our student @povinsahu1909 is working hard to add support for sparse linear algebra solvers and increasing the compatibility of the new random number generation API.

Compile with SASS (CUBIN) for CUDA versions >= 11.1 (#5097)

Changes without compatibility

Support the new DLPack exchange protocol (#5306)

By adopting the new DLPack exchange protocol proposed in the Python array API standard, cupy.fromDlpack has been deprecated in favor of cupy.from_dlpack.

Known Issues

cupy-cuda102, cupy-cuda110 and cupy-cuda111 wheels are not available yet in PyPI. In the meantime, they can be downloaded from the Assets section below. See #4971 for detailed instructions.

Changes

New Features

Texture memory 2D/3D affine transformations (#5171)
Support the new DLPack exchange protocol (#5306)
Add cupyx.scipy.sparse.linalg.lsmr (#5331)
JIT: Support all atomic intrinsics (#5387)
Expose _GUFunc through cupyx (#5408)
Add geometric distribution to new Generator (#5443)
Support Numba-like jit.gridsize() syntax in CuPy JIT (#5461)
Support Numba-like jit.laneid() and jit.warpsize syntax in CuPy JIT (#5462)
Add cupyx.scipy.sparse.linalg.cgs (#5524)
Add hypergeometric distribution to new Generator (#5560)

Enhancements

Compile with SASS for CUDA versions >= 11.1 (#5097)
Support NCCL v2.9.9 (#5268)
Support CUDA 11.4 and compute_86 (#5434)
Update NumPy/SciPy pinning in setup.py (#5453)
Make matrix_power support stacked matrices (#5458)
Support hipSPARSE and fix streams not set in some generic APIs in cuSPARSE (#5472)
Add cudaDeviceDisablePeerAccess wrapper (#5495)
Support cuDNN v8.2.2 (#5516)
Support NCCL v2.10.3: library installer and document (#5521)

Bug Fixes

JIT: Fix supported dtype of atomic_add on HIP (#5383)
Fix cupy.nanmedian's axis parameter to accept a sequence other than a tuple (#5389)
Fix astype from boolean (#5410)
Fix compatibility issues of ndarray.view (#5428)
Fix types attribute of ufunc (#5448)
Fix new DLPack protocol error messages and tests (#5449)
texture_memory option in affine_transform not supported by HIP (#5464)
Fix linalg.lstsq for empty matrix (#5467)
Fix reshape (#5470)
Fix random generator output not being raveled (#5478)
Fix random integers (#5479)
Fix availability tests in cuSOLVER and cuSPARSE (#5492)
Add missing hipSPARSE include to builder (#5515)
prune cuFFT static lib by major cc ver (#5531)
Fix casts from bool in ufunc inputs (#5539)
Access cudaMemoryType in the pointer attributes and fix for HIP (#5544)
Fix casts in ufunc outputs (#5550)
Code fix for {cu, roc}SOLVER (#5558)
Fix CUDA API call on module initialization (#5561)
Fix the RTC call path for HIP (#5569)
Fix broadcast error messages (#5579)

Code Fixes

Do not call cudnnGetVersion on import (#5326)
JIT: Fix __call__() for built-in functions (#5361)
Add HIP symbol redefinitions (#5362)
Remove the data member use_32bit_indexing from CArray (#5376)
Use dtype.name instead dtype.char (#5444)
Try to use -I in hipRTC (#5486)
Hide modules from public APIs (#5522)
consistent kernel names (#5551)
Use the new macro __HIP_PLATFORM_AMD__ at build time (#5554)

Documentation

Add upgrade guide for v10 (#5278)
Update tag lines in package description and docs index (#5399)
Fix typo in apply_along_axis (#5432)
Fix indent of Returns section (#5433)
Update user_guide/basic.rst device agnostic section (#5435)
Support CUDA 11.4 on documents (#5447)
Update install guide with new NumPy/SciPy versions (#5454)
Use from_dlpack instead of fromDlpack (#5488)
Use Sphinx 4.1.0 (#5489)
Bump ReadTheDocs configuration to version 2 (#5491)
Fix docs of eigh and eigvalsh (#5494)
Add a lingering doc page for fromDlpack() (#5509)
Document scipy.fft backend usage (#5514)
Replaced the links for NumPy docs as per issue #3418 (#5548)
Use Sphinx's envvar construct (#5570)
Fix intersphinx for SciPy 1.7.1 docs (#5587)

Installation

Fix license_file option in setup.cfg (#5406)
Import numpy before Cython (#5482)

Tests

Add tests for num_to_num's optional parameters (#5337)
Add script for ROCm CI on Jenkins (#5378)
Skip unwrap tests for numpy<1.21 (#5384)
Enable strict xfail in pytest (#5407)
Remove xfail in windows jitify test (#5409)
Fix preloading slow tests (#5440)
Add script for CUDA 11.4 CI on FlexCI (#5457)
Increase memory for CUDA 11.4 tests (#5477)
Fix DLPack test for ROCm/HIP (#5485)
Fix "Revert test decorators order" (#5498)
Fix some tests for HIP (#5501)
Fix FlexCI Linux tests (#5505)
Add CUDA 11.4 for FlexCI helper script (#5528)
Increase timeout for CUDA 11.4 tests (#5575)
Update tests to install all requirements and add PATH (#5576)
Add Cython to all requirements (#5577)

Others

Notify conflict by mergify (#5371)
Fix mergify to only comment when pull-request is open (#5439)
Fix mergify condition (#5513)
Add auto notify bot for hip label (#5538)
Use pull_request_target instead for auto notify bot (#5541)
Fix auto notify bot for issues (#5546)
Disable Mergify's auto-merge (#5556)
Bump version to v10.0.0b1 (#5595)
Fix signal tests for scipy 1.7.0 (#5368)
Fix numpy.unwrap for NumPy 1.21 (#5385)
Fix signaltools medfilt for scipy>=1.7.0 (#5386)
Fix deprecated numpy.typeDict utilization (#5388)

The CuPy Team would like to thank all those who contributed to this release!

@12rambau @grlee77 @leofang @maxim-belkin @Palash-Vishnani @povinsahu1909 @the-lay

cupy - v9.3.0

Published by kmaehashi about 3 years ago

This is the release note of v9.3.0. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

CuPy now supports CUDA 11.4 (`cupy-cuda114`)

Along with the new CUDA toolkit version, support for NCCL 2.10.3 and cuDNN 8.2.2 libraries is added.

Compute capability 86 support for GPUs of the RTX 30X0 and AX000 series is also added.

Known Issues

cupy-cuda102, cupy-cuda110 and cupy-cuda111 wheels are not available yet in PyPI. In the meantime, they can be downloaded from the Assets section below. See #4971 for detailed instructions.

Changes

Enhancements

Support NCCL v2.9.9 (#5402)
Update NumPy/SciPy pinning in setup.py (#5471)
Support CUDA 11.4 and support compute_86 (#5519)
Support cuDNN v8.2.2 (#5523)
Make matrix_power support stacked matrices (#5525)
Support NCCL v2.10.3: library installer and document (#5526)

Bug Fixes

JIT: Fix supported dtype of atomic_add on HIP (#5405)
Fix cupy.nanmedian's axis parameter to accept a sequence other than a tuple (#5416)
Fix compatibility issues of ndarray.view (#5442)
Fix types attribute of ufunc (#5455)
Fix random integers (#5484)
Fix random generator output not being raveled (#5487)
Fix astype from boolean (#5490)
Fix reshape (#5504)
Fix linalg.lstsq for empty matrix (#5506)
Add missing checks and _setStream() (#5507)
Fix availability tests in cuSOLVER and cuSPARSE (#5534)
prune cufft static lib by major cc ver (#5536)
Fix casts from bool in ufunc inputs (#5549)
Code fix for {cu, roc}SOLVER (#5566)
Access cudaMemoryType in the pointer attributes and fix for HIP (#5571)
Fix broadcast error messages (#5584)
Fix casts in ufunc outputs (#5589)
Fix broken build on CUDA 9.2 (#5598)

Code Fixes

Remove the data member use_32bit_indexing from CArray (#5414)
JIT: Fix __call__() for built-in functions (#5422)
Do not call cudnnGetVersion on import (#5446)
Add HIP symbol redefinitions (#5475)
Try to use -I in hipRTC (#5502)
Hide modules from public APIs (#5533)
Use the new macro __HIP_PLATFORM_AMD__ at build time (#5565)

Documentation

Update tag lines in package description and docs index (#5415)
Fix typo in apply_along_axis (#5441)
Fix indent of Returns section (#5452)
Update user_guide/basic.rst device agnostic section (#5456)
Update install guide with new NumPy/SciPy versions (#5465)
Bump ReadTheDocs configuration to version 2 (#5497)
Fix docs of eigh and eigvalsh (#5499)
Use Sphinx 4.1.0 (#5500)
Document scipy.fft backend usage (#5532)
Support CUDA 11.4 on documents (#5535)
Replaced the links for NumPy docs as per issue #3418 (#5553)
Use Sphinx's envvar construct (#5586)
Fix intersphinx for SciPy 1.7.1 docs (#5588)

Installation

Fix license_file option in setup.cfg (#5411)
Import numpy before Cython (#5483)

Examples

Tests

Skip unwrap tests for numpy<1.21 (#5412)
Remove xfail in windows jitify test (#5418)
Enable strict xfail in pytest (#5423)
Add missing DLPack test for complex numbers (#5425)
Fix unwrap tests for v9 (#5426)
Fix preloading slow tests (#5445)
Add script for ROCm CI on Jenkins (#5468)
Add script for CUDA 11.4 CI on FlexCI (#5473)
Increase memory for CUDA 11.4 tests (#5480)
Fix "Revert test decorators order" (#5518)
Fix FlexCI Linux tests (#5520)
Add CUDA 11.4 for FlexCI helper script (#5543)
Fix scipy requirement in tests (#5563)
Fix some tests for HIP (#5578)
Update tests to install all requirements and add PATH (#5581)
Add Cython to all requirements (#5582)

Others

Notify conflict by mergify (#5419)
Fix mergify to only comment when pull-request is open (#5510)
Fix mergify condition (#5517)
Add auto notify bot for hip label (#5540)
Use pull_request_target instead for auto notify bot (#5542)
Fix auto notify bot for issues (#5547)
Disable Mergify's auto-merge (#5562)
Bump version to v9.3.0 (#5596)
Fix deprecated numpy.typeDict utilization (#5403)
Fix signal tests for SciPy 1.7.0 (#5413)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@12rambau @leofang @maxim-belkin @Palash-Vishnani

cupy - v9.2.0

Published by asi1024 over 3 years ago

This is the release note of v9.2.0. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

CuPy now supports CUDA 11.3 (cupy-cuda113) and AMD ROCm 4.2 (cupy-rocm-4-2) and binary wheels are now available on PyPI.

Known Issues

cupy-cuda111 wheels only support CUDA 11.1.1 and will not work with CUDA 11.1.0 (#5313).
cupy-cuda110 and cupy-cuda111 wheels are not available yet in PyPI. In the meantime, they can be downloaded from the Assets section below. See #4971 for detailed instructions.

Changes

Enhancements

Add CUDA 11.3 headers (#5232)
Do not use handles unless requested in cupy.show_config() (#5285)
Use independent version of hipFFT for ROCm 4.1 and later (#5351)
Support cuTENSOR v1.3.1 (#5370)
Support cuDNN v8.2.1 (#5372)

Bug Fixes

MemoryAsyncPool: Use the "current" mempool instead of the "default" one (#5271)
Fix MemoryAsync to keep a weakref to stream (#5307)
Fix cuFFT callback for sm_61 etc (#5325)
Fix large arrays assignment (#5333)
Fix check_availablity for cupy.cusolver (#5336)
Fix cuDNN preloading (#5365)
Ensure source array is C-contiguous before copying to CUDAArray (#5375)
Remove unnecessary print (#5377)

Code Fixes

Use cdef instead of cpdef where appropriate (#5274)
Fix cub repository url (#5288)

Documentation

Fix matmul docstring (#5281)
Update list of wheels in README (#5284)
Add user guide for FFT (#5286)
Fix deadlink to tutorial and reorder in README (#5291)
Add user guide for streams & events (#5302)
Document ExternalStream (#5312)
user_guide/basic.rst: various improvements (#5356)
Add ROCm 4.2 support to install docs (#5360)

Installation

Exclude Cython 3 from setup_requires (#5273)
Add upper restrictions to NumPy/SciPy versions (#5321)

Tests

Fix threading memory pool tests (#5289)
Fix Windows CI kernel cache (#5317)
Xfail random generator tests for HIP (#5359)
Tentatively pin to SciPy 1.6 in Windows CI (#5369)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@leofang @maxim-belkin

cupy - v10.0.0a2

Published by asi1024 over 3 years ago

This is the release note of v10.0.0a2. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

CuPy now supports CUDA 11.3 (cupy-cuda113) and AMD ROCm 4.2 (cupy-rocm-4-2) and binary wheels are now available on PyPI.
The following Python syntax and new APIs can now be used in JIT target functions.
- Calling len, min, max Python built-ins.
  - len(arr): Equivalent to arr.shape[0].
  - min(scalar1, scalar2, ...): Returns the minimum value of the inputs.
  - max(scalar1, scalar2, ...): Returns the maximum value of the inputs.
- Accessing .ndim, .size attributes of ndarray.
- Unpacking nested tuples.
  - (x, y), z = ...
- jit.grid() API, similar to numba.cuda.grid.
  - x, y, z = cupyx.jit.grid(3) (x is equal to threadIdx.x + blockIdx.x * blockDim.x.)
- Warp shuffle and sync functions.
  - cupyx.jit.shfl_down_sync(mask, var, val_id) (__shfl_down_sync(mask, var, val_id))
cupyx.scipy.sparse.{coo,csr,csc}_matrix now provides the reshape method.

Changes without compatibility

Drop CUDA 9.2 & NCCL 2.4 Support (#5214)

CUDA 9.2 and NCCL 2.4 are no longer supported in CuPy v10.

Changes in Stream behavior (#5251)

The same cupy.cuda.Stream instance can now safely be shared between multiple threads. To achieve this, CuPy v10 will not destroy the stream (i.e., call cudaStreamDestroy) if the stream is the current stream of any thread.

Known Issues

cupy-cuda111 wheels only support CUDA 11.1.1 and will not work with CUDA 11.1.0 (#5313).
cupy-cuda110 and cupy-cuda111 wheels are not available yet in PyPI. In the meantime, they can be downloaded from the Assets section below. See #4971 for detailed instructions.

Changes

New Features

Add reshape method for COO, CSR and CSC matrices (#5301)
Support len, min, max, .ndim, .size in jit (#5319)
Support nested tuple unpack in CuPy JIT (#5332)
Support Numba-like jit.grid() syntax in CuPy JIT (#5334)
Support warp shuffle and sync functions in CuPy JIT (#5335)

Enhancements

Do not use handles unless requested in cupy.show_config() (#5073)
Fix to allow sharing a Stream instance between threads (#5251)
Adding GUFunc order, dtype and casting kwarg support (#5260)
Support nan, posinf, neginf in cupy.nan_to_num (#5295)
Use independent version of hipFFT for ROCm 4.1 and later (#5318)
Support cuTENSOR v1.3.1 (#5338)
Support cuDNN v8.2.1 (#5357)

Performance Improvements

Make cuTENSOR available in cupy.einsum (#5203)

Bug Fixes

Fix check_availablity for cupy.cusolver (#5207)
Fix MemoryAsync to keep a weakref to stream (#5264)
Fix cuFFT callback for sm_61 etc (#5304)
Fix cuDNN preloading (#5327)
Fix large arrays assignment (#5330)
Ensure source array is C-contiguous before copying to CUDAArray (#5342)
Increase test coverage for Generalized Universal Functions (#5344)
Remove unnecessary print (#5374)

Code Fixes

Fix cub repository url (#5236)
Code and comment fixes for stream (#5243)
Use cdef instead of cpdef where appropriate (#5274)

Documentation

Fix matmul docstring (#5174)
Update list of wheels in README (#5267)
Add user guide for FFT (#5272)
Bump CuPy version in docs (#5277)
Add user guide for streams & events (#5283)
Fix deadlink to tutorial and reorder in README (#5287)
Document ExternalStream (#5305)
Add ROCm 4.2 support to install docs (#5354)
user_guide/basic.rst: various improvements (#5356)

Installation

Drop support for CUDA 9.2 & NCCL 2.4 (#5214)
Add upper restrictions to NumPy/SciPy versions (#5225)
Exclude Cython 3 from setup_requires (#5273)

Tests

Fix threading memory pool tests (#5263)
Temporarily remove the async pool test from TestAllocator (#5308)
Fix Windows CI kernel cache (#5310)
Tentatively skip unstable MemoryPoolAsync tests (#5350)
Xfail random generator tests for HIP (#5355)
Tentatively pin to SciPy 1.6 in Windows CI (#5366)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@anaruse @eternalphane @leofang @maxim-belkin @povinsahu1909

cupy - v10.0.0a1

Published by emcastillo over 3 years ago

This is the release note of v10.0.0a1. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

CUDA 11.0 and 11.1 wheels not available yet in PyPI (#4971)

In the meantime, they can be downloaded from the Assets section below. See #4971 for the detailed instructions.

Changes without compatibility

Current stream is now managed per device (#5172)

CuPy now automatically manages the stream switching when changing a device, so the user is not responsible for changing the stream anymore.

This pull-request also includes a bug fix for #5143. An existing code mixing with stream: blocks and stream.use() may get different results as the stream set via use() API will not be reactivated when exiting a stream context.

s1 = cupy.cuda.Stream()
s2 = cupy.cuda.Stream()
s3 = cupy.cuda.Stream()
with s1:
    s2.use()
    with s3:
        pass
    cupy.cuda.get_current_stream()  # -> CuPy v10 returns `s1` instead of `s2`.

Make `cupy.cuda.Device` context manager interface thread safe (#5083)

The use of a single cupy.cuda.Device context manager object with multiple threads was leading to incorrect behavior when restoring the previous device since the first versions of CuPy. Now the correct device is restored back so user code relying on this incorrect behavior might need to be updated.

Deprecate `cupyx.allow_synchronize` and `cupyx.DeviceSynchronized` APIs (#5226)

These APIs used for detecting when synchronization to a device was happening have been deprecated since they don’t provide reliable behavior.

Changes

Note: many of these PRs are backported to the v9 series and available since the release.

New Features

CUDA 11.2: Add MemoryAsyncPool to support malloc_async (#4592)
Add APIs for creating NumPy arrays backed by pinned memory (#4870)
Support cuSPARSELt (#4883)
Add gamma distributions to random API (#4905)
Add random for uniform [0, 1) generation (#4906)
Add poisson distribution to random API (#4927)
Add SciPy compatible connected_components (#4940)
Support shared memory in CuPy JIT (#4950)
Add cupyx.scipy.sparse.kronsum() (#4968)
Add hfft2, ihfft2, hfftn, and ihfftn to cupyx.scipy.fft (#4996)
CuPy JIT: Print kernel code (#5017)
Add cupyx.jit.atomic_add (#5169)
CUDA 11.2/11.3: Support MemoryAsyncPool statistics and limits (#5177)

Enhancements

Ability to pass structured data types by value as kernel parameters (#4829)
Move the NVTX module to cupy_backends.cuda.libs (#4930)
Disable CUB SpMV on CUDA 11.x (#4949)
CuPy JIT: Readable compile error messages (#4991)
Fix JIT test failures on ROCm (#4998)
Mark cupyx.jit.rawkernel as experimental (#5005)
HIP: add -ftz=true (#5007)
Give gufunc a name (#5013)
CuPy JIT: Use C++-like typing rule in 'cuda' mode (#5028)
Add PCI Bus ID to show_config (#5037)
Print cuSPARSELt version in show_config (#5054)
Support custom getsource option in CuPy JIT (#5071)
Make cupy.cuda.Device context manager interface thread safe (#5083)
Add a new argument out to cupy.asnumpy() (#5155)
Support cuSPARSELt v0.1.0 (#5158)
Per device stream (#5172)
cuTENSOR v1.3.0 for library installer (#5192)
Add sum_labels to cupyx.scipy.ndimage.measure (#5200)
Support NCCL v2.9.8 (#5201)
Fix thrust compilation for ROCm 4.2.0 (#5209)
Add NVCC path and Python version to show_config (#5215)
Add CUDA 11.3 headers (#5218)
Add libraries for CUDA 11.3 (#5219)
Remove syncdetect APIs (#5226)

Bug Fixes

Use THRUST_OPTIONAL_CPP11_CONSTEXPR (#5002)
Use async memcpy in ndarray.copy (#5004)
Fix DLPack lanes (#5045)
Disable cuFFT plan cache on CUDA 11.1 (#5046)
Support PTDS in CuPy memory pool (#5072)
CuPy JIT: Fix range type (#5077)
Fix poisson to support lam array (#5087)
Adjust PATH when preloading to load cuDNN v8 correctly on Windows (#5103)
Bugfix for typing rule of CuPy JIT (#5125)
Fix TypeError in svds (#5140)
Properly handle non-contiguous RHS in cupyx.scipy.sparse.linalg.spsolve (#5168)
Fix integer scatter_add failure on Windows (#5173)
MemoryAsyncPool: Use the "current" mempool instead of the "default" one (#5191)
Fix matmul for input with relaxed strides (#5205)
Add check_availability for cuTensor routines (#5206)
Fix windows constexpr (#5233)
Remove duplicated subtraction in cupy.random.Generator.integers (#5247)

Code Fixes

Rename cupy.core submodule to cupy._core (#3820)
Fix some internal cpdef functions to cdef in _kernel.pyx (#5084)
Remove cupy.cupy (#5121)
Cosmetic change in cuSPARSELt stub header (#5149)
Cosmetic changes of CuPy JIT implementation (#5152)

Documentation

Follow the latest NumPy/SciPy docs style (#4945)
Fix docs: cupy-cuda112 now on PyPI (#4957)
Update installation guide for Conda-Forge (#4985)
CuPy JIT documentation (#5012)
Document cupyx.time.repeat (#5015)
Document cupy.cuda.runtime.getDeviceProperties (#5016)
More documentation on the supported backends (#5019)
Add links to Anaconda, Gitter, StackOverflow (#5020)
Improve the documentation on interoperability (#5023)
Document CFunctionAllocator and ManagedMemory (#5025)
Fix code block in installation guide (#5033)
Improve comments for memory and stream API usage (#5060)
Point to the correct numpy random docs (#5088)
Add user guide (#5093)
Add ROCm limitations to docs (#5107)
Reorganize API reference pages (#5108)
Revise ROCm doc (#5122)
Fix docs of scatter_add (#5129)
Mention baseline API change in upgrade guide (#5131)
Fix ROCm wheel install steps (#5133)
Fix docstring in coo.py (#5139)
Fix docs in stream.pyx (#5144)
cuDNN v8.2 on documentation (#5148)
Mention PTDS in ROCm Limitation (#5159)
Use Sphinx 4 (#5188)
cuTENSOR v1.3 on documentation (#5196)
Fix cuSPARSELt not covered in docs (#5221)
Add cupyx.scipy.ndimage.sum_labels to docs (#5223)
Improve README (#5254)
Update logo image (#5255)
Tentatively remove CUDA 11.3 from support list (#5256)

Installation

Fix Windows dll loading for Conda (#4974)
Add warnings for duplicate installation (#5032)
cuDNN v8.2.0 for library installer (#5146)
Bump version to v10.0.0a1 (#5269)

Examples

Fix cuSPARSELt example not to use internal function (#4995)
Update examples for current version of CuPy (#4999)

Tests

Refactor random tests (#4907)
Tentatively pin CI to ROCm 4.0.1 (#4961)
Fix cutensor import in the test (#4965)
Make install_tests runnable without depending on current path (#4969)
Avoid using pip install -e on Windows CI for performance (#4970)
Update known base branches in flexCI config (#4973)
Update list of known branches (#4982)
Fix TestStream cleanup (#5042)
Mark some memory tests as testing.slow (#5061)
Fix stream usage on D2D copy test under HIP (#5091)
Xfail tests for random distribution generator under HIP/ROCm (#5096)
Adjust testing tolerance for hfftn for HIP/ROCm (#5099)
Use current device in tests (#5127)
Fix for updated FlexCI base image (#5164)
Relax tolerance of cupyx.jit.atomic_add test (#5186)
Test build for ROCm 4.0 and latest (#5224)
Fix mergify configuration (#5248)

Others

Use bot mode in automatic backport (#5051)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@anaruse @beingaryan @eternalphane @grlee77 @insertinterestingnamehere @keckj @leofang @povinsahu1909 @UmashankarTriforce

cupy - v9.1.0

Published by emcastillo over 3 years ago

This is the release note of v9.1.0. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

CUDA 11.0 and 11.1 wheels not available yet in PyPI (#4971)

In the meantime, they can be downloaded from the Assets section below. See #4971 for the detailed instructions.

Changes without compatibility

Make `cupy.cuda.Device` context manager interface thread safe (#5083)

Changes

Enhancements

Add cupyx.jit.atomic_add (#5181)
Support custom getsource option in CuPy JIT (#5089)
Fix JIT test failures on ROCm (#5101)
Make cupy.cuda.Device context manager interface thread safe (#5147)
Fix thrust compilation for ROCm 4.2.0 (#5212)
Add sum_labels to cupyx.scipy.ndimage.measure (#5222)
Support cuSPARSELt v0.1.0 (#5227)
Fix Stream destructor not taking care of PTDS (#5228)
NCCL v2.9.8 (#5229)
Add NVCC path and Python version to show_config (#5230)
cuTENSOR v1.3.0 for library installer (#5234)
Add libraries for CUDA 11.3 (#5235)

Bug Fixes

Fix DLPack lanes (#5094)
Fix TypeError in svds (#5161)
Fix integer scatter_add failure on Windows (#5178)
Properly handle non-contiguous RHS in cupyx.scipy.sparse.linalg.spsolve (#5180)
Fix poisson to support lam array (#5182)
Fix matmul for input with relaxed strides (#5240)
Add check_availability for cuTensor routines (#5244)
Fix windows constexpr (#5250)
Remove duplicated subtraction in cupy.random.Generator.integers (#5261)

Code Fixes

Remove cupy.cupy (#5137)
Cosmetic change in cuSPARSELt stub header (#5160)
Cosmetic changes of CuPy JIT implementation (#5162)

Documentation

Mention baseline API change in upgrade guide (#5132)
Fix docstring in coo.py (#5141)
Fix docs in stream.pyx (#5150)
Fix docs of scatter_add (#5153)
Fix ROCm wheel install steps (#5154)
Mention PTDS in ROCm Limitation (#5166)
Use Sphinx 4 (#5198)
cuDNN v8.2 on documentation (#5217)
Fix cuSPARSELt not covered in docs (#5231)
cuTENSOR v1.3 on documentation (#5238)
Add cupyx.scipy.ndimage.sum_labels to docs (#5245)
Update logo image (#5257)
Improve README (#5259)

Installation

cuDNN v8.2.0 for library installer (#5216)
Bump version to v9.1.0 (#5270)

Tests

Use current device in tests (#5151)
Fix stream usage on D2D copy test under HIP (#5157)
Fix for updated FlexCI base image (#5167)
Relax tolerance of cupyx.jit.atomic_add test (#5187)
Test build for ROCm 4.0 and latest (#5239)
Avoid using pip install -e on Windows CI for performance (#5242)
Fix mergify configuration (#5249)

Others

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@anaruse @beingaryan @eternalphane @grlee77 @insertinterestingnamehere @leofang

cupy - v9.0.0

Published by kmaehashi over 3 years ago

This is the release note of v9.0.0.

This release note only covers the changes since v9.0.0rc1 release. Read the blog for the details of new features introduced in CuPy v9!

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

NVIDIA cuSPARSELt

CuPy now integrates the Python binding for the cuSPARSELt library that accelerates sparse matrix multiplications on NVIDIA Ampere GPUs. We are planning to start using it in CuPy sparse APIs to transparently improve performance.

RAPIDS cuGraph

cupyx.scipy.sparse.csgraph is added to the API with support for the connected_components method. The support for cuGraph is optional and can be installed through conda-forge or by manually building CuPy. Currently, PyPI wheels do not have built-in support for cuGraph.

Add `MemoryAsyncPool` to support `malloc_async` (#5034)

By using cupy.cuda.set_allocator(cupy.cuda.MemoryAsyncPool().malloc) it is now possible to use the stream ordered memory allocations introduced in CUDA 11.2.

APIs for creating NumPy arrays backed by pinned memory (#5100)

By using the cupyx.empty_pinned(), cupyx.empty_like_pinned(), cupyx.zeros_pinned() cupyx.zeros_like_pinned() it is possible to obtain NumPy ndarrays with their storage located in pinned memory to improve performance of data movement.

CUDA 11.0 and 11.1 wheels not available yet in PyPI (#4971)

In the meantime, they can be downloaded from the Assets section below. See #4971 for the detailed instructions.

Changes

See here for the complete list of solved issues and merged PRs after v9.0.0rc1 release. For all changes since v9 series, please refer to the release notes of the pre-releases ((alpha1, beta1, beta2, beta3, rc1).

New Features

Support shared memory in CuPy JIT (#4977)
Support cuSPARSELt (#4994)
Add random for uniform [0, 1) generation (#5003)
CUDA 11.2: Add MemoryAsyncPool to support malloc_async (#5034)
Add poisson distribution to random API (#5036)
CuPy JIT: Print kernel code (#5038)
Add gamma distributions to random API (#5086)
Add APIs for creating NumPy arrays backed by pinned memory (#5100)
Add SciPy compatible connected_components (#5113)

Enhancements

Disable CUB SpMV on CUDA 11.x (#4978)
Move the NVTX module to cupy_backends.cuda.libs (#5014)
HIP: add -ftz=true (#5035)
CuPy JIT: Readable compile error messages (#5041)
CuPy JIT: Use C++-like typing rule in 'cuda' mode (#5053)
Mark cupyx.jit.rawkernel as experimental (#5057)
Add PCI Bus ID to show_config (#5062)
Print cuSPARSELt version in show_config (#5065)
Give gufunc a name (#5085)

Bug Fixes

Use THRUST_OPTIONAL_CPP11_CONSTEXPR (#5011)
Disable cuFFT plan cache on CUDA 11.1 (#5068)
Use async memcpy in ndarray.copy (#5078)
CuPy JIT: Fix range type (#5081)
Support PTDS in CuPy memory pool (#5082)
Adjust PATH when preloading to load cuDNN v8 correctly on Windows (#5116)

Code Fixes

Rename cupy.core submodule to cupy._core (#4987)
Fix some internal cpdef functions to cdef in _kernel.pyx (#5098)

Documentation

Fix docs: cupy-cuda112 now on PyPI (#4990)
Update installation guide for Conda-Forge (#4993)
Document cupyx.time.repeat (#5027)
Document cupy.cuda.runtime.getDeviceProperties (#5029)
Doc: Add links to Anaconda, Gitter, StackOverflow (#5030)
More documentation on the supported backends (#5039)
Fix code block in installation guide (#5043)
Document CFunctionAllocator and ManagedMemory (#5059)
Improve the documentation on interoperability (#5064)
CuPy JIT documentation (#5076)
Improve comments for memory and stream API usage (#5079)
Add user guide (#5109)
Reorganize API reference pages (#5114)
Point to the correct numpy random docs (#5115)
Follow the latest NumPy/SciPy docs style (#5118)
Add ROCm limitations to docs (#5119)
Revise ROCm doc (#5123)

Installation

Fix Windows dll loading for Conda (#5106)

Examples

Update examples for current version of CuPy (#5009)
Fix cuSPARSELt example not to use internal function (#5066)

Tests

Tentatively pin CI to ROCm 4.0.1 (#4976)
Update known base branches in flexCI config (#4980)
Fix cutensor import in the test (#4981)
Update list of known branches (#4989)
Make install_tests runnable without depending on current path (#4992)
Fix TestStream cleanup (#5052)
Mark some memory tests as testing.slow (#5063)
Refactor random tests (#5102)

Others

Use bot mode in automatic backport (#5058)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@anaruse @leofang @povinsahu1909

cupy - v9.0.0rc1

Published by asi1024 over 3 years ago

This is the release note of v9.0.0rc1. See here for the complete list of solved issues and merged PRs.

We are planning to release the final v9.0.0 on April 22th. Please start testing your workload with this release. See the Upgrade Guide for the list of possible breaking changes.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

CuPy JIT (#4774)

Now creating raw kernels out of python functions is possible thanks to the introduction of the @cupyx.jit.rawkernel decorator.

from cupyx import jit

@jit.rawkernel()
def f(x, y, z, n):
    tid = jit.threadIdx.x + jit.blockIdx.x * jit.blockDim.x
    ntid = jit.blockDim.x * jit.gridDim.x
    for i in range(tid, n, ntid):
        z[i] = x[i] + y[i]

n = numpy.uint32(1024)
x = cupy.arange(n)
y = cupy.arange(n)
z = cupy.empty((n,), dtype='l')
f[16, 16](x, y, z, n)

Support for Generalized Universal Functions (#4675)

We have added an interface to support Generalized Universal Functions based on the one in Dask. Currently, it is used in matmul to ensure compatibility with __array_ufunc__ numpy dispatching.

cuTENSOR Support in Binary Packages (#4600)

cuTENSOR support is now enabled in wheel packages. To use cuTENSOR features you will need to install the shared library using python -m cupyx.tools.install_library --cuda 11.2 --library cutensor after installing wheels.

New Sphinx Theme in Documentation (#4351)

Following NumPy, we have adopted the pydata_sphinx_theme in our documentation site starting from this release.

CUDA 11.0 and 11.1 wheels not available yet in PyPI (#4971)

In the meantime they can be downloaded from the Assets section below. See #4971 for the detailed instructions.

Changes without compatibility

`cupy.cuda.nccl` is hidden by default (#4919)

NCCL wrapper is no longer imported in cupy/cuda/__init__.py requiring it to be explicitly imported from cupy.cuda.nccl.

Drop NCCL & cuDNN shared libraries from wheels (#4850, #4932)

NCCL and cuDNN shared libraries are no longer bundled in all wheels. To activate features using NCCL / cuDNN in CuPy v9, you will need to install these libraries using python -m cupyx.tools.install_library tool after installing CuPy wheels. See the Installation Guide for details.

By eliminating the default bundling of cuDNN & NCCL we have achieved further reductions in the wheel size averaging 5x.

Deprecate `cupy.bool`, `cupy.int`, `cupy.float` and `cupy.complex` (#4790)

Following NumPy 1.20 API, these aliases for the Python scalar types have been deprecated.
cupy.bool_, cupy.int_, cupy.float_ and cupy.complex_ should be used instead when required.

Docker image updated to CUDA 11.2 and Python 3.8

The official Docker image is now updated to use CUDA 11.2 and Python 3.8.

Changes

New Features

LOBPCG solver - cupyx.scipy.sparse.linalg.lobpcg (#4281)
Add diagonal and setdiag methods for COO sparse matrices (#4664)
Support for Generalized Universal Functions (#4675)
Support batched pinv (#4686)
Add CuPy JIT Kernel definition (#4774)
Add cupy.random.Generator.standard_normal (#4885)
Support tuple in CuPy JIT (#4890)
Add exponential distribution to random API (#4915)
Support tuple indexing in CuPy JIT (#4939)
Support __syncthreads() in CuPy JIT (#4941)

Enhancements

Support nvrtcGetSupportedArchs (#4691)
Update DLPack support (#4695)
Bump cuDNN to v8.1.1 in library installer tool (#4780)
Support norm='forward'/'backward' in cupy.fft functions (#4797)
Fix for flake8 F541 (#4803)
Complete build only when all of the essential modules are available (#4815)
Support norm='forward'/'backward' in cupyx.scipy.fft functions (#4816)
Support cuSparse functions for matrix conversion added in CUDA 11.2 (#4844)
Add NCCL to library installer (#4848)
Improve cuTENSOR installer (#4852)
Support cupy.ndarray type shift in cupy.roll (#4884)
Fix uniform random generation interval (#4894)
Use NVCC --threads option when building CuPy (#4908)
Bump headers to CUDA 11.2.2 (#4911)
Update preload to look for lib directory to support cuTENSOR/NCCL (#4912)
Move the NCCL module to cupy_backends.cuda.libs (#4919)
Add cupy/cuda/cutensor.py (#4920)

Performance Improvements

Improve batched SVD (#4731)
Avoid evaluating PTDS environment variable every time (#4842)

Bug Fixes

Fix dtypes in cupy.linalg (#4363)
Fix: avoid redeclaring attributes (#4764)
Windows: Fix compiler error for CUB block reduction kernels (#4771)
Support int argument for Dirichlet shape (#4772)
Windows: Fix histogram test failures (#4777)
Windows: fix sparse matrix indexing type (#4778)
Unify linux/windows randint with NumPy (#4808)
Improve/fix csc/csr argmax/argmin (#4813)
ROCm: Fix sorting bug (#4823)
Fixed choice function for 0 samples from 0 candidates (#4830)
Fix redeclaration of sparse warning classes (#4837)
Fix cuFFT callback compilations - v2 (#4853)
Solve UnboundLocalError on copy_from_host_async (#4900)
Add out arg verifier in new random interface. (#4904)
Fix compilation error due to invalid complex-to-real casting in _SimpleReductionKernel (#4909)
Fix C++ compilation error (#4922)
Fix cutensor import (#4933)
Fix flaky CUDAarray tests (#4946)
Declare CArray._indexing() only in CuPy JIT mode (#4951)

Code Fixes

Rename submodules under cupy.testing package (#3868)
Fix: code quality issues (#4587)
Use newest versions of stylecheck packages (#4694)
Clean-up sparse max/min argmax/argmin (#4860)

Documentation

Use pydata_sphinx_theme in Sphinx (#4351)
Remove cupy-cuda112 support from documentation (#4761)
Revert "Remove cupy-cuda112 support from documentation" (#4785)
Fix broken Stream docs (#4843)
Reformat environment variables table (#4845)
Revert memory back to reference (#4857)
Update wheel list in README (#4910)
Merge ROCm installation guide (#4928)
Document that cuDNN and NCCL are no longer included (#4932)
Update install docs (#4943)

Installation

Support optional dependencies from Conda-Forge (#4873)
Bump version to v9.0.0rc1 (#4953)
Bump Docker image to use CUDA 11.2 (#4972)

Tests

Show config on Windows CI (#4649)
Windows: Fix test condition for CUB device kernels (#4776)
Xfail some tests for cupyx.scipy.statistics.correlation under ROCm/HIP (#4781)
Windows: fix vectorize tests (#4794)
Windows: fix OOM errors in the CI (#4801)
Windows: Fix sepfir2d tests (#4804)
Windows: Fix cuTENSOR tests (#4806)
Windows: Fix cuTENSOR tests (#4818)
Remove AppVeyor configurations (#4836)
Windows: Fix test_poly1d_pow_scalar (#4854)
Fix for flake8 E741 (#4888)
Windows: Skip failing cuDNN tests (#4893)
Add names for workflows (#4913)
Prioritize FlexCI daemon in Windows CI (#4916)
Fix to work with scheduled FlexCI job (#4929)
Change irfft tests tolerance (#4937)
Xfail tests for ndarray indexing under HIP (#4653)
Adjust tolerance of TestPolyArithmeticDiffTypes under HIP/ROCm (#4657)
Xfail tests in polynomial roots (#4658)
Xfail tests for manipulation dims under HIP/ROCm (#4662)
Xfail TestPolyfitParametersCombinations when deg == 0 under ROCm/HIP (#4758)
Xfail TestPolyfitCovMode when deg == 0 under ROCm/HIP (#4759)
Xfail TestInvh under ROCm/HIP (#4760)
ROCm: remove the need to set HCC_AMDGPU_TARGET at runtime (#4766)
Assert MT19937 not implemented in hipRAND (#4769)
Xfail chi-squared test for some random functions under ROCm/HIP (#4770)
Remove duplicated typedef in example when HIP (#4782)
Xfail cuDNN version check test under ROCm/HIP (#4791)
Remove solved xfail mark for msort (#4792)
Fix to test checking HIP version (#4859)
Xfail test on sparse handle under ROCm/HIP (#4861)
Xfail some tests under ROCm/HIP (#4868)
Xfail some conditions of ndimage filter under ROCm/HIP (#4877)
Xfail some conditions of ndimage interpolation tests under ROCm/HIP (#4878)
Xfail some conditions of ndimage measurements under ROCm/HIP (#4879)
Xfail some conditions of signal tests under ROCm/HIP (#4880)

Others

Add CODEOWNERS file (#4757)
Add GitHub Actions workflow for automatic backport (#4812)
Fix pytest opts for Windows CI (#4820)
Use access token for automated backport (#4833)
Fix automated backport workflow (#4835)
Use pull_request_target trigger in backport automation (#4841)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@anaruse @aryamccarthy @grlee77 @leofang @mattvend @povinsahu1909 @venkywonka @viantirreau @withshubh

cupy - v8.6.0

Published by asi1024 over 3 years ago

This is the release note of v8.6.0. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Notes

Final release for v8.x series

We expect this version to be the final release for v8.x series. Please start testing your workloads with the latest v9.x pre-release.

CUDA 11.0 and 11.1 wheels for Windows not available yet in PyPI (#4971)

In the meantime they can be downloaded from the Assets section below. See #4971 for the detailed instructions.

Changes

Enhancements

Bump cuDNN to v8.1.1 in library installer tool (#4795)
Update DLPack support (#4849)
Bump headers to CUDA 11.2.2 (#4917)

Bug Fixes

[v8] Fix linalg.pinv on empty matrices (#4783)
Windows: Fix histogram test failures (#4784)
Windows: fix sparse matrix indexing type (#4796)
Support int argument for Dirichlet shape (#4798)
Windows: Fix compiler error for CUB block reduction kernels (#4814)
ROCm: Fix sorting bug (#4826)
Unify linux/windows randint with NumPy (#4827)
Fix dtypes in cupy.linalg (#4839)
Fixed choice function for 0 samples from 0 candidates (#4851)
Improve/fix csc/csr argmax/argmin (#4858)
Fix cooperative kernel launch (#4887)

Code Fixes

Use newest versions of stylecheck packages (#4800)
Fix: code quality issues (#4832)

Documentation

Remove cupy-cuda112 support from documentation (#4762)
Revert " Remove cupy-cuda112 support from documentation" (#4786)
Reformat environment variables table (#4856)

Installation

Bump version to v8.6.0 (#4954)

Tests

Windows: Fix test condition for CUB device kernels (#4793)
Windows: Fix cuTENSOR tests (#4818)
Remove AppVeyor configurations (#4846)
Windows: fix OOM errors in the CI (#4862)
Fix raw kernel test (#4871)
Windows: Fix test_poly1d_pow_scalar (#4889)
Windows: Skip failing cuDNN tests (#4901)
Add names for workflows (#4914)
Show config on Windows CI (#4918)
Prioritize FlexCI daemon in Windows CI (#4921)
Fix to work with scheduled FlexCI job (#4931)

Others

Add CODEOWNERS file (#4788)
Fix pytest opts for Windows CI (#4822)
Rename submodules under cupy.testing package (#4876)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@anaruse @aryamccarthy @leofang @povinsahu1909 @withshubh

cupy - v8.5.0

Published by emcastillo over 3 years ago

This is the release note of v8.5.0. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Changes without compatibility

Always run cythonize on sdist installation (#4619)

When installing cupy from the regular sdist wheel, Cython files are provided instead of .cpp ones so an environment capable of running the latest Cython (0.29.22) is required.

Changes

Enhancements

Use NVRTC for grid synchronization in cupy.fuse (#4639)
Fix DLPack header version (#4640)
Bump cuDNN to v8.1.0 (#4674)
Fix tests for cuDNN 8.1 (#4699)
Use MSVC 14.0 with CUDA 11.2 (#4701)
Add cudnn and cutensor for CUDA 11.2 in install_library (#4703)
Fix Windows CI test script to avoid caching kernel when in pull-request test (#4707)
Support CUDA 11.2 (#4708)
ROCm: Fix filters in cupyx.scipy.ndimage - Part 1 (#4642)
Stop using deprecated type aliases (#4612)
Deprecate passing shape=None to mean shape=() (#4622)
Fix poly1d return types for NumPy 1.20 (#4623)
Update spec of linspace to NumPy 1.20 (Fix tests for incompatible behavior of NumPy 1.20 linspace) (#4625)

Bug Fixes

ndimage: coordinate rounding fix for order=0 interpolation (#4570)
Fix cupy.array from nested list of zero-dim ndarray (#4571)
Fix warning message on cuDNN version (#4580)
Fix empty NVRTC program name (#4599)
Fix device property default name (#4602)
Make normal and lognormal support array args (#4626)
Make uniform-based random distributions support array args (#4671)
Remove dependency on CUDA's math_constants.h (#4679)
Fix files not closed (#4681)
Fix cuSPARSE module build failure with CUDA 10.1 on Windows (#4715)
Always run cythonize on sdist installation (#4725)
Fix thrust compilation in MSVC 14 for CUDA 11.2 (#4728)
Missing constexpr in cupy_thrust.cu (#4732)
Disable some constexpr only for windows (#4740)
Eliminate read past end of array in percentile_weightnening (#4742)
Fix gesvdj_batched info array size (#4747)

Code Fixes

Remove use of numpy.bool (#4589)
Avoid test discovery failure in NumPy 1.20 (#4603)
Stop using deprecated numpy.complex (#4624)

Documentation

Update installation guide for cuTENSOR on Conda-Forge (#4634)
Update documentation requirements (#4637)
Support building docs against pip installed cupy (#4688)
Update document for cuDNN 8.1 (#4700)
Update Python / NumPy / SciPy requirements (#4712)
Add CUDA 11.2 to docs (#4716)
Remove unneeded environment variable from ROCm install guide (#4723)
Document that only x86_64 wheels are provided (#4724)
Add NCCL 2.8 to supported version (#4726)
Fix dead link to issue (#4744)
Remove cupy-cuda112 support from documentation (#4762)

Installation

Remove broken support for macOS (#4641)
Remove maximum cuDNN version check (#4697)
Bump Cython version requirement for Python 3.9 (#4733)
Remove pyproject.toml (#4748)
Fix Cython setup requirement out of sync (#4754)

Tests

Test numpy.VisibleDeprecationWarning (#4581)
Add FlexCI for Windows (#4645)
Publish results even if build failed in Windows CI (#4722)

Others

Bump version to v8.5.0 (#4752)
Bump Dockerfile version to v8.5.0 (#4753)

Contributors

The CuPy Team would like to thank all those who contributed to this release!
@grlee77 @leofang @wphicks

cupy - v9.0.0b3

Published by emcastillo over 3 years ago

This is the release note of v9.0.0b3. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

NumPy 1.20 and SciPy 1.6 support

In addition, we have ensured compatibility with the newly released NumPy 1.20 and SciPy 1.6.

`cupy.vectorize` Development

The development of the CUDA JIT for CuPy is progressing steadily. We currently support regular if/while/for statements and constant declarations. Currently the JIT is only used in cupy.vectorize with an almost complete support. Its uses will be extended in upcoming releases.

ROCm (HIP) 4.0 support

Starting in v9.0.0b3, we are providing cupy-rocm-4-0 binary packages (wheels) for ROCm 4.0. Check the installation guide for the details.

Support for ROCm/HIP is being addressed by fixing bugs and providing a stable CI environment in order to ensure a smooth development.

Changes without compatibility

Always run cythonize on sdist installation (#4619)

When installing cupy from the regular sdist wheel, Cython files are provided instead of .cpp ones so an environment capable of running the latest Cython (0.29.22) is required.

Bump CUDA version for Docker images (#4709, #4738)

The current base Docker images have been updated from CUDA 10.2 to CUDA 11.2.

Changes

New Features

Support for Per Thread Default Stream (PTDS) (#4322)
Add splu, spilu and factorized to cupyx.scipy.sparse.linalg (#4392)
Support the built-in Stream Ordered Memory Allocator (#4537)
Add cuTENSOR to library download tool (#4560)
Support batched SVD (#4628)
Support if-statement in CuPy JIT (#4646)
Support while/for-statement in CuPy JIT (#4677)

Enhancements

Add SparseEfficiencyWarning and warn inefficient comparison (#4213)
Support ufunc call in CuPy JIT (#4347)
Use NVRTC for grid synchronization in cupy.fuse (#4492)
Wrap the new nvrtcGetCUBIN API (#4558)
Support explicit typecast in CuPy JIT (#4595)
Bump fastrlock requirement (#4632)
Bump cuDNN to v8.1.0 (#4636)
Fix Windows CI test script to avoid caching kernel when in pull-request test (#4650)
Update CUDA Array Interface to v3 - Part 2 (#4659)
Support assignment of compile-time constants in CuPy JIT (#4672)
Support cuTENSOR download tool on Windows (#4678)
Add cudnn and cutensor for CUDA 11.2 in install_library (#4680)
Fix tests for cuDNN 8.1 (#4689)
Use MSVC 14.0 with CUDA 11.2 (#4698)
Support CUDA 11.2 (#4702)
Slight code improvements for the CUB reduction kernel (#4504)
ROCm: Fix device attributes; Record HIP_VERSION; Add more device properties; etc (#4556)
Fix/skip some tests for HIP (#4588)
Skip failing tests for cupy.cublas under HIP (#4652)
Xfail test for cupy._indexing.generate under HIP (#4654)
Skip TestPoly1dMathArithmetic under HIP/ROCm (#4656)
Xfail tests for eigenvalue under HIP/ROCm (#4661)
Xfail tests for convolve under HIP/ROCm (#4668)
Fix definition order in preamble (#4669)
Fix invalid command line generated when rpath is empty (#4717)
Update spec of linspace to NumPy 1.20 (#4604)
Stop using deprecated type aliases (#4605)
Fix poly1d return types for NumPy 1.20 (#4611)
Deprecate passing shape=None to mean shape=() (#4616)
Improve floating point accuracy in percentile (#4617)

Bug Fixes

Fix cuSPARSE module build failure with CUDA 10.1 on Windows (#4419)
Fix warning message on cuDNN version (#4487)
Fix Jitify version detection on CUDA 11.0+ (#4514)
CUDA 11.2: Fix empty NVRTC program name (#4538)
Fix return type of has_sorted_indices of sparse array (#4564)
Fix device property default name (#4594)
Make normal and lognormal support array args (#4615)
Always run cythonize on sdist installation (#4619)
Make uniform-based random distributions support array args (#4638)
Fix typecast from complex to bool in JIT (#4648)
ROCm: disable PTDS (#4651)
Fix files not closed (#4667)
Revert _get_arch (#4682)
Fix random errors in batched SVD (#4690)
Fix thrust compilation in MSVC 14 for CUDA 11.2 (#4713)
Missing constexpr in cupy_thrust.cu (#4730)
Disable some constexpr only for windows (#4735)
Add missing attrs (#4739)

Code Fixes

Use cupy.core._dtype.to_cuda_dtype whenever possible (#3853)
Remove use of numpy.bool (#4586)
Avoid test discovery failure in NumPy 1.20 (#4598)
Stop using deprecated numpy.complex (#4620)
Use CodeBlock to generate human-readable code in JIT (#4633)

Documentation

Update doctest of cupyx.scipy.linalg.lu_factor (#4561)
Update installation guide for cuTENSOR on Conda-Forge (#4614)
Update documentation requirements (#4630)
Update document for cuDNN 8.1 (#4643)
Support building docs against pip installed cupy (#4685)
Add CUDA 11.2 to docs (#4705)
Add NCCL 2.8 to supported version (#4710)
Update Python / NumPy / SciPy requirements (#4711)
Remove unneeded environment variable from ROCm install guide (#4720)
Document that only x86_64 wheels are provided (#4721)
Fix dead link to issue (#4736)
Remove cupy-cuda112 support from documentation (#4761)

Installation

Update install requirements (#4631)
Remove maximum cuDNN version check (#4683)
Bump Cython version requirement for Python 3.9 (#4706)
Bump CUDA version for Docker images (#4709)
Fix Docker image to use CUDA 11.1 (#4763)
Remove pyproject.toml (#4734)
Fix Cython setup requirement out of sync (#4741)
Add Docker image for ROCm and documentation (#4737)
Fix Docker image to use CUDA 11.1 (#4763)

Tests

Add FlexCI for Windows (#4362)
Test numpy.VisibleDeprecationWarning (#4498)
Fix test import failure in ROCm (#4585)
Publish results even if build failed in Windows CI (#4714)
Skip callback tests for this release (#4745)

Others

Bump version to v9.0.0b3 (#4751)

Contributors

The CuPy Team would like to thank all those who contributed to this release!
@anaruse @leofang @pentschev @wphicks

cupy - v8.4.0

Published by emcastillo over 3 years ago

This is the release note of v8.4.0. See here for the complete list of solved issues and merged PRs.

Highlights

Gitter Community

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Changes without compatibility

Removal of older pre-release packages from PyPI

As announced in #4360, we have removed pre-release wheels earlier than v6.0.0rc1 from PyPI. Those version wheels can be found at the GitHub release page of every version, and can be installed by specifying -f option:

pip install --pre cupy-cuda101 -f https://github.com/cupy/cupy/releases/v6.0.0rc1

Changes

Enhancements

Import DLPack header file & Fix multiple issues (#4535)
Fix sparse format of kron (#4547)
Fix return type of polynomial.__eq__ (#4555)

Bug Fixes

Fix dev info allocation (#4501)
Use --device-c for RDC compile (#4505)
Fix cupy.concatenate typecheck for out with different dtype (#4528)
Fix cupy.take from an empty array (#4542)
Fix integer GEMM (#4551)

Tests

Test FutureWarning (#4510)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@leofang @mor2code

cupy - v9.0.0b2

Published by kmaehashi over 3 years ago

This is the release note of v9.0.0b2. See here for the complete list of solved issues and merged PRs.

Highlights

Support for NumPy 1.17 New Random API (#4177)

This release adds preliminary support for the new random API introduced in NumPy 1.17.
Since our implementation is based on cuRAND, we currently support the following BitGenerator objects: XORWOW, MRG32k3a, and Philox4x3210. Notice that they are different from NumPy ones. The new random module is currently in development and only a few distributions are supported in this first release (#4557). Please check the documentation for further reference.

AMD/HIP support improved

Several bugs have been corrected for AMD devices and added support for ROCm 3.9. Now almost all the CuPy core functionalities are checked to work with HIP/ROCm. However, there are still some issues that require support from AMD such as using arrays with a size larger than 2^32 elements in element-wise and reduction routines.

Gitter Community

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Changes without compatibility

Removal of older pre-release packages from PyPI

pip install --pre cupy-cuda101 -f https://github.com/cupy/cupy/releases/v6.0.0rc1

Changes

New Features

Add NumPy v1.17 new random API with Generator (#4177)
Add cufftXtMakePlanMany and cufftXtExec (#4407)
Support texture object in cupy.ElementwiseKernel (#4433)
Add cupyx.scipy.signal.oaconvolve (#4468)
Add cupy.round (#4539)

Enhancements

ndimage: implement higher order spline interpolation (#4402)
ndimage: complex dtype support in filters and interpolation (#4444)
Fix potential race when compiling cuFFT callbacks (#4508)
Import DLPack header file & Fix multiple issues (#4517)
Remove experimental warnings from cupyx.scipy.ndimage (#4518)
Add complex support to cupyx.scipy.signal functions (#4525)
Implement __format__ in ndarray (#4544)
Fix DLPack header version (#4559)
Support passing pointers as integers in memory copy routines (#4562)
Fix sparse format of kron (#4536)
Fix return type of polynomial.__eq__ (#4554)
ROCm: Improve compiler (#4102)
ROCm: Cover more new BLAS/SOLVER functions (#4217)
Fixes a bunch of AMD issues (#4485)
Fix fusion tests for HIP (#4521)
ROCm: skip multi-GPU FFT tests (#4553)

Performance Improvements

Improve cupy.argwhere and cupy.nonzero (#4367)

Bug Fixes

Use --device-c for RDC compile (#4470)
Fix integer GEMM (#4512)
Fix axis of cupy.gradient (#4523)
Fix bug on bad interaction between __cuda_array_interface__ and __array_ufunc__ on HIP (#4524)
Fix cupy.concatenate typecheck for out with different dtype (#4527)
Fix cupy.take from an empty array (#4530)
Fix windows build of random module (#4543)
ndimage: coordinate rounding fix for order=0 interpolation (#4552)
Fix return type of polynomial.__eq__ to numpy.bool_ (#4563)
Remove dependency on CUDA's math_constants.h (#4565)
Fix cupyx.scipy.ndimage.zoom for outputs of size 1 (#4568)
Fix cupy.array from nested list of zero-dim ndarray (#4569)
Fix Windows build failure in new random API (#4574)

Code Fixes

Refactor axis errors (#4488)
Remove deprecated stubs for CUDA<9.2 (#4545)

Documentation

Document random.uniform may return high (#4509)

Tests

Test FutureWarning (#4226)
Show type of error in numpy_cupy_* decorators (#4403)
Update PyTest (#4507)
Remove unnecessary return statements (#4519)
Check if the accept_error was not raised from test code (#4566)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@anaruse @coderforlife @grlee77 @leofang

cupy - v8.3.0

Published by asi1024 almost 4 years ago

This is the release note of v8.3.0. See here for the complete list of solved issues and merged PRs.

Changes

Enhancements

Inherit environment variable and detect cl.exe automatically (#4417)
Update CUDA Array Interface to v3 - Part 1 (#4446)

Bug Fixes

Fix cupy.random.bytes not working (#4323)
Fix rcond arg of linalg.lstsq (#4408)
Fix linalg.lstsq for complex types (#4426)
Fix cupy.searchsorted on HIP (#4447)
Fix out-of-bound access in ndimage rank filters (#4449)
Support complex types in solve_triangular (#4459)

Code Fixes

Rename submodules under cupy.lib (#4353)
Make names of test classes start with Test (#4372)

Documentation

Update links to forums in README (#4346)
Fix comment in docs/source/reference/statistics.rst (#4386)
add scipy.fft module to the API comparison table (#4391)
Fix docs of cupy.random functions/methods (#4474)

Installation

Fix parallel build (#4349)
Reset extra_compile_args for each module (#4384)
Disentangle HIP from CUDA in the build script (#4430)
Add support for cuTENSOR 1.2.2 (#4462)

Tests

Remove travis (#4376)
Refactor test of linalg.lstsq (#4425)
Update [jenkins] requirement (#4473)
Exclude unsupported dtypes for TestOrderFilter (#4480)

Others

Configure Mergify to check GitHub Actions instead of Travis (#4381)
Bump version to v8.3.0 (#4500)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@anaruse @grlee77 @leofang

cupy - v9.0.0b1

Published by asi1024 almost 4 years ago

This is the release note of v9.0.0b1. See here for the complete list of solved issues and merged PRs.

Changes without compatibility

Deprecations

As announced in #4300, CuPy v9 no longer supports the following out-dated components:

Python 3.5
CUDA 9.0 (cupy-cuda90 will not be released for v9)
cuDNN v7.5 (or earlier) and NCCL v2.3 (or earlier)
NumPy 1.16 and SciPy 1.3

Removal of older pre-release packages from PyPI

As announced in #4360, we are going to remove pre-release wheels earlier than v6.0.0rc1 from PyPI on 2021-01-28. Those version wheels can be found at the GitHub release page of every version, and can be installed by specifying -f option:

pip install --pre cupy-cuda101 -f https://github.com/cupy/cupy/releases/v6.0.0rc1

Changes

New Features

Adding cupyx.scipy.signal.fftconvolve (#3828)
Add cupyx.scipy.sparse.linalg.gmres (#4236)
Add cupyx.scipy.sparse.linalg.interface that exposes LinearOperator Feature (#4258)
Add cupyx.scipy.sparse.csr_matrix.diagonal and cupyx.scipy.sparse.csr_matrix.setdiag (#4284)
Support CUB-backed cupy.ReductionKernel and @cupy.fuse (#4289)
Add batched cholesky solver (#4291)
cupyx.ndimage.measurements: add center_of_mass, histogram, labeled_comprehension (#4311)
Add cudaLaunchHostFunc (#4338)
Add cupyx.scipy.sparse.linalg.spsolve_triangular (#4356)
Add cupyx.scipy.ndimage.fourier_ellipsoid (#4361)
Add cupyx.scipy.stats.entropy (#4369)
Add cupy.quantile (#4370)
Add cupyx.scipy.sparse.linalg.spsolve (#4375)
Add tril, triu and find to cupyx.scipy.sparse (#4382)
Add cupy.interp (#4418)
Add support for LinearOperator to cg and gmres in cupyx.scipy.sparse.linalg (#4422)

Enhancements

Adding C++ constructors for CArray and CIndexer (#3683)
Fix ndarray binary operations with non-CuPy arrays to return NotImplemented (#4198)
Change the wrapper arguments for cuBLAS L3 functions (#4263)
Revise dtype handling in cupyx.scipy.ndimage.spline_filter (#4314)
Update CUDA Array Interface to v3 - Part 1 (#4357)
Inherit environment variable and detect cl.exe automatically (#4397)
Fix poly1d (#4399)
ndimage: support all interpolation boundary modes (#4400)
ndimage: add grid_mode option to zoom (#4401)
Add support for LinearOperator to eigsh and svds in cupyx.scipy.sparse.linalg (#4428)
Update to use the new CArray and CIndexer constructors (#4463)
Deprecations for v9 (#4479)
Disable __cuda_array_interface__ for HIP (#4482)

Performance Improvements

Improve scan_core in cupy/core/_routines_math.pyx (#4316)
Improve scan in cupy/core/_routines_math.pyx (#4366)
Fixing minor TODO in scan (#4464)

Bug Fixes

Apply fixes to the FFT callback module (#4355)
Fix rcond arg of linalg.lstsq (#4365)
Fix cufft callback manager error in Windows (#4377)
Fix linalg.lstsq for complex types (#4390)
Fix cupy.searchsorted on HIP (#4437)
Fix out-of-bound access in ndimage rank filters (#4439)
Support complex types in solve_triangular (#4452)
Eliminate read past end of array in percentile_weightnening (#4453)
cupy.random.normal fix broadcasting of scale and loc arguments (#4457)
Fix __cuda_array_interface__ for HIP (#4458)
Fix overwritten issue in cumsum and cumprod (#4460)
Fix dev info allocation (#4491)

Code Fixes

Rename submodules under cupy.lib (#3713)
Clean up cupy_backends (#4088)
Replace stubs in ndimage by Jitify (#4264)
Refactor cupyx.scipy.sparse.linalg.eigsh (#4275)
Simplify cupy.linalg.inv (#4293)
Make names of test classes start with Test (#4320)
Refactor traceback in testing.helper (#4442)

Documentation

Add scipy.fft module to the API comparison table (#4278)
Fix docs of cupy.random functions/methods (#4319)
Update links to forums in README (#4345)
Incorrect environment variable description (#4410)
Add spsolve_triangular to the reference (#4438)
Fix doc of LinearOperator (#4441)

Installation

Remove broken support for macOS (#3857)
Fix parallel build (#4348)
Add support for cuTENSOR 1.2.2 (#4404)
Disentangle HIP from CUDA in the build script (#4416)

Tests

Set __module__ attr to parameterized class (#4239)
testing.numpy_cupy_allclose with per-dtype tolerance (#4269)
Refactor fft_tests: unittest -> pytest (#4287)
Fix testing.helper to work without unittest (#4304)
Refactor ndimage_tests (#4307)
Update [jenkins] requirement (#4325)
Add broadcast test for cupy.vectorize (#4341)
Parameterize sparse matrix tests (#4368)
Remove travis (#4373)
Fix some FFT test issues (#4385)
Refactor test of linalg.lstsq (#4388)
Require SciPy >= 1.4 in TestGmres (#4394)
Run FFT callback test only on Linux (#4411)
Refactor TestLinearOperator (#4413)
Require SciPy 1.4 in TestLinearOperator (#4420)
Fix RawKernel test (#4472)
Exclude unsupported dtypes for TestOrderFilter (#4477)
Fix TestEntropy: Requires scipy>=1.4.0 for axis parameter (#4481)
Skip large size test for hip (#4461)

Others

Drop support for Python 3.5 in CuPy v9 (#4340)
Configure Mergify to check GitHub Actions instead of Travis (#4378)
Bump version to v9.0.0b1 (#4499)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@anaruse @carterbox @coderforlife @dmargala @grlee77 @leofang @mor2code @venkywonka @wphicks

cupy - v8.2.0

Published by asi1024 almost 4 years ago

This is the release note of v8.2.0. See here for the complete list of solved issues and merged PRs.

Changes

Enhancements

Record Cython build version (#4188)
Add parallel build feature (#4273)
Bump cuDNN to v8.0.5 (#4313)
Defer import in cupy/_environment.py (#4329)

Bug Fixes

Fix broadcasting behavior in ndimage.measurements functions (#4204)
Refactor AssertFunctionIsCalled (#4253)

Code Fixes

Rename submodules under cupyx.linalg package (#4202)
Use assert statement instead of self.assert* methods (#4297)

Documentation

Add cupy-cuda111 to README (#4212)
Add missing functions to the API reference (#4257)
cupy-cuda111 package now on PyPI (#4335)

Tests

Fix tests of __bytes__ (#4255)
Fix numpy_cupy_equal for case that both numpy cupy raise errors (#4260)
Use GitHub Actions (#4286)
Skip some failing tests for fp16 + CUDA 9.0 (#4324)
Add import test for ROCm (#4334)

Others

Bump version to v8.2.0 (#4332)
ROCm: Support hipCUB/rocPRIM (#4327)
Fix output dtype of linalg.norm (#4230)
Warn non-tuple sequence for multidimensional indexing (#4285)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@grlee77 @leofang

cupy - v9.0.0a2

Published by emcastillo almost 4 years ago

This is the release note of v9.0.0a2. See here for the complete list of solved issues and merged PRs.

Update (2020-12-02): Unfortunately, the Windows build of this release is not working. We have taken down Windows wheels from PyPI, but if you need one for reference purposes you can still download them from the Assets section below. We are working hard to resolve this issue towards the next v9.0.0b1 release.

Highlights

`cupy.vectorize` & Initial CUDA JIT support

With this release, we are including a very early version of a Python to CUDA transpiler that will allow users to write their own CUDA kernels in Python, similarly to what Numba does. However, while Numba works on the bytecode and directly outputs the PTX code using LLVM, our approach uses the Python AST to directly translate the source code to C-CUDA and compile it using the NVIDIA toolchain, aiming to achieve a higher performance in the long run.

import cupy


def f(x, y):
    # This code will be compiled to a CUDA kernel by our JIT
    return x * x + y * y


x = cupy.linspace(0, 10, 6)
y = cupy.linspace(0, 20, 6)


func = cupy.vectorize(f)
out = func(x, y)
# out is [  0.  20.  80. 180. 320. 500.]

The initial version provides the limited support of primitive operators but we will be going forward in the upcoming releases. Check out #4290 if you are interested.

Jitify for raw kernels and modules

Thanks to @leofang now it is possible to use headers and libraries that were not possible before in RawKernel or RawModule due to the NVRTC reliance. With the new jitify=True option, Jitify is applied to your code so that you can use libraries such as the cuRAND device API, or CUB device routines in your raw kernels.

`cupyx.lapack` now as a public interface to cuBLAS

Until now, cuBLAS & cuSOLVER bindings were not publicly exposed in the API. However, with the introduction of cupyx.lapack by @anaruse, now it is possible to use LAPACK compatible routines backed by cuBLAS & cuSOLVER with a much simpler interface.

Deprecations in upcoming releases

We are going to drop support for Python 3.5 and obsolete libraries such as CUDA 9.0 and NumPy 1.16. Leave a comment in #4300 if you have any concerns in your use-case.

Changes

New Features

Add cupy.cusolver.gesv that uses cusolverDn<t1><t2>gesv (#3917)
Add cupy.cusolver.gels that uses cusolverDn<t1><t2>gels (#4073)
Add cupy.vectorize (#4135)
Support cuFFT callbacks (#4141)
Add spline_filter1d and spline_filter to cupyx.scipy.ndimage.interpolation (#4145)
Add cupyx.scipy.sparse.linalg.svds (#4155)
Improve coverage of cuBLAS L1 functions (#4205)
Change the wrapper arguments for cuBLAS L2 functions (#4221)
Add cupyx.scipy.sparse.linalg.cg (#4222)
Support Jitify (#4228)
Add cupyx.lapack (#4235)

Enhancements

Remove cupy.testing.NumpyError (#4225)
Fix an issue in cupyx.scipy.sparse.linalg.eigsh with CUDA 9.2 (#4231)
Add parallel build feature (#4240)
Support compile-time constants in CuPy JIT (#4241)
Detect the CC of the device when building (#4242)
Reduce the file size of cuFFT callback modules (#4267)
Bump cuDNN to v8.0.5 (#4303)
Detect Jitify version (#4306)

Performance Improvements

Improve cupy.random.randint (#4160)
Improve performance of cupyx.scipy.sparse.linalg.eigsh (#4214)
Improve convolve/correlate (#4248)
Improve Jitify performance (#4277)

Bug Fixes

Respect user-supplied output array in all binary morphology functions (#4157)
Refactor AssertFunctionIsCalled (#4233)
Fix possible redefinition of "-ccbin" in cupy.fft._callback (#4276)
Fix undefined symbols in cupy.fft._callback (#4283)
Fix issues when coo sparse matrix is created from dense matrix (#4295)
Fix cupy.random.bytes not working (#4318)

Code Fixes

Use .imag = 0 at hipFFT workaround (#4234)
Use assert statement instead of self.assert* methods (#4292)

Documentation

Add a note in free_all_blocks reference (#4196)
Add cupy-cuda111 to README (#4210)
Add missing functions to the API reference (#4215)
Add description of env vars for parallel build and auto cc detection (#4250)
Add spline_filter functions to ndimage docs (#4265)
cupy-cuda111 package now on PyPI (#4333)

Installation

Reset extra_compile_args for each module (#4336)
Tentatively hide CUPY_NUM_BUILD_JOBS option (#4339)

HIP/ROCm

ROCm: Fix filters in cupyx.scipy.ndimage - Part 1 (#4271)
ROCm: fix ndimage interpolation (#4301)

Tests

Remove unused features in testing.parameterized (#4178)
Add pytest backend implementation of testing.parameterize (#4192)
Use Python 3.6 in Travis CI (#4206)
Add repr of parameterized test and stop adding error message (#4211)
Fix numpy_cupy_equal for case that both numpy cupy raise errors (#4244)
Fix tests of __bytes__ (#4252)
Use GitHub Actions (#4261)
Skip some failing tests for fp16 + CUDA 9.0 (#4299)
Assert the results are bool (#4310)
Add import test for ROCm (#4326)

Others

Bump version to v9.0.0a2 (#4331)
Add equal_nan toggle for NaN values in array_equal (#4203)
Fix output dtype of linalg.norm (#4227)
Warn non-tuple sequence for multidimensional indexing (#4245)
DeprecationWaring on truth value on empty array (#4308)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@grlee77 @aitikgupta @anaruse @leofang

cupy - v8.1.0

Published by kmaehashi almost 4 years ago

This is the release note of v8.1.0. See here for the complete list of solved issues and merged PRs.

Highlights

CUDA 11.1 Support

Support for CUDA 11.1 is added in #4184, with CUDA 11.1, GeForce RTX 30 series and Quadro RTX series can now be used in CuPy.

Notes on Wheel Packages

Update (2020-11-25): cupy-cuda111 is now available on PyPI.
CuPy for CUDA 11.1 (cupy-cuda111) wheel packages are currently only available for Windows. We are going to publish Linux wheels once we get approval from the PyPI team. Meanwhile, Linux wheels can be downloaded from the Assets section below (or pip install cupy-cuda111 -f https://github.com/cupy/cupy/releases/tag/v8.1.0).

New Features

Add sparse pointwise equality & inequality functions (#4004)
Add cudaGetDeviceProperties (#4103)
Add order option in cupy.testing.shaped_random (#4104)
Add support for CUDA 11.1 (#4191)

Enhancements

Bump cuDNN to v8.0.4 (#4069)
Show numpy and scipy versions in show_config (#4079)
Support pickling cupy.RawKernel (#4154)

Bug Fixes

Fix csr2csc for zero-size matrix (#3922)
Add a kernel for integer GEMM (#4067)
Fix potential segfault when reduction axis is empty (#4068)
Workaround cudaPointerGetAttributes error in CUDA 10.2+ (#4089)
Add work-around for issue in cutensorReduction of cuTENSOR 1.2.1 (#4098)
Fix argmax and argmin for F-order inputs (#4106)
Fix CUB block reduction for F-order arrays with ndim > 2 (#4109)
ROCm: Fix getDeviceProperties for HIP (#4113)
Fix argmax/argmin in CUB block reduction for F-order arrays with ndim > 1 (#4115)
Fix typos in cupy.cuda.cufft (#4117)
Handle np.nan and np.inf constant values properly in ndimage functions (#4133)
Fix 64-bit int types in type_dispatcher.cuh (#4134)
Add compute_35 for CUDA 11.0+ (#4140)
Fix device properties for cuda 9.2 (#4152)
fix mode='opencv' case in cupyx.scipy.ndimage.affine_transform (#4158)
Fix argwhere for 0d inputs (#4174)
Fix to use current stream properly with CUDA-related libraries (#4175)
Add compute capability checking for cublasGemmEx() (#4180)
Fix cupyx.seterr() when linalg not supplied (#4189)
Fix nonzero for 0d inputs (#4190)

Code Fixes

Rename submodules under cupyx.scipy.sparse (#3959)
Rename submodule under cupy.fft package (#4066)
Hide private names in cupy.cusolver (#4076)
Move _normalize_axis_index to cupy/core/internal.pyx (#4086)
Rename cupyx.rsqrt submodule (#4116)
Rename submodules under cupyx.scipy.special (#4119)
Move matmul from core.pyx to _routine_linalg.pyx (#4123)
Hide private names in cupy.cutensor (#4147)
Rename cupy.manipulation submodule to cupy._manipulation (#4181)
Rename cupy.io submodule to cupy._io (#4183)
Rename submodule under cupyx.scipy.fft (#4186)
Rename submodules under cupy.linalg package (#4187)

Documentation

Fix typo (#4056)
Update README and docs for a unified tagline (#4074)
Improve the plan cache documentation (#4087)
Simplify ROCm install guide (#4128)

Installation

Add CUDA_VERSION define for Cython compilation (#4035)

Tests

Require SciPy 1.2 for sparse comparison (#4041)
Make parameterized dtype test skip by pytest.skip (#4179)
Code fix on tests for cupyx.scipy.ndiamge stats functions (#4182)
Fix tests that have side effects (#4185)

HIP/ROCm

ROCm: Fix bugs and test suites to make ROCm/HIP happy - Part 2 (#4063)
ROCm: Build on the latest ROCm (#4126)

Others

Bump version to v8.1.0 (#4195)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@anaruse @garanews @grlee77 @leofang

Package Rankings

Top 0.96% on Pypi.org

Top 5.87% on Conda-forge.org

Top 8.17% on Proxy.golang.org

Top 19.57% on Anaconda.org

Badges

Extracted from project README

Related Projects

nice-slam

[CVPR'22] NICE-SLAM: Neural Implicit Scalable Encoding for SLAM

28 Mar 2022 1,418

Deep-Learning-in-Production

In this repository, I will share some useful notes and references about deploying deep learning-b...

03 May 2018 4,294

CV-CUDA

CV-CUDA™ is an open-source, GPU accelerated library for cloud-scale image processing and computer...

23 Aug 2022 2,338

cupoch

Robotics with GPU computing

22 Oct 2019 898

tensorly

TensorLy: Tensor Learning in Python.

21 Oct 2016 1,504

sit4onnx

Tools for simple inference testing using TensorRT, CUDA and OpenVINO CPU/GPU and CPU providers. S...

12 May 2022 18

CuVec

Unifying Python/C++/CUDA memory: Python buffered array ↔️ `std::vector` ↔️ CUDA managed memory

16 Jan 2021 80

Neuromorphic-Computing-Guide

Learn about the Neumorphic engineering process of creating large-scale integration (VLSI) systems...

03 Oct 2021 191

libpython-clj

Python bindings for Clojure

16 May 2019 1,078

cudf

cuDF - GPU DataFrame Library

07 May 2017 7,236

pycuda

CUDA integration for Python, plus shiny features

06 Apr 2011 1,827

chainer

A flexible framework of neural networks for deep learning

05 Jun 2015 5,883

klongpy

High-Performance Klong array language in Python.

06 Jul 2022 117

tsne-cuda

GPU Accelerated t-SNE for CUDA with Python bindings

24 Mar 2018 1,782

spconv

Spatial Sparse Convolution Library

19 Jan 2019 1,847

cupy

Highlights

Support for CUDA Python (#5638)

Support for AMD ROCm 4.3

Announcements

Removal of Alpha/Beta/RC Wheels from PyPI

Changes

New Features

Enhancements

Bug Fixes

Code Fixes

Documentation

Tests

Others

Contributors

Highlights

Compile with SASS (CUBIN) for CUDA versions >= 11.1 (#5097)

Support for AMD ROCm 4.3

Changes

Enhancements

Bug Fixes

Code Fixes

Documentation

Tests

Others

Contributors

Highlights

CuPy now supports CUDA 11.4 (cupy-cuda114)

Google Summer of Code

Compile with SASS (CUBIN) for CUDA versions >= 11.1 (#5097)

Changes without compatibility

Support the new DLPack exchange protocol (#5306)

Known Issues

Changes

New Features

Enhancements

Bug Fixes

Code Fixes

Documentation

Installation

Tests

Others

Highlights

CuPy now supports CUDA 11.4 (cupy-cuda114)

Known Issues

Changes

Enhancements

Bug Fixes

Code Fixes

Documentation

Installation

Examples

Tests

Others

Contributors

Highlights

Known Issues

Changes

Enhancements

Bug Fixes

Code Fixes

Documentation

Installation

Tests

Contributors

Highlights

Changes without compatibility

Drop CUDA 9.2 & NCCL 2.4 Support (#5214)

Changes in Stream behavior (#5251)

Known Issues

Changes

New Features

Enhancements

Performance Improvements

Bug Fixes

Code Fixes

Documentation

Installation

Tests

Contributors

CuPy now supports CUDA 11.4 (`cupy-cuda114`)

CuPy now supports CUDA 11.4 (`cupy-cuda114`)

Make `cupy.cuda.Device` context manager interface thread safe (#5083)

Deprecate `cupyx.allow_synchronize` and `cupyx.DeviceSynchronized` APIs (#5226)

Make `cupy.cuda.Device` context manager interface thread safe (#5083)

Add `MemoryAsyncPool` to support `malloc_async` (#5034)

`cupy.cuda.nccl` is hidden by default (#4919)

Deprecate `cupy.bool`, `cupy.int`, `cupy.float` and `cupy.complex` (#4790)