cupy

NumPy & SciPy for GPU

MIT License

Downloads
758.5K
Stars
7.7K
Committers
370

Bot releases are visible (Hide)

cupy - v11.1.0

Published by emcastillo about 2 years ago

This is the release note of v11.1.0. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

Jetson AGX Orin Support (#6876)

Arm (aarch64) wheels are now compiled with support for compute capability 8.7.
These wheels are available through our Pip index: pip install cupy-cuda11x -f https://pip.cupy.dev/aarch64

Changes

New Features

  • Add cupyx.scipy.special.log_softmax (#6966)

Enhancements

  • Update cupy.array_api (#6929)
  • Add CC 8.7 for Jetson Orin (#6950)
  • Accept kind in sort/argsort and fix cupy.array_api.{sort,argsort} accordingly (#6951)
  • Fix augassign target is evaluated twice in JIT (#6964)
  • Update cupy.array_api (cont'd) (#6973)
  • Support SciPy 1.9 (#6981)
  • Enhance JIT error message when __device__ option is missing (#6991)
  • JIT: Add type annotation in _compile.py (#6993)
  • Make testing decorators able to use with @pytest.mark.parametrize in some cases (#7010)
  • Support keepdims parameter for average (#6897)
  • Support equal_nan parameter for unique (#6904)

Bug Fixes

  • Fix CUB reduction with zero-size arrays (#6968)
  • Fix cuTensorNet shim layer (#6979)
  • Fix SciPy dependency leak (#6980)
  • Fix occasional hang in sparse distributed (#6997)
  • Let argpartition use the kth argument properly (#7020)

Code Fixes

  • Remove proxy functions for softlink (#6946)
  • Suppress nvcc warning (#6970)

Documentation

  • Document matmul supports out (#6899)
  • Bump documentation build requirements (#6930)
  • Expand breaking change & best practice on device management (#6952)
  • Fix docs build failure (#6967)

Tests

  • Fix XFAIL for tests/cupyx_tests/scipy_tests/sparse_tests/test_coo.py when scipy>=1.9.0rc2 (#6963)
  • Use ubuntu-22.04 as GitHub Actions runner image (#6992)
  • Revert comment fix (#6996)
  • Filter warnings from setuptools 65 (#7004)
  • CI: bump CUDA version used in cuda-python test (#7023)
  • CI: Show all errors when doc build fail (#6945)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@asi1024 @emcastillo @khushi-411 @kmaehashi @leofang @takagi @toslunar

cupy - v11.0.0

Published by asi1024 about 2 years ago

This is the release note of v11.0.0. See here for the complete list of solved issues and merged PRs.

This release note only covers changes made since v11.0.0rc1 release. Check out our blog for highlights in the v11 release!

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

cupy-wheel package

Currently, downstream projects depending on CuPy had a hard time specifying a binary wheel as a dependency, and it was the users’ responsibility to install the correct package in their environments. CuPy v10 introduced the experimental cupy-wheel meta-package. In this release, we declare this feature ready for production environments. cupy-wheel will examine the users’ environment and automatically select the matching CuPy binary wheel to be installed.

Changes

For all changes in v11, please refer to the release notes of the pre-releases (alpha1, alpha2, beta1, beta2, beta3, rc1).

Enhancements

  • Support deg in cupy.angle (#6909)
  • Update cupy-wheel for v11 (#6913)
  • Relaxed C-contiguous requirement for changing dtype of different size (#6850)

Bug Fixes

  • Generate CUBIN for all supported GPUs at build time (#6881)
  • Fix real and imag in subclass (#6907)

Code Fixes

  • Fix function names (#6878)

Documentation

  • Fix ROCm supported versions in compat matrix (#6851)
  • Generate docs for private classes in one location (#6858)

Installation

  • Bump version to v11.0.0 (#6915)

Tests

  • Update tags for FlexCI projects (#6860)
  • CI: Add ROCm 5.1 and 5.2 (#6861)
  • Add config for cupy.win.cuda117 (#6885)

Others

  • Bump branch version to v11 (#6845)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@emcastillo @kmaehashi @takagi

cupy - v10.6.0

Published by kmaehashi over 2 years ago

This is the release note of v10.6.0. See here for the complete list of solved issues and merged PRs.

This is the last planned release for CuPy v10 series. We are going to release v11.0.0 on July 28th. Please start testing your workload with the v11 release candidate (pip install --pre cupy-cuda11x -f https://pip.cupy.dev/pre). See the Upgrade Guide for the list of possible breaking changes in v11.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

Support CUDA 11.7 (#6767)

Full support for CUDA 11.7 has been added as of this release. Binary packages can be installed with the following command: pip install cupy-cuda117

Changes without compatibility

Changes

Enhancements

  • Improve warning message in sparse (#6675)
  • Support CUDA 11.7 (#6794)
  • Make the warning for cupy.array_api say "cupy" instead of "numpy" (#6795)
  • cupy-wheel: Use NVRTC to infer the toolkit version (#6831)

Bug Fixes

  • Fix cupy.median for NaN inputs (#6760)
  • Fix batched matmul for integral numbers (#6777)

Documentation

  • Add CUDA 11.7 on documents (#6801)

Tests

  • Fix Dockerfile broken for array-api tests (#6518)
  • Skip ndimage.filter tests for ROCm 4.0 (#6676)
  • Xfail a test of LOBPCG on ROCm 5.0+ (#6733)
  • CI: Fix prep script to show build failure details (#6784)
  • Fix a potential variable misuse bug (#6788)
  • Fix CI Docker image build failing in head test (#6808)
  • Skip ndimage.filter tests for ROCm 4.0 (#6676)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@asi1024 @asmeurer @emcastillo @kmaehashi @LostBenjamin @takagi

cupy - v11.0.0rc1

Published by kmaehashi over 2 years ago

This is the release note of v11.0.0rc1. See here for the complete list of solved issues and merged PRs.

We are going to release v11.0.0 on July 28th. Please start testing your workload with this release candidate (pip install --pre cupy-cuda11x -f https://pip.cupy.dev/pre). See the Upgrade Guide for the list of possible breaking changes.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

Support CUDA 11.7 (#6767)

Full support for CUDA 11.7 has been added as of this release. Binary packages can be installed with the following command: pip install --pre cupy-cuda11x -f https://pip.cupy.dev/pre

Unified Binary Package for CUDA 11.2 or later (#6730)

CuPy v11 provides a unified binary package named cupy-cuda11x that supports all CUDA 11.2+ releases. This replaces per-CUDA version binary packages (cupy-cuda112, cupy-cuda113, …, cupy-cuda117) provided in CuPy v10 or earlier.

Note that CUDA 11.1 or earlier still requires per-CUDA version binary packages. cupy-cuda102, cupy-cuda110, and cupy-cuda111 will be provided for CUDA 10.2, 11.0, and 11.1, respectively.

Binary Package for Arm Platform (#6705)

CuPy v11 provides cupy-cuda11x binary package built for aarch64, which supports CUDA 11.2+ Arm SBSA and JetPack 5.
These wheels are available through our Pip index: pip install --pre cupy-cuda11x -f https://pip.cupy.dev/aarch64

Support for ndarray subclassing (#6720, #6755)

This release allows users to subclass cupy.ndarray, using the same protocol as NumPy:

class C(cupy.ndarray):

    def __new__(cls, *args, info=None, **kwargs):
        obj = super().__new__(cls, *args, **kwargs)
        obj.info = info
        return obj

    def __array_finalize__(self, obj):
        if obj is None:
            return
        self.info = getattr(obj, 'info', None)

a = C([0, 1, 2, 3], info='information')
assert type(a) is C
assert issubclass(type(a), cupy.ndarray)
assert a.info == 'information'

Note that view casting and new from template mechanisms are also supported as described by the NumPy documentation.

Add Collective Communication APIs in cupyx.distributed for Sparse Matrices

All the collective calls implemented for dense matrices now support sparse matrices. Users interested in this feature should install mpi4py in order to perform an efficient metadata exchange.

Google Summer of Code 2022

We would like to give a warm welcome to @khushi-411 who will be working in adding support for the cupyx.scipy.interpolate APIs as part of her GSoC internship!

Changes without compatibility

Bump base Docker image to the latest supported one (#6802)

CuPy official Docker images have been upgraded. Users relying on these images may suffer from compatibility issues with preinstalled tools or libraries.

Changes

New Features

  • Add cupy.setxor1d (#6582)
  • Add initial cupyx.spatial.distance support from pylibraft (#6690)
  • Support cupy.ndarray subclassing - Part 2 - View casting (#6720)
  • Add sparse broadcast (#6758)
  • Add sparse reduce (#6761)
  • Add sparse all_reduce and minor fixes (#6762)
  • Add sparse all_to_all, reduce_scatter, send_recv (#6765)
  • Subclass cupy.ndarray subclassing - Part 3 - New from template (ufunc) (#6775)
  • Add cupyx.scipy.special.log_ndtr (#6776)
  • Add cupyx.scipy.special.expn (#6790)

Enhancements

  • Utilize CUDA Enhanced Compatibility (#6730)
  • Fix to return correct CUDA version when in CUDA Python mode (#6736)
  • Support CUDA 11.7 (#6767)
  • Make the warning for cupy.array_api say "cupy" instead of "numpy" (#6791)
  • Utilize CUDA Enhanced Compatibility in all wrappers (#6799)
  • Add support for cupy-cuda11x wheel (#6800)
  • Bump base Docker image to the latest supported one (#6802)
  • Remove CUPY_CUDA_VERSION as much as possible (#6810)
  • Raise UserWarning in cupy.cuda.compile_with_cache (#6818)
  • cupy-wheel: Use NVRTC to infer the toolkit version (#6819)
  • Support NumPy 1.23 (#6820)
  • Fix for NumPy 1.23 (#6807)

Performance Improvements

  • Improved integer matrix multiplication performance by modifying tuning parameters (#6703)
  • Use fast convolution algorithm in cupy.poly1d.__pow__ (#6770)

Bug Fixes

  • Fix polynomial tests (#6721)
  • Fix batched matmul for integral numbers (#6725)
  • Fix cupy.median for NaN inputs (#6759)
  • Fix required cusparse symbol not loaded in CUDA 11.1.1 (#6806)

Code Fixes

  • Add type annotation in _cuda_types.py (#6726)
  • Subclass rename (#6746)
  • Add type annotation to JIT internal types (#6778)

Documentation

  • Add CUDA 11.7 on documents (#6768)
  • Improved NVTX documentation (#6774)
  • Fix docs to hide ndarray_base (#6782)
  • Update docs for cupy-cuda11x wheel (#6803)
  • Bump NumPy version used in docs (#6824)
  • Add upgrade guide for CuPy v11 (#6826)

Tests

  • Fix mempool tests (#6591)
  • CI: Fix prep script to show build failure details (#6781)
  • Fix a potential variable misuse bug (#6786)
  • Fix CI Docker image build failing in head test (#6804)
  • Tiny clean up in CI script (#6809)

Others

  • Fix docker workflow to push to latest image (#6832)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@andoorve @asi1024 @asmeurer @cjnolet @emcastillo @khushi-411 @kmaehashi @leofang @LostBenjamin @pri1311 @rietmann-nv @takagi

cupy - v10.5.0

Published by emcastillo over 2 years ago

This is the release note of v10.5.0. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Update (2022-06-17): Wheels for CUDA 11.5 Arm SBSA are now available in the Assets section below. (#6705)

Changes

Enhancements

  • Fix compilation warning caused by ifdef (#6740)
  • Support cuDNN 8.4 (#6741)

Bug Fixes

  • Fix memory leak in the FFT plan cache during multi-threading (#6732)
  • Fix ifdef for ROCm >= 4.2 (#6751)

Documentation

  • Minor improvement on the array API docs (#6714)
  • Document the returned benchmark object (#6742)

Tests

  • CI: Update repo for libcudnn7 in cuda10.2 (#6709)
  • Pin mypy version in setup.py (#6711)
  • Follow scipy==1.8.1 sparse dot bugfix (#6728)
  • Support testing CUDA 11.6+ in FlexCI (#6737)
  • Fix GPG key issue in FlexCI base image (#6743)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@asi1024 @emcastillo @kmaehashi @leofang @takagi

cupy - v11.0.0b3

Published by emcastillo over 2 years ago

This is the release note of v11.0.0b3. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

Support cuTensorNet as an einsum backend (#6677) (thanks @leofang!)

A new accelerator for CuPy has been added (CUPY_ACCELERATORS=cutensornet).
This feature requires cuquantum-python >= 22.03 and cuTENSOR >= 1.5.0. And is used to accelerate and support large array sizes in the cupy.linalg.einsum API.

Changes without compatibility

Drop Support for ROCm 4.2 (#6734)

CuPy v11 will drop support for ROCm 4.2. We recommend users to use ROCm 4.3 or 5.0 instead.

Drop Support for NumPy 1.18/1.19 and SciPy 1.4/1.5 (#6735)

As per NEP29, NumPy 1.18/1.9 support has been dropped on 2021. SciPy supported versions are the one released close to NumPy supported ones.

Changes

New Features

  • Support cuTensorNet (from cuQuantum) as an einsum backend (#6677)
  • Add cupy.poly (#6697)
  • Support cupy.ndarray subclassing - Part 1 - Direct constructor call (#6716)

Enhancements

  • Support cuDNN 8.4 (#6641)
  • Support cuTENSOR 1.5.0 (#6665)
  • JIT: Use C++14 (#6670)
  • Support cuTENSOR 1.5.0 (#6722)
  • Drop support for ROCm 4.2 in CuPy v11 (#6734)
  • Drop support for NumPy 1.18/1.19 and SciPy 1.4/1.5 in CuPy v11 (#6735)
  • Fix compilation warning caused by ifdef (#6739)

Performance Improvements

  • Accelerate bincount, histogram2d, histogramdd with CUB (#6701)

Bug Fixes

  • Fix memory leak in the FFT plan cache during multi-threading (#6704)
  • Fix ifdef for ROCm >= 4.2 (#6750)

Code Fixes

  • JIT: Cosmetic change of Dim3 class (#6644)

Documentation

  • Fix imports of scatter_add example (#6696)
  • Minor improvement on the array API docs (#6706)
  • Document the returned benchmark object (#6712)
  • Use exposed name in user guide (#6718)

Tests

  • Xfail a test of LOBPCG on ROCm 5.0+ (#6603)
  • CI: Update repo for libcudnn7 in cuda10.2 (#6708)
  • Bump pinned mypy version (#6710)
  • Follow scipy==1.8.1 sparse dot bugfix (#6727)
  • Support testing CUDA 11.6+ in FlexCI (#6731)
  • Fix GPG key issue in FlexCI base image (#6738)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@asi1024 @Dahlia-Chehata @emcastillo @kmaehashi @leofang @takagi

cupy - v11.0.0b2

Published by asi1024 over 2 years ago

This is the release note of v11.0.0b2. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

JIT Improvements (#6620, #6640, #6649, #6668)

CuPy JIT has been further enhanced thanks to @leofang and @eternalphane!
It is now possible to use CUDA cooperative groups and access .shape and .strides attributes of ndarrays.

import cupy
from cupyx import jit

@jit.rawkernel()
def kernel(x, y):
    size = x.shape[0]
    ntid = jit.gridDim.x * jit.blockDim.x
    tid = jit.blockIdx.x * jit.blockDim.x + jit.threadIdx.x
    for i in range(tid, size, ntid):
        y[i] = x[i]
    g = jit.cg.this_thread_block()
    g.sync()

x = cupy.arange(200, dtype=cupy.int64)
y = cupy.zeros((200,), dtype=cupy.int64)
kernel[2, 32](x, y)

print(kernel.cached_code)

The above program emits the CUDA code as follows:

#include <cooperative_groups.h>
namespace cg = cooperative_groups;

extern "C" __global__ void kernel(CArray<long long, 1, true, true> x, CArray<long long, 1, true, true> y) {
  ptrdiff_t i;
  ptrdiff_t size = thrust::get<0>(x.get_shape());
  unsigned int ntid = (gridDim.x * blockDim.x);
  unsigned int tid = ((blockIdx.x * blockDim.x) + threadIdx.x);
  for (ptrdiff_t __it = tid, __stop = size, __step = ntid; __it < __stop; __it += __step) {
    i = __it;
    y[i] = x[i];
  }
  cg::thread_block g = cg::this_thread_block();
  g.sync();
}

Initial MPI and sparse matrix support in cupyx.distributed (#6628, #6658)

CuPy v10 added the cupyx.distributed API to perform interprocess communication using NCCL in a way similar to MPI. In CuPy v11 we are extending this API to support sparse matrices as defined in cupyx.scipy.sparse. Currently only send/recv primitives are supported but we will be adding support for collective calls in the following releases.

Additionally, now it is possible to use MPI (through the mpi4py python package) to initialize the NCCL communicator. This prevents from launching the TCP server used for communication exchange of CPU values. Moreover, we recommend to enable MPI for sparse matrices communication as this requires to exchange metadata per each communication call that lead to device synchronization if MPI is not enabled.

# run with mpiexec -n N python …

import mpi4py
comm = mpi4py.MPI.COMM_WORLD
workers = comm.Get_size()
rank = comm.Get_rank()

comm = cupyx.distributed.init_process_group(workers, rank, use_mpi=True)

Announcements

Introduction of generic cupy-wheel (EXPERIMENTAL) (#6012)

We have added a new package in the PyPI called cupy-wheel. This meta package allows other libraries to add a dependency to CuPy with the ability to transparently install the exact CuPy binary wheel matching the user environment. Users can also install CuPy using this package instead of manually specifying a CUDA/ROCm version.

pip install cupy-wheel

This package is only available for the stable release as the current pre-release wheels are not hosted in PyPI.

This feature is currently experimental and subject to change so we recommend users not to distribute packages relying on it for now. Your suggestions or comments are highly welcomed (please visit #6688.)

Changes

New Features

  • Support cooperative group in JIT compiler (#6620)
  • Add support for sparse matrices in cupyx.distributed (#6628)
  • JIT: Support compile-time for-loop unrolling (#6649)
  • JIT: Support .shape and .strides (#6668)

Enhancements

  • Add a few driver/runtime/nvrtc API wrappers (#6604)
  • Implement flatten(order) (#6613)
  • Implemented a __repr__ for cupyx.profiler._time._PerfCaseResult (#6617)
  • JIT: Avoid calling default constructor if possible (#6619)
  • Add missing cudaDevAttrMemoryPoolsSupported to hip (#6621)
  • Add CC 3.2 to Tegra arch list (#6631)
  • JIT: Add more cooperative group APIs (#6640)
  • JIT: Add kernel.cached_code test (#6643)
  • Use MPI for management in cupyx.distributed (#6658)
  • Improve warning message in sparse (#6669)

Performance Improvements

  • Improve copy and assign operation (#6181)
  • Performance improvement of cupy.intersect1d (#6586)

Bug Fixes

  • Define float16::operator-() only for ROCm 5.0+ (#6624)
  • JIT: fix access to cached codes (#6639)
  • Fix cuda python CI (#6652)
  • Fix int64 overflow in cupy.polyval (#6664)
  • JIT: Disable memcpy_async on CUDA 11.0 (#6671)

Documentation

  • Add --pre option to instructions installing pre-releases (#6612)
  • JIT: fix function signatures in the docs (#6648)
  • Fix typo in performance guide (#6657)

Installation

  • Add universal CuPy package (#6012)

Tests

  • Run daily benchmark with head branch against latest release (#6598)
  • CI: Trigger FlexCI for hotfix branches (#6625)
  • Remove jenkins requirements (#6632)
  • Fix TestIncludesCompileCUDA for HEAD tests (#6646)
  • Trigger CUDA Python tests with /test mini (#6653)
  • Fix missing f prefix on f-strings fix (#6674)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@asi1024 @code-review-doctor @danielg1111 @davidegavio @emcastillo @eternalphane @kmaehashi @leofang @okuta @takagi @toslunar

cupy - v10.4.0

Published by asi1024 over 2 years ago

This is the release note of v10.4.0. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Announcements

Introduction of generic cupy-wheel (EXPERIMENTAL) (#6012)

We have added a new package in the PyPI called cupy-wheel. This meta package allows other libraries to add a dependency to CuPy with the ability to transparently install the exact CuPy binary wheel matching the user environment. Users can also install CuPy using this package instead of manually specifying a CUDA/ROCm version.

pip install cupy-wheel

This package is only available for the stable release as the current pre-release wheels are not hosted in PyPI.

This feature is currently experimental and subject to change so we recommend users not to distribute packages relying on it for now. Your suggestions or comments are highly welcomed (please visit #6688.)

Changes

Enhancements

  • Add missing cudaDevAttrMemoryPoolsSupported to hip (#6626)
  • Add CC 3.2 to Tegra arch list (#6647)
  • Add a few driver/runtime/nvrtc API wrappers (#6651)

Bug Fixes

  • Define float16::operator-() only for ROCm 5.0+ (#6629)
  • JIT: fix access to cached codes (#6642)
  • [v10] Fix Mempool attr for Cuda Python (#6654)
  • Fix int64 overflow in cupy.polyval (#6666)

Documentation

  • Documentation update for ROCm 5.0 (#6607)
  • Add --pre option to instructions installing pre-releases (#6614)
  • Fix typo in performance guide (#6659)
  • JIT: fix function signatures in the docs (#6660)

Installation

  • Add universal CuPy package (#6683)

Tests

  • Remove jenkins requirements (#6634)
  • CI: Trigger FlexCI for hotfix branches (#6636)
  • Fix TestIncludesCompileCUDA for HEAD tests (#6650)
  • Trigger CUDA Python tests with /test mini (#6655)
  • Fix missing f prefix on f-strings fix (#6679)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@asi1024 @code-review-doctor @danielg1111 @emcastillo @kmaehashi @leofang @takagi

cupy - v10.3.1

Published by emcastillo over 2 years ago

This is the release note of v10.3.1. See here for the complete list of solved issues and merged PRs.

This is a hot-fix release for v10.3.0 which contained a regression that prevents CuPy from working on older CUDA GPUs (Maxwell or earlier).

Changes

Bug Fixes

  • Define float16::operator-() only for ROCm 5.0+ (#6630)

Installation

  • Bump version to v10.3.1 (#6633)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@kmaehashi @takagi

cupy - v10.3.0

Published by kmaehashi over 2 years ago

This is the release note of v10.3.0. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Notice (2022-04-08)

We have published a hot-fix release v10.3.1 which addresses a regression that prevents CuPy from working in older CUDA GPUs (Maxwell or earlier).

Highlights

Support for CUDA 11.6

Full support for CUDA 11.6 has been added as of this release. Binary packages are available in PyPI and can be installed with the following command: pip install cupy-cuda116

Support for ROCm 5.0

Full support for ROCm 5.0 has been added as of this release. Binary packages are available in PyPI and can be installed with the following command: pip install cupy-rocm-5-0

Changes

Enhancements

  • Support ROCm 5.0 (#6496)
  • Support cuSPARSELt 0.2.0 (repost) (#6507)
  • Update cupy.array_api (#6550)
  • Fix cupy.copyto to take NumPy array scalars (#6593)
  • Fix for supporting ROCm 5.0 (#6599)
  • Make einsum accept subscripts in numpy int (#6516)

Bug Fixes

  • Fix error message in vectorize (#6515)
  • Fix cupy.cumsum on ROCm 5.0 (#6525)
  • Fix coo_matrix.diagonal (#6533)
  • Fix out args parser of ufunc (#6547)
  • Fix cupy.fill to properly take zero-dim cupy.ndarray (#6548)
  • Fix cuSPARSELt 0.1.0 support in v10 (#6563)
  • Fix may_share_memory algorithm (#6565)
  • Avoid using the same kernel from different devices in JIT (#6581)
  • Fix array creation shape (#6592)
  • Fix cupy.full and cupy.full_like to make unsafe casting (#6595)
  • Fix device context management in MemoryAsyncPool (#6596)

Code Fixes

  • mypy: array_api (#6552)

Documentation

  • Remove description about issues from contribution guide (#6542)
  • Fix documents for CUDA 11.6 (#6543)

Installation

  • Remove CUPY_SETUP_ENABLE_THRUST=0 environment variable (#6488)
  • Skip appending --compiler-bindir if cl.exe is already on PATH (#6514)
  • Bump version to v10.3.0 (#6602)

Tests

  • Ignore warnings from Optuna 3.0 pre-releases (#6490)
  • Disable CentOS 8 test (#6519)
  • Add FlexCI projects for Windows (#6540)
  • Skip async_malloc tests on unsupported device (#6544)
  • CI: Trigger push event of FlexCI via GitHub Actions (#6554)
  • CI: regenerate matrix (#6557)
  • CI: Fix rule name in dispatcher (#6558)
  • CI: Fix event name in dispatcher (#6559)
  • Fix flaky test_inverse_indices_shape (#6573)
  • Trigger CUDA 11.6 Windows CI when push/pull-request (#6578)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@anaruse @asi1024 @kmaehashi @leofang @Onkar627 @takagi @toslunar @tushxr16

cupy - v11.0.0b1

Published by kmaehashi over 2 years ago

This is the release note of v11.0.0b1. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Notice (2022-04-05)

We have identified that this release contains a regression that prevents CuPy from working in older CUDA GPUs (Maxwell or earlier). We are planning to fix this issue in the next pre-release. See #6615 for the details.

Highlights

Increase coverage of cupyx.scipy.special APIs (#6461, #6582, #6571)

A series of scipy.special routines have been added to cupyx with optimized CUDA raw kernel implementations. loggamma, multigammaln, fast Hankel transformations and several other utility special functions are added in these series of PRs by @grlee77 and @khushi-411.

Support for CUDA 11.6

Full support for CUDA 11.6 has been added as of this release. Binary packages can be installed with the following commnad: pip install --pre cupy-cuda116 -f https://pip.cupy.dev/pre

Support for ROCm 5.0

Full support for ROCm 5.0 has been added as of this release. Binary packages can be installed with the following commnad: pip install --pre cupy-rocm-5-0 -f https://pip.cupy.dev/pre

Changes without compatibility

Use CUB by default (#6549)

CUB support in CuPy is now enabled by default. This results in faster general reductions and routines such as sum, argmax, argmin having increased performance. Notice that CUB may introduce some non-deterministic behavior and this can be disabled by setting the CUPY_ACCELERATORS="" environment variable.

Drop support for ROCm 4.0 (#6420)

CuPy v11 will drop support for ROCm 4.0. We recommend users to use ROCm 4.3 or 5.0 instead.

Changes

New Features

  • Add cupyx.scipy.special statistical distributions (#6461)
  • Add cupy.real_if_close API (#6475)
  • Add cupyx.scipy.special loggamma, multigammaln and fast Hankel transforms (#6528)
  • Add cupyx.scipy.special.{i0e, i1e} (#6571)

Enhancements

  • Update cupy.array_api (#6486)
  • Fix for supporting ROCm 5.0 (#6524)
  • Use CUB by default (#6549)
  • Fix cupy.copyto to take NumPy array scalars (#6584)
  • Implement ndarray.ravel(order="K") (#6585)
  • Make einsum accept subscripts in numpy int (#6506)

Performance Improvements

  • Support cusparseSpGEMM() (#6511)
  • eigsh: Prefer gemv over gemm (#6570)
  • Performance improvement of cupy.in1d (#6583)

Bug Fixes

  • Fix cupy.fill to properly take zero-dim cupy.ndarray (#6481)
  • Fix error message in vectorize (#6499)
  • Fix cupy.cumsum on ROCm 5.0 (#6520)
  • Fix coo_matrix.diagonal (#6522)
  • Fix array creation shape (#6545)
  • Fix out args parser of ufunc (#6546)
  • Fix may_share_memory algorithm (#6560)
  • Avoid using the same kernel from different devices in JIT (#6575)
  • Fix cupy.full and cupy.full_like to make unsafe casting (#6587)
  • Fix device context management in MemoryAsyncPool (#6590)

Code Fixes

  • mypy: array_api (#6438)
  • Minor fixes on uarray backend support (#6526)

Documentation

  • Fix documents for CUDA 11.6 (#6405)
  • Remove description about issues from contribution guide (#6497)
  • Documentation update for ROCm 5.0 (#6530)

Installation

  • Skip appending --compiler-bindir if cl.exe is already on PATH (#6510)
  • Bump version to v11.0.0b1 (#6601)

Tests

  • Add FlexCI projects for Windows (#5889)
  • Run cupy-benchmark on CI (#6417)
  • Disable CentOS 8 test (#6492)
  • Fix Dockerfile broken for array-api tests (#6508)
  • CI: Trigger push event of FlexCI via GitHub Actions (#6538)
  • Skip async_malloc tests on unsupported device (#6541)
  • Fix flaky test_inverse_indices_shape (#6551)
  • Trigger CUDA 11.6 Windows CI when push/pull-request (#6553)
  • CI: Fix event name in dispatcher (#6555)
  • CI: Fix rule name in dispatcher (#6556)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@anaruse @asi1024 @emcastillo @grlee77 @khushi-411 @kmaehashi @leofang @Onkar627 @peterbell10 @pri1311 @Smit-create @takagi @toslunar @tushxr16

cupy - v10.2.0

Published by emcastillo over 2 years ago

This is the release note of v10.2.0. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

Support for CUDA 11.6 (#6349)

Initial support for CUDA 11.6 has been added as of this release. However, binary wheels are not yet distributed and users are expected to build CuPy from source meanwhile.

Changes

Enhancements

  • Support cuDNN 8.3.2 (#6328)
  • Support cuTENSOR 1.4.0 (#6330)
  • Support CUDA 11.5.1 (#6331)
  • Support NumPy 1.22 (#6354)
  • avoid DeprecationWarning from SciPy 1.8 (cupyx.scipy.sparse) (#6379)
  • Fix thrust related build issue with CUDA 11.6 (#6386)
  • Fix type annotations in installer (#6395)
  • Support CUDA 11.6 (#6422)
  • Bump Jitify version to fix memory leak (#6432)
  • Add __cupy_get_ndarray__ dunder method to transform objects to arrays' (#6465)
  • Warn if unexpectedlly failed to detect device count in cupy.show_config() (#6476)
  • Fix verbose LOBPCG for SciPy 1.8 (#6394)

Bug Fixes

  • Fix JIT to support notebook environment (#6356)
  • Fix cuDNN installer not working (#6368)
  • Fix cupyx.ndimage.spline_filter1d for HIP (#6411)
  • Fix boolean views for HIP (#6418)
  • Fix cupy.nan_to_num (#6431)
  • Fix reduction contiguous size calculation (#6464)

Code Fixes

  • Remove global use_hip flag in setup (#6398)

Documentation

  • Use cupy.__version__ instead of pkg_resources (#6380)
  • Tentatively pin intersphinx to SciPy 1.7.1 docs (#6442)
  • Revert "Tentatively pin intersphinx to SciPy 1.7.1 docs" (#6480)

Installation

  • Fix for cuDNN directory structure in Windows (#6369)
  • Install lib directory on Windows in cuDNN installer (#6370)
  • Avoid monkeypatching distutils (#6373)
  • Eliminate unnecessary configuration pass in setup (#6399)
  • Bump version to v10.2.0 (#6502)

Tests

  • CI: use CUDA docker images for CUDA Python CI (#6338)
  • Avoid empty notification message for scheduled tests (#6364)
  • CI: allow discarding docker image cache manually (#6372)
  • Parameterize library installer tests (#6374)
  • Fix tests for eigh() for CUDA 11.6 (#6376)
  • Add cupy.testing.installed (#6387)
  • Mark XFAIL for SciPy 1.8 release candidate (#6396)
  • CI: build docs in parallel (#6419)
  • CI: Bump ROCm version from 4.3 to 4.3.1 (#6421)
  • CI: Use default schema/matrix path in generate.py (#6428)
  • CI: Manage test tags in yaml (#6441)
  • Support SciPy 1.8 (#6444)
  • CI: coverage in reST (#6447)
  • CI: fix NCCL 2.10 unit test not covered (#6452)
  • Skip hfft related tests in HIP (#6458)
  • CI: Fix CUDA 11.6 driver update steps (#6471)

Others

  • CI: allow specifying special skip tag (#6477)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@anaruse @emcastillo @grlee77 @kmaehashi @takagi

cupy - v11.0.0a2

Published by emcastillo over 2 years ago

This is the release note of v11.0.0a2 See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

Improved NumPy functions coverage (#6078)

As series of NumPy routines have been proposed as a good-first-issue and as a result, an increasing number of contributors have sent pull requests to help increase the number of available APIs. An issue tracker with the currently implemented issues is available at #6078.

Initial support for cupy.typing (#6251)

An API equivalent to numpy.typing to allow the introduction of data types in CuPy and user codes has been added.

Support for CUDA 11.6 (#6349)

Initial support for CUDA 11.6 has been added as of this release. However, binary wheels are not yet distributed and users are expected to build CuPy from source meanwhile.

Support for ROCm 5.0 (#6466)

Initial support for ROCm 5.0 has been added as of this release. However, binary wheels are not yet distributed and users are expected to build CuPy from source meanwhile.

Changes without compatibility

Drop support for ROCm 4.0 (#6420)

CuPy v11 will drop support for ROCm 4.0. We recommend users to use ROCm 4.2/4.3 instead.

Changes

New Features

  • Add cupy.isneginf and cupy.isposinf (#6089)
  • Add cupy.typing (#6251)
  • Add asarray_chkfinite API. (#6275)
  • Add Box-Cox transformations to cupyx.scipy.special (#6302)
  • Use CUDA's log1p for cupyx.scipy.special.log1p (#6315)
  • Add special functions from the CUDA Math API (#6317)
  • Add beta functions to cupyx.scipy.special (#6318)
  • Add cupy.union1d API. (#6357)
  • Add cupy.float_power (#6371)
  • Add cupy.intersect1d API. (#6402)
  • Add cupy.setdiff1d api. (#6433)
  • Add cupy.format_float_scientific API (#6474)

Enhancements

  • First step of mypy introduction (#4955)
  • Fix CI failure to support SciPy 1.8.0 (#6249)
  • implement overwrite_input in cupy.{percentile,quantile} (#6298)
  • avoid DeprecationWarning from SciPy 1.8 (cupyx.scipy.sparse) (#6321)
  • Support NumPy 1.22 (#6323)
  • Remove batched QR solver's experimental mark (#6327)
  • Make scipy.special ufuncs work with CuPy inputs (#6341)
  • Fix thrust related build issue with CUDA 11.6 (#6346)
  • Support CUDA 11.6 (#6349)
  • Fix CI failure to support SciPy 1.8.0 (#6362)
  • Fix type annotations in installer (#6382)
  • Add __cupy_get_ndarray__ dunder method to transform objects to arrays' (#6414)
  • Bump Jitify version to fix memory leak (#6430)
  • Support cuSPARSELt 0.2.0 (repost) (#6436)
  • Support ROCm 5.0 (#6466)
  • Warn if unexpectedlly failed to detect device count in cupy.show_config() (#6472)
  • Fix verbose LOBPCG for SciPy 1.8 (#6388)

Performance Improvements

  • Reduce memory usage in cupy.sort (#6392)

Bug Fixes

  • Fix JIT to support notebook environment (#6329)
  • Fix cupyx.ndimage.spline_filter1d for HIP (#6406)
  • Fix cupy.nan_to_num (#6408)
  • Fix cupyx.special.gammainc, lpmv and sph_harm for hip (#6409)
  • Fix boolean views for HIP (#6412)
  • Fix reduction contiguous size calculation (#6457)

Code Fixes

  • Remove global use_hip flag in setup (#6391)
  • Hide private names in cupyx.scipy.linalg (#6449)
  • Hide private names in cupyx.scipy.ndimage (#6450)
  • Hide private names in cupyx.scipy.signal (#6451)
  • Hide private names in cupyx.scipy.sparse (#6454)
  • Hide private names in cupyx.scipy.stats (#6456)

Documentation

  • Use cupy.__version__ instead of pkg_resources (#6332)
  • Tentatively pin intersphinx to SciPy 1.7.1 docs (#6440)
  • Revert "Tentatively pin intersphinx to SciPy 1.7.1 docs" (#6479)

Installation

  • Avoid monkeypatching distutils (#6273)
  • Eliminate unnecessary configuration pass in setup (#6389)
  • Remove CUPY_SETUP_ENABLE_THRUST=0 environment variable (#6390)
  • Drop support for ROCm 4.0 (#6420)
  • Bump version to v11.0.0a2 (#6501)

Tests

  • CI: allow discarding docker image cache manually (#6269)
  • Add slow tests for stable branch (#6340)
  • Parameterize library installer tests (#6343)
  • Fix tests for eigh() for CUDA 11.6 (#6347)
  • Avoid empty notification message for scheduled tests (#6363)
  • Support SciPy 1.8 (#6365)
  • Add cupy.testing.installed (#6381)
  • Mark XFAIL for SciPy 1.8 release candidate (#6385)
  • CI: Bump ROCm version from 4.3 to 4.3.1 (#6415)
  • CI: build docs in parallel (#6416)
  • CI: Add HEAD tests for stable branch (#6423)
  • CI: Use default schema/matrix path in generate.py (#6424)
  • Skip hfft related tests in HIP (#6427)
  • CI: Manage test tags in yaml (#6429)
  • CI: coverage in reST (#6445)
  • CI: fix NCCL 2.10 unit test not covered (#6448)
  • CI: Fix CUDA 11.6 driver update steps (#6467)
  • Ignore warnings from Optuna 3.0 pre-releases (#6470)
  • Fix failing tests in ROCm (#6482)

Others

  • CI: allow specifying special skip tag (#6468)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@amanchhaparia @anaruse @asi1024 @emcastillo @grlee77 @IvanYashchuk @khushi-411 @kmaehashi @pri1311 @saswatpp @takagi

cupy - v10.1.0

Published by asi1024 over 2 years ago

This is the release note of v10.1.0. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Changes

Enhancements

  • Remove memory copy in matmul (#6241)
  • Fix cupy.linalg.qr to align with NumPy 1.22 (#6263)

Bug Fixes

  • Fix edge case compatibility in cupy.eye() (#6213)
  • Fix compile_with_cache returning None (#6236)
  • Allow flip ()-shaped array (#6237)
  • Fix linalg.eigh and linalg.eigvalsh on empty inputs (#6238)
  • Fix overloading ambiguity in ndimage filters (#6242)
  • Fixing index calculation for random constructor (#6267)
  • BUG: Fix the .T attribute in the array_api namespace (#6291)

Code Fixes

  • Remove legacy cp.linalg.solve() implementation (#6235)

Documentation

  • Docs: CentOS installation from source (#6230)
  • Add cupy.positive in API Reference (#6276)
  • Fix eigsh doc (#6292)

Tests

  • Add tests for convolve2d (#6194)
  • Change a parameter name in percentile and quantile to support NumPy 1.22 (#6247)
  • Tentatively pin to setuptools<60 in Windows CI (#6270)
  • Fix cache key for github actions (#6286)
  • Remove XFAIL for XPASS tests on ROCm (#6297)
  • Use NVIDIA docker images for CUDA 11.5 (#6304)
  • Tentatively pin to CUDA Driver 495 (#6311)
  • CI: Add cuda-slow test in FlexCI (#6339)

Others

  • Bump version to v10.1.0 (#6345)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@asi1024 @kmaehashi @leofang @ptim0626 @SauravMaheshkar @takagi @thomasjpfan @toslunar @WiseroOrb

cupy - v11.0.0a1

Published by asi1024 over 2 years ago

This is the release note of v11.0.0a1. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

Improved NumPy functions coverage (#6078)

As series of NumPy routines have been proposed as a good-first-issue and as a result, an increasing number of contributors have sent pull requests to help increase the number of available APIs. An issue tracker with the currently implemented issues is available at #6078.

Add cupyx.scipy.special functions (#5687)

Spherical harmonics, Legendre and Gamma functions are implemented using highly performant specific CUDA kernels. Thanks to @grlee77!

Initial support for CUDA Graph API by means of stream capture API (#4567)

This PR adds the ability of using the CUDA Graph API to greatly reduce the overhead of kernel launching. This is done by using the stream capture API, and example follows.
Thanks to @leofang!

import cupy as cp

a = cp.random.randint(0, 10, 100, dtype=np.int32)
s = cp.cuda.Stream(non_blocking=True)

with s:
    s.begin_capture()
    a += 3
    a = cp.abs(a)
    g = s.end_capture()  # work is queued, but not yet launched
g.launch()
s.synchronize()

Support __device__ function in CuPy JIT (#6265)

The new interface cupyx.jit.rawkernel(device=True) is supported to define a CUDA device function.

from cupyx import jit

@jit.rawkernel(device=True)
def getitem(x, tid):
    return x[tid]

@jit.rawkernel()
def elementwise_copy(x, y):
    tid = jit.threadIdx.x + jit.blockDim.x * jit.blockIdx.x
    y[tid] = getitem(x, tid)

The following CUDA code is generated from the above python code.

__device__ int getitem_1(CArray<int, 1, true, true> x, unsigned int tid) {
  return x[tid];
}
extern "C" __global__ void elementwise_copy(CArray<int, 1, true, true> x, CArray<int, 1, true, true> y) {
  unsigned int tid;
  tid = (threadIdx.x + (blockDim.x * blockIdx.x));
  y[tid] = getitem_1(x, tid);
}

Changes

New Features

  • Support stream capture (#4567)
  • Add additional special functions (spherical harmonics, Legendre, Gamma functions) (#5687)
  • Add cupy.asfarray (#6085)
  • Add cupy.trapz (#6107)
  • Add cupy.array_api.linalg (#6131)
  • Add cupy.mask_indices (#6156)
  • Add cupy.array_equiv API. (#6254)
  • Add cupy.cublas.syrk and cupy.cublas.sbmv (#6278)
  • Add cupy.vander API. (#6279)
  • Add cupy.ediff1d API. (#6280)
  • Add cupy.fabs API. (#6282)
  • Add discrete cosine and sine transforms to cupyx.scipy.fft (#6288)
  • Add logit, expit and log_expit to cupyx.scipy.special (#6300)
  • Add xlogy and xlog1py to cupyx.scipy.special(#6301)
  • Add tril_indices and tril_indices_from API. (#6305)
  • Add cupy.format_float_positional (#6308)
  • Add cupy.row_stack API. (#6312)
  • Add triu_indices and triu_indices_from API. (#6316)

Enhancements

  • Raise better message when importing CPU array via DLPack (#6051)
  • Borrow more non-GPU APIs from NumPy (#6074)
  • Add more aliases for compatibility with NumPy (#6075)
  • Import more dtype aliases from NumPy (#6076)
  • Borrow indexing APIs from NumPy (#6077)
  • Apply upstream patch to cupy.array_api (#6086)
  • Compile cub/thrust with no unique symbol (#6106)
  • Support cuDNN 8.3.0 (#6108)
  • Support all advanced indexing (#6127)
  • Support CUDA 11.5.1 (#6166)
  • Support lambda function in cupy.vectorize (#6170)
  • Support eigenvalue solver 64bit API (#6178)
  • Support cuTENSOR 1.4.0 (#6187)
  • Make matmul support ufunc kwargs (#6195)
  • Alias NumPy error classes (#6212)
  • Support comparison to None and Ellipsis (#6222)
  • JIT: Fix if expr typing rule (#6234)
  • Support comparison with more objects (#6250)
  • JIT: Support __device__ function (#6265)
  • More clear warning message (#6283)
  • Make streams hashable (#6285)
  • Check isinstance before comparison in __eq__ (#6287)
  • Support cuDNN 8.3.2 (#6314)
  • Deprecate MachAr (support NumPy 1.22) (#6188)
  • Fix cupy.linalg.qr to align with NumPy 1.22 (#6225)
  • Change a parameter name in percentile and quantile to support NumPy 1.22 (#6228)

Performance Improvements

  • Avoid 64bit division for reduce register consumption (#6019)
  • Remove memory copy in matmul (#6179)

Bug Fixes

  • Detect repeated axis in reduction (#5964)
  • Fix __all__ in cupyx.scipy.fft (#6071)
  • Fix __getitem__ on Ellipsis and advanced indexing dimension (#6081)
  • Allow leading unit dimensions in copy source (#6118)
  • Always test broadcast in copyto (#6121)
  • Fix overloading ambiguity in ndimage filters (#6162)
  • Fix empty Cholesky (#6164)
  • Fix empty solve (#6167)
  • Allow flip ()-shaped array (#6169)
  • Handles infinities of the same sign in logaddexp and logaddexp2 (#6172)
  • Fix #4675 on resolving TODO in #4198 (#6197)
  • Eigenvalue solver 64bit API on CUDA 11.1 (#6201)
  • Fix edge case compatibility in cupy.eye() (#6208)
  • Fix linalg.eigh and linalg.eigvalsh on empty inputs (#6210)
  • Fix overlapping out in matmul and (tensor)dot (#6216)
  • Fix compile_with_cache returning None (#6232)
  • Fixing index calculation for random constructor (#6257)
  • BUG: Fix the .T attribute in the array_api namespace (#6289)
  • Fix stream capture in ROCm (#6296)
  • Fix cuDNN installer not working (#6337)

Code Fixes

  • Remove __all__ from cupyx/scipy/* (#6149)
  • Delete from os import path (#6152)
  • Remove legacy cp.linalg.solve() implementation (#6161)

Documentation

  • Add link to compatibility matrix (#6055)
  • Update upgrade guide (#6058)
  • Add v11 to compatibility matrix (#6067)
  • Exclude kernel_version from comparison table (#6072)
  • Doc: Add more footnotes to comparison table (#6073)
  • Add polynomial modules to comparison table (#6082)
  • Add CITATION.bib and update README (#6091)
  • Remove LLVM_PATH note on document (#6093)
  • Docs: Update linkcode implementation (#6126)
  • Update footnotes in comparison table (#6142)
  • Update conda-forge installation guide (#6186)
  • Revise Overview for CuPy v10 (#6209)
  • Docs: CentOS installation from source (#6218)
  • Fix cupy.trapz docstring (#6239)
  • Fix eigsh doc (#6266)
  • Add cupy.positive in API Reference (#6274)

Installation

  • Replace distutils with setuptools in Windows cl.exe detection (#6025)
  • Fix for cuDNN directory structure in Windows (#6342)

Tests

  • Fix testing.multi_gpu to add pytest marker (#6015)
  • CI: add link to ROCm projects in CI coverage matrix (#6037)
  • CI: use separate project for multi-GPU tests (#6050)
  • Fix CI result notification message format (#6066)
  • Fix CI cannot override cuSPARSELt/cuTENSOR version preinstalled (#6084)
  • Workaround DeprecationWarning raised from pkg_resources (#6094)
  • Fix missing multi_gpu annotation in tests (#6098)
  • Fix exception handling in cupyx.distributed (#6114)
  • Improve FlexCI test scripts (#6117)
  • CI: Add timeout to show_config (#6120)
  • Trigger FlexCI from GitHub Actions (#6130)
  • CI: Fix package override sometimes fails in CentOS (#6141)
  • CI: Need to update CUDA driver in cuda115.multi (#6144)
  • Add tests for convolve2d (#6171)
  • CI: Update limits to reduce cache size (#6174)
  • CI: Fix unquoted specifiers (#6175)
  • Support pre-release NumPy version in tests (#6190)
  • Remove XFAIL for XPASS tests on ROCm (#6259)
  • Tentatively pin to setuptools<60 in Windows CI (#6260)
  • Fix cache key for github actions (#6281)
  • Use NVIDIA docker images for CUDA 11.5 (#6303)
  • Tentatively pin to CUDA Driver 495 (#6310)
  • Remove unused dtype parameterizing in tril_indices test (#6322)
  • Use get_include instead of array_equiv for fallback test (#6333)
  • CI: Add cuda-slow test in FlexCI (#6335)
  • CI: use CUDA docker images for CUDA Python CI (#6336)

Others

  • Add doc issue template (#6294)
  • Bump version to v11.0.0a1 (#6344)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@akochepasov @amanchhaparia @asi1024 @ColmTalbot @emcastillo @eternalphane @grlee77 @haesleinhuepf @khushi-411 @kmaehashi @leofang @okuta @ptim0626 @SauravMaheshkar @shwina @takagi @thomasjpfan @tom24d @toslunar @twmht @WiseroOrb @Yutaro-Sanada

cupy - v10.0.0

Published by kmaehashi almost 3 years ago

This is the release note of v10.0.0. See here for the complete list of solved issues and merged PRs.

This release note only covers changes made since v10.0.0rc1 release. Check out our blog for highlights in the v10 release!

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

Support all advanced indexing (#6196)

The support for advanced indexing using boolean masks has been completed in CuPy v10.
Now it is possible to index arrays using combinations of Ellipsis, boolean flags and regular indexes such as a[[[1, 1, -3], [0, 2, 2]], [True, False, True, True]] and a[..., [[False, True]]]

Support lambda functions in cupy.vectorize (#6217)

A long-awaited feature to ensure compatibility with NumPy vectorize has been implemented. In this release, it is now possible to transpile lambda functions. This is especially handy when using JIT in conjunction with cupy.vectorize:

import cupy

a = cupy.array([0.4, -0.2, 1.8, -1.2])
relu = cupy.vectorize(lambda x: (x > 0.0) * x)
print(relu(a))  # [ 0.4 -0.   1.8 -0. ]

Announcements

Drop support for CUDA 10.1 or earlier (#5770)

As per the RFC in #5717 and Twitter, the minimum CUDA version that is supported by CuPy v10 is CUDA 10.2.

Drop support for NCCL 2.6 and 2.7 (#5855)

The minimum supported version for CuPy v10 is NCCL 2.8 as it implements the required primitives for cupyx.distributed to work.

Drop support for Python 3.6 (#5771)

Following the Python 3.6 sunset on December 2021, and the compatibility lines with NumPy, starting CuPy v10, Python 3.6 will no longer be supported.

Drop support for NumPy 1.17 (#5857)

As per NEP29, NumPy 1.17 support has been dropped on July 26, 2021.

Changes

New Features

  • Add cupy.array_api.linalg (#6199)

Enhancements

  • Add more aliases for compatibility with NumPy (#6080)
  • Raise better message when importing CPU array via DLPack (#6097)
  • Apply upstream patch to cupy.array_api (#6105)
  • Borrow more non-GPU APIs from NumPy (#6109)
  • Import more dtype aliases from NumPy (#6110)
  • Borrow indexing APIs from NumPy (#6111)
  • Compile cub/thrust with no unique symbol (#6140)
  • Support cuDNN 8.3.0 (#6150)
  • Support eigenvalue solver 64bit API (#6192)
  • Support all advanced indexing (#6196)
  • Support lambda functions in cupy.vectorize (#6217)
  • Deprecate MachAr (support NumPy 1.22) (#6189)

Performance Improvements

  • Avoid 64bit division to reduce register consumption (#6102)

Bug Fixes

  • Fix __all__ in cupyx.scipy.fft (#6083)
  • Detect repeated axis in reduction (#6103)
  • Fix __getitem__ on Ellipsis and advanced indexing dimension (#6113)
  • Allow leading unit dimensions in copy source (#6153)
  • Always test broadcast in copyto (#6155)
  • Handles infinities of the same sign in logaddexp and logaddexp2 (#6176)
  • Fix empty solve (#6183)
  • Fix empty Cholesky (#6184)
  • Fix #4675 on resolving TODO in #4198 (#6204)
  • Eigenvalue solver 64bit API on CUDA 11.1 (#6220)

Code Fixes

  • Avoid from os import path (#6165)

Documentation

  • Update stable branch (#6065)
  • Update labels of Docs column (#6068)
  • Add more footnotes to comparison table (#6079)
  • Exclude kernel_version from comparison table (#6090)
  • Remove LLVM_PATH note on document (#6101)
  • Add polynomial modules to comparison table (#6122)
  • Add link to compatibility matrix (#6135)
  • Update footnotes in comparison table (#6143)
  • Update conda-forge installation guide (#6200)
  • Update upgrade guide (#6203)
  • Update linkcode implementation (#6206)
  • Revise Overview for CuPy v10 (#6215)

Installation

  • Replace distutils with setuptools in Windows cl.exe detection (#6138)
  • Bump version to v10.0.0 (#6224)

Tests

  • Fix CI cannot override cuSPARSELt/cuTENSOR version preinstalled (#6087)
  • Workaround DeprecationWarning raised from pkg_resources (#6095)
  • Fix testing.multi_gpu to add pytest marker (#6096)
  • Fix missing multi_gpu annotation in tests (#6100)
  • Fix exception handling in cupyx.distributed (#6116)
  • Improve FlexCI test scripts (#6119)
  • Fix CI result notification message format (#6124)
  • CI: Add timeout to show_config (#6132)
  • CI: use separate project for multi-GPU tests (#6145)
  • CI: Need to update CUDA driver in cuda115.multi (#6146)
  • CI: Fix package override sometimes fails in CentOS (#6147)
  • CI: add link to ROCm projects in CI coverage matrix (#6148)
  • CI: Fix unquoted specifiers (#6182)
  • CI: Update limits to reduce cache size (#6185)
  • Trigger FlexCI from GitHub Actions (#6191)
  • Support pre-release NumPy version in tests (#6193)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@asi1024 @emcastillo @eternalphane @kmaehashi @leofang @okuta @takagi @toslunar @twmht @Yutaro-Sanada

cupy - v9.6.0

Published by emcastillo almost 3 years ago

This is the release note of v9.6.0. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Announcements

Final release for v9.x series

This is expected to be the last release of the CuPy v9 series. Please start trying your workflow with CuPy v10.0.0rc1 and let us know if you have any feedback!

CuPy now supports CUDA 11.5

Wheels for CUDA 11.5 (cupy-cuda115) are now available.

Removal of Alpha/Beta/RC Wheels from PyPI

  • As per the discussion in #5671, we stopped uploading pre-release binary wheels to PyPI for the health of the ecosystem. Pre-release wheels can now be downloaded from the recently introduced custom index (e.g., pip install cupy-cudaXXX -f https://pip.cupy.dev/pre) . Note that the sdist package is available in PyPI for all versions.

  • Outdated (v8.0.0rc1 or earlier) pre-release binaries have been removed from PyPI. See #5667 for details.

Changes

Enhancements

  • Make show_config runnable without GPU (#5839)
  • Merge fp16 headers for CUDA 11.2+ (#6004)
  • Support cuTENSOR 1.3.3 (#6005)
  • Support CUDA 11.5 for library installer (#6010)
  • Display license terms when downloading libraries (#6041)
  • Fix error type/message for duplicate value in axis (#5987)

Bug Fixes

  • Do not use cuTENSOR unless available (#5885)
  • Fix non-determinisitc behavior in cupy.random.shuffle (#5887)
  • Fix ndarray.clip to match numpy (#5916)
  • Fix __repr__ of mode and scalar in cuTENSOR (#5917)
  • Fix max blocksize used in cupyx.optimizing.optimize for HIP (#5931)
  • Fix ravel for strides 0 (#5998)
  • Fix cuTENSOR installation on Windows (#6022)
  • Allow generating cubins for the max known CC (#6024)

Documentation

  • Update upgrade guide (#5834)
  • Document ppc64le and aarch64 are supported on conda-forge (#5869)
  • Improve the comparison table (#5911)
  • Add footnotes for functions unimplemented in CuPy (#5954)
  • Update the docstring for cholesky (#5960)
  • Document CUPY_ACCELERATORS (#5975)
  • Add favicon to docs (#5983)
  • Support CUDA 11.5 on documents (#6006)
  • Replace favicon with high resolution one (#6008)
  • Fix typo in copyright line (#6035)

Tests

  • Clean up plan cache in a FFT slow test (#5825)
  • Copy source directory to support pip 21.3 (#5896)
  • Simplify legacy ROCm test script for FlexCI (#5936)
  • Relax sparse linalg testing tolerance (#5958)
  • CI: Fix ROCm build test (FlexCI) failing (#5965)
  • Improve handling of FlexCI test runs (#6002)
  • Upload cache even when test failed in FlexCI (#6003)
  • CI: Increase timeout for CUDA 11.4 / 11.5 tests (#6040)
  • CI: Do not run full combination test even for branch tests for ROCm (#5974)

Others

  • Avoid triggering docker workflow on release of forked repos (#5886)
  • Bump version to v9.6.0 (#6043)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@asi1024 @drbeh @emcastillo @kmaehashi @leofang @takagi @toslunar

cupy - v10.0.0rc1

Published by emcastillo almost 3 years ago

This is the release note of v10.0.0rc1. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

Add cupyx.distributed (#5590)

This new version provides a wrapper over NVIDIA’s NCCL library to perform communication in an MPI-like style. Currently, point-to-point and collective communication primitives are supported. Check the documentation for a complete reference of the functions.

CuPy now supports CUDA 11.5, Python 3.10, and NVIDIA Jetson

Wheels for CUDA 11.5 (cupy-cuda115) are now available.
Python 3.10 wheels are also available for all supported CUDA / ROCm versions.

Wheels for Jetson can be found in the attached artifacts (pip install cupy-cuda112 ​​-f https://pip.cupy.dev/pre).

Enable Generator random API in ROCm 4.3 (#5895)

ROCm 4.3 fixes a series of issues that prevented the Generator random API (#4177) to run in AMD devices.

Changes without compatibility

Refer to the Upgrade Guide ​​for the detailed description.

Automatically enable peer access (#5496)

Peer access is enabled by default when a CuPy ndarray is stored in a different device as long as the machine topology allows it.

Change Device.use() semantics to align with Stream.use() (#5853)

When exiting a context, the current device is now reverted back to the device of the parent's context scope, not the device last use()d.

Automatically convert big-endian numpy.ndarray to little-endian in cupy.array() and its variants (#5828)

Previously CuPy was copying the given numpy.ndarray to GPU as-is, regardless of the endianness. In CuPy v10, big-endian arrays are converted to little-endian before the transfer, which is the native byte order on GPUs. This change eliminates the need to manually change the array endianness before creating the CuPy array.

Add cupyx.profiler module (#5940)

A new module cupyx.profiler is added to host all profiling related APIs in CuPy. Accordingly, the following APIs are relocated to this module:

  • cupy.prof.TimeRangeDecorator() -> cupyx.profiler.time_range()
  • cupy.prof.time_range() -> cupyx.profiler.time_range()
  • cupy.cuda.profile() -> cupyx.profiler.profile()
  • cupyx.time.repeat() -> cupyx.profiler.benchmark()

The old routines are deprecated.

Deprecate cupy.cuda.compile_with_cache (#5858)

An internal API cupy.cuda.compile_with_cache() has been marked as deprecated as there are better alternatives (RawModule, RawKernel). While it has a long-standing history, this API has never been meant to be public. We encourage downstream libraries and users to migrate to the aforementioned public APIs.

Announcements

Drop support for CUDA 10.1 or earlier (#5770)

As per the RFC in #5717 and Twitter, the minimum CUDA version that will be supported by CuPy v10 is CUDA 10.2.

Drop support for NCCL 2.6 and 2.7 (#5855)

The minimum supported version for CuPy v10 will be NCCL 2.8 as it implements the required primitives for cupyx.distributed to work.

Drop support for Python 3.6 (#5771)

Following the Python 3.6 sunset on December 2021, and the compatibility lines with NumPy, starting CuPy v10, Python 3.6 will no longer be supported.

Drop support for NumPy 1.17 (#5857)

As per NEP29, NumPy 1.17 support has been dropped on July 26, 2021.

Alpha/Beta/RC wheels no longer distributed through PyPI

  • As per the discussion in #5671, we stopped uploading pre-release binary wheels to PyPI for the health of the ecosystem. Pre-release wheels can now be downloaded from the recently introduced custom index (e.g., pip install cupy-cudaXXX -f https://pip.cupy.dev/pre) . Note that the sdist package is available in PyPI for all versions.

  • Outdated (v8.0.0rc1 or earlier) pre-release binaries have been removed from PyPI. See #5667 for details.

Changes of supported cuSPARSELt version

We are planning to drop cuSPARSELt v0.1.0 support in CuPy v10 final release. (#6045)

Changes

New Features

  • Add cupyx.distributed (#5590)
  • Add cupy.positive() (#5774)
  • Update cupy.array_api (#5783)
  • Update cupy.array_api typing (#5821)
  • Add trim_mean from scipy.stats to cupyx (#5900)
  • Implement more array creation & serialization methods (#5925)

Enhancements

  • Automatically enable peer access (#5496)
  • Update DLPack header to v0.6 to support exchanging arrays backed by managed memory (#5512)
  • Lazy-preload cuDNN (#5677)
  • Support ROCm managed memory (#5685)
  • Fix import failure when pytest namespace is available (#5703) (#5707)
  • Support cuTENSOR 1.3.3 (#5732)
  • Add dtype and casting arguments to cupy.concatenate() (#5759)
  • Automatically convert big-endian data to little-endian in cupy.array() and its variants (#5828)
  • Use pylibcugraph for connected_components (#5830)
  • Make show_config runnable without GPU (#5835)
  • NotImplementedError clarity (#5841)
  • Change Device.use() semantics to align with Stream.use() (#5853)
  • Drop support for NumPy 1.17 (#5857)
  • Deprecate cupy.cuda.compile_with_cache (#5858)
  • Show error when importing cupy.array_api with Python 3.7 (#5873)
  • Enable new random api in ROCm 4.3 (#5895)
  • Add bitorder option to cupy.packbits (#5898)
  • Support using cuTENSOR in elementwise ufuncs (#5902)
  • Workaround ROCm 4.3 LLVM_PATH issue in hipRTC (#5933)
  • Update the Array API module (#5939)
  • Add cupyx.profiler module (#5940)
  • Use SHA1 hash for kernel cache key to support Linux in FIPS-compliant mode (#5988)
  • Merge fp16 headers for CUDA 11.2+ (#5993)
  • Support CUDA 11.5 for library installer (#5996)
  • Add cupy-cuda115 to duplicate detection (#5999)
  • Suggest using binary packages when build failed (#6028)
  • Improve import error message (#6029)
  • Display license terms when downloading libraries (#6032)
  • Fix error type/message for duplicate value in axis (#5953)

Performance Improvements

  • Use index_t for faster address calculation (#5981)

Bug Fixes

  • Use cudaRuntimeGetVersion instead of CUDA_VERSION for CUDA Python support (#5723)
  • Allow generating cubins for the max known CC (#5779)
  • Fix hypergeometric distribution implementation to use int (#5785)
  • Fix non-determinisitc behavior in cupy.random.shuffle (#5838)
  • Avoid using driver.get_build_version (#5861)
  • Fix nan_to_num to comply with NumPy API (#5870)
  • Do not use cuTENSOR unless available (#5872)
  • Fix _get_cuda_build_version for ROCm (#5888)
  • Fix __repr__ of mode and scalar in cuTENSOR (#5901)
  • Fix to push device after setDevice succeed (#5904)
  • Fix ndarray.clip to match numpy (#5910)
  • Fix copyto with non-contiguous multidevice (#5913)
  • Avoid use of setDevice in CuPy codebase (#5915)
  • Fix max blocksize used in cupyx.optimizing.optimize for HIP (#5921)
  • Do not use with device in code base (#5963)
  • Fix __dlpack__ protocol (#5970)
  • Fix cupyx.tools.install_library for windows (#5977)
  • Fix ravel for strides 0 (#5978)
  • Avoid using with context for streams (#5985)
  • Fix cuTENSOR installation on Windows (#6007)
  • Fix hash length for SHA1 (#6023)
  • Fix: Add missing output dtype check for direct correlate/convolve (#6046)
  • Fix cuDNN version not displayed in wheel installation (#6054)

Code Fixes

  • Code-fix on cupy.array() (#5842)
  • Successive code fix on cupy.array() (#5844)
  • Fix kernel name of cupyx.scipy.ndimage.interpolation.map_coordinates (#5845)
  • Replace addAddNameExpression with addNameExpression in NVRTC binding (#5938)
  • Split loop testing helpers into _loops (#5967)
  • Make CUPY_DLPACK_EXPORT_VERSION consistent (#5982)
  • Fix comment in device switching (#5984)
  • Avoid using deprecated setDaemon method (#6059)

Documentation

  • Update upgrade guide (#5824)
  • Update list of supported OS (#5854)
  • Drop support for NCCL 2.6 and 2.7 (#5855)
  • Add docs for driver.get_build_version (#5860)
  • Document ppc64le and aarch64 are supported on conda-forge (#5865)
  • Mention deprecation of compile_with_cache() in upgrade guide (#5883)
  • Add docs for scipy.sparse.csgraph module (#5903)
  • Refine SciPy-compatible API documentation (#5905)
  • Improve the comparison table (#5907)
  • Remove CUDA 10.0 / 10.1 from README (#5924)
  • Improve some docs on interoperability and cupy.linalg.cholesky (#5941)
  • Add footnotes for functions unimplemented in CuPy (#5942)
  • Document CUPY_ACCELERATORS (#5948)
  • Fix section heading level (#5962)
  • Mention np.matrix in the difference section (#5966)
  • Add PyTorch with RawKernel example to docs (#5973)
  • Add sphinx-copybutton (#5976)
  • Add favicon to docs (#5980)
  • Replace favicon with high resolution one (#5986)
  • Update upgrade guide for v10 (#5994)
  • Cover a bit more of cuTENSOR in perf guide (#5995)
  • Support CUDA 11.5 on documents (#5997)
  • Fix typo in copyright line (#6030)
  • Add Python 3.10.0 to support list (#6038)
  • Added Compatibility Matrix to Upgrade Guide (#6053)

Installation

  • Bump CUDA/ROCm version in docker images (#5859)
  • Fix library installer to limit architecture (#5926)

Tests

  • Introduce new toolset for CI (#5474)
  • Simplify legacy ROCm test script for FlexCI (#5753)
  • Use pytest in TestJoin (#5764)
  • Clean up plan cache in a FFT slow test (#5811)
  • Improve handling of FlexCI test runs (#5814)
  • Tentatively disable pytest-xdist (#5826)
  • Add FlexCI projects for Linux (#5836)
  • Fix ROCm tests does not export LLVM_PATH (#5849)
  • Add test for CI generator (#5850)
  • Remove CUDA 10.0/10.1 and Python 3.6 from FlexCI tests (#5851)
  • Add mypy test for cupy_builder (#5856)
  • Add array-api-tests in FlexCI (#5862)
  • Upload cache even when test failed in FlexCI (#5867)
  • Improve CI generator to emit warning on uncovered axis (#5871)
  • Add CI for pylibcugraph (#5874)
  • Build CI docker images using BuildKit to utilize cache from registry (#5875)
  • Show hint to reproduce CI result locally in shell target (#5877)
  • Show time taken for build in CI (#5878)
  • Increase parallelism of CuPy build in CI (#5880)
  • Copy source directory to support pip 21.3 (#5881)
  • Fix ccache path to support CentOS (#5882)
  • Upload Docker image after running branch test (#5884)
  • Avoid cache download failure in CI when conflicting with cache upload (#5890)
  • Add FlexCI project for doctest, example and head test (#5891)
  • FlexCI test against Python 3.10 (#5899)
  • Fix fft test skip condition (#5908)
  • Add Slack/Gitter notification when branch test fail (#5914)
  • Declare the same environment variables as Linux in Windows CI (#5923)
  • Fix trim_mean test (#5944)
  • Temporarily skip Array API tests on ROCm (#5945)
  • Relax sparse linalg testing tolerance (#5952)
  • CI: Fix ROCm build test (FlexCI) failing (#5956)
  • CI: Fix ccache not working (#6016)
  • CI: Use ccache in Pre-review Test (#6027)
  • CI: Migrate ROCm build test from FlexCI to GitHub Actions (using ROCm docker image) (#6034)
  • CI: Merge doctest to example test in FlexCI (#6036)
  • CI: allow running tests selectively (#6039)
  • CI: Fix pip command use in FlexCI instance (#6049)
  • Fix notifier to work on Python 3.6 (#6056)
  • CI: Do not run full combination test even for branch tests for ROCm (#5955)

Others

  • Avoid triggering docker workflow on release of forked repos (#5863)
  • Refine issue templates using Issue Forms (#5868)
  • Bump version to v10.0.0rc1 (#6042)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@Anubha13kumari @SaharCarmel @SwastikTripathi @amathews-amd @anaruse @asi1024 @carterbox @drbeh @emcastillo @iskode @kmaehashi @lanttu1243 @leofang @okuta @prkhrsrvstv1 @spiralray @takagi @toslunar

cupy - v9.5.0

Published by kmaehashi about 3 years ago

This is the release note of v9.5.0. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Announcements

Removal of Alpha/Beta/RC Wheels from PyPI

  • As per the discussion in #5671, we stopped uploading pre-release binary wheels to PyPI for the health of the ecosystem. Pre-release wheels can now be downloaded from the recently introduced custom index (e.g., pip install cupy-cudaXXX -f https://pip.cupy.dev/pre) . Note that the sdist package is available in PyPI for all versions.

  • Outdated (v8.0.0rc1 or earlier) pre-release binaries have been removed from PyPI. See #5667 for details.

Changes

Enhancements

  • Support cuDNN 8.2.4 (#5744)
  • Support NCCL 2.11.4 (#5747)
  • Fix cupyx.optimize to save file when no optimization ran (#5760)

Bug Fixes

  • Fix spline filter with large array (#5686)
  • Fix exception for indexing with multiple ellipses (#5739)
  • Fix docstring for fallback modules (#5742)
  • Include stdexcept in hip headers (#5777)
  • Fixed typo in error message in sparse.csr_matrix (#5788)
  • Fix MAX_NDIM and add guards/tests (#5798)
  • Disable spmm on Windows CUDA 10.2 (#5805)

Documentation

  • Fix random docstring (#5708)
  • Remove --pre from ROCm source build instructions (#5782)
  • Use custom index for pre-release wheels (#5793)

Installation

  • Add maintainers in setup.py (#5758)
  • Bump version to v9.5.0 (#5808)

Tests

  • Update test_eigenvalue.py (#5643)
  • Improve performance of TestSplineFilter1dLargeArray (#5694)
  • Stop inheriting unittest.TestCase for performance (#5710)
  • TestSplineFilter1dLargeArray marked slow and reduced memory usage (#5729)
  • Make testing helpers support non-methods (#5731)
  • Make test parameter names static (#5733)
  • Update pip and setuptools in Windows CI (#5738)
  • Improve FlexCI output (#5796)
  • Fix error message comparison (#5806)

Others

  • Add workflow to test/build/push docker images on pull-request/release (#5752)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@christinahedges @emcastillo @kmaehashi @leofang @takagi @toslunar

cupy - v10.0.0b3

Published by kmaehashi about 3 years ago

This is the release note of v10.0.0b3. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

Array API initial support (#5698)

This release starts implementing the Array API standard for interoperability with other tensor libraries. Please check the CuPy documentation to see a list of the currently available features.

Changes without compatibility

Drop support for CUDA 10.1 or earlier (#5770)

As per the RFC in #5717 and twitter, the minimum CUDA version that will be supported by CuPy v10 is CUDA 10.2.

Drop support for Python 3.6 (#5771)

Following the Python 3.6 sunset on December 2021, and the compatibility lines with NumPy, starting CuPy v10, Python 3.6 will no longer be supported.

Alpha/Beta/RC wheels no longer distributed through PyPI

  • As per the discussion in #5671, we stopped uploading pre-release binary wheels to PyPI for the health of the ecosystem. Pre-release wheels can now be downloaded from the recently introduced custom index (e.g., pip install cupy-cudaXXX -f https://pip.cupy.dev/pre) . Note that the sdist package is available in PyPI for all versions.

  • Outdated (v8.0.0rc1 or earlier) pre-release binaries have been removed from PyPI. See #5667 for details.

Changes

New Features

  • Add binomial distribution to new Generator (#5429)
  • Adopt the numpy.array_api module as cupy.array_api (#5698)

Enhancements

  • Improve stream mismatch error message (#5706)
  • Support cuDNN 8.2.4 (#5726)
  • Support NCCL 2.11.4 (#5734)
  • Fix cupyx.optimize to save file when no optimization ran (#5757)
  • Adding bitorder support to cupy.unpackbits (#5765)
  • Drop support for CUDA 10.1 or earlier (#5770)
  • Drop support for Python 3.6 (#5771)

Bug Fixes

  • Fix spline filter with large array (#5673)
  • Fix exception for indexing with multiple ellipses (#5718)
  • Fix docstring for fallback modules (#5728)
  • Fix MAX_NDIM and add guards/tests (#5749)
  • Fixed typo in error message in sparse.csr_matrix (#5767)
  • Include stdexcept in hip headers (#5769)
  • Disable spmm on Windows CUDA 10.2 (#5802)

Code Fixes

  • Prefix Cython compile_time_env with CUPY_ (#5740)

Documentation

  • Use custom index for pre-release wheels (#5772)
  • Remove --pre from ROCm source build instructions (#5773)

Installation

  • Reorganize build scripts, part 1 (#5730)
  • Reorganize build scripts, part 2: separate modules (#5743)
  • Reorganize build scripts, part 3: simplify setup.py (#5745)
  • Reorganize build scripts, part 4: remove global cupy_setup_options (#5754)
  • Reorganize build scripts, part 5: remove Cython version check (#5755)
  • Add maintainers in setup.py (#5756)
  • Bump version to v10.0.0b3 (#5807)

Tests

  • Make testing helpers support non-methods (#5594)
  • Stop inheriting unittest.TestCase for performance (#5599)
  • Eliminate random test ids (#5659)
  • Improve performance of TestSplineFilter1dLargeArray (#5693)
  • TestSplineFilter1dLargeArray marked slow and reduced memory usage (#5724)
  • Make test parameter names static (#5727)
  • Update pip and setuptools in Windows CI (#5735)
  • Improve FlexCI output (#5786)
  • Skip tests for bug cases (FFT on CUDA 10.2 + Pascal) (#5791)
  • Fix error message comparison (#5799)
  • Fix test skip issue (#5801)

Others

  • Update auto-notify bot for array-api label (#5725)
  • Fix backport trigger (#5741)
  • Add workflow to test/build/push docker images on pull-request/release (#5746)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@christinahedges @emcastillo @iskode @kmaehashi @leofang @povinsahu1909 @takagi @toslunar

Package Rankings
Top 0.96% on Pypi.org
Top 5.87% on Conda-forge.org
Top 8.17% on Proxy.golang.org
Top 19.57% on Anaconda.org
Badges
Extracted from project README
pypi Conda GitHub license Matrix Twitter Medium