cupy

NumPy & SciPy for GPU

MIT License

Downloads
758.5K
Stars
7.7K
Committers
370

Bot releases are visible (Hide)

cupy - v13.0.0 Latest Release

Published by emcastillo 9 months ago

This is the release note of v13.0.0. See here for the complete list of solved issues and merged PRs.

This release note only covers changes made since the v13.0.0rc1 release. Check out our blog for highlights of the v13 release!

See the Upgrade Guide for the list of possible breaking changes in v13.

💬 Join the Matrix chat to talk with developers and users and ask quick questions!

🙌 Help us sustain the project by sponsoring CuPy!

📝 Changes

For all changes in v13, please refer to the release notes of the pre-releases (alpha1, beta1, rc1).

New Features

  • Add cupyx.signal.pulse_compression from cuSignal's non SciPy-compat API (#8039)
  • Add cupyx.signal.convolve1d3o from cuSignal's non SciPy-compat API (#8067)
  • add cupyx.signal.{pulse_doppler, cfar_alpha} (#8069)
  • Add cupyx.signal.convolve1d2o (#8113)

Enhancements

  • Make cupyx.signal.radartools private (#8053)
  • Fix csrmatrix.__pow__ to raise ValueError for non-int other (#8085)

Performance Improvements

  • Speed up cupy environment duplicate detection (#8042)

Bug Fixes

  • Fix lfilter_zi and sosfilt_zi when any IIR coefficient is zero (#8036)
  • Fix argmax/argmin for large reduction axis (#8041)
  • Fix cupyx.scipy.fft.{dst,dstn} in type 2/3 (#8082)
  • Do not use from-import (#8114)

Code Fixes

  • Refactor convolve1d3o (#8100)
  • Refactor radartools (#8106)

Documentation

  • Generate signature for ufunc documentation (#8044)
  • Use modern dlpack interface in torch interoperability document (#8048)

Installation

  • Skip CUDA_PATH warning in Conda installation (#8076)
  • Bump version to v13.0.0 (#8119)

Tests

  • Bump stable branch to v13 (#8026)
  • Remove some signal.vectorstrength xfail tests (#8083)
  • Fix scipy.linalg not to raise DeprecationWarning for zero-size inputs (#8086)
  • scipy.special.{btdtr,btdtri} are deprecated since SciPy (#8094)
  • Refactor radartools tests (#8099)
  • Fix slow test (#8117)

👥 Contributors

@andfoy @asi1024 @emcastillo @hauntsaninja @kmaehashi @takagi

The CuPy Team would like to thank all those who contributed to this release!

cupy - v13.0.0rc1

Published by asi1024 11 months ago

This is the release note of v13.0.0rc1. See here for the complete list of solved issues and merged PRs.

This is a release candidate of the CuPy v13 series. Please start testing your workload with this release to prepare for the final v13 release. To install: pip install -U --pre cupy-cuda11x -f https://pip.cupy.dev/pre. See the Upgrade Guide for the list of possible breaking changes in v13.

💬 Join the Matrix chat to talk with developers and users and ask quick questions!

🙌 Help us sustain the project by sponsoring CuPy!

✨ Highlights

NVIDIA cuTENSOR 2.0

NVIDIA cuTENSOR is a performant and flexible library for accelerating tensor linear algebra. CuPy v13 supports cuTENSOR 2.0, the latest major release of the library, achieving higher performance than cuTENSOR 1.x series.

NVIDIA RAPIDS cuSignal Integration

cuSignal is a library developed by the NVIDIA RAPIDS project that provides GPU-accelerated implementation of signal processing algorithms using CuPy as a backend. cuSignal includes scipy.signal compatible APIs, so we share the same goals. After a discussion with the cuSignal team, we agreed to merge cuSignal into CuPy to provide users with a better experience using a unified library for SciPy routines on GPU.

Currently, most of the functions provided in cuSignal have been merged into CuPy, and the remaining items are expected to be merged into CuPy v13 in due course.

We would like to acknowledge and thank @awthomp and everyone involved in the cuSignal development for creating a great library and agreeing to this transition.

Distributed NDArray (experimental) (#7881)

Added initial support for sharding ndarrays across multiple GPU devices connected to the same host.

from cupyx.distributed.array import distributed_array

shape = (16, 16)
cpu_array = numpy.random.rand(*shape)
# Set the chunk indexes for each device
# device 0 holds  rows 0..8 and device 1 holds rows 8..16
mapping = {
        0: [(slice(8), slice(None, None))],
        1: [(slice(8, None), slice(None, None))],
}
# The array is allocated in devices 0 and 1
multi_gpu_array = distributed_array(cpu_array, mapping)

This work was done by @shino16 during the Preferred Networks 2023 summer internship.

Support for Python 3.12

Binary packages are now available for Python 3.12.

🛠️ Changes without compatibility

CUDA Runtime API is now statically linked

CuPy is now shipped with CUDA Runtime statically linked. Due to this, cupy.cuda.runtime.runtimeGetVersion() always returns the version of CUDA Runtime that CuPy is built with, regardless of the version of CUDA Runtime installed locally. If you need to retrieve the version of CUDA Runtime shared library installed locally, use cupy.cuda.get_local_runtime_version() instead.

📝 Changes

New Features

  • Port lombscargle from cuSignal to cupyx.scipy.signal (#7563)
  • Port periodogram, welch and csd from cuSignal to cupyx.signal (#7564)
  • Port cusignal windows module to cupyx.scipy.signal (#7568)
  • Add cupy.lib.stride_tricks.sliding_window_view (#7575)
  • cupyx/scipy/signal: add place poles (#7666)
  • Add check_{NOLA, COLA} to cupyx.scipy.signal (#7675)
  • Port argrel{extrema, max, min} to cupyx.scipy.signal from cusignal (#7694)
  • Port waveforms from cusignal to cupyx.scipy.signal (#7696)
  • Port wavelets module from cusignal to cupyx.scipy.signal (#7700)
  • Add 2D signal b-splines to cupyx.scipy.signal (#7721)
  • Port firwin/firwin2 from cuSignal (#7722)
  • port upfirdn from cuSignal (#7749)
  • Support boolean COO sparse matrix (#7764)
  • Port gauss_spline from cuSignal (#7837)
  • Port stft/istft from CuSignal to cupyx.scipy.signal (#7838)
  • Port vectorstrength, coherence and spectrogram from CuSignal to cupyx.scipy.signal (#7853)
  • Port decimate, resample and resample_poly from cuSignal to cupyx.scipy.signal (#7855)
  • Add max_len_seq to cupyx.scipy.signal (#7867)
  • Add distributed ndarray (#7942)

Enhancements

  • Implement axis parameter on cupy.unique (#6886)
  • Load cuTENSOR from wheel distribution (#7025)
  • Soft link NVRTC for cupy_backends.cuda.libs.nvrtc (#7621)
  • Add a property to get access to the nccl handle. (#7823)
  • Remove cusolver_enabled, cub_enabled, thrust_enabled flags (#7840)
  • Lazy import cuSOLVER (#7843)
  • Lazy import cuSPARSE (#7847)
  • Lazy import cuFFT (#7849)
  • Static link to CUDA Runtime in CUB module (#7850)
  • Bundle CCCL in CuPy (#7851)
  • Lazy import cuRAND (#7856)
  • Use NVRTC for compiling kernels calling cupyx.jit.cub APIs (#7869)
  • Add optional argument device_id=-1 to get_current_stream (#7885)
  • Prohibit conversion from Variable to Python scalar in fusion (#7887)
  • Add __slots__ to cupy.ndarray (#7891)
  • Lazy import cuBLAS (#7921)
  • Allow Jitify to only cache CuPy-owned headers (#7934)
  • Ensure D2H copies are stream ordered and by default blocking (#7938)
  • Accelerate H2D copies when the source is on pinned memory (#7939)
  • Add Linux CI for Python 3.12 (#7940)
  • MNT: Suppress CUB compilation warnings (#7943)
  • Static link CUDA Runtime (#7954)
  • Add debug feature to preloading and softlink (#7977)
  • Support cuTensor 2.0 (#7984)
  • Bump supported NumPy & SciPy versions (#7992)
  • Softlink CUDA Driver (#7994)
  • Show local runtime version in cupy.show_config() (#7995)
  • Avoid using numpy.find_common_type (#7651)
  • ENH: Remove NINF, PINF, Inf,... usages (#7800)
  • Fix cupy.empty_like parameter name to prototype (#7827)
  • Make stream kwonly argument in ndarray.__dlpack__ (#7829)
  • Remove conversions of array with ndim > 0 to a scalar (#7886)
  • scipy.linalg.{tri/tril/triu} are deprecated in SciPy 1.11.0 (#7889)
  • Fix signal.medfilt complex error type for SciPy>=1.11 (#7890)
  • Fix cupyx.scipy.sparse._base tests for SciPy 1.11 (#7905)
  • Fix return type of division of csr_matrix and dense array for SciPy 1.11 (#7906)
  • Fix maxiter in TestLOBPCG (#7908)

Performance Improvements

  • Optimize spmatrix._set_many (#7888)

Bug Fixes

  • Fix csr2dense to avoid race conditions (#7724)
  • Fix cuTENSOR contraction descriptor cache (#7814)
  • Fix handling of scalars in cupy.r_ (#7815)
  • Fix cupy.r_ for scalar inputs (#7896)
  • Fixed Improper Method Call: Replaced NotImplementedError with NotImplemented (#7900)
  • Provide .stop() method for cupyx.distributed._Backend (#7952)
  • Fix NVRTCError not calling initialize() (#7955)
  • Import cupyx.lapack inside cupy.linalg.solve (#7966)
  • Add lazy load for cupyx.lapack (#7993)
  • Fix issues with the initial state when a SOS filter has no IIR part (#7998)
  • Avoid using pkg_resources for cuTENSOR wheel discovery (#8012)

Code Fixes

  • MNT: suppress compiler warning from cupyx.cusolver (#7714)
  • Add type annotation in _creation.basic (#7739)
  • Fix nvrtc initialize not inlined for CUDA Python (#7842)
  • Fix coding style (#7844)
  • Reorganize directory structure around CCCL (#7920)
  • Remove deprecated ast expr in CuPy JIT (#7941)
  • Reorganize third party code under third_party directory (#7956)

Documentation

  • Add -U to pre-release installation command (#7803)
  • Fix get_window docstring reference (#7835)
  • Clarify sparse .transpose() return type in docstrings (#7868)
  • DOC: cupyx/scipy: add missing names (#7898)
  • Fix CUDA 12.2 for Windows notice (#7922)
  • Bump CuPy version in install.rst (#8002)
  • Update installation guide to note about cuTENSOR 2.0 support (#8003)
  • Update wheels list in README (#8006)

Installation

  • Avoid warning when uploading packages (#7792)
  • Fix ROCm Dockerfile not working (#7797)
  • Add cuSignal license (#7816)
  • Improve symlink handling and preflight (#7945)
  • Bump docker cuda version to 12 (#7973)

Tests

  • Add timeout to Windows CI (#7775)
  • Fix mypy not installed in pre-review test (#7832)
  • Execution tests for typing tests passing rows in typing_tests (#7836)
  • CI: Remove path length limitation on Windows CI image (#7857)
  • Fix Windows CI failures (#7862)
  • Skip test_pos_boolarray if numpy>=1.25 (#7893)
  • Add NumPy 1.25/1.26 & SciPy 1.11 to CI (#7897)
  • Skip some LOBPCG tests failing with SciPy 1.11 (#7924)
  • Support Python 3.12, add Windows CI (#7947)
  • Skip logspace test in NumPy 1.25 & 1.26 (#7946) (#7948)
  • Fix Windows test scripts (#7957)
  • Skip test_parameterize_pytest_impl test for pytest 7.4.3 (#7965)
  • Fix TestLOBPCG.test_maxit_None CUDA 12.2 CI failure (#8000)

Others

  • Fix publish workflow permission and output for review (#7788)
  • Fix backport workflow (#7831)
  • Avoid triggering Project Updates for updates from assignees (#7861)
  • Bump version to v13.0.0rc1 (#8015)

👥 Contributors

The CuPy Team would like to thank all those who contributed to this release!

@anaruse @andfoy @asi1024 @emcastillo @ev-br @fazledyn-or @kerry-vorticity @kmaehashi @leofang @loganbvh @milesvant @mtsokol @mvnvidia @negin513 @shino16 @takagi

cupy - v12.3.0

Published by asi1024 11 months ago

This is the release note of v12.3.0. See here for the complete list of solved issues and merged PRs.

This is the last planned release for the CuPy v12 series. Please start testing your workload with the v13 release candidate to get ready for the final v13 release. To install: pip install -U --pre cupy-cuda11x -f https://pip.cupy.dev/pre. See the Upgrade Guide for the list of possible breaking changes in v13.

💬 Join the Matrix chat to talk with developers and users and ask quick questions!

🙌 Help us sustain the project by sponsoring CuPy!

✨ Highlights

Support for Python 3.12

Binary packages are now available for Python 3.12.

📝 Changes

Enhancements

  • Add a property to get access to the nccl handle. (#7824)
  • Add Linux CI for Python 3.12 (#7949)
  • Bump supported NumPy & SciPy versions (#8001)
  • ENH: Remove NINF, PINF, Inf,... usages (#7805)
  • Avoid using numpy.find_common_type (#7810)
  • Remove conversions of array with ndim > 0 to a scalar (#7895)
  • scipy.linalg.{tri/tril/triu} are deprecated in SciPy 1.11.0 (#7902)
  • Fix signal.medfilt complex error type for SciPy>=1.11 (#7909)
  • Fix return type of division of csr_matrix and dense array for SciPy 1.11 (#7912)
  • Skip TestSpmatrix on SciPy 1.11 or later (#7918)
  • Fix test of product, cumproduct, alltrue and sometrue for deprecation (#7936)
  • Skip fusion round_ tests (#7937)

Bug Fixes

  • Fix csr2dense to avoid race conditions (#7808)
  • Fix cuTENSOR contraction descriptor cache (#7817)
  • Provide .stop() method for cupyx.distributed._Backend (#7960)

Code Fixes

  • MNT: suppress compiler warning from cupyx.cusolver (#7819)
  • Fix coding style (#7846)
  • Remove deprecated ast expr in CuPy JIT (#7944)
  • Remove unnecessary CUB files from CuPy distribution (#7975)

Documentation

  • Add -U to pre-release installation command (#7806)
  • Fix CUDA 12.2 for Windows notice (#7926)

Installation

  • Fix ROCm Dockerfile not working (#7799)
  • Avoid warning when uploading packages (#7807)

Tests

  • Add timeout to Windows CI (#7859)
  • CI: Remove path length limitation on Windows CI image (#7860)
  • Fix Windows CI failures (#7865)
  • Fix Windows + CUDA 12.2 CI (#7910)
  • Skip test_pos_boolarray if numpy>=1.25 (#7913)
  • Skip some LOBPCG tests failing with SciPy 1.11 (#7931)
  • Add NumPy 1.25/1.26 & SciPy 1.11 to CI (#7932)
  • Skip logspace test in NumPy 1.25 & 1.26 (#7946) (#7951)
  • Support Python 3.12, add Windows CI (#7958)
  • Fix Windows test scripts (#7961)
  • Skip test_parameterize_pytest_impl test for pytest 7.4.3 (#7968)
  • Filter DeprecationWarning for distutils.dep_util used in Cython (#7999)
  • Fix TestLOBPCG.test_maxit_None CUDA 12.2 CI failure (#8007)

Others

  • Fix backport workflow (#7833)
  • Bump version to v12.3.0 (#8016)

👥 Contributors

The CuPy Team would like to thank all those who contributed to this release!

@asi1024 @emcastillo @kmaehashi @leofang @mtsokol @mvnvidia

cupy - v12.2.0

Published by kmaehashi about 1 year ago

This is the release note of v12.2.0. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

✨ Highlights

Support for CUDA 12.2

CuPy now supports CUDA 12.2. Note that there is a known issue on CUDA 12.2 for Windows. See #7776 for details.

GitHub Sponsors

As a part of our effort to make CuPy sustainable, we have enrolled in GitHub Sponsors to accept donations. Help us to support CuPy’s development and contribute to ease the required infrastructure costs due to the need of GPU enabled CI platforms and resources to build binary packages.

As a NumFOCUS Sponsored Project, funds sponsored through the GitHub Sponsors are collected and disbursed via NumFOCUS, a 501(c)(3) public charity in the United States, which acts as the fiscal sponsor for the project.

🛠️ Changes without compatibility

Deprecation of cupy-wheel Package

Due to the recent specification change in Pip 23.1, it became difficult for cupy-wheel to ensure detecting the CUDA version installed correctly. As discussed in RFC #7628, we have decided to remove this package in CuPy v13. To allow existing projects using cupy-wheel to continue to work, the package remains available for v12 releases.

📝 Changes

Enhancements

  • Minor updates for cuQuantum/cuTensorNet support (#7730)
  • Bump mypy version to 1.4.1 (#7736)
  • Support CUDA 12.2 (#7752)

Performance Improvements

  • Fix random module performance regression (#7592)

Bug Fixes

  • Fix returned CUDA statuses not being checked (#7618)
  • Fix cuSPARSE error message (#7684)
  • Fix memory pool to try resolve fragmentation when limit is set (#7685)
  • Fix type/exception annotations in cuSPARSE binding (#7703)
  • Update pylibcugraph weakly connected components call (#7704)
  • Improve detection for package installation source on Windows (#7711)
  • Temporarily disable CUB histogram (#7716)
  • Fix aweights type not checked in cupy.cov (#7717)
  • Revert FP16 headers from CUDA 12.2.0 to CUDA 12.1.1 (#7773)

Code Fixes

  • Introduce cython-lint (#7612)

Documentation

  • Improve README and Installation Guide (#7599)
  • update badges (#7600)
  • Fix small typos in docstrings (#7657)
  • Fix docstring of asarray (#7695)
  • Add CUDA 12.2 to list of supported CUDA (#7756)
  • Remove incorrect cupyx.distributed.NCCLBackend.all_gather comment (#7765)
  • Fix Note highlight sections in README (#7770)
  • Add notes for CUDA 12.2 on Windows support (#7778)

Installation

  • Fix cupy-wheel package installation fails with pip 23.1+ (#7624)

Tests

  • Bump versions of static checkers (#7598)
  • Fix build-cuda test restore-keys not working (#7614)
  • [v12] Require numpy<1.25 for round_ tests (#7642)
  • Ignore pkg_resources deprecation warning on import (#7656)
  • Skip TestLOBPCG::test_maxit_None in CUDA 12.1.1 & cuSOVLER 11.4.5 (#7670)
  • Bump CUDA minor versions used in CI (#7683)
  • [v12] Allow specifying Docker repository for CI images (#7690)
  • Use "/test" tag configuration from pull-request base branch (#7706)
  • XFAIL known test failures in cuSPARSE module (#7725)
  • Fix test_fht not to feed cupy.ndarray to scipy.fft.fhtoffset (#7728)
  • CI: remove explicit Cython installation (#7731)
  • Fix test_sum_duplicates_incompatibility for SciPy 1.11 (#7768)

Others

  • Fix flake8-cython not working (#7606)
  • Add env var to disable RPATH (#7718)
  • Bump version to v12.2.0 (#7755)

👥 Contributors

The CuPy Team would like to thank all those who contributed to this release!

@12rambau @asi1024 @emcastillo @jnke2016 @kmaehashi @leofang @pelmers @pri1311 @RandomY-2 @takagi

cupy - v13.0.0b1

Published by kmaehashi about 1 year ago

This is the release note of v13.0.0b1. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

✨ Highlights

Improved Coverage of cupyx.scipy.signal and cupyx.scipy.interpolate APIs (#7507, #7537, #7543 and others)

More than 20 new APIs are now included in cupyx.scipy.signal.

Acknowledgments: This work was done by Edgar Andrés Margffoy Tuay (@andfoy) and Evgeni Burovski (@ev-br) under the support of the Chan Zuckerberg Initiative's Essential Open Source Software for Science program.

Support for CUDA 12.2

CuPy now supports CUDA 12.2. Note that there is a known issue on CUDA 12.2 for Windows. See #7776 for details.

Removal of cupy-wheel Package

Due to the recent specification change in Pip 23.1, it became difficult for cupy-wheel to ensure detecting the CUDA version installed correctly. As discussed in RFC #7628, we have decided to remove this package in CuPy v13. To allow existing projects using cupy-wheel to continue to work, the package remains available for v12 releases.

Support us via GitHub Sponsors!

As a part of our effort to make CuPy sustainable, we have enrolled in GitHub Sponsors to accept donations. Help us to support CuPy’s development and contribute to ease the required infrastructure costs due to the need of GPU enabled CI platforms and resources to build binary packages.

As a NumFOCUS Sponsored Project, funds sponsored through the GitHub Sponsors are collected and disbursed via NumFOCUS, a 501(c)(3) public charity in the United States, which acts as the fiscal sponsor for the project.

🛠️ Changes without compatibility

  • Support for the following platforms are removed in CuPy v13. (#7647)
    • CUDA 10.2, 11.0, and 11.1
    • Python 3.8
    • NumPy 1.21
    • cuTENSOR 1.5 or earlier
    • NCCL 2.15 or earlier
    • cuDNN 8.7 or earlier
    • Ubuntu 18.04
  • APIs deprecated in NumPy 1.25 (product, cumproduct, alltrue, and sometrue) now emits DeprecationWarning in CuPy as well. (#7645)
  • cupy.cuda.compile_with_cache, which is a private API and has been marked deprecated since CuPy v10, has been removed. Please use RawKernel or RawModule instead. (#5297, #7734)

📝 Changes

New Features

  • Add cupyx.scipy.ndimage.value_indices (#7410)
  • Add Euclidean distance transform (scipy.ndimage.distance_transform_edt) (#7413)
  • Add cupy.scipy.linalg.bandwidth functionality (#7507)
  • cupyx.scipy.signal: add firls and freqz, freqz_zpk, sosfreqz (#7537)
  • cupyx.scipy.signal: add CZT and ZoomFFT (#7543)
  • Add sosfilt_zi to cupyx.scipy.signal (#7552)
  • filter design "prototypes" (#7553)
  • Add sosfiltfilt to cupyx.scipy.signal (#7558)
  • Add iirfilter and related filter design functions (#7591)
  • cupyx.scipy.signal: add hilbert and hilbert2 (#7607)
  • cupyx/scipy.signal: port *ord filter design functions from scipy.signal (#7632)
  • Add gammatone, group_delay to cupyx.scipy.signal (#7633)
  • Add iir{notch,comb,peak} design functions (#7634)
  • Add kaiser{ord,_beta,_atten} functions (#7635)
  • cupyx/scipy/signal: add abcd_normalize (#7637)
  • Add cupyx.scipy.signal.minimum_phase (#7638)
  • Add find_peaks, peak_prominences, peak_widths to cupyx.scipy.signal (#7640)
  • Add cupyx.signal.{freqs, freqs_zpk, findfreqs} (#7641)
  • cupyx/scipy/signal: add unique_roots, invres{z}, residue{z} (#7644)
  • cupyx/scipy/signal: add missing LTI format conversions (#7652)
  • Implement scipy.linalg.khatri_rao (#7659)
  • Add LTI class hierarchy and lti/dlti related functions (#7660)
  • cupyx/scipy/signal: add correlation_lags (#7707)
  • Add 1D signal b-splines to cupyx.scipy.signal (#7715)
  • Add the matrix exponential expm (#7744)

Enhancements

  • Let the user specify a starting vector for eigsh (#7487)
  • cupy.kron can accept numeric arguments, replicating numpy.kron behavior (#7608)
  • Deprecate support for out-dated platforms in CuPy v13 (#7647)
  • Add mixed precision (FP16) support for ROCm (#7663)
  • Minor updates for cuQuantum/cuTensorNet support (#7723)
  • Bump mypy version to 1.4.1 (#7735)
  • Support CUDA 12.2 (#7748)
  • Avoid overflow warnings in test_astype_strides (#7622)
  • Deprecate cupy.round_ (#7623)
  • Deprecate product, cumproduct, alltrue and sometrue (#7645)

Bug Fixes

  • Fix returned CUDA statuses not being checked (#7613)
  • Fix memory pool to try resolve fragmentation when limit is set (#7679)
  • Fix cuSPARSE error message (#7680)
  • Fix type/exception annotations in cuSPARSE binding (#7692)
  • Update pylibcugraph weakly connected components call (#7693)
  • Fix aweights type not checked in cupy.cov (#7701)
  • Temporarily disable CUB histogram (#7708)
  • Improve detection for package installation source on Windows (#7709)
  • Revert FP16 headers from CUDA 12.2.0 to CUDA 12.1.1 (#7758)

Code Fixes

  • Introduce cython-lint (#7508)
  • Remove cupy.cuda.compile_with_cache (#7734)
  • Cosmetic changes of cupy/typing (#7738)
  • MAINT: centralize np.roots calls (#7740)

Documentation

  • Improve README and Installation Guide (#7580)
  • update badges (#7594)
  • Fix small typos in docstrings (#7655)
  • Fix docstring of asarray (#7668)
  • Remove incorrect cupyx.distributed.NCCLBackend.all_gather comment (#7746)
  • Add CUDA 12.2 to list of supported CUDA (#7753)
  • Fix Note highlight sections in README (#7767)
  • Note device sync on some cupyx.scipy.signal API documents (#7771)
  • Add notes for CUDA 12.2 on Windows support (#7777)

Installation

  • Fix cupy-wheel package installation fails with pip 23.1+ (#7597)
  • Remove cupy-wheel package (#7745)

Tests

  • Bump versions of static checkers (#7595)
  • Fix build-cuda test restore-keys not working (#7610)
  • Fix hilbert and hilbert2 test condition (#7619)
  • Change test pass condition for TestChoiceChi::test_goodness_of_fit (#7626)
  • cupyx/signal: skip tests against older SciPy versions, adjust test tolerances (#7627)
  • Fix test_fht not to feed cupy.ndarray to scipy.fft.fhtoffset (#7643)
  • Ignore pkg_resources deprecation warning on import (#7653)
  • Skip TestLOBPCG::test_maxit_None in CUDA 12.1.1 & cuSOVLER 11.4.5 (#7669)
  • Bump CUDA minor versions used in CI (#7682)
  • XFAIL known test failures in cuSPARSE module (#7688)
  • Allow specifying Docker repository for CI images (#7689)
  • Use "/test" tag configuration from pull-request base branch (#7705)
  • CI: remove explicit Cython installation (#7729)
  • Fix test_sum_duplicates_incompatibility for SciPy 1.11 (#7763)

Others

  • Fix command in issue template for Windows (#7601)
  • Fix flake8-cython not working (#7602)
  • Add workflow to automatically updating Pull-Request dashboard project (#7631)
  • Dump event payload for debugging (#7646)
  • Pass pull-request number via artifact (#7648)
  • Use GitHub App to update org-wide project (#7649)
  • Do not trigger on closed pull-requests and fix to work with issue_comment trigger (#7650)
  • Fix project automation to skip when pull-request is already closed (#7658)
  • Add env var to disable RPATH (#7691)
  • Fix project update workflow (#7710)
  • Fix github-token used in workflow (#7712)
  • Bump version to v13.0.0b1 (#7754)
  • Add workflow to publish to PyPI (#7779)

👥 Contributors

The CuPy Team would like to thank all those who contributed to this release!

@12rambau @andfoy @asi1024 @ev-br @grlee77 @jglaser @jmbr @jnke2016 @kmaehashi @KyanCheung @leofang @pelmers @pnunna93 @pri1311 @RandomY-2 @sametz @takagi

cupy - v12.1.0.post1

Published by kmaehashi over 1 year ago

This is a hot-fix release for v12.1.0 to address an issue reported in #7593 that pip install cupy-wheel raises an error with Pip v23.1 or later. See here for the complete list of solved issues and merged PRs.

This fix only applies to the cupy-wheel meta package. As there are no differences in CuPy functionalities with v12.1.0, no releases are made for CuPy's source/binary packages.

We are also considering removing cupy-wheel meta package in CuPy v13. Join the discussion in #7628 if you have any suggestions or comments.

cupy - v12.1.0

Published by takagi over 1 year ago

This is the release note of v12.1.0. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Changes

New Features

  • Add array_api.take function (#7513)

Enhancements

  • Support SciPy 1.10 (#7586)

Bug Fixes

  • Fixup array/asarray call to prefer C order on plain NumPy arrays (#7493)
  • Fix cudart errors raised by texture APIs swallowed by Cython (#7566)
  • Dispatch ufunc methods (#7583)

Code Fixes

  • Fix cythonize warnings (#7502)

Documentation

  • Update aarch64 install insturctions (#7503)
  • Fix RTD build failure (#7554)

Installation

  • Use -Xfatbin=-compress-all (#7505)
  • Fix _depends.json not included in wheel (#7584)

Tests

  • Bump platform versions used in actions (#7501)
  • Fix TestBSpline::test_design_matrix_same_as_BSpline_call (#7525)
  • Remove unused test decorators (#7535)
  • Restore GitHub Actions cache with prefix match (#7571)
  • Fix CUDA Python CI failure (#7582)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@andfoy @arogozhnikov @asi1024 @kmaehashi @leofang @seberg @takagi

cupy - v13.0.0a1

Published by takagi over 1 year ago

This is the release note of v13.0.0a1. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

CuPy v13 Roadmap and Revised Release Schedule

  • We have published a list of feature roadmaps for CuPy v13 planned to be released in October 2023. See #7555 for the details.
  • Starting in the CuPy v13 development cycle, we have adjusted our release frequency to once every two months. Mid-term or hot-fix releases may be provided depending on necessity, such as for new CUDA/Python version support or critical bug fixes. This new policy also applies to v12 releases.
  • RFC: We plan to drop CUDA 10.2/11.0/11.1 support in CuPy v13. Please leave a comment on #7557 if you have any suggestions.
  • RFC: We are thinking of improving PyTorch interoperability features in CuPy. If you are interested, please join the discussion in #7556.

Improved Coverage of cupyx.scipy.signal and cupyx.scipy.interpolate APIs (#7442, #7496 and others)

lfilter, lfilter_zi, filtfilt, sosfilt APIs are now included in cupyx.scipy.signal, and NdPPoly in cupyx.scipy.interpolate modules.

Acknowledgements: This work was done by Edgar Andrés Margffoy Tuay (@andfoy) and Evgeni Burovski (@ev-br) under the support of the Chan Zuckerberg Initiative's Essential Open Source Software for Science program.

Random number generator performance improved (#7517)

Sampling using cupy.random.Generator.* methods were slower than the cupy.random.* function calls using the old random API. Now the regression is solved, and performance has increased more than 4X when using the cupy.random.Generator API.

Changes without compatibility

Drop support for Python 3.8

Getting aligned with NumPy NEP29, Python 3.8 is no longer supported since CuPy v13.

Changes

New Features

  • Add NdPPoly to cupyx.scipy.interpolate (#7357)
  • Implement delete function, add documentation (#7359)
  • add array_api.take function (#7432)
  • Add lfilter/IIR utilities to cupyx.scipy.signal (#7442)
  • Added scipy.special.binom functionality to CuPy (#7463)
  • cupyx/scipy/signal: add savgol_coeffs and savgol_filter (#7469)
  • Add scipy.special.zetac to cupyx (#7470)
  • add cupyx.scipy.special.exprel (#7474)
  • Add lfiltic and lfilter_zi to cupyx.scipy.signal (#7477)
  • Add filtfilt to cupyx.scipy.signal (#7496)
  • Add deconvolve to cupyx.scipy.signal (#7509)
  • Add symiirorder1 to cupyx.scipy.signal (#7511)
  • Add symiirorder2 to cupyx.scipy.signal (#7518)
  • Add scipy.special.spherical_yn (#7520)
  • Add sosfilt to cupyx.scipy.signal (#7528)
  • ENH: scipy.signal: add detrend (#7536)
  • cupyx.scipy.signal: add bilinear & bilinear_zpk (#7541)

Enhancements

  • Support SciPy 1.10 (#7367)
  • ROCm5.3.0+ rocPrim C++14 extension requirement. (#7412)
  • Support cuDNN 8.8 (#7472)
  • Support CUDA 12.1 (#7473)
  • Support NumPy 1.24: dtype and casting keyword arguments for hstack, vstack, stack (#7490)
  • Replace concatenate by slice manipulation in lfilter (#7522)
  • Support NumPy 1.24: Adding strict option to testing.assert_array_equal (#7481)

Performance Improvements

  • Fix random module performance regression (#7517)
  • Improve symiirorder2 performance (#7526)

Bug Fixes

  • Fix new strides when array is both C and F-contiguous (#7438)
  • Fixup array/asarray call to prefer C order on plain NumPy arrays (#7457)
  • Fix cudart errors raised by texture APIs swallowed by Cython (#7540)
  • Dispatch ufunc methods (#7572)

Code Fixes

  • Rename type_test to type_testing (#7456)
  • Fix cythonize warnings (#7480)

Documentation

  • Add comparison table for scipy.interpolate module (#7433)
  • Update list of supported libraries (#7478)
  • Update aarch64 install insturctions (#7500)
  • Fix RTD build failure (#7547)

Installation

  • Bump version to v13.0.0a1 (#7494)
  • Use -Xfatbin=-compress-all (#7497)
  • Fix _depends.json not included in wheel (#7578)

Tests

  • Remove unused test decorators (#7453)
  • Remove xfail for invh (#7476)
  • Bump platform versions used in actions (#7488)
  • Fix TestBSpline::test_design_matrix_same_as_BSpline_call (#7521)
  • Mark scipy required in a test (#7523)
  • Require newer SciPy in a test (#7524)
  • Import SciPy in tests (#7531)
  • Restore GitHub Actions cache with prefix match (#7546)
  • Try to fix nan value mismatches in filtfilt tests (#7567)
  • Fix CUDA Python CI failure (#7574)

Others

  • Bump stable branch to v12 (#7447)
  • Update branch name from master to main (#7448)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@AdrianAbeyta @Anas20001 @andfoy @arogozhnikov @asi1024 @chettub @emcastillo @ev-br @kmaehashi @KyanCheung @leofang @pri1311 @Raghav323 @seberg @takagi @tysonwu

cupy - v12.0.0

Published by emcastillo over 1 year ago

This is the release note of v12.0.0. See here for the complete list of solved issues and merged PRs.

This release note only covers changes made since the v12.0.0rc1 release. Check out our blog for highlights of the v12 release!

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

Support for CUDA 12.1 & cuDNN 8.8 (#7484 & #7475)

CuPy now supports CUDA 12.1 and cuDNN 8.8. Binary packages are available for Linux (x86_64/aarch64) and Windows as cupy-cuda12x.

$ pip install cupy-cuda12x

Announcements

Arm packages available in PyPI

Binary packages for aarch64 (Jetson and Arm servers) can now be installed from PyPI.

$ pip install cupy-cuda102
$ pip install cupy-cuda11x
$ pip install cupy-cuda12x

Note: At the time of the release, Arm wheel of cupy-cuda11x for Python 3.8 (cupy_cuda11x-12.0.0-cp38-cp38-manylinux2014_aarch64.whl) is not available on PyPI. We are working on resolving this issue. Meanwhile, this wheel can be installed from the CuPy index. $ pip install cupy-cuda11x -f https://pip.cupy.dev/aarch64 This issue was resolved on 2023-04-03.

Changes

For all changes in v12, please refer to the release notes of the pre-releases (alpha1, alpha2, beta1, beta2, beta3, rc1).

Enhancements

  • ROCm5.3.0+ rocPrim C++14 extension requirement (#7454)
  • Support cuDNN 8.8 (#7475)
  • Support CUDA 12.1 (#7484)

Bug Fixes

  • Fix new strides when array is both C and F-contiguous (#7451)

Code Fixes

  • Rename type_test to type_testing (#7461)

Documentation

  • Add comparison table for scipy.interpolate module (#7450)
  • Update list of supported libraries (#7486)

Tests

  • Remove xfail for invh (#7485)

Others

  • Bump version to v12.0.0 (#7492)
  • Bump branch version to v12 (#7446)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@AdrianAbeyta @asi1024 @emcastillo @kmaehashi @seberg

cupy - v11.6.0

Published by asi1024 over 1 year ago

This is the release note of v11.6.0. See here for the complete list of solved issues and merged PRs.

This is the last planned release for CuPy v11 series. Please start testing your workload with the v12 release candidate to get ready for the final v12 release. To install:pip install -U --pre cupy-cuda11x -f https://pip.cupy.dev/pre. See the Upgrade Guide for the list of possible breaking changes in v12.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

Fixed Performance Issue with CUDA 12.0

This release fixes a critical performance regression in CUDA 12.0 that the on-disk kernel cache is ineffective, causing kernels to be recompiled for each python process. Users with CUDA 12.0 are strongly suggested to upgrade to this release.

Changes

Enhancements

  • Use warp size from runtime.getDeviceProperties (#7353)
  • Update DLPack to v0.8 to support bool arrays (#7376)
  • Mark cupy.cuda.profiler.initialize deprecated as it is removed in CUDA 12 (#7379)
  • Work around a potential OOM error raised by CUB histogram (#7388)
  • Use NumPy 1.24 in CI and bump baseline API (#7423)
  • Fix arange() to raise TypeError in boolean case (#7407)

Bug Fixes

  • Fix kernel cache not working in CUDA 12.0 (#7348)
  • Imporves stability of orthogonization step in cupyx.scipy.sparse.eigsh (#7361)
  • Do not test NumPy version for private APIs (#7370)

Documentation

  • Downgrade pydata-sphinx-theme to v0.11.0 (#7380)

Installation

  • Bump version to v11.6.0 (#7435)

Tests

  • CI: tentatively use SciPy 1.9 in Windows (#7336)
  • CI: Add optuna 3.0 (#7337)
  • Remove invalid pytest markers and turn on strict mode (#7354)
  • Avoid int8 overflow warning in TestRoundHalfway (#7362)
  • Filter SQLAlchemy 2.0 warnings raised from Optuna v2 (#7365)
  • Add CI for CUDA 12.0 on Windows (#7371)
  • Fix pre-commit configuration error (#7373)
  • Avoid casting nan value to integer type in nanargmin/max tests (#7381)
  • Avoid int8 overflow in some tests (#7382)
  • Fix int8 overflow in vectorize tests (#7384)
  • Fix sumprod test to avoid uint overflow (#7398)
  • Avoid fillvalue overflow in cupyx.scipy.signal test (#7401)
  • Fix ndarray.fill to raise ComplexWarning (#7408)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@asi1024 @emcastillo @kmaehashi @leofang @RisaKirisu

cupy - v12.0.0rc1

Published by asi1024 over 1 year ago

This is the release note of v12.0.0rc1. See here for the complete list of solved issues and merged PRs.

This is a release candidate of the CuPy v12 series. Please start testing your workload with this release to prepare for the final v12 release. To install: pip install -U --pre cupy-cuda11x -f https://pip.cupy.dev/pre. See the Upgrade Guide for the list of possible breaking changes in v12.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

Improved Coverage of cupyx.scipy.interpolate

The following interpolators have been implemented: BPoly, Akima1DInterpolator, PchipInterpolator.

Acknowledgements: This work was done by Edgar Andrés Margffoy Tuay (@andfoy) and Evgeni Burovski (@ev-br) under the support of the Chan Zuckerberg Initiative's Essential Open Source Software for Science program.

DLPack v0.8 Support

CuPy is now compatible with DLPack v0.8 to allow importing/exporting bool arrays.

Fixed Performance Issue with CUDA 12.0

This release fixes a critical performance regression in CUDA 12.0 that the on-disk kernel cache is ineffective, causing kernels to be recompiled for each python process. Users with CUDA 12.0 are strongly suggested to upgrade to this release.

Changes without compatibility

Change cupy.cuda.Device Behavior (#7427)

The CUDA current device (set via cupy.cuda.Device.use() or underlying CUDA API cudaSetDevice()) will now be reactivated when exiting a cupy.cuda.Device context manager. This reverts the change introduced in CuPy v10, making the behavior identical to the one in CuPy v9 or earlier. Please refer to the Upgrade Guide for the background of this decision.

Requirement Changes (#7405)

As per NEP 29, CuPy v12 drops support for Python 3.7 and NumPy 1.20. Support for SciPy 1.6 has been dropped as well.

Remove Texture Reference APIs (#7308)

Texture reference features (RawModule.get_texref() and TextureReference), which were marked deprecated in CUDA 10.1 and removed in CUDA 12.0, have been removed from CuPy.

Changes

New Features

  • Initial experimental & private cupyx.distributed._array implementation (#7040)
  • Add PchipInterpolator to cupyx.scipy.interpolate (#7255)
  • Add Akima1DInterpolator to cupyx.scipy.interpolate (#7260)
  • Add cached_code to ElementwiseKernel and ReductionKernel (#7265)
  • Enable spline methods on RegularGridInterpolator (#7334)
  • Add BPoly to cupyx.scipy.interpolate module (#7343)

Enhancements

  • Use NumPy 1.24 in CI and bump baseline API (#7248)
  • Use warp size from runtime.getDeviceProperties (#7302)
  • Update DLPack to v0.8 to support bool arrays (#7307)
  • Remove texture reference completely (#7308)
  • Work around a potential OOM error raised by CUB histogram (#7316)
  • Mark cupy.cuda.profiler.initialize deprecated as it is removed in CUDA 12 (#7377)
  • Drop support for Python 3.7, NumPy 1.20, and SciPy 1.6 (#7405)
  • Raise RuntimeError if pylibraft is unavailable (#7411)
  • Revert cupy.cuda.Device behavior to v9 (#7427)
  • Fix ndarray.fill to raise ComplexWarning (#7393)
  • Fix arange() to raise TypeError in boolean case (#7394)

Performance Improvements

  • Change implementation of fftshift and ifftshift (#7399)

Bug Fixes

  • Fix kernel cache not working in CUDA 12.0 (#7345)
  • Imporves stability of orthogonization step in cupyx.scipy.sparse.eigsh (#7356)
  • Do not test NumPy version for private APIs (#7368)

Code Fixes

  • Small fixes and refactor of casting related things (#7322)

Documentation

  • Doc: fix wrong time unit (#7312)
  • Doc: add docs for contiguity policy (#7344)
  • Doc: downgrade pydata-sphinx-theme to v0.11.0 (#7375)
  • Fix typo in docstring (#7402)
  • DOC: cupyx.interpolate: document limitations on ROCm (#7419)
  • Add upgrade guide for v12 (#7430)

Installation

  • Add CUPY_INCLUDE_PATH and CUPY_LIBRARY_PATH env vars (#7305)
  • Bump docker image to CUDA 11.8.0 (#7429)
  • Bump version to v12.0.0rc1 (#7434)

Tests

  • CI: tentatively use SciPy 1.9 in Windows (#7326)
  • CI: Add optuna 3.0 (#7333)
  • Avoid int8 overflow warning in TestRoundHalfway (#7338)
  • Avoid int8 overflow in some tests (#7339)
  • Fix int8 overflow in vectorize tests (#7340)
  • Avoid casting nan value to integer type in nanargmin/max tests (#7341)
  • Add CI for CUDA 12.0 on Windows (#7349)
  • Remove invalid pytest markers and turn on strict mode (#7350)
  • Drop support for Optuna v2 (#7363)
  • Filter SQLAlchemy 2.0 warnings raised from Optuna v2 (#7364)
  • Fix pre-commit configuration error (#7369)
  • Avoid int8 overflow in core test (#7387)
  • Fix sumprod test to avoid uint overflow (#7395)
  • Avoid fillvalue overflow in cupyx.scipy.signal test (#7397)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

Contributors:
@andfoy @asi1024 @emcastillo @ev-br @kmaehashi @leofang @Nordicus @Raghav323 @RisaKirisu @seberg @wstolp

cupy - v11.5.0

Published by kmaehashi over 1 year ago

This is the release note of v11.5.0. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

CUDA 12 & H100 Support

CuPy now supports CUDA 12.0 and NVIDIA's latest H100 GPU. Binary packages are available for Linux (x86_64/aarch64) and Windows.

$ pip install cupy-cuda12x

For aarch64:
$ pip install cupy-cuda12x -f https://pip.cupy.dev/aarch64

Note that cuDNN support is unavailable at this time as cuDNN for CUDA 12 has not yet been released.

Changes

Enhancements

  • Support CUDA 12.0 (#7238)
  • Conditionally change identifiers for ROCm (#7244)
  • Extra fixes for CUDA 12.0 (#7257)
  • Support NCCL 2.16 (#7288)
  • Bump to cuTENSOR 1.6.2 (#7290)
  • Support cuDNN 8.7 (#7296)
  • Lazy load dtypes deprecated in NumPy 1.24 (#7297)
  • Add cupy-cuda12x to cupy-wheel (#7327)
  • Update for deprecations in NumPy 1.24 (#7263)
  • Update array_api (#7321)

Bug Fixes

  • Fix interpreting Sparse init arguments (#7230)
  • Fix race condition in Jitify (#7266)
  • Support passing int as shape to broadcast_to (#7291)
  • Update cuTENSOR installer for CUDA 12.x (#7301)

Documentation

  • Bump docs requirements (#7258)
  • Doc: Bump supported environments (CUDA 12 / cuDNN 8.7 / NCCL 2.16) (#7320)

Installation

  • Bump version to v11.5.0 (#7324)

Tests

  • CI: Support cuTENSOR 1.6.2 which defaults to CUDA 12 (#7241)
  • Filter SQLAlchemy's warning on which optuna depends in test (#7277)
  • Fix tests for NumPy 1.24 (c.f. #7286) (#7287)
  • Add CI for CUDA 12.0 (#7317)
  • CI: Use NVTX1 in FlexCI image (#7325)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@anaruse @hubertlu-tw @kmaehashi @leofang @takagi

cupy - v12.0.0b3

Published by kmaehashi over 1 year ago

This is the release note of v12.0.0b3. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

CUDA 12 & H100 Support

CuPy now supports CUDA 12.0 and NVIDIA's latest H100 GPU. Binary packages are available for Linux (x86_64/aarch64) and Windows.

$ pip install cupy-cuda12x --pre -f https://pip.cupy.dev/pre

Note that cuDNN support is unavailable at this time as cuDNN for CUDA 12 has not yet been released.

NXTX3

NVTX support in CuPy is now backed by NVTX3 instead of the legacy NVTX1.

Changes

New Features

  • Add cupyx.scipy.interpolate.make_interp_spline (#7195)
  • Implementing RegularGridInterpolator and interpn from scipy.interpolate (#7197)
  • Add PPoly to cupyx.scipy.interpolate (#7204)
  • Add uniform() to random generator (#7205)
  • Implement make_interp_spline(..., bc_type="periodic") (#7206)
  • JIT: Enhance thrust functions coverage (#7233)
  • Add CubicHermiteSpline to cupyx.scipy.interpolate (#7242)

Enhancements

  • Conditionally change identifiers for ROCm (#7079)
  • cupyx.scipy.sparse.linalg.spsolve : allow two-dimensional right-hand sides in A @ X = B (#7219)
  • Support CUDA 12.0 (#7235)
  • Extra fixes for CUDA 12.0 (#7236)
  • Adding smaller eigenvalues option in cupyx.scipy.sparse.linalg.eigsh (#7269)
  • Performance optimization of RegularGridInterpolator (#7275)
  • Add function to diagnose Windows DLL load issue (#7279)
  • Support NCCL 2.16 (#7283)
  • Bump to cuTENSOR 1.6.2 (#7284)
  • Support cuDNN 8.7 (#7285)
  • Add cupy-cuda12x to cupy-wheel (#7300)
  • Migrate to NVTX3 (#7304)
  • Update for deprecations in NumPy 1.24 (#7245)
  • Check if the slice does not have inhomogeneous shape before converting it to array (#7286)
  • Update array_api (#7313)

Bug Fixes

  • Fix interpreting Sparse init arguments (#7222)
  • Fix race condition in Jitify (#7259)
  • Support passing int as shape to broadcast_to (#7271)
  • Update cuTENSOR installer for CUDA 12.x (#7298)

Documentation

  • Bump docs requirements (#7247)
  • Add explanation for JIT kernel. (#7252)
  • Doc: Add interop example using raw pointers (#7278)
  • Doc: Bump supported environments (CUDA 12 / cuDNN 8.7 / NCCL 2.16) (#7310)

Installation

  • Bump version to v12.0.0b3 (#7323)

Tests

  • CI: Support cuTENSOR 1.6.2 which defaults to CUDA 12 (#7237)
  • Skip tests if SciPy is unavailable (#7239)
  • Fix CI failures related to cupyx.scipy.interpolate (#7262)
  • Filter SQLAlchemy's warning on which optuna depends in test (#7276)
  • Add CI for CUDA 12.0 (#7299)
  • CI: Use NVTX1 in FlexCI image (#7311)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@anaruse @andfoy @asi1024 @ev-br @hubertlu-tw @ideasrule @kmaehashi @leofang @mandal-saswata @oishigyunyu @takagi

cupy - v11.4.0

Published by takagi almost 2 years ago

This is the release note of v11.4.0. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Changes

Enhancements

  • Use cuSPARSE Generic API instead of older one documented to be removed (#7209)

Bug Fixes

  • Fix 1-dim lexsort (#7191)
  • Fix cupyx.scipy.ndimage.zoom for outputs of size 1 when mode is 'opencv' (#7202)
  • Split inputs to random routines (#7207)
  • Use list(kwargs) instead of list(kwargs.keys) (#7213)
  • Fix cusparseSpSM compatibility (#7220)

Tests

  • CI: Generate coverage count just after the parameter axis in table (#7188)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@emcastillo @hadipash @jjmortensen @kmaehashi @takagi

cupy - v12.0.0b2

Published by takagi almost 2 years ago

This is the release note of v12.0.0b2. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

More cupyx.scipy.interpolate APIs (#7086, #7190 and #7215)

Increased coverage of cupyx.scipy.interpolate APIs, which now includes BSpline, RBFInterpolator, splantider and splder.

Acknowledgements: This work was done by Edgar Andrés Margffoy Tuay (@andfoy) and Evgeni Burovski (@ev-br) under the support of the Chan Zuckerberg Initiative's Essential Open Source Software for Science program.

Use CUB reduction classes in cupyx.jit (#7145)

Now it is possible to use the CUB reduction classes, cub::WarpReduce and cub::BlockReduce, in kernels written using CuPy JIT.

import cupy, cupyx
from cupy.cuda import runtime
from cupyx import jit

@jit.rawkernel()
def warp_reduce_sum(x, y):
    WarpReduce = jit.cub.WarpReduce[cupy.int32]
    temp_storage = jit.shared_memory(
        dtype=WarpReduce.TempStorage, size=1)
    i, j = jit.blockIdx.x, jit.threadIdx.x
    value = x[i, j]
    aggregator = WarpReduce(temp_storage[0])
    aggregate = aggregator.Reduce(value, jit.cub.Sum())
    if j == 0:
        y[i] = aggregate

warp_size = 64 if runtime.is_hip else 32
h, w = (32, warp_size)
x = cupy.arange(h * w, dtype=cupy.int32).reshape(h, w)
cupy.random.shuffle(x)
y = cupy.zeros(h, dtype=cupy.int32)
warp_reduce_sum[h, w](x, y)

Acknowledgements: This work was done by Tsutsui Masayoshi (@TsutsuiMasayoshi) as a part of the internship program at Preferred Networks.

Changes

New Features

  • Add 1-D BSpline to interpolate module (#7086)
  • JIT: Support cub::WarpReduce and cub::BlockReduce (#7145)
  • Add cupyx.scipy.interpolate.RBFInterpolator (#7190)
  • Expose splder and splantider (#7215)

Enhancements

  • Use cuSPARSE Generic API instead of older one documented to be removed (#7052)
  • Improve _PerfCaseResult.to_str format (#7152)

Bug Fixes

  • Split inputs to random routines (#7173)
  • Fix 1-dim lexsort (#7178)
  • Fix cupyx.scipy.ndimage.zoom for outputs of size 1 when mode is 'opencv' (#7192)
  • Fix wrong argument in warnings.warn() (#7194)
  • Use list(kwargs) instead of list(kwargs.keys) (#7203)
  • Fix cusparseSpSM compatibility (#7214)
  • Remove scipy import (#7218)
  • Use naive comb() for Python 3.7 (#7221)

Tests

  • CI: Generate coverage count just after the parameter axis in table (#7175)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@anaruse @andfoy @asi1024 @emcastillo @ev-br @hadipash @jjmortensen @kmaehashi @takagi @TsutsuiMasayoshi

cupy - v11.3.0

Published by asi1024 almost 2 years ago

This is the release note of v11.3.0. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

Support for CUDA 11.8 & NVIDIA H100 GPUs

This release adds support for CUDA 11.8 and the latest NVIDIA H100 GPUs. Note that CUDA 11.8 support is included in the cupy-cuda11x wheel.

Support for Python 3.11

Wheels are now available for Python 3.11.

Changes

Enhancements

  • Add wrapper for cutensorPermutation (#7083)
  • Fix compile error from inf/nan in cupy.fuse (#7128)
  • Support CUDA 11.8 (#7134)
  • Add CUDA 11.8 on documents (#7148)
  • Support NCCL 2.15 (#7160)
  • Support CUDA 11.8 H100 GPUs (#7169)
  • Support Python 3.11 (#7179)
  • Fix indexing sparse matrix with empty index arguments (#7155)

Bug Fixes

  • Make sure weibull distribution support ndarrays (#7055)
  • Make sure that cupy (array-api) Array objects can be composed using asarray (#7095)
  • JIT: Fix compile error for op.routine including in0_type (#7096)
  • Don't use __del__ in TCPStore (#7111)
  • Fix cupy.nansum in fusing (#7114)
  • Fusion TypeError in cupy._core.fusion._call_ufunc() (#7130)
  • JIT: Fix compile error of minmax function (#7174)

Documentation

  • Docs: Add missing functions (#7112)

Installation

  • Support force-overwriting Docker image via workflow (#7091)
  • Bump version to v11.3.0 (#7182)

Tests

  • CI: Add ROCm 5.3 (#7125)
  • CI: Allow /test jenkins to trigger Jenkins only (#7129)
  • Install zlib for CUDA 11.8 Windows CI (#7138)
  • Fix for pytest 7.2 (#7149)
  • CI: improve use of cache in GitHub Actions (#7156)
  • CI: Add support for the latest FlexCI Windows image (#7172)

Others

  • CI: use pre-commit in GitHub Actions (#7132)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@anaruse @andfoy @asi1024 @emcastillo @kmaehashi @leofang @takagi

cupy - v12.0.0b1

Published by asi1024 almost 2 years ago

This is the release note of v12.0.0b1. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

Support for CUDA 11.8 & NVIDIA H100 GPUs

This release adds support for CUDA 11.8 and the latest NVIDIA H100 GPUs. Note that CUDA 11.8 support is included in the cupy-cuda11x wheel.

Support for Python 3.11

Wheels are now available for Python 3.11.

ufunc Methods

This release adds ufunc.reduce, ufunc.accumulate, ufunc.reduceat, and ufunc.at methods. See the documentation for more details.

Use Thrust in cupyx.jit (#7054, #7139)

Now it is possible to use the Thrust library device functions in kernels written using CuPy JIT.

import cupy, cupyx

@cupyx.jit.rawkernel()
def sort_by_key(x, y):
    i = cupyx.jit.threadIdx.x
    x_array = x[i]
    y_array = y[i]
    cupyx.jit.thrust.sort_by_key(
        cupyx.jit.thrust.device,
        x_array.begin(),
        x_array.end(),
        y_array.begin(),
    )

h, w = (256, 256)
x = cupy.arange(h * w, dtype=cupy.int32)
cupy.random.shuffle(x)
x = x.reshape(h, w)
y = cupy.arange(h * w, dtype=cupy.int32)
cupy.random.shuffle(y)
y = y.reshape(h, w)
sort_by_key[1, 256](x, y)

Currently supported Thrust functions are count, copy, find, mismatch, sort, sort_by_key.

Acknowledgements: This work was done by Tsutsui Masayoshi (@TsutsuiMasayoshi) as a part of the internship program at Preferred Networks.

Changes without compatibility

Deprecates ndarray.scatter_{add,max,min} (#7097)

cupy.ndarray.scatter_{add,max,min} methods are marked as deprecated. Use the corresponding ufunc methods (cupy.{add,maximum,minimum}.at) instead.

CUDA library wrappers now live in cupyx (#7013)

Previously, CuPy has been providing high-level wrappers for CUDA libraries as cupy.cudnn, cupy.cusolver, cupy.cusparse, and cupy.cutensor. These modules are now moved to cupyx as a part of the cupy namespace cleanup. The old modules are still available but marked as deprecated. Note that these modules are still undocumented and may be subject to change.

Changes

New Features

  • Add axis to cupy.logspace (#6797)
  • Support thrust::count, device in CuPy JIT (#7054)
  • Add cupy.ndarray.searchsorted (#7059)
  • Support add.at, maximum.at, minimum.at (#7077)
  • Add pdist implementation to distance functions (#7078)
  • Support subtract.at, bitwise_and.at, bitwise_or.at, bitwise_xor.at (#7099)
  • Add ufunc.reduce and ufunc.accumulate (#7105)
  • Add cupy.add.reduceat (#7115)
  • Implement cupy.min_scalar_type (#7136)
  • JIT: Support more thrust functions (#7139)

Enhancements

  • Move cupy.cudnn cupy.cusolver cupy.cutensor cupy.cusparse to cupyx (#7013)
  • Allow randint to support array bounds (#7051)
  • Deprecate ndarray.scatter_{add, max, min} (#7097)
  • Support CUDA 11.8 H100 GPUs (#7100)
  • Support CUDA 11.8 (#7117)
  • Add CUDA 11.8 on documents (#7119)
  • Fix compile error from inf/nan in cupy.fuse (#7122)
  • Raise TypeError instead of ValueError in cupy.from_dlpack when CPU tensor is passed (#7133)
  • Support NCCL 2.15 (#7153)
  • Support Python 3.11 (#7159)
  • Fix indexing sparse matrix with empty index arguments (#7143)

Bug Fixes

  • Make sure that cupy (array-api) Array objects can be composed using asarray (#6874)
  • Don't use __del__ in TCPStore (#6989)
  • JIT: Fix compile error for op.routine including in0_type (#7076)
  • Fix cupy.nansum in fusing (#7102)
  • Fusion TypeError in cupy._core.fusion._call_ufunc() (#7113)
  • Fix a typo (#7163)
  • JIT: Fix compile error of minmax function (#7167)

Code Fixes

  • Remove _ufunc_method directory (#7116)
  • Add missing base type to cdef declarations (#7170)

Documentation

  • Docs: Add missing functions (#7103)
  • Docs: ufunc methods (#7104)
  • Improve benchmark documentation (#7176)

Installation

  • Bump version to v12.0.0b1 (#7181)

Examples

Tests

  • CI: Add ROCm 5.3 (#7124)
  • CI: Allow /test jenkins to trigger Jenkins only (#7126)
  • Install zlib for CUDA 11.8 Windows CI (#7137)
  • CI: improve use of cache in GitHub Actions (#7141)
  • Fix for pytest 7.2 (#7147)
  • CI: Add support for the latest FlexCI Windows image (#7161)
  • JIT: Skip HIP thrust::sort test (#7162)
  • CI: use pre-commit in GitHub Actions (#7123)

Others

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@anaruse @andfoy @asi1024 @Diwakar-Gupta @emcastillo @IncubatorShokuhou @kmaehashi @MarcoGorelli @takagi @TsutsuiMasayoshi

cupy - v12.0.0a2

Published by kmaehashi about 2 years ago

This is the release note of v12.0.0a2. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

Increased cupyx.scipy APIs (#6773, #6990, #7014, #7015, #7036)

The coverage of SciPy interpolate & special APIs has increased. (Thanks @khushi-411 & @1MrEnot!)

Initial support for ufunc methods (#7049)

Starting from v12, CuPy will support the corresponding NumPy ufunc methods.
This release adds compatibility with ufunc.outer. Check the tracking issue (#7082) for detailed information.

Changes

New Features

  • Add cupyx.scipy.special.logsumexp (#6773)
  • Add cupyx.scipy.interpolate.KroghInterpolator (#6990)
  • Add scipy.special.expi and scipy.special.exp1 (#7014)
  • Add cupy.byte_bounds (#7015)
  • Adds cupyx.scipy.special.k0, cupyx.scipy.special.k1, cupyx.scipy.special.k0e, cupyx.scipy.special.k1e (#7036)
  • Add ufunc.outer (#7049)
  • Expose pairwise distance functions (#7063)

Enhancements

  • Support NCCL 2.12 ~ 2.14 (#6534)
  • Support cuDNN 8.5 (#7008)
  • Fix cupy.apply_along_axis for tuple retval (#7068)
  • Add wrapper for cutensorPermutation (#7070)

Bug Fixes

  • Fix JIT for scalar argument (#6948)
  • Make sparse argmin/max return a scalar array containing the index (#6976)
  • Fix csrsm2 memory leak (#7039)
  • Make sure weibull distribution support ndarrays (#7048)
  • Fix bessel test to pass ROCm CI (#7081)

Code Fixes

  • Cosmetic change in _routine_indexing.pyx (#7053)

Documentation

  • Fixes docstring for interpolation prefiltering (#6998)
  • Typo fix (#7045)

Tests

  • CI: Create a status for FlexCI dashboard (#7024)
  • CI: Migrate to GAR from GCR (#7064)
  • CI: tentatively fix hypothesis version (#7072)

Others

  • Introduce pre-commit (#6987)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@1MrEnot @andfoy @asi1024 @betatim @khushi-411 @kmaehashi @leofang @maronuu @takagi @wyli

cupy - v11.2.0

Published by kmaehashi about 2 years ago

This is the release note of v11.2.0. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Changes

Enhancements

  • Support NCCL 2.12 ~ 2.14 (#7069)
  • Support cuDNN 8.5 (#7071)

Bug Fixes

  • Fix csrsm2 memory leak (#7041)
  • Fix JIT for scalar argument (#7043)
  • Make sparse argmin/max return a scalar array containing the index (#7057)

Code Fixes

  • Cosmetic change in _routine_indexing.pyx (#7056)

Documentation

  • Fixes docstring for interpolation prefiltering (#7037)
  • Typo fix (#7047)

Installation

  • Remove use of distutils.utils (#7009)

Tests

  • CI: Create a status for FlexCI dashboard (#7034)
  • CI: Migrate to GAR from GCR (#7066)
  • CI: tentatively fix hypothesis version (#7073)

Others

  • Introduce pre-commit (#7067)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@andfoy @asi1024 @betatim @kmaehashi @leofang @takagi @wyli

cupy - v12.0.0a1

Published by emcastillo about 2 years ago

This is the release note of v12.0.0a1. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

Increased cupyx.scipy APIs (#6823, #6849, #6855, #6890, #6958, #6971)

The coverage of SciPy interpolate, stats & special APIs has increased. (Thanks @khushi-411 & @andoorve!)

Jetson AGX Orin Support (#6876)

Arm (aarch64) wheels are now compiled with support for compute capability 8.7.
These wheels are available through our Pip index: pip install --pre cupy-cuda11x -f https://pip.cupy.dev/aarch64

Changes

New Features

  • Add cupy.heaviside api. (#6798)
  • Add cupyx.scipy.special.log_softmax (#6823)
  • Add cupyx.scipy.stats.boxcox_llf (#6849)
  • Add cupyx.scipy.stats.{zmap, zscore} (#6855)
  • Add cupyx.scipy.special.softmax (#6890)
  • Add dtype, fweights, aweights to cupy.cov (#6892)
  • Add cupyx.scipy.interpolate.BarycentricInterpolator (#6958)
  • Add scipy.special.cosm1 to cupyx (#6971)

Enhancements

  • Enhance JIT error message when __device__ option is missing (#6837)
  • Fix augassign target is evaluated twice in JIT (#6844)
  • JIT: Add type annotation in _compile.py (#6859)
  • Add complex support for nanvar and nanstd (#6869)
  • Update cupy.array_api (#6871)
  • Accept kind in sort/argsort and fix cupy.array_api.{sort,argsort} accordingly (#6872)
  • Add CC 8.7 for Jetson Orin (#6876)
  • Update cupy-wheel for v11 (#6903)
  • Support deg in cupy.angle (#6905)
  • Make sure that uniform sampling respects broadcasting (#6928)
  • Update cupy.array_api (cont'd) (#6932)
  • Support SciPy 1.9 (#6962)
  • Make testing decorators able to use with @pytest.mark.parametrize in some cases (#6984)
  • Relaxed C-contiguous requirement for changing dtype of different size (#6848)
  • Support keepdims parameter for average (#6852)
  • Support equal_nan parameter for unique (#6853)

Performance Improvements

  • Efficiency improvements in cupyx.scipy.ndimage utilities (#6953)

Bug Fixes

  • Generate CUBIN for all supported GPUs at build time (#6875)
  • Fix boxcox_llf (#6884)
  • Fix real and imag in subclass (#6896)
  • Fix cupy.clip to match numpy (#6920)
  • Let argpartition use the kth argument properly (#6921)
  • Fix cuTensorNet shim layer (#6934)
  • Fix occasional hang in sparse distributed (#6942)
  • Fix SciPy dependency leak (#6947)
  • Fix CUB reduction with zero-size arrays (#6960)

Code Fixes

  • Fix function names (#6877)
  • Remove proxy functions for softlink (#6879)
  • Suppress nvcc warning (#6954)

Documentation

  • Bump documentation build requirements (#6825)
  • Reverting to v10 installation instruction until v11 stable release (#6836)
  • Fix ROCm supported versions in compat matrix (#6846)
  • Generate docs for private classes in one location (#6857)
  • Expand breaking change & best practice on device management (#6883)
  • Update installation guide for v11 (aarch64) (#6888)
  • Update install instructions on README (#6889)
  • Document matmul supports out (#6898)
  • Fix docs build failure (#6955)

Installation

  • Reorganize build scripts: define compile options declaratively (#6911)
  • Parallelize Cythonize (#6975)
  • Remove use of distutils.utils (#7006)

Examples

  • Make matrix in CG example positive definite (#6939)

Tests

  • Update tags for FlexCI projects (#6814)
  • Add config for cupy.win.cuda117 (#6880)
  • Fix XFAIL for tests/cupyx_tests/scipy_tests/sparse_tests/test_coo.py when scipy>=1.9.0rc2 (#6894)
  • Use ubuntu-22.04 as GitHub Actions runner image (#6988)
  • Revert comment fix (#6995)
  • Filter warnings from setuptools 65 (#7000)
  • CI: bump CUDA version used in cuda-python test (#7022)
  • CI: Add ROCm 5.1 and 5.2 (#6828)
  • CI: Show all errors when doc build fail (#6910)

Others

  • Bump version to v12.0.0a1 (#7027)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@andfoy @andoorve @asi1024 @BasLaa @emcastillo @grlee77 @khushi-411 @kmaehashi @leofang @pri1311 @takagi @tom24d @toslunar @tpkessler

Package Rankings
Top 0.96% on Pypi.org
Top 5.87% on Conda-forge.org
Top 8.17% on Proxy.golang.org
Top 19.57% on Anaconda.org
Badges
Extracted from project README
pypi Conda GitHub license Matrix Twitter Medium