cupy

NumPy & SciPy for GPU

MIT License

Downloads
758.5K
Stars
7.7K
Committers
370

Bot releases are hidden (Show)

cupy - v7.0.0

Published by asi1024 almost 5 years ago

This is the release note of v7.0.0. See here for the complete list of solved issues and merged PRs.

This release note only covers the difference from v7.0.0rc1; for all highlights and changes, please refer to the release notes of the pre-releases:

See the Upgrade Guide if you are upgrading from previous versions. Also, note that we dropped the support of Python 2.7 and 3.4 from CuPy v7.

Highlights

  • Added experimental support cuTENSOR 1.0.0. cuTENSOR is a library for high-performance tensor operations and is available for CUDA GPUs with compute capability of 70+. See cuTENSOR examples for the details.

Changes without compatibility

  • Stopped raising some errors in CuPy linalg functions by default for performance improvement. We can keep NumPy compatibility by setting cupyx.seterr(linalg=True), but it sometimes decrease performance because checking cuSOLVER devInfo and cuBLAS infoArray requires device synchronization.

New Features

  • Add scipy.fft to cupyx (#2355, thanks @peterbell10!)
  • Support separate compilation in RawKernel (#2426, thanks @leofang!)
  • Introduce errstate configuration to control cuSOLVER devInfo and cuBLAS infoArray checks (#2437)
  • Inverse for Hermitian matrix (#2495)
  • Introduce errstate and related functions (#2535)
  • Implement tobytes for CuPy arrays (#2617, thanks @jakirkham!)
  • Add fromfile (#2626, thanks @jakirkham!)
  • Add using_allocator context manager (#2627)
  • Add cuDNN new batch normalization (#2651)
  • Support CUB reduction for F-contiguous arrays (#2682, thanks @leofang!)
  • Support cuTENSOR 1.0.0 (#2709)
  • Add searchsorted (#2726)

Enhancements

  • Support cuComplex.h in cupy.RawKernel and cupy.RawModule (#2551, thanks @leofang and @grlee77!)
  • Reduce compile warnings (#2556)
  • Normalize strides of cuDNN descriptors (#2564)
  • Display versions of CUDA libraries (#2578)
  • Improve ROCm error handling (#2639)
  • Add support of complex dtypes for sinc (#2646)
  • Support thrust with ROCm (#2666)
  • Improve ndattay.reduced_view (#2694)
  • Make set_allocator and get_allocator symmetric (#2707)
  • Remove cupy.cupyx (#2722)
  • Add a few missing stubs for ROCm/HIP build (#2737, thanks @leofang!)
  • Support Python 3.8 on Windows (#2738)
  • Implement ParameterInfo.__repr__ (#2747)

Performance Improvements

  • Refactor CUB to support an explicit axis argument; Fix alignments for Thrust's complex types (#2562, thanks @leofang!)
  • Add CUB support for argmax() and argmin() (#2596, thanks @leofang!)
  • Avoid __init__ function call overhead in memory allocation (#2671)
  • Avoid with overhead in memory allocation (#2672)
  • Avoid use of slow numpy.find_common_type (#2683, thanks @grlee77!)
  • Cache dtype object for speed in _scalar (#2684)
  • Cache ElementwiseKernel object (#2685)
  • Avoid threading.local() object overhead (#2687)
  • Add prod_sequence to avoid creating vector (#2689)
  • Remove memory allocation in set_contiguous_strides (#2690)
  • Avoid __init__ call when creating CArray object (#2691)
  • Reduce memory allocation in improve get_reduced_dims (#2692)
  • Improve _reduce_dims (#2693)
  • Improve small issues (#2695)
  • Improve broadcast (#2696)
  • CUB-based CSR sparse matrix vector multiply (#2698, thanks @grlee77!)
  • Avoid module level lookup in _dtype (#2700)
  • Add _ndarray_init to reduce ndarray creation cost (#2701)

Bug Fixes

  • Cache ElementwiseKernel kernel globally instead of per instance (#2474)
  • Use != instead of is not for literal (#2561, thanks @Dobatymo!)
  • Remove cuSPARSE APIs dropped in CUDA 10.1 Update 2 (#2573)
  • Support 0-sized arrays for linalg.qr (#2586, thanks @IvanYashchuk!)
  • Fix __cuda_array_interface__ data pointer for 0-size arrays (#2611, thanks @leofang!)
  • Fix ROCm build error (#2632)
  • Fix bugs in CUB (#2636, thanks @leofang!)
  • Avoid using __align__ in ROCm (#2638)
  • Remove stubs for APIs dropped in CUDA 10.1 Update 2 (#2641)
  • Do not allow reshape on empty arrays (#2648)
  • Fix pinv for complex datatypes (#2657, thanks @YoujinShin!)
  • Fix det and slogdet on singular inputs (#2660)
  • Handle tuple with value 0 and return empty array (#2662, thanks @quasiben!)
  • Fix AttributeError of stride_tricks (#2679)

Code Fixes

  • Remove redundant definitions in cupy_cufft.h (#2560, thanks @leofang!)
  • Type dumps return value as bytes (#2619, thanks @jakirkham!)
  • Remove std::map for simple implementation (#2670)
  • Improve reduction core (#2697)
  • Remove insignificant assertion (#2714)
  • Avoid tricky initialization of block stride (#2729)
  • Remove cupy/internal.py (#2739, thanks @leofang!)

Documentation

  • Add CUDA API runtime API list (#2557)
  • Document more environment variables (#2593, thanks @leofang!)
  • Update CODE_OF_CONDUCT typo (#2609)
  • Expand TOC to improve document index page (#2642)
  • Fix document format of as_strided (#2680)
  • Update requirements (#2756)

Installation

  • Package tests in sdist (#2563, thanks @jakirkham!)
  • Fix url to use the home page address (#2580)
  • Add software description to setup.py (#2582)
  • Import CUDA headers from CUDA 10.1 Update 2 (10.1.243) (#2592)
  • Fix invaild requirements (#2630)

Examples

  • Show better performance improvement in examples/stream/map_reduce.py (#2588, thanks @leofang!)

Tests

  • Add CI configuration for ROCm (#2408)
  • Add backward compatibility test for __cuda_array_interface__ (#2536, thanks @leofang!)
  • Include .git in ChainerCV compatibility CI (#2577)
  • Update testing.parameterize using the latest version from Chainer (#2633, thanks @grlee77!)
  • Add FlexCI configurations (#2649)
  • Add test for get_c_contiguity (#2686)
  • Skip tests that segfault when using SciPy 1.3.x (#2712, thanks @grlee77!)
  • Fix broken version specification in FlexCI dockerfiles (#2728)
cupy - v6.6.0

Published by toslunar almost 5 years ago

This is the release note of v6.6.0. See here for the complete list of solved issues and merged PRs.

Highlights

Python 3.8 is now officially supported and we are providing wheels for this version.

Enhancements

  • Support complex dtypes in cupy.where (#2615, thanks @AntoineDujardin!)
  • Update __cuda_array_interface__ to protocol version 2 (#2669, thanks @leofang!)
  • Remove cupy.cupyx (#2724)
  • Support Python 3.8 on Windows (#2751)

Bug Fixes

  • Make exceptions picklable (#2567)
  • Support 0-sized arrays for linalg.qr (#2602, thanks @IvanYashchuk)
  • Do not allow reshape on empty arrays (#2652)
  • Cache ElementwiseKernel kernel globally instead of per instance (#2659)
  • Handle tuple with value 0 and return empty array (#2706, thanks @quasiben!)

Code Fixes

  • Remove insignificant assertion (#2719)

Documentation

  • Document more environment variables (#2612, thanks @leofang!)
  • Update CODE_OF_CONDUCT typo (#2628)
  • Fix document format of as_strided (#2713)
  • Update requirements (#2755)

Tests

  • Add missing FlexCI configurations (#2591)
  • Add FlexCI configurations (#2715)
  • Skip tests that segfault when using SciPy 1.3.x (#2721, thanks @grlee77!)
  • Fix broken version specification in FlexCI dockerfiles (#2736)

Installation

  • Package tests in sdist (#2572, thanks @jakirkham!)
  • Fix url to use the home page address (#2590)
  • Import CUDA headers from CUDA 10.1 Update 2 (10.1.243) (#2605)
  • Fix invaild requirements (#2640)
  • Add software description to setup.py (#2653)

Examples

  • Show better performance improvement in examples/stream/map_reduce.py (#2603, thanks @leofang!)

Others

  • Fix AttributeError of stride_tricks (#2705)
cupy - v6.5.0

Published by asi1024 almost 5 years ago

This is the release note of v6.5.0. See here for the complete list of solved issues and merged PRs.

Enhancements

  • Ignore warning caused by fastrlock (#2501)
  • Fix cupy.repeat error message about repeats argument type (#2506)

Bug Fixes

  • Fix coosort (#2487, thanks @econtal!)

Documentation

  • Fix some typo (#2527, thanks @garanews!)
  • Fix dead links in NumPy docs in random functions (#2554)

Tests

  • Fix memory pool disabled during tests (#2502)
  • Move CI requirements to CuPy repository (#2543)
  • Fix for NumPy 1.14.x compatibility (#2546)
cupy - v7.0.0rc1

Published by beam2d almost 5 years ago

This is the release note of v7.0.0rc1. See here for the complete list of solved issues and merged PRs.

Announcements

This time, we will keep the current branches for active development (master for v7.x, v6 for v6.x) after the RC. We will maintain v6.x series until Python2 EOL, so we do not cut the new development version for now to avoid increasing the number of branches to maintain. New features will be included directly into v7 for a while, and maintenance changes will be backported to v6.

Highlights

  • Experimental support of AMD GPUs are added (#1094). See the installation guide for how to install CuPy with AMD support. Note that this feature is still experimental, and we do not guarantee the API stability.

Changes without compatibility

  • Avoid casting inputs to cupy.ndarray in cupy.pad (#2504)
    • From this release, cupy.pad does no longer convert the input to cupy.ndarray automatically. This is done under the design principle of not implicitly synchronizing the host and the device, which most of the other APIs are also following.

New Features

  • Experimental support of AMD GPU via HIP (ROCm2.7.0+) (#1094)
  • Adds nvcc as a RawKernel backend (#1941, thanks @sjperkins and @leofang!)
  • Support cuTENSOR 0.2 (#2341)
  • Implement isin and in1d (#2388, thanks @UmashankarTriforce!)
  • Support scipy.ndimage compatible convolve and correlate (#2483)
  • Added cupy.cuda.memory.get_allocator interface (#2489)
  • Handle PCI bus ID (#2531, thanks @jameshclrk!)
  • Expand coverage of cuSolverSP APIs (#2539)
  • Add cuSPARSE routines for preconditioners (#2542)

Enhancements

  • Fix division by zero in mean/std/var functions for 0-length dimensions (#2201, thanks @pentschev!)
  • Improve error message in cupy.linalg.inv (#2342)
  • Replace cupy.pad with a heavily refactored version from NumPy 1.17 (#2399, thanks @grlee77!)
  • Fix cupy.repeat error message about repeats argument type (#2400)
  • Ignore warning caused by fastrlock (#2488)
  • Update __cuda_array_interface__ to protocol version 2 (#2491, thanks @leofang!)
  • Allow axis=None in concatenate (#2496, thanks @liwt31!)
  • Fix @testing.numpy_cupy_ decorators for skips (#2498)
  • Avoid implicit cast inputs to cupy.ndarray in cupy.pad (#2504)
  • Cholesky decomposition to support complex values (#2509)
  • Enhance shuffle-test of testing.for_dtypes_combination (#2511)
  • Allow to use real and imag on CUDA kernels (#2520)
  • Support complex numbers in cupy.linalg.qr() (#2526, thanks @leofang!)
  • Fix bug in CUB + Native support of complex numbers in CUB (#2538, thanks @leofang!)
  • Support cupyx.fallback_mode as an experimental feature (#2541)
  • Support stream in CUB (#2555, thanks @leofang!)

Performance Improvements

  • Performance improvement for cupy.var complex inputs (#2484)
  • Enable fast CUB-based reductions in more cases (cupy.linalg.norm, etc.) (#2517, thanks @grlee77!)

Code Fixes

  • Remove unnecessary check of cuSOLVER (#2529, thanks @grlee77!)

Documentation

  • Fix dead links in NumPy docs in random functions (#2384)
  • Update install_rocm.rst (#2512)
  • Fix some typo (#2523, thanks @garanews!)

Tests

  • Fix memory pool disabled during tests (#2452)
  • Skip bool-bool inputs in cupy.cross test (#2503)
  • Fix error in test_build.py (#2514, thanks @leofang!)
  • Move CI requirements to CuPy repository (#2533)
  • Fix for NumPy 1.14.x compatibility (#2544)
  • Workaround bug in NumPy 1.12.x or earlier (#2545)
cupy - v7.0.0b4

Published by hvy about 5 years ago

This is the release note of v7.0.0b4. See here for the complete list of solved issues and merged PRs.

New Features

  • Add cupy.cross (#2366, thanks @UmashankarTriforce!)
  • Add cupy.may_share_memory (#2417)
  • Add occupancy driver APIs (#2424, thanks @leofang!)
  • Add scatter_max and scatter_min (#2427)
  • Support texture memory in RawKernel (#2432, thanks @leofang!)

Enhancements

  • Extend CuDNNError to have more debugging information (#2404)
  • Add complex dtype support to cupy.std and cupy.var (#2411, thanks @grlee77!)
  • Support complex dtypes in cupy.linalg.inv() (#2468, thanks @leofang!)

Performance Improvements

  • Use CUB to speed up sum/min/max (#2090)
  • Avoid creating empty numpy ndarray in common_type (#2307)
  • cupy.linalg.norm: update docstring and improve performance for ord=2 cases (#2479, thanks @grlee77!)

Bug Fixes

  • Fix bug in coosort (#2410, thanks @econtal!)
  • Fix multithreading issue in cupy.cuda.cufft.get_current_plan() (#2435, thanks @leofang!)
  • Normalize hidden layer strides in cuDNN RNN (#2442)
  • Copy inputs in ufunc if they share the same memory with outputs (#2460)

Code Fixes

  • Fix cuSOLVER devInfo dtype in inv and unify how those are specified (#2454)
  • Fix code convention of statistics routines (#2459)
  • Remove unnecessary ndarray private methods (#2465)

Documentation

  • Document CUTENSOR_PATH environment variable (#2386)
  • Add appropriate pointer types in Cython to the contribute guide (#2455, thanks @leofang!)
  • Fix invalid escape sequence (#2470)

Installation

  • Fix NumPy version in Dockerfile (#2430)

Examples

  • Broadcast ValueError for n-clusters > 2 in k-means example (#2453, thanks @casheera!)

Tests

  • Simplify cupy.fuse tests (#2339)
  • Revert "Skip some ndarray-elementwise-op tests as temporary fix" (#2383)
  • Drop Python 2 Travis CI configuration (#2428)
  • Drop Python 2 PFN CI configuration (#2429)
  • Change URL to place test assets (#2434)
  • Add NumPy 1.17.1 to skip list of 0-length ifft test (#2441)
  • Remove unnecessary skip in 0-length ifft test (#2443)
  • Fix TestDLTensorMemory.test_delete (#2451)
  • Fix data race in advanced indexing test (#2472)
  • Fix pytest 5.x version errors (#2473)
  • Remove global state from tests (#2475, thanks @leofang!)
cupy - v6.4.0

Published by beam2d about 5 years ago

This is the release note of v6.4.0. See here for the complete list of solved issues and merged PRs.

Enhancements

  • Support for shape argument in *_like functions (#2418, thanks @pentschev!)

Bug Fixes

  • Normalize hidden layer strides in cuDNN RNN (#2448)

Documentation

  • Add appropriate pointer types in Cython to the contribute guide (#2480, thanks @leofang!)

Installation

  • Fix NumPy version in Dockerfile (#2447)

Tests

  • Add NumPy 1.17.1 to skip list of 0-length ifft test (#2445)
  • Fix TestDLTensorMemory.test_delete (#2458)
  • Fix data race in advanced indexing test (#2482)
cupy - v7.0.0b3

Published by asi1024 about 5 years ago

This is the release note of v7.0.0b3. See here for the complete list of solved issues and merged PRs.

Highlights

cupy.RawModule has been introduced to allow users access low-level features (CUDA modules).

Dropping Support of Python 2

Due to the end-of-life (EOL) of Python 2 in January 2020, Python 2 support has been dropped in this release. CuPy v6.x continues to support Python 2. See the blog post for details.

New Features

  • Add cupy.nanargmin and cupy.nanargmax (#2222, thanks @harshalchaudhari35!)
  • Support cuDNN Fused Ops API (#2246)
  • Add fallback_mode.ndarray (#2272, thanks @Piyush-555!)
  • Add NCCL broadcast (#2303)
  • Add take_along_axis (#2314)
  • Add cupy.nanmean (#2319, thanks @Piyush-555!)
  • Add more cuSolver APIs for dense linear solver (#2320)
  • Provide full coverage for NCCL APIs in CuPy (#2325, thanks @leofang!)
  • Add cupy.nanvar and cupy.nanstd (#2344, thanks @Piyush-555!)
  • Allow getting and setting CUDA kernel attributes (#2369, thanks @leofang, @grlee77 and @andravin!)
  • Add cupy.RawModule() (#2389, thanks @leofang!)
  • Add diagonal method for cupyx.scipy.sparse.dia_matrix (#2398, thanks @grlee77!)

Enhancements

  • Support for shape argument in *_like functions (#2171, thanks @pentschev!)
  • Support batched matrix inverse (#2300)
  • Add wrapper for numpy.vectorize in fallback_mode (#2350, thanks @Piyush-555!)
  • Allow cuFFT plans to be used as a context manager; Set stream before executing cuFFT plans (#2362, thanks @leofang!)
  • Raise DeprecationWarning on 0-dim arrays in numpy.nonzero to match NumPy 1.17 behavior (#2394)
  • Change check condition of accept_error (#2396)

Bug Fixes

  • Make exceptions picklable (#2318)
  • Make cupy.tensordot use Tensor Core also in case of compute-capability > 70 (#2328)

Code Fixes

  • Fix unused imports and import orders (#2312)
  • Fix variable names in tri kernel (#2326)
  • Some simplifications using isnan (#2364)
  • Small fix of fusion.pyx and docs (#2393, thanks @xuzijian629!)

Documentation

  • Add new functions to API reference (#2308)
  • Fix for linalg.svd documentation (#2321, thanks @IvanYashchuk!)
  • Fix markup in linalg.svd docs (#2323)
  • Fix doctest failure related to __array_function__ support (#2352)
  • Add 1.17 to supported NumPy versions (#2368)

Tests

  • Make wheel of master for CI (#2144)
  • Add ChainerCV's tests to pfnCI (#2186)
  • Fix tests of linalg.svd (#2338)
  • Check return type in test_type_routines.py (#2358)
  • Skip some FFT tests on NumPy 1.17.0 due to a NumPy bug (#2363, thanks @grlee77!)
  • Remove random.power test with forbidden value (#2375)
  • Skip some tests in TestArrayElementwiseOp as temporary fix (#2376)
  • Avoid using numpy.nonzero in _make_decorator (#2385)
  • Increase numbers of retries for K-S tests (#2397)

Others

  • Drop support for Python 2.7 and 3.4 (#2343)
cupy - v6.3.0

Published by beam2d about 5 years ago

This is the release note of v6.3.0. See here for the complete list of solved issues and merged PRs.

Highlights

  • NumPy 1.17 is now officially supported.

New Features

  • Add diagonal method for cupyx.scipy.sparse.dia_matrix (#2412, thanks @grlee77!)
  • Support allow_pickle in cupy.load and cupy.save (#2291)

Bug Fixes

  • Make cupy.tensordot use Tensor Core also in case of compute-capability > 70 (#2335)

Code Fixes

  • Fix unused imports and import orders (#2336)

Documentation

  • Add new functions to API reference (#2317)
  • Fix for linalg.svd documentation (#2322, thanks @IvanYashchuk!)
  • Fix markup in linalg.svd docs (#2327)
  • Fix doctest failure related to __array_function__ support (#2353)
  • Add 1.17 to supported NumPy versions (#2413)

Tests

  • Skip some ndarray-elementwise-op tests as temporary fix (#2378)
  • Remove random.power test with forbidden value (#2380)
  • Skip some FFT tests on NumPy 1.17.0 due to a NumPy bug (#2381, thanks @grlee77!)
  • Avoid using numpy.nonzero in _make_decorator (#2395)
  • Skip nonzero test for 0d arrays in NumPy 1.17 (#2401)
  • Increase numbers of retries for K-S tests (#2402)
cupy - v7.0.0b2

Published by hvy over 5 years ago

This is the release note of v7.0.0b2. See here for the complete list of solved issues and merged PRs.

Highlights

  • cupy.cutensor has been introduced that wraps cuTENSOR, allowing high-performance tensor operations. Examples are available here.

Changes without compatibility

  • cupy.load now specifies allow_pickle=False by default to follow the security fix made in NumPy 1.16.3 (see numpy/numpy #13359 and cupy/cupy #2290 for details). Most users should not be affected by this change; users loading ndarray serialized using pickle may need to explicitly specify allow_pickle=True.

New Features

  • Support cuTENSOR (#2210)
  • Add nansum and nanprod support (#2252, thanks @pentschev!)

Enhancements

  • Raise error when no available algorithm found by cudnnFindConvolution* (#2234)
  • Add NHWC layout support to batch normalization (#2235)
  • Remove unused code (#2255)
  • Support non-square matrices in lu_factor (#2286, thanks @econtal!)
  • Remove cupy.ndarray.{nansum/nanprod} (#2292)
  • Remove warning about nvcc absence (#2299)

Performance Improvements

  • Add support for merge path algorithm (csrmvEx) when csr_matrix multiply with a dense vector (#2287, thanks @wonghang!)

Bug Fixes

  • Add wrappers for can_cast, common_type and result_type functions (#2249, thanks @pentschev!)
  • Make __cuda_array_interface__()['strides'] be tuple (#2260, thanks @leofang!)
  • Avoid CUDNN_STATUS_BAD_PARAMS (#2261, thanks @himkt!)
  • Support allow_pickle in cupy.load and cupy.save (#2290)

Documentation

  • Update installation guide (#2184)
  • Document interoperability with mpi4py (#2270, thanks @leofang!)

Installation

  • Make nvcc code generation target configure by env var (#2293)

Tests

  • Fix test failure when cuDNN is unavailable (#2284)
  • Enhance tests for type routines (#2306)
cupy - v6.2.0

Published by niboshi over 5 years ago

This is the release note of v6.2.0. See here for the complete list of solved issues and merged PRs.

New Features

  • Allow copying in the format cupy_array[:] = numpy_array (#2219, thanks @pentschev!)

Enhancements

  • Fix _preprocess_args to avoid calling hasattr (#2263)
  • Simplify _prepare_multiple_array_indexing (#2264)
  • Add constant modification to vector_equal (#2265)
  • Remove unused code (#2275)
  • Remove warning about nvcc absence (#2302)

Bug Fixes

  • Make __cuda_array_interface__()['strides'] be tuple (#2273, thanks @leofang!)
  • Avoid CUDNN_STATUS_BAD_PARAMS (#2285, thanks @himkt!)
  • Add wrappers for can_cast, common_type and result_type functions (#2304, thanks @pentschev!)

Documentation

  • Document interoperability with mpi4py (#2288, thanks @leofang!)
  • Update installation guide (#2297)

Tests

  • Enhance tests for type routines (#2311)
cupy - v7.0.0b1

Published by hvy over 5 years ago

This is the release note of v7.0.0b1. See here for the complete list of solved issues and merged PRs.

Highlights

Host to device copy from NumPy ndarrays is now allowed as an experimental feature with the syntax cupy_array[:] = numpy_array. Set the environment variable CUPY_EXPERIMENTAL_SLICE_COPY to try it out.

Notes

  • Tensor Core in cuDNN convolution is tentatively disabled for Turing GPUs due to some test failures. The issue is under investigation and hopefully fixed in a future version.

New Features

  • Allow copying in the format cupy_array[:] = numpy_array (#2079, thanks @pentschev!)
  • Add linalg.lstsq (#2165, thanks @cjekel!)
  • Add fallback_mode (#2229, thanks @Piyush-555!)
  • Add CUDNN_POOLING_MAX_DETERMINISTIC (#2239)
  • Add API to retrieve installation info (#2245)

Enhancements

  • Improve atomicAdd in histogram and sample (#1345)
  • Set current device in cupy.ndarray.get()/set() (#2169, thanks @hyabe!)
  • Improve out-of-memory error message (#2242)
  • Simplify _prepare_multiple_array_indexing (#2254)
  • Fix _preprocess_args to avoid calling hasattr (#2256)
  • Add constant modification to vector_equal (#2257)

Performance Improvements

  • Optimize the initialization of List[ndarray] in cupy.array (#2081)

Bug Fixes

  • Revert "Fix usage check of Tensor Core" (#2197)
  • Fix power for large integrals (#2204)
  • Avoid division by zero in tensordot, allowing 0-length arrays (#2209, thanks @pentschev!)
  • Make RandomState.permutation compatible with random.permutation (#2250)

Code Fixes

  • Coding style fix (#2211)

Documentation

  • Add cupy-cuda101 to README (#2196)
  • Fix: duplicate object description of cupy (#2233)

Installation

  • Add setup option to copy include files in wheel (#2208)
  • Bump version to v7.0.0b1 (#2266)

Tests

  • Ignore invalid axis type test in NumPy 1.12.x or earlier (#2192)

Others

  • Fix array_split with non-equally dividing sections (#2207)
cupy - v6.1.0

Published by niboshi over 5 years ago

This is the release note of v6.1.0. See here for the complete list of solved issues and merged PRs.

Notes

  • Tensor Core in cuDNN convolution is tentatively disabled for Turing GPUs due to some test failures. The issue is under investigation and hopefully fixed in a future version.

Enhancements

  • Improve atomicAdd in histgram and sample (#2217)

Bug Fixes

  • Revert "Fix usage check of Tensor Core" (#2198)
  • Fix array_split with non-equally dividing sections (#2214)
  • Avoid division by zero in tensordot, allowing 0-length arrays (#2231, thanks @pentschev!)
  • Make RandomState.permutation compatible with random.permutation (#2253)

Documentation

  • Add cupy-cuda101 to README (#2203)
  • Fix duplicate object description of cupy (#2240)

Installation

  • Add setup option to copy include files in wheel (#2232)

Tests

  • Ignore invalid axis type test in NumPy 1.12.x or earlier (#2200)
cupy - v7.0.0a1

Published by kmaehashi over 5 years ago

This is the release note of v7.0.0a1. See here for the complete list of solved issues and merged PRs.

Highlights

  • CuPy memory pool now supports setting hard-limit quota for the amount of GPU memory allocated. Refer to the reference for the details.

New Features

  • Support cuDNN CTC functions (#1769, thanks @aonotas!)
  • Support NHWC format in convolution (#1885)
  • Add hostRegister and hostUnregister (#2102)
  • Implement limit to memory pool (#2113)
  • Add strides_check option in array testing functions (#2150)

Enhancements

  • Fix ascontiguousarray with 0-dim array input (#2078)
  • Emit kernel names with type names (#2151)
  • Support complex dtypes in cupy.where (#2175, thanks @AntoineDujardin!)

Bug Fixes

  • Fix __cuda_array_interface__ data pointer for sliced arrays (#2129, thanks @pentschev!)
  • Fix usage check of Tensor Core (#2168)
  • Avoid using Tensor Core with cuDNN deterministic mode in convolution backward (#2174)

Code Fixes

  • Avoid unnecessary weak pointer for null stream (#1539)
  • Avoid PyThread in stream.pyx (#1945)
  • Add a comment to the testing condition of einsum (#2131)
  • Fix style (#2167)

Documentation

  • Add upgrade guide for v6 (#2182)

Installation

  • Fix compile error on CUDA 10.1 and GCC 7 or 8 (#2147, thanks @grafi-tt!)

Examples

  • Make k-means example's custom kernels simpler (#2145)
  • Make k-means sample code cleaner for educational purpose (#2146)

Tests

  • Fix testing condition of diff and unwrap (#2124)
  • Add a test for assert_array_equal(strides_check=True) (#2156)
  • Fix test failure when cudnn is unavailable (#2161)
cupy - v6.0.0

Published by beam2d over 5 years ago

This is the release note of v6.0.0. See here for the complete list of solved issues and merged PRs.

This release note only covers the difference from v6.0.0rc1; for all highlights and changes, please refer to the release notes of the pre-releases:

See the Upgrade Guide if you are upgrading from previous versions.

Bug Fixes

  • Fix __cuda_array_interface__ data pointer for sliced arrays (#2134, thanks @pentschev!)
  • Fix usage check of Tensor Core (#2172)
  • Avoid using Tensor Core with cuDNN deterministic mode in convolution backward (#2176)

Code Fixes

  • Avoid unnecessary weak pointer for null stream (#2154)

Documentation

  • Add upgrade guide for v6 (#2187)

Installation

  • Fix compile error on CUDA 10.1 and GCC 7 or 8 (#2160, thanks @grafi-tt!)

Tests

  • Fix testing condition of diff and unwrap (#2142)
cupy - v5.4.0

Published by niboshi over 5 years ago

This is the release note of v5.4.0. This is the final release of v5.x series. See here for the complete list of solved issues and merged PRs.

Highlights

  • CUDA 10.1 and cuDNN 7.5 are now supported. CuPy also starts to compile for compute capability 7.5 for Turing GPUs.

Enhancements

  • Avoid using pytest attributes during import (#2057)
  • Fix fp16 issue in batch normalization (#2094, thanks @anaruse!)
  • Keep backward compatibility on cupy.cudnn.batch_normalization_forward_training (#2094)
  • Support CUDA 10.1 + cuDNN 7.5 + Turing (#2123)

Bug Fixes

  • Ensure that sparse matrix shapes are always a tuple of int (#2082, thanks @grlee77!)
  • Fix assigning from complex to float (#2092)
  • Check array contiguity in copy (#2093)
  • Fix assertion error in _Chunk.split (#2112, thanks @liwt31!)
  • Support dltensor with strides of NULL (#2119, thanks @crcrpar!)
  • Avoid sharing handles between threads (#2122)

Code Fixes

  • Remove unused variable (#2108, thanks @crcrpar!)

Documentation

  • Add NCCL v2.4 support to docs (#2089)
  • Add strides to docstring of ndarray (#2109, thanks @crcrpar!)

Tests

  • Do not fail by warnings when building docs (#2086)
cupy - v6.0.0rc1

Published by beam2d over 5 years ago

This is the release note of v6.0.0rc1. See here for the complete list of solved issues and merged PRs.

Highlights

  • CUDA 10.1 and cuDNN 7.5 are now supported. CuPy also starts to compile for compute capability 7.5 for Turing GPUs.
  • After this release, the master branch is switched to the development of v7 series. v6.0.0 will continue developing at the v6 branch.

New Features

  • New RNN API introduced in cuDNN v7.2 (#1609)
  • Add diff and unwrap (#1933, thanks @a2kiti!)
  • Support fusion feature of copyto method (#1983)
  • Add lu_factor and lu_solve to cupyx.scipy.linalg (#2051, thanks @msakai!)

Enhancements

  • More support __cuda_array_interface__ (#2058)
  • Fix fp16 issue in batch normalization (#2060)
  • Keep backward compatibility on cupy.cudnn.batch_normalization_forward_training (#2072)
  • Check if gc module is still available (#2116)
  • Support CUDA 10.1 + cuDNN 7.5 + Turing (#2117)

Performance Improvements

  • Do exact type comparison instead of isinstance for numpy.dtype (#2016)
  • Improve _routines_manipulation (#2038)

Bug Fixes

  • Fix assigning from complex to float (#1911)
  • Ensure that sparse matrix shapes are always a tuple of int (#1943, thanks @grlee77!)
  • Avoid sharing handles between threads (#2053)
  • Check array contiguity in copy (#2075)
  • Support dltensor with strides of NULL (#2097, thanks @crcrpar!)
  • Fix assertion error in _Chunk.split (#2103, thanks @liwt31!)

Code Fixes

  • Use single quote (#2049)
  • Use assert in helper.py instead of self.assertXXX (#2077)
  • Remove unused variable (#2105, thanks @crcrpar!)
  • Reorganize import in device.pyx (#2121)

Documentation

  • Document __array_function__ (#1979)
  • Add NCCL v2.4 support to docs (#2065)
  • Add strides to docstring of ndarray (#2096, thanks @crcrpar!)

Installation

  • Use deep copy in setting up RPATH (#2073)

Tests

  • Do not fail by warnings when building docs (#2080)
  • Test linalg.cholesky with more stable inputs (#2084)
cupy - v6.0.0b3

Published by beam2d over 5 years ago

This is the release note of v6.0.0b3. See here for the complete list of solved issues and merged PRs.

New Features

  • Implement cupy.put and cupy.place (#1787, thanks @grafi-tt!)
  • Add plan argument to FFT functions in cupyx.scipy.fftpack (#1942 #2033, thanks @leofang!)
  • Add out argument to ndarray.get for asynchronous device-to-host copy (#1970, thanks @jeng1220)
  • Add NCCL 2.4 functions (#1992)

Enhancements

  • Remove experimental warning of cupy.fuse (#1379)
  • Support comparing complex-number arrays in cupy.allclose (#1947, thanks @leofang!)
  • Use version interface in NCCL 2.3.4 (#1985)
  • Enhance error messages of elementwise operation in fusion mode (#2007)
  • Fix vector handling (#2008)
  • Support cuDNN FP16 batch normalization (#2034)
  • Avoid import numpy (#2040)
  • Avoid using pytest attributes during import (#2055)
  • Support Python scalars in iscomplexobj (#1991)

Performance Improvements

  • Improve performance of reduction on outer axis (#2010, thanks @grafi-tt!)
  • Reduce vector copy in ndarray initialization (#2015)
  • Improve kind score operation (#2017)
  • Improve dictionary operation (#2018)
  • Improve get_ufunc_kernel (#2019)
  • Reduce get_device_id call (#2021)
  • Improve performance of broadcasting in elementwise kernels (#2022)
  • Improve _is_fusing performance (#2023)

Bug Fixes

  • Fix cupy.random.randint fail with size zero (#1967)
  • Call free_all_blocks to free CUDA memory (#1984)
  • Define __dealloc__ instead of __del__ for cdef-classes to fix memory leak (#1995, thanks @msakai!)
  • Fix NCCL version assignment for NCCL < 2.3.4 (#2009)
  • Fix __array_function__ bug (#2024)
  • Fix usage array function for CuPy modules (#2026, thanks @pentschev!)
  • Fix random generator seed type (#2036)

Code Fixes

  • Remove duplicate declaration of cudaMalloc (#2028, thanks @grlee77!)
  • Fix incorrect dtype for ipiv buffer in cupy.linalg.inv (#2043, thanks @msakai!)

Documentation

  • Add documentation of cupy.fuse (#1789)
  • Fix README image to use URL (#1977)
  • Add cupy-cuda100 to README (#1978)
  • Document NumPy 1.16 support (#1986)
  • Document new features in cupy.fft and cupyx.scipy.fftpack (#2035, thanks @leofang!)
  • Fix cupyx.scipy.get_array_module's docstring (#2050, thanks @msakai!)
  • Make plan argument in 2D/3D FFT functions experimental (#2056)

Installation

  • Eliminate RUNPATH to use correct cuDNN (#1770)
  • Fix eliminate-runpath (#2014)
  • Fix RPATH not set correctly (#2064)

Examples

  • Improve memcpy example (#1999, thanks @grafi-tt!)

Tests

  • Fix test of assert_array_list_equal (#1997)
  • Fix test with NumPy 1.16.1 (#1998)
  • Do not "accept error" if bad kwarg is passed to test itself (#2001)
cupy - v5.3.0

Published by hvy over 5 years ago

This is the release note of v5.3.0. See here for the complete list of solved issues and merged PRs.

New Features

  • Add NCCL 2.4 functions (#2052)

Enhancements

  • Use version interface in NCCL 2.3.4 (#2006)
  • Fix vector handling (#2013)
  • Avoid import numpy (#2054)
  • Support Python scalars in iscomplexobj (#1993)

Performance Improvements

  • Improve dictionary operation (#2039)

Bug Fixes

  • Fix cupy.random.randint fail with size zero (#1981)
  • Call free_all_blocks to free CUDA memory (#1987)
  • Define __dealloc__ instead of __del__ for cdef-classes to fix memory leak (#2000)
  • Fix NCCL version assignment for NCCL < 2.3.4 (#2012)
  • Fix random generator seed type (#2041)

Code Fixes

  • Remove duplicate declaration of cudaMalloc (#2037)

Documentation

  • Add cupy-cuda100 to README (#1980)
  • Fix README image to use URL (#1982)
  • Document NumPy 1.16 support (#1996)

Tests

  • Fix test to support SciPy 1.12 (#1971)
  • Fix test of assert_array_list_equal (#2003)
  • Fix test with NumPy 1.16.1 (#2004)
cupy - v6.0.0b2

Published by hvy over 5 years ago

This is the release note of v6.0.0b2. See here for the complete list of solved issues and merged PRs.

Highlights

  • Compatibility with NumPy 1.16 and support for the __array_function__ interface allowing CuPy arrays to be passed to NumPy functions.

New Features

  • Implement __array_function__ interface (#1650)
  • Implement SciPy-compatible FFT functions (#1745)
  • Add PlanNd class for faster 2D and 3D FFTs (#1746, thanks @grlee77!)
  • Add order argument to empty_like, zeros_like, etc. (#1819, thanks @grlee77!)
  • Add cupyx.scipy.sparse.diags (#1840, thanks @grlee77!)
  • Add the order kwarg to cupy.reshape and the underlying ndarray method (#1843, thanks @grlee77!)
  • Support order kwarg in asarray, asanyarray, ndarray.get, tonumpy (#1845, thanks @grlee77!)
  • cumsum, cumprod: support array-like input (#1847, thanks @grlee77!)
  • Add sparse option to meshgrid (#1848, thanks @grlee77!)
  • Add attributes property to Device (#1869, thanks @grlee77!)
  • Support new APIs in cuDNN v7.4.1 (#1884)
  • Add as_strided (#1897, thanks @fujiisoup!)
  • Add fnc and forc to core.flags.Flags (#1898, thanks @grlee77!)
  • CuPy external memory pool with memory pool and function pointers (#1904)
  • Allow multi-axis roll as in NumPy 1.12+ (#1818, thanks @grlee77!)

Enhancements

  • Add cupy.util.PerformanceWarning (#1607)
  • Support NumPy arrays as seeds in RandomState (#1689, thanks @mrocklin!)
  • Better support cuda-gdb (#1773)
  • Allow cupy.fuse taking any parameters when fails to fuse (#1817)
  • Verbose broadcast failure message (#1836)
  • Bundle CUDA fp16 headers (#1837)
  • Refactor array creation for RNN API (#1842)
  • Support for documenting NCCL interfaces (#1857)
  • Use item instead of asscalar to support NumPy 1.16 (#1880)
  • Support NumPy 1.16 (#1881)
  • Allow passing None as parameters of Fusion.__call__ (#1965)

Performance Improvements

  • Improve cudnn.py performance (#1375)
  • Improve matmul perfomance (x3 faster) (#1547)
  • Improve matmul (small changes) (#1894)
  • Improve Fusion.__call__ (#1832)

Bug Fixes

  • Fix backward of batch normalization (#1822)
  • Fix RecursionError bug in conj method of sparse matrix classes (#1846, thanks @grlee77!)
  • Allow None as cx argument in cuDNN RNN functions (#1862)
  • Fix ndarray initialization bug (#1907)
  • Set size attribute of externally allocated memory (#1926)
  • Fix ndim not using cdef (#1935)
  • Ensure the input to the CUFFT Plan1D class is C contiguous (#1944, thanks @grlee77!)
  • Fix wrong variable name in cudnn.pyx (#1966)

Documentation

  • Document Numba CUDA array conversion (#1786)
  • Align with the latest NumPy docs (#1804)
  • Update docs for newly supported libraries (#1854)
  • Fix bad hyperlink in cupy.around documentation (#1856)
  • Add memory management documentation (#1871)
  • Add diags to documentation (#1872)
  • Fix stylecheck installation in contribution guide. (#1873, thanks @crcrpar!)
  • Avoid using _ufunc_wrapper and _reduction_wrapper (#1888)
  • Rename Code of Conduct filename (#1900)

Tests

  • Add tests for complex min and max (#1828)
  • Fix randint high value in test for Windows (#1892)
  • Fix test to support SciPy 1.12 (#1968)

Code Fixes

  • Refactor manipulation routines from core.pyx (#1620)
  • Use language level 3 in cythonize (#1792)
  • Change argument name of core.create_comparison (#1829)
  • Refactor cyclic imports (#1905)
  • Refactor math and indexing routines from core.pyx (#1949)
  • Clean up internal.pyx (prod) (#1951)
  • Fix int types (#1952)
  • Rename External* to CFunction* (#1958)
  • Refactor logic, sorting and statistics routines from core.pyx (#1959)
  • Use LF instead of CR+LF (#1909)

Others

  • Ignore W503 and W504 (#1953)
cupy - v5.2.0

Published by mitmul over 5 years ago

This is the release note of v5.2.0. See here for the complete list of solved issues and merged PRs.

Highlights

  • CuPy now runs without CUDA development headers if you are using CUDA 9.2 or 10.0.
  • Improved compatibility with NumPy 1.16.

New Features

  • Add cupyx.scipy.sparse.diags (#1865, thanks @grlee77!)
  • Support New APIs in cuDNN v7.4.1 (#1893)

Enhancements

  • Bundle CUDA fp16 headers (#1858)
  • Support for documenting NCCL interfaces (#1868)
  • Verbose broadcast failure message (#1875)
  • Support NumPy arrays as seeds in RandomState (#1882, thanks @mrocklin!)
  • Use item instead of asscalar to support NumPy 1.16 (#1895)
  • Use LF instead of CR+LF (#1920)
  • Support NumPy 1.16 (#1923)

Performance Improvements

  • Fix ufunc performance degradation (#1929)

Bug Fixes

  • Fix backward of batch normalization (#1860)
  • Check if cx is None (#1863)
  • Fix RecursionError bug in conj method of sparse matrix classes (#1922, thanks @grlee77!)
  • Fix ndarray initialization bug (#1927)
  • Fix ndim not using cdef (#1939)
  • Ensure the input to the CUFFT Plan1D class is C contiguous (#1964, thanks @grlee77!)

Code Fixes

  • Clean up internal.pyx (prod) (#1960)

Documentation

  • Update docs for newly supported libraries (#1855)
  • Fix stylecheck installation in contribution guide. (#1876, thanks @crcrpar!)
  • Add memory management documentation (#1877)
  • Align with the latest numpy docs (#1878)
  • Rename Code of Conduct filename (#1963)

Tests

  • Add tests for complex min and max (#1859)
  • Fix randint high value in test for Windows (#1902)

Others

  • Ignore W503 and W504 (#1957)
Package Rankings
Top 0.96% on Pypi.org
Top 5.87% on Conda-forge.org
Top 8.17% on Proxy.golang.org
Top 19.57% on Anaconda.org
Badges
Extracted from project README
pypi Conda GitHub license Matrix Twitter Medium