Bot releases are hidden (Show)

cupy - v7.0.0

Published by asi1024 almost 5 years ago

This is the release note of v7.0.0. See here for the complete list of solved issues and merged PRs.

This release note only covers the difference from v7.0.0rc1; for all highlights and changes, please refer to the release notes of the pre-releases:

See the Upgrade Guide if you are upgrading from previous versions. Also, note that we dropped the support of Python 2.7 and 3.4 from CuPy v7.

Highlights

Added experimental support cuTENSOR 1.0.0. cuTENSOR is a library for high-performance tensor operations and is available for CUDA GPUs with compute capability of 70+. See cuTENSOR examples for the details.

Changes without compatibility

Stopped raising some errors in CuPy linalg functions by default for performance improvement. We can keep NumPy compatibility by setting cupyx.seterr(linalg=True), but it sometimes decrease performance because checking cuSOLVER devInfo and cuBLAS infoArray requires device synchronization.

New Features

Add scipy.fft to cupyx (#2355, thanks @peterbell10!)
Support separate compilation in RawKernel (#2426, thanks @leofang!)
Introduce errstate configuration to control cuSOLVER devInfo and cuBLAS infoArray checks (#2437)
Inverse for Hermitian matrix (#2495)
Introduce errstate and related functions (#2535)
Implement tobytes for CuPy arrays (#2617, thanks @jakirkham!)
Add fromfile (#2626, thanks @jakirkham!)
Add using_allocator context manager (#2627)
Add cuDNN new batch normalization (#2651)
Support CUB reduction for F-contiguous arrays (#2682, thanks @leofang!)
Support cuTENSOR 1.0.0 (#2709)
Add searchsorted (#2726)

Enhancements

Support cuComplex.h in cupy.RawKernel and cupy.RawModule (#2551, thanks @leofang and @grlee77!)
Reduce compile warnings (#2556)
Normalize strides of cuDNN descriptors (#2564)
Display versions of CUDA libraries (#2578)
Improve ROCm error handling (#2639)
Add support of complex dtypes for sinc (#2646)
Support thrust with ROCm (#2666)
Improve ndattay.reduced_view (#2694)
Make set_allocator and get_allocator symmetric (#2707)
Remove cupy.cupyx (#2722)
Add a few missing stubs for ROCm/HIP build (#2737, thanks @leofang!)
Support Python 3.8 on Windows (#2738)
Implement ParameterInfo.__repr__ (#2747)

Performance Improvements

Refactor CUB to support an explicit axis argument; Fix alignments for Thrust's complex types (#2562, thanks @leofang!)
Add CUB support for argmax() and argmin() (#2596, thanks @leofang!)
Avoid __init__ function call overhead in memory allocation (#2671)
Avoid with overhead in memory allocation (#2672)
Avoid use of slow numpy.find_common_type (#2683, thanks @grlee77!)
Cache dtype object for speed in _scalar (#2684)
Cache ElementwiseKernel object (#2685)
Avoid threading.local() object overhead (#2687)
Add prod_sequence to avoid creating vector (#2689)
Remove memory allocation in set_contiguous_strides (#2690)
Avoid __init__ call when creating CArray object (#2691)
Reduce memory allocation in improve get_reduced_dims (#2692)
Improve _reduce_dims (#2693)
Improve small issues (#2695)
Improve broadcast (#2696)
CUB-based CSR sparse matrix vector multiply (#2698, thanks @grlee77!)
Avoid module level lookup in _dtype (#2700)
Add _ndarray_init to reduce ndarray creation cost (#2701)

Bug Fixes

Cache ElementwiseKernel kernel globally instead of per instance (#2474)
Use != instead of is not for literal (#2561, thanks @Dobatymo!)
Remove cuSPARSE APIs dropped in CUDA 10.1 Update 2 (#2573)
Support 0-sized arrays for linalg.qr (#2586, thanks @IvanYashchuk!)
Fix __cuda_array_interface__ data pointer for 0-size arrays (#2611, thanks @leofang!)
Fix ROCm build error (#2632)
Fix bugs in CUB (#2636, thanks @leofang!)
Avoid using __align__ in ROCm (#2638)
Remove stubs for APIs dropped in CUDA 10.1 Update 2 (#2641)
Do not allow reshape on empty arrays (#2648)
Fix pinv for complex datatypes (#2657, thanks @YoujinShin!)
Fix det and slogdet on singular inputs (#2660)
Handle tuple with value 0 and return empty array (#2662, thanks @quasiben!)
Fix AttributeError of stride_tricks (#2679)

Code Fixes

Remove redundant definitions in cupy_cufft.h (#2560, thanks @leofang!)
Type dumps return value as bytes (#2619, thanks @jakirkham!)
Remove std::map for simple implementation (#2670)
Improve reduction core (#2697)
Remove insignificant assertion (#2714)
Avoid tricky initialization of block stride (#2729)
Remove cupy/internal.py (#2739, thanks @leofang!)

Documentation

Add CUDA API runtime API list (#2557)
Document more environment variables (#2593, thanks @leofang!)
Update CODE_OF_CONDUCT typo (#2609)
Expand TOC to improve document index page (#2642)
Fix document format of as_strided (#2680)
Update requirements (#2756)

Installation

Package tests in sdist (#2563, thanks @jakirkham!)
Fix url to use the home page address (#2580)
Add software description to setup.py (#2582)
Import CUDA headers from CUDA 10.1 Update 2 (10.1.243) (#2592)
Fix invaild requirements (#2630)

Examples

Show better performance improvement in examples/stream/map_reduce.py (#2588, thanks @leofang!)

Tests

Add CI configuration for ROCm (#2408)
Add backward compatibility test for __cuda_array_interface__ (#2536, thanks @leofang!)
Include .git in ChainerCV compatibility CI (#2577)
Update testing.parameterize using the latest version from Chainer (#2633, thanks @grlee77!)
Add FlexCI configurations (#2649)
Add test for get_c_contiguity (#2686)
Skip tests that segfault when using SciPy 1.3.x (#2712, thanks @grlee77!)
Fix broken version specification in FlexCI dockerfiles (#2728)

cupy - v6.6.0

Published by toslunar almost 5 years ago

This is the release note of v6.6.0. See here for the complete list of solved issues and merged PRs.

Highlights

Python 3.8 is now officially supported and we are providing wheels for this version.

Enhancements

Support complex dtypes in cupy.where (#2615, thanks @AntoineDujardin!)
Update __cuda_array_interface__ to protocol version 2 (#2669, thanks @leofang!)
Remove cupy.cupyx (#2724)
Support Python 3.8 on Windows (#2751)

Bug Fixes

Make exceptions picklable (#2567)
Support 0-sized arrays for linalg.qr (#2602, thanks @IvanYashchuk)
Do not allow reshape on empty arrays (#2652)
Cache ElementwiseKernel kernel globally instead of per instance (#2659)
Handle tuple with value 0 and return empty array (#2706, thanks @quasiben!)

Code Fixes

Remove insignificant assertion (#2719)

Documentation

Document more environment variables (#2612, thanks @leofang!)
Update CODE_OF_CONDUCT typo (#2628)
Fix document format of as_strided (#2713)
Update requirements (#2755)

Tests

Add missing FlexCI configurations (#2591)
Add FlexCI configurations (#2715)
Skip tests that segfault when using SciPy 1.3.x (#2721, thanks @grlee77!)
Fix broken version specification in FlexCI dockerfiles (#2736)

Installation

Package tests in sdist (#2572, thanks @jakirkham!)
Fix url to use the home page address (#2590)
Import CUDA headers from CUDA 10.1 Update 2 (10.1.243) (#2605)
Fix invaild requirements (#2640)
Add software description to setup.py (#2653)

Examples

Show better performance improvement in examples/stream/map_reduce.py (#2603, thanks @leofang!)

Others

Fix AttributeError of stride_tricks (#2705)

cupy - v6.5.0

Published by asi1024 almost 5 years ago

This is the release note of v6.5.0. See here for the complete list of solved issues and merged PRs.

Enhancements

Ignore warning caused by fastrlock (#2501)
Fix cupy.repeat error message about repeats argument type (#2506)

Bug Fixes

Fix coosort (#2487, thanks @econtal!)

Documentation

Fix some typo (#2527, thanks @garanews!)
Fix dead links in NumPy docs in random functions (#2554)

Tests

Fix memory pool disabled during tests (#2502)
Move CI requirements to CuPy repository (#2543)
Fix for NumPy 1.14.x compatibility (#2546)

cupy - v7.0.0rc1

Published by beam2d almost 5 years ago

This is the release note of v7.0.0rc1. See here for the complete list of solved issues and merged PRs.

Announcements

This time, we will keep the current branches for active development (master for v7.x, v6 for v6.x) after the RC. We will maintain v6.x series until Python2 EOL, so we do not cut the new development version for now to avoid increasing the number of branches to maintain. New features will be included directly into v7 for a while, and maintenance changes will be backported to v6.

Highlights

Experimental support of AMD GPUs are added (#1094). See the installation guide for how to install CuPy with AMD support. Note that this feature is still experimental, and we do not guarantee the API stability.

Changes without compatibility

Avoid casting inputs to cupy.ndarray in cupy.pad (#2504)
- From this release, cupy.pad does no longer convert the input to cupy.ndarray automatically. This is done under the design principle of not implicitly synchronizing the host and the device, which most of the other APIs are also following.

New Features

Experimental support of AMD GPU via HIP (ROCm2.7.0+) (#1094)
Adds nvcc as a RawKernel backend (#1941, thanks @sjperkins and @leofang!)
Support cuTENSOR 0.2 (#2341)
Implement isin and in1d (#2388, thanks @UmashankarTriforce!)
Support scipy.ndimage compatible convolve and correlate (#2483)
Added cupy.cuda.memory.get_allocator interface (#2489)
Handle PCI bus ID (#2531, thanks @jameshclrk!)
Expand coverage of cuSolverSP APIs (#2539)
Add cuSPARSE routines for preconditioners (#2542)

Enhancements

Fix division by zero in mean/std/var functions for 0-length dimensions (#2201, thanks @pentschev!)
Improve error message in cupy.linalg.inv (#2342)
Replace cupy.pad with a heavily refactored version from NumPy 1.17 (#2399, thanks @grlee77!)
Fix cupy.repeat error message about repeats argument type (#2400)
Ignore warning caused by fastrlock (#2488)
Update __cuda_array_interface__ to protocol version 2 (#2491, thanks @leofang!)
Allow axis=None in concatenate (#2496, thanks @liwt31!)
Fix @testing.numpy_cupy_ decorators for skips (#2498)
Avoid implicit cast inputs to cupy.ndarray in cupy.pad (#2504)
Cholesky decomposition to support complex values (#2509)
Enhance shuffle-test of testing.for_dtypes_combination (#2511)
Allow to use real and imag on CUDA kernels (#2520)
Support complex numbers in cupy.linalg.qr() (#2526, thanks @leofang!)
Fix bug in CUB + Native support of complex numbers in CUB (#2538, thanks @leofang!)
Support cupyx.fallback_mode as an experimental feature (#2541)
Support stream in CUB (#2555, thanks @leofang!)

Performance Improvements

Performance improvement for cupy.var complex inputs (#2484)
Enable fast CUB-based reductions in more cases (cupy.linalg.norm, etc.) (#2517, thanks @grlee77!)

Code Fixes

Remove unnecessary check of cuSOLVER (#2529, thanks @grlee77!)

Documentation

Fix dead links in NumPy docs in random functions (#2384)
Update install_rocm.rst (#2512)
Fix some typo (#2523, thanks @garanews!)

Tests

Fix memory pool disabled during tests (#2452)
Skip bool-bool inputs in cupy.cross test (#2503)
Fix error in test_build.py (#2514, thanks @leofang!)
Move CI requirements to CuPy repository (#2533)
Fix for NumPy 1.14.x compatibility (#2544)
Workaround bug in NumPy 1.12.x or earlier (#2545)

cupy - v7.0.0b4

Published by hvy about 5 years ago

This is the release note of v7.0.0b4. See here for the complete list of solved issues and merged PRs.

New Features

Add cupy.cross (#2366, thanks @UmashankarTriforce!)
Add cupy.may_share_memory (#2417)
Add occupancy driver APIs (#2424, thanks @leofang!)
Add scatter_max and scatter_min (#2427)
Support texture memory in RawKernel (#2432, thanks @leofang!)

Enhancements

Extend CuDNNError to have more debugging information (#2404)
Add complex dtype support to cupy.std and cupy.var (#2411, thanks @grlee77!)
Support complex dtypes in cupy.linalg.inv() (#2468, thanks @leofang!)

Performance Improvements

Use CUB to speed up sum/min/max (#2090)
Avoid creating empty numpy ndarray in common_type (#2307)
cupy.linalg.norm: update docstring and improve performance for ord=2 cases (#2479, thanks @grlee77!)

Bug Fixes

Fix bug in coosort (#2410, thanks @econtal!)
Fix multithreading issue in cupy.cuda.cufft.get_current_plan() (#2435, thanks @leofang!)
Normalize hidden layer strides in cuDNN RNN (#2442)
Copy inputs in ufunc if they share the same memory with outputs (#2460)

Code Fixes

Fix cuSOLVER devInfo dtype in inv and unify how those are specified (#2454)
Fix code convention of statistics routines (#2459)
Remove unnecessary ndarray private methods (#2465)

Documentation

Document CUTENSOR_PATH environment variable (#2386)
Add appropriate pointer types in Cython to the contribute guide (#2455, thanks @leofang!)
Fix invalid escape sequence (#2470)

Installation

Fix NumPy version in Dockerfile (#2430)

Examples

Broadcast ValueError for n-clusters > 2 in k-means example (#2453, thanks @casheera!)

Tests

Simplify cupy.fuse tests (#2339)
Revert "Skip some ndarray-elementwise-op tests as temporary fix" (#2383)
Drop Python 2 Travis CI configuration (#2428)
Drop Python 2 PFN CI configuration (#2429)
Change URL to place test assets (#2434)
Add NumPy 1.17.1 to skip list of 0-length ifft test (#2441)
Remove unnecessary skip in 0-length ifft test (#2443)
Fix TestDLTensorMemory.test_delete (#2451)
Fix data race in advanced indexing test (#2472)
Fix pytest 5.x version errors (#2473)
Remove global state from tests (#2475, thanks @leofang!)

cupy - v6.4.0

Published by beam2d about 5 years ago

This is the release note of v6.4.0. See here for the complete list of solved issues and merged PRs.

Enhancements

Support for shape argument in *_like functions (#2418, thanks @pentschev!)

Bug Fixes

Normalize hidden layer strides in cuDNN RNN (#2448)

Documentation

Add appropriate pointer types in Cython to the contribute guide (#2480, thanks @leofang!)

Installation

Fix NumPy version in Dockerfile (#2447)

Tests

Add NumPy 1.17.1 to skip list of 0-length ifft test (#2445)
Fix TestDLTensorMemory.test_delete (#2458)
Fix data race in advanced indexing test (#2482)

cupy - v7.0.0b3

Published by asi1024 about 5 years ago

This is the release note of v7.0.0b3. See here for the complete list of solved issues and merged PRs.

Highlights

cupy.RawModule has been introduced to allow users access low-level features (CUDA modules).

Dropping Support of Python 2

Due to the end-of-life (EOL) of Python 2 in January 2020, Python 2 support has been dropped in this release. CuPy v6.x continues to support Python 2. See the blog post for details.

New Features

Add cupy.nanargmin and cupy.nanargmax (#2222, thanks @harshalchaudhari35!)
Support cuDNN Fused Ops API (#2246)
Add fallback_mode.ndarray (#2272, thanks @Piyush-555!)
Add NCCL broadcast (#2303)
Add take_along_axis (#2314)
Add cupy.nanmean (#2319, thanks @Piyush-555!)
Add more cuSolver APIs for dense linear solver (#2320)
Provide full coverage for NCCL APIs in CuPy (#2325, thanks @leofang!)
Add cupy.nanvar and cupy.nanstd (#2344, thanks @Piyush-555!)
Allow getting and setting CUDA kernel attributes (#2369, thanks @leofang, @grlee77 and @andravin!)
Add cupy.RawModule() (#2389, thanks @leofang!)
Add diagonal method for cupyx.scipy.sparse.dia_matrix (#2398, thanks @grlee77!)

Enhancements

Support for shape argument in *_like functions (#2171, thanks @pentschev!)
Support batched matrix inverse (#2300)
Add wrapper for numpy.vectorize in fallback_mode (#2350, thanks @Piyush-555!)
Allow cuFFT plans to be used as a context manager; Set stream before executing cuFFT plans (#2362, thanks @leofang!)
Raise DeprecationWarning on 0-dim arrays in numpy.nonzero to match NumPy 1.17 behavior (#2394)
Change check condition of accept_error (#2396)

Bug Fixes

Make exceptions picklable (#2318)
Make cupy.tensordot use Tensor Core also in case of compute-capability > 70 (#2328)

Code Fixes

Fix unused imports and import orders (#2312)
Fix variable names in tri kernel (#2326)
Some simplifications using isnan (#2364)
Small fix of fusion.pyx and docs (#2393, thanks @xuzijian629!)

Documentation

Add new functions to API reference (#2308)
Fix for linalg.svd documentation (#2321, thanks @IvanYashchuk!)
Fix markup in linalg.svd docs (#2323)
Fix doctest failure related to __array_function__ support (#2352)
Add 1.17 to supported NumPy versions (#2368)

Tests

Make wheel of master for CI (#2144)
Add ChainerCV's tests to pfnCI (#2186)
Fix tests of linalg.svd (#2338)
Check return type in test_type_routines.py (#2358)
Skip some FFT tests on NumPy 1.17.0 due to a NumPy bug (#2363, thanks @grlee77!)
Remove random.power test with forbidden value (#2375)
Skip some tests in TestArrayElementwiseOp as temporary fix (#2376)
Avoid using numpy.nonzero in _make_decorator (#2385)
Increase numbers of retries for K-S tests (#2397)

Others

Drop support for Python 2.7 and 3.4 (#2343)

cupy - v6.3.0

Published by beam2d about 5 years ago

This is the release note of v6.3.0. See here for the complete list of solved issues and merged PRs.

Highlights

NumPy 1.17 is now officially supported.

New Features

Add diagonal method for cupyx.scipy.sparse.dia_matrix (#2412, thanks @grlee77!)
Support allow_pickle in cupy.load and cupy.save (#2291)

Bug Fixes

Make cupy.tensordot use Tensor Core also in case of compute-capability > 70 (#2335)

Code Fixes

Fix unused imports and import orders (#2336)

Documentation

Add new functions to API reference (#2317)
Fix for linalg.svd documentation (#2322, thanks @IvanYashchuk!)
Fix markup in linalg.svd docs (#2327)
Fix doctest failure related to __array_function__ support (#2353)
Add 1.17 to supported NumPy versions (#2413)

Tests

Skip some ndarray-elementwise-op tests as temporary fix (#2378)
Remove random.power test with forbidden value (#2380)
Skip some FFT tests on NumPy 1.17.0 due to a NumPy bug (#2381, thanks @grlee77!)
Avoid using numpy.nonzero in _make_decorator (#2395)
Skip nonzero test for 0d arrays in NumPy 1.17 (#2401)
Increase numbers of retries for K-S tests (#2402)

cupy - v7.0.0b2

Published by hvy over 5 years ago

This is the release note of v7.0.0b2. See here for the complete list of solved issues and merged PRs.

Highlights

cupy.cutensor has been introduced that wraps cuTENSOR, allowing high-performance tensor operations. Examples are available here.

Changes without compatibility

cupy.load now specifies allow_pickle=False by default to follow the security fix made in NumPy 1.16.3 (see numpy/numpy #13359 and cupy/cupy #2290 for details). Most users should not be affected by this change; users loading ndarray serialized using pickle may need to explicitly specify allow_pickle=True.

New Features

Support cuTENSOR (#2210)
Add nansum and nanprod support (#2252, thanks @pentschev!)

Enhancements

Raise error when no available algorithm found by cudnnFindConvolution* (#2234)
Add NHWC layout support to batch normalization (#2235)
Remove unused code (#2255)
Support non-square matrices in lu_factor (#2286, thanks @econtal!)
Remove cupy.ndarray.{nansum/nanprod} (#2292)
Remove warning about nvcc absence (#2299)

Performance Improvements

Add support for merge path algorithm (csrmvEx) when csr_matrix multiply with a dense vector (#2287, thanks @wonghang!)

Bug Fixes

Add wrappers for can_cast, common_type and result_type functions (#2249, thanks @pentschev!)
Make __cuda_array_interface__()['strides'] be tuple (#2260, thanks @leofang!)
Avoid CUDNN_STATUS_BAD_PARAMS (#2261, thanks @himkt!)
Support allow_pickle in cupy.load and cupy.save (#2290)

Documentation

Update installation guide (#2184)
Document interoperability with mpi4py (#2270, thanks @leofang!)

Installation

Make nvcc code generation target configure by env var (#2293)

Tests

Fix test failure when cuDNN is unavailable (#2284)
Enhance tests for type routines (#2306)

cupy - v6.2.0

Published by niboshi over 5 years ago

This is the release note of v6.2.0. See here for the complete list of solved issues and merged PRs.

New Features

Allow copying in the format cupy_array[:] = numpy_array (#2219, thanks @pentschev!)

Enhancements

Fix _preprocess_args to avoid calling hasattr (#2263)
Simplify _prepare_multiple_array_indexing (#2264)
Add constant modification to vector_equal (#2265)
Remove unused code (#2275)
Remove warning about nvcc absence (#2302)

Bug Fixes

Make __cuda_array_interface__()['strides'] be tuple (#2273, thanks @leofang!)
Avoid CUDNN_STATUS_BAD_PARAMS (#2285, thanks @himkt!)
Add wrappers for can_cast, common_type and result_type functions (#2304, thanks @pentschev!)

Documentation

Document interoperability with mpi4py (#2288, thanks @leofang!)
Update installation guide (#2297)

Tests

Enhance tests for type routines (#2311)

cupy - v7.0.0b1

Published by hvy over 5 years ago

This is the release note of v7.0.0b1. See here for the complete list of solved issues and merged PRs.

Highlights

Host to device copy from NumPy ndarrays is now allowed as an experimental feature with the syntax cupy_array[:] = numpy_array. Set the environment variable CUPY_EXPERIMENTAL_SLICE_COPY to try it out.

Notes

Tensor Core in cuDNN convolution is tentatively disabled for Turing GPUs due to some test failures. The issue is under investigation and hopefully fixed in a future version.

New Features

Allow copying in the format cupy_array[:] = numpy_array (#2079, thanks @pentschev!)
Add linalg.lstsq (#2165, thanks @cjekel!)
Add fallback_mode (#2229, thanks @Piyush-555!)
Add CUDNN_POOLING_MAX_DETERMINISTIC (#2239)
Add API to retrieve installation info (#2245)

Enhancements

Improve atomicAdd in histogram and sample (#1345)
Set current device in cupy.ndarray.get()/set() (#2169, thanks @hyabe!)
Improve out-of-memory error message (#2242)
Simplify _prepare_multiple_array_indexing (#2254)
Fix _preprocess_args to avoid calling hasattr (#2256)
Add constant modification to vector_equal (#2257)

Performance Improvements

Optimize the initialization of List[ndarray] in cupy.array (#2081)

Bug Fixes

Revert "Fix usage check of Tensor Core" (#2197)
Fix power for large integrals (#2204)
Avoid division by zero in tensordot, allowing 0-length arrays (#2209, thanks @pentschev!)
Make RandomState.permutation compatible with random.permutation (#2250)

Code Fixes

Coding style fix (#2211)

Documentation

Add cupy-cuda101 to README (#2196)
Fix: duplicate object description of cupy (#2233)

Installation

Add setup option to copy include files in wheel (#2208)
Bump version to v7.0.0b1 (#2266)

Tests

Ignore invalid axis type test in NumPy 1.12.x or earlier (#2192)

Others

Fix array_split with non-equally dividing sections (#2207)

cupy - v6.1.0

Published by niboshi over 5 years ago

This is the release note of v6.1.0. See here for the complete list of solved issues and merged PRs.

Notes

Tensor Core in cuDNN convolution is tentatively disabled for Turing GPUs due to some test failures. The issue is under investigation and hopefully fixed in a future version.

Enhancements

Improve atomicAdd in histgram and sample (#2217)

Bug Fixes

Revert "Fix usage check of Tensor Core" (#2198)
Fix array_split with non-equally dividing sections (#2214)
Avoid division by zero in tensordot, allowing 0-length arrays (#2231, thanks @pentschev!)
Make RandomState.permutation compatible with random.permutation (#2253)

Documentation

Add cupy-cuda101 to README (#2203)
Fix duplicate object description of cupy (#2240)

Installation

Add setup option to copy include files in wheel (#2232)

Tests

Ignore invalid axis type test in NumPy 1.12.x or earlier (#2200)

cupy - v7.0.0a1

Published by kmaehashi over 5 years ago

This is the release note of v7.0.0a1. See here for the complete list of solved issues and merged PRs.

Highlights

CuPy memory pool now supports setting hard-limit quota for the amount of GPU memory allocated. Refer to the reference for the details.

New Features

Support cuDNN CTC functions (#1769, thanks @aonotas!)
Support NHWC format in convolution (#1885)
Add hostRegister and hostUnregister (#2102)
Implement limit to memory pool (#2113)
Add strides_check option in array testing functions (#2150)

Enhancements

Fix ascontiguousarray with 0-dim array input (#2078)
Emit kernel names with type names (#2151)
Support complex dtypes in cupy.where (#2175, thanks @AntoineDujardin!)

Bug Fixes

Fix __cuda_array_interface__ data pointer for sliced arrays (#2129, thanks @pentschev!)
Fix usage check of Tensor Core (#2168)
Avoid using Tensor Core with cuDNN deterministic mode in convolution backward (#2174)

Code Fixes

Avoid unnecessary weak pointer for null stream (#1539)
Avoid PyThread in stream.pyx (#1945)
Add a comment to the testing condition of einsum (#2131)
Fix style (#2167)

Documentation

Add upgrade guide for v6 (#2182)

Installation

Fix compile error on CUDA 10.1 and GCC 7 or 8 (#2147, thanks @grafi-tt!)

Examples

Make k-means example's custom kernels simpler (#2145)
Make k-means sample code cleaner for educational purpose (#2146)

Tests

Fix testing condition of diff and unwrap (#2124)
Add a test for assert_array_equal(strides_check=True) (#2156)
Fix test failure when cudnn is unavailable (#2161)

cupy - v6.0.0

Published by beam2d over 5 years ago

This is the release note of v6.0.0. See here for the complete list of solved issues and merged PRs.

This release note only covers the difference from v6.0.0rc1; for all highlights and changes, please refer to the release notes of the pre-releases:

See the Upgrade Guide if you are upgrading from previous versions.

Bug Fixes

Fix __cuda_array_interface__ data pointer for sliced arrays (#2134, thanks @pentschev!)
Fix usage check of Tensor Core (#2172)
Avoid using Tensor Core with cuDNN deterministic mode in convolution backward (#2176)

Code Fixes

Avoid unnecessary weak pointer for null stream (#2154)

Documentation

Add upgrade guide for v6 (#2187)

Installation

Fix compile error on CUDA 10.1 and GCC 7 or 8 (#2160, thanks @grafi-tt!)

Tests

Fix testing condition of diff and unwrap (#2142)

cupy - v5.4.0

Published by niboshi over 5 years ago

This is the release note of v5.4.0. This is the final release of v5.x series. See here for the complete list of solved issues and merged PRs.

Highlights

CUDA 10.1 and cuDNN 7.5 are now supported. CuPy also starts to compile for compute capability 7.5 for Turing GPUs.

Enhancements

Avoid using pytest attributes during import (#2057)
Fix fp16 issue in batch normalization (#2094, thanks @anaruse!)
Keep backward compatibility on cupy.cudnn.batch_normalization_forward_training (#2094)
Support CUDA 10.1 + cuDNN 7.5 + Turing (#2123)

Bug Fixes

Ensure that sparse matrix shapes are always a tuple of int (#2082, thanks @grlee77!)
Fix assigning from complex to float (#2092)
Check array contiguity in copy (#2093)
Fix assertion error in _Chunk.split (#2112, thanks @liwt31!)
Support dltensor with strides of NULL (#2119, thanks @crcrpar!)
Avoid sharing handles between threads (#2122)

Code Fixes

Remove unused variable (#2108, thanks @crcrpar!)

Documentation

Add NCCL v2.4 support to docs (#2089)
Add strides to docstring of ndarray (#2109, thanks @crcrpar!)

Tests

Do not fail by warnings when building docs (#2086)

cupy - v6.0.0rc1

Published by beam2d over 5 years ago

This is the release note of v6.0.0rc1. See here for the complete list of solved issues and merged PRs.

Highlights

CUDA 10.1 and cuDNN 7.5 are now supported. CuPy also starts to compile for compute capability 7.5 for Turing GPUs.
After this release, the master branch is switched to the development of v7 series. v6.0.0 will continue developing at the v6 branch.

New Features

New RNN API introduced in cuDNN v7.2 (#1609)
Add diff and unwrap (#1933, thanks @a2kiti!)
Support fusion feature of copyto method (#1983)
Add lu_factor and lu_solve to cupyx.scipy.linalg (#2051, thanks @msakai!)

Enhancements

More support __cuda_array_interface__ (#2058)
Fix fp16 issue in batch normalization (#2060)
Keep backward compatibility on cupy.cudnn.batch_normalization_forward_training (#2072)
Check if gc module is still available (#2116)
Support CUDA 10.1 + cuDNN 7.5 + Turing (#2117)

Performance Improvements

Do exact type comparison instead of isinstance for numpy.dtype (#2016)
Improve _routines_manipulation (#2038)

Bug Fixes

Fix assigning from complex to float (#1911)
Ensure that sparse matrix shapes are always a tuple of int (#1943, thanks @grlee77!)
Avoid sharing handles between threads (#2053)
Check array contiguity in copy (#2075)
Support dltensor with strides of NULL (#2097, thanks @crcrpar!)
Fix assertion error in _Chunk.split (#2103, thanks @liwt31!)

Code Fixes

Use single quote (#2049)
Use assert in helper.py instead of self.assertXXX (#2077)
Remove unused variable (#2105, thanks @crcrpar!)
Reorganize import in device.pyx (#2121)

Documentation

Document __array_function__ (#1979)
Add NCCL v2.4 support to docs (#2065)
Add strides to docstring of ndarray (#2096, thanks @crcrpar!)

Installation

Use deep copy in setting up RPATH (#2073)

Tests

Do not fail by warnings when building docs (#2080)
Test linalg.cholesky with more stable inputs (#2084)

cupy - v6.0.0b3

Published by beam2d over 5 years ago

This is the release note of v6.0.0b3. See here for the complete list of solved issues and merged PRs.

New Features

Implement cupy.put and cupy.place (#1787, thanks @grafi-tt!)
Add plan argument to FFT functions in cupyx.scipy.fftpack (#1942 #2033, thanks @leofang!)
Add out argument to ndarray.get for asynchronous device-to-host copy (#1970, thanks @jeng1220)
Add NCCL 2.4 functions (#1992)

Enhancements

Remove experimental warning of cupy.fuse (#1379)
Support comparing complex-number arrays in cupy.allclose (#1947, thanks @leofang!)
Use version interface in NCCL 2.3.4 (#1985)
Enhance error messages of elementwise operation in fusion mode (#2007)
Fix vector handling (#2008)
Support cuDNN FP16 batch normalization (#2034)
Avoid import numpy (#2040)
Avoid using pytest attributes during import (#2055)
Support Python scalars in iscomplexobj (#1991)

Performance Improvements

Improve performance of reduction on outer axis (#2010, thanks @grafi-tt!)
Reduce vector copy in ndarray initialization (#2015)
Improve kind score operation (#2017)
Improve dictionary operation (#2018)
Improve get_ufunc_kernel (#2019)
Reduce get_device_id call (#2021)
Improve performance of broadcasting in elementwise kernels (#2022)
Improve _is_fusing performance (#2023)

Bug Fixes

Fix cupy.random.randint fail with size zero (#1967)
Call free_all_blocks to free CUDA memory (#1984)
Define __dealloc__ instead of __del__ for cdef-classes to fix memory leak (#1995, thanks @msakai!)
Fix NCCL version assignment for NCCL < 2.3.4 (#2009)
Fix __array_function__ bug (#2024)
Fix usage array function for CuPy modules (#2026, thanks @pentschev!)
Fix random generator seed type (#2036)

Code Fixes

Remove duplicate declaration of cudaMalloc (#2028, thanks @grlee77!)
Fix incorrect dtype for ipiv buffer in cupy.linalg.inv (#2043, thanks @msakai!)

Documentation

Add documentation of cupy.fuse (#1789)
Fix README image to use URL (#1977)
Add cupy-cuda100 to README (#1978)
Document NumPy 1.16 support (#1986)
Document new features in cupy.fft and cupyx.scipy.fftpack (#2035, thanks @leofang!)
Fix cupyx.scipy.get_array_module's docstring (#2050, thanks @msakai!)
Make plan argument in 2D/3D FFT functions experimental (#2056)

Installation

Eliminate RUNPATH to use correct cuDNN (#1770)
Fix eliminate-runpath (#2014)
Fix RPATH not set correctly (#2064)

Examples

Improve memcpy example (#1999, thanks @grafi-tt!)

Tests

Fix test of assert_array_list_equal (#1997)
Fix test with NumPy 1.16.1 (#1998)
Do not "accept error" if bad kwarg is passed to test itself (#2001)

cupy - v5.3.0

Published by hvy over 5 years ago

This is the release note of v5.3.0. See here for the complete list of solved issues and merged PRs.

New Features

Add NCCL 2.4 functions (#2052)

Enhancements

Use version interface in NCCL 2.3.4 (#2006)
Fix vector handling (#2013)
Avoid import numpy (#2054)
Support Python scalars in iscomplexobj (#1993)

Performance Improvements

Improve dictionary operation (#2039)

Bug Fixes

Fix cupy.random.randint fail with size zero (#1981)
Call free_all_blocks to free CUDA memory (#1987)
Define __dealloc__ instead of __del__ for cdef-classes to fix memory leak (#2000)
Fix NCCL version assignment for NCCL < 2.3.4 (#2012)
Fix random generator seed type (#2041)

Code Fixes

Remove duplicate declaration of cudaMalloc (#2037)

Documentation

Add cupy-cuda100 to README (#1980)
Fix README image to use URL (#1982)
Document NumPy 1.16 support (#1996)

Tests

Fix test to support SciPy 1.12 (#1971)
Fix test of assert_array_list_equal (#2003)
Fix test with NumPy 1.16.1 (#2004)

cupy - v6.0.0b2

Published by hvy over 5 years ago

This is the release note of v6.0.0b2. See here for the complete list of solved issues and merged PRs.

Highlights

Compatibility with NumPy 1.16 and support for the __array_function__ interface allowing CuPy arrays to be passed to NumPy functions.

New Features

Implement __array_function__ interface (#1650)
Implement SciPy-compatible FFT functions (#1745)
Add PlanNd class for faster 2D and 3D FFTs (#1746, thanks @grlee77!)
Add order argument to empty_like, zeros_like, etc. (#1819, thanks @grlee77!)
Add cupyx.scipy.sparse.diags (#1840, thanks @grlee77!)
Add the order kwarg to cupy.reshape and the underlying ndarray method (#1843, thanks @grlee77!)
Support order kwarg in asarray, asanyarray, ndarray.get, tonumpy (#1845, thanks @grlee77!)
cumsum, cumprod: support array-like input (#1847, thanks @grlee77!)
Add sparse option to meshgrid (#1848, thanks @grlee77!)
Add attributes property to Device (#1869, thanks @grlee77!)
Support new APIs in cuDNN v7.4.1 (#1884)
Add as_strided (#1897, thanks @fujiisoup!)
Add fnc and forc to core.flags.Flags (#1898, thanks @grlee77!)
CuPy external memory pool with memory pool and function pointers (#1904)
Allow multi-axis roll as in NumPy 1.12+ (#1818, thanks @grlee77!)

Enhancements

Add cupy.util.PerformanceWarning (#1607)
Support NumPy arrays as seeds in RandomState (#1689, thanks @mrocklin!)
Better support cuda-gdb (#1773)
Allow cupy.fuse taking any parameters when fails to fuse (#1817)
Verbose broadcast failure message (#1836)
Bundle CUDA fp16 headers (#1837)
Refactor array creation for RNN API (#1842)
Support for documenting NCCL interfaces (#1857)
Use item instead of asscalar to support NumPy 1.16 (#1880)
Support NumPy 1.16 (#1881)
Allow passing None as parameters of Fusion.__call__ (#1965)

Performance Improvements

Improve cudnn.py performance (#1375)
Improve matmul perfomance (x3 faster) (#1547)
Improve matmul (small changes) (#1894)
Improve Fusion.__call__ (#1832)

Bug Fixes

Fix backward of batch normalization (#1822)
Fix RecursionError bug in conj method of sparse matrix classes (#1846, thanks @grlee77!)
Allow None as cx argument in cuDNN RNN functions (#1862)
Fix ndarray initialization bug (#1907)
Set size attribute of externally allocated memory (#1926)
Fix ndim not using cdef (#1935)
Ensure the input to the CUFFT Plan1D class is C contiguous (#1944, thanks @grlee77!)
Fix wrong variable name in cudnn.pyx (#1966)

Documentation

Document Numba CUDA array conversion (#1786)
Align with the latest NumPy docs (#1804)
Update docs for newly supported libraries (#1854)
Fix bad hyperlink in cupy.around documentation (#1856)
Add memory management documentation (#1871)
Add diags to documentation (#1872)
Fix stylecheck installation in contribution guide. (#1873, thanks @crcrpar!)
Avoid using _ufunc_wrapper and _reduction_wrapper (#1888)
Rename Code of Conduct filename (#1900)

Tests

Add tests for complex min and max (#1828)
Fix randint high value in test for Windows (#1892)
Fix test to support SciPy 1.12 (#1968)

Code Fixes

Refactor manipulation routines from core.pyx (#1620)
Use language level 3 in cythonize (#1792)
Change argument name of core.create_comparison (#1829)
Refactor cyclic imports (#1905)
Refactor math and indexing routines from core.pyx (#1949)
Clean up internal.pyx (prod) (#1951)
Fix int types (#1952)
Rename External* to CFunction* (#1958)
Refactor logic, sorting and statistics routines from core.pyx (#1959)
Use LF instead of CR+LF (#1909)

Others

Ignore W503 and W504 (#1953)

cupy - v5.2.0

Published by mitmul over 5 years ago

This is the release note of v5.2.0. See here for the complete list of solved issues and merged PRs.

Highlights

CuPy now runs without CUDA development headers if you are using CUDA 9.2 or 10.0.
Improved compatibility with NumPy 1.16.

New Features

Add cupyx.scipy.sparse.diags (#1865, thanks @grlee77!)
Support New APIs in cuDNN v7.4.1 (#1893)

Enhancements

Bundle CUDA fp16 headers (#1858)
Support for documenting NCCL interfaces (#1868)
Verbose broadcast failure message (#1875)
Support NumPy arrays as seeds in RandomState (#1882, thanks @mrocklin!)
Use item instead of asscalar to support NumPy 1.16 (#1895)
Use LF instead of CR+LF (#1920)
Support NumPy 1.16 (#1923)

Performance Improvements

Fix ufunc performance degradation (#1929)

Bug Fixes

Fix backward of batch normalization (#1860)
Check if cx is None (#1863)
Fix RecursionError bug in conj method of sparse matrix classes (#1922, thanks @grlee77!)
Fix ndarray initialization bug (#1927)
Fix ndim not using cdef (#1939)
Ensure the input to the CUFFT Plan1D class is C contiguous (#1964, thanks @grlee77!)