Bot releases are hidden (Show)

cupy - v6.0.0b1

Published by hvy almost 6 years ago

This is the release note of v6.0.0b1. See here for the complete list of solved issues and merged PRs.

New Features

Support comparison operators for complex numbers (#1636, thanks @themightyoarfish!)
Implement RNN high level API (#1659)
Support array conversion from numba to cupy w/o memcpy (#1760)
Add ndarray.item method (#1815)

Enhancements

Check if ElementwiseKernel doesn't broadcast mutable in_params (#1601)
Support ndim checking in Fusion (#1713)
Remove code for CUDA 7.0 / 7.5 and cuDNN v4 (#1733)
Fix DropoutStates interface to avoid using cuDNN handle (#1734)
Add cuDNN pooling interface (#1735)
Add cuDNN softmax interface (#1736)
Implement cuDNN batch normalization interface (#1737)
Assume CUDA v8 in cuSPARSE/cuSOLVER check (#1780)
Rename _make_rnn_workspace function (#1838)
Make a helper function to get the number of layers in RNN (#1839)

Bug Fixes

Fix invalid memory access during reduction whose output is large to exceed 32-bit (#1774)
Fix incorrect cast in element-wise op for float array and complex scalar (#1795)
Fix error in typecast from fp16 to complex dtype in cupy.fuse (#1799)

Documentation

Improve comparison table (#1622)
Add requirements to build documentation on RTD (#1751)
Add policy on compatibility of random sampling (#1754)
Fix description of CUDA_PATH environment variable (#1775)
Indicate alias in comparison table (#1791)
Add code of conduct (#1793)
Fix English in ndarray.nbytes documentation (#1797)
Add note about cupy.random.{get_set}_state (#1805)
Add docs about conversion from/to cupy ndarray and sparse (#1806)
Enable intersphinx to cupyx module (#1809, thanks @grafi-tt!)
Add 10.0 to supported CUDA versions (#1824)
Add v7.3 and v7.4 to supported cuDNN versions (#1823)
Fix bad source hyperlinks (#1835)

Tests

Fix PendingDeprecationWarning when running test against NumPy 1.15 (#1755)
Support skipping tests decorated by condition or parametrize (#1757)
Fix test failure of binomial distribution in Windows (#1758)
Fix userspace kernel test failure in Windows (#1759)

Other

Add issue template (#1808)

cupy - v5.1.0

Published by niboshi almost 6 years ago

This is the release note of v5.1.0. See here for the complete list of solved issues and merged PRs.

New Features

Support comparison operators for complex numbers (#1820, thanks @themightyoarfish!)
Add cuDNN RNN high-level API (#1821)
Add cuDNN softmax and pooling interface (#1752, #1753)
Add cuDNN batch normalization interface (#1784)

Enhancements

Add float16 and float64 support for scatter_add (#1756, thanks @uchida!, #1763)
Code enhancements
- Rename _make_rnn_workspace function (#1841)
- Make a helper function to get the number of layers in RNN (#1844)
- Remove code for CUDA 7.0 / 7.5 and cuDNN v4 (#1778)
- Remove unnecessary intermediate data in FP16 atomicAdd (#1777)
- Fix DropoutStates interface to avoid using cuDNN handle (#1796)
- Assume CUDA v8 in cuSPARSE/cuSOLVER check (#1782)

Bug Fixes

Fix error in typecast from fp16 to complex dtype in cupy.fuse (#1801)
Fix invalid memory access during reduction whose output is large to exceed 32-bit (#1810)
Fix incorrect cast in element-wise op for float array and complex scalar (#1803)

Test

Workaround for test failure with Python 3.4 + SciPy 0.x + NumPy 0.15 (#1738)
Fix userspace kernel test failure in Windows (#1761)
Fix PendingDeprecationWarning when running test against NumPy 1.15 (#1762)
Support skipping tests decorated by condition or parametize (#1767)
Fix test failure of binomial distribution in Windows (#1851)

Documentation

Add code of conduct (#1831)
Improve comparison table (#1750)
Add requirements to build documentation on RTD (#1768)
Add policy on compatibility of random sampling (#1771)
Fix description of CUDA_PATH environment variable (#1783)
Indicate alias in comparison table (#1794)
Fix English in ndarray.nbytes documentation (#1798)
Add note about cupy.random.{get_set}_state (#1814)
Enable intersphinx to cupyx module (#1816, thanks @grafi-tt!)
Add 10.0 to supported CUDA versions (#1826)
Add docs about conversion from/to cupy ndarray and sparse (#1830)
Add v7.3 and v7.4 to supported cuDNN versions (#1825)

cupy - v6.0.0a1

Published by kmaehashi almost 6 years ago

This is the release note of v6.0.0a1. See here for the complete list of solved issues and merged PRs.

Highlights

Improved support of random sampling functions (cupy.random.*); see the list below for details.

New Features

Support astype in fused functions (#1586)
Add float16 and float64 support for scatter_add (#1707, #1684, thanks @uchida!)
Add random sampling functions
- cupy.random.hypergeometric (#1625)
- cupy.random.logistic (#1626)
- cupy.random.logseries (#1628)
- cupy.random.power (#1629)
- cupy.random.rayleigh (#1630)
- cupy.random.triangular (#1631)
- cupy.random.wald (#1632)
- cupy.random.weibull (#1633)
- cupy.random.negative_binomial (#1635)
- cupy.random.noncentral_chisquare (#1637)
- cupy.random.noncentral_f (#1637)

Enhancements

Update ndimage to support SciPy 1.0+ (#1606)
Improve reduction code for performance (#1668)
Device comparison with non-Device class (#1672)
Fix ufunc performance degradation (#1714)
Code enhancements
- Minor refactoring in random kernels (#1641)
- Reorder definitions of distribution kernels (#1681)
- Update einsum optimization routine (#1718)
- Remove unnecessary intermediate data in FP16 atomicAdd (#1706)

Bug Fixes

Fix float16 nextafter (#1665, thanks @toru-fukaya!)
Fix NVRTC error in cupy.random.hypergeometric (#1688)
Fix cupy.random.rand to not generate 1 (#1701)
Fix comparison with NumPy scalar (#1727)
Fix ndarray.__iter__ to raise TypeError correctly for 0-d arrays (#1697)
Add complex support in solve (#1524, #1674 thanks @boeddeker!)

Documentation

Fix typo in docstring for cupy.linalg.slogdet (#1667, thanks @fiarabbit!)
Add licenses of distribution kernels from numpy (#1680)
Fix math docs of cupy.random (#1685)
Explain that -ftz=true affects nextafter (#1717)
Drop support for CUDA 7.0 / 7.5 (#1722)
Add support for NumPy 1.15 in docs (#1728)
Add Python 3.7 support to installation docs (#1739)

Installation

Reorganize setup requirements (#1478)
Update base docker image (#1723)

Tests

Tentatively skip failing test_nextafter_combination (#1687)
Compare distributions between cupy and numpy by two-sample K-S test (#1300)
Improve floating point tests (#1673)
Mark slow cupy.random tests (#1677)
Add more K-S tests (#1678)
Minor fixes to cupy.random tests (#1679)
Fix TestRandintDtype was slow (#1682)
Ignore scipy<1.0 is warned by using deprecated feature of numpy>=1.15 (#1702)
Workaround for test failure when Python 3.4 + SciPy 0.x + NumPy 0.15 (#1732)
Fix test for Python 3.7 (#1743)

Others

Add header file generated by Cython to gitignore (#1700)

cupy - v5.0.0

Published by beam2d almost 6 years ago

This is the release note of v5.0.0. See here for the complete list of solved issues and merged PRs.

This release note only covers the difference from v5.0.0rc1; for all highlights and changes, please refer to the release notes of the pre-releases:

See the Upgrade Guide if you are upgrading from previous versions.

Highlights

CuPy now supports Python 3.7. Wheel packages are also available.
Dropped support of CUDA 7.0/7.5. Please update the CUDA installation if you are using these versions.
Improved compatibility with NumPy 1.15 and SciPy 1.0.

New Features

Support astype in fused functions (#1705)

Enhancements

Update ndimage to support SciPy 1.0+ (#1670)
Reorganize setup requirements (#1696)
Improve reduction code for performance (#1709)
Device comparison with non-Device class (#1724)

Bug Fixes

Fix float16 nextafter (#1676, thanks @toru-fukaya!)
Fix cupy.random.rand to not generate 1 (#1710)
Fix comparison with NumPy scalar (#1731)
Fix ndarray.__iter__ to raise TypeError correctly for 0-d arrays (#1699)

Documentation

Fix typo in docstring for cupy.linalg.slogdet (#1671, thanks @fiarabbit!)
Add licenses of distribution kernels from numpy (#1683)
Add support for NumPy 1.15 in docs (#1729)
Explain that -ftz=true affects nextafter (#1730)
Add Python 3.7 support to installation docs (#1742)

Installation

Update base docker image (#1725)
Drop support for CUDA 7.0 / 7.5 (#1726)

Tests

Fix TestRandintDtype was slow (#1693)
Ignore warning of scipy<1.0 which uses a deprecated feature of numpy>=1.15 (#1708)
Minor fixes to cupy.random tests (#1712)
Fix test for Python 3.7 (#1744)

Others

Add header file generated by Cython to gitignore (#1704)

cupy - v5.0.0rc1

Published by beam2d about 6 years ago

These are the release notes for v5.0.0rc1. See here for the complete list of solved issues and merged PRs.

Highlights

Many routines, esp. for sampling from various distributions, are added.
We now support SciPy 1.0+.

New Features

New sampling routines for various distributions
- Multivariate normal distribution (#1320)
- Chi-square distribution (#1414)
- Gamma distribution (#1416, #1616)
- Student’s t distribution (#1417)
- Poisson distribution (#1418)
- Cauchy distribution (#1419)
- Exponential distribution (#1420, #1624)
- F distribution (#1421)
- Geometric distribution (#1422)
- Pareto distribution (#1423)
- von Mises distribution (#1623)
- Zipf distribution (#1634)
New routines
- Add sparse cholesky decomposition (#1075, thanks @chengts95!)
- Implement round function (#1499)
- Add cbrt (#1559)
- Add nan_to_num (#1562)
- Add erf, erfc, and erfcx to cupyx (#1570)
- Add ndtr (#1571)
- Add erfinv and erfcinv (#1590)
Add new APIs for RNN (#1572)
Support complex in eigh and svd (#1518, thanks @infrub!)
Add cupy.cublas.dgetrfBatched (#1608)
General support for external memory pointer (#1610)
Add cupy.cublas.dgetriBatched (#1617)

Enhancements

Improve exception handling performance in Cython (#919)
Improve cudnn.pyx to up speed (#1378)
Improve take function (#1505)
Improve view method performance (#1543)
Split core.pyx to improve maintainability (#1550)
Improve memory pool performance (#1544)
Make ElementwiseKernel emit immutable in_params (#1554)
Use bit operation when the divisor is a power of 2 on GPU (#1555)
Use cuBLAS 9.1+ algorithm names (#1561)
Improve error messages for complex arguments of bessel functions (#1569)
Support SciPy 1.0+ (#1573)
Fix result dtypes of special functions (#1577)
Improve usability of special functions (#1578)
Improve readability of reduction kernel (only python) (#1580)
Fallback to synchronous transfer if pinned memory could not be allocated (#1593)
Unify fusion (#1599)
Add strides option in cupy.ndarray (#1611)
Fix laplace distribution (#1645)

Bug Fixes

Fix invalid exception propagation in getDropoutReserveSpaceSize (#919, #1344)
Fix CuPy not working in thread other than one imported CuPy (#1581)
Reset cuda runtime error (#1592)
Fix cupy.random.standard_exponential bug (#1642)
Support non-tuple size in random distribution generators (#1648)
Fix random.exponential (#1649)

Documentation

Add documentation to sqrt (#1560)
Add interoperability docs (#1565)
Fix docstring of modified bessel function (#1567)
Add comparison table generator (#1582)
Support cuDNN v7.2 (#1584)
Update contribution guide to align with Chainer's one (#1604)
Rename my_sum to my_add in docs (#1612)

Tests

Import scipy.special to use it in test (#1568)
Use assertIsNone in test_interpolation.py (#1587)
Add test for thread use case of fusion (#1589)
Test reproducibility for any seed (#1613)
Fix invalid escape sequence warnings in Python 3.6 (#1619)
Improve boundary tests of erfinv (#1646)

Others

Fix around style checking (#1605)

cupy - v4.5.0

Published by mitmul about 6 years ago

This is the release note of v4.5.0. See here for the complete list of solved issues and merged PRs.

Highlights

This stable update adds scipy 1.0+ support.

Enhancements

Support SciPy 1.0+ (#1588)
Improve exception handling performance in Cython (#1598)
Fallback to synchronous transfer if pinned memory could not be allocated (#1603)
Improve cudnn.pyx to up speed (#1652)

Bug Fixes

Fix cupy not working in thread other than one imported cupy (#1585)
Reset cuda runtime error (#1597)
Improve exception handling performance in Cython (#1598)

Tests

Add test for thread use case of fusion (#1595)
Fix invalid escape sequence warnings in Python 3.6 (#1627)

cupy - v4.4.1

Published by kmaehashi about 6 years ago

This is the release note of v4.4.1. See here for the complete list of solved issues and merged PRs.

This is a hot-fix release for v4.4.0 to address the issue reported in #1579 (thanks @BobLiu20 for reporting this!). Users calling CuPy functions on non-main threads may have been affected by this issue.

Bug Fixes

Fix cupy not working in thread other than one imported cupy (#1591)

Tests

Add test for thread use case of fusion (#1596)

cupy - v5.0.0b4

Published by beam2d about 6 years ago

This is the release note of v5.0.0b4. See here for the complete list of solved issues and merged PRs.

Highlights

CuPy starts supporting __cuda_array_interface__, which is the CUDA array interchange interface compatible with Numba>=0.39.0. It means you can now pass CuPy arrays to kernels JITed with Numba. The folowing is a simple example code borrowed from numba/numba#2860:

import cupy
from numba import cuda

@cuda.jit
def add(x, y, out):
    start = cuda.grid(1)
    stride = cuda.gridsize(1)
    for i in range(start, x.shape[0], stride):
        out[i] = x[i] + y[i]

a = cupy.arange(10)
b = a * 2
out = cupy.zeros_like(a)

print(out)  # => [0 0 0 0 0 0 0 0 0 0]

add[1, 32](a, b, out)

print(out)  # => [ 0  3  6  9 12 15 18 21 24 27]

Improved performance.
Implemented cumsum and cumprod to ndarray.
Implemented cupy.allclose
Enhanced cuDNN RNN functionality including FP16 support.

New Features

Implement __cuda_array_interface__ (#1144, thanks @seibert!)
Support FP16 and FP64 in cuDNN RNN related functions (#1471)
Implement <t>tpttr and <t>trttp of cuBLAS (#1492)
Add cumsum and cumprod to ndarray (#1500)
Add cupyx.scipy.get_array_module (#1513)
Implement cupy.allclose (#1522, thanks @tsurumeso!)
Add mem_info to Device (#1538, thanks @larsoner!)

Enhancements

Avoid keeping Device object in Memory and MemoryPointer (#946)
Speed up ElementwiseKernel launch (#1318)
Improve memory allocation performance (#1343)
Fix styles for latest autopep8 (#1352)
Use CScalar in elementwise and reduction (#1447)
Define ndarray.__iter__ (#1449)
Move cupy.sparse to cupyx.scipy.sparse (#1451)
Support negative indices in array_split (#1454)
Avoid collections.sequence (#1456)
Avoid variable name l to follow pep8 (#1460)
Simplify nonzero function (#1487)
Use TensorCore for matmul with fp32 matrixes (#1493)
Support nonzero for complex types (#1501)
Change type checking rules of Fusion ufuncs (#1507)
Fix minor issues on coding style (#1509)
Fix errors in NumPy 1.15 (#1514)
Use collections.abc to avoid DeprecationWarning in Python 3.7 (#1515)
Support loop_prep in ufunc (#1537)
Add _has_memory_hooks to avoid thread local dictionary operation (#1540)
Add specialized CUDA kernel for fill function (#1541)
Improve ndarray creation performance (#1542)
Use xorshift128 to reduce global memory access (#1546)
Add cdef and cpdef for better cythonize core.pyx (#1548)

Bug Fixes

Fix errors on 0-sized inputs (#1459)
Use cython.no_gc to avoid memory leak (#1463)
Fix cupy.random.dirichlet to behave same as numpy.random.dirichlet (#1468)
Fix indexing behavior when input is zero-sized array (#1503)
Fix cupy.real and cupy.imag (#1504)
Avoid compile error in old GCC (CentOS 6) (#1506)
Fix thrust memory allocation problem (#1511)
Fix dtype order of create_comparison (#1551)
Raise error when trying to broadcast out_params in ElementwiseKernel (#1552)

Documentation

Add upgrade guide for cupyx namespaces (#1467)
Fix docstring about free_all_free (#1519)
Update agnostic code tutorial (#1521, thanks @w-m!)

Tests

Fix sparse test (#1452)
Avoid hacking: Use the same test settings as Chainer (#1477)

cupy - v4.4.0

Published by kmaehashi about 6 years ago

This is the release note of v4.4.0. See here for the complete list of solved issues and merged PRs.

New Features

Add divmod function to cupy namespace (#1480)
Allow more natural fusion notation (#1481)

Enhancements

Avoid collections.sequence (#1472)
Support negative indices in array_split (#1475)
Avoid variable name l to follow pep8 (#1479)
Reduce Python function call in ElementwiseKernel (#1482)
Speed up ElementwiseKernel launch (#1488)
Improve memory allocation performance (#1490)
Fix type of return value of fused function (#1491)
Support composition of fused functions (#1494)
Add compilation methods in Fusion class (#1497)
Allow cupy.get_array_module take fusion parameters (#1498)
Use CScalar in elementwise and reduction (#1508)
Use collections.abc to avoid DeprecationWarning in Python 3.7 (#1517)
Fix unit test errors in NumPy 1.15 (#1549)

Bug Fixes

Use cython.no_gc to avoid memory leak (#1474)
Fix errors on 0-sized inputs (#1485)
Fix type of reduction (#1502)
Avoid compile error in old GCC (CentOS6) (#1516)
Fix thrust memory allocation problem (#1525)
Fix dtype order of create_comparison (#1553)
Raise error when trying to broadcast out_params in ElementwiseKernel (#1556)

Documentation

Add upgrade guide for cupyx namespaces (#1496)
Update agnostic code tutorial (#1528, thanks @w-m!)

Tests

Add .pytest_cache/ to .gitignore (#1530)
Avoid hacking: Use the same test settings as Chainer (#1536)

cupy - v5.0.0b3

Published by niboshi over 6 years ago

This is the release note of v5.0.0b3. See here for the complete list of solved issues and merged PRs.

Highlights

cupyx.scipy namespace has been introduced to provide SciPy-compatible APIs for CuPy ndarrays. cupy.sparse module has been renamed to cupyx.scipy.sparse; cupy.sparse is kept for backward compatibility.
New user-defined kernel class called cupy.RawKernel has been added. By using raw kernels, you can define kernels from raw CUDA source. See the documentation for details.

New Features

Introduce SciPy namespace (#1079)
Logarithmic gamma and related functions (#1232)
Binomial distribution (#1356)
Implement cupyx.scipy.linalg.solve_triangular (#1383)
Implement RawKernel (#1398)
Beta distribution (#1413)
Dirichlet distribution (#1415)

Enhancements

Use fmin and fmax for HIP environment (#1116)
Improve reduce_dims for speed up (#1324)
Remove overhead in creation/basic.py (#1342)
Improve performance of host to device memory copy (#1367)
Fix cupy.cov for degrees of freedom <= 0 (#1370, thanks @tsurumeso!)
Add compilation methods in Fusion class (#1382)
Make the method cupy.random.RandomState.interval private (#1430)
Use get_cublas_handle to reduce creation of Device object (#1440)
Remove overhead in generator and distribution (#1442)
Remove stream option from RawKernel and add missing docs of arguments in ReductionKernel (#1444)
Allow cupy.get_array_module to take fusion parameters (#1446)
Use internal.clp2 in reduction (#1448)

Bug Fixes

Fix issue of cuDNN convolution math_type setting (#1428)
Fix Module and LinkState not freed (#1439)
Fix fromDlpack memory management (#1445, thanks @t-vi!)
Fix error in PooledMemory in Python 3.7 (#1457)

Documentation

Fix documentation of the option arg of ElementwiseKernel (#1437)
Convert cupy.sparse to cupyx.scipy.sparse in docstrings (#1450)

Installation

Change required Cython version to 0.28 or later (#1407)

Tests

Fix requirements of numpy in test_einsum.py (#1400)
Refactor TestOrder (#1405)

cupy - v4.3.0

Published by beam2d over 6 years ago

This is the release note of v4.3.0. See here for the complete list of solved issues and merged PRs.

Enhancements

Improve reduce_dims for speed up (#1424)
Remove overhead in creation/basic.py (#1425)
Improve performance of host to device memory copy (#1426)

Bug Fixes

Fix to accept longer order names (#1395)
Fix Module and LinkState not freed (#1441)
Fix error in PooledMemory in Python 3.7 (#1462)

Documentation

Fix documentation of the option arg of ElementwiseKernel (#1438)

Installation

Support bundling dependent DLLs for Windows wheel support (#1410)
Change required Cython version to 0.28 or later (#1412)

Tests

Refactor TestOrder (#1408)

cupy - v4.2.0

Published by hvy over 6 years ago

This is the release note of v4.2.0. See here for the complete list of solved issues and merged PRs.

Highlights

Allocation strategy of pinned memory has been improved to reduce host memory usage.
Fixed bugs in multiple functions with arrays with complex dtypes.

Enhancements

Use cuDNN v7 APIs to get conv algos for TensorCore (#1134)
Fix cupy.diag failures for array-likes objects other than CuPy arrays (#1235, thanks @hyabe!)
Remove memory copy in the cupy.diag function (#1337)
Support weak reference to CuPy array (#1359)
Fix to preserve dtype of an input array in cupy.linalg.norm (#1376)
Improve performance of ndarray initialization (#1377)
Round-up pooled memory allocation size with clp2 (#1386)
Reduce GPU memory usage in (de)convotion (#1387)
Support 'f' and 'c' in the order option of ndarray (#1390)

Bug Fixes

Fix cupy.matmul when inputs contain zero-sized array(s) (#1238)
Fix default dtype of cupy.full (#1257)
Fix dtype option of cupy.sum and cupy.prod (#1259)
sort, lexsort, and argsort catch C++ exceptions from Thrust (#1290)
Fix view of zero-dim ndarray (#1291)
Fix real and imag of zero-dim ndarray (#1292)
Support cupy.expm1, cupy.log1p, cupy.log2 for complex type (#1293)
Fix unary functions in cupy.math.misc to support complex types (#1297)
Fix binary functions in cupy.math.misc to support complex types (#1298)
Fix OutOfMemoryError raised even when there are sufficient large freeable chunks (#1301, thanks @hyabe!)
Fix astype for complex dtypes (#1302)
Fix real, imag of non-contiguous complex ndarray (#1306)
Rounding functions support complex ndarrays (#1308)

Documentation

Update README to encourage use of wheels (#1296)
Add NumPy 1.14 to supported versions (#1305)
Fix typo in sparse matrix docs (#1310)
Force displaying known methods which are mis-recognized as attributes by Sphinx (#1314)
Expand reference on differences in zero-dimensional arrays (#1315)
Split LICENSE file (#1326)
Fix typo in profiler docs (#1328)
Reorganize license files (#1335)
Fix typos in cupy.zeros and cupy.zeros_like (#1360)
Update requirements for v5.0.0b2 / v4.2 release (#1373)

Installation

Remove deprecated imp.load_source in setup.py (#1332, thanks @vilyaair!)
Add a license file to wheel (#1348)

Tests

Fix .coveragerc (#1212)
Remove _multiprocess_can_split_ (#1267)

cupy - v5.0.0b2

Published by niboshi over 6 years ago

This is the release note of v5.0.0b2. See here for the complete list of solved issues and merged PRs.

Highlights

CuPy now supports DLPack to improve interoperability between frameworks. You can convert between cupy.ndarray and DLPack tensor using array.toDlpack() and cupy.fromDlpack(tensor). See the documentation for details.
CuPy ndarray now implements __array_ufunc__ protocol to improve interoperability with NumPy. It makes NumPy ufuncs applicable to CuPy ndarrays directly (for example, numpy.exp(cupy.ones(3)) will call cupy.exp to compute the exponential, and return CuPy ndarray).
CuPy now supports CUDA 9.2 and NumPy 1.14.
More NumPy/SciPy compatible methods have been implemented: cupy.linalg.matrix_power, cupy.random.laplace, cupy.corrcoef, cupy.cov, cupy.i0, cupy.sinc, cupyx.scipy.special.* and more.
cupy.einsum has been rewritten to use cuBLAS. This significantly reduces the memory usage and also improves the performance.
Allocation strategy of pinned memory has been improved to reduce host memory usage.
Fixed bugs in multiple functions with arrays with complex dtypes.

New Features

Support DLPack (#1082)
Implement cupy.corrcoef and cupy.cov (#1110, thanks @tsurumeso!)
Allow more natural fusion notation (#1167)
Implement special functions (#1233)
Implement __array_ufunc__ (#1247, thanks @martindurant!)
Improve performance of batch normalization (#1260)
Add complex dtype to sparse matrix (#1277, thanks @chengts95!)
Add cuDNN API for tensor operations and reduction (#1319, thanks @kashif!)
Add distribution laplace (#1321)
Implement cupy.linalg.matrix_power (#1374, thanks @ericmjl!)

Enhancements

Reduce Python function call in ElementwiseKernel (#725)
Check cuDNN convolution algorithm (#890)
Remove memory copy in diag function (#1129)
Fix linalg.matrix_rank casting for Windows (#1217)
Use cuBLAS in cupy.einsum (#1218)
Add divmod function to cupy namespace (#1286)
Improve ndarray initializing performance (#1341)
Fix type of return value of fused function (#1349)
Support composition of fused functions (#1350)
Support weak reference to CuPy array (#1355)
Remove cupy_stdint.h (#1361)
Round-up pooled memory allocation size with clp2 (#1372)
Reduce GPU memory usage in (de)convotion (#1381)
Support 'f' and 'c' in the order option of ndarray (#1385)

Bug Fixes

Fix OutOfMemoryError raised even when there are sufficient large freeable chunks (#1256, thanks @hyabe!)
Fix astype for complex dtypes (#1279)
Fix real and imag for zero-dim arrays (#1280)
Support rounding for complex types (#1282)
Support expm1, log1p, log2 for complex type (#1283)
Fix unary functions in misc for complex (#1284)
Fix binary functions in misc to support complex types (#1285)
Fix view of zerodim ndarray (#1287)
Catch C++ exceptions from Thrust (#1289)
Fix real, imag of non-contiguous complex arrays (#1303)
Fix rint syntax error (#1311)
Skip einsum test for NumPy versions with broken einsum (#1334)
Fix type of reduction (#1354)
Fix to accept longer order names (#1393)

Documentation

Add NumPy 1.14 to supported versions (#1139)
Update README to encourage use of wheels (#1208)
Improve sparse docs to show conversion from/to SciPy (#1213)
Force displaying known methods which are mis-recognized as attributes by Sphinx (#1250)
Expand reference on differences in zero-dimensional arrays (#1254)
Fix typo in sparse matrix docs (#1307)
Split LICENSE file (#1325)
Fix typo in profiler docs (#1327)
Reorganize license file (#1330)
Fix typos in zeros and zeros_like (#1357)
Update requirements for v5.0.0b2 / v4.2 release (#1369)

Installation

Use define macro in setup.py (#1121)
Support bundling dependent DLLs for Windows wheel support (#1253)
Remove deprecated imp.load_source in setup.py (#1329, thanks @vilyaair!)
Add license file to wheel (#1333)

Tests

Sparse complex ufunc (#1312)
Add scipy_name to testing helper functions (#1339)

cupy - v4.1.0

Published by niboshi over 6 years ago

This is the release note of v4.1.0. See here for the complete list of solved issues and merged PRs.

Enhancements

Add NumPy-compatibility constants (#1205, thanks @keisuke-umezawa!)
Support complex constants and functions in fuse (#1207)
Free pooled memory when cufftMakePlan1d cannot allocate memory (#1236)
Rename CuFftError to CuFFTError (#1244)

Bug Fixes

Fix regex in einsum to match empty input subscript (#1186)
Fix memory leak: mempool tried to find out-of-bounds bin when freeing chunk (#1189)
Fix scalar casting rule to support Windows (#1194)
Fix conversion from float16 to complex (#1252)

Installation

Separate NVTX module for better Windows support (#1237)

Tests

Separate tests for cupy.power against complex dtype (#1187)
Fix example test to pass on Windows (#1188)
Fix real and imag test for bool to pass on Windows (#1192)
Skip int8.max test on Windows due to NumPy bug (#1193)
Fix hacking version (#1229)
Fix 32-bit boundary test to support Windows (#1255)

Others

Fix .gitignore to exclude .pyd files (#1227)

cupy - v5.0.0b1

Published by beam2d over 6 years ago

This is the release notes of v5.0.0b1. See here for the complete list of solved issues and merged PRs.

Highlights

We started to provide wheels for Python 3.6 on Windows. Currently this is considered as experimental, and we'd love to hear your feedback. See Installation Guide for details.

New Features

Fix incompatibility between cupy.random.permutation and numpy.random.permutation. (#1138)
Implement unique (#1140)
Add cupy.average (#1180)
Implement triangular array creation routines (#1195, thanks @tsurumeso!)

Enhancements

Fix to preserve dtype of input array in cupy.linalg.norm (#875)
Support complex constants and functions in fuse (#1090)
Fix cupy.diag() fails for array-likes other than CuPy arrays (#1124, thanks @hyabe!)
Add NumPy-compatibility constants (#1163, thanks @keisuke-umezawa!)
Free pooled memory when cufftMakePlan1d cannot allocate memory (#1219)
Rename CuFftError to CuFFTError (#1234)

Bug Fixes

Fix memory leak: mempool tried to find out-of-bounds bin when freeing chunk (#1165)
Fix scalar casting rule to support Windows (#1169)
Fix regex in einsum to match empty input subscript (#1181)
Fix default dtype of full (#1209)
Fix matmul when inputs contain zero-sized array (#1231)
Fix dtype option of sum and prod (#1239)
Fix conversion from float16 to complex (#1241)
Fix file permissions (#1249)

Documentations

Fix missing documents (#1148)

Installation

Separate NVTX module for better Windows support (#1211)

Examples

Tests

Skip int8.max test on Windows due to NumPy bug (#1171)
Fix example test to pass on Windows (#1172)
Fix real and imag test for bool to pass on Windows (#1173)
Separate tests for cupy.power against complex dtypes (#1174)
Fix .coveragerc (#1210)
Fix 32-bit boundary test to support Windows (#1216)
Remove _multiprocess_can_split_ (#1220)
Fix hacking version (#1228)

Others

Fix .gitignore to exclude .pyd files (#1215)

cupy - v4.0.0

Published by kmaehashi over 6 years ago

This is the major release of CuPy v4.0.0. All of the updates since the previous major version (v2.5.0) can be found in the release notes below:

Summary of v4 update

We start providing wheel packages. You can install one using the following command, depending on the CUDA version you are using.

$ pip install cupy-cuda80
$ pip install cupy-cuda90
$ pip install cupy-cuda91

If you already have an old version of CuPy installed, first uninstall it before installing a wheel package. Note that these packages also include binaries of cuDNN and NCCL, so you do not need to place it by yourself.

Memory pool is now the default allocator even if CuPy is used alone without Chainer (note that it does not affect those who are using Chainer).
Many new functions are added, including FFT support.
Version number is aligned with that of Chainer. It means “v3.x.x” series has been skipped.

See the Upgrade Guide for users of migrating from CuPy v2 to v4.

Updates from the release candidate are as follows.

New Features

Implement cupy.show_config and cupyx.get_runtime_info (#1120)

Enhancements

Support double precision atomicAdd on Maxwell or older GPUs (#1114, thanks @anaruse!)
Expose all supported dtypes from numpy (#1130)
Handle errors in cupy.show_config() (#1135)
Fix to capture CuDNNError in cupyx.runtime (#1151)

Bug Fixes

Fix diagflat fail if argument is not cupy.ndarray (#1058)
Fix moveaxis bug (#1059, thanks @fukatani!)
Fix duplicate declaration of EigMode in cuSPARSE (#1111)
Fix a.real and a.imag to return view (#1113)
Fix cupy.concatenate to support arrays with >= 2**31 elements (#1115)
Limit arch to the maximum value allowed in each NVRTC version (#1119)
Fix duplicate delcaration of cudaError_t (#1145)
Use streams when calling libraries (#1153)
Fix cupy.linalg.inv() breaks its argument (#1154, thanks @hyabe!)
Do not use platform-specific CC (#1158)

Documents

Update documentation for chainer.backends.cuda (#1050)
Fix typo (#1051)
Fix typo (#1080)
Fix document of for_unsigned_dtypes (#1081)
Fix wrong references of document (#1102)
Remove invalid argument description in cupy.tensordot (#1103)
Rewrite installation guide (#1127)
Enable flake8 in cupy/indexing/generate.py (#1146)
Fix document of r_ and c_ (#1149)
Fix document of MemoryHook (#1150)

Installation

Use --no-cache-dir in Dockerfile (#1061)
Avoid embedding CUDA_PATH to RPATH in wheels (#1083)

Examples

Avoid to import matplotlib to set its backend Agg in code like chainer (#1054)

Tests

Remove platform-dependent dtype (#1092)
Remove nose dependency (#1126)

cupy - v5.0.0a1

Published by niboshi over 6 years ago

This is the release note of v5.0.0a1. See here for the complete list of solved issues and merged PRs.

New Features

Expose context management API in driver (#977)
Add 'edge' and 'reflect' mode to cupy.pad (#1040, thanks @wkentaro!)
Implement histogram (#1049, thanks @IshitaTakeshi!)
Implement multi-dimensional image processing (#1066)
Implement cupy.show_config and cupyx.get_runtime_info (#1067)

Enhancements

Expose all supported dtypes from numpy (#1070)
Support double precision atomicAdd on Maxwell or older GPUs (#1071, thanks @anaruse!)
Use cuDNN v7 APIs to get convolution algorithms for TensorCore (#1095, thanks @anaruse!)
Handle errors in cupy.show_config() (#1132)
Fix to capture CuDNNError in cupyx.runtime (#1136)

Bug Fixes

Fix moveaxis bug (#1023, thanks @fukatani!)
Fix diagflat to fail if argument is not cupy.ndarray (#1036)
Limit arch to the maximum value allowed in each NVRTC version (#1055)
Fix ndarray.real and ndarray.imag to return view (#1089)
Fix cupy.concatenate to support arrays with >= 2**31 elements (#1101)
Use streams when calling libraries (#1107)
Fix duplicate declaration of EigMode in cuSPARSE (#1108)
Fix duplicate delcaration of cudaError_t (#1112)
Fix cupy.linalg.inv() breaks its argument (#1123, thanks @hyabe!)
Use cusolverSpSetStream for cuSolverSP library calls (#1152)
Do not use platform-specific CC (#1157)

Documents

Fix typos: (#1046, #1077)
Update documentation for chainer.backends.cuda (#1047)
Rewrite installation guide (#1064)
Remove invalid argument description in cupy.tensordot (#1069)
Fix document of for_unsigned_dtypes (#1076)
Fix wrong references of document (#1078)
Fix document of ndimage (#1131)
Enable flake8 in cupy/indexing/generate.py (#1141)
Fix document of r_ and c_ (#1142)
Fix document of MemoryHook (#1143)

Installation

Use --no-cache-dir in Dockerfile (#1060)
Avoid embedding CUDA_PATH to RPATH in wheels (#1065)

Examples

Avoid to import matplotlib to set its backend Agg (#976)

Tests

Remove platform-dependent dtype (#1091)
Remove nose dependency (#1125)

cupy - v4.0.0rc1

Published by hvy over 6 years ago

This is the release candidate of v4. See here for the complete list of solved issues and merged PRs.

Announcements

We have started supporting CUDA9.1! A new wheel package cupy-cuda91 is also available from this release. You can install it with pip install cupy-cuda91.
The master branch has been switched to v5 development. The development of v4 will continue in the v4 branch.
The major release of v4 is planned on Apr. 17.

New Features

Implement cuDNN convolution interface (#715)
Support multi dimensional arrays in solve (#845)
Add cupyx.rsqrt (#846)
Add destroy method to NcclCommunicator (#975)
Implement __setitem__ in fusion function (#1002)

Bug Fixes

Fix overflow in indices when indexing (#758, thanks @yuyu2172!)
Fix matrix multiplication when matrixes have duplicated entries (#834)
Fix multithread bug with CUDA driver API (#916)
Remove trailing NULL from values returned from NVRTC (#942)
Fix ndarray.diagonal to accept appropriate argument of axis2 (#978, thanks @ronekko!)
Fix eliminate_zeros (#998)
Remove cudnn STATUS dict (#1012)
Fix temporary variables which are used when input_num is given (#1020)
Fix cupy.copyto ignore where argument when src is scalar (#1028)

Installation

Fix to use rpath only when wheel libs are specified (#980)
Update to CUDA 8.0 and use CuPy wheels in Dockerfiles (#991)
Support CUDA 9.1 (#997)
Fix exception handling fail on Windows with Python 3.x (#1000)

Enhancements

Improve matrix inverse speed using LU decomposition (#695, #927, thanks @stevendbrown!)
Use CUDA version to decide if it import cuSOLVER or not (#832)
Simplify fp16 code in carray.cuh (#870)
Fix potential error at the stride for loop over the j-axis in ReductionKernel (#874, thanks @grafi-tt!)
Hide Chunk class (#933)
Improve concatenate and other functions (#949)
Use nogil in FFT (#950)
Improve error message when import failed (#970)
Use current stream in array method (#981)
Change group argument name of create_convolution_descriptor (#988)
Use default stream in _scatter_op (#989)
Expose CUDNN_BN_MIN_EPSILON from cudnn.h (#1011)

Documents

Add upgrade guide for v2 & v4 (#884)
Add wheels to installation guide (#955)
Update array docstring (#982, thanks @juniorrojas!)
Prefer pip in documentation (#985)
Add Docker update information to upgrade guide (#993)
Remove unnecessary heading from reference (#996)
Add URL to the directory in the documentation (#999)
Fix spelling mistake of NumPy and CuPy (#1013)
Fix typo (#1015)
Fix scatter_add docs (#1025)
Add complex dtypes on the overview (#1026)
Document more on CuPy/NumPy difference (#1027)
Add SciPy license to document (#1037)
Fix broken link to numpy.sum (#1039)

Tests

Free huge memory in slow test; Fix sum test to avoid contiguousness difference between CuPy and NumPy (#971)
Add AppVeyor configuration (#1001)
Fix shaped_random for complex number (#1017)
Skip cuDNN tests when cuDNN is unavailable (#1041)
Add Codecov.io configuration (#1003)

Others

Remove outdated TODO (#1014)

cupy - v2.5.0

Published by niboshi over 6 years ago

This is the release note of v2.5.0. See here for the complete list of solved issues and merged PRs.

Improvements

Improve error message when import failed (#1029)
Fix coding style for chained comparisons (#961)
Prefer double quoted docstring to single quoted docstring (#962)

Bug Fixes

Fix return type of linalg.norm when its input is complex (#869, thanks @kohr-h!)
Fix multithread bug with CUDA driver API (#972)
Fix overflow in indices when indexing (#984, thanks @yuyu2172!)
Fix ndarray.diagonal to accept an appropriate argument of axis2 (#992, thanks @ronekko!)
Remove trailing NULL from values returned from NVRTC (#1033)
Fix cupy.copyto to ignore where argument when src is scalar (#1035)

Documentation

Add upgrade guide for v2 (#990)
Fix typo (#1016, thanks @juniorrojas!)
Use pip in documentation (#1031)
Document more on CuPy/NumPy differences (#1032)
Prefer NumPy dtype objects over character codes in documentation (#963)

Tests

Fix sum test to avoid contiguousness difference between CuPy and NumPy (#995)
Add Codecov.io configuration (#1004)

cupy - v4.0.0b4

Published by hvy over 6 years ago

This is the release of v4.0.0b4. See here for the complete list of solved issues and merged PRs.

Important Changes

Starting from this release, CuPy supports wheel packages. The packages have different names depending on the CUDA versions. Installation is much faster with wheel packages than with the conventional installation from source.

To install CuPy from a wheel package, first uninstall the existing CuPy if you have, and then type the following command with appropriate CUDA version.

$ pip install --pre cupy-cuda80
$ # or
$ pip install --pre cupy-cuda90

Note: the wheel packages include cuDNN and NCCL2 binaries. These included binaries are automatically used by CuPy instead of those prepared by yourselves.

New Features

Implement isclose (#825)
Introduce cupyx namespace (#894)
- cupy.scatter_add is marked as deprecated. Use cupyx.scatter_add instead.
Add logic type tests: iscomplex, iscomplexobj, isfortran, isreal, isrealobj (#904)
Import numpy complex dtypes into cupy namespace. (#907, thanks @ericmjl!)
Simplify free_all_blocks interface (#918)
Support CUDA stream on FFT (#926)
Add forward/backward support for cuDNN clipped ReLU and ELU (#938, thanks @tkerola!)

Improvements

Simplify __getitem__ and __setitem__ (#724)
Fix concatenate bug and improve performance (#747)
Improve cupy.partition performance (#763)
Support memory pool for FFT functions (#910)
Implement memory pool index compaction (#912)
Enable complex tensors for cupy.matmul and cupy.einsum (#921, thanks @hknerdgn!)
Move cupy.cufft to internal functions of cupy.fft.fft (#924)
Use cublas*gemmStridedBatched APIs in matmul (#930, thanks @anaruse!)
Use __dealloc__ in memory object (#934)
Embed build-time version of CUDA/cuDNN (#937)
Remove unnecessary code in cupy.fft (#951)
Prefer data type objects over character codes (#956)
Prefer double quoted docstring to single quoted docstring (#957)
Simplify chained comparisons (#958 )
Remove warning in cupy.core.fusion (#960)

Bug Fixes

Fix concatenate bug and improve performance (#747)
Fix scalar broadcast (#789)
Test sparse with many types of sparse matrices (#833)
Fix array iteration index overflow (#882)
Fix get and set method with async mode (#900)
Fix bug about stream deletion (#917)
Fix test failure in cuDNN v5 which does not support CUDNN_ACTIVATION_ELU (#959)

Installation

Improve message during installation (#762)
Support wheel build (#899)
Exclude Cython files from sdist (#920)
Add --cupy-long-description option to inject long description for wheels (#953)