AMDGPU.jl

AMD GPU (ROCm) programming in Julia

OTHER License

Stars
281
Committers
22

Bot releases are visible (Hide)

AMDGPU.jl - v0.4.15

Published by github-actions[bot] over 1 year ago

AMDGPU v0.4.15

Diff since v0.4.14

Merged pull requests:

  • Declare compatibility with LLD_jll 15 (#435) (@giordano)
AMDGPU.jl - v0.4.14

Published by github-actions[bot] over 1 year ago

AMDGPU v0.4.14

Diff since v0.4.13

Closed issues:

  • Switching to device ≠ 1 hangs on multi-GPU node (#425)
  • @ROCDynamicLocalArray: add support for dynamic eltype and expressions for dims (#428)

Merged pull requests:

  • Fix host synchronization (#417) (@pxl-th)
  • Add device selection in current task by ID (#420) (@luraess)
  • Declare compatibility with LLVM_jll 15 (#426) (@giordano)
  • Remove buggy uses of default_device (#427) (@jpsamaroo)
  • at-ROC*LocalArray: Escape arguments (#430) (@jpsamaroo)
AMDGPU.jl - v0.4.13

Published by github-actions[bot] over 1 year ago

AMDGPU v0.4.13

Diff since v0.4.12

Merged pull requests:

  • Use TLS library state for MIOpen (#415) (@pxl-th)
AMDGPU.jl - v0.4.12

Published by github-actions[bot] over 1 year ago

AMDGPU v0.4.12

Diff since v0.4.11

Merged pull requests:

  • Fix huge MIOpen workspace allocations (#413) (@pxl-th)
  • Use accurate softmax (#414) (@pxl-th)
AMDGPU.jl - v0.4.11

Published by github-actions[bot] over 1 year ago

AMDGPU v0.4.11

Diff since v0.4.10

Merged pull requests:

  • Fix typo in LinkedList (#412) (@pxl-th)
AMDGPU.jl - v0.4.10

Published by github-actions[bot] over 1 year ago

AMDGPU v0.4.10

Diff since v0.4.9

Merged pull requests:

  • Share syncstate with reshaped/reinterpreted arrays (#411) (@pxl-th)
AMDGPU.jl - v0.4.9

Published by github-actions[bot] over 1 year ago

AMDGPU v0.4.9

Diff since v0.4.8

Closed issues:

  • State of queues and streams (#337)
  • rocBLAS: Remove old hand-wrapped code (#384)
  • HSA memory fault upon switching from default device on multi-GPU node (#385)
  • Test fail locally with AssertionError: AMDGPU.Runtime.LOGGING_STATIC_ENABLED (#399)

Merged pull requests:

  • Switch to task-focused synchronization model (#374) (@jpsamaroo)
  • Use broadcast instead of copies to initialize mapreduce buffers. (#390) (@maleadt)
  • tests: Skip logging tests if disabled (#391) (@jpsamaroo)
  • Add blas wrappers for triangular matrix mul / div (#392) (@pxl-th)
  • Simplify signal pooling (#393) (@pxl-th)
  • Adapt to GPUCompiler 0.18 (#394) (@pxl-th)
  • Reduce memory usage (#395) (@pxl-th)
  • Add support for KernelAbstraction 0.9 (#398) (@vchuravy)
  • Update to GPUCompiler 0.19 & LLVM 5 (#407) (@pxl-th)
  • Fix compiler timespan logging (#408) (@pxl-th)
  • rocBLAS: define highlevel dot, gemm, axpy functions for FP16 (#409) (@pxl-th)
  • Add KernelAbstractions.jl unsafe_free! (#410) (@pxl-th)
AMDGPU.jl - v0.4.8

Published by github-actions[bot] over 1 year ago

AMDGPU v0.4.8

Diff since v0.4.7

Merged pull requests:

  • ROCSignal: Pool signals in ctor (#369) (@jpsamaroo)
  • Reduce allocations (#376) (@pxl-th)
  • Report and exit on memory fault (#379) (@jpsamaroo)
  • versioninfo: Indicate if using JLLs or System (#381) (@jpsamaroo)
  • ROCSignal: Disable IPC by default (#383) (@jpsamaroo)
AMDGPU.jl - v0.4.7

Published by github-actions[bot] over 1 year ago

AMDGPU v0.4.7

Diff since v0.4.6

Closed issues:

  • Segfault on ROCm 5.0 (#239)

Merged pull requests:

  • Add more MIOpen functionality (#370) (@pxl-th)
  • Store HSA.Signal instead of Ref{HSA.Signal} in ROCSignal (#375) (@pxl-th)
AMDGPU.jl - v0.4.6

Published by github-actions[bot] over 1 year ago

AMDGPU v0.4.6

Diff since v0.4.5

Closed issues:

  • Implement occupancy API (#271)
  • getinfo should determine the Ref output container automatically (#273)

Merged pull requests:

  • Add timespan logging via TimespanLogging.jl (#263) (@jpsamaroo)
  • Add occupancy API and groupsize tuning (#326) (@jpsamaroo)
  • Reduce signal wait allocations (#361) (@jpsamaroo)
  • Add more intrinsics, enable always_inline (#362) (@jpsamaroo)
  • Simplify math intrinsics (#363) (@pxl-th)
  • Implement unified getinfo interface (#364) (@jpsamaroo)
  • Assorted fixes (#365) (@jpsamaroo)
  • Add memory allocation limiters (#366) (@jpsamaroo)
  • Specify return types for getinfo calls (#368) (@pxl-th)
AMDGPU.jl - v0.4.5

Published by github-actions[bot] over 1 year ago

AMDGPU v0.4.5

Diff since v0.4.4

Closed issues:

  • Mem.alloc: Allow using hipMalloc to service allocations (#286)
  • rocBLAS GEMM ignores @view (#319)
  • sincospi intrinsic is broken (#334)
  • #jps/dev segfaults on MI250x (#340)
  • method ambiguity in rand! (#343)
  • Add function or macro for AMDGPU.jl equivalent to CUDA.CuDynamicSharedArray (and CUDA.CuStaticSharedArray) (#347)
  • Free KernelState in finalizer (#352)
  • --check-bounds=no is broken on Julia 1.9.0-beta3 (#354)

Merged pull requests:

  • Fix GEMM (regular & batched) and support batched GEMM for 3D array (#318) (@pxl-th)
  • Add MIOpen (#320) (@pxl-th)
  • Add support for 2D * 3D batched GEMM (#321) (@pxl-th)
  • Support NNlib batched gemm format (#322) (@pxl-th)
  • Add pointer() method for ROCArray and some library tests (#323) (@torrance)
  • Fix double unsafe_free calls (#324) (@jpsamaroo)
  • Mem: Allow using hipMalloc/hipFree for allocations (#325) (@jpsamaroo)
  • Cast to Ptr before checking NULL pointer (#328) (@torrance)
  • Resize! support (#333) (@matinraayai)
  • Add sincos/sincospi/frexp/ldexp intrinsics (#336) (@jpsamaroo)
  • Add local memory allocation helpers (#348) (@jpsamaroo)
  • Add GPUCompiler 0.17 to compat (#349) (@jpsamaroo)
  • Preserve UInt32 in indexing intrinsics (#351) (@pxl-th)
  • Fix unsafe_free! not actually freeing (#353) (@jpsamaroo)
  • Don't sync on default HIP stream every time (#356) (@pxl-th)
  • Make alignment generated (#358) (@pxl-th)
  • tests: Properly unwrap Distributed exceptions (#359) (@jpsamaroo)
AMDGPU.jl - v0.4.4

Published by github-actions[bot] almost 2 years ago

AMDGPU v0.4.4

Diff since v0.4.3

Closed issues:

  • Repetetive AMDGPU.ones calls crash runtime (#299)
  • Add AMDGPU.jl equivalent to CUDA.CuDynamicSharedArray (and CUDA.CuStaticSharedArray) (#304)
  • Segfault with basic kernel from AMDGPU.jl doc on LUMI (#308)
  • ROC kernel faulting upon having AMDGPU and CUDA loaded (#312)
  • AMDGPU.rand failing to create a ROCArray (#315)

Merged pull requests:

  • Remove waiter and error monitor threads (#306) (@pxl-th)
  • Update bindeps search path (#307) (@luraess)
  • Prioritise ENV var to use or not artifacts (#310) (@luraess)
  • Add dynamic local memory support (#311) (@jpsamaroo)
  • random: Load definitions without rocRAND (#316) (@jpsamaroo)
AMDGPU.jl - v0.4.3

Published by github-actions[bot] about 2 years ago

AMDGPU v0.4.3

Diff since v0.4.2

Closed issues:

  • Queue selection test fail (#274)

Merged pull requests:

  • Add device quirks from CUDA.jl, enhance at-rocprintf (#269) (@jpsamaroo)
  • Use an optimized norm function for ROCBLASArray (#282) (@amontoison)
  • Add rocBLAS_jll and rocSPARSE_jll deps (#284) (@jpsamaroo)
  • active_kernels: Use WeakKeyDict (#285) (@jpsamaroo)
  • CI: Add gfx90a to more jobs (#289) (@jpsamaroo)
  • build: Remove build step, run at toplevel (#290) (@jpsamaroo)
  • Mapreducedim support for AnyROCArray (#291) (@matinraayai)
  • Parallelize tests (#293) (@jpsamaroo)
  • Fix precompilation (#294) (@pxl-th)
  • Do not rethrow EOF (#296) (@pxl-th)
  • Use correct queue for kernels (#297) (@pxl-th)
  • Implement kernel hashing system (#302) (@jpsamaroo)
AMDGPU.jl - v0.4.2

Published by github-actions[bot] about 2 years ago

AMDGPU v0.4.2

Diff since v0.4.1

Closed issues:

  • build failure on Julia 1.8.1 (#278)

Merged pull requests:

  • Mem: Retry failing allocations (#251) (@jpsamaroo)
  • Add device-to-device unsafe_copy3d test (#260) (@luraess)
  • Fix allocation retry mechanism, add slow allocation fallback (#262) (@jpsamaroo)
  • Run wavefront tests with detected wavefrontsize (#264) (@torrance)
  • During HostCall, ensure device has finished using buffers before freeing (#266) (@torrance)
  • Expand fft tests (#267) (@torrance)
  • Remove code that duplicates AbstractFFTs; add tests for casting (#268) (@torrance)
  • Don't embed the method table in the AST (#276) (@jpsamaroo)
  • deps: Don't access is_available unless using succeeds (#279) (@jpsamaroo)
  • device: Add ROCDevice() ctor (#280) (@jpsamaroo)
AMDGPU.jl - v0.4.1

Published by github-actions[bot] about 2 years ago

AMDGPU v0.4.1

Diff since v0.4.0

Closed issues:

  • Add option to disable automatic mark/wait of specific arrays (#126)
  • Limit multi-dimensional groupsize properly (#150)
  • Optimize kernarg allocations in kernel construction (#247)
  • Add priority kwarg to ROCQueue ctor (#256)

Merged pull requests:

  • Add BackToCPU struct to reduce 'view' allocations (#246) (@pxl-th)
  • LB GPUCompiler to 0.16.2 (#248) (@jpsamaroo)
  • Optimize kernel setup and launch (#249) (@jpsamaroo)
  • launch: Fix groupsize dimension check (#250) (@jpsamaroo)
  • device: Add device_id method (#253) (@jpsamaroo)
  • Re-export indexing intrinsics (#254) (@jpsamaroo)
  • CI: Switch GHA to 1.7 release (#257) (@jpsamaroo)
  • queue: Allow setting priority from ctor (#258) (@jpsamaroo)
  • math: Make signbit return Bool (#259) (@jpsamaroo)
AMDGPU.jl - v0.4.0

Published by github-actions[bot] about 2 years ago

AMDGPU v0.4.0

Diff since v0.3.7

Closed issues:

  • Use atomics/locking in hostcall (#4)
  • Bump Setfield compat (#242)

Merged pull requests:

  • Remove launch export (#232) (@matinraayai)
  • Remove indirection layer, use modules (#240) (@jpsamaroo)
  • Update Setfield compat (#243) (@luraess)
AMDGPU.jl - v0.3.7

Published by github-actions[bot] over 2 years ago

AMDGPU v0.3.7

Diff since v0.3.6

Closed issues:

  • Move standard warnings of using AMDGPU on system without AMD GPU to @debug? (#199)

Merged pull requests:

  • build/init: Skip if GPUs are not available (#231) (@jpsamaroo)
AMDGPU.jl - v0.3.6

Published by github-actions[bot] over 2 years ago

AMDGPU v0.3.6

Diff since v0.3.5

Merged pull requests:

  • chore(ci): add informational Codecov status checks (#227) (@thomasrockhu-codecov)
  • Fix incorrect agent target for allocations (#228) (@jpsamaroo)
  • Properly skip unsupported OS/arch configs (#229) (@jpsamaroo)
AMDGPU.jl - v0.3.5

Published by github-actions[bot] over 2 years ago

AMDGPU v0.3.5

Diff since v0.3.4

Closed issues:

  • Restrict agent to :gpu in HSA agent test (#223)

Merged pull requests:

  • Add async memcpy functionality (#220) (@luraess)
  • HSA: Don't always load hsa_rocr_jll (#221) (@jpsamaroo)
  • Support GPUCompiler 14/LLVM 13/Julia master (#222) (@jpsamaroo)
  • Restrict multi-GPU test to GPU agent only (#224) (@luraess)
AMDGPU.jl - v0.3.4

Published by github-actions[bot] over 2 years ago

AMDGPU v0.3.4

Diff since v0.3.3

Closed issues:

  • Add unsafe_wrap for AMDGPU (#203)
  • mapreducedim! failing in multi-AMDGPU configuration (#210)

Merged pull requests:

  • Refactor and update HSA wrappers, show agent UUID (#201) (@jpsamaroo)
  • HSAAgent: Add device()/device!() API (#202) (@jpsamaroo)
  • docs: Document device queries (#205) (@jpsamaroo)
  • Revert incorrect ones/zeros change (#206) (@jpsamaroo)
  • at-roc: Support custom completion signals (#211) (@jpsamaroo)
  • Mem: Fix pointerinfo (#212) (@jpsamaroo)
  • ROCArray: Add unsafe_wrap (#213) (@jpsamaroo)
  • Mem: Fix DL/UL/TX for wrapped ptr (#215) (@jpsamaroo)
  • HSA: Add memory pools API (#217) (@jpsamaroo)