advent-of-numba

Solutions to Advent of Code 2020 using Numba and CUDA

BSD-3-CLAUSE License

Stars
9
Committers
1

Advent of Numba

Solutions to Advent of Code using Numba. Some notes on the solutions:

  • Numba 0.52 is required to run the solutions.
    • I may use features that only appear in the master branch in later days.
  • I will try to use CUDA for each solution.
  • Most solutions will be the easiest for me to write.
    • This might mean a lot of brute force, due to the capabilities of a GPU and
      the low effort needed to invent brute force solutions.
    • The solutions will probably not be close to performance-optimal.
  • I will try to demonstrate something "interesting" about the CUDA target for
    each solution. E.g.:
    • Atomic operations
    • Cooperative grids
    • etc.
  • I will try to annotate each solution to explain to a beginner the rationale
    behind the implementation.
  • I will probably fall a few days behind.
  • I am not optimistic about finishing all 24 days.

Please direct comments / questions / criticisms / veneration to: @gmarkall.

Solutions

Links to solutions and some interesting features of them:

  • Day 1: 2D / 3D grids, atomic exchange for stores.
  • Day 2: Atomic increment, structured arrays.
  • Day 3: Building reduction kernels with @cuda.reduce,
    host to device transfers to elide unnecessary copying
  • Day 4: I didn't finish doing this on the GPU.
  • Day 5: Cooperative Groups (grid group / grid sync) and
    device functions.
  • Day 6: Python solution only so far. Will need to
    re-visit to complete a CUDA implementation - should be doable, but I'm under
    time constraints.
  • Day 7: Python solution only so far. Will probably not
    do a CUDA implementation of this one as it doesn't easily map to a GPU.
  • Day 8: Sharing the core computation of an implementation
    on both the CPU and GPU targets by calling an @njit function from a
    @cuda.jit function.
  • Day 9: Demonstrates some changes and workarounds needed when porting a pure Python code to the CUDA target - e.g. involving lists, array slicing, array functions.

Other approaches

Notes

I'm using this section to collect thoughts I have whilst working on solutions about improving the usability and accessibility of Numba and the CUDA target.

Nice-to-haves:

  • Ability to call atomic inc without specifying a maximum (e.g.
    cuda.atomic.max(arr, idx) (day 2).
  • The ability to return things from kernels (every day).
    • Kernel launches are asynchronous, so this could return a future.
    • Alternatively, allow an optional blocking launch to directly return the
      result.
  • A library of small sort functions (day 4).
    • E.g. a function for a block to cooperate sorting a small array,
    • A whole-grid sort for larger arrays,
    • etc.
  • Better string op support (day 2).
    • E.g. allow passing strings or arrays of bytes to kernels.
    • Lots of lowering of string operations missing in CUDA (but probably present
      for nopython mode).
  • Support for a better print, for "prinf debugging" (all days)
    • There is a printf-like function somewhere (in libdevice?) that can format
      strings that could be used.