Advent of Numba
Solutions to Advent of Code using Numba. Some notes
on the solutions:
- Numba 0.52 is required to run the solutions.
- I may use features that only appear in the master branch in later days.
- I will try to use CUDA for each solution.
- Most solutions will be the easiest for me to write.
- This might mean a lot of brute force, due to the capabilities of a GPU and
the low effort needed to invent brute force solutions.
- The solutions will probably not be close to performance-optimal.
- I will try to demonstrate something "interesting" about the CUDA target for
each solution. E.g.:
- Atomic operations
- Cooperative grids
- etc.
- I will try to annotate each solution to explain to a beginner the rationale
behind the implementation.
- I will probably fall a few days behind.
- I am not optimistic about finishing all 24 days.
Please direct comments / questions / criticisms / veneration to:
@gmarkall.
Solutions
Links to solutions and some interesting features of them:
-
Day 1: 2D / 3D grids, atomic exchange for stores.
-
Day 2: Atomic increment, structured arrays.
-
Day 3: Building reduction kernels with
@cuda.reduce
,
host to device transfers to elide unnecessary copying
- Day 4: I didn't finish doing this on the GPU.
-
Day 5: Cooperative Groups (grid group / grid sync) and
device functions.
-
Day 6: Python solution only so far. Will need to
re-visit to complete a CUDA implementation - should be doable, but I'm under
time constraints.
-
Day 7: Python solution only so far. Will probably not
do a CUDA implementation of this one as it doesn't easily map to a GPU.
-
Day 8: Sharing the core computation of an implementation
on both the CPU and GPU targets by calling an @njit
function from a
@cuda.jit
function.
- Day 9: Demonstrates some changes and workarounds needed when porting a pure Python code to the CUDA target - e.g. involving lists, array slicing, array functions.
Other approaches
Notes
I'm using this section to collect thoughts I have whilst working on solutions
about improving the usability and accessibility of Numba and the CUDA target.
Nice-to-haves:
- Ability to call atomic inc without specifying a maximum (e.g.
cuda.atomic.max(arr, idx)
(day 2).
- The ability to return things from kernels (every day).
- Kernel launches are asynchronous, so this could return a future.
- Alternatively, allow an optional blocking launch to directly return the
result.
- A library of small sort functions (day 4).
- E.g. a function for a block to cooperate sorting a small array,
- A whole-grid sort for larger arrays,
- etc.
- Better string op support (day 2).
- E.g. allow passing strings or arrays of bytes to kernels.
- Lots of lowering of string operations missing in CUDA (but probably present
for nopython mode).
- Support for a better print, for "prinf debugging" (all days)
- There is a printf-like function somewhere (in libdevice?) that can format
strings that could be used.