futhark | Cuda Ecosystem Directory

Bot releases are visible (Hide)

futhark - 0.20.7

Published by github-actions[bot] almost 3 years ago

Added

Better exploitation of parallelism in fused nested segmented
reductions.
Prelude function not for negating booleans.

Fixed

Some incorrect removal of copies (#1505).
Handling of parametric modules with top-level existentials (#1510).
Module substitution fixes (#1512, #1518).
Invalid in-place lowering (#1523).
Incorrect code generation for some intra-group parallel code versions.
Flattening crash in the presence of irregular parallelism (#1525).
Incorrect substitution of type abbreviations with hidden sizes (#1531).
Proper handling of NaN in min/max functions for
f16/f32/f64 in interpreter (#1528).

futhark - 0.20.6

Published by github-actions[bot] almost 3 years ago

Added

Much better code generation for segmented scans with vectorisable
operators.

Fixed

Fixes to extremely exotic GPU scans involving array operators.
Missing alias tracking led to invalid rewrites, causing a compiler
crash (#1499).
Top-level bindings with existential sizes were mishandled (#1500, #1501).
A variety of memory leaks in the multicore backend, mostly (or
perhaps exclusively) centered around context freeing or failing
programs - this should not have affected many people.
Various fixes to f16 handling in the GPU backends.

futhark - 0.20.5

Published by github-actions[bot] about 3 years ago

Added

Existential sizes can now be explicitly quantified in type
expressions (#1308).
Significantly expanded error index.
Attributes can now be numeric.
Patterns can now have attributes. None have any effect at the
moment.
futhark autotune and futhark bench now take a --spec-file
option for loading a test specification from another file.

Fixed

auto output reference datasets are now recreated when the program
is newer than the data files.
Exotic hoisting bug (#1490).

futhark - 0.20.4

Published by github-actions[bot] about 3 years ago

Added

Tuning parameters now (officially) exposed in the C API.
futhark autotune is now 2-3x faster on many programs, as it now
keeps the process running.
Negative numeric literals are now allowed in case patterns.

Fixed

futhark_context_config_set_profiling was missing for the c backend.
Correct handling of nested entry points (#1478).
Incorrect type information recorded when doing in-place lowering (#1481).

futhark - 0.20.3

Published by github-actions[bot] about 3 years ago

Added

Executables produced by C backends now take a --no-print-result option.
The C backends now generate a manifest when compiling with
--library. This can be used by FFI generators (#1465).
The beginnings of a Rust-style error index.
scan on newer CUDA devices is now much faster.

Fixed

Unique opaque types are named properly in entry points.
The CUDA backend in library mode no longer exit()s the process if
NVRTC initialisation fails.

futhark - 0.20.2

Published by github-actions[bot] about 3 years ago

Fixed

Simplification bug (#1455).
In-place-lowering bug (#1457).
Another in-place-lowering bug (#1460).
Don't try to tile inside loops with parameters with variant sizes (#1462).
Don't consider it an ICE when the user passes invalid command line
options (#1464).

futhark - 0.20.1

Published by github-actions[bot] about 3 years ago

Added

The #[trace] and #[break] attributes now replace the trace
and break functions (although they are still present in
slightly-reduced but compatible form).
The #[opaque] attribute replaces the opaque function, which is
now deprecated.
Tracing now works in compiled code, albeit with several caveats
(mainly, it does not work for code running on the GPU).
New wasm and wasm-multicore backends by Philip Lassen. Still
very experimental; do not expect API stability.
New intrinsic type f16, along with a prelude module f16.
Implemented with hardware support where it is available, and with
f32-based emulation where it is not.
Sometimes slightly more informative error message when input of
the wrong type is passed to a test program.

Changed

The ! function in the integer modules is now called not.
! is now builtin syntax. You can no longer define a function
called !. It is extremely unlikely this affects you. This
removes the last special-casing of prefix operators.
A prefix operator section (i.e. (!)) is no longer permitted
(and it never was according to the grammar).
The offset parameter for the "raw" array creation functions in the
C API is now int64_t instead of int.

Fixed

i64.abs was wrong for arguments that did not fit in an i32.
Some f32 operations (**, abs, max) would be done in double
precision on the CUDA backend.
Yet another defunctorisation bug (#1397).
The clz function would sometimes exhibit undefined behaviour in
CPU code (#1415).
Operator priority of prefix - was wrong - it is now the same as
! (#1419).
futhark hash is now invariant to source location as well as
stable across OS/compiler/library versions.
futhark literate is now much better at avoiding unnecessary
recalculation.
Fixed a hole in size type checking that would usually lead to
compiler crashes (#1435).
Underscores now allowed in numeric literals in test data (#1440).
The cuda backend did not use single-pass segmented scans as
intended. Now it does.

futhark - 0.19.7

Published by github-actions[bot] over 3 years ago

Added

A new memory reuse optimisation has been added. This results in
slightly lower footprint for many programs.
The cuda backend now uses a fast single-pass implementation for
segmented scans, due to Morten Tychsen Clausen (#1375).
futhark bench now prints interim results while it is running.

Fixed

futhark test now provides better error message when asked to
test an undefined entry point (#1367).
futhark pkg now detects some nonsensical package paths (#1364).
FutharkScript now parses f x y as applying f to x and y,
rather than as f (x y).
Some internal array utility functions would not be generated if
entry points exposed both unit arrays and boolean arrays (#1374).
Nested reductions used (much) more memory for intermediate results
than strictly needed.
Size propagation bug in defunctionalisation (#1384).
In the C FFI, array types used only internally to implement opaque
types are no longer exposed (#1387).
futhark bench now copes with test programs that consume their
input (#1386). This required an extension of the server protocol
as well.

futhark - 0.19.6

Published by github-actions[bot] over 3 years ago

Added

f32.hypot and f64.hypot are now much more numerically exact in
the interpreter.
Generated code now contains a header with information about the
version of Futhark used (and maybe more information in the
future).
Testing/benchmarking with large input data (including randomly
generated data) is much faster, as each file is now only read
once.
Test programs may now use arbitrary FutharkScript expressions to
produce test input, in particular expressions that produce opaque
values. This affects both testing, benchmarking, and autotuning.
Compilation is about 10% faster, especially for large programs.

Fixed

futhark repl had trouble with declarations that produced unknown
sizes (#1347).
Entry points can now have same name as (undocumented!) compiler intrinsics.
FutharkScript now detects too many arguments passed to functions.
Sequentialisation bug (#1350).
Missing causality check for index sections.
futhark test now reports mismatches using proper indexes (#1356).
Missing alias checking in fusion could lead to compiler crash (#1358).
The absolute value of NaN is no longer infinity in the interpreter (#1359).
Proper detection of zero strides in compiler (#1360).
Invalid memory accesses related to internal bookkeeping of bounds checking.

futhark - 0.19.5

Published by github-actions[bot] over 3 years ago

Added

Initial work on granting programmers more control over existential
sizes, starting with making type abbreviations function as
existential quantifiers (#1301).
FutharkScript now also supports arrays and scientific notation.
Added f32.epsilon and f64.epsilon for the difference between
1.0 and the next larger representable number.
Added f32.hypot and f64.hypot for your hypothenuse needs (#1344).

Local size bindings in let expressions, e.g:

let [n] (xs': [n]i32) = filter (>0) xs
in ...

Fixed

futhark_context_report() now internally calls
futhark_context_sync() before collecting profiling information
(if applicable).
futhark literate: Parse errors for expression directives now
detected properly.
futhark autotune now works with the cuda backend (#1312).
Devious fusion bug (#1322) causing compiler crashes.
Memory expansion bug for certain complex GPU kernels (#1328).
Complex expressions in index sections (#1332).
Handling of sizes in abstract types in the interpreter (#1333).
Type checking of explicit size requirements in loop parameter (#1324).
Various alias checking bugs (#1300, #1340).

futhark - 0.19.4

Published by github-actions[bot] over 3 years ago

Fixed

Some uniqueness ignorance in fusion (#1291).
An invalid transformation could in rare cases cause race
conditions (#1292).
Generated Python and C code should now be warning-free.
Missing check for uses of size-lifted types (#1294).
Error in simplification of concatenations could cause compiler
crashes (#1296).

futhark - 0.19.3

Published by github-actions[bot] over 3 years ago

Added

Better futhark test/futhark bench errors when test data does
not have the expected type.

Fixed

Mismatch between how thresholds were printed and what the
autotuner was looking for (#1269).
zip now produces unique arrays (#1271).
futhark literate no longer chokes on lines beginning with --
without a following whitespace.
futhark literate: :loadimg was broken due to overzealous
type checking (#1276).
futhark literate: :loadimg now handles relative paths properly.
futhark hash no longer considers the built-in prelude.
Server executables had broken store/restore commands for opaque types.

futhark - 0.19.2

Published by github-actions[bot] over 3 years ago

Added

New subcommand: futhark hash.
futhark literate is now smart about when to regenerate image and
animation files.
futhark literate now produces better error messages passing
expressions of the wrong type to directives.

Fixed

Type-checking of higher-order functions that take consuming
funtional arguments.
Missing cases in causality checking (#1263).
f32.sgn was mistakenly defined with double precision arithmetic.
Only include double-precision atomics if actually needed by
program (this avoids problems on devices that only support single
precision).
A lambda lifting bug due to not handling existential sizes
produced by loops correctly (#1267).
Incorrect uniqueness attributes inserted by lambda lifting
(#1268).
FutharkScript record expressions were a bit too sensitive to
whitespace.

futhark - 0.19.1

Published by github-actions[bot] over 3 years ago

Added

futhark literate now supports a $loadimg builtin function for
passing images to Futhark programs.
The futhark literate directive for generating videos is now
:video.
Support for 64-bit atomics on CUDA and OpenCL for higher
performance with reduce_by_index in particular.
Double-precision float atomics are used on CUDA.
New functions: f32.recip and f64.recip for multiplicative inverses.
Executables produced with the c and multicore backends now
also accept --tuning and --size options (although there are
not yet any tunable sizes).
New functions: scatter_2d and scatter_3d for scattering to
multi-dimensional arrays (#1258).

Removed

The math modules no longer define the name negate (use neg
instead).

Fixed

Exotic core language alias tracking bug (#1239).
Issue with entry points returning constant arrays (#1240).
Overzealous CSE collided with uniqueness types (#1241).
Defunctionalisation issue (#1242).
Tiling inside multiply nested loops (#1243).
Substitution bug in interpreter (#1250).
f32.sgn/f64.sgn now correct for NaN arguments.
CPU backends (c/multicore) are now more careful about staying
in single precision for f32 functions (#1253).
futhark test and futhark bench now detect program
initialisation errors in a saner way (#1246).
Partial application of operators with parameters used in a
size-dependent way now works (#1256).
An issue regarding abstract size-lifted sum types (#1260).

futhark - 0.18.6

Published by github-actions[bot] over 3 years ago

Added

The C API now exposes serialisation functions for opaque values.
The C API now lets you pick which stream (if any) is used for
logging prints (#1214).
New compilation mode: --server. For now used to support faster
benchmarking and testing tools, but can be used to build even
fancier things in the future (#1179).
Significantly faster reading/writing of large values. This mainly
means that validation of test and benchmark results is much faster
(close to an order of magnitude).
The experimental futhark literate command allows vaguely a
notebook-like programming experience.
All compilers now accept an --entry option for treating more
functions as entry points.
The negate function is now neg, but negate is kept around
for a short while for backwards compatibility.
Generated header-files are now declared extern "C" when
processed with a C++ compiler.
Parser errors in test blocks used by futhark bench and futhark test are now reported with much better error messages.

Fixed

Interaction between slice simplification and in-place updates
(#1222).
Problem with user-defined functions with the same name as intrinsics.
Names from transitive imports no longer leak into scope (#1231).
Pattern-matching unit values now works (#1232).

futhark - 0.18.5

Published by github-actions[bot] almost 4 years ago

Fixed

Fix tiling crash (#1203).
futhark run now does slightly more type-checking of its inputs
(#1208).
Sum type deduplication issue (#1209).
Missing parentheses when printing sum values in interpreter.

futhark - 0.18.4

Published by github-actions[bot] almost 4 years ago

Added

When compiling to binaries in the C-based backends, the compiler
now respects the CFLAGS and CC environment variables.
GPU backends: avoid some bounds-checks for parallel sections
inside intra-kernel loops.
The cuda backend now uses a much faster single-pass scan
implementation, although only for nonsegmented scans where the
operator operates on scalars.

Fixed

futhark dataset now correctly detects trailing commas in textual
input (#1189).
Fixed local memory capacity check for intra-group-parallel GPU kernels.
Fixed compiler bug on segmented rotates where the rotation amount
is variant to the nest (#1192).
futhark repl no longer crashes on type errors in given file (#1193).
Fixed a simplification error for certain arithmetic expressions
(#1194).
Fixed a small uniqueness-related bug in the compilation of
operator section.
Sizes of opaque entry point arguments are now properly checked
(related to #1198).

futhark - 0.18.3

Published by github-actions[bot] almost 4 years ago

Fixed

Python backend now disables spurious NumPy overflow warnings for
both library and binary code (#1180).
Undid deadlocking over-synchronisation for freeing opaque objects.
futhark datacmp now handles bad input files better (#1181).

futhark - 0.18.2

Published by github-actions[bot] almost 4 years ago

Added

The GPU loop tiler can now handle loops where only a subset of the
input arrays are tiled. Matrix-vector multiplication is one
important program where this helps (#1145).
The number of threads used by the multicore backend is now
configurable (--num-threads and
futhark_context_config_set_num_threads()). (#1162)

Fixed

PyOpenCL backend would mistakenly still streat entry point
argument sizes as 32 bit.
Warnings are now reported even for programs with type errors.
Multicore backend now works properly for very large iteration
spaces.
A few internal generated functions (init_constants(),
free_constants()) were mistakenly declared non-static.
Process exit code is now nonzero when compiler bugs and
limitations are encountered.
Multicore backend crashed on reduce_by_index with nonempty target
and empty input.
Fixed a flattening issue for certain complex map nestings
(#1168).
Made API function futhark_context_clear_caches() thread safe
(#1169).
API functions for freeing opaque objects are now thread-safe
(#1169).
Tools such as futhark dataset no longer crash with an internal
error if writing to a broken pipe (but they will return a nonzero
exit code).
Defunctionalisation had a name shadowing issue that would crop up
for programs making very advanced use of functional
representations (#1174).
Type checker erroneously permitted pattern-matching on string
literals (this would fail later in the compiler).
New coverage checker for pattern matching, which is more correct.
However, it may not provide quite as nice counter-examples
(#1134).
Fix rare internalisation error (#1177).