Bot releases are visible (Hide)
Published by github-actions[bot] almost 3 years ago
Better exploitation of parallelism in fused nested segmented
reductions.
Prelude function not
for negating booleans.
Some incorrect removal of copies (#1505).
Handling of parametric modules with top-level existentials (#1510).
Module substitution fixes (#1512, #1518).
Invalid in-place lowering (#1523).
Incorrect code generation for some intra-group parallel code versions.
Flattening crash in the presence of irregular parallelism (#1525).
Incorrect substitution of type abbreviations with hidden sizes (#1531).
Proper handling of NaN in min
/max
functions for
f16
/f32
/f64
in interpreter (#1528).
Published by github-actions[bot] almost 3 years ago
Fixes to extremely exotic GPU scans involving array operators.
Missing alias tracking led to invalid rewrites, causing a compiler
crash (#1499).
Top-level bindings with existential sizes were mishandled (#1500, #1501).
A variety of memory leaks in the multicore backend, mostly (or
perhaps exclusively) centered around context freeing or failing
programs - this should not have affected many people.
Various fixes to f16
handling in the GPU backends.
Published by github-actions[bot] about 3 years ago
Existential sizes can now be explicitly quantified in type
expressions (#1308).
Significantly expanded error index.
Attributes can now be numeric.
Patterns can now have attributes. None have any effect at the
moment.
futhark autotune
and futhark bench
now take a --spec-file
option for loading a test specification from another file.
auto output
reference datasets are now recreated when the program
is newer than the data files.
Exotic hoisting bug (#1490).
Published by github-actions[bot] about 3 years ago
Tuning parameters now (officially) exposed in the C API.
futhark autotune
is now 2-3x faster on many programs, as it now
keeps the process running.
Negative numeric literals are now allowed in case
patterns.
futhark_context_config_set_profiling
was missing for the c
backend.
Correct handling of nested entry points (#1478).
Incorrect type information recorded when doing in-place lowering (#1481).
Published by github-actions[bot] about 3 years ago
Executables produced by C backends now take a --no-print-result
option.
The C backends now generate a manifest when compiling with
--library
. This can be used by FFI generators (#1465).
The beginnings of a Rust-style error index.
scan
on newer CUDA devices is now much faster.
Unique opaque types are named properly in entry points.
The CUDA backend in library mode no longer exit()
s the process if
NVRTC initialisation fails.
Published by github-actions[bot] about 3 years ago
Simplification bug (#1455).
In-place-lowering bug (#1457).
Another in-place-lowering bug (#1460).
Don't try to tile inside loops with parameters with variant sizes (#1462).
Don't consider it an ICE when the user passes invalid command line
options (#1464).
Published by github-actions[bot] about 3 years ago
The #[trace]
and #[break]
attributes now replace the trace
and break
functions (although they are still present in
slightly-reduced but compatible form).
The #[opaque]
attribute replaces the opaque
function, which is
now deprecated.
Tracing now works in compiled code, albeit with several caveats
(mainly, it does not work for code running on the GPU).
New wasm
and wasm-multicore
backends by Philip Lassen. Still
very experimental; do not expect API stability.
New intrinsic type f16
, along with a prelude module f16
.
Implemented with hardware support where it is available, and with
f32
-based emulation where it is not.
Sometimes slightly more informative error message when input of
the wrong type is passed to a test program.
The !
function in the integer modules is now called not
.
!
is now builtin syntax. You can no longer define a function
called !
. It is extremely unlikely this affects you. This
removes the last special-casing of prefix operators.
A prefix operator section (i.e. (!)
) is no longer permitted
(and it never was according to the grammar).
The offset parameter for the "raw" array creation functions in the
C API is now int64_t
instead of int
.
i64.abs
was wrong for arguments that did not fit in an i32
.
Some f32
operations (**
, abs
, max
) would be done in double
precision on the CUDA backend.
Yet another defunctorisation bug (#1397).
The clz
function would sometimes exhibit undefined behaviour in
CPU code (#1415).
Operator priority of prefix -
was wrong - it is now the same as
!
(#1419).
futhark hash
is now invariant to source location as well as
stable across OS/compiler/library versions.
futhark literate
is now much better at avoiding unnecessary
recalculation.
Fixed a hole in size type checking that would usually lead to
compiler crashes (#1435).
Underscores now allowed in numeric literals in test data (#1440).
The cuda
backend did not use single-pass segmented scans as
intended. Now it does.
Published by github-actions[bot] over 3 years ago
A new memory reuse optimisation has been added. This results in
slightly lower footprint for many programs.
The cuda
backend now uses a fast single-pass implementation for
segmented scan
s, due to Morten Tychsen Clausen (#1375).
futhark bench
now prints interim results while it is running.
futhark test
now provides better error message when asked to
test an undefined entry point (#1367).
futhark pkg
now detects some nonsensical package paths (#1364).
FutharkScript now parses f x y
as applying f
to x
and y
,
rather than as f (x y)
.
Some internal array utility functions would not be generated if
entry points exposed both unit arrays and boolean arrays (#1374).
Nested reductions used (much) more memory for intermediate results
than strictly needed.
Size propagation bug in defunctionalisation (#1384).
In the C FFI, array types used only internally to implement opaque
types are no longer exposed (#1387).
futhark bench
now copes with test programs that consume their
input (#1386). This required an extension of the server protocol
as well.
Published by github-actions[bot] over 3 years ago
f32.hypot
and f64.hypot
are now much more numerically exact in
the interpreter.
Generated code now contains a header with information about the
version of Futhark used (and maybe more information in the
future).
Testing/benchmarking with large input data (including randomly
generated data) is much faster, as each file is now only read
once.
Test programs may now use arbitrary FutharkScript expressions to
produce test input, in particular expressions that produce opaque
values. This affects both testing, benchmarking, and autotuning.
Compilation is about 10% faster, especially for large programs.
futhark repl
had trouble with declarations that produced unknown
sizes (#1347).
Entry points can now have same name as (undocumented!) compiler intrinsics.
FutharkScript now detects too many arguments passed to functions.
Sequentialisation bug (#1350).
Missing causality check for index sections.
futhark test
now reports mismatches using proper indexes (#1356).
Missing alias checking in fusion could lead to compiler crash (#1358).
The absolute value of NaN is no longer infinity in the interpreter (#1359).
Proper detection of zero strides in compiler (#1360).
Invalid memory accesses related to internal bookkeeping of bounds checking.
Published by github-actions[bot] over 3 years ago
Initial work on granting programmers more control over existential
sizes, starting with making type abbreviations function as
existential quantifiers (#1301).
FutharkScript now also supports arrays and scientific notation.
Added f32.epsilon
and f64.epsilon
for the difference between
1.0 and the next larger representable number.
Added f32.hypot
and f64.hypot
for your hypothenuse needs (#1344).
Local size bindings in let
expressions, e.g:
let [n] (xs': [n]i32) = filter (>0) xs
in ...
futhark_context_report()
now internally calls
futhark_context_sync()
before collecting profiling information
(if applicable).
futhark literate
: Parse errors for expression directives now
detected properly.
futhark autotune
now works with the cuda
backend (#1312).
Devious fusion bug (#1322) causing compiler crashes.
Memory expansion bug for certain complex GPU kernels (#1328).
Complex expressions in index sections (#1332).
Handling of sizes in abstract types in the interpreter (#1333).
Type checking of explicit size requirements in loop
parameter (#1324).
Various alias checking bugs (#1300, #1340).
Published by github-actions[bot] over 3 years ago
Some uniqueness ignorance in fusion (#1291).
An invalid transformation could in rare cases cause race
conditions (#1292).
Generated Python and C code should now be warning-free.
Missing check for uses of size-lifted types (#1294).
Error in simplification of concatenations could cause compiler
crashes (#1296).
Published by github-actions[bot] over 3 years ago
futhark test
/futhark bench
errors when test data doesMismatch between how thresholds were printed and what the
autotuner was looking for (#1269).
zip
now produces unique arrays (#1271).
futhark literate
no longer chokes on lines beginning with --
without a following whitespace.
futhark literate
: :loadimg
was broken due to overzealous
type checking (#1276).
futhark literate
: :loadimg
now handles relative paths properly.
futhark hash
no longer considers the built-in prelude.
Server executables had broken store/restore commands for opaque types.
Published by github-actions[bot] over 3 years ago
New subcommand: futhark hash
.
futhark literate
is now smart about when to regenerate image and
animation files.
futhark literate
now produces better error messages passing
expressions of the wrong type to directives.
Type-checking of higher-order functions that take consuming
funtional arguments.
Missing cases in causality checking (#1263).
f32.sgn
was mistakenly defined with double precision arithmetic.
Only include double-precision atomics if actually needed by
program (this avoids problems on devices that only support single
precision).
A lambda lifting bug due to not handling existential sizes
produced by loops correctly (#1267).
Incorrect uniqueness attributes inserted by lambda lifting
(#1268).
FutharkScript record expressions were a bit too sensitive to
whitespace.
Published by github-actions[bot] over 3 years ago
futhark literate
now supports a $loadimg
builtin function for
passing images to Futhark programs.
The futhark literate
directive for generating videos is now
:video
.
Support for 64-bit atomics on CUDA and OpenCL for higher
performance with reduce_by_index
in particular.
Double-precision float atomics are used on CUDA.
New functions: f32.recip
and f64.recip
for multiplicative inverses.
Executables produced with the c
and multicore
backends now
also accept --tuning
and --size
options (although there are
not yet any tunable sizes).
New functions: scatter_2d
and scatter_3d
for scattering to
multi-dimensional arrays (#1258).
negate
(use neg
Exotic core language alias tracking bug (#1239).
Issue with entry points returning constant arrays (#1240).
Overzealous CSE collided with uniqueness types (#1241).
Defunctionalisation issue (#1242).
Tiling inside multiply nested loops (#1243).
Substitution bug in interpreter (#1250).
f32.sgn
/f64.sgn
now correct for NaN arguments.
CPU backends (c
/multicore
) are now more careful about staying
in single precision for f32
functions (#1253).
futhark test
and futhark bench
now detect program
initialisation errors in a saner way (#1246).
Partial application of operators with parameters used in a
size-dependent way now works (#1256).
An issue regarding abstract size-lifted sum types (#1260).
Published by github-actions[bot] over 3 years ago
The C API now exposes serialisation functions for opaque values.
The C API now lets you pick which stream (if any) is used for
logging prints (#1214).
New compilation mode: --server
. For now used to support faster
benchmarking and testing tools, but can be used to build even
fancier things in the future (#1179).
Significantly faster reading/writing of large values. This mainly
means that validation of test and benchmark results is much faster
(close to an order of magnitude).
The experimental futhark literate
command allows vaguely a
notebook-like programming experience.
All compilers now accept an --entry
option for treating more
functions as entry points.
The negate
function is now neg
, but negate
is kept around
for a short while for backwards compatibility.
Generated header-files are now declared extern "C"
when
processed with a C++ compiler.
Parser errors in test blocks used by futhark bench
and futhark test
are now reported with much better error messages.
Interaction between slice simplification and in-place updates
(#1222).
Problem with user-defined functions with the same name as intrinsics.
Names from transitive imports no longer leak into scope (#1231).
Pattern-matching unit values now works (#1232).
Published by github-actions[bot] almost 4 years ago
Fix tiling crash (#1203).
futhark run
now does slightly more type-checking of its inputs
(#1208).
Sum type deduplication issue (#1209).
Missing parentheses when printing sum values in interpreter.
Published by github-actions[bot] almost 4 years ago
When compiling to binaries in the C-based backends, the compiler
now respects the CFLAGS
and CC
environment variables.
GPU backends: avoid some bounds-checks for parallel sections
inside intra-kernel loops.
The cuda
backend now uses a much faster single-pass scan
implementation, although only for nonsegmented scans where the
operator operates on scalars.
futhark dataset
now correctly detects trailing commas in textual
input (#1189).
Fixed local memory capacity check for intra-group-parallel GPU kernels.
Fixed compiler bug on segmented rotates where the rotation amount
is variant to the nest (#1192).
futhark repl
no longer crashes on type errors in given file (#1193).
Fixed a simplification error for certain arithmetic expressions
(#1194).
Fixed a small uniqueness-related bug in the compilation of
operator section.
Sizes of opaque entry point arguments are now properly checked
(related to #1198).
Published by github-actions[bot] almost 4 years ago
Python backend now disables spurious NumPy overflow warnings for
both library and binary code (#1180).
Undid deadlocking over-synchronisation for freeing opaque objects.
futhark datacmp
now handles bad input files better (#1181).
Published by github-actions[bot] almost 4 years ago
The GPU loop tiler can now handle loops where only a subset of the
input arrays are tiled. Matrix-vector multiplication is one
important program where this helps (#1145).
The number of threads used by the multicore
backend is now
configurable (--num-threads
and
futhark_context_config_set_num_threads()
). (#1162)
PyOpenCL backend would mistakenly still streat entry point
argument sizes as 32 bit.
Warnings are now reported even for programs with type errors.
Multicore backend now works properly for very large iteration
spaces.
A few internal generated functions (init_constants()
,
free_constants()
) were mistakenly declared non-static.
Process exit code is now nonzero when compiler bugs and
limitations are encountered.
Multicore backend crashed on reduce_by_index
with nonempty target
and empty input.
Fixed a flattening issue for certain complex map
nestings
(#1168).
Made API function futhark_context_clear_caches()
thread safe
(#1169).
API functions for freeing opaque objects are now thread-safe
(#1169).
Tools such as futhark dataset
no longer crash with an internal
error if writing to a broken pipe (but they will return a nonzero
exit code).
Defunctionalisation had a name shadowing issue that would crop up
for programs making very advanced use of functional
representations (#1174).
Type checker erroneously permitted pattern-matching on string
literals (this would fail later in the compiler).
New coverage checker for pattern matching, which is more correct.
However, it may not provide quite as nice counter-examples
(#1134).
Fix rare internalisation error (#1177).
Published by github-actions[bot] almost 4 years ago
Made API function futhark_context_clear_caches()
thread safe
(#1169).
API functions for freeing opaque objects are now thread-safe
(#1169).