dace - v0.15.1 Latest Release

Published by BenWeber42 11 months ago

What's Changed

Highlights

Option for utilizing GPU global memory by @alexnick83 in https://github.com/spcl/dace/pull/1405
Add tensor storage format abstraction by @JanKleine in https://github.com/spcl/dace/pull/1392
Hierarchical Control Flow / Control Flow Regions by @phschaad in https://github.com/spcl/dace/pull/1404
GPU code generation: User-specified block/thread/warp location by @tbennun in https://github.com/spcl/dace/pull/1358
Implement loop-based Fortran intrinsics by @mcopik in https://github.com/spcl/dace/pull/1394
Change strides move assignment outside if by @Sajohn-CH in https://github.com/spcl/dace/pull/1402
Numpy fill accepts also variables by @philip-paul-mueller in https://github.com/spcl/dace/pull/1420
Implement writeset underapproximation by @matteonu in https://github.com/spcl/dace/pull/1425
Loop Regions by @phschaad in https://github.com/spcl/dace/pull/1407
Compress the SDFG generated when failing/invalid for larger codebase by @FlorianDeconinck in https://github.com/spcl/dace/pull/1456
Do not serialize non-default fields by default by @tbennun in https://github.com/spcl/dace/pull/1452

Fixes and other improvements:

replace |& which is not widely supported by @tim0s in https://github.com/spcl/dace/pull/1399
RTL codegen "line" error by @carljohnsen in https://github.com/spcl/dace/pull/1403
Bump urllib3 from 2.0.6 to 2.0.7 by @dependabot in https://github.com/spcl/dace/pull/1400
Bugfixes and extended testing for Fortran SUM by @mcopik in https://github.com/spcl/dace/pull/1390
Remove erroneous file creation in test by @JanKleine in https://github.com/spcl/dace/pull/1411
Fix for VS Code debug console: view opens sdfg in VS Code and not in browser by @kotsaloscv in https://github.com/spcl/dace/pull/1419
Bump werkzeug from 2.3.5 to 3.0.1 by @dependabot in https://github.com/spcl/dace/pull/1409
AugAssignToWCR: Support for more cases and increased test coverage by @lukastruemper in https://github.com/spcl/dace/pull/1359
Implement Subsetlist and covers_precise by @matteonu in https://github.com/spcl/dace/pull/1412
OTFMapFusion: Bugfix for tasklets with None connectors by @lukastruemper in https://github.com/spcl/dace/pull/1415
Better mangeling of the state struct in the code generator by @philip-paul-mueller in https://github.com/spcl/dace/pull/1413
Trivial map elimination init by @Sajohn-CH in https://github.com/spcl/dace/pull/1353
Fixed Improper Method Call: Replaced mktemp by @fazledyn-or in https://github.com/spcl/dace/pull/1428
Symbol specialization in auto_optimizer() never took effect. by @philip-paul-mueller in https://github.com/spcl/dace/pull/1410
Issue a warning when to_sdfg() ignores the auto_optimize flag (Issue #1380). by @philip-paul-mueller in https://github.com/spcl/dace/pull/1395
Fix schedule tree conversion for use of arrays in conditions by @tbennun in https://github.com/spcl/dace/pull/1440
Fixes for TaskletFusion, AugAssignToWCR and MapExpansion by @lukastruemper in https://github.com/spcl/dace/pull/1432
AugAssignToWCR: Minor fix for node not found error by @lukastruemper in https://github.com/spcl/dace/pull/1447
OTFMapFusion: Minor bug fixes by @lukastruemper in https://github.com/spcl/dace/pull/1448
Fix three issues related to deepcopying elements by @tbennun in https://github.com/spcl/dace/pull/1446
Fix CUDA high-dimensional test by @tbennun in https://github.com/spcl/dace/pull/1441
SDFG.arg_names was not a member but a class variable. by @philip-paul-mueller in https://github.com/spcl/dace/pull/1457
PruneConnectors: Fission into separate states before pruning by @lukastruemper in https://github.com/spcl/dace/pull/1451
In-out connector's global source when connector becomes out-only at outer SDFG scopes. by @alexnick83 in https://github.com/spcl/dace/pull/1463
Fix two regressions in v0.15 by @tbennun in https://github.com/spcl/dace/pull/1465
Fix codegen with data access on inter-state edge by @edopao in https://github.com/spcl/dace/pull/1434

New Contributors

@kotsaloscv made their first contribution in https://github.com/spcl/dace/pull/1419
@matteonu made their first contribution in https://github.com/spcl/dace/pull/1412
@philip-paul-mueller made their first contribution in https://github.com/spcl/dace/pull/1413
@fazledyn-or made their first contribution in https://github.com/spcl/dace/pull/1428

Full Changelog: https://github.com/spcl/dace/compare/v0.15...v0.15.1rc1

dace - v0.15

Published by tbennun about 1 year ago

What's Changed

Work-Depth / Average Parallelism Analysis by @hodelcl in #1363 and #1327

A new analysis engine allows SDFGs to be statically analyzed for work and depth / average parallelism. The analysis allows specifying a series of assumptions about symbolic program parameters that can help simplify and improve the analysis results. For an example on how to use the analysis, see the following example:

from dace.sdfg.work_depth_analysis import work_depth

# A dictionary mapping each SDFG element to a tuple (work, depth)
work_depth_map = {}
# Assumptions about symbolic parameters
assumptions = ['N>5', 'M<200', 'K>N']
work_depth.analyze_sdfg(mysdfg, work_depth_map, work_depth.get_tasklet_work_depth, assumptions)

# A dictionary mapping each SDFG element to its average parallelism
average_parallelism_map = {}
work_depth.analyze_sdfg(mysdfg, average_parallelism_map, work_depth.get_tasklet_avg_par, assumptions)

Symbol parameter reduction in generated code (#1338, #1344)

To improve our integration with external codes, we limit the symbolic parameters generated by DaCe to only the used symbols. Take the following code for example:

@dace
def addone(a: dace.float64[N]):
  for i in dace.map[0:10]:
    a[i] += 1

Since the internal code does not actually need N to process the array, it will not appear in the generated code. Before this release the signature of the generated code would be:

DACE_EXPORTED void __program_addone(addone_t *__state, double * __restrict__ a, int N);

After this release it is:

DACE_EXPORTED void __program_addone(addone_t *__state, double * __restrict__ a);

Note that this is a major, breaking change that requires users who manually interact with the generated .so files to adapt to.

Externally-allocated memory (workspace) support (#1294)

A new allocation lifetime, dace.AllocationLifetime.External, has been introduced into DaCe. Now you can use your DaCe code with external memory allocators (such as PyTorch) and ask DaCe for: (a) how much transient memory it will need; and (b) to use a specific pre-allocated pointer. Example:

@dace
def some_workspace(a: dace.float64[N]):
  workspace = dace.ndarray([N], dace.float64, lifetime=dace.AllocationLifetime.External)
  workspace[:] = a
  workspace += 1
  a[:] = workspace

csdfg = some_workspace.to_sdfg().compile()

sizes = csdfg.get_workspace_sizes()  # Returns {dace.StorageType.CPU_Heap: N*8}
wsp = # ...Allocate externally...
csdfg.set_workspace(dace.StorageType.CPU_Heap, wsp)

The same interface is available in the generated code:

size_t __dace_get_external_memory_size_CPU_Heap(programname_t *__state, int N);
void __dace_set_external_memory_CPU_Heap(programname_t *__state, char *ptr, int N);
// or GPU_Global...

Schedule Trees (EXPERIMENTAL, #1145)

An experimental feature that allows you to analyze your SDFGs in a schedule-oriented format. It takes in SDFGs (even after applying transformations) and outputs a tree of elements that can be printed out in a Python-like syntax. For example:

@dace.program
def matmul(A: dace.float32[10, 10], B: dace.float32[10, 10], C: dace.float32[10, 10]):
  for i in range(10):
   for j in dace.map[0:10]:
     atile = dace.define_local([10], dace.float32)
     atile[:] = A[i]
     for k in range(10):
       with dace.tasklet:
         # ...
sdfg = matmul.to_sdfg()

from dace.sdfg.analysis.schedule_tree.sdfg_to_tree import as_schedule_tree
stree = as_schedule_tree(sdfg)
print(stree.as_string())

will print:

for i = 0; (i < 10); i = i + 1:
  map j in [0:10]:
    atile = copy A[i, 0:10]
    for k = 0; (k < 10); k = (k + 1):
      C[i, j] = tasklet(atile[k], B(10) [k, j], C[i, j])

There are some new transformation classes and passes in dace.sdfg.analysis.schedule_tree.passes, for example, to remove empty control flow scopes:

class RemoveEmptyScopes(tn.ScheduleNodeTransformer):
  def visit_scope(self, node: tn.ScheduleTreeScope):
    if len(node.children) == 0:
      return None
    return self.generic_visit(node)

We hope you find new ways to analyze and optimize DaCe programs with this feature!

Other Major Changes

Support for tensor linear algebra (transpose, dot products) by @alexnick83 in #1309
(Experimental) support for nested data containers and structures by @alexnick83 in #1324
(Experimental) basic support for mpi4py syntax by @alexnick83 and @Com1t in #1070 and #1288
(Experimental) Added support for a subset of F77 and F90 language features by @acalotoiu and @mcopik #1275, #1293, #1349 and #1367

Minor Changes

Support for Python 3.12 by @alexnick83 in #1386
Support attributes in symbolic expressions by @tbennun in #1369
GPU User Experience Improvements by @tbennun in #1283
State Fusion Extension with happens before dependency edge by @acalotoiu in #1268
Add CPU_Persistent map schedule (OpenMP parallel regions) by @tbennun in #1330

Fixes and Smaller Changes:

Fix transient bug in test with array_equal of empty arrays by @tbennun in #1374
Fixes GPUTransform bug when data are already in GPU memory by @alexnick83 in #1291
Fixed erroneous parsing of data slices when the data are defined inside a nested scope by @alexnick83 in #1287
Disable OpenMP sections by default by @tbennun in #1282
Make SDFG.name a proper property by @phschaad in #1289
Refactor and fix performance regression with GPU runtime checks by @tbennun in #1292
Fixed RW dependency violation when accessing data attributes by @alexnick83 in #1296
Externally-managed memory lifetime by @tbennun in #1294
External interaction fixes by @tbennun in #1301
Improvements to RefineNestedAccess by @alexnick83 and @Sajohn-CH in #1310
Fixed erroneous parsing of while-loop conditions by @alexnick83 in #1313
Improvements to MapFusion when the Map bodies contain NestedSDFGs by @alexnick83 in #1312
Fixed erroneous code generation of indirected accesses by @alexnick83 in #1302
RefineNestedAccess take indices into account when checking for missing free symbols by @Sajohn-CH in #1317
Fixed SubgraphFusion erroneously removing/merging intermediate data nodes by @alexnick83 in #1307
Fixed SDFG DFS traversal missing InterstateEdges by @alexnick83 in #1320
Frontend now uses the AST nodes' context to infer read/write accesses by @alexnick83 in #1297
Added capability for non-strict shape validation by @alexnick83 in #1321
Fixes for persistent schedule and GPUPersistentFusion transformation by @tbennun in #1322
Relax test for inter-state edges in default schedules by @tbennun in #1326
Improvements to inference of an SDFGState's read and write sets by @Sajohn-CH in #1325 and #1329
Fixed ArrayElimination pass trying to eliminate data that were already removed in #1314
Bump certifi from 2023.5.7 to 2023.7.22 by @dependabot in #1332
Fix some underlying issues with tensor core sample by @computablee in #1336
Updated hlslib to support Xilinx Vitis >=2022.2 by @carljohnsen in #1340
Docs: mention FPGA backend tested with Intel Quartus PRO by @TizianoDeMatteis in #1335
Improved validation of NestedSDFG connectors by @alexnick83 in #1333
Remove unused global data descriptor shapes from arguments by @tbennun in #1338
Fixed Scalar data validation in NestedSDFGs by @alexnick83 in #1341
Fix for None set properties by @tbennun in #1345
Add Object to defined types in code generation and some documentation by @tbennun in #1343
Fix symbolic parsing for ternary operators by @tbennun in #1346
Fortran fix memlet indices by @Sajohn-CH in #1342
Have memory type as argument for fpga auto interleave by @TizianoDeMatteis in #1352
Eliminate extraneous branch-end gotos in code generation by @tbennun in #1355
TaskletFusion: Fix additional edges in case of none-connectors by @lukastruemper in #1360
Fix dynamic memlet propagation condition by @tbennun in #1364
Configurable GPU thread/block index types, minor fixes to integer code generation and GPU runtimes by @tbennun in #1357

New Contributors

@computablee made their first contribution in #1290
@Com1t made their first contribution in #1288
@mcopik made their first contribution in #1349

Full Changelog: https://github.com/spcl/dace/compare/v0.14.4...v0.15

dace - DaCe 0.14.4

Published by tbennun over 1 year ago

Minor release; adds support for Python 3.11.

dace - DaCe 0.14.3

Published by phschaad over 1 year ago

What's Changed

Scope Schedules

The schedule type of a scope (e.g., a Map) is now also determined by the surrounding storage. If the surrounding storage is ambiguous, dace will fail with a nice exception. This means that codes such as the one below:

@dace.program
def add(a: dace.float32[10, 10] @ dace.StorageType.GPU_Global, 
        b: dace.float32[10, 10] @ dace.StorageType.GPU_Global):
    return a + b @ b

will now automatically run the + and @ operators on the GPU.

(#1262 by @tbennun)

DaCe Profiler

Easier interface for profiling applications: dace.profile and dace.instrument can now be used within Python with a simple API:

with dace.profile(repetitions=100) as profiler:
    some_program(...)
    # ...
    other_program(...)

# Print all execution times of the last called program (other_program)
print(profiler.times[-1])

Where instrumentation is applied can be controlled with filters in the form of strings and wildcards, or with a function:

with dace.instrument(dace.InstrumentationType.GPU_Events, 
                     filter='*add??') as profiler:
    some_program(...)
    # ...
    other_program(...)

# Print instrumentation report for last call
print(profiler.reports[-1])

With dace.builtin_hooks.instrument_data, the same technique can be applied to instrument data containers.

(#1197 by @tbennun)

Improved Data Instrumentation

Data container instrumentation can further now be used conditionally, allowing saving and restoring of data container contents only if certain conditions are met. In addition to this, data instrumentation now saves the SDFG's symbol values at the time of dumping data, allowing an entire SDFG's state / context to be restored from data reports.

(#1202, #1208 by @phschaad)

Restricted SSA for Scalars and Symbols

Two new passes (ScalarFission and StrictSymbolSSA) allow fissioning of scalar data containers (or arrays of size 1) and symbols into separate containers and symbols respectively, based on the scope or reach of writes to them. This is a form of restricted SSA, which performs SSA wherever possible without introducing Phi-nodes. This change is made possible by a set of new analysis passes that provide the scope or reach of each write to scalars or symbols.

(#1198, #1214 by @phschaad)

Extending Cutout Capabilities

SDFG Cutouts can now be taken from more than one state.

Additionally, taking cutouts that only access a subset of a data containre (e.g., A[2:5] from a data container A of size N) results in the cutout receiving an "Alibi Node" to represent only that subset of the data (A_cutout[0:3] -> A[2:5], where A_cutout is of size 4). This allows cutouts to be significantly smaller and have a smaller memory footprint, simplifying debugging and localized optimization.

Finally, cutouts now contain an exact description of their input and output configuration. The input configuration is anything that may influence a cutout's behavior and may contain data before the cutout is executed in the context of the original SDFG. Similarly, the output configuration is anything that a cutout writes to, that may be read externally or may influence the behavior of the remaining SDFG. This allows isolating all side effects of changes to a particular cutout, allowing transformations to be tested and verified in isolation and simplifying debugging.

(#1201 by @phschaad)

Bug Fixes, Compatability Improvements, and Other Changes

SymPy 1.12 Compatibility by @alexnick83 in https://github.com/spcl/dace/pull/1256
GPU Grid-Strided Tiling by @C-TC in https://github.com/spcl/dace/pull/1249
Fix MapInterchange for Maps with dynamic inputs by @alexnick83 in https://github.com/spcl/dace/pull/1244
Assortment of fixes for dynamic Maps on GPU (dynamic thread blocks) by @alexnick83 in https://github.com/spcl/dace/pull/1246
Tuning Compatibility Fixes by @lukastruemper in https://github.com/spcl/dace/pull/1234
Inline preprocessor command by @tbennun in https://github.com/spcl/dace/pull/1242
unsqueeze_memlet fixes by @alexnick83 in https://github.com/spcl/dace/pull/1203
Fix-intermediate-nodes by @alexnick83 in https://github.com/spcl/dace/pull/1212
Fix for LoopToMap when applied on multi-nested loops by @alexnick83 in https://github.com/spcl/dace/pull/1207
Fix-nested-sdfg-deepcopy by @alexnick83 in https://github.com/spcl/dace/pull/1221
Fix integer division in Python frontend by @tbennun in https://github.com/spcl/dace/pull/1196
Fix augmented assignment on scalar in condition by @tbennun in https://github.com/spcl/dace/pull/1225
Fix internal subscript access if already existed by @tbennun in https://github.com/spcl/dace/pull/1228
Fix atomic operation detection for exactly-overlapping ranges by @tbennun in https://github.com/spcl/dace/pull/1230
Fix-gpu-transform-copy-out by @alexnick83 in https://github.com/spcl/dace/pull/1231
Fix-interstate-free-symbols by @alexnick83 in https://github.com/spcl/dace/pull/1238
Fix nested access with nested symbol dependency by @alexnick83 in https://github.com/spcl/dace/pull/1239
Fix import in the transformations tutorial. by @lamyiowce in https://github.com/spcl/dace/pull/1210
LoopToMap detects shared transients by @alexnick83 in https://github.com/spcl/dace/pull/1200
Faster CI and reachability checks for codecov.io by @tbennun in https://github.com/spcl/dace/pull/1213
Map-fission-single-data-multi-connectors by @alexnick83 in https://github.com/spcl/dace/pull/1216
Add library path to HIP CMake by @tbennun in https://github.com/spcl/dace/pull/1219
BatchedMatMul: MKL gemm_batch support by @lukastruemper in https://github.com/spcl/dace/pull/1181

Full Changelog: https://github.com/spcl/dace/compare/v0.14.2...v0.14.3

Please let us know if there are any regressions with this new release.

dace - DaCe 0.14.2

Published by phschaad over 1 year ago

What's Changed

GPU instrumentation support with LIKWID by @lukastruemper
New GPU expansion for the Reduce Library Node by @hodelcl
CSRMM and CSRMV Library Nodes by @alexnick83, @lukastruemper, and @C-TC
New transformations (Temporal Vectorization, HBM Transform) and other FPGA improvements by @carljohnsen, @jnice-81, @sarahtr, and @TizianoDeMatteis
AMD GPU-related fixes and rocBLAS GEMM by @tbennun

Full Changelog: https://github.com/spcl/dace/compare/v0.14.1...v0.14.2

dace - DaCe 0.14.1

Published by tbennun about 2 years ago

This release of DaCe offers mostly stability fixes for the Python frontend, transformations, and callbacks.

Full Changelog: https://github.com/spcl/dace/compare/v0.14...v0.14.1

dace - DaCe 0.14

Published by tbennun about 2 years ago

What's Changed

This release brings forth a major change to how SDFGs are simplified in DaCe, using the Simplify pass pipeline. This both improves the performance of DaCe's transformations and introduces new types of simplification, such as dead dataflow elimination.

Please let us know if there are any regressions with this new release.

Features

Breaking change: The experimental dace.constant type hint has now achieved stable status and was renamed to dace.compiletime
Major change: Only modified configuration entries are now stored in ~/.dace.conf. The SDFG build folders still include the full configuration file. Old .dace.conf files are detected and migrated automatically.
Detailed, multi-platform performance counters are now available via native LIKWID instrumentation (by @lukastruemper in https://github.com/spcl/dace/pull/1063). To use, set .instrument to dace.InstrumentationType.LIKWID_Counters
GPU Memory Pools are now supported through CUDA's mallocAsync API. To enable, set desc.pool = True on any GPU data descriptor.
Map schedule and array storage types can now be annotated directly in Python code (by @orausch in https://github.com/spcl/dace/pull/1088). For example:

import dace
from dace.dtypes import StorageType, ScheduleType

N = dace.symbol('N')

@dace
def add_on_gpu(a: dace.float64[N] @ StorageType.GPU_Global,
               b: dace.float64[N] @ StorageType.GPU_Global):
  # This map will become a GPU kernel
  for i in dace.map[0:N] @ ScheduleType.GPU_Device:
    b[i] = a[i] + 1.0

Customizing GPU block dimension and OpenMP threading properties per map is now supported
Optional arrays (i.e., arrays that can be None) can now be annotated in the code. The simplification pipeline also infers non-optional arrays from their use and can optimize code by eliminating branches. For example:

@dace
def optional(maybe: Optional[dace.float64[20]], always: dace.float64[20]):
  always += 1  # "always" is always used, so it will not be optional
  if maybe is None:  # This condition will stay in the code
    return 1
  if always is None:  # This condition will be eliminated in simplify
    return 2
  return 3

Minor changes

Miscellaneous fixes to transformations and passes
Fixes for string literal ("string") use in the Python frontend
einsum is now a library node
If CMake is already installed, it is now detected and will not be installed through pip
Add kernel detection flag by @TizianoDeMatteis in https://github.com/spcl/dace/pull/1061
Better support for __array_interface__ objects by @gronerl in https://github.com/spcl/dace/pull/1071
Replacements look up base classes by @tbennun in https://github.com/spcl/dace/pull/1080

Full Changelog: https://github.com/spcl/dace/compare/v0.13.3...v0.14

dace - DaCe 0.13.3

Published by tbennun over 2 years ago

What's Changed

Better integration with Visual Studio Code: Calling sdfg.view() inside a VSCode console or debug session will open the file directly in the editor!
Code generator for the Snitch RISC-V architecture (by @noah95 and @am-ivanov)
Minor hotfixes to Python frontend, transformations, and code generation (with @orausch)

Full Changelog: https://github.com/spcl/dace/compare/v0.13.2...v0.13.3

dace - DaCe 0.13.2

Published by tbennun over 2 years ago

What's Changed

New API for SDFG manipulation: Passes and Pipelines. More about that in the next major release!
Various fixes to frontend, type inference, and code generation.
Support for more numpy and Python functions: arange, round, etc.
Better callback support:
- Support callbacks with keyword arguments
- Support literal lists, tuples, sets, and dictionaries in callbacks
New transformations: move loop into map, on-the-fly-recomputation map fusion
Performance improvements to frontend
Better Docker container compatibility via fixes for config files without a home directory
Add interface to check whether in a DaCe parsing context in https://github.com/spcl/dace/pull/998

def potentially_parsed_by_dace():
    if not dace.in_program():
        print('Called by Python interpreter!')
   else:
       print('Compiled with DaCe!')

Support compressed (gzipped) SDFGs. Loads normally, saves with:

sdfg.save('myprogram.sdfgz', compress=True)  # or just run gzip on your old SDFGs

SDFV: Add web serving capability by @orausch in https://github.com/spcl/dace/pull/1013. Use for interactively debugging SDFGs on remote nodes with: sdfg.view(8080) (or any other port)

Full Changelog: https://github.com/spcl/dace/compare/v0.13.1...v0.13.2

dace - DaCe 0.13.1

Published by tbennun over 2 years ago

What's Changed

Python frontend: Bug fixes for closures and callbacks in nested scopes
Bug fixes for several transformations (StateFusion, RedundantSecondArray)
Fixes for issues with FORTRAN ordering of numpy arrays
Python object duplicate reference checks in SDFG validation

Full Changelog: https://github.com/spcl/dace/compare/v0.13...v0.13.1

dace - DaCe 0.13

Published by tbennun over 2 years ago

New Features

Cutout:

Cutout allows developers to take large DaCe programs and cut out subgraphs reliably to create a runnable sub-program. This sub-program can be then used to check for correctness, benchmark, and transform a part of a program without having to run the full application.
* Example usage from Python:

def my_method(sdfg: dace.SDFG, state: dace.SDFGState):
    nodes = [n for n in state if isinstance(n, dace.nodes.LibraryNode)]  # Cut every library node
    cut_sdfg: dace.SDFG = cutout.cutout_state(state, *nodes)
    # The cut SDFG now includes each library node and all the necessary arrays to call it with

Also available in the SDFG editor:

Data Instrumentation:

Just like node instrumentation for performance analysis, data instrumentation allows users to set access nodes to be saved to an instrumented data report, and loaded later for exact reproducible runs.
* Data instrumentation natively works with CPU and GPU global memory, so there is no need to copy data back
* Combined with Cutout, this is a powerful interface to perform local optimizations in large applications with ease!
* Example use:

    @dace.program
    def tester(A: dace.float64[20, 20]):
        tmp = A + 1
        return tmp + 5

    sdfg = tester.to_sdfg()
    for node, _ in sdfg.all_nodes_recursive():  # Instrument every access node
        if isinstance(node, nodes.AccessNode):
            node.instrument = dace.DataInstrumentationType.Save

    A = np.random.rand(20, 20)
    result = sdfg(A)

    # Get instrumented data from report
    dreport = sdfg.get_instrumented_data()
    assert np.allclose(dreport['A'], A)
    assert np.allclose(dreport['tmp'], A + 1)
    assert np.allclose(dreport['__return'], A + 6)

Logical Groups:

SDFG elements can now be grouped by any criteria, and they will be colored during visualization by default (by @phschaad). See example in action:

Changes and Bug Fixes

Samples and tutorials have now been updated to reflect the latest API
Constants (added with sdfg.add_constant) can now be used as access nodes in SDFGs. The constants are hard-coded into the generated program, so you can run code with the best performance possible.
View nodes can now use the views connector to disambiguate which access node is being viewed
Python frontend: else clause is now handled in for and while loops
Scalars have been removed from the __dace_init generated function signature (by @orausch)
Multiple clock signals in the RTL codegen (by @carljohnsen)
Various fixes to frontends, transformations, and code generators

Full Changelog available at https://github.com/spcl/dace/compare/v0.12...v0.13

dace - DaCe 0.12

Published by tbennun almost 3 years ago

API Changes

Important: Pattern-matching transformation API has been significantly simplified. Transformations using the old API must be ported! Summary of changes:

Transformations now expand either the SingleStateTransformation or MultiStateTransformation classes instead of using decorators
Patterns must be registered as class variables called PatternNodes
Nodes in matched patterns can be then accessed in can_be_applied and apply directly using self.nodename
The name strict is now replaced with permissive (False by default). Permissive mode allows transformations to match in more cases, but may be dangerous to apply (e.g., create race conditions).
can_be_applied is now a method of the transformation
The apply method accepts a graph and the SDFG.

Example of using the new API:

import dace
from dace import nodes
from dace.sdfg import utils as sdutil
from dace.transformation import transformation as xf

class ExampleTransformation(xf.SingleStateTransformation):
    # Define pattern nodes
    map_entry = xf.PatternNode(nodes.MapEntry)
    access = xf.PatternNode(nodes.AccessNode)

    # Define matching subgraphs
    @classmethod
    def expressions(cls):
        # MapEntry -> Access
        return [sdutil.node_path_graph(cls.map_entry, cls.access)]

    def can_be_applied(self, graph: dace.SDFGState, expr_index: int, sdfg: dace.SDFG, permissive: bool = False) -> bool:
        # Returns True if the transformation can be applied on a subgraph
        if permissive:  # In permissive mode, we will always apply this transformation
            return True
        return self.map_entry.schedule == dace.ScheduleType.CPU_Multicore

    def apply(self, graph: dace.SDFGState, sdfg: dace.SDFG):
        # Apply the transformation using the SDFG API
        pass

Simplifying SDFGs is renamed from sdfg.apply_strict_transformations() to sdfg.simplify()

AccessNodes no longer have an AccessType field.

Other changes

More nested SDFG inlining opportunities by default with the multi-state inline transformation
Performance optimizations of the DaCe framework (parsing, transformations, code generation) for large graphs
Support for Xilinx Vitis 2021.2
Minor fixes to transformations and deserialization

Full Changelog: https://github.com/spcl/dace/compare/v0.11.4...v0.12

dace - DaCe 0.11.4

Published by tbennun almost 3 years ago

What's Changed

If a Python call cannot be parsed into a data-centric program, DaCe will automatically generate a callback into Python. Supports CPU arrays and GPU arrays (via CuPy) without copying!
Python 3.10 support
CuPy arrays are supported when calling @dace.programs in JIT mode
Fix various issues in Python frontend and code generation

Full Changelog: https://github.com/spcl/dace/compare/v0.11.3...v0.11.4

dace - DaCe 0.11.3

Published by tbennun almost 3 years ago

What's Changed

Minor fixes to exceptions in Python parsing.

Full Changelog: https://github.com/spcl/dace/compare/v0.11.2...v0.11.3

dace - DaCe 0.11.2

Published by tbennun almost 3 years ago

What's Changed

Various bug fixes to the Python frontend

Full Changelog: https://github.com/spcl/dace/compare/v0.11.1...v0.11.2

dace - DaCe 0.11.1

Published by tbennun about 3 years ago

What's Changed

More flexible Python frontend: you can now call functions and object methods, use fields and globals in @dace programs! Some examples:
- There is no need to annotate called functions
- @dataclass and general object field support
- Loop unrolling: implicit and explicit (with the dace.unroll generator)
- Constant folding and explicit constant arguments (with dace.constant as a type hint)
- Debuggability: all functions (e.g. dace.map, dace.tasklet) work in pure Python as well
- and many more features
NumPy semantics are followed more closely, e.g., subscripts create array views
Direct CuPy and torch.tensor integration in @dace program arguments
Auto-optimization (preview): use @dace.program(auto_optimize=True, device=dace.DeviceType.CPU) to automatically run some transformations, such as turning loops into parallel maps.
ARM SVE code generation support by @sscholbe (#705)
Support for MLIR tasklets by @Berke-Ates in (#747)
Source Mapping by @benibenj in https://github.com/spcl/dace/pull/756
Support for HBM on Xilinx FPGAs by @jnice-81 (#762)

Miscellaneous:

Various performance optimizations to calling @dace programs
Various bug fixes to transformations, code generator, and frontends

Full Changelog: https://github.com/spcl/dace/compare/v0.10.8...v0.11.1

dace - DaCe 0.10.8

Published by tbennun over 3 years ago

What's New?

Various bug fixes and more stable Python/NumPy frontend
Support for running DaCe programs within the Python interpreter
(experimental) Support for automatic optimization passes (more coming soon!)

dace - DaCe 0.10

Published by tbennun about 4 years ago

What's New?

Python frontend improvements: More Python features are supported, such as return values, tuples, and numpy broadcasting. @dace.programs can now call other programs or SDFGs.
AMD GPU (HIP) Support: AMD GPUs are now fully supported with HIP code generation.
Easy-to-use transformation APIs: Apply transformation compositions with one call, enumerate subgraph matches manually, and many more functions now available as part of the dace API. See the new tutorial for examples.
Faster code generation: Backends now generate lower-level code that is more compiler-friendly.
Instrumentation interface: Setting the instrument property for SDFG nodes and states enables easy-to-use, localized performance reporting with timers, GPU events, and PAPI performance counters.
DaCe VSCode plugin: Interactive SDFG viewer and optimizer as part of Visual Studio Code. Download the plugin here.
Type inference and connector types: In addition to automatic type inference, connectors on nodes can now be defined with explicit types, giving more fine-grained control over type reinterpreting and vector types.
Subgraph transformations: New transformation type that can work on arbitrary subgraphs. For example, fuse any computation within a state with SubgraphFusion.
Persistent GPU kernel schedule: Launch persistent kernels with a change of a property! Proportion used of GPU multiprocessors is configurable.
More transformations: Loop manipulation and other new transformations now available with DaCe. Some transformations (such as Vectorization) made more robust to corner cases.
More tools: Use sdfgcc to quickly compile and optimize .sdfg files from the command line, generating header and library files. Great for interoperability and Makefiles.
Short DaCe annotation: Data-centric functions can now be annotated with @dace.
Many minor fixes and additions: More library nodes (such as einsum) and new properties added, enabling faster performance and more productive high-performance coding than ever.

dace - DaCe 0.9.5

Published by tbennun almost 5 years ago

What's New?

Intel FPGA backend: Generates and compiles Intel FPGA OpenCL code from SDFGs.
Renderer: Many improvements to the scalability of drawing large SDFGs, touch/mobile support, and code view upon zooming into Tasklets.
SDFV: Now includes a sidebar with information about clicked nodes/edges/states.
GPU reduction: Now supports Reduce nodes where output array contains multiple dimensions (if contiguous). On other cases, use the ReduceExpansion transformation.
Faster compilation: Improved CMake usage to speed up compilation time if files were not changed.
Stability: Various fixes to the Python frontend, transformations, code generation, and DIODE (on Linux and Windows).
Generated programs now include header (.h) file and an example C program that invokes the compiled SDFG.

dace - DaCe 0.9

Published by tbennun about 5 years ago

What's New

NumPy syntax for Python: Wrap Python functions that work on numpy arrays with @dace.program and create SDFGs from implicit dataflow.
DIODE 2.0: DIODE has been reworked to operate in the browser, and works natively on Windows. Note that it is currently experimental, and some features may cause errors. We are happy to fix bugs if you find and report issues!
Standalone SDFG renderer (SDFV) and improved Jupyter support: Contextual, optimized SDFG drawing with collapsible scopes (double-click a map, a state, or a nested SDFG). Fully integrated into Jupyter notebooks.
Transformations: Improvements to scalability of subgraph pattern matching and memlet propagation.
Improvements to the TensorFlow frontend.
Many minor bug fixes and several API improvements.