kernel_tuner

Kernel Tuner

APACHE-2.0 License

Downloads
659
Stars
248
Committers
26

Bot releases are hidden (Show)

kernel_tuner - Version 1.0 Latest Release

Published by benvanwerkhoven 7 months ago

Finally, the Version 1.0 release is here! The software has been stable and ready for production use for quite some time now and after being in beta for about a half a year, we are confident that the current version of the software deserves to mark the first major release of Kernel Tuner.

Version 1.0 integrates a lot of new functionality, including blazing fast search space construction, support for tuning HIP kernels on AMD GPUs, new functionality for mixed precision and accuracy tuning, experimental support for tuning OpenACC programs, a conda package installer for Kernel Tuner, and many more changes and additions.

I would like to thank every one involved in the development of Kernel Tuner of the past years! Special thanks to the Kernel Tuner developers team for their continued support of the project!

From the Changelog

  • HIP backend to support tuning HIP kernels on AMD GPUs
  • Experimental features for mixed-precision and accuracy tuning
  • Experimental features for OpenACC tuning
  • Major speedup due to new parser and using revamped python-constraint for searchspace building
  • Implemented ability to use PySMT and ATF for searchspace building
  • Added Poetry for dependency and build management
  • Switched from setup.py and setup.cfg to pyproject.toml for centralized metadata, added relevant tests
  • Updated GitHub Action workflows to use Poetry
  • Updated dependencies, most notably NumPy is no longer version-locked as scikit-opt is no longer a dependency
  • Documentation now uses pyproject.toml metadata, minor fixes and changes to be compatible with updated dependencies
  • Set up Nox for testing on all supported Python versions in isolated environments
  • Added linting information, VS Code settings and recommendations
  • Discontinued use of OrderedDict, as all dictionaries in the Python versions used are already ordered
  • Dropped Python 3.7 support

Merged Pull Requests

New Contributors

Full Changelog: https://github.com/KernelTuner/kernel_tuner/compare/0.4.5...1.0

kernel_tuner - Version 1.0.0b6

Published by fjwillemsen 11 months ago

This is a beta release for early access to the new features. Not intended for production use.

The release contains:

  • Inclusion of tests in the source package, as requested in #225
  • Updated dependencies
kernel_tuner - Version 1.0.0b5

Published by fjwillemsen 12 months ago

This is a beta release for early access to the new features. Not intended for production use.

The release contains:

Full Changelog: https://github.com/KernelTuner/kernel_tuner/compare/1.0.0b4...1.0.0b5

kernel_tuner - Version 1.0.0b4

Published by fjwillemsen 12 months ago

This is a beta release for early access to the new features. Not intended for production use.

This release contains several improvements:

  • nvidia-ml-py added to tutorial extra dependencies.
  • Additional checks for coherent Poetry configuration and warning in case of outdated development environment.
  • Updated dependencies.
kernel_tuner - Version 1.0.0b3

Published by fjwillemsen about 1 year ago

This is a beta release for early access to the new features. Not intended for production use.

This version contains several bugfixes:

  • Fix snap_to_nearest on non-numeric parameters by @stijnh in https://github.com/KernelTuner/kernel_tuner/pull/221
  • Fixed an issue where some restrictions would not be recognized by the old check_restrictions function.
  • Fixed an issue where bayes_opt would not handle pruned parameters correctly.

Full Changelog: https://github.com/KernelTuner/kernel_tuner/compare/1.0.0b2...1.0.0b3

kernel_tuner - Version 1.0.0b2

Published by fjwillemsen about 1 year ago

This is a beta release for early access to the new features. Not intended for production use.

Full Changelog: https://github.com/KernelTuner/kernel_tuner/compare/1.0.0b1...1.0.0b2

kernel_tuner - Version 1.0.0 beta 1

Published by fjwillemsen about 1 year ago

This is a beta release for early access to the new features. Not intended for production use.

What's Changed

New Contributors

Full Changelog: https://github.com/KernelTuner/kernel_tuner/compare/0.4.5...1.0.0b1

kernel_tuner - Version 0.4.5

Published by benvanwerkhoven over 1 year ago

Version 0.4.5 adds support of using PMT in combination with Kernel Tuner enabling power and energy measurements on a wide range of devices. In addition, we have worked extensively on the internals of Kernel Tuner and the interfaces of the separate components that together make up Kernel Tuner. Along with a few bugfixes, fixes of small errors in examples and documentation.

[0.4.5] - 2023-06-01

Added

  • PMTObserver to measure power and energy on various platforms

Changed

  • Improved functionality for storing output and metadata files
  • Updated PowerSensorObserver to support PowerSensor3
  • Refactored interal interfaces of runners and backends
  • Bugfix in interface to set objective and optimization direction
kernel_tuner - Version 0.4.4

Published by benvanwerkhoven over 1 year ago

Version 0.4.4

Version 0.4.4 adds extended support for energy efficiency tuning. In particular, with the new capability to fit a performance model to the target GPUs power-frequency curve. How to use these features is demonstrated in:
https://github.com/KernelTuner/kernel_tuner/blob/master/examples/cuda/going_green_performance_model.py

And described in the paper:

Going green: optimizing GPUs for energy efficiency through model-steered auto-tuning
R. Schoonhoven, B. Veenboer, B. van Werkhoven, K. J. Batenburg
International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS) at Supercomputing (SC22) 2022
https://arxiv.org/abs/2211.07260

Other than that, we've implemented a new output and metadata JSON format that adheres to the 'T4' auto-tuning schema created by the auto-tuning community at the Lorentz Center workshop in March 2022.

From the changelog:

[0.4.4] - 2023-03-09

Added

  • Support for using time_limit in simulation mode
  • Helper functions for energy tuning
  • Example to show ridge frequency and power-frequency model
  • Functions to store tuning output and metadata

Changed

  • Changed what timings are stored in cache files
  • No longer inserting partial loop unrolling factor of 0 in CUDA
kernel_tuner - Version 0.4.3

Published by benvanwerkhoven almost 2 years ago

The version 0.4.3 release consists of a large number of changes to the internals of Kernel Tuner, including the addition of a new backend based on Nvidia's official Python bindings for CUDA, as well as improved functionality for tuning energy efficiency, e.g. measuring core voltages, the measurement of power and the interface with NVML has also improved a lot.

Some of the changes are also in the "externals" of Kernel Tuner. In the sense that we have migrated from https://github.com/benvanwerkhoven/ to https://github.com/KernelTuner. The goal of this move is to bring the collection of repositories belonging to the larger Kernel Tuner project under one organization.

From the Changelog:

[0.4.3] - 2022-10-19

Added

  • A new backend that uses Nvidia cuda-python
  • Support for locked clocks in NVMLObserver
  • Support for measuring core voltages using NVML
  • Support for custom preprocessor definitions
  • Support for boolean scalar arguments in PyCUDA backend

Changed

  • Migrated from github.com/benvanwerkhoven to github.com/KernelTuner
  • Significant update to the documentation pages
  • Unified benchmarking loops across backends
  • Backends are no longer context managers
  • Replaced the method for measuring power consumption using NVML
  • Improved NVML measurements of temperature and clock frequencies
  • bugfix in parse_restrictions when using and/or in expressions
  • bugfix in GreedyILS when using neighbor method "adjacent"
  • bugfix in Bayesian Optimization for small problems
kernel_tuner - Version 0.4.2

Published by benvanwerkhoven over 2 years ago

Version 0.4.2 includes a lot of work on the search space representation, application of restrictions, and optimization strategies. In addition to the addition of several new optimization strategies, most optimization strategies should see improved performance both in terms of the number of evaluated kernel configurations as well as execution time.

Added

  • new optimization strategies: dual annealing, greedly ILS, ordered greedy MLS, greedy MLS
  • support for constant memory in cupy backend
  • constraint solver to cut down time spent in creating search spaces
  • support for custom tuning objectives
  • support for max_fevals and time_limit in strategy_options of all strategies

Removed

  • alternative Bayesian Optimization strategies that could not be used directly
  • C++ wrapper module that was too specific and hardly used

Changed

  • string-based restrictions are compiled into functions for improved performance
  • genetic algorithm, MLS, ILS, random, and simulated annealing use new search space object
  • diff evo, firefly, PSO are initialized using population of all valid configurations
  • all strategies except brute_force strictly adhere to max_fevals and time_limit
  • simulated annealing adapts annealing schedule to max_fevals if supplied
  • minimize, basinhopping, and dual annealing start from a random valid config
kernel_tuner - Version 0.4.1

Published by benvanwerkhoven about 3 years ago

This version adds a brand new Bayesian Optimization strategy, as well as some smaller features and fixes.

[0.4.1] - 2021-09-10

Added

  • support for PyTorch Tensors as input data type for kernels
  • support for smem_args in run_kernel
  • support for (lambda) function and string for dynamic shared memory size
  • a new Bayesian Optimization strategy

Changed

  • optionally store the kernel_string with store_results
  • improved reporting of skipped configurations
kernel_tuner - Version 0.4.0

Published by benvanwerkhoven over 3 years ago

This version adds a great deal of new functionality and extra flexibility and additional control to the user over what is being benchmarked and when. From the CHANGELOG:

Added

  • support for (lambda) function instead of list of strings for restrictions
  • support for (lambda) function instead of list for specifying grid divisors
  • support for (lambda) function instead of tuple for specifying problem_size
  • function to store the top tuning results
  • function to create header file with device targets from stored results
  • support for using tuning results in PythonKernel
  • option to control measurements using observers
  • support for NVML tunable parameters
  • option to simulate auto-tuning searches from existing cache files
  • Cupy backend to support C++ templated CUDA kernels
  • support for templated CUDA kernels using PyCUDA backend
  • documentation on tunable parameter vocabulary
kernel_tuner - Version 0.3.2

Published by benvanwerkhoven almost 4 years ago

Version 0.3.2

This version adds several new and recent features. Most importantly is the new feature to specify user-defined metrics for Kernel Tuner to compute along with the benchmarking results. User-defined metrics are composable, so you can define metrics that build upon other metrics. The documentation pages have also been updated to include this new feature and other recent changes.

An important change that might influence benchmark results reported by Kernel Tuner is the fact that the runner will now do a warm up of the device using the first kernel in the parameter space. This is to remove any startup or cold start delays that were significantly slowing down the first benchmarked kernel on many devices.

From the changelog:

[0.3.2] - 2020-11-04

Added

  • support loop unrolling using params that start with loop_unroll_factor
  • always insert "define kernel_tuner 1" to allow preprocessor ifdef kernel_tuner
  • support for user-defined metrics
  • support for choosing the optimization starting point x0 for most strategies

Changed

  • more compact output is printed to the terminal
  • sequential runner runs first kernel in the parameter space to warm up device
  • updated tutorials to demonstrate use of user-defined metrics
kernel_tuner - Version 0.3.1

Published by benvanwerkhoven over 4 years ago

A small release for 2 small new features and a bugfix for older GPUs.

[0.3.1] - 2020-06-11

Added

  • kernelbuilder functionality for including kernels in Python applications
  • smem_args option for dynamically allocated shared memory in CUDA kernels

Changed

  • bugfix for NVML Error on Nvidia devices without internal current sensor
kernel_tuner - Version 0.3.0

Published by benvanwerkhoven almost 5 years ago

Version 0.3.0

This is the release of version 0.3.0 of Kernel Tuner. We have done a lot of work on the internals of Kernel Tuner. This release fixes several issues, adds and extends new features, and simplifies the user interface.

[0.3.0] - 2019-12-20

Changed

  • fix for output checking, custom verify functions are called just once
  • benchmarking now returns multiple results not only time
  • more sophisticated implementation of genetic algorithm strategy
  • how the "method" option is passed, now use strategy_options

Added

  • Bayesian Optimizaton strategy, use strategy="bayes_opt"
  • support for kernels that use texture memory in CUDA
  • support for measuring energy consumption of CUDA kernels
  • option to set strategy_options to pass strategy specific options
  • option to cache and restart from tuned kernel configurations cachefile

Removed

  • Python 2 support, it may still work but we no longer test for Python 2
  • Noodles parallel runner
kernel_tuner - Version 0.2.0

Published by benvanwerkhoven almost 6 years ago

Version 0.2.0

Version 0.2.0 adds a large number of search optimization algorithms and basic support for testing and tuning Fortran kernels.

Changed

  • no longer replacing kernel names with instance strings during tuning
  • bugfix in tempfile creation that lead to too many open files error

Added

  • A minimal Fortran example and basic Fortran support
  • Particle Swarm Optimization strategy, use strategy="pso"
  • Simulated Annealing strategy, use strategy="simulated_annealing"
  • Firefly Algorithm strategy, use strategy="firefly_algorithm"
  • Genetic Algorithm strategy, use strategy="genetic_algorithm"
kernel_tuner - Version 0.1.9

Published by benvanwerkhoven over 6 years ago

[0.1.9] - 2018-04-18

Changed

  • bugfix for C backend for byte array arguments
  • argument type mismatches throw warning instead of exception

Added

  • wrapper functionality to wrap C++ functions
  • citation file and zenodo doi generation for releases
kernel_tuner - Version 0.1.8

Published by benvanwerkhoven almost 7 years ago

Version 0.1.8 brings many improvements, mostly focused on user friendliness. The installation process of optional dependencies is simplified as you can now use extras with pip. For example, pip install kernel_tuner[cuda] can be used to install both Kernel Tuner and the optional dependency PyCuda. In addition, Version 0.1.8 introduces many more checks on the user input that you pass to tune_kernel and run_kernel. For example, the kernel source code is parsed to see if the signature matches the argument list. The additional checks on input should make it easier to use and debug programs using Kernel Tuner. For a more detailed overview of the changes, see below:

[0.1.8] - 2017-11-23

Changed

  • bugfix for when using iterations smaller than 3
  • the install procedure now uses extras, e.g. [cuda,opencl]
  • option quiet makes tune_kernel completely quiet
  • extensive updates to documentation

Added

  • type checking for kernel arguments and answers lists
  • checks for reserved keywords in tunable paramters
  • checks for whether thread block dimensions are specified
  • printing units for measured time with CUDA and OpenCL
  • option to print all measured execution times
kernel_tuner - Version 0.1.7

Published by benvanwerkhoven almost 7 years ago

[0.1.7] - 2017-10-11

Changed

  • bugfix install when scipy not present
  • bugfix for GPU cleanup when using Noodles runner
  • reworked the way strings are handled internally

Added

  • option to set compiler name, when using C backend
Package Rankings
Top 11.23% on Pypi.org
Badges
Extracted from project README
Build Status CodeCov Badge PyPi Badge Zenodo Badge SonarCloud Badge OpenSSF Badge FairSoftware Badge Open In Colab Open In Colab Open In Colab Open In Colab Open In Colab Open In Colab Open In Colab Open In Colab