Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
APACHE-2.0 License
Bot releases are hidden (Show)
kedro-viz
on python 3.8.kedro-sphinx-theme
for documentation.Published by merelcht 6 months ago
kedro run
--telemetry
flag to kedro new
, allowing the user to register consent to have user analytics collected at the same time as the project is created.Pipeline
object creation and summing.toposort
in favour of the built-in graphlib
module.--verbose
flag.kedro pipeline create
and kedro pipeline delete
to read the base environment from the project settings.kedro catalog resolve
to read credentials properly.kedro pipeline create
from <project root>/src/tests/pipelines/<pipeline name>
to <project root>/tests/pipelines/<pipeline name>
..gitignore
to prevent pushing Mlflow local runs folder to a remote forge when using mlflow and git.node
-creation allowing self-dependencies when using transcoding, that is datasets named like name@format
._is_project
and _find_kedro_project
have been moved to kedro.utils
. We recommend not using private methods in your code, but if you do, please update your code to use the new location.merge_strategy
argument in OmegaConfigLoader.Many thanks to the following Kedroids for contributing PRs to this release:
Published by merelcht 8 months ago
%load_node
for Jupyter Notebook and Jupyter Lab.%load_node
and minimal support for Databricks.%load_node
.kedro catalog resolve
to work with dataset factories that use PartitionedDataset
._EPHEMERAL
attribute to AbstractDataset
and other Dataset classes that inherit from it.kedro-telemetry
and the data collected by it.Many thanks to the following Kedroids for contributing PRs to this release:
Published by merelcht 9 months ago
tools
.source_dir
explicitly in pyproject.toml
for non-src layout project.MemoryDataset
entries are now included in free outputs.ruff format
.SequentiallRunner
and ParallelRunner
.bootstrap_project
and configure_project
.kedro run
and hook execution order.Published by idanov 10 months ago
Release 0.19.1
kedro-telemetry
by @merelcht in https://github.com/kedro-org/kedro/pull/3417
Published by idanov 10 months ago
🚀 Major Features and improvements
--starter
flag.--conf-source
option to %reload_kedro
, allowing users to specify a source for project configuration._ProjectSettings
. This enables the use of config loader as a standalone class without affecting existing Kedro Framework users.🪲 Bug fixes and other changes
💥 Breaking changes
kedro.io
(import them from kedro-datasets instead)KEDRO_LOGGING_CONFIG
.data_set
and DataSet to dataset and Dataset everywhere.create_default_data_set()
method in the Runner in favour of using dataset factories to create default dataset instances.✍️ Documentation changes
New Contributors
Full Changelog: https://github.com/kedro-org/kedro/compare/0.18.14...0.19.0
🚨 If you are upgrading from Kedro 0.18, have a look at the migration guide.
We welcome every community contribution, large or small. See what we're working on now and report bugs or suggest future features.
Until next time,
The Kedro Team 💛
Published by idanov about 1 year ago
--template
flag for kedro pipeline create
or via template/pipeline
folder.runtime_params
resolver with OmegaConfigLoader
.OmegaConfigLoader
to handle paths containing dots outside of conf_source
.settings.py
optional.standalone-datacatalog
starter into its README file.kedro.extras.datasets
). Install and import them from the kedro-datasets
package instead.DataSet
are deprecated and will be removed in Kedro 0.19.0
and kedro-datasets
2.0.0
. Instead, use the updated class names ending with Dataset
.pandas-iris
, pyspark-iris
, pyspark
, and standalone-datacatalog
are deprecated and will be archived in Kedro 0.19.0.PartitionedDataset
and IncrementalDataset
have been moved to kedro-datasets
and will be removed in Kedro 0.19.0
. Install and import them from the kedro-datasets
package instead.Many thanks to the following Kedroids for contributing PRs to this release:
Published by idanov about 1 year ago
Release 0.18.13
Published by idanov about 1 year ago
OmegaConfigLoader
except for oc.env
.kedro catalog rank
CLI command that ranks dataset factories in the catalog by matching priority.pyproject.toml
.kedro catalog list
to show datasets generated with factories.ruff
as the linter and removed mentions of pylint
, isort
, flake8
.Thanks to Laíza Milena Scheid Parizotto and Chris Schopp.
ConfigLoader
and TemplatedConfigLoader
will be deprecated. Please use OmegaConfigLoader
instead.Published by idanov over 1 year ago
Published by idanov over 1 year ago
Published by idanov over 1 year ago
kedro run --params
now updates interpolated parameters correctly when using OmegaConfigLoader
.metadata
attribute to kedro.io
datasets. This is ignored by Kedro, but may be consumed by users or external plugins.kedro.logging.RichHandler
. This replaces the default rich.logging.RichHandler
and is more flexible, user can turn off the rich
traceback if needed.OmegaConfigLoader
will return a dict
instead of DictConfig
.OmegaConfigLoader
does not show a MissingConfigError
when the config files exist but are empty.kedro package
does not produce .egg
files anymore, and now relies exclusively on .whl
files.Many thanks to the following Kedroids for contributing PRs to this release:
Published by idanov over 1 year ago
KEDRO_LOGGING_CONFIG
environment variable, which can be used to configure logging from the beginning of the kedro
process.kedro run
CLI command to session store to improve run reproducibility using Kedro-Viz
experiment tracking.flake8
configuration.kedro.extras.datasets
.Published by idanov over 1 year ago
kedro jupyter setup
to setup Jupyter Kernel for Kedro.kedro package
now includes the project configuration in a compressed tar.gz
file.OmegaConfigLoader
to load configuration from compressed files of zip
or tar
format. This feature requires fsspec>=2023.1.0
._ProjectPipeline
.Published by idanov over 1 year ago
s3a
or s3n
filepaths--params
flagKedro-Viz
experiment trackingA regression introduced in Kedro version 0.18.5
caused the Kedro-Viz
console to fail to show experiment tracking correctly. If you experienced this issue, you will need to:
0.18.6
<project-path>/data/session_store.db
.Thanks to Kedroids tomohiko kato, tsanikgr and maddataanalyst for very detailed reports about the bug.
Published by idanov over 1 year ago
NOTE: This version of Kedro introduced a bug such that the Kedro-Viz console to fail to show experiment tracking correctly. We recommend that you don't use it and prefer instead to use Kedro version
0.18.6
.
OmegaConfigLoader
which uses OmegaConf
for loading and merging configuration.--conf-source
option to kedro run
, allowing users to specify a source for project configuration for the run.omegaconf
syntax as option for --params
. Keys and values can now be separated by colons or equals signs.yield
instead of return.
yield
before proceeding with next chunk.OmegaConfigLoader
.--namespace
flag to kedro run
to enable filtering by node namespace.node
for all four dataset hooks.kedro run
flags --nodes
, --tags
, and --load-versions
to replace --node
, --tag
, and --load-version
.kedro run
options which take a list of nodes as inputs (--from-nodes
and --to-nodes
).micropkg
manifest section in pyproject.toml
isn't recognised as allowed configuration.load_ipython_extension
not to register the %reload_kedro
line magic when called in a directory that does not contain a Kedro project.anyconfig
's ac_context
parameter to kedro.config.commons
module functions for more flexible ConfigLoader
customizations.kedro.pipeline.Pipeline
object throughout test suite with kedro.modular_pipeline.pipeline
factory.after_dataset_saved
hook only to be called for one output dataset when multiple are saved in a single node and async saving is in use.WARNING
to DEBUG
.micropkg pull
to fix vulnerability caused by CVE-2007-4559.kedro run
Many thanks to the following Kedroids for contributing PRs to this release:
project_version
will be deprecated in pyproject.toml
please use kedro_init_version
instead.kedro run
flags --node
, --tag
, and --load-version
in favour of --nodes
, --tags
, and --load-versions
.Published by idanov almost 2 years ago
kedro_datasets
with higher priority than kedro.extras.datasets
. kedro_datasets
is the namespace for the new kedro-datasets
python package.UserDict
and the configuration is accessed through conf_loader['catalog']
.settings.py
without creating a custom config loader.Type | Description | Location |
---|---|---|
svmlight.SVMLightDataSet |
Work with svmlight/libsvm files using scikit-learn library | kedro.extras.datasets.svmlight |
video.VideoDataSet |
Read and write video files from a filesystem | kedro.extras.datasets.video |
video.video_dataset.SequenceVideo |
Create a video object from an iterable sequence to use with VideoDataSet
|
kedro.extras.datasets.video |
video.video_dataset.GeneratorVideo |
Create a video object from a generator to use with VideoDataSet
|
kedro.extras.datasets.video |
dask.ParquetDataSet
to work with the dask.to_parquet
API.kedro micropkg pull
for packages on PyPI.format
in save_args
for SparkHiveDataSet
, previously it didn't allow you to save it as delta format.TensorFlowModelDataset
when used without versioning; previously, it wouldn't overwrite an existing model.tf.device
in TensorFlowModelDataset
.VersionNotFoundError
to handle insufficient permission issues for cloud storage.local_ns
rather than a global variable.ShelveStore
to its own module to ensure multiprocessing works with it.kedro.extras.datasets.pandas.SQLQueryDataSet
now takes optional argument execution_options
.attrs
upper bound to support newer versions of Airflow.setuptools
dependency to <=61.5.1.kedro test
and kedro lint
will be deprecated.We are grateful to the following for submitting PRs that contributed to this release: jstammers, FlorianGD, yash6318, carlaprv, dinotuku, williamcaicedo, avan-sh, Kastakin, amaralbf, BSGalvan, levimjoseph, daniel-falk, clotildeguinard, avsolatorio, and picklejuicedev for comments and input to documentation changes
Published by idanov about 2 years ago
Implemented autodiscovery of project pipelines. A pipeline created with kedro pipeline create <pipeline_name>
can now be accessed immediately without needing to explicitly register it in src/<package_name>/pipeline_registry.py
, either individually by name (e.g. kedro run --pipeline=<pipeline_name>
) or as part of the combined default pipeline (e.g. kedro run
). By default, the simplified register_pipelines()
function in pipeline_registry.py
looks like:
def register_pipelines() -> Dict[str, Pipeline]:
"""Register the project's pipelines.
Returns:
A mapping from pipeline names to ``Pipeline`` objects.
"""
pipelines = find_pipelines()
pipelines["__default__"] = sum(pipelines.values())
return pipelines
The Kedro IPython extension should now be loaded with %load_ext kedro.ipython
.
The line magic %reload_kedro
now accepts keywords arguments, e.g. %reload_kedro --env=prod
.
Improved resume pipeline suggestion for SequentialRunner
, it will backtrack the closest persisted inputs to resume.
False
value for rich logging show_locals
, to make sure credentials and other sensitive data isn't shown in logs.rich
.kedro run -n [some_node]
, if some_node
is missing a namespace the resulting error message will suggest the correct node name.rich
logging.delta-spark
upper bound to allow compatibility with Spark 3.1.x and 3.2.x.gdrive
to list of cloud protocols, enabling Google Drive paths for datasets.%load_ext kedro.extras.extensions.ipython
; use %load_ext kedro.ipython
instead.kedro jupyter convert
, kedro build-docs
, kedro build-reqs
and kedro activate-nbstripout
will be deprecated.Published by idanov over 2 years ago
abfss
to list of cloud protocols, enabling abfss paths.conf/base/logging.yml
is now optional. See our documentation for details.kedro.starters
entry point. This enables plugins to create custom starter aliases used by kedro starter list
and kedro new
.kedro new
prompts to just one question asking for the project name.pyyaml
upper bound to make Kedro compatible with the pyodide stack.myst_parser
instead of recommonmark
.INFO
to DEBUG
for low priority messages.info.log
/errors.log
files are no longer created in your project root, and running Kedro on read-only file systems such as Databricks Repos is now possible.root
logger is now set to the Python default level of WARNING
rather than INFO
. Kedro's logger is still set to emit INFO
level messages.SequentialRunner
now has consistent execution order across multiple runs with sorted nodes.kedro jupyter notebook/lab
no longer reuses a Jupyter kernel.cookiecutter>=2.1.1
to address a known command injection vulnerability.getpass.getuser
.AbstractDataSet
and AbstractVersionedDataSet
as well as typing to all datasets.kedro.config.default_logger
no longer exists; default logging configuration is now set automatically through kedro.framework.project.LOGGING
. Unless you explicitly import kedro.config.default_logger
you do not need to make any changes.kedro.extras.ColorHandler
will be removed in 0.19.0.Published by idanov over 2 years ago
after_context_created
that passes the KedroContext
instance as context
.after_command_run
.ParserError
exception error message.SparkDataSet
to specify a schema
load argument that allows for supplying a user-defined schema as opposed to relying on the schema inference of Spark.CONFIG_LOADER_CLASS
validation so that TemplatedConfigLoader
can be specified in settings.py. Any CONFIG_LOADER_CLASS
must be a subclass of AbstractConfigLoader
.run_params
dictionary used in pipeline hooks.Jinja2
syntax loading with TemplatedConfigLoader
using globals.yml
._active_session
, _activate_session
and _deactivate_session
. Plugins that need to access objects such as the config loader should now do so through context
in the new after_context_created
hook.config_loader
is available as a public read-only attribute of KedroContext
.hook_manager
argument optional for runner.run
.kedro docs
now opens an online version of the Kedro documentation instead of a locally built version.kedro docs
will be removed in 0.19.0.