Bot releases are hidden (Show)

kedro - 0.19.5 Latest Release

Published by merelcht 6 months ago

Bug fixes and other changes

Fixed breaking import issue when working on a project with kedro-viz on python 3.8.

Documentation changes

Updated the documentation for deploying a Kedro project with Astronomer Airflow.
Used kedro-sphinx-theme for documentation.

kedro - 0.19.4

Published by merelcht 6 months ago

Major features and improvements

Kedro commands now work from any subdirectory within a Kedro project.
Kedro CLI now provides a better error message when project commands are run outside of a project i.e. kedro run
Added the --telemetry flag to kedro new, allowing the user to register consent to have user analytics collected at the same time as the project is created.
Improved the performance of Pipeline object creation and summing.
Improved suggestions to resume failed pipeline runs.
Dropped the dependency on toposort in favour of the built-in graphlib module.
Cookiecutter errors are shown in short format without the --verbose flag.

Bug fixes and other changes

Updated kedro pipeline create and kedro pipeline delete to read the base environment from the project settings.
Updated CLI command kedro catalog resolve to read credentials properly.
Changed the path of where pipeline tests generated with kedro pipeline create from <project root>/src/tests/pipelines/<pipeline name> to <project root>/tests/pipelines/<pipeline name>.
Updated .gitignore to prevent pushing Mlflow local runs folder to a remote forge when using mlflow and git.
Fixed error handling message for malformed yaml/json files in OmegaConfigLoader.
Fixed a bug in node-creation allowing self-dependencies when using transcoding, that is datasets named like name@format.
Improved error message when passing wrong value to node.

Breaking changes to the API

Methods _is_project and _find_kedro_project have been moved to kedro.utils. We recommend not using private methods in your code, but if you do, please update your code to use the new location.

Documentation changes

Added missing description for merge_strategy argument in OmegaConfigLoader.
Added documentation on best practices for testing nodes and pipelines.
Clarified docs around using custom resolvers without a full Kedro project.

Community contributions

Many thanks to the following Kedroids for contributing PRs to this release:

kedro - 0.19.3

Published by merelcht 8 months ago

Major features and improvements

Create the debugging line magic %load_node for Jupyter Notebook and Jupyter Lab.
Add better IPython, VSCode Notebook support for %load_node and minimal support for Databricks.
Add full Kedro Node input syntax for %load_node.

load-node-debug

Bug fixes and other changes

Updated CLI Command kedro catalog resolve to work with dataset factories that use PartitionedDataset.
Addressed arbitrary file write via archive extraction security vulnerability in micropackaging.
Added the _EPHEMERAL attribute to AbstractDataset and other Dataset classes that inherit from it.
Added new JSON Schema that works with Kedro versions 0.19.*

Breaking changes to the API

Documentation changes

Enable read-the-docs search when user presses Command/Ctrl + K.
Added documentation for kedro-telemetry and the data collected by it.

Community contributions

Many thanks to the following Kedroids for contributing PRs to this release:

kedro - 0.19.2

Published by merelcht 9 months ago

Bug fixes and other changes

Removed example pipeline requirements when examples are not selected in tools.
Allowed modern versions of JupyterLab and Jupyter Notebooks.
Removed setuptools dependency
Added source_dir explicitly in pyproject.toml for non-src layout project.
MemoryDataset entries are now included in free outputs.
Removed black dependency and replaced it functionality with ruff format.
Added logging about not using async mode in SequentiallRunner and ParallelRunner.

Breaking changes to the API

Changed input format for tools option obtained from --config file from numbers to short names.

Documentation changes

Added documentation about bootstrap_project and configure_project.
Added documentation about kedro run and hook execution order.

kedro - 0.19.1

Published by idanov 10 months ago

Release 0.19.1

What's Changed

Loosen bound for kedro-telemetry by @merelcht in https://github.com/kedro-org/kedro/pull/3417

kedro - 0.19.0

Published by idanov 10 months ago

🚀 Major Features and improvements

Dropped Python 3.7 support.
Introduced project tools and example to the kedro new CLI flow.
The new spaceflights starters, spaceflights-pandas, spaceflights-pandas-viz, spaceflights-pyspark, and spaceflights-pyspark-viz can be used with the kedro new command with the --starter flag.
Added the --conf-source option to %reload_kedro, allowing users to specify a source for project configuration.
Added the functionality to choose a merging strategy for config files loaded with OmegaConfigLoader.
Modified the mechanism of importing datasets, raise more explicit error when dependencies are missing.
Added validation for configuration file used to override run commands via the CLI.
Moved the default environment base and local from config loader to _ProjectSettings. This enables the use of config loader as a standalone class without affecting existing Kedro Framework users.

🪲 Bug fixes and other changes

Added a new field tools to pyproject.toml when a project is created.
Reduced spaceflights data to minimise waiting times during tutorial execution.
Added validation to node tags to be consistent with node names.
Removed pip-tools as a dependency.
Accepted path-like filepaths more broadly for datasets.

💥 Breaking changes

Removed ConfigLoader and TemplatedConfigLoader.
Removed kedro.extras.datasets and tests (use kedro-datasets instead)
Removed PartitionedDataset and IncrementalDataset from kedro.io (import them from kedro-datasets instead)
logging is removed from OmegaConfigLoader in favour of the environment variable KEDRO_LOGGING_CONFIG.
Removed support for defining the layer attribute at top-level within DataCatalog.
Renamed data_set and DataSet to dataset and Dataset everywhere.
Removed the create_default_data_set() method in the Runner in favour of using dataset factories to create default dataset instances.
The default project template now has only one pyproject.toml at the root of the project (containing both the packaging metadata and the Kedro build config).

✍️ Documentation changes

Added new top navigation to easily switch between Framework, Viz, and Datasets.
Added new search-as-you-type to improve the search experience.

New Contributors

@MinuraPunchihewa made their first contribution in https://github.com/kedro-org/kedro/pull/3115
@mustious made their first contribution in https://github.com/kedro-org/kedro/pull/3181
@JayOaks made their first contribution in https://github.com/kedro-org/kedro/pull/3239
@adamkells made their first contribution in https://github.com/kedro-org/kedro/pull/3203
@HKABIG made their first contribution in https://github.com/kedro-org/kedro/pull/3270
@pdave34 made their first contribution in https://github.com/kedro-org/kedro/pull/3213
@hermlon made their first contribution in https://github.com/kedro-org/kedro/pull/3303

Full Changelog: https://github.com/kedro-org/kedro/compare/0.18.14...0.19.0

🚨 If you are upgrading from Kedro 0.18, have a look at the migration guide.

We welcome every community contribution, large or small. See what we're working on now and report bugs or suggest future features.
Until next time,
The Kedro Team 💛

kedro - 0.18.14

Published by idanov about 1 year ago

Release 0.18.14

Major features and improvements

Allowed using of custom cookiecutter templates for creating pipelines with --template flag for kedro pipeline create or via template/pipeline folder.
Allowed overriding of configuration keys with runtime parameters using the runtime_params resolver with OmegaConfigLoader.

Bug fixes and other changes

Updated dataset factories to resolve nested catalog config properly.
Updated OmegaConfigLoader to handle paths containing dots outside of conf_source.
Made settings.py optional.

Documentation changes

Added documentation to clarify execution order of hooks.
Added a notebook example for spaceflights to illustrate how to incrementally add Kedro features.
Moved documentation for the standalone-datacatalog starter into its README file.
Added new documentation about deploying a Kedro project with Amazon EMR.
Added new documentation about how to publish a Kedro-Viz project to make it shareable.
New TSC members added to the page and the organisation of each member is also now listed.
Plus some minor bug fixes and changes across the documentation.

Upcoming deprecations for Kedro 0.19.0

All dataset classes will be removed from the core Kedro repository (kedro.extras.datasets). Install and import them from the kedro-datasets package instead.
All dataset classes ending with DataSet are deprecated and will be removed in Kedro 0.19.0 and kedro-datasets 2.0.0. Instead, use the updated class names ending with Dataset.
The starters pandas-iris, pyspark-iris, pyspark, and standalone-datacatalog are deprecated and will be archived in Kedro 0.19.0.
PartitionedDataset and IncrementalDataset have been moved to kedro-datasets and will be removed in Kedro 0.19.0. Install and import them from the kedro-datasets package instead.

Community contributions

Many thanks to the following Kedroids for contributing PRs to this release:

kedro - 0.18.13

Published by idanov about 1 year ago

Release 0.18.13

kedro - 0.18.12

Published by idanov about 1 year ago

Release 0.18.12

Major features and improvements

Added dataset factories feature which uses pattern matching to reduce the number of catalog entries.
Activated all built-in resolvers by default for OmegaConfigLoader except for oc.env.
Added kedro catalog rank CLI command that ranks dataset factories in the catalog by matching priority.

Bug fixes and other changes

Consolidated dependencies and optional dependencies in pyproject.toml.
Made validation of unique node outputs much faster.
Updated kedro catalog list to show datasets generated with factories.

Documentation changes

Recommended ruff as the linter and removed mentions of pylint, isort, flake8.

Community contributions

Thanks to Laíza Milena Scheid Parizotto and Chris Schopp.

Breaking changes to the API

Upcoming deprecations for Kedro 0.19.0

ConfigLoader and TemplatedConfigLoader will be deprecated. Please use OmegaConfigLoader instead.

kedro - 0.18.11

Published by idanov over 1 year ago

Release 0.18.11

Major features and improvements

Added databricks-iris as an official starter.

Bug fixes and other changes

Reworked micropackaging workflow to use standard Python packaging practices.
Make kedro micropkg package accept --verbose.

Documentation changes

Significant improvements to the documentation that covers working with Databricks and Kedro, including a new page for workspace-only development, and a guide to choosing the best workflow for your use case.
Updated documentation for deploying with Prefect for version 2.0.

kedro - 0.18.10

Published by idanov over 1 year ago

Major features and improvements

Rebrand across all documentation and Kedro assets.
Added support for variable interpolation in the catalog with the OmegaConfigLoader.

kedro - 0.18.9

Published by idanov over 1 year ago

Major features and improvements

kedro run --params now updates interpolated parameters correctly when using OmegaConfigLoader.
Added metadata attribute to kedro.io datasets. This is ignored by Kedro, but may be consumed by users or external plugins.
Added kedro.logging.RichHandler. This replaces the default rich.logging.RichHandler and is more flexible, user can turn off the rich traceback if needed.

Bug fixes and other changes

OmegaConfigLoader will return a dict instead of DictConfig.
OmegaConfigLoader does not show a MissingConfigError when the config files exist but are empty.

Documentation changes

Added documentation for collaborative experiment tracking within Kedro-Viz.
Revised section on deployment to better organise content and reflect how recently docs have been updated.
Minor improvements to fix typos and revise docs to align with engineering changes.

Breaking changes to the API

kedro package does not produce .egg files anymore, and now relies exclusively on .whl files.

Community contributions

Many thanks to the following Kedroids for contributing PRs to this release:

kedro - 0.18.8

Published by idanov over 1 year ago

Major features and improvements

Added KEDRO_LOGGING_CONFIG environment variable, which can be used to configure logging from the beginning of the kedro process.
Removed logs folder from the kedro new project template. File-based logging will remain but just be level INFO and above and go to project root instead.

Bug fixes and other changes

Improvements to Jupyter E2E tests.
Added full kedro run CLI command to session store to improve run reproducibility using Kedro-Viz experiment tracking.

Documentation changes

Improvements to documentation about configuration.
Improvements to Sphinx toolchain including incrementing to use a newer version.
Improvements to documentation on visualising Kedro projects on Databricks, and additional documentation about the development workflow for Kedro projects on Databricks.
Updated Technical Steering Committee membership documentation.
Revised documentation section about linting and formatting and extended to give details of flake8 configuration.
Updated table of contents for documentation to reduce scrolling.
Expanded FAQ documentation.
Added a 404 page to documentation.
Added deprecation warnings about the removal of kedro.extras.datasets.

kedro - 0.18.7

Published by idanov over 1 year ago

Release 0.18.7

Major features and improvements

Added new Kedro CLI kedro jupyter setup to setup Jupyter Kernel for Kedro.
kedro package now includes the project configuration in a compressed tar.gz file.
Added functionality to the OmegaConfigLoader to load configuration from compressed files of zip or tar format. This feature requires fsspec>=2023.1.0.
Significant improvements to on-boarding documentation that covers setup for new Kedro users. Also some major changes to the spaceflights tutorial to make it faster to work through. We think it's a better read. Tell us if it's not.

Bug fixes and other changes

Added a guide and tooling for developing Kedro for Databricks.
Implement missing dict-like interface for _ProjectPipeline.

kedro - 0.18.6

Published by idanov over 1 year ago

Release 0.18.6

Bug fixes and other changes

Fixed bug that didn't allow to read or write datasets with s3a or s3n filepaths
Fixed bug with overriding nested parameters using the --params flag
Fixed bug that made session store incompatible with Kedro-Viz experiment tracking

Migration guide from Kedro 0.18.5 to 0.18.6

A regression introduced in Kedro version 0.18.5 caused the Kedro-Viz console to fail to show experiment tracking correctly. If you experienced this issue, you will need to:

upgrade to Kedro version 0.18.6
delete any erroneous session entries created with Kedro 0.18.5 from your session_store.db stored at <project-path>/data/session_store.db.

Thanks to Kedroids tomohiko kato, tsanikgr and maddataanalyst for very detailed reports about the bug.

kedro - 0.18.5

Published by idanov over 1 year ago

Release 0.18.5

NOTE: This version of Kedro introduced a bug such that the Kedro-Viz console to fail to show experiment tracking correctly. We recommend that you don't use it and prefer instead to use Kedro version 0.18.6.

Major features and improvements

Added new OmegaConfigLoader which uses OmegaConf for loading and merging configuration.
Added the --conf-source option to kedro run, allowing users to specify a source for project configuration for the run.
Added omegaconf syntax as option for --params. Keys and values can now be separated by colons or equals signs.
Added support for generator functions as nodes, i.e. using yield instead of return.
- Enable chunk-wise processing in nodes with generator functions.
- Save node outputs after every yield before proceeding with next chunk.
Fixed incorrect parsing of Azure Data Lake Storage Gen2 URIs used in datasets.
Added support for loading credentials from environment variables using OmegaConfigLoader.
Added new --namespace flag to kedro run to enable filtering by node namespace.
Added a new argument node for all four dataset hooks.
Added the kedro run flags --nodes, --tags, and --load-versions to replace --node, --tag, and --load-version.

Bug fixes and other changes

Commas surrounded by square brackets (only possible for nodes with default names) will no longer split the arguments to kedro run options which take a list of nodes as inputs (--from-nodes and --to-nodes).
Fixed bug where micropkg manifest section in pyproject.toml isn't recognised as allowed configuration.
Fixed bug causing load_ipython_extension not to register the %reload_kedro line magic when called in a directory that does not contain a Kedro project.
Added anyconfig's ac_context parameter to kedro.config.commons module functions for more flexible ConfigLoader customizations.
Change reference to kedro.pipeline.Pipeline object throughout test suite with kedro.modular_pipeline.pipeline factory.
Fixed bug causing the after_dataset_saved hook only to be called for one output dataset when multiple are saved in a single node and async saving is in use.
Log level for "Credentials not found in your Kedro project config" was changed from WARNING to DEBUG.
Added safe extraction of tar files in micropkg pull to fix vulnerability caused by CVE-2007-4559.
Documentation improvements
- Bug fix in table font size
- Updated API docs links for datasets
- Improved CLI docs for kedro run
- Revised documentation for visualisation to build plots and for experiment tracking
- Added example for loading external credentials to the Hooks documentation

Breaking changes to the API

Community contributions

Many thanks to the following Kedroids for contributing PRs to this release:

Upcoming deprecations for Kedro 0.19.0

project_version will be deprecated in pyproject.toml please use kedro_init_version instead.
Deprecated kedro run flags --node, --tag, and --load-version in favour of --nodes, --tags, and --load-versions.

kedro - 0.18.4

Published by idanov almost 2 years ago

Major features and improvements

Make Kedro instantiate datasets from kedro_datasets with higher priority than kedro.extras.datasets. kedro_datasets is the namespace for the new kedro-datasets python package.
The config loader objects now implement UserDict and the configuration is accessed through conf_loader['catalog'].
You can configure config file patterns through settings.py without creating a custom config loader.
Added the following new datasets:

Type	Description	Location
`svmlight.SVMLightDataSet`	Work with svmlight/libsvm files using scikit-learn library	`kedro.extras.datasets.svmlight`
`video.VideoDataSet`	Read and write video files from a filesystem	`kedro.extras.datasets.video`
`video.video_dataset.SequenceVideo`	Create a video object from an iterable sequence to use with `VideoDataSet`	`kedro.extras.datasets.video`
`video.video_dataset.GeneratorVideo`	Create a video object from a generator to use with `VideoDataSet`	`kedro.extras.datasets.video`

Implemented support for a functional definition of schema in dask.ParquetDataSet to work with the dask.to_parquet API.

Bug fixes and other changes

Fixed kedro micropkg pull for packages on PyPI.
Fixed format in save_args for SparkHiveDataSet, previously it didn't allow you to save it as delta format.
Fixed save errors in TensorFlowModelDataset when used without versioning; previously, it wouldn't overwrite an existing model.
Added support for tf.device in TensorFlowModelDataset.
Updated error message for VersionNotFoundError to handle insufficient permission issues for cloud storage.
Updated Experiment Tracking docs with working examples.
Updated MatplotlibWriter Dataset, TextDataset, plotly.PlotlyDataSet and plotly.JSONDataSet docs with working examples.
Modified implementation of the Kedro IPython extension to use local_ns rather than a global variable.
Refactored ShelveStore to its own module to ensure multiprocessing works with it.
kedro.extras.datasets.pandas.SQLQueryDataSet now takes optional argument execution_options.
Removed attrs upper bound to support newer versions of Airflow.
Bumped the lower bound for the setuptools dependency to <=61.5.1.

Minor breaking changes to the API

Upcoming deprecations for Kedro 0.19.0

kedro test and kedro lint will be deprecated.

Documentation

Revised the Introduction to shorten it
Revised the Get Started section to remove unnecessary information and clarify the learning path
Updated the spaceflights tutorial to simplify the later stages and clarify what the reader needed to do in each phase
Moved some pages that covered advanced materials into more appropriate sections
Moved visualisation into its own section
Fixed a bug that degraded user experience: the table of contents is now sticky when you navigate between pages
Added redirects where needed on ReadTheDocs for legacy links and bookmarks

Contributions from the Kedroid community

We are grateful to the following for submitting PRs that contributed to this release: jstammers, FlorianGD, yash6318, carlaprv, dinotuku, williamcaicedo, avan-sh, Kastakin, amaralbf, BSGalvan, levimjoseph, daniel-falk, clotildeguinard, avsolatorio, and picklejuicedev for comments and input to documentation changes

kedro - 0.18.3

Published by idanov about 2 years ago

Release 0.18.3

Major features and improvements

Implemented autodiscovery of project pipelines. A pipeline created with kedro pipeline create <pipeline_name> can now be accessed immediately without needing to explicitly register it in src/<package_name>/pipeline_registry.py, either individually by name (e.g. kedro run --pipeline=<pipeline_name>) or as part of the combined default pipeline (e.g. kedro run). By default, the simplified register_pipelines() function in pipeline_registry.py looks like:
```
def register_pipelines() -> Dict[str, Pipeline]:
    """Register the project's pipelines.

    Returns:
        A mapping from pipeline names to ``Pipeline`` objects.
    """
    pipelines = find_pipelines()
    pipelines["__default__"] = sum(pipelines.values())
    return pipelines
```
The Kedro IPython extension should now be loaded with %load_ext kedro.ipython.
The line magic %reload_kedro now accepts keywords arguments, e.g. %reload_kedro --env=prod.
Improved resume pipeline suggestion for SequentialRunner, it will backtrack the closest persisted inputs to resume.

Bug fixes and other changes

Changed default False value for rich logging show_locals, to make sure credentials and other sensitive data isn't shown in logs.
Rich traceback handling is disabled on Databricks so that exceptions now halt execution as expected. This is a workaround for a bug in rich.
When using kedro run -n [some_node], if some_node is missing a namespace the resulting error message will suggest the correct node name.
Updated documentation for rich logging.
Updated Prefect deployment documentation to allow for reruns with saved versioned datasets.
The Kedro IPython extension now surfaces errors when it cannot load a Kedro project.
Relaxed delta-spark upper bound to allow compatibility with Spark 3.1.x and 3.2.x.
Added gdrive to list of cloud protocols, enabling Google Drive paths for datasets.
Added svg logo resource for ipython kernel.

Upcoming deprecations for Kedro 0.19.0

The Kedro IPython extension will no longer be available as %load_ext kedro.extras.extensions.ipython; use %load_ext kedro.ipython instead.
kedro jupyter convert, kedro build-docs, kedro build-reqs and kedro activate-nbstripout will be deprecated.

kedro - 0.18.2

Published by idanov over 2 years ago

Release 0.18.2

Major features and improvements

Added abfss to list of cloud protocols, enabling abfss paths.
Kedro now uses the Rich library to format terminal logs and tracebacks.
The file conf/base/logging.yml is now optional. See our documentation for details.
Introduced a kedro.starters entry point. This enables plugins to create custom starter aliases used by kedro starter list and kedro new.
Reduced the kedro new prompts to just one question asking for the project name.

Bug fixes and other changes

Bumped pyyaml upper bound to make Kedro compatible with the pyodide stack.
Updated project template's Sphinx configuration to use myst_parser instead of recommonmark.
Reduced number of log lines by changing the logging level from INFO to DEBUG for low priority messages.
Kedro's framework-side logging configuration no longer performs file-based logging. Hence superfluous info.log/errors.log files are no longer created in your project root, and running Kedro on read-only file systems such as Databricks Repos is now possible.
The root logger is now set to the Python default level of WARNING rather than INFO. Kedro's logger is still set to emit INFO level messages.
SequentialRunner now has consistent execution order across multiple runs with sorted nodes.
Bumped the upper bound for the Flake8 dependency to <5.0.
kedro jupyter notebook/lab no longer reuses a Jupyter kernel.
Required cookiecutter>=2.1.1 to address a known command injection vulnerability.
The session store no longer fails if a username cannot be found with getpass.getuser.
Added generic typing for AbstractDataSet and AbstractVersionedDataSet as well as typing to all datasets.
Rendered the deployment guide flowchart as a Mermaid diagram, and added Dask.

Minor breaking changes to the API

The module kedro.config.default_logger no longer exists; default logging configuration is now set automatically through kedro.framework.project.LOGGING. Unless you explicitly import kedro.config.default_logger you do not need to make any changes.

Upcoming deprecations for Kedro 0.19.0

kedro.extras.ColorHandler will be removed in 0.19.0.

kedro - 0.18.1

Published by idanov over 2 years ago

Major features and improvements

Added a new hook after_context_created that passes the KedroContext instance as context.
Added a new CLI hook after_command_run.
Added more detail to YAML ParserError exception error message.
Added option to SparkDataSet to specify a schema load argument that allows for supplying a user-defined schema as opposed to relying on the schema inference of Spark.
The Kedro package no longer contains a built version of the Kedro documentation significantly reducing the package size.

Bug fixes and other changes

Removed fatal error from being logged when a Kedro session is created in a directory without git.
Fixed CONFIG_LOADER_CLASS validation so that TemplatedConfigLoader can be specified in settings.py. Any CONFIG_LOADER_CLASS must be a subclass of AbstractConfigLoader.
Added runner name to the run_params dictionary used in pipeline hooks.
Updated Databricks documentation to include how to get it working with IPython extension and Kedro-Viz.
Update sections on visualisation, namespacing, and experiment tracking in the spaceflight tutorial to correspond to the complete spaceflights starter.
Fixed Jinja2 syntax loading with TemplatedConfigLoader using globals.yml.
Removed global _active_session, _activate_session and _deactivate_session. Plugins that need to access objects such as the config loader should now do so through context in the new after_context_created hook.
config_loader is available as a public read-only attribute of KedroContext.
Made hook_manager argument optional for runner.run.
kedro docs now opens an online version of the Kedro documentation instead of a locally built version.