dagster

An orchestration platform for the development, production, and observation of data assets.

APACHE-2.0 License

Downloads
12.2M
Stars
11.1K
Committers
367

Bot releases are hidden (Show)

dagster - 1.7.2 (core) / 0.23.2 (libraries)

Published by elementl-devtools 6 months ago

New

  • Performance improvements when loading large asset graphs in the Dagster UI.
  • @asset_check functions can now be invoked directly for unit testing.
  • dagster-embedded-elt dlt resource DagsterDltResource can now be used from @op definitions in addition to assets.
  • UPathIOManager.load_partitions has been added to assist with helping UpathIOManager subclasses deal with serialization formats which support partitioning. Thanks @danielgafni!
  • [dagster-polars] now supports other data types rather than only string for the partitioning columns. Also PolarsDeltaIOManager now supports MultiPartitionsDefinition with DeltaLake native partitioning. Metadata value "partition_by": {"dim_1": "col_1", "dim_2": "col_2"} should be specified to enable this feature. Thanks @danielgafni!

Bugfixes

  • [dagster-airbyte] Auto materialization policies passed to load_assets_from_airbyte_instance and load_assets_from_airbyte_project will now be properly propagated to the created assets.
  • Fixed an issue where deleting a run that was intended to materialize a partitioned asset would sometimes leave the status of that asset as “Materializing” in the Dagster UI.
  • Fixed an issue with build_time_partition_freshness_checks where it would incorrectly intuit that an asset was not fresh in certain cases.
  • [dagster-k8s] Fix an error on transient ‘none’ responses for pod waiting reasons. Thanks @piotrmarczydlo!
  • [dagster-dbt] Failing to build column schema metadata will now result in a warning rather than an error.
  • Fixed an issue where incorrect asset keys would cause a backfill to fail loudly.
  • Fixed an issue where syncing unmaterialized assets could include source assets.

Breaking Changes

  • [dagster-polars] PolarsDeltaIOManager no longer supports loading natively partitioned DeltaLake tables as dictionaries. They should be loaded as a single pl.DataFrame/pl.LazyFrame instead.

Documentation

  • Renamed Dagster Cloud to Dagster+ all over the docs.
  • Added a page about Change Tracking in Dagster+ branch deployments.
  • Added a section about user-defined metrics to the Dagster+ Insights docs.
  • Added a section about Asset owners to the asset metadata docs.

Dagster Cloud

  • Branch deployments now have Change Tracking. Assets in each branch deployment will be compared to the main deployment. New assets and changes to code version, dependencies, partitions definitions, tags, and metadata will be marked in the UI of the branch deployment.
  • Pagerduty alerting is now supported with Pro plans. See the documentation for more info.
  • Asset metadata is now included in the insights metrics for jobs materializing those assets.
  • Per-run Insights are now available on individual assets.
  • Previously, the before_storage_id / after_storage_id values in the AssetRecordsFilter class were ignored. This has been fixed.
  • Updated the output of dagster-cloud deployment alert-policies list to match the format of sync.
  • Fixed an issue where Dagster Cloud agents with many code locations would sometimes leave code servers running after the agent shut down.
dagster - 1.7.1 (core) / 0.23.1 (libraries)

Published by elementl-devtools 6 months ago

New

  • [dagster-dbt][experimental] A new cli command dagster-dbt project prepare-for-deployment has been added in conjunction with DbtProject for managing the behavior of rebuilding the manifest during development and preparing a pre-built one for production.

Bugfixes

  • Fixed an issue with duplicate asset check keys when loading checks from a package.
  • A bug with the new build_last_update_freshness_checks and build_time_partition_freshness_checks has been fixed where multi_asset checks passed in would not be executable.
  • [dagster-dbt] Fixed some issues with building column lineage for incremental models, models with implicit column aliases, and models with columns that have multiple dependencies on the same upstream column.

Breaking Changes

  • [dagster-dbt] The experimental DbtArtifacts class has been replaced by DbtProject.

Documentation

  • Added a dedicated concept page for all things metadata and tags
  • Moved asset metadata content to a dedicated concept page: Asset metadata
  • Added section headings to the Software-defined Assets API reference, which groups APIs by asset type or use
  • Added a guide about user settings in the Dagster UI
  • Added AssetObservation to the Software-defined Assets API reference
  • Renamed Dagster Cloud GitHub workflow files to the new, consolidated dagster-cloud-deploy.yml
  • Miscellaneous formatting and copy updates
  • [community-contribution] [dagster-embedded-elt] Fixed get_asset_key API documentation (thanks @aksestok!)
  • [community-contribution] Updated Python version in contributing documentation (thanks @piotrmarczydlo!)
  • [community-contribution] Typo fix in README (thanks @MiConnell!)

Dagster Cloud

  • Fixed a bug where an incorrect value was being emitted for BigQuery bytes billed in Insights.
dagster - 1.7.0 (core) / 0.23.0 (libraries)

Published by elementl-devtools 7 months ago

Major Changes since 1.6.0 (core) / 0.22.0 (libraries)

  • Asset definitions can now have tags, via the tags argument on @asset, AssetSpec, and AssetOut. Tags are meant to be used for organizing, filtering, and searching for assets.
  • The Asset Details page has been revamped to include an “Overview” tab that centralizes the most important information about the asset – such as current status, description, and columns – in a single place.
  • Assets can now be assigned owners.
  • Asset checks are now considered generally available and will no longer raise experimental warnings when used.
  • Asset checks can now be marked blocking, which causes downstream assets in the same run to be skipped if the check fails with ERROR-level severity.
  • The new @multi_asset_check decorator enables defining a single op that executes multiple asset checks.
  • The new build_last_updated_freshness_checks and build_time_partition_freshness_checks APIs allow defining asset checks that error or warn when an asset is overdue for an update. Refer to the Freshness checks guide for more info.
  • The new build_column_schema_change_checks API allows defining asset checks that warn when an asset’s columns have changed since its latest materialization.
  • In the asset graph UI, the “Upstream data”, “Code version changed”, and “Upstream code version” statuses have been collapsed into a single “Unsynced” status. Clicking on “Unsynced” displays more detailed information.
  • I/O managers are now optional. This enhances flexibility for scenarios where they are not necessary. For guidance, see When to use I/O managers.
    • Assets with None or MaterializeResult return type annotations won't use I/O managers; dependencies for these assets can be set using the deps parameter in the @asset decorator.
  • [dagster-dbt] Dagster’s dbt integration can now be configured to automatically collect metadata about column schema and column lineage.
  • [dagster-dbt] dbt tests are now pulled in as Dagster asset checks by default.
  • [dagster-dbt] dbt resource tags are now automatically pulled in as Dagster asset tags.
  • [dagster-dbt] dbt owners from dbt groups are now automatically pulled in as Dagster owners.
  • [dagster-snowflake] [dagster-gcp] The dagster-snowflake and dagster-gcp packages now both expose a fetch_last_updated_timestamps API, which makes it straightforward to collect data freshness information in source asset observation functions.

Changes since 1.6.14 (core) / 0.22.14 (libraries)

New

  • Metadata attached during asset or op execution can now be accessed in the I/O manager using OutputContext.output_metadata.
  • [experimental] Single-run backfills now support batched inserts of asset materialization events. This is a major performance improvement for large single-run backfills that have database writes as a bottleneck. The feature is off by default and can be enabled by setting the DAGSTER_EVENT_BATCH_SIZE environment variable in a code server to an integer (25 recommended, 50 max). It is only currently supported in Dagster Cloud and OSS deployments with a postgres backend.
  • [ui] The new Asset Details page is now enabled for new users by default. To turn this feature off, you can toggle the feature in the User Settings.
  • [ui] Queued runs now display a link to view all the potential reasons why a run might remain queued.
  • [ui] Starting a run status sensor with a stale cursor will now warn you in the UI that it will resume from the point that it was paused.
  • [asset-checks] Asset checks now support asset names that include ., which can occur when checks are ingested from dbt tests.
  • [dagster-dbt] The env var DBT_INDIRECT_SELECTION will no longer be set to empty when executing dbt tests as asset checks, unless specific asset checks are excluded. dagster-dbt will no longer explicitly select all dbt tests with the dbt cli, which had caused argument length issues.
  • [dagster-dbt] Singular tests with a single dependency are now ingested as asset checks.
  • [dagster-dbt] Singular tests with multiple dependencies must have the primary dependency must be specified using dbt meta.
{{
    config(
        meta={
            'dagster': {
                'ref': {
                    'name': <ref_name>,
                    'package': ... # Optional, if included in the ref.
                    'version': ... # Optional, if included in the ref.
                },
            }
        }
    )
}}

...
  • [dagster-dbt] Column lineage metadata can now be emitted when invoking dbt. See the documentation for details.
  • [experimental][dagster-embedded-elt] Add the data load tool (dlt) integration for easily building and integration dlt ingestion pipelines with Dagster.
  • [dagster-dbt][community-contribution] You can now specify a custom schedule name for schedules created with build_schedule_from_dbt_selection. Thanks @dragos-pop!
  • [helm][community-contribution] You can now specify a custom job namespace for your user code deployments. Thanks @tmatthews0020!
  • [dagster-polars][community-contribution] Column schema metadata is now integrated using the dagster-specific metadata key in dagster_polars. Thanks @danielgafni!
  • [dagster-datadog][community-contribution] Added datadog.api module to the DatadogClient resource, enabling direct access to API methods. Thanks @shivgupta!

Bugfixes

  • Fixed a bug where run status sensors configured to monitor a specific job would trigger for jobs with the same name in other code locations.
  • Fixed a bug where multi-line asset check result descriptions were collapsed into a single line.
  • Fixed a bug that caused a value to show up under “Target materialization” in the asset check UI even when an asset had had observations but never been materialized.
  • Changed typehint of metadata argument on multi_asset and AssetSpec to Mapping[str, Any].
  • [dagster-snowflake-pandas] Fixed a bug introduced in 0.22.4 where column names were not using quote identifiers correctly. Column names will now be quoted.
  • [dagster-aws] Fixed an issue where a race condition where simultaneously materializing the same asset more than once would sometimes raise an Exception when using the s3_io_manager.
  • [ui] Fixed a bug where resizable panels could inadvertently be hidden and never recovered, for instance the right panel on the global asset graph.
  • [ui] Fixed a bug where opening a run with an op selection in the Launchpad could lose the op selection setting for the subsequently launched run. The op selection is now correctly preserved.
  • [community-contribution] Fixed dagster-polars tests by excluding Decimal types. Thanks @ion-elgreco!
  • [community-contribution] Fixed a bug where auto-materialize rule evaluation would error on FIPS-compliant machines. Thanks @jlloyd-widen!
  • [community-contribution] Fixed an issue where an excessive DeprecationWarning was being issued for a ScheduleDefinition passed into the Definitions object. Thanks @2Ryan09!

Breaking Changes

  • Creating a run with a custom non-UUID run_id was previously private and only used for testing. It will now raise an exception.
  • [community-contribution] Previously, calling get_partition_keys_in_range on a MultiPartitionsDefinition would erroneously return partition keys that were within the one-dimensional range of alphabetically-sorted partition keys for the definition. Now, this method returns the cartesian product of partition keys within each dimension’s range. Thanks, @mst!
  • Added AssetCheckExecutionContext to replace AssetExecutionContext as the type of the context param passed in to @asset_check functions. @asset_check was an experimental decorator.
  • [experimental] @classmethod decorators have been removed from dagster-embedded-slt.sling DagsterSlingTranslator
  • [dagster-dbt] @classmethod decorators have been removed from DagsterDbtTranslator.
  • [dagster-k8s] The default merge behavior when raw kubernetes config is supplied at multiple scopes (for example, at the instance level and for a particluar job) has been changed to be more consistent. Previously, configuration was merged shallowly by default, with fields replacing other fields instead of appending or merging. Now, it is merged deeply by default, with lists appended to each other and dictionaries merged, in order to be more consistent with how kubernetes configuration is combined in all other places. See the docs for more information, including how to restore the previous default merge behavior.

Deprecations

  • AssetSelection.keys() has been deprecated. Instead, you can now supply asset key arguments to AssetSelection.assets() .
  • Run tag keys with long lengths and certain characters are now deprecated. For consistency with asset tags, run tags keys are expected to only contain alpha-numeric characters, dashes, underscores, and periods. Run tag keys can also contain a prefix section, separated with a slash. The main section and prefix section of a run tag are limited to 63 characters.
  • AssetExecutionContext has been simplified. Op-related methods and methods with existing access paths have been marked deprecated. For a full list of deprecated methods see this GitHub Discussion.
  • The metadata property on InputContext and OutputContext has been deprecated and renamed to definition_metadata .
  • FreshnessPolicy is now deprecated. For monitoring freshness, use freshness checks instead. If you are using AutoMaterializePolicy.lazy(), FreshnessPolicy is still recommended, and will continue to be supported until an alternative is provided.

Documentation

Dagster Cloud

  • The Dagster Cloud agent will now monitor the code servers that it spins to detect whether they have stopped serving requests, and will automatically redeploy the code server if it has stopped responding for an extended period of time.
  • New additions and bugfixes in Insights:
    • Added per-metric cost estimation. Estimates can be added via the “Insights settings” button, and will appear in the table and chart for that metric.
    • Branch deployments are now included in the deployment filter control.
    • In the Deployments view, fixed deployment links in the data table.
    • Added support for BigQuery cost metrics.
dagster - 1.6.14 (core) / 0.22.14 (libraries)

Published by elementl-devtools 7 months ago

Bugfixes

  • [dagster-dbt] Fixed some issues with building column lineage metadata.
dagster - 1.6.13 (core) / 0.22.13 (libraries)

Published by elementl-devtools 7 months ago

Bugfixes

  • Fixed a bug where an asset with a dependency on a subset of the keys of a parent multi-asset could sometimes crash asset job construction.
  • Fixed a bug where a Definitions object containing assets having integrated asset checks and multiple partitions definitions could not be loaded.
dagster - 1.6.12 (core) / 0.22.12 (libraries)

Published by elementl-devtools 7 months ago

New

  • AssetCheckResult now has a text description property. Check evaluation descriptions are shown in the Checks tab on the asset details page.
  • Introduced TimestampMetadataValue. Timestamp metadata values are represented internally as seconds since the Unix epoch. They can be constructed using MetadataValue.timestamp. In the UI, they’re rendered in the local timezone, like other timestamps in the UI.
  • AssetSelection.checks can now accept AssetCheckKeys as well as AssetChecksDefinition.
  • [community-contribution] Metadata attached to an output at runtime (via either add_output_metadata or by passing to Output) is now available on HookContext under the op_output_metadata property. Thanks @JYoussouf!
  • [experimental] @asset, AssetSpec, and AssetOut now accept a tags property. Tags are key-value pairs meant to be used for organizing asset definitions. If "__dagster_no_value" is set as the value, only the key will be rendered in the UI. AssetSelection.tag allows selecting assets that have a particular tag.
  • [experimental] Asset tags can be used in asset CLI selections, e.g. dagster asset materialize --select tag:department=marketing
  • [experimental][dagster-dbt] Tags can now be configured on dbt assets, using DagsterDbtTranslator.get_tags. By default, we take the dbt tags configured on your dbt models, seeds, and snapshots.
  • [dagster-gcp] Added get_gcs_keys sensor helper function.

Bugfixes

  • Fixed a bug that prevented external assets with dependencies from displaying properly in Dagster UI.
  • Fix a performance regression in loading code locations with large multi-assets.
  • [community-contribution] [dagster-databricks] Fix a bug with the DatabricksJobRunner that led to an inability to use dagster-databricks with Databricks instance pools. Thanks @smats0n!
  • [community-contribution] Fixed a bug that caused a crash when external assets had hyphens in their AssetKey. Thanks @maxfirman!
  • [community-contribution] Fix a bug with load_assets_from_package_module that would cause a crash when any submodule had the same directory name as a dependency. Thanks @CSRessel!
  • [community-contribution] Fixed a mypy type error, thanks @parthshyara!
  • [community-contribution][dagster-embedded-elt] Fixed an issue where Sling assets would not properly read group and description metadata from replication config, thanks @jvyoralek!
  • [community-contribution] Ensured annotations from the helm chart properly propagate to k8s run pods, thanks @maxfirman!

Dagster Cloud

  • Fixed an issue in Dagster Cloud Serverless runs where multiple runs simultaneously materializing the same asset would sometimes raise a “Key not found” exception.
  • Fixed an issue when using agent replicas where one replica would sporadically remove a code server created by another replica due to a race condition, leading to a “code server not found” or “Deployment not found” exception.
  • [experimental] The metadata key for specifying column schema that will be rendered prominently on the new Overview tab of the asset details page has been changed from "columns" to "dagster/column_schema". Materializations using the old metadata key will no longer result in the Columns section of the tab being filled out.
  • [ui] Fixed an Insights bug where loading a view filtered to a specific code location would not preserve that filter on pageload.
dagster - 1.6.11 (core) / 0.22.11 (libraries)

Published by elementl-devtools 7 months ago

Bugfixes

  • Fixed an issue where dagster dev or the Dagster UI would display an error when loading jobs created with op or asset selections.
dagster - 1.6.10 (core) / 0.22.10 (libraries)

Published by elementl-devtools 7 months ago

New

  • Latency improvements to the scheduler when running many simultaneous schedules.

Bugfixes

  • The performance of loading the Definitions snapshot from a code server when large @multi_asset s are in use has been drastically improved.
  • The snowflake quickstart example project now renames the “by” column to avoid reserved snowflake names. Thanks @jcampbell!
  • The existing group name (if any) for an asset is now retained if the_asset.with_attributes is called without providing a group name. Previously, the existing group name was erroneously dropped. Thanks @ion-elgreco!
  • [dagster-dbt] Fixed an issue where Dagster events could not be streamed from dbt source freshness.
  • [dagster university] Removed redundant use of MetadataValue in Essentials course. Thanks @stianthaulow!
  • [ui] Increased the max number of plots on the asset plots page to 100.

Breaking Changes

  • The tag_keys argument on DagsterInstance.get_run_tagsis no longer optional. This has been done to remove an easy way of accidentally executing an extremely expensive database operation.

Dagster Cloud

  • The maximum number of concurrent runs across all branch deployments is now configurable. This setting can now be set via GraphQL or the CLI.
  • [ui] In Insights, fixed display of table rows with zero change in value from the previous time period.
  • [ui] Added deployment-level Insights.
  • [ui] Fixed an issue causing void invoices to show up as “overdue” on the billing page.
  • [experimental] Branch deployments can now indicate the new and modified assets in the branch deployment as compared to the main deployment. To enable this feature, turn on the “Enable experimental branch deployment asset graph diffing” user setting.
dagster - 1.6.9 (core) / 0.22.8 (libraries)

Published by elementl-devtools 8 months ago

New

  • [ui] When viewing logs for a run, the date for a single log row is now shown in the tooltip on the timestamp. This helps when viewing a run that takes place over more than one date.
  • Added suggestions to the error message when selecting asset keys that do not exist as an upstream asset or in an AssetSelection.
  • Improved error messages when trying to materialize a subset of a multi-asset which cannot be subset.
  • [dagster-snowflake] dagster-snowflake now requires snowflake-connector-python>=3.4.0
  • [embedded-elt] @sling_assets accepts an optional name parameter for the underlying op
  • [dagster-openai] dagster-openai library is now available.
  • [dagster-dbt] Added a new setting on DagsterDbtTranslatorSettings called enable_duplicate_source_asset_keys that allows users to set duplicate asset keys for their dbt sources. Thanks @hello-world-bfree!
  • Log messages in the Dagster daemon for unloadable sensors and schedules have been removed.
  • [ui] Search now uses a cache that persists across pageloads which should greatly improve search performance for very large orgs.
  • [ui] groups/code locations in the asset graph’s sidebar are now sorted alphabetically.

Bugfixes

  • Fixed issue where the input/output schemas of configurable IOManagers could be ignored when providing explicit input / output run config.
  • Fixed an issue where enum values could not properly have a default value set in a ConfigurableResource.
  • Fixed an issue where graph-backed assets would sometimes lose user-provided descriptions due to a bug in internal copying.
  • [auto-materialize] Fixed an issue introduced in 1.6.7 where updates to ExternalAssets would be ignored when using AutoMaterializePolicies which depended on parent updates.
  • [asset checks] Fixed a bug with asset checks in step launchers.
  • [embedded-elt] Fix a bug when creating a SlingConnectionResource where a blank keyword argument would be emitted as an environment variable
  • [dagster-dbt] Fixed a bug where emitting events from dbt source freshness would cause an error.
  • [ui] Fixed a bug where using the “Terminate all runs” button with filters selected would not apply the filters to the action.
  • [ui] Fixed an issue where typing a search query into the search box before the search data was fetched would yield “No results” even after the data was fetched.

Community Contributions

  • [docs] fixed typo in embedded-elt.mdx (thanks @cameronmartin)!
  • [dagster-databricks] log the url for the run of a databricks job (thanks @smats0n)!
  • Fix missing partition property (thanks christeefy)!
  • Add op_tags to @observable_source_asset decorator (thanks @maxfirman)!
  • [docs] typo in MultiPartitionMapping docs (thanks @dschafer)
  • Allow github actions to checkout branch from forked repo for docs changes (ci fix) (thanks hainenber)!

Experimental

  • [asset checks] UI performance of asset checks related pages has been improved.
  • [dagster-dbt] The class DbtArtifacts has been added for managing the behavior of rebuilding the manifest during development but expecting a pre-built one in production.

Documentation

  • Added example of writing compute logs to AWS S3 when customizing agent configuration.
  • "Hello, Dagster" is now "Dagster Quickstart" with the option to use a Github Codespace to explore Dagster.
  • Improved guides and reference to better running multiple isolated agents with separate queues on ECS.

Dagster Cloud

  • Microsoft Teams is now supported for alerts. Documentation
  • A send sample alert button now exists on both the alert policies page and in the alert policies editor to make it easier to debug and configure alerts without having to wait for an event to kick them off.
dagster - 1.6.8 (core) / 0.22.8 (libraries

Published by elementl-devtools 8 months ago

Bugfixes

  • [dagster-embedded-elt] Fixed a bug in the SlingConnectionResource that raised an error when connecting to a database.

Experimental

  • [asset checks] graph_multi_assets with check_specs now support subsetting.
dagster - 1.6.7 (core) / 0.22.7 (libraries)

Published by elementl-devtools 8 months ago

New

  • Added a new run_retries.retry_on_op_or_asset_failures setting that can be set to false to make run retries only occur when there is an unexpected failure that crashes the run, allowing run-level retries to co-exist more naturally with op or asset retries. See the docs for more information.
  • dagster dev now sets the environment variable DAGSTER_IS_DEV_CLI allowing subprocesses to know that they were launched in a development context.
  • [ui] The Asset Checks page has been updated to show more information on the page itself rather than in a dialog.

Bugfixes

  • [ui] Fixed an issue where the UI disallowed creating a dynamic partition if its name contained the “|” pipe character.
  • AssetSpec previously dropped the metadata and code_version fields, resulting in them not being attached to the corresponding asset. This has been fixed.

Experimental

  • The new @multi_observable_source_asset decorator enables defining a set of assets that can be observed together with the same function.
  • [dagster-embedded-elt] New Asset Decorator @sling_assets and Resource SlingConnectionResource have been added for the [dagster-embedded-elt.sling](http://dagster-embedded-elt.sling) package. Deprecated build_sling_asset, SlingSourceConnection and SlingTargetConnection.
  • Added support for op-concurrency aware run dequeuing for the QueuedRunCoordinator.

Documentation

  • Fixed reference documentation for isolated agents in ECS.
  • Corrected an example in the Airbyte Cloud documentation.
  • Added API links to OSS Helm deployment guide.
  • Fixed in-line pragmas showing up in the documentation.

Dagster Cloud

  • Alerts now support Microsoft Teams.
  • [ECS] Fixed an issue where code locations could be left undeleted.
  • [ECS] ECS agents now support setting multiple replicas per code server.
  • [Insights] You can now toggle the visibility of a row in the chart by clicking on the dot for the row in the table.
  • [Users] Added a new column “Licensed role” that shows the user's most permissive role.
dagster - 1.6.6 (core) / 0.22.6 (libraries)

Published by elementl-devtools 8 months ago

New

  • Dagster officially supports Python 3.12.
  • dagster-polars has been added as an integration. Thanks @danielgafni!
  • [dagster-dbt] @dbt_assets now supports loading projects with semantic models.
  • [dagster-dbt] @dbt_assets now supports loading projects with model versions.
  • [dagster-dbt] get_asset_key_for_model now supports retrieving asset keys for seeds and snapshots. Thanks @aksestok!
  • [dagster-duckdb] The Dagster DuckDB integration supports DuckDB version 0.10.0.
  • [UPath I/O manager] If a non-partitioned asset is updated to have partitions, the file containing the non-partitioned asset data will be deleted when the partitioned asset is materialized, rather than raising an error.

Bugfixes

  • Fixed an issue where creating a backfill of assets with dynamic partitions and a backfill policy would sometimes fail with an exception.
  • Fixed an issue with the type annotations on the @asset decorator causing a false positive in Pyright strict mode. Thanks @tylershunt!
  • [ui] On the asset graph, nodes are slightly wider allowing more text to be displayed, and group names are no longer truncated.
  • [ui] Fixed an issue where the groups in the asset graph would not update after an asset was switched between groups.
  • [dagster-k8s] Fixed an issue where setting the security_context field on the k8s_job_executor didn't correctly set the security context on the launched step pods. Thanks @krgn!

Experimental

  • Observable source assets can now yield ObserveResults with no data_version.
  • You can now include FreshnessPolicys on observable source assets. These assets will be considered “Overdue” when the latest value for the “dagster/data_time” metadata value is older than what’s allowed by the freshness policy.
  • [ui] In Dagster Cloud, a new feature flag allows you to enable an overhauled asset overview page with a high-level stakeholder view of the asset’s health, properties, and column schema.

Documentation

  • Updated docs to reflect newly-added support for Python 3.12.

Dagster Cloud

  • [kubernetes] Fixed an issue where the Kubernetes agent would sometimes leave dangling kubernetes services if the agent was interrupted during the middle of being terminated.
dagster - 1.6.5 (core) / 0.22.5 (libraries)

Published by elementl-devtools 8 months ago

New

  • Within a backfill or within auto-materialize, when submitting runs for partitions of the same assets, runs are now submitted in lexicographical order of partition key, instead of in an unpredictable order.
  • [dagster-k8s] Include k8s pod debug info in run worker failure messages.
  • [dagster-dbt] Events emitted by DbtCliResource now include metadata from the dbt adapter response. This includes fields like rows_affected, query_id from the Snowflake adapter, or bytes_processed from the BigQuery adapter.

Bugfixes

  • A previous change prevented asset backfills from grouping multiple assets into the same run when using BackfillPolicies under certain conditions. While the backfills would still execute in the proper order, this could lead to more individual runs than necessary. This has been fixed.
  • [dagster-k8s] Fixed an issue introduced in the 1.6.4 release where upgrading the Helm chart without upgrading the Dagster version used by user code caused failures in jobs using the k8s_job_executor.
  • [instigator-tick-logs] Fixed an issue where invoking context.log.exception in a sensor or schedule did not properly capture exception information.
  • [asset-checks] Fixed an issue where additional dependencies for dbt tests modeled as Dagster asset checks were not properly being deduplicated.
  • [dagster-dbt] Fixed an issue where dbt model, seed, or snapshot names with periods were not supported.

Experimental

  • @observable_source_asset-decorated functions can now return an ObserveResult. This allows including metadata on the observation, in addition to a data version. This is currently only supported for non-partitioned assets.
  • [auto-materialize] A new AutoMaterializeRule.skip_on_not_all_parents_updated_since_cron class allows you to construct AutoMaterializePolicys which wait for all parents to be updated after the latest tick of a given cron schedule.
  • [Global op/asset concurrency] Ops and assets now take run priority into account when claiming global op/asset concurrency slots.

Documentation

  • Fixed an error in our asset checks docs. Thanks @vaharoni!
  • Fixed an error in our Dagster Pipes Kubernetes docs. Thanks @cameronmartin!
  • Fixed an issue on the Hello Dagster! guide that prevented it from loading.
  • Add specific capabilities of the Airflow integration to the Airflow integration page.
  • Re-arranged sections in the I/O manager concept page to make info about using I/O versus resources more prominent.
dagster - 1.6.4 (core) / 0.22.4 (libraries)

Published by elementl-devtools 8 months ago

New

  • build_schedule_from_partitioned_job now supports creating a schedule from a static-partitioned job (Thanks @craustin!)
  • [dagster-pipes] PipesK8sClient will now autodetect the namespace when using in-cluster config. (Thanks @aignas!)
  • [dagster-pipes] PipesK8sClient can now inject the context in to multiple containers. (Thanks @aignas!)
  • [dagster-snowflake] The Snowflake Pandas I/O manager now uses the write_pandas method to load Pandas DataFrames in Snowflake. To support this change, the database connector was switched from SqlDbConnection to SnowflakeConnection .
  • [ui] On the overview sensors page you can now filter sensors by type.
  • [dagster-deltalake-polars] Added LazyFrame support (Thanks @ion-elgreco!)
  • [dagster-dbt] When using @dbt_assets and multiple dbt resources produce the same AssetKey, we now display an exception message that highlights the file paths of the misconfigured dbt resources in your dbt project.
  • [dagster-k8s] The debug info reported upon failure has been improved to include additional information from the Job. (Thanks @jblawatt!)
  • [dagster-k8s] Changed the Dagster Helm chart to apply automountServiceAccountToken: false to the default service account used by the Helm chart, in order to better comply with security policies. (Thanks @MattyKuzyk!)

Bugfixes

  • A unnecessary thread lock has been removed from the sensor daemon. This should improve sensor throughput for users with many sensors who have enabled threading.
  • Retry from failure behavior has been improved for cases where dynamic steps were interrupted.
  • Previously, when backfilling a set of assets which shared a BackfillPolicy and PartitionsDefinition, but had a non-default partition mapping between them, a run for the downstream asset could be launched at the same time as a separate run for the upstream asset, resulting in inconsistent partition ordering. Now, the downstream asset will only execute after the parents complete. (Thanks @ruizh22!)
  • Previously, asset backfills would raise an exception if the code server became unreachable mid-iteration. Now, the backfill will pause until the next evaluation.
  • Fixed a bug that was causing ranged backfills over dynamically partitioned assets to fail.
  • [dagster-pipes] PipesK8sClient has improved handling for init containers and additional containers. (Thanks @aignas!)
  • Fixed the last_sensor_start_time property of the SensorEvaluationContext, which would get cleared on ticks after the first tick after the sensor starts.
  • [dagster-mysql] Fixed the optional dagster instance migrate --bigint-migration, which caused some operational errors on mysql storages.

Deprecations

  • The following methods on AssetExecutionContext have been marked deprecated, with their suggested replacements in parenthesis:
    • context.op_config (context.op_execution_context.op_config)
    • context.node_handle (context.op_execution_context.node_handle)
    • context.op_handle (context.op_execution_context.op_handle)
    • context.op (context.op_execution_context.op)
    • context.get_mapping_key (context.op_execution_context.get_mapping_key)
    • context.selected_output_names (context.op_execution_context.selected_output_names)
    • context.dagster_run (context.run)
    • context.run_id (context.run.run_id)
    • context.run_config (context.run.run_config)
    • context.run_tags (context.run.tags)
    • context.has_tag (key in context.run.tags)
    • context.get_tag (context.run.tags.get(key))
    • context.get_op_execution_context (context.op_execution_context)
    • context.asset_partition_key_for_output (context.partition_key)
    • context.asset_partition_keys_for_output (context.partition_keys)
    • context.asset_partitions_time_window_for_output (context.partition_time_window)
    • context.asset_partition_key_range_for_output (context.partition_key_range)

Experimental

  • [asset checks] @asset_check now has a blocking parameter. When this is enabled, if the check fails with severity ERROR then any downstream assets in the same run won’t execute.

Documentation

  • The Branch Deployment docs have been updated to reflect support for backfills
  • Added Dagster’s maximum supported Python version (3.11) to Dagster University and relevant docs
  • Added documentation for recommended partition limits (a maximum of 25K per asset).
  • References to the Enterprise plan have been renamed to Pro, to reflect recent plan name changes
  • Added syntax example for setting environment variables in PowerShell to our dbt with Dagster tutorial
  • [Dagster University] Dagster Essentials to Dagster v1.6, and introduced the usage of MaterializeResult
  • [Dagster University] Fixed a typo in the Dagster University section on adding partitions to an asset (Thanks Brandon Peebles!)
  • [Dagster University] Corrected lesson where sensors are covered (Thanks onefloid!)

Dagster Cloud

  • Agent tokens can now be locked down to particular deployments. Agents will not be able to run any jobs scheduled for deployments that they are not permitted to access. By default, agent tokens have access to all deployments in an organization. Use the Edit button next to an agent token on the Tokens tab in Org Settings to configure permissions for a particular token. You must be an Organization Admin to edit agent token permissions.
dagster - 1.6.3 (core) / 0.22.3 (libraries)

Published by elementl-devtools 9 months ago

New

  • Added support for the 3.0 release of the pendulum library, for Python versions 3.9 and higher.
  • Performance improvements when starting run worker processes or step worker processes for runs in code locations with a large number of jobs.
  • AllPartitionMapping now supports mapping to downstream partitions, enabling asset backfills with these dependencies. Thanks @craustin!
  • [asset checks][experimental] @asset_check has new fields additional_deps and additional_ins to allow dependencies on assets other than the asset being checked.
  • [ui] Asset graph group nodes now show status counts.
    • [dagster-snowflake] The Snowflake I/O Manager now has more specific error handling when a table doesn’t exist.
  • [ui] [experimental] A new experimental UI for the auto-materialize history of a specific asset has been added. This view can be enabled under your user settings by setting “Use new asset auto-materialize history page”.
  • [ui] Command clicking on an asset group will now select or deselect all assets in that group.
  • [dagster-k8s] Added the ability to customize resource limits for initContainers used by Dagster system components in the Dagster Helm chart. Thanks @MattyKuzyk!
  • [dagster-k8s] Added the ability to specify additional containers and initContainers in code locations in the Helm chart. Thanks @craustin!
  • [dagster-k8s] Explicitly listed the set of RBAC permissions used by the agent Helm chart role instead of using a wildcard. Thanks @easontm!
  • [dagster-dbt] Support for dbt-core==1.4.* is now removed because the version has reached end-of-life.

Bugfixes

  • Previously, calling get_partition_keys_not_in_subset on a BaseTimeWindowPartitionsSubset that targeted a partitions definition with no partitions (e.g. a future start date) would raise an error. Now, it returns an empty list.
  • Fixed issue which could cause invalid runs to be launched if a code location was updated during the course of an AMP evaluation.
  • Previously, some asset backfills raised an error when targeting multi-assets with internal asset dependencies. This has been fixed.
  • Previously, using the LocalComputeLogManager on Windows could result in errors relating to invalid paths. This has been resolved. Thanks @hainenber!
  • An outdated path in the contribution guide has been updated. Thanks @hainenber!
  • [ui] Previously an error was sometimes raised when attempting to create a dynamic partition within a multi-partitioned asset via the UI. This has been fixed.
  • [ui] The “Upstream materializations are missing” warning when launching a run has been expanded to warn about failed upstream materializations as well.
  • [ui] The community welcome modal now renders properly in dark mode and some elements of Asset and Op graphs have higher contrast in both themes.
  • [ui] Fixed dark mode colors for datepicker, error message, and op definition elements.
  • [ui] Pressing the arrow keys to navigate op/asset graphs while the layout is loading no longer causes errors.
  • [ui] Exporting asset and op graphs to SVG no longer fails when chrome extensions inject additional stylesheets into Dagster’s UI.
  • [ui] Dagster now defaults to UTC when the user’s default timezone cannot be identified, rather than crashing with a date formatting error.
  • [ui] Fixed an issue in the asset graph sidebar that caused groups to only list their first asset.
  • [ui] Fixed an issue where sensors runs would undercount the number of dynamic partition requests added or deleted if there were multiple requests for additions/deletions.
  • [docs] Fixed a typo in the “Using Dagster with Delta Lake” guide. Thanks @avriiil!
  • [asset checks] Fixed an issue which could cause errors when using asset checks with step launchers.
  • [dagster-webserver] A bug preventing WebSocket connections from establishing on python 3.11+ has been fixed.
  • [dagster-databricks] DatabricksJobRunner now ensures the correctdatabricks-sdk is installed. Thanks @zyd14!
  • [dagster-dbt] On run termination, an interrupt signal is now correctly forwarded to any in-progress dbt subprocesses.
  • [dagster-dbt] Descriptions for dbt tests ingested as asset checks can now be populated using the config.meta.description. Thanks @CapitanHeMo!
  • [dagster-dbt] Previously, the error message displayed when no dbt profiles information was found would display an incorrect path. This has been fixed. Thanks @zoltanctoth!
  • [dagster-k8s] PipesK8sClient can now correctly handle load_incluster_config . Thanks @aignas!

Documentation

  • Added a new category to Concepts: Automation. This page provides a high-level overview of the various ways Dagster allows you run data pipelines without manual intervention.
  • Moved several concept pages under Concepts > Automation: Schedules, Sensors, Asset Sensors, and Auto-materialize Policies.

Dagster Cloud

  • Fixed an issue where configuring the agent_queue key in a dagster_cloud.yaml file incorrectly failed to validate when using the dagster-cloud ci init or dagster-cloud ci check commands during CI/CD.
dagster - 1.6.2 (core) / 0.22.2 (libraries)

Published by elementl-devtools 9 months ago

New

  • The warning for unloadable sensors and schedules in the Dagster UI has now been removed.
  • When viewing an individual sensor or schedule, we now provide a button to reset the status of the sensor or schedule back to its default status as defined in code.

Experimental

  • [asset-checks] dbt asset checks now respect warn_if/ error_if severities

Dagster Cloud

  • Fixed a bug introduced in 1.6.0 where run status sensors did not cursor correctly when deployed on Dagster Cloud.
  • Schedule and sensor mutations are now tracked in the audit log.
dagster - 1.6.1 (core) / 0.22.1 (libraries)

Published by elementl-devtools 9 months ago

New

  • Added experimental functionality which hides user code errors from the Dagster UI. You may enable this functionality by setting the DAGSTER_REDACT_USER_CODE_ERRORS environment variable to 1.
  • [dagster-dbt] @dbt_assets now accepts a required_resource_keys argument.

Bugfixes

  • Fixed a bug where a run that targets no steps is launched by an asset backfill when code updates are pushed after backfill launch time.
  • Previously a graphQL error would be thrown on the asset backfill page if certain unpartitioned assets were changed to a partitioned assets. This has been fixed.
  • [ui] Show run log timestamps in the user’s preferred hour cycle (12/24h) format.
  • [ui] The “Export to SVG” option now works as expected in the improved asset graph UI.
  • [ui] On the asset graph, hovering over a collapsed group or the title bar of an expanded group highlights all edges in/out of the group.
  • Fixed occasional CI/CD errors when building documentation on a feature branch

Community Contributions

  • fix: add missing volumes and volumeMounts in job-instance-migrate.yaml. Thanks @nhuray!

Documentation

  • Fixed typos in the docs.

Dagster Cloud

  • [ui] Fix dark theme colors for billing components.
  • [ui] Show the number of users for each grant type (admin, editor, etc.) on the Users page.
dagster - 1.6.0 (core) / 0.22.0 (libraries)

Published by elementl-devtools 9 months ago

Major Changes since 1.5.0 (core) / 0.21.0 (libraries)

Core

  • Asset lineage graph UI revamp, to make it easier to visualize and navigate large graphs
    • Lineage now flows left-to-right instead of top-to-bottom.
    • You can expand and collapse asset groups in the graph.
    • A new left-hand sidebar provides a list of assets, organized by asset group and code location.
    • You can right-click on assets or groups to filter or materialize them.
    • You can filter by compute kind.
  • Dark mode for the Dagster UI – By default, Dagster will match your system’s light or dark theme but you can adjust this in the user settings in the top right of the UI.
  • Report asset materializations from the UI – I.e. you record an asset materialization event without executing the code to materialize the asset. This is useful in cases where you overwrote data outside of Dagster, and you want Dagster to know about it and represent it in the UI. It’s also useful when you have a preexisting partitioned asset and start managing it with Dagster: you want Dagster to show the historical partitions as materialized instead of missing.
  • MaterializeResult, AssetSpec, and AssetDep now marked stable – These APIs, introduced in Dagster 1.5, were previously marked experimental. They offer a more straightforward way of defining assets when you don’t want to use I/O managers.
  • Backfill previews – When launching a backfill that covers assets with different partitions, can you now click “Preview” to see that partitions for each asset that will be covered by the backfill.
  • Viewing logs for a sensor or schedule tick is no longer considered experimental – previously, accessing this functionality required turning on a feature flag in user settings.
  • Runs triggered by a sensor or schedule link to the tick that triggered them.

dagster-pipes

  • AWS Lambda Pipes clientPipesLambdaClient [guide].
  • Report arbitrary messages between pipes processes and the orchestrating process – with report_custom_message and get_custom_messages.
  • Termination forwarding – ensures that external processes are terminated when an orchestration process is.

Since 1.5.14 (core) / 0.21.14 (libraries)

New

  • Default op/asset concurrency limits are now configurable at the deployment level, using the concurrency > default_op_concurrency_limit configuration in your dagster.yaml (OSS) or Deployment Settings page (Dagster Cloud). In OSS, this feature first requires a storage migration (e.g. dagster instance migrate).
  • Zero-value op/asset concurrency limits are now supported. In OSS, this feature first requires a storage migration (e.g. dagster instance migrate).
  • When a Nothing-typed output is returned from an asset or op, the handle_output function of the I/O manager will no longer be called. Users of most Dagster-maintained I/O managers will see no behavioral changes, but users of the In-Memory I/O manager, or custom I/O managers that store Nothing-typed outputs should reference the migration guide for more information.
  • [ui] The updated asset graph is no longer behind an experimental flag. The new version features a searchable left sidebar, a horizontal DAG layout, context menus and collapsible groups!

Bugfixes

  • Previously, if a code location was re-deployed with modified assets during an iteration of the asset daemon, empty auto-materialize runs could be produced. This has been fixed.
  • The CLI command dagster asset materialize will now return a non-zero exit code upon failure.
  • [ui] The Dagster UI now shows resource descriptions as markdown instead of plain text.
  • [ui] Viewing stdout/stderr logs for steps emitting hundreds of thousands of messages is much more performant and does not render the Run page unusable.
  • [ui] Fixed an issue where sensors with intervals that were less than 30 seconds were shown with an interval of “~30s” in the UI. The correct interval is now shown.
  • [dagster-graphql] Fixed an issue where the GraphQL Python client raised an unclear error if the request failed due to a permissions error.

Breaking Changes

  • A slight change has been made to run status sensors cursor values for Dagster instance using the default SQLite storage implementation. If you are using the default SQLite storage and you are upgrading directly from a version of dagster<1.5.1, you may see the first tick of your run status sensor skip runs that completed but were not yet registered by the sensor during your upgrade. This should not be common, but to avoid any chance of that, you may consider an interim upgrade to dagster>=1.5.1,<1.6.0 first.

Community Contributions

  • Fixed a typo in the docs. Thanks @tomscholz!
  • [dagster-pyspark] Added additional file exclude rules to the zip files created by Dagster Pyspark step launchers. Thanks @maxfirman!

Documentation

  • Added a high-level overview page for Logging.

Dagster Cloud

  • Added the ability to annotate code locations with custom agent queues, allowing you to route requests for code locations in a single deployment to different agents. For example, you can route requests for one code location to an agent running in an on-premise data center but requests for all other code locations to another agent running in the cloud. For more information, see the docs.
dagster - 1.5.14 / 0.21.14 (libraries)

Published by elementl-devtools 10 months ago

New

  • Viewing logs for a sensor or schedule tick is now a generally available feature.
    • The feature flag to view sensor or schedule tick logs has been removed, as the feature is now enabled by default.
    • Logs can now be viewed even when the sensor or schedule tick fails.
    • The logs are now viewable in the sensor or schedule tick modal.
  • graph_multi_assets can now accept inputs as kwargs.
  • [ui] The tick timeline for schedules and sensors now defaults to showing all ticks, instead of excluding skipped ticks. The previous behavior can be enabled by unchecking the “Skipped” checkbox below the timeline view.
  • [ui] The updated asset graph is no longer behind an experimental flag. The new version features a searchable left sidebar, a horizontal DAG layout, context menus and collapsible groups!

Bugfixes

  • [ui] Fix layout and scrolling issues that arise when a global banner alert is displayed in the app.
  • [ui] Use a larger version of the run config dialog in the Runs list in order to maximize the amount of visible config yaml.
  • [ui] When a software-defined asset is removed from a code location, it will now also be removed from global search.
  • [ui] When selecting assets in the catalog, you can now opt to materialize only “changed and missing” items in your selection.
  • [ui] The “Open in Launchpad” option on asset run pages has been updated to link to the graph of assets or asset job instead of an unusable launchpad page.
  • [ui] Partition status dots of multi-dimensional assets no longer wrap on the Asset > Partitions page.
  • [asset checks] Fixed a bug that caused the resource_defs parameter of @asset_check to not be respected
  • [ui] Fixed an issue where schedules or sensors with the same name in two different code locations sometimes showed each others runs in the list of runs for that schedule or sensor.
  • [pipes] Fixed an issue with the PipesFileMessageReader that could cause a crash on Windows.
  • Previously, calling context.log in different threads within a single op could result in some of those log messages being dropped. This has been fixed (thanks @quantum-byte!)
  • [dagster-dbt] On Dagster run termination, the dbt subprocess now exits gracefully to terminate any inflight queries that are materializing models.

Breaking Changes

  • The file_manager property on OpExecutionContext and AssetExecutionContext has been removed. This is an ancient property that was deprecated prior to Dagster 1.0, and since then had been raising a NotImplementedError whenever invoked.

Community Contributions

  • Added the Hashicorp Nomad integration to the documentation’s list of community integrations. Thanks, @ThomAub!
  • [dagster-deltalake] Fixed an error when passing non-string valued options and extended the supported data types by the arrow type handler to support pyarrow datasets which allows for lazily loading delta tables. Thanks @roeap!

Experimental

  • [dagster-pipes] The subprocess and databricks clients now forward termination to the external process if the orchestration process is terminated. A forward_termination argument is available for opting out.

Documentation

  • Fixed an error in the asset checks factory code example.

Dagster Cloud

  • The UI now correctly displays failed partitions after a single-run backfill occurs. Previously, if a single-run backfill failed, the corresponding partitions would not display as failed.
  • Several performance improvements when submitting Snowflake metrics to Dagster Cloud Insights.
  • Fixed an error which would occur when submitting Snowflake metrics for a removed or renamed asset to Dagster Cloud Insights.
dagster - 1.5.13 / 0.21.13 (libraries)

Published by elementl-devtools 10 months ago

New

  • The SensorEvaluationContext object has two new properties: last_sensor_start_time and is_first_tick_since_sensor_start. This enables sensor evaluation functions to vary behavior on the first tick vs subsequent ticks after the sensor has started.
  • The asset_selection argument to @sensor and SensorDefinition now accepts sequence of AssetsDefinitions, a sequences of strings, or a sequence of AssetKeys, in addition to AssetSelections.
  • [dagster-dbt] Support for dbt-core==1.3.* has been removed.
  • [ui] In code locations view, link to git repo when it’s a valid URL.
  • [ui] To improve consistency and legibility, when displaying elapsed time, most places in the app will now no longer show milliseconds.
  • [ui] Runs that were launched by schedules or sensors now show information about the relevant schedule or sensor in the header, with a link to view other runs associated with the same tick.
  • [dagster-gcp] Added a show_url_only parameter to GCSComputeLogManager that allows you to configure the compute log manager so that it displays a link to the GCS console rather than loading the logs from GCS, which can be useful if giving Dagster access to GCS credentials is undesirable.

Bugfixes

  • Fixed behavior of loading partitioned parent assets when using the BranchingIOManager
  • [ui] Fixed an unwanted scrollbar that sometimes appears on the code location list.

Community Contributions

  • Fixed a bug where dagster would error on FIPS-enabled systems by explicitly marking callsites of hashlib.md5 as not used for security purposes (Thanks @jlloyd-widen!)
  • [dagster-k8s] Changed execute_k8s_job to be aware of run-termination and op failure by deleting the executing k8s job (Thanks @Taadas!).
  • [dagstermill] Fixed dagstermill integration with the Dagster web UI to allow locally-scoped static resources (required to show certain frontend-components like plotly graphs) when viewing dagstermill notebooks (Thanks @aebrahim!).
  • [dagster-dbt] Fixed type annotation typo in the DbtCliResource API docs (Thanks @akan72!)

Experimental

  • [pipes] Methods have been added to facilitate passing non-Dagster data back from the external process (report_custom_message ) to the orchestration process (get_custom_messages).
  • [ui] Added a “System settings” option for UI theming, which will use your OS preference to set light or dark mode.

Documentation

  • [graphql] - Removed experimental marker that was missed when the GraphQL client was fully released
  • [assets] - Add an example for using retries with assets to the SDA concept page
  • [general] - Fixed some typos and formatting issues