An orchestration platform for the development, production, and observation of data assets.
APACHE-2.0 License
Bot releases are visible (Hide)
Published by elementl-devtools 10 months ago
EnvVar
s used in Sling source and target configuration would not work properly in some circumstances.Published by elementl-devtools 11 months ago
DAGSTER_DBT_PARSE_PROJECT_ON_LOAD=1 dagster dev
in a new scaffolded project from dagster-dbt project scaffold
, dbt logs from creating dbt artifacts to loading the project are now silenced.connection_meta_to_group_fn
argument which allows configuring loaded asset groups based on the connection’s metadata dict.QueuedRunCoordinatorDaemon
has been refactored to paginate over runs when applying priority sort and tag concurrency limits. Previously, it loaded all runs into memory causing large memory spikes when many runs were enqueued.UPathIOManager
has been updated to use the correct path delimiter when interacting with cloud storages from a Windows process.STEP_WORKER_STARTED
event now fires before importing code in line with the other executors.EnvVar
did not work properly.IAttachDifferentObjectToOpContext
would pass the incorrect object to schedules and sensors.materialize_on_cron
rule with dynamically partitioned assets.without_checks
no longer fails by attempting to include the checks.DATABRICKS_HOST
is set. Thanks @zyd14!PipesLambdaClient
, an AWS Lambda pipes client has been added to dagster_aws
.PipesLambdaClient
with Dagster Pipes.Published by elementl-devtools 11 months ago
MetadataValue.job
metadata type, which can be used to link to a Dagster job from other objects in the UI.schema
metadata set on the asset
or op
, I/O manager schema
/ dataset
configuration, key_prefix
set on the asset
. Previously, all methods for setting the schema/dataset were mutually exclusive, and setting more than one would raise an exception.DAGSTER_DBT_PARSE_PROJECT_ON_LOAD=1 dagster dev
in a new scaffolded project from dagster-dbt project scaffold
, dbt artifacts for loading the project are now created in a static target/
directory.ScheduleEvaluationContext
when testing via build_schedule_context
.metadata
from a Failure
exception is now hoisted up to the failure that culminates when retry limits are exceeded.PipesK8sClient
now correctly raises on failed containers.maxCatchupRuns
and maxTickRetries
configuration options for the scheduler in the Helm chart.DBT_INDIRECT_SELECTION=empty
.@asset(check_specs=...
to not cooperate with the key_prefix
argument of the load_assets_from_modules
method and it’s compatriots.define_asset_job
now accepts an op_retry_policy
argument, which specifies a default retry policies for all of the ops in the job. (thanks Eugenio Contreras!)observable_source_asset
decorator now accepts a key
argument.implicit_materializations
argument has been added to get_results
and get_materialize_result
to control whether an implicit materialization event is created or not.SlingConnectionResource
to allow reusing sources and targets interoperably.build_dbt_asset_selection
now also selects asset checks based on their underlying dbt tests. E.g. build_dbt_asset_selection([my_dbt_assets], dbt_select="tag:data_quality")
will select the assets and checks for any models and tests tagged with ‘data_quality’.EnvVar
vs. os.getenv
to the Environment variables documentation.Published by elementl-devtools 11 months ago
AutoMaterializeRule.materialize_on_cron()
rule makes it possible to create policies which materialize assets on a regular cadence.SensorResult
from a sensor no longer overwrites a cursor if it was set via the context.can_subset=True
alongside assets which were upstream of some assets in the multi-asset, and downstream of others.HourlyPartitionsDefinition
with a non-UTC timezone and the default format string (or any format string not including a UTC-offset), there was no way to disambiguate between the first and second instance of the repeated hour during a daylight saving time transition. Now, for the one hour per year in which this ambiguity exists, the partition key of the second instance of the hour will have the UTC offset automatically appended to it.check_specs
to AssetsDefinition.from_graph
dagster-dbt
that caused some dbt tests to not be selected as asset checks.email_on_failure
sensor called deprecated methods on the context. This has been fixedDagsterInstance.report_runless_asset_event
is now public.AutoMaterializeRule.materialize_on_parent_updated
now accepts an updated_parents_filter
of type AutoMaterializeAssetPartitionsFilter
, which allows only materializing based on updates from runs with a required set of tags.Published by elementl-devtools 11 months ago
EnvVars
used in Airbyte or Fivetran resources would show up as their processed values in the launchpad when loading assets from a live Fivetran or Airbyte instance.Published by elementl-devtools 11 months ago
OpExecutionContext
and AssetExecutionContext
now have a partition_keys
propertydbt-core==1.7.*
is now supported.build_schedule_from_partitioned_job
was used with a job with multi-partitioned assets and the partitions_def
argument wasn’t provided to define_asset_job
.config
.dagster-embedded-elt
where sling’s updated_at
parameter was set to the incorrect typeAssetCheckKey
PipesSubprocessClient
now inherits the environment variables of the parent process in the launched subprocess.Published by elementl-devtools 12 months ago
PipesK8sClient
will now attempt to load the appropriate kubernetes config, and exposes arguments for controlling this process.load_asset_checks_from_modules
functions for loading asset checks in tandem with load_assets_from_modules
.dagster-dbt project scaffold
now uses ~/.dbt/profiles.yml
if a profiles.yml
is not present in the dbt project directory.@dbt_assets
now support PartitionMapping
using DagsterDbtTranslator.get_partition_mapping
.@dbt_assets
. To enable this, add the following metadata to your dbt model’s metadata in your dbt project:meta:
dagster:
has_self_dependency: True
pydantic<2.0.0
but having pydantic-core
installed.LastPartitionMapping
dependency would raise an error. This has been fixed.instance is not available to load partitions
error was raised.build_asset_with_blocking_check
1.5.0
where instances that haven’t been migrated to the latest schema hit an error upon run deletion.exclude
that didn’t match any dbt nodes was used in @dbt_assets
, an error would be raised. The error is now properly handled.DbtCliResource.cli(...)
in an op
, AssetMaterialization
's instead of Output
are now emitted.report_asset_observation
REST endpoint for reporting runless events is now available.Published by elementl-devtools 12 months ago
report_asset_observation
REST API endpoint for runless external asset observation eventsDAGSTER_K8S_PG_PASSWORD_SECRET
and DAGSTER_K8S_INSTANCE_CONFIG_MAP
will no longer be set in all pods.build_pyspark_zip
now takes an exclude
parameter that can be used to customize the set of excluded files.source_key_prefix
argument to load_assets_from_current_module
and load_assets_from_package_name
was ignoreddagster_embedded_elt
where the mode parameter was not being passed to Sling, and only one asset could be created at a timepipelineRun
configuration in the Helm chart is now deprecated. The same config can be set under dagster-user-deployments
setup_for_execution
and teardown_after_execution
calls to the inner IOManagers of the BranchingIOManager
- thank you @danielgafni!S3FakeResource.upload_fileobj()
signature is now consistent with boto3 S3.Client.upload_fileobj()
- thank you @jeanineharb!dbt_assets
now have an optional name parameter - thank you @AlexanderVR!Published by elementl-devtools about 1 year ago
report_asset_check
REST API endpoint for runless external asset check evaluation events. This is available in cloud as well.config
argument is now supported on @graph_multi_asset
DbtCliResource
.AssetExecutionContext
type annotation for the context
parameter in @asset_check
functions.profiles_dir=None
into DbtCliResource
would cause incorrect validation.DuckDBResource
and DuckDBIOManager
accept a connection_config
configuration that will be passed as config
to the DuckDB connection. Thanks @xjhc!Published by elementl-devtools about 1 year ago
--live-data-poll-rate
that allows configuring how often the UI polls for new asset data when viewing the asset graph, asset catalog, or overview assets page. It defaults to 2000 ms.report_asset_materialization
REST API endpoint for creating external asset materialization events. This is available in cloud as well.@dbt_assets
decorator now accepts a backfill_policy
argument, for controlling how the assets are backfilled.@dbt_assets
decorator now accepts a op_tags
argument, for passing tags to the op underlying the produced AssetsDefinition
.get_materialize_result
& get_asset_check_result
to PipesClientCompletedInvocation
acryl-datahub
pin in the dagster-datahub
package has been removed.PipesDatabricksClient
now performs stdout/stderr forwarding from the Databricks master node to Dagster.dagster-dbt-cloud
CLI.K8sRunLauncher
. See the docs for more information.Previously, the asset backfill page would display negative counts if failed partitions were manually re-executed. This has been fixed.
Fixed an issue where the run list dialog for viewing the runs occupying global op concurrency slots did not expand to fit the content size.
Fixed an issue where selecting a partition would clear the launchpad and typing in the launchpad would clear the partition selection
Fixed various issues with the asset-graph displaying the wrong graph
The IO manager’s handle_output
method is no longer invoked when observing an observable source asset.
[ui] Fixed an issue where the run config dialog could not be scrolled.
[pipes] Fixed an issue in the PipesDockerClient
with parsing logs fetched via the docker client.
[external assets] Fixed an issue in external_assets_from_specs
where providing multiple specs would error
[external assets] Correct copy in tooltip to explain why Materialize button is disabled on an external asset.
dagster-pipes
version used in the external process.is_dagster_pipes_process
has been removed from the dagster-pipes
package.Published by elementl-devtools about 1 year ago
Published by elementl-devtools about 1 year ago
EnvVar
utility will now raise an exception if it is used outside of the context of a Dagster resource or config class. The get_value()
utility will retrieve the value outside of this context.DbtCliResource
, an explicit target_path
can now be specified.DagsterDbtTranslator
and DagsterDbtTranslatorSettings
: see the docs for more information.Published by elementl-devtools about 1 year ago
Improved ergonomics for execution dependencies in assets - We introduced a set of APIs to simplify working with Dagster that don't use the I/O manager system for handling data between assets. I/O manager workflows will not be affected.
AssetDep
type allows you to specify upstream dependencies with partition mappings when using the deps
parameter of @asset
and AssetSpec
.MaterializeResult
can be optionally returned from an asset to report metadata about the asset when the asset handles any storage requirements within the function body and does not use an I/O manager.AssetSpec
has been added as a new way to declare the assets produced by @multi_asset
. When using , the does not need to return any values to be stored by the I/O manager. Instead, the should handle any storage requirements in the body of the function.Asset checks (experimental) - You can now define, execute, and monitor data quality checks in Dagster [docs].
@asset_check
decorator, as well as the check_specs
argument to @asset
and @multi_asset
enable defining asset checks.Auto materialize customization (experimental) - AutoMaterializePolicies
can now be customized [docs].
AutoMaterializeRule
s which determine if an asset should be materialized or skipped.PipesSubprocessClient
, PipesDocketClient
, PipesK8sClient
, PipesDatabricksClient
open_pipes_session
. One can augment existing invocations rather than replacing them wholesale.AssetExecutionContext
is now a subclass of OpExecutionContext
, not a type alias. The codedef my_helper_function(context: AssetExecutionContext):
...
@op
def my_op(context: OpExecutionContext):
my_helper_function(context)
will cause type checking errors. To migrate, update type hints to respect the new subclassing.
AssetExecutionContext
cannot be used as the type annotation for @op
s run in @jobs
. To migrate, update the type hint in @op
to OpExecutionContext
. @op
s that are used in @graph_assets
may still use the AssetExecutionContext
type hint.# old
@op
def my_op(context: AssetExecutionContext):
...
# correct
@op
def my_op(context: OpExecutionContext):
...
backfill_policy=BackfillPolicy.single_run()
to your assets.has_dynamic_partition
implementation has been optimized. Thanks @edvardlindelof!stream_to_asset_map
argument to build_airbyte_assets
to support the Airbyte prefix setting with special characters. Thanks @chollinger93!DatabricksPysparkStepLauncher
fails to get logs when job_run
doesn’t have cluster_id
at root level. Thanks @PadenZach!dagster_cloud.dagster_insights
module that contains utilities for capturing and submitting external metrics about data operations to Dagster Cloud via an api. Dagster Cloud Insights is a soon-to-be released feature that shows improves visibility into usage and cost metrics such as run duration and Snowflake credits in the Cloud UI.Published by elementl-devtools about 1 year ago
DbtCliResource
now enforces that the current installed version of dbt-core
is at least version 1.4.0
.DbtCliResource
now properly respects DBT_TARGET_PATH
if it is set by the user. Artifacts from dbt invocations using DbtCliResource
will now be placed in unique subdirectories of DBT_TARGET_PATH
.partition_time_window
attribute on OpExecutionContext
and AssetExecutionContext
now returns the time range, instead of raising an error.job_name
property on the result object of build_hook_context
.AssetSpec
has been added as a new way to declare the assets produced by @multi_asset
.AssetDep
type allows you to specify upstream dependencies with partition mappings when using the deps
parameter of @asset
and AssetSpec
.report_asset_check
method added to ExtContext
.yield from
to forward reported materializations and asset check results to Dagster. Results reported from ext that are not yielded will raise an error.os.getenv()
versus Dagster’s EnvVar
.Published by elementl-devtools about 1 year ago
ExpectationResult
. This will be made irrelevant by upcoming data quality features.SensorResult
.Published by elementl-devtools about 1 year ago
deps
parameter for @asset
and @multi_asset
now supports directly passing @multi_asset
definitions. If an @multi_asset
is passed to deps
, dependencies will be created on every asset produced by the @multi_asset
.dagster instance migrate --bigint-migration
.DbtCliResource
now validates at definition time that its project_dir
and profiles_dir
arguments are directories that respectively contain a dbt_project.yml
and profiles.yml
.policy_id
for new clusters when using the databricks_pyspark_step_launcher
(thanks @zyd14!)dagster-webserver
command was not indicating which port it was using in the command-line output.MaterializeResult
has been added as a new return type to be used in @asset
/ @multi_asset
materialization functionsPublished by elementl-devtools about 1 year ago
dagster-ext
module along with subprocess, docker, databricks, and k8s pod integrations are now available. Read more at https://github.com/dagster-io/dagster/discussions/16319. Note that the module is temporarily being published to PyPI under dagster-ext-process
, but is available in python as import dagster_ext
.@asset_check
or AssetChecksDefinition
.check_specs
argument to @graph_multi_asset
@graph_asset
that would raise an error about nonexistant checksPublished by elementl-devtools about 1 year ago
OpExecutionContext.add_output_metadata
can now be called multiple times per output.@janosroden
!)29 2 *
) no longer cause exceptions in the UI or daemon.observable_source_asset
s with different partition definitions existed in the same code location, runs targeting those assets could fail to launch. This has been fixed.start_date
of their underlying PartitionsDefinition
could result in runs being launched for partitions that no longer existed. This has been fixed.@ldnicolasmay
!)DagsterDbtTranslator
did not properly invoke get_auto_materialize_policy
and get_freshness_policy
for load_assets_from_dbt_project
.@graph_asset
support. This can be used to implement blocking checks, by raising an exception if the check fails.@multi_asset
subsetting, so only checks which target assets in the subset will execute.AssetCheckSpec
s will now cause an error at definition time if they target an asset other than the one they’re defined on.run_monitoring > free_slots_after_run_end_seconds
Published by elementl-devtools about 1 year ago
context
object now has an asset_key
property to get the AssetKey
of the current asset.dagster dev
and dagster-daemon run
commands now include a --log-level
argument that allows you to customize the logger level threshold.AirbyteResource
now includes a poll_interval
key that allows you to configure how often it checks an Airbyte sync’s status.DatabricksStepLauncher
to fail. Thanks @zyd14!workspace
and volumes
init scripts in the databricks client. Thanks @zyd14!Asset checks are now displayed in the asset graph and sidebar.
[Breaking] Asset check severity is now set at runtime on AssetCheckResult
instead of in the @asset_check
definition. Now you can define one check that either errors or warns depending on your check logic. ERROR
severity no longer causes the run to fail. We plan to reintroduce this functionality with a different API.
[Breaking] @asset_check
now requires the asset=
argument, even if the asset is passed as an input to the decorated function. Example:
@asset_check(asset=my_asset)
def my_check(my_asset) -> AssetCheckResult:
...
[Breaking] AssetCheckSpec
now takes asset=
instead of asset_key=
, and can accept either a key or an asset definition.
[Bugfix] Asset checks now work on assets with key_prefix
set.
[Bugfix] Execution failure
asset checks are now displayed correctly on the checks tab.
DbtCliResource
in custom asset/op to API docs.Published by elementl-devtools about 1 year ago
dagster execute job
cli now accepts —-op-selection
(thanks @silent-lad!)AssetsDefinition
instances.@graph
and @job
now work again, fixing a regression introduced in 1.4.5.ins
argument to graph_asset
is now respected correctly.dagster dev
failed on startup when the DAGSTER_GRPC_PORT
`environment variable was set in the environment.deps
arguments for an asset can now be specified as an iterable instead of a sequence, allowing for sets to be passed.securityContext
setting now applies correctly to all init containers (thanks @maowerner!)AutoMaterializeRule.skip_on_not_all_parents_updated
that enforces that an asset can only be materialized if all parents have been materialized since the asset's last materialization.AutoMaterializeRule.skip_on_parent_missing
–which is already part of the behavior of the default auto-materialize policy.dagster-dbt project scaffold
.