An orchestration platform for the development, production, and observation of data assets.
APACHE-2.0 License
Bot releases are visible (Hide)
Published by gibsondan over 2 years ago
protobuf
version 3 due to a backwards incompatible change in the probobuf
version 4 release.AssetGroup.build_job()
, you can define selections which select subsets of the loaded dbt project.load_assets_from_dbt_manifest
function now supports an experimental select
parameter. This allows you to use dbt selection syntax to select from an existing manifest.json file, rather than having Dagster re-compile the project on demand.OpExecutionContext
now exposes an asset_key_for_output
method, which returns the asset key that one of the op’s outputs corresponds too.python -m dagster.daemon
.non_argument_deps
parameter for the asset
and multi_asset
decorators can now be a set of strings in addition to a set of AssetKey
.InvalidSubsetError
. This is now fixed.AssetKey
s.snowflake_io_manager
would sometimes raise an error with pandas
1.4 or later installed.DagsterExecutionStepNotFoundError
. This is now fixed.AssetIn
can now accept a string that will be coerced to an AssetKey
. Thanks @aroig!dagster-gcp
now have user-configurable timeout length. Thanks @3cham!https://github.com/dagster-io/dagster/compare/0.14.16...0.14.17
Published by rexledesma over 2 years ago
AssetsDefinition.from_graph
now accepts a partitions_def
argument.@asset
-decorated functions can now accept variable keyword arguments.dagster instance info
now prints the current schema migration state for the configured instance storage.docs_url
on the dbt_cli_resource
. If this value is set, AssetMaterializations associated with each dbt model will contain a link to the dbt docs for that model.dbt_cloud_host
on the dbt_cloud_resource
, in the case that your dbt cloud instance is under a custom domain.InputContext.upstream_output
was missing the asset_key
when it referred to an asset outside the run.selection
parameter in AssetGroup.build_job()
, the generated job would include an incorrect set of assets in certain situations. This has been fixed.DagsterInstanceSchemaOutdated
exception if the instance storage was not up to date with the latest schema. We no longer wrap these exceptions, allowing the underlying exceptions to bubble up.adls2_pickle_io_manager
would sometimes fail to recursively delete a folder when cleaning up an output.instance.get_event_records
without an event type filter is now deprecated and will generate a warning. These calls will raise an exception starting in 0.15.0
.@multi_asset
now supports partitioning. Thanks @aroig!max_concurrent
field to the k8s_job_executor
that limits the number of concurrent Ops that will execute per run. Since this executor launches a Kubernetes Job per Op, this also limits the number of concurrent Kuberenetes Jobs. Note that this limit is per run, not global. Thanks @kervel!externalConfigmap
field as an alternative to dagit.workspace.servers
when running the user deployments chart in a separate release. This allows the workspace to be managed outside of the main Helm chart. Thanks @peay!markupsafe<=2.0.1
. Thanks @bollwyvl!https://github.com/dagster-io/dagster/compare/0.14.15...0.14.16
Published by benpankow over 2 years ago
RunRequest
objects instead of yielding them.OpExecutionContext
(provided as the context
argument to Ops) now has fields for, run
, job_def
, job_name
, op_def
, and op_config
. These replace pipeline_run
, pipeline_def
, etc. (though they are still available).OpExecutionContext
now offers a partition_time_window
attribute, which returns a tuple of datetime objects that mark the bounds of the partition’s time window.AssetsDefinition.from_graph
now accepts a partitions_def
argument.dagster-test-connection
pod from the Dagster Helm chart.k8s_job_executor
now polls the event log on a ~1 second interval (previously 0.1). Performance testing showed that this reduced DB load while not significantly impacting run time.Jinja2
and nbconvert
.<meta>
tag to a response header, and several more security and privacy related headers have been added as well.foo/bar
in dagit, rather than appearing as foo > bar
in some contexts.--log-level
flag is now available in the dagit cli for controlling the uvicorn log level.load_assets_from_dbt_project()
and load_assets_from_dbt_manifest()
utilities now have a use_build_command
parameter. If this flag is set, when materializing your dbt assets, Dagster will use the dbt build
command instead of dbt run
. Any tests run during this process will be represented with AssetObservation events attached to the relevant assets. For more information on dbt build
, see the dbt docs.build_snowflake_io_manager
offers a way to store assets and op outputs in Snowflake. The PandasSnowflakeTypeHandler
stores Pandas DataFrame
s in Snowflake.dagit.logLevel
has been added to values.yaml to access the newly added dagit --log-level cli option.toposort<=1.6
was installed.PickledObjectS3IOManager
now uses list_objects
to check the access permission. Thanks @trevenrawr!load_assets_from_dbt_project
and load_assets_from_dbt_manifest
functions now include the schemas of the dbt models in their asset keys. To revert to the old behavior: dbt_assets = load_assets_from_dbt_project(..., node_info_to_asset_key=lambda node_info: AssetKey(node_info["name"])
.TableSchema
API is no longer experimental.Published by prha over 2 years ago
When viewing a config schema in the Dagit launchpad, default values are now shown. Hover over an underlined key in the schema view to see the default value for that key.
dagster, dagit, and all extension libraries (dagster-*) now contain py.typed files. This exposes them as typed libraries to static type checking tools like mypy. If your project is using mypy or another type checker, this may surface new type errors. For mypy, to restore the previous state and treat dagster or an extension library as untyped (i.e. ignore Dagster’s type annotations), add the following to your configuration file:
[mypy-dagster] (or e.g. mypy-dagster-dbt)
follow_imports = "skip"
Op retries now surface the underlying exception in Dagit.
Made some internal changes to how we store schema migrations across our different storage implementations.
build_output_context
now accepts an asset_key
argument.
They key argument to the SourceAsset constructor now accepts values that are strings or sequences of strings and coerces them to AssetKeys.
You can now use the + operator to add two AssetGroups together, which forms an AssetGroup that contains a union of the assets in the operands.
AssetGroup.from_package_module
, from_modules
, from_package_name
, and from_current_module
now accept an extra_source_assets
argument that includes a set of source assets into the group in addition to the source assets scraped from modules.
AssetsDefinition and AssetGroup now both expose a to_source_assets
method that return SourceAsset versions of their assets, which can be used as source assets for downstream AssetGroups.
Repositories can now include multiple AssetGroups.
The new prefixed method on AssetGroup returns a new AssetGroup where a given prefix is prepended to the asset key of every asset in the group.
Dagster now has a BoolMetadataValue representing boolean-type metadata. Specifying True or False values in metadata will automatically be casted to the boolean type.
Tags on schedules can now be expressed as nested JSON dictionaries, instead of requiring that all tag values are strings.
If an exception is raised during an op, Dagster will now always run the failure hooks for that op. Before, certain system exceptions would prevent failure hooks from being run.
mapping_key
can now be provided as an argument to build_op_context
/build_solid_context
. Doing so will allow the use of OpExecutionContext.get_mapping_key()
.
AssetGroup.build_job
now uses >
instead of .
for delimiting the components within asset keys, which is consistent with how selection works in Dagit.+
.celery_docker_executor
would sometimes fail to execute with a JSON deserialization error when using Dagster resources that write to stdout.client.terminate_run(run_id)
. Thanks @Javier162380!Published by jmsanders over 2 years ago
values.yaml
file:runLauncher:
type: K8sRunLauncher
config:
k8sRunLauncher:
resources:
limits:
cpu: 100m
memory: 128Mi
requests:
cpu: 100m
memory: 128Mi
includeConfigInLaunchedRuns: true
in a user code deployment will now launch runs using the same namespace and service account as the user code deployment.@asset
decorator now accepts an op_tags
argument, which allows e.g. providing k8s resource requirements on the op that computes the asset.dagster api grpc-health-check
(previously it just returned via exit codes)emr_pyspark_step_launcher
now supports dynamic orchestration, RetryPolicy
s defined on ops, and re-execution from failure. For failed steps, the stack trace of the root error will now be available in the event logs, as will logs generated with context.log.info
.tags_fn_for_partition
function, instead of requiring that the dictionary have string keys and values.EcsRunLauncher
now registers new task definitions if the task’s execution role or task role changes.setuptools
as a runtime dependency.In
can now accept asset_partitions
without crashing.secretsmanager_resource
.Published by gibsondan over 2 years ago
Published by gibsondan over 2 years ago
Published by Ramshackle-Jamathon over 2 years ago
includeConfigInLaunchedRuns
flag to the Helm chart that can be used to automatically include configmaps, secrets, and volumes in any runs launched from code in a user code deployment. See https://docs.dagster.io/deployment/guides/kubernetes/deploying-with-helm#configure-your-user-deployment for more information.BackfillParams
(used for launching backfills), now has an allPartitions
boolean flag, which can be used instead of specifying all the individual partition names.gevent
and gevent-websocket
dependencies from dagster-graphql
sqlite3.OperationalError
error when viewing schedules/sensors pages in Dagit. This was affecting dagit instances using the default SQLite schedule storage with a SQLite version < 3.25.0
.asset_partitions
arg passed to In
@graph
composition which would cause the wrong input definition to be used for type checks--path-prefix
, large DAGs failed to render due to a WebWorker error, and the user would see an endless spinner instead. This has been fixed.dagster-census
is a new library that includes a census_resource
for interacting the Census REST API, census_trigger_sync_op
for triggering a sync and registering an asset once it has finished, and a CensusOutput
type. Thanks @dehume!Published by gibsondan over 2 years ago
dagster.yaml
that can be used to increase the time that Dagster waits when spinning up a gRPC server before timing out. For more information, see https://docs.dagster.io/deployment/dagster-instance#code-servers.assetMaterializations
that can be queried off of a DagsterRun
field. You can use this field to fetch the set of asset materialization events generated in a given run within a GraphQL query.@resource
decorator will now be used as resource descriptions, if no description is explicitly provided.dagit -m
or dagit -f
at a module or file that has asset definitions but no jobs or asset groups, and all the asset definitions will be loaded into Dagit.AssetGroup
now has a materialize
method which executes an in-process run to materialize all the assets in the group.AssetGroup
s can now contain assets with different partition_defs
.fs_asset_io_manager
, now include the path of the file where the values were saved.max_concurrent_runs
limit on the QueuedRunCoordinator
by setting it to -1
. Use this if you only want to limit runs using tag_concurrency_limits
.get_runs()
function to get a list of runs matching certain paramters from the dbt Cloud API (thanks @kstennettlull!)authenticator
field to the connection arguments for the snowflake_resource
(thanks @swotai!).container_kwargs
that allows you to specify additional arguments to pass to your docker containers when they are run.:
character would fail to parse correctly, and filtering would therefore fail. This has been fixed.Published by clairelin135 over 2 years ago
run_id
can now be provided as an argument to execute_in_process
.dagit
’s empty state no longer mentions the legacy concept “Pipelines”.IOManager.load_input
method, you can add input metadata via InputContext.add_input_metadata
. These metadata entries will appear on the LOADED_INPUT
event and if the input is an asset, be attached to an AssetObservation
. This metadata is viewable in dagit
.dagit
and dagster-daemon
processes. This would manifest in schedules / sensors getting marked as “Unloadable” in dagit
, and ticks not being registered correctly. The fix involves changing how Dagster stores schedule/sensor state and requires a schema change using the CLI command dagster instance migrate
. Users who are not running into this class of bugs may consider the migration optional.root_input_manager
can now be specified without a context argument.root_input_manager
from being used with VersionStrategy
.dagit
writing to the same telemetry logs.dagit
, using the “Open in Launchpad” feature for a run could cause server errors if the run configuration yaml was too long. Runs can now be opened from this feature regardless of config length.dagit
, runs in the timeline view sometimes showed incorrect end times, especially batches that included in-progress runs. This has been fixed.dagit
launchpad, reloading a repository should present the user with an option to refresh config that may have become stale. This feature was broken for jobs without partition sets, and has now been fixed.typing
type as dagster_type
to input and output definition was incorrectly being rejected.Published by jfineberg over 2 years ago
from dagster import AssetGroup
from dagster_azure import adls2_pickle_asset_io_manager, adls2_resource
asset_group = AssetGroup(
[upstream_asset, downstream_asset],
resource_defs={"io_manager": adls2_pickle_asset_io_manager, "adls2": adls2_resource}
)
@hourly_partitioned_config
, @daily_partitioned_config
, @weekly_partitioned_config
, and @monthly_partitioned_config
PartitionedConfig.get_run_config_for_partition_key
function. This will allow the use of the validate_run_config
function in unit tests.PartitionedConfig
now takes an argument tags_for_partition_fn
which allows for custom run tags for a given partition.Published by prha over 2 years ago
resource_defs
in AssetGroup
:from dagster import AssetGroup, gcs_pickle_asset_io_manager, gcs_resource
asset_group = AssetGroup(
[upstream_asset, downstream_asset],
resource_defs={"io_manager": gcs_pickle_asset_io_manager, "gcs": gcs_resource}
)
RetryRequested
exceptions.Published by jmsanders over 2 years ago
funcsigs.partial
that would cause incorrect InvalidInvocationErrors
to be thrown.Published by jmsanders over 2 years ago
@hourly
, @daily
, @weekly
, and @monthly
in addition to the standard 5-field cron strings (e.g. * * * * *
).value
is now an alias argument of entry_data
(deprecated) for the MetadataEntry
constructor.SourceAssets
and is rendered in dagit
.dagster
CLI.dagster-k8s
, dagster-celery-k8s
, and dagster-docker
now name step workers dagster-step-...
rather than dagster-job-...
.k8s_job_executor
for runs with many user logsdagster-k8s/config
tag to configure Dagster Kubernetes pods, the tags can now accept any valid Kubernetes config, and can be written in either snake case (node_selector_terms
) or camel case (nodeSelectorTerms
). See the docs for more information.EcsRunLauncher
using the same syntax that you use to set secrets in the ECS API.EcsRunLauncher
now attempts to reuse task definitions instead of registering a new task definition for every run.EcsRunLauncher
now raises the underlying ECS API failure if it cannot successfully start a task.AssetGroup.from_package_name
and similar methods, lists of assets at module scope are now loaded.AssetGroup.from_modules
and AssetGroup.from_current_module
, which automatically load assets at module scope from particular modules.AssetGraph.from_modules
now correctly raises an error if multiple assets with the same key are detected.InputContext
object provided to IOManager.load_input
previously did not include resource config. Now it does.build_fivetran_assets
will now be properly tagged with a fivetran
pill in Dagit.++item
. This has been fixed.SQLAlchemy
package to be 1.4
or greater to be installed. We are now using queries supported by SQLAlchemy>=1.3
. Previously we would raise an error including the message: 'Select' object has no attribute 'filter'
.sqlite
to be 3.25.0
or greater to be installed. This has been relaxed to support older versions of sqlite. This was previously marked as fixed in our 0.14.0
notes, but a handful of cases that were still broken have now been fixed. Previously we would raise an error (sqlite3.OperationalError
).EcsRunLauncher
to use sidecars without you providing your own custom task definition. Now, you can continue to inherit sidecars from the launching task’s task definition by setting include_sidecars: True
in your run launcher config.dagster-snowflake
has dropped support for python 3.6. The library it is currently built on, snowflake-connector-python,
dropped 3.6 support in their recent 2.7.5
release.MetadataValue.path()
and PathMetadataValue
now accept os.PathLike
objects in addition to strings. Thanks@abkfenris!env_vars
on the k8s_job_executor
. Thanks @kervel!AssetGroup
instead of build_assets_job
, and it can now be run entirely from a local machine with no additional infrastructure (storing data inside DuckDB).AssetGroup
instead of build_assets_job
.Published by gibsondan over 2 years ago
run_request_for_partition
, which returns a RunRequest
that can be returned in a sensor or schedule evaluation function to launch a run for a particular partition for that job. See our documentation for more information.PipelineRunsFilter
=> RunsFilter
.load_assets_from_dbt_project
will now attach schema information to the generated assets if it is available in the dbt project (schema.yml
).AssetGroup
api is now the fs_asset_io_manager
.SourceAsset
s can now be partitioned, by setting the partitions_def
argument.execute_in_process
was not updated properly.SQLAlchemy<1.4.0
.load_assets_from_dbt_project
would fail if models were organized into subdirectories.load_assets_from_dbt_project
would fail if seeds or snapshots were present in the project.[dagster-fivetran] A new fivetran_resync_op (along with a corresponding resync_and_poll method on the fivetran_resource) allows you to kick off Fivetran resyncs using Dagster (thanks @dwallace0723!)
[dagster-shell] Fixed an issue where large log output could cause operations to hang (thanks @kbd!)
[documentation] Fixed export message with dagster home path (thanks @proteusiq)!
[documentation] Remove duplicate entries under integrations (thanks @kahnwong)!
validate_run_config
.reexecute_pipeline
API.TableRecord
, TableSchema
and its constituents are now documented in the API docs.MetadataEntry
and MetadataValue
instead of old ones.Published by benpankow over 2 years ago
build_run_sensor_status_context
to help build context objects for run status sensorsdefault_value
on inputs has been resolved. Previously, a defensive error that did not take default_value
in to account was thrown.local_dagster_job_package_path
config option (Thanks Iswariya Manivannan!)reconstructable
API docs.Published by alangenfeld over 2 years ago
markupsafe
to function with existing Jinja2
pin.Published by clairelin135 over 2 years ago
from dagster import sensor, DefaultSensorStatus
@sensor(job=my_job, default_status=DefaultSensorStatus.RUNNING)
def my_running_sensor():
...
or a schedule like this:
from dagster import schedule, DefaultScheduleStatus, ScheduleEvaluationContext
@schedule(job=my_job, cron_schedule="0 0 * * *", default_status=DefaultScheduleStatus.RUNNING)
def my_running_schedule(context: ScheduleEvaluationContext):
...
As soon as schedules or sensors with the default_status field set to RUNNING
are included in the workspace loaded by your Dagster Daemon, they will begin creating ticks and submitting runs.my_graph.my_op
. This is supported in both Dagit and Python APIs.AssetMaterializations
, ExpectationResults
, and AssetObservations
can be logged via the context of an op using the OpExecutionContext.log_event method. Output metadata can also be logged using the OpExecutionContext.add_output_metadata method. Previously, Dagster expected these events to be yielded within the body of an op, which caused lint errors for many users, made it difficult to add mypy types to ops, and also forced usage of the verbose Output API. Here’s an example of the new invocations:
from dagster import op, AssetMaterialization
@op
def the_op(context):
context.log_event(AssetMaterialization(...))
context.add_output_metadata({"foo": "bar"})
...
EventMetadata
> MetadataValue
EventMetadataEntry
> MetadataEntry
XMetadataEntryData
> XMetadataValue
(e.g. TextMetadataEntryData
> TextMetadataValue
)metadata_entries
keyword argument to events and Dagster types is deprecated. Instead, users should use the metadata keyword argument, which takes a dictionary mapping string labels to MetadataValue
s.EventMetadataEntry
is deprecated.EventMetadataEntry
(e.g. EventMetadataEntry.text
) is deprecated. In 0.15.0, users should avoid constructing EventMetadataEntry
objects directly, instead utilizing the metadata dictionary keyword argument, which maps string labels to MetadataValues
.from flask import has_request_context, request
def submit_run(self, context: SubmitRunContext) -> PipelineRun:
jwt_claims_header = (
request.headers.get("X-Amzn-Oidc-Data", None) if has_request_context() else None
)
Should be replaced by:
def submit_run(self, context: SubmitRunContext) -> PipelineRun:
jwt_claims_header = context.get_request_header("X-Amzn-Oidc-Data")
from dagster import AssetGroup, repository, asset
@asset(required_resource_keys={"foo"})
def asset1():
...
@asset
def asset2():
...
@repository
def the_repo():
asset_group = AssetGroup(assets=[asset1, asset2], resource_defs={"foo": ...})
return [asset_group, asset_group.build_job(selection="asset1-")]
AssetGroup.build_job
supports a selection syntax similar to that found in op selection.asset_observations_for_node
method to ExecuteInProcessResult
for fetching the AssetObservations from an in-process execution.OpExecutionContext.log_event
provides a way to log AssetMaterializations, ExpectationResults, and AssetObservations from the body of an op without having to yield anything. Likewise, you can use OpExecutionContext.add_output_metadata
to attach metadata to an output without having to explicitly use the Output object.OutputContext.log_event
provides a way to log AssetMaterializations from within the handle_output method of an IO manager without yielding. Likewise, output metadata can be added using OutputContext.add_output_metadata
.Published by brad-alexander over 2 years ago
EcsRunLauncher
can now override the secrets_tag
parameter to None, which will cause it to not look for any secrets to be included in the tasks for the run. This can be useful in situations where the run launcher does not have permissions to query AWS Secretsmanager.0.13.17
/ 0.13.18
might display an incorrect timestamp for its start time on the Runs page. Running the dagster instance migrate
CLI command should resolve the issue.Published by prha over 2 years ago
my_op
inside a subgraph my_graph
, you can now specify the query as "my_graph.my_op"
.dagster asset wipe
CLI command now takes a --noprompt
option.Map
config type, used to represent mappings between arbitrary scalar keys and typed values. For more information, see the Map ConfigType docs.build_resources
has been added to the top level API. It provides a way to initialize resources outside of execution. This provides a way to use resources within the body of a sensor or schedule: https://github.com/dagster-io/dagster/issues/3794
dagster-daemon
process now creates fewer log entries when no actions are taken (for example, if the run queue is empty)dagster-daemon
pod will now spin down completely before the new dagster-daemon
pod is started.K8sRunLauncher
should fail if the Dagster run fails. To enable this flag, set the ``failPodOnRunFailure` key to true in the run launcher portion of the Helm chart.schema
and data
arguments on the DbtCliResource.test
function no longer need to be set to False to avoid errors, and the dbt output will be no longer be displayed in json format in the event logs.DagsterGraphQLClient
now supports submitting runs with op/solid sub-selections.0.13.15
/ 0.13.16
/ 0.13.17
might display an incorrect timestamp for its start time on the Runs page. This would only happen if you had run a schema migration (using one of those versions) with the dagster instance migrate
CLI command. Running the dagster instance reindex
command should run a data migration that resolves this issue.namespace
argument of the @asset
decorator now accepts a list of strings in addition to a single string.TableSchemaMetadataEntryData
and TableMetadataEntryData
allow you to emit metadata representing the schema / contents of a table, to be displayed in Dagit.