
An orchestration platform for the development, production, and observation of data assets.

APACHE-2.0 License


Bot releases are hidden (Show)

dagster -

Published by johannkm over 3 years ago

dagster -

Published by johannkm over 3 years ago


  • The Python GraphQL client now includes a shutdown_repository_location API call that shuts down a gRPC server. This is useful in situations where you want Kubernetes to restart your server and re-create your repository definitions, even though the underlying Python code hasn’t changed (for example, if your pipelines are loaded programatically from a database)

  • io_manager_key and root_manager_key is disallowed on composite solids’ InputDefinitions and OutputDefinitions. Instead, custom IO managers on the solids inside composite solids will be respected:

    @solid(input_defs=[InputDefinition("data", dagster_type=str, root_manager_key="my_root")])
    def inner_solid(_, data):
      return data
    def my_composite():
      return inner_solid()
  • Schedules can now be directly invoked. This is intended to be used for testing. To learn more, see https://docs.dagster.io/master/concepts/partitions-schedules-sensors/schedules#testing-schedules


  • Dagster libraries (for example, dagster-postgres or dagster-graphql) are now pinned to the same version as the core dagster package. This should reduce instances of issues due to backwards compatibility problems between Dagster packages.
  • Due to a recent regression, when viewing a launched run in Dagit, the Gantt chart would inaccurately show the run as queued well after it had already started running. This has been fixed, and the Gantt chart will now accurately reflect incoming logs.
  • In some cases, navigation in Dagit led to overfetching a workspace-level GraphQL query that would unexpectedly reload the entire app. The excess fetches are now limited more aggressively, and the loading state will no longer reload the app when workspace data is already available.
  • Previously, execution would fail silently when trying to use memoization with a root input manager. The error message now more clearly states that this is not supported.

Breaking Changes

  • Invoking a generator solid now yields a generator, and output objects are not unpacked.

    def my_solid():
      yield Output("hello")
    assert isinstance(list(my_solid())[0], Output)


  • Added an experimental EcsRunLauncher. This creates a new ECS Task Definition and launches a new ECS Task for each run. You can use the new ECS Reference Deployment to experiment with the EcsRunLauncher. We’d love your feedback in our #dagster-ecs Slack channel!


dagster -

Published by prha over 3 years ago

dagster -

Published by prha over 3 years ago



  • Supplying the "metadata" argument to InputDefinitions and OutputDefinitions is no longer considered experimental.
  • The "context" argument can now be omitted for solids that have required resource keys.
  • The S3ComputeLogManager now takes a boolean config argument skip_empty_files, which skips uploading empty log files to S3. This should enable a work around of timeout errors when using the S3ComputeLogManager to persist logs to MinIO object storage.
  • The Helm subchart for user code deployments now allows for extra manifests.
  • Running dagit with flag --suppress-warnings will now ignore all warnings, such as ExperimentalWarnings.
  • PipelineRunStatus, which represents the run status, is now exported in the public API.


  • The asset catalog now has better backwards compatibility for supporting deprecated Materialization events. Previously, these events were causing loading errors.

Community Contributions

  • Improved documentation of the dagster-dbt library with some helpful tips and example code (thanks @makotonium!).
  • Fixed the example code in the dagster-pyspark documentation for providing and accessing the pyspark resource (thanks @Andrew-Crosby!).
  • Helm chart serviceaccounts now allow annotations (thanks @jrouly!).


  • Added section on testing resources (link).
  • Revamped IO manager testing section to use build_input_context and build_output_context APIs (link).
dagster - 0.11.13

Published by gibsondan over 3 years ago


  • Added an example that demonstrates what a complete repository that takes advantage of many Dagster features might look like. Includes usage of IO Managers, modes / resources, unit tests, several cloud service integrations, and more! Check it out at examples/hacker_news!
  • retry_number is now available on SolidExecutionContext, allowing you to determine within a solid function how many times the solid has been previously retried.
  • Errors that are surfaced during solid execution now have clearer stack traces.
  • When using Postgres or MySQL storage, the database mutations that initialize Dagster tables on startup now happen in atomic transactions, rather than individual SQL queries.
  • The tags for Dagster-provided images in the Helm chart will now default to the current chart version.
  • Removed the PIPELINE_INIT_FAILURE event type. A failure that occurs during pipeline initialization will now produce a PIPELINE_FAILURE as with all other pipeline failures.


  • When viewing run logs in Dagit, in the stdout/stderr log view, switching the filtered step did not work. This has been fixed. Additionally, the filtered step is now present as a URL query parameter.
  • The get_run_status method on the Python GraphQL client now returns a PipelineRunStatus enum instead of the raw string value in order to align with the mypy type annotation. Thanks to Dylan Bienstock for surfacing this bug!
  • When a docstring on a solid doesn’t match the reST, Google, or Numpydoc formats, Dagster no longer raises an error.
  • Fixed a bug where memoized runs would sometimes fail to execute when specifying a non-default IO manager key.


  • Added thek8s_job_executor, which executes solids in separate kubernetes jobs. With the addition of this executor, you can now choose at runtime between single pod and multi-pod isolation for solids in your run. Previously this was only configurable for the entire deployment - you could either use the K8sRunLauncher with the default executors (in_process and multiprocess) for low isolation, or you could use the CeleryK8sRunLauncher with the celery_k8s_job_executor for pod-level isolation. Now, your instance can be configured with the K8sRunLauncher and you can choose between the default executors or the k8s_job_executor.
  • The DagsterGraphQLClient now allows you to specify whether to use HTTP or HTTPS when connecting to the GraphQL server. In addition, error messages during query execution or connecting to dagit are now clearer. Thanks to @emily-hawkins for raising this issue!
  • Added experimental hook invocation functionality. Invoking a hook will call the underlying decorated function. For example:
  from dagster import build_hook_context

  my_hook(build_hook_context(resources={"foo_resource": "foo"}))
  • Resources can now be directly invoked as functions. Invoking a resource will call the underlying decorated initialization function.
  from dagster import build_init_resource_context

  def my_basic_resource(init_context):
      return init_context.resource_config

  context = build_init_resource_context(config="foo")
  assert my_basic_resource(context) == "foo"
  • Improved the error message when a pipeline definition is incorrectly invoked as a function.


dagster -

Published by jmsanders over 3 years ago


  • ScheduleDefinition and SensorDefinition now carry over properties from functions decorated by @sensor and @schedule. Ie: docstrings.
  • Fixed a bug with configured on resources where the version set on a ResourceDefinition was not being passed to the ResourceDefinition created by the call to configured.
  • Previously, if an error was raised in an IOManager handle_output implementation that was a generator, it would not be wrapped DagsterExecutionHandleOutputError. Now, it is wrapped.
  • Dagit will now gracefully degrade if websockets are not available. Previously launching runs and viewing the event logs would block on a websocket conection.


  • Added an example of run attribution via a custom run coordinator, which reads a user’s email from HTTP headers on the Dagster GraphQL server and attaches the email as a run tag. Custom run coordinator are also now specifiable in the Helm chart, under queuedRunCoordinator. See the docs for more information on setup.
  • RetryPolicy now supports backoff and jitter settings, to allow for modulating the delay as a function of attempt number and randomness.


dagster -

Published by yuhan over 3 years ago



  • [Helm] Added dagit.enableReadOnly . When enabled, a separate Dagit instance is deployed in —read-only mode. You can use this feature to serve Dagit to users who you do not want to able to kick off new runs or make other changes to application state.
  • [dagstermill] Dagstermill is now compatible with current versions of papermill (2.x). Previously we required papermill to be pinned to 1.x.
  • Added a new metadata type that links to the asset catalog, which can be invoked using EventMetadata.asset.
  • Added a new log event type LOGS_CAPTURED, which explicitly links to the captured stdout/stderr logs for a given step, as determined by the configured ComputeLogManager on the Dagster instance. Previously, these links were available on the STEP_START event.
  • The network key on DockerRunLauncher config can now be sourced from an environment variable.
  • The Workspace section of the Status page in Dagit now shows more metadata about your workspace, including the python file, python package, and Docker image of each of your repository locations.
  • In Dagit, settings for how executions are viewed now persist across sessions.

Breaking Changes

  • The get_execution_data method of SensorDefinition and ScheduleDefinition has been renamed to evaluate_tick. We expect few to no users of the previous name, and are renaming to prepare for improved testing support for schedules and sensors.

Community Contributions

  • README has been updated to remove typos (thanks @gogi2811).
  • Configured API doc examples have been fixed (thanks @jrouly).


  • Documentation on testing sensors using experimental build_sensor_context API. See Testing sensors.


  • Some mypy errors encountered when using the built-in Dagster types (e.g., dagster.Int ) as type annotations on functions decorated with @solid have been resolved.
  • Fixed an issue where the K8sRunLauncher sometimes hanged while launching a run due to holding a stale Kubernetes client.
  • Fixed an issue with direct solid invocation where default config values would not be applied.
  • Fixed a bug where resource dependencies to io managers were not being initialized during memoization.
  • Dagit can once again override pipeline tags that were set on the definition, and UI clarity around the override behavior has been improved.
  • Markdown event metadata rendering in dagit has been repaired.


dagster -

Published by OwenKephart over 3 years ago



  • Sensors can now set a string cursor using context.update_cursor(str_value) that is persisted across evaluations to save unnecessary computation. This persisted string value is made available on the context as context.cursor. Previously, we encouraged cursor-like behavior by exposing last_run_key on the sensor context, to keep track of the last time the sensor successfully requested a run. This, however, was not useful for avoiding unnecessary computation when the sensor evaluation did not result in a run request.
  • Dagit may now be run in --read-only mode, which will disable mutations in the user interface and on the server. You can use this feature to run instances of Dagit that are visible to users who you do not want to able to kick off new runs or make other changes to application state.
  • In dagster-pandas, the event_metadata_fn parameter to the function create_dagster_pandas_dataframe_type may now return a dictionary of EventMetadata values, keyed by their string labels. This should now be consistent with the parameters accepted by Dagster events, including the TypeCheck event.
# old
MyDataFrame = create_dagster_pandas_dataframe_type(
    event_metadata_fn=lambda df: [
        EventMetadataEntry.int(len(df), "number of rows"),
        EventMetadataEntry.int(len(df.columns), "number of columns"),

# new
MyDataFrame = create_dagster_pandas_dataframe_type(
    event_metadata_fn=lambda df: {
        "number of rows": len(df),
        "number of columns": len(dataframe.columns),
  • dagster-pandas’ PandasColumn.datetime_column() now has a new tz parameter, allowing you to constrain the column to a specific timezone (thanks @mrdavidlaing!)
  • The DagsterGraphQLClient now takes in an optional transport argument, which may be useful in cases where you need to authenticate your GQL requests:
authed_client = DagsterGraphQLClient(
    transport=RequestsHTTPTransport(..., auth=<some auth>),
  • Added an ecr_public_resource to get login credentials for the AWS ECR Public Gallery. This is useful if any of your pipelines need to push images.
  • Failed backfills may now be resumed in Dagit, by putting them back into a "requested" state. These backfill jobs should then be picked up by the backfill daemon, which will then attempt to create and submit runs for any of the outstanding requested partitions. This should help backfill jobs recover from any deployment or framework issues that occurred during the backfill prior to all the runs being launched. This will not, however, attempt to re-execute any of the individual pipeline runs that were successfully launched but resulted in a pipeline failure.
  • In the run log viewer in Dagit, links to asset materializations now include the timestamp for that materialization. This will bring you directly to the state of that asset at that specific time.
  • The Databricks step launcher now includes a max_completion_wait_time_seconds configuration option, which controls how long it will wait for a Databricks job to complete before exiting.


  • Solids can now be invoked outside of composition. If your solid has a context argument, the build_solid_context function can be used to provide a context to the invocation.
from dagster import build_solid_context

def basic_solid():
    return "foo"

assert basic_solid() == 5

def add_one(x):
    return x + 1

assert add_one(5) == 6

def solid_reqs_resources(context):
    return context.resources.foo_resource + "bar"

context = build_solid_context(resources={"foo_resource": "foo"})
assert solid_reqs_resources(context) == "foobar"
  • build_schedule_context allows you to build a ScheduleExecutionContext using a DagsterInstance. This can be used to test schedules.
from dagster import build_schedule_context

with DagsterInstance.get() as instance:
    context = build_schedule_context(instance)
  • build_sensor_context allows you to build a SensorExecutionContext using a DagsterInstance. This can be used to test sensors.

from dagster import build_sensor_context

with DagsterInstance.get() as instance:
    context = build_sensor_context(instance)
  • build_input_context and build_output_context allow you to construct InputContext and OutputContext respectively. This can be used to test IO managers.
from dagster import build_input_context, build_output_context

io_manager = MyIoManager()

io_manager.handle_output(build_output_context(), val)
  • Resources can be provided to either of these functions. If you are using context manager resources, then build_input_context/build_output_context must be used as a context manager.
with build_input_context(resources={"cm_resource": my_cm_resource}) as context:
  • validate_run_config can be used to validate a run config blob against a pipeline definition & mode. If the run config is invalid for the pipeline and mode, this function will throw an error, and if correct, this function will return a dictionary representing the validated run config that Dagster uses during execution.
    {"solids": {"a": {"config": {"foo": "bar"}}}},
) # usage for pipeline that requires config

) # usage for pipeline that has no required config
  • The ability to set a RetryPolicy has been added. This allows you to declare automatic retry behavior when exceptions occur during solid execution. You can set retry_policy on a solid invocation, @solid definition, or @pipeline definition.
@solid(retry_policy=RetryPolicy(max_retries=3, delay=5))
def fickle_solid(): # ...

@pipeline( # set a default policy for all solids
def my_pipeline(): # will use the pipelines policy by default

    # solid definition takes precedence over pipeline default

    # invocation setting takes precedence over definition


  • Previously, asset materializations were not working in dagster-dbt for dbt >= 0.19.0. This has been fixed.
  • Previously, using the dagster/priority tag directly on pipeline definitions would cause an error. This has been fixed.
  • In dagster-pandas, the create_dagster_pandas_dataframe_type() function would, in some scenarios, not use the specified materializer argument when provided. This has been fixed (thanks @drewsonne!)
  • dagster-graphql --remote now sends the query and variables as post body data, avoiding uri length limit issues.
  • In the Dagit pipeline definition view, we no longer render config nubs for solids that do not need them.
  • In the run log viewer in Dagit, truncated row contents (including errors with long stack traces) now have a larger and clearer button to expand the full content in a dialog.
  • [dagster-mysql] Fixed a bug where database connections accumulated by sqlalchemy.Engine objects would be invalidated after 8 hours of idle time due to MySQL’s default configuration, resulting in an sqlalchemy.exc.OperationalError when attempting to view pages in Dagit in long-running deployments.


  • In 0.11.9, context was made an optional argument on the function decorated by @solid. The solids throughout tutorials and snippets that do not need a context argument have been altered to omit that argument, and better reflect this change.
  • In a previous docs revision, a tutorial section on accessing resources within solids was removed. This has been re-added to the site.
dagster -

Published by jmsanders over 3 years ago


  • In Dagit, assets can now be viewed with an asOf URL parameter, which shows a snapshot of the asset at the provided timestamp, including parent materializations as of that time.
  • [Dagit] Queries and Mutations now use HTTP instead of a websocket-based connection.


  • A regression in 0.11.8 where composites would fail to render in the right side bar in Dagit has been fixed.
  • A dependency conflict in make dev_install has been fixed.
  • [dagster-python-client] reload_repository_location and submit_pipeline_execution have been fixed - the underlying GraphQL queries had a missing inline fragment case.

Community Contributions

  • AWS S3 resources now support named profiles (thanks @deveshi!)
  • The Dagit ingress path is now configurable in our Helm charts (thanks @orf!)
  • Dagstermill’s use of temporary files is now supported across operating systems (thanks @slamer59!)
  • Deploying with Helm documentation has been updated to reflect the correct name for “dagster-user-deployments” (thanks @hebo-yang!)
  • Deploying with Helm documentation has been updated to suggest naming your release “dagster” (thanks @orf!)
  • Solids documentation has been updated to remove a typo (thanks @dwallace0723!)
  • Schedules documentation has been updated to remove a typo (thanks @gdoron!)
dagster -

Published by alangenfeld over 3 years ago


  • The @solid decorator can now wrap a function without a context argument, if no context information is required. For example, you can now do:
def basic_solid():
    return 5

def solid_with_inputs(x, y):
    return x + y

however, if your solid requires config or resources, then you will receive an error at definition time.

  • It is now simpler to provide structured metadata on events. Events that take a metadata_entries argument may now instead accept a metadata argument, which should allow for a more convenient API. The metadata argument takes a dictionary with string labels as keys and EventMetadata values. Some base types (str, int, float, and JSON-serializable list/dicts) are also accepted as values and will be automatically coerced to the appropriate EventMetadata value. For example:
def old_metadata_entries_solid(df):
   yield AssetMaterialization(
           EventMetadataEntry.text("users_table", "table name"),
           EventMetadataEntry.int(len(df), "row count"),
           EventMetadataEntry.url("http://mysite/users_table", "data url")

def new_metadata_solid(df):
    yield AssetMaterialization(
           "table name": "users_table",
           "row count": len(df),
           "data url": EventMetadata.url("http://mysite/users_table")

  • The dagster-daemon process now has a --heartbeat-tolerance argument that allows you to configure how long the process can run before shutting itself down due to a hanging thread. This parameter can be used to troubleshoot failures with the daemon process.
  • When creating a schedule from a partition set using PartitionSetDefinition.create_schedule_definition, the partition_selector function that determines which partition to use for a given schedule tick can now return a list of partitions or a single partition, allowing you to create schedules that create multiple runs for each schedule tick.


  • Runs submitted via backfills can now correctly resolve the source run id when loading inputs from previous runs instead of encountering an unexpected KeyError.
  • Using nested Dict and Set types for solid inputs/outputs now works as expected. Previously a structure like Dict[str, Dict[str, Dict[str, SomeClass]]] could result in confusing errors.
  • Dagstermill now correctly loads the config for aliased solids instead of loading from the incorrect place which would result in empty solid_config.
  • Error messages when incomplete run config is supplied are now more accurate and precise.
  • An issue that would cause map and collect steps downstream of other map and collect steps to mysteriously not execute when using multiprocess executors has been resolved.


  • Typo fixes and improvements (thanks @elsenorbw !)
  • Improved documentation for scheduling partitions
dagster -

Published by sidkmenon-zz over 3 years ago


  • For pipelines with tags defined in code, display these tags in the Dagit playground.
  • On the Dagit asset list page, use a polling query to regularly refresh the asset list.
  • When viewing the Dagit asset list, persist the user’s preference between the flattened list view and the directory structure view.
  • Added solid_exception on HookContext which returns the actual exception object thrown in a failed solid. See the example “Accessing failure information in a failure hook“ for more details.
  • Added solid_output_values on HookContext which returns the computed output values.
  • Added make_values_resource helper for defining a resource that passes in user-defined values. This is useful when you want multiple solids to share values. See the example for more details.
  • StartupProbes can now be set to disabled in Helm charts. This is useful if you’re running on a version earlier than Kubernetes 1.16.


  • Fixed an issue where partial re-execution was not referencing the right source run and failed to load the correct persisted outputs.
  • When running Dagit with --path-prefix, our color-coded favicons denoting the success or failure of a run were not loading properly. This has been fixed.
  • Hooks and tags defined on solid invocations now work correctly when executing a pipeline with a solid subselection
  • Fixed an issue where heartbeats from the dagster-daemon process would not appear on the Status page in dagit until the process had been running for 30 seconds
  • When filtering runs, Dagit now suggests all “status:” values and other auto-completions in a scrolling list
  • Fixed asset catalog where nested directory structure links flipped back to the flat view structure

Community Contributions

  • [Helm] The Dagit service port is now configurable (thanks @trevenrawr!)
  • [Docs] Cleanup & updating visual aids (thanks @keypointt!)


  • [Dagster-GraphQL] Added an official Python Client for Dagster’s GraphQL API (GH issue #2674). Docs can be found here


  • Fixed a confusingly-worded header on the Solids/Pipelines Testing pag
dagster -

Published by mgasner over 3 years ago

Breaking Changes

  • DagsterInstance.get() no longer falls back to an ephemeral instance if DAGSTER_HOME is not set. We don’t expect this to break normal workflows. This change allows our tooling to be more consistent around it’s expectations. If you were relying on getting an ephemeral instance you can use DagsterInstance.ephemeral() directly.
  • Undocumented attributes on HookContext have been removed. step_key and mode_def have been documented as attributes.


  • Added a permanent, linkable panel in the Run view in Dagit to display the raw compute logs.
  • Added more descriptive / actionable error messages throughout the config system.
  • When viewing a partitioned asset in Dagit, display only the most recent materialization for a partition, with a link to view previous materializations in a dialog.
  • When viewing a run in Dagit, individual log line timestams now have permalinks. When loading a timestamp permalink, the log table will highlight and scroll directly to that line.
  • The default config_schema for all configurable objects - solids, resources, IO managers, composite solids, executors, loggers - is now Any. This means that you can now use configuration without explicitly providing a config_schema. Refer to the docs for more details: https://docs.dagster.io/concepts/configuration/config-schema.
  • When launching an out of process run, resources are no longer initialized in the orchestrating process. This should give a performance boost for those using out of process execution with heavy resources (ie, spark context).
  • input_defs and output_defs on @solid will now flexibly combine data that can be inferred from the function signature that is not declared explicitly via InputDefinition / OutputDefinition. This allows for more concise defining of solids with reduced repetition of information.
  • [Helm] Postgres storage configuration now supports connection string parameter keywords.
  • The Status page in Dagit will now display errors that were surfaced in the dagster-daemon process within the last 5 minutes. Previously, it would only display errors from the last 30 seconds.
  • Hanging sensors and schedule functions will now raise a timeout exception after 60 seconds, instead of crashing the dagster-daemon process.
  • The DockerRunLauncher now accepts a container_kwargs config parameter, allowing you to specify any argument to the run container that can be passed into the Docker containers.run method. See https://docker-py.readthedocs.io/en/stable/containers.html#docker.models.containers.ContainerCollection.run for the full list of available options.
  • Added clearer error messages for when a Partition cannot be found in a Partition Set.
  • The celery_k8s_job_executor now accepts a job_wait_timeout allowing you to override the default of 24 hours.


  • Fixed the raw compute logs in Dagit, which were not live updating as the selected step was executing.
  • Fixed broken links in the Backfill table in Dagit when Dagit is started with a --prefix-path argument.
  • Showed failed status of backfills in the Backfill table in Dagit, along with an error stack trace. Previously, the backfill jobs were stuck in a Requested state.
  • Previously, if you passed a non-required Field to the output_config_schema or input_config_schema arguments of @io_manager, the config would still be required. Now, the config is not required.
  • Fixed nested subdirectory views in the Assets catalog, where the view switcher would flip back from the directory view to the flat view when navigating into subdirectories.
  • Fixed an issue where the dagster-daemon process would crash if it experienced a transient connection error while connecting to the Dagster database.
  • Fixed an issue where the dagster-airflow scaffold command would raise an exception if a preset was specified.
  • Fixed an issue where Dagit was not including the error stack trace in the Status page when a repository failed to load.
dagster -

Published by mgasner over 3 years ago

dagster -

Published by mgasner over 3 years ago

dagster -

Published by mgasner over 3 years ago

dagster -

Published by mgasner over 3 years ago

dagster -

Published by mgasner over 3 years ago

dagster -

Published by mgasner over 3 years ago

dagster -

Published by mgasner over 3 years ago

dagster -

Published by mgasner over 3 years ago