dagster

An orchestration platform for the development, production, and observation of data assets.

APACHE-2.0 License

Downloads
12.2M
Stars
11.1K
Committers
367

Bot releases are visible (Hide)

dagster - 0.14.17 (May 26, 2022)

Published by gibsondan over 2 years ago

New

  • Added a pin to protobuf version 3 due to a backwards incompatible change in the probobuf version 4 release.
  • [helm] The name of the Dagit deployment can now be overridden in the Dagster Helm chart.
  • [dagit] The left navigation now shows jobs as expandable lists grouped by repository. You can opt out of this change using the feature flag in User Settings.
  • [dagit] In the left navigation, when a job has more than one schedule or sensor, clicking the schedule/sensor icon will now display a dialog containing the full list of schedules and sensors for that job.
  • [dagit] Assets on the runs page are now shown in more scenarios.
  • [dagster-dbt] dbt assets now support subsetting! In dagit, you can launch off a dbt command which will only refresh the selected models, and when you’re building jobs using AssetGroup.build_job(), you can define selections which select subsets of the loaded dbt project.
  • [dagster-dbt] [experimental] The load_assets_from_dbt_manifest function now supports an experimental select parameter. This allows you to use dbt selection syntax to select from an existing manifest.json file, rather than having Dagster re-compile the project on demand.
  • For software-defined assets, OpExecutionContext now exposes an asset_key_for_output method, which returns the asset key that one of the op’s outputs corresponds too.
  • The Backfills tab in Dagit loads much faster when there have been backfills that produced large numbers of runs.
  • Added the ability to run the Dagster Daemon as a Python module, by running python -m dagster.daemon.
  • The non_argument_deps parameter for the asset and multi_asset decorators can now be a set of strings in addition to a set of AssetKey.

Bugfixes

  • [dagit] In cases where Dagit is unable to make successful WebSocket connections, run logs could become stuck in a loading state. Dagit will now time out on the WebSocket connection attempt after a brief period of time. This allows run logs to fall back to http requests and move past the loading state.
  • In version 0.14.16, launching an asset materialization run with source assets would error with an InvalidSubsetError. This is now fixed.
  • Empty strings are no longer allowed as AssetKeys.
  • Fixed an issue where schedules built from partitioned job config always ran at midnight, ignoring any hour or minute offset that was specified on the config.
  • Fixed an issue where if the scheduler was interrupted and resumed in the middle of running a schedule tick that produced multiple RunRequests, it would show the same run ID multiple times on the list of runs for the schedule tick.
  • Fixed an issue where Dagit would raise a GraphQL error when a non-dictionary YAML string was entered into the Launchpad.
  • Fixed an issue where Dagster gRPC servers would sometimes raise an exception when loading repositories with many partition sets.
  • Fixed an issue where the snowflake_io_manager would sometimes raise an error with pandas 1.4 or later installed.
  • Fixed an issue where re-executing an entire set of dynamic steps together with their upstream step resulted in DagsterExecutionStepNotFoundError. This is now fixed.
  • [dagit] Added loading indicator for job-scoped partition backfills.
  • Fixed an issue that made it impossible to have graph-backed assets with upstream SourceAssets.

Community Contributions

  • AssetIn can now accept a string that will be coerced to an AssetKey. Thanks @aroig!
  • Runtime type checks improved for some asset-related functions. Thanks @aroig!
  • Docs grammar fixes. Thanks @dwinston!
  • Dataproc ops for dagster-gcp now have user-configurable timeout length. Thanks @3cham!

All Changes

https://github.com/dagster-io/dagster/compare/0.14.16...0.14.17

dagster - 0.14.16 (May 19, 2022)

Published by rexledesma over 2 years ago

New

  • AssetsDefinition.from_graph now accepts a partitions_def argument.
  • @asset-decorated functions can now accept variable keyword arguments.
  • Jobs executed in ECS tasks now report the health status of the ECS task
  • The CLI command dagster instance info now prints the current schema migration state for the configured instance storage.
  • [dagster-dbt] You can now configure a docs_url on the dbt_cli_resource. If this value is set, AssetMaterializations associated with each dbt model will contain a link to the dbt docs for that model.
  • [dagster-dbt] You can now configure a dbt_cloud_host on the dbt_cloud_resource, in the case that your dbt cloud instance is under a custom domain.

Bugfixes

  • Fixed a bug where InputContext.upstream_output was missing the asset_key when it referred to an asset outside the run.
  • When specifying a selection parameter in AssetGroup.build_job(), the generated job would include an incorrect set of assets in certain situations. This has been fixed.
  • Previously, a set of database operational exceptions were masked with a DagsterInstanceSchemaOutdated exception if the instance storage was not up to date with the latest schema. We no longer wrap these exceptions, allowing the underlying exceptions to bubble up.
  • [dagster-airbyte] Fixed issue where successfully completed Airbyte syncs would send a cancellation request on completion. While this did not impact the sync itself, if alerts were set up on that connection, they would get triggered regardless of if the sync was successful or not.
  • [dagster-azure] Fixed an issue where the Azure Data Lake Storage adls2_pickle_io_manager would sometimes fail to recursively delete a folder when cleaning up an output.
  • Previously, if two different jobs with the same name were provided to the same repo, and one was targeted by a sensor/schedule, the job provided by the sensor/schedule would silently overwrite the other job instead of failing. In this release, a warning is fired when this case is hit, which will turn into an error in 0.15.0.
  • Dagit will now display workspace errors after reloading all repositories.

Breaking Changes

  • Calls to instance.get_event_records without an event type filter is now deprecated and will generate a warning. These calls will raise an exception starting in 0.15.0.

Community Contributions

  • @multi_asset now supports partitioning. Thanks @aroig!
  • Orphaned process detection now works correctly across a broader set of platforms. Thanks @aroig!
  • [K8s] Added a new max_concurrent field to the k8s_job_executor that limits the number of concurrent Ops that will execute per run. Since this executor launches a Kubernetes Job per Op, this also limits the number of concurrent Kuberenetes Jobs. Note that this limit is per run, not global. Thanks @kervel!
  • [Helm] Added a new externalConfigmap field as an alternative to dagit.workspace.servers when running the user deployments chart in a separate release. This allows the workspace to be managed outside of the main Helm chart. Thanks @peay!
  • Removed the pin on markupsafe<=2.0.1. Thanks @bollwyvl!

All Changes

https://github.com/dagster-io/dagster/compare/0.14.15...0.14.16

dagster - 0.14.15

Published by benpankow over 2 years ago

New

  • Sensors / schedules can now return a list of RunRequest objects instead of yielding them.
  • Repositories can now contain asset definitions and source assets for the same asset key.
  • OpExecutionContext (provided as the context argument to Ops) now has fields for, run, job_def, job_name, op_def, and op_config. These replace pipeline_run, pipeline_def, etc. (though they are still available).
  • When a job is partitioned using an hourly, daily, weekly, or monthly partitions definition, OpExecutionContext now offers a partition_time_window attribute, which returns a tuple of datetime objects that mark the bounds of the partition’s time window.
  • AssetsDefinition.from_graph now accepts a partitions_def argument.
  • [dagster-k8s] Removed an unnecessary dagster-test-connection pod from the Dagster Helm chart.
  • [dagster-k8s] The k8s_job_executor now polls the event log on a ~1 second interval (previously 0.1). Performance testing showed that this reduced DB load while not significantly impacting run time.
  • [dagit] Removed package pins for Jinja2 and nbconvert.
  • [dagit] When viewing a list of Runs, tags with information about schedules, sensors, and backfills are now more visually prominent and are sorted to the front of the list.
  • [dagit] The log view on Run pages now includes a button to clear the filter input.
  • [dagit] When viewing a list of Runs, you can now hover over a tag to see a menu with an option to copy the tag, and in filtered Run views, an option to add the tag to the filter.
  • [dagit] Configuration editors throughout Dagit now display clear indentation guides, and our previous whitespace indicators have been removed.
  • [dagit] The Dagit Content-Security-Policy has been moved from a <meta> tag to a response header, and several more security and privacy related headers have been added as well.
  • [dagit] Assets with multi-component key paths are always shown as foo/bar in dagit, rather than appearing as foo > bar in some contexts.
  • [dagit] The Asset graph now includes a “Reload definitions” button which reloads your repositories.
  • [dagit] On all DAGs, you can hold shift on the keyboard to switch from mouse wheel / touch pad zooming to panning. This makes it much easier to scroll horizontally at high speed without click-drag-click-drag-click-drag.
  • [dagit] a --log-level flag is now available in the dagit cli for controlling the uvicorn log level.
  • [dagster-dbt] The load_assets_from_dbt_project() and load_assets_from_dbt_manifest() utilities now have a use_build_command parameter. If this flag is set, when materializing your dbt assets, Dagster will use the dbt build command instead of dbt run. Any tests run during this process will be represented with AssetObservation events attached to the relevant assets. For more information on dbt build, see the dbt docs.
  • [dagster-dbt] If a dbt project successfully runs some models and then fails, AssetMaterializations will now be generated for the successful models.
  • [dagster-snowflake] The new Snowflake IO manager, which you can create using build_snowflake_io_manager offers a way to store assets and op outputs in Snowflake. The PandasSnowflakeTypeHandler stores Pandas DataFrames in Snowflake.
  • [helm] dagit.logLevel has been added to values.yaml to access the newly added dagit --log-level cli option.

Bugfixes

  • Fixed incorrect text in the error message that’s triggered when building a job and an asset can’t be found that corresponds to one of the asset dependencies.
  • An error is no longer raised when an op/job/graph/other definition has an empty docstring.
  • Fixed a bug where pipelines could not be executed if toposort<=1.6 was installed.
  • [dagit] Fixed an issue in global search where rendering and navigation broke when results included objects of different types but with identical names.
  • [dagit] server errors regarding websocket send after close no longer occur.
  • [dagit] Fixed an issue where software-defined assets could be rendered improperly when the dagster and dagit versions were out of sync.

Community Contributions

  • [dagster-aws] PickledObjectS3IOManager now uses list_objects to check the access permission. Thanks @trevenrawr!

Breaking Changes

  • [dagster-dbt] The asset definitions produced by the experimental load_assets_from_dbt_project and load_assets_from_dbt_manifest functions now include the schemas of the dbt models in their asset keys. To revert to the old behavior: dbt_assets = load_assets_from_dbt_project(..., node_info_to_asset_key=lambda node_info: AssetKey(node_info["name"]).

Experimental

  • The TableSchema API is no longer experimental.

Documentation

  • Docs site now has a new design!
  • Concepts pages now have links to code snippets in our examples that use those concepts.
dagster - 0.14.14

Published by prha over 2 years ago

New

  • When viewing a config schema in the Dagit launchpad, default values are now shown. Hover over an underlined key in the schema view to see the default value for that key.

  • dagster, dagit, and all extension libraries (dagster-*) now contain py.typed files. This exposes them as typed libraries to static type checking tools like mypy. If your project is using mypy or another type checker, this may surface new type errors. For mypy, to restore the previous state and treat dagster or an extension library as untyped (i.e. ignore Dagster’s type annotations), add the following to your configuration file:

    [mypy-dagster]  (or e.g. mypy-dagster-dbt)
    follow_imports = "skip"
    
  • Op retries now surface the underlying exception in Dagit.

  • Made some internal changes to how we store schema migrations across our different storage implementations.

  • build_output_context now accepts an asset_key argument.

  • They key argument to the SourceAsset constructor now accepts values that are strings or sequences of strings and coerces them to AssetKeys.

  • You can now use the + operator to add two AssetGroups together, which forms an AssetGroup that contains a union of the assets in the operands.

  • AssetGroup.from_package_module, from_modules, from_package_name, and from_current_module now accept an extra_source_assets argument that includes a set of source assets into the group in addition to the source assets scraped from modules.

  • AssetsDefinition and AssetGroup now both expose a to_source_assets method that return SourceAsset versions of their assets, which can be used as source assets for downstream AssetGroups.

  • Repositories can now include multiple AssetGroups.

  • The new prefixed method on AssetGroup returns a new AssetGroup where a given prefix is prepended to the asset key of every asset in the group.

  • Dagster now has a BoolMetadataValue representing boolean-type metadata. Specifying True or False values in metadata will automatically be casted to the boolean type.

  • Tags on schedules can now be expressed as nested JSON dictionaries, instead of requiring that all tag values are strings.

  • If an exception is raised during an op, Dagster will now always run the failure hooks for that op. Before, certain system exceptions would prevent failure hooks from being run.

  • mapping_key can now be provided as an argument to build_op_context/build_solid_context. Doing so will allow the use of OpExecutionContext.get_mapping_key().

Bugfixes

  • [dagit] Previously, when viewing a list of an asset’s materializations from a specified date/time, a banner would always indicate that it was a historical view. This banner is no longer shown when viewing the most recent materialization.
  • [dagit] Special cron strings like @daily were treated as invalid when converting to human-readable strings. These are now handled correctly.
  • The selection argument to AssetGroup.build_job now uses > instead of . for delimiting the components within asset keys, which is consistent with how selection works in Dagit.
  • [postgres] passwords and usernames are now correctly url quoted when forming a connection string. Previously spaces were replaced with +.
  • Fixed an issue where the celery_docker_executor would sometimes fail to execute with a JSON deserialization error when using Dagster resources that write to stdout.
  • [dagster-k8s] Fixed an issue where the Helm chart failed to work when the user code deployment subchart was used in a different namespace than the main dagster Helm chart, due to missing configmaps.
  • [dagster-airbyte] When a Dagster run is terminated while executing an Airbyte sync operation, the corresponding Airbyte sync will also be terminated.
  • [dagster-dbt] Log output from dbt cli commands will no longer have distracting color-formatting characters.
  • [dagit] Fixed issue where multi_assets would not show correct asset dependency information.
  • Fixed an issue with the sensor daemon, where the sensor would sometimes enter a race condition and overwrite the sensor status.

Community Contributions

  • [dagster-graphql] The Python DagsterGraphQLClient now supports terminating in-progress runs using client.terminate_run(run_id). Thanks @Javier162380!

Experimental

  • Added an experimental view of the Partitions page / Backfill page, gated behind a feature flag in Dagit.
dagster - 0.14.13

Published by jmsanders over 2 years ago

New

  • [dagster-k8s] You can now specify resource requests and limits to the K8sRunLauncher when using the Dagster helm chart, that will apply to all runs. Before, you could only set resource configuration by tagging individual jobs. For example, you can set this config in your values.yaml file:
runLauncher:
  type: K8sRunLauncher
  config:
    k8sRunLauncher:
      resources:
        limits:
          cpu: 100m
          memory: 128Mi
        requests:
          cpu: 100m
          memory: 128Mi
  • [dagster-k8s] Specifying includeConfigInLaunchedRuns: true in a user code deployment will now launch runs using the same namespace and service account as the user code deployment.
  • The @asset decorator now accepts an op_tags argument, which allows e.g. providing k8s resource requirements on the op that computes the asset.
  • Added CLI output to dagster api grpc-health-check (previously it just returned via exit codes)
  • [dagster-aws] The emr_pyspark_step_launcher now supports dynamic orchestration, RetryPolicys defined on ops, and re-execution from failure. For failed steps, the stack trace of the root error will now be available in the event logs, as will logs generated with context.log.info.
  • Partition sets and can now return a nested dictionary in the tags_fn_for_partition function, instead of requiring that the dictionary have string keys and values.
  • [dagit] It is now possible to perform bulk re-execution of runs from the Runs page. Failed runs can be re-executed from failure.
  • [dagit] Table headers are now sticky on Runs and Assets lists.
  • [dagit] Keyboard shortcuts may now be disabled from User Settings. This allows users with certain keyboard layouts (e.g. QWERTZ) to inadvertently avoid triggering unwanted shortcuts.
  • [dagit] Dagit no longer continues making some queries in the background, improving performance when many browser tabs are open.
  • [dagit] On the asset graph, you can now filter for multi-component asset keys in the search bar and see the “kind” tags displayed on assets with a specified compute_kind.
  • [dagit] Repositories are now displayed in a stable order each time you launch Dagster.

Bugfixes

  • [dagster-k8s] Fixed an issue where the Dagster helm chart sometimes failed to parse container images with numeric tags. Thanks @jrouly!
  • [dagster-aws] The EcsRunLauncher now registers new task definitions if the task’s execution role or task role changes.
  • Dagster now correctly includes setuptools as a runtime dependency.
  • In can now accept asset_partitions without crashing.
  • [dagit] Fixed a bug in the Launchpad, where default configuration failed to load.
  • [dagit] Global search now truncates the displayed list of results, which should improve rendering performance.
  • [dagit] When entering an invalid search filter on Runs, the user will now see an appropriate error message instead of a spinner and an alert about a GraphQL error.

Documentation

  • Added documentation for partitioned assets
  • [dagster-aws] Fixed example code of a job using secretsmanager_resource.
dagster - 0.14.12

Published by gibsondan over 2 years ago

Bugfixes

  • Fixed an issue where the Launchpad in Dagit sometimes incorrectly launched in an empty state.
dagster - 0.14.11

Published by gibsondan over 2 years ago

Bugfixes

  • Fixed an issue where schedules created from partition sets that launched runs for multiple partitions in a single schedule tick would sometimes time out while generating runs in the scheduler.
  • Fixed an issue where nested graphs would sometimes incorrectly determine the set of required resources for a hook.
dagster - 0.14.10

Published by Ramshackle-Jamathon over 2 years ago

New

  • [dagster-k8s] Added an includeConfigInLaunchedRuns flag to the Helm chart that can be used to automatically include configmaps, secrets, and volumes in any runs launched from code in a user code deployment. See https://docs.dagster.io/deployment/guides/kubernetes/deploying-with-helm#configure-your-user-deployment for more information.
  • [dagit] Improved display of configuration yaml throughout Dagit, including better syntax highlighting and the addition of line numbers.
  • The GraphQL input argument type BackfillParams (used for launching backfills), now has an allPartitions boolean flag, which can be used instead of specifying all the individual partition names.
  • Removed gevent and gevent-websocket dependencies from dagster-graphql
  • Memoization is now supported while using step selection
  • Cleaned up various warnings across the project
  • The default IO Managers now support asset partitions

Bugfixes

  • Fixed sqlite3.OperationalError error when viewing schedules/sensors pages in Dagit. This was affecting dagit instances using the default SQLite schedule storage with a SQLite version < 3.25.0.
  • Fixed a bug with type-checking of the asset_partitions arg passed to In
  • Fixed an issues where schedules and sensors would sometimes fail to run when the daemon and dagit were running in different Python environments.
  • Fixed an exception when the telemetry file is empty
  • fixed a bug with @graph composition which would cause the wrong input definition to be used for type checks
  • [dagit] For users running Dagit with --path-prefix, large DAGs failed to render due to a WebWorker error, and the user would see an endless spinner instead. This has been fixed.
  • [dagit] Fixed a rendering bug in partition set selector dropdown on Launchpad.
  • [dagit] Fixed the ‘View Assets’ link in Job headers
  • Fixed an issue where root input managers with resource dependencies would not work with software defined assets

Community Contributions

  • dagster-census is a new library that includes a census_resource for interacting the Census REST API, census_trigger_sync_op for triggering a sync and registering an asset once it has finished, and a CensusOutput type. Thanks @dehume!
  • Docs fix. Thanks @ascrookes!
dagster - 0.14.9

Published by gibsondan over 2 years ago

New

  • Added a parameter in dagster.yaml that can be used to increase the time that Dagster waits when spinning up a gRPC server before timing out. For more information, see https://docs.dagster.io/deployment/dagster-instance#code-servers.
  • Added a new graphQL field assetMaterializations that can be queried off of a DagsterRun field. You can use this field to fetch the set of asset materialization events generated in a given run within a GraphQL query.
  • Docstrings on functions decorated with the @resource decorator will now be used as resource descriptions, if no description is explicitly provided.
  • You can now point dagit -m or dagit -f at a module or file that has asset definitions but no jobs or asset groups, and all the asset definitions will be loaded into Dagit.
  • AssetGroup now has a materialize method which executes an in-process run to materialize all the assets in the group.
  • AssetGroups can now contain assets with different partition_defs.
  • Asset materializations produced by the default asset IO manager, fs_asset_io_manager, now include the path of the file where the values were saved.
  • You can now disable the max_concurrent_runs limit on the QueuedRunCoordinator by setting it to -1. Use this if you only want to limit runs using tag_concurrency_limits.
  • [dagit] Asset graphs are now rendered asynchronously, which means that Dagit will no longer freeze when rendering a large asset graph.
  • [dagit] When viewing an asset graph, you can now double-click on an asset to zoom in, and you can use arrow keys to navigate between selected assets.
  • [dagit] The “show whitespace” setting in the Launchpad is now persistent.
  • [dagit] A bulk selection checkbox has been added to the repository filter in navigation or Instance Overview.
  • [dagit] A “Copy config” button has been added to the run configuration dialog on Run pages.
  • [dagit] An “Open in Launchpad” button has been added to the run details page.
  • [dagit] The Run page now surfaces more information about start time and elapsed time in the header.
  • [dagster-dbt] The dbt_cloud_resource has a new get_runs() function to get a list of runs matching certain paramters from the dbt Cloud API (thanks @kstennettlull!)
  • [dagster-snowflake] Added an authenticator field to the connection arguments for the snowflake_resource (thanks @swotai!).
  • [celery-docker] The celery docker executor has a new configuration entry container_kwargs that allows you to specify additional arguments to pass to your docker containers when they are run.

Bugfixes

  • Fixed an issue where loading a Dagster repository would fail if it included a function to lazily load a job, instead of a JobDefinition.
  • Fixed an issue where trying to stop an unloadable schedule or sensor within Dagit would fail with an error.
  • Fixed telemetry contention bug on windows when running the daemon.
  • [dagit] Fixed a bug where the Dagit homepage would claim that no jobs or pipelines had been loaded, even though jobs appeared in the sidebar.
  • [dagit] When filtering runs by tag, tag values that contained the : character would fail to parse correctly, and filtering would therefore fail. This has been fixed.
  • [dagster-dbt] When running the “build” command using the dbt_cli_resource, the run_results.json file will no longer be ignored, allowing asset materializations to be produced from the resulting output.
  • [dagster-airbyte] Responses from the Airbyte API with a 204 status code (like you would get from /connections/delete) will no longer produce raise an error (thanks @HAMZA310!)
  • [dagster-shell] Fixed a bug where shell ops would not inherit environment variables if any environment variables were added for ops (thanks @kbd!)
  • [dagster-postgres] usernames are now urlqouted in addition to passwords

Documentation

dagster - 0.14.8

Published by clairelin135 over 2 years ago

New

  • The MySQL storage implementations for Dagster storage is no longer marked as experimental.
  • run_id can now be provided as an argument to execute_in_process.
  • The text on dagit’s empty state no longer mentions the legacy concept “Pipelines”.
  • Now, within the IOManager.load_input method, you can add input metadata via InputContext.add_input_metadata. These metadata entries will appear on the LOADED_INPUT event and if the input is an asset, be attached to an AssetObservation. This metadata is viewable in dagit.

Bugfixes

  • Fixed a set of bugs where schedules and sensors would get out of sync between dagit and dagster-daemon processes. This would manifest in schedules / sensors getting marked as “Unloadable” in dagit, and ticks not being registered correctly. The fix involves changing how Dagster stores schedule/sensor state and requires a schema change using the CLI command dagster instance migrate. Users who are not running into this class of bugs may consider the migration optional.
  • root_input_manager can now be specified without a context argument.
  • Fixed a bug that prevented root_input_manager from being used with VersionStrategy.
  • Fixed a race condition between daemon and dagit writing to the same telemetry logs.
  • [dagit] In dagit, using the “Open in Launchpad” feature for a run could cause server errors if the run configuration yaml was too long. Runs can now be opened from this feature regardless of config length.
  • [dagit] On the Instance Overview page in dagit, runs in the timeline view sometimes showed incorrect end times, especially batches that included in-progress runs. This has been fixed.
  • [dagit] In the dagit launchpad, reloading a repository should present the user with an option to refresh config that may have become stale. This feature was broken for jobs without partition sets, and has now been fixed.
  • Fixed issue where passing a stdlib typing type as dagster_type to input and output definition was incorrectly being rejected.
  • [dagster-airbyte] Fixed issue where AssetMaterialization events would not be generated for streams that had no updated records for a given sync.
  • [dagster-dbt] Fixed issue where including multiple sets of dbt assets in a single repository could cause a conflict with the names of the underlying ops.
dagster - 0.14.7

Published by jfineberg over 2 years ago

New

  • [helm] Added configuration to explicitly enable or disable telemetry.
  • Added a new IO manager for materializing assets to Azure ADLS. You can specify this IO manager for your AssetGroups by using the following config:
from dagster import AssetGroup
from dagster_azure import adls2_pickle_asset_io_manager, adls2_resource
asset_group = AssetGroup(
    [upstream_asset, downstream_asset],
    resource_defs={"io_manager": adls2_pickle_asset_io_manager, "adls2": adls2_resource}
)
  • Added ability to set a custom start time for partitions when using @hourly_partitioned_config , @daily_partitioned_config, @weekly_partitioned_config, and @monthly_partitioned_config
  • Run configs generated from partitions can be retrieved using the PartitionedConfig.get_run_config_for_partition_key function. This will allow the use of the validate_run_config function in unit tests.
  • [dagit] If a run is re-executed from failure, and the run fails again, the default action will be to re-execute from the point of failure, rather than to re-execute the entire job.
  • PartitionedConfig now takes an argument tags_for_partition_fn which allows for custom run tags for a given partition.

Bugfixes

  • Fixed a bug in the message for reporting Kubernetes run worker failures
  • [dagit] Fixed issue where re-executing a run that materialized a single asset could end up re-executing all steps in the job.
  • [dagit] Fixed issue where the health of an asset’s partitions would not always be up to date in certain views.
  • [dagit] Fixed issue where the “Materialize All” button would be greyed out if a job had SourceAssets defined.

Documentation

  • Updated resource docs to reference “ops” instead of “solids” (thanks @joe-hdai!)
  • Fixed formatting issues in the ECS docs
dagster - 0.14.6

Published by prha over 2 years ago

New

  • Added IO manager for materializing assets to GCS. You can specify the GCS asset IO manager by using the following config for resource_defs in AssetGroup:
from dagster import AssetGroup, gcs_pickle_asset_io_manager, gcs_resource
asset_group = AssetGroup(
    [upstream_asset, downstream_asset],
    resource_defs={"io_manager": gcs_pickle_asset_io_manager, "gcs": gcs_resource}
)
  • Improved the performance of storage queries run by the sensor daemon to enforce the idempotency of run keys. This should reduce the database CPU when evaluating sensors with a large volume of run requests with run keys that repeat across evaluations.
  • [dagit] Added information on sensor ticks to show when a sensor has requested runs that did not result in the creation of a new run due to the enforcement of idempotency using run keys.
  • [k8s] Run and step workers are now labeled with the Dagster run id that they are currently handling.
  • If a step launched with a StepLauncher encounters an exception, that exception / stack trace will now appear in the event log.

Bugfixes

  • Fixed a race condition where canceled backfills would resume under certain conditions.
  • Fixed an issue where exceptions that were raised during sensor and schedule execution didn’t always show a stack trace in Dagit.
  • During execution, dependencies will now resolve correctly for certain dynamic graph structures that were previously resolving incorrectly.
  • When using the forkserver start_method on the multiprocess executor, preload_modules have been adjusted to prevent libraries that change namedtuple serialization from causing unexpected exceptions.
  • Fixed a naming collision between dagster decorators and submodules that sometimes interfered with static type checkers (e.g. pyright).
  • [dagit] postgres database connection management has improved when watching actively executing runs
  • [dagster-databricks] The databricks_pyspark_step_launcher now supports steps with RetryPolicies defined, as well as RetryRequested exceptions.

Community Contributions

  • Docs spelling fixes - thanks @antquinonez!
dagster - 0.14.5

Published by jmsanders over 2 years ago

0.14.5

Bugfixes

  • [dagit] Fixed issue where sensors could not be turned on/off in dagit.
  • Fixed a bug with direct op invocation when used with funcsigs.partial that would cause incorrect InvalidInvocationErrors to be thrown.
  • Internal code no longer triggers deprecation warnings for all runs.
dagster - 0.14.4

Published by jmsanders over 2 years ago

New

  • Dagster now supports non-standard vixie-style cron strings, like @hourly, @daily, @weekly, and @monthly in addition to the standard 5-field cron strings (e.g. * * * * *).
  • value is now an alias argument of entry_data (deprecated) for the MetadataEntry constructor.
  • Typed metadata can now be attached to SourceAssets and is rendered in dagit.
  • When a step fails to upload its compute log to Dagster, it will now add an event to the event log with the stack trace of the error instead of only logging the error to the process output.
  • [dagit] Made a number of improvements to the Schedule/Sensor pages in Dagit, including showing a paginated table of tick information, showing historical cursor state, and adding the ability to set a cursor from Dagit. Previously, we only showed tick information on the timeline view and cursors could only be set using the dagster CLI.
  • [dagit] When materializing assets, Dagit presents a link to the run rather than jumping to it, and the status of the materialization (pending, running, failed) is shown on nodes in the asset graph.
  • [dagit] Dagit now shows sensor and schedule information at the top of asset pages based on the jobs in which the asset appears.
  • [dagit] Dagit now performs "middle truncation" on gantt chart steps and graph nodes, making it much easier to differentiate long assets and ops.
  • [dagit] Dagit no longer refreshes data when tabs are in the background, lowering browser CPU usage.
  • dagster-k8s, dagster-celery-k8s, and dagster-docker now name step workers dagster-step-... rather than dagster-job-....
  • [dagit] The launchpad is significantly more responsive when you're working with very large partition sets.
  • [dagit] We now show an informative message on the Asset catalog table when there are no matching assets to display. Previously, we would show a blank white space.
  • [dagit] Running Dagit without a backfill daemon no longer generates a warning unless queued backfills are present. Similarly, a missing sensor or schedule daemon only yields a warning if sensors or schedules are turned on.
  • [dagit] On the instance summary page, hovering over a recent run’s status dot shows a more helpful tooltip.
  • [dagster-k8s] Improved performance of the k8s_job_executor for runs with many user logs
  • [dagster-k8s] When using the dagster-k8s/config tag to configure Dagster Kubernetes pods, the tags can now accept any valid Kubernetes config, and can be written in either snake case (node_selector_terms) or camel case (nodeSelectorTerms). See the docs for more information.
  • [dagster-aws] You can now set secrets on the EcsRunLauncher using the same syntax that you use to set secrets in the ECS API.
  • [dagster-aws] The EcsRunLauncher now attempts to reuse task definitions instead of registering a new task definition for every run.
  • [dagster-aws] The EcsRunLauncher now raises the underlying ECS API failure if it cannot successfully start a task.

Software-Defined Assets

  • When loading assets from modules using AssetGroup.from_package_name and similar methods, lists of assets at module scope are now loaded.
  • Added the static methods AssetGroup.from_modules and AssetGroup.from_current_module, which automatically load assets at module scope from particular modules.
  • Software-defined assets jobs can now load partitioned assets that are defined outside the job.
  • AssetGraph.from_modules now correctly raises an error if multiple assets with the same key are detected.
  • The InputContext object provided to IOManager.load_input previously did not include resource config. Now it does.
  • Previously, if an assets job had a partitioned asset as well as a non-partitioned asset that depended on another non-partitioned asset, it would fail to run. Now it runs without issue.
  • [dagit] The asset "View Upstream Graph" links no longer select the current asset, making it easier to click "Materialize All".
  • [dagit] The asset page's "partition health bar" highlights missing partitions better in large partition sets.
  • [dagit] The asset "Materialize Partitions" modal now presents an error when partition config or tags cannot be generated.
  • [dagit] The right sidebar of the global asset graph no longer defaults to 0% wide in fresh / incognito browser windows, which made it difficult to click nodes in the global graph.
  • [dagit] In the asset catalog, the search bar now matches substrings so it's easier to find assets with long path prefixes.
  • [dagit] Dagit no longer displays duplicate downstream dependencies on the Asset Details page in some scenarios.
  • [dagster-fivetran] Assets created using build_fivetran_assets will now be properly tagged with a fivetran pill in Dagit.

Bugfixes

  • Fixed issue causing step launchers to fail in many scenarios involving re-execution or dynamic execution.
  • Previously, incorrect selections (generally, step selections) could be generated for strings of the form ++item. This has been fixed.
  • Fixed an issue where run status sensors sometimes logged the wrong status to the event log if the run moved into a different status while the sensor was running.
  • Fixed an issue where daily schedules sometimes produced an incorrect partition name on spring Daylight Savings time boundaries.
  • [dagit] Certain workspace or repo-scoped pages relied on versions of the SQLAlchemy package to be 1.4 or greater to be installed. We are now using queries supported by SQLAlchemy>=1.3. Previously we would raise an error including the message: 'Select' object has no attribute 'filter'.
  • [dagit] Certain workspace or repo-scoped pages relied on versions of sqlite to be 3.25.0 or greater to be installed. This has been relaxed to support older versions of sqlite. This was previously marked as fixed in our 0.14.0 notes, but a handful of cases that were still broken have now been fixed. Previously we would raise an error (sqlite3.OperationalError).
  • [dagit] When changing presets / partitions in the launchpad, Dagit preserves user-entered tags and replaces only the tags inherited from the previous base.
  • [dagit] Dagit no longer hangs when rendering the run gantt chart for certain graph structures.
  • [dagster-airbyte] Fixed issues that could cause failures when generating asset materializations from an Airbyte API response.
  • [dagster-aws] 0.14.3 removed the ability for the EcsRunLauncher to use sidecars without you providing your own custom task definition. Now, you can continue to inherit sidecars from the launching task’s task definition by setting include_sidecars: True in your run launcher config.

Breaking Changes

  • dagster-snowflake has dropped support for python 3.6. The library it is currently built on, snowflake-connector-python, dropped 3.6 support in their recent 2.7.5 release.

Community Contributions

  • MetadataValue.path() and PathMetadataValue now accept os.PathLike objects in addition to strings. Thanks@abkfenris!
  • [dagster-k8s] Fixed configuration of env_vars on the k8s_job_executor. Thanks @kervel!
  • Typo fix on the Create a New Project page. Thanks @frcode!

Documentation

  • Concepts sections added for Op Retries and Dynamic Graphs
  • The Hacker News Assets demo now uses AssetGroup instead of build_assets_job, and it can now be run entirely from a local machine with no additional infrastructure (storing data inside DuckDB).
  • The Software-Defined Assets guide in the docs now uses AssetGroup instead of build_assets_job.
dagster - 0.14.3

Published by gibsondan over 2 years ago

New

  • When using an executor that runs each op in its own process, exceptions in the Dagster system code that result in the op process failing will now be surfaced in the event log.
  • Introduced new SecretsManager resources to the dagster-aws package to enable loading secrets into Jobs more easily. For more information, see the documentation.
  • Daemon heartbeats are now processed in a batch request to the database.
  • Job definitions now contain a method called run_request_for_partition, which returns a RunRequest that can be returned in a sensor or schedule evaluation function to launch a run for a particular partition for that job. See our documentation for more information.
  • Renamed the filter class from PipelineRunsFilter => RunsFilter.
  • Assets can now be directly invoked for unit testing.
  • [dagster-dbt] load_assets_from_dbt_project will now attach schema information to the generated assets if it is available in the dbt project (schema.yml).
  • [examples] Added an example that demonstrates using Software Defined Assets with Airbyte, dbt, and custom Python.
  • The default io manager used in the AssetGroup api is now the fs_asset_io_manager.
  • It's now possible to build a job where partitioned assets depend on partitioned assets that are maintained outside the job, and for those upstream partitions to show up on the context in the op and IOManager load_input function.
  • SourceAssets can now be partitioned, by setting the partitions_def argument.

Bugfixes

  • Fixed an issue where run status sensors would sometimes fire multiple times for the same run if the sensor function raised an error.
  • [ECS] Previously, setting cpu/memory tags on a job would override the ECS task’s cpu/memory, but not individual containers. If you were using a custom task definition that explicitly sets a container’s cpu/memory, the container would not resize even if you resized the task. Now, setting cpu/memory tags on a job overrides both the ECS task’s cpu/memory and the container's cpu/memory.
  • [ECS] Previously, if the EcsRunLauncher launched a run from a task with multiple containers - for example if both dagit and daemon were running in the same task - then the run would be launched with too many containers. Now, the EcsRunLauncher only launches tasks with a single container.
  • Fixed an issue where the run status of job invoked through execute_in_process was not updated properly.
  • Fixed some storage queries that were incompatible with versions of SQLAlchemy<1.4.0.
  • [dagster-dbt] Fixed issue where load_assets_from_dbt_project would fail if models were organized into subdirectories.
  • [dagster-dbt] Fixed issue where load_assets_from_dbt_project would fail if seeds or snapshots were present in the project.

Community Contributions

  • [dagster-fivetran] A new fivetran_resync_op (along with a corresponding resync_and_poll method on the fivetran_resource) allows you to kick off Fivetran resyncs using Dagster (thanks @dwallace0723!)

  • [dagster-shell] Fixed an issue where large log output could cause operations to hang (thanks @kbd!)

  • [documentation] Fixed export message with dagster home path (thanks @proteusiq)!

  • [documentation] Remove duplicate entries under integrations (thanks @kahnwong)!

UI

  • Added a small toggle to the right of each graph on the asset details page, allowing them to be toggled on and off.
  • Full asset paths are now displayed on the asset details page.

Documentation

  • Added API doc entries for validate_run_config.
  • Fixed the example code for the reexecute_pipeline API.
  • TableRecord, TableSchema and its constituents are now documented in the API docs.
  • Docs now correctly use new metadata names MetadataEntry and MetadataValue instead of old ones.
dagster - 0.14.2

Published by benpankow over 2 years ago

New

  • Run status sensors can now be invoked in unit tests. Added build_run_sensor_status_context to help build context objects for run status sensors

Bugfixes

  • An issue preventing the use of default_value on inputs has been resolved. Previously, a defensive error that did not take default_value in to account was thrown.
  • [dagster-aws] Fixed issue where re-emitting log records from the pyspark_step_launcher would occasionally cause a failure.
  • [dagit] The asset catalog now displays entries for materialized assets when only a subset of repositories were selected. Previously, it only showed the software-defined assets unless all repositories were selected in Dagit.

Community Contributions

  • Fixed an invariant check in the databricks step launcher that was causing failures when setting the local_dagster_job_package_path config option (Thanks Iswariya Manivannan!)

Documentation

  • Fixed the example code in the reconstructable API docs.
dagster - 0.14.1

Published by alangenfeld over 2 years ago

New

  • [dagit] The sensor tick timeline now shows cursor values in the tick tooltip if they exist.

Bugfixes

  • Pinned dependency on markupsafe to function with existing Jinja2 pin.
  • Sensors that have a default status can now be manually started. Previously, this would fail with an invariant exception.
dagster - 0.14.0

Published by clairelin135 over 2 years ago

“Never Felt Like This Before”

Major Changes

  • Software-defined assets, which offer a declarative approach to data orchestration on top of Dagster’s core job/op/graph APIs, have matured significantly. Improvements include partitioned assets, a revamped asset details page in Dagit, a cross-repository asset graph view in Dagit, Dagster types on assets, structured metadata on assets, and the ability to materialize ad-hoc selections of assets without defining jobs. Users can expect the APIs to only undergo minor changes before being declared fully stable in Dagster’s next major release. For more information, view the software-defined assets concepts page here.
  • We’ve made it easier to define a set of software-defined assets where each Dagster asset maps to a dbt model. All of the dependency information between the dbt models will be reflected in the Dagster asset graph, while still running your dbt project in a single step.
  • Dagit has a new homepage, dubbed the “factory floor” view, that provides an overview of recent runs of all the jobs. From it, you can monitor the status of each job’s latest run or quickly re-execute a job. The new timeline view reports the status of all recent runs in a convenient gantt chart.
  • You can now write schedules and sensors that default to running as soon as they are loaded in your workspace, without needing to be started manually in Dagit. For example, you can create a sensor like this:
    from dagster import sensor, DefaultSensorStatus
    
    @sensor(job=my_job, default_status=DefaultSensorStatus.RUNNING)
    def my_running_sensor():
        ...
    
    or a schedule like this:
    from dagster import schedule, DefaultScheduleStatus, ScheduleEvaluationContext
    
    @schedule(job=my_job, cron_schedule="0 0 * * *", default_status=DefaultScheduleStatus.RUNNING)
    def my_running_schedule(context: ScheduleEvaluationContext):
        ...
    
    As soon as schedules or sensors with the default_status field set to RUNNING are included in the workspace loaded by your Dagster Daemon, they will begin creating ticks and submitting runs.
  • Op selection now supports selecting ops inside subgraphs. For example, to select an op my_op inside a subgraph my_graph, you can now specify the query as my_graph.my_op. This is supported in both Dagit and Python APIs.
  • Dagster Types can now have attached metadata. This allows TableSchema objects to be attached to Dagster Types via TableSchemaMetadata. A Dagster Type with a TableSchema will have the schema rendered in Dagit.
  • A new Pandera integration (dagster-pandera) allows you to use Pandera’s dataframe validation library to wrap dataframe schemas in Dagster types. This provides two main benefits: (1) Pandera’s rich schema validation can be used for runtime data validation of Pandas dataframes in Dagster ops/assets; (2) Pandera schema information is displayed in Dagit using a new TableSchema API for representing arbitrary table schemas.
  • The new AssetObservation event enables recording metadata about an asset without indicating that the asset has been updated.
  • AssetMaterializations, ExpectationResults, and AssetObservations can be logged via the context of an op using the OpExecutionContext.log_event method. Output metadata can also be logged using the OpExecutionContext.add_output_metadata method. Previously, Dagster expected these events to be yielded within the body of an op, which caused lint errors for many users, made it difficult to add mypy types to ops, and also forced usage of the verbose Output API. Here’s an example of the new invocations:
    from dagster import op, AssetMaterialization
    @op
    def the_op(context):
        context.log_event(AssetMaterialization(...))
        context.add_output_metadata({"foo": "bar"})
        ...
    
  • A new Airbyte integration (dagster-airbyte) allows you to kick off and monitor Airbyte syncs from within Dagster. The original contribution from @airbytehq’s own @marcosmarxm includes a resource implementation as well as a pre-built op for this purpose, and we’ve extended this library to support software-defined asset use cases as well. Regardless of which interface you use, Dagster will automatically capture the Airbyte log output (in the compute logs for the relevant steps) and track the created tables over time (via AssetMaterializations).
  • The ECSRunLauncher (introduced in Dagster 0.11.15) is no longer considered experimental. You can bootstrap your own Dagster deployment on ECS using our docker compose example or you can use it in conjunction with a managed Dagster Cloud deployment. Since its introduction, we’ve added the ability to customize Fargate container memory and CPU, mount secrets from AWS SecretsManager, and run with a variety of AWS networking configurations. Join us in #dagster-ecs in Slack!
  • [Helm] The default liveness and startup probes for Dagit and user deployments have been replaced with readiness probes. The liveness and startup probe for the Daemon has been removed. We observed and heard from users that under load, Dagit could fail the liveness probe which would result in the pod restarting. With the new readiness probe, the pod will not restart but will stop serving new traffic until it recovers. If you experience issues with any of the probe changes, you can revert to the old behavior by specifying liveness and startup probes in your Helm values (and reach out via an issue or Slack).

Breaking Changes and Deprecations

  • The Dagster Daemon now uses the same workspace.yaml file as Dagit to locate your Dagster code. You should ensure that if you make any changes to your workspace.yaml file, they are included in both Dagit’s copy and the Dagster Daemon’s copy. When you make changes to the workspace.yaml file, you don’t need to restart either Dagit or the Dagster Daemon - in Dagit, you can reload the workspace from the Workspace tab, and the Dagster Daemon will periodically check the workspace.yaml file for changes every 60 seconds. If you are using the Dagster Helm chart, no changes are required to include the workspace in the Dagster Daemon.
  • Dagster’s metadata API has undergone a signficant overhaul. Changes include:
    • To reflect the fact that metadata can be specified on definitions in addition to events, the following names are changing. The old names are deprecated, and will function as aliases for the new names until 0.15.0:
      • EventMetadata > MetadataValue
      • EventMetadataEntry > MetadataEntry
      • XMetadataEntryData > XMetadataValue (e.g. TextMetadataEntryData > TextMetadataValue)
    • The metadata_entries keyword argument to events and Dagster types is deprecated. Instead, users should use the metadata keyword argument, which takes a dictionary mapping string labels to MetadataValues.
    • Arbitrary metadata on In/InputDefinition and Out/OutputDefinition is deprecated. In 0.15.0, metadata passed for these classes will need to be resolvable to MetadataValue (i.e. function like metadata everywhere else in Dagster).
    • The description attribute of EventMetadataEntry is deprecated.
    • The static API of EventMetadataEntry (e.g. EventMetadataEntry.text) is deprecated. In 0.15.0, users should avoid constructing EventMetadataEntry objects directly, instead utilizing the metadata dictionary keyword argument, which maps string labels to MetadataValues.
  • In previous releases, it was possible to supply either an AssetKey, or a function that produced an AssetKey from an OutputContext as the asset_key argument to an Out/OutputDefinition. The latter behavior makes it impossible to gain information about these relationships without running a job, and has been deprecated. However, we still support supplying a static AssetKey as an argument.
  • We have renamed many of the core APIs that interact with ScheduleStorage, which keeps track of sensor/schedule state and ticks. The old term for the generic schedule/sensor “job” has been replaced by the term “instigator” in order to avoid confusion with the execution API introduced in 0.12.0. If you have implemented your own schedule storage, you may need to change your method signatures appropriately.
  • Dagit is now powered by Starlette instead of Flask. If you have implemented a custom run coordinator, you may need to make the following change:
    from flask import has_request_context, request
    def submit_run(self, context: SubmitRunContext) -> PipelineRun:
        jwt_claims_header = (
            request.headers.get("X-Amzn-Oidc-Data", None) if has_request_context() else None
        )
    
    Should be replaced by:
    def submit_run(self, context: SubmitRunContext) -> PipelineRun:
        jwt_claims_header = context.get_request_header("X-Amzn-Oidc-Data")
    
  • Dagit
    • Dagit no longer allows non-software-defined asset materializations to be be graphed or grouped by partition. This feature could render in incorrect / incomplete ways because no partition space was defined for the asset.
    • Dagit’s “Jobs” sidebar now collapses by default on Instance, Job, and Asset pages. To show the left sidebar, click the “hamburger” icon in the upper left.
    • “Step Execution Time” is no longer graphed on the asset details page in Dagit, which significantly improves page load time. To view this graph, go to the asset graph for the job, uncheck “View as Asset Graph” and click the step to view its details.
    • The “experimental asset UI” feature flag has been removed from Dagit, this feature is shipped in 0.14.0!
  • The Dagster Daemon now requires a workspace.yaml file, much like Dagit.
  • Ellipsis (“...”) is now an invalid substring of a partition key. This is because Dagit accepts an ellipsis to specify partition ranges.
  • [Helm] The Dagster Helm chart now only supported Kubernetes clusters above version 1.18.

New since 0.13.19

  • Software Defined Assets:
    • In Dagit, the Asset Catalog now offers a third display mode - a global graph of your software-defined assets.
    • The Asset Catalog now allows you to filter by repository to see a subset of your assets, and offers a “View in Asset Graph” button for quickly seeing software-defined assets in context.
    • The Asset page in Dagit has been split into two tabs, “Activity” and “Definition”.
    • Dagit now displays a warning on the Asset page if the most recent run including the asset’s step key failed without yielding a materialization, making it easier to jump to error logs.
    • Dagit now gives you the option to view jobs with software-defined assets as an Asset Graph (default) or as an Op Graph, and displays asset<>op relationships more prominently when a single op yields multiple assets.
    • You can now include your assets in a repository with the use of an AssetGroup. Each repository can only have one AssetGroup, and it can provide a jumping off point for creating the jobs you plan on using from your assets.
      from dagster import AssetGroup, repository, asset
      @asset(required_resource_keys={"foo"})
      def asset1():
          ...
      @asset
      def asset2():
          ...
      @repository
      def the_repo():
          asset_group = AssetGroup(assets=[asset1, asset2], resource_defs={"foo": ...})
          return [asset_group, asset_group.build_job(selection="asset1-")]
      
    • AssetGroup.build_job supports a selection syntax similar to that found in op selection.
  • Asset Observations:
    • You can now yield AssetObservations to log metadata about a particular asset from beyond its materialization site. AssetObservations appear on the asset details page alongside materializations and numerical metadata is graphed. For assets with software-defined partitions, materialized and observed metadata about each partition is rolled up and presented together. For more information, view the docs page here.
    • Added an asset_observations_for_node method to ExecuteInProcessResult for fetching the AssetObservations from an in-process execution.
  • Dagster Types with an attached TableSchemaMetadataValue now render the schema in Dagit UI.
  • [dagster-pandera] New integration library dagster-pandera provides runtime validation from the Pandera dataframe validation library and renders table schema information in Dagit.
  • OpExecutionContext.log_event provides a way to log AssetMaterializations, ExpectationResults, and AssetObservations from the body of an op without having to yield anything. Likewise, you can use OpExecutionContext.add_output_metadata to attach metadata to an output without having to explicitly use the Output object.
  • OutputContext.log_event provides a way to log AssetMaterializations from within the handle_output method of an IO manager without yielding. Likewise, output metadata can be added using OutputContext.add_output_metadata.
  • [dagster-dbt] The load_assets_from_dbt_project function now returns a set of assets that map to a single dbt run command (rather than compiling each dbt model into a separate step). It also supports a new node_info_to_asset_key argument which allows you to customize the asset key that will be used for each dbt node.
  • [dagster-airbyte] The dagster-airbyte integration now collects the Airbyte log output for each run as compute logs, and generates AssetMaterializations for each table that Airbyte updates or creates.
  • [dagster-airbyte] The dagster-airbyte integration now supports the creation of software-defined assets, with the build_airbyte_assets function.
  • [dagster-fivetran] The dagster-fivetran integration now supports the creation of software-defined assets with the build_fivetran_assets function.
  • The multiprocess executor now supports choosing between spawn or forkserver for how its subprocesses are created. When using forkserver we attempt to intelligently preload modules to reduce the per-op overhead.
  • [Helm] Labels can now be set on the Dagit and daemon deployments.
  • [Helm] The default liveness and startup probes for Dagit and user deployments have been replaced with readiness probes. The liveness and startup probe for the Daemon has been removed. We observed and heard from users that under load, Dagit could fail the liveness probe which would result in the pod restarting. With the new readiness probe, the pod will not restart but will stop serving new traffic until it recovers. If you experience issues with any of the probe changes, you can revert to the old behavior by specifying liveness and startup probes in your Helm values (and reach out via an issue or Slack).
  • [Helm] The Ingress v1 is now supported.

Community Contributions

  • Typo fix from @jiafi, thank you!

Bugfixes

  • Fixed an issue where long job names were truncated prematurely in the Jobs page in Dagit.
  • Fixed an issue where loading the sensor timeline would sometimes load slowly or fail with a timeout error.
  • Fixed an issue where the first time a run_status_sensor executed, it would sometimes run very slowly or time out.
  • Fixed an issue where Launchpad mistakenly defaulted with invalid subset error in Dagit.
  • Multi-component asset keys can now be used in the asset graph filter bar.
  • Increased the storage query statement timeout to better handle more complex batch queries.
  • Added fallback support for older versions of sqlite to service top-level repository views in Dagit (e.g. the top-level jobs, schedule, and sensor pages).

Documentation

  • Images in the documentation now enlarge when clicked.
  • New example in examples/bollinger demonstrates dagster-pandera and TableSchema , and software-defined assets in the context of analyzing stock price data.
dagster - 0.13.19

Published by brad-alexander over 2 years ago

New

  • [dagit] Various performance improvements for asset graph views.
  • [dagster-aws] The EcsRunLauncher can now override the secrets_tag parameter to None, which will cause it to not look for any secrets to be included in the tasks for the run. This can be useful in situations where the run launcher does not have permissions to query AWS Secretsmanager.

Bugfixes

  • [dagster-mysql] For instances using MySQL for their run storage, runs created using dagster versions 0.13.17 / 0.13.18 might display an incorrect timestamp for its start time on the Runs page. Running the dagster instance migrate CLI command should resolve the issue.
dagster - 0.13.18

Published by prha over 2 years ago

New

  • Op selection now supports selecting ops inside subgraphs. For example, to select an op my_op inside a subgraph my_graph, you can now specify the query as "my_graph.my_op".
  • The error message raised on failed Dagster type check on an output now includes the description provided on the TypeCheck object.
  • The dagster asset wipe CLI command now takes a --noprompt option.
  • Added the new Map config type, used to represent mappings between arbitrary scalar keys and typed values. For more information, see the Map ConfigType docs.
  • build_resources has been added to the top level API. It provides a way to initialize resources outside of execution. This provides a way to use resources within the body of a sensor or schedule: https://github.com/dagster-io/dagster/issues/3794
  • The dagster-daemon process now creates fewer log entries when no actions are taken (for example, if the run queue is empty)
  • [dagster-k8s] When upgrading the Dagster helm chart, the old dagster-daemon pod will now spin down completely before the new dagster-daemon pod is started.
  • [dagster-k8s] A flag can now be set in the Dagster helm chart to control whether the Kubernetes Jobs and Pods created by the K8sRunLauncher should fail if the Dagster run fails. To enable this flag, set the ``failPodOnRunFailure` key to true in the run launcher portion of the Helm chart.
  • [dagster-dbt] Fixed compatibility issues with dbt 1.0. The schema and data arguments on the DbtCliResource.test function no longer need to be set to False to avoid errors, and the dbt output will be no longer be displayed in json format in the event logs.
  • Dagster Types can now have metadata entries attached to them.
  • DagsterGraphQLClient now supports submitting runs with op/solid sub-selections.
  • [dagit] The Asset Catalog view will now include information from both AssetMaterializations and AssetObservation events for each asset.
  • [dagit] [software-defined-assets] A warning will now be displayed if you attempt to backfill partitions of an asset whose upstream dependencies are missing.

Bugfixes

  • When Dagit fails to load a list of ops, the error message used the legacy term “solids”. Now it uses “ops”.
  • Runs created using dagster versions 0.13.15 / 0.13.16 / 0.13.17 might display an incorrect timestamp for its start time on the Runs page. This would only happen if you had run a schema migration (using one of those versions) with the dagster instance migrate CLI command. Running the dagster instance reindex command should run a data migration that resolves this issue.
  • When attempting to invoke run status sensors or run failure sensors, it will now incur an error. Run status/failure sensor invocation is not yet supported.
  • [dagster-k8s] Fixed a bug in the sanitization of K8s label values with uppercase characters and underscores

Community Contributions

  • [software-defined-assets] Language in dagit has been updated from “refreshing” to “rematerializing” assets (thanks @Sync271!)
  • [docs] The changelog page is now mobile friendly (thanks @keyz!)
  • [docs] The loading shimmer for text on docs pages now has correct padding (also @keyz!)

Experimental

  • [software-defined-assets] The namespace argument of the @asset decorator now accepts a list of strings in addition to a single string.
  • [memoization] Added a missing space to the error thrown when trying to use memoization without a persistent Dagster instance.
  • [metadata] Two new metadata types, TableSchemaMetadataEntryData and TableMetadataEntryData allow you to emit metadata representing the schema / contents of a table, to be displayed in Dagit.