dagster

An orchestration platform for the development, production, and observation of data assets.

APACHE-2.0 License

Downloads
12.2M
Stars
11.1K
Committers
367

Bot releases are hidden (Show)

dagster - 0.13.17

Published by prha over 2 years ago

New

  • When a user-generated context.log call fails while writing to the event log, it will now log a system error in the event log instead of failing the run.
  • [dagit] Made performance improvements to the Runs page, which can be realized after running an optional storage schema migration using dagster instance migrate.
  • When a job is created from a graph, it will now use the graph’s description if a description is not explicitly provided to override it. (Thanks @AndreaGiardini!)
  • [dagit] Log job names are now truncated in Dagit.
  • [dagit] The execution timezone is shown beside schedule cron strings, since their timezone may be UTC or a custom value.
  • [dagit] Graph filter inputs now default to using quoted strings, and this syntax matches ops, steps, or assets via an exact string match. "build_table"+ will select that asset and it's downstream children without selecting another containing that string, such as build_table_result. Removing the quotes provides the old string matching behavior
  • [dagster-aws] When using the emr_pyspark_step_launcher to run Dagster ops in an Amazon EMR cluster, the raw stdout output of the Spark driver is now written to stdout and will appear in the compute logs for the op in dagit, rather than being written to the Dagster event log.
  • [dagit] Improved performance loading the Asset entry page in Dagit.

Bugfixes

  • [dagster-mysql] Added a schema migration script that was mistakenly omitted from 0.13.16. Migrating instance storage using dagster instance migrate should now complete without error.
  • [dagster-airbyte] Fixed a packaging dependency issue with dagster-airbyte. (Thanks bollwyvl!)
  • Fixed a bug where config provided to the config arg on to_job required environment variables to exist at definition time.
  • [dagit] The asset graph view now supports ops that yield multiple assets and renders long asset key paths correctly.
  • [dagit] The asset graph’s filter input now allows you to filter on assets with multi-component key paths.
  • [dagit] The asset graph properly displays downstream asset links to other asset jobs in your workspace.

Experimental

  • [dagster-celery-k8s] Experimental run monitoring is now supported with the CeleryK8sRunLauncher. This will detect when a run worker K8s Job has failed (due to an OOM, a Node shutting down, etc.) and mark the run as failed so that it doesn’t hang in STARTED. To enable this feature, set dagsterDaemon.runMonitoring.enabled to true in your Helm values.

Documentation

  • [dagster-snowflake] Fixed some example code in the API doc for snowflake_resource, which incorrectly constructed a Dagster job using the snowflake resource.
dagster - 0.13.16

Published by sryza over 2 years ago

New

  • Added an integration with Airbyte, under the dagster-airbyte package (thanks Marcos Marx).
  • An op that has a config schema is no longer required to have a context argument.

Bugfixes

  • Fixed an issue introduced in 0.13.13 where jobs with DynamicOutputs would fail when using the k8s_job_executor due to a label validation error when creating the step pod.
  • In Dagit, when searching for asset keys on the Assets page, string matches beyond a certain character threshold on deeply nested key paths were ignored. This has been fixed, and all keys in the asset path are now searchable.
  • In Dagit, links to Partitions views were broken in several places due to recent URL querystring changes, resulting in page crashes due to JS errors. These links have been fixed.
  • The “Download Debug File” menu link is fixed on the Runs page in Dagit.
  • In the “Launch Backfill” dialog on the Partitions page in Dagit, the range input sometimes discarded user input due to page updates. This has been fixed. Additionally, pressing the return key now commits changes to the input.
  • When using a mouse wheel or touchpad gestures to zoom on a DAG view for a job or graph in Dagit, the zoom behavior sometimes was applied to the entire browser instead of just the DAG. This has been fixed.
  • Dagit fonts now load correctly when using the --path-prefix option.
  • Date strings in tool tips and tick labels on time-based charts no longer duplicate the meridiem indicator.

Experimental

  • Software-defined assets can now be partitioned. The @asset decorator has a partitions_def argument, which accepts a PartitionsDefinition value. The asset details page in Dagit now represents which partitions are filled in.

Documentation

  • Fixed the documented return type for the sync_and_poll method of the dagster-fivetran resource (thanks Marcos Marx).
  • Fixed a typo in the Ops concepts page (thanks Oluwashina Aladejubelo).
dagster - 0.13.14

Published by Ramshackle-Jamathon almost 3 years ago

New

  • When you produce a PartitionedConfig object using a decorator like daily_partitioned_config or static_partitioned_config, you can now directly invoke that object to invoke the decorated function.
  • The end_offset argument to PartitionedConfig can now be negative. This allows you to define a schedule that fills in partitions further in the past than the current partition (for example, you could define a daily schedule that fills in the partition from two days ago by setting end_offset to -1.
  • The runConfigData argument to the launchRun GraphQL mutation can now be either a JSON-serialized string or a JSON object , instead of being required to be passed in as a JSON object. This makes it easier to use the mutation in typed languages where passing in unserialized JSON objects as arguments can be cumbersome.
  • Dagster now always uses the local working directory when resolving local imports in job code, in all workspaces. In the case where you want to use a different base folder to resolve local imports in your code, the working_directory argument can now always be specified (before, it was only available when using the python_file key in your workspace). See the Workspace docs (https://docs.dagster.io/concepts/repositories-workspaces/workspaces#loading-relative-imports) for more information.

Bugfixes

  • In Dagit, when viewing an in-progress run, the logic used to render the “Terminate” button was backward: it would appear for a completed run, but not for an in-progress run. This bug was introduced in 0.13.13, and is now fixed.
  • Previously, errors in the instance’s configured compute log manager would cause runs to fail. Now, these errors are logged but do not affect job execution.
  • The full set of DynamicOutputs returned by a op are no longer retained in memory if there is no hook to receive the values. This allows for DynamicOutput to be used for breaking up a large data set that can not fit in memory.

Breaking Changes

  • When running your own gRPC server to serve Dagster code, jobs that launch in a container using code from that server will now default to using dagster as the entry point. Previously, the jobs would run using PYTHON_EXECUTABLE -m dagster, where PYTHON_EXECUTABLE was the value of sys.executable on the gRPC server. For the vast majority of Dagster jobs, these entry points will be equivalent. To keep the old behavior (for example, if you have multiple Python virtualenvs in your image and want to ensure that runs also launch in a certain virtualenv), you can launch the gRPC server using the new ----use-python-environment-entry-point command-line arg.

Community Contributions

  • [bugfix] Fixed an issue where log levels on handlers defined in dagster.yaml would be ignored (thanks @lambdaTW!)

Documentation

  • Typo fix in the jobs page (thanks @kmiku7!)
  • Added docs on how to modify k8s job TTL

UI

  • When re-launching a run, the log/step filters are now preserved in the new run’s page.
  • Step execution times/recent runs now appear in the job/graph sidebar.
dagster - 0.13.13

Published by benpankow almost 3 years ago

New

  • [dagster-dbt] dbt rpc resources now surface dbt log messages in the Dagster event log.
  • [dagster-databricks] The databricks_pyspark_step_launcher now streams Dagster logs back from Databricks rather than waiting for the step to completely finish before exporting all events. Fixed an issue where all events from the external step would share the same timestamp. Immediately after execution, stdout and stderr logs captured from the Databricks worker will be automatically surfaced to the event log, removing the need to set the wait_for_logs option in most scenarios.
  • [dagster-databricks] The databricks_pyspark_step_launcher now supports dynamically mapped steps.
  • If the scheduler is unable to reach a code server when executing a schedule tick, it will now wait until the code server is reachable again before continuing, instead of marking the schedule tick as failed.
  • The scheduler will now check every 5 seconds for new schedules to run, instead of every 30 seconds.
  • The run viewer and workspace pages of Dagit are significantly more performant.
  • Dagit loads large (100+ node) asset graphs faster and retrieves information about the assets being rendered only.
  • When viewing an asset graph in Dagit, you can now rematerialize the entire graph by clicking a single “Refresh” button, or select assets to rematerialize them individually. You can also launch a job to rebuild an asset directly from the asset details page.
  • When viewing a software-defined asset, Dagit displays its upstream and downstream assets in two lists instead of a mini-graph for easier scrolling and navigation. The statuses of these assets are updated in real-time. This new UI also resolves a bug where only one downstream asset would appear.

Bugfixes

  • Fixed bug where execute_in_process would not work for graphs with nothing inputs.
  • In the Launchpad in Dagit, the Ctrl+A command did not correctly allow select-all behavior in the editor for non-Mac users, this has now been fixed.
  • When viewing a DAG in Dagit and hovering on a specific input or output for an op, the connections between the highlighted inputs and outputs were too subtle to see. These are now a bright blue color.
  • In Dagit, when viewing an in-progress run, a caching bug prevented the page from updating in real time in some cases. For instance, runs might appear to be stuck in a queued state long after being dequeued. This has been fixed.
  • Fixed a bug in the k8s_job_executor where the same step could start twice in rare cases.
  • Enabled faster queries for the asset catalog by migrating asset database entries to store extra materialization data.
  • [dagster-aws] Viewing the compute logs for in-progress ops for instances configured with the S3ComputeLogManager would cause errors in Dagit. This is now fixed.
  • [dagster-pandas] Fixed bug where Pandas categorical dtype did not work by default with dagster-pandas categorical_column constraint.
  • Fixed an issue where schedules that yielded a SkipReason from the schedule function did not display the skip reason in the tick timeline in Dagit, or output the skip message in the dagster-daemon log output.
  • Fixed an issue where the snapshot link of a finished run in Dagit would sometimes fail to load with a GraphQL error.
  • Dagit now supports software-defined assets that are defined in multiple jobs within a repo, and displays a warning when assets in two repos share the same name.

Breaking Changes

  • We previously allowed schedules to be defined with cron strings like @daily rather than 0 0 * * *. However, these schedules would fail to actually run successfully in the daemon and would also cause errors when viewing certain pages in Dagit. We now raise an DagsterInvalidDefinitionError for schedules that do not have a cron expression consisting of a 5 space-separated fields.

Community Contributions

  • In dagster-dask, a schema can now be conditionally specified for ops materializing outputs to parquet files, thank you @kudryk!
  • Dagster-gcp change from @AndreaGiardini that replaces get_bucket() calls with bucket(), to avoid unnecessary bucket metadata fetches, thanks!
  • Typo fix from @sebastianbertoli, thank you!
  • [dagster-k8s] Kubernetes jobs and pods created by Dagster now have labels identifying the name of the Dagster job or op they are running. Thanks @skirino!

Experimental

  • [dagit] Made performance improvements for loading the asset graph.
  • [dagit] The debug console logging output now tracks calls to fetch data from the database, to help track inefficient queries.
dagster - 0.13.12

Published by gibsondan almost 3 years ago

0.13.12

New

  • The dagit and dagster-daemon processes now use a structured Python logger for command-line output.
  • Dagster command-line logs now include the system timezone in the logging timestamp.
  • When running your own Dagster gRPC code server, the server process will now log a message to stdout when it starts up and when it shuts down.
  • [dagit] The sensor details page and sensor list page now display links to the assets tracked by @asset_sensors.
  • [dagit] Improved instance warning in Dagit. Previously, Dagit showed an instance warning for daemon not running when no repos have schedulers or sensors.
  • [dagster-celery-k8s] You can now specify volumes and volume mounts to runs using the CeleryK8sRunLauncher that will be included in all launched jobs.
  • [dagster-databricks] You are no longer required to specify storage configuration when using the databricks_pyspark_step_launcher.
  • [dagster-databricks] The databricks_pyspark_step_launcher can now be used with dynamic mapping and collect steps.
  • [dagster-mlflow] The end_mlflow_on_run_finished hook is now a top-level export of the dagster mlflow library. The API reference also now includes an entry for it.

Bugfixes

  • Better backwards-compatibility for fetching asset keys materialized from older versions of dagster.
  • Fixed an issue where jobs running with op subsets required some resource configuration as part of the run config, even when they weren’t required by the selected ops.
  • RetryPolicy is now respected when execution is interrupted.
  • [dagit] Fixed "Open in Playground" link on the scheduled ticks.
  • [dagit] Fixed the run ID links on the Asset list view.
  • [dagit] When viewing an in-progress run, the run status sometimes failed to update as new logs arrived, resulting in a Gantt chart that either never updated from a “queued” state or did so only after a long delay. The run status and Gantt chart now accurately match incoming logs.

Community Contributions

  • [dagster-k8s] Fixed an issue where specifying job_metadata in tags did not correctly propagate to Kubernetes jobs created by Dagster. Thanks @ibelikov!

Experimental

  • [dagit] Made performance improvements for loading the asset graph.

Documentation

  • The Versioning and Memoization guide has been updated to reflect a new set of core memoization APIs.
  • [dagster-dbt] Updated the dagster-dbt integration guide to mention the new dbt Cloud integration.
  • [dagster-dbt] Added documentation for the default_flags property of DbtCliResource.
dagster -

Published by alangenfeld almost 3 years ago

New

  • [dagit] Made performance improvements to the Run page.
  • [dagit] Highlighting a specific sensor / schedule ticks is now reflected in a shareable URL.

Bugfixes

  • [dagit] On the Runs page, when filtering runs with a tag containing a comma, the filter input would incorrectly break the tag apart. This has been fixed.
  • [dagit] For sensors that do not target a specific job (e.g. un_status_sensor, we are now hiding potentially confusing Job details
  • [dagit] Fixed an issue where some graph explorer views generated multiple scrollbars.
  • [dagit] Fixed an issue with the Run view where the Gantt view incorrectly showed in-progress steps when the run had exited.
  • [dagster-celery-k8s] Fixed an issue where setting a custom Celery broker URL but not a custom Celery backend URL in the helm chart would produce an incorrect Celery configuration.
  • [dagster-k8s] Fixed an issue where Kubernetes volumes using list or dict types could not be set in the Helm chart.

Community Contributions

  • [dagster-k8s] Added the ability to set a custom location name when configuring a workspace in the Helm chart. Thanks @pcherednichenko!

Experimental

  • [dagit] Asset jobs now display with spinners on assets that are currently in progress.
  • [dagit] Assets jobs that are in progress will now display a dot icon on all assets that are not yet running but will be re-materialized in the run.
  • [dagit] Fixed broken links to the asset catalog entries from the explorer view of asset jobs.
  • The AssetIn input object now accepts an asset key so upstream assets can be explicitly specified (e.g. AssetIn(asset_key=AssetKey("asset1")))
  • The @asset decorator now has an optional non_argument_deps parameter that accepts AssetKeys of assets that do not pass data but are upstream dependencies.
  • ForeignAsset objects now have an optional description attribute.

Documentation

  • “Validating Data with Dagster Type Factories” guide added.
dagster - 0.13.10

Published by johannkm almost 3 years ago

New

  • run_id, job_name, and op_exception have been added as parameters to build_hook_context.
  • You can now define inputs on the top-level job / graph. Those inputs can be can configured as an inputs key on the top level of your run config. For example, consider the following job:
from dagster import job, op

@op
def add_one(x):
    return x + 1

@job
def my_job(x):
    add_one(x)

You can now add config for x at the top level of my run_config like so:

run_config = {
  "inputs": {
    "x": {
      "value": 2
    }
  }
}
  • You can now create partitioned jobs and reference a run’s partition from inside an op body or IOManager load_input or handle_output method, without threading partition values through config. For example, where previously you might have written:
@op(config_schema={"partition_key": str})
def my_op(context):
    print("partition_key: " + context.op_config["partition_key"])

@static_partitioned_config(partition_keys=["a", "b"])
def my_static_partitioned_config(partition_key: str):
    return {"ops": {"my_op": {"config": {"partition_key": partition_key}}}}

@job(config=my_static_partitioned_config)
def my_partitioned_job():
    my_op()

You can now write:

@op
def my_op(context):
    print("partition_key: " + context.partition_key)

@job(partitions_def=StaticPartitionsDefinition(["a", "b"]))
def my_partitioned_job():
    my_op()
  • Added op_retry_policy to @job. You can also specify op_retry_policy when invoking to_job on graphs.
  • [dagster-fivetran] The fivetran_sync_op will now be rendered with a fivetran tag in Dagit.
  • [dagster-fivetran] The fivetran_sync_op now supports producing AssetMaterializations for each table updated during the sync. To this end, it now outputs a structured FivetranOutput containing this schema information, instead of an unstructured dictionary.
  • [dagster-dbt] AssetMaterializations produced from the dbt_cloud_run_op now include a link to the dbt Cloud docs for each asset (if docs were generated for that run).
  • You can now use the @schedule decorator with RunRequest - based evaluation functions. For example, you can now write:
@schedule(cron_schedule="* * * * *", job=my_job)
def my_schedule(context):
    yield RunRequest(run_key="a", ...)
    yield RunRequest(run_key="b", ...)
  • [dagster-k8s] You may now configure instance-level python_logs settings using the Dagster Helm chart.
  • [dagster-k8s] You can now manage a secret that contains the Celery broker and backend URLs, rather than the Helm chart
  • [Dagster-slack] Improved the default messages in make_slack_on_run_failure_sensor to use Slack layout blocks and include clickable link to Dagit. Previously, it sent a plain text message.

Dagit

  • Made performance improvements to the Run page.
  • The Run page now has a pane control that splits the Gantt view and log table evenly on the screen.
  • The Run page now includes a list of succeeded steps in the status panel next to the Gantt chart.
  • In the Schedules list, execution timezone is now shown alongside tick timestamps.
  • If no repositories are successfully loaded when viewing Dagit, we now redirect to /workspace to quickly surface errors to the user.
  • Increased the size of the reload repository button
  • Repositories that had been hidden from the left nav became inaccessible when loaded in a workspace containing only that repository. Now, when loading a workspace containing a single repository, jobs for that repository will always appear in the left nav.
  • In the Launchpad, selected ops were incorrectly hidden in the lower right panel.
  • Repaired asset search input keyboard interaction.
  • In the Run page, the list of previous runs was incorrectly ordered based on run ID, and is now ordered by start time.
  • Using keyboard commands with the / key (e.g. toggling commented code) in the config editor

Bugfixes

  • Previously, if an asset in software-defined assets job depended on a ForeignAsset, the repository containing that job would fail to load.
  • Incorrectly triggered global search. This has been fixed.
  • Fix type on tags of EMR cluster config (thanks Chris)!
  • Fixes to the tests in dagster new-project , which were previously using an outdated result API (thanks Vašek)!

Experimental

  • You can now mount AWS Secrets Manager secrets as environment variables in runs launched by the EcsRunLauncher.
  • You can now specify the CPU and Memory for runs launched by the EcsRunLauncher.
  • The EcsRunLauncher now dynamically chooses between assigning a public IP address or not based on whether it’s running in a public or private subnet.
  • The @asset and @multi_asset decorator now return AssetsDefinition objects instead of OpDefinitions

Documentation

  • The tutorial now uses get_dagster_logger instead of context.log.
  • In the API docs, most configurable objects (such as ops and resources) now have their configuration schema documented in-line.
  • Removed typo from CLI readme (thanks Kan (https://github.com/zkan))!
dagster - 0.13.9

Published by gibsondan almost 3 years ago

New

  • Memoization can now be used with the multiprocess, k8s, celery-k8s, and dask executors.
dagster - 0.13.8

Published by gibsondan almost 3 years ago

New

  • Improved the error message for situations where you try a, b = my_op(), inside @graph or @job, but my_op only has a single Out.
  • [dagster-dbt] A new integration with dbt Cloud allows you to launch dbt Cloud jobs as part of your Dagster jobs. This comes complete with rich error messages, links back to the dbt Cloud UI, and automatically generated Asset Materializations to help keep track of your dbt models in Dagit. It provides a pre-built dbt_cloud_run_op, as well as a more flexible dbt_cloud_resource for more customized use cases. Check out the api docs to learn more!
  • [dagster-gcp] Pinned the google-cloud-bigquery dependency to <3, because the new 3.0.0b1 version was causing some problems in tests.
  • [dagit] Verbiage update to make it clear that wiping an asset means deleting the materialization events for that asset.

Bugfixes

  • Fixed a bug with the pipeline launch / job launch CLIs that would spin up an ephemeral dagster instance for the launch, then tear it down before the run actually executed. Now, the CLI will enforce that your instance is non-ephemeral.
  • Fixed a bug with re-execution when upstream step skips some outputs. Previously, it mistakenly tried to load inputs from parent runs. Now, if an upstream step doesn’t yield outputs, the downstream step would skip.
  • [dagit] Fixed a bug where configs for unsatisfied input wasn’t properly resolved when op selection is specified in Launchpad.
  • [dagit] Restored local font files for Inter and Inconsolata instead of using the Google Fonts API. This allows correct font rendering for offline use.
  • [dagit] Improved initial workspace loading screen to indicate loading state instead of showing an empty repository message.

Breaking Changes

  • The pipeline argument of the InitExecutorContext constructor has been changed to job.

Experimental

  • The @asset decorator now accepts a dagster_type argument, which determines the DagsterType for the output of the asset op.
  • build_assets_job accepts an executor_def argument, which determines the executor for the job.

Documentation

  • A docs section on context manager resources has been added. Check it out here.
  • Removed the versions of the Hacker News example jobs that used the legacy solid & pipeline APIs.
dagster - 0.13.7

Published by gibsondan almost 3 years ago

New

  • The Runs page in Dagit now loads much more quickly.

Bugfixes

  • Fixed an issue where Dagit would sometimes display a red "Invalid JSON" error message.

Dependencies

  • google-cloud-bigquery is temporarily pinned to be prior to version 3 due to a breaking change in that version.
dagster - 0.13.6

Published by gibsondan almost 3 years ago

Bugfixes

  • Previously, the EcsRunLauncher tagged each ECS task with its corresponding Dagster Run ID. ECS tagging isn't supported for AWS accounts that have not yet migrated to using the long ARN format. Now, the EcsRunLauncher only adds this tag if your AWS account has the long ARN format enabled.
  • Fixed a bug in the k8s_job_executor and docker_executor that could result in jobs exiting as SUCCESS before all ops have run.
  • Fixed a bug in the k8s_job_executor and docker_executor that could result in jobs failing when an op is skipped.
  • [dagit] Improved performance of the Runs page.

Dependencies

  • graphene is temporarily pinned to be prior to version 3 to unbreak Dagit dependencies.
dagster - 0.13.5

Published by rexledesma almost 3 years ago

New

  • [dagster-fivetran] A new dagster-fivetran integration allows you to launch Fivetran syncs and monitor their progress from within Dagster. It provides a pre-built fivetran_sync_op, as well as a more flexible fivetran_resource for more customized use cases. Check out the api docs to learn more!
  • When inferring a graph/job/op/solid/pipeline description from the docstring of the decorated function, we now dedent the docstring even if the first line isn’t indented. This allows descriptions to be formatted nicely even when the first line is on the same line as the triple-quotes.
  • The SourceHashVersionStrategy class has been added, which versions op and resource code. It can be provided to a job like so:
from dagster import job, SourceHashVersionStrategy

@job(version_strategy=SourceHashVersionStrategy())
def my_job():
     ...
  • [dagit] Improved performance on the initial page load of the Run page, as well as the partitions UI / launch backfill modal
  • [dagit] Fixed a bug where top-level graphs in the repo could not be viewed in the Workspace > Graph view.

Bugfixes

  • Fixed an issue where turning a partitioned schedule off and on again would sometimes result in unexpected past runs being created. (#5604)
  • Fixed an issue where partition sets that didn’t return a new copy of run configuration on each function call would sometimes apply the wrong config to partitions during backfills.
  • Fixed rare issue where using dynamic outputs in combination with optional outputs would cause errors when using certain executors.
  • [dagster-celery-k8s] Fixed bug where CeleryK8s executor would not respect job run config
  • [dagit] Fixed bug where graphs would sometimes appear off-center.

Breaking Changes

  • In 0.13.0, job CLI commands executed via dagster job selected both pipelines and jobs. This release changes the dagster job command to select only jobs and not pipelines.

Community Contributions

  • [dagster-dask] Updated DaskClusterTypes to have the correct import paths for certain cluster managers (thanks @kudryk!)
  • [dagster-azure] Updated version requirements for Azure to be more recent and more permissive (thanks @roeap !)
  • [dagster-shell] Ops will now copy the host environment variables at runtime, rather than copying them from the environment that their job is launched from (thanks @alexismanuel !)

Documentation

  • The job, op, graph migration guide was erroneously marked experimental. This has been fixed.
dagster - 0.13.4

Published by gibsondan almost 3 years ago

New

  • [dagster-k8s] The k8s_job_executor is no longer experimental, and is recommended for production workloads. This executor runs each op in a separate Kubernetes job. We recommend this executor for Dagster jobs that require greater isolation than the multiprocess executor can provide within a single Kubernetes pod. The celery_k8s_job_executor will still be supported, but is recommended only for use cases where Celery is required (The most common example is to offer step concurrency limits using multiple Celery queues). Otherwise, the k8s_job_executor is the best way to get Kubernetes job isolation.
  • [dagster-airflow] Updated dagster-airflow to better support job/op/graph changes by adding a make_dagster_job_from_airflow_dag factory function. Deprecated pipeline_name argument in favor of job_name in all the APIs.
  • Removed a version pin of the chardet library that was required due to an incompatibility with an old version of the aiohttp library, which has since been fixed.
  • We now raise a more informative error if the wrong type is passed to the ins argument of the op decorator.
  • In the Dagit Launchpad, the button for launching a run now says “Launch Run” instead of “Launch Execution”

Bugfixes

  • Fixed an issue where job entries from Dagit search navigation were not linking to the correct job pages.
  • Fixed an issue where jobs / pipelines were showing up instead of the underlying graph in the list of repository graph definitions.
  • Fixed a bug with using custom loggers with default config on a job.
  • [dagster-slack] The slack_on_run_failure_sensor now says “Job” instead of “Pipeline” in its default message.

Community Contributions

  • Fixed a bug that was incorrectly causing a DagsterTypeCheckDidNotPass error when a Dagster Type contained a List inside a Tuple (thanks @jan-eat!)
  • Added information for setting DAGSTER_HOME in Powershell and batch for windows users. (thanks @slamer59!)

Experimental

  • Changed the job explorer view in Dagit to show asset-based graphs when the experimental Asset API flag is turned on for any job that has at least one software-defined asset.

Documentation

  • Updated API docs and integration guides to reference job/op/graph for various libraries (dagstermill, dagster-pandas, dagster-airflow, etc)
  • Improved documentation when attempting to retrieve output value from execute_in_process, when job does not have a top-level output.
dagster -

Published by johannkm almost 3 years ago

Bugfixes

  • [dagster-k8s] Fixed a bug that caused retries to occur twice with the k8s_job_executor
dagster - 0.13.2

Published by benpankow almost 3 years ago

New

  • Updated dagstermill to better support job/op/graph changes by adding a define_dagstermill_op factory function. Also updated documentation and examples to reflect these changes.
  • Changed run history for jobs in Dagit to include legacy mode tags for runs that were created from pipelines that have since been converted to use jobs.
  • The new get_dagster_logger() method is now importable from the top level dagster module (from dagster import get_dagster_logger)
  • [dagster-dbt] All dagster-dbt resources (dbt_cli_resource, dbt_rpc_resource, and dbt_rpc_sync_resource) now support the dbt ls command: context.resources.dbt.ls().
  • Added ins and outs properties to OpDefinition.
  • Updated the run status favicon of the Run page in Dagit.
  • There is now a resources_config argument on build_solid_context. The config argument has been renamed to solid_config.
  • [helm] When deploying Redis using the Dagster helm chart, by default the new cluster will not require authentication to start a connection to it.
  • [dagster-k8s] The component name on Kubernetes jobs for run and step workers is now run_worker and step_worker, respectively.
  • Improved performance for rendering the Gantt chart on the Run page for runs with very long event logs.

Bugfixes

  • Fixed a bug where decorating a job with a hook would create a pipeline.
  • Fixed a bug where providing default logger config to a job would break with a confusing error.
  • Fixed a bug with retrieving output results from a mapped input on execute_in_process
  • Fixed a bug where schedules referencing a job were not creating runs using that job’s default run config.
  • [dagster-k8s] Fixed a bug where the retry mode was not being passed along through the k8s executor.

Breaking Changes

  • The first argument on Executor.execute(...) has changed from pipeline_context to plan_context

Community Contributions

  • When using multiple Celery workers in the Dagster helm chart, each worker can now be individually configured. See the helm chart for more information. Thanks @acrulopez!
  • [dagster-k8s] Changed Kubernetes job containers to use the fixed name dagster, rather than repeating the job name. Thanks @skirino!

Experimental

  • [dagster-docker] Added a new docker_executor which executes steps in separate Docker containers.

  • The dagster-daemon process can now detect hanging runs and restart crashed run workers. Currently
    only supported for jobs using the docker_executor and k8s_job_executor. Enable this feature in your dagster.yaml with:

    run_monitoring:
      enabled: true
    

    Documentation coming soon. Reach out in the #dagster-support Slack channel if you are interested in using this feature.

Documentation

  • Adding “Python Logging” back to the navigation pane.
  • Updated documentation for dagster-aws, dagster-github, and dagster-slack to reference job/op/graph APIs.
dagster - 0.13.1

Published by gibsondan almost 3 years ago

New

Docs

  • Various fixes to broken links on pages in 0.13.0 docs release

Bug fixes

  • Previously, the Dagster CLI would use a completely ephemeral dagster instance if $DAGSTER_HOME was not set. Since the new job abstraction by default requires a non-ephemeral dagster instance, this has been changed to instead create a persistent instance that is cleaned up at the end of an execution.

Dagit

  • Run-status-colorized dagster logo is back on job execution page
  • Improvements to Gantt chart color scheme
dagster - 0.13.0

Published by gibsondan almost 3 years ago

0.13.0 "Get the Party Started"

Major Changes

  • The job, op, and graph APIs now represent the stable core of the system, and replace pipelines, solids, composite solids, modes, and presets as Dagster’s core abstractions. All of Dagster’s documentation - tutorials, examples, table of contents - is in terms of these new core APIs. Pipelines, modes, presets, solids, and composite solids are still supported, but are now considered “Legacy APIs”. We will maintain backcompatibility with the legacy APIs for some time, however, we believe the new APIs represent an elegant foundation for Dagster going forward. As time goes on, we will be adding new features that only apply to the new core. All in all, the new APIs provide increased clarity - they unify related concepts, make testing more lightweight, and simplify operational workflows in Dagit. For comprehensive instructions on how to transition to the new APIs, refer to the migration guide.
  • Dagit has received a complete makeover. This includes a refresh to the color palette and general design patterns, as well as functional changes that make common Dagit workflows more elegant. These changes are designed to go hand in hand with the new set of core APIs to represent a stable core for the system going forward.
  • You no longer have to pass a context object around to do basic logging. Many updates have been made to our logging system to make it more compatible with the python logging module. You can now capture logs produced by standard python loggers, set a global python log level, and set python log handlers that will be applied to every log message emitted from the Dagster framework. Check out the docs here!
  • The Dagit “playground” has been re-named into the Dagit “launchpad”. This reflects a vision of the tool closer to how our users actually interact with it - not just a testing/development tool, but also as a first-class starting point for many one-off workflows.
  • Introduced a new integration with Microsoft Teams, which includes a connection resource and support for sending messages to Microsoft Teams. See details in the API Docs (thanks @iswariyam!).
  • Intermediate storages, which were deprecated in 0.10.0, have now been removed. Refer to the “Deprecation: Intermediate Storage” section of the 0.10.0 release notes for how to use IOManagers instead.
  • The pipeline-level event types in the run log have been renamed so that the PIPELINE prefix has been replaced with RUN. For example, the PIPELINE_START event is now the RUN_START event.

New since 0.12.15

  • Addition of get_dagster_logger function, which creates a python loggers whose output messages will be captured and converted into Dagster log messages.

Community Contributions

  • The run_config attribute is now available on ops/solids built using the build_op_context or build_solid_context functions. Thanks @jiafi!
  • Limit configuration of applyLimitPerUniqueValue in k8s environments. Thanks @cvb!
  • Fix for a solid’s return statement in the intro tutorial. Thanks @dbready!
  • Fix for a bug with output keys in the s3_pickle_io_manager. Thanks @jiafi!

Breaking Changes

  • We have renamed a lot of our GraphQL Types to reflect our emphasis on the new job/op/graph APIs. We have made the existing types backwards compatible so that GraphQL fragments should still work. However, if you are making custom GraphQL requests to your Dagit webserver, you may need to change your code to handle the new types.
  • We have paired our GraphQL changes with changes to our Python GraphQL client. If you have upgraded the version of your Dagit instance, you will most likely also want to upgrade the version of your Python GraphQL client.

Improvements

  • Solid, op, pipeline, job, and graph descriptions that are inferred from docstrings now have leading whitespaces stripped out.
  • Improvements to how we cache and store step keys should speed up dynamic workflows with many dynamic outputs significantly.
  • The asset catalog is now paginated, which should result in better initial load times.

Bugfixes

  • Fixed a bug where kwargs could not be used to set the context when directly invoking a solid. IE my_solid(context=context_obj).
  • Fixed a bug where celery-k8s config did not work in the None case:
execution:
  celery-k8s:

Experimental

  • Removed the lakehouse library, whose functionality is subsumed by @asset and build_assets_job in Dagster core.

Documentation

  • Removed the trigger_pipeline example, which was not referenced in docs.
  • dagster-mlflow APIs have been added to API docs.
dagster -

Published by jmsanders about 3 years ago

0.12.15

Community Contributions

  • You can now configure credentials for the GCSComputeLogManager using a string or environment variable instead of passing a path to a credentials file. Thanks @silentsokolov!
  • Fixed a bug in the dagster-dbt integration that caused the DBT RPC solids not to retry when they received errors from the server. Thanks @cdchan!
  • Improved helm schema for the QueuedRunCoordinator config. Thanks @cvb!

Bugfixes

  • Fixed a bug where dagster instance migrate would run out of memory when migrating over long run histories.

Experimental

  • Fixed broken links in the Dagit workspace table view for the experimental software-defined assets feature.
dagster - 0.12.14

Published by gibsondan about 3 years ago

Community Contributions

  • Updated click version, thanks @ashwin153!
  • Typo fix, thanks @geoHeil!

Bugfixes

  • Fixed a bug in dagster_aws.s3.sensor.get_s3_keys that would return no keys if an invalid s3 key was provided
  • Fixed a bug with capturing python logs where statements of the form my_log.info("foo %s", "bar") would cause errors in some scenarios.
  • Fixed a bug where the scheduler would sometimes hang during fall Daylight Savings Time transitions when Pendulum 2 was installed.

Experimental

  • Dagit now uses an asset graph to represent jobs built using build_assets_job. The asset graph shows each node in the job’s graph with metadata about the asset it corresponds to - including asset materializations. It also contains links to upstream jobs that produce assets consumed by the job, as well as downstream jobs that consume assets produced by the job.
  • Fixed a bug in load_assets_from_dbt_project and load_assets_from_dbt_project that would cause runs to fail if no runtime_metadata_fn argument were supplied.
  • Fixed a bug that caused @asset not to infer the type of inputs and outputs from type annotations of the decorated function.
  • @asset now accepts a compute_kind argument. You can supply values like “spark”, “pandas”, or “dbt”, and see them represented as a badge on the asset in the Dagit asset graph.
dagster - 0.12.13

Published by prha about 3 years ago

0.12.13

Community Contributions

  • Changed VersionStrategy.get_solid_version and VersionStrategy.get_resource_version to take in a SolidVersionContext and ResourceVersionContext, respectively. This gives VersionStrategy access to the config (in addition to the definition object) when determining the code version for memoization. (Thanks @RBrossard!).

    Note: This is a breaking change for anyone using the experimental VersionStrategy API. Instead of directly being passed solid_def and resource_def, you should access them off of the context object using context.solid_def and context.resource_def respectively.

New

  • [dagster-k8s] When launching a pipeline using the K8sRunLauncher or k8s_job_executor, you can know specify a list of volumes to be mounted in the created pod. See the API docs for for information.
  • [dagster-k8s] When specifying a list of environment variables to be included in a pod using custom configuration, you can now specify the full set of parameters allowed by a V1EnvVar in Kubernetes.

Bugfixes

  • Fixed a bug where mapping inputs through nested composite solids incorrectly caused validation errors.
  • Fixed a bug in Dagit, where WebSocket reconnections sometimes led to logs being duplicated on the Run page.
  • Fixed a bug In Dagit, where log views that were scrolled all the way down would not auto-scroll as new logs came in.

Documentation