dagster

An orchestration platform for the development, production, and observation of data assets.

APACHE-2.0 License

Downloads
12.2M
Stars
11.1K
Committers
367

Bot releases are hidden (Show)

dagster -

Published by chenbobby almost 4 years ago

0.9.21

Community Contributions

  • Fixed helm chart to only add flower to the K8s ingress when enabled (thanks @PenguinToast!)
  • Updated helm chart to use more lenient timeouts for liveness probes on user code deployments (thanks @PenguinToast!)

Bugfixes

  • [Helm/K8s] Due to Flower being incompatible with Celery 5.0, the Helm chart for Dagster now uses a specific image mher/flower:0.9.5 for the Flower pod.
dagster -

Published by dpeng817 almost 4 years ago

New

  • [Dagit] Show recent runs on individual schedule pages
  • [Dagit] It’s no longer required to run dagster schedule up or press the Reconcile button before turning on a new schedule for the first time
  • [Dagit] Various improvements to the asset view. Expanded the Last Materialization Event view. Expansions to the materializations over time view, allowing for both a list view and a graphical view of materialization data.

Community Contributions

  • Updated many dagster-aws tests to use mocked resources instead of depending on real cloud resources, making it possible to run these tests locally. (thanks @jmsanders!)

Bugfixes

  • fixed an issue with retries in step launchers
  • [Dagit] bugfixes and improvements
  • Fixed an issue where dagit sometimes left hanging processes behind after exiting

Experimental

  • [K8s] The dagster daemon is now optionally deployed by the helm chart. This enables run-level queuing with the QueuedRunCoordinator.
dagster -

Published by dpeng817 almost 4 years ago

dagster -

Published by rexledesma almost 4 years ago

dagster -

Published by rexledesma almost 4 years ago

New

  • Improved error handling when the intermediate storage stores and retrieves objects.
  • New URL scheme in Dagit, with repository details included on all paths for pipelines, solids, and schedules
  • Relaxed constraints for the AssetKey constructor, to enable arbitrary strings as part of the key path.
  • When executing a subset of a pipeline, configuration that does not apply to the current subset but would be valid in the original pipeline is now allowed and ignored.
  • GCSComputeLogManager was added, allowing for compute logs to be persisted to Google cloud storage
  • The step-partition matrix in Dagit now auto-reloads runs

Bugfixes

  • Dagit bugfixes and improvements
  • When specifying a namespace during helm install, the same namespace will now be used by the K8sScheduler or K8sRunLauncher, unless overridden.
  • @pipeline decorated functions with -> None typing no longer cause unexpected problems.
  • Fixed an issue where compute logs might not always be complete on Windows.
dagster -

Published by rexledesma almost 4 years ago

dagster -

Published by rexledesma almost 4 years ago

Breaking Changes

  • CliApiRunLauncher and GrpcRunLauncher have been combined into DefaultRunLauncher.
    If you had one of these run launchers in your dagster.yaml, replace it with DefaultRunLauncher
    or remove the run_launcher: section entirely.

New

  • Added a type loader for typed dictionaries: can now load typed dictionaries from config.

Bugfixes

  • Dagit bugfixes and improvements
    • Added error handling for repository errors on startup and reload
    • Repaired timezone offsets
    • Fixed pipeline explorer state for empty pipelines
    • Fixed Scheduler table
  • User-defined k8s config in the pipeline run tags (with key dagster-k8s/config) will now be
    passed to the k8s jobs when using the dagster-k8s and dagster-celery-k8s run launchers.
    Previously, only user-defined k8s config in the pipeline definition’s tag was passed down.

Experimental

  • Run queuing: the new QueuedRunCoordinator enables limiting the number of concurrent runs.
    The DefaultRunCoordinator launches jobs directly from Dagit, preserving existing behavior.
dagster -

Published by johannkm almost 4 years ago

New

  • [dagster-dask] Allow connecting to an existing scheduler via its address
  • [dagster-aws] Importing dagster_aws.emr no longer transitively importing dagster_spark
  • [dagster-dbr] CLI solids now emit materializations

Community contributions

  • Docs fix (Thanks @kaplanbora!)

Bug fixes

  • PipelineDefinition 's that do not meet resource requirements for its types will now fail at definition time
  • Dagit bugfixes and improvements
  • Fixed an issue where a run could be left hanging if there was a failure during launch

Deprecated

  • We now warn if you return anything from a function decorated with @pipeline. This return value actually had no impact at all and was ignored, but we are making changes that will use that value in the future. By changing your code to not return anything now you will avoid any breaking changes with zero user-visible impact.
dagster -

Published by johannkm almost 4 years ago

Breaking Changes

  • Removed DagsterKubernetesPodOperator in dagster-airflow.
  • Removed the execute_plan mutation from dagster-graphql.
  • ModeDefinition, PartitionSetDefinition, PresetDefinition, @repository, @pipeline, and ScheduleDefinition names must pass the regular expression r"^[A-Za-z0-9_]+$" and not be python keywords or disallowed names. See DISALLOWED_NAMES in dagster.core.definitions.utils for exhaustive list of illegal names.
  • dagster-slack is now upgraded to use slackclient 2.x - this means that this resource will only support Python 3.6 and above.
  • [K8s] Added a health check to the helm chart for user deployments, which relies on a new dagster api grpc-health-check cli command present in Dagster 0.9.16 and later.

New

  • Add helm chart configurations to allow users to configure a K8sRunLauncher, in place of the CeleryK8sRunLauncher.
  • “Copy URL” button to preserve filter state on Run page in dagit

Community Contributions

  • Dagster CLI options can now be passed in via environment variables (Thanks @xinbinhuang!)
  • New --limit flag on the dagster run list command (Thanks @haydarai!)

Bugfixes

  • Addressed performance issues loading the /assets table in dagit. Requires a data migration to create a secondary index by running dagster instance reindex.
  • Dagit bugfixes and improvements
dagster - 0.9.15

Published by hellendag about 4 years ago

Breaking Changes

  • CeleryDockerExecutor no longer requires a repo_location_name config field.
  • executeRunInProcess was removed from dagster-graphql

New

  • Dagit: Warn on tab removal in playground
  • Display versions CLI: Added a new CLI that displays version information for a memoized run. Called via dagster pipeline list_versions.
  • CeleryDockerExecutor accepts a network field to configure the network settings for the Docker container it connects to for execution.
  • Dagit will now set a statement timeout on supported instance DBs. Defaults to 5s and can be controlled with the --db-statement-timeout flag

Community Contributions

  • dagster grpc requirements are now more friendly for users (thanks @jmo-qap!)
  • dagster.utils now has is_str (thanks @monicayao!)
  • dagster-pandas can now load dataframes from pickle (thanks @mrdrprofuroboros!)
  • dagster-ge validation solid factory now accepts name (thanks @haydarai!)

Bugfixes

  • Dagit bugfixes and improvements
  • Fixed an issue where dagster could fail to load large pipelines.
  • Fixed a bug where experimental arg warning would be thrown even when not using versioned dagster type loaders.
  • Fixed a bug where CeleryDockerExecutor was failing to execute pipelines unless they used a legacy workspace config.
  • Fixed a bug where pipeline runs using IntMetadataEntryData could not be visualized in dagit.

Experimental

  • Improve the output structure of dagster-dbt solids.
  • Version-based memoization over outputs stored in the intermediate store now works

Documentation

  • Fix a code snippet rendering issue in Overview: Assets & Materializations
  • Fixed all python code snippets alignment across docs examples
dagster -

Published by alangenfeld about 4 years ago

0.9.14

New

  • Steps down stream of a failed step no longer report skip events and instead simply do not execute.
  • dagit-debug can load multiple debug files.
  • dagit now has a Debug Console Logging feature flag accessible at /flags .
  • Telemetry metrics are now taken when scheduled jobs are executed.
  • With memoized reexecution, we now only copy outputs that current plan won't generate
  • Document titles throughout dagit

Community Contributions

  • [dagster-ge] solid factory can now handle arbitrary types (thanks @sd2k!)
  • [dagster-dask] utility options are now available in loader/materializer for Dask DataFrame (thanks @kinghuang!)

Bugfixes

  • Fixed an issue where run termination would sometimes be ignored or leave the execution process hanging
  • [dagster-k8s] fixed issue that would cause timeouts on clusters with many jobs
  • Fixed an issue where reconstructable was unusable in an interactive environment, even when the pipeline is defined in a different module.
  • Bugfixes and UX improvements in dagit

Experimental

  • AssetMaterializations now have an optional “partition” attribute
dagster - 0.9.12

Published by gibsondan about 4 years ago

Breaking Changes

  • Dagster now warns when a solid, pipeline, or other definition is created with an invalid name (for example, a Python keyword). This warning will become an error in the 0.9.13 release.

Community Contributions

  • Added an int type to EventMetadataEntry (Thanks @ChocoletMousse !)
  • Added a build_composite_solid_definition method to Lakehouse (Thanks @sd2k!)
  • Improved broken link detection in Dagster docs (Thanks @keyz !)

New

  • Improvements to log filtering on Run view in Dagit
  • Improvements to instance level scheduler page
  • Emit engine events when pipeline termination is initiated
  • Published the Lakehouse module to PyPI

Bugfixes

  • Syntax errors in user code now display the file and line number with the error in Dagit.
  • Dask executor no longer fails when using intermediate_storage
  • Fixes an issue using `build_reconstructable_pipeline
  • In the Celery K8s executor, we now mark the step as failed when the step job fails
  • Changed DagsterInvalidAssetKey error so that it no longer fails upon being thrown.

Documentation

  • Added API docs for dagster-dbt experimental library.
  • Fixed some cosmetic issues with docs.dagster.io.
  • Added code snippets from Solids examples to test path, and fixed some inconsistencies regarding parameter ordering.
  • Changed to using markers instead of exact line numbers to mark out code snippets
dagster -

Published by rexledesma about 4 years ago

Breaking Changes

  • [dagster-dask] Removed the compute option from Dask DataFrame materialization configs for all output types. Setting this option to False (default True) would result in a future that is never computed, leading to missing materializations

Community Contributions

New

  • Console log messages are now streamlined to live on a single line per message
  • Added better messaging around $DAGSTER_HOME if it is not set or improperly setup when starting up a Dagster instance
  • Tools for exporting a file for debugging a run have been added:
    • dagster debug export - a new CLI entry added for exporting a run by id to a file
    • dagit-debug - a new CLI added for loading dagit with a run to debug
    • dagit now has a button to download the debug file for a run via the action menu on the runs page
  • The dagster api grpc command now defaults to the current working directory if none is specified
  • Added retries to dagster-postgres connections
  • Fixed faulty warning message when invoking the same solid multiple times in the same context
  • Added ability to specify custom liveness probe for celery workers in kubernetes deployment

Bugfixes

  • Fixed a bug where Dagster types like List/Set/Tuple/Dict/Optional were not displaying properly on dagit logs
  • Fixed endless spinners on dagit --empty-workspace
  • Fixed incorrect snapshot banner on pipeline view
  • Fixed visual overlapping of overflowing dagit logs
  • Fixed a bug where hanging runs when executing against a gRPC server could cause the Runs page to be unable to load
  • Fixed a bug in celery integration where celery tasks could return None when an iterable is expected, causing errors in the celery execution loop.

Experimental

  • [lakehouse] Each time a Lakehouse solid updates an asset, it automatically generates an AssetMaterialization event
  • [lakehouse] Lakehouse computed_assets now accept a version argument that describes the version of the computation
  • Setting the “dagster/is_memoized_run” tag to true will cause the run to skip any steps whose versions match the versions of outputs produced in prior runs.
  • [dagster-dbt] Solids for running dbt CLI commands
  • Added extensive documentation to illuminate how versions are computed
  • Added versions for step inputs from config, default values, and from other step outputs
dagster - 0.9.9

Published by chenbobby about 4 years ago

New

  • [Databricks] solids created with create_databricks_job_solid now log a URL for accessing the job in the Databricks UI.
  • The pipeline execute command now defaults to using your current directory if you don’t specify a working directory.

Bugfixes

  • [Celery-K8s] Surface errors to Dagit that previously were not caught in the Celery workers.
  • Fix issues with calling add_run_tags on tags that already exist.
  • Add “Unknown” step state in Dagit’s pipeline run logs view for when pipeline has completed but step has not emitted a completion event

Experimental

  • Version tags for resources and external inputs.

Documentation

  • Fix rendering of example solid config in “Basics of Solids” tutorial.
dagster - 0.9.8

Published by yuhan about 4 years ago

New

  • Support for the Dagster step selection DSL: reexecute_pipeline now takes step_selection, which accepts queries like *solid_a.compute++ (i.e., solid_a.compute, all of its ancestors, its immediate descendants, and their immediate descendants). steps_to_execute is deprecated and will be removed in 0.10.0.

Community contributions

  • [dagster-databricks] Improved setup of Databricks environment (Thanks @sd2k!)
  • Enabled frozenlist pickling (Thanks @kinghuang!)

Bugfixes

  • Fixed a bug that pipeline-level hooks were not correctly applied on a pipeline subset.
  • Improved error messages when execute command can't load a code pointer.
  • Fixed a bug that prevented serializing Spark intermediates with configured intermediate storages.

Dagit

  • Enabled subset reexecution via Dagit when part of the pipeline is still running.
  • Made Schedules clickable and link to View All page in the schedule section.
  • Various Dagit UI improvements.

Experimental

  • [lakehouse] Added CLI command for building and executing a pipeline that updates a given set of assets: house update --module package.module —assets my_asset*

Documentation

  • Fixes and improvements.
dagster - 0.9.7

Published by catherinewu about 4 years ago

Bugfixes

  • Fixed an issue in the dagstermill library that caused solid config fetch to be non-deterministic.
  • Fixed an issue in the K8sScheduler where multiple pipeline runs were kicked off for each scheduled
    execution.
dagster - 0.9.6

Published by catherinewu about 4 years ago

Changelog

0.9.6

New

  • Added ADLS2 storage plugin for Spark DataFrame (Thanks @sd2k!)
  • Added feature in the Dagit Playground to automatically remove extra configuration that does not conform to a pipeline’s config schema.
  • [Dagster-Celery/Celery-K8s/Celery-Docker] Added Celery worker names and pods to the logs for each step execution

Community contributions

  • Re-enabled dagster-azure integration tests in dagster-databricks tests (Thanks @sd2k!)
  • Moved dict_without_keys from dagster-pandas into dagster.utils (Thanks @DavidKatz-il!)
  • Moved Dask DataFrame read/to options under read/to keys (Thanks @kinghuang!)

Bugfixes

  • Fixed helper for importing data from GCS paths into Bigquery (Thanks @grabangomb!)
  • Postgres event storage now waits to open a thread to watch runs until it is needed

Experimental

  • Added version computation function for DagsterTypeLoader. (Actual versioning will be supported in 0.10.0)
  • Added version attribute to solid and SolidDefinition. (Actual versioning will be supported in 0.10.0)
dagster -

Published by prha about 4 years ago

New

  • UI improvements to the backfill partition selector
  • Enabled sorting of steps by failure in the partition run matrix in Dagit

Bugfixes

  • [dagstermill] fixes an issue with output notebooks and s3 storage
  • [dagster_celery] bug fixed in pythonpath calculation (thanks @enima2648!)
  • [dagster_pandas] marked create_structured_dataframe_type and ConstraintWithMetadata as experimental APIs
  • [dagster_k8s] reduced default job backoff limit to 0

Docs

  • Various docs site improvements
dagster - 0.9.4

Published by helloworld about 4 years ago

Breaking Changes

  • When using the configured API on a solid or composite solid, a new solid name must be provided.
  • The image used by the K8sScheduler to launch scheduled executions is now specified under the “scheduler” section of the Helm chart (previously under “pipeline_run” section).

New

  • Added an experimental mode that speeds up interactions in dagit by launching a gRPC server on startup for each repository location in your workspace. To enable it, add the following to your dagster.yaml:
opt_in:
  local_servers: true
  • Intermediate Storage and System Storage now default to the first provided storage definition when no configuration is provided. Previously, it would be necessary to provide a run config for storage whenever providing custom storage definitions, even if that storage required no run configuration. Now, if the first provided storage definition requires no run configuration, the system will default to using it.
  • Added a timezone picker to Dagit, and made all timestamps timezone-aware
  • Added solid_config to hook context which provides the access to the config schema variable of the corresponding solid.
  • Hooks can be directly set on PipelineDefinition or @pipeline, e.g. @pipeline(hook_defs={hook_a}). It will apply the hooks on every single solid instance within the pipeline.
  • Added Partitions tab for partitioned pipelines, with new backfill selector.
dagster -

Published by dpeng817 about 4 years ago

New

  • Added step-level run history for partitioned schedules on the schedule view
  • Added great_expectations integration, through the dagster_ge library. Example usage is under a new example, called ge_example, and documentation for the library can be found under the libraries section of apidocs.
  • PythonObjectDagsterType can now take a tuple of types as well as a single type, more closely mirroring isinstance and allowing Union types to be represented in Dagster.
  • The configured API can now be used on all definition types (including CompositeDefinition). Example usage has been updated in the configuration documentation (https://docs.dagster.io/overview/configuration).
  • Descriptions for solid inputs and outputs will now be inferred from doc blocks if available (thanks @AndersonReyes (https://github.com/dagster-io/dagster/commits?author=AndersonReyes) !)
  • Various documentation improvements (thanks @jeriscc (https://github.com/dagster-io/dagster/commits?author=jeriscc) !)
  • Load inputs from pyspark dataframes (thanks @davidkatz-il (https://github.com/dagster-io/dagster/commits?author=davidkatz-il) !)
  • Updated Helm chart to include auto-generated user code configmap in user code deployment by default

Bugfixes

  • Databricks now checks intermediate storage instead of system storage
  • Fixes a bug where applying hooks on a pipeline with composite solids would flatten the top-level solids. Now applying hooks on pipelines or composite solids means attaching hooks to every single solid instance within the pipeline or the composite solid.
  • Fixes the GraphQL playground hosted by dagit
  • Fixes a bug where K8s CronJobs were stopped unnecessarily during schedule reconciliation
  • Removed deprecated env param from CLI
  • Renamed —host CLI param to —grpc_host to avoid conflict with dagit —host param

Experimental

  • New dagster-k8s/config tag that lets users pass in custom configuration to the Kubernetes Job, Job metadata, JobSpec, PodSpec, and PodTemplateSpec metadata.
    • This allows users to specify settings like eviction policy annotations and node affinities.
    • Example:
      •     tags = {
              'dagster-k8s/config': {
                'container_config': {
                  'resources': {
                    'requests': { 'cpu': '250m', 'memory': '64Mi' },
                    'limits': { 'cpu': '500m', 'memory': '2560Mi' },
                  }
                },
                'pod_template_spec_metadata': {
                  'annotations': { "cluster-autoscaler.kubernetes.io/safe-to-evict": "true"}
                },
                'pod_spec_config': {
                  'affinity': {
                    'nodeAffinity': {
                      'requiredDuringSchedulingIgnoredDuringExecution': {
                        'nodeSelectorTerms': [{
                          'matchExpressions': [{
                            'key': 'beta.kubernetes.io/os', 'operator': 'In', 'values': ['windows', 'linux'],
                          }]
                        }]
                      }
                    }
                  }
                },
              },
            },
          )
          def my_solid(context):
            context.log.info('running')```