dagster

An orchestration platform for the development, production, and observation of data assets.

APACHE-2.0 License

Downloads
12.2M
Stars
11.1K
Committers
367

Bot releases are hidden (Show)

dagster - 0.9.2

Published by ajnadel about 4 years ago

New

  • Added ResourceDefinition.mock_resource helper for magic mocking resources. Example usage can be found here
  • Remove the row_count metadata entry from the Dask DataFrame type check (thanks @kinghuang!)
  • Add orient to the config options when materializing a Dask DataFrame to json (thanks @kinghuang!)

Bugfixes

  • Fixed a bug where applying configured to a solid definition would overwrite inputs from run config.
  • Fixed a bug where pipeline tags would not apply to solid subsets.
  • Improved error messages for repository-loading errors in CLI commands.
  • Fixed a bug where pipeline execution error messages were not being surfaced in Dagit.
dagster - 0.9.1

Published by johannkm about 4 years ago

Bugfixes

  • Fixes an issue in the dagster-k8s-celery executor when executing solid subsets
dagster - 0.9.0

Published by helloworld about 4 years ago

Breaking Changes

  • The dagit key is no longer part of the instance configuration schema and must be removed from dagster.yaml files before they can be used.
  • -d can no longer be used as a command-line argument to specify a mode. Use --mode instead.
  • Use --preset instead of --preset-name to specify a preset to the pipeline launch command.
  • We have removed the config argument to the ConfigMapping, @composite_solid, @solid, SolidDefinition, @executor, ExecutorDefinition, @logger, LoggerDefinition, @resource, and ResourceDefinition APIs, which we deprecated in 0.8.0. Use config_schema instead.

New

  • Python 3.8 is now fully supported.
  • -d or --working-directory can be used to specify a working directory in any command that
    takes in a -f or --python_file argument.
  • Removed the deprecation of create_dagster_pandas_dataframe_type. This is the currently
    supported API for custom pandas data frame type creation.
  • Removed gevent dependency from dagster
  • New configured API for predefining configuration for various definitions: https://docs.dagster.io/overview/configuration/#configured
  • Added hooks to enable success and failure handling policies on pipelines. This enables users to set up policies on all solids within a pipeline or on a per solid basis. Example usage can be found here
  • New instance level view of Scheduler and running schedules
  • dagster-graphql is now only required in dagit images.
dagster - 0.8.9

Published by helloworld about 4 years ago

New

  • CeleryK8sRunLauncher supports termination of pipeline runs. This can be accessed via the
    “Terminate” button in Dagit’s Pipeline Run view or via “Cancel” in Dagit’s All Runs page. This
    will terminate the run master K8s Job along with all running step job K8s Jobs; steps that are
    still in the Celery queue will not create K8s Jobs. The pipeline and all impacted steps will
    be marked as failed. We recommend implementing resources as context managers and we will execute
    the finally block upon termination.
  • K8sRunLauncher supports termination of pipeline runs.
  • AssetMaterialization events display the asset key in the Runs view.
  • Added a new "Actions" button in Dagit to allow to cancel or delete mulitple runs.

Bugfixes

  • Fixed an issue where DagsterInstance was leaving database connections open due to not being
    garbage collected.
  • Fixed an issue with fan-in inputs skipping when upstream solids have skipped.
  • Fixed an issue with getting results from composites with skippable outputs in python API.
  • Fixed an issue where using Enum in resource config schemas resulted in an error.
dagster - 0.8.10

Published by helloworld about 4 years ago

New

  • Added new GCS and Azure file manager resources
  • AssetMaterializations can now have type information attached as metadata. See the materializations tutorial for more
  • Added verification for resource arguments (previously only validated at runtime)

Bugfixes

  • Fixed bug with order-dependent python module resolution seen with some packages (e.g. numpy)
  • Fixed bug where Airflow's context['ts'] was not passed properly
  • Fixed a bug in celery-k8s when using task_acks_late: true that resulted in a 409 Conflict error from Kubernetes. The creation of a Kubernetes Job will now be aborted if another Job with the same name exists
  • Fixed a bug with composite solid output results when solids are skipped
  • Hide the re-execution button in Dagit when the pipeline is not re-executable in the currently loaded repository

Docs

  • Fixed code example in the advanced scheduling doc (Thanks @wingyplus!)
  • Various other improvements
dagster -

Published by gibsondan over 4 years ago

New

  • The new configured API makes it easy to create configured versions of resources.
  • Deprecated the Materialization event type in favor of the new AssetMaterialization event type,
    which requires the asset_key parameter. Solids yielding Materialization events will continue
    to work as before, though the Materialization event will be removed in a future release.
  • We have added an intermediate_store_defs argument to ModeDefinition, which will eventually
    replace system storage. You can only use one or the other for now. We will eventually deprecate
    system storage entirely, but continued usage for the time being is fine.
  • The help panel in the dagit config editor can now be resized and toggled open or closed, to
    enable easier editing on smaller screens.

Bugfixes

  • Opening new Dagit browser windows maintains your current repository selection. #2722
  • Pipelines with the same name in different repositories no longer incorrectly share playground state. #2720
  • Setting default_value config on a field now works as expected. #2725
  • Fixed rendering bug in the dagit run reviewer where yet-to-be executed execution steps were
    rendered on left-hand side instead of the right.
dagster -

Published by alangenfeld over 4 years ago

Breaking Changes

  • Loading python modules reliant on the working directory being on the PYTHONPATH is no longer
    supported. The dagster and dagit CLI commands no longer add the working directory to the
    PYTHONPATH when resolving modules, which may break some imports. Explicitly installed python
    packages can be specified in workspaces using the python_package workspace yaml config option.
    The python_module config option is deprecated and will be removed in a future release.

New

  • Dagit can be hosted on a sub-path by passing --path-prefix to the dagit CLI. #2073
  • The date_partition_range util function now accepts an optional inclusive boolean argument. By default, the function does not return include the partition for which the end time of the date range is greater than the current time. If inclusive=True, then the list of partitions returned will include the extra partition.
  • MultiDependency or fan-in inputs will now only cause the solid step to skip if all of the
    fanned-in inputs upstream outputs were skipped

Bugfixes

  • Fixed accidental breaking change with input_hydration_config arguments
  • Fixed an issue with yaml merging (thanks @shasha79!)
  • Invoking alias on a solid output will produce a useful error message (thanks @iKintosh!)
  • Restored missing run pagination controls
  • Fixed error resolving partition-based schedules created via dagster schedule decorators (e.g. daily_schedule) for certain workspace.yaml formats
dagster -

Published by johannkm over 4 years ago

Breaking Changes

  • The dagster-celery module has been broken apart to manage dependencies more coherently. There are now three modules: dagster celery, dagster-celery-k8s, and dagster-celery-docker.
  • Related to above, the dagster-celery worker start command now takes a required -A parameter which must point to the app.py file within the appropriate module. E.g if you are using the celery_k8s_job_executor then you must use the -A dagster_celery_k8s.app option when using the celery or dagster-celery cli tools. Similar for the celery_docker_executor: -A dagster_celery_docker.app must be used.
  • Renamed the input_hydration_config and output_materialization_config decorators to dagster_type_ and dagster_type_materializer respectively. Renamed DagsterType's input_hydration_config and output_materialization_config arguments to loader and materializer respectively.

New

  • New pipeline scoped runs tab in Dagit

  • Add the following Dask Job Queue clusters: moab, sge, lsf, slurm, oar (thanks @DavidKatz-il!)

  • K8s resource-requirements for run coordinator pods can be specified using the dagster-k8s/resource_requirements tag on pipeline definitions:

    @pipeline(
        tags={
            'dagster-k8s/resource_requirements': {
                'requests': {'cpu': '250m', 'memory': '64Mi'},
                'limits': {'cpu': '500m', 'memory': '2560Mi'},
            }
        },
    )
    def foo_bar_pipeline():
    
  • Added better error messaging in dagit for partition set and schedule configuration errors

  • An initial version of the CeleryDockerExecutor was added (thanks @mrdrprofuroboros!). The celery workers will launch tasks in docker containers.

  • Experimental: Great Expectations integration is currently under development in the new library dagster-ge. Example usage can be found here

dagster - 0.8.5

Published by helloworld over 4 years ago

Breaking Changes

  • Python 3.5 is no longer under test.
  • Engine and ExecutorConfig have been deleted in favor of Executor. Instead of the @executor decorator decorating a function that returns an ExecutorConfig it should now decorate a function that returns an Executor.

New

  • The python built-in dict can be used as an alias for Permissive() within a config schema declaration.
  • Use StringSource in the S3ComputeLogManager configuration schema to support using environment variables in the configuration (Thanks @mrdrprofuroboros!)
  • Improve Backfill CLI help text
  • Add options to spark_df_output_schema (Thanks @DavidKatz-il!)
  • Helm: Added support for overriding the PostgreSQL image/version used in the init container checks.
  • Update celery k8s helm chart to include liveness checks for celery workers and flower
  • Support step level retries to celery k8s executor

Bugfixes

  • Improve error message shown when a RepositoryDefinition returns objects that are not one of the allowed definition types (Thanks @sd2k!)
  • Show error message when $DAGSTER_HOME environment variable is not an absolute path (Thanks @AndersonReyes!)
  • Update default value for staging_prefix in the DatabricksPySparkStepLauncher configuration to be an absolute path (Thanks @sd2k!)
  • Improve error message shown when Databricks logs can't be retrieved (Thanks @sd2k!)
  • Fix errors in documentation fo input_hydration_config (Thanks @joeyfreund!)
dagster - 0.8.4

Published by ajnadel over 4 years ago

Bugfix

  • Reverted changed in 0.8.3 that caused error during run launch in certain circumstances
  • Updated partition graphs on schedule page to select most recent run
  • Forced reload of partitions for partition sets to ensure not serving stale data

New

  • Added reload button to dagit to reload current repository
  • Added option to wipe a single asset key by using dagster asset wipe <asset_key>
  • Simplified schedule page, removing ticks table, adding tags for last tick attempt
  • Better debugging tools for launch errors
dagster - 0.8.3

Published by natekupp over 4 years ago

Breaking Changes

  • Previously, the gcs_resource returned a GCSResource wrapper which had a single client property that returned a google.cloud.storage.client.Client. Now, the gcs_resource returns the client directly.

    To update solids that use the gcp_resource, change:

    context.resources.gcs.client
    

    To:

    context.resources.gcs
    

New

  • Introduced a new Python API reexecute_pipeline to reexecute an existing pipeline run.
  • Performance improvements in Pipeline Overview and other pages.
  • Long metadata entries in the asset details view are now scrollable.
  • Added a project field to the gcs_resource in dagster_gcp.
  • Added new CLI command dagster asset wipe to remove all existing asset keys.

Bugfix

  • Several Dagit bugfixes and performance improvements
  • Fixes pipeline execution issue with custom run launchers that call executeRunInProcess.
  • Updates dagster schedule up output to be repository location scoped
dagster - 0.8.2

Published by helloworld over 4 years ago

Bugfix

  • Fixes issues with dagster instance migrate.
  • Fixes bug in launch_scheduled_execution that would mask configuration errors.
  • Fixes bug in dagit where schedule related errors were not shown.
  • Fixes JSON-serialization error in dagster-k8s when specifying per-step resources.

New

  • Makes label optional parameter for materializations with asset_key specified.
  • Changes Assets page to have a typeahead selector and hierarchical views based on asset_key path.
  • dagster-ssh
    • adds SFTP get and put functions to SSHResource, replacing sftp_solid.

Docs

  • Various docs corrections
dagster - 0.8.1

Published by helloworld over 4 years ago

Bugfix

  • Fixed a file descriptor leak that caused OSError: [Errno 24] Too many open files when enough
    temporary files were created.
  • Fixed an issue where an empty config in the Playground would unexpectedly be marked as invalid
    YAML.
  • Removed "config" deprecation warnings for dask and celery executors.

New

  • Improved performance of the Assets page.
dagster - 0.8.0 "In The Zone"

Published by mgasner over 4 years ago

Major Changes

Please see the 080_MIGRATION.md migration guide for details on updating existing code to be
compatible with 0.8.0

  • Workspace, host and user process separation, and repository definition Dagit and other tools no
    longer load a single repository containing user definitions such as pipelines into the same
    process as the framework code. Instead, they load a "workspace" that can contain multiple
    repositories sourced from a variety of different external locations (e.g., Python modules and
    Python virtualenvs, with containers and source control repositories soon to come).

    The repositories in a workspace are loaded into their own "user" processes distinct from the
    "host" framework process. Dagit and other tools now communicate with user code over an IPC
    mechanism. This architectural change has a couple of advantages:

    • Dagit no longer needs to be restarted when there is an update to user code.
    • Users can use repositories to organize their pipelines, but still work on all of their
      repositories using a single running Dagit.
    • The Dagit process can now run in a separate Python environment from user code so pipeline
      dependencies do not need to be installed into the Dagit environment.
    • Each repository can be sourced from a separate Python virtualenv, so teams can manage their
      dependencies (or even their own Python versions) separately.

    We have introduced a new file format, workspace.yaml, in order to support this new architecture.
    The workspace yaml encodes what repositories to load and their location, and supersedes the
    repository.yaml file and associated machinery.

    As a consequence, Dagster internals are now stricter about how pipelines are loaded. If you have
    written scripts or tests in which a pipeline is defined and then passed across a process boundary
    (e.g., using the multiprocess_executor or dagstermill), you may now need to wrap the pipeline
    in the reconstructable utility function for it to be reconstructed across the process boundary.

    In addition, rather than instantiate the RepositoryDefinition class directly, users should now
    prefer the @repository decorator. As part of this change, the @scheduler and
    @repository_partitions decorators have been removed, and their functionality subsumed under
    @repository.

  • Dagit organization The Dagit interface has changed substantially and is now oriented around
    pipelines. Within the context of each pipeline in an environment, the previous "Pipelines" and
    "Solids" tabs have been collapsed into the "Definition" tab; a new "Overview" tab provides
    summary information about the pipeline, its schedules, its assets, and recent runs; the previous
    "Playground" tab has been moved within the context of an individual pipeline. Related runs (e.g.,
    runs created by re-executing subsets of previous runs) are now grouped together in the Playground
    for easy reference. Dagit also now includes more advanced support for display of scheduled runs
    that may not have executed ("schedule ticks"), as well as longitudinal views over scheduled runs,
    and asset-oriented views of historical pipeline runs.

  • Assets Assets are named materializations that can be generated by your pipeline solids, which
    support specialized views in Dagit. For example, if we represent a database table with an asset
    key, we can now index all of the pipelines and pipeline runs that materialize that table, and
    view them in a single place. To use the asset system, you must enable an asset-aware storage such
    as Postgres.

  • Run launchers The distinction between "starting" and "launching" a run has been effaced. All
    pipeline runs instigated through Dagit now make use of the RunLauncher configured on the
    Dagster instance, if one is configured. Additionally, run launchers can now support termination of
    previously launched runs. If you have written your own run launcher, you may want to update it to
    support termination. Note also that as of 0.7.9, the semantics of RunLauncher.launch_run have
    changed; this method now takes the run_id of an existing run and should no longer attempt to
    create the run in the instance.

  • Flexible reexecution Pipeline re-execution from Dagit is now fully flexible. You may
    re-execute arbitrary subsets of a pipeline's execution steps, and the re-execution now appears
    in the interface as a child run of the original execution.

  • Support for historical runs Snapshots of pipelines and other Dagster objects are now persisted
    along with pipeline runs, so that historial runs can be loaded for review with the correct
    execution plans even when pipeline code has changed. This prepares the system to be able to diff
    pipeline runs and other objects against each other.

  • Step launchers and expanded support for PySpark on EMR and Databricks We've introduced a new
    StepLauncher abstraction that uses the resource system to allow individual execution steps to
    be run in separate processes (and thus on separate execution substrates). This has made extensive
    improvements to our PySpark support possible, including the option to execute individual PySpark
    steps on EMR using the EmrPySparkStepLauncher and on Databricks using the
    DatabricksPySparkStepLauncher The emr_pyspark example demonstrates how to use a step launcher.

  • Clearer names What was previously known as the environment dictionary is now called the
    run_config, and the previous environment_dict argument to APIs such as execute_pipeline is
    now deprecated. We renamed this argument to focus attention on the configuration of the run
    being launched or executed, rather than on an ambiguous "environment". We've also renamed the
    config argument to all use definitions to be config_schema, which should reduce ambiguity
    between the configuration schema and the value being passed in some particular case. We've also
    consolidated and improved documentation of the valid types for a config schema.

  • Lakehouse We're pleased to introduce Lakehouse, an experimental, alternative programming model
    for data applications, built on top of Dagster core. Lakehouse allows developers to define data
    applications in terms of data assets, such as database tables or ML models, rather than in terms
    of the computations that produce those assets. The simple_lakehouse example gives a taste of
    what it's like to program in Lakehouse. We'd love feedback on whether this model is helpful!

  • Airflow ingest We've expanded the tooling available to teams with existing Airflow installations
    that are interested in incrementally adopting Dagster. Previously, we provided only injection
    tools that allowed developers to write Dagster pipelines and then compile them into Airflow DAGs
    for execution. We've now added ingestion tools that allow teams to move to Dagster for execution
    without having to rewrite all of their legacy pipelines in Dagster. In this approach, Airflow
    DAGs are kept in their own container/environment, compiled into Dagster pipelines, and run via
    the Dagster orchestrator. See the airflow_ingest example for details!

Breaking Changes

  • dagster

    • The @scheduler and @repository_partitions decorators have been removed. Instances of
      ScheduleDefinition and PartitionSetDefinition belonging to a repository should be specified
      using the @repository decorator instead.

    • Support for the Dagster solid selection DSL, previously introduced in Dagit, is now uniform
      throughout the Python codebase, with the previous solid_subset arguments (--solid-subset in
      the CLI) being replaced by solid_selection (--solid-selection). In addition to the names of
      individual solids, this argument now supports selection queries like *solid_name++ (i.e.,
      solid_name, all of its ancestors, its immediate descendants, and their immediate descendants).

    • The built-in Dagster type Path has been removed.

    • PartitionSetDefinition names, including those defined by a PartitionScheduleDefinition,
      must now be unique within a single repository.

    • Asset keys are now sanitized for non-alphanumeric characters. All characters besides
      alphanumerics and _ are treated as path delimiters. Asset keys can also be specified using
      AssetKey, which accepts a list of strings as an explicit path. If you are running 0.7.10 or
      later and using assets, you may need to migrate your historical event log data for asset keys
      from previous runs to be attributed correctly. This event_log data migration can be invoked
      as follows:

      from dagster.core.storage.event_log.migration import migrate_event_log_data
      from dagster import DagsterInstance
      
      migrate_event_log_data(instance=DagsterInstance.get())
      
    • The interface of the Scheduler base class has changed substantially. If you've written a
      custom scheduler, please get in touch!

    • The partitioned schedule decorators now generate PartitionSetDefinition names using
      the schedule name, suffixed with _partitions.

    • The repository property on ScheduleExecutionContext is no longer available. If you were
      using this property to pass to Scheduler instance methods, this interface has changed
      significantly. Please see the Scheduler class documentation for details.

    • The CLI option --celery-base-priority is no longer available for the command:
      dagster pipeline backfill. Use the tags option to specify the celery priority, (e.g.
      dagster pipeline backfill my_pipeline --tags '{ "dagster-celery/run_priority": 3 }'

    • The execute_partition_set API has been removed.

    • The deprecated is_optional parameter to Field and OutputDefinition has been removed.
      Use is_required instead.

    • The deprecated runtime_type property on InputDefinition and OutputDefinition has been
      removed. Use dagster_type instead.

    • The deprecated has_runtime_type, runtime_type_named, and all_runtime_types methods on
      PipelineDefinition have been removed. Use has_dagster_type, dagster_type_named, and
      all_dagster_types instead.

    • The deprecated all_runtime_types method on SolidDefinition and CompositeSolidDefinition
      has been removed. Use all_dagster_types instead.

    • The deprecated metadata argument to SolidDefinition and @solid has been removed. Use
      tags instead.

    • The graphviz-based DAG visualization in Dagster core has been removed. Please use Dagit!

  • dagit

    • dagit-cli has been removed, and dagit is now the only console entrypoint.
  • dagster-aws

    • The AWS CLI has been removed.
    • dagster_aws.EmrRunJobFlowSolidDefinition has been removed.
  • dagster-bash

    • This package has been renamed to dagster-shell. Thebash_command_solid and bash_script_solid
      solid factory functions have been renamed to create_shell_command_solid and
      create_shell_script_solid.
  • dagster-celery

    • The CLI option --celery-base-priority is no longer available for the command:
      dagster pipeline backfill. Use the tags option to specify the celery priority, (e.g.
      dagster pipeline backfill my_pipeline --tags '{ "dagster-celery/run_priority": 3 }'
  • dagster-dask

    • The config schema for the dagster_dask.dask_executor has changed. The previous config should
      now be nested under the key local.
  • dagster-gcp

    • The BigQueryClient has been removed. Use bigquery_resource instead.
  • dagster-dbt

    • The dagster-dbt package has been removed. This was inadequate as a reference integration, and
      will be replaced in 0.8.x.
  • dagster-spark

    • dagster_spark.SparkSolidDefinition has been removed - use create_spark_solid instead.
    • The SparkRDD Dagster type, which only worked with an in-memory engine, has been removed.
  • dagster-twilio

    • The TwilioClient has been removed. Use twilio_resource instead.

New

  • dagster

    • You may now set asset_key on any Materialization to use the new asset system. You will also
      need to configure an asset-aware storage, such as Postgres. The longitudinal_pipeline example
      demonstrates this system.
    • The partitioned schedule decorators now support an optional end_time.
    • Opt-in telemetry now reports the Python version being used.
  • dagit

    • Dagit's GraphQL playground is now available at /graphiql as well as at /graphql.
  • dagster-aws

    • The dagster_aws.S3ComputeLogManager may now be configured to override the S3 endpoint and
      associated SSL settings.
    • Config string and integer values in the S3 tooling may now be set using either environment
      variables or literals.
  • dagster-azure

    • We've added the dagster-azure package, with support for Azure Data Lake Storage Gen2; you can
      use the adls2_system_storage or, for direct access, the adls2_resource resource. (Thanks
      @sd2k!)
  • dagster-dask

    • Dask clusters are now supported by dagster_dask.dask_executor. For full support, you will need
      to install extras with pip install dagster-dask[yarn, pbs, kube]. (Thanks @DavidKatz-il!)
  • dagster-databricks

    • We've added the dagster-databricks package, with support for running PySpark steps on Databricks
      clusters through the databricks_pyspark_step_launcher. (Thanks @sd2k!)
  • dagster-gcp

    • Config string and integer values in the BigQuery, Dataproc, and GCS tooling may now be set
      using either environment variables or literals.
  • dagster-k8s

    • Added the CeleryK8sRunLauncher to submit execution plan steps to Celery task queues for
      execution as k8s Jobs.
    • Added the ability to specify resource limits on a per-pipeline and per-step basis for k8s Jobs.
    • Many improvements and bug fixes to the dagster-k8s Helm chart.
  • dagster-pandas

    • Config string and integer values in the dagster-pandas input and output schemas may now be set
      using either environment variables or literals.
  • dagster-papertrail

    • Config string and integer values in the papertrail_logger may now be set using either
      environment variables or literals.
  • dagster-pyspark

    • PySpark solids can now run on EMR, using the emr_pyspark_step_launcher, or on Databricks using
      the new dagster-databricks package. The emr_pyspark example demonstrates how to use a step
      launcher.
  • dagster-snowflake

    • Config string and integer values in the snowflake_resource may now be set using either
      environment variables or literals.
  • dagster-spark

    • dagster_spark.create_spark_solid now accepts a required_resource_keys argument, which
      enables setting up a step launcher for Spark solids, like the emr_pyspark_step_launcher.

Bugfix

  • dagster pipeline execute now sets a non-zero exit code when pipeline execution fails.
dagster - 0.7.16

Published by prha over 4 years ago

Bugfix

  • Enabled NoOpComputeLogManager to be configured as the compute_logs implementation in dagster.yaml
  • Suppressed noisy error messages in logs from skipped step
dagster - 0.7.15

Published by natekupp over 4 years ago

New

  • Improve dagster scheduler state reconciliation.
dagster -

Published by catherinewu over 4 years ago

New

  • Dagit now allows re-executing arbitrary step subset via step selector syntax, regardless of whether
    the previous pipeline failed or not.
  • Added a search filter for the root Assets page
  • Adds tooltip explanations for disabled run actions
  • The last output of the cron job command created by the scheduler is now stored in a file. A new dagster schedule logs {schedule_name} command will show the log file for a given schedule. This helps uncover errors like missing environment variables and import errors.
  • The dagit schedule page will now show inconsistency errors between schedule state and the cron tab that were previously only displayed by the dagster schedule debug command. As before, these errors can be resolve using dagster schedule up

Bugfix

  • Fixes an issue with config schema validation on Arrays
  • Fixes an issue with initializing K8sRunLauncher when configured via dagster.yaml
  • Fixes a race condition in Airflow injection logic that happens when multiple Operators try to
    create PipelineRun entries simultaneously.
  • Fixed an issue with schedules that had invalid config not logging the appropriate error.
dagster -

Published by alangenfeld over 4 years ago

Breaking Changes

  • dagster pipeline backfill command no longer takes a mode flag. Instead, it uses the mode specified on the PartitionSetDefinition. Similarly, the runs created from the backfill also use the solid_subset specified on the PartitionSetDefinition

BugFix

  • Fixes a bug where using solid subsets when launching pipeline runs would fail config validation.
  • (dagster-gcp) allow multiple "bq_solid_for_queries" solids to co-exist in a pipeline
  • Improve scheduler state reconciliation with dagster-cron scheduler. dagster schedule debug command will display issues related to missing crob jobs, extraneous cron jobs, and duplicate cron jobs. Running dagster schedule up will fix any issues.

New

  • The dagster-airflow package now supports loading Airflow dags without depending on an initialized Airflow database.
  • Improvements to the longitudinal partitioned schedule view, including live updates, run filtering, and better default states.
  • Added user warning for dagster library packages that are out of sync with the core dagster package.
dagster - 0.7.12

Published by helloworld over 4 years ago

Bugfix

  • We now only render the subset of an execution plan that has actually executed, and persist that subset information along with the snapshot.
  • @pipeline and @composite_solid now correctly capture doc from the function they decorate.
  • Fixed a bug with using solid subsets in the Dagit playground
dagster -

Published by prha over 4 years ago

0.7.11

Bugfix

  • Fixed an issue with strict snapshot ID matching when loading historical snapshots, which caused
    errors on the Runs page when viewing historical runs.
  • Fixed an issue where dagster_celery had introduced a spurious dependency on dagster_k8s
    (#2435)
  • Fixed an issue where our Airflow, Celery, and Dask integrations required S3 or GCS storage and
    prevented use of filesystem storage. Filesystem storage is now also permitted, to enable use of
    these integrations with distributed filesystems like NFS (#2436).