An orchestration platform for the development, production, and observation of data assets.
APACHE-2.0 License
Bot releases are hidden (Show)
New
RepositoryDefinition
now takes schedule_defs
and partition_set_defs
directly. The loadingrepository.yaml
under the scheduler:
and partitions:
keysmake_dagster_repo_from_airflow_example_dags
).dagster-celery worker start -n my-worker -- --uid=42
will pass the--uid
flag to celery.PresetDefinition
that has no environment defined.dagster schedule debug
command to help debug scheduler state.SystemCronScheduler
now verifies that a cron job has been successfully been added to theBreaking Changes
dagster instance migrate
is required for this release to support the new experimental assetsPath
is no longer valid in config schemas. Use str
or dagster.String
instead.@pyspark_solid
decorator - its functionality, which was experimental, is subsumed byDagit
Experimental
asset_key
string parameter to Materializations and created a new “Assets” tab in Dagitemr_pyspark_step_launcher
that enables launching PySpark solids in EMR. TheBugfix
CompositeSolidResult
objects.Breaking Changes
DagsterInstance.launch_run
, this method now takes a run id instead of an instance of PipelineRun
. Additionally, DagsterInstance.create_run
and DagsterInstance.create_empty_run
have been replaced by DagsterInstance.get_or_create_run
and DagsterInstance.create_run_for_pipeline
.RunLauncher
, there are two required changes:
RunLauncher.launch_run
takes a pipeline run that has already been created. You should remove any calls to instance.create_run
in this method.startPipelineExecution
(defined in the dagster_graphql.client.query.START_PIPELINE_EXECUTION_MUTATION
) in the run launcher, you should call startPipelineExecutionForCreatedRun
(defined in dagster_graphql.client.query.START_PIPELINE_EXECUTION_FOR_CREATED_RUN_MUTATION`RemoteDagitRunLauncher
for an example implementation.New
Bugfix
Documentation
Breaking Changes
execute_pipeline_with_mode
and execute_pipeline_with_preset
APIs have been dropped inexecute_pipeline
, mode
and preset
.RunConfig
to pass options to execute_pipeline
has been deprecated, and RunConfig
execute_solid_within_pipeline
and execute_solids_within_pipeline
APIs, intended to supportmode
and preset
.New
Runs
view will apply that tag as a filter.Bugfix
Experimental
make_dagster_pipeline_from_airflow_dag
). This is in the early experimentation phase.Schedules
detailed view.Documentation
Breaking Changes
The default sqlite and dagster-postgres
implementations have been altered to extract the
event step_key
field as a column, to enable faster per-step queries. You will need to run
dagster instance migrate
to update the schema. You may optionally migrate your historical event
log data to extract the step_key
using the migrate_event_log_data
function. This will ensure
that your historical event log data will be captured in future step-key based views. This
event_log
data migration can be invoked as follows:
from dagster.core.storage.event_log.migration import migrate_event_log_data
from dagster import DagsterInstance
migrate_event_log_data(instance=DagsterInstance.get())
We have made pipeline metadata serializable and persist that along with run information.
While there are no user-facing features to leverage this yet, it does require an instance migration.
dagster instance migrate
. If you have already run the migration for the event_log
changes
above, you do not need to run it again. Any unforeseen errors related the the new snapshot_id
in the runs
table or the new snapshots
table are related to this migration.
dagster-pandas ColumnTypeConstraint
has been removed in favor of ColumnDTypeFnConstraint
and
ColumnDTypeInSetConstraint
.
New
FileManager
machinery.PandasColumn
constructors now support pandas 1.0 dtypes.env:
to load from environment variables.Bugfix
dagit
would not populate tags specified on the pipelineFailure
was not displayed in the error modal indagit
.dagstermill.get_context()
outside ofExperimental
Schedule
tab for scheduled, partitioned pipelines.Dagit
dagster_pandas
dagster_pandas
dataframes.dagster_aws
s3_resource
no longer uses an unsigned session by default.Bugfixes
Documentation
Docs
New
OutputDefinition
to take is_required
rather than is_optional
argument. This is toField
in 0.7.1 and to avoid confusionOptional
, which indicates None-ability,is_optional
is deprecated and will be removed in a future version.Bugfixes
New
dagster-k8s
Helm chart.dagster-k8s
Helm chart.Bugfix
SourceString
.dagster schedule up
would fail in certain scenariosSystemCronScheduler
.Pandas
Dagstermill
Docs
Experimental
RetryRequested
exception, was added.Other
runtime_type
to dagster_type
in definitions. The following are deprecatedInputDefinition.runtime_type
is deprecated. Use InputDefinition.dagster_type
instead.OutputDefinition.runtime_type
is deprecated. Use OutputDefinition.dagster_type
instead.CompositeSolidDefinition.all_runtime_types
is deprecated. Use CompositeSolidDefinition.all_dagster_types
instead.SolidDefinition.all_runtime_types
is deprecated. Use SolidDefinition.all_dagster_types
instead.PipelineDefinition.has_runtime_type
is deprecated. Use PipelineDefinition.has_dagster_type
instead.PipelineDefinition.runtime_type_named
is deprecated. Use PipelineDefinition.dagster_type_named
instead.PipelineDefinition.all_runtime_types
is deprecated. Use PipelineDefinition.all_dagster_types
instead.New
dagster_postgres.PostgresScheduleStorage
on the instance.execute_pipeline_with_mode
API to allow executing a pipeline in test with a specificRunConfig
.--celery-base-priority
to dagster pipeline backfill
.@weekly
schedule decorator.Deprecations
dagster-ge
library has been removed from this release due to drift from the underlyingdagster-pandas
PandasColumn
now includes an is_optional
flag, replacing the previousColumnExistsConstraint
.ignore_missing_values flag
to PandasColumn
in order to apply columndagster-k8s
Documentation
New
Added the IntSource
type, which lets integers be set from environment variables in config.
You may now set tags on pipeline definitions. These will resolve in the following cases:
execute_pipeline
api will create a run with the unionRunConfig
tags, with RunConfig
tags taking precedence.Output materialization configs may now yield multiple Materializations, and the tutorial has
been updated to reflect this.
We now export the SolidExecutionContext
in the public API so that users can correctly type hint
solid compute functions.
Dagit
Bugfix
None
.threads_per_worker
on Dask distributed clusters.dagster-postgres
dagster-aws
s3_resource
now exposes a list_objects_v2
method corresponding to the underlying boto3redshift_resource
to access Redshift databases.dagster-k8s
K8sRunLauncher
config now includes the load_kubeconfig
and kubeconfig_file
options.Documentation
Dependencies
Community
We've added opt-in telemetry to Dagster so we can collect usage statistics in order to inform
development priorities. Telemetry data will motivate projects such as adding features in
frequently-used parts of the CLI and adding more examples in the docs in areas where users
encounter more errors.
We will not see or store solid definitions (including generated context) or pipeline definitions
(including modes and resources). We will not see or store any data that is processed within solids
and pipelines.
If you'd like to opt in to telemetry, please add the following to $DAGSTER_HOME/dagster.yaml
:
telemetry:
enabled: true
Thanks to @basilvetas and @hspak for their contributions!
Breaking Changes
default_value
in Field
no longer accepts native instances of python enums. Insteaddefault_value
in Field
no longer accepts callables.dagster_aws
imports have been reorganized; you should now import resources fromdagster_aws.<AWS service name>
. dagster_aws
provides s3
, emr
, redshift
, and cloudwatch
dagster_aws
S3 resource no longer attempts to model the underlying boto3 API, and you cancontext.resources.s3.list_objects_v2
. (#2292)New
Playground
view in dagit
showing an interactive config mapInputDefinition
dagster pipeline launch
to launch runs using a configured RunLauncher
pdb
utility to SolidExecutionContext
to help with debugging, available within a solid as context.pdb
PresetDefinition.with_additional_config
to allow for config overridesBugfix
@weekly
partitioned schedule decoratordagstermill
dagster-dbt
dbt_solid
now has a Nothing
input to allow for sequencingdagster-k8s
get_celery_engine_config
to select celery engine, leveraging Celery infrastructureDocumentation
Published by asingh16 over 4 years ago
🎆 🚢 🎆 Dagster 0.7.0: Waiting To Exhale 😤 😌 🍵
We are pleased to announce version 0.7.0 of Dagster, codenamed “Waiting To Exhale”. We set out to make Dagster a solution for production-grade pipelines on modern cloud infrastructure. In service of that goal, we needed to fill missing gaps and incorporate feedback from the community at large.
Our last release, 0.6.0, expanded Dagster from local developer experience to a hostable product, allowing for scheduling, execution, and monitoring of pipelines in the cloud.
This release goes further, supporting pipelines with 100s and 1000s of nodes, deployable to modern, scalable cloud infrastructure, with dramatically improved monitoring tools, as well as other features.
Given this, 0.7.0 introduces the following:
https://media.giphy.com/media/Rhx6ujovXlvuKaLCGY/giphy.gif
Warning
There are a substantial number of breaking changes in the 0.7.0 release. These changes effect the scheduler system, config system, required resources, and the type system. We apologize for the thrash, and thank you for bearing with us!
For more info on changes check out the following resources:
Changelog: https://github.com/dagster-io/dagster/blob/master/CHANGES.md
0.7.0 migration guide: https://github.com/dagster-io/dagster/blob/master/070_MIGRATION.md
Published by natekupp over 5 years ago
API Changes
storage
which controls whether or notdagster
CLI now includes options to list and wipe pipeline runRunConfig
where the user can specifyOutputDefinition
now contains an explicit is_optional
parameter and defaults to beingdagster.check
: is_list
dagster.seven
: py23-compatible FileNotFoundError
, json.dump
,json.dumps
.Nothing
type now allows dependencies to be constructed between solids that do not havethrow_on_user_error
has been renamed to raise_on_error
in all APIs, public and privateGraphQL
startSubplanExecution
has been replaced by executePlan
.startPipelineExecution
now supports reexecution of pipeline subsets.Dagit
Execute
tab now opens runs in separate browser tabs and a new Runs
tab allows you toExecute
tabs. This functionality willExplore
tab is more performant on large DAGs.dagit -q
command line flag has been deprecated in favor of a separate command-linedagster-graphql
utility.Dagster-Airflow
DockerOperator
-based)PythonOperator
-based) Airflow DAGs from Dagster pipelines and config.Libraries
dagster_aws
, dagster_ge
, dagster_pandas
, dagster_pyspark
,dagster_snowflake
, and dagster_spark
.Examples
Documentation
Published by schrockn over 5 years ago
Hotfix to not put config values in error messages. Had to re-release because of packaging errors uploaded pypi (.pyc files or similar were included)
Published by schrockn almost 6 years ago
Pushing an update because dagit 0.2.8 was getting out-of-date code.
Published by schrockn almost 6 years ago
Published by schrockn almost 6 years ago
Version 0.2.7 Release Notes
Most notable improvements in this release are bunch of improvements to dagit, most notably hot reloading and the in-browser rendering of python error. Also the ability to scaffold configs from the command line is the first fruit of the rearchitecting of the config system.
Dagster improvements:
Dagit improvements:
Published by schrockn about 6 years ago
Changes:
This is a significant change in the config system. Now the top level environment objects (and all descendants) are now part of the dagster type system. Unique types are generated on a per-pipeline basis. This unlocks a few things:
Previously:
context:
name: context_name
config: some_config_value
Now:
context:
context_name:
config: some_config_value
BREAKING CHANGE: Config format change. See above.
Published by schrockn about 6 years ago
Version bump to 0.2.5 (#227)
Published by schrockn about 6 years ago
This version bump contains a few changes (including one breaking
change).
Published by schrockn about 6 years ago
Driving factor to release this is a bug in the command line interface in 0.2.2 (https://github.com/dagster-io/dagster/issues/207)
Other changes in this release: