An orchestration platform for the development, production, and observation of data assets.
APACHE-2.0 License
Bot releases are visible (Hide)
Published by prha over 2 years ago
Published by sryza over 2 years ago
k8s_job_executor
due to a label validation error when creating the step pod.--path-prefix
option.@asset
decorator has a partitions_def
argument, which accepts a PartitionsDefinition
value. The asset details page in Dagit now represents which partitions are filled in.sync_and_poll
method of the dagster-fivetran resource (thanks Marcos Marx).Published by Ramshackle-Jamathon almost 3 years ago
Published by benpankow almost 3 years ago
databricks_pyspark_step_launcher
now streams Dagster logs back from Databricks rather than waiting for the step to completely finish before exporting all events. Fixed an issue where all events from the external step would share the same timestamp. Immediately after execution, stdout and stderr logs captured from the Databricks worker will be automatically surfaced to the event log, removing the need to set the wait_for_logs
option in most scenarios.databricks_pyspark_step_launcher
now supports dynamically mapped steps.execute_in_process
would not work for graphs with nothing inputs.Ctrl+A
command did not correctly allow select-all behavior in the editor for non-Mac users, this has now been fixed.k8s_job_executor
where the same step could start twice in rare cases.S3ComputeLogManager
would cause errors in Dagit. This is now fixed.categorical_column
constraint.SkipReason
from the schedule function did not display the skip reason in the tick timeline in Dagit, or output the skip message in the dagster-daemon log output.@daily
rather than 0 0 * * *
. However, these schedules would fail to actually run successfully in the daemon and would also cause errors when viewing certain pages in Dagit. We now raise an DagsterInvalidDefinitionError
for schedules that do not have a cron expression consisting of a 5 space-separated fields.get_bucket()
calls with bucket()
, to avoid unnecessary bucket metadata fetches, thanks!Published by gibsondan almost 3 years ago
@asset_sensor
s.CeleryK8sRunLauncher
that will be included in all launched jobs.end_mlflow_on_run_finished
hook is now a top-level export of the dagster mlflow library. The API reference also now includes an entry for it.RetryPolicy
is now respected when execution is interrupted.job_metadata
in tags did not correctly propagate to Kubernetes jobs created by Dagster. Thanks @ibelikov!default_flags
property of DbtCliResource
.AssetIn
input object now accepts an asset key so upstream assets can be explicitly specified (e.g. AssetIn(asset_key=AssetKey("asset1"))
)@asset
decorator now has an optional non_argument_deps
parameter that accepts AssetKeys of assets that do not pass data but are upstream dependencies.ForeignAsset
objects now have an optional description
attribute.Published by johannkm almost 3 years ago
run_id
, job_name
, and op_exception
have been added as parameters to build_hook_context
.from dagster import job, op
@op
def add_one(x):
return x + 1
@job
def my_job(x):
add_one(x)
You can now add config for x at the top level of my run_config like so:
run_config = {
"inputs": {
"x": {
"value": 2
}
}
}
@op(config_schema={"partition_key": str})
def my_op(context):
print("partition_key: " + context.op_config["partition_key"])
@static_partitioned_config(partition_keys=["a", "b"])
def my_static_partitioned_config(partition_key: str):
return {"ops": {"my_op": {"config": {"partition_key": partition_key}}}}
@job(config=my_static_partitioned_config)
def my_partitioned_job():
my_op()
You can now write:
@op
def my_op(context):
print("partition_key: " + context.partition_key)
@job(partitions_def=StaticPartitionsDefinition(["a", "b"]))
def my_partitioned_job():
my_op()
op_retry_policy
to @job
. You can also specify op_retry_policy
when invoking to_job
on graphs.fivetran_sync_op
will now be rendered with a fivetran tag in Dagit.fivetran_sync_op
now supports producing AssetMaterializations
for each table updated during the sync. To this end, it now outputs a structured FivetranOutput
containing this schema information, instead of an unstructured dictionary.AssetMaterializations
produced from the dbt_cloud_run_op now include a link to the dbt Cloud docs for each asset (if docs were generated for that run).@schedule
decorator with RunRequest
- based evaluation functions. For example, you can now write:@schedule(cron_schedule="* * * * *", job=my_job)
def my_schedule(context):
yield RunRequest(run_key="a", ...)
yield RunRequest(run_key="b", ...)
python_logs
settings using the Dagster Helm chart.make_slack_on_run_failure_sensor
to use Slack layout blocks and include clickable link to Dagit. Previously, it sent a plain text message.ForeignAsset
, the repository containing that job would fail to load.EcsRunLauncher
.EcsRunLauncher
.EcsRunLauncher
now dynamically chooses between assigning a public IP address or not based on whether it’s running in a public or private subnet.@asset
and @multi_asset
decorator now return AssetsDefinition
objects instead of OpDefinitions
get_dagster_logger
instead of context.log
.Published by gibsondan almost 3 years ago
Published by gibsondan almost 3 years ago
a, b = my_op()
, inside @graph
or @job
, but my
_op only has a single Out
.dbt_cloud_run_op
, as well as a more flexible dbt_cloud_resource
for more customized use cases. Check out the api docs to learn more!pipeline launch
/ job launch
CLIs that would spin up an ephemeral dagster instance for the launch, then tear it down before the run actually executed. Now, the CLI will enforce that your instance is non-ephemeral.pipeline
argument of the InitExecutorContext
constructor has been changed to job
.@asset
decorator now accepts a dagster_type
argument, which determines the DagsterType for the output of the asset op.build_assets_job
accepts an executor_def
argument, which determines the executor for the job.Published by gibsondan almost 3 years ago
google-cloud-bigquery
is temporarily pinned to be prior to version 3 due to a breaking change in that version.Published by gibsondan almost 3 years ago
EcsRunLauncher
tagged each ECS task with its corresponding Dagster Run ID. ECS tagging isn't supported for AWS accounts that have not yet migrated to using the long ARN format. Now, the EcsRunLauncher
only adds this tag if your AWS account has the long ARN format enabled.k8s_job_executor
and docker_executor
that could result in jobs exiting as SUCCESS
before all ops have run.k8s_job_executor
and docker_executor
that could result in jobs failing when an op is skipped.graphene
is temporarily pinned to be prior to version 3 to unbreak Dagit dependencies.Published by rexledesma almost 3 years ago
fivetran_sync_op
, as well as a more flexible fivetran_resource
for more customized use cases. Check out the api docs to learn more!SourceHashVersionStrategy
class has been added, which versions op
and resource
code. It can be provided to a job like so:from dagster import job, SourceHashVersionStrategy
@job(version_strategy=SourceHashVersionStrategy())
def my_job():
...
Workspace
> Graph
view.dagster job
selected both pipelines and jobs. This release changes the dagster job
command to select only jobs and not pipelines.Published by gibsondan almost 3 years ago
k8s_job_executor
is no longer experimental, and is recommended for production workloads. This executor runs each op in a separate Kubernetes job. We recommend this executor for Dagster jobs that require greater isolation than the multiprocess
executor can provide within a single Kubernetes pod. The celery_k8s_job_executor
will still be supported, but is recommended only for use cases where Celery is required (The most common example is to offer step concurrency limits using multiple Celery queues). Otherwise, the k8s_job_executor
is the best way to get Kubernetes job isolation.make_dagster_job_from_airflow_dag
factory function. Deprecated pipeline_name
argument in favor of job_name
in all the APIs.chardet
library that was required due to an incompatibility with an old version of the aiohttp
library, which has since been fixed.ins
argument of the op
decorator.slack_on_run_failure_sensor
now says “Job” instead of “Pipeline” in its default message.DagsterTypeCheckDidNotPass
error when a Dagster Type contained a List inside a Tuple (thanks @jan-eat!)dagstermill
, dagster-pandas
, dagster-airflow
, etc)execute_in_process
, when job does not have a top-level output.Published by benpankow almost 3 years ago
define_dagstermill_op
factory function. Also updated documentation and examples to reflect these changes.from dagster import get_dagster_logger
)dbt_cli_resource
, dbt_rpc_resource
, and dbt_rpc_sync_resource
) now support the dbt ls
command: context.resources.dbt.ls()
.ins
and outs
properties to OpDefinition
.resources_config
argument on build_solid_context
. The config argument has been renamed to solid_config
.run_worker
and step_worker
, respectively.execute_in_process
Executor.execute(...)
has changed from pipeline_context
to plan_context
dagster
, rather than repeating the job name. Thanks @skirino![dagster-docker] Added a new docker_executor
which executes steps in separate Docker containers.
The dagster-daemon process can now detect hanging runs and restart crashed run workers. Currently
only supported for jobs using the docker_executor
and k8s_job_executor
. Enable this feature in your dagster.yaml with:
run_monitoring:
enabled: true
Documentation coming soon. Reach out in the #dagster-support Slack channel if you are interested in using this feature.
dagster-aws
, dagster-github
, and dagster-slack
to reference job/op/graph APIs.Published by gibsondan almost 3 years ago
ls
command.Published by gibsondan almost 3 years ago
execution:
celery-k8s:
GCSComputeLogManager
using a string or environment variable instead of passing a path to a credentials file. Thanks @silentsokolov!dagster instance migrate
would run out of memory when migrating over long run histories.Published by gibsondan about 3 years ago
dagster_aws.s3.sensor.get_s3_keys
that would return no keys if an invalid s3 key was providedmy_log.info("foo %s", "bar")
would cause errors in some scenarios.build_assets_job
. The asset graph shows each node in the job’s graph with metadata about the asset it corresponds to - including asset materializations. It also contains links to upstream jobs that produce assets consumed by the job, as well as downstream jobs that consume assets produced by the job.load_assets_from_dbt_project
and load_assets_from_dbt_project
that would cause runs to fail if no runtime_metadata_fn
argument were supplied.@asset
not to infer the type of inputs and outputs from type annotations of the decorated function.@asset
now accepts a compute_kind
argument. You can supply values like “spark”, “pandas”, or “dbt”, and see them represented as a badge on the asset in the Dagit asset graph.Published by prha about 3 years ago
Changed VersionStrategy.get_solid_version
and VersionStrategy.get_resource_version
to take in a SolidVersionContext
and ResourceVersionContext
, respectively. This gives VersionStrategy access to the config (in addition to the definition object) when determining the code version for memoization. (Thanks @RBrossard!).
Note: This is a breaking change for anyone using the experimental VersionStrategy
API. Instead of directly being passed solid_def
and resource_def
, you should access them off of the context object using context.solid_def
and context.resource_def
respectively.