Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
APACHE-2.0 License
Bot releases are hidden (Show)
Previously, when a DAG is paused or removed, incoming dataset events would still
trigger it, and the DAG would run when it is unpaused or added back in a DAG
file. This has been changed; a DAG's dataset schedule can now only be satisfied
by events that occur when the DAG is active. While this is a breaking change,
the previous behavior is considered a bug.
The behavior of time-based scheduling is unchanged, including the timetable part
of DatasetOrTimeSchedule
.
try_number
is no longer incremented during task execution (#39336)Previously, the try number (try_number
) was incremented at the beginning of task execution on the worker. This was problematic for many reasons.
For one it meant that the try number was incremented when it was not supposed to, namely when resuming from reschedule or deferral. And it also resulted in
the try number being "wrong" when the task had not yet started. The workarounds for these two issues caused a lot of confusion.
Now, instead, the try number for a task run is determined at the time the task is scheduled, and does not change in flight, and it is never decremented.
So after the task runs, the observed try number remains the same as it was when the task was running; only when there is a "new try" will the try number be incremented again.
One consequence of this change is, if users were "manually" running tasks (e.g. by calling ti.run()
directly, or command line airflow tasks run
),
try number will no longer be incremented. Airflow assumes that tasks are always run after being scheduled by the scheduler, so we do not regard this as a breaking change.
/logout
endpoint in FAB Auth Manager is now CSRF protected (#40145)The /logout
endpoint's method in FAB Auth Manager has been changed from GET
to POST
in all existing
AuthViews (AuthDBView
, AuthLDAPView
, AuthOAuthView
, AuthOIDView
, AuthRemoteUserView
), and
now includes CSRF protection to enhance security and prevent unauthorized logouts.
This new feature adds capability for Apache Airflow to emit 1) airflow system traces of scheduler,
triggerer, executor, processor 2) DAG run traces for deployed DAG runs in OpenTelemetry format. Previously, only metrics were supported which emitted metrics in OpenTelemetry.
This new feature will add richer data for users to use OpenTelemetry standard to emit and send their trace data to OTLP compatible endpoints.
(@skip_if, @run_if)
to make it simple to apply whether or not to skip a Task. (#41116)This feature adds a decorator to make it simple to skip a Task.
Previously known as hybrid executors, this new feature allows Airflow to use multiple executors concurrently. DAGs, or even individual tasks, can be configured
to use a specific executor that suits its needs best. A single DAG can contain tasks all using different executors. Please see the Airflow documentation for
more details. Note: This feature is still experimental. See documentation on Executor <https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/executor/index.html#using-multiple-executors-concurrently>
_ for a more detailed description.
Airflow integrates Scarf to collect basic usage data during operation. Deployments can opt-out of data collection by setting the [usage_data_collection]enabled
option to False, or the SCARF_ANALYTICS=false environment variable.
See FAQ on this <https://airflow.apache.org/docs/apache-airflow/stable/faq.html#does-airflow-collect-any-telemetry-data>
_ for more information.
AIP-61 <https://github.com/apache/airflow/pulls?q=is%3Apr+label%3Aarea%3Ahybrid-executors+is%3Aclosed+milestone%3A%22Airflow+2.10.0%22>
_)AIP-62 <https://github.com/apache/airflow/pulls?q=is%3Apr+is%3Amerged+label%3AAIP-62+milestone%3A%22Airflow+2.10.0%22>
_)AIP-64 <https://github.com/apache/airflow/pulls?q=is%3Apr+is%3Amerged+label%3AAIP-64+milestone%3A%22Airflow+2.10.0%22>
_)AIP-44 <https://github.com/apache/airflow/pulls?q=is%3Apr+label%3AAIP-44+milestone%3A%22Airflow+2.10.0%22+is%3Aclosed>
_)accessors
to read dataset events defined as inlet (#39367)dag test
(#40010)endDate
in task instance tooltip. (#39547)accessors
to read dataset events defined as inlet (#39367, #39893)run_if
& skip_if
decorators (#41116)renderedjson
component (#40964)get_extra_dejson
method with nested parameter which allows you to specify if you want the nested json as string to be also deserialized (#39811)__getattr__
to task decorator stub (#39425)RemovedIn20Warning
in airflow task
command (#39244)db migrate
error messages (#39268)suppress_and_warn
warning (#39263)declarative_base
from sqlalchemy.orm
instead of sqlalchemy.ext.declarative
(#39134)on_task_instance_failed
access to the error that caused the failure (#38155)output_processor
parameter to BashProcessor
(#40843)never_fail
in BaseSensor (#40915)start_date
(#40878)external_task_group_id
to WorkflowTrigger
(#39617)BaseSensorOperator
introduce skip_policy
parameter (#40924)__init__
(#41086)OTel
Traces (#40874)pydocstyle
rules to pyproject.toml (#40569)pydocstyle
rule D213 in ruff. (#40448, #40464)Dag.test()
to run with an executor if desired (#40205)AirflowInternalRuntimeError
for raise non catchable
errors (#38778)pytest
to 8.0+ (#39450)back_populates
between DagScheduleDatasetReference.dag
and DagModel.schedule_dataset_references
(#39392)B028
(no-explicit-stacklevel) in core (#39123)ImportError
to ParseImportError
for avoid shadowing with builtin exception (#39116)SubDagOperator
examples warnings (#39057)model_dump
instead of dict
for serialize Pydantic V2 model (#38933)ws
from 7.5.5 to 7.5.10 in /airflow/www (#40288)filesystems
and dataset-uris
to "how to create your own provider" page (#40801)otel_on
to True in example airflow.cfg (#40712)task_id
from send_email
to send_email_notification
in taskflow.rst
(#41060)Published by jedcunningham 3 months ago
2.9.3
(#40816)The default Airflow image that is used with the Chart is now 2.9.3
, previously it was 2.9.2
.
The PgBouncer Exporter image has been updated to airflow-pgbouncer-exporter-2024.06.18-0.17.0
, which addresses CVE-2024-24786.
dags.gitSync.sshKey
, which allows the git-sync private key to be configured in the values file directly (#39936)extraEnvFrom
to git-sync containers (#39031)UIAlert
to production guide when a dynamic webserver secret is used now opens in a new tab (#40635)extraConfigMaps
and extraSecrets
(#40294)safeToEvict
annotations (#40554)triggerer.keda.usePgbouncer
to values.yaml (#40614)//
character using mysql backend (#40401)airflow-pgbouncer-exporter-2024.06.18-0.17.0
(#40318)startupProbe
timing comment (#40412)Published by utkarsharma2 3 months ago
scheduled_duration
and queued_duration
changed (#37936)scheduled_duration
and queued_duration
metrics are now emitted in milliseconds instead of seconds.
By convention all statsd metrics should be emitted in milliseconds, this is later expected in e.g. prometheus
statsd-exporter.
Experimental support for OpenTelemetry was added in 2.7.0 since then fixes and improvements were added and now we announce the feature as stable.
[webserver]update_fab_perms
to deprecated configs (#40317)httpx
to requests
in file_task_handler
(#39799)SchedulerJobRunner._process_executor_events
(#40563)Published by jedcunningham 4 months ago
ClusterRole
and ClusterRoleBinding
names have been updated to be unique (#37197)ClusterRole
s and ClusterRoleBinding
s created when multiNamespaceMode
is enabled have been renamed to ensure unique names:
{{ include "airflow.fullname" . }}-pod-launcher-role
has been renamed to {{ .Release.Namespace }}-{{ include "airflow.fullname" . }}-pod-launcher-role
{{ include "airflow.fullname" . }}-pod-launcher-rolebinding
has been renamed to {{ .Release.Namespace }}-{{ include "airflow.fullname" . }}-pod-launcher-rolebinding
{{ include "airflow.fullname" . }}-pod-log-reader-role
has been renamed to {{ .Release.Namespace }}-{{ include "airflow.fullname" . }}-pod-log-reader-role
{{ include "airflow.fullname" . }}-pod-log-reader-rolebinding
has been renamed to {{ .Release.Namespace }}-{{ include "airflow.fullname" . }}-pod-log-reader-rolebinding
{{ include "airflow.fullname" . }}-scc-rolebinding
has been renamed to {{ .Release.Namespace }}-{{ include "airflow.fullname" . }}-scc-rolebinding
workers.safeToEvict
default changed to False (#40229)The default for workers.safeToEvict
now defaults to False. This is a safer default
as it prevents the nodes workers are running on from being scaled down by the
K8s Cluster Autoscaler <https://kubernetes.io/docs/concepts/cluster-administration/cluster-autoscaling/#cluster-autoscaler>
_.
If you would like to retain the previous behavior, you can set this config to True.
2.9.2
(#40160)The default Airflow image that is used with the Chart is now 2.9.2
, previously it was 2.8.3
.
v0.26.1
(#38416)The default StatsD image that is used with the Chart is now v0.26.1
, previously it was v0.26.0
.
valueFrom
in env config of components (#40135)extraContainers
and extraInitContainers
(#38507)workers.command
for KubernetesExecutor (#39132)priorityClassName
to Jobs (#39133)workers.safeToEvict
default to False (#40229)extraContainers
and extraInitContainers
that are templated (#40033)brokerUrlSecretName
(#39115)Published by utkarsharma2 4 months ago
No significant changes.
AirflowSecurityManagerV2
leave transactions in the idle in transaction
state (#39935)SafeDogStatsdLogger
to use get_validator
to enable pattern matching (#39370)has_access
(#39421)execution_date
in @apply_lineage
(#39327)sql_alchemy_engine_args
config example (#38971)yandex
provider to avoid mypy
errors (#39990)provider_info_cache
decorator (#39750)defer
(#39742)idx_last_scheduling_decision
on dag_run
table (#39275)CronDataIntervalTimetable
(#39780)Published by ephraimbuddy 6 months ago
10.17.0
or later (#38071)If you use Stackdriver logging, you must use Google provider version 10.17.0
or later. Airflow 2.9.1
now passes gcp_log_name
to the StackdriverTaskHandler
instead of name
, and this will fail on earlier provider versions.
This fixes a bug where the log name configured in [logging] remove_base_log_folder
was overridden when Airflow configured logging, resulting in task logs going to the wrong destination.
href
for nav bar (#39282)firefox
(#39261)log_url
(#39183)UX
(#39119)ux
in react dag page (#39122)AUTH_ROLE_PUBLIC
is set in check_authentication
(#39012)map_index_template
so it renders for failed tasks as long as it was defined before the point of failure (#38902)Undeprecate
BaseXCom.get_one
method for now (#38991)inherit_cache
attribute for CreateTableAs
custom SA Clause (#38985)SAWarning
'Coercing Subquery object into a select() for use in IN()' (#38926)cartesian
product in AirflowSecurityManagerV2 (#38913)methodtools.lru_cache
instead of functools.lru_cache
in class methods (#37757)airflow dags backfill
only if -I
/ --ignore-first-depends-on-past
provided (#38676)TriggerDagRunOperator
deprecate execution_date
in favor of logical_date
(#39285)@deprecated
decorator (#39205)is_authorized_custom_view
from auth manager to handle custom actions (#39167)minischeduler
skip (#38976)undici
from 5.28.3 to 5.28.4
in /airflow/www
(#38751)PythonOperator
op_kwargs (#39242)user
and role
commands (#39224)k8s 1.29
to supported version in docs (#39168)DagBag
class docstring to include all params (#38814)Published by ephraimbuddy 6 months ago
Lifecycle events:
on_starting
before_stopping
DagRun State Change Events:
on_dag_run_running
on_dag_run_success
on_dag_run_failed
TaskInstance State Change Events:
on_task_instance_running
on_task_instance_success
on_task_instance_failed
After discussion <https://lists.apache.org/thread/r06j306hldg03g2my1pd4nyjxg78b3h4>
__
and a voting process <https://lists.apache.org/thread/pgcgmhf6560k8jbsmz8nlyoxosvltph2>
__,
the Airflow's PMC and Committers have reached a resolution to no longer maintain MsSQL as a supported Database Backend.
As of Airflow 2.9.0 support of MsSQL has been removed for Airflow Database Backend.
A migration script which can help migrating the database before upgrading to Airflow 2.9.0 is available in
airflow-mssql-migration repo on Github <https://github.com/apache/airflow-mssql-migration>
_.
Note that the migration script is provided without support and warranty.
This does not affect the existing provider packages (operators and hooks), DAGs can still access and process data from MsSQL.
Datasets must use a URI that conform to rules laid down in AIP-60, and the value
will be automatically normalized when the DAG file is parsed. See
documentation on Datasets <https://airflow.apache.org/docs/apache-airflow/stable/authoring-and-scheduling/datasets.html>
_ for
a more detailed description on the rules.
You may need to change your Dataset identifiers if they look like a URI, but are
used in a less mainstream way, such as relying on the URI's auth section, or
have a case-sensitive protocol name.
get_permitted_menu_items
in BaseAuthManager
has been renamed filter_permitted_menu_items
(#37627)The Audit Log event
name for REST API events will be prepended with api.
or ui.
, depending on if it came from the Airflow UI or externally.
There are a few caveats though:
Pendulum2 does not support Python 3.12. For Python 3.12 you need to use
Pendulum 3 <https://pendulum.eustace.io/blog/announcing-pendulum-3-0-0.html>
_
Minimum SQLAlchemy version supported when Pandas is installed for Python 3.12 is 1.4.36
released in
April 2022. Airflow 2.9.0 increases the minimum supported version of SQLAlchemy to 1.4.36
for all
Python versions.
Not all Providers support Python 3.12. At the initial release of Airflow 2.9.0 the following providers
are released without support for Python 3.12:
apache.beam
- pending on Apache Beam support for 3.12 <https://github.com/apache/beam/issues/29149>
_papermill
- pending on Releasing Python 3.12 compatible papermill client versionincluding this merged issue <https://github.com/nteract/papermill/pull/771>
_There's now a limit to the length of data that can be stored in the Rendered Template Fields.
The limit is set to 4096 characters. If the data exceeds this limit, it will be truncated. You can change this limit
by setting the [core]max_template_field_length
configuration option in your airflow config.
Xcom table column value
type has changed from blob
to longblob
. This will allow you to store relatively big data in Xcom but process can take a significant amount of time if you have a lot of large data stored in Xcom.
To downgrade from revision: b4078ac230a1
, ensure that you don't have Xcom values larger than 65,535 bytes. Otherwise, you'll need to clean those rows or run airflow db clean xcom
to clean the Xcom table.
Matomo
as an option for analytics_tool. (#38221)hashable
(#37465)queuedEvent
endpoint to get/delete DatasetDagRunQueue (#37176)DatasetOrTimeSchedule
(#36710)on_skipped_callback
to BaseOperator
(#36374)@task.bash
TaskFlow decorator (#30176, #37875)ExternalPythonOperator
use version from sys.version_info
(#38377)run_id
column to log table (#37731)tryNumber
to grid task instance tooltip (#37911)ExternalPythonOperator
(#37409)Pathlike
(#36947)nowait
and skip_locked into with_row_locks (#36889)dag/dagRun
in the REST API (#36641)Connexion
from auth manager interface (#36209)total_entries
count on the event logs endpoint (#38625)tz
in next run ID info (#38482)chakra
styles to keep dropdowns
in filter bar (#38456)__exit__
is called in decorator context managers (#38383)BaseAuthManager.is_authorized_custom_view
abstract (#37915)/get_logs_with_metadata
endpoint (#37756)encoding
to the SQL engine in SQLAlchemy v2 (#37545)consuming_dags
attr eagerly before dataset listener (#36247)importlib_metadata
with compat to Python 3.10/3.12 stdlib
(#38366)__new__
magic method of BaseOperatorMeta to avoid bad mixing classic and decorated operators (#37937)sys.version_info
for determine Python Major.Minor (#38372)blinker
add where it requires (#38140)> 39.0.0
(#38112)assert
outside of the tests (#37718)flask._request_ctx_stack
(#37522)login
attribute in airflow.__init__.py
(#37565)datetime.datetime.utcnow
by airflow.utils.timezone.utcnow
in core (#35448)is_authorized_cluster_activity
from auth manager (#36175)exception
to templates ref list (#36656)Published by jedcunningham 7 months ago
No significant changes.
FixedTimezone
(#38139)ObjectStoragePath
(#37769)pytest_rewrites
(#38095, #38139)pandas
to <2.2
(#37748)croniter
to fix an issue with 29 Feb cron expressions (#38198)Published by jedcunningham 7 months ago
2.8.3
(#38036)The default Airflow image that is used with the Chart is now 2.8.3
, previously it was 2.8.2
.
.Values.airflowPodAnnotations
(#37917)multiNamespace
releases with the same name (#37197)Published by ephraimbuddy 7 months ago
airflow_pre_installed_providers.txt
artifact (#37679)BranchDayOfWeekOperator
(#37813)ERD
generating doc improvement (#37808)Published by jedcunningham 8 months ago
The default Airflow image that is used with the Chart is now 2.8.2, previously it was 2.8.1.
Published by ephraimbuddy 8 months ago
allowed_deserialization_classes
flag now follows a glob pattern (#36147).For example if one wants to add the class airflow.tests.custom_class
to the
allowed_deserialization_classes
list, it can be done by writing the full class
name (airflow.tests.custom_class
) or a pattern such as the ones used in glob
search (e.g., airflow.*
, airflow.tests.*
).
If you currently use a custom regexp path make sure to rewrite it as a glob pattern.
Alternatively, if you still wish to match it as a regexp pattern, add it under the new
list allowed_deserialization_classes_regexp
instead.
This was done under the policy that we do not want users like Viewer, Ops,
and other users apart from Admin to have access to audit_logs. The intention behind
this change is to restrict users with less permissions from viewing user details
like First Name, Email etc. from the audit_logs when they are not permitted to.
The impact of this change is that the existing users with non admin rights won't be able
to view or access the audit_logs, both from the Browse tab or from the DAG run.
AirflowTimeoutError
is no longer except
by default through Exception
(#35653).The AirflowTimeoutError
is now inheriting BaseException
instead of
AirflowException
->Exception
.
See https://docs.python.org/3/library/exceptions.html#exception-hierarchy
This prevents code catching Exception
from accidentally
catching AirflowTimeoutError
and continuing to run.
AirflowTimeoutError
is an explicit intent to cancel the task, and should not
be caught in attempts to handle the error and return some default value.
Catching AirflowTimeoutError
is still possible by explicitly except
ing
AirflowTimeoutError
or BaseException
.
This is discouraged, as it may allow the code to continue running even after
such cancellation requests.
Code that previously depended on performing strict cleanup in every situation
after catching Exception
is advised to use finally
blocks or
context managers. To perform only the cleanup and then automatically
re-raise the exception.
See similar considerations about catching KeyboardInterrupt
in
https://docs.python.org/3/library/exceptions.html#KeyboardInterrupt
IMPORT_ERROR
from DAG related permissions to view related permissions (#37292)AirflowTaskTimeout
to inherit BaseException
(#35653)namedtuple
(#37168)Treeview
function (#37162)access_entity
is specified (#37290)dateTimeAttrFormat
constant (#37285)@Sentry.enrich_errors
(#37002)dryrun
auto-fetch (#36941)/variables
endpoint (#36820)pendulum.from_timestamp
usage (#37160)CLI
instead of specific one (#37651)undici
from 5.26.3
to 5.28.3
in /airflow/www
(#37493)3.12
exclusions in providers/pyproject.toml
(#37404)markdown
from core dependencies (#37396)pageSize
method. (#37319)Python 3.11
and 3.12
deprecations (#37478)airflow_pre_installed_providers.txt
into sdist
distribution (#37388)universal-pathlib to < 0.2.0
(#37311)queue_when
(#36997)config.yml
for environment variable sql_alchemy_connect_args
(#36526)Alembic to 1.13.1
(#36928)flask-session
to <0.6
(#36895)CLI
flags available (#37231)otel
config descriptions (#37229)Objectstore
tutorial with prereqs
section (#36983)package/module
names (#36927)__init__
of operators automatically (#33786)Published by jedcunningham 8 months ago
bitnami/postgresql
dependency (#34817)The version of bitnami/postgresql
subchart upgraded from 12.10.0
to 13.2.24
.
The version of PostgreSQL
binaries upgraded from 11
to 16.1.0
.
The change requires existing bitnami/postgresql
subchart users to perform manual major version upgrade using pg_dumpall
or pg_upgrade
.
As a reminder, it is recommended to set up an external database <https://airflow.apache.org/docs/helm-chart/stable/production-guide.html#database>
_ in production.
2.8.1
(#36907)The default Airflow image that is used with the Chart is now 2.8.1
, previously it was 2.7.1
.
The PgBouncer and PgBouncer Exporter images are based on newer software/os.
pgbouncer
: 1.21.0 based on alpine 3.14 (airflow-pgbouncer-2024.01.19-1.21.0
)pgbouncer-exporter
: 0.16.0 based on alpine 3.19 (apache/airflow:airflow-pgbouncer-exporter-2024.01.19-0.16.0
)v0.26.0
(#37187)The default StatsD image that is used with the Chart is now v0.26.0
, previously it was v0.22.8
.
7-bookworm
(#37187)The default Redis image that is used with the Chart is now 7-bookworm
, previously it was 7-bullseye
.
securityContexts
in dag processors log groomer sidecar (#34499)securityContexts
in dag processors wait-for-migrations container (#35593)storageClassName
(#35581)volumeClaimTemplate
for worker (#34986)priorityClassName
on Redis pods (#34879)emptyDir
config (#34837)AIRFLOW_HOME
env var with airflowHome
value (#34839)safeToEvict
properly (#35130)useStandardNaming
(#34825)usePgbouncer
is false (#34741)useStandardNaming
(#34787)bitnami/postgresql
subchart to 13.2.24
(#36156)pgbouncer
and pgbouncer-exporter
images with newer versions (#36898)statsd
and redis
chart images (#37187)Published by ephraimbuddy 9 months ago
pendulum
package set to 3 (#36281).Support for pendulum 2.1.2 will be saved for a while, presumably until the next feature version of Airflow.
It is advised to upgrade user code to use pendulum 3 as soon as possible.
We standardized Airflow dependency configuration to follow latest development in Python packaging by
using pyproject.toml
. Airflow is now compliant with those accepted PEPs:
PEP-440 Version Identification and Dependency Specification <https://www.python.org/dev/peps/pep-0440/>
__PEP-517 A build-system independent format for source trees <https://www.python.org/dev/peps/pep-0517/>
__PEP-518 Specifying Minimum Build System Requirements for Python Projects <https://www.python.org/dev/peps/pep-0518/>
__PEP-561 Distributing and Packaging Type Information <https://www.python.org/dev/peps/pep-0561/>
__PEP-621 Storing project metadata in pyproject.toml <https://www.python.org/dev/peps/pep-0621/>
__PEP-660 Editable installs for pyproject.toml based builds (wheel based) <https://www.python.org/dev/peps/pep-0660/>
__PEP-685 Comparison of extra names for optional distribution dependencies <https://www.python.org/dev/peps/pep-0685/>
__Also we implement multiple license files support coming from Draft, not yet accepted (but supported by hatchling) PEP:
PEP 639 Improving License Clarity with Better Package Metadata <https://peps.python.org/pep-0639/>
__This has almost no noticeable impact on users if they are using modern Python packaging and development tools, generally
speaking Airflow should behave as it did before when installing it from PyPI and it should be much easier to install
it for development purposes using pip install -e ".[devel]"
.
The differences from the user side are:
-
(following PEP-685) instead of _
and .
dbt.core
orall_dbs
) you should use -
instead of _
and .
.In most modern tools this will work in backwards-compatible way, but in some old version of those tools you might need to
replace _
and .
with -
. You can also get warnings that the extra you are installing does not exist - but usually
this warning is harmless and the extra is installed anyway. It is, however, recommended to change to use -
in extras in your dependency
specifications for all Airflow extras.
Released airflow package does not contain devel
, devel-*
, doc
and doc-gen
extras.
Those extras are only available when you install Airflow from sources in --editable
mode. This is
because those extras are only used for development and documentation building purposes and are not needed
when you install Airflow for production use. Those dependencies had unspecified and varying behaviour for
released packages anyway and you were not supposed to use them in released packages.
The all
and all-*
extras were not always working correctly when installing Airflow using constraints
because they were also considered as development-only dependencies. With this change, those dependencies are
now properly handling constraints and they will install properly with constraints, pulling the right set
of providers and dependencies when constraints are used.
The graphviz
dependency has been problematic as Airflow required dependency - especially for
ARM-based installations. Graphviz packages require binary graphviz libraries - which is already a
limitation, but they also require to install graphviz Python bindings to be build and installed.
This does not work for older Linux installation but - more importantly - when you try to install
Graphviz libraries for Python 3.8, 3.9 for ARM M1 MacBooks, the packages fail to install because
Python bindings compilation for M1 can only work for Python 3.10+.
This is not a breaking change technically - the CLIs to render the DAGs is still there and IF you
already have graphviz installed, it will continue working as it did before. The only problem when it
does not work is where you do not have graphviz installed it will raise an error and inform that you need it.
Graphviz will remain to be installed for most users:
The only change will be a new installation of new version of Airflow from the scratch, where graphviz will
need to be specified as extra or installed separately in order to enable DAG rendering option.
taskinstance
list (#36693)AUTH_ROLE_PUBLIC=admin
(#36750)op
subtypes (#35536)typing.Union
in _infer_multiple_outputs
for Python 3.10+ (#36728)multiple_outputs
is inferred correctly even when using TypedDict
(#36652)Dagrun.update_state
(#36712)EventsTimetable
schedule past events if catchup=False
(#36134)tis_query
in _process_executor_events
(#36655)call_regular_interval
(#36608)DagRun
fails while running dag test
(#36517)_manage_executor_state
by refreshing TIs in batch (#36502)MAX_CONTENT_LENGTH
(#36401)kubernetes
decorator type annotation consistent with operator (#36405)api/dag/*/dagrun
from anonymous user (#36275)DAG.is_fixed_time_schedule
(#36370)httpx
import in file_task_handler for performance (#36753)pyarrow-hotfix
for CVE-2023-47248
(#36697)graphviz
dependency optional (#36647)pandas
dependency to 1.2.5 for all providers and airflow (#36698)/airflow/www
(#36700)docker
decorator type annotations (#36406)batch_is_authorized_dag
to check if user has permission to read DAGs (#36279)numpy
example with practical exercise demonstrating top-level code (#35097)dags.rst
with information on DAG pausing (#36540)metrics.rst
for param dagrun.schedule_delay
(#36404)Published by ephraimbuddy 10 months ago
Raw HTML code in DAG docs and DAG params descriptions is disabled by default
To ensure that no malicious javascript can be injected with DAG descriptions or trigger UI forms by DAG authors
a new parameter webserver.allow_raw_html_descriptions
was added with default value of False
.
If you trust your DAG authors code and want to allow using raw HTML in DAG descriptions and params, you can restore the previous
behavior by setting the configuration value to True
.
To ensure Airflow is secure by default, the raw HTML support in trigger UI has been super-seeded by markdown support via
the description_md
attribute. If you have been using description_html
please migrate to description_md
.
The custom_html_form
is now deprecated. (#35460)
prev_end_date_success
method access (#34528)List Task Instances
view (#34529)clear_number
to track DAG run being cleared (#34126)multiselect
to run state in grid view (#35403)Connection.get_hook
in case of ImportError (#36005)taskinstance
(#35810)AIRFLOW_CONFIG
path (#35818)JSON-string
connection representation generator (#35723)BaseOperatorLink
into the separate module (#35032)cbreak
in execute_interactive
and handle SIGINT
(#35602)synchronize_log_template
function (#35366)BaseOperatorLink.operators
(#35003)SA2-compatible
syntax for TaskReschedule (#33720)EventScheduler
(#34808)update_forward_refs
(#34657)Dataset
from airflow
package in codebase (#34610)airflow.datasets.Dataset
in examples and tests (#34605)version
top-level element from docker compose files (#33831)NOT EXISTS
subquery instead of tuple_not_in_condition
(#33527)triggerer_heartbeat
(#33320)airflow variables export
to print to stdout (#33279)reset_user_sessions
to work from either CLI or web (#36056)overscroll
behaviour to auto (#35717)borderWidthRight
to grid for Firefox scrollbar
(#35346)processor_subdir
in serialized_dag table (#35661)get_dag_by_pickle
util function (#35339)mappedoperator
(#35257)Literal
from typing_extensions
(#33794)4.3.10
(#35991)Connection.to_json_dict
to Connection.to_dict
(#35894)moto
version to >= 4.2.9
(#35687)pyarrow-hotfix
to mitigate CVE-2023-47248 (#35650)axios
from 0.26.0 to 1.6.0
in /airflow/www/
(#35624)navbar_text_color
and rm
condition in style (#35553)dag_next_execution
(#35539)TCH004
and TCH005
rules (#35475)AirflowException
from airflow (#34541)postcss
from 8.4.25 to 8.4.31
in /airflow/www
(#34770)airflow.models.dag.DAG
in examples (#34617)re2
regex engine in the .airflowignore documentation. (#35663)best-practices.rst
(#35692)dag-run.rst
to mention Airflow's support for extended cron syntax through croniter (#35342)webserver.rst
to include information of supported OAuth2 providers (#35237)rst
code block format (#34708)Published by ephraimbuddy 12 months ago
No significant changes.
codemirror
and extra (#35122)get_plugin_info
for class based listeners. (#35022)all_skipped
trigger rule as skipped
if any task is in upstream_failed
state (#34392)pendulum
requirement to <3.0
(#35336)sentry_sdk
to 1.33.0
(#35298)@babel/traverse
from 7.16.0 to 7.23.2
in /airflow/www
(#34988)undici
from 5.19.1 to 5.26.3
in /airflow/www
(#34971)SchedulerJobRunner
(#34810)max_tis per query > parallelism
(#34742)connexion<3.0
upper bound (#35218)< 3.12
(#35123)3.1.0
(#34943)conn.extras
(#35165)mysql-connector-python
from recommended MySQL driver (#34287)set_downstream
example (#35075)airflow_local_settings.py
template (#34826)'>'
in provider section name (#34813)Published by ephraimbuddy about 1 year ago
No significant changes
taskgroup
is mapped (#34587)cluster_activity
view not loading due to standaloneDagProcessor
templating (#34274)loglevel=DEBUG
in 'Not syncing DAG-level
permissions' (#34268)access_control={}
(#34114)ab_user
table in the CLI session (#34120)next_run_datasets_summary
endpoint (#34143)_run_task_session
in mapped render_template_fields
(#33309)version_added
(#34011)AUTH_REMOTE_USER
from FAB in WSGI middleware example (#34721)astroid
version < 3 (#34658)os.path.splitext
to Path.*
(#34352, #33669)pyproject.toml
(#34014)isinstance
in fab_security manager (#33760)isinstance
calls for the same object in a single call (#33767)str.splitlines()
to split lines (#33592)len()
(#33454)Published by jedcunningham about 1 year ago
This is a new opt-in switch useStandardNaming
, for backwards compatibility, to leverage the standard naming convention, which allows full use of fullnameOverride
and nameOverride
in all resources.
The following resources will be renamed using default of useStandardNaming=false
when upgrading to 1.11.0 or a higher version.
{release}-airflow-config
to {release}-config
{release}-airflow-metadata
to {release}-metadata
{release}-airflow-result-backend
to {release}-result-backend
{release}-airflow-ingress
to {release}-ingress
For existing installations, all your resources will be recreated with a new name and Helm will delete the previous resources.
This won't delete existing PVCs for logs used by StatefulSet/Deployments, but it will recreate them with brand new PVCs.
If you do want to preserve logs history you'll need to manually copy the data of these volumes into the new volumes after
deployment. Depending on what storage backend/class you're using this procedure may vary. If you don't mind starting
with fresh logs/redis volumes, you can just delete the old PVCs that will be names, for example:
.. code-block:: bash
kubectl delete pvc -n airflow logs-gta-triggerer-0
kubectl delete pvc -n airflow logs-gta-worker-0
kubectl delete pvc -n airflow redis-db-gta-redis-0
If you do not change useStandardNaming
or fullnameOverride
after upgrade, you can proceed as usual and no unexpected behaviours will be presented.
bitnami/postgresql
subchart updated to 12.10.0
(#33747)The PostgreSQL subchart that is used with the Chart is now 12.10.0
, previously it was 12.1.9
.
3.6.9
(#33748)The default git-sync image that is used with the Chart is now 3.6.9
, previously it was 3.6.3
.
2.7.1
(#34186)The default Airflow image that is used with the Chart is now 2.7.1
, previously it was 2.6.2
.
startupProbe
to scheduler and webserver (#33107)automountServiceAccountToken
(#32808)runtimeClassName
(#31868)containerSecurityContext
for cleanup job (#34351)waitformigration
containers extraVolumeMounts
(#32100)airflow db migrate
command to database migration job (#34178)workers.terminationGracePeriodSeconds
into KubeExecutor pod template (#33514)--local
and --job-type
args (#32426)common.tplvalues.render
with tpl
in ingress template files (#33384)or
function in template files (#34415)Published by ephraimbuddy about 1 year ago
When setting catchup=False
, CronTriggerTimetable no longer skips a run if
the scheduler does not query the timetable immediately after the previous run
has been triggered.
This should not affect scheduling in most cases, but can change the behaviour if
a DAG is paused-unpaused to manually skip a run. Previously, the timetable (with
catchup=False
) would only start a run after a DAG is unpaused, but with this
change, the scheduler would try to look at little bit back to schedule the
previous run that covers a part of the period when the DAG was paused. This
means you will need to keep a DAG paused longer (namely, for the entire cron
period to pass) to really skip a run.
Note that this is also the behaviour exhibited by various other cron-based
scheduling tools, such as anacron
.
conf.set()
becomes case insensitive to match conf.get()
behavior (#33452)Also, conf.get()
will now break if used with non-string parameters.
conf.set(section, key, value)
used to be case sensitive, i.e. conf.set("SECTION", "KEY", value)
and conf.set("section", "key", value)
were stored as two distinct configurations.
This was inconsistent with the behavior of conf.get(section, key)
, which was always converting the section and key to lower case.
As a result, configuration options set with upper case characters in the section or key were unreachable.
That's why we are now converting section and key to lower case in conf.set
too.
We also changed a bit the behavior of conf.get()
. It used to allow objects that are not strings in the section or key.
Doing this will now result in an exception. For instance, conf.get("section", 123)
needs to be replaced with conf.get("section", "123")
.
MappedTaskGroup
tasks not respecting upstream dependency (#33732)SECURITY_MANAGER_CLASS
should be a reference to class, not a string (#33690)get_url_for_login
in security manager (#33660)2.7.0 db
migration job errors (#33652)groupby
in TIS duration calculation (#33535)dialect.name
in custom SA types (#33503)end_date
is less than utcnow
(#33488)formatDuration
method (#33486)conf.set
case insensitive (#33452)soft_fail
argument when poke
is called (#33401)processor_subdir
(#33357)<br>
text in Provider's view (#33326)soft_fail
argument when ExternalTaskSensor runs in deferrable mode (#33196)expand_kwargs
method (#32272)Pydantic
1 compatibility (#34081, #33998)Pydantic
2 (#33956)devel_only
extra in Airflow's setup.py (#33907)FAB
to 4.3.4
in order to fix issues with filters (#33931)sqlalchemy to 1.4.24
(#33892)OrderedDict
with plain dict (#33508)Pydantic
warning about orm_mode
rename (#33220)Pydantic
limitation for version < 2 (#33507)Published by ephraimbuddy about 1 year ago
As of now, Python 3.7 is no longer supported by the Python community.
Therefore, to use Airflow 2.7.0, you must ensure your Python version is
either 3.8, 3.9, 3.10, or 3.11.
The old Graph View is removed. The new Graph View is the default view now.
If you are using dag_run.conf
dictionary and web UI JSON entry to run your DAG you should either:
Add params to your DAG <https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/params.html#use-params-to-provide-a-trigger-ui-form>
_show_trigger_form_if_no_params
to bring back old behaviourInstead, you should use "airflow db migrate" command to create or upgrade database. This command will not create default connections.
In order to create default connections you need to run "airflow connections create-default-connections" explicitly,
after running "airflow db migrate".
The "default" context is Python's default_ssl_contest
instead of previously used "none". The
default_ssl_context
provides a balance between security and compatibility but in some cases,
when certificates are old, self-signed or misconfigured, it might not work. This can be configured
by setting "ssl_context" in "email" configuration of Airflow.
Setting it to "none" brings back the "none" setting that was used in Airflow 2.6 and before,
but it is not recommended due to security reasons ad this setting disables validation of certificates and allows MITM attacks.
For security reasons, the test connection functionality is disabled by default across Airflow UI,
API and CLI. The availability of the functionality can be controlled by the
test_connection
flag in the core
section of the Airflow
configuration (airflow.cfg
). It can also be controlled by the
environment variable AIRFLOW__CORE__TEST_CONNECTION
.
The following values are accepted for this config param:
Disabled
: Disables the test connection functionality andThis is also the default value set in the Airflow configuration.
2. Enabled
: Enables the test connection functionality and
activates the Test Connection button in the UI.
Hidden
: Disables the test connection functionality andFor more information on capabilities of users, see the documentation:
https://airflow.apache.org/docs/apache-airflow/stable/security/security_model.html#capabilities-of-authenticated-ui-users
It is strongly advised to not enable the feature until you make sure that only
highly trusted UI/API users have "edit connection" permissions.
xcomEntries
API disables support for the deserialize
flag by default (#32176)For security reasons, the /dags/*/dagRuns/*/taskInstances/*/xcomEntries/*
API endpoint now disables the deserialize
option to deserialize arbitrary
XCom values in the webserver. For backward compatibility, server admins may set
the [api] enable_xcom_deserialize_support
config to True to enable the
flag and restore backward compatibility.
However, it is strongly advised to not enable the feature, and perform
deserialization at the client side instead.
Default name of the Celery application changed from airflow.executors.celery_executor
to airflow.providers.celery.executors.celery_executor
.
You should change both your configuration and Health check command to use the new name:
celery_app_name
configuration in celery
section) use airflow.providers.celery.executors.celery_executor
airflow.providers.celery.executors.celery_executor.app
scheduler.max_tis_per_query
is changed from 512 to 16 (#32572)This change is expected to make the Scheduler more responsive.
scheduler.max_tis_per_query
needs to be lower than core.parallelism
.
If both were left to their default value previously, the effective default value of scheduler.max_tis_per_query
was 32
(because it was capped at core.parallelism
).
To keep the behavior as close as possible to the old config, one can set scheduler.max_tis_per_query = 0
,
in which case it'll always use the value of core.parallelism
.
In order to use the executors, you need to install the providers:
apache-airflow-providers-celery
package >= 3.3.0apache-airflow-providers-cncf-kubernetes
package >= 7.4.0apache-airflow-providers-daskexecutor
package in any versionYou can achieve it also by installing airflow with [celery]
, [cncf.kubernetes]
, [daskexecutor]
extras respectively.
Users who base their images on the apache/airflow
reference image (not slim) should be unaffected - the base
reference image comes with all the three providers installed.
This index seems to have great positive effect in a setup with tens of millions such rows.
AIP-49 <https://github.com/apache/airflow/pulls?q=is%3Apr+is%3Amerged+label%3AAIP-49+milestone%3A%22Airflow+2.7.0%22>
_)AIP-51 <https://github.com/apache/airflow/pulls?q=is%3Apr+is%3Amerged+label%3AAIP-51+milestone%3A%22Airflow+2.7.0%22>
_)AIP-52 <https://github.com/apache/airflow/pulls?q=is%3Apr+is%3Amerged+label%3AAIP-52+milestone%3A%22Airflow+2.7.0%22>
_)AIP-53 <https://github.com/apache/airflow/pulls?q=is%3Apr+is%3Amerged+label%3AAIP-53+milestone%3A%22Airflow+2.7.0%22>
_)BranchExternalPythonOperator
(#32787, #33360)Per-LocalTaskJob
Configuration (#32313)AirflowClusterPolicySkipDag
exception (#32013)reactflow
for datasets graph (#31775)chain
which doesn't require matched lists (#31927)--retry
and --retry-delay
to airflow db check
(#31836)section
query param in get config rest API (#30936)Scheduled->Queued->Running
task state transition times (#30612)db upgrade
to db migrate
and add connections create-default-connections
(#32810, #33136)<=
parallelism (#32572)isdisjoint
instead of not intersection
(#32616)dag_processor
status. (#32382)[triggers.running]
(#32050)TriggerDagRunOperator
: Add wait_for_completion
to template_fields
(#31122)PythonVirtualenvOperator
termination log in alert (#31747)airflow db
commands to SQLAlchemy 2.0 style (#31486)validators
into their own modules (#30802)get_log
api (#30729)Gantt chart:
Use earliest/oldest ti dates if different than dag run start/end (#33215)virtualenv
detection for Python virtualenv
operator (#33223)chmod
airflow.cfg
(#33118)max_active_runs
reached its upper limit. (#31414)get_task_instances
query (#33054)$ref
(#32887)PythonOperator
sub-classes extend its decorator (#32845)virtualenv
is installed in PythonVirtualenvOperator
(#32939)__iter__
in is_container() (#32850)dagRunTimeout
(#32565)/blocked
endpoint (#32571)cli.dags.trigger
command output (#32548)whitespaces
from airflow connections form (#32292)readonly
property in our API (#32510)resizer
wouldn't expanse grid view (#31581)type_
arg to drop_constraint (#31306)drop_constraint
call in migrations (#31302)requirepass
redis sentinel (#30352)/config
(#31057)dag_processing
(#33161)Pydantic
to < 2.0.0
(#33235)cncf.kubernetes
provider (#32767, #32891)pydocstyle
check - core Airflow only (#31297)1.2.3 to 1.2.4
in /airflow/www
(#32680)6.3.0 to 6.3.1
in /airflow/www
(#32506)4.18.0
(#32445)stylelint
from 13.13.1 to 15.10.1
in /airflow/www
(#32435)4.0.0 to 4.1.3
in /airflow/www
(#32443)Pydantic
2 (#32366)enums
(#31735)0.272
(#31966)asynctest
(#31664)2.0
style (#31569, #31772, #32350, #32339, #32474, #32645)3.7
support (#30963)0.0.262
(#30809)1.2.0
(#30687)DAGRun / DAG / Task
in templates-ref.rst (#33013)