Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
APACHE-2.0 License
Bot releases are hidden (Show)
New Features
""""""""""""
PythonVirtualenvDecorator
to Taskflow API (#14761)Taskgroup
decorator (#15034)SubprocessHook
for running commands from operators (#13423)WeekDayBranchOperator
(#13997)worker_pod_pending_timeout
support (#15263)template_fields_renderers
additions (#15130)AirflowSkipException
on exit code 99 (by default, configurable) (#13421) (#14963)airflow jobs check
CLI command to check health of jobs (Scheduler etc) (#14519)DateTimeBranchOperator
to BranchDateTimeOperator
(#14720)Improvements
""""""""""""
DbApiHook
(#15581)apply_default
to subclasses of BaseOperator
(#15667)KubernetesExecutor
pod templates to allow access to IAM permissions (#15669)airflow db check-migrations
(#15662)secret_key
when Webserver > 1 (#15546)JSONFormatter
(#15414)on_failure_callback
when SIGTERM is received (#15172)worker_refresh_interval
to 6000
seconds (#14970)[celery] default_queue
config to [operators] default_queue
to re-use between executors (#14699)Bug Fixes
"""""""""
updateTaskInstancesState
API endpoint when dry_run
not passed (#15889)drawDagStatsForDag
in dags.html (#13884)NotPreviouslySkippedDep
(#13933)KubernetesExecutor
(#14795)KubernetesPodOperator
(#15388)dag.partial_subset
(#13700) (#15308)pod_id
for KubernetesPodOperator
(#15445)pod_id
ends with hyphen in KubernetesPodOperator
(#15443)pool_slots > 1
(#15426)sync-perm
to work correctly when update_fab_perms = False (#14847)GCSObjectsWtihPrefixExistenceSensor
(#14179)CeleryKubernetesExecutor
bug (#13247)StackdriverTaskHandler
(#13784)func.sum
may return Decimal
that break rest APIs (#15585)AlreadyExists
exception when the execution_date
is same (#15174)sync_metadata
inside DagFileProcessorManager
(#15121)docker-py
update to resolve docker op issues (#15731)user_id
from API schema (#15117)airflow info
work with pipes (#14528)CollectionInfo
in all Collections that have total_entries
(#14366)task_instance_mutation_hook
when importing airflow.models.dagrun (#15851)Doc only changes
""""""""""""""""
markdownlint
and yamllint
config files (#15682)git_sync_template.yaml
(#13197)Misc/Internal
"""""""""""""
logging.exception
redundancy (#14823)stylelint
to remove vulnerable sub-dependency (#15784)ssri
from 6.0.1 to 6.0.2 in /airflow/www (#15437)datepicker
for task instance detail view (#15284)tableau
extra (#13595)cached_property
on Python 3.8 where possible (#14606)flynt
. (#13732)jquery
ready instead of vanilla js (#15258)Webpack
entries (#14551)TypeError
when Serializing & sorting iterable properties of DAGs (#15395)on_load
trigger for folder-based plugins (#15208)kubernetes cleanup-pods
subcommand will only clean up Airflow-created Pods (#15204)pod_template_file
in KubernetesExecutor (#15197)executor_config
breaks Graph View in UI (#15199)dagrun.schedule_delay
metric (#15105)executor_config
is passed (#14323)Lax
for cookie_samesite
when empty string is passed (#14183)dag.cli()
KeyError (#13647)[kubernetes] enable_tcp_keepalive
for new installs to True
(#15338)libyaml
C library when available. (#14577)airflow dags show
command display TaskGroups (#14269)extra
connection field. (#12944)Published by kaxil over 3 years ago
airflow db upgrade
to upgrade db as intended (#13267)KubernetesExecutor
should accept images from executor_config
(#13074)airflow/contrib/executors
in the dist packagekubernetes_generate_dag_yaml
] - Fix dag yaml generate function (#13816)airflow tasks clear
cli command wirh --yes
(#14188)delete_dag
function of json_client (#14441)KubernetesExecutor
(#14090)airflow/www_rbac
airflow/www_rbac
rbac_app
's db.session
use the same timezone with @provide_session
(#14025)StreamLogWriter
: Provide (no-op) close method (#10885)Published by kaxil over 3 years ago
sql_alchemy_conn_secret
(#13260)DROP CONSTRAINT
in MySQL during airflow db upgrade
(#13239)sync-perm
(#13377)datatables.net
from 1.10.21 to 1.10.22 in /airflow/www (#13143)datatables.net
JS to 1.10.23 (#13253)dompurify
from 2.0.12 to 2.2.6 in /airflow/www (#13164)cattrs
version (#13223)python-daemon
limit for python 3.8+ to fix daemon crash (#13540)worker_concurrency
to 16 (#13612)dag_id
is None (#13619)continue_token
for cleanup list pods (#13563)max_tis_per_query
to 0
now correctly removes the limit (#13512)BaseBranchOperator
will push to xcom by default (#13704) (#13763)configuration.getsection
(#13804)Website.can_read
access to default roles. (#13923)FileTaskHandler
(#14001)v1/config
endpoint respect webserver expose_config
setting (#14020)min_file_process_interval
to decrease CPU Usage (#13664)os.fork
& CeleryExecutor
(#13265)example_kubernetes_executor
example dag (#13216)flask-swagger
, funcsigs
(#13178)queued_by_job_id
& external_executor_id
Columns to TI View (#13266)json-merge-patch
an optional library and unpin it (#13175)setup.py
to better reflect changes in providers (#13314)pyjwt
and Add integration tests for Apache Pinot (#13195)setup.cfg
(#13409)__eq__
methods in models Dag and BaseOperator (#13449)contextdecorator
(#13455)mysql-connector-python
to allow 8.0.22
(#13370)NotFound
response for DELETE methods in OpenAPI YAML (#13550)[core] lazy_load_plugins
is False
(#13578)colorlog
dependency (#13176)python3-openid
dependency (#13714)__repr__
for Executors (#13753)conn_type
is missing (#13778)get_connnection
REST endpoint (#13885)airflow_local_settings.py
to fix an error message (#13927)TriggerDagRunOperator
(#13964)start_date
(REST API) (#13959)OperationalError
(#14032)rbac
UI (#13569)Published by kaxil almost 4 years ago
The full changelog is about 3,000 lines long (already excluding everything backported to 1.10), so for now I’ll simply share some of the major features in 2.0.0 compared to 1.10.14:
(Known in 2.0.0alphas as Functional DAGs.)
DAGs are now much much nicer to author especially when using PythonOperator. Dependencies are handled more clearly and XCom is nicer to use
A quick teaser of what DAGs can now look like:
from airflow.decorators import dag, task
from airflow.utils.dates import days_ago
@dag(default_args={'owner': 'airflow'}, schedule_interval=None, start_date=days_ago(2))
def tutorial_taskflow_api_etl():
@task
def extract():
return {"1001": 301.27, "1002": 433.21, "1003": 502.22}
@task
def transform(order_data_dict: dict) -> dict:
total_order_value = 0
for value in order_data_dict.values():
total_order_value += value
return {"total_order_value": total_order_value}
@task()
def load(total_order_value: float):
print("Total order value is: %.2f" % total_order_value)
order_data = extract()
order_summary = transform(order_data)
load(order_summary["total_order_value"])
tutorial_etl_dag = tutorial_taskflow_api_etl()
We now have a fully supported, no-longer-experimental API with a comprehensive OpenAPI specification
Read more here:
REST API Documentation.
As part of AIP-15 (Scheduler HA+performance) and other work Kamil did, we significantly improved the performance of the Airflow Scheduler. It now starts tasks much, MUCH quicker.
Over at Astronomer.io we’ve benchmarked the scheduler—it’s fast (we had to triple check the numbers as we don’t quite believe them at first!)
It’s now possible and supported to run more than a single scheduler instance. This is super useful for both resiliency (in case a scheduler goes down) and scheduling performance.
To fully use this feature you need Postgres 9.6+ or MySQL 8+ (MySQL 5, and MariaDB won’t work with more than one scheduler I’m afraid).
There’s no config or other set up required to run more than one scheduler—just start up a scheduler somewhere else (ensuring it has access to the DAG files) and it will cooperate with your existing schedulers through the database.
For more information, read the Scheduler HA documentation.
SubDAGs were commonly used for grouping tasks in the UI, but they had many drawbacks in their execution behaviour (primarirly that they only executed a single task in parallel!) To improve this experience, we’ve introduced “Task Groups”: a method for organizing tasks which provides the same grouping behaviour as a subdag without any of the execution-time drawbacks.
SubDAGs will still work for now, but we think that any previous use of SubDAGs can now be replaced with task groups. If you find an example where this isn’t the case, please let us know by opening an issue on GitHub
For more information, check out the Task Group documentation.
We’ve given the Airflow UI a visual refresh and updated some of the styling. Check out the UI section of the docs for screenshots.
We have also added an option to auto-refresh task states in Graph View so you no longer need to continuously press the refresh button :).
If you make heavy use of sensors in your Airflow cluster, you might find that sensor execution takes up a significant proportion of your cluster even with “reschedule” mode. To improve this, we’ve added a new mode called “Smart Sensors”.
This feature is in “early-access”: it’s been well-tested by AirBnB and is “stable”/usable, but we reserve the right to make backwards-incompatible changes to it in a future release (if we have to. We’ll try very hard not to!)
For Airflow 2.0, we have re-architected the KubernetesExecutor in a fashion that is simultaneously faster, easier to understand, and more flexible for Airflow users. Users will now be able to access the full Kubernetes API to create a .yaml pod_template_file instead of specifying parameters in their airflow.cfg.
We have also replaced the executor_config dictionary with the pod_override parameter, which takes a Kubernetes V1Pod object for a 1:1 setting override. These changes have removed over three thousand lines of code from the KubernetesExecutor, which makes it run faster and creates fewer potential errors.
Airflow 2.0 is not a monolithic “one to rule them all” package. We’ve split Airflow into core and 61 (for now) provider packages. Each provider package is for either a particular external service (Google, Amazon, Microsoft, Snowflake), a database (Postgres, MySQL), or a protocol (HTTP/FTP). Now you can create a custom Airflow installation from “building” blocks and choose only what you need, plus add whatever other requirements you might have. Some of the common providers are installed automatically (ftp, http, imap, sqlite) as they are commonly used. Other providers are automatically installed when you choose appropriate extras when installing Airflow.
The provider architecture should make it much easier to get a fully customized, yet consistent runtime with the right set of Python dependencies.
But that’s not all: you can write your own custom providers and add things like custom connection types, customizations of the Connection Forms, and extra links to your operators in a manageable way. You can build your own provider and install it as a Python package and have your customizations visible right in the Airflow UI.
Security
As part of Airflow 2.0 effort, there has been a conscious focus on Security and reducing areas of exposure. This is represented across different functional areas in different forms. For example, in the new REST API, all operations now require authorization. Similarly, in the configuration settings, the Fernet key is now required to be specified.
Configuration in the form of the airflow.cfg file has been rationalized further in distinct sections, specifically around “core”. Additionally, a significant amount of configuration options have been deprecated or moved to individual component-specific configuration files, such as the pod-template-file for Kubernetes execution-related configuration.
We’ve tried to make as few breaking changes as possible and to provide deprecation path in the code, especially in the case of anything called in the DAG. That said, please read through UPDATING.md to check what might affect you. For example: We re-organized the layout of operators (they now all live under airflow.providers.*) but the old names should continue to work - you’ll just notice a lot of DeprecationWarnings that need to be fixed up.
Published by kaxil almost 4 years ago
depends_on_past
or task_concurrency
are stuck (#12663)force_log_out_after
was not used (#12661)setup_requires
(#12880)[scheduler] max_threads
to [scheduler] parsing_processes
(#12605)Published by ashb almost 4 years ago
Published by ashb almost 4 years ago
The first release of apache-airflow-upgrade-check
module.
Release based off the v1-10-stable branch.
Published by kaxil almost 4 years ago
kubernetes
to a max version of 11.0.0 (#11974)airflow test
only works for tasks in 1.10, not whole dags (#11191)Published by ashb almost 4 years ago
Major features since beta2:
Published by kaxil about 4 years ago
provide_context=True
(#8256)image
in KubernetesPodOperator
to be templated (#10068)Published by kaxil over 4 years ago
/dag_stats
& /task_stats
(#8742)conf
parameter to Spark JDBC Hook (#8787)|safe
filter in code, it's risky (#9180)DAG.__init__
(#8225)Published by kaxil over 4 years ago
none_failed
consistent with documentation (#7464)Published by kaxil over 4 years ago
Published by kaxil over 4 years ago
in_cluster
value in KubernetesPodOperator respect config (#6124)airflow dags show
command guide (#7014).autoenv_leave.zsh
to .gitignore (#6986)Published by kaxil almost 5 years ago
tty
parameter in Docker related operators (#6542)airflow worker
(#6709)AIRFLOW__CORE__SQL_ALCHEMY_CONN_CMD
for example) (#6801)[emr]
etc. extra still work)email
parameters of BaseOperator (#6315)Published by ashb about 5 years ago
aws_session_token
in extra_config of the aws hook (#6303)airflow backfill
command (#6195)airflow/utils/dag_processing.py
(#6314)sudo
to kill cleared tasks when running with impersonation (#6026) (#6176)Published by kaxil about 5 years ago
none_skipped
trigger rule (#5032)airflow test
. (#4828)allowed_states
for ExternalTaskSensor (#4536)fallback
arg in airflow.configuration.get (#4567)query_params
in BigQueryOperator
is wrong. (#4876)