airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

APACHE-2.0 License

Downloads
57.5M
Stars
36.1K
Committers
3.2K

Bot releases are hidden (Show)

airflow - Apache Airflow 2.10.0 Latest Release

Published by ephraimbuddy 2 months ago

Significant Changes

Datasets no longer trigger inactive DAGs (#38891)

Previously, when a DAG is paused or removed, incoming dataset events would still
trigger it, and the DAG would run when it is unpaused or added back in a DAG
file. This has been changed; a DAG's dataset schedule can now only be satisfied
by events that occur when the DAG is active. While this is a breaking change,
the previous behavior is considered a bug.

The behavior of time-based scheduling is unchanged, including the timetable part
of DatasetOrTimeSchedule.

try_number is no longer incremented during task execution (#39336)

Previously, the try number (try_number) was incremented at the beginning of task execution on the worker. This was problematic for many reasons.
For one it meant that the try number was incremented when it was not supposed to, namely when resuming from reschedule or deferral. And it also resulted in
the try number being "wrong" when the task had not yet started. The workarounds for these two issues caused a lot of confusion.

Now, instead, the try number for a task run is determined at the time the task is scheduled, and does not change in flight, and it is never decremented.
So after the task runs, the observed try number remains the same as it was when the task was running; only when there is a "new try" will the try number be incremented again.

One consequence of this change is, if users were "manually" running tasks (e.g. by calling ti.run() directly, or command line airflow tasks run),
try number will no longer be incremented. Airflow assumes that tasks are always run after being scheduled by the scheduler, so we do not regard this as a breaking change.

/logout endpoint in FAB Auth Manager is now CSRF protected (#40145)

The /logout endpoint's method in FAB Auth Manager has been changed from GET to POST in all existing
AuthViews (AuthDBView, AuthLDAPView, AuthOAuthView, AuthOIDView, AuthRemoteUserView), and
now includes CSRF protection to enhance security and prevent unauthorized logouts.

OpenTelemetry Traces for Apache Airflow (#37948).

This new feature adds capability for Apache Airflow to emit 1) airflow system traces of scheduler,
triggerer, executor, processor 2) DAG run traces for deployed DAG runs in OpenTelemetry format. Previously, only metrics were supported which emitted metrics in OpenTelemetry.
This new feature will add richer data for users to use OpenTelemetry standard to emit and send their trace data to OTLP compatible endpoints.

Decorator for Task Flow (@skip_if, @run_if) to make it simple to apply whether or not to skip a Task. (#41116)

This feature adds a decorator to make it simple to skip a Task.

Using Multiple Executors Concurrently (#40701)

Previously known as hybrid executors, this new feature allows Airflow to use multiple executors concurrently. DAGs, or even individual tasks, can be configured
to use a specific executor that suits its needs best. A single DAG can contain tasks all using different executors. Please see the Airflow documentation for
more details. Note: This feature is still experimental. See documentation on Executor <https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/executor/index.html#using-multiple-executors-concurrently>_ for a more detailed description.

Scarf based telemetry: Does Airflow collect any telemetry data? (#39510)

Airflow integrates Scarf to collect basic usage data during operation. Deployments can opt-out of data collection by setting the [usage_data_collection]enabled option to False, or the SCARF_ANALYTICS=false environment variable.
See FAQ on this <https://airflow.apache.org/docs/apache-airflow/stable/faq.html#does-airflow-collect-any-telemetry-data>_ for more information.

New Features

  • AIP-61 Hybrid Execution (AIP-61 <https://github.com/apache/airflow/pulls?q=is%3Apr+label%3Aarea%3Ahybrid-executors+is%3Aclosed+milestone%3A%22Airflow+2.10.0%22>_)
  • AIP-62 Getting Lineage from Hook Instrumentation (AIP-62 <https://github.com/apache/airflow/pulls?q=is%3Apr+is%3Amerged+label%3AAIP-62+milestone%3A%22Airflow+2.10.0%22>_)
  • AIP-64 TaskInstance Try History (AIP-64 <https://github.com/apache/airflow/pulls?q=is%3Apr+is%3Amerged+label%3AAIP-64+milestone%3A%22Airflow+2.10.0%22>_)
  • AIP-44 Internal API (AIP-44 <https://github.com/apache/airflow/pulls?q=is%3Apr+label%3AAIP-44+milestone%3A%22Airflow+2.10.0%22+is%3Aclosed>_)
  • Enable ending the task directly from the triggerer without going into the worker. (#40084)
  • Extend dataset dependencies (#40868)
  • Feature/add token authentication to internal api (#40899)
  • Add DatasetAlias to support dynamic Dataset Event Emission and Dataset Creation (#40478)
  • Add example DAGs for inlet_events (#39893)
  • Implement accessors to read dataset events defined as inlet (#39367)
  • Decorator for Task Flow, to make it simple to apply whether or not to skip a Task. (#41116)
  • Add start execution from triggerer support to dynamic task mapping (#39912)
  • Add try_number to log table (#40739)
  • Added ds_format_locale method in macros which allows localizing datetime formatting using Babel (#40746)
  • Add DatasetAlias to support dynamic Dataset Event Emission and Dataset Creation (#40478, #40723, #40809, #41264, #40830, #40693, #41302)
  • Use sentinel to mark dag as removed on re-serialization (#39825)
  • Add parameter for the last number of queries to the DB in DAG file processing stats (#40323)
  • Add prototype version dark mode for Airflow UI (#39355)
  • Add ability to mark some tasks as successful in dag test (#40010)
  • Allow use of callable for template_fields (#37028)
  • Filter running/failed and active/paused dags on the home page(#39701)
  • Add metrics about task CPU and memory usage (#39650)
  • UI changes for DAG Re-parsing feature (#39636)
  • Add Scarf based telemetry (#39510, #41318)
  • Add dag re-parsing request endpoint (#39138)
  • Redirect to new DAGRun after trigger from Grid view (#39569)
  • Display endDate in task instance tooltip. (#39547)
  • Implement accessors to read dataset events defined as inlet (#39367, #39893)
  • Add color to log lines in UI for error and warnings based on keywords (#39006)
  • Add Rendered k8s pod spec tab to ti details view (#39141)
  • Make audit log before/after filterable (#39120)
  • Consolidate grid collapse actions to a single full screen toggle (#39070)
  • Implement Metadata to emit runtime extra (#38650)
  • Add executor field to the DB and parameter to the operators (#38474)
  • Implement context accessor for DatasetEvent extra (#38481)
  • Add dataset event info to dag graph (#41012)
  • Add button to toggle datasets on/off in dag graph (#41200)
  • Add run_if & skip_if decorators (#41116)
  • Add dag_stats rest api endpoint (#41017)
  • Add listeners for Dag import errors (#39739)
  • Allowing DateTimeSensorAsync, FileSensor and TimeSensorAsync to start execution from trigger during dynamic task mapping (#41182)

Improvements

  • Allow set Dag Run resource into Dag Level permission: extends Dag's access_control feature to allow Dag Run resource permissions. (#40703)
  • Improve security and error handling for the internal API (#40999)
  • Datasets UI Improvements (#40871)
  • Change DAG Audit log tab to Event Log (#40967)
  • Make standalone dag file processor works in DB isolation mode (#40916)
  • Show only the source on the consumer DAG page and only triggered DAG run in the producer DAG page (#41300)
  • Update metrics names to allow multiple executors to report metrics (#40778)
  • Format DAG run count (#39684)
  • Update styles for renderedjson component (#40964)
  • Improve ATTRIBUTE_REMOVED sentinel to use class and more context (#40920)
  • Make XCom display as react json (#40640)
  • Replace usages of task context logger with the log table (#40867)
  • Rollback for all retry exceptions (#40882) (#40883)
  • Support rendering ObjectStoragePath value (#40638)
  • Add try_number and map_index as params for log event endpoint (#40845)
  • Rotate fernet key in batches to limit memory usage (#40786)
  • Add gauge metric for 'last_num_of_db_queries' parameter (#40833)
  • Set parallelism log messages to warning level for better visibility (#39298)
  • Add error handling for encoding the dag runs (#40222)
  • Use params instead of dag_run.conf in example DAG (#40759)
  • Load Example Plugins with Example DAGs (#39999)
  • Stop deferring TimeDeltaSensorAsync task when the target_dttm is in the past (#40719)
  • Send important executor logs to task logs (#40468)
  • Open external links in new tabs (#40635)
  • Attempt to add ReactJSON view to rendered templates (#40639)
  • Speeding up regex match time for custom warnings (#40513)
  • Refactor DAG.dataset_triggers into the timetable class (#39321)
  • add next_kwargs to StartTriggerArgs (#40376)
  • Improve UI error handling (#40350)
  • Remove double warning in CLI when config value is deprecated (#40319)
  • Implement XComArg concat() (#40172)
  • Added get_extra_dejson method with nested parameter which allows you to specify if you want the nested json as string to be also deserialized (#39811)
  • Add executor field to the task instance API (#40034)
  • Support checking for db path absoluteness on Windows (#40069)
  • Introduce StartTriggerArgs and prevent start trigger initialization in scheduler (#39585)
  • Add task documentation to details tab in grid view (#39899)
  • Allow executors to be specified with only the class name of the Executor (#40131)
  • Remove obsolete conditional logic related to try_number (#40104)
  • Allow Task Group Ids to be passed as branches in BranchMixIn (#38883)
  • Javascript connection form will apply CodeMirror to all textarea's dynamically (#39812)
  • Determine needs_expansion at time of serialization (#39604)
  • Add indexes on dag_id column in referencing tables to speed up deletion of dag records (#39638)
  • Add task failed dependencies to details page (#38449)
  • Remove webserver try_number adjustment (#39623)
  • Implement slicing in lazy sequence (#39483)
  • Unify lazy db sequence implementations (#39426)
  • Add __getattr__ to task decorator stub (#39425)
  • Allow passing labels to FAB Views registered via Plugins (#39444)
  • Simpler error message when trying to offline migrate with sqlite (#39441)
  • Add soft_fail to TriggerDagRunOperator (#39173)
  • Rename "dataset event" in context to use "outlet" (#39397)
  • Resolve RemovedIn20Warning in airflow task command (#39244)
  • Determine fail_stop on client side when db isolated (#39258)
  • Refactor cloudpickle support in Python operators/decorators (#39270)
  • Update trigger kwargs migration to specify existing_nullable (#39361)
  • Allowing tasks to start execution directly from triggerer without going to worker (#38674)
  • Better db migrate error messages (#39268)
  • Add stacklevel into the suppress_and_warn warning (#39263)
  • Support searching by dag_display_name (#39008)
  • Allow sort by on all fields in MappedInstances.tsx (#38090)
  • Expose count of scheduled tasks in metrics (#38899)
  • Use declarative_base from sqlalchemy.orm instead of sqlalchemy.ext.declarative (#39134)
  • Add example DAG to demonstrate emitting approaches (#38821)
  • Give on_task_instance_failed access to the error that caused the failure (#38155)
  • Simplify dataset serialization (#38694)
  • Add heartbeat recovery message to jobs (#34457)
  • Remove select_column option in TaskInstance.get_task_instance (#38571)
  • Don't create session in get_dag if not reading dags from database (#38553)
  • Add a migration script for encrypted trigger kwargs (#38358)
  • Implement render_templates on TaskInstancePydantic (#38559)
  • Handle optional session in _refresh_from_db (#38572)
  • Make type annotation less confusing in task_command.py (#38561)
  • Use fetch_dagrun directly to avoid session creation (#38557)
  • Added output_processor parameter to BashProcessor (#40843)
  • Improve serialization for Database Isolation Mode (#41239)
  • Only orphan non-orphaned Datasets (#40806)
  • Adjust gantt width based on task history dates (#41192)
  • Enable scrolling on legend with high number of elements. (#41187)

Bug Fixes

  • Bugfix for get_parsing_context() when ran with LocalExecutor (#40738)
  • Validating provider documentation urls before displaying in views (#40933)
  • Move import to make PythonOperator working on Windows (#40424)
  • Fix dataset_with_extra_from_classic_operator example DAG (#40747)
  • Call listener on_task_instance_failed() after ti state is changed (#41053)
  • Add never_fail in BaseSensor (#40915)
  • Fix tasks API endpoint when DAG doesn't have start_date (#40878)
  • Fix and adjust URL generation for UI grid and older runs (#40764)
  • Rotate fernet key optimization (#40758)
  • Fix class instance vs. class type in validate_database_executor_compatibility() call (#40626)
  • Clean up dark mode (#40466)
  • Validate expected types for args for DAG, BaseOperator and TaskGroup (#40269)
  • Exponential Backoff Not Functioning in BaseSensorOperator Reschedule Mode (#39823)
  • local task job: add timeout, to not kill on_task_instance_success listener prematurely (#39890)
  • Move Post Execution Log Grouping behind Exception Print (#40146)
  • Fix triggerer race condition in HA setting (#38666)
  • Pass triggered or existing DAG Run logical date to DagStateTrigger (#39960)
  • Passing external_task_group_id to WorkflowTrigger (#39617)
  • ECS Executor: Set tasks to RUNNING state once active (#39212)
  • Only heartbeat if necessary in backfill loop (#39399)
  • Fix trigger kwarg encryption migration (#39246)
  • Fix decryption of trigger kwargs when downgrading. (#38743)
  • Fix wrong link in TriggeredDagRuns (#41166)
  • Pass MapIndex to LogLink component for external log systems (#41125)
  • Add NonCachingRotatingFileHandler for worker task (#41064)
  • Add argument include_xcom in method resolve an optional value (#41062)
  • Sanitizing file names in example_bash_decorator DAG (#40949)
  • Show dataset aliases in dependency graphs (#41128)
  • Render Dataset Conditions in DAG Graph view (#41137)
  • Add task duration plot across dagruns (#40755)
  • Add start execution from trigger support for existing core sensors (#41021)
  • add example dag for dataset_alias (#41037)
  • Add dataset alias unique constraint and remove wrong dataset alias removing logic (#41097)
  • Set "has_outlet_datasets" to true if "dataset alias" exists (#41091)
  • Make HookLineageCollector group datasets by (#41034)
  • Enhance start_trigger_args serialization (#40993)
  • Refactor BaseSensorOperator introduce skip_policy parameter (#40924)
  • Fix viewing logs from triggerer when task is deferred (#41272)
  • Refactor how triggered dag run url is replaced (#41259)
  • Added support for additional sql alchemy session args (#41048)
  • Allow empty list in TriggerDagRun failed_state (#41249)
  • Clean up the exception handler when run_as_user is the airflow user (#41241)
  • Collapse docs when click and folded (#41214)
  • Update updated_at when saving to db as session.merge does not trigger on-update (#40782)
  • Fix query count statistics when parsing DAF file (#41149)
  • Method Resolution Order in operators without __init__ (#41086)
  • Ensure try_number incremented for empty operator (#40426)

Miscellaneous

  • Remove the Experimental flag from OTel Traces (#40874)
  • Bump packaging version to 23.0 in order to fix issue with older otel (#40865)
  • Simplify _auth_manager_is_authorized_map function (#40803)
  • Use correct unknown executor exception in scheduler job (#40700)
  • Add D1 pydocstyle rules to pyproject.toml (#40569)
  • Enable enforcing pydocstyle rule D213 in ruff. (#40448, #40464)
  • Update Dag.test() to run with an executor if desired (#40205)
  • Update jest and babel minor versions (#40203)
  • Refactor BashOperator and Bash decorator for consistency and simplicity (#39871)
  • Add AirflowInternalRuntimeError for raise non catchable errors (#38778)
  • ruff version bump 0.4.5 (#39849)
  • Bump pytest to 8.0+ (#39450)
  • Remove stale comment about TI index (#39470)
  • Configure back_populates between DagScheduleDatasetReference.dag and DagModel.schedule_dataset_references (#39392)
  • Remove deprecation warnings in endpoints.py (#39389)
  • Fix SQLA deprecations in Airflow core (#39211)
  • Use class-bound attribute directly in SA (#39198, #39195)
  • Fix stacklevel for TaskContextLogger (#39142)
  • Capture warnings during collect DAGs (#39109)
  • Resolve B028 (no-explicit-stacklevel) in core (#39123)
  • Rename model ImportError to ParseImportError for avoid shadowing with builtin exception (#39116)
  • Add option to support cloudpickle in PythonVenv/External Operator (#38531)
  • Suppress SubDagOperator examples warnings (#39057)
  • Add log for running callback (#38892)
  • Use model_dump instead of dict for serialize Pydantic V2 model (#38933)
  • Widen cheat sheet column to avoid wrapping commands (#38888)
  • Update hatchling to latest version (1.22.5) (#38780)
  • bump uv to 0.1.29 (#38758)
  • Add missing serializations found during provider tests fixing (#41252)
  • Bump ws from 7.5.5 to 7.5.10 in /airflow/www (#40288)
  • Improve typing for allowed/failed_states in TriggerDagRunOperator (#39855)

Doc Only Changes

  • Add filesystems and dataset-uris to "how to create your own provider" page (#40801)
  • Fix (TM) to (R) in Airflow repository (#40783)
  • Set otel_on to True in example airflow.cfg (#40712)
  • Add warning for _AIRFLOW_PATCH_GEVENT (#40677)
  • Update multi-team diagram proposal after Airflow 3 discussions (#40671)
  • Add stronger warning that MSSQL is not supported and no longer functional (#40565)
  • Fix misleading mac menu structure in howto (#40440)
  • Update k8s supported version in docs (#39878)
  • Add compatibility note for Listeners (#39544)
  • Update edge label image in documentation example with the new graph view (#38802)
  • Update UI doc screenshots (#38680)
  • Add section "Manipulating queued dataset events through REST API" (#41022)
  • Add information about lack of security guarantees for docker compose (#41072)
  • Add links to example dags in use params section (#41031)
  • Change task_id from send_email to send_email_notification in taskflow.rst (#41060)
  • Remove unnecessary nginx redirect rule from reverse proxy documentation (#38953)
airflow - Apache Airflow Helm Chart 1.15.0

Published by jedcunningham 3 months ago

Significant Changes

Default Airflow image is updated to 2.9.3 (#40816)

The default Airflow image that is used with the Chart is now 2.9.3, previously it was 2.9.2.

Default PgBouncer Exporter image has been updated (#40318)

The PgBouncer Exporter image has been updated to airflow-pgbouncer-exporter-2024.06.18-0.17.0, which addresses CVE-2024-24786.

New Features

  • Add git-sync container lifecycle hooks (#40369)
  • Add init containers for jobs (#40454)
  • Add persistent volume claim retention policy (#40271)
  • Add annotations for Redis StatefulSet (#40281)
  • Add dags.gitSync.sshKey, which allows the git-sync private key to be configured in the values file directly (#39936)
  • Add extraEnvFrom to git-sync containers (#39031)

Improvements

  • Link in UIAlert to production guide when a dynamic webserver secret is used now opens in a new tab (#40635)
  • Support disabling helm hooks on extraConfigMaps and extraSecrets (#40294)

Bug Fixes

  • Add git-sync ssh secret to DAG processor (#40691)
  • Fix duplicated safeToEvict annotations (#40554)
  • Add missing triggerer.keda.usePgbouncer to values.yaml (#40614)
  • Trim leading // character using mysql backend (#40401)

Doc only changes

  • Updating chart download link to use the Apache download CDN (#40618)

Misc

  • Update PgBouncer exporter image to airflow-pgbouncer-exporter-2024.06.18-0.17.0 (#40318)
  • Default airflow version to 2.9.3 (#40816)
  • Fix startupProbe timing comment (#40412)
airflow - Apache Airflow 2.9.3

Published by utkarsharma2 3 months ago

Significant Changes

Time unit for scheduled_duration and queued_duration changed (#37936)

scheduled_duration and queued_duration metrics are now emitted in milliseconds instead of seconds.

By convention all statsd metrics should be emitted in milliseconds, this is later expected in e.g. prometheus statsd-exporter.

Support for OpenTelemetry Metrics is no longer "Experimental" (#40286)

Experimental support for OpenTelemetry was added in 2.7.0 since then fixes and improvements were added and now we announce the feature as stable.

Bug Fixes

  • Fix calendar view scroll (#40458)
  • Validating provider description for urls in provider list view (#40475)
  • Fix compatibility with old MySQL 8.0 (#40314)
  • Fix dag (un)pausing won't work on environment where dag files are missing (#40345)
  • Extra being passed to SQLalchemy (#40391)
  • Handle unsupported operand int + str when value of tag is int (job_id) (#40407)
  • Fix TriggeredDagRunOperator triggered link (#40336)
  • Add [webserver]update_fab_perms to deprecated configs (#40317)
  • Swap dag run link from legacy graph to grid with graph tab (#40241)
  • Change httpx to requests in file_task_handler (#39799)
  • Fix import future annotations in venv jinja template (#40208)
  • Ensures DAG params order regardless of backend (#40156)
  • Use a join for TI notes in TI batch API endpoint (#40028)
  • Improve trigger UI for string array format validation (#39993)
  • Disable jinja2 rendering for doc_md (#40522)
  • Skip checking sub dags list if taskinstance state is skipped (#40578)
  • Recognize quotes when parsing urls in logs (#40508)

Doc Only Changes

  • Add notes about passing secrets via environment variables (#40519)
  • Revamp some confusing log messages (#40334)
  • Add more precise description of masking sensitive field names (#40512)
  • Add slightly more detailed guidance about upgrading to the docs (#40227)
  • Metrics allow_list complete example (#40120)
  • Add warning to deprecated api docs that access control isn't applied (#40129)
  • Simpler command to check local scheduler is alive (#40074)
  • Add a note and an example clarifying the usage of DAG-level params (#40541)
  • Fix highlight of example code in dags.rst (#40114)
  • Add warning about the PostgresOperator being deprecated (#40662)
  • Updating airflow download links to CDN based links (#40618)
  • Fix import statement for DatasetOrTimetable example (#40601)
  • Further clarify triage process (#40536)
  • Fix param order in PythonOperator docstring (#40122)
  • Update serializers.rst to mention that bytes are not supported (#40597)

Miscellaneous

  • Upgrade build installers and dependencies (#40177)
  • Bump braces from 3.0.2 to 3.0.3 in /airflow/www (#40180)
  • Upgrade to another version of trove-classifier (new CUDA classifiers) (#40564)
  • Rename "try_number" increments that are unrelated to the airflow concept (#39317)
  • Update trove classifiers to the latest version as build dependency (#40542)
  • Upgrade to latest version of hatchling as build dependency (#40387)
  • Fix bug in SchedulerJobRunner._process_executor_events (#40563)
  • Remove logging for "blocked" events (#40446)
airflow - Apache Airflow Helm Chart 1.14.0

Published by jedcunningham 4 months ago

Significant Changes

ClusterRole and ClusterRoleBinding names have been updated to be unique (#37197)

ClusterRoles and ClusterRoleBindings created when multiNamespaceMode is enabled have been renamed to ensure unique names:

  • {{ include "airflow.fullname" . }}-pod-launcher-role has been renamed to {{ .Release.Namespace }}-{{ include "airflow.fullname" . }}-pod-launcher-role
  • {{ include "airflow.fullname" . }}-pod-launcher-rolebinding has been renamed to {{ .Release.Namespace }}-{{ include "airflow.fullname" . }}-pod-launcher-rolebinding
  • {{ include "airflow.fullname" . }}-pod-log-reader-role has been renamed to {{ .Release.Namespace }}-{{ include "airflow.fullname" . }}-pod-log-reader-role
  • {{ include "airflow.fullname" . }}-pod-log-reader-rolebinding has been renamed to {{ .Release.Namespace }}-{{ include "airflow.fullname" . }}-pod-log-reader-rolebinding
  • {{ include "airflow.fullname" . }}-scc-rolebinding has been renamed to {{ .Release.Namespace }}-{{ include "airflow.fullname" . }}-scc-rolebinding

workers.safeToEvict default changed to False (#40229)

The default for workers.safeToEvict now defaults to False. This is a safer default
as it prevents the nodes workers are running on from being scaled down by the
K8s Cluster Autoscaler <https://kubernetes.io/docs/concepts/cluster-administration/cluster-autoscaling/#cluster-autoscaler>_.
If you would like to retain the previous behavior, you can set this config to True.

Default Airflow image is updated to 2.9.2 (#40160)

The default Airflow image that is used with the Chart is now 2.9.2, previously it was 2.8.3.

Default StatsD image is updated to v0.26.1 (#38416)

The default StatsD image that is used with the Chart is now v0.26.1, previously it was v0.26.0.

New Features

  • Enable MySQL KEDA support for triggerer (#37365)
  • Allow AWS Executors (#38524)

Improvements

  • Allow valueFrom in env config of components (#40135)
  • Enable templating in extraContainers and extraInitContainers (#38507)
  • Add safe-to-evict annotation to pod-template-file (#37352)
  • Support workers.command for KubernetesExecutor (#39132)
  • Add priorityClassName to Jobs (#39133)
  • Add Kerberos sidecar to pod-template-file (#38815)
  • Add templated field support for extra containers (#38510)

Bug Fixes

  • Set workers.safeToEvict default to False (#40229)

Doc only changes

  • Document extraContainers and extraInitContainers that are templated (#40033)
  • Fix typo in HorizontalPodAutoscaling documentation (#39307)
  • Fix supported k8s versions in docs (#39172)
  • Fix typo in YAML path for brokerUrlSecretName (#39115)

Misc

  • Default Airflow version to 2.9.2 (#40160)
  • Limit Redis image to 7.2 (#38928)
  • Build Helm values schemas with Kubernetes 1.29 resources (#38460)
  • Add missing containers to resources docs (#38534)
  • Upgrade StatsD Exporter image to 0.26.1 (#38416)
  • Remove K8S 1.25 support (#38367)
airflow - Apache Airflow 2.9.2

Published by utkarsharma2 4 months ago

Significant Changes

No significant changes.

Bug Fixes

  • Fix bug that makes AirflowSecurityManagerV2 leave transactions in the idle in transaction state (#39935)
  • Fix alembic auto-generation and rename mismatching constraints (#39032)
  • Add the existing_nullable to the downgrade side of the migration (#39374)
  • Fix Mark Instance state buttons stay disabled if user lacks permission (#37451). (#38732)
  • Use SKIP LOCKED instead of NOWAIT in mini scheduler (#39745)
  • Remove DAG Run Add option from FAB view (#39881)
  • Add max_consecutive_failed_dag_runs in API spec (#39830)
  • Fix example_branch_operator failing in python 3.12 (#39783)
  • Fetch served logs also when task attempt is up for retry and no remote logs available (#39496)
  • Change dataset URI validation to raise warning instead of error in Airflow 2.9 (#39670)
  • Visible DAG RUN doesn't point to the same dag run id (#38365)
  • Refactor SafeDogStatsdLogger to use get_validator to enable pattern matching (#39370)
  • Fix custom actions in security manager has_access (#39421)
  • Fix HTTP 500 Internal Server Error if DAG is triggered with bad params (#39409)
  • Fix static file caching is disabled in Airflow Webserver. (#39345)
  • Fix TaskHandlerWithCustomFormatter now adds prefix only once (#38502)
  • Do not provide deprecated execution_date in @apply_lineage (#39327)
  • Add missing conn_id to string representation of ObjectStoragePath (#39313)
  • Fix sql_alchemy_engine_args config example (#38971)
  • Add Cache-Control "no-store" to all dynamically generated content (#39550)

Miscellaneous

  • Limit yandex provider to avoid mypy errors (#39990)
  • Warn on mini scheduler failures instead of debug (#39760)
  • Change type definition for provider_info_cache decorator (#39750)
  • Better typing for BaseOperator defer (#39742)
  • More typing in TimeSensor and TimeSensorAsync (#39696)
  • Re-raise exception from strict dataset URI checks (#39719)
  • Fix stacklevel for _log_state helper (#39596)
  • Resolve SA warnings in migrations scripts (#39418)
  • Remove unused index idx_last_scheduling_decision on dag_run table (#39275)

Doc Only Changes

  • Provide extra tip on labeling DynamicTaskMapping (#39977)
  • Improve visibility of links / variables / other configs in Configuration Reference (#39916)
  • Remove 'legacy' definition for CronDataIntervalTimetable (#39780)
  • Update plugins.rst examples to use pyproject.toml over setup.py (#39665)
  • Fix nit in pg set-up doc (#39628)
  • Add Matomo to Tracking User Activity docs (#39611)
  • Fix Connection.get -> Connection. get_connection_from_secrets (#39560)
  • Adding note for provider dependencies (#39512)
  • Update docker-compose command (#39504)
  • Update note about restarting triggerer process (#39436)
  • Updating S3LogLink with an invalid bucket link (#39424)
  • Update testing_packages.rst (#38996)
  • Add multi-team diagrams (#38861)
airflow - Apache Airflow 2.9.1

Published by ephraimbuddy 6 months ago

Significant Changes

Stackdriver logging bugfix requires Google provider 10.17.0 or later (#38071)

If you use Stackdriver logging, you must use Google provider version 10.17.0 or later. Airflow 2.9.1 now passes gcp_log_name to the StackdriverTaskHandler instead of name, and this will fail on earlier provider versions.

This fixes a bug where the log name configured in [logging] remove_base_log_folder was overridden when Airflow configured logging, resulting in task logs going to the wrong destination.

Bug Fixes

  • Make task log messages include run_id (#39280)
  • Copy menu_item href for nav bar (#39282)
  • Fix trigger kwarg encryption migration (#39246, #39361, #39374)
  • Add workaround for datetime-local input in firefox (#39261)
  • Add Grid button to Task Instance view (#39223)
  • Get served logs when remote or executor logs not available for non-running task try (#39177)
  • Fixed side effect of menu filtering causing disappearing menus (#39229)
  • Use grid view for Task Instance's log_url (#39183)
  • Improve task filtering UX (#39119)
  • Improve rendered_template ux in react dag page (#39122)
  • Graph view improvements (#38940)
  • Check that the dataset<>task exists before trying to render graph (#39069)
  • Hostname was "redacted", not "redact"; remove it when there is no context (#39037)
  • Check whether AUTH_ROLE_PUBLIC is set in check_authentication (#39012)
  • Move rendering of map_index_template so it renders for failed tasks as long as it was defined before the point of failure (#38902)
  • Undeprecate BaseXCom.get_one method for now (#38991)
  • Add inherit_cache attribute for CreateTableAs custom SA Clause (#38985)
  • Don't wait for DagRun lock in mini scheduler (#38914)
  • Fix calendar view with no DAG Run (#38964)
  • Changed the background color of external task in graph (#38969)
  • Fix dag run selection (#38941)
  • Fix SAWarning 'Coercing Subquery object into a select() for use in IN()' (#38926)
  • Fix implicit cartesian product in AirflowSecurityManagerV2 (#38913)
  • Fix problem that links in legacy log view can not be clicked (#38882)
  • Fix dag run link params (#38873)
  • Use async db calls in WorkflowTrigger (#38689)
  • Fix audit log events filter (#38719)
  • Use methodtools.lru_cache instead of functools.lru_cache in class methods (#37757)
  • Raise deprecated warning in airflow dags backfill only if -I / --ignore-first-depends-on-past provided (#38676)

Miscellaneous

  • TriggerDagRunOperator deprecate execution_date in favor of logical_date (#39285)
  • Force to use Airflow Deprecation warnings categories on @deprecated decorator (#39205)
  • Add warning about run/import Airflow under the Windows (#39196)
  • Update is_authorized_custom_view from auth manager to handle custom actions (#39167)
  • Add in Trove classifiers Python 3.12 support (#39004)
  • Use debug level for minischeduler skip (#38976)
  • Bump undici from 5.28.3 to 5.28.4 in /airflow/www (#38751)

Doc Only Changes

  • Fix supported k8s version in docs (#39172)
  • Dynamic task mapping PythonOperator op_kwargs (#39242)
  • Add link to user and role commands (#39224)
  • Add k8s 1.29 to supported version in docs (#39168)
  • Data aware scheduling docs edits (#38687)
  • Update DagBag class docstring to include all params (#38814)
  • Correcting an example taskflow example (#39015)
  • Remove decorator from rendering fields example (#38827)
airflow - Apache Airflow 2.9.0

Published by ephraimbuddy 7 months ago

Significant Changes

Following Listener API methods are considered stable and can be used for production system (were experimental feature in older Airflow versions) (#36376):

Lifecycle events:

  • on_starting
  • before_stopping

DagRun State Change Events:

  • on_dag_run_running
  • on_dag_run_success
  • on_dag_run_failed

TaskInstance State Change Events:

  • on_task_instance_running
  • on_task_instance_success
  • on_task_instance_failed

Support for Microsoft SQL-Server for Airflow Meta Database has been removed (#36514)

After discussion <https://lists.apache.org/thread/r06j306hldg03g2my1pd4nyjxg78b3h4>__
and a voting process <https://lists.apache.org/thread/pgcgmhf6560k8jbsmz8nlyoxosvltph2>__,
the Airflow's PMC and Committers have reached a resolution to no longer maintain MsSQL as a supported Database Backend.

As of Airflow 2.9.0 support of MsSQL has been removed for Airflow Database Backend.

A migration script which can help migrating the database before upgrading to Airflow 2.9.0 is available in
airflow-mssql-migration repo on Github <https://github.com/apache/airflow-mssql-migration>_.
Note that the migration script is provided without support and warranty.

This does not affect the existing provider packages (operators and hooks), DAGs can still access and process data from MsSQL.

Dataset URIs are now validated on input (#37005)

Datasets must use a URI that conform to rules laid down in AIP-60, and the value
will be automatically normalized when the DAG file is parsed. See
documentation on Datasets <https://airflow.apache.org/docs/apache-airflow/stable/authoring-and-scheduling/datasets.html>_ for
a more detailed description on the rules.

You may need to change your Dataset identifiers if they look like a URI, but are
used in a less mainstream way, such as relying on the URI's auth section, or
have a case-sensitive protocol name.

The method get_permitted_menu_items in BaseAuthManager has been renamed filter_permitted_menu_items (#37627)

Add REST API actions to Audit Log events (#37734)

The Audit Log event name for REST API events will be prepended with api. or ui., depending on if it came from the Airflow UI or externally.

Official support for Python 3.12 (#38025)

There are a few caveats though:

  • Pendulum2 does not support Python 3.12. For Python 3.12 you need to use
    Pendulum 3 <https://pendulum.eustace.io/blog/announcing-pendulum-3-0-0.html>_

  • Minimum SQLAlchemy version supported when Pandas is installed for Python 3.12 is 1.4.36 released in
    April 2022. Airflow 2.9.0 increases the minimum supported version of SQLAlchemy to 1.4.36 for all
    Python versions.

Not all Providers support Python 3.12. At the initial release of Airflow 2.9.0 the following providers
are released without support for Python 3.12:

  • apache.beam - pending on Apache Beam support for 3.12 <https://github.com/apache/beam/issues/29149>_
  • papermill - pending on Releasing Python 3.12 compatible papermill client version
    including this merged issue <https://github.com/nteract/papermill/pull/771>_

Prevent large string objects from being stored in the Rendered Template Fields (#38094)

There's now a limit to the length of data that can be stored in the Rendered Template Fields.
The limit is set to 4096 characters. If the data exceeds this limit, it will be truncated. You can change this limit
by setting the [core]max_template_field_length configuration option in your airflow config.

Change xcom table column value type to longblob for MySQL backend (#38401)

Xcom table column value type has changed from blob to longblob. This will allow you to store relatively big data in Xcom but process can take a significant amount of time if you have a lot of large data stored in Xcom.

To downgrade from revision: b4078ac230a1, ensure that you don't have Xcom values larger than 65,535 bytes. Otherwise, you'll need to clean those rows or run airflow db clean xcom to clean the Xcom table.

New Features

  • Allow users to write dag_id and task_id in their national characters, added display name for dag / task (v2) (#38446)
  • Prevent large objects from being stored in the RTIF (#38094)
  • Use current time to calculate duration when end date is not present. (#38375)
  • Add average duration mark line in task and dagrun duration charts. (#38214, #38434)
  • Add button to manually create dataset events (#38305)
  • Add Matomo as an option for analytics_tool. (#38221)
  • Experimental: Support custom weight_rule implementation to calculate the TI priority_weight (#38222)
  • Adding ability to automatically set DAG to off after X times it failed sequentially (#36935)
  • Add dataset conditions to next run datasets modal (#38123)
  • Add task log grouping to UI (#38021)
  • Add dataset_expression to grid dag details (#38121)
  • Introduce mechanism to support multiple executor configuration (#37635)
  • Add color formatting for ANSI chars in logs from task executions (#37985)
  • Add the dataset_expression as part of DagModel and DAGDetailSchema (#37826)
  • Add TaskFail entries to Gantt chart (#37918)
  • Allow longer rendered_map_index (#37798)
  • Inherit the run_ordering from DatasetTriggeredTimetable for DatasetOrTimeSchedule (#37775)
  • Implement AIP-60 Dataset URI formats (#37005)
  • Introducing Logical Operators for dataset conditional logic (#37101)
  • Add post endpoint for dataset events (#37570)
  • Show custom instance names for a mapped task in UI (#36797)
  • Add excluded/included events to get_event_logs api (#37641)
  • Add datasets to dag graph (#37604)
  • Show dataset events above task/run details in grid view (#37603)
  • Introduce new config variable to control whether DAG processor outputs to stdout (#37439)
  • Make Datasets hashable (#37465)
  • Add conditional logic for dataset triggering (#37016)
  • Implement task duration page in react. (#35863)
  • Add queuedEvent endpoint to get/delete DatasetDagRunQueue (#37176)
  • Support multiple XCom output in the BaseOperator (#37297)
  • AIP-58: Add object storage backend for xcom (#37058)
  • Introduce DatasetOrTimeSchedule (#36710)
  • Add on_skipped_callback to BaseOperator (#36374)
  • Allow override of hovered navbar colors (#36631)
  • Create new Metrics with Tagging (#36528)
  • Add support for openlineage to AFS and common.io (#36410)
  • Introduce @task.bash TaskFlow decorator (#30176, #37875)
  • Added functionality to automatically ingest custom airflow.cfg file upon startup (#36289)

Improvements

  • More human friendly "show tables" output for db cleanup (#38654)
  • Improve trigger assign_unassigned by merging alive_triggerer_ids and get_sorted_triggers queries (#38664)
  • Add exclude/include events filters to audit log (#38506)
  • Clean up unused triggers in a single query for all dialects except MySQL (#38663)
  • Update Confirmation Logic for Config Changes on Sensitive Environments Like Production (#38299)
  • Improve datasets graph UX (#38476)
  • Only show latest dataset event timestamp after last run (#38340)
  • Add button to clear only failed tasks in a dagrun. (#38217)
  • Delete all old dag pages and redirect to grid view (#37988)
  • Check task attribute before use in sentry.add_tagging() (#37143)
  • Mysql change xcom value col type for MySQL backend (#38401)
  • ExternalPythonOperator use version from sys.version_info (#38377)
  • Replace too broad exceptions into the Core (#38344)
  • Add CLI support for bulk pause and resume of DAGs (#38265)
  • Implement methods on TaskInstancePydantic and DagRunPydantic (#38295, #38302, #38303, #38297)
  • Made filters bar collapsible and add a full screen toggle (#38296)
  • Encrypt all trigger attributes (#38233, #38358, #38743)
  • Upgrade react-table package. Use with Audit Log table (#38092)
  • Show if dag page filters are active (#38080)
  • Add try number to mapped instance (#38097)
  • Add retries to job heartbeat (#37541)
  • Add REST API events to Audit Log (#37734)
  • Make current working directory as templated field in BashOperator (#37968)
  • Add calendar view to react (#37909)
  • Add run_id column to log table (#37731)
  • Add tryNumber to grid task instance tooltip (#37911)
  • Session is not used in _do_render_template_fields (#37856)
  • Improve MappedOperator property types (#37870)
  • Remove provide_session decorator from TaskInstancePydantic methods (#37853)
  • Ensure the "airflow.task" logger used for TaskInstancePydantic and TaskInstance (#37857)
  • Better error message for internal api call error (#37852)
  • Increase tooltip size of dag grid view (#37782) (#37805)
  • Use named loggers instead of root logger (#37801)
  • Add Run Duration in React (#37735)
  • Avoid non-recommended usage of logging (#37792)
  • Improve DateTimeTrigger typing (#37694)
  • Make sure all unique run_ids render a task duration bar (#37717)
  • Add Dag Audit Log to React (#37682)
  • Add log event for auto pause (#38243)
  • Better message for exception for templated base operator fields (#37668)
  • Clean up webserver endpoints adding to audit log (#37580)
  • Filter datasets graph by dag_id (#37464)
  • Use new exception type inheriting BaseException for SIGTERMs (#37613)
  • Refactor dataset class inheritance (#37590)
  • Simplify checks for package versions (#37585)
  • Filter Datasets by associated dag_ids (GET /datasets) (#37512)
  • Enable "airflow tasks test" to run deferrable operator (#37542)
  • Make datasets list/graph width adjustable (#37425)
  • Speedup determine installed airflow version in ExternalPythonOperator (#37409)
  • Add more task details from rest api (#37394)
  • Add confirmation dialog box for DAG run actions (#35393)
  • Added shutdown color to the STATE_COLORS (#37295)
  • Remove legacy dag details page and redirect to grid (#37232)
  • Order XCom entries by map index in API (#37086)
  • Add data_interval_start and data_interval_end in dagrun create API endpoint (#36630)
  • Making links in task logs as hyperlinks by preventing HTML injection (#36829)
  • Improve ExternalTaskSensor Async Implementation (#36916)
  • Make Datasets Pathlike (#36947)
  • Simplify query for orphaned tasks (#36566)
  • Add deferrable param in FileSensor (#36840)
  • Run Trigger Page: Configurable number of recent configs (#36878)
  • Merge nowait and skip_locked into with_row_locks (#36889)
  • Return the specified field when get dag/dagRun in the REST API (#36641)
  • Only iterate over the items if debug is enabled for DagFileProcessorManager (#36761)
  • Add a fuzzy/regex pattern-matching for metric allow and block list (#36250)
  • Allow custom columns in cli dags list (#35250)
  • Make it possible to change the default cron timetable (#34851)
  • Some improvements to Airflow IO code (#36259)
  • Improve TaskInstance typing hints (#36487)
  • Remove dependency of Connexion from auth manager interface (#36209)
  • Refactor ExternalDagLink to not create ad hoc TaskInstances (#36135)

Bug Fixes

  • Load providers configuration when gunicorn workers start (#38795)
  • Fix grid header rendering (#38720)
  • Add a task instance dependency for mapped dependencies (#37498)
  • Improve stability of remove_task_decorator function (#38649)
  • Mark more fields on API as dump-only (#38616)
  • Fix total_entries count on the event logs endpoint (#38625)
  • Add padding to bottom of log block. (#38610)
  • Properly serialize nested attrs classes (#38591)
  • Fixing the tz in next run ID info (#38482)
  • Show abandoned tasks in Grid View (#38511)
  • Apply task instance mutation hook consistently (#38440)
  • Override chakra styles to keep dropdowns in filter bar (#38456)
  • Store duration in seconds and scale to handle case when a value in the series has a larger unit than the preceding durations. (#38374)
  • Don't allow defaults other than None in context parameters, and improve error message (#38015)
  • Make postgresql default engine args comply with SA 2.0 (#38362)
  • Add return statement to yield within a while loop in triggers (#38389)
  • Ensure __exit__ is called in decorator context managers (#38383)
  • Make the method BaseAuthManager.is_authorized_custom_view abstract (#37915)
  • Add upper limit to planned calendar events calculation (#38310)
  • Fix Scheduler in daemon mode doesn't create PID at the specified location (#38117)
  • Properly serialize TaskInstancePydantic and DagRunPydantic (#37855)
  • Fix graph task state border color (#38084)
  • Add back methods removed in security manager (#37997)
  • Don't log "403" from worker serve-logs as "Unknown error". (#37933)
  • Fix execution data validation error in /get_logs_with_metadata endpoint (#37756)
  • Fix task duration selection (#37630)
  • Refrain from passing encoding to the SQL engine in SQLAlchemy v2 (#37545)
  • Fix 'implicitly coercing SELECT object to scalar subquery' in latest dag run statement (#37505)
  • Clean up typing with max_execution_date query builder (#36958)
  • Optimize max_execution_date query in single dag case (#33242)
  • Fix list dags command for get_dagmodel is None (#36739)
  • Load consuming_dags attr eagerly before dataset listener (#36247)

Miscellaneous

  • Remove display of param from the UI (#38660)
  • Update log level to debug from warning about scheduled_duration metric (#38180)
  • Use importlib_metadata with compat to Python 3.10/3.12 stdlib (#38366)
  • Refactored __new__ magic method of BaseOperatorMeta to avoid bad mixing classic and decorated operators (#37937)
  • Use sys.version_info for determine Python Major.Minor (#38372)
  • Add missing deprecated Fab auth manager (#38376)
  • Remove unused loop variable from airflow package (#38308)
  • Adding max consecutive failed dag runs info in UI (#38229)
  • Bump minimum version of blinker add where it requires (#38140)
  • Bump follow-redirects from 1.15.4 to 1.15.6 in /airflow/www (#38156)
  • Bump Cryptography to > 39.0.0 (#38112)
  • Add Python 3.12 support (#36755, #38025, #36595)
  • Avoid use of assert outside of the tests (#37718)
  • Update ObjectStoragePath for universal_pathlib>=v0.2.2 (#37930)
  • Resolve G004: Logging statement uses f-string (#37873)
  • Update build and install dependencies. (#37910)
  • Bump sanitize-html from 2.11.0 to 2.12.1 in /airflow/www (#37833)
  • Update to latest installer versions. (#37754)
  • Deprecate smtp configs in airflow settings / local_settings (#37711)
  • Deprecate PY* constants into the airflow module (#37575)
  • Remove usage of deprecated flask._request_ctx_stack (#37522)
  • Remove redundant login attribute in airflow.__init__.py (#37565)
  • Upgrade to FAB 4.3.11 (#37233)
  • Remove SCHEDULED_DEPS which is no longer used anywhere since 2.0.0 (#37140)
  • Replace datetime.datetime.utcnow by airflow.utils.timezone.utcnow in core (#35448)
  • Bump aiohttp min version to avoid CVE-2024-23829 and CVE-2024-23334 (#37110)
  • Move config related to FAB auth manager to FAB provider (#36232)
  • Remove MSSQL support form Airflow core (#36514)
  • Remove is_authorized_cluster_activity from auth manager (#36175)
  • Create FAB provider and move FAB auth manager in it (#35926)

Doc Only Changes

  • Improve timetable documentation (#38505)
  • Reorder OpenAPI Spec tags alphabetically (#38717)
  • Update UI screenshots in the documentation (#38680, #38403, #38438, #38435)
  • Remove section as it's no longer true with dataset expressions PR (#38370)
  • Refactor DatasetOrTimeSchedule timetable docs (#37771)
  • Migrate executor docs to respective providers (#37728)
  • Add directive to render a list of URI schemes (#37700)
  • Add doc page with providers deprecations (#37075)
  • Add a cross reference to security policy (#37004)
  • Improve AIRFLOW__WEBSERVER__BASE_URL docs (#37003)
  • Update faq.rst with (hopefully) clearer description of start_date (#36846)
  • Update public interface doc re operators (#36767)
  • Add exception to templates ref list (#36656)
  • Add auth manager interface as public interface (#36312)
  • Reference fab provider documentation in Airflow documentation (#36310)
  • Create auth manager documentation (#36211)
  • Update permission docs (#36120)
  • Docstring improvement to _covers_every_hour (#36081)
  • Add note that task instance, dag and lifecycle listeners are non-experimental (#36376)
airflow - Apache Airflow 2.8.4

Published by jedcunningham 7 months ago

Significant Changes

No significant changes.

Bug Fixes

  • Fix incorrect serialization of FixedTimezone (#38139)
  • Fix excessive permission changing for log task handler (#38164)
  • Fix task instances list link (#38096)
  • Fix a bug where scheduler heartrate parameter was not used (#37992)
  • Add padding to prevent grid horizontal scroll overlapping tasks (#37942)
  • Fix hash caching in ObjectStoragePath (#37769)

Miscellaneous

  • Limit importlib_resources as it breaks pytest_rewrites (#38095, #38139)
  • Limit pandas to <2.2 (#37748)
  • Bump croniter to fix an issue with 29 Feb cron expressions (#38198)

Doc Only Changes

  • Tell users what to do if their scanners find issues in the image (#37652)
  • Add a section about debugging in Docker Compose with PyCharm (#37940)
  • Update deferrable docs to clarify kwargs when trigger resumes operator (#38122)
airflow - Apache Airflow Helm Chart 1.13.1

Published by jedcunningham 7 months ago

Significant Changes

Default Airflow image is updated to 2.8.3 (#38036)

The default Airflow image that is used with the Chart is now 2.8.3, previously it was 2.8.2.

Bug Fixes

  • Don't overwrite .Values.airflowPodAnnotations (#37917)
  • Fix cluster-wide RBAC naming clash when using multiple multiNamespace releases with the same name (#37197)

Misc

  • Chart: Default airflow version to 2.8.3 (#38036)
airflow - Apache Airflow 2.8.3

Published by ephraimbuddy 7 months ago

Significant Changes

The smtp provider is now pre-installed when you install Airflow. (#37713)

Bug Fixes

  • Add "MENU" permission in auth manager (#37881)
  • Fix external_executor_id being overwritten (#37784)
  • Make more MappedOperator members modifiable (#37828)
  • Set parsing context dag_id in dag test command (#37606)

Miscellaneous

  • Remove useless methods from security manager (#37889)
  • Improve code coverage for TriggerRuleDep (#37680)
  • The SMTP provider is now preinstalled when installing Airflow (#37713)
  • Bump min versions of openapi validators (#37691)
  • Properly include airflow_pre_installed_providers.txt artifact (#37679)

Doc Only Changes

  • Clarify lack of sync between workers and scheduler (#37913)
  • Simplify some docs around airflow_local_settings (#37835)
  • Add section about local settings configuration (#37829)
  • Fix docs of BranchDayOfWeekOperator (#37813)
  • Write to secrets store is not supported by design (#37814)
  • ERD generating doc improvement (#37808)
  • Update incorrect config value (#37706)
  • Update security model to clarify Connection Editing user's capabilities (#37688)
  • Fix ImportError on examples dags (#37571)
airflow - Apache Airflow Helm Chart 1.13.0

Published by jedcunningham 8 months ago

Significant Changes

Default Airflow image is updated to 2.8.2 (#37704)

The default Airflow image that is used with the Chart is now 2.8.2, previously it was 2.8.1.

New Features

  • Support labels specific to the database migration objects and pods (#37490)

Improvements

  • Flower K8s Probe config (#37528)

Bug Fixes

  • Remove duplicate ports key in webserver service (#37356)
  • Add AIRFLOW_HOME env var to log groomer sidecar (#37588)
  • Skip . path when preparing reproducible packages (#37402)

Misc

  • Default airflow version to 2.8.2 (#37704)
airflow - Apache Airflow 2.8.2

Published by ephraimbuddy 8 months ago

Significant Changes

The allowed_deserialization_classes flag now follows a glob pattern (#36147).

For example if one wants to add the class airflow.tests.custom_class to the
allowed_deserialization_classes list, it can be done by writing the full class
name (airflow.tests.custom_class) or a pattern such as the ones used in glob
search (e.g., airflow.*, airflow.tests.*).

If you currently use a custom regexp path make sure to rewrite it as a glob pattern.

Alternatively, if you still wish to match it as a regexp pattern, add it under the new
list allowed_deserialization_classes_regexp instead.

The audit_logs permissions have been updated for heightened security (#37501).

This was done under the policy that we do not want users like Viewer, Ops,
and other users apart from Admin to have access to audit_logs. The intention behind
this change is to restrict users with less permissions from viewing user details
like First Name, Email etc. from the audit_logs when they are not permitted to.

The impact of this change is that the existing users with non admin rights won't be able
to view or access the audit_logs, both from the Browse tab or from the DAG run.

AirflowTimeoutError is no longer except by default through Exception (#35653).

The AirflowTimeoutError is now inheriting BaseException instead of
AirflowException->Exception.
See https://docs.python.org/3/library/exceptions.html#exception-hierarchy

This prevents code catching Exception from accidentally
catching AirflowTimeoutError and continuing to run.
AirflowTimeoutError is an explicit intent to cancel the task, and should not
be caught in attempts to handle the error and return some default value.

Catching AirflowTimeoutError is still possible by explicitly excepting
AirflowTimeoutError or BaseException.
This is discouraged, as it may allow the code to continue running even after
such cancellation requests.
Code that previously depended on performing strict cleanup in every situation
after catching Exception is advised to use finally blocks or
context managers. To perform only the cleanup and then automatically
re-raise the exception.
See similar considerations about catching KeyboardInterrupt in
https://docs.python.org/3/library/exceptions.html#KeyboardInterrupt

Bug Fixes

  • Sort dag processing stats by last_runtime (#37302)
  • Allow pre-population of trigger form values via URL parameters (#37497)
  • Base date for fetching dag grid view must include selected run_id (#34887)
  • Check permissions for ImportError (#37468)
  • Move IMPORT_ERROR from DAG related permissions to view related permissions (#37292)
  • Change AirflowTaskTimeout to inherit BaseException (#35653)
  • Revert "Fix future DagRun rarely triggered by race conditions when max_active_runs reached its upper limit. (#31414)" (#37596)
  • Change margin to padding so first task can be selected (#37527)
  • Fix Airflow serialization for namedtuple (#37168)
  • Fix bug with clicking url-unsafe tags (#37395)
  • Set deterministic and new getter for Treeview function (#37162)
  • Fix permissions of parent folders for log file handler (#37310)
  • Fix permission check on DAGs when access_entity is specified (#37290)
  • Fix the value of dateTimeAttrFormat constant (#37285)
  • Resolve handler close race condition at triggerer shutdown (#37206)
  • Fixing status icon alignment for various views (#36804)
  • Remove superfluous @Sentry.enrich_errors (#37002)
  • Use execution_date= param as a backup to base date for grid view (#37018)
  • Handle SystemExit raised in the task. (#36986)
  • Revoking audit_log permission from all users except admin (#37501)
  • Fix broken regex for allowed_deserialization_classes (#36147)
  • Fix the bug that affected the DAG end date. (#36144)
  • Adjust node width based on task name length (#37254)
  • fix: PythonVirtualenvOperator crashes if any python_callable function is defined in the same source as DAG (#37165)
  • Fix collapsed grid width, line up selected bar with gantt (#37205)
  • Adjust graph node layout (#37207)
  • Revert the sequence of initializing configuration defaults (#37155)
  • Displaying "actual" try number in TaskInstance view (#34635)
  • Bugfix Triggering DAG with parameters is mandatory when show_trigger_form_if_no_params is enabled (#37063)
  • Secret masker ignores passwords with special chars (#36692)
  • Fix DagRuns with UPSTREAM_FAILED tasks get stuck in the backfill. (#36954)
  • Disable dryrun auto-fetch (#36941)
  • Fix copy button on a DAG run's config (#36855)
  • Fix bug introduced by replacing spaces by + in run_id (#36877)
  • Fix webserver always redirecting to home page if user was not logged in (#36833)
  • REST API set description on POST to /variables endpoint (#36820)
  • Sanitize the conn_id to disallow potential script execution (#32867)
  • Fix task id copy button copying wrong id (#34904)
  • Fix security manager inheritance in fab provider (#36538)
  • Avoid pendulum.from_timestamp usage (#37160)

Miscellaneous

  • Install latest docker CLI instead of specific one (#37651)
  • Bump undici from 5.26.3 to 5.28.3 in /airflow/www (#37493)
  • Add Python 3.12 exclusions in providers/pyproject.toml (#37404)
  • Remove markdown from core dependencies (#37396)
  • Remove unused pageSize method. (#37319)
  • Add more-itertools as dependency of common-sql (#37359)
  • Replace other Python 3.11 and 3.12 deprecations (#37478)
  • Include airflow_pre_installed_providers.txt into sdist distribution (#37388)
  • Turn Pydantic into an optional dependency (#37320)
  • Limit universal-pathlib to < 0.2.0 (#37311)
  • Allow running airflow against sqlite in-memory DB for tests (#37144)
  • Add description to queue_when (#36997)
  • Updated config.yml for environment variable sql_alchemy_connect_args (#36526)
  • Bump min version of Alembic to 1.13.1 (#36928)
  • Limit flask-session to <0.6 (#36895)

Doc Only Changes

  • Fix upgrade docs to reflect true CLI flags available (#37231)
  • Fix a bug in fundamentals doc (#37440)
  • Add redirect for deprecated page (#37384)
  • Fix the otel config descriptions (#37229)
  • Update Objectstore tutorial with prereqs section (#36983)
  • Add more precise description on avoiding generic package/module names (#36927)
  • Add airflow version substitution into Docker Compose Howto (#37177)
  • Add clarification about DAG author capabilities to security model (#37141)
  • Move docs for cron basics to Authoring and Scheduling section (#37049)
  • Link to release notes in the upgrade docs (#36923)
  • Prevent templated field logic checks in __init__ of operators automatically (#33786)
airflow - Apache Airflow Helm Chart 1.12.0

Published by jedcunningham 8 months ago

Significant Changes

The helm chart is now using a newer version of bitnami/postgresql dependency (#34817)

The version of bitnami/postgresql subchart upgraded from 12.10.0 to 13.2.24.
The version of PostgreSQL binaries upgraded from 11 to 16.1.0.

The change requires existing bitnami/postgresql subchart users to perform manual major version upgrade using pg_dumpall or pg_upgrade.

As a reminder, it is recommended to set up an external database <https://airflow.apache.org/docs/helm-chart/stable/production-guide.html#database>_ in production.

Default Airflow image is updated to 2.8.1 (#36907)

The default Airflow image that is used with the Chart is now 2.8.1, previously it was 2.7.1.

Default PgBouncer and PgBouncer Exporter images have been updated (#36898)

The PgBouncer and PgBouncer Exporter images are based on newer software/os.

  • pgbouncer: 1.21.0 based on alpine 3.14 (airflow-pgbouncer-2024.01.19-1.21.0)
  • pgbouncer-exporter: 0.16.0 based on alpine 3.19 (apache/airflow:airflow-pgbouncer-exporter-2024.01.19-0.16.0)

Default StatsD image is updated to v0.26.0 (#37187)

The default StatsD image that is used with the Chart is now v0.26.0, previously it was v0.22.8.

Default Redis image is updated to 7-bookworm (#37187)

The default Redis image that is used with the Chart is now 7-bookworm, previously it was 7-bullseye.

New Features

  • Enable native HPA for Airflow Workers (#36174)
  • Add init container + sidecar support for Airflow Kerberos (#35548)
  • Support MySQL backend as KEDA trigger (#36167)

Improvements

  • Improve PriorityClass to improve debuggability (#36365)
  • Add securityContexts in dag processors log groomer sidecar (#34499)
  • Add support for securityContexts in dag processors wait-for-migrations container (#35593)
  • Add templating for PVC storageClassName (#35581)
  • Add volumeClaimTemplate for worker (#34986)
  • Add support for priorityClassName on Redis pods (#34879)
  • Configurable mount path for DAGs volume (#35083)
  • Add support for custom emptyDir config (#34837)
  • Added ability to enable/disable scheduler and webserver (#36991)

Bug Fixes

  • Fix StatsD host in Airflow config (#35679)
  • Set AIRFLOW_HOME env var with airflowHome value (#34839)
  • Safer worker pod annotations (#35309)
  • Set worker safeToEvict properly (#35130)
  • Fix Redis broker URL with useStandardNaming (#34825)
  • Fix metadata DB & port in KEDA connection when usePgbouncer is false (#34741)
  • Fix PgBouncer connection with useStandardNaming (#34787)

Doc only changes

  • Add docs about extending the Airflow Helm chart (#36331)
  • Add comment for Elasticsearch connection scheme (#35588)
  • Add notes about Virtualenvs preventing the need for custom images (#35306)

Misc

  • Default Airflow version to 2.8.1 (#36907)
  • Support git-sync v4 (#34731)
  • Upgrade bitnami/postgresql subchart to 13.2.24 (#36156)
  • Change git sync container indent to 4 (#35824)
  • Remove K8S 1.24 support (#35214)
  • Rebuild pgbouncer and pgbouncer-exporter images with newer versions (#36898)
  • Update statsd and redis chart images (#37187)
airflow - Apache Airflow 2.8.1

Published by ephraimbuddy 9 months ago

Significant Changes

Target version for core dependency pendulum package set to 3 (#36281).

Support for pendulum 2.1.2 will be saved for a while, presumably until the next feature version of Airflow.
It is advised to upgrade user code to use pendulum 3 as soon as possible.

Airflow packaging specification follows modern Python packaging standards (#36537).

We standardized Airflow dependency configuration to follow latest development in Python packaging by
using pyproject.toml. Airflow is now compliant with those accepted PEPs:

  • PEP-440 Version Identification and Dependency Specification <https://www.python.org/dev/peps/pep-0440/>__
  • PEP-517 A build-system independent format for source trees <https://www.python.org/dev/peps/pep-0517/>__
  • PEP-518 Specifying Minimum Build System Requirements for Python Projects <https://www.python.org/dev/peps/pep-0518/>__
  • PEP-561 Distributing and Packaging Type Information <https://www.python.org/dev/peps/pep-0561/>__
  • PEP-621 Storing project metadata in pyproject.toml <https://www.python.org/dev/peps/pep-0621/>__
  • PEP-660 Editable installs for pyproject.toml based builds (wheel based) <https://www.python.org/dev/peps/pep-0660/>__
  • PEP-685 Comparison of extra names for optional distribution dependencies <https://www.python.org/dev/peps/pep-0685/>__

Also we implement multiple license files support coming from Draft, not yet accepted (but supported by hatchling) PEP:

  • PEP 639 Improving License Clarity with Better Package Metadata <https://peps.python.org/pep-0639/>__

This has almost no noticeable impact on users if they are using modern Python packaging and development tools, generally
speaking Airflow should behave as it did before when installing it from PyPI and it should be much easier to install
it for development purposes using pip install -e ".[devel]".

The differences from the user side are:

  • Airflow extras now get extras normalized to - (following PEP-685) instead of _ and .
    (as it was before in some extras). When you install airflow with such extras (for example dbt.core or
    all_dbs) you should use - instead of _ and ..

In most modern tools this will work in backwards-compatible way, but in some old version of those tools you might need to
replace _ and . with -. You can also get warnings that the extra you are installing does not exist - but usually
this warning is harmless and the extra is installed anyway. It is, however, recommended to change to use - in extras in your dependency
specifications for all Airflow extras.

  • Released airflow package does not contain devel, devel-*, doc and doc-gen extras.
    Those extras are only available when you install Airflow from sources in --editable mode. This is
    because those extras are only used for development and documentation building purposes and are not needed
    when you install Airflow for production use. Those dependencies had unspecified and varying behaviour for
    released packages anyway and you were not supposed to use them in released packages.

  • The all and all-* extras were not always working correctly when installing Airflow using constraints
    because they were also considered as development-only dependencies. With this change, those dependencies are
    now properly handling constraints and they will install properly with constraints, pulling the right set
    of providers and dependencies when constraints are used.

Graphviz dependency is now an optional one, not required one (#36647).

The graphviz dependency has been problematic as Airflow required dependency - especially for
ARM-based installations. Graphviz packages require binary graphviz libraries - which is already a
limitation, but they also require to install graphviz Python bindings to be build and installed.
This does not work for older Linux installation but - more importantly - when you try to install
Graphviz libraries for Python 3.8, 3.9 for ARM M1 MacBooks, the packages fail to install because
Python bindings compilation for M1 can only work for Python 3.10+.

This is not a breaking change technically - the CLIs to render the DAGs is still there and IF you
already have graphviz installed, it will continue working as it did before. The only problem when it
does not work is where you do not have graphviz installed it will raise an error and inform that you need it.

Graphviz will remain to be installed for most users:

  • the Airflow Image will still contain graphviz library, because
    it is added there as extra
  • when previous version of Airflow has been installed already, then
    graphviz library is already installed there and Airflow will
    continue working as it did

The only change will be a new installation of new version of Airflow from the scratch, where graphviz will
need to be specified as extra or installed separately in order to enable DAG rendering option.

Bug Fixes

  • Fix airflow-scheduler exiting with code 0 on exceptions (#36800)
  • Fix Callback exception when a removed task is the last one in the taskinstance list (#36693)
  • Allow anonymous user edit/show resource when set AUTH_ROLE_PUBLIC=admin (#36750)
  • Better error message when sqlite URL uses relative path (#36774)
  • Explicit string cast required to force integer-type run_ids to be passed as strings instead of integers (#36756)
  • Add log lookup exception for empty op subtypes (#35536)
  • Remove unused index on task instance (#36737)
  • Fix check on subclass for typing.Union in _infer_multiple_outputs for Python 3.10+ (#36728)
  • Make sure multiple_outputs is inferred correctly even when using TypedDict (#36652)
  • Add back FAB constant in legacy security manager (#36719)
  • Fix AttributeError when using Dagrun.update_state (#36712)
  • Do not let EventsTimetable schedule past events if catchup=False (#36134)
  • Support encryption for triggers parameters (#36492)
  • Fix the type hint for tis_query in _process_executor_events (#36655)
  • Redirect to index when user does not have permission to access a page (#36623)
  • Avoid using dict as default value in call_regular_interval (#36608)
  • Remove option to set a task instance to running state in UI (#36518)
  • Fix details tab not showing when using dynamic task mapping (#36522)
  • Raise error when DagRun fails while running dag test (#36517)
  • Refactor _manage_executor_state by refreshing TIs in batch (#36502)
  • Add flask config: MAX_CONTENT_LENGTH (#36401)
  • Fix get_leaves calculation for teardown in nested group (#36456)
  • Stop serializing timezone-naive datetime to timezone-aware datetime with UTC tz (#36379)
  • Make kubernetes decorator type annotation consistent with operator (#36405)
  • Fix Webserver returning 500 for POST requests to api/dag/*/dagrun from anonymous user (#36275)
  • Fix the required access for get_variable endpoint (#36396)
  • Fix datetime reference in DAG.is_fixed_time_schedule (#36370)
  • Fix AirflowSkipException message raised by BashOperator (#36354)
  • Allow PythonVirtualenvOperator.skip_on_exit_code to be zero (#36361)
  • Increase width of execution_date input in trigger.html (#36278)
  • Fix logging for pausing DAG (#36182)
  • Stop deserializing pickle when enable_xcom_pickling is False (#36255)
  • Check DAG read permission before accessing DAG code (#36257)
  • Enable mark task as failed/success always (#36254)
  • Create latest log dir symlink as relative link (#36019)
  • Fix Python-based decorators templating (#36103)

Miscellaneous

  • Rename concurrency label to max active tasks (#36691)
  • Restore function scoped httpx import in file_task_handler for performance (#36753)
  • Add support of Pendulum 3 (#36281)
  • Standardize airflow build process and switch to Hatchling build backend (#36537)
  • Get rid of pyarrow-hotfix for CVE-2023-47248 (#36697)
  • Make graphviz dependency optional (#36647)
  • Announce MSSQL support end in Airflow 2.9.0, add migration script hints (#36509)
  • Set min pandas dependency to 1.2.5 for all providers and airflow (#36698)
  • Bump follow-redirects from 1.15.3 to 1.15.4 in /airflow/www (#36700)
  • Provide the logger_name param to base hook in order to override the logger name (#36674)
  • Fix run type icon alignment with run type text (#36616)
  • Follow BaseHook connection fields method signature in FSHook (#36444)
  • Remove redundant docker decorator type annotations (#36406)
  • Straighten typing in workday timetable (#36296)
  • Use batch_is_authorized_dag to check if user has permission to read DAGs (#36279)
  • Replace deprecated get_accessible_dag_ids and use get_readable_dags in get_dag_warnings (#36256)

Doc Only Changes

  • Metrics tagging documentation (#36627)
  • In docs use logical_date instead of deprecated execution_date (#36654)
  • Add section about live-upgrading Airflow (#36637)
  • Replace numpy example with practical exercise demonstrating top-level code (#35097)
  • Improve and add more complete description in the architecture diagrams (#36513)
  • Improve the error message displayed when there is a webserver error (#36570)
  • Update dags.rst with information on DAG pausing (#36540)
  • Update installation prerequisites after upgrading to Debian Bookworm (#36521)
  • Add description on the ways how users should approach DB monitoring (#36483)
  • Add branching based on mapped task group example to dynamic-task-mapping.rst (#36480)
  • Add further details to replacement documentation (#36485)
  • Use cards when describing priority weighting methods (#36411)
  • Update metrics.rst for param dagrun.schedule_delay (#36404)
  • Update admonitions in Python operator doc to reflect sentiment (#36340)
  • Improve audit_logs.rst (#36213)
  • Remove Redshift mention from the list of managed Postgres backends (#36217)
airflow - Apache Airflow 2.8.0

Published by ephraimbuddy 10 months ago

Significant Changes

  • Raw HTML code in DAG docs and DAG params descriptions is disabled by default

    To ensure that no malicious javascript can be injected with DAG descriptions or trigger UI forms by DAG authors
    a new parameter webserver.allow_raw_html_descriptions was added with default value of False.
    If you trust your DAG authors code and want to allow using raw HTML in DAG descriptions and params, you can restore the previous
    behavior by setting the configuration value to True.

    To ensure Airflow is secure by default, the raw HTML support in trigger UI has been super-seeded by markdown support via
    the description_md attribute. If you have been using description_html please migrate to description_md.
    The custom_html_form is now deprecated. (#35460)

New Features

  • AIP-58: Add Airflow ObjectStore (AFS) (AIP-58)
  • Add XCom tab to Grid (#35719)
  • Add "literal" wrapper to disable field templating (#35017)
  • Add task context logging feature to allow forwarding messages to task logs (#32646, #32693, #35857)
  • Add Listener hooks for Datasets (#34418, #36247)
  • Allow override of navbar text color (#35505)
  • Add lightweight serialization for deltalake tables (#35462)
  • Add support for serialization of iceberg tables (#35456)
  • prev_end_date_success method access (#34528)
  • Add task parameter to set custom logger name (#34964)
  • Add pyspark decorator (#35247)
  • Add trigger as a valid option for the db clean command (#34908)
  • Add decorators for external and venv python branching operators (#35043)
  • Allow PythonVenvOperator using other index url (#33017)
  • Add Python Virtualenv Operator Caching (#33355)
  • Introduce a generic export for containerized executor logging (#34903)
  • Add ability to clear downstream tis in List Task Instances view (#34529)
  • Attribute clear_number to track DAG run being cleared (#34126)
  • Add BranchPythonVirtualenvOperator (#33356)
  • Allow PythonVenvOperator using other index url (#33017)
  • Add CLI notification commands to providers (#33116)
  • Use dropdown instead of buttons when there are more than 10 retries in log tab (#36025)

Improvements

  • Add multiselect to run state in grid view (#35403)
  • Fix warning message in Connection.get_hook in case of ImportError (#36005)
  • Add processor_subdir to import_error table to handle multiple dag processors (#35956)
  • Consolidate the call of change_state to fail or success in the core executors (#35901)
  • Relax mandatory requirement for start_date when schedule=None (#35356)
  • Use ExitStack to manage mutation of secrets_backend_list in dag.test (#34620)
  • improved visibility of tasks in ActionModal for taskinstance (#35810)
  • Create directories based on AIRFLOW_CONFIG path (#35818)
  • Implements JSON-string connection representation generator (#35723)
  • Move BaseOperatorLink into the separate module (#35032)
  • Set mark_end_on_close after set_context (#35761)
  • Move external logs links to top of react logs page (#35668)
  • Change terminal mode to cbreak in execute_interactive and handle SIGINT (#35602)
  • Make raw HTML descriptions configurable (#35460)
  • Allow email field to be templated (#35546)
  • Hide logical date and run id in trigger UI form (#35284)
  • Improved instructions for adding dependencies in TaskFlow (#35406)
  • Add optional exit code to list import errors (#35378)
  • Limit query result on DB rather than client in synchronize_log_template function (#35366)
  • Allow description to be passed in when using variables CLI (#34791)
  • Allow optional defaults in required fields with manual triggered dags (#31301)
  • Permitting airflow kerberos to run in different modes (#35146)
  • Refactor commands to unify daemon context handling (#34945)
  • Add extra fields to plugins endpoint (#34913)
  • Add description to pools view (#34862)
  • Move cli's Connection export and Variable export command print logic to a separate function (#34647)
  • Extract and reuse get_kerberos_principle func from get_kerberos_principle (#34936)
  • Change type annotation for BaseOperatorLink.operators (#35003)
  • Optimise and migrate to SA2-compatible syntax for TaskReschedule (#33720)
  • Consolidate the permissions name in SlaMissModelView (#34949)
  • Add debug log saying what's being run to EventScheduler (#34808)
  • Increase log reader stream loop sleep duration to 1 second (#34789)
  • Resolve pydantic deprecation warnings re update_forward_refs (#34657)
  • Unify mapped task group lookup logic (#34637)
  • Allow filtering event logs by attributes (#34417)
  • Make connection login and password TEXT (#32815)
  • Ban import Dataset from airflow package in codebase (#34610)
  • Use airflow.datasets.Dataset in examples and tests (#34605)
  • Enhance task status visibility (#34486)
  • Simplify DAG trigger UI (#34567)
  • Ban import AirflowException from airflow (#34512)
  • Add descriptions for airflow resource config parameters (#34438)
  • Simplify trigger name expression (#34356)
  • Move definition of Pod*Exceptions to pod_generator (#34346)
  • Add deferred tasks to the cluster_activity view Pools Slots (#34275)
  • heartbeat failure log message fix (#34160)
  • Rename variables for dag runs (#34049)
  • Clarify new_state in OpenAPI spec (#34056)
  • Remove version top-level element from docker compose files (#33831)
  • Remove generic trigger cancelled error log (#33874)
  • Use NOT EXISTS subquery instead of tuple_not_in_condition (#33527)
  • Allow context key args to not provide a default (#33430)
  • Order triggers by - TI priority_weight when assign unassigned triggers (#32318)
  • Add metric triggerer_heartbeat (#33320)
  • Allow airflow variables export to print to stdout (#33279)
  • Workaround failing deadlock when running backfill (#32991)
  • add dag_run_ids and task_ids filter for the batch task instance API endpoint (#32705)
  • Configurable health check threshold for triggerer (#33089)
  • Rework provider manager to treat Airflow core hooks like other provider hooks (#33051)
  • Ensure DAG-level references are filled on unmap (#33083)
  • Affix webserver access_denied warning to be configurable (#33022)
  • Add support for arrays of different data types in the Trigger Form UI (#32734)
  • Add a mechanism to warn if executors override existing CLI commands (#33423)

Bug Fixes

  • Account for change in UTC offset when calculating next schedule (#35887)
  • Add read access to pools for viewer role (#35352)
  • Fix gantt chart queued duration when queued_dttm is greater than start_date for deferred tasks (#35984)
  • Avoid crushing container when directory is not found on rm (#36050)
  • Update reset_user_sessions to work from either CLI or web (#36056)
  • Fix UI Grid error when DAG has been removed. (#36028)
  • Change Trigger UI to use HTTP POST in web ui (#36026)
  • Fix airflow db shell needing an extra key press to exit (#35982)
  • Change dag grid overscroll behaviour to auto (#35717)
  • Run triggers inline with dag test (#34642)
  • Add borderWidthRight to grid for Firefox scrollbar (#35346)
  • Fix for infinite recursion due to secrets_masker (#35048)
  • Fix write processor_subdir in serialized_dag table (#35661)
  • Reload configuration for standalone dag file processor (#35725)
  • Long custom operator name overflows in graph view (#35382)
  • Add try_number to extra links query (#35317)
  • Prevent assignment of non JSON serializable values to DagRun.conf dict (#35096)
  • Numeric values in DAG details are incorrectly rendered as timestamps (#35538)
  • Fix Scheduler and triggerer crashes in daemon mode when statsd metrics are enabled (#35181)
  • Infinite UI redirection loop after deactivating an active user (#35486)
  • Bug fix fetch_callback of Partial Subset DAG (#35256)
  • Fix DagRun data interval for DeltaDataIntervalTimetable (#35391)
  • Fix query in get_dag_by_pickle util function (#35339)
  • Fix TriggerDagRunOperator failing to trigger subsequent runs when reset_dag_run=True (#35429)
  • Fix weight_rule property type in mappedoperator (#35257)
  • Bugfix/prevent concurrency with cached venv (#35258)
  • Fix dag serialization (#34042)
  • Fix py/url-redirection by replacing request.referrer by get_redirect() (#34237)
  • Fix updating variables during variable imports (#33932)
  • Use Literal from airflow.typing_compat in Airflow core (#33821)
  • Always use Literal from typing_extensions (#33794)

Miscellaneous

  • Change default MySQL client to MariaDB (#36243)
  • Mark daskexecutor provider as removed (#35965)
  • Bump FAB to 4.3.10 (#35991)
  • Mark daskexecutor provider as removed (#35965)
  • Rename Connection.to_json_dict to Connection.to_dict (#35894)
  • Upgrade to Pydantic v2 (#35551)
  • Bump moto version to >= 4.2.9 (#35687)
  • Use pyarrow-hotfix to mitigate CVE-2023-47248 (#35650)
  • Bump axios from 0.26.0 to 1.6.0 in /airflow/www/ (#35624)
  • Make docker decorator's type annotation consistent with operator (#35568)
  • Add default to navbar_text_color and rm condition in style (#35553)
  • Avoid initiating session twice in dag_next_execution (#35539)
  • Work around typing issue in examples and providers (#35494)
  • Enable TCH004 and TCH005 rules (#35475)
  • Humanize log output about retrieved DAG(s) (#35338)
  • Switch from Black to Ruff formatter (#35287)
  • Upgrade to Flask Application Builder 4.3.9 (#35085)
  • D401 Support (#34932, #34933)
  • Use requires_access to check read permission on dag instead of checking it explicitly (#34940)
  • Deprecate lazy import AirflowException from airflow (#34541)
  • View util refactoring on mapped stuff use cases (#34638)
  • Bump postcss from 8.4.25 to 8.4.31 in /airflow/www (#34770)
  • Refactor Sqlalchemy queries to 2.0 style (#34763, #34665, #32883, #35120)
  • Change to lazy loading of io in pandas serializer (#34684)
  • Use airflow.models.dag.DAG in examples (#34617)
  • Use airflow.exceptions.AirflowException in core (#34510)
  • Check that dag_ids passed in request are consistent (#34366)
  • Refactors to make code better (#34278, #34113, #34110, #33838, #34260, #34409, #34377, #34350)
  • Suspend qubole provider (#33889)
  • Generate Python API docs for Google ADS (#33814)
  • Improve importing in modules (#33812, #33811, #33810, #33806, #33807, #33805, #33804, #33803,
    #33801, #33799, #33800, #33797, #33798, #34406, #33808)
  • Upgrade Elasticsearch to 8 (#33135)

Doc Only Changes

  • Add support for tabs (and other UX components) to docs (#36041)
  • Replace architecture diagram of Airflow with diagrams-generated one (#36035)
  • Add the section describing the security model of DAG Author capabilities (#36022)
  • Enhance docs for zombie tasks (#35825)
  • Reflect drop/add support of DB Backends versions in documentation (#35785)
  • More detail on mandatory task arguments (#35740)
  • Indicate usage of the re2 regex engine in the .airflowignore documentation. (#35663)
  • Update best-practices.rst (#35692)
  • Update dag-run.rst to mention Airflow's support for extended cron syntax through croniter (#35342)
  • Update webserver.rst to include information of supported OAuth2 providers (#35237)
  • Add back dag_run to docs (#35142)
  • Fix rst code block format (#34708)
  • Add typing to concrete taskflow examples (#33417)
  • Add concrete examples for accessing context variables from TaskFlow tasks (#33296)
  • Fix links in security docs (#33329)
airflow - Apache Airflow 2.7.3

Published by ephraimbuddy 12 months ago

Significant Changes

No significant changes.

Bug Fixes

  • Fix pre-mature evaluation of tasks in mapped task group (#34337)
  • Add TriggerRule missing value in rest API (#35194)
  • Fix Scheduler crash looping when dagrun creation fails (#35135)
  • Fix test connection with codemirror and extra (#35122)
  • Fix usage of cron-descriptor since BC in v1.3.0 (#34836)
  • Fix get_plugin_info for class based listeners. (#35022)
  • Some improvements/fixes for dag_run and task_instance endpoints (#34942)
  • Fix the dags count filter in webserver home page (#34944)
  • Return only the TIs of the readable dags when ~ is provided as a dag_id (#34939)
  • Fix triggerer thread crash in daemon mode (#34931)
  • Fix wrong plugin schema (#34858)
  • Use DAG timezone in TimeSensorAsync (#33406)
  • Mark tasks with all_skipped trigger rule as skipped if any task is in upstream_failed state (#34392)
  • Add read only validation to read only fields (#33413)

Misc/Internal

  • Improve testing harness to separate DB and non-DB tests (#35160, #35333)
  • Add pytest db_test markers to our tests (#35264)
  • Add pip caching for faster build (#35026)
  • Upper bound pendulum requirement to <3.0 (#35336)
  • Limit sentry_sdk to 1.33.0 (#35298)
  • Fix subtle bug in mocking processor_agent in our tests (#35221)
  • Bump @babel/traverse from 7.16.0 to 7.23.2 in /airflow/www (#34988)
  • Bump undici from 5.19.1 to 5.26.3 in /airflow/www (#34971)
  • Remove unused set from SchedulerJobRunner (#34810)
  • Remove warning about max_tis per query > parallelism (#34742)
  • Improve modules import in Airflow core by moving some of them into a type-checking block (#33755)
  • Fix tests to respond to Python 3.12 handling of utcnow in sentry-sdk (#34946)
  • Add connexion<3.0 upper bound (#35218)
  • Limit Airflow to < 3.12 (#35123)
  • update moto version (#34938)
  • Limit WTForms to below 3.1.0 (#34943)

Doc Only Changes

  • Fix variables substitution in Airflow Documentation (#34462)
  • Added example for defaults in conn.extras (#35165)
  • Update datasets.rst issue with running example code (#35035)
  • Remove mysql-connector-python from recommended MySQL driver (#34287)
  • Fix syntax error in task dependency set_downstream example (#35075)
  • Update documentation to enable test connection (#34905)
  • Update docs errors.rst - Mention sentry "transport" configuration option (#34912)
  • Update dags.rst to put SubDag deprecation note right after the SubDag section heading (#34925)
  • Add info on getting variables and config in custom secrets backend (#34834)
  • Document BaseExecutor interface in more detail to help users in writing custom executors (#34324)
  • Fix broken link to airflow_local_settings.py template (#34826)
  • Fixes python_callable function assignment context kwargs example in params.rst (#34759)
  • Add missing multiple_outputs=True param in the TaskFlow example (#34812)
  • Remove extraneous '>' in provider section name (#34813)
  • Fix imports in extra link documentation (#34547)
airflow - Apache Airflow 2.7.2

Published by ephraimbuddy about 1 year ago

Significant Changes

No significant changes

Bug Fixes

  • Check if the lower of provided values are sensitives in config endpoint (#34712)
  • Add support for ZoneInfo and generic UTC to fix datetime serialization (#34683, #34804)
  • Fix AttributeError: 'Select' object has no attribute 'count' during the airflow db migrate command (#34348)
  • Make dry run optional for patch task instance (#34568)
  • Fix non deterministic datetime deserialization (#34492)
  • Use iterative loop to look for mapped parent (#34622)
  • Fix is_parent_mapped value by checking if any of the parent taskgroup is mapped (#34587)
  • Avoid top-level airflow import to avoid circular dependency (#34586)
  • Add more exemptions to lengthy metric list (#34531)
  • Fix dag warning endpoint permissions (#34355)
  • Fix task instance access issue in the batch endpoint (#34315)
  • Correcting wrong time showing in grid view (#34179)
  • Fix www cluster_activity view not loading due to standaloneDagProcessor templating (#34274)
  • Set loglevel=DEBUG in 'Not syncing DAG-level permissions' (#34268)
  • Make param validation consistent for DAG validation and triggering (#34248)
  • Ensure details panel is shown when any tab is selected (#34136)
  • Fix issues related to access_control={} (#34114)
  • Fix not found ab_user table in the CLI session (#34120)
  • Fix FAB-related logging format interpolation (#34139)
  • Fix query bug in next_run_datasets_summary endpoint (#34143)
  • Fix for TaskGroup toggles for duplicated labels (#34072)
  • Fix the required permissions to clear a TI from the UI (#34123)
  • Reuse _run_task_session in mapped render_template_fields (#33309)
  • Fix scheduler logic to plan new dag runs by ignoring manual runs (#34027)
  • Add missing audit logs for Flask actions add, edit and delete (#34090)
  • Hide Irrelevant Dag Processor from Cluster Activity Page (#33611)
  • Remove infinite animation for pinwheel, spin for 1.5s (#34020)
  • Restore rendering of provider configuration with version_added (#34011)

Doc Only Changes

  • Clarify audit log permissions (#34815)
  • Add explanation for Audit log users (#34814)
  • Import AUTH_REMOTE_USER from FAB in WSGI middleware example (#34721)
  • Add information about drop support MsSQL as DB Backend in the future (#34375)
  • Document how to use the system's timezone database (#34667)
  • Clarify what landing time means in doc (#34608)
  • Fix screenshot in dynamic task mapping docs (#34566)
  • Fix class reference in Public Interface documentation (#34454)
  • Clarify var.value.get and var.json.get usage (#34411)
  • Schedule default value description (#34291)
  • Docs for triggered_dataset_event (#34410)
  • Add DagRun events (#34328)
  • Provide tabular overview about trigger form param types (#34285)
  • Add link to Amazon Provider Configuration in Core documentation (#34305)
  • Add "security infrastructure" paragraph to security model (#34301)
  • Change links to SQLAlchemy 1.4 (#34288)
  • Add SBOM entry in security documentation (#34261)
  • Added more example code for XCom push and pull (#34016)
  • Add state utils to Public Airflow Interface (#34059)
  • Replace markdown style link with rst style link (#33990)
  • Fix broken link to the "UPDATING.md" file (#33583)

Misc/Internal

  • Update min-sqlalchemy version to account for latest features used (#34293)
  • Fix SesssionExemptMixin spelling (#34696)
  • Restrict astroid version < 3 (#34658)
  • Fail dag test if defer without triggerer (#34619)
  • Fix connections exported output (#34640)
  • Don't run isort when creating new alembic migrations (#34636)
  • Deprecate numeric type python version in PythonVirtualEnvOperator (#34359)
  • Refactor os.path.splitext to Path.* (#34352, #33669)
  • Replace = by is for type comparison (#33983)
  • Refactor integer division (#34180)
  • Refactor: Simplify comparisons (#34181)
  • Refactor: Simplify string generation (#34118)
  • Replace unnecessary dict comprehension with dict() in core (#33858)
  • Change "not all" to "any" for ease of readability (#34259)
  • Replace assert by if...raise in code (#34250, #34249)
  • Move default timezone to except block (#34245)
  • Combine similar if logic in core (#33988)
  • Refactor: Consolidate import and usage of random (#34108)
  • Consolidate importing of os.path.* (#34060)
  • Replace sequence concatenation by unpacking in Airflow core (#33934)
  • Refactor unneeded 'continue' jumps around the repo (#33849, #33845, #33846, #33848, #33839, #33844, #33836, #33842)
  • Remove [project] section from pyproject.toml (#34014)
  • Move the try outside the loop when this is possible in Airflow core (#33975)
  • Replace loop by any when looking for a positive value in core (#33985)
  • Do not create lists we don't need (#33519)
  • Remove useless string join from core (#33969)
  • Add TCH001 and TCH002 rules to pre-commit to detect and move type checking modules (#33865)
  • Add cancel_trigger_ids to to_cancel dequeue in batch (#33944)
  • Avoid creating unnecessary list when parsing stats datadog tags (#33943)
  • Replace dict.items by dict.values when key is not used in core (#33940)
  • Replace lambdas with comprehensions (#33745)
  • Improve modules import in Airflow core by some of them into a type-checking block (#33755)
  • Refactor: remove unused state - SHUTDOWN (#33746, #34063, #33893)
  • Refactor: Use in-place .sort() (#33743)
  • Use literal dict instead of calling dict() in Airflow core (#33762)
  • remove unnecessary map and rewrite it using list in Airflow core (#33764)
  • Replace lambda by a def method in Airflow core (#33758)
  • Replace type func by isinstance in fab_security manager (#33760)
  • Replace single quotes by double quotes in all Airflow modules (#33766)
  • Merge multiple isinstance calls for the same object in a single call (#33767)
  • Use a single statement with multiple contexts instead of nested statements in core (#33769)
  • Refactor: Use f-strings (#33734, #33455)
  • Refactor: Use random.choices (#33631)
  • Use str.splitlines() to split lines (#33592)
  • Refactor: Remove useless str() calls (#33629)
  • Refactor: Improve detection of duplicates and list sorting (#33675)
  • Simplify conditions on len() (#33454)
airflow - Apache Airflow Helm Chart 1.11.0

Published by jedcunningham about 1 year ago

Significant Changes

Support naming customization on helm chart resources, some resources may be renamed during upgrade (#31066)

This is a new opt-in switch useStandardNaming, for backwards compatibility, to leverage the standard naming convention, which allows full use of fullnameOverride and nameOverride in all resources.

The following resources will be renamed using default of useStandardNaming=false when upgrading to 1.11.0 or a higher version.

  • ConfigMap {release}-airflow-config to {release}-config
  • Secret {release}-airflow-metadata to {release}-metadata
  • Secret {release}-airflow-result-backend to {release}-result-backend
  • Ingress {release}-airflow-ingress to {release}-ingress

For existing installations, all your resources will be recreated with a new name and Helm will delete the previous resources.

This won't delete existing PVCs for logs used by StatefulSet/Deployments, but it will recreate them with brand new PVCs.
If you do want to preserve logs history you'll need to manually copy the data of these volumes into the new volumes after
deployment. Depending on what storage backend/class you're using this procedure may vary. If you don't mind starting
with fresh logs/redis volumes, you can just delete the old PVCs that will be names, for example:

.. code-block:: bash

kubectl delete pvc -n airflow logs-gta-triggerer-0
kubectl delete pvc -n airflow logs-gta-worker-0
kubectl delete pvc -n airflow redis-db-gta-redis-0

If you do not change useStandardNaming or fullnameOverride after upgrade, you can proceed as usual and no unexpected behaviours will be presented.

bitnami/postgresql subchart updated to 12.10.0 (#33747)

The PostgreSQL subchart that is used with the Chart is now 12.10.0, previously it was 12.1.9.

Default git-sync image is updated to 3.6.9 (#33748)

The default git-sync image that is used with the Chart is now 3.6.9, previously it was 3.6.3.

Default Airflow image is updated to 2.7.1 (#34186)

The default Airflow image that is used with the Chart is now 2.7.1, previously it was 2.6.2.

New Features

  • Add support for scheduler name to PODs templates (#33843)
  • Support KEDA scaling for triggerer (#32302)
  • Add support for container lifecycle hooks (#32349, #34677)
  • Support naming customization on helm chart resources (#31066)
  • Adding startupProbe to scheduler and webserver (#33107)
  • Allow disabling token mounts using automountServiceAccountToken (#32808)
  • Add support for defining custom priority classes (#31615)
  • Add support for runtimeClassName (#31868)
  • Add support for custom query in workers KEDA trigger (#32308)

Improvements

  • Add containerSecurityContext for cleanup job (#34351)
  • Add existing secret support for PGBouncer metrics exporter (#32724)
  • Allow templating in webserver ingress hostnames (#33142)
  • Allow templating in flower ingress hostnames (#33363)
  • Add configmap annotations to StatsD and webserver (#33340)
  • Add pod security context to PgBouncer (#32662)
  • Add an option to use a direct DB connection in KEDA when PgBouncer is enabled (#32608)
  • Allow templating in cleanup.schedule (#32570)
  • Template dag processor waitformigration containers extraVolumeMounts (#32100)
  • Ability to inject extra containers into PgBouncer (#33686)
  • Allowing ability to add custom env into PgBouncer container (#33438)
  • Add support for env variables in the StatsD container (#33175)

Bug Fixes

  • Add airflow db migrate command to database migration job (#34178)
  • Pass workers.terminationGracePeriodSeconds into KubeExecutor pod template (#33514)
  • CeleryExecutor namespace depends on Airflow version (#32753)
  • Fix dag processor not including webserver config volume (#32644)
  • Dag processor liveness probe include --local and --job-type args (#32426)
  • Revising flower_url_prefix considering default value (#33134)

Doc only changes

  • Add more explicit "embedded postgres" exclusion for production (#33034)
  • Update git-sync description (#32181)

Misc

  • Default Airflow version to 2.7.1 (#34186)
  • Update PostgreSQL subchart to 12.10.0 (#33747)
  • Update git-sync to 3.6.9 (#33748)
  • Remove unnecessary loops to load env from helm values (#33506)
  • Replace common.tplvalues.render with tpl in ingress template files (#33384)
  • Remove K8S 1.23 support (#32899)
  • Fix chart named template comments (#32681)
  • Remove outdated comment from chart values in the workers KEDA conf section (#32300)
  • Remove unnecessary or function in template files (#34415)
airflow - Apache Airflow 2.7.1

Published by ephraimbuddy about 1 year ago

Significant Changes

CronTriggerTimetable is now less aggressive when trying to skip a run (#33404)

When setting catchup=False, CronTriggerTimetable no longer skips a run if
the scheduler does not query the timetable immediately after the previous run
has been triggered.

This should not affect scheduling in most cases, but can change the behaviour if
a DAG is paused-unpaused to manually skip a run. Previously, the timetable (with
catchup=False) would only start a run after a DAG is unpaused, but with this
change, the scheduler would try to look at little bit back to schedule the
previous run that covers a part of the period when the DAG was paused. This
means you will need to keep a DAG paused longer (namely, for the entire cron
period to pass) to really skip a run.

Note that this is also the behaviour exhibited by various other cron-based
scheduling tools, such as anacron.

conf.set() becomes case insensitive to match conf.get() behavior (#33452)

Also, conf.get() will now break if used with non-string parameters.

conf.set(section, key, value) used to be case sensitive, i.e. conf.set("SECTION", "KEY", value)
and conf.set("section", "key", value) were stored as two distinct configurations.
This was inconsistent with the behavior of conf.get(section, key), which was always converting the section and key to lower case.

As a result, configuration options set with upper case characters in the section or key were unreachable.
That's why we are now converting section and key to lower case in conf.set too.

We also changed a bit the behavior of conf.get(). It used to allow objects that are not strings in the section or key.
Doing this will now result in an exception. For instance, conf.get("section", 123) needs to be replaced with conf.get("section", "123").

Bug Fixes

  • Ensure that tasks wait for running indirect setup (#33903)
  • Respect "soft_fail" for core async sensors (#33403)
  • Differentiate 0 and unset as a default param values (#33965)
  • Raise 404 from Variable PATCH API if variable is not found (#33885)
  • Fix MappedTaskGroup tasks not respecting upstream dependency (#33732)
  • Add limit 1 if required first value from query result (#33672)
  • Fix UI DAG counts including deleted DAGs (#33778)
  • Fix cleaning zombie RESTARTING tasks (#33706)
  • SECURITY_MANAGER_CLASS should be a reference to class, not a string (#33690)
  • Add back get_url_for_login in security manager (#33660)
  • Fix 2.7.0 db migration job errors (#33652)
  • Set context inside templates (#33645)
  • Treat dag-defined access_control as authoritative if defined (#33632)
  • Bind engine before attempting to drop archive tables (#33622)
  • Add a fallback in case no first name and last name are set (#33617)
  • Sort data before groupby in TIS duration calculation (#33535)
  • Stop adding values to rendered templates UI when there is no dagrun (#33516)
  • Set strict to True when parsing dates in webserver views (#33512)
  • Use dialect.name in custom SA types (#33503)
  • Do not return ongoing dagrun when a end_date is less than utcnow (#33488)
  • Fix a bug in formatDuration method (#33486)
  • Make conf.set case insensitive (#33452)
  • Allow timetable to slightly miss catchup cutoff (#33404)
  • Respect soft_fail argument when poke is called (#33401)
  • Create a new method used to resume the task in order to implement specific logic for operators (#33424)
  • Fix DagFileProcessor interfering with dags outside its processor_subdir (#33357)
  • Remove the unnecessary <br> text in Provider's view (#33326)
  • Respect soft_fail argument when ExternalTaskSensor runs in deferrable mode (#33196)
  • Fix handling of default value and serialization of Param class (#33141)
  • Check if the dynamically-added index is in the table schema before adding (#32731)
  • Fix rendering the mapped parameters when using expand_kwargs method (#32272)
  • Fix dependencies for celery and opentelemetry for Python 3.8 (#33579)

Misc/Internal

  • Bring back Pydantic 1 compatibility (#34081, #33998)
  • Use a trimmed version of README.md for PyPI (#33637)
  • Upgrade to Pydantic 2 (#33956)
  • Reorganize devel_only extra in Airflow's setup.py (#33907)
  • Bumping FAB to 4.3.4 in order to fix issues with filters (#33931)
  • Add minimum requirement for sqlalchemy to 1.4.24 (#33892)
  • Update version_added field for configs in config file (#33509)
  • Replace OrderedDict with plain dict (#33508)
  • Consolidate import and usage of itertools (#33479)
  • Static check fixes (#33462)
  • Import utc from datetime and normalize its import (#33450)
  • D401 Support (#33352, #33339, #33337, #33336, #33335, #33333, #33338)
  • Fix some missing type hints (#33334)
  • D205 Support - Stragglers (#33301, #33298, #33297)
  • Refactor: Simplify code (#33160, #33270, #33268, #33267, #33266, #33264, #33292, #33453, #33476, #33567,
    #33568, #33480, #33753, #33520, #33623)
  • Fix Pydantic warning about orm_mode rename (#33220)
  • Add MySQL 8.1 to supported versions. (#33576)
  • Remove Pydantic limitation for version < 2 (#33507)

Doc only changes

  • Add documentation explaining template_ext (and how to override it) (#33735)
  • Explain how users can check if python code is top-level (#34006)
  • Clarify that DAG authors can also run code in DAG File Processor (#33920)
  • Fix broken link in Modules Management page (#33499)
  • Fix secrets backend docs (#33471)
  • Fix config description for base_log_folder (#33388)
airflow - Apache Airflow 2.7.0

Published by ephraimbuddy about 1 year ago

Significant Changes

Remove Python 3.7 support (#30963)

As of now, Python 3.7 is no longer supported by the Python community.
Therefore, to use Airflow 2.7.0, you must ensure your Python version is
either 3.8, 3.9, 3.10, or 3.11.

Old Graph View is removed (#32958)

The old Graph View is removed. The new Graph View is the default view now.

The trigger UI form is skipped in web UI if no parameters are defined in a DAG (#33351)

If you are using dag_run.conf dictionary and web UI JSON entry to run your DAG you should either:

  • Add params to your DAG <https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/params.html#use-params-to-provide-a-trigger-ui-form>_
  • Enable the new configuration show_trigger_form_if_no_params to bring back old behaviour

The "db init", "db upgrade" commands and "[database] load_default_connections" configuration options are deprecated (#33136).

Instead, you should use "airflow db migrate" command to create or upgrade database. This command will not create default connections.
In order to create default connections you need to run "airflow connections create-default-connections" explicitly,
after running "airflow db migrate".

In case of SMTP SSL connection, the context now uses the "default" context (#33070)

The "default" context is Python's default_ssl_contest instead of previously used "none". The
default_ssl_context provides a balance between security and compatibility but in some cases,
when certificates are old, self-signed or misconfigured, it might not work. This can be configured
by setting "ssl_context" in "email" configuration of Airflow.

Setting it to "none" brings back the "none" setting that was used in Airflow 2.6 and before,
but it is not recommended due to security reasons ad this setting disables validation of certificates and allows MITM attacks.

Disable default allowing the testing of connections in UI, API and CLI(#32052)

For security reasons, the test connection functionality is disabled by default across Airflow UI,
API and CLI. The availability of the functionality can be controlled by the
test_connection flag in the core section of the Airflow
configuration (airflow.cfg). It can also be controlled by the
environment variable AIRFLOW__CORE__TEST_CONNECTION.

The following values are accepted for this config param:

  1. Disabled: Disables the test connection functionality and
    disables the Test Connection button in the UI.

This is also the default value set in the Airflow configuration.
2. Enabled: Enables the test connection functionality and
activates the Test Connection button in the UI.

  1. Hidden: Disables the test connection functionality and
    hides the Test Connection button in UI.

For more information on capabilities of users, see the documentation:
https://airflow.apache.org/docs/apache-airflow/stable/security/security_model.html#capabilities-of-authenticated-ui-users
It is strongly advised to not enable the feature until you make sure that only
highly trusted UI/API users have "edit connection" permissions.

The xcomEntries API disables support for the deserialize flag by default (#32176)

For security reasons, the /dags/*/dagRuns/*/taskInstances/*/xcomEntries/*
API endpoint now disables the deserialize option to deserialize arbitrary
XCom values in the webserver. For backward compatibility, server admins may set
the [api] enable_xcom_deserialize_support config to True to enable the
flag and restore backward compatibility.

However, it is strongly advised to not enable the feature, and perform
deserialization at the client side instead.

Change of the default Celery application name (#32526)

Default name of the Celery application changed from airflow.executors.celery_executor to airflow.providers.celery.executors.celery_executor.

You should change both your configuration and Health check command to use the new name:

  • in configuration (celery_app_name configuration in celery section) use airflow.providers.celery.executors.celery_executor
  • in your Health check command use airflow.providers.celery.executors.celery_executor.app

The default value for scheduler.max_tis_per_query is changed from 512 to 16 (#32572)

This change is expected to make the Scheduler more responsive.

scheduler.max_tis_per_query needs to be lower than core.parallelism.
If both were left to their default value previously, the effective default value of scheduler.max_tis_per_query was 32
(because it was capped at core.parallelism).

To keep the behavior as close as possible to the old config, one can set scheduler.max_tis_per_query = 0,
in which case it'll always use the value of core.parallelism.

Some executors have been moved to corresponding providers (#32767)

In order to use the executors, you need to install the providers:

  • for Celery executors you need to install apache-airflow-providers-celery package >= 3.3.0
  • for Kubernetes executors you need to install apache-airflow-providers-cncf-kubernetes package >= 7.4.0
  • For Dask executors you need to install apache-airflow-providers-daskexecutor package in any version

You can achieve it also by installing airflow with [celery], [cncf.kubernetes], [daskexecutor] extras respectively.

Users who base their images on the apache/airflow reference image (not slim) should be unaffected - the base
reference image comes with all the three providers installed.

Improvement Changes

PostgreSQL only improvement: Added index on taskinstance table (#30762)

This index seems to have great positive effect in a setup with tens of millions such rows.

New Features

  • Add OpenTelemetry to Airflow (AIP-49 <https://github.com/apache/airflow/pulls?q=is%3Apr+is%3Amerged+label%3AAIP-49+milestone%3A%22Airflow+2.7.0%22>_)
  • Trigger Button - Implement Part 2 of AIP-50 (#31583)
  • Removing Executor Coupling from Core Airflow (AIP-51 <https://github.com/apache/airflow/pulls?q=is%3Apr+is%3Amerged+label%3AAIP-51+milestone%3A%22Airflow+2.7.0%22>_)
  • Automatic setup and teardown tasks (AIP-52 <https://github.com/apache/airflow/pulls?q=is%3Apr+is%3Amerged+label%3AAIP-52+milestone%3A%22Airflow+2.7.0%22>_)
  • OpenLineage in Airflow (AIP-53 <https://github.com/apache/airflow/pulls?q=is%3Apr+is%3Amerged+label%3AAIP-53+milestone%3A%22Airflow+2.7.0%22>_)
  • Experimental: Add a cache to Variable and Connection when called at dag parsing time (#30259)
  • Enable pools to consider deferred tasks (#32709)
  • Allows to choose SSL context for SMTP connection (#33070)
  • New gantt tab (#31806)
  • Load plugins from providers (#32692)
  • Add BranchExternalPythonOperator (#32787, #33360)
  • Add option for storing configuration description in providers (#32629)
  • Introduce Heartbeat Parameter to Allow Per-LocalTaskJob Configuration (#32313)
  • Add Executors discovery and documentation (#32532)
  • Add JobState for job state constants (#32549)
  • Add config to disable the 'deserialize' XCom API flag (#32176)
  • Show task instance in web UI by custom operator name (#31852)
  • Add default_deferrable config (#31712)
  • Introducing AirflowClusterPolicySkipDag exception (#32013)
  • Use reactflow for datasets graph (#31775)
  • Add an option to load the dags from db for command tasks run (#32038)
  • Add version of chain which doesn't require matched lists (#31927)
  • Use operator_name instead of task_type in UI (#31662)
  • Add --retry and --retry-delay to airflow db check (#31836)
  • Allow skipped task state task_instance_schema.py (#31421)
  • Add a new config for celery result_backend engine options (#30426)
  • UI Add Cluster Activity Page (#31123, #32446)
  • Adding keyboard shortcuts to common actions (#30950)
  • Adding more information to kubernetes executor logs (#29929)
  • Add support for configuring custom alembic file (#31415)
  • Add running and failed status tab for DAGs on the UI (#30429)
  • Add multi-select, proposals and labels for trigger form (#31441)
  • Making webserver config customizable (#29926)
  • Render DAGCode in the Grid View as a tab (#31113)
  • Add rest endpoint to get option of configuration (#31056)
  • Add section query param in get config rest API (#30936)
  • Create metrics to track Scheduled->Queued->Running task state transition times (#30612)
  • Mark Task Groups as Success/Failure (#30478)
  • Add CLI command to list the provider trigger info (#30822)
  • Add Fail Fast feature for DAGs (#29406)

Improvements

  • Improve graph nesting logic (#33421)
  • Configurable health check threshold for triggerer (#33089, #33084)
  • add dag_run_ids and task_ids filter for the batch task instance API endpoint (#32705)
  • Ensure DAG-level references are filled on unmap (#33083)
  • Add support for arrays of different data types in the Trigger Form UI (#32734)
  • Always show gantt and code tabs (#33029)
  • Move listener success hook to after SQLAlchemy commit (#32988)
  • Rename db upgrade to db migrate and add connections create-default-connections (#32810, #33136)
  • Remove old gantt chart and redirect to grid views gantt tab (#32908)
  • Adjust graph zoom based on selected task (#32792)
  • Call listener on_task_instance_running after rendering templates (#32716)
  • Display execution_date in graph view task instance tooltip. (#32527)
  • Allow configuration to be contributed by providers (#32604, #32755, #32812)
  • Reduce default for max TIs per query, enforce <= parallelism (#32572)
  • Store config description in Airflow configuration object (#32669)
  • Use isdisjoint instead of not intersection (#32616)
  • Speed up calculation of leaves and roots for task groups (#32592)
  • Kubernetes Executor Load Time Optimizations (#30727)
  • Save DAG parsing time if dag is not schedulable (#30911)
  • Updates health check endpoint to include dag_processor status. (#32382)
  • Disable default allowing the testing of connections in UI, API and CLI (#32052, #33342)
  • Fix config var types under the scheduler section (#32132)
  • Allow to sort Grid View alphabetically (#32179)
  • Add hostname to triggerer metric [triggers.running] (#32050)
  • Improve DAG ORM cleanup code (#30614)
  • TriggerDagRunOperator: Add wait_for_completion to template_fields (#31122)
  • Open links in new tab that take us away from Airflow UI (#32088)
  • Only show code tab when a task is not selected (#31744)
  • Add descriptions for celery and dask cert configs (#31822)
  • PythonVirtualenvOperator termination log in alert (#31747)
  • Migration of all DAG details to existing grid view dag details panel (#31690)
  • Add a diagram to help visualize timer metrics (#30650)
  • Celery Executor load time optimizations (#31001)
  • Update code style for airflow db commands to SQLAlchemy 2.0 style (#31486)
  • Mark uses of md5 as "not-used-for-security" in FIPS environments (#31171)
  • Add pydantic support to serde (#31565)
  • Enable search in note column in DagRun and TaskInstance (#31455)
  • Save scheduler execution time by adding new Index idea for dag_run (#30827)
  • Save scheduler execution time by caching dags (#30704)
  • Support for sorting DAGs by Last Run Date in the web UI (#31234)
  • Better typing for Job and JobRunners (#31240)
  • Add sorting logic by created_date for fetching triggers (#31151)
  • Remove DAGs.can_create on access control doc, adjust test fixture (#30862)
  • Split Celery logs into stdout/stderr (#30485)
  • Decouple metrics clients and validators into their own modules (#30802)
  • Description added for pagination in get_log api (#30729)
  • Optimize performance of scheduling mapped tasks (#30372)
  • Add sentry transport configuration option (#30419)
  • Better message on deserialization error (#30588)

Bug Fixes

  • Remove user sessions when resetting password (#33347)
  • Gantt chart: Use earliest/oldest ti dates if different than dag run start/end (#33215)
  • Fix virtualenv detection for Python virtualenv operator (#33223)
  • Correctly log when there are problems trying to chmod airflow.cfg (#33118)
  • Pass app context to webserver_config.py (#32759)
  • Skip served logs for non-running task try (#32561)
  • Fix reload gunicorn workers (#32102)
  • Fix future DagRun rarely triggered by race conditions when max_active_runs reached its upper limit. (#31414)
  • Fix BaseOperator get_task_instances query (#33054)
  • Fix issue with using the various state enum value in logs (#33065)
  • Use string concatenation to prepend base URL for log_url (#33063)
  • Update graph nodes with operator style attributes (#32822)
  • Affix webserver access_denied warning to be configurable (#33022)
  • Only load task action modal if user can edit (#32992)
  • OpenAPI Spec fix nullable alongside $ref (#32887)
  • Make the decorators of PythonOperator sub-classes extend its decorator (#32845)
  • Fix check if virtualenv is installed in PythonVirtualenvOperator (#32939)
  • Unwrap Proxy before checking __iter__ in is_container() (#32850)
  • Override base log folder by using task handler's base_log_folder (#32781)
  • Catch arbitrary exception from run_job to prevent zombie scheduler (#32707)
  • Fix depends_on_past work for dynamic tasks (#32397)
  • Sort extra_links for predictable order in UI. (#32762)
  • Fix prefix group false graph (#32764)
  • Fix bad delete logic for dagruns (#32684)
  • Fix bug in prune_dict where empty dict and list would be removed even in strict mode (#32573)
  • Add explicit browsers list and correct rel for blank target links (#32633)
  • Handle returned None when multiple_outputs is True (#32625)
  • Fix returned value when ShortCircuitOperator condition is falsy and there is not downstream tasks (#32623)
  • Fix returned value when ShortCircuitOperator condition is falsy (#32569)
  • Fix rendering of dagRunTimeout (#32565)
  • Fix permissions on /blocked endpoint (#32571)
  • Bugfix, prevent force of unpause on trigger DAG (#32456)
  • Fix data interval in cli.dags.trigger command output (#32548)
  • Strip whitespaces from airflow connections form (#32292)
  • Add timedelta support for applicable arguments of sensors (#32515)
  • Fix incorrect default on readonly property in our API (#32510)
  • Add xcom map_index as a filter to xcom endpoint (#32453)
  • Fix CLI commands when custom timetable is used (#32118)
  • Use WebEncoder to encode DagRun.conf in DagRun's list view (#32385)
  • Fix logic of the skip_all_except method (#31153)
  • Ensure dynamic tasks inside dynamic task group only marks the (#32354)
  • Handle the cases that webserver.expose_config is set to non-sensitive-only instead of boolean value (#32261)
  • Add retry functionality for handling process termination caused by database network issues (#31998)
  • Adapt Notifier for sla_miss_callback (#31887)
  • Fix XCOM view (#31807)
  • Fix for "Filter dags by tag" flickering on initial load of dags.html (#31578)
  • Fix where expanding resizer wouldn't expanse grid view (#31581)
  • Fix MappedOperator-BaseOperator attr sync check (#31520)
  • Always pass named type_ arg to drop_constraint (#31306)
  • Fix bad drop_constraint call in migrations (#31302)
  • Resolving problems with redesigned grid view (#31232)
  • Support requirepass redis sentinel (#30352)
  • Fix webserver crash when calling get /config (#31057)

Misc/Internal

  • Modify pathspec version restriction (#33349)
  • Refactor: Simplify code in dag_processing (#33161)
  • For now limit Pydantic to < 2.0.0 (#33235)
  • Refactor: Simplify code in models (#33181)
  • Add elasticsearch group to pre-2.7 defaults (#33166)
  • Refactor: Simplify dict manipulation in airflow/cli (#33159)
  • Remove redundant dict.keys() call (#33158)
  • Upgrade ruff to latest 0.0.282 version in pre-commits (#33152)
  • Move openlineage configuration to provider (#33124)
  • Replace State by TaskInstanceState in Airflow executors (#32627)
  • Get rid of Python 2 numeric relics (#33050)
  • Remove legacy dag code (#33058)
  • Remove legacy task instance modal (#33060)
  • Remove old graph view (#32958)
  • Move CeleryExecutor to the celery provider (#32526, #32628)
  • Move all k8S classes to cncf.kubernetes provider (#32767, #32891)
  • Refactor existence-checking SQL to helper (#32790)
  • Extract Dask executor to new daskexecutor provider (#32772)
  • Remove atlas configuration definition (#32776)
  • Add Redis task handler (#31855)
  • Move writing configuration for webserver to main (webserver limited) (#32766)
  • Improve getting the query count in Airflow API endpoints (#32630)
  • Remove click upper bound (#32634)
  • Add D400 pydocstyle check - core Airflow only (#31297)
  • D205 Support (#31742, #32575, #32213, #32212, #32591, #32449, #32450)
  • Bump word-wrap from 1.2.3 to 1.2.4 in /airflow/www (#32680)
  • Strong-type all single-state enum values (#32537)
  • More strong typed state conversion (#32521)
  • SQL query improvements in utils/db.py (#32518)
  • Bump semver from 6.3.0 to 6.3.1 in /airflow/www (#32506)
  • Bump jsonschema version to 4.18.0 (#32445)
  • Bump stylelint from 13.13.1 to 15.10.1 in /airflow/www (#32435)
  • Bump tough-cookie from 4.0.0 to 4.1.3 in /airflow/www (#32443)
  • upgrade flask-appbuilder (#32054)
  • Support Pydantic 2 (#32366)
  • Limit click until we fix mypy issues (#32413)
  • A couple of minor cleanups (#31890)
  • Replace State usages with strong-typed enums (#31735)
  • Upgrade ruff to 0.272 (#31966)
  • Better error message when serializing callable without name (#31778)
  • Improve the views module a bit (#31661)
  • Remove asynctest (#31664)
  • Refactor sqlalchemy queries to 2.0 style (#31569, #31772, #32350, #32339, #32474, #32645)
  • Remove Python 3.7 support (#30963)
  • Bring back min-airflow-version for preinstalled providers (#31469)
  • Docstring improvements (#31375)
  • Improve typing in SchedulerJobRunner (#31285)
  • Upgrade ruff to 0.0.262 (#30809)
  • Upgrade to MyPy 1.2.0 (#30687)

Docs only changes

  • Clarify UI user types in security model (#33021)
  • Add links to DAGRun / DAG / Task in templates-ref.rst (#33013)
  • Add docs of how to test for DAG Import Errors (#32811)
  • Clean-up of our new security page (#32951)
  • Cleans up Extras reference page (#32954)
  • Update Dag trigger API and command docs (#32696)
  • Add deprecation info to the Airflow modules and classes docstring (#32635)
  • Formatting installation doc to improve readability (#32502)
  • Fix triggerer HA doc (#32454)
  • Add type annotation to code examples (#32422)
  • Document cron and delta timetables (#32392)
  • Update index.rst doc to correct grammar (#32315)
  • Fixing small typo in python.py (#31474)
  • Separate out and clarify policies for providers (#30657)
  • Fix docs: add an "apache" prefix to pip install (#30681)