OpenLineage

An Open Standard for lineage metadata collection

APACHE-2.0 License

Downloads
5.1M
Stars
1.6K

Bot releases are visible (Hide)

OpenLineage - OpenLineage 0.24.0

Published by merobi-hub over 1 year ago

Added

  • Support custom transport types #1795 @nataliezeller1
    Adds a new interface, TransportBuilder, for creating custom transport types without having to modify core components of OpenLineage.
  • Airflow: dbt Cloud integration #1418 @howardyoo
    Adds a new OpenLineage extractor for dbt Cloud that uses the dbt Cloud hook provided by Airflow to communicate with dbt Cloud via its API.
  • Spark: support dataset name modification using regex #1796 @pawel-big-lebowski
    It is a common scenario to write Spark output datasets with a location path ending with /year=2023/month=04. The Spark parameter spark.openlineage.dataset.removePath.pattern introduced here allows for removing certain elements from a path with a regex pattern.

Fixed

  • Spark: catch exception when trying to obtain details of non-existing table. #1798 @pawel-big-lebowski
    This mostly happens when getting table details on START event while the table is still not created.
  • Spark: LogicalPlanSerializer #1792 @pawel-big-lebowski
    Changes LogicalPlanSerializer to make use of non-shaded Jackson classes in order to serialize LogicalPlans. Note: class names are no longer serialized.
  • Flink: fix Flink CI #1801 @pawel-big-lebowski
    Specifies an older image version that succeeds on CI in order to fix the Flink integration.
OpenLineage - OpenLineage 0.23.0

Published by merobi-hub over 1 year ago

Added

  • SQL: parser improvements to support: copy into, create stage, pivot #1742 @pawel-big-lebowski
    Adds support for additional syntax available in sqlparser-rs.
  • dbt: add support for snapshots #1787 @JDarDagran
    Adds support for this special kind of table representing type-2 Slowly Changing Dimensions.

Changed

  • Spark: change custom column lineage visitors #1788 @pawel-big-lebowski
    Makes the CustomColumnLineageVisitor interface public to support custom column lineage.

Fixed

  • Spark: fix null pointer in JobMetricsHolder #1786 @pawel-big-lebowski
    Adds a null check before running put to fix a NPE occurring in JobMetricsHolder
  • SQL: fix query with table generator #1783 @pawel-big-lebowski
    Allows TableFactor::TableFunction to support queries containing table functions.
  • SQL: fix rust code style bug #1785 @pawel-big-lebowski
    Fixes a minor style issue in visitor.rs.

Removed

  • Airflow: Remove explicit pass from several extract_on_complete methods #1771 @JDarDagran
    Removes the code from three extractors.
OpenLineage - OpenLineage 0.22.0

Published by merobi-hub over 1 year ago

Added

  • Spark: properties facet #1717 by @tnazarew
    Adds a new facet to capture specified Spark properties.
  • SQL: SQLParser supports alter, truncate and drop statements #1695 by @pawel-big-lebowski
    Adds support for the statements to the parser.
  • Common/SQL: provide public interface for openlineage_sql package #1727 by @JDarDagran
    Provides a .pyi public interface file for providing typing hints.
  • Java client: add configurable headers to HTTP transport #1718 by @tnazarew
    Adds custom header handling to HttpTransport and the Spark integration.
  • Python client: create client from dictionary #1745 by @JDarDagran
    Adds a new from_dict method to the Python client to support creating it from a dictionary.

Changed

  • Spark: remove URL parameters for JDBC namespaces #1708 by @tnazarew
    Makes the namespace value from an event conform to the naming convention specified in Naming.md.
  • Make OPENLINEAGE_DISABLED case-insensitive #1705 by @jedcunningham
    Makes the environment variable for disabling OpenLineage in the Python client and Airflow integration case-insensitive.

Fixed

  • Spark: fix missing BigQuery class in column lineage #1698 by @pawel-big-lebowski
    The Spark integration now checks if the BigQuery classes are available on the classpath before attempting to use them.
  • DBT: throw UnsupportedDbtCommand when finding unsupported entry in args.which #1724 by @JDarDagran
    Adjusts the dbt-ol script to detect DBT commands in run_results.json only.

Removed

  • Spark: remove unnecessary warnings for column lineage #1700 by @pawel-big-lebowski
    Removes the warnings about OneRowRelation and LocalRelation nodes.
  • Spark: remove deprecated configs #1711 by @tnazarew
    Removes support for deprecated configs.
OpenLineage - OpenLineage 0.21.1

Published by merobi-hub over 1 year ago

Added

  • Clients: add DEBUG logging of events to transports #1633 by @mobuchowski
    Ensures that the DEBUG loglevel on properly configured loggers will always log events, regardless of the chosen transport.
  • Spark: add CustomEnvironmentFacetBuilder class #1545 by New contributor @Anirudh181001
    Enables the capture of custom environment variables from Spark.
  • Spark: introduce the new output visitors AlterTableAddPartitionCommandVisitor and AlterTableSetLocationCommandVisitor #1629 by New contributor @nataliezeller1
    Adds visitors for extracting table names from the Spark commands AlterTableAddPartitionCommand and AlterTableSetLocationCommand. The intended use case is a custom transport for the OpenMetadata lineage API.
  • Spark: add column lineage for JDBC relations #1636 by @tnazarew
    Adds column lineage information to JDBC events with data extracted from query by the SQL parser.
  • SQL: add Linux-aarch64 native library to Java SQL parser #1664 by @mobuchowski
    Adds a Linux-ARM version of the native library. The Java SQL parser interface had only Linux-x64 and MacOS universal binary variants previously.

Changed

  • Airflow: get table database in Athena extractor #1631 by New contributor @rinzool
    Changes the extractor to get a table's database from the table.schema field or the operator default if the field is None.

Fixed

  • dbt: add dbt seed to the list of dbt-ol events #1649 by New contributor @pohek321
    Ensures that dbt-ol test no longer fails when run against an event seed.
  • Spark: make column lineage extraction in Spark support caching #1634 by @pawel-big-lebowski
    Collect column lineage from Spark logical plans that contain cached datasets.
  • Spark: add support for a deprecated config #1586 by @tnazarew
    Maps the deprecated spark.openlineage.url to spark.openlineage.transport.url.
  • Spark: add error message in case of null in url #1590 by @tnazarew
    Improves error logging in the case of undefined URLs.
  • Spark: collect complete event for really quick Spark jobs #1650 by @pawel-big-lebowski
    Improves the collecting of OpenLineage events on SQL complete in the case of quick operations.
  • Spark: fix input/outputs for one node LogicalRelation plans #1668 by @pawel-big-lebowski
    For simple queries like select col1, col2 from my_db.my_table that do not write output,
    the Spark plan contained just a single node, which was wrongly treated as both
    an input and output dataset.
  • SQL: fix file existence check in build script for openlineage-sql-java #1613 by @sekikn
    Ensures that the build script works if the library is compiled solely for Linux.

Removed

  • Airflow: remove JobIdMapping and update macros to better support Airflow version 2+ #1645 by @JDarDagran
    Updates macros to use OpenLineageAdapter's method to generate deterministic run UUIDs because using the JobIdMapping utility is incompatible with Airflow 2+.
OpenLineage - OpenLineage 0.20.6

Published by merobi-hub over 1 year ago

Added

  • Airflow: add new extractor for FTPFileTransmitOperator #1603 @sekikn
    Adds a new extractor for this Airflow operator serving legacy systems.

Changed

  • Airflow: make extractors for async operators work #1601 @JDarDagran
    Sends a deterministic Run UUID for Airflow runs.

Fixed

  • dbt: render actual profile only in profiles.yml #1599 @mobuchowski
    Adds an include_section argument for the Jinja render method to include only one profile if needed.
  • dbt: make compiled_code optional #1595 @JDarDagran
    Makes compiled_code optional for manifest > v7.
OpenLineage - OpenLineage 0.20.4

Published by merobi-hub over 1 year ago

Added

  • Airflow: add new extractor for GCSToGCSOperator #1495 @sekikn
    Adds a new extractor for this operator.
  • Flink: resolve topic names from regex, support 1.16.0 #1522 @pawel-big-lebowski
    Adds support for Flink 1.16.0 and makes the integration resolve topic names from Kafka topic patterns.
  • Proxy: implement lineage event validator for client proxy #1469 @fm100
    Implements logic in the proxy (which is still in development) for validating and handling lineage events.

Changed

  • CI: use ruff instead of flake8, isort, etc., for linting and formatting #1526 @mobuchowski
    Adopts the ruff package, which combines several linters and formatters into one fast binary.

Fixed

  • Airflow: make the Trino catalog non-mandatory #1572 @JDarDagran
    Makes the Trino catalog optional in the Trino extractor.
  • Common: add explicit SQL dependency #1532 @mobuchowski
    Addresses 0.19.2 breaking change to GE integration by including SQL dependency explicitly.
  • DBT: adjust tqdm logging in dbt-ol #1549 @JDarDagran
    Adjusts tqdm to show the correct number of iterations and adds START events for parent runs.
  • DBT: fix typo in log output #1493 @denimalpaca
    Fixes 'emittled' typo in log output.
  • Great Expectations, Airflow: follow Snowflake dataset naming rules #1527 @mobuchowski
    Normalize Snowflake dataset and datasource naming rules among DBT/Airflow/GX; canonize old Snowflake account paths around making them all full size with account, region and cloud names.
  • Java and Python Clients: Kafka does not initialize properties if they are empty; check and notify about Confluent-Kafka requirement #1556 @mobuchowski
    Fixes the failure to initialize KafkaTransport in the Java client and adds an exception if the required confluent-kafka module is missing from the Python client.
  • Spark: add square brackets for list-based Spark configs #1507 @Varunvaruns9
    Adds a condition to treat configs with [] as lists. [] will be required for list-based configs starting with 0.21.0.
  • Spark: fix several Spark/BigQuery-related issues #1557 @mobuchowski
    Fixes the assumption that a version is always a number; adds support for HadoopMapReduceWriteConfigUtil; makes the integration access BigQueryUtil and getTableId using reflection, which supports all BigQuery versions; makes logs provide the full serialized LogicalPlan on debug.
  • SQL: only report partial failures `#1479 @mobuchowski
    Changes the parser so it reports partial failures instead of failing the whole extraction.
OpenLineage - OpenLineage 0.19.2

Published by merobi-hub almost 2 years ago

Added

Fixed

  • Airflow: fix collect_ignore, add flags to Pytest for cleaner output https://github.com/OpenLineage/OpenLineage/pull/1437 @JDarDagran
    Removes the extractors directory from the ignored list, improving unit testing.
  • Spark & Java client: fix README typos @versaurabh
    Fixes typos in the SPDX license headers.
OpenLineage - OpenLineage 0.18.0

Published by merobi-hub almost 2 years ago

Added

  • Airflow: support SQLExecuteQueryOperator #1379 @JDarDagran
    Changes the SQLExtractor and adds support for the dynamic assignment of extractors based on conn_type.
  • Airflow: introduce a new extractor for SFTPOperator #1263 @sekikn
    Adds an extractor for tracing file transfers between local file systems.
  • Airflow: add Sagemaker extractors #1136 @fhoda
    Creates extractors for SagemakeProcessingOperator and SagemakerTransformOperator.
  • Airflow: add S3 extractor for Airflow operators #1166 @fhoda
    Creates an extractor for the S3CopyObject in the Airflow integration.
  • Spec: add spec file for ExternalQueryRunFacet #1262 @howardyoo
    Adds a spec file to make this facet available for the Java client. Includes a README.
  • Docs: add a TSC doc #1303 @merobi-hub
    Adds a document listing the members of the Technical Steering Committee.

Fixed

  • Spark: improve Databricks to send better events #1330 @pawel-big-lebowski
    Filters unwanted events and provides a meaningful job name.
  • Spark-Bigquery: fix a few of the common errors #1377 @mobuchowski
    Fixes a few of the common issues with the Spark-Bigquery integration and adds an integration test and configures CI.
  • Python: validate eventTime field in Python client #1355 @pawel-big-lebowski
    Validates the eventTime of a RunEvent within the client library.
  • Databricks: Handle Databricks Runtime 11.3 changes to DbFsUtils constructor #1351 @wjohnson
    Recaptures lost mount point information from the DatabricksEnvironmentFacetBuilder and environment-properties facet by looking at the number of parameters in the DbFsUtils constructor to determine the runtime version.
OpenLineage - OpenLineage 0.17.0

Published by merobi-hub almost 2 years ago

Added

Changed

Fixed

Removed

OpenLineage - OpenLineage 0.16.1

Published by merobi-hub almost 2 years ago

Added

Changed

Fixed

Removed

OpenLineage - OpenLineage 0.15.1

Published by merobi-hub about 2 years ago

Added

Changed

Fixed

OpenLineage - OpenLineage 0.14.1

Published by merobi-hub about 2 years ago

Fixed

OpenLineage - OpenLineage 0.14.0

Published by merobi-hub about 2 years ago

Added

Changed

Fixed

OpenLineage - OpenLineage 0.13.1

Published by merobi-hub about 2 years ago

Fixed

OpenLineage - OpenLineage 0.13.0

Published by merobi-hub about 2 years ago

Added

Changed

  • Use RUNNING EventType in Flink integration for currently running jobs #985 @mzareba382
  • Convert task object into JSON encodable when creating Airflow version facet #1018 @fm100

Fixed

OpenLineage - OpenLineage 0.12.0

Published by merobi-hub about 2 years ago

Added

Changed

Fixed

OpenLineage - OpenLineage 0.11.0

Published by merobi-hub over 2 years ago

Added

Changed

Fixed

OpenLineage - OpenLineage 0.10.0

Published by merobi-hub over 2 years ago

Added

Changed

Fixed

OpenLineage - OpenLineage 0.9.0

Published by merobi-hub over 2 years ago

Added

Fixed

OpenLineage - OpenLineage 0.8.2

Published by merobi-hub over 2 years ago

Added

Fixed