Collect, aggregate, and visualize a data ecosystem's metadata
APACHE-2.0 License
Bot releases are hidden (Show)
Published by merobi-hub almost 2 years ago
Published by merobi-hub almost 2 years ago
dataset_symlinks
table for SymlinkDatasetFacet
https://github.com/MarquezProject/marquez/pull/2087 @pawel-big-lebowski
column-lineage.description
column https://github.com/MarquezProject/marquez/pull/2205 @pawel-big-lebowski
parentRun
facet as reported by older Airflow OpenLineage versions https://github.com/MarquezProject/marquez/pull/2130 @collado-mike
jobs_current_version_uuid_index
and jobs_symlink_target_uuid_index
to ignore NULL values https://github.com/MarquezProject/marquez/pull/2186 @collado-mike
Published by merobi-hub about 2 years ago
--metadata
option to seed backend with OpenLineage events https://github.com/MarquezProject/marquez/pull/2082 @wslulciuc--metadata
option. (Metadata used in the command was not being defined using the OpenLineage standard.)
nodeId
in the spec https://github.com/MarquezProject/marquez/pull/2084 @howardyoometadata
cmd https://github.com/MarquezProject/marquez/pull/2091 @wslulciucmetadata
to generate OpenLineage events; generated events will be saved to a file called metadata.json
that can be used to seed Marquez via the seed cmd. (We lacked a way to performance test the data model of Marquez with significantly large OL events.)
io.dropwizard.cli.Command
instead of io.dropwizard.cli.ConfiguredCommand
to no longer require passing marquez.yml
as an argument to the seed cmd. (The marquez.yml argument is not used in the seed cmd.)
OpenLineageDao
to handle Airflow run UUID conflicts https://github.com/MarquezProject/marquez/pull/2097 @collado-mikePublished by merobi-hub about 2 years ago
Published by merobi-hub about 2 years ago
jobs_view
to stop computing FQN on reads and to compute on writes instead #2036 @collado-mike
Run
in the openapi spec to include a context
field #2020 @esaych
Published by merobi-hub over 2 years ago
Published by merobi-hub over 2 years ago
LifecycleStateChangeFacet
with an ability to softly delete datasets #1847 @pawel-big-lebowski
operationId
to openapi spec #1978 @phixMe
=
, @
, and ;
#1936 @mobuchowski
Published by merobi-hub over 2 years ago
LoggingMdcFilter
to include API method, path, and request ID @fm100
Java11
to Java17
@ucg8j
temurin
enabling Marquez to run on multiple CPU architectures @ucg8j
The /api/v1-beta/lineage
endpoint @wslulciuc
The marquez-airflow
lib. has been removed, Please use the openlineage-airflow
library instead. To migrate to using openlineage-airflow
, make the following changes @wslulciuc:
# Update the import in your DAG definitions
-from marquez_airflow import DAG
+from openlineage.airflow import DAG
# Update the following environment variables in your Airflow instance
-MARQUEZ_URL
+OPENLINEAGE_URL
-MARQUEZ_NAMESPACE
+OPENLINEAGE_NAMESPACE
The marquez-spark
lib. has been removed. Please use the openlineage-spark
library instead. To migrate to using openlineage-spark
, make the following changes @wslulciuc:
SparkSession.builder()
- .config("spark.jars.packages", "io.github.marquezproject:marquez-spark:0.20.+")
+ .config("spark.jars.packages", "io.openlineage:openlineage-spark:0.2.+")
- .config("spark.extraListeners", "marquez.spark.agent.SparkListener")
+ .config("spark.extraListeners", "io.openlineage.spark.agent.OpenLineageSparkListener")
.config("spark.openlineage.host", "https://api.demo.datakin.com")
.config("spark.openlineage.apiKey", "your datakin api key")
.config("spark.openlineage.namespace", "<NAMESPACE_NAME>")
.getOrCreate()
Published by wslulciuc almost 3 years ago
7.x
@wslulciuc
eclipse-temurin
for Marquez API base docker image @fm100
0.25.0
. Please use the /lineage
endpoint when collecting source, dataset, and job metadata @wslulciuc:
name
column size for tables namespaces
and sources
@mmeasic
Published by collado-mike almost 3 years ago
struct<>
@fm100
Published by wslulciuc almost 3 years ago
docker/up.sh
to run in the background @rossturk
totalCount
in lists of jobs and datatsets @phixMe
dataset_fields
table to TEXT
@wslulciuc
ZonedDateTime
parsing to support optional offsets and default to server timezone @collado-mike
Job.location
and Source.connectionUrl
should be in URI format on write @OleksandrDvornik
WriteOnly
clients for java
and python
. Before OpenLineage, we added a WriteOnly
implementation to our clients to emit calls to a backend. A backend
enabled collecting raw HTTP requests to an HTTP endpoint, console, or file. This was our way of capturing lineage events that could then be used to automatically create resources on the Marquez backend. We soon worked on a standard that eventually became OpenLineage. That is, OpenLineage removed the need to make individual calls to create a namespace, a source, a datasets, etc, but rather accept an event with metadata that the backend could process. @wslulciuc
Published by wslulciuc about 3 years ago
.env.example
to override variables defined in docker-compose files @wslulciuc
marquez
to marquez.tracing
pkg8080
when running the Marquez shadow jar
@wslulciuc
examples/airflow
to use openlineage-airflow
and fix the SQL in DAG troubleshooting step @wslulciuc
job_versions_io_mapping_inputs
and job_versions_io_mapping_outputs
tables @OleksandrDvornik
Published by wslulciuc about 3 years ago
/api/v1/lineage
endpoint to docs and deprecate run endpoints @wslulciuc
FieldType
enum @wslulciuc
0.19.0
). Please use the POST /api/v1/lineage
endpoint when collecting job run metadata. @wslulciuc
openlineage-airflow
library instead. @wslulciuc
openlineage-spark
library instead. @wslulciuc
java
and python
(scheduled to be removed in 0.19.0
) @wslulciuc
Published by wslulciuc over 3 years ago
DatasetVersionDao
queries missing input and output facets @dominiquetipton
Run
and JobData
models @collado-mike
openlineage.*
configuration parameters with spark.*
@collado-mike
SqlParser
used in Airflow integration @wslulciuc
Published by wslulciuc over 3 years ago
Published by collado-mike over 3 years ago
Published by collado-mike over 3 years ago
Published by wslulciuc over 3 years ago
Published by wslulciuc over 3 years ago
requests
dep in marquez-airflow
integration @wslulciuc
attrs
dep in marquez-airflow
integration @wslulciuc
Published by wslulciuc over 3 years ago