marquez

Collect, aggregate, and visualize a data ecosystem's metadata

APACHE-2.0 License

Downloads
2.8K
Stars
1.6K

Bot releases are hidden (Show)

marquez - Latest Release

Published by merobi-hub 7 months ago

Changed

  • Web: various revisions #2770 @phixMe
    Includes clean up of issues in the UI and removal of non-useful elements.

Fixed

  • Streaming API: fix behaviour for COMPLETE/FAIL events within streaming jobs #2768 @pawel-big-lebowski
    New job_version is not created for a streaming job terminal event with no dataset information and existing version is kept.
marquez -

Published by merobi-hub 8 months ago

Added


Redesigned Web UI Featuring Column Lineage


Fixed

marquez - Marquez 0.45.0-rc.1

Published by merobi-hub 8 months ago

Added

Fixed

marquez - Marquez 0.44.0

Published by merobi-hub 9 months ago

Added

  • Web: add dataset tags tabs for adding/deleting of tags #2714 @davidsharp7
    Adds a dataset tags component so that datasets can have tags added/deleted.
  • API: Add endpoint to delete field-level tags #2705 @davidsharp7
    Adds delete endpoint to remove dataset field tags.

Fixed

  • Web: fix dataset tag reducers bug #2716 @davidsharp7
    Removes result from dataset tags reducer to fix a sidebar bug.
marquez - Marquez 0.43.1

Published by merobi-hub 10 months ago

Fixed

marquez - Marquez 0.43.0

Published by merobi-hub 10 months ago

Added

  • API: refactor the RunDao SQL query #2685 @sophiely
    Improves the performance of the SQL query used for listing all runs.
  • API: refactor dataset version query #2683 @sophiely
    Improves the performance of the SQL query used for the dataset version.
  • API: add support for a DatasetEvent #2641 #2654 @pawel-big-lebowski
    Adds a feature for saving into the Marquez model datasets sent via the DatasetEvent event type. Includes optimization of the lineage query.
  • API: add support for a JobEvent #2661 @pawel-big-lebowski
    Adds a feature for saving into the Marquez model jobs and datasets sent via the JobEvent event type.
  • API: add support for streaming jobs #2682 @pawel-big-lebowski
    Creates job version and reference rows at the beginning of the job instead of on complete. Updates the job version within the run if anything changes.
  • API/spec: implement upstream run-level lineage #2658 @julienledem
    Returns the version of each job and dataset a run is depending on.
  • API: add DELETE endpoint for dataset tags #2698 @davidsharp7
    Creates a new endpoint for removing the linkage between a dataset and a tag in datasets_tag_mapping to supply a way to delete a tag from a dataset via the API.
  • Web: add a dataset drawer #2672 @davidsharp7
    Adds a drawer to the dataset column view in the GUI.

Fixed:

  • Client/Java: change url path encoding to match jersey decoding #2693 @davidjgoss
    Swaps out the implementation of MarquezPathV1::encode to use the UrlEscapers path segment escaper, which does proper URI encoding.
  • Web: fix pagination in the Jobs route #2655 @merobi-hub
    Hides job pagination in the case of no jobs.
  • Web: fix empty search experience #2679 @phixMe
    Use of the previous search value was resulting in a bad request for the first character of a search.

Removed:

  • Client/Java: remove maven-archiver dependency from the Java client #2695 @davidjgoss
    Removes a dependency from build.gradle that was bringing some transitive vulnerabilities.
marquez - Marquez 0.42.0

Published by merobi-hub about 1 year ago

Added

Client: add Java client method for dataset/job lineage #2623 @davidjgoss
To add a method for the dataset/job-level endpoint (GET /lineage) to the Java SDK, this adds a new method to the MarquezClient for the endpoint, along with tests, and the necessary new subclasses of NodeData for datasets and jobs.
Web: add IO tab #2613 @phixme
Improves experience with large graphs by adding a new tab to move between graph elements without looking at the graph itself.
Web: add hover-over Tag tooltip to datasets #2630 @davidsharp7
For parity with columns in the GUI, this adds a Tag tooltip to datasets.

Changed

Docker: upgrade to Docker Compose V2 #2644 @merobi-hub
Docker Compose V1 has been at EOL since June, but docker/up.sh uses the V1 format. This upgrades the up command in up.sh to V2.

Removed

API: drop table job_contexts and usage #2621 @wslulciuc
Removes usage of job_contexts, which has been replaced by OpenLineage facets, and adds a migration to drop the table.
API: remove usage of current_job_context_uuid column #2622 @wslulciuc
Removes usage of job_context_uuid and current_job_context_uuid. Column to be removed in 0.43.0.

Fixed

Web: fix Unix epoch time display for null endedAt values #2647 @merobi-hub
Fixes the issue of the GUI displaying Unix epoch time (midnight on January 1, 1970) in the case of running jobs/null endedAt values.

marquez - Marquez 0.41.0

Published by merobi-hub about 1 year ago

Added

  • API: add support for the following parameters in the SearchDao #2556 @tati @wslulciuc
    This PR updates the search endpoint to enforce YYYY-MM-DD for query params, use YYYY-MM-DD as LocalDate, and support the following query params:
    • namespace - matches jobs or datasets within the given namespace.
    • before - matches jobs or datasets before YYYY-MM-DD.
    • after - matches jobs or datasets after YYYY-MM-DD.
  • Web: add paging on jobs and datasets #2614 @phixme
    Adds paging to jobs and datasets just like we already have on the lineage events page.
  • Web: add tag descriptions to tooltips #2612 @davidsharp7
    Get the tag descriptions from the tags endpoint and when a column has a tag display the corresponding description on hover over. Context can be found here.
  • Web: add available column-level tags #2606 @davidsharp7
    Adds a new column called "tags" to the dataset column view along with the tags associated with the dataset column.
  • Web: add HTML Tool Tip #2601 @davidsharp7
    Adds a Tool Tip to display basic node details.

Fixed

  • Web: fix dataset saga for paging #2615 @phixme
    Updates the saga, changes the default page size.
  • API: perf/improve jobdao query #2609 @algorithmy1
    Optimizes the query to make use of Common Table Expressions to fetch the required data more efficiently and before the join, fixing a significant bottleneck.

Changed

  • Docker: Postgres 14 #2607 @wslulciuc
    Bumps the recommended version of Postgres to 14.
    When deploying locally, you might need to run ./docker/down.sh to clean existing volumes.

Removed

  • Client: tolerate null transformation attrs in field model #2600 @davidjgoss
    Removes the @NonNull annotation from the client class and the @NotNull from the model class.
marquez - Marquez 0.40.0

Published by merobi-hub about 1 year ago

Added

  • API: lineage events paging update #2577 @phixme
    Updates the API for lineage events and restyles the lineage events page to fix a number of bugs and code duplication.
  • Chart: do not use hardcoded Postgres image for init container #2579 @terrpan
    Adds a template in chart/templates/helpers to use the global.imageRegistry input value for the wait-for-db container to improve performance on private registries.
  • Web: add copy button for lineage ID #2578 @AmandaYao00
    Adds a copy button to the IDs on the Events page.

Fixed

  • API: add defaults for idFromValue() and idFromValueAndType() #2581 @wslulciuc
    Replaces the null values in these functions in EventTypeResolver with defaults.
  • Client: correct example syntax #2575 @davidjgoss
    Removes errant parens from the sample code's client instantiation.
marquez - Marquez 0.39.0

Published by merobi-hub about 1 year ago

Added

  • Web: add full graph toggle #2569 @jlukenoff
    Adds a toggle to the Lineage UI to let users switch between viewing the full graph and only the selected paths.
  • Web: add ARIA labels to input fields #2562 @merobi-hub
    Adds i18next-compliant ARIA labels to input fields for improved accessibility.

Changed

  • Web: upgrade React to version 18 #2563 @Xavier-Cliquennois
    Upgrades the Web client in order to utilize the latest version of Node.js and update all dependencies to their respective latest versions.

Fixed

  • Web: fix the stylesheet for the date selector #2573 @phixme
    Fixes margins and moves the label to be more inline with what the defaults are to fix issues caused by the recent Material-UI upgrade.
  • Web: update i18n for general search filter and runInfo facets search #2557 @merobi-hub
    Adds missing i18n support for runInfo and search.
  • Docker: update web proxy import #2571 @phixme
    Updates the import style for the http-proxy-middleware.
marquez - Marquez 0.38.0

Published by merobi-hub about 1 year ago

Added

  • API: add db retention support #2486 @wslulciuc
    Adds migration, a dbRetention config in marquez.yml for enabling a retention policy, and a db-retention command for executing a policy.
  • API: add runs state indices #2535 @phixme
    Adds four indices to help run retention faster.
  • API: define DbRetentionJob(Jdbi, DbRetentionConfig) #2549 @wslulciuc
    Adds @Positive to DbRetentionConfig instance variables for validating DbRetentionConfig properties internally within the class.
  • API: add log for when retention job starts #2551 @wslulciuc
    Adds logging of DbRetentionJob.

Fixed

  • API: fix slow dataset query updates #2534 @phixme
    Scopes down nested facet queries to be the same scope as the outer query.
  • Client/Python: increase namespace length to 1024 characters #2554 @hloomupgrade
    Changes the namespace length constraint to sync up with the Java client's.
  • Web: remove pagination in case of no content #2559 @Nisarg-Chokshi
    Updates Dataset & Event route rendering to remove pagination in the case of no content.
marquez - Marquez 0.37.0

Published by merobi-hub over 1 year ago

Added

  • API: add ability to decode static metadata events #2495 @pawel-big-lebowski
    Introduces an EventTypeResolver for using the schemaURL field to decode POST requests to /lineage with LineageEvents, DatasetEvents or JobEvents, as the first step in implementing static lineage support.

Fixed

  • API: remove unnecessary DB updates #2531 @pawel-big-lebowski
    Prevent updates that are not needed and are deadlock-prone.
  • Web: revert URL encoding when fetching lineage #2529 @jlukenoff
    Reverts the node ID from being URL-encoded and allows the backend to return lineage details successfully even when a node ID contains special characters.
marquez - Marquez 0.36.0

Published by merobi-hub over 1 year ago

Added

  • UI: add an option for configuring the depth of the lineage graph #2525 @jlukenoff
    Makes the lineage UI a bit easier to navigate, especially for larger lineage graphs.

Fixed

  • Docker: generate new uuid for etl_menus in seed data #2519 @wslulciuc
    Fixes a runID collision creating an invalid lineage graph when the seed command is used.
  • Docker: remove unnecessary copy command from Dockerfile #2516 @Nisarg-Chokshi
    Deletes redundant copy command.
  • Chart: enable RFC7230_LEGACY http compliance on application connectors by default #2524 @jlukenoff
    Adds this configuration to the helm chart by default to fix basic chart installation and ensure that the fix in #1419 does not revert.
marquez - Marquez 0.35.0

Published by merobi-hub over 1 year ago

Added

  • UI: add pagination to datasets #2512 @merobi-hub
    Adds pagination to the datasets route using the same approach employed for events.

Fixed

  • UI: handle lineage graph cycles on the client #2506 @jlukenoff
    Fixes a bug where we blow the stack on the client-side if the user selects a node that is part of a cycle in the graph.
marquez - Marquez 0.34.0

Published by merobi-hub over 1 year ago

Fixed

  • Chart: skip regex after postgresql in chart/values.yaml #2488 @wslulciuc
    Fixes regex for version bump of chart/values.yaml in new-version.sh.
marquez - Marquez 0.33.0

Published by merobi-hub over 1 year ago

Added

  • API: support inputFacets and outputFacets from Openlineage specification #2417 @pawel-big-lebowski
    Adds the ability to store inputFacets / outputFacets sent within datasets, exposing them through the Marquez API as part of the Run resource.

Fixed

  • API: fix job update SQL to correctly use simple_name for job updates #2457 collado-mike
    Fixes a bug in the job update logic stemming from use of the FQN rather than the simple_name and updates the relevant test.
  • API: update SQL in backfill script for facet tables to improve performance #2461 collado-mike
    Dramatically improves migration performance by making the backfill script fetch events by run_uuid via a new temp table for tracking and sorting runs.
  • API: update v61 migration to handle duplicate job names before unique constraint #2464 collado-mike
    To fix a bug in the case of duplicate job FQNs, this renames jobs that have been symlinked to point to newer versions of themselves so that the job FQN doesn't conflict and the unique constraint (without regard to parent job) can be applied. Note: Any installations that have already applied this migration will not see any new operations on their data, but installations that have duplicates will need this fix for the migration to complete successfully.
  • API: make improvements to lineage query performance #2472 collado-mike
    Dramatically lessens the lineage query performance regression caused by removal of the jobs_fqn table in #2448.
  • UI: change color for selected node and edges on graph #2458 tito12
    Improves the visibility of the selected node and edges by increasing the contrast with the background.
  • UI: change color for selected node and edges on graph #2458 tito12
    Improves the visibility of the selected node and edges by increasing the contrast with the background.
  • UI: Handle null run.jobVersion in DatasetInfo.tsx to fix rendering issues. [#2471] (https://github.com/MarquezProject/marquez/pull/2471) perttus
    In some cases Marquez UI fails to render DatasetInfo, this addresses that issue.
  • UI: better handling of null job latestRun for Jobs page #2467 perttus
    Fixes a bug where Jobs view fails to load where some jobs don't have latestRun.
marquez - Marquez 0.32.0

Published by merobi-hub over 1 year ago

Fixed

  • API: improve dataset facets access #2407 @pawel-big-lebowski
    Improves database query performance when accessing dataset facets by rewriting SQL queries in DatasetDao and DatasetVersionDao.
  • Chart: fix communication between the UI and the API #2430 @thomas-delrue
    Defines the value for MARQUEZ_PORT as .Values.marquez.port (80) in the Helm Chart so the Marquez Web component can communicate with the API.
  • UI: always render MqCode #2454 @JDarDagran
    Fixes rendering of DatasetInfo and RunInfo pages when no SqlJobFacet exists.

Removed

  • API: remove job context #2373 @JDarDagran
    Removes the use of job context and adds two endpoints for job/run facets per run. These are called from Web components to replace the job context with SQLJobFacet.
  • API: remove jobs_fqn table and move FQN into jobs directly #2448 @collado-mike
    Fixes loading of certain jobs caused by the inability to enforce uniqueness constraints on fully qualified job names.
marquez - Marquez 0.31.0

Published by merobi-hub over 1 year ago

Added

  • UI: add facet view enhancements #2336 @tito12
    Creates a dynamic component offering the ability to navigate and search the JSON, expand sections and click on links.
  • UI: highlight selected path on graph and display status of jobs and datasets based on last 14 runs or latest quality facets #2384 @tito12
    Adds highlighting of the visual graph based on upstream and downstream dependencies of selected nodes; makes displayed status reflect last 14 runs the case of jobs and latest quality facets in the case of datasets.
  • UI: enable auto-accessibility feature on graph nodes #2388 @merobi-hub
    Adds attributes to the FontAwesomeIcons to enable a built-in accessibility feature.

Fixed

  • API: add index to jobs_fqn table using namespace_name and job_fqn columns #2357 @collado-mike
    Optimizes read queries by adding an index to this table.
  • API: add missing indices to column_lineage, dataset_facets, job_facets tables #2419 @pawel-big-lebowski
    Creates missing indices on reference columns in a number of database tables.
  • Spec: make data version and dataset types the same #2400 @phixme
    Makes the fields property the same for datasets and dataset versions, allowing type-generating systems to treat them the same way.
  • UI: show location button only when link to code exists #2409 @tito12
    Makes the button visible only if the link is not empty.
marquez - Marquez 0.30.0

Published by merobi-hub over 1 year ago

Added

  • Proposals: add proposal for OL facet tables #2076 @wslulciuc
    Adds the proposal Optimize query performance for OpenLineage facets.
  • UI: display column lineage of a dataset #2293 @pawel-big-lebowski @tito12
    Adds a JSON preview of column-level lineage of a selected dataset to the UI.
  • UI: Add soft delete option to UI #2343 @tito12
    Adds option to soft delete a data record with a dialog component and double confirmation.
  • API: split lineage_events table to dataset_facets, run_facets, and job_facets tables. 2350, 2355, 2359
    @wslulciuc, @pawel-big-lebowski
    Performance improvement storing and querying facets.
    Migration procedure requires manual steps if database has more than 100K lineage events.
    We highly encourage users to review our migration plan.
  • Docker: add new script for stopping Docker #2380 @rossturk
    Provides a clean way to stop a deployment via docker-compose down.
  • Docker: seed data for column lineage #2381 @rossturk
    Adds some ColumnLineageDatasetFacet JSON snippets to docker/metadata.json to seed data for column-level lineage facets.

Fixed

  • API: validate RunLink and JobLink #2342 @pawel-big-lebowski
    Fixes validation of the ParentRunFacet to avoid NullPointerExceptions in the case of empty run sections.
  • Docker: use docker-compose.web.yml as base compose file #2360 @wslulciuc
    Fixes the Marquez HTTP server set in docker/up.sh so the script uses docker-compose.web.yml with overrides for dev set via docker-compose.web-dev.yml.
  • Docs: update copyright headers #2353 @merobi-hub
    Updates the headers with the current year.
  • Chart: fix Helm chart #2374 @perttus
    Fixes minor issues with the Helm chart.
  • Spec: update dataset version API spec #2389 @phixme
    Adds limit and offset to the openapi.yml spec file as query parameters.
marquez - Marquez 0.29.0

Published by merobi-hub almost 2 years ago

Added

Fixed