Open, Multi-modal Catalog for Data & AI
APACHE-2.0 License
Unity Catalog is the industrys only universal catalog for data and AI.
The first release of Unity Catalog focuses on a core set of APIs for tables, unstructured data, and AI assets - with more to come soon on governance, access, and client interoperability. This is just the beginning!
This is a community effort. Unity Catalog is supported by
Unity Catalog is proud to be hosted by the LF AI & Data Foundation.
Let's take Unity Catalog for spin. In this guide, we are going to do the following:
You have to ensure that your local environment has the following:
JAVA_HOME
environment variable your terminal is configured to point to JDK17.build/sbt package
If you prefer to run this using the Unity Catalog Dockerized Environment, please refer to the Docker README.md
In a terminal, in the cloned repository root directory, start the UC server.
bin/start-uc-server
For the remaining steps, continue in a different terminal.
Let's list the tables.
bin/uc table list --catalog unity --schema default
You should see a few tables. Some details are truncated because of the nested nature of the data.
To see all the content, you can add --output jsonPretty
to any command.
Next, let's get the metadata of one of those tables.
bin/uc table get --full_name unity.default.numbers
You can see that it is a Delta table. Now, specifically for Delta tables, this CLI can print a snippet of the contents of a Delta table (powered by the Delta Kernel Java project). Let's try that.
bin/uc table read --full_name unity.default.numbers
For operating on tables with DuckDB, you will have to install it (version 1.0).
Let's start DuckDB and install a couple of extensions. To start DuckDB, run the command duckdb
in the terminal.
Then, in the DuckDB shell, run the following commands:
install uc_catalog from core_nightly;
load uc_catalog;
install delta;
load delta;
If you have installed these extensions before, you may have to run update extensions
and restart DuckDB
for the following steps to work.
Now that we have DuckDB all set up, let's try connecting to UC by specifying a secret.
CREATE SECRET (
TYPE UC,
TOKEN 'not-used',
ENDPOINT 'http://127.0.0.1:8080',
AWS_REGION 'us-east-2'
);
You should see it print a short table saying Success
= true
. Then we attach the unity
catalog to DuckDB.
ATTACH 'unity' AS unity (TYPE UC_CATALOG);
Now we are ready to query. Try the following:
SHOW ALL TABLES;
SELECT * from unity.default.numbers;
You should see the tables listed and the contents of the numbers
table printed.
To quit DuckDB, press Ctrl
+D
(if your platform supports it), press Ctrl
+C
, or use the .exit
command in the DuckDB shell.
To use the Unity Catalog UI, start a new terminal and ensure you have already started the UC server (e.g., ./bin/start-uc-server
)
Prerequisites
How to start the UI through yarn
cd /ui
yarn install
yarn start
You can interact with a Unity Catalog server to create and manage catalogs, schemas and tables, operate on volumes and functions from the CLI, and much more. See the cli usage for more details.
Unity Catalog can be built using sbt.
To build UC (incl. Spark Integration module), run the following command:
build/sbt clean package publishLocal spark/publishLocal
Refer to sbt docs for more commands.
build/sbt createTarball
This will create a tarball in the target
directory. See the full deployment guide for more details.JAVA_HOME
)build/sbt clean compile
build/sbt -J-Xmx2G clean test
build/sbt -J-Xmx2G jacoco
api/all.yaml
and then run the following:
build/sbt generate
This will regenerate the OpenAPI data models in the UC server and data models + APIs in the client SDK.build/sbt javafmtAll
IntelliJ is the recommended IDE to use when developing Unity Catalog. The below steps outline how to add the project to IntelliJ:
~/unitycatalog
.File
> New Project
> Project from Existing Sources...
and select ~/unitycatalog
.Import project from external model
select sbt
. Click Next
.Finish
.Java code adheres to the Google style, which is verified via build/sbt javafmtCheckAll
during builds.
In order to automatically fix Java code style issues, please use build/sbt javafmtAll
.
Follow the instructions for Eclipse or IntelliJ to install the google-java-format plugin (note the required manual actions for IntelliJ).
The build script checks for a lower bound on the JDK but the current SBT version imposes an upper bound. Please check the JDK compatibility documentation for more information
Create a virtual environment:
# Create virtual environment
python -m venv uc_docs_venv
# Activate virtual environment (Linux/macOS)
source uc_docs_venv/bin/activate
# Activate virtual environment (Windows)
uc_docs_venv\Scripts\activate
Install the required dependencies:
pip install -r requirements-docs.txt
Then serve the docs with
mkdocs serve