Generate CREATE TABLE statements from JSON schema
MIT License
This is a command line utility to generate CREATE TABLE
statements compatible
with Trino from a table schema provided as JSON
Schema.
Grab the script.
$ # Or download just the Python script :)
$ git clone https://github.com/hashhar/jsonschema2sql.git
Run it against a schema (test.json is a sample schema present in the root of this repo).
$ python3 jsonschema2sql.py --jsonschema test.json \
--table my_test_table \
--schema my_test_db \
--location 's3://my/test/bucket' \
--table-format 'JSON' \
--partition-columns year month date
This generates a CREATE TABLE
DDL as below
CREATE TABLE "my_test_db"."my_test_table" (
"string_col" varchar,
"datetime_col" timestamp,
"datetime_string_col" varchar,
"date_col" date,
"date_string_col" varchar,
"time_col" time,
"time_string_col" varchar,
"decimal_string_col" decimal(10, 2),
"double_col" double,
"double_double_col" double,
"float_col" float,
"decimal_col" decimal(5, 3),
"action_date" bigint,
"boolean_col" boolean,
"array_col" array(varchar),
"array_object_col" array(ROW("string_col" varchar, "datetime_col" timestamp)),
"object_col" ROW("string_col" varchar, "integer_col" bigint)
) WITH (
external_location = 's3://my/test/bucket',
partitioned_by = ARRAY['year', 'month', 'date'],
format = 'JSON'
)
For all the possible command line arguments, pass the -h
or --help
command
line argument.
The JSON Schema to SQL types mapping is defined in code via a mapping of the
JSON Schema type
and format
pair against the corresponding SQL type.
For the complete mapping refer to the JSON_TYPE_TO_SQL_TYPE
dictionary in the
script.
I welcome you to use this tool (without any implied support contract 😛). In case you find any issues feel free to create one on GitHub.
For feature requests keep the following expectations in mind (I am open to being persuaded otherwise):
CREATE TABLE
statements.precision
, scale
and some values for the format
are extensions toFor contributing changes I recommend to open an issue before implementing anything. That makes it more likely to avoid repeated effort and allows other people to see if somebody else is already working on an issue or not.
master
in your fork.tools/reformat.sh
and tools/build.sh
(this might change the filesJoin us on Trino slack for more about
Trino. I can be found as hashhar
over there.
Create an issue in this repo or drop me an email (not that hard to find).