sq data wrangler
MIT License
Bot releases are hidden (Show)
Published by neilotoole over 1 year ago
sq version
now supports --yaml
output.sq version
now outputs host OS details with --verbose
, --json
--yaml
flags. The motivation behind this is bug submission: we wantsq version -j
:{
"version": "v0.38.1",
"commit": "eedc11ec46d1f0e78628158cc6fd58850601d701",
"timestamp": "2023-06-21T11:41:34Z",
"latest_version": "v0.39.0",
"host": {
"platform": "darwin",
"arch": "arm64",
"kernel": "Darwin",
"kernel_version": "22.5.0",
"variant": "macOS",
"variant_version": "13.4"
}
}
Published by neilotoole over 1 year ago
This release has significant improvements (and breaking changes) to SLQ (sq
's query language).
☢️ #254: The formerly-implicit "WHERE" mechanism now requires an explicit where()
function.
This, alas, is a fairly big breaking change. But it's necessary to remove an ambiguity roadblock.
See discussion in the issue.
# Previously
$ sq '.actor | .actor_id <= 2'
# Now
$ sq '.actor | where(.actor_id <= 2)'
#256: Column-only queries are now possible. This has the neat side effect
that sq
can now be used as a calculator.
$ sq 1+2
1+2
3
You may want to use --no-header
(-H
) when using sq
as a calculator.
$ sq -H 1+2
3
$ sq -H '(1+2)*3'
9
Literals can now be selected (docs).
$ sq '.actor | .first_name, "X":middle_name, .last_name | .[0:2]'
first_name middle_name last_name
PENELOPE X GUINESS
NICK X WAHLBERG
Lots of expressions that previously failed badly, now work.
$ sq '.actor | .first_name, (1+2):addition | .[0:2]'
first_name addition
PENELOPE 3
NICK 3
#258: Column aliases can now be arbitrary strings, instead of only a
valid identifier.
# Previously only valid identifier allowed
$ sq '.actor | .first_name:given_name | .[0:2]'
given_name
PENELOPE
NICK
# Now, any arbitrary string can be used
$ sq '.actor | .first_name:"Given Name" | .[0:2]'
Given Name
PENELOPE
NICK
Published by neilotoole over 1 year ago
# mysql "date_format" func
$ sq '@sakila/mysql | .payment | _date_format(.payment_date, "%m")'
# Postgres "date_trunc" func
$ sq '@sakila/postgres | .payment | _date_trunc("month", .payment_date)'
Published by neilotoole over 1 year ago
sq diff
: Renamed --count
flag to --counts
as intended.Published by neilotoole over 1 year ago
The major feature is the long-gestating sq diff
.
sq diff
compares two sources, or tables.sq inspect --dbprops
is a new mode that returns only the DB properties.sq inspect -v
previously returned DB properties in a field named db_variables
.db_properties
. The renaming reflects the factdb_variables
(now db_properties
) field has{"name": "XX", "value": "YY"}
values,bool
, int
, string
, etc.), or a nested value such as an array[777 bytes]
instead of dumping--tsv
) no longer has a shorthand form -T
. Apparently that-T
is needed elsewhere.--xml
no longer has shorthand -X
. And --markdown
has lost alias --md
.--text
, --json
, etc., there is now--format=FORMAT
flag, e.g. --format=json
. This will allow sq
toPublished by neilotoole over 1 year ago
--markdown
and the alias --md
are now supported.Published by neilotoole over 1 year ago
sq ls -jv
and sq ls -yv
produced no outputPublished by neilotoole over 1 year ago
This release significantly overhauls sq
's config mechanism (#199).
For an overview, see the new config docs.
Alas, this release has several minor breaking changes ☢️.
sq config ls
shows config.sq config get
gets individual config option.sq config set
sets config values.sq config edit
edits config.
$EDITOR
or $SQ_EDITOR
.sq config location
prints the location of the config dir.--config
flag is now honored globally.$ sq config set log true
$ sq config set log.level INFO
$ sq config set log.file /var/log/sq.log
There are also equivalent flags (--log
, --log.file
and --log.level
) andSQ_LOG
, SQ_LOG_FILE
and SQ_LOG_LEVEL
).sq
's config file (sq.yml
) has changed. The configsq
log file has changed. The new locationsq config get log.file -v
to view the location,sq config set log.file /path/to/sq.log
to set it.SQ_CONFIG
replaces SQ_CONFIGDIR
.SQ_LOG_FILE
replaces SQ_LOGFILE
.--table
is renamed to --text
. This is changed because while thetable
was not quite accurate.sq
has been--driver
to --ingest.driver
. This change aligns# previously
$ cat mystery.data | sq --driver=csv '.data'
# now
$ cat mystery.data | sq --ingest.driver=csv '.data'
sq add
no longer has the generic --opts x=y
mechanism. This flag was# previously
$ sq add ./actor.csv --opts=header=false
# now
$ sq add ./actor.csv --ingest.header=false
sq add --handle
flag has been changed from -h
to-n
. While this is not ideal, the -h
shorthand is already in use everywhere--header
.
# previously
$ sq add ./actor.csv -h @actor
# now
$ sq add ./actor.csv -n @actor
--pretty
flag has been removed. Its only previous use was with thejson
format, where if --pretty=false
would output the JSON in compact form.--compact
/ -c
flag that behavesjq
.--compact
/ -c
flag, the short form of the --csv
-c
to -C
. It's an unfortunate situation, but alignmentPublished by neilotoole over 1 year ago
The headline feature is source groups.
This is the biggest change to the sq
CLI in some time, and should make working with lots of sources much easier.
sq
now has a mechanism to group sources. A source handle can@sakila_prod
, @sakila_staging
, etc,@prod/sakila
, @staging/sakila
. Use sq group prod
tosq ls
respects). See docs.sq group GROUP
sets the active group to GROUP
.sq group
returns the active group (default is /
, the root group).sq ls GROUP
lists the sources in GROUP
.sq ls --group
(or sq ls -g
) lists all groups.sq mv
moves/renames sources and groups.sq ls
now shows the active item in a distinct color. It no longer addssq ls
now sorts alphabetically when using --table
format.sq ls
now shows the sources in the active group only. But note that/
(the root group), so the default behaviorsq ls
is the same as before.sq add hello.csv
will now generate the handle @hello
instead of @hello_csv
.@hello1
instead of @hello_csv_1
. Why_
character_
is a relative pain to type.sq ping
has changed to support groups. Instead of sq ping --all
, you cansq ping GROUP
, e.g. sq ping /
.Published by neilotoole over 1 year ago
#187: For csv
sources, sq
will now try to auto-detect if the CSV file
has a header row or not. Previously, this needed to be explicitly specified
via an awkward syntax:
$ sq add ./actor.csv --opts=header=true
This change makes working with CSV files significantly lower friction.
A command like the below now almost always works as expected:
$ cat ./actor.csv | sq .data
Support for Excel/XLSX header detection is in #191.
sq
is now better at detecting the (data) kind of CSV fields. It now more
accurately distinguishes between Decimal
and Int
, and knows how to
handle Datetime
.
#189: sq
now treats CSV empty fields as NULL
.
Published by neilotoole over 1 year ago
unique
function (docs):
$ sq '.actor | .first_name | unique'
This is equivalent to:
SELECT DISTINCT first_name FROM actor
count_unique
function (docs).
$ sq '.actor | count_unique(.first_name)'
count
function has been changed (docs)
.actor | count
equivalent to SELECT COUNT(*) AS "count" FROM "actor"
..actor | count(*)
) is no longer supported; use thesq
token, not the SQL token.
# previous behavior
$ sq '.actor | max(.actor_id)'
max("actor_id")
200
# now
$ sq '.actor | max(.actor_id)'
max(.actor_id)
200
Published by neilotoole over 1 year ago
group_by
now accepts function arguments.groupby
to group_by
to match jq.orderby
to order_by
to match jq.Published by neilotoole over 1 year ago
groupby()
to group results. See query guide.Published by neilotoole over 1 year ago
orderby()
to order results. See query guide.