Bot releases are hidden (Show)
--fixed-column column-name string-value
, --fixed-column-int column-name integer-value
and --fixed-column-float column-name float-value
. Thanks, William Rowell. #81
Published by simonw almost 4 years ago
Published by simonw about 4 years ago
--just-strings
feature to disable type detection and import columns as strings by default. Thanks, Dan Nguyen!. #58Published by simonw about 5 years ago
This release drops support for Python 2.x #55
Published by simonw over 5 years ago
Bumped dependencies and pinned pytest to version 4 (5 is incompatible with Python 2.7).
Published by simonw over 5 years ago
-f
option used FTS4 even when FTS5 was available (#41)Published by simonw almost 6 years ago
Published by simonw over 6 years ago
-d
and -df
options for specifying date/datetime columns, closes #33
Maintain lookup tables in SQLite, refs #17
--index
option to specify which columns to index, closes #24
Test confirming --shape
and --filename-column
and -c
work together #25
Use usecols when loading CSV if shape specified
--filename-column
is now compatible with --shape
, closes #10
--no-index-fks
option
By default, csvs-to-sqlite creates an index for every foreign key column that is
added using the --extract-column
option.
For large tables, this can dramatically increase the size of the resulting
database file on disk. The new --no-index-fks
option allows you to disable
this feature to save on file size.
Refs #24 which will allow you to explicitly list which columns SHOULD have
an index created.
Added --filename-column
option, refs #10
Fixes for Python 2, refs #25
Implemented new --shape
option - refs #25
--table
option for specifying table to write to, refs #10
Updated README to cover --skip-errors
, refs #20
Add --skip-errors
option (#20) [Jani Monoses]
Less verbosity (#19) [Jani Monoses]
Only log extract_columns
info when that option is passed.
Add option for field quoting behaviour (#15) [Jani Monoses]
Published by simonw almost 7 years ago
Published by simonw almost 7 years ago
-f and -c
now work for single table multiple columns.
Fixes #12
Published by simonw almost 7 years ago
Added --fts
option for setting up SQLite full-text search.
The --fts
option will create a corresponding SQLite FTS virtual table, using
the best available version of the FTS module.
https://sqlite.org/fts5.html
https://www.sqlite.org/fts3.html
Usage:
csvs-to-sqlite my-csv.csv output.db -f column1 -f column2
Example generated with this option: https://sf-trees-search.now.sh/
Will be used in https://github.com/simonw/datasette/issues/131
--fts
and --extract-column
now cooperate.
If you extract a column and then specify that same column in the --fts
list,
csvs-to-sqlite
now uses the original value of that column in the index.
Example using CSV from https://data.sfgov.org/City-Infrastructure/Street-Tree-List/tkzw-k3nq
csvs-to-sqlite Street_Tree_List.csv trees-fts.db \
-c qLegalStatus -c qSpecies -c qSiteInfo \
-c PlantType -c qCaretaker -c qCareAssistant \
-f qLegalStatus -f qSpecies -f qAddress \
-f qSiteInfo -f PlantType -f qCaretaker \
-f qCareAssistant -f PermitNotes
Closes #9
Handle column names with spaces in them.
Added csvs-to-sqlite --version
option.
Published by simonw almost 7 years ago
Pandas does a good job of figuring out which SQLite column types should be
used for a DataFrame - with one exception: due to a limitation of NumPy it
treats columns containing a mixture of integers and NaN (blank values) as
being of type float64, which means they end up as REAL columns in SQLite.
http://pandas.pydata.org/pandas-docs/stable/gotchas.html#support-for-integer-na
To fix this, we now check to see if a float64 column actually consists solely
of NaN and integer-valued floats (checked using v.is_integer() in Python). If
that is the case, we over-ride the column type to be INTEGER instead.
See #5 - also a8ab524 and 0997b7b
Published by simonw almost 7 years ago
Mechanism for converting columns into separate tables
Let's say you have a CSV file that looks like this:
county,precinct,office,district,party,candidate,votes
Clark,1,President,,REP,John R. Kasich,5
Clark,2,President,,REP,John R. Kasich,0
Clark,3,President,,REP,John R. Kasich,7
(Real example from https://github.com/openelections/openelections-data-sd/blob/ master/2016/20160607__sd__primary__clark__precinct.csv )
You can now convert selected columns into separate lookup tables using the new
--extract-column option (shortname: -c) - for example:
csvs-to-sqlite openelections-data-*/*.csv \
-c county:County:name \
-c precinct:Precinct:name \
-c office -c district -c party -c candidate \
openelections.db
The format is as follows:
column_name:optional_table_name:optional_table_value_column_name
If you just specify the column name e.g. -c office
, the following table will
be created:
CREATE TABLE "party" (
"id" INTEGER PRIMARY KEY,
"value" TEXT
);
If you specify all three options, e.g. -c precinct:Precinct:name
the table
will look like this:
CREATE TABLE "Precinct" (
"id" INTEGER PRIMARY KEY,
"name" TEXT
);
The original tables will be created like this:
CREATE TABLE "ca__primary__san_francisco__precinct" (
"county" INTEGER,
"precinct" INTEGER,
"office" INTEGER,
"district" INTEGER,
"party" INTEGER,
"candidate" INTEGER,
"votes" INTEGER,
FOREIGN KEY (county) REFERENCES County(id),
FOREIGN KEY (party) REFERENCES party(id),
FOREIGN KEY (precinct) REFERENCES Precinct(id),
FOREIGN KEY (office) REFERENCES office(id),
FOREIGN KEY (candidate) REFERENCES candidate(id)
);
They will be populated with IDs that reference the new derived tables.
Closes #2