A toolbox of simple solutions for common data cleaning problems.
MIT License
Compatible with any Tables.jl implementation.
Installation: At the Julia REPL, using Pkg; Pkg.add("Cleaner")
snake_case
or camelCase
style.missing
, ""
, "NA"
, "None"
Any
.Julia
with more than 1 thread).CleanTable
implements the Tables.jl interface too.Cleaner
transformations on your original table implementation and have the resulting table be of the same type as the original.join
or merge
problems caused by having different schemas.!
) receive a table
as argument and return a CleanTable
.!
) receive a CleanTable
and return a CleanTable
.table
as argument and return a table
of the same type of the original.So you can start your workflow with a non mutating function and continue it using mutating ones if you prefer. E.g.
julia> df = DataFrame(" some bad Name" => [missing, missing, missing], "Another_weird name " => [1, 2, 3])
3×2 DataFrame
Row │ some bad Name Another_weird name
│ Missing Int64
─────┼─────────────────────────────────────
1 │ missing 1
2 │ missing 2
3 │ missing 3
julia> df |> polish_names |> compact_columns!
┌────────────────────┐
│ another_weird_name │
│ Int64 │
├────────────────────┤
│ 1 │
│ 2 │
│ 3 │
└────────────────────┘
Inspired by janitor from the R ecosystem.