What's Changed

Remove crate num_cpus from polars by @dandxy89 in https://github.com/pola-rs/polars/pull/2890
temporarely pin crossbeam-epoch by @ritchie46 in https://github.com/pola-rs/polars/pull/2902
fix unique and drop by @ritchie46 in https://github.com/pola-rs/polars/pull/2908
fix explode of empty lists by @ritchie46 in https://github.com/pola-rs/polars/pull/2910
fix function input expansion by @ritchie46 in https://github.com/pola-rs/polars/pull/2913
fix compilation lazy + string by @ritchie46 in https://github.com/pola-rs/polars/pull/2914
respect dtype overwrite when schema is overwritten in lazy csv scanner by @ritchie46 in https://github.com/pola-rs/polars/pull/2915
deprecate to_ and string cache in lazy by @ritchie46 in https://github.com/pola-rs/polars/pull/2916
Refactor: move most temporal related code to polars-time. by @ritchie46 in https://github.com/pola-rs/polars/pull/2918
improve datetime inference by @ritchie46 in https://github.com/pola-rs/polars/pull/2923
rename distinct to unique by @ritchie46 in https://github.com/pola-rs/polars/pull/2926
fix some warning by @ritchie46 in https://github.com/pola-rs/polars/pull/2927
improve date/datetime inference by @ritchie46 in https://github.com/pola-rs/polars/pull/2925
fix fill_nan dtypes by @ritchie46 in https://github.com/pola-rs/polars/pull/2933
fix future calculation in groupby dynamic by @ritchie46 in https://github.com/pola-rs/polars/pull/2935
add tolerance to asof + by by @ritchie46 in https://github.com/pola-rs/polars/pull/2937
fix(scan_csv): handle empty csv file exception by @LuisCardosoOliveira in https://github.com/pola-rs/polars/pull/2934
handle Utf8Owned AnyValue for DataType by @cigrainger in https://github.com/pola-rs/polars/pull/2944
Fix argsort by @ritchie46 in https://github.com/pola-rs/polars/pull/2946
value_counts and unique_counts expression by @ritchie46 in https://github.com/pola-rs/polars/pull/2947
use schema in 'with_columns' to amortize lookups and fix bug in emptr… by @ritchie46 in https://github.com/pola-rs/polars/pull/2949
add native log and entropy expression by @ritchie46 in https://github.com/pola-rs/polars/pull/2952
csv parsing: skip whitespace on failed parse by @ritchie46 in https://github.com/pola-rs/polars/pull/2953
Literal in groupby context, arange and repeat by @ritchie46 in https://github.com/pola-rs/polars/pull/2958
Huge perf improvement of many expressions and ListChunked::from_iter perf by @ritchie46 in https://github.com/pola-rs/polars/pull/2962
update groups in count() agg and correctly update state by @ritchie46 in https://github.com/pola-rs/polars/pull/2963
add sign by @ritchie46 in https://github.com/pola-rs/polars/pull/2977
see kurtosis as aggregation by @ritchie46 in https://github.com/pola-rs/polars/pull/2993
fix groups state after apply by @ritchie46 in https://github.com/pola-rs/polars/pull/2992
Home directory support by @cjermain in https://github.com/pola-rs/polars/pull/2940
make sure that sort does not index empty list by @ritchie46 in https://github.com/pola-rs/polars/pull/2996
python: improve arithmetic consistency by @ritchie46 in https://github.com/pola-rs/polars/pull/3001
python: add apply on struct dtype by @ritchie46 in https://github.com/pola-rs/polars/pull/3003
fix null in non-fast-explode explode of numeric arrays by @ritchie46 in https://github.com/pola-rs/polars/pull/3006
also expand rename in filters by @ritchie46 in https://github.com/pola-rs/polars/pull/3008
fix when then with literal by @ritchie46 in https://github.com/pola-rs/polars/pull/3009
fix groups update to match exploded offsets by @ritchie46 in https://github.com/pola-rs/polars/pull/3010
add duration expression by @ritchie46 in https://github.com/pola-rs/polars/pull/3017
allow nested groupby in groupby_rolling by @ritchie46 in https://github.com/pola-rs/polars/pull/3018
Fix read_parquet with list having nested struct by @cjermain in https://github.com/pola-rs/polars/pull/2991
fix outer join schema by @ritchie46 in https://github.com/pola-rs/polars/pull/3021
lazy: fix drop all by @ritchie46 in https://github.com/pola-rs/polars/pull/3023
fix schemas of groupby rolling/dynamic by @ritchie46 in https://github.com/pola-rs/polars/pull/3028
fix div by zero by @ritchie46 in https://github.com/pola-rs/polars/pull/3031
fix incorrect match in agg_mean by @ritchie46 in https://github.com/pola-rs/polars/pull/3030
check alias in whole expr on opt by @ritchie46 in https://github.com/pola-rs/polars/pull/3032
align groups in binary when they not align by @ritchie46 in https://github.com/pola-rs/polars/pull/3033
only expand function inputs if wildcard expansion allows it by @ritchie46 in https://github.com/pola-rs/polars/pull/3039
fix when_then_chain containing nulls by @ritchie46 in https://github.com/pola-rs/polars/pull/3040
fixed typo in format_path docstring by @cnpryer in https://github.com/pola-rs/polars/pull/3045
fix when-then-chain by @ritchie46 in https://github.com/pola-rs/polars/pull/3048
throw error on empty keyed groupby by @ritchie46 in https://github.com/pola-rs/polars/pull/3049
compare expand_cols by variant not exact datatype by @ritchie46 in https://github.com/pola-rs/polars/pull/3050
dot: use apply instead of map by @ritchie46 in https://github.com/pola-rs/polars/pull/3051
check output length of all 'map' expressions by @ritchie46 in https://github.com/pola-rs/polars/pull/3052
error on invalid asof_join by input by @ritchie46 in https://github.com/pola-rs/polars/pull/3053
improve performance of asof_join by equal or more than 2 keys by @ritchie46 in https://github.com/pola-rs/polars/pull/3055
remove unneeded expensive assert by @ritchie46 in https://github.com/pola-rs/polars/pull/3069
improve boolean null comparsions consistency by @ritchie46 in https://github.com/pola-rs/polars/pull/3068
fix entropy by @ritchie46 in https://github.com/pola-rs/polars/pull/3070
fix explode empty lists by @ritchie46 in https://github.com/pola-rs/polars/pull/3083
Lazy: update schema in explode op by @ritchie46 in https://github.com/pola-rs/polars/pull/3084
CSV datetime inference 3x performance improvement by @ritchie46 in https://github.com/pola-rs/polars/pull/2950
[polars-sql] Adding SQL Context, SELECT and GROUP BY by @potter420 in https://github.com/pola-rs/polars/pull/3024
Default sample n param to 1 by @cnpryer in https://github.com/pola-rs/polars/pull/3090
Expose 'rechunk' param from "read_ipc" for consistency (default behaviour unchanged) by @alexander-beedie in https://github.com/pola-rs/polars/pull/3088
Add optional seeding for sampling by @cnpryer in https://github.com/pola-rs/polars/pull/3080
default to native strptime by @ritchie46 in https://github.com/pola-rs/polars/pull/3093
Raise error in sample() if n and frac are both passed by @cnpryer in https://github.com/pola-rs/polars/pull/3091
split up planner by @ritchie46 in https://github.com/pola-rs/polars/pull/3095
add test for #3097 by @ritchie46 in https://github.com/pola-rs/polars/pull/3098
Initial support for serde/pickling expressions. by @ritchie46 in https://github.com/pola-rs/polars/pull/3096
Adding nested struct support by fixing ArrayRef determination by @cjermain in https://github.com/pola-rs/polars/pull/3103
Enhanced columns param for DataFrame init, additionally allowing for inline type specification by @alexander-beedie in https://github.com/pola-rs/polars/pull/3100
Improve rolling agg by @ritchie46 in https://github.com/pola-rs/polars/pull/3101
add estimate_size methods by @ritchie46 in https://github.com/pola-rs/polars/pull/3110
fix and test estimated_size by @ritchie46 in https://github.com/pola-rs/polars/pull/3113
remove unused datafusion integration by @ritchie46 in https://github.com/pola-rs/polars/pull/3115
Nodejs writejson fix & avro read/write by @universalmind303 in https://github.com/pola-rs/polars/pull/3116
Parquet statistics: don't panic by @ritchie46 in https://github.com/pola-rs/polars/pull/3127
lazy: expand cols in filter by @ritchie46 in https://github.com/pola-rs/polars/pull/3128
melt extra arguments by @ritchie46 in https://github.com/pola-rs/polars/pull/3133
Lazy: Don't materialize whole table in JOIN followed by SLICE by @ritchie46 in https://github.com/pola-rs/polars/pull/3136
Pushdown SLICE to GROUPBY nodes by @ritchie46 in https://github.com/pola-rs/polars/pull/3138
Switch from unmaintained jemalloctor to maintained tikv-jemallocator. by @ghuls in https://github.com/pola-rs/polars/pull/3141
Polars vs Pivot: Round 3 🥊 ~2-25x improvement by @ritchie46 in https://github.com/pola-rs/polars/pull/3143
DataFrame::partition_by by @ritchie46 in https://github.com/pola-rs/polars/pull/3148
Add semi and anti joins. by @ritchie46 in https://github.com/pola-rs/polars/pull/3149
derive clone for lazy groupby by @elferherrera in https://github.com/pola-rs/polars/pull/3156
pushdown slice to sort nodes by @ritchie46 in https://github.com/pola-rs/polars/pull/3159
slice_pushdown projections by @ritchie46 in https://github.com/pola-rs/polars/pull/3160
lazy err on not found col by @ritchie46 in https://github.com/pola-rs/polars/pull/3169
improve inner join performance by @ritchie46 in https://github.com/pola-rs/polars/pull/3168
fix duration filters with different time units by @marcvanheerden in https://github.com/pola-rs/polars/pull/3179
fix overflow in agg_mean by @ritchie46 in https://github.com/pola-rs/polars/pull/3183
list eval expression by @ritchie46 in https://github.com/pola-rs/polars/pull/3185
Supporting Struct comparison and any/all API by @cjermain in https://github.com/pola-rs/polars/pull/3180
struct logical type arrow conversion by @ritchie46 in https://github.com/pola-rs/polars/pull/3193
make series comparissons fallible by @ritchie46 in https://github.com/pola-rs/polars/pull/3192
fix_pivot by @ritchie46 in https://github.com/pola-rs/polars/pull/3199
recursively convert arrow by @ritchie46 in https://github.com/pola-rs/polars/pull/3200
fix arr.eval type inference by @ritchie46 in https://github.com/pola-rs/polars/pull/3203
Improve Left join on chunked data by @ritchie46 in https://github.com/pola-rs/polars/pull/3177
polars-ops by @ritchie46 in https://github.com/pola-rs/polars/pull/3212
Fix tree traversal complexity by @ritchie46 in https://github.com/pola-rs/polars/pull/3213
Adding struct column tests by @ishmandoo in https://github.com/pola-rs/polars/pull/3209
struct: handle validity by @ritchie46 in https://github.com/pola-rs/polars/pull/3217
bug template bounce resolved bugs by @ritchie46 in https://github.com/pola-rs/polars/pull/3218
add duration minutes by @ritchie46 in https://github.com/pola-rs/polars/pull/3219
fix partition boundary by @ritchie46 in https://github.com/pola-rs/polars/pull/3223
Option to check column order when comparing polars dataframes by @physinet in https://github.com/pola-rs/polars/pull/3206
fix dispatch of quantile aggregations by @ritchie46 in https://github.com/pola-rs/polars/pull/3234
Improving array refs for to_list by @cjermain in https://github.com/pola-rs/polars/pull/3231
fix offsets in categorical merge by @ritchie46 in https://github.com/pola-rs/polars/pull/3242
Serialize/Deserialize LazyFrames/Logical plans by @ritchie46 in https://github.com/pola-rs/polars/pull/3244
setup serializable function + null_count expr by @ritchie46 in https://github.com/pola-rs/polars/pull/3247
improve ternary in groupby context by @ritchie46 in https://github.com/pola-rs/polars/pull/3248
fix skew autoexplode and add test by @marcvanheerden in https://github.com/pola-rs/polars/pull/3251
quantile agg; update grouptuples by @ritchie46 in https://github.com/pola-rs/polars/pull/3252
Only pass dtype to array, if not None: Fixes #3253 by @ghuls in https://github.com/pola-rs/polars/pull/3257
polars 0.21.0 by @ritchie46 in https://github.com/pola-rs/polars/pull/3258
do not write empty chunk to parquet by @ritchie46 in https://github.com/pola-rs/polars/pull/3259
Improve partitioned groupby by @ritchie46 in https://github.com/pola-rs/polars/pull/3263
improve sample_perf by @ritchie46 in https://github.com/pola-rs/polars/pull/3264
add iso strptime patterns by @ritchie46 in https://github.com/pola-rs/polars/pull/3265
add partial decompression in read_csv by @ritchie46 in https://github.com/pola-rs/polars/pull/3268
fix partitoned and error don't ignore errors by @ritchie46 in https://github.com/pola-rs/polars/pull/3273
fix row count for u64 idx by @ritchie46 in https://github.com/pola-rs/polars/pull/3285
Code coverage for Rust/Python by @cjermain in https://github.com/pola-rs/polars/pull/3278
Improve groupby states by @ritchie46 in https://github.com/pola-rs/polars/pull/3291
recursive list builder in rows by @ritchie46 in https://github.com/pola-rs/polars/pull/3293
Fix ipc_read_schema so Path() and filename which start with "~/" work. by @ghuls in https://github.com/pola-rs/polars/pull/3297

New Contributors

@LuisCardosoOliveira made their first contribution in https://github.com/pola-rs/polars/pull/2934
@keiv-fly made their first contribution in https://github.com/pola-rs/polars/pull/2930
@cigrainger made their first contribution in https://github.com/pola-rs/polars/pull/2944
@slonik-az made their first contribution in https://github.com/pola-rs/polars/pull/3124
@physinet made their first contribution in https://github.com/pola-rs/polars/pull/3215

Full Changelog*: https://github.com/pola-rs/polars/compare/rust-polars-v0.20.0...rust-polars-v0.21.

polars - Rust polars 0.20.0

Published by ritchie46 over 2 years ago

New rust polars release! 🚀

This release of 286 commits is here thanks to the contributions of: (in no specific order):

@moritzwilksch
@JakobGM
@illumination-k
@tamasfe
@ghuls
@alexander-beedie
@Maxyme
@universalmind303
@qiemem
@glennpierce
@nmandery
@ilsley
@marcvanheerden

did I forget your contribution, please ping me, I do this manually 🙈

Most notable changes are:

Many bug fixes.
Many performance improvements.

features

Made representation of groups tuples more cache friendly #2431
Remove Seek requirement of readers
Add groupby_rolling as new entrance to expression API.
Improve CSV parsers stability and performance on several occasions
Horizontal aggregations are parallelized #2454
Reduce pivot code bloat and improve performance #2458
Struct data type added.
Extend methods that allow modification of the same memory if Arc::ref_count == 1
Avro readers and writers.
Improved rules of window expressions.
Support for us time unit.
Parquet use statistics in query optimizations.
Optimize projections in lazy computations. (Mostly useful when you deal with a large number of columns e.g. millions).
Improve performance and flexibility of melt operation @2799
new expressions
- str.split
- str.split_inclusive
- arr.join
- unique_stable
- str.split_exact
- count expression that does not require column names
- arr.arg_min
- arr.arg_max
- arr.diff
- arr.shift