polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust

OTHER License

Downloads
9.7M
Stars
26.3K
Committers
213

Bot releases are visible (Hide)

polars - Rust Polars 0.31.1

Published by github-actions[bot] over 1 year ago

🚀 Performance improvements

  • Rolling min/max for partially sorted data (#9819)
  • use hash set in drop_many (#9807)
  • Faster is_sorted when no flag set (#9777)
  • optimize n_unique for integers (#9568)
  • remove sort columns on multiple-key OOC sort (#9545)
  • don't needlessly trigger bitcount (#9561)
  • don't initialize memory before row-encoding (#9435)
  • reduce page faults in q1 ~-30% (#9423)
  • reduce rayon/idle time in streaming (#9416)
  • use row format in streaming join ~15% (#9379)
  • row encode buffer reuse (#9371)
  • bytes row format for streaming groupby/unique keys >3.5x (#9346)
  • push slices down map functions (#9350)
  • increase streaming groupby spill size from 256 to 10_000 (#9312)
  • perf(rust, python) Improve rolling min and max for nonulls (#9277)
  • slightly improve n_unique performance (#9286)
  • speed up write_csv for time-zone-aware columns (#9093)
  • parallelize rolling_window group materialization (#9095)

✨ Enhancements

  • pass through unknown schema in unnest (#9896)
  • access OptState in LazyFrame to unit-test optimization toggle methods. (#9883)
  • respect and allow more options in eager json parsing (#9882)
  • allow set_sorted in streaming (#9876)
  • Expr.cat.get_categories expression (#9869)
  • add LENGTH and OCTET_LENGTH string functions for SQL (#9860)
  • polars_warn! macro (#9868)
  • Add Run-length Encoding functions (#9826)
  • add include_key parameter to partition_by (#9750)
  • add LEFT string function for SQL (#9836)
  • add REGEXP_LIKE function for SQL (both two and three parameter version) (#9838)
  • add maintain_order argument to sort/top_k/bottom_k (#9672)
  • add drop_many_amortized (#9814)
  • Dedicated horizontal aggregation functions (#9752)
  • implement with_row_count as private function (#9810)
  • add support for SQL SUBSTR function (#9803)
  • add SQL support for binary data and expand recognised SQL dtype strings (#9802)
  • reworked comfy-table layout constraints, improving table wrapping/repr (#9744)
  • allow qcut in window expressions (#9745)
  • Improve cut and allow use in expressions (#9580)
  • clearer message when stringcache-related errors occur (#9715)
  • improve expression formatting (#9704)
  • set string cache in window functions (#9705)
  • raise on both sides of datetime/str comparison (#9692)
  • support deserializing struct json into df (#9688)
  • add tree formatter for expressions (#9684)
  • add .list.any() and .list.all() (#9573)
  • extend dtype/selector matching for Datetime with a "*" wildcard for timezones (#9641)
  • add polars::VERSION (#9660)
  • add symmetric difference to list set operations (#9655)
  • add dt.base_utc_offset (#9636)
  • add dt.dst_offset feature (#9629)
  • allow to specify index order in to_numpy (#9592)
  • accept expressions in repeat (#9614)
  • set operations for list (#9599)
  • add drop_first parameter for to_dummies (issue #8246) (#9143)
  • raise if window size in rolling functions isn't strictly positive (#9465)
  • add infer schema len to json_extract (#9478)
  • Adds (Most) Remaining Trig Functions to SQLContext (#9453)
  • update error handling msg for sql functions (#9474)
  • add str.titlecase (#9457)
  • raise if period is negative in groupby_rolling (#9445)
  • add SQL round support (#9330)
  • dont error for time-zone-aware parsing if time zone is UTC (#9414)
  • support all numeric dtypes in serde (#9393)
  • ensure part of the plan is streaming if aggregati… (#9387)
  • add relaxed concatenation (#9382)
  • add sql DROP TABLE (#9355)
  • support ternary expressions in streaming (#9343)
  • add decoding support for row format (#9339)
  • add SQL support for null-aware equality checks (#9332)
  • add SQL support for regular expression operators (~, !~, ~*, and !~*) (#9327)
  • support // integer floordiv operator in the SQL engine (#9324)
  • serde for 'to_physical' expr (#9294)
  • add join cardinality validation (#9278)
  • keep sorted flag after Expr::truncate (#9275)
  • add "sql_expr" function (#9248)
  • rewrite correlation functions to expression architecture (#9258)
  • keep sorted flag on offset_by (#9253)
  • add intersection primitive for selector API (#9240)
  • building blocks for expression expansion sets (#9231)
  • Add ddof option to rolling_var and rolling_std (#8957)
  • immediately flatten nested unions (#9220)
  • support float expression on integers (#9210)
  • add binary to list<u8> cast (#9161)
  • add arr.unique expression (#9159)
  • implement explode for DataType::Array (#9157)
  • Decimal type: sum, min, max aggregations in select and agg context. (#9135)
  • Decimal arithmetic (#9123)
  • support decimals as cast types in csv parser (#9121)
  • Improve error handling for repeat (#9117)
  • conversion from Utf8 to Decimal. (#9090)

🐞 Bug fixes

  • fix(rust,python) respect original series dtype when constructing LitIter (#9886)
  • sum aggregation empty set is 0, not null (#9894)
  • Allow None as exponent (#9880)
  • preserve expression aliases when parsing SQL with pl.sql_expr (#9875)
  • fmt unknown dtype (#9872)
  • fix row-encode of 32 byte payloads (#9843)
  • shrink_type on all-null columns (#9811)
  • don't go into streaming engine when groupby by list (#9834)
  • fix regex + exclude (#9827)
  • potential integer overflow in drop_many_amortized (#9829)
  • add maintain_order argument to sort/top_k/bottom_k (#9672)
  • fix array concat and Series::fill_null (#9825)
  • dont preserve sortedness in offset_by for tz-aware non-constant durations (#9818)
  • Remove stray arr.eval references (#9821)
  • fix row-encode of null data (#9813)
  • allow +00:00 when loading from arrow (#9747)
  • fix row-count schema (#9797)
  • fix supertype detection (#9787)
  • merge rev-maps when building list arrays of categoricals. (#9742)
  • Loosen restrictions on cut expressions and add docs (#9730)
  • Fix list symmetric difference (#9732)
  • Fix list intersection (#9735)
  • don't clear rev_map when categorical series is cle… (#9720)
  • fix(rust, python) improve glob pattern testing (#9721)
  • don't run hstack checks when using cached names (#9709)
  • fix result dtype in date_range(..., eager=True) if duration contains "1s1d" (#9670)
  • increment seed between samples (#9694)
  • fix cse_plan invalid projection removal (#9700)
  • fix ne_missing for booleans vs lit (#9693)
  • raise if to_datetime would have parsed input incorrectly (#9675)
  • respect time_zone in lazy date_range (#8591)
  • redo weighted rolling var (#9609)
  • Correct weighted rolling quantile definition (#9608)
  • clear hashes buffer in generic streaming joins (#9612)
  • stable list namespace ouput when all elements are … (#9610)
  • validate time zone in cast and from_arrow operations (#9598)
  • make json feature depend on "dtype-struct" feature (#9589)
  • fix join suffix collision (#9579)
  • fix sum consistency (#9576)
  • fix take of array dtype (#9575)
  • fix predicate pushdown case before sort (#9574)
  • fix lazy schema of temporal_range functions when no alias is provided (#9543)
  • change the path parameter from to (#9531)
  • fix join validation when swapped (#9534)
  • fix race condition in out-of-core sort (#9521)
  • unset sortedness for local date and local datetime (#9515)
  • maintain sortedness flags on append/extend (#9496)
  • fix serde for small integer dtypes (#9495)
  • raise if window size in rolling functions isn't strictly positive (#9465)
  • groupby rolling with negative offset (#9428)
  • date_range with unit microseconds was producing incorrect results (#9413)
  • read_csv was parsing dates incorrectly when the dtype was overridden (#9420)
  • Compute Spearman rank correlations using average ra… (#9415)
  • Fix rolling min/max when window is empty (#9406)
  • fix compilation of other rustc versions (#9392)
  • list zip with (#9367)
  • parquet + categorical (#9363)
  • respect startby in groupby_dynamic when every is greater than 1d (#9362)
  • raise groupby apply on empty frame (#9360)
  • raise more informative error on string arguments (#9352)
  • correct assertion (#9320)
  • fix rolling weighted mean (#9292)
  • raise on invalid sort_by (#9262)
  • correct ne/e_missing schema (#9257)
  • fix cached reproject offsets (#9254)
  • delay opening files in streaming engine (#9251)
  • ensure agg(F(lit)) == lit (#9222)
  • don't SO on concat(expressions) (#9214)
  • clip window_size to length in rolling_apply (#9209)
  • rolling_apply window_size == len (#9181)
  • respect time zone in strptime/to_datetime when exact=False (#9171)
  • make null chunking behavior equal to other dtypes (#9176)
  • return single numpy array in Array dtype -> numpy (#9164)
  • fix regression in boolean nulls comparison (#9142)
  • fix struct null_count if fields are null arrays (#9151)
  • categorical construction from null values (#9145)
  • let apply caller determine if length needs to be checked. (#9140)
  • struct is_in should upcast numeric types (#9110)
  • json_extract on empty series (#9126)
  • bubble up dtype when converting from arrow (#9120)
  • rolling_groupy was returning incorrect results when offset was positive (#9082)

🛠️ Other improvements

  • Rolling quantile and median use DynArgs (#9867)
  • Clean up workspace definition (#9861)
  • Fix all clippy warnings in the test suite (#9839)
  • Refactor failing test (#9823)
  • Remove stray arr.eval references (#9821)
  • fix cut features (#9808)
  • cluster file scans in one node (#9799)
  • Remove old cut/qcut (#9763)
  • Small updates to issue templates (#9789)
  • unswap from_tz and to_tz in replace_timezone (#9768)
  • More cleanup around arange (#9769)
  • More cleanup for arange (#9681)
  • Fix small typo (#9714)
  • refactor arange and add int_range/int_ranges (#9666)
  • clean up inconsistencies in duration string language (#9551)
  • ensure date-range integration test runs in CI (#9554)
  • remove some redundancies in sort (#9541)
  • Fix some doc examples (#9405)
  • Remove outdated badges from README (#9532)
  • don't pickle pyarrow dataset (#9523)
  • Remove StdWindow in rolling (#9486)
  • remove unreachable code (#9463)
  • note that weekday is actually ISO weekday (#9440)
  • Add some documentation on the CI workflows (#9404)
  • fix typo in polars-lazy docs (#9354)
  • Utilize caching in test job (#9301)
  • Caching for benchmark workflow (#9267)
  • Further CI cleanup for Rust lints (#9260)
  • Separate workflow for Rust lints (#9245)
  • Fix itoap dependency specification (#9239)
  • Fix more broken links (#9230)
  • Fix some doc links (#9227)
  • Fix unused import warning in release build (#9224)
  • split up dsl::functions module (#9213)
  • update object_store requirement from 0.5.3 to 0.6.0 (#9154)
  • simplify slow datetime parser (#9183)
  • remove outdated struct, improve naming (#9172)
  • change decimal inference and argument order (#9133)
  • Include license file in polars-json crate (#9113)
  • Remove dbg statement from CoreJsonReader (#9114)
  • use concrete type for time zones (#9076)

Thank you to all our contributors for making this release possible!
@0xbe7a, @AnatolyBuga, @CloseChoice, @DeflateAwning, @EdmundsEcho, @MarcoGorelli, @SeanTroyUWO, @alexander-beedie, @ankane, @avimallu, @baggiponte, @bfeif, @borchero, @braaannigan, @c-peters, @datapythonista, @dependabot, @dependabot[bot], @dkrako, @durandtibo, @eitsupi, @guanqun, @jeroenjanssens, @jonashaag, @jorisSchaller, @josh, @kljensen, @lorentzenchr, @magarick, @mcrumiller, @messense, @mgperry, @mishpat, @moritzwilksch, @ritchie46, @sorhawell, @stinodego, @tarrafil, @thomascamminady, @ttencate, @universalmind303 and @zundertj

polars - Python Polars 0.18.7

Published by github-actions[bot] over 1 year ago

🚀 Performance improvements

  • speed up python object to AnyValue construction (#9840)
  • use hash set in drop_many (#9807)
  • speed up in series 10x (#9794)
  • Faster is_sorted when no flag set (#9777)

✨ Enhancements

  • Add Run-length Encoding functions (#9826)
  • add include_key parameter to partition_by (#9750)
  • add LEFT string function for SQL (#9836)
  • add REGEXP_LIKE function for SQL (both two and three parameter version) (#9838)
  • add maintain_order argument to sort/top_k/bottom_k (#9672)
  • Dedicated horizontal aggregation functions (#9752)
  • support numpy datetime64 units (from 'ns' to 'D') in polars.from_numpy (#9783)
  • implement with_row_count as private function (#9810)
  • add support for SQL SUBSTR function (#9803)
  • add SQL support for binary data and expand recognised SQL dtype strings (#9802)
  • add new duration selector and improve selector typing (#9772)
  • reworked comfy-table layout constraints, improving table wrapping/repr (#9744)

🐞 Bug fixes

  • fix row-encode of 32 byte payloads (#9843)
  • shrink_type on all-null columns (#9811)
  • don't go into streaming engine when groupby by list (#9834)
  • fix regex + exclude (#9827)
  • add maintain_order argument to sort/top_k/bottom_k (#9672)
  • fix array concat and Series::fill_null (#9825)
  • dont preserve sortedness in offset_by for tz-aware non-constant durations (#9818)
  • Remove stray arr.eval references (#9821)
  • fix row-encode of null data (#9813)
  • allow +00:00 when loading from arrow (#9747)
  • improve/fix write_database handling of db schema and quoted table names (#9788)
  • fix row-count schema (#9797)
  • fix supertype detection (#9787)
  • fix import error when writing parquet with pyarrow (#9760)

🛠️ Other improvements

  • Refactor failing test (#9823)
  • Remove stray arr.eval references (#9821)
  • Remove old cut/qcut (#9763)
  • improve note about the behaviour when converting from ns-precision temporal values to python-native types (#9798)
  • Small updates to issue templates (#9789)
  • More cleanup around arange (#9769)
  • add missing last entry (#9782)
  • Add rows_by_key docs (#9766)

Thank you to all our contributors for making this release possible!
@CloseChoice, @MarcoGorelli, @alexander-beedie, @avimallu, @jonashaag, @magarick, @mcrumiller, @ritchie46 and @stinodego

polars - Python Polars 0.18.6

Published by github-actions[bot] over 1 year ago

✨ Enhancements

  • allow qcut in window expressions (#9745)

🐞 Bug fixes

  • merge rev-maps when building list arrays of categoricals. (#9742)
  • Loosen restrictions on cut expressions and add docs (#9730)
  • Fix list symmetric difference (#9732)
  • Fix list intersection (#9735)

Thank you to all our contributors for making this release possible!
@magarick and @ritchie46

polars - Python Polars 0.18.5

Published by github-actions[bot] over 1 year ago

🚀 Performance improvements

  • optimize n_unique for integers (#9568)
  • remove sort columns on multiple-key OOC sort (#9545)
  • don't needlessly trigger bitcount (#9561)
  • optimize _datetime_to_pl_timestamp (#9533)

✨ Enhancements

  • Improve cut and allow use in expressions (#9580)
  • clearer message when stringcache-related errors occur (#9715)
  • improve expression formatting (#9704)
  • set string cache in window functions (#9705)
  • raise on both sides of datetime/str comparison (#9692)
  • support deserializing struct json into df (#9688)
  • add tree formatter for expressions (#9684)
  • streamline adbc connectivity, adding snowflake support (#9600)
  • improve selector utility functions with better docstrings/examples (#9683)
  • add .list.any() and .list.all() (#9573)
  • extend dtype/selector matching for Datetime with a "*" wildcard for timezones (#9641)
  • add symmetric difference to list set operations (#9655)
  • Pass through stdin/stderr buffer in to_csv (#9624)
  • add dt.base_utc_offset (#9636)
  • add dt.dst_offset feature (#9629)
  • allow to specify index order in to_numpy (#9592)
  • accept expressions in repeat (#9614)
  • set operations for list (#9599)
  • make LazyFrame.map pickle (#9597)
  • add a new rows_by_key method, returning a keyed-dictionary of row data (#9567)
  • implement apply object -> struct (#9578)

🐞 Bug fixes

  • don't clear rev_map when categorical series is cle… (#9720)
  • fix(rust, python) improve glob pattern testing (#9721)
  • don't run hstack checks when using cached names (#9709)
  • fix result dtype in date_range(..., eager=True) if duration contains "1s1d" (#9670)
  • increment seed between samples (#9694)
  • fix cse_plan invalid projection removal (#9700)
  • fix ne_missing for booleans vs lit (#9693)
  • raise if to_datetime would have parsed input incorrectly (#9675)
  • respect time_zone in lazy date_range (#8591)
  • Align dependency versions (#9661)
  • redo weighted rolling var (#9609)
  • Correct weighted rolling quantile definition (#9608)
  • clear hashes buffer in generic streaming joins (#9612)
  • stable list namespace ouput when all elements are … (#9610)
  • address schema edge-case with scalar-expanded data that resolves to an empty frame (#9593)
  • handle dictionary init with unsized iterators that also hits the scalar-expansion fast path (#9594)
  • validate time zone in cast and from_arrow operations (#9598)
  • ensure from_dicts drops columns explicitly omitted from schema (#9581)
  • fix join suffix collision (#9579)
  • fix sum consistency (#9576)
  • fix take of array dtype (#9575)
  • fix predicate pushdown case before sort (#9574)
  • fix lazy schema of temporal_range functions when no alias is provided (#9543)
  • fix join validation when swapped (#9534)

🛠️ Other improvements

  • More cleanup for arange (#9681)
  • Fix some more type hints (#9716)
  • Added trivial examples for the aggregation of columns in groupby (#9708)
  • Fix some type hints (#9695)
  • additional ADBC examples and docstring information for read_database (inc snowflake) (#9686)
  • drop Python 3.7 support (#9679)
  • improve selector utility functions with better docstrings/examples (#9683)
  • refactor arange and add int_range/int_ranges (#9666)
  • Clarify Dataframe.corr operates on columns (#9678)
  • remove false "eager=True" from date_range tests (#9663)
  • Add examples to .merge_sorted (#9664)
  • bump maturin from 1.0.1 to 1.1.0 in /py-polars (#9646)
  • remove deprecation warning of already-enforced valid timezones change (#9639)
  • fix failing ci test (#9638)
  • fix inconsistency in .list.difference() example (#9615)
  • Clean up doctests for rolling (#9626)
  • fix faulty test of to_numpy (#9619)
  • examples for .list.union(), .list.difference(), .list.intersection() (#9602)
  • fix see also broken links (#9607)
  • clarify sortedness condition of groupby_dynamic and groupby_rolling (#9606)
  • clean up inconsistencies in duration string language (#9551)
  • Adding examples to binary functions (#9553)
  • Minor cleanup of arange (#9544)
  • Remove outdated badges from README (#9532)

Thank you to all our contributors for making this release possible!
@MarcoGorelli, @alexander-beedie, @borchero, @datapythonista, @dependabot, @dependabot[bot], @eitsupi, @guanqun, @jeroenjanssens, @jorisSchaller, @kljensen, @magarick, @mcrumiller, @messense, @mishpat, @moritzwilksch, @ritchie46, @stinodego, @ttencate, @universalmind303 and @zundertj

polars - Python Polars 0.18.4

Published by github-actions[bot] over 1 year ago

🚀 Performance improvements

  • don't initialize memory before row-encoding (#9435)
  • further optimize datetime conversion (#9452)
  • speedup datetime conversion (#9432)
  • reduce page faults in q1 ~-30% (#9423)
  • reduce rayon/idle time in streaming (#9416)

✨ Enhancements

  • add drop_first parameter for to_dummies (issue #8246) (#9143)
  • raise if window size in rolling functions isn't strictly positive (#9465)
  • serializable python functions in expressions (#9462)
  • add infer schema len to json_extract (#9478)
  • Adds (Most) Remaining Trig Functions to SQLContext (#9453)
  • update error handling msg for sql functions (#9474)
  • Update LazyFrame.__repr__ (#9460)
  • support inversion of first & last selectors, additional minor repr improvements (#9456)
  • add str.titlecase (#9457)
  • raise if period is negative in groupby_rolling (#9445)
  • enhanced polars.selectors repr and implicit application of as_expr when broadcasting (#9450)
  • add SQL round support (#9330)
  • dont error for time-zone-aware parsing if time zone is UTC (#9414)

🐞 Bug fixes

  • ensure that trying to use a string as a dtype raises a consistent error on both DataFrame and Series init (#9493)
  • fix race condition in out-of-core sort (#9521)
  • unset sortedness for local date and local datetime (#9515)
  • maintain sortedness flags on append/extend (#9496)
  • fix serde for small integer dtypes (#9495)
  • raise if window size in rolling functions isn't strictly positive (#9465)
  • Fix empty list or Series selections on DF or Series (#8660)
  • groupby rolling with negative offset (#9428)
  • pl.lit with datetime was producing slightly incorrect results (#9438)
  • read_csv was parsing dates incorrectly when the dtype was overridden (#9420)
  • Compute Spearman rank correlations using average ra… (#9415)
  • Fix rolling min/max when window is empty (#9406)

🛠️ Other improvements

  • don't pickle pyarrow dataset (#9523)
  • fix rendering of examples (#9482)
  • Warn for future change of closed default value in rolling functions (#9470)
  • Document aggregate_function=None in pivot (#9473)
  • Docstrings for expressions and dtypes (#9351)
  • fix typo in rolling_* docstrings (#9449)
  • Deprecate some expr input parsing behavior (#9455)
  • improve date-range docs (#9451)
  • Improve docstrings rolling functions (#9215)
  • Remove _tempdir module references (#9427)
  • fix typo in Series.qcut (#9421)
  • Add some documentation on the CI workflows (#9404)

Thank you to all our contributors for making this release possible!
@EdmundsEcho, @MarcoGorelli, @SeanTroyUWO, @alexander-beedie, @baggiponte, @braaannigan, @datapythonista, @magarick, @mcrumiller, @messense, @mgperry, @mishpat, @ritchie46, @stinodego, @tarrafil, @universalmind303 and @zundertj

polars - Python Polars 0.18.3

Published by github-actions[bot] over 1 year ago

🚀 Performance improvements

  • use row format in streaming join ~15% (#9379)
  • row encode buffer reuse (#9371)
  • bytes row format for streaming groupby/unique keys >3.5x (#9346)
  • push slices down map functions (#9350)

✨ Enhancements

  • support all numeric dtypes in serde (#9393)
  • allow easy load/save of polars Config options to/from file (#9391)
  • ensure part of the plan is streaming if aggregati… (#9387)
  • add relaxed concatenation (#9382)
  • add sql DROP TABLE (#9355)
  • support ternary expressions in streaming (#9343)
  • add SQL support for null-aware equality checks (#9332)
  • add SQL support for regular expression operators (~, !~, ~*, and !~*) (#9327)
  • support // integer floordiv operator in the SQL engine (#9324)

🐞 Bug fixes

  • fix bug when comparing series (#9359)
  • list zip with (#9367)
  • parquet + categorical (#9363)
  • respect startby in groupby_dynamic when every is greater than 1d (#9362)
  • raise groupby apply on empty frame (#9360)
  • raise more informative error on string arguments (#9352)
  • Allow for tolerance when comparing nested dtype columns (#9272)
  • avoid is_in TypeError with sets of values containing 'None' (#9323)

🛠️ Other improvements

  • add top-k test for #9385 (#9388)
  • document apply 'return_dtype' requirement (#9361)
  • clarify when day of week takes effect in groupby_dynamic (#9342)
  • add "if you're coming from pandas" tip to groupby_dynamic (#9336)
  • fix string language formatting (#9341)
  • add doc entries for eq_missing and ne_missing expressions (#9331)
  • fixup options for validate arg in join (#9319)

Thank you to all our contributors for making this release possible!
@0xbe7a, @AnatolyBuga, @MarcoGorelli, @alexander-beedie, @dkrako, @durandtibo, @ritchie46 and @universalmind303

polars - Python Polars 0.18.2

Published by ritchie46 over 1 year ago

🚀 Performance improvements

  • increase streaming groupby spill size from 256 to 10_000 (#9312)
  • perf(rust, python) Improve rolling min and max for nonulls (#9277)

✨ Enhancements

  • allow use of StringCache object as a function decorator (#9309)
  • allow use of Config object as a function decorator (#9307)
  • serde for 'to_physical' expr (#9294)

🐞 Bug fixes

  • fix rolling weighted mean (#9292)
  • fix overly-broad string matching in selectors (#9303)
  • fix when loading model data from upcoming pydantic 2.x release (#9296)

🛠️ Other improvements

  • fix extraneous indent in examples block (#9297)
  • Fix typo in Selectors documentation (#9295)

Thank you to all our contributors for making this release possible!
@alexander-beedie, @magarick, @ritchie46, @stinodego and @thomascamminady

polars - Python Polars 0.18.1

Published by github-actions[bot] over 1 year ago

🏆 Highlights

  • add dedicated selectors module, consolidating/expanding existing selector capabilities (#9204)

🚀 Performance improvements

  • slightly improve n_unique performance (#9286)
  • use ciborium in Expression pickling (#9235)

✨ Enhancements

  • add join cardinality validation (#9278)
  • implement set operations for selector API (#9276)
  • keep sorted flag after Expr::truncate (#9275)
  • add "sql_expr" function (#9248)
  • rewrite correlation functions to expression architecture (#9258)
  • keep sorted flag on offset_by (#9253)
  • add expression json serde (#9236)
  • add intersection primitive for selector API (#9240)
  • building blocks for expression expansion sets (#9231)
  • Add ddof option to rolling_var and rolling_std (#8957)
  • immediately flatten nested unions (#9220)
  • Allow empty select/with_columns/groupby (#9205)
  • add a datetime selector (#9212)
  • support float expression on integers (#9210)
  • add dedicated selectors module, consolidating/expanding existing selector capabilities (#9204)
  • add binary to list<u8> cast (#9161)
  • groupby_dynamic by quarter. (#6842)
  • add arr.unique expression (#9159)
  • implement explode for DataType::Array (#9157)
  • Decimal type: sum, min, max aggregations in select and agg context. (#9135)
  • Decimal arithmetic (#9123)
  • support decimals as cast types in csv parser (#9121)
  • Improve error handling for repeat (#9117)

🐞 Bug fixes

  • fix pyarrow dataset literal filter (#9274)
  • raise on invalid sort_by (#9262)
  • match missing Array and Struct classes in FromPyObject (#9271)
  • correct ne/e_missing schema (#9257)
  • fix cached reproject offsets (#9254)
  • delay opening files in streaming engine (#9251)
  • ensure agg(F(lit)) == lit (#9222)
  • don't SO on concat(expressions) (#9214)
  • df.apply first rechunk (#9211)
  • clip window_size to length in rolling_apply (#9209)
  • raise error on invalid df.apply return (#9207)
  • Handle edge cases of named select input (#9198)
  • rolling_apply window_size == len (#9181)
  • respect time zone in strptime/to_datetime when exact=False (#9171)
  • make null chunking behavior equal to other dtypes (#9176)
  • return single numpy array in Array dtype -> numpy (#9164)
  • fix regression in boolean nulls comparison (#9142)
  • fix struct null_count if fields are null arrays (#9151)
  • Fix DataFrame.to_arrow() for 0x0 dataframes (#9144)
  • categorical construction from null values (#9145)
  • let apply caller determine if length needs to be checked. (#9140)
  • struct is_in should upcast numeric types (#9110)
  • Restore functionality of name arg for date_range (#9107)
  • bubble up dtype when converting from arrow (#9120)

🛠️ Other improvements

  • Fix grammar and add periods in Expr.over docs (#9244)
  • Update linting for py-polars crate (#9242)
  • Deprecate exprs=... input for select/with_columns/agg/struct (#9219)
  • Enable parallelization in Python Windows tests (#9232)
  • Use pytest tmp_path (#9206)
  • Build docs in parallel (#9229)
  • Unify Python docs workflows (#9228)
  • add docstring to __array__ methods (#8055)
  • Update expr parsing util to return PyExpr (#9166)
  • update pyo3 requirement from 0.18 to 0.19 (#9155)
  • clarify how the windows are formed in the rolling_* functions (#9192)
  • stabilise polars importtime check (#9196)
  • fix "to_decimal" docstring (#9197)
  • note that exact=False is a performance footgun (#9186)
  • change decimal inference and argument order (#9133)
  • Cache Rust build on main branch (#9130)
  • Improve df.clear() docs (#8809)
  • Bump maturin to 1.0.1 (#9115)
  • Bump lint dependency versions (#9116)

Thank you to all our contributors for making this release possible!
@DeflateAwning, @MarcoGorelli, @alexander-beedie, @ankane, @avimallu, @bfeif, @dependabot, @dependabot[bot], @jonashaag, @josh, @lorentzenchr, @magarick, @ritchie46, @stinodego, @universalmind303 and @zundertj

polars - Rust Polars 0.30.0

Published by github-actions[bot] over 1 year ago

🏆 Highlights

  • Rename list namespace accesor from .arr to .list (#8999)
  • Array (backed by arrow::FixedSizeList datatype (#8943)

⚠️ Breaking changes

  • propagate null in equality comparisons (#9053)
  • formalize implode -> explode relation (#9038)
  • consistently return list of date/datetime from lazy date_range (#8513)
  • Rename list namespace accesor from .arr to .list (#8999)
  • disallow time zones other than those in zoneinfo.available_timezones() (#8993)
  • remove window expression magic (#8992)
  • raise error when sorted flag not set (#8994)
  • in Series constructor, if inputs are time-zone-aware datetimes, convert to UTC (#8881)
  • parse offset-naive date time strings as Timestamp(time_unit), offset-aware datetime strings as Timestamp(time_unit, "UTC"), and remove the utc argument (#8714)
  • Remove deprecated tz_aware argument (#8696)

🚀 Performance improvements

  • speed up write_csv for time-zone-aware columns (#9093)
  • parallelize rolling_window group materialization (#9095)
  • elide hot loop in hash joins (#9075)
  • improve list explode perf (#8974)
  • Improve explodes: offsets_to_indexes performance (#8964)
  • avoid quadratic exclude behaviour when selecting against dtypes and/or wildcards (#8953)
  • use simd-json for all json parsing (#8922)
  • improve json_extract (#8858)
  • add optimizer passes and change initial order (#8811)
  • fused multiply sub / sub multiply (#8799)
  • improve parallel work distribution of sort expression ~4x (#8775)
  • change default row-group size (#8758)

✨ Enhancements

  • conversion from Utf8 to Decimal. (#9090)
  • default to checking sortedness in groupby_rolling… (#9063)
  • propagate null in equality comparisons (#9053)
  • implement apply for rolling/dynamic_groupby (#9049)
  • implement strategy=nearest for join_asof (#9024)
  • arr.sum expression (#9041)
  • formalize implode -> explode relation (#9038)
  • add array namespace and min/max expression (#9032)
  • improve error message on row-wise overflow (#9021)
  • properly apply slice at UNION level (#9018)
  • consistently return list of date/datetime from lazy date_range (#8513)
  • disallow time zones other than those in zoneinfo.available_timezones() (#8993)
  • raise error when sorted flag not set (#8994)
  • in Series constructor, if inputs are time-zone-aware datetimes, convert to UTC (#8881)
  • parse offset-naive date time strings as Timestamp(time_unit), offset-aware datetime strings as Timestamp(time_unit, "UTC"), and remove the utc argument (#8714)
  • error on invalid sortby expr (#8986)
  • Pushdown is_in to pyarrow dataset (#8930)
  • Array (backed by arrow::FixedSizeList datatype (#8943)
  • multiple enhancements for SQLContext (#8944)
  • add sql UNION, UNION ALL & UNION DISTINCT (#8936)
  • add sql compound identifiers (#8934)
  • add sql EXCLUDE (#8913)
  • add sql CASE (#8911)
  • add sql EXPLAIN (#8897)
  • improve json_extract (#8858)
  • add support for sql DISTINCT ON (#8824)
  • add LazyFrame null_count (#8837)
  • check categorical cache on transpose (#8836)
  • add support for OFFSET keyword in SQL queries (#8833)
  • add a new time_range utility function (#8776)
  • Add hint to use _saturating on overflow (#8805)
  • support boolean addition (#8778)
  • improved detail in several error messages (#8747)

🐞 Bug fixes

  • rolling_groupy was returning incorrect results when offset was positive (#9082)
  • fix null/empty in List::take_unchecked (#9074)
  • repeat by (#9023)
  • raise in to_datetime/strptime if format contains hour but not minute directive (#9044)
  • propagate nulls in broadcasting of order comparisons (#9050)
  • fix apply with passed date/datetime return_dtype (#9035)
  • raise error on invalid aggregation (#9013)
  • fix fused arithmetic in window functions (#9012)
  • JoinBuilder::force_parallel is modifying allow_parallel (#8617)
  • Fix erroneous warning in hist (#8982)
  • respect rechunk in parquet (#8935)
  • Simplify offsets_to_indexes, fix empty offsets edge cases (#8920)
  • sql qualified wildcards (#8916)
  • don't check sortedness in asof by (#8906)
  • check for object type in csv writer (#8894)
  • window function with filtered groups (#8880)
  • parse offset-aware strings as UTC in read_csv when try_parse_dates=True (#8864)
  • free buffer, but not its contents (#8848)
  • improve agg expr field types (#8834)
  • sql BETWEEN bounds should be inclusive (#8818)
  • sort cached window groups (#8813)
  • check null data before take (#8812)
  • fix broadcasting on integer bitwise (#8798)
  • correct aggregation of overlapping groups (#8794)
  • modify join error (#8768)
  • don't parallelize sort within rayon job (#8774)
  • fix deadlock in cache and improve parallelism/work… (#8765)
  • check offset before doing owned mutation (#8760)
  • validate data on successful deserialization (#8757)
  • improve supertype coercion of functions (#8755)

🛠️ Other improvements

  • use concrete type for time zones (#9076)
  • factor add_month out of add_impl_month_week_or_day (#9066)
  • remove unnecessary timezone trait usage, use concrete type (#9065)
  • Fix broken links (#9072)
  • bump sqlparser version (#9043)
  • move list namespace functions to seperate module (#9040)
  • Clean up arange/date_range/time_range (#9027)
  • Rename list namespace accesor from .arr to .list (#8999)
  • replace pattern match with unwrap (#9000)
  • remove window expression magic (#8992)
  • Remove deprecated tz_aware argument (#8696)
  • simplify take_every (#8971)
  • add readmes to all sub crates (#8770)
  • refactor(rust); improve arithmetic reuse and don't allocate on binary… (#8781)
  • accumulate windows flag during translation (#8773)

Thank you to all our contributors for making this release possible!
@CloseChoice, @MarcoGorelli, @alexander-beedie, @avimallu, @cbowdon, @charliegallop, @chitralverma, @jonashaag, @kpberry, @mcrumiller, @petar-savov, @raymead, @ritchie46, @sorhawell, @stinodego, @tim-habitat, @uchiiii and @universalmind303

polars - Python Polars 0.18.0

Published by github-actions[bot] over 1 year ago

🏆 Highlights

  • Rename list namespace accesor from .arr to .list (#8999)

⚠️ Breaking changes

  • propagate null in equality comparisons (#9053)
  • formalize implode -> explode relation (#9038)
  • Drop subclassing support for DataFrame/LazyFrame (#9008)
  • consistently return list of date/datetime from lazy date_range (#8513)
  • Default date_range/ones/zeros to eager=False (#9007)
  • Rename list namespace accesor from .arr to .list (#8999)
  • disallow time zones other than those in zoneinfo.available_timezones() (#8993)
  • remove window expression magic (#8992)
  • raise error when sorted flag not set (#8994)
  • Drop subclassing support for GroupBy (#7746)
  • in Series constructor, if inputs are time-zone-aware datetimes, convert to UTC (#8881)
  • parse offset-naive date time strings as Timestamp(time_unit), offset-aware datetime strings as Timestamp(time_unit, "UTC"), and remove the utc argument (#8714)
  • Remove deprecated tz_aware argument (#8696)

🚀 Performance improvements

  • speed up write_csv for time-zone-aware columns (#9093)
  • parallelize rolling_window group materialization (#9095)
  • elide hot loop in hash joins (#9075)

✨ Enhancements

  • conversion from Utf8 to Decimal. (#9090)
  • default to checking sortedness in groupby_rolling… (#9063)
  • propagate null in equality comparisons (#9053)
  • warn if constructing Series with time-zone-aware datetimes (#9058)
  • implement apply for rolling/dynamic_groupby (#9049)
  • Support more data types in lazy repeat (#9046)
  • implement strategy=nearest for join_asof (#9024)
  • arr.sum expression (#9041)
  • formalize implode -> explode relation (#9038)
  • add array namespace and min/max expression (#9032)
  • improve error message on row-wise overflow (#9021)
  • properly apply slice at UNION level (#9018)
  • consistently return list of date/datetime from lazy date_range (#8513)
  • Default date_range/ones/zeros to eager=False (#9007)
  • disallow time zones other than those in zoneinfo.available_timezones() (#8993)
  • raise error when sorted flag not set (#8994)
  • in Series constructor, if inputs are time-zone-aware datetimes, convert to UTC (#8881)
  • parse offset-naive date time strings as Timestamp(time_unit), offset-aware datetime strings as Timestamp(time_unit, "UTC"), and remove the utc argument (#8714)

🐞 Bug fixes

  • rolling_groupy was returning incorrect results when offset was positive (#9082)
  • don't underflow on list.tail (#9089)
  • fix null/empty in List::take_unchecked (#9074)
  • repeat by (#9023)
  • raise in to_datetime/strptime if format contains hour but not minute directive (#9044)
  • Order of pl.Array arguments in docstring (#9059)
  • propagate nulls in broadcasting of order comparisons (#9050)
  • Improve read_parquet missing column error message (#8961)
  • fix apply with passed date/datetime return_dtype (#9035)
  • respect inner type in Array construction (#9020)
  • raise error on invalid aggregation (#9013)
  • fix fused arithmetic in window functions (#9012)
  • don't allow silent init of Series declared as int/temporal with floating point values (#9004)
  • deprecate time_unit property from Series (#8990)

🛠️ Other improvements

  • Improve expression parsing utils (#9094)
  • Refactor expression input parsing util (#9085)
  • Organize "as_datatype" functions (#9080)
  • Change eager path for repeat (#9048)
  • Clean up arange/date_range/time_range (#9027)
  • Drop subclassing support for DataFrame/LazyFrame (#9008)
  • minor SQLContext docstring cleanups (#9005)
  • Rename list namespace accesor from .arr to .list (#8999)
  • remove window expression magic (#8992)
  • Drop subclassing support for GroupBy (#7746)
  • refactor!(python): Remove old deprecated functionality (#8995)
  • Remove deprecated tz_aware argument (#8696)

Thank you to all our contributors for making this release possible!
@CloseChoice, @MarcoGorelli, @alexander-beedie, @charliegallop, @jonashaag, @mcrumiller, @raymead, @ritchie46, @sorhawell, @stinodego, @tim-habitat and @universalmind303

polars - Python Polars 0.17.15

Published by github-actions[bot] over 1 year ago

🏆 Highlights

  • Array (backed by arrow::FixedSizeList datatype (#8943)
  • Write dataframes as delta tables (#7616)

🚀 Performance improvements

  • improve list explode perf (#8974)
  • Improve explodes: offsets_to_indexes performance (#8964)
  • avoid quadratic exclude behaviour when selecting against dtypes and/or wildcards (#8953)
  • use simd-json for all json parsing (#8922)
  • improve performance of align_frames, and add new alignment option (#8899)

✨ Enhancements

  • error on invalid sortby expr (#8986)
  • Pushdown is_in to pyarrow dataset (#8930)
  • allow set column list input to 'drop' and 'drop_nulls' (#8962)
  • Array (backed by arrow::FixedSizeList datatype (#8943)
  • Add dtype argument for repeat (#8946)
  • Use schema keys to define the columns if only the schema is provided to pl.struct (#8952)
  • multiple enhancements for SQLContext (#8944)
  • add sql UNION, UNION ALL & UNION DISTINCT (#8936)
  • add sql compound identifiers (#8934)
  • add sql EXCLUDE (#8913)
  • add sql CASE (#8911)
  • add sql EXPLAIN (#8897)
  • Write dataframes as delta tables (#7616)
  • improve performance of align_frames, and add new alignment option (#8899)
  • improved inference from type annotations (#8895)

🐞 Bug fixes

  • Fix erroneous warning in hist (#8982)
  • don't modify Series with empty names in-place on DataFrame init (#8956)
  • respect rechunk in parquet (#8935)
  • Add hint on PyArrow to ADBC import error (#8898)
  • Simplify offsets_to_indexes, fix empty offsets edge cases (#8920)
  • sql qualified wildcards (#8916)
  • address edge cases with in-place modification of Series objects (#8915)
  • don't check sortedness in asof by (#8906)
  • check for object type in csv writer (#8894)
  • improve performance of align_frames, and add new alignment option (#8899)
  • window function with filtered groups (#8880)

🛠️ Other improvements

  • deprecate rename "in_place" parameter (#8960)
  • Clean up tests for repeat (#8979)
  • Deprecate name argument for repeat (#8977)
  • simplify take_every (#8971)
  • Clean up repeat/ones/zeros (#8963)
  • further enhance SQLContext docstrings (#8948)
  • docs(python) Fix typo in lazygroupby.rs error message (#8937)
  • fix docstring for time() (#8939)
  • refactor tzinfo-related tests (#8883)

Thank you to all our contributors for making this release possible!
@CloseChoice, @MarcoGorelli, @alexander-beedie, @avimallu, @cbowdon, @chitralverma, @jonashaag, @kpberry, @mcrumiller, @petar-savov, @ritchie46, @stinodego and @universalmind303

polars - Python Polars 0.17.14

Published by github-actions[bot] over 1 year ago

🚀 Performance improvements

  • optimise align_frames and properly handle the case where the alignment key has duplicate values (#8825)

✨ Enhancements

  • add an align option to pl.concat (#8835)
  • add support for sql DISTINCT ON (#8824)
  • add LazyFrame null_count (#8837)
  • check categorical cache on transpose (#8836)
  • add support for OFFSET keyword in SQL queries (#8833)
  • optimise align_frames and properly handle the case where the alignment key has duplicate values (#8825)

🐞 Bug fixes

  • parse offset-aware strings as UTC in read_csv when try_parse_dates=True (#8864)
  • handle InitVar typing declarations on dataclass objects (#8856)
  • free buffer, but not its contents (#8848)
  • improve agg expr field types (#8834)
  • optimise align_frames and properly handle the case where the alignment key has duplicate values (#8825)
  • sql BETWEEN bounds should be inclusive (#8818)

🛠️ Other improvements

  • add examples for Config "set_tbl_formatting" and "set_fmt_str_lengths" methods (#8859)
  • Convert between Vec of Series/Pyseries using trait (#8846)

Thank you to all our contributors for making this release possible!
@MarcoGorelli, @alexander-beedie, @ritchie46, @stinodego and @universalmind303

polars - Python Polars 0.17.13

Published by github-actions[bot] over 1 year ago

🚀 Performance improvements

  • add optimizer passes and change initial order (#8811)
  • fused multiply sub / sub multiply (#8799)
  • improve parallel work distribution of sort expression ~4x (#8775)
  • change default row-group size (#8758)
  • elide function calls in AnyValue::eq (#8725)

✨ Enhancements

  • add a new time_range utility function (#8776)
  • Add hint to use _saturating on overflow (#8805)
  • add a "restore_defaults" kwarg to Config init (#8797)
  • add lazy time expression (#8785)
  • support boolean addition (#8778)
  • support SQLContext registration of DataFrames (#8762)
  • support automatic SQLContext frame/table registration from local variables (#8749)
  • improved detail in several error messages (#8747)
  • support frame registration at SQLContext init time, and add an "unregister" method (#8744)
  • support repeat for all types (#8741)
  • add support for DISTINCT keyword in SQL select clauses (#8740)
  • support any day of the week in 'start_by' in groupby_dynamic (#8720)
  • add support for USING clause in SQL join operations (#8731)
  • add unit tests for extend_constant Expr (#8734)
  • add clean multi-frame registration to SQLContext (#8724)
  • add support for HAVING clause to SQL GROUP BY operations (#8704)
  • improved numpy string interop (#8703)

🐞 Bug fixes

  • sort cached window groups (#8813)
  • check null data before take (#8812)
  • fix broadcasting on integer bitwise (#8798)
  • Fix incorrect type hint for arange (#8796)
  • correct aggregation of overlapping groups (#8794)
  • don't parallelize sort within rayon job (#8774)
  • fix deadlock in cache and improve parallelism/work… (#8765)
  • check offset before doing owned mutation (#8760)
  • don't persist temporary column in disjoint calls to update (#8763)
  • validate data on successful deserialization (#8757)
  • improve supertype coercion of functions (#8755)
  • groupby_dynamic was unnecessarily failing on ambiguous local datetime (#8737)
  • ensure count aggregation has proper length when spilling (#8735)
  • fix return value of std for single-element sequence with ddof=1 (#8730)
  • don't take logical plan during streaming fmt (#8711)
  • Don't upcast in round() for f32 when decimal is 0 (#8706)

🛠️ Other improvements

  • add entry for lazy time func (#8786)
  • add unit tests for extend_constant Expr (#8734)
  • add rounding coverage for 32/64 bit floats (#8715)
  • Add warning to count methods on null (#8698)

Thank you to all our contributors for making this release possible!
@DeflateAwning, @MarcoGorelli, @alexander-beedie, @mcrumiller, @ritchie46, @stinodego, @uchiiii, @universalmind303 and @zundertj

polars - Rust Polars 0.29.0

Published by github-actions[bot] over 1 year ago

🏆 Highlights

  • Out-of-core unique (#8573)

⚠️ Breaking changes

  • Rename concat_lst to concat_list (#8597)
  • Schema improvements (#8286)
  • don't create duplicate pivot names (#8002)
  • rename toggle_string_cache to enable_string_cache (#7970)
  • change top_k(descending) -> bottom_k (#7969)
  • in sort, top_k, sort_by, and arg_sort_by, raise if descending is a sequence and its length doesn't match the number of columns to sort by (#7957)

🚀 Performance improvements

  • elide function calls in AnyValue::eq (#8725)
  • add fused multiply add optimization for expressions (#8690)
  • use expression for dot product (#8686)
  • improve nested grouptuples related code (#8618)
  • buffer spill partitions in ooc sort. ~10/20% (#8616)
  • improve OOC sort performance during partition phase (#8590)
  • remove some unnecessary calls and matches (#8490)
  • less naive count (#8473)
  • parallelize almost all flattens (#8468)
  • optimize horizontal min/max (#8463)
  • reinstate old behavior in numeric group-tuples (#8445)
  • remove false sharing in perfect hash table >2x (#8432)
  • further optimised conversions to python date/datetime (#8417)
  • optimize join inner materialization of single keys (#8405)
  • parallelize sorted group tuple materialization (#8387)
  • improve materialization of huge cardinality group tuples (#8382)
  • improve group_tuples materialization (#8375)
  • use online variance kernel for aggregation (#8306)
  • add specialized boolean aggregation for min/max (#8294)
  • fail fast on non-inferable strings in strptime if no fmt is provided (#8111)
  • make chunks search more resilient (#8229)
  • SIMD accelerated arg_min/arg_max (via argminmax) (#8074)
  • speed up csv parsing for slower datetimes formats (#8213)
  • arr.eval run on groupby expression engine when possible (#8199)
  • FromParalleIter<Option<str>> for Utf8Chunked ~1.9x (#8058)
  • speed up from_par_iter Option<bool> ~2.5x (#8057)
  • parallelize numeric ChunkedArray materialization ~2x. (#8053)
  • parallelize into_groups materialization ~-25% (#8036)
  • use a trusted anyvalue builder (#8001)
  • numeric grouptuples with nulls hash in single pass ~25% (#7980)
  • use perfect hash table for categoricals (#7951)
  • improve group_tuples of high cardinality data ~10% (#7938)
  • use streaming instead of partitioned groupby (#7907)
  • don't auto-stream groupby (#7906)
  • rechunk before aggs (#7903)
  • don't re-allocate groups in sorted to_dummies (#7897)

✨ Enhancements

  • add support for DISTINCT keyword in SQL select clauses (#8740)
  • support any day of the week in 'start_by' in groupby_dynamic (#8720)
  • add support for USING clause in SQL join operations (#8731)
  • add support for HAVING clause to SQL GROUP BY operations (#8704)
  • streaming unions (#8676)
  • expression cache (#8674)
  • rolling covariance and correlation (#8671)
  • Add dt.to_string alias for dt.strftime (#8290)
  • use temp dir for ooc spills (#8614)
  • make ooc-sort resilient against chunk_size (#8588)
  • Set strptime default strict/exact=true (#8587)
  • Out-of-core unique (#8573)
  • Add to_date, to_datetime, to_time to String namespace (#8579)
  • more detailed error message on failure to cast List dtype (#8583)
  • don't trigger unreachable code if no dtype is set (#8532)
  • accept expressions in groupby_dynamic/rolling (#8528)
  • expose quantile/mean for duration (#8491)
  • require explicitly sorted flag for upsample (#8488)
  • allow for _saturating suffix in duration strings (#8479)
  • let duration string accept "1mo_saturating" (#8469)
  • add dt.month_start and dt.month_end (#8435)
  • add SQL support for cumulative functions (#8457)
  • add str_slice method to StringNameSpace (#8427)
  • allow negative 'arange' expression (#8413)
  • warn if argument is not explicitly sorted (#8409)
  • Schema improvements (#8286)
  • add support for SQL "IN" expr (#8396)
  • cli output mode & sql read_json (#8336)
  • rename 'csv-file' to 'csv' (#8101)
  • preserve time zone in combine (#8263)
  • add use_earliest argument to replace_time_zone for dealing with ambiguous datetimes (#8087)
  • SQL CTE's (#8208)
  • add duration cumsum and remainder (#8219)
  • better algorithm for streaming unique (#8003)
  • Add approx distinct count via approx_unique() (#7937)
  • adopt FunctionExpr for cat namespace (#8173)
  • DatetimeArgs ergonomics (#8133)
  • Remove Seek constraint from IpcStreamReader and SerReader (#8166)
  • implement FunctionExpr for bound and round methods (#8172)
  • display skipped row if same number of rows (#8170)
  • move all boolean expressions into BooleanFunction enum (#8132)
  • rewrite log expressions to make them serializable (#8126)
  • make unique expr serde and cmp (#8153)
  • adopt FunctionExpr for abs to allow for serialization (#8129)
  • adopt FunctionExpr for cum* functions (#8130)
  • support negative index in pct_change (#8137)
  • add log1p to list of mathematical functions (#8102)
  • expand list of tz-aware formats which can be auto-inferred (#8085)
  • clearer error message if strptime without a fmt specified fails (#8086)
  • infer tz-aware formats with try_parse_dates in read_csv (#8084)
  • feat(python, rust)! make 'mo' interval raise if the target date does not exist (#8078)
  • auto-infer fmt for tz-aware date strings (#7405)
  • multiple sql contexts & optional sql highlighting in cli (#8072)
  • implement arg_sort for struct dtype (#8051)
  • support struct in df.unique (#7976)
  • change top_k(descending) -> bottom_k (#7969)
  • optimize away nested unions in lp (#7861)
  • Add seed argument to rank for random (#7913)
  • auto-infer detecting time-zone-awareness of fmt argument in strptime; deprecate tz_aware argument (#7886)
  • deal with null values in cut/qcut (#7878)
  • support datetime/date subclasses (e.g. FreezeGun) (#7819)

🐞 Bug fixes

  • groupby_dynamic was unnecessarily failing on ambiguous local datetime (#8737)
  • ensure count aggregation has proper length when spilling (#8735)
  • fix return value of std for single-element sequence with ddof=1 (#8730)
  • don't take logical plan during streaming fmt (#8711)
  • Don't upcast in round() for f32 when decimal is 0 (#8706)
  • block predicate containing shifts and windows after sort (#8670)
  • ensure perfect hash table processes the nulls (#8668)
  • Reading more tiny CSVs than workers in parallel will deadlock (#8441)
  • respect maintain_order in partitioned groupby (#8653)
  • fix explode null series (#8654)
  • fix categorical agg type (#8645)
  • allow list<null> -> list<cat> (#8636)
  • maintain sorted info on top-k and empty sort (#8615)
  • maintain sortedness in date -> datetime cast (#8606)
  • fix determining of supertype for tz-aware and tz-naive datetimes (#8585)
  • fix csv reader with new line in header (#8580)
  • correct for nested offsets in json serialization (#8584)
  • fix wrong dtype init in streaming groupby (#8574)
  • fix categorical/string_cache fill_null panic (#8562)
  • fix window function contention in binary expression (#8544)
  • fix StructChunked not_equal comparator/operator (#8547)
  • fix struct pyarrow ffi (#8543)
  • don't trigger unreachable code if no dtype is set (#8532)
  • keep sorted info on agg_first and simple singleton… (#8526)
  • unset fast_unique coming from arrow (#8521)
  • correct sign-reversed scale on DecimalChunked to Python Decimal conversion (fixes #8423) (#8508)
  • don't error on cast if column is not projected (#8495)
  • ensure window function succeeds on empty frame (#8492)
  • don't set verbose on union (#8487)
  • check literal/group length before claiming agg sta… (#8486)
  • fix error message of offset_by if offsetting by negative number of months (#8464)
  • fix sorted warning (#8462)
  • fix features serde and dtype-struct not compiling together (#8439)
  • respect dtype in anonymous list builder in case of… (#8428)
  • infer supertype in json serde (#8411)
  • duration on empty df (#8403)
  • don't inadvertently set Series initialised with nested tuple data as Object dtype (#8401)
  • use physical in streaming unique global table (#8390)
  • recursively bubble up all dtypes in list cast (#8386)
  • is_in struct logical types (#8378)
  • fix nested null parquet read (#8372)
  • fix logical type in ListChunked::new_from_index (#8367)
  • bubble up logical type in recursive list cast (#8356)
  • implement clone_inner for all series (#8357)
  • fix fill_null for categorical (#8353)
  • time.cast(str) as strftime (#8351)
  • fix logical dtypes in parallel list collection (#8349)
  • improve logical types of explode operation (#8348)
  • logical type in anonymous list builders (#8346)
  • escape csv header names if they contain special chars (#8331)
  • nested struct/list/categorical logical/physical (#8334)
  • fix deserialize empty list (#8326)
  • fix coalesce schema (#8324)
  • don't do null propagation (#8322)
  • ensure invalid list eval raises (#8317)
  • pass name to struct construction in aggregation (#8299)
  • Use three slashes for doc comments (#8284)
  • improve nested list construction (#8278)
  • Fix DataFrame.sum returning empty column names (#8283)
  • always sort in top_k fast path (#8275)
  • don't use fast paths for sorted join if there are … (#8272)
  • fix boolean par materialization (#8257)
  • improve null/empty list construction (#8255)
  • fix offsets in parallel utf8 materialization (#8254)
  • nested struct logical type consistency (#8249)
  • keep literal state if elementwise function is applied (#8195)
  • decimal ensure backed arrow arrays have correct dtype (#8193)
  • ensure cached nodes are initialized once (#8103)
  • validate map lenghts (#8147)
  • fix row-wise init of UInt64 values that exceed Int64 upper bound (#8146)
  • implement list<null> constructor (#8143)
  • add all primitives to av_buffer builder (#8140)
  • struct is_in (#8139)
  • fix wrong display name of binary expressions (#8131)
  • lazy: fix boolean sum schema (#8108)
  • don't exponentially grow error messages (partial fix). (#8081)
  • check element count in multi-column explode (#8050)
  • set lower limit for chunk_size (#8048)
  • impl to_static for struct (#8037)
  • all/any empty sets (#8012)
  • struct null_count, cast string, tranpose and describe (#8009)
  • fix pivot and transpose of struct data (#8005)
  • don't create duplicate pivot names (#8002)
  • fix chunked literals in expression engine (#7973)
  • in sort, top_k, sort_by, and arg_sort_by, raise if descending is a sequence and its length doesn't match the number of columns to sort by (#7957)
  • concat object types (#7958)
  • fix decimal conversion alignment (#7954)
  • Fix lazy encode schema (#7912)
  • respect skip_nulls in apply for temporal types (#7908)
  • fix lit agg (#7904)
  • disable ooc groupby (#7901)
  • fix abs logical type (#7895)
  • fix boolean min/max output type and null handling (#7894)
  • validate groupby_dynamic inputs (#7876)
  • correct for chunks in arg_where (#7873)
  • fix nested logical/physical list (#7872)
  • fix arbitrary nested logical types (#7869)
  • don't use fxhash in sink_sorted fast path (#7849)
  • parquet stats & all kernel (#7846)

🛠️ Other improvements

  • remove unnecessary feature flag requirement for start_by=monday in groupby_dynamic (#8716)
  • remove some branches (#8688)
  • streaming pipeline creation (#8656)
  • simplify replace_time_zone (#8644)
  • make slice attribute in UnionOptions consistent with … (#8639)
  • document the dispatcher (#8637)
  • Rename concat_lst to concat_list (#8597)
  • remove unreachable/duplicated code in get_supertype (#8592)
  • change partition strategy (#8561)
  • remove some unnecessary calls and matches (#8490)
  • improve sorted warning/ fix tests (#8484)
  • bubble up time_iter errors (#8467)
  • Minor update to strptime (#8345)
  • use concat_owned_array_unchecked when possible (#8274)
  • Rename strptime/strftime args (#8221)
  • change sampling ratio for groupby strategy (#8223)
  • Rename Expr.list to implode (#8165)
  • introduce FieldsMapper utility class for obtaining FunctionExpr schema (#8175)
  • don't panic on err in offset_by (#8210)
  • remove unused list_construction (#8197)
  • split dsl paragraph header (#8162)
  • feature flag guards (#8117)
  • use map_private where applicable to reduce code duplication (#8128)
  • remove unnecessary to_string (#8083)
  • docs(rust) Add note about -1 to show all rows. (#8080)
  • Fixed a bunch of clippy warnings (#7967)
  • rename toggle_string_cache to enable_string_cache (#7970)
  • Include license files in polars-error and polars-row crates (#7930)
  • quantile typo in qcut (#7936)
  • Improve Duration::parse docs (#7918)
  • improve shift and fill performance in case of periods >= ca.len() (#7843)

Thank you to all our contributors for making this release possible!
@DeflateAwning, @JoonHong-Kim, @LdRoW, @MarcoGorelli, @Newtoniano, @StefanBRas, @alexander-beedie, @alonme, @ankane, @avimallu, @ayemjay, @borchero, @cgevans, @chitralverma, @clickingbuttons, @dependabot, @dependabot[bot], @ghuls, @grantmcdermott, @jonashaag, @josh, @jvdd, @lorentzenchr, @mcrumiller, @mzjp2, @n8henrie, @pgimalac, @rben01, @ritchie46, @stinodego, @uchiiii, @universalmind303, @utkarshgupta137, @zaynetro and @zundertj

polars - Python Polars 0.17.12

Published by github-actions[bot] over 1 year ago

🚀 Performance improvements

  • add fused multiply add optimization for expressions (#8690)
  • use expression for dot product (#8686)

✨ Enhancements

  • streaming unions (#8676)
  • allow arr.to_struct to take a list of field names, fix it for Series, improve related docstrings (#8673)
  • expression cache (#8674)
  • rolling covariance and correlation (#8671)
  • .to_physical() for List(Categorical) (#8499)
  • allow from_repr to handle parsing of table reprs with no dtype row (#8640)
  • Add dt.to_string alias for dt.strftime (#8290)
  • support DataFrame export to numpy structured/record arrays (#8628)
  • support transparent DataFrame init from numpy structured/record arrays. (#8620)
  • Prettify show_versions (#8627)

🐞 Bug fixes

  • allow arr.to_struct to take a list of field names, fix it for Series, improve related docstrings (#8673)
  • block predicate containing shifts and windows after sort (#8670)
  • ensure perfect hash table processes the nulls (#8668)
  • Reading more tiny CSVs than workers in parallel will deadlock (#8441)
  • respect maintain_order in partitioned groupby (#8653)
  • fix explode null series (#8654)
  • fix categorical agg type (#8645)
  • allow list<null> -> list<cat> (#8636)

🛠️ Other improvements

  • add notes/examples on use of inline regex flags to replace docstrings (#8685)
  • Add "See Also" sections for alias, map_alias, prefix, s… (#8682)
  • add notes/examples on use of inline regex flags to extract_all docstrings (#8675)
  • allow arr.to_struct to take a list of field names, fix it for Series, improve related docstrings (#8673)
  • add notes on the use of inline regex flags to extract docstrings (#8669)
  • Add missing implode to internal functions (#8667)
  • Clean up type checking imports (#8666)
  • Organize PySeries impl blocks (#8665)
  • clean-up some examples, extend pipe docstring (#8658)
  • add notes on the use of inline regex flags to contains docstrings (#8657)
  • fix/improve from_repr example/doctest (#8642)
  • Improve some bindings imports (#8630)
  • Move functions in Rust bindings to functions module (#8629)
  • only require typing_extensions before Python 3.8 (#8623)
  • Set up separate modules for lazy classes (#8624)
  • Remove duplicate util in Rust bindings (#8622)
  • Move Python version to env in release workflow (#8621)

Thank you to all our contributors for making this release possible!
@MarcoGorelli, @alexander-beedie, @dependabot, @dependabot[bot], @ghuls, @jonashaag, @josh, @mcrumiller, @ritchie46 and @stinodego

polars - Python Polars 0.17.11

Published by github-actions[bot] over 1 year ago

🚀 Performance improvements

  • improve nested grouptuples related code (#8618)
  • buffer spill partitions in ooc sort. ~10/20% (#8616)
  • avoid potentially redundant casts on Series init (#8613)

✨ Enhancements

  • add Expr.meta namespace eq and ne methods (#8599)
  • avoid potentially redundant casts on Series init (#8613)
  • use temp dir for ooc spills (#8614)
  • add strict dtype equality comparison methods (is_ and is_not) (#8600)
  • automatically convert series <op> expr to pl.lit(series) <op> expr (#8549)

🐞 Bug fixes

  • maintain sorted info on top-k and empty sort (#8615)
  • fix ooc sort regression; don't take IO-thread before init (#8607)
  • maintain sortedness in date -> datetime cast (#8606)

🛠️ Other improvements

  • document sortedness of return value of upsample (#8612)
  • Set up functions module in Rust bindings (#8598)
  • Split PyExpr impl block into modules (#8596)

Thank you to all our contributors for making this release possible!
@MarcoGorelli, @alexander-beedie, @dependabot, @dependabot[bot], @mcrumiller, @ritchie46 and @stinodego

polars - Python Polars 0.17.10

Published by github-actions[bot] over 1 year ago

🏆 Highlights

  • Out-of-core unique (#8573)

🚀 Performance improvements

  • improve OOC sort performance during partition phase (#8590)
  • significant speedup for python iteration over Series data (#8501)

✨ Enhancements

  • make ooc-sort resilient against chunk_size (#8588)
  • Out-of-core unique (#8573)
  • Add to_date, to_datetime, to_time to String namespace (#8579)
  • enhance parametric strategy retrieval, enable List strategy by default (#8571)
  • Add default value for round (#8566)
  • don't trigger unreachable code if no dtype is set (#8532)
  • Ergonomic inputs for all, any, sum, and cumsum (#8541)
  • accept expressions in groupby_dynamic/rolling (#8528)
  • add is_nested property to dtypes (#8514)

🐞 Bug fixes

  • fix determining of supertype for tz-aware and tz-naive datetimes (#8585)
  • correct for nested offsets in json serialization (#8584)
  • fix wrong dtype init in streaming groupby (#8574)
  • fix edge-case with NamedTuple input that contains unhashable field data (#8578)
  • temporarily disable List dtype in parametric tests (#8581)
  • fix categorical/string_cache fill_null panic (#8562)
  • fix testing asserts for NaN values in Struct data (#8557)
  • fix window function contention in binary expression (#8544)
  • fix struct pyarrow ffi (#8543)
  • don't trigger unreachable code if no dtype is set (#8532)
  • fix testing asserts for NaN values in List data (#8537)
  • keep sorted info on agg_first and simple singleton… (#8526)
  • don't downcast Decimal to Float64 in truediv (#8523)
  • unset fast_unique coming from arrow (#8521)
  • correct sign-reversed scale on DecimalChunked to Python Decimal conversion (fixes #8423) (#8508)
  • Clarify and fix behaviour in pl.min/max (#8509)

🛠️ Other improvements

  • warn about changing date_range default from lazy=False to eager=False (#8593)
  • Rename internals module to _reexport (#8554)
  • change partition strategy (#8561)
  • fix testing asserts for NaN values in Struct data (#8557)
  • note sortedness of results from groupby ops (#8540)
  • better type signature for set_sorted (#8529)
  • add test for categorical input that is not fast_unique (#8527)
  • Improvements to the Python release workflow (#8121)
  • Update docs requirements (#8200)

Thank you to all our contributors for making this release possible!
@MarcoGorelli, @alexander-beedie, @cgevans, @ritchie46, @stinodego and @uchiiii

polars - Python Polars 0.17.9

Published by github-actions[bot] over 1 year ago

Migration guide.

Operation that require columns to be sorted will now give a warning if they are not explicitly sorted, or tagged as sorted.

# 1. inform polars that a column is sorted on the DataFrame / LazyFrame.
(
    df.set_sorted("foo")
    .groupby_dynamic(..)
)

# 2. inform polars inline via the `set_sorted` expression
df.join_asof(df2, on=pl.col("foo").set_sorted())

# 3. explicitly sort first 
# this is expensive if the data is already sorted
df.sort("foo")

✨ Enhancements

  • expose quantile/mean for duration (#8491)
  • require explicitly sorted flag for upsample (#8488)
  • allow for _saturating suffix in duration strings (#8479)

🐞 Bug fixes

  • don't error on cast if column is not projected (#8495)
  • ensure window function succeeds on empty frame (#8492)
  • don't set verbose on union (#8487)
  • check literal/group length before claiming agg sta… (#8486)

🛠️ Other improvements

  • Remove unneeded operation in strptime (#8496)
  • additional parametric testing docs/examples (#8485)
  • improve sorted warning/ fix tests (#8484)

Thank you to all our contributors for making this release possible!
@MarcoGorelli, @alexander-beedie, @ritchie46 and @stinodego

polars - Python Polars 0.17.8

Published by github-actions[bot] over 1 year ago

🚀 Performance improvements

  • less naive count (#8473)
  • parallelise dataframe describe method (#8465)
  • parallelize almost all flattens (#8468)
  • optimize horizontal min/max (#8463)
  • reinstate old behavior in numeric group-tuples (#8445)

✨ Enhancements

  • apply thousand-separators to "shape" html output, consi… (#8472)
  • let duration string accept "1mo_saturating" (#8469)
  • add dt.month_start and dt.month_end (#8435)
  • add SQL support for cumulative functions (#8457)
  • improve utility of dtype groups (#8453)
  • improved parametric Decimal strategy (#8444)
  • improved hypothesis/parametric testing profile registration (#8433)

🐞 Bug fixes

  • fix error message of offset_by if offsetting by negative number of months (#8464)
  • fix sorted warning (#8462)
  • improve utility of dtype groups (#8453)

🛠️ Other improvements

  • bubble up time_iter errors (#8467)
  • additional test coverage for dtype groups (#8458)
  • integrate live refresh/reload facility while writing docs (#8452)
  • add a series of parametric/hypothesis example tests to the main testing docs page (#8454)
  • parametric testing docs improvements (#8447)
  • improved parametric Decimal strategy (#8444)
  • improved hypothesis/parametric testing profile registration (#8433)

Thank you to all our contributors for making this release possible!
@MarcoGorelli, @alexander-beedie, @ritchie46, @universalmind303 and @utkarshgupta137

polars - Python Polars 0.17.7

Published by github-actions[bot] over 1 year ago

🚀 Performance improvements

  • remove false sharing in perfect hash table >2x (#8432)
  • further optimised conversions to python date/datetime (#8417)

✨ Enhancements

  • initial parametric/hypothesis Decimal dtype testing strategy (note: disabled by default) (#8430)
  • add Series support to pl.from_repr (#8429)
  • Allow %f in strptime format strings (#8404)

🐞 Bug fixes

  • raise upon invalid use of zero_copy_only (#8418)
  • respect dtype in anonymous list builder in case of… (#8428)
  • str.strptime error message: utf -> utc (#8422)

🛠️ Other improvements

  • initial parametric/hypothesis Decimal dtype testing strategy (note: disabled by default) (#8430)

Thank you to all our contributors for making this release possible!
@alexander-beedie, @ayemjay, @jonashaag, @mzjp2, @pgimalac, @ritchie46 and @stinodego