polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust

OTHER License

Downloads
9.7M
Stars
26.3K
Committers
213

Bot releases are visible (Hide)

polars - Python Polars 0.14.31

Published by ritchie46 almost 2 years ago

🚀 Performance improvements

  • improve streaming primitve groupby (#5575)
  • vectorize integer vec-hash by using very simple, … (#5572)

✨ Enhancements

  • prefer streaming groupby if partitionable (#5580)

🐞 Bug fixes

  • fix ub due to invalid dtype on splitting dfs (#5579)

🛠️ Other improvements

  • Remove old Python changelog file (#5577)
  • namespace registration docs update (#5565)
  • Improve contributing guide (#5558)

Thank you to all our contributors for making this release possible!
@alexander-beedie, @ghuls, @ritchie46 and @stinodego

polars - Python Polars 0.14.29

Published by github-actions[bot] almost 2 years ago

🚀 Performance improvements

  • specialized utf8 groupby in streaming (#5535)

✨ Enhancements

  • add dataframe.pearson_corr (#5533)
  • support namespace registration (#5531)
  • make map_alias fallible (#5532)
  • pl.min & pl.max accept wildcard similar to pl.sum (#5511)
  • additional support for using timedelta with duration-type arguments (#5487)

🐞 Bug fixes

  • fix(rust, python); fix projection pushdown in asof joins (#5542)
  • streaming hstack allow duplicates (#5538)
  • fix streaming empty join panic (#5534)
  • fix duplicate caches in cse and prevent quadratic … (#5528)
  • allow appending categoricals that are all null (#5526)
  • tz-aware strftime (#5525)
  • make 'truncate' tz-aware (#5522)
  • fix coalesce expreession expansion (#5521)
  • fix nested aggregatin in when then and window expr… (#5520)
  • fix sort_by expression if groups already aggregated (#5518)
  • fix bug in batched parquet reader that dropped dfs… (#5506)
  • preserve Series name when exporting to pandas (#5498)
  • Refactor is_between (#5491)
  • fix bugs in skew and kurtosis (#5484)

🛠️ Other improvements

  • support tabbed panels in sphinx, add namespace docs (#5540)
  • Update dev dependencies (#5517)

Thank you to all our contributors for making this release possible!
@alexander-beedie, @braaannigan, @ghuls, @ritchie46, @sorhawell and @zundertj

polars - Python Polars 0.14.27

Published by github-actions[bot] almost 2 years ago

✨ Enhancements

  • additional autocomplete affordances for IPython users (#5477)
  • make streaming work with multiple sinks in a sing… (#5474)
  • add streaming slice operation (#5466)
  • run partial streaming queries (#5464)
  • streaming left joins (#5456)
  • file statistics so we only (try to) keep smallest table in memory (#5454)
  • streaming inner joins. (#5400)

🐞 Bug fixes

  • compute correct offset for streaming join on multi… (#5479)
  • return error on invalid sortby expression (#5478)
  • use json for expr pickle (#5476)
  • improved namespace/accessor behaviour (resolves VSCode autocomplete issue) (#5469)
  • further improved lazy loading (#5459)
  • fix for categorical inserts from row-oriented data (#5462)
  • use of fill_null with temporal literals (#5440)

🛠️ Other improvements

  • don't panic if part of query cannot run strea… (#5458)
  • add build_info() to the API doc (#5442)
  • Improved structure for DataFrame and LazyFrame API docs, misc design improvements (#5433)

Thank you to all our contributors for making this release possible!
@alexander-beedie, @dannyvankooten, @ritchie46, @s1ck, @slonik-az, @stinodego and @universalmind303

polars - Python Polars 0.14.26

Published by github-actions[bot] almost 2 years ago

✨ Enhancements

  • build_info() provides detailed information how polars was built (#5423)
  • add missing width property to LazyFrame (#5431)
  • enhanced Series.dot method and related interop (#5428)
  • allow regex and wildcard in groupby (#5425)
  • support DataFrame init from generators (#5424)
  • support Series init from generator (#5411)

🐞 Bug fixes

  • fix freeze/stall when writing more than 2^31 string values to parquet (#5366)
  • properly handle json with unclosed strings (#5427)
  • fix null poisoning in rank operation (#5417)
  • correct expr::diff dtype for temporal columns (#5416)
  • fix cse for nested caches (#5412)
  • don't set sorted flag in argsort (#5410)

🛠️ Other improvements

  • Fix dependencies on memory allocator (#5426)
  • Better docstring for keep_name (#5378) (#5421)

Thank you to all our contributors for making this release possible!
@CalOmnie, @alexander-beedie, @ghuls, @ritchie46, @slonik-az, @stinodego and @universalmind303

polars - Python Polars 0.14.25

Published by github-actions[bot] almost 2 years ago

✨ Enhancements

  • 30x speedup initialising Series from python range object (#5397)
  • r-associative support for commutative DataFrame operators (#5394)
  • pl.from_epoch function (#5330)
  • Streaming joins architecture and Cross join implementation. (#5339)
  • enable frame init from sequence of pandas series, and improve lazy typechecks (handle subclasses) (#5383)
  • add support for am/pm notation in parse_dates read_csv (#5373)
  • add reduce/cumreduce expression as an easier fold (#5364)

🐞 Bug fixes

  • explicit nan comparison in min/max agg (#5403)
  • lazy proxy module does not require global registration (#5390)
  • Correct CSV row indexing (#5385)

🛠️ Other improvements

  • Docstrings for frame, lazyframe and time series (#5398)
  • add integrated support for copying API examples, and auto-parallelise docs build (#5393)
  • improve rendering of API docs type signatures, mark PivotOps as deprecated, misc tidy-ups (#5388)
  • Expression docstrings (#5377)
  • minor navbar improvements; adds discord and twitter links, fixes github icon (#5379)
  • improve structure of sphinx-generated API docs (#5376)
  • Add with_time_zone to reference guide (#5369)

Thank you to all our contributors for making this release possible!
@YuRiTan, @alexander-beedie, @braaannigan, @owrior, @ritchie46 and @zundertj

polars - Rust Polars 0.25.0

Published by github-actions[bot] almost 2 years ago

Most notable mention this release is the start of Out Of Core support in polars, meaning we are able to process larger than RAM datasets. This is currently supported for parts of queries that read from csv or parquet and are limited to select, filter, and groupby operations. Many more operations will follow in next releases.

See https://github.com/pola-rs/polars/pull/5139#issuecomment-1274687634 where we were able to process a 80GB dataset on a laptop with only 16GB RAM.

Thanks to everyone who contributed to another release! 🙌

⚠️ Breaking changes

  • rename expand_at_index -> new_from_index (#5259)

🚀 Performance improvements

  • lower contention in out of core filter (#5311)
  • improve pivot performance by using faster series… (#5172)
  • improve streaming performance (~15%) (#5170)
  • don't block projection pushdown on unnest (#5123)
  • more conservative JIT sort settings (#5080)
  • sort and unsort join key if other side is sorted (#5069)
  • do not rechunk left joins (#5066)
  • Prune unneeded projections (#5032)
  • Improve predicate pushdown + with_columns (#5029)
  • Don't execute unused with_column expressions (#5026)

✨ Enhancements

  • shrink_type expression (#5351)
  • tz_localize expression (#5340)
  • accept expr in arr.get (#5337)
  • Implement forward strategy in groupby join_asof (#5335)
  • improve dynamic inference of struct types (#5297)
  • Add newline to Aggregate..FROM describe_optimization_plan (#5253)
  • date_range expression (#5267)
  • show expression where error originated if raised … (#5263)
  • improve error msg if window expressions length do… (#5262)
  • Add round for date and datetime (#5153)
  • new n_chars functionality for utf8 strings (#5252)
  • added new Config formatting option set_tbl_column_data_type_inline, fixed reading of env vars, improved interaction between formatting options (#5243)
  • make date_range timezone aware (#5234)
  • Rust functions for typed JsonPath implementation (#5140)
  • allow polars Config options to be serialised/shared, and more easily unset (#5219)
  • batched csv reader (#5212)
  • accept expressions in arr.slice (#5191)
  • is_sorted aggregation fast path for Utf8Chunked (#5184)
  • hybrid streaming query engine (#5139)
  • add binary dtype (#5122)
  • improve function expansion (#5110)
  • add struct arithmetics (#5107)
  • add cumfold/cumsum expression (#5103)
  • error on invalid asof join inputs (#5100)
  • small plan and profile chart improvements (#5067)
  • Initial implementation of histogram algorithm (#4752)

🐞 Bug fixes

  • unnest only pushdown column if there are projections (#5360)
  • block is_null predicate in asof join (#5358)
  • ensure that no-projection is seen as select all in… (#5356)
  • resolve duplicated column names in pivot (#5349)
  • fix serde of expression (pickle) (#5333)
  • don't set auto-explode in apply_multiple (#5265)
  • export anonymousscan in lazy prelude (#5295)
  • fix explicit list + sort aggregation in groupby co… (#5317)
  • fix sort-merge dispatch of utf8 (#5315)
  • properly interpret FMT_MAX_ROWS - remove arbitrary minimum, fix Series formatting (#5281)
  • don't block non matching groups in binary expression (#5273)
  • fix logical type of nested take (#5271)
  • tag IntoSeries trait as unsafe (#5258)
  • include single null value in global cat builder (#5254)
  • include slice in sort fast path (#5247)
  • determine supertype of datetimes with timezones an… (#5240)
  • fix groupby dynamic truncate for > days resolution (#5235)
  • set timezone on groupby_dynamic boundaries (#5233)
  • fix incorrect duration dtype (#5226)
  • set string cache if lazy schema contains categorical (#5225)
  • fix pipeline dtypes (#5224)
  • fix asof_join schema (#5213)
  • fix single thread loop if schema lenght is off by 1 (#5210)
  • improve numeric stability of rolling_variance (#5207)
  • fix overflow in partitioned groupby mean of int32/… (#5204)
  • don't allow categorical append that is not under s… (#5195)
  • include offset in arr.get (#5193)
  • fix rolling_float in case closure returns None (#5180)
  • Implement missing extract conversion for Time datatype (#5161)
  • implement missing conversion to python time object (#5152)
  • microsecond noise on date >> time cast (add 00:00:00 fast-path) (#5149)
  • wrong operator mapped for LtEq (#5120)
  • unique include null (#5112)
  • don't recurse assign uniuns as it SO > 5k files (#5098)
  • block projection pushdown on unnest (#5093)
  • projection_node always do projection locally if no… (#5090)
  • fix iso_year for Date dtype (#5074)
  • fix bug in unneeded projection pruning (#5071)
  • Improve printing controls of DataFrame and Series (#5047)
  • Double projections should be checked on input schema (#5058)
  • Apply flat overlapping row groups when possible (#5039)
  • Ensure all predicates use same key function when inserting… (#5034)
  • Only consider dt series equal if they have the same tz (#5025)
  • Special-case ewm_mean(alpha=1) (#5019)
  • Time zone conversion bug (NY -> UTC works, UTC -> NY doesn't) (#5014)
  • Fix timezone cast (#5016)

🛠️ Other improvements

  • update to rustc to nightly-2022-10-24 (#5312)
  • update ahash and add nightly features of hashbrown (#5310)
  • Update comfy-table and memchr. (#5276)
  • rename expand_at_index -> new_from_index (#5259)
  • ensure streaming groupby take slice into account (#5178)
  • move polars-sql under polars folder (#5176)
  • remove aggregate pushdown optimization (#5173)
  • relax sync requirement on Executor trait impls (#5142)
  • Get rid of unnecessary check in SplitLines iterator (#5141)
  • Constant instead of literal (#5088)
  • Use release-drafter to draft releases with changelogs (#5033)
  • Fix docs by activating docfg feature (#5028)
  • Split up polars-lazy crate. (#5020)

Thank you to all our contributors for making this release possible!
@AlecZorab, @YuRiTan, @alexander-beedie, @cjermain, @dannyvankooten, @dpatton-gr, @egorchakov, @ghuls, @hpux735, @matteosantama, @mcrumiller, @owrior, @ritchie46, @slonik-az, @sorhawell, @stinodego, @thatlittleboy, @universalmind303 and @zundertj

polars - Python Polars 0.14.24

Published by github-actions[bot] almost 2 years ago

✨ Enhancements

  • shrink_type expression (#5351)
  • don't raise error but print a warning if mp fork method… (#5342)
  • tz_localize expression (#5340)
  • accept expr in arr.get (#5337)
  • Implement forward strategy in groupby join_asof (#5335)

🐞 Bug fixes

  • unnest only pushdown column if there are projections (#5360)
  • block is_null predicate in asof join (#5358)
  • ensure that no-projection is seen as select all in… (#5356)
  • resolve duplicated column names in pivot (#5349)
  • remove unused branch in getitem (#5348)
  • nested dicts / list generation (#5336)
  • fix serde of expression (pickle) (#5333)
  • handle old-style module loaders such that we can still lazy load them (#5331)
  • explicit output type in apply (#5328)

🛠️ Other improvements

  • remove multiprocessing check, and leave it to the user (#5347)
  • Update dev, lint and docs dependencies (#5338)
  • lazy module proxy (obviate attribute access guards for missing modules) (#5320)

Thank you to all our contributors for making this release possible!
@AlecZorab, @alexander-beedie, @ghuls and @ritchie46

polars - Python Polars 0.14.23

Published by github-actions[bot] almost 2 years ago

🐞 Bug fixes

  • fix explicit list + sort aggregation in groupby co… (#5317)
  • fix sort-merge dispatch of utf8 (#5315)
  • close multi-threading pool in df creation (#5309)
  • fix and check all uninstalled imports in ci (#5304)

🛠️ Other improvements

  • Add "import polars.testing" to testing docstrings (#5316) (#5318)
  • streamline lazy imports (#5302)
  • Catch deprecation warnings in unit tests (#5306)
  • fix and check all uninstalled imports in ci (#5304)

Thank you to all our contributors for making this release possible!
@alexander-beedie, @ghuls, @ritchie46, @thatlittleboy, @universalmind303 and @zundertj

polars - Python Polars 0.14.22

Published by github-actions[bot] almost 2 years ago

🚀 Performance improvements

  • Make all expensive imports lazy - ~85% (#5287)
  • remove pandas imports (#5286)
  • never import hypothesis in user code (#5282)

✨ Enhancements

  • expose to_struct to series list namespace (#5298)
  • improve dynamic inference of struct types (#5297)
  • don't panic in failing apply (#5294)
  • improve error message in struct apply (#5291)
  • accept schema in read_dicts (#5290)
  • Do not import polars.testing by default (#5284)
  • Pass more options to pyarrow in write_parquet (#5278) (#5280)
  • date_range expression (#5267)
  • allow implicit None branch in when then otherwise (#5264)
  • show expression where error originated if raised … (#5263)
  • improve error msg if window expressions length do… (#5262)
  • pl.ones, pl.zeros and Series.new_from_index functions (#5260)
  • Add round for date and datetime (#5153)
  • new n_chars functionality for utf8 strings (#5252)
  • added new Config formatting option set_tbl_column_data_type_inline, fixed reading of env vars, improved interaction between formatting options (#5243)

🐞 Bug fixes

  • throw error on invalid lazy concat strategy (#5292)
  • fix to_pandas edge case (#5293)
  • properly interpret FMT_MAX_ROWS - remove arbitrary minimum, fix Series formatting (#5281)
  • respect schema overwrite in from rows (#5275)
  • don't block non matching groups in binary expression (#5273)
  • fix logical type of nested take (#5271)
  • Check if BatchedCsvReader.next_batches() is None befor… (#5256)
  • include single null value in global cat builder (#5254)
  • Check multiprocessing start_method on import (#3144) (#5237)

🛠️ Other improvements

  • Add ModuleType for import functions in import_check.py (#5289)

Thank you to all our contributors for making this release possible!
@alexander-beedie, @ghuls, @owrior and @ritchie46

polars - Python Polars 0.14.21

Published by github-actions[bot] about 2 years ago

🐞 Bug fixes

  • include slice in sort fast path (#5247)
  • don't use zoneinfo globally (#5246)

Thank you to all our contributors for making this release possible!
@ritchie46

polars - Python Polars 0.14.20

Published by github-actions[bot] about 2 years ago

✨ Enhancements

  • make date_range timezone aware (#5234)
  • infer timezone and improve display (#5232)
  • allow Config to be used as a context manager, and update some docs (#5223)
  • allow polars Config options to be serialised/shared, and more easily unset (#5219)

🐞 Bug fixes

  • determine supertype of datetimes with timezones an… (#5240)
  • fix groupby dynamic truncate for > days resolution (#5235)
  • ensure that polars_type_to_constructor works with tz-aware Datetime dtypes (#5239)
  • set timezone on groupby_dynamic boundaries (#5233)
  • accept tuple[bool, bool] instead of Sequence[bool] for Expr.is_between (#5094)
  • fix incorrect duration dtype (#5226)
  • set string cache if lazy schema contains categorical (#5225)
  • fix pipeline dtypes (#5224)

🛠️ Other improvements

  • update lazyframe lazygroupby apply docstring (#5238)
  • Consistent naming for Python release workflow (#5229)

Thank you to all our contributors for making this release possible!
@YuRiTan, @alexander-beedie, @cjermain, @matteosantama, @ritchie46 and @stinodego

polars - Python Polars 0.14.19

Published by github-actions[bot] about 2 years ago

🚀 Performance improvements

  • improve pivot performance by using faster series… (#5172)
  • improve streaming performance (~15%) (#5170)
  • don't block projection pushdown on unnest (#5123)

✨ Enhancements

  • batched csv reader (#5212)
  • accept expressions in arr.slice (#5191)
  • is_sorted aggregation fast path for Utf8Chunked (#5184)
  • support DataFrame init with Datetime dtypes that specify a timezone (#5174)
  • frame-level n_unique() that can count unique rows or col/expr subsets (#5165)
  • hybrid streaming query engine (#5139)
  • return Datetime/Duration with appropriate timeunit when inferring from pytype (#5127)
  • add binary dtype (#5122)

🐞 Bug fixes

  • fix asof_join schema (#5213)
  • fix single thread loop if schema lenght is off by 1 (#5210)
  • improve numeric stability of rolling_variance (#5207)
  • fix apply function over object dtype (#5206)
  • fix overflow in partitioned groupby mean of int32/… (#5204)
  • don't allow categorical append that is not under s… (#5195)
  • include offset in arr.get (#5193)
  • DataFrame.fill_null include unsigned integers (#5192)
  • error on fill_nan on non float dtype (#5185)
  • infer missing columns in from_dicts (#5183)
  • fix rolling_float in case closure returns None (#5180)
  • Implement missing extract conversion for Time datatype (#5161)
  • implement missing conversion to python time object (#5152)
  • Rendering long docstring lines. (#5150)
  • add missing _NUMPY_AVAILABLE check in Series.__getitem__ (#5126)
  • wrong operator mapped for LtEq (#5120)

🛠️ Other improvements

  • skip failing test until #5177 is resolved (#5205)
  • ensure streaming groupby take slice into account (#5178)
  • remove aggregate pushdown optimization (#5173)
  • Add support for ruff python linter. (#5151)
  • improve typing; many list types are better defined as Sequence (#5164)
  • Get rid of unnecessary check in SplitLines iterator (#5141)

Thank you to all our contributors for making this release possible!
@alexander-beedie, @dannyvankooten, @ghuls, @ritchie46 and @sorhawell

polars - Python Polars 0.14.18

Published by github-actions[bot] about 2 years ago

🚀 Performance improvements

  • take advantage of sorted join for frame alignment (#5106)

✨ Enhancements

  • improve function expansion (#5110)
  • add struct arithmetics (#5107)
  • add cumfold/cumsum expression (#5103)
  • error on invalid asof join inputs (#5100)

🐞 Bug fixes

  • unique include null (#5112)
  • don't recurse assign uniuns as it SO > 5k files (#5098)
  • block projection pushdown on unnest (#5093)
  • projection_node always do projection locally if no… (#5090)

🛠️ Other improvements

  • deprecate name argument in drop (#5099)
  • improve py-polars/Makefile (#5089)

Thank you to all our contributors for making this release possible!
@alexander-beedie, @owrior, @ritchie46 and @slonik-az

polars - Python Polars 0.14.17

Published by github-actions[bot] about 2 years ago

🚀 Performance improvements

  • more conservative JIT sort settings (#5080)

Thank you to all our contributors for making this release possible!
@mcrumiller, @ritchie46 and @zundertj

polars - Python Polars 0.14.16

Published by github-actions[bot] about 2 years ago

🚀 Performance improvements

  • sort and unsort join key if other side is sorted (#5069)
  • do not rechunk left joins (#5066)

✨ Enhancements

  • deprecate boolean mask for Series indexing (#5075)
  • small plan and profile chart improvements (#5067)
  • add gantt chart plot to LazyFrame::profile (#5063)
  • Support Series init as struct from @dataclass and annotated NamedTuple (#5057)

🐞 Bug fixes

  • fix iso_year for Date dtype (#5074)
  • tz-aware get_idx (#5072)
  • Fix empty method detection when PYTHONOPTIMIZE=2 (#5043)
  • fix bug in unneeded projection pruning (#5071)
  • remove overloads for from_arrow (#5065)
  • Improve printing controls of DataFrame and Series (#5047)
  • Double projections should be checked on input schema (#5058)
  • Add missing cse param to LazyFrame "profile" method (#5054)

🛠️ Other improvements

  • Default to zstd parquet compression (#5060)
  • Refactor show_graph (#5059)
  • Use release-drafter to draft releases with changelogs (#5033)
  • Update Makefile (#5056)
  • Parametric test coverage for EWM functions (#5011)

Thank you to all our contributors for making this release possible!
@alexander-beedie, @egorchakov, @matteosantama, @ritchie46, @slonik-az, @stinodego and @zundertj

polars - Python Polars 0.14.15

Published by stinodego about 2 years ago

polars - Rust Polars 0.24.3

Published by stinodego about 2 years ago

polars - Rust Polars 0.24.0

Published by ritchie46 about 2 years ago

New rust polars release! 🚀

This is the release of rust polars 0.24.0. This release comes with a lot of bug fixes, performance improvements and added functionality. The changes that stand out are larger than RAM memory mapping of IPC files and a new common-subplan-optimization that prunes duplicated sub-plan from the query plan and thereby potentially save a lot of duplicated work.

See more

Update to arrow2 0.14.0

See the 0.14.0 release for all upstream improvements.

New Contributors

Full Changelog: https://github.com/pola-rs/polars/compare/rust-polars-v0.23.0...rust-polars-v0.24.0

polars - Rust polars 0.23.0

Published by ritchie46 about 2 years ago

What's Changed

New Contributors

Full Changelog: https://github.com/pola-rs/polars/compare/rust-polars-v0.22.1...rust-polars-v0.23.0

polars - Rust polars 0.22.1

Published by ritchie46 over 2 years ago

What's Changed

New Contributors

Full Changelog: https://github.com/pola-rs/polars/compare/rust-polars-v0.21.1...rust-polars-v0.22.1