polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust

OTHER License

Downloads
9.7M
Stars
26.3K
Committers
213

Bot releases are visible (Hide)

polars - Python Polars 0.20.3-rc.2

Published by github-actions[bot] 10 months ago

🚀 Performance improvements

  • don't needlessly allocate validity in concat/rechunk (#13288)
  • add fast path to count_bits_set_by_offsets (#13253)
  • make .dt.truncate('*mo') more than 3x faster (#13192)

✨ Enhancements

  • change doc links to new url docs.pola.rs (#13290)
  • support horizontal concatenation of LazyFrames (#13139)
  • Rename Utf8 data type to String, keep Utf8 as alias (#13257)
  • dispatch strict_cast via cast (#13255)
  • Impl any/all for array type (#13250)
  • add cancellable queries (#13178)
  • add offset parameter to gather_every (#13156)
  • Support Array dtype AnyValue Series construction (#12817)
  • Allow step parameter in int_ranges to take an expression (#13148)
  • make python map_batches safer (#13181)
  • Implement count for DataFrame/LazyFrame (#13153)

🐞 Bug fixes

  • sorting categorical lexically bugs on null values (#13271)
  • improve replace on categoricals (#13223)
  • round trip to JSON and back should preserve Enum type (#13267)
  • fix return type hint of list series any/all (#13265)
  • sink_csv deadlock (#13239)
  • Correctly use read_parquet for all binary inputs (#13218)
  • is_in operator for categoricals (#13205)
  • Better handle mismatched dtypes in replace (#13213)
  • Fix replace fast path by casting old input to the right data type (#13176)
  • ndjson nested null schema inference (#13206)
  • don't cast to unknown dtypes (#13197)
  • maintain old join behavior in window expression (#13179)

🛠️ Other improvements

  • Copy Makefile build commands to top level (#13293)
  • Fix release flags (#13298)
  • Re-enable consortium standard tests (#13296)
  • Update CODEOWNERS (#13292)
  • Add CPU compatibility check (#13134)
  • Change base url of docs/guide to docs.pola.rs (#13281)
  • Fix source link for dev docs (#13279)
  • fix return type hint of list series any/all (#13265)
  • Fix display of overloaded signatures (#13258)
  • clean up bytecode parsing a bit (#13221)
  • Add a couple of docstring examples to Series methods (#13244)
  • remove unnecessary arg unpacking (#13241)
  • update rustc (#13219)
  • fix horizontal concatenation documentation (#13141)
  • Replace blackdoc by ruff's new docstring formatter (#13182)
  • Update ruff & ruff settings (#13126)
  • Link to latest object_store docs in api doc (#13180)
  • Fix failing test (#13171)

Thank you to all our contributors for making this release possible!
@MarcoGorelli, @TNieuwdorp, @adamreeve, @alexander-beedie, @c-peters, @cjfuller, @dependabot, @dependabot[bot], @mcrumiller, @orlp, @petrosbar, @r-brink, @reswqa, @ritchie46, @robvanmieghem and @stinodego

polars - Python Polars 0.20.3-rc.1

Published by github-actions[bot] 10 months ago

🚀 Performance improvements

  • add fast path to count_bits_set_by_offsets (#13253)
  • make .dt.truncate('*mo') more than 3x faster (#13192)

✨ Enhancements

  • Rename Utf8 data type to String, keep Utf8 as alias (#13257)
  • dispatch strict_cast via cast (#13255)
  • Impl any/all for array type (#13250)
  • add cancellable queries (#13178)
  • add offset parameter to gather_every (#13156)
  • Support Array dtype AnyValue Series construction (#12817)
  • Allow step parameter in int_ranges to take an expression (#13148)
  • make python map_batches safer (#13181)
  • Implement count for DataFrame/LazyFrame (#13153)

🐞 Bug fixes

  • sorting categorical lexically bugs on null values (#13271)
  • improve replace on categoricals (#13223)
  • round trip to JSON and back should preserve Enum type (#13267)
  • fix return type hint of list series any/all (#13265)
  • sink_csv deadlock (#13239)
  • Correctly use read_parquet for all binary inputs (#13218)
  • is_in operator for categoricals (#13205)
  • Better handle mismatched dtypes in replace (#13213)
  • Fix replace fast path by casting old input to the right data type (#13176)
  • ndjson nested null schema inference (#13206)
  • don't cast to unknown dtypes (#13197)
  • maintain old join behavior in window expression (#13179)

🛠️ Other improvements

  • Add CPU compatibility check (#13134)
  • Change base url of docs/guide to docs.pola.rs (#13281)
  • Fix source link for dev docs (#13279)
  • fix return type hint of list series any/all (#13265)
  • Fix display of overloaded signatures (#13258)
  • clean up bytecode parsing a bit (#13221)
  • Add a couple of docstring examples to Series methods (#13244)
  • remove unnecessary arg unpacking (#13241)
  • update rustc (#13219)
  • fix horizontal concatenation documentation (#13141)
  • Replace blackdoc by ruff's new docstring formatter (#13182)
  • Update ruff & ruff settings (#13126)
  • Link to latest object_store docs in api doc (#13180)
  • Fix failing test (#13171)

Thank you to all our contributors for making this release possible!
@MarcoGorelli, @TNieuwdorp, @adamreeve, @alexander-beedie, @c-peters, @cjfuller, @dependabot, @dependabot[bot], @mcrumiller, @orlp, @petrosbar, @r-brink, @reswqa, @ritchie46, @robvanmieghem and @stinodego

polars - Python Polars 0.20.2

Published by github-actions[bot] 10 months ago

🚀 Performance improvements

  • ensure single expression evaluation for replace (#13147)
  • drop the pyarrow conversion path in iter_rows; we can now do fully native conversion ~2-3x faster (#13122)

✨ Enhancements

  • Move from GA to more privacy friendly framework (#13155)
  • prune all/any_horizontals with single inputs (#13146)
  • ensure we get cleaner logical plans with any/all_horizontal (#13144)

🐞 Bug fixes

  • Fix comparison of categoricals (#13137)
  • Use the name of the leftmost expression in horizontal operations (#13143)
  • any_value should supports cast to boolean (#13125)
  • Update offsets of null value correctly for all from_iter_xxx_trusted_len (#13132)
  • fix neq for series cmp str (#13128)
  • Fix off-by-one error in lit dtype determination for integers (#13129)
  • fix category list builder append series with multiple chunks (#13116)

🛠️ Other improvements

  • Fix release LTS CPU step (#13160)
  • Use the name of the leftmost expression in horizontal operations (#13143)
  • ensure we get cleaner logical plans with any/all_horizontal (#13144)
  • Minor cleanup of PyO3 bindings (#13067)
  • Update auto_explode param name to returns_scalar (#13119)
  • Mark whether the current package is the LTS-CPU version (#13068)

Thank you to all our contributors for making this release possible!
@alexander-beedie, @c-peters, @orlp, @reswqa, @ritchie46 and @stinodego

polars - Python Polars 0.20.1

Published by github-actions[bot] 10 months ago

🐞 Bug fixes

  • repeat_by should not raise if by contains nulls (#13105)
  • [csv] raise on single quote char (#13104)
  • Raise if scan zstd compressed csv file (#13102)
  • allow timeunit-less dtype in pl.lit creation (#12997)
  • Don't check map length if input is literal (#13098)
  • rolling_quantile can get incorrect state (#13088)

🛠️ Other improvements

  • Fix column name in contains_any example (#13090)
  • update user-defined-functions for 0.19.x (#13071)
  • Fix some links, and make map_batches warning more evident (#13081)
  • Linting updates (#13069)
  • take pl.concat out of StringCache context manager in "mismatched string cache" error message (#13076)
  • add Enum to dtype list (#13080)

Thank you to all our contributors for making this release possible!
@MarcoGorelli, @mcrumiller, @reswqa, @ritchie46 and @stinodego

polars - Python Polars 0.20.0

Published by github-actions[bot] 10 months ago

This version includes quite a few breaking changes. We are preparing for the 1.0 release and aim to make the upgrade from 0.20 to 1.0 as smooth as possible. Therefore, we prioritized getting any breaking changes in now rather than with 1.0.

Check out the upgrade guide for help navigating the upgrade to this version.

Please bear with us while we continue to make Polars the best tool it can be!

🏆 Highlights

  • Add new Enum categorical data type which allows a fixed set of categories (#11822)

💥 Breaking changes

  • Use Object Store instead of fsspec for read_parquet (#13044)
  • Reimplement replace expression on the Rust side (#13002)
  • Preserve left and right join keys in outer joins (#12963)
  • Update update signature (#12986)
  • Update Expr.count to ignore null values by default (#12934)
  • Scheduled removal of previously deprecated functionality (#12885)
  • Allow all DataType objects to be instantiated (#12470)
  • Change value_counts resulting column name from counts to count (#12506)
  • Change default join behavior with regard to nulls, add join_nulls parameter to keep existing behavior (#12840)
  • Default to exact checking for integers in assertion utils (#12331)
  • Set default dtype for Series to Null when no data is present (#12807)
  • Update lit behavior for list/tuple inputs (#12559)
  • Change DataType.is_nested from property to classmethod (#12453)
  • Update constructors for Array and Decimal (#12837)
  • Smaller integer data types for datetime components (#12070)
  • Fix NaN ordering to make NaNs compare greater than any other float, and equal to themselves (#12721)

⚠️ Deprecations

  • Rename write_database parameter if_exists to if_table_exists (#12783)

🚀 Performance improvements

  • Avoid dispatching to expression engine for various Series methods (#13010)
  • Elide allocation in outer join materialization (#12992)
  • Avoid dispatching Series.head/tail to the expression engine (#12946)
  • Ensure we reduce for any/all_horizontal (#12976)
  • Add fast paths for UTC in truncate (#12965)
  • Use select_seq for expression dispatch (#12962)
  • Improve rolling_median algorithm (#12704)
  • Use fast path for non-null data in new SQL-like null matching (#12874)
  • Optimize DataFrame.iter_rows for smaller buffer sizes (#12804)
  • Speed up initializing Series from a list of NumPy arrays (#12785)

✨ Enhancements

  • Add str.contains_any and str.replace_many (Aho-Corasick algorithms) (#13073)
  • Auto-infer credentials from .aws folder (#13062)
  • Support private cloud S3 storage in scan_parquet (#13060)
  • Use Object Store instead of fsspec for read_parquet (#13044)
  • Avoid dispatching to expression engine for various Series methods (#13010)
  • Allow order operators (<,>,>=,<=) on Enum types (#12982)
  • Reimplement replace expression on the Rust side (#13002)
  • Expand set of NumPy functions which emit inefficient map_* warning (#13039)
  • Use tokio semaphore for concurrency handling (#13026)
  • Improve and expressify hist (#13014)
  • Update describe to use new count implementation (#12990)
  • Add default to_struct Series name consistent with the usual default Series name (empty string) (#12998)
  • Preserve left and right join keys in outer joins (#12963)
  • Clarify "inefficient map_elements" warning message (#12978)
  • Allow end before start in date/time_range (#12964)
  • Update update signature (#12986)
  • Minor update to Array data type repr (#12973)
  • Implement group-tuples for Null dtype (#12975)
  • Cast to an enum from int (#12954)
  • Move categorical ordering into dtype (#12911)
  • Avoid importing interchange module by default (#12927)
  • Update Expr.count to ignore null values by default (#12934)
  • Raise if expression passed as scalar to DataFrame constructor (#12916)
  • Update repr of Struct data type class (#12922)
  • Enable partial predicate pushdown past window expressions (#12710)
  • Add merge mode to write_delta and remove pyarrow to delta conversions (#12392)
  • Add str.reverse (#12878)
  • Allow all DataType objects to be instantiated (#12470)
  • Specific performance warnings from Rust to Python (#12802)
  • Change value_counts resulting column name from counts to count (#12506)
  • Implement std and var for Duration columns (#12865)
  • Change default join behavior with regard to nulls, add join_nulls parameter to keep existing behavior (#12840)
  • Enhance write_database return (indicate the number of rows affected by the operation) (#12830)
  • Add dedicated Decimal selector (#12852)
  • Preserve base dtype when raising to UInt power (#10446)
  • Default to exact checking for integers in assertion utils (#12331)
  • Improve __repr__ implementation for Expr (#12770)
  • Support SQL subqueries for JOIN and FROM (#12819)

🐞 Bug fixes

  • Fix off-by-one error in quantile(method="nearest") (#13058)
  • Fix incorrect schema inference on nested columns (#13057)
  • Don't raise for datetime_range if starting on ambiguous datetime and earliest was specified (#13050)
  • Parse json_decode per max buffer length (#13029)
  • Parse 00:00 time zone as UTC (#13034)
  • Fix timeout errors in concurrent downloads (#13023)
  • Streamline align_frames and fix edge-case where the identical frame object appears more than once (#13007)
  • Fix SQL substring indexing (#13016)
  • Allow broadcasting in ranges (#11900)
  • Prevent deadlock in sink_csv (#12991)
  • Don't get mutable if buffer is sliced (#12979)
  • Support parameterized read_database calls against cursors that only take positional args (#12967)
  • Fix truncate when truncating by multiple weeks (#12948)
  • Fix segfault / memory corruption after plugins return Err result (#12953)
  • Raise a proper python typed exception when IO writers try to write to an non existent folder (#12936)
  • Don't panic when ambiguous parameter is not Utf8 (#12913)
  • Raise a proper python typed exception when the CSV writer tries to write to an non existent folder (#12919)
  • Patch rolling_var/rolling_std numerical stability (#12909)
  • Fix incorrect Int16 min/max due to incorrect SIMD mask construction (#12908)
  • Improve handling of decimal conversion with to_numpy in the absence of pyarrow (#12888)
  • Fix OOB error in list set operations on empty frame (#12845)
  • Fix error message for uninstantiated Enum types (#12886)
  • Fix repr of Expr.gather (which was still showing deprecated take) (#12864)
  • Fix Array dtype equality (#12853)
  • Fix nan_min/max incorrectly aggregating chunks with addition (#12848)
  • Revert type hint change on expression inputs (#12792)
  • More accurate type hinting for collect_all functions (#12796)
  • Use total float ordering in is_in (#12800)
  • Handle aggregation for all-NaN groups in group_by (#12304)

🛠️ Other improvements

  • Update version switcher for 0.20 (#12844)
  • Add upgrade guide for Python Polars 0.20 (#12872)
  • Run doctests before other tests (#13047)
  • Update describe calculation of min/max (#13027)
  • Minor typo fix (#13003)
  • Resolve two interchange tests failing locally (#12999)
  • Update outdated links to API in Expressions/Functions page (#12981)
  • Expand docstrings for count (#12960)
  • Fix issue with docs for group_by_dynamic (#12906)
  • Prefer explicit --no-cov flag for py3.12/ubuntu test workflow (vs implicit/omitted) (#12889)
  • Scheduled removal of previously deprecated functionality (#12885)
  • Fix references in deprecation notes (#12877)
  • Fix typo in hash docstring (#12879)
  • Fix docstring for deprecated list.take (#12873)
  • Note that list.take is deprecated (#12867)
  • Fix failing tests (#12859)
  • Add quotes to pip install with dependencies (#12799)
  • Fix parameter name reference in update docstring #12797

Thank you to all our contributors for making this release possible!
@MarcoGorelli, @Object905, @Yerachmiel-Feltzman, @alexander-beedie, @c-peters, @ion-elgreco, @jankislinger, @mcrumiller, @nameexhaustion, @oli-clive-griffin, @orlp, @rancomp, @ritchie46, @romanovacca, @stinodego and @xuestrange

polars - Python Polars 0.19.19

Published by github-actions[bot] 11 months ago

✨ Enhancements

  • Parquet support required deltabyte encoding (#12836)

🐞 Bug fixes

  • Fix incorrect values from parquet RLE decoding (#12818)
  • Write only one dict page per row rowgroup (#12831)

Thank you to all our contributors for making this release possible!
@nameexhaustion, @ritchie46 and @stinodego

polars - Python Polars 0.19.18

Published by github-actions[bot] 11 months ago

✨ Enhancements

  • support nested null in vstack/append/extend/concat (#12771)
  • Improve error messages on attempted Arrow conversions involving incompatible/unknown dtypes (#12421)
  • determine mode parallelism depending on current tasks (#12764)
  • enable slice push down past with_columns (#12742)
  • Improve write_database, accounting for latest adbc fixes/updates (#12713)

🐞 Bug fixes

  • don't use streaming engine if aggregate is unknown (#12769)
  • Enable special casing of sequence in list_to_struct (#12759)
  • hold align_chunks_invariant (#12738)
  • allow leading zero and plus in integer parsing (#12744)
  • csv lines iter, always return remainder (#12739)
  • fix oob in set operations (#12736)
  • undo regression in ability to read certain parquet files (#12731)

🛠️ Other improvements

  • Use latest atoi_simd release (#12748)
  • Fix invalid references to xlsx2csv dependency (#12741)
  • Remove pinned aiohttp dependency (#12733)

Thank you to all our contributors for making this release possible!
@0siride, @PierreAttard, @RoDmitry, @alexander-beedie, @dependabot, @dependabot[bot], @eitsupi, @kszlim, @nameexhaustion, @orlp, @ritchie46 and @stinodego

polars - Python Polars 0.19.17

Published by github-actions[bot] 11 months ago

✨ Enhancements

  • Automatically wrap NumPy array as lit (#12709)
  • Add DataFrame.iter_columns (#12653)
  • favour showing "adbc_driver_manager" over "adbc_driver_sqlite" in show_versions (#12690)

🐞 Bug fixes

  • corr return nan if denominator is invalid (#12708)
  • parquet decimal statistics and schema (#12705)
  • support append/extend with null series (#11824) (#12686)
  • address a numpy ndarray init regression (#12701)
  • fix carrying over infinity into other windows (#12685)

🛠️ Other improvements

  • Update URI prefix in examples (prefer "postgresql" to "postgres") (#12707)
  • now that scan_parquet supports hive partitioning, remove note pointing to scan_pyarrow_dataset (#12706)
  • Minor docstring fixes (#12688)

Thank you to all our contributors for making this release possible!
@MarcoGorelli, @alexander-beedie, @c-peters, @ritchie46, @stinodego and @tkarabela

polars - Python Polars 0.19.16

Published by github-actions[bot] 11 months ago

⚠️ Deprecations

  • Rename series_equal/frame_equal to equals (#12618)
  • Rename map_dict to replace and change default behavior (#12599)

🚀 Performance improvements

  • order(s) of magnitude speedup when initialising List dtype Series from 2D numpy array (#12672)
  • improve merge_local_rhs_categorical traversal (#12660)
  • make values_size estimate correct for sliced arrays (#12658)
  • improve parquet utf8 validation (#12655)
  • parquet pre-allocate buffer in binary plain encode (#12652)
  • optimize dict binary decoding in parquet (#12648)
  • ensure we only check the values within bounds (#12633)
  • parquet; elide recursion in hot path (#12625)
  • improve cov/corr algorithm (#12590)

✨ Enhancements

  • Join operations on local categoricals (#12657)
  • Implement PySeries.from_buffer for boolean buffers (#12654)
  • Implement PySeries.from_buffer for numeric types (#12646)
  • use RLE_DICTIONARY for integers in parquet (#12647)
  • extend recent filter syntax upgrades to when/then construct (#12603)
  • implement RLE_DICT encoding for utf8/binary columns (reduced parquet file size) (#12623)
  • implement 'DeltaByteArray' decoding for parquet (#12602)

🐞 Bug fixes

  • json null inference (#12677)
  • cov/corr respect f32 type (#12676)
  • fix ternary zip_with null broadcast (#12668)
  • support negative slice on eager frame (#12644)
  • fix concurrency budget assertion (#12641)
  • fix oob in set operations (#12640)
  • panic reading parquet nested struct column (#12614)
  • Fix deprecation message for DataFrame.sum (#12619)
  • features: performant,lazy,random (#12600)

🛠️ Other improvements

  • Use range instead of np.arange in constructors (#12621)
  • update custom allocator instructions to include macOS (#12593)

Thank you to all our contributors for making this release possible!
@alexander-beedie, @c-peters, @cardoso, @dmitrybugakov, @nameexhaustion, @orlp, @ritchie46 and @stinodego

polars - Python Polars 0.19.15

Published by github-actions[bot] 11 months ago

⚠️ Deprecations

  • Rename str.json_extract to str.json_decode (#12586)

🚀 Performance improvements

  • apply left side predicate pushdown also to right side on semi join (#12565)
  • ensure streaming parquet download remains concurrent ~7x (#12552)

✨ Enhancements

  • warn if by column is not sorted in rolling aggregations (as opposed to raising), add warn_if_unsorted argument (#12398)
  • struct -> json encoding expression (#12583)
  • Implement support for multi-character comments in read_csv (#12519)
  • Implement LazyFrame.sink_ndjson (#10786)
  • use JEMALLOC on all unix architectures (#12568)
  • improve concurrency parameters (#12567)
  • In explain(), rename PIPELINE to STREAMING so it's clearer what it means (#12547)

🐞 Bug fixes

  • error when invalid list to array is given (#12584)
  • parquet: do not extend existing nested that is already complete (#12569)
  • accidental panic if predicate selects no files (#12575)
  • fix lazy parquet slice with nested columns (#12558)
  • ensure stats-evalutor exists (#12566)
  • list schema of list eval (#12563)
  • ensure concurrency budget never locks (#12555)
  • Fix lazy schema for group_by_dynamic and rolling (#12551)
  • address overflow on vec capacity calculation for int_ranges with negative step (#12548)

🛠️ Other improvements

  • convert all recursive parquet deserialize to iterative (#12560)
  • Minor cleanup in Expr class (#12549)

Thank you to all our contributors for making this release possible!
@MarcoGorelli, @Qqwy, @alexander-beedie, @dmitrybugakov, @fernandocast, @gab23r, @itamarst, @nameexhaustion, @ritchie46, @stinodego and @uchiiii

polars - Python Polars 0.19.14

Published by ritchie46 11 months ago

🏆 Highlights

  • Support Python 3.12 (#12094)
  • make 1D numpy to polars conversion zero-copy for numeric data (#12403)

⚠️ Deprecations

  • Rename DataFrame column index methods (#12542)
  • Rename Series.set_at_idx to scatter (#12540)
  • Deprecate Series.view (#12539)
  • Rename cumulative functions cumsum -> cum_sum and similar (#12513)
  • Rename take to gather (#12528)
  • Add dedicated horizontal aggregation methods to DataFrame (#12492)
  • Rename take_every to gather_every (#12531)
  • Deprecate Series.inner_dtype property (#12494)
  • Deprecate parse_int in favor of to_integer (#12464)
  • Deprecate DataType method is_not (#12458)
  • Deprecate Series methods is_boolean and is_utf8 (#12457)
  • Add DataType.is_integer and other dtype groups (#12200)

🚀 Performance improvements

  • speed up parquet download of streaming engine (#12544)
  • speed up cov/corr with SIMD + strength-reduction ~3x 0.19.13/ ~2x numpy (#12471)
  • apply predicates and statistics of parquet files in streaming mode (#12439)
  • use online algorithm for cov/corr ~2x (#12412)
  • make 1D numpy to polars conversion zero-copy for numeric data (#12403)

✨ Enhancements

  • Add dedicated horizontal aggregation methods to DataFrame (#12492)
  • support http scan_parquet (#12517)
  • Add support for UTF-8 BOM option in write_csv and sink_csv (#12253)
  • remove lexical (replace with atoi_simd, ryu, and itao). (#12512)
  • more changes for versioned plugins (#12504)
  • plugins add version and context (#12433)
  • Add DataType.is_integer and other dtype groups (#12200)
  • include i128 in more primitive functions (#12413)
  • write rolling functions as private expressions. (#12379)

🐞 Bug fixes

  • fix incorrect ternary agg states (#12538)
  • fix and improve ternary evaluation on groups (#12529)
  • saturating sub in debug msg (#12525)
  • fix panic when writing Decimal type to parquet (#12532)
  • pre-fefetch struct columns in async projection pd (#12514)
  • rechunk cross join output in streaming (#12511)
  • Ensure behaviour ofSeries comparison with timedelta matches that of other types (#12497)
  • fix as_list logical types (#12507)
  • fix streaming cross join on empty df (#12491)
  • dont overflow when calculating date range over very long periods (#12479)
  • Allow append/zip_with/extend on local categoricals (#12369)
  • Do not panic if time is invalid (#12466)
  • ensure explicit "return_dtype" is respected by map_dicts (#12436)
  • empty csv no-raise (#12434)
  • Fix scan_csv error type (#12355)
  • binary operations in aggregation context on literals (#12430)
  • raw HTML output alignment was incorrect for dtype in header (#12422)
  • update groups state after binary aggregation (#12415)
  • Remove extra \n when reading file-like object wi… (#12333)
  • Issue correct PolarsInefficientMapWarning for lshift/rshift operations (#12385)
  • revert ternary special broadcast, ensure broadcast is always to max height (#12395)
  • ensure first/last return null if empty (#12401)

🛠️ Other improvements

  • fix and improve ternary evaluation on groups (#12529)
  • Add polars-ds to list of community plugins (#12527)
  • Future-proof consortium standard test (#12524)
  • add schema test (#12523)
  • remove lexical (replace with atoi_simd, ryu, and itao). (#12512)
  • add test for previous commit (#12510)
  • Update polars-hash reference (#12505)
  • Add note on hash stability and mention polars-hash (#12496)
  • Support Python 3.12 (#12094)
  • Improved import polars timing test; now much more consistent/reliable (#12478)
  • Use .with_columns() in all .list namespace examples (#12475)
  • update rustc (#12468)
  • Fix docs trigger (#12449)
  • Update for new maturin release (#12437)
  • Remove 'experimental' tag for auto-structify setting (#12435)
  • make "DataFrame" and "Series" case more consistent across docs/comments/errors (#12428)
  • dprint/markdown link checker minor updates (#12409)
  • Use manylinux_2_17 for building x86-64 wheel (#12408)
  • Use manylinux 2.24 instead of 2.28 for compatibility reasons (#12397)
  • use with_columns in is_in example, and fix some bullet points not rendering (#12383)

Thank you to all our contributors for making this release possible!
@MarcoGorelli, @abstractqqq, @alexander-beedie, @c-peters, @cmdlineluser, @hirohira9119, @ion-elgreco, @jerome3o, @nameexhaustion, @reswqa, @ritchie46, @stinodego and @uchiiii

polars - Rust Polars 0.35.0

Published by github-actions[bot] 11 months ago

🏆 Highlights

  • improve join performance through radix partitioned join (#12270)

💥 Breaking changes

  • Rename cumulative functions cumsum -> cum_sum and similar (#12513)
  • Rename take to gather (#12528)
  • Add dedicated horizontal aggregation methods to DataFrame (#12492)
  • Rename take_every to gather_every (#12531)
  • Deprecate parse_int in favor of to_integer (#12464)
  • plugins add version and context (#12433)
  • Fix scan_csv error type (#12355)
  • Rename write_csv parameter has_header to include_header (#12351)
  • Rename is_signed to is_signed_integer (#12220)
  • Rename dt.seconds to dt.total_seconds (likewise for days, hours, minutes, milliseconds, microseconds, and nanoseconds) (#12179)
  • Rename ljust/rjust to pad_end/pad_start (#11975)

🚀 Performance improvements

  • speed up cov/corr with SIMD + strength-reduction ~3x 0.19.13/ ~2x numpy (#12471)
  • apply predicates and statistics of parquet files in streaming mode (#12439)
  • use online algorithm for cov/corr ~2x (#12412)
  • indexvec in group-by (#12371)
  • reduce allocations in hash join (#12368)
  • change concurrency parameters (#12321)
  • improve join performance through radix partitioned join (#12270)
  • remove extra multiplication in hash_to_partition (#12233)
  • allow non-power-of-two partitions (#12225)
  • Reduce compute in error message for failed datetime parsing (#12147)
  • improve parquet downloading (#12061)

✨ Enhancements

  • Add dedicated horizontal aggregation methods to DataFrame (#12492)
  • support http scan_parquet (#12517)
  • Add support for UTF-8 BOM option in write_csv and sink_csv (#12253)
  • remove lexical (replace with atoi_simd, ryu, and itao). (#12512)
  • Allow comparison of two local categories with the same hash (#12503)
  • more changes for versioned plugins (#12504)
  • plugins add version and context (#12433)
  • include i128 in more primitive functions (#12413)
  • write rolling functions as private expressions. (#12379)
  • Add round_sig_figs expression for rounding to significant figures (#11959)
  • change concurrency parameters (#12321)
  • deprecate _saturating in duration string language, make it the default (#12301)
  • auto infer ambiguous for truncate and round (#12204)
  • Rename is_signed to is_signed_integer (#12220)
  • New Config options for numeric formatting: digit grouping and thousands/decimal separator (#12099)
  • allow non-aggregation predicate in ternary groupby (#12286)
  • Add name= in .write_avro to set schema name (#12255)
  • Add support for reading zstd compressed files (no-options) in read_csv (#12214)
  • start prefetching all files immediately (#12201)
  • Add .list.to_array expression (#12192)
  • consolidate & improve all casting failure error messages (#12168)
  • tunable concurrency (#12171)
  • support reverse sort in streaming (#12169)
  • Add .arr.to_list expression (#12136)
  • add concurrency budget (#12117)
  • Introduce ignore_nulls for str.concat (#12108)
  • casting utf8 to temporal (#12072)
  • Add supertype for List/Array (#12016)
  • enable eq and neq for array dtype (#12020)
  • Expressify n of shift (#12004)
  • add dedicated name namespace for operations that affect expression names (#11973)

🐞 Bug fixes

  • fix incorrect ternary agg states (#12538)
  • fix and improve ternary evaluation on groups (#12529)
  • saturating sub in debug msg (#12525)
  • fix panic when writing Decimal type to parquet (#12532)
  • pre-fefetch struct columns in async projection pd (#12514)
  • rechunk cross join output in streaming (#12511)
  • fix as_list logical types (#12507)
  • fix streaming cross join on empty df (#12491)
  • dont overflow when calculating date range over very long periods (#12479)
  • Allow append/zip_with/extend on local categoricals (#12369)
  • Do not panic if time is invalid (#12466)
  • empty csv no-raise (#12434)
  • Fix scan_csv error type (#12355)
  • binary operations in aggregation context on literals (#12430)
  • update groups state after binary aggregation (#12415)
  • Remove extra \n when reading file-like object wi… (#12333)
  • revert ternary special broadcast, ensure broadcast is always to max height (#12395)
  • ensure first/last return null if empty (#12401)
  • Do not cast lit if has same dtype (#12342)
  • Fix index column name of rolling/dynamic group by (#12365)
  • ternary broadcasting with empty truthy or falsy and agg predicate (#12357)
  • uint64 should be correctly extracted from python object (#12338)
  • expr_output_name include literal (#12335)
  • Fix Decimal dtype table repr (#12318)
  • Fix behavior of month intervals in date_range (#12317)
  • scan emtpy csv miss row_count (#12316)
  • zip_with also broadcast mask (#12309)
  • respect hive_partitioning flag when dealing with multiple files (#12315)
  • parquet, add row_count to empty file materialization (#12310)
  • fix download ranges in parquet (#12313)
  • object store path derivation for local URL (#12308)
  • don't move right endpoint of windows in rolling in default offset==-period case (#12267)
  • Raise more informative error on invalid reshape input (#12288)
  • incorrect super type for literals in nested binary exprs (#12238)
  • Update null_count after arithmetic (#12280)
  • fix ambiguous aggregation type (#12269)
  • Consistently propagate nulls for numpy ufuncs (#12212)
  • respect return_scalar of list scalars (#12251)
  • potential overflow (#12206)
  • always start a new thread if the thread is already blocking (#12202)
  • with_row_count should block predicate push down for lazy csv (#12187)
  • rechunk failed-list series before iterate (#12189)
  • Raise if *_horizontal without inputs (#12106)
  • fix incorrect desc sort behavior (#12141)
  • take should block predicate pushdown (#12130)
  • use null type when read from unknown row (#12128)
  • boundary predicate to block all accumulated predicates in push down (#12105)
  • make python schema_overrides information available to the rust-side inference code when initialising from records/dicts (#12045)
  • fix panic when initializing Series with array of list dtype (#12148)
  • Fix schema of arr.min/max (#12127)
  • ensure filter predicate inputs exist in schema (#12089)
  • str.concat on empty list (#12066)
  • binary agg should group aware if literal not a scalar (#12043)
  • Use Arrow schema for file readers (#12048)
  • Error on duplicates in hive partitioning (#12040)
  • display fmt for str split (#12039)
  • sum_horizontal should not always cast to int (#12031)
  • fix apply_to_inner's dtype (#12010)
  • Fix padding for non-ASCII strings (#12008)
  • inline parts of unstable unicode module for stable (#12003)
  • fix dot visualization of anonymous scans (#12002)
  • SQL table aliases (#11988)

🛠️ Other improvements

  • Rename cumulative functions cumsum -> cum_sum and similar (#12513)
  • fix and improve ternary evaluation on groups (#12529)
  • Rename take to gather (#12528)
  • Add dedicated horizontal aggregation methods to DataFrame (#12492)
  • Rename take_every to gather_every (#12531)
  • Add polars-ds to list of community plugins (#12527)
  • add schema test (#12523)
  • remove lexical (replace with atoi_simd, ryu, and itao). (#12512)
  • add test for previous commit (#12510)
  • Support Python 3.12 (#12094)
  • Fix some typos (#12485)
  • Deprecate parse_int in favor of to_integer (#12464)
  • update rustc (#12468)
  • rename the DataType in the polars-arrow crate to ArrowDataType for clarity, preventing conflation with our own/native DataType (#12459)
  • Replace outdated dev dependency tempdir (#12462)
  • move cov/corr to polars-ops (#12411)
  • use unwrap_or_else and get_unchecked_release in rolling kernels (#12405)
  • dprint/markdown link checker minor updates (#12409)
  • replace as_u64 with dirty_hash (#12327)
  • Fix ruff linting invocation (#12350)
  • Rename write_csv parameter has_header to include_header (#12351)
  • Build and verify Rust examples in docs (#12334)
  • Fix some feature flags (#12325)
  • Organize Cargo.toml (#12323)
  • remove fxhash (#12322)
  • Run rustfmt on doc examples (#12319)
  • Consolidate "getting started" and "user guide" sections (#12246)
  • deprecate _saturating in duration string language, make it the default (#12301)
  • simplify expr checking in predicate push down (#12287)
  • Replace dev dependency avro-rs with apache-avro (#12295)
  • Run clippy on all targets (#12293)
  • Add top-level make clippy, simplify Rust linting workflows (#12290)
  • ensure we git-ignore ALL .venv dirs (#12289)
  • incorrect super type for literals in nested binary exprs (#12238)
  • remove unwrap from group_by (#12263)
  • update object_store (#12006) (#12273)
  • Remove recommended setting from IDE docs (#12275)
  • Add feature flag for list.eval (#12254)
  • factor out some shared code in truncate_impl (#12229)
  • update Cargo.lock (#12226)
  • Make all functions in string namespace non-anonymous (#12215)
  • Rename dt.seconds to dt.total_seconds (likewise for days, hours, minutes, milliseconds, microseconds, and nanoseconds) (#12179)
  • use enum for Ambiguous (#12193)
  • Standardize project name formatting across docs (#12185)
  • Update sqlparser to 0.39 (#12173)
  • pin ring (#12176)
  • Refactor FunctionExpr module (#12162)
  • Fix tests for pyarrow 14 (#12170)
  • Fix triggers for docs deployment (#12159)
  • Make all functions in binary namespace non-anonymous (#12126)
  • Consolidate contributing info (#12109)
  • Fix typo in user-guide/expressions/plugins.md (#12115)
  • Update CODEOWNERS (#12107)
  • visualize plugin directory layout in user guide (#12092)
  • Minor improvements to the docs website (#12084)
  • reshape and repeat_by non-anoymous (#12064)
  • upgrade zstd to 0.13 in polars-parquet (#12062)
  • Direct CONTRIBUTING to the docs website (#12042)
  • inline parquet2 (#12026)
  • remove parquet logic from polars-arrow and consolidate logic in polars-parquet crate. (#12022)
  • move abs to ops (#12005)
  • Rename ljust/rjust to pad_end/pad_start (#11975)
  • Disable type checking for dataframe_api_compat dependency (#11997)

Thank you to all our contributors for making this release possible!
@JulianCologne, @MarcoGorelli, @Priyansh121096, @abstractqqq, @alexander-beedie, @braaannigan, @brayanjuls, @c-peters, @cmdlineluser, @daviskirk, @dependabot, @dependabot[bot], @dgilman, @hirohira9119, @ion-elgreco, @jerome3o, @jrycw, @mcrumiller, @messense, @moritzwilksch, @nameexhaustion, @orlp, @owrior, @rancomp, @reswqa, @ritchie46, @rob-sil, @stefmolin, @stinodego, @uchiiii, @universalmind303 and @wsyxbcl

polars - Python Polars 0.19.13

Published by github-actions[bot] 12 months ago

🏆 Highlights

  • improve join performance through radix partitioned join (#12270)

⚠️ Deprecations

  • Rename write_csv parameter has_header to include_header (#12351)
  • deprecate _saturating in duration string language, make it the default (#12301)
  • Switch args for Decimal and set default scale=0 (#12224)
  • Rename dt.seconds to dt.total_seconds (likewise for days, hours, minutes, milliseconds, microseconds, and nanoseconds) (#12179)
  • Deprecate DataFrame.as_dict positional input (#12131)

🚀 Performance improvements

  • indexvec in group-by (#12371)
  • reduce allocations in hash join (#12368)
  • change concurrency parameters (#12321)
  • improve join performance through radix partitioned join (#12270)
  • remove extra multiplication in hash_to_partition (#12233)
  • allow non-power-of-two partitions (#12225)
  • Reduce compute in error message for failed datetime parsing (#12147)

✨ Enhancements

  • updated BytecodeParser for Python 3.12 (#12348)
  • Add round_sig_figs expression for rounding to significant figures (#11959)
  • change concurrency parameters (#12321)
  • deprecate _saturating in duration string language, make it the default (#12301)
  • auto infer ambiguous for truncate and round (#12204)
  • allow construction of Datetime series from datetime.date array (#12175)
  • New Config options for numeric formatting: digit grouping and thousands/decimal separator (#12099)
  • allow non-aggregation predicate in ternary groupby (#12286)
  • Add name= in .write_avro to set schema name (#12255)
  • Update write_delta to write large arrow types without casting (#12260)
  • Add support for reading zstd compressed files (no-options) in read_csv (#12214)
  • start prefetching all files immediately (#12201)
  • expose more options to plugin registration (#12197)
  • Add .list.to_array expression (#12192)
  • consolidate & improve all casting failure error messages (#12168)
  • Add Binary dtype to hypothesis tests (#12140)
  • tunable concurrency (#12171)
  • support reverse sort in streaming (#12169)
  • Add .arr.to_list expression (#12136)
  • Support decimals in assert utils (#12119)
  • add concurrency budget (#12117)
  • improved support for use of file-like objects with DataFrame "write" methods (#12113)
  • Introduce ignore_nulls for str.concat (#12108)

🐞 Bug fixes

  • Do not cast lit if has same dtype (#12342)
  • Fix index column name of rolling/dynamic group by (#12365)
  • ternary broadcasting with empty truthy or falsy and agg predicate (#12357)
  • uint64 should be correctly extracted from python object (#12338)
  • ignore IDE-mediated DeprecationWarning when debugging tests under 3.12 (#12343)
  • expr_output_name include literal (#12335)
  • Fix Decimal dtype table repr (#12318)
  • Fix behavior of month intervals in date_range (#12317)
  • scan emtpy csv miss row_count (#12316)
  • zip_with also broadcast mask (#12309)
  • respect hive_partitioning flag when dealing with multiple files (#12315)
  • parquet, add row_count to empty file materialization (#12310)
  • Fix invalid DeprecationWarning generated from date_range defined with 'saturating' interval (#12311)
  • fix download ranges in parquet (#12313)
  • object store path derivation for local URL (#12308)
  • don't move right endpoint of windows in rolling in default offset==-period case (#12267)
  • Raise more informative error on invalid reshape input (#12288)
  • incorrect super type for literals in nested binary exprs (#12238)
  • typo in exception message (#12278)
  • fix ambiguous aggregation type (#12269)
  • return frames from read_excel in the originally specified order (#12243)
  • Consistently propagate nulls for numpy ufuncs (#12212)
  • respect return_scalar of list scalars (#12251)
  • fix plugins system on Windows (#12230)
  • potential overflow (#12206)
  • always start a new thread if the thread is already blocking (#12202)
  • with_row_count should block predicate push down for lazy csv (#12187)
  • rechunk failed-list series before iterate (#12189)
  • Fix interchange protocol boolean buffer size (#12177)
  • fix incorrect desc sort behavior (#12141)
  • take should block predicate pushdown (#12130)
  • use null type when read from unknown row (#12128)
  • boundary predicate to block all accumulated predicates in push down (#12105)
  • make python schema_overrides information available to the rust-side inference code when initialising from records/dicts (#12045)
  • fix panic when initializing Series with array of list dtype (#12148)
  • Fix schema of arr.min/max (#12127)
  • ensure filter predicate inputs exist in schema (#12089)

🛠️ Other improvements

  • updated BytecodeParser for Python 3.12 (#12348)
  • Workaround for maturin issue (#12370)
  • Fix incorrect boundary column name in group_by_dynamic docstrings (#12366)
  • Fix typo in rolling_* docstrings (#12362)
  • Fix ruff linting invocation (#12350)
  • Clean up conversion utils (#11789)
  • Organize Cargo.toml (#12323)
  • Consolidate "getting started" and "user guide" sections (#12246)
  • Minor updates to prepare for Python 3.12 support (#12314)
  • Move script for testing map warning (#12306)
  • simplify expr checking in predicate push down (#12287)
  • Remove external link (#12223)
  • Fix rebase issue breaking CI (#12296)
  • Add top-level make clippy, simplify Rust linting workflows (#12290)
  • ensure we git-ignore ALL .venv dirs (#12289)
  • incorrect super type for literals in nested binary exprs (#12238)
  • Remove recommended setting from IDE docs (#12275)
  • Clean up Python test workflow (#12261)
  • clarify contains selector (#12265)
  • Add py-polars to Cargo workspace (#12256)
  • Use .with_columns in some docstrings (#12250)
  • Add test for scan_csv plus slice (#12239)
  • Fix emphasis formatting in docstring (#12240)
  • Fix emphasis formatting in docstring (#12237)
  • add deprecation notices to the docs for expressions moved into the new name namespace (#12236)
  • update Cargo.lock (#12226)
  • make sort test work with unstable sort (#12221)
  • Build Python wheels on manylinux_2_28 (#12211)
  • Include rust-toolchain.toml with sdist/wheels (#12184)
  • Standardize project name formatting across docs (#12185)
  • Update sqlparser to 0.39 (#12173)
  • pin ring (#12176)
  • Improve strip_{prefix, suffix} & strip_chars_{start, end} (#12161)
  • Fix tests for pyarrow 14 (#12170)
  • Fix rendering of note in DataFrame.fold (#12164)
  • Fix triggers for docs deployment (#12159)
  • Refactor some tests (#12121)
  • Consolidate contributing info (#12109)
  • Fix typo in user-guide/expressions/plugins.md (#12115)
  • Render docstring text in single backticks as code (#12096)
  • use more ergonomic syntax in select/with_columns where possible (#12101)
  • Update CODEOWNERS (#12107)
  • visualize plugin directory layout in user guide (#12092)
  • Minor tweak in code example in section Expressions/Aggregation (#12033)
  • Minor tweak in code example in section Expressions/Missing data (#12080)
  • Minor improvements to the docs website (#12084)

Thank you to all our contributors for making this release possible!
@JulianCologne, @MarcoGorelli, @Priyansh121096, @alexander-beedie, @cmdlineluser, @daviskirk, @dependabot, @dependabot[bot], @dgilman, @hirohira9119, @ion-elgreco, @jrycw, @mcrumiller, @moritzwilksch, @nameexhaustion, @orlp, @owrior, @rancomp, @reswqa, @ritchie46, @rob-sil, @stefmolin, @stinodego and @wsyxbcl

polars - Python Polars 0.19.13-rc.1

Published by github-actions[bot] 12 months ago

⚠️ Deprecations

  • Deprecate DataFrame.as_dict positional input (#12131)

🚀 Performance improvements

  • Reduce compute in error message for failed datetime parsing (#12147)

✨ Enhancements

  • tunable concurrency (#12171)
  • support reverse sort in streaming (#12169)
  • Add .arr.to_list expression (#12136)
  • Support decimals in assert utils (#12119)
  • add concurrency budget (#12117)
  • improved support for use of file-like objects with DataFrame "write" methods (#12113)
  • Introduce ignore_nulls for str.concat (#12108)

🐞 Bug fixes

  • fix incorrect desc sort behavior (#12141)
  • take should block predicate pushdown (#12130)
  • use null type when read from unknown row (#12128)
  • boundary predicate to block all accumulated predicates in push down (#12105)
  • make python schema_overrides information available to the rust-side inference code when initialising from records/dicts (#12045)
  • fix panic when initializing Series with array of list dtype (#12148)
  • Fix schema of arr.min/max (#12127)
  • ensure filter predicate inputs exist in schema (#12089)

🛠️ Other improvements

  • pin ring (#12176)
  • Improve strip_{prefix, suffix} & strip_chars_{start, end} (#12161)
  • Fix tests for pyarrow 14 (#12170)
  • Fix rendering of note in DataFrame.fold (#12164)
  • Fix triggers for docs deployment (#12159)
  • Refactor some tests (#12121)
  • Consolidate contributing info (#12109)
  • Fix typo in user-guide/expressions/plugins.md (#12115)
  • Render docstring text in single backticks as code (#12096)
  • use more ergonomic syntax in select/with_columns where possible (#12101)
  • Update CODEOWNERS (#12107)
  • visualize plugin directory layout in user guide (#12092)
  • Minor tweak in code example in section Expressions/Aggregation (#12033)
  • Minor tweak in code example in section Expressions/Missing data (#12080)
  • Minor improvements to the docs website (#12084)

Thank you to all our contributors for making this release possible!
@MarcoGorelli, @Priyansh121096, @alexander-beedie, @dependabot, @dependabot[bot], @jrycw, @moritzwilksch, @nameexhaustion, @reswqa, @ritchie46, @stefmolin and @stinodego

polars - Python Polars 0.17.5

Published by stinodego 12 months ago

🚀 Performance improvements

  • use online variance kernel for aggregation (#8306)

Thank you to all our contributors for making this release possible!
@ritchie46

polars - Python Polars 0.19.12

Published by github-actions[bot] 12 months ago

⚠️ Deprecations

  • Deprecate nans_compare_equal parameter in assert utils (#12019)
  • Rename ljust/rjust to pad_end/pad_start (#11975)
  • Deprecate shift_and_fill in favor of shift (#11955)
  • Deprecate clip_min/clip_max in favor of clip (#11961)

🚀 Performance improvements

  • improve parquet downloading (#12061)
  • fix regression non-null asof join (#11984)
  • drasticly improve performance of limit on async parquet datasets (#11965)

✨ Enhancements

  • Add supertype for List/Array (#12016)
  • enable eq and neq for array dtype (#12020)
  • Expressify n of shift (#12004)
  • add dedicated name namespace for operations that affect expression names (#11973)
  • optimize asof_join and allow null/string keys (#11712)
  • limit concurrent downloads in async parquet (#11971)
  • sample fraction can take an expr (#11943)
  • Add infer_schema_length to pl.read_json (#11724)

🐞 Bug fixes

  • Fix get_index/iteration for Array types (#12047)
  • improved xlsx2csv defaults for read_excel (#12081)
  • str.concat on empty list (#12066)
  • fix issue with invalid Mapping objects used as schema being silently ignored (#12027)
  • improve ingest from numpy scalar values (#12025)
  • binary agg should group aware if literal not a scalar (#12043)
  • Use Arrow schema for file readers (#12048)
  • Error on duplicates in hive partitioning (#12040)
  • display fmt for str split (#12039)
  • sum_horizontal should not always cast to int (#12031)
  • fix apply_to_inner's dtype (#12010)
  • Allow inexact checking of nested integers (#12037)
  • Fix padding for non-ASCII strings (#12008)
  • fix dot visualization of anonymous scans (#12002)
  • SQL table aliases (#11988)
  • fix streaming multi-column/multi-dtype sort (#11981)
  • ensure streaming parquet datasets deal with limits (#11977)
  • implement proper hash for identifier in cse (#11960)
  • fix take return dtype in group context. (#11949)
  • fix panic in format of anonymous scans (#11951)
  • sql In should work without specific ops (#11947)
  • construct list series from any values subject to dtype (#11944)

🛠️ Other improvements

  • minor updates to lint-related dependencies (#12073)
  • Add Excel page to user guide (#12055)
  • Direct CONTRIBUTING to the docs website (#12042)
  • Replace black by ruff format (#11996)
  • Further assert utils refactor (#12015)
  • Remove stacklevels checker utility script (#11962)
  • Disable type checking for dataframe_api_compat dependency (#11997)
  • Fix release tag (#11994)
  • optimize asof_join and allow null/string keys (#11712)
  • Add Development and Releases sections to the documentation (#11932)
  • include the "build" dir when running make clean for docs (#11970)
  • make cloning PyExpr consistent (#11956)
  • fix take return dtype in group context. (#11949)
  • warn about scan_pyarrow_dataset's limitations and suggest scan_parquet instead (if possible) (#11952)
  • Add set_fmt_table_cell_list_len to API docs (#11942)

Thank you to all our contributors for making this release possible!
@JulianCologne, @MarcoGorelli, @Rohxn16, @alexander-beedie, @braaannigan, @brayanjuls, @messense, @nameexhaustion, @orlp, @reswqa, @ritchie46, @squnit, @stinodego and @universalmind303

polars - Rust Polars 0.34.0

Published by github-actions[bot] 12 months ago

🏆 Highlights

  • postfix rolling expression as a special case of window functions. (#11445)
  • support 'hive partitioning' aware readers (#11284)

💥 Breaking changes

  • Rename .list.lengths and .str.lengths (#11613)
  • Rename write_csv parameter quote to quote_char (#11583)
  • Add disable_string_cache (#11020)

🚀 Performance improvements

  • fix regression non-null asof join (#11984)
  • drasticly improve performance of limit on async parquet datasets (#11965)
  • support multiple files in a single scan parquet node. (#11922)
  • fix accidental quadratic behavior; cache null_count (#11889)
  • fix quadratic behavior in append sorted check (#11893)
  • properly push down slice before left/asof join (#11854)
  • Improve performance of cot (cotangent) (#11717)
  • rechunk before grouping on multiple keys (#11711)
  • process parquet statistics before downloading row-group (#11709)
  • push down predicates that refer to group_by keys (#11687)
  • slightly faster float equality (#11652)
  • actually use projection information in async parquet reader (#11637)
  • improve performance and fix panic in async parquet reader (#11607)
  • use try_binary_elementwise over try_binary_elementwise_values (#11596)
  • skip empty chunks in concat (#11565)
  • improve sparse sample performance (#11544)
  • early return in replace_time_zone if target and source time zones match (#11478)
  • greatly improve parquet cloud reading (#11479)
  • ensure we download row-groups concurrently. (#11464)
  • don't load N metadata files when globbing N files (#11422)
  • remove double memcopy (#11365)
  • adress perf regression (#11354)
  • improve dynamic_groupby_iter (#11341)
  • improve and fix rolling windows by linear scanning (#11326)
  • improve outer join materialization (#11241)
  • use ryu and itoa for primitive serialization (#11193)
  • use try-binary-elementwise instead of try-binary-elementwise-values in dt_truncate (#11189)
  • Using cache for str.contains regex compilation (#11183)

✨ Enhancements

  • optimize asof_join and allow null/string keys (#11712)
  • limit concurrent downloads in async parquet (#11971)
  • sample fraction can take an expr (#11943)
  • Add infer_schema_length to pl.read_json (#11724)
  • improve error handling in scan_parquet and deal with file limits (#11938)
  • support multiple files in a single scan parquet node. (#11922)
  • error instead of panic in unsupported sinks (#11915)
  • Introduce list.sample (#11845)
  • don't require empty config for cloud scan_parquet (#11819)
  • Expressify pct_change and move to ops (#11786)
  • add DATE function for SQL (#11541)
  • right-align numeric columns (#7475)
  • Add config setting to control how many List items are printed (#11409)
  • allow specifying schema in pl.scan_ndjson (#10963)
  • easier arrow2/arrow-rs conversion (#11666)
  • support multiple sources in scan_file (#11661)
  • allow coalesce in streaming (#11633)
  • Implement schema, schema_override for pl.read_json with array-like input (#11492)
  • add SQL support for UNION [ALL] BY NAME, add "diagonal_relaxed" strategy for pl.concat (#11597)
  • improve performance and fix panic in async parquet reader (#11607)
  • add time_unit argument to duration, default to "us" (#11586)
  • elide overflow checks on i64 (#11563)
  • add INITCAP string function for SQL (#9884)
  • Use IPC for (un)pickling dataframes/series (#11507)
  • support left and right anti/semi joins from the SQL interface (#11501)
  • expressify peak_min/peak_max (#11482)
  • IN(subquery) and SQL Subquery Infrastructure (#11218)
  • Format null arrays in Series (#11289)
  • postfix rolling expression as a special case of window functions. (#11445)
  • allow for "by" column to be of dtype Date in rolling_* functions (#11004)
  • support 'abfss' for azure (#11413)
  • multi-threaded async runtime (#11411)
  • async parquet. (#11403)
  • fail fast when invalid cloud settings; introduce retries arg (#11380)
  • modernize CPU features (#11351)
  • introduce 'label' instead of 'truncate' in group_by_dynamic, which can take label='right' (#11337)
  • Expressify list.shift (#11320)
  • add gather_skip_nulls implementation (#11329)
  • top_k and bottom_k supports pass an expr (#11344)
  • support 'hive partitioning' aware readers (#11284)
  • str.strip_chars supports take an expr argument (#11313)
  • sample n can take an expr (#11257)
  • Add disable_string_cache (#11020)
  • clip supports expr arguments and physical numeric dtype (#11288)
  • Introduce list.drop_nulls (#11272)
  • str.splitn and split_exact can take an expr argument by (#11275)
  • introduce ambiguous option for dt.round (#11269)
  • improve binary helper so we don't need to rechunk. (#11247)
  • Adds NULLIF and COALESCE SQL functions (#11124)
  • better tree-formatting representation (#11176)
  • Support duration + date (#11190)
  • binary search and rechunk in chunked gather (#11199)
  • Expressify str.strip_prefix & suffix (#11197)
  • sql udfs (#10957)
  • run cloud parquet reader in default engine (#11196)
  • list.join's separator can be expression (#11167)
  • argument every of datetime.truncate can be expression (#11155)

🐞 Bug fixes

  • fix streaming multi-column/multi-dtype sort (#11981)
  • ensure streaming parquet datasets deal with limits (#11977)
  • implement proper hash for identifier in cse (#11960)
  • fix take return dtype in group context. (#11949)
  • sql In should work without specific ops (#11947)
  • construct list series from any values subject to dtype (#11944)
  • avoid integer overflow in offsets_to_groups when bigidx is enabled (#11901)
  • read_csv for empty lines (#11924)
  • predicate push-down remove predicate refers to alias for more branch (#11887)
  • use physcial append (#11894)
  • recursively apply cast_unchecked in lists (#11884)
  • recursively check allowed streaming dtypes (#11879)
  • fix project pushdown for double projection contains count (#11843)
  • series.to_numpy fails with dtype=Null (#11858)
  • panic on hive scan from cloud (#11847)
  • Propagate validity when cast primitive to list (#11846)
  • Edge cases for list count formatting (#11780)
  • remove flag inconsistency 'map_many' (#11817)
  • ensure projections containing only hive columns are projected (#11803)
  • patch broken aHash AES intrinsics on ARM (#11801)
  • fix key in object-store cache (#11790)
  • handle logical types in plugins (#11788)
  • make PyLazyGroupby reusable (#11769)
  • only exclude final output names of group_by key expressions (#11768)
  • fix ambiguity wrt list aggregation states (#11758)
  • Correctly process subseconds in pl.duration (#11748)
  • LazyFrame.drop_columns overflow issue when columns.len()>schema.len() (#11716)
  • index_to_chunked_index's fast path is not correct (#11710)
  • use actual number of read rows for hive materialization (#11690)
  • return float dtype in interpolate (for method="linear") for numeric dtypes (#11624)
  • fix seg fault in concat_str of empty series (#11704)
  • Fix match on last item for join_asof with strategy="nearest" (#11673)
  • fix display str for peak_max and top_k (#11657)
  • Fix input replacement logic for slice (#11631)
  • slice expr can be taken in cse (#11628)
  • ensure nested logical types are converted to physical (#11621)
  • correctly convert nullability of nested parquet fields to arrow (#11619)
  • improve performance and fix panic in async parquet reader (#11607)
  • expand all literals before group_by (#11590)
  • mark take_group_last function as unsafe (#11587)
  • handle unary operators applied to numbers used in SQL IN clauses (#11574)
  • Align new_columns argument for scan_csv and read_csv (#11575)
  • don't conflate supported UNION ops in the SQL parser with (currently) unsupported UNION "BY NAME" variations (#11576)
  • incomplete reading of list types from parquet (#11578)
  • respect identity in horizontal sum (#11559)
  • bug in BitMask::get_u32 (#11560)
  • take slice into account in parallel unions (#11558)
  • correct schema empty df in hive partitioning read (#11557)
  • ensure ListChunked::full_null uses physical types (#11554)
  • respect 'hive_partitioning' argument in parquet (#11551)
  • fix parquet deserialization Overflow error by using i64 offset types when promoting Arrow Lists to LargeLists (#11549)
  • streamline is_in handling of mismatched dtypes and fix a minor regression (#11533)
  • catch use of non equi-joins in SQL interface and raise appropriate error (#11526)
  • rework SQL join constraint processing to properly account for all USING columns (#11518)
  • literal hash (#11508)
  • Fix lazy schema for cut/qcut when allow_breaks=True (#11287)
  • correct output schema of hive partition and projection at scan (#11499)
  • correct projection pushdown in hive partitioned read (#11486)
  • fix for write_csv when using non-default "quote" char (#11474)
  • fix deserialization of parquets with large string list columns causing stack overflow (#11471)
  • Fix SQL ANY and ALL behaviour (#10879)
  • address multiple issues caused by implicit casting of is_in values to the column dtype being searched (#11427)
  • raise on invalid sort_by group lengths (#11423)
  • fix outer join on bools (#11417)
  • fix categorical collect (#11414)
  • Free bitmap when slicing into a non-null array (#11405)
  • async parquet. (#11403)
  • Fix edge-case where the Array dtype could (internally) be considered numeric (#11398)
  • Fix empty check when building a list (#11378)
  • more cloud urls (#11361)
  • ensure cloud globbing can deal with spaces (#11360)
  • recognize more cloud urls (#11357)
  • Fix Series.__contains__ for None values and implement is_in for null Series (#11345)
  • don't panic on multi-nodes in streaming conversion (#11343)
  • ensure trailing quote is written for temporal data when CSV quote_style is non-numeric (#11328)
  • fix empty Series construction edge-case with Struct dtype (#11301)
  • add missing feature flags on tests (#11305)
  • set partitions independent of thread pool (#11304)
  • parse sign for decimal properly (#11302)
  • consume duplicates in rolling_by window (#11261)
  • handle url encoded paths in objectpath creation (#11240)
  • use POOL when writing csv (#11222)
  • is_in for bool evaluate has_false incorrectly (#11217)
  • fix nullable filter mask in group_by (#11207)
  • replace n-th in filter (#11206)
  • fix translation of Series-nested datetime/date values for scan_pyarrow predicates (#11195)
  • impl hash for more function expr (#11182)
  • list.join's separator can be expression (#11167)
  • Add some missing expr type hint for series (#11171)
  • Make pl.struct serializable (#11169)
  • Fix rust test for logical plan optimizer for categoricals (#11135)
  • propagate null value for str/binary starts/ends_with and contains (#11141)

🛠️ Other improvements

  • optimize asof_join and allow null/string keys (#11712)
  • Add Development and Releases sections to the documentation (#11932)
  • use ahash from crates.io release (#11964)
  • move unique_counts to ops (#11963)
  • fix take return dtype in group context. (#11949)
  • move moment to ops (#11941)
  • fix some typos and add polars-business to curated plugin list (#11916)
  • prepare for multiple files in a node (#11918)
  • load 40x40 avatar from github and add loading=lazy attribute. (#11886)
  • Fix Cargo warning for parquet2 dependency (#11882)
  • Allow manual trigger for docs deployment (#11881)
  • rename new_from_owned_with_null_bitmap (#11828)
  • add section about plugins (#11855)
  • fix incorrect example of valid time zones (#11873)
  • Bump docs dependencies (#11852)
  • add missing polars-ops tests to CI (#11859)
  • Update doc comments for with_column to reflect that columns can be updated (#11840)
  • Move round to ops (#11838)
  • arrow: remove unused arithmetic code and remove doctests (#11820)
  • Move diff to polars-ops (#11818)
  • remove redundant if branch in nested parquet (#11814)
  • Move ewma to polars-ops (#11794)
  • Make some functions in dsl::mod non-anonymous (#11799)
  • Move cum_agg to polars-ops (#11770)
  • more granular polars-ops imports (#11760)
  • Make all emw function expr non-anonymous (#11638)
  • clarify polars-arrow <=> arrow2 license (#11755)
  • Version polars-arrow with the other crates (#11738)
  • fill missing fill_null strategies (#11751)
  • Minor fix in code example in section Coming from Pandas (#11745) (#11745)
  • Update group_by_dynamic example (#11737)
  • merge nano-arrow/polars-arrow (#11719)
  • Improving the documentation of the SQL expressions (#11708)
  • *_horizontal dependent on reduce_expr to expression architecture (#11685)
  • update document of folds (#11705)
  • update rustc and fix future (#11696)
  • better align help command output following addition of some longer options (#11681)
  • sum_horizontal to expression architecture (#11659)
  • Cleanup the match block for date inference (#11677)
  • Adding feature annotation (#11671)
  • add note about use of polars-lts-cpu for macOS x86-64/rosetta (#11660)
  • improve rank implementation, especially around nulls (#11651)
  • Bring cloud monikers in line with the ones in is_cloud_url (#11629)
  • Rename .list.lengths and .str.lengths (#11613)
  • Make backwardfill and forwardfill function expr non-anonymous (#11630)
  • Make all expr in dt namespace non-anonymous (#11627)
  • Fix changelog for language-specific breaking changes (#11617)
  • avoid nightly rust for case conversion (#11610)
  • Make value_counts and unique_counts function expr non-anonymous (#11601)
  • Make arg_min(max), diff in list namespace non-anonymous (#11602)
  • Rename write_csv parameter quote to quote_char (#11583)
  • use a generic consistent total ordering, also for floats (#11468)
  • Move mode operation from core to ops crate (#11543)
  • fix lints (#11555)
  • use single threaded take under certain values size (#11539)
  • fix some features (#11529)
  • move (hor_)str_concat to polars-ops (#11488)
  • minor changes in peak-min/max (#11491)
  • align cloud url regex in rust and python (#11481)
  • move AnonymousScan into Scan node (#11502)
  • move repeat_by to polars-ops (#11461)
  • upgrade to nightly-10-02 (#11460)
  • Update contributing guide to include memory requirement (#11458)
  • remove unused order_by attribute (#11434)
  • cleanup sort_by expresion impl (#11431)
  • large windows runner for release (#11370)
  • Fix error message reference to infer_schema_length (#11358)
  • move rank to polars-ops (#11349)
  • unify display for namespaced function expr (#11342)
  • Fix some cargo manifest warnings (#11327)
  • Use GITHUB_TOKEN to get contributor information for docs (#11321)
  • Add disable_string_cache (#11020)
  • remove default auto-explode for map_many_private (#11270)
  • Add API links for Rust user guide examples (#11294)
  • update a few dependencies (#11283)
  • move scan helpers to separate module (#11279)
  • update sponsors (#11271)
  • bump chrono to 0.4.31 (#11258)
  • bind all remaining method in StringNameSpace to function expr (#11229)
  • Make some list function expr non-anonymous (#11230)
  • remove lz4_flex feature (#11253)
  • remove unnecessary transmute (#11250)
  • move (almost) all join related code from polars-core to polars-ops. (#11228)
  • Mention the performant feature only once (#11223)
  • remove unneeded indirection (#11233)
  • remove unneeded mutex around object-store (#11224)
  • bind struct.rename_fields to function expr (#11215)
  • fix un-compilable rust example in user guide. (#11214)
  • add various missing expression doc-comments (#11213)
  • Fix user_guide of str.split (#11185)
  • New take implementation (#11138)
  • Fix rust test for logical plan optimizer for categoricals (#11135)

Thank you to all our contributors for making this release possible!
@ByteNybbler, @Cheukting, @Fokko, @Hofer-Julian, @JulianCologne, @LaurynasMiksys, @MarcoGorelli, @Rohxn16, @SeanTroyUWO, @TheDataScientistNL, @Walnut356, @aberres, @alexander-beedie, @alicja-januszkiewicz, @andysham, @billylanchantin, @bowlofeggs, @c-peters, @cmdlineluser, @dannyvankooten, @dependabot, @dependabot[bot], @ewoolsey, @jhorstmann, @jonashaag, @jrycw, @mcrumiller, @messense, @nameexhaustion, @orlp, @petrosbar, @ptiza, @rancomp, @reswqa, @ritchie46, @rjthoen, @romanovacca, @sd2k, @shenker, @squnit, @stinodego, @svaningelgem, @thomasjpfan, @uchiiii, @universalmind303 and Romano Vacca

polars - Python Polars 0.19.12-rc.1

Published by github-actions[bot] 12 months ago

⚠️ Deprecations

  • Deprecate shift_and_fill in favor of shift (#11955)
  • Deprecate clip_min/clip_max in favor of clip (#11961)

🚀 Performance improvements

  • fix regression non-null asof join (#11984)
  • drasticly improve performance of limit on async parquet datasets (#11965)

✨ Enhancements

  • optimize asof_join and allow null/string keys (#11712)
  • limit concurrent downloads in async parquet (#11971)
  • sample fraction can take an expr (#11943)
  • Add infer_schema_length to pl.read_json (#11724)

🐞 Bug fixes

  • fix streaming multi-column/multi-dtype sort (#11981)
  • ensure streaming parquet datasets deal with limits (#11977)
  • implement proper hash for identifier in cse (#11960)
  • fix take return dtype in group context. (#11949)
  • fix panic in format of anonymous scans (#11951)
  • sql In should work without specific ops (#11947)
  • construct list series from any values subject to dtype (#11944)

🛠️ Other improvements

  • optimize asof_join and allow null/string keys (#11712)
  • Add Development and Releases sections to the documentation (#11932)
  • include the "build" dir when running make clean for docs (#11970)
  • make cloning PyExpr consistent (#11956)
  • fix take return dtype in group context. (#11949)
  • warn about scan_pyarrow_dataset's limitations and suggest scan_parquet instead (if possible) (#11952)
  • Add set_fmt_table_cell_list_len to API docs (#11942)

Thank you to all our contributors for making this release possible!
@MarcoGorelli, @Rohxn16, @alexander-beedie, @messense, @orlp, @reswqa, @ritchie46, @squnit and @stinodego

polars - Python Polars 0.19.11

Published by github-actions[bot] 12 months ago

⚠️ Deprecations

  • Rename shift parameter from periods to n (#11923)
  • Fix Array data type initialization (#11907)

🚀 Performance improvements

  • support multiple files in a single scan parquet node. (#11922)

✨ Enhancements

  • improve error handling in scan_parquet and deal with file limits (#11938)
  • support multiple files in a single scan parquet node. (#11922)
  • error instead of panic in unsupported sinks (#11915)
  • upcast int->float and date->datetime for certain Series comparisons (#11779)

🐞 Bug fixes

  • avoid integer overflow in offsets_to_groups when bigidx is enabled (#11901)
  • read_csv for empty lines (#11924)
  • raise suitable error on invalid predicates passed to filter method (#11928)
  • Fix Array data type initialization (#11907)
  • set null_count on categorical append (#11914)
  • predicate push-down remove predicate refers to alias for more branch (#11887)
  • address DataFrame construction error with lists of numpy arrays (#11905)
  • address issue with inadvertently shared options dict in read_excel (#11908)
  • raise a suitable error from read_excel and/or read_ods when target sheet does not exist (#11906)

🛠️ Other improvements

  • Fix typo in read_excel docstring (#11934)
  • Fix docstring for diff methods (#11921)
  • fix some typos and add polars-business to curated plugin list (#11916)
  • add missing 'diagonal_relaxed' to pl.concat "how" param docstring signature (#11909)

Thank you to all our contributors for making this release possible!
@LaurynasMiksys, @alexander-beedie, @mcrumiller, @reswqa, @ritchie46, @romanovacca, @shenker, @stinodego and @uchiiii

polars - Python Polars 0.19.10

Published by github-actions[bot] about 1 year ago

⚠️ Deprecations

  • Deprecate DataType.is_nested (#11844)

🚀 Performance improvements

  • fix accidental quadratic behavior; cache null_count (#11889)
  • fix quadratic behavior in append sorted check (#11893)
  • optimise read_database Databricks queries made using SQLAlchemy connections (#11885)
  • properly push down slice before left/asof join (#11854)

✨ Enhancements

  • Introduce list.sample (#11845)
  • don't require empty config for cloud scan_parquet (#11819)

🐞 Bug fixes

  • use physical append (#11894)
  • Add include_nulls parameter to update (#11830)
  • recursively apply cast_unchecked in lists (#11884)
  • recursively check allowed streaming dtypes (#11879)
  • Frame slicing single column (#11825)
  • fix project pushdown for double projection contains count (#11843)
  • Propagate validity when cast primitive to list (#11846)
  • Edge cases for list count formatting (#11780)

🛠️ Other improvements

  • Further assert utils refactor (#11888)
  • load 40x40 avatar from github and add loading=lazy attribute. (#11886)
  • Fix Cargo warning for parquet2 dependency (#11882)
  • Allow manual trigger for docs deployment (#11881)
  • add section about plugins (#11855)
  • fix incorrect example of valid time zones (#11873)
  • fix typo in code example in section Expressions - Basic operators (#11848)
  • Bump docs dependencies (#11852)
  • add missing polars-ops tests to CI (#11859)
  • Assert utils refactor (#11813)

Thank you to all our contributors for making this release possible!
@Walnut356, @alexander-beedie, @dannyvankooten, @dependabot, @dependabot[bot], @ewoolsey, @jrycw, @mcrumiller, @nameexhaustion, @orlp, @reswqa, @ritchie46, @rjthoen, @romanovacca and @stinodego