polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust

OTHER License

Downloads
9.7M
Stars
26.3K
Committers
213

Bot releases are visible (Hide)

polars - Python Polars 0.19.9

Published by github-actions[bot] about 1 year ago

⚠️ Deprecations

  • Deprecate non-keyword args for ewm methods (#11804)
  • Deprecate use_pyarrow param for Series.to_list (#11784)
  • Rename group_by_rolling to rolling (#11761)

πŸš€ Performance improvements

  • Improve DataFrame.get_column performance by ~35% (#11783)
  • rechunk before grouping on multiple keys (#11711)
  • process parquet statistics before downloading row-group (#11709)
  • push down predicates that refer to group_by keys (#11687)
  • slightly faster float equality (#11652)

✨ Enhancements

  • Expressify pct_change and move to ops (#11786)
  • primitive kwargs in plugins (#11268)
  • add DATE function for SQL (#11541)
  • extend filter capabilities with new support for *args predicates, **kwargs constraints, and chained boolean masks (#11740)
  • Add config setting to control how many List items are printed (#11409)
  • Use OrderedDict for schemas (#11742)
  • allow specifying schema in pl.scan_ndjson (#10963)
  • add support for "outer" mode to frame update method (#11688)
  • transparently support "qmark" parameterisation of SQLAlchemy queries in read_database (#11700)
  • support multiple sources in scan_file (#11661)
  • support batched frame iteration over read_database queries (#11664)
  • column selector support for DataFrame.melt and LazyFrame.unnest (#11662)

🐞 Bug fixes

  • ensure projections containing only hive columns are projected (#11803)
  • patch broken aHash AES intrinsics on ARM (#11801)
  • fix key in object-store cache (#11790)
  • handle logical types in plugins (#11788)
  • Fix values printed by assert_*_equal AssertionError when exact=False (#11781)
  • make PyLazyGroupby reusable (#11769)
  • only exclude final output names of group_by key expressions (#11768)
  • Fix subsecond parsing in timedelta conversions (#11759)
  • fix ambiguity wrt list aggregation states (#11758)
  • Correctly process subseconds in pl.duration (#11748)
  • use actual number of read rows for hive materialization (#11690)
  • return float dtype in interpolate (for method="linear") for numeric dtypes (#11624)
  • fix seg fault in concat_str of empty series (#11704)
  • fix sort_by regression (#11679)
  • Fix match on last item for join_asof with strategy="nearest" (#11673)

πŸ› οΈ Other improvements

  • Bump lint dependencies (#11802)
  • Minor updates to assertion utils and docstrings (#11798)
  • Remove unused _to_rust_syntax util (#11795)
  • Minor tweak in code example in section Coming from Pandas (#11764)
  • Fix Exception module paths (#11785)
  • Rename IntegralType to IntegerType (#11773)
  • more granular polars-ops imports (#11760)
  • Link to expand_selector in user guide (#11722)
  • Add parametric test for df.to_dict/series.to_list (#11757)
  • Minor fix in code example in section Coming from Pandas (#11745) (#11745)
  • Move tests for group_by_dynamic into one module (#11741)
  • Update group_by_dynamic example (#11737)
  • reorder pl.duration arguments (#11641)
  • remove default features from some crates (#11680)
  • *_horizontal dependent on reduce_expr to expression architecture (#11685)
  • clarify that median is equivalent to the 50% percentile shown in describe metrics (#11694)
  • update rustc and fix future (#11696)
  • Publish release after uploading assets (#11686)
  • upgrade pyo3 to 0.20.0 (#11683)
  • better align help command output following addition of some longer options (#11681)
  • sum_horizontal to expression architecture (#11659)
  • add note about use of polars-lts-cpu for macOS x86-64/rosetta (#11660)
  • improve rank implementation, especially around nulls (#11651)

Thank you to all our contributors for making this release possible!
@JulianCologne, @MarcoGorelli, @Walnut356, @aberres, @alexander-beedie, @alicja-januszkiewicz, @cmdlineluser, @jrycw, @mcrumiller, @messense, @nameexhaustion, @orlp, @petrosbar, @rancomp, @reswqa, @ritchie46, @romanovacca, @sd2k, @stinodego, @svaningelgem and @thomasjpfan

polars - Python Polars 0.19.8

Published by github-actions[bot] about 1 year ago

πŸ† Highlights

  • Enable additional flags for x86-64 wheels (#11487)

⚠️ Deprecations

  • Rename .list.lengths and .str.lengths (#11613)
  • Deprecate default value for radix in parse_int (#11615)
  • Rename write_csv parameter quote to quote_char (#11583)

πŸš€ Performance improvements

  • actually use projection information in async parquet reader (#11637)
  • improve performance and fix panic in async parquet reader (#11607)
  • use try_binary_elementwise over try_binary_elementwise_values (#11596)
  • skip empty chunks in concat (#11565)
  • improve sparse sample performance (#11544)

✨ Enhancements

  • Standardize error message format (#11598)
  • allow coalesce in streaming (#11633)
  • Implement schema, schema_override for pl.read_json with array-like input (#11492)
  • add SQL support for UNION [ALL] BY NAME, add "diagonal_relaxed" strategy for pl.concat (#11597)
  • improve performance and fix panic in async parquet reader (#11607)
  • add time_unit argument to duration, default to "us" (#11586)
  • support read_database options passthrough to the underlying connection's execute method (enables parameterised SQL queries, etc) (#11562)
  • elide overflow checks on i64 (#11563)
  • add INITCAP string function for SQL (#9884)

🐞 Bug fixes

  • Fix input replacement logic for slice (#11631)
  • slice expr can be taken in cse (#11628)
  • ensure nested logical types are converted to physical (#11621)
  • correctly convert nullability of nested parquet fields to arrow (#11619)
  • improve performance and fix panic in async parquet reader (#11607)
  • normalize filepath in sink_parquet (#11605)
  • parse time unit properly in pl.lit (#11573)
  • expand all literals before group_by (#11590)
  • fix as_dict with include_key=False for partition_by (#9865)
  • mark take_group_last function as unsafe (#11587)
  • handle unary operators applied to numbers used in SQL IN clauses (#11574)
  • Align new_columns argument for scan_csv and read_csv (#11575)
  • Add initialization support for python Timedeltas (#11566)
  • incomplete reading of list types from parquet (#11578)
  • respect identity in horizontal sum (#11559)
  • bug in BitMask::get_u32 (#11560)
  • take slice into account in parallel unions (#11558)
  • correct schema empty df in hive partitioning read (#11557)
  • ensure ListChunked::full_null uses physical types (#11554)
  • respect 'hive_partitioning' argument in parquet (#11551)
  • fix parquet deserialization Overflow error by using i64 offset types when promoting Arrow Lists to LargeLists (#11549)
  • streamline is_in handling of mismatched dtypes and fix a minor regression (#11533)
  • fix comparing tz-aware series with stdlib datetime (#11480)
  • catch use of non equi-joins in SQL interface and raise appropriate error (#11526)
  • rework SQL join constraint processing to properly account for all USING columns (#11518)

πŸ› οΈ Other improvements

  • Improved user guide for cloud functionality (#11646)
  • Improve some docstrings (#11644)
  • Disable clippy lint "too many arguments" for py-polars (#11616)
  • Make backwardfill and forwardfill function expr non-anonymous (#11630)
  • Make all expr in dt namespace non-anonymous (#11627)
  • Fix changelog for language-specific breaking changes (#11617)
  • Make value_counts and unique_counts function expr non-anonymous (#11601)
  • Make arg_min(max), diff in list namespace non-anonymous (#11602)
  • Rename write_csv parameter quote to quote_char (#11583)
  • improve struct documentation (#11585)
  • Remove **kwargs from LazyFrame.collect() (#11567)
  • use a generic consistent total ordering, also for floats (#11468)
  • fix lints (#11555)
  • Remove toolchain specification workaround (#11552)
  • Trigger Python release from Actions workflow dispatch (#11538)
  • Enable additional flags for x86-64 wheels (#11487)

Thank you to all our contributors for making this release possible!
@ByteNybbler, @MarcoGorelli, @TheDataScientistNL, @alexander-beedie, @andysham, @c-peters, @jhorstmann, @mcrumiller, @nameexhaustion, @orlp, @reswqa, @ritchie46, @romanovacca, @stinodego and @svaningelgem

polars - Python Polars 0.19.7

Published by github-actions[bot] about 1 year ago

πŸ† Highlights

  • Postfix rolling expression as a special case of window functions. (#11445)
  • Use IPC for (un)pickling dataframes/series (#11507)

πŸš€ Performance improvements

  • early return in replace_time_zone if target and source time zones match (#11478)
  • greatly improve parquet cloud reading (#11479)
  • ensure we download row-groups concurrently. (#11464)

✨ Enhancements

  • support left and right anti/semi joins from the SQL interface (#11501)
  • Add left_on and right_on parameters to df.update (#11277)
  • expressify peak_min/peak_max (#11482)
  • IN(subquery) and SQL Subquery Infrastructure (#11218)
  • add ODBC connection string support to read_database (#11448)
  • postfix rolling expression as a special case of window functions. (#11445)
  • allow for "by" column to be of dtype Date in rolling_* functions (#11004)
  • rework ColumnFactory to additionally support tab-complete for col in IPython (#11435)

🐞 Bug fixes

  • literal hash (#11508)
  • Fix lazy schema for cut/qcut when allow_breaks=True (#11287)
  • correct output schema of hive partition and projection at scan (#11499)
  • correct projection pushdown in hive partitioned read (#11486)
  • fix for write_csv when using non-default "quote" char (#11474)
  • fix deserialization of parquets with large string list columns causing stack overflow (#11471)
  • enable read_database fallback for Snowflake warehouses/connections that don't support Arrow resultsets (#11447)
  • Fix SQL ANY and ALL behaviour (#10879)
  • partially address some PyCharm tooltip/signature issues with decorated methods (#11428)
  • address multiple issues caused by implicit casting of is_in values to the column dtype being searched (#11427)

πŸ› οΈ Other improvements

  • minor changes in peak-min/max (#11491)
  • align cloud url regex in rust and python (#11481)
  • Test sdist before releasing (#11494)
  • Unpin maturin version, fix release workflow (#11483)
  • More release workflow refactor (#11472)
  • Set some env vars for release (#11463)
  • move repeat_by to polars-ops (#11461)
  • upgrade to nightly-10-02 (#11460)
  • Update contributing guide to include memory requirement (#11458)
  • add missing docs entry for rolling (#11456)
  • use with_columns in shift examples (#11453)
  • Add wheels as assets to GitHub release (#11452)
  • Build more wheels for polars-lts-cpu/polars-u64-idx (#11430)

Thank you to all our contributors for making this release possible!
@ByteNybbler, @MarcoGorelli, @SeanTroyUWO, @alexander-beedie, @c-peters, @dependabot, @dependabot[bot], @mcrumiller, @orlp, @ritchie46, @romanovacca, @stinodego, @svaningelgem and Romano Vacca

polars - Python Polars 0.19.6

Published by github-actions[bot] about 1 year ago

πŸš€ Performance improvements

  • don't load N metadata files when globbing N files (#11422)

🐞 Bug fixes

  • raise on invalid sort_by group lengths (#11423)
  • fix outer join on bools (#11417)
  • fix categorical collect (#11414)
  • fix opaque python reader schema (#11412)
  • async parquet. (#11403)
  • Fix edge-case where the Array dtype could (internally) be considered numeric (#11398)
  • handle ambiguous datetimes in pl.lit (#11386)
  • fix panic in hive read of booleans (#11376)

πŸ› οΈ Other improvements

  • Split Python release into build / release jobs (#11421)
  • Refactor Python release workflow (#11382)
  • clarify use of "batch_size" for read_database (#11377)
  • large windows runner for release (#11370)

Thank you to all our contributors for making this release possible!
@MarcoGorelli, @alexander-beedie, @bowlofeggs, @c-peters, @jonashaag, @orlp, @ritchie46 and @stinodego

polars - Python Polars 0.19.5

Published by github-actions[bot] about 1 year ago

πŸš€ Performance improvements

  • remove double memcopy (#11365)
  • adress perf regression (#11354)

🐞 Bug fixes

  • revert invalid runtime check (#11363)
  • more cloud urls (#11361)
  • ensure cloud globbing can deal with spaces (#11360)
  • recognize more cloud urls (#11357)

πŸ› οΈ Other improvements

  • Disable version warning banner for now (#11359)
  • Fix error message reference to infer_schema_length (#11358)
  • Mark some tests as slow (#11350)
  • improve parametric tests for group_by_rolling by skipping overflowing cases (#11286)

Thank you to all our contributors for making this release possible!
@MarcoGorelli, @jonashaag, @orlp, @ritchie46 and @stinodego

polars - Python Polars 0.19.4

Published by github-actions[bot] about 1 year ago

πŸ† Highlights

  • support 'hive partitioning' aware readers (#11284)
  • natively support reading parquet for aws, gcp and azure (#11210)
  • Add support for Iceberg (#10375)
  • The great expressification by @reswqa (#11320, #11344, #11313, #11257, #11288, #11275, #11197, #11167, #11155)

⚠️ Deprecations

  • Add disable_string_cache (#11020)

πŸš€ Performance improvements

  • improve dynamic_groupby_iter (#11341)
  • improve and fix rolling windows by linear scanning (#11326)
  • faster init from pydantic models that have a small number of fields, and support direct init from SQLModel data (often used with FastAPI) (#11263)
  • improve outer join materialization (#11241)
  • use ryu and itoa for primitive serialization (#11193)
  • use try-binary-elementwise instead of try-binary-elementwise-values in dt_truncate (#11189)
  • Using cache for str.contains regex compilation (#11183)

✨ Enhancements

  • introduce 'label' instead of 'truncate' in group_by_dynamic, which can take label='right' (#11337)
  • Expressify list.shift (#11320)
  • top_k and bottom_k supports pass an expr (#11344)
  • add "pyxlsb" engine support to read_excel (for excel binary workbook files) (#11248)
  • support 'hive partitioning' aware readers (#11284)
  • str.strip_chars supports take an expr argument (#11313)
  • sample n can take an expr (#11257)
  • Add disable_string_cache (#11020)
  • clip supports expr arguments and physical numeric dtype (#11288)
  • Introduce list.drop_nulls (#11272)
  • str.splitn and split_exact can take an expr argument by (#11275)
  • introduce ambiguous option for dt.round (#11269)
  • Adds NULLIF and COALESCE SQL functions (#11124)
  • better tree-formatting representation (#11176)
  • natively support reading parquet for aws, gcp and azure (#11210)
  • Expressify str.strip_prefix & suffix (#11197)
  • Add support for Iceberg (#10375)
  • list.join's separator can be expression (#11167)
  • argument every of datetime.truncate can be expression (#11155)

🐞 Bug fixes

  • Fix Series.__contains__ for None values and implement is_in for null Series (#11345)
  • don't panic on multi-nodes in streaming conversion (#11343)
  • ensure trailing quote is written for temporal data when CSV quote_style is non-numeric (#11328)
  • clarify has_validity docstring and fix several cases where the presence of a bitmask was used to incorrectly infer the existence of null values (#11319)
  • fix empty Series construction edge-case with Struct dtype (#11301)
  • DataFrame init from collections.namedtuple values (#11314)
  • Exclude functools wrapper frames in find_stacklevel (#11292)
  • set partitions independent of thread pool (#11304)
  • address VSCode issue with autocomplete on selector expressions in editor/console (#11235)
  • consume duplicates in rolling_by window (#11261)
  • handle url encoded paths in objectpath creation (#11240)
  • use POOL when writing csv (#11222)
  • don't conflate saved Config JSON string with file path (#11098)
  • is_in for bool evaluate has_false incorrectly (#11217)
  • improve handling of database drivers that can return arrow data (#11201)
  • fix nullable filter mask in group_by (#11207)
  • replace n-th in filter (#11206)
  • fix translation of Series-nested datetime/date values for scan_pyarrow predicates (#11195)
  • address unexpected expression name from use of unary - or + operators (#11158)
  • impl hash for more function expr (#11182)
  • list.join's separator can be expression (#11167)
  • Add some missing expr type hint for series (#11171)
  • consistently use negative every as the default for offset in group_by_dynamic (#11164)
  • Make pl.struct serializable (#11169)
  • only raise on actual parameter collision when "dtypes" specified in read_excel "read_csv_options" (#11162)
  • propagate null value for str/binary starts/ends_with and contains (#11141)

πŸ› οΈ Other improvements

  • simplify/clarify group_by_dynamic examples (#11335)
  • tighten assert_frame_equal for LazyFrames (don't collect until after the schema has been checked) (#11331)
  • unify display for namespaced function expr (#11342)
  • add lazy pivot example (#11325)
  • Use GITHUB_TOKEN to get contributor information for docs (#11321)
  • Enable version warning banner (#11322)
  • cross-reference null_count from has_validity (clarifies the correct way to check for nulls) (#11323)
  • Pin pydantic in dev requirements <2.4.0 (#11312)
  • remove default auto-explode for map_many_private (#11270)
  • Add type alias IntoExprColumn (#11296)
  • update a few dependencies (#11283)
  • Properly skip ADBC test (#11282)
  • Fix some minor Makefile issues (#11276)
  • update sponsors (#11271)
  • parametric tests for group_by_rolling (#11262)
  • Make some list function expr non-anonymous (#11230)
  • Mention the performant feature only once (#11223)
  • remove unneeded indirection (#11233)
  • remove unneeded mutex around object-store (#11224)
  • clarify every/period/offset in group_by_dynamic (#11175)
  • Fix read_database batch_size docstring (#11132)

Thank you to all our contributors for making this release possible!
@ByteNybbler, @Cheukting, @Fokko, @Hofer-Julian, @MarcoGorelli, @SeanTroyUWO, @alexander-beedie, @billylanchantin, @jonashaag, @mcrumiller, @orlp, @ptiza, @reswqa, @ritchie46, @stinodego and @universalmind303

polars - Rust Polars 0.33

Published by github-actions[bot] about 1 year ago

πŸ† Highlights

  • implementing sink_csv for LazyFrame (#10682)

πŸ’₯ Breaking changes

  • empty product returns identity (#10842)
  • return f64 for rank when method="average" (#10734)
  • Rename groupby to group_by (#10654)
  • Read/write support for IPC streams in DataFrames (#10606)
  • Change behavior of all - fix Kleene logic implementation for all/any (#10564)
  • remove fixed_seed and add pl.set_random_seed (#10388)
  • Make arange an alias for int_range (#9983)
  • date_range/time_range no longer return a List type (#10526)
  • Remove various functionalities deprecated before 0.18 (#10527)

⚠️ Deprecations

  • Rename is_first/last to is_first/last_distinct (#11130)
  • Rename count_match to count_matches (#11028)
  • Rename strip to strip_chars (#10813)
  • Add datetime_range expression function (#10213)
  • Rename Series/Expr.rolling_apply to rolling_map (#10750)

πŸš€ Performance improvements

  • improve performance of fast projection (#10945)
  • parse time zones outside of downcast_iter() in replace_time_zone (#10713)
  • use binary abstraction for atan2 (#10588)
  • use binary abstraction in pow (#10562)

✨ Enhancements

  • Expressify str.split argument. (#11117)
  • Expressify argument of binary contains (#11091)
  • dt.offset_by supports broadcasting lhs (#11095)
  • Expressify argument of binary starts_with and ends_with (#11076)
  • json_extract supports extract static and string value to list dtype (#11057)
  • add quote_style="never" option for write_csv (#11015)
  • add support for nextest (#11048)
  • Add literal for str count_match (#10996)
  • More dtypes supports cast to list (#11025)
  • ParquetCloudSink to allow streaming pipelines into remote ObjectStores (#10060)
  • Add strip_prefix and strip_suffix to the string namespace (#10958)
  • Add datetime_range expression function (#10213)
  • add proper cache for Regex compilation (#10934)
  • implementation of array_to_string (#10839)
  • apply left side predicate pushdown also to right side if all predicate columns are also join columns (#10841)
  • accept expr in str.count_match (#10900)
  • accept expressions in .offset_by (#9967)
  • implement drop as special case of select (#10885)
  • Supports is_last operation (#10760)
  • activate cse for group_by (again) (#10749)
  • add pairwise float sum implementation (#10756)
  • implementing sink_csv for LazyFrame (#10682)
  • Supports series unique & arg_unique & n_unique for list (#10743)
  • repeat_by should also support broadcasting of LHS (#10735)
  • deprecate 'use_earliest' argument in favour of 'ambiguous', which can take expressions (#10719)
  • is_first also supports numeric list type. (#10727)
  • improve slice pushdown in unions (#10723)
  • Support min and max strategy for binary & str columns fill null (#10673)
  • support broadcasting in list set operations (#10668)
  • add truncate_ragged_lines (#10660)
  • supports cast to list (#10623)
  • Rename groupby to group_by (#10654)
  • preserve whitespace in notebook output (#10644)
  • Read/write support for IPC streams in DataFrames (#10606)
  • improve binary (arity) generics (#10622)
  • propagate null is in is_in and more generic array construction (#10614)
  • Change behavior of all - fix Kleene logic implementation for all/any (#10564)
  • frame-level cast support (#10504)
  • Add failed column to cast exception (#10507)
  • Make arange an alias for int_range (#9983)
  • date_range/time_range no longer return a List type (#10526)
  • Remove various functionalities deprecated before 0.18 (#10527)

🐞 Bug fixes

  • Correct hash and fmt for struct expr (#11119)
  • enforce sortedness of by argument in rolling_* functions (#11002)
  • Filter on empty objectChunked should not throw error (#11073)
  • ensure null_count statistics accounts for null array (#11070)
  • toggle off cse if ext_context is used (#11051)
  • Correct field dtype of string concat (#11055)
  • pushed-down expr should be considered when evaluating ExternalContext (#11023)
  • fix rolling_* functions when "by" has nanosecond resolution (#11005)
  • Don't reuse member for Selector::Add (#11026)
  • fix the construction of List<Null> (#10969)
  • allow singular null in regex pattern (#10948)
  • compute length of null array in explode (#10946)
  • Allow exactly one value in start/end for int_range (#10914)
  • count was falsy tagged as cse in group by (#10917)
  • Retain original dtype when deserializing an empty list (#10893)
  • CSE don't accept opaque functions (#10905)
  • Make int_range(s) exclusive on the upper bound when step is negative (#10898)
  • fix conversion from decimal to float (#10776)
  • Add broadcasting for list comparisons (#10857)
  • don't overflow length before checking limit (#10883)
  • fix bug where datetimes were not parsed in read_csv when pattern had no hour or minute (#10877)
  • tag amortized iter unsafe and add safe alternatives (#10881)
  • use pool in dataframe arithmetic (#10864)
  • remove debug println! from datetime fn (#10862)
  • repair polars_err string interpolation (#10863)
  • make count_match docs and extract_all docs/impl consistent around zero matches (#10854)
  • empty product returns identity (#10842)
  • never panic in hash/equality doesn't hold in cse (#10836)
  • Improve bound checks on temporal ranges (#10837)
  • var/std behavior around few elements (#10828)
  • Fix divided by zero error when read empty csv in streaming mode (#10819)
  • fix equality of quantile aggregation node (#10816)
  • Reading an only-header csv file in streaming mode should not panic (#10810)
  • get_single_leaf can't handle Expr::Count (#10790)
  • string to decimal parsing (#10712)
  • support groupby literal in streaming (#10771)
  • ORDER BY on unselected columns (#10752)
  • Fix is_in cannot cast list type for float (#10769)
  • fix unicode truncation in json parsing (#10761)
  • Error message of list unique should not display inner type (#10748)
  • create chunks_mut entry in vtable (#10745)
  • Prevent panic on sample_n with replacement from empty df (#10731)
  • only preserve sortedness flag in replace_time_zone when safe (#10738)
  • Error on value_counts on column named "counts" (#10737)
  • Build Series from empty Series vector (#10558)
  • return f64 for rank when method="average" (#10734)
  • Keep min/max and arg_min/arg_max consistent. (#10716)
  • Fix bug when providing custom labels and opting for duplicates in qcut (#10686)
  • Cast small int type when scan csv in streaming mode. (#10679)
  • Reused input series in rolling_apply should not be orderly (#10694)
  • re-sort buffer when update window swap the whole buffer (#10696)
  • Set the correct fast_explode flag for ListUtf8ChunkedBuilder (#10684)
  • Sorted Utf8Chunked max_str and min_str should consider null value (#10675)
  • AllHorizontal format string (#10658)
  • List<null> chunked builder should take care of series name (#10642)
  • respect 'ignore_errors=False' in csv parser (#10641)
  • fix rename + projection pushdown (#10624)
  • fix int/float downcast in is_in (#10620)
  • Change behavior of all - fix Kleene logic implementation for all/any (#10564)
  • Fix serialization for categorical chunked. (#10609)
  • join_asof missing tolerance implementation, address edge-cases (#10482)
  • Take input_schema to create physical expr for Selection (#10571)
  • fix serialization of empty lists (#10563)
  • Clear window cache after evaluate predication expr (#10505)
  • Parsing regex col in Expr::Columns (#10551)
  • sanitize column naming in boolean ops (#10531)
  • fix build for wasm (#10536)
  • remove fixed_seed and add pl.set_random_seed (#10388)
  • fix build for wasm (#9502)
  • rollback cse in groupby: python 0.18.15 (#10491)

πŸ› οΈ Other improvements

  • Removed duplicated example (#11109)
  • Add CODEOWNERS for docs folder (#11107)
  • Refactor starts_with and ends_with for string (#11085)
  • Integrate user guide (#11089)
  • remove feature gate join/groupby in polars-core (#10965)
  • Add Documentation issue type (#11042)
  • complete intra-docs in api documentation (#11007)
  • genericize take implementation (#10976)
  • genericize PolarsDataType (#10952)
  • enhance internal crates readme with reference to main crate (#10928)
  • Add Duration method for checking full days (#10850)
  • apply with_name in more places (#10899)
  • never compare opaque functions (#10906)
  • eliminate repetition in utf8 datetime functions (#10860)
  • Fix issue templates for bug reports (#10896)
  • remove LocalProjection (#10886)
  • request verbose logging output of minimal reproducable examples (#10882)
  • Reorganize range expression module (#10871)
  • introduce with_name for Series/ChunkedArray (#10859)
  • Further refactor temporal range functions (#10844)
  • Refactor range related functions (#10830)
  • Fix the un-compile Black box function parts in polars lazy cookbook (#10809)
  • Fix some broken links / formatting (#10772)
  • Improve docs for polars-lazy (#10729)
  • update rustc nightly_2023-08-26 (#10467)
  • default to rust native flate2 lib (#10733)
  • Clear GitHub Actions caches weekly (#10715)
  • move 'is_in' to polars-ops (#10645)
  • Clean up schema calculation for date_range (#10653)
  • remove unused apply functions and add fallible generic apply functions (#10621)
  • Enforce up-to-date Cargo.lock (#10555)
  • make binary chunkedarray functions DRY (#10607)
  • bump MSRV to 1.65 (#10568)
  • genericize chunk implementation (#10506)
  • use ChunkArray::(try_)from_chunk_iter (#10497)
  • add VSCode rust-analyzer settings (#10498)
  • Update URLs for dev documentation (#10495)
  • Update features for latest flate2 release (#10492)

Thank you to all our contributors for making this release possible!
@Barsik-sus, @I8dNLo, @JulianCologne, @KacpiW, @MarcoGorelli, @Object905, @OndrejSlamecka, @Qqwy, @SeanTroyUWO, @TNieuwdorp, @VasanthakumarV, @alexander-beedie, @aminalaee, @antoniocali, @braaannigan, @bvanelli, @c-peters, @cjackal, @cmdlineluser, @dependabot, @dependabot[bot], @drgif, @henrikig, @ion-elgreco, @jakob-keller, @jeroenjanssens, @jonashaag, @lorepozo, @marki259, @mcrumiller, @messense, @mrogowski11, @nameexhaustion, @orlp, @owrior, @rben01, @reswqa, @ritchie46, @s-banach, @sdamashek, @stinodego, @svaningelgem, @thomasjpfan, @titoeb, @trueb2, @washcycle, @wdoppenberg and @zundertj

polars - Python Polars 0.19.3

Published by github-actions[bot] about 1 year ago

πŸ† Highlights

  • Polars plugins (#10924)

⚠️ Deprecations

  • Rename is_first/last to is_first/last_distinct (#11130)
  • Rename count_match to count_matches (#11028)
  • Rename strip to strip_chars (#10813)
  • Add datetime_range expression function (#10213)

πŸš€ Performance improvements

  • optimize _unpack_schema() (#11080)
  • optimize polars.utils._post_apply_columns() (#11086)
  • optimize polars.utils._post_apply_columns() (#11041)
  • optimize _unpack_schema() (#10960)
  • improve performance of fast projection (#10945)

✨ Enhancements

  • Expressify str.split argument. (#11117)
  • Polars plugins (#10924)
  • better async_collect (#10912)
  • Expressify argument of binary contains (#11091)
  • dt.offset_by supports broadcasting lhs (#11095)
  • Expressify argument of binary starts_with and ends_with (#11076)
  • add OpenOffice spreadsheet support via new pl.read_ods function (#11011)
  • json_extract supports extract static and string value to list dtype (#11057)
  • add quote_style="never" option for write_csv (#11015)
  • Add literal for str count_match (#10996)
  • More dtypes supports cast to list (#11025)
  • Add strip_prefix and strip_suffix to the string namespace (#10958)
  • improve read_excel table data identification (#10953)
  • Add from_dataframe fast path and improve typing (#10979)
  • add openpyxl as a new/optional engine for read_excel (#6183)
  • Add datetime_range expression function (#10213)

🐞 Bug fixes

  • Correct hash and fmt for struct expr (#11119)
  • enforce sortedness of by argument in rolling_* functions (#11002)
  • Make Series.__getitem__ raise an IndexError (#11061)
  • Filter on empty objectChunked should not throw error (#11073)
  • ensure null_count statistics accounts for null array (#11070)
  • toggle off cse if ext_context is used (#11051)
  • Correct field dtype of string concat (#11055)
  • fix partial schema init with read_dicts and reduce latency of small-frame creation (#11047)
  • pushed-down expr should be considered when evaluating ExternalContext (#11023)
  • fix rolling_* functions when "by" has nanosecond resolution (#11005)
  • Don't reuse member for Selector::Add (#11026)
  • ensure series_equal properly accounts for dtypes when strict=True (#11012)
  • fix the construction of List<Null> (#10969)
  • write_excel "hidden_columns" parameter fails when taking a selector (#10987)
  • allow singular null in regex pattern (#10948)
  • compute length of null array in explode (#10946)

πŸ› οΈ Other improvements

  • remove low contrast coloring from visited links (#11133)
  • Ignore matplotlib warning (#11129)
  • Do not run user guide examples by default (#11128)
  • Ignore matplotlib mypy warnings (#11126)
  • Add deprecation message in groupby docs (#11121)
  • Removed duplicated example (#11109)
  • Add CODEOWNERS for docs folder (#11107)
  • Refactor starts_with and ends_with for string (#11085)
  • Integrate user guide (#11089)
  • remove mentions of the deprecated random module (#11087)
  • simplify SchemaDefinition type alias (#11077)
  • put fetch explanation in a "notes" block to better highlight it in the docs (#11058)
  • remove feature gate join/groupby in polars-core (#10965)
  • Add Documentation issue type (#11042)
  • warn that "by" argument must be sorted for results to be correct in rolling_* functions (#11013)
  • Adds missing method refs in LazyDataFrame API docs (#11027)
  • Add lint for boolean trap (#11010)
  • Add private LazyFrame method for setting sink optimizations (#10988)
  • Enable a few more ruff lints (#10998)
  • document polars string duration language in temporal range functions (#10978)
  • Additional tests for interchange get_data_buffer (#10966)
  • genericize PolarsDataType (#10952)
  • Document that filter, drop_nulls, left join preserve order (#10955)
  • add note about adbc flight sql driver (#10949)
  • Revert pydantic >= 2.0.0 requirement (#10944)
  • note that pl.duration represents fixed durations, point to offset_by for non-fixed (#10927)
  • Test S3 functionality using moto server (#10164)

Thank you to all our contributors for making this release possible!
@I8dNLo, @KacpiW, @MarcoGorelli, @Object905, @Qqwy, @TNieuwdorp, @alexander-beedie, @antoniocali, @bvanelli, @cjackal, @henrikig, @jakob-keller, @mrogowski11, @nameexhaustion, @orlp, @reswqa, @ritchie46, @s-banach, @stinodego, @svaningelgem and @thomasjpfan

polars - Python Polars 0.19.2

Published by github-actions[bot] about 1 year ago

πŸ† Highlights

  • Add syntactic sugar for col("foo") -> col.foo (#10874)

⚠️ Deprecations

  • Rename Expr.is_not() to not_() (#10838)

✨ Enhancements

  • allow individual Config options to be easily reset to their default value (#10922)
  • accept expr in str.count_match (#10900)
  • allow additional glimpse customisation, fix strings repr (#10895)
  • accept expressions in .offset_by (#9967)
  • support schema overrides for frames created from databases (#10884)
  • Add syntactic sugar for col("foo") -> col.foo (#10874)
  • support negative indexing in set_at_idx (#10891)
  • implement drop as special case of select (#10885)
  • raise a more helpful error when non-query statements passed to read_database (#10851)

🐞 Bug fixes

  • Allow exactly one value in start/end for int_range (#10914)
  • fix(rust, python): raise error when function didn't receive any inputs (#8635)
  • count was falsy tagged as cse in group by (#10917)
  • CSE don't accept opaque functions (#10905)
  • Make int_range(s) exclusive on the upper bound when step is negative (#10898)
  • don't overflow length before checking limit (#10883)
  • fix bug where datetimes were not parsed in read_csv when pattern had no hour or minute (#10877)
  • use pool in dataframe arithmetic (#10864)
  • repair polars_err string interpolation (#10863)
  • make count_match docs and extract_all docs/impl consistent around zero matches (#10854)

πŸ› οΈ Other improvements

  • Set minimum version for pydantic to 2.0.0 (#10923)
  • fix and clarify docs for Expr.map_elements (#10647)
  • fix rendering of bullet points in dt.round (#10911)
  • add test for 10875 (#10913)
  • apply with_name in more places (#10899)
  • never compare opaque functions (#10906)
  • eliminate repetition in utf8 datetime functions (#10860)
  • Fix issue templates for bug reports (#10896)
  • request verbose logging output of minimal reproducable examples (#10882)
  • add a note about read_database connection/cursor behaviour (#10873)
  • introduce with_name for Series/ChunkedArray (#10859)

Thank you to all our contributors for making this release possible!
@Barsik-sus, @MarcoGorelli, @alexander-beedie, @c-peters, @cmdlineluser, @dependabot, @dependabot[bot], @drgif, @jeroenjanssens, @orlp, @ritchie46, @stinodego and @wdoppenberg

polars - Python Polars 0.19.1

Published by github-actions[bot] about 1 year ago

πŸ’₯ Breaking changes

  • empty product returns identity and product ignores nulls (#10842)

✨ Enhancements

  • add binary, boolean, categorical, date, object, and time selectors (#10806)
  • Supports is_last operation (#10760)
  • minor typing improvement for DataFrame.__iter__ (#10825)
  • Add custom error for allow_copy=False (#10822)

🐞 Bug fixes

  • empty product returns identity (#10842)
  • never panic in hash/equality doesn't hold in cse (#10836)
  • Improve bound checks on temporal ranges (#10837)
  • var/std behavior around few elements (#10828)
  • Fix divided by zero error when read empty csv in streaming mode (#10819)
  • behaviour of reversed(df) (#10823)
  • fix equality of quantile aggregation node (#10816)
  • Reading an only-header csv file in streaming mode should not panic (#10810)

πŸ› οΈ Other improvements

  • Refactor range related functions (#10830)
  • map-related docstring updates (#10779)
  • Move sink tests to streaming module (#10821)

Thank you to all our contributors for making this release possible!
@alexander-beedie, @orlp, @reswqa, @ritchie46 and @stinodego

polars - Python Polars 0.19.0

Published by github-actions[bot] about 1 year ago

An upgrade guide is available on our website.

πŸ† Highlights

  • implementing sink_csv for LazyFrame (#10682)
  • Support DataFrame init from queries against users' existing database connections (#10649)
  • Rename groupby to group_by (#10656)

πŸ’₯ Breaking changes

  • return f64 for rank when method="average" (#10734)
  • Update a lot of error types (#10637)
  • Remove deprecated behavior from vertical aggregations (#10602)
  • Read/write support for IPC streams in DataFrames (#10606)
  • Change behavior of all - fix Kleene logic implementation for all/any (#10564)
  • Improve consistency of parsing expression input (#9512)
  • allow from_arrow to take a generator of RecordBatches, change error type to TypeError (#10529)
  • remove fixed_seed and add pl.set_random_seed (#10388)
  • Make arange an alias for int_range (#9983)
  • date_range/time_range no longer return a List type (#10526)
  • Remove various functionalities deprecated before 0.18 (#10527)
  • Improve some error types and messages (#10470)

⚠️ Deprecations

  • Rename map to map_batches (#10801)
  • Rename GroupBy.apply to map_groups (#10799)
  • Rename DataFrame.apply to map_rows (#10797)
  • Rename Series/Expr.rolling_apply to rolling_map (#10750)
  • Rename Series/Expr.apply to map_elements (#10678)
  • Rename groupby to group_by (#10656)
  • Deprecate some parameters of cut/qcut (#10484)

πŸš€ Performance improvements

  • parse time zones outside of downcast_iter() in replace_time_zone (#10713)
  • use binary abstraction for atan2 (#10588)
  • use binary abstraction in pow (#10562)

✨ Enhancements

  • activate cse for group_by (again) (#10749)
  • implementing sink_csv for LazyFrame (#10682)
  • Supports series unique & arg_unique & n_unique for list (#10743)
  • repeat_by should also support broadcasting of LHS (#10735)
  • deprecate 'use_earliest' argument in favour of 'ambiguous', which can take expressions (#10719)
  • is_first also supports numeric list type. (#10727)
  • improve slice pushdown in unions (#10723)
  • Explicitly implement Protocol for interchange classes (#10688)
  • Support min and max strategy for binary & str columns fill null (#10673)
  • support broadcasting in list set operations (#10668)
  • csv: add schema argument (#10665)
  • Support DataFrame init from queries against users' existing database connections (#10649)
  • add truncate_ragged_lines (#10660)
  • supports cast to list (#10623)
  • Update a lot of error types (#10637)
  • preserve whitespace in notebook output (#10644)
  • Remove deprecated behavior from vertical aggregations (#10602)
  • support selector usage in write_excel arguments (#10589)
  • Add LazyFrame.collect_async and pl.collect_all_async (#10616)
  • Read/write support for IPC streams in DataFrames (#10606)
  • propagate null is in is_in and more generic array construction (#10614)
  • Change behavior of all - fix Kleene logic implementation for all/any (#10564)
  • frame-level cast support (#10504)
  • Improve consistency of parsing expression input (#9512)
  • Add failed column to cast exception (#10507)
  • allow from_arrow to take a generator of RecordBatches, change error type to TypeError (#10529)
  • Remove deprecated get_idx_type - use get_index_type instead (#10556)
  • Make arange an alias for int_range (#9983)
  • date_range/time_range no longer return a List type (#10526)
  • Remove various functionalities deprecated before 0.18 (#10527)
  • Improve some error types and messages (#10470)
  • suggest str.to_datetime instead of apply and stdlib strptime (#10266)

🐞 Bug fixes

  • get_single_leaf can't handle Expr::Count (#10790)
  • support groupby literal in streaming (#10771)
  • ORDER BY on unselected columns (#10752)
  • Fix is_in cannot cast list type for float (#10769)
  • whitespace CSS in Notebook HTML updated to use pre-wrap instead of pre (#10739)
  • only preserve sortedness flag in replace_time_zone when safe (#10738)
  • Error on value_counts on column named "counts" (#10737)
  • return f64 for rank when method="average" (#10734)
  • Keep min/max and arg_min/arg_max consistent. (#10716)
  • use time zone from dtype to overwrite output time zone when initialising Series (#10689)
  • Cast small int type when scan csv in streaming mode. (#10679)
  • raise exception with invalid on arg type for join_asof (#10690)
  • Reused input series in rolling_apply should not be orderly (#10694)
  • re-sort buffer when update window swap the whole buffer (#10696)
  • Set the correct fast_explode flag for ListUtf8ChunkedBuilder (#10684)
  • Sorted Utf8Chunked max_str and min_str should consider null value (#10675)
  • Correctly handle time zones in write_delta (#10633)
  • fix apply for empty series in threading mode (#10651)
  • respect 'ignore_errors=False' in csv parser (#10641)
  • fix rename + projection pushdown (#10624)
  • fix int/float downcast in is_in (#10620)
  • Change behavior of all - fix Kleene logic implementation for all/any (#10564)
  • Fix serialization for categorical chunked. (#10609)
  • Take input_schema to create physical expr for Selection (#10571)
  • Clear window cache after evaluate predication expr (#10505)
  • Parsing regex col in Expr::Columns (#10551)
  • sanitize column naming in boolean ops (#10531)
  • Fix write_delta with schema in delta_write_options (#10541)
  • remove fixed_seed and add pl.set_random_seed (#10388)
  • respect pl.Config options relating to shape, column names, and types when rendering HTML (#10449)

πŸ› οΈ Other improvements

  • update cargo.lock (#10800)
  • Create .venv in repo root (#10789)
  • refactored write_database unit tests to properly separate concerns (#10773)
  • Fix some broken links / formatting (#10772)
  • Document chained when-then behaviour more prominently (#10759)
  • Fix test failing due to new adbc release (#10763)
  • Unpin connectorx and bump other Python dependencies (#10753)
  • add note to testing docs about module import (#10741)
  • Clear GitHub Actions caches weekly (#10715)
  • Update for new pyarrow 13.0.0 behavior (#10691)
  • Fix minor issue with sink_parquet docs (#10669)
  • Remove deprecate_renamed_methods util (#10537)
  • add "see also" entries to ne/eq_missing and update related examples (#10667)
  • fix potential memory leak from usage of inspect.currentframe (#10630)
  • give more relevant example for polars.apply (#10631)
  • Bump ruff and enable new setting (#10626)
  • Add docstrings for Expr.meta namespace (#10617)
  • Enforce up-to-date Cargo.lock (#10555)
  • deprecate DataFrame.replace (#10600)
  • ensure that make requirements fully refreshes unpinned packages/deps (#10591)
  • fix out-of-date explain default parameter (#10566)
  • Fix expr_dispatch decorator to work on methods with decorators (#10549)
  • Fix link to source code (#10542)
  • Add title to index page (#10539)
  • Disable SIM108 lint (#10519)
  • Keep versioned docs (#10500)
  • switch to pyo3/maturin-action (#10503)
  • Update URLs for dev documentation (#10495)
  • Skip failing test (#10496)
  • Add version switcher to API reference (#10488)

Thank you to all our contributors for making this release possible!
@JulianCologne, @MarcoGorelli, @Object905, @OndrejSlamecka, @SeanTroyUWO, @VasanthakumarV, @alexander-beedie, @aminalaee, @braaannigan, @c-peters, @ion-elgreco, @lorepozo, @marki259, @mcrumiller, @messense, @orlp, @owrior, @rben01, @reswqa, @ritchie46, @sdamashek, @stinodego, @svaningelgem, @titoeb, @trueb2, @washcycle and @zundertj

polars - Python Polars 0.18.15

Published by github-actions[bot] about 1 year ago

🐞 Bug fixes

  • rollback cse in groupby: python 0.18.15 (#10491)

πŸ› οΈ Other improvements

  • Mark import timing check as slow (#10487)
  • Gather all streaming tests (#10485)
  • Bump maturin to version 1.2.1 (#10479)

Thank you to all our contributors for making this release possible!
@ritchie46 and @stinodego

polars - Rust Polars 0.32.0

Published by github-actions[bot] about 1 year ago

πŸ† Highlights

  • common subexpression elemination (#9632)

πŸ’₯ Breaking changes

  • remove deprecate tz_localize, name CastTimezone to ReplaceTimeZone (#10070)

⚠️ Deprecations

  • renaming approx_unique as approx_n_unique (#10290)
  • remove/deprecate cache and its logic (#10066)
  • Add date_ranges/time_ranges expression functions (#10005)

πŸš€ Performance improvements

  • pre-alloc int_ranges (#10399)
  • use hash as CSE Identifier (#10385)
  • re-use regex capture allocation (#10302) (#10335)
  • don't parallelize literal expressions (#10321)
  • fix O(n^2) in sorted check during append (#10241)
  • speedup mode on sorted data (#10084)
  • speedup boolean apply (#10073)
  • shrink alp/lp ~2.5x (#10039)
  • Remove fused arithmetic from expressions with literals (#10011)

✨ Enhancements

  • quote style option for csv writer (#10422)
  • add "raise_if_empty" flag to read_excel, read_csv, scan_csv, and read_csv_batched (#10409)
  • be more permissive on predicate pushdown to left side of left join (#10442)
  • add use_earliest to to_datetime / strptime (#10426)
  • {any/all}_horizontal to expression architecture (#10412)
  • serialize flags (#10140)
  • allow unaligned pointers in arrow FFI (#10403)
  • add line_terminator option to write_csv (#10373)
  • Add is_local and to_local to categorical namespace (#10372)
  • cse for groupby.agg and reduced cse collisions (#10381)
  • re-use regex capture allocation (#10302) (#10335)
  • Add Series.cat.uses_lexical_ordering (#10325)
  • improve datetime parsing error message (#10332)
  • allow sequential runners in select/with_columns (#10322)
  • improve err msg parsing time, date, datetime (#10298)
  • Add str.extract_groups (#10179)
  • add extra build profiles (#10268)
  • Extend datetime expression function with time zone/time unit parameters (#10235)
  • added gcs to gcp cloud schema in polars-core::cloud #10206. (#10207)
  • support writing duration type in json (#10112)
  • inline lit(Series).cast(..) to -> lit(Series.cast(..)) (#10092)
  • Move transpose naming to Rust (#10009)
  • cse in groupby's (#10062)
  • Adds sql CASE statement expressions (#10065)
  • Add date_ranges/time_ranges expression functions (#10005)
  • comm_subexpr_elim in streaming 'select/with_columns' (#10050)
  • common subexpression elemination (#9632)
  • Let qcut create evenly spaced probabilities (#9960)
  • sorted flag on singletons (#9933)
  • maintain sorted flag after partition_by (#9944)
  • keep sorted flag in streaming left join (#9932)
  • Add cloudpickle for serializing python UDFs (#9921)

🐞 Bug fixes

  • Fix incorrect handling of VisitRecursion::Skip. (#10452)
  • fix negative decimal parsing (#10444)
  • ensure sorted_sink hash equals the default path (#10464)
  • fix sum agg (#10459)
  • ensure last aggregation deals with default chunk (#10453)
  • fix cse input schema (#10450)
  • fix list groupby of array dtype (#10408)
  • correct AnyValue::hash (#10391)
  • finalize cast in partitioned groupby (#10359)
  • fix oob in 'last' (#10329)
  • fix categorical lexical sort (#10318)
  • Fix join validation (#10257)
  • Set correct dtype for .extract_groups() (#10306)
  • clear window cache and run windows on proper runners (#10303)
  • fix sorted fast path in streaming groupby wrt nulls (#10289)
  • fix nan aggregation in groupby (#10287)
  • check dtypes of single-column 'by' parameter in asof-join (#10284)
  • fix pyo3 link errors on macos (#10256)
  • fix empty streaming parquet file (#10252)
  • fix logical columns of streaming multi-column sort (#10250)
  • fix date/datetime parsing for short inputs with exact=False (#10231)
  • correct agg_sum for ChunkedArray. (#10243)
  • don't panic in wildcard apply (#10240)
  • fix cse profile (#10239)
  • correct struct null counts (#10142)
  • no cse in groupby until fixed (#10216)
  • fix is_in on empty series (#10195)
  • fix cse windows (#10197)
  • block predicate pushdown is_in and null producing … (#10194)
  • prevent re-ordering of dict keys inside .apply (#10172)
  • initialize fixed null values (#10192)
  • ensure window function run partitioned when cse is hit (#10170)
  • adjust for null values in str.replace fast path (#10132)
  • clear bit settings in list iteration (#10131)
  • use row-encoded for struct::is_sorted (#10129)
  • fix(rust, python): don't run file-caching in streaming mode (#10117)
  • Allow initialize of pl.Array in Dataframe using schema alone (#10100)
  • don't panic if masked out values are invalid in temporal kernels (#10114)
  • Fix struct get field by index out of bounds error. (#10097)
  • fix ub in simd-json (#10093)
  • fix invalid access when groupby rolling produces empty sets (#10109)
  • respect null_on_oob=False in list.take when pa… (#10105)
  • fix is_sorted for structs (#10099)
  • add file path to io error in scan_csv (#10076)
  • fix false positive in parquet stats evaluation (#10087)
  • fix error message from cast-timezone to replace-time-zone (#10089)
  • Address .col(regex).exclude() operations not executing. (#10025)
  • fix Boolean::isin(null values) (#10074)
  • predicate pushdown #10058 (#10071)
  • Fix weighted quantile for 0 weights (#10051)
  • fix incorrect state in projection pushdown with joins (#9987)
  • don't pass predicates referring to renamed literal… (#9965)
  • fix regression in regex expansion (#9952)
  • potential SO in csv infer schema (#9950)
  • raise on unsupported transpose and object types (#9946)
  • Fix as-of join when by groups are interleaved (#9938)

πŸ› οΈ Other improvements

  • fix and run polars-plan tests (#10465)
  • Simplify flag methods (#10429)
  • match_block_trailing_comma (#10414)
  • implement ChunkArray::(try_)from_chunk_iter (#10395)
  • add test for 10401 (#10405)
  • Bump some dependencies (#10396)
  • Move dependency version info to workspace level (#10295)
  • patch reedline until fix released (#10382)
  • remove wasm-timer dependency (#10347)
  • write down invariants of ChunkedArray (#10334)
  • fix typo in lib.rs (#10313)
  • Exclude examples from workspace default (#10309)
  • Update CODEOWNERS (#10261)
  • avoid outputting docs of dependencies (#10292)
  • Do not keep history in gh-pages branch (#10282)
  • Use workspace package info / organize dependencies section (#10279)
  • fix dead links in Rust documentation (#10251)
  • Fix make pre-commit command (#10205)
  • Fix make integration-tests command (#10202)
  • Replace "question" issues with link to Stack Overflow (#10230)
  • Update dependabot config (#10222)
  • Fix LICENSE symlink for moved crates (#10150)
  • Re-organize folder structure for Rust crates (#10141)
  • update to rustc nightly-2023-07-27 (#10139)
  • temporarily turn off fail-fast so that ubuntu tests run (#10133)
  • Refactor when/then/otherwise internals (#9922)
  • move replace_time_zone to polars-ops (#10078)
  • remove unneeded branch (#10082)
  • remove deprecate tz_localize, name CastTimezone to ReplaceTimeZone (#10070)
  • fix typo in contribution example (#10038)
  • correct example in API reference (#10032)
  • add developer contribution examples (#10013)
  • Update autolabeler again (#9984)
  • fix docs build and add to CI (#9904)
  • Minor makeover for Rust Makefile (#9874)

Thank you to all our contributors for making this release possible!
@0xbe7a, @CanglongCl, @JulianCologne, @MarcoGorelli, @OndrejSlamecka, @OneRaynyDay, @SeanTroyUWO, @StefanBRas, @TLouf, @alexander-beedie, @c-peters, @cjackal, @cmdlineluser, @dependabot, @dependabot[bot], @drgif, @duvenagep, @eltociear, @fsimkovic, @ion-elgreco, @jonashaag, @lfn3, @magarick, @mcrumiller, @orlp, @potzenhotz, @rea1bacon, @reswqa, @rikkaka, @ritchie46, @stinodego, @thomasaarholt, @varunmittal91 and @zundertj

polars - Python Polars 0.18.14

Published by github-actions[bot] about 1 year ago

πŸ† Highlights

  • Native implementation of dataframe interchange protocol (#10267)

⚠️ Deprecations

  • Deprecate behavior of list/tuple inputs for lit (#10461)

πŸš€ Performance improvements

  • optimise retrieval of values from df.item (~4-5x speedup) (#10411)
  • pre-alloc int_ranges (#10399)
  • use hash as CSE Identifier (#10385)

✨ Enhancements

  • quote style option for csv writer (#10422)
  • add "raise_if_empty" flag to read_excel, read_csv, scan_csv, and read_csv_batched (#10409)
  • add use_earliest to to_datetime / strptime (#10426)
  • add new "header_format" option for write_excel (#10392)
  • {any/all}_horizontal to expression architecture (#10412)
  • Native implementation of dataframe interchange protocol (#10267)
  • allow unaligned pointers in arrow FFI (#10403)
  • add line_terminator option to write_csv (#10373)
  • add explicit selector variants for signed/unsigned integers (#10384)
  • Add is_local and to_local to categorical namespace (#10372)
  • enhance selectors expansion function, so it can operate on a schema as well as a frame (#10341)
  • Order percentiles in describe (#10378)
  • cse for groupby.agg and reduced cse collisions (#10381)
  • improve take_every(0) exception (#10352)
  • add offset and length to get_ptr (#10361)

🐞 Bug fixes

  • fix pyarrow write_to_dataset wrt check_not_directory parameter (#10471)
  • fix negative decimal parsing (#10444)
  • ensure sorted_sink hash equals the default path (#10464)
  • address inconsistency in init from square numpy arrays with/without an explicit schema (#10445)
  • ensure last aggregation deals with default chunk (#10453)
  • fix cse input schema (#10450)
  • Fix by argument handling in join_asof (#10447)
  • fix potential OverflowError in testing asserts with huge UInt64 diffs (#10437)
  • Create delta compatible schema during writing (#10165)
  • fix list groupby of array dtype (#10408)
  • correct AnyValue::hash (#10391)
  • finalize cast in partitioned groupby (#10359)

πŸ› οΈ Other improvements

  • add vertical_relaxed example for pl.concat (#10472)
  • Run all streaming tests on the same test runner (#10469)
  • Organize OOC tests (#10463)
  • add test for 10417 (#10420)
  • Clean up some Sphinx settings (#10400)
  • add test for 10401 (#10405)
  • Address Ruff per file ignores (#10258)
  • Small improvement for PySeries.get_buffer (#10363)

Thank you to all our contributors for making this release possible!
@MarcoGorelli, @OndrejSlamecka, @alexander-beedie, @c-peters, @cmdlineluser, @drgif, @ion-elgreco, @lfn3, @orlp, @potzenhotz, @rea1bacon, @reswqa, @ritchie46, @stinodego and @zundertj

polars - Python Polars 0.18.13

Published by github-actions[bot] about 1 year ago

⚠️ Deprecations

  • Rename LazyFrame.read/write_json to de/serialize (#10238)
  • Add categorical_as_str parameter to testing utils (#10350)

πŸš€ Performance improvements

  • don't parallelize literal expressions (#10321)

✨ Enhancements

  • support selectors in additional frame methods (#10255)
  • Add Series.cat.uses_lexical_ordering (#10325)
  • utility to get buffers and pointers (#10331)
  • improve datetime parsing error message (#10332)
  • add ptr for small integer types (#10330)
  • add offsets utility (#10328)
  • allow sequential runners in select/with_columns (#10322)
  • warn about inefficient apply json.loads if json is local import (#10310)
  • improve err msg parsing time, date, datetime (#10298)
  • Add categorical_as_str parameter to testing utils

🐞 Bug fixes

  • fix oob in 'last' (#10329)
  • show inefficient apply warning in ipython (#10312)
  • add cse to no_optimization in profile (#10317)
  • fix categorical lexical sort (#10318)
  • Fix join validation (#10257)
  • Set correct dtype for .extract_groups() (#10306)

Thank you to all our contributors for making this release possible!
@CanglongCl, @JulianCologne, @MarcoGorelli, @alexander-beedie, @cmdlineluser, @eltociear, @orlp, @ritchie46 and @stinodego

polars - Python Polars 0.18.12

Published by github-actions[bot] about 1 year ago

⚠️ Deprecations

  • renaming approx_unique as approx_n_unique (#10290)
  • Rename first qcut parameter to quantiles (#10253)
  • Deprecate avg alias for mean (#10236)

πŸš€ Performance improvements

  • fix O(n^2) in sorted check during append (#10241)

✨ Enhancements

  • Add str.extract_groups (#10179)
  • raise TypeError for all LazyFrame comparison operators (#10275)
  • support bytecode translation to map_dict where the lookup key is an expression (#10265)
  • add entry point to the Consortium DataFrame API (#10244)
  • Extend datetime expression function with time zone/time unit parameters (#10235)
  • add "batch_size" to scan_pyarrow_dataset parameters (#10249)

🐞 Bug fixes

  • clear window cache and run windows on proper runners (#10303)
  • fix sorted fast path in streaming groupby wrt nulls (#10289)
  • Fix interchange protocol allowing copy even when allow_copy was set to False (#10262)
  • fix nan aggregation in groupby (#10287)
  • don't panic on cse if function hasn't implemented __eq__ (#10286)
  • fix empty streaming parquet file (#10252)
  • fix logical columns of streaming multi-column sort (#10250)
  • fix date/datetime parsing for short inputs with exact=False (#10231)
  • don't panic in wildcard apply (#10240)
  • fix cse profile (#10239)

πŸ› οΈ Other improvements

  • Update CODEOWNERS (#10261)
  • add note about pyarrow partitioning (#10297)
  • Do not keep history in gh-pages branch (#10282)
  • make an explicit note in read_parquet and scan_parquet about hive-style partitioning (point to scan_pyarrow_dataset instead) (#10277)
  • Fix typo in error message (#10281)
  • Replace "question" issues with link to Stack Overflow (#10230)
  • Use sphinx' maximum_signature_line_length (#10228)
  • add warning about parallel eval of .then(..) branches (#10229)
  • Update Sphinx to 7.1.1 and bump related dependencies (#10221)
  • Update dependabot config (#10222)

Thank you to all our contributors for making this release possible!
@0xbe7a, @MarcoGorelli, @TLouf, @alexander-beedie, @cmdlineluser, @dependabot, @dependabot[bot], @duvenagep, @mcrumiller, @orlp, @reswqa, @ritchie46 and @stinodego

polars - Python Polars 0.18.11

Published by github-actions[bot] about 1 year ago

🐞 Bug fixes

  • correct struct null counts (#10142)
  • no cse in groupby until fixed (#10216)
  • avoid false positives from multiple RETURN_VALUE ops when checking apply lambdas/functions (#10211)

πŸ› οΈ Other improvements

  • Improve deprecation utils (#10167)

Thank you to all our contributors for making this release possible!
@alexander-beedie, @magarick, @ritchie46, @stinodego and @varunmittal91

polars - Python Polars 0.18.10

Published by github-actions[bot] about 1 year ago

✨ Enhancements

  • raise a better error message from read_database if not passed a string URI (#10191)
  • Add pyarrow write_to_dataset to write_parquet function (#9835)

🐞 Bug fixes

  • fix is_in on empty series (#10195)
  • fix cse windows (#10197)
  • block predicate pushdown is_in and null producing … (#10194)
  • prevent re-ordering of dict keys inside .apply (#10172)
  • initialize fixed null values (#10192)
  • Don't pickle _scan_impl (#10175)
  • ensure window function run partitioned when cse is hit (#10170)

πŸ› οΈ Other improvements

  • prepend set_ to set operations on lists (#10182)
  • Track version in deprecation utils (#10147)
  • Add a simple util issue_deprecation_warning (#10146)
  • more precise checks for inefficient apply warnings (#10135)

Thank you to all our contributors for making this release possible!
@MarcoGorelli, @alexander-beedie, @cjackal, @cmdlineluser, @potzenhotz, @ritchie46 and @stinodego

polars - Python Polars 0.18.9

Published by github-actions[bot] about 1 year ago

πŸ† Highlights

  • common subexpression elemination (#9632)

⚠️ Deprecations

  • Deprecate parsing string inputs as literals for when-then-otherwise (#10122)
  • deprecate "connection_uri" β†’ "connection" param in read/write database methods (#10134)
  • remove/deprecate cache and its logic (#10066)
  • Add date_ranges/time_ranges expression functions (#10005)

πŸš€ Performance improvements

  • speedup mode on sorted data (#10084)
  • speedup boolean apply (#10073)
  • shrink alp/lp ~2.5x (#10039)

✨ Enhancements

  • suggest map_dict instead of lambda x: DICT[x] (#10123)
  • enable "inefficient apply" warnings from Series (#10104)
  • support writing duration type in json (#10112)
  • BytecodeParser can now handle mixed/nested and/or control flow (#10085)
  • inline lit(Series).cast(..) to -> lit(Series.cast(..)) (#10092)
  • Add ArcTan2 to SQLContext (#9571)
  • cse in groupby's (#10062)
  • Adds sql CASE statement expressions (#10065)
  • Add date_ranges/time_ranges expression functions (#10005)
  • comm_subexpr_elim in streaming 'select/with_columns' (#10050)
  • add dataframe.flags property (#10037)
  • common subexpression elemination (#9632)
  • detect and warn about usage of str/int/float python-based casts with apply (#10026)
  • detect and warn about usage of json.loads in conjunction with apply (#10023)
  • detect and warn about bare numpy functions passed to apply (#10021)
  • support bytecode identification/mapping of python string-case functions in UDFs (#10007)
  • support bytecode identification of numpy functions in UDFs that we can map to native expressions (#10003)

🐞 Bug fixes

  • adjust for null values in str.replace fast path (#10132)
  • clear bit settings in list iteration (#10131)
  • use row-encoded for struct::is_sorted (#10129)
  • fix(rust, python): don't run file-caching in streaming mode (#10117)
  • Allow initialize of pl.Array in Dataframe using schema alone (#10100)
  • silence Series.apply inefficient apply warning when calling Expr.apply (#10116)
  • don't panic if masked out values are invalid in temporal kernels (#10114)
  • Fix struct get field by index out of bounds error. (#10097)
  • fix ub in simd-json (#10093)
  • fix invalid access when groupby rolling produces empty sets (#10109)
  • respect null_on_oob=False in list.take when pa… (#10105)
  • undo regression in scan_parquet from s3 (#10098)
  • fix is_sorted for structs (#10099)
  • add file path to io error in scan_csv (#10076)
  • fix false positive in parquet stats evaluation (#10087)
  • Address .col(regex).exclude() operations not executing. (#10025)
  • address an inadvertently shallow-copy issue on underlying PySeries (#10086)
  • fix Boolean::isin(null values) (#10074)
  • predicate pushdown #10058 (#10071)
  • map 'postgres' URI prefix to ADBC 'postgresql' module (#10018)
  • Fix weighted quantile for 0 weights (#10051)
  • eager time_range/date_range dimensions fix (#9996)

πŸ› οΈ Other improvements

  • get test_udfs running on all python versions again (#10136)
  • temporarily turn off fail-fast so that ubuntu tests run (#10133)
  • clarify "clones data" in to_numpy (#10095)
  • Refactor when/then/otherwise internals (#9922)
  • Properly format Returns sections of docstrings (#10064)
  • much-improved Instruction matching for BytecodeParser (#10040)
  • add pure-python tests and CI for bytecodeparser (#10027)
  • split-out expression translation and instruction-rewrite logic from BytecodeParser (#10012)
  • cleans api sections in docs (#10004)
  • Bump some dependencies (#9997)
  • Add patchelf extra to maturin (#9995)
  • restructure all UDF parsing/translation methods into a new BytecodeParser class (#9993)
  • Clean up date_range/time_range (#9985)

Thank you to all our contributors for making this release possible!
@MarcoGorelli, @SeanTroyUWO, @alexander-beedie, @c-peters, @cmdlineluser, @jonashaag, @magarick, @mcrumiller, @rikkaka, @ritchie46 and @stinodego

polars - Python Polars 0.18.8

Published by github-actions[bot] over 1 year ago

⚠️ Deprecations

  • Add Series.extend (#9901)
  • Deprecate functions series input (#9878)

πŸš€ Performance improvements

  • Rolling min/max for partially sorted data (#9819)
  • Use pyo3::intern to avoid needlessly recreating PyString (#9853)

✨ Enhancements

  • Name transpose from column (#9846)
  • adds SQRT, CBRT, PI functions to SQLContext (#9936)
  • Let qcut create evenly spaced probabilities (#9960)
  • add freeze_panes option to write_excel (#9974)
  • initial support for parsing the set of jump bytecode instructions required to reconstruct and/or logic (#9972)
  • suggest more efficient expression if user passes simple lambda to Expr.apply or DataFrame.apply (#9918)
  • sorted flag on singletons (#9933)
  • maintain sorted flag after partition_by (#9944)
  • keep sorted flag in streaming left join (#9932)
  • Add cloudpickle for serializing python UDFs (#9921)
  • Optional three-valued logic for any/all (#9848)
  • Add Series.extend (#9901)
  • pass through unknown schema in unnest (#9896)
  • convenience support for parsing a list of SQL strings with sql_expr (#9881)
  • respect and allow more options in eager json parsing (#9882)
  • allow set_sorted in streaming (#9876)
  • Expr.cat.get_categories expression (#9869)
  • add LENGTH and OCTET_LENGTH string functions for SQL (#9860)
  • polars_warn! macro (#9868)

🐞 Bug fixes

  • fix incorrect state in projection pushdown with joins (#9987)
  • don't pass predicates referring to renamed literal… (#9965)
  • fix regression in regex expansion (#9952)
  • potential SO in csv infer schema (#9950)
  • raise on unsupported transpose and object types (#9946)
  • Fix as-of join when by groups are interleaved (#9938)
  • Handle DataFrame.extend extending by itself (#9897)
  • don't SO on align_frames (#9911)
  • respect original series dtype when constructing LitIter (#9886)
  • Handle DataFrame.vstack stacking itself (#9895)
  • sum aggregation empty set is 0, not null (#9894)
  • preserve expression aliases when parsing SQL with pl.sql_expr (#9875)
  • fmt unknown dtype (#9872)

πŸ› οΈ Other improvements

  • Update autolabeler again (#9984)
  • use param_name more in udfs for greater defensiveness (#9969)
  • fix or/and docstrings to say bitwise, not logical (#9964)
  • minor fix for apply docstring example text (#9953)
  • add note that collect_all returns result frames in the same order as input (#9951)
  • Improve docstrings for renaming operations (#9942)
  • Move sink_* methods to IO chapter (#9939)
  • Add 'nearest' in Expr.interpolation docstring with an example (#9935)
  • fix hyperlinks to pandas (#9937)
  • Address ignored Ruff doc rules (#9919)
  • improve weekday, day, ordinal_day examples (#9926)
  • deprecate bins argument and rename to breaks in Series.cut (#9913)
  • Use Pathlib everywhere (#9914)
  • Add various unit tests (#9903)
  • add big warnings about using apply (#9906)
  • Update autolabeler (#9885)
  • Workaround for PyCharm deprecation warning (#9907)
  • Mention func_horizontal on deprecated func docstrings (#9863)
  • note ordering guarantee for groupby (#9879)
  • add logo link entry to sphinx conf and factor-out website root paths (#9864)

Thank you to all our contributors for making this release possible!
@0xbe7a, @JulianCologne, @MarcoGorelli, @OneRaynyDay, @SeanTroyUWO, @StefanBRas, @alexander-beedie, @c-peters, @fsimkovic, @ion-elgreco, @magarick, @mcrumiller, @messense, @ritchie46, @sorhawell, @stinodego, @thomasaarholt and @zundertj