polars | Rust Ecosystem Directory

Bot releases are visible (Hide)

polars - Python Polars 0.19.9

Published by github-actions[bot] about 1 year ago

⚠️ Deprecations

Deprecate non-keyword args for ewm methods (#11804)
Deprecate use_pyarrow param for Series.to_list (#11784)
Rename group_by_rolling to rolling (#11761)

🚀 Performance improvements

Improve DataFrame.get_column performance by ~35% (#11783)
rechunk before grouping on multiple keys (#11711)
process parquet statistics before downloading row-group (#11709)
push down predicates that refer to group_by keys (#11687)
slightly faster float equality (#11652)

✨ Enhancements

Expressify pct_change and move to ops (#11786)
primitive kwargs in plugins (#11268)
add DATE function for SQL (#11541)
extend filter capabilities with new support for *args predicates, **kwargs constraints, and chained boolean masks (#11740)
Add config setting to control how many List items are printed (#11409)
Use OrderedDict for schemas (#11742)
allow specifying schema in pl.scan_ndjson (#10963)
add support for "outer" mode to frame update method (#11688)
transparently support "qmark" parameterisation of SQLAlchemy queries in read_database (#11700)
support multiple sources in scan_file (#11661)
support batched frame iteration over read_database queries (#11664)
column selector support for DataFrame.melt and LazyFrame.unnest (#11662)

🐞 Bug fixes

ensure projections containing only hive columns are projected (#11803)
patch broken aHash AES intrinsics on ARM (#11801)
fix key in object-store cache (#11790)
handle logical types in plugins (#11788)
Fix values printed by assert_*_equal AssertionError when exact=False (#11781)
make PyLazyGroupby reusable (#11769)
only exclude final output names of group_by key expressions (#11768)
Fix subsecond parsing in timedelta conversions (#11759)
fix ambiguity wrt list aggregation states (#11758)
Correctly process subseconds in pl.duration (#11748)
use actual number of read rows for hive materialization (#11690)
return float dtype in interpolate (for method="linear") for numeric dtypes (#11624)
fix seg fault in concat_str of empty series (#11704)
fix sort_by regression (#11679)
Fix match on last item for join_asof with strategy="nearest" (#11673)

🛠️ Other improvements

Bump lint dependencies (#11802)
Minor updates to assertion utils and docstrings (#11798)
Remove unused _to_rust_syntax util (#11795)
Minor tweak in code example in section Coming from Pandas (#11764)
Fix Exception module paths (#11785)
Rename IntegralType to IntegerType (#11773)
more granular polars-ops imports (#11760)
Link to expand_selector in user guide (#11722)
Add parametric test for df.to_dict/series.to_list (#11757)
Minor fix in code example in section Coming from Pandas (#11745) (#11745)
Move tests for group_by_dynamic into one module (#11741)
Update group_by_dynamic example (#11737)
reorder pl.duration arguments (#11641)
remove default features from some crates (#11680)
*_horizontal dependent on reduce_expr to expression architecture (#11685)
clarify that median is equivalent to the 50% percentile shown in describe metrics (#11694)
update rustc and fix future (#11696)
Publish release after uploading assets (#11686)
upgrade pyo3 to 0.20.0 (#11683)
better align help command output following addition of some longer options (#11681)
sum_horizontal to expression architecture (#11659)
add note about use of polars-lts-cpu for macOS x86-64/rosetta (#11660)
improve rank implementation, especially around nulls (#11651)

Thank you to all our contributors for making this release possible!
@JulianCologne, @MarcoGorelli, @Walnut356, @aberres, @alexander-beedie, @alicja-januszkiewicz, @cmdlineluser, @jrycw, @mcrumiller, @messense, @nameexhaustion, @orlp, @petrosbar, @rancomp, @reswqa, @ritchie46, @romanovacca, @sd2k, @stinodego, @svaningelgem and @thomasjpfan

polars - Python Polars 0.19.8

Published by github-actions[bot] about 1 year ago

🏆 Highlights

Enable additional flags for x86-64 wheels (#11487)

⚠️ Deprecations

Rename .list.lengths and .str.lengths (#11613)
Deprecate default value for radix in parse_int (#11615)
Rename write_csv parameter quote to quote_char (#11583)

🚀 Performance improvements

actually use projection information in async parquet reader (#11637)
improve performance and fix panic in async parquet reader (#11607)
use try_binary_elementwise over try_binary_elementwise_values (#11596)
skip empty chunks in concat (#11565)
improve sparse sample performance (#11544)

✨ Enhancements

Standardize error message format (#11598)
allow coalesce in streaming (#11633)
Implement schema, schema_override for pl.read_json with array-like input (#11492)
add SQL support for UNION [ALL] BY NAME, add "diagonal_relaxed" strategy for pl.concat (#11597)
improve performance and fix panic in async parquet reader (#11607)
add time_unit argument to duration, default to "us" (#11586)
support read_database options passthrough to the underlying connection's execute method (enables parameterised SQL queries, etc) (#11562)
elide overflow checks on i64 (#11563)
add INITCAP string function for SQL (#9884)

🐞 Bug fixes

Fix input replacement logic for slice (#11631)
slice expr can be taken in cse (#11628)
ensure nested logical types are converted to physical (#11621)
correctly convert nullability of nested parquet fields to arrow (#11619)
improve performance and fix panic in async parquet reader (#11607)
normalize filepath in sink_parquet (#11605)
parse time unit properly in pl.lit (#11573)
expand all literals before group_by (#11590)
fix as_dict with include_key=False for partition_by (#9865)
mark take_group_last function as unsafe (#11587)
handle unary operators applied to numbers used in SQL IN clauses (#11574)
Align new_columns argument for scan_csv and read_csv (#11575)
Add initialization support for python Timedeltas (#11566)
incomplete reading of list types from parquet (#11578)
respect identity in horizontal sum (#11559)
bug in BitMask::get_u32 (#11560)
take slice into account in parallel unions (#11558)
correct schema empty df in hive partitioning read (#11557)
ensure ListChunked::full_null uses physical types (#11554)
respect 'hive_partitioning' argument in parquet (#11551)
fix parquet deserialization Overflow error by using i64 offset types when promoting Arrow Lists to LargeLists (#11549)
streamline is_in handling of mismatched dtypes and fix a minor regression (#11533)
fix comparing tz-aware series with stdlib datetime (#11480)
catch use of non equi-joins in SQL interface and raise appropriate error (#11526)
rework SQL join constraint processing to properly account for all USING columns (#11518)

🛠️ Other improvements

Improved user guide for cloud functionality (#11646)
Improve some docstrings (#11644)
Disable clippy lint "too many arguments" for py-polars (#11616)
Make backwardfill and forwardfill function expr non-anonymous (#11630)
Make all expr in dt namespace non-anonymous (#11627)
Fix changelog for language-specific breaking changes (#11617)
Make value_counts and unique_counts function expr non-anonymous (#11601)
Make arg_min(max), diff in list namespace non-anonymous (#11602)
Rename write_csv parameter quote to quote_char (#11583)
improve struct documentation (#11585)
Remove **kwargs from LazyFrame.collect() (#11567)
use a generic consistent total ordering, also for floats (#11468)
fix lints (#11555)
Remove toolchain specification workaround (#11552)
Trigger Python release from Actions workflow dispatch (#11538)
Enable additional flags for x86-64 wheels (#11487)

Thank you to all our contributors for making this release possible!
@ByteNybbler, @MarcoGorelli, @TheDataScientistNL, @alexander-beedie, @andysham, @c-peters, @jhorstmann, @mcrumiller, @nameexhaustion, @orlp, @reswqa, @ritchie46, @romanovacca, @stinodego and @svaningelgem

polars - Python Polars 0.19.7

Published by github-actions[bot] about 1 year ago

🏆 Highlights

Postfix rolling expression as a special case of window functions. (#11445)
Use IPC for (un)pickling dataframes/series (#11507)

🚀 Performance improvements

early return in replace_time_zone if target and source time zones match (#11478)
greatly improve parquet cloud reading (#11479)
ensure we download row-groups concurrently. (#11464)

✨ Enhancements

support left and right anti/semi joins from the SQL interface (#11501)
Add left_on and right_on parameters to df.update (#11277)
expressify peak_min/peak_max (#11482)
IN(subquery) and SQL Subquery Infrastructure (#11218)
add ODBC connection string support to read_database (#11448)
postfix rolling expression as a special case of window functions. (#11445)
allow for "by" column to be of dtype Date in rolling_* functions (#11004)
rework ColumnFactory to additionally support tab-complete for col in IPython (#11435)

🐞 Bug fixes

literal hash (#11508)
Fix lazy schema for cut/qcut when allow_breaks=True (#11287)
correct output schema of hive partition and projection at scan (#11499)
correct projection pushdown in hive partitioned read (#11486)
fix for write_csv when using non-default "quote" char (#11474)
fix deserialization of parquets with large string list columns causing stack overflow (#11471)
enable read_database fallback for Snowflake warehouses/connections that don't support Arrow resultsets (#11447)
Fix SQL ANY and ALL behaviour (#10879)
partially address some PyCharm tooltip/signature issues with decorated methods (#11428)
address multiple issues caused by implicit casting of is_in values to the column dtype being searched (#11427)

🛠️ Other improvements

minor changes in peak-min/max (#11491)
align cloud url regex in rust and python (#11481)
Test sdist before releasing (#11494)
Unpin maturin version, fix release workflow (#11483)
More release workflow refactor (#11472)
Set some env vars for release (#11463)
move repeat_by to polars-ops (#11461)
upgrade to nightly-10-02 (#11460)
Update contributing guide to include memory requirement (#11458)
add missing docs entry for rolling (#11456)
use with_columns in shift examples (#11453)
Add wheels as assets to GitHub release (#11452)
Build more wheels for polars-lts-cpu/polars-u64-idx (#11430)

Thank you to all our contributors for making this release possible!
@ByteNybbler, @MarcoGorelli, @SeanTroyUWO, @alexander-beedie, @c-peters, @dependabot, @dependabot[bot], @mcrumiller, @orlp, @ritchie46, @romanovacca, @stinodego, @svaningelgem and Romano Vacca

polars - Python Polars 0.19.6

Published by github-actions[bot] about 1 year ago

🚀 Performance improvements

don't load N metadata files when globbing N files (#11422)

🐞 Bug fixes

raise on invalid sort_by group lengths (#11423)
fix outer join on bools (#11417)
fix categorical collect (#11414)
fix opaque python reader schema (#11412)
async parquet. (#11403)
Fix edge-case where the Array dtype could (internally) be considered numeric (#11398)
handle ambiguous datetimes in pl.lit (#11386)
fix panic in hive read of booleans (#11376)

🛠️ Other improvements

Split Python release into build / release jobs (#11421)
Refactor Python release workflow (#11382)
clarify use of "batch_size" for read_database (#11377)
large windows runner for release (#11370)

Thank you to all our contributors for making this release possible!
@MarcoGorelli, @alexander-beedie, @bowlofeggs, @c-peters, @jonashaag, @orlp, @ritchie46 and @stinodego

polars - Python Polars 0.19.5

Published by github-actions[bot] about 1 year ago

🚀 Performance improvements

remove double memcopy (#11365)
adress perf regression (#11354)

🐞 Bug fixes

revert invalid runtime check (#11363)
more cloud urls (#11361)
ensure cloud globbing can deal with spaces (#11360)
recognize more cloud urls (#11357)

🛠️ Other improvements

Disable version warning banner for now (#11359)
Fix error message reference to infer_schema_length (#11358)
Mark some tests as slow (#11350)
improve parametric tests for group_by_rolling by skipping overflowing cases (#11286)

Thank you to all our contributors for making this release possible!
@MarcoGorelli, @jonashaag, @orlp, @ritchie46 and @stinodego

polars - Python Polars 0.19.4

Published by github-actions[bot] about 1 year ago

🏆 Highlights

support 'hive partitioning' aware readers (#11284)
natively support reading parquet for aws, gcp and azure (#11210)
Add support for Iceberg (#10375)
The great expressification by @reswqa (#11320, #11344, #11313, #11257, #11288, #11275, #11197, #11167, #11155)

⚠️ Deprecations

Add disable_string_cache (#11020)

🚀 Performance improvements

improve dynamic_groupby_iter (#11341)
improve and fix rolling windows by linear scanning (#11326)
faster init from pydantic models that have a small number of fields, and support direct init from SQLModel data (often used with FastAPI) (#11263)
improve outer join materialization (#11241)
use ryu and itoa for primitive serialization (#11193)
use try-binary-elementwise instead of try-binary-elementwise-values in dt_truncate (#11189)
Using cache for str.contains regex compilation (#11183)

✨ Enhancements

introduce 'label' instead of 'truncate' in group_by_dynamic, which can take label='right' (#11337)
Expressify list.shift (#11320)
top_k and bottom_k supports pass an expr (#11344)
add "pyxlsb" engine support to read_excel (for excel binary workbook files) (#11248)
support 'hive partitioning' aware readers (#11284)
str.strip_chars supports take an expr argument (#11313)
sample n can take an expr (#11257)
Add disable_string_cache (#11020)
clip supports expr arguments and physical numeric dtype (#11288)
Introduce list.drop_nulls (#11272)
str.splitn and split_exact can take an expr argument by (#11275)
introduce ambiguous option for dt.round (#11269)
Adds NULLIF and COALESCE SQL functions (#11124)
better tree-formatting representation (#11176)
natively support reading parquet for aws, gcp and azure (#11210)
Expressify str.strip_prefix & suffix (#11197)
Add support for Iceberg (#10375)
list.join's separator can be expression (#11167)
argument every of datetime.truncate can be expression (#11155)

🐞 Bug fixes

Fix Series.__contains__ for None values and implement is_in for null Series (#11345)
don't panic on multi-nodes in streaming conversion (#11343)
ensure trailing quote is written for temporal data when CSV quote_style is non-numeric (#11328)
clarify has_validity docstring and fix several cases where the presence of a bitmask was used to incorrectly infer the existence of null values (#11319)
fix empty Series construction edge-case with Struct dtype (#11301)
DataFrame init from collections.namedtuple values (#11314)
Exclude functools wrapper frames in find_stacklevel (#11292)
set partitions independent of thread pool (#11304)
address VSCode issue with autocomplete on selector expressions in editor/console (#11235)
consume duplicates in rolling_by window (#11261)
handle url encoded paths in objectpath creation (#11240)
use POOL when writing csv (#11222)
don't conflate saved Config JSON string with file path (#11098)
is_in for bool evaluate has_false incorrectly (#11217)
improve handling of database drivers that can return arrow data (#11201)
fix nullable filter mask in group_by (#11207)
replace n-th in filter (#11206)
fix translation of Series-nested datetime/date values for scan_pyarrow predicates (#11195)
address unexpected expression name from use of unary - or + operators (#11158)
impl hash for more function expr (#11182)
list.join's separator can be expression (#11167)
Add some missing expr type hint for series (#11171)
consistently use negative every as the default for offset in group_by_dynamic (#11164)
Make pl.struct serializable (#11169)
only raise on actual parameter collision when "dtypes" specified in read_excel "read_csv_options" (#11162)
propagate null value for str/binary starts/ends_with and contains (#11141)

🛠️ Other improvements

simplify/clarify group_by_dynamic examples (#11335)
tighten assert_frame_equal for LazyFrames (don't collect until after the schema has been checked) (#11331)
unify display for namespaced function expr (#11342)
add lazy pivot example (#11325)
Use GITHUB_TOKEN to get contributor information for docs (#11321)
Enable version warning banner (#11322)
cross-reference null_count from has_validity (clarifies the correct way to check for nulls) (#11323)
Pin pydantic in dev requirements <2.4.0 (#11312)
remove default auto-explode for map_many_private (#11270)
Add type alias IntoExprColumn (#11296)
update a few dependencies (#11283)
Properly skip ADBC test (#11282)
Fix some minor Makefile issues (#11276)
update sponsors (#11271)
parametric tests for group_by_rolling (#11262)
Make some list function expr non-anonymous (#11230)
Mention the performant feature only once (#11223)
remove unneeded indirection (#11233)
remove unneeded mutex around object-store (#11224)
clarify every/period/offset in group_by_dynamic (#11175)
Fix read_database batch_size docstring (#11132)

Thank you to all our contributors for making this release possible!
@ByteNybbler, @Cheukting, @Fokko, @Hofer-Julian, @MarcoGorelli, @SeanTroyUWO, @alexander-beedie, @billylanchantin, @jonashaag, @mcrumiller, @orlp, @ptiza, @reswqa, @ritchie46, @stinodego and @universalmind303

polars - Rust Polars 0.33

Published by github-actions[bot] about 1 year ago

🏆 Highlights

implementing sink_csv for LazyFrame (#10682)

💥 Breaking changes

empty product returns identity (#10842)
return f64 for rank when method="average" (#10734)
Rename groupby to group_by (#10654)
Read/write support for IPC streams in DataFrames (#10606)
Change behavior of all - fix Kleene logic implementation for all/any (#10564)
remove fixed_seed and add pl.set_random_seed (#10388)
Make arange an alias for int_range (#9983)
date_range/time_range no longer return a List type (#10526)
Remove various functionalities deprecated before 0.18 (#10527)

⚠️ Deprecations

Rename is_first/last to is_first/last_distinct (#11130)
Rename count_match to count_matches (#11028)
Rename strip to strip_chars (#10813)
Add datetime_range expression function (#10213)
Rename Series/Expr.rolling_apply to rolling_map (#10750)

🚀 Performance improvements

improve performance of fast projection (#10945)
parse time zones outside of downcast_iter() in replace_time_zone (#10713)
use binary abstraction for atan2 (#10588)
use binary abstraction in pow (#10562)

✨ Enhancements

Expressify str.split argument. (#11117)
Expressify argument of binary contains (#11091)
dt.offset_by supports broadcasting lhs (#11095)
Expressify argument of binary starts_with and ends_with (#11076)
json_extract supports extract static and string value to list dtype (#11057)
add quote_style="never" option for write_csv (#11015)
add support for nextest (#11048)
Add literal for str count_match (#10996)
More dtypes supports cast to list (#11025)
ParquetCloudSink to allow streaming pipelines into remote ObjectStores (#10060)
Add strip_prefix and strip_suffix to the string namespace (#10958)
Add datetime_range expression function (#10213)
add proper cache for Regex compilation (#10934)
implementation of array_to_string (#10839)
apply left side predicate pushdown also to right side if all predicate columns are also join columns (#10841)
accept expr in str.count_match (#10900)
accept expressions in .offset_by (#9967)
implement drop as special case of select (#10885)
Supports is_last operation (#10760)
activate cse for group_by (again) (#10749)
add pairwise float sum implementation (#10756)
implementing sink_csv for LazyFrame (#10682)
Supports series unique & arg_unique & n_unique for list (#10743)
repeat_by should also support broadcasting of LHS (#10735)
deprecate 'use_earliest' argument in favour of 'ambiguous', which can take expressions (#10719)
is_first also supports numeric list type. (#10727)
improve slice pushdown in unions (#10723)
Support min and max strategy for binary & str columns fill null (#10673)
support broadcasting in list set operations (#10668)
add truncate_ragged_lines (#10660)
supports cast to list (#10623)
Rename groupby to group_by (#10654)
preserve whitespace in notebook output (#10644)
Read/write support for IPC streams in DataFrames (#10606)
improve binary (arity) generics (#10622)
propagate null is in is_in and more generic array construction (#10614)
Change behavior of all - fix Kleene logic implementation for all/any (#10564)
frame-level cast support (#10504)
Add failed column to cast exception (#10507)
Make arange an alias for int_range (#9983)
date_range/time_range no longer return a List type (#10526)
Remove various functionalities deprecated before 0.18 (#10527)

🐞 Bug fixes

Correct hash and fmt for struct expr (#11119)
enforce sortedness of by argument in rolling_* functions (#11002)
Filter on empty objectChunked should not throw error (#11073)
ensure null_count statistics accounts for null array (#11070)
toggle off cse if ext_context is used (#11051)
Correct field dtype of string concat (#11055)
pushed-down expr should be considered when evaluating ExternalContext (#11023)
fix rolling_* functions when "by" has nanosecond resolution (#11005)
Don't reuse member for Selector::Add (#11026)
fix the construction of List<Null> (#10969)
allow singular null in regex pattern (#10948)
compute length of null array in explode (#10946)
Allow exactly one value in start/end for int_range (#10914)
count was falsy tagged as cse in group by (#10917)
Retain original dtype when deserializing an empty list (#10893)
CSE don't accept opaque functions (#10905)
Make int_range(s) exclusive on the upper bound when step is negative (#10898)
fix conversion from decimal to float (#10776)
Add broadcasting for list comparisons (#10857)
don't overflow length before checking limit (#10883)
fix bug where datetimes were not parsed in read_csv when pattern had no hour or minute (#10877)
tag amortized iter unsafe and add safe alternatives (#10881)
use pool in dataframe arithmetic (#10864)
remove debug println! from datetime fn (#10862)
repair polars_err string interpolation (#10863)
make count_match docs and extract_all docs/impl consistent around zero matches (#10854)
empty product returns identity (#10842)
never panic in hash/equality doesn't hold in cse (#10836)
Improve bound checks on temporal ranges (#10837)
var/std behavior around few elements (#10828)
Fix divided by zero error when read empty csv in streaming mode (#10819)
fix equality of quantile aggregation node (#10816)
Reading an only-header csv file in streaming mode should not panic (#10810)
get_single_leaf can't handle Expr::Count (#10790)
string to decimal parsing (#10712)
support groupby literal in streaming (#10771)
ORDER BY on unselected columns (#10752)
Fix is_in cannot cast list type for float (#10769)
fix unicode truncation in json parsing (#10761)
Error message of list unique should not display inner type (#10748)
create chunks_mut entry in vtable (#10745)
Prevent panic on sample_n with replacement from empty df (#10731)
only preserve sortedness flag in replace_time_zone when safe (#10738)
Error on value_counts on column named "counts" (#10737)
Build Series from empty Series vector (#10558)
return f64 for rank when method="average" (#10734)
Keep min/max and arg_min/arg_max consistent. (#10716)
Fix bug when providing custom labels and opting for duplicates in qcut (#10686)
Cast small int type when scan csv in streaming mode. (#10679)
Reused input series in rolling_apply should not be orderly (#10694)
re-sort buffer when update window swap the whole buffer (#10696)
Set the correct fast_explode flag for ListUtf8ChunkedBuilder (#10684)
Sorted Utf8Chunked max_str and min_str should consider null value (#10675)
AllHorizontal format string (#10658)
List<null> chunked builder should take care of series name (#10642)
respect 'ignore_errors=False' in csv parser (#10641)
fix rename + projection pushdown (#10624)
fix int/float downcast in is_in (#10620)
Change behavior of all - fix Kleene logic implementation for all/any (#10564)
Fix serialization for categorical chunked. (#10609)
join_asof missing tolerance implementation, address edge-cases (#10482)
Take input_schema to create physical expr for Selection (#10571)
fix serialization of empty lists (#10563)
Clear window cache after evaluate predication expr (#10505)
Parsing regex col in Expr::Columns (#10551)
sanitize column naming in boolean ops (#10531)
fix build for wasm (#10536)
remove fixed_seed and add pl.set_random_seed (#10388)
fix build for wasm (#9502)
rollback cse in groupby: python 0.18.15 (#10491)

🛠️ Other improvements

Removed duplicated example (#11109)
Add CODEOWNERS for docs folder (#11107)
Refactor starts_with and ends_with for string (#11085)
Integrate user guide (#11089)
remove feature gate join/groupby in polars-core (#10965)
Add Documentation issue type (#11042)
complete intra-docs in api documentation (#11007)
genericize take implementation (#10976)
genericize PolarsDataType (#10952)
enhance internal crates readme with reference to main crate (#10928)
Add Duration method for checking full days (#10850)
apply with_name in more places (#10899)
never compare opaque functions (#10906)
eliminate repetition in utf8 datetime functions (#10860)
Fix issue templates for bug reports (#10896)
remove LocalProjection (#10886)
request verbose logging output of minimal reproducable examples (#10882)
Reorganize range expression module (#10871)
introduce with_name for Series/ChunkedArray (#10859)
Further refactor temporal range functions (#10844)
Refactor range related functions (#10830)
Fix the un-compile Black box function parts in polars lazy cookbook (#10809)
Fix some broken links / formatting (#10772)
Improve docs for polars-lazy (#10729)
update rustc nightly_2023-08-26 (#10467)
default to rust native flate2 lib (#10733)
Clear GitHub Actions caches weekly (#10715)
move 'is_in' to polars-ops (#10645)
Clean up schema calculation for date_range (#10653)
remove unused apply functions and add fallible generic apply functions (#10621)
Enforce up-to-date Cargo.lock (#10555)
make binary chunkedarray functions DRY (#10607)
bump MSRV to 1.65 (#10568)
genericize chunk implementation (#10506)
use ChunkArray::(try_)from_chunk_iter (#10497)
add VSCode rust-analyzer settings (#10498)
Update URLs for dev documentation (#10495)
Update features for latest flate2 release (#10492)

Thank you to all our contributors for making this release possible!
@Barsik-sus, @I8dNLo, @JulianCologne, @KacpiW, @MarcoGorelli, @Object905, @OndrejSlamecka, @Qqwy, @SeanTroyUWO, @TNieuwdorp, @VasanthakumarV, @alexander-beedie, @aminalaee, @antoniocali, @braaannigan, @bvanelli, @c-peters, @cjackal, @cmdlineluser, @dependabot, @dependabot[bot], @drgif, @henrikig, @ion-elgreco, @jakob-keller, @jeroenjanssens, @jonashaag, @lorepozo, @marki259, @mcrumiller, @messense, @mrogowski11, @nameexhaustion, @orlp, @owrior, @rben01, @reswqa, @ritchie46, @s-banach, @sdamashek, @stinodego, @svaningelgem, @thomasjpfan, @titoeb, @trueb2, @washcycle, @wdoppenberg and @zundertj

polars - Python Polars 0.19.3

Published by github-actions[bot] about 1 year ago

🏆 Highlights

Polars plugins (#10924)

⚠️ Deprecations

Rename is_first/last to is_first/last_distinct (#11130)
Rename count_match to count_matches (#11028)
Rename strip to strip_chars (#10813)
Add datetime_range expression function (#10213)

🚀 Performance improvements

optimize _unpack_schema() (#11080)
optimize polars.utils._post_apply_columns() (#11086)
optimize polars.utils._post_apply_columns() (#11041)
optimize _unpack_schema() (#10960)
improve performance of fast projection (#10945)

✨ Enhancements

Expressify str.split argument. (#11117)
Polars plugins (#10924)
better async_collect (#10912)
Expressify argument of binary contains (#11091)
dt.offset_by supports broadcasting lhs (#11095)
Expressify argument of binary starts_with and ends_with (#11076)
add OpenOffice spreadsheet support via new pl.read_ods function (#11011)
json_extract supports extract static and string value to list dtype (#11057)
add quote_style="never" option for write_csv (#11015)
Add literal for str count_match (#10996)
More dtypes supports cast to list (#11025)
Add strip_prefix and strip_suffix to the string namespace (#10958)
improve read_excel table data identification (#10953)
Add from_dataframe fast path and improve typing (#10979)
add openpyxl as a new/optional engine for read_excel (#6183)
Add datetime_range expression function (#10213)

🐞 Bug fixes

Correct hash and fmt for struct expr (#11119)
enforce sortedness of by argument in rolling_* functions (#11002)
Make Series.__getitem__ raise an IndexError (#11061)
Filter on empty objectChunked should not throw error (#11073)
ensure null_count statistics accounts for null array (#11070)
toggle off cse if ext_context is used (#11051)
Correct field dtype of string concat (#11055)
fix partial schema init with read_dicts and reduce latency of small-frame creation (#11047)
pushed-down expr should be considered when evaluating ExternalContext (#11023)
fix rolling_* functions when "by" has nanosecond resolution (#11005)
Don't reuse member for Selector::Add (#11026)
ensure series_equal properly accounts for dtypes when strict=True (#11012)
fix the construction of List<Null> (#10969)
write_excel "hidden_columns" parameter fails when taking a selector (#10987)
allow singular null in regex pattern (#10948)
compute length of null array in explode (#10946)

🛠️ Other improvements

remove low contrast coloring from visited links (#11133)
Ignore matplotlib warning (#11129)
Do not run user guide examples by default (#11128)
Ignore matplotlib mypy warnings (#11126)
Add deprecation message in groupby docs (#11121)
Removed duplicated example (#11109)
Add CODEOWNERS for docs folder (#11107)
Refactor starts_with and ends_with for string (#11085)
Integrate user guide (#11089)
remove mentions of the deprecated random module (#11087)
simplify SchemaDefinition type alias (#11077)
put fetch explanation in a "notes" block to better highlight it in the docs (#11058)
remove feature gate join/groupby in polars-core (#10965)
Add Documentation issue type (#11042)
warn that "by" argument must be sorted for results to be correct in rolling_* functions (#11013)
Adds missing method refs in LazyDataFrame API docs (#11027)
Add lint for boolean trap (#11010)
Add private LazyFrame method for setting sink optimizations (#10988)
Enable a few more ruff lints (#10998)
document polars string duration language in temporal range functions (#10978)
Additional tests for interchange get_data_buffer (#10966)
genericize PolarsDataType (#10952)
Document that filter, drop_nulls, left join preserve order (#10955)
add note about adbc flight sql driver (#10949)
Revert pydantic >= 2.0.0 requirement (#10944)
note that pl.duration represents fixed durations, point to offset_by for non-fixed (#10927)
Test S3 functionality using moto server (#10164)

Thank you to all our contributors for making this release possible!
@I8dNLo, @KacpiW, @MarcoGorelli, @Object905, @Qqwy, @TNieuwdorp, @alexander-beedie, @antoniocali, @bvanelli, @cjackal, @henrikig, @jakob-keller, @mrogowski11, @nameexhaustion, @orlp, @reswqa, @ritchie46, @s-banach, @stinodego, @svaningelgem and @thomasjpfan

polars - Python Polars 0.19.2

Published by github-actions[bot] about 1 year ago

🏆 Highlights

Add syntactic sugar for col("foo") -> col.foo (#10874)

⚠️ Deprecations

Rename Expr.is_not() to not_() (#10838)

✨ Enhancements

allow individual Config options to be easily reset to their default value (#10922)
accept expr in str.count_match (#10900)
allow additional glimpse customisation, fix strings repr (#10895)
accept expressions in .offset_by (#9967)
support schema overrides for frames created from databases (#10884)
Add syntactic sugar for col("foo") -> col.foo (#10874)
support negative indexing in set_at_idx (#10891)
implement drop as special case of select (#10885)
raise a more helpful error when non-query statements passed to read_database (#10851)

🐞 Bug fixes

Allow exactly one value in start/end for int_range (#10914)
fix(rust, python): raise error when function didn't receive any inputs (#8635)
count was falsy tagged as cse in group by (#10917)
CSE don't accept opaque functions (#10905)
Make int_range(s) exclusive on the upper bound when step is negative (#10898)
don't overflow length before checking limit (#10883)
fix bug where datetimes were not parsed in read_csv when pattern had no hour or minute (#10877)
use pool in dataframe arithmetic (#10864)
repair polars_err string interpolation (#10863)
make count_match docs and extract_all docs/impl consistent around zero matches (#10854)

🛠️ Other improvements

Set minimum version for pydantic to 2.0.0 (#10923)
fix and clarify docs for Expr.map_elements (#10647)
fix rendering of bullet points in dt.round (#10911)
add test for 10875 (#10913)
apply with_name in more places (#10899)
never compare opaque functions (#10906)
eliminate repetition in utf8 datetime functions (#10860)
Fix issue templates for bug reports (#10896)
request verbose logging output of minimal reproducable examples (#10882)
add a note about read_database connection/cursor behaviour (#10873)
introduce with_name for Series/ChunkedArray (#10859)

Thank you to all our contributors for making this release possible!
@Barsik-sus, @MarcoGorelli, @alexander-beedie, @c-peters, @cmdlineluser, @dependabot, @dependabot[bot], @drgif, @jeroenjanssens, @orlp, @ritchie46, @stinodego and @wdoppenberg

polars - Python Polars 0.19.1

Published by github-actions[bot] about 1 year ago

💥 Breaking changes

empty product returns identity and product ignores nulls (#10842)

✨ Enhancements

add binary, boolean, categorical, date, object, and time selectors (#10806)
Supports is_last operation (#10760)
minor typing improvement for DataFrame.__iter__ (#10825)
Add custom error for allow_copy=False (#10822)

🐞 Bug fixes

empty product returns identity (#10842)
never panic in hash/equality doesn't hold in cse (#10836)
Improve bound checks on temporal ranges (#10837)
var/std behavior around few elements (#10828)
Fix divided by zero error when read empty csv in streaming mode (#10819)
behaviour of reversed(df) (#10823)
fix equality of quantile aggregation node (#10816)
Reading an only-header csv file in streaming mode should not panic (#10810)

🛠️ Other improvements

Refactor range related functions (#10830)
map-related docstring updates (#10779)
Move sink tests to streaming module (#10821)

Thank you to all our contributors for making this release possible!
@alexander-beedie, @orlp, @reswqa, @ritchie46 and @stinodego

polars - Python Polars 0.19.0

Published by github-actions[bot] about 1 year ago

An upgrade guide is available on our website.

🏆 Highlights

implementing sink_csv for LazyFrame (#10682)
Support DataFrame init from queries against users' existing database connections (#10649)
Rename groupby to group_by (#10656)

💥 Breaking changes

return f64 for rank when method="average" (#10734)
Update a lot of error types (#10637)
Remove deprecated behavior from vertical aggregations (#10602)
Read/write support for IPC streams in DataFrames (#10606)
Change behavior of all - fix Kleene logic implementation for all/any (#10564)
Improve consistency of parsing expression input (#9512)
allow from_arrow to take a generator of RecordBatches, change error type to TypeError (#10529)
remove fixed_seed and add pl.set_random_seed (#10388)
Make arange an alias for int_range (#9983)
date_range/time_range no longer return a List type (#10526)
Remove various functionalities deprecated before 0.18 (#10527)
Improve some error types and messages (#10470)

⚠️ Deprecations

Rename map to map_batches (#10801)
Rename GroupBy.apply to map_groups (#10799)
Rename DataFrame.apply to map_rows (#10797)
Rename Series/Expr.rolling_apply to rolling_map (#10750)
Rename Series/Expr.apply to map_elements (#10678)
Rename groupby to group_by (#10656)
Deprecate some parameters of cut/qcut (#10484)

🚀 Performance improvements

parse time zones outside of downcast_iter() in replace_time_zone (#10713)
use binary abstraction for atan2 (#10588)
use binary abstraction in pow (#10562)

✨ Enhancements

activate cse for group_by (again) (#10749)
implementing sink_csv for LazyFrame (#10682)
Supports series unique & arg_unique & n_unique for list (#10743)
repeat_by should also support broadcasting of LHS (#10735)
deprecate 'use_earliest' argument in favour of 'ambiguous', which can take expressions (#10719)
is_first also supports numeric list type. (#10727)
improve slice pushdown in unions (#10723)
Explicitly implement Protocol for interchange classes (#10688)
Support min and max strategy for binary & str columns fill null (#10673)
support broadcasting in list set operations (#10668)
csv: add schema argument (#10665)
Support DataFrame init from queries against users' existing database connections (#10649)
add truncate_ragged_lines (#10660)
supports cast to list (#10623)
Update a lot of error types (#10637)
preserve whitespace in notebook output (#10644)
Remove deprecated behavior from vertical aggregations (#10602)
support selector usage in write_excel arguments (#10589)
Add LazyFrame.collect_async and pl.collect_all_async (#10616)
Read/write support for IPC streams in DataFrames (#10606)
propagate null is in is_in and more generic array construction (#10614)
Change behavior of all - fix Kleene logic implementation for all/any (#10564)
frame-level cast support (#10504)
Improve consistency of parsing expression input (#9512)
Add failed column to cast exception (#10507)
allow from_arrow to take a generator of RecordBatches, change error type to TypeError (#10529)
Remove deprecated get_idx_type - use get_index_type instead (#10556)
Make arange an alias for int_range (#9983)
date_range/time_range no longer return a List type (#10526)
Remove various functionalities deprecated before 0.18 (#10527)
Improve some error types and messages (#10470)
suggest str.to_datetime instead of apply and stdlib strptime (#10266)

🐞 Bug fixes

get_single_leaf can't handle Expr::Count (#10790)
support groupby literal in streaming (#10771)
ORDER BY on unselected columns (#10752)
Fix is_in cannot cast list type for float (#10769)
whitespace CSS in Notebook HTML updated to use pre-wrap instead of pre (#10739)
only preserve sortedness flag in replace_time_zone when safe (#10738)
Error on value_counts on column named "counts" (#10737)
return f64 for rank when method="average" (#10734)
Keep min/max and arg_min/arg_max consistent. (#10716)
use time zone from dtype to overwrite output time zone when initialising Series (#10689)
Cast small int type when scan csv in streaming mode. (#10679)
raise exception with invalid on arg type for join_asof (#10690)
Reused input series in rolling_apply should not be orderly (#10694)
re-sort buffer when update window swap the whole buffer (#10696)
Set the correct fast_explode flag for ListUtf8ChunkedBuilder (#10684)
Sorted Utf8Chunked max_str and min_str should consider null value (#10675)
Correctly handle time zones in write_delta (#10633)
fix apply for empty series in threading mode (#10651)
respect 'ignore_errors=False' in csv parser (#10641)
fix rename + projection pushdown (#10624)
fix int/float downcast in is_in (#10620)
Change behavior of all - fix Kleene logic implementation for all/any (#10564)
Fix serialization for categorical chunked. (#10609)
Take input_schema to create physical expr for Selection (#10571)
Clear window cache after evaluate predication expr (#10505)
Parsing regex col in Expr::Columns (#10551)
sanitize column naming in boolean ops (#10531)
Fix write_delta with schema in delta_write_options (#10541)
remove fixed_seed and add pl.set_random_seed (#10388)
respect pl.Config options relating to shape, column names, and types when rendering HTML (#10449)

🛠️ Other improvements

update cargo.lock (#10800)
Create .venv in repo root (#10789)
refactored write_database unit tests to properly separate concerns (#10773)
Fix some broken links / formatting (#10772)
Document chained when-then behaviour more prominently (#10759)
Fix test failing due to new adbc release (#10763)
Unpin connectorx and bump other Python dependencies (#10753)
add note to testing docs about module import (#10741)
Clear GitHub Actions caches weekly (#10715)
Update for new pyarrow 13.0.0 behavior (#10691)
Fix minor issue with sink_parquet docs (#10669)
Remove deprecate_renamed_methods util (#10537)
add "see also" entries to ne/eq_missing and update related examples (#10667)
fix potential memory leak from usage of inspect.currentframe (#10630)
give more relevant example for polars.apply (#10631)
Bump ruff and enable new setting (#10626)
Add docstrings for Expr.meta namespace (#10617)
Enforce up-to-date Cargo.lock (#10555)
deprecate DataFrame.replace (#10600)
ensure that make requirements fully refreshes unpinned packages/deps (#10591)
fix out-of-date explain default parameter (#10566)
Fix expr_dispatch decorator to work on methods with decorators (#10549)
Fix link to source code (#10542)
Add title to index page (#10539)
Disable SIM108 lint (#10519)
Keep versioned docs (#10500)
switch to pyo3/maturin-action (#10503)
Update URLs for dev documentation (#10495)
Skip failing test (#10496)
Add version switcher to API reference (#10488)

Thank you to all our contributors for making this release possible!
@JulianCologne, @MarcoGorelli, @Object905, @OndrejSlamecka, @SeanTroyUWO, @VasanthakumarV, @alexander-beedie, @aminalaee, @braaannigan, @c-peters, @ion-elgreco, @lorepozo, @marki259, @mcrumiller, @messense, @orlp, @owrior, @rben01, @reswqa, @ritchie46, @sdamashek, @stinodego, @svaningelgem, @titoeb, @trueb2, @washcycle and @zundertj

polars - Python Polars 0.18.15

Published by github-actions[bot] about 1 year ago

🐞 Bug fixes

rollback cse in groupby: python 0.18.15 (#10491)

🛠️ Other improvements

Mark import timing check as slow (#10487)
Gather all streaming tests (#10485)
Bump maturin to version 1.2.1 (#10479)

Thank you to all our contributors for making this release possible!
@ritchie46 and @stinodego

polars - Rust Polars 0.32.0

Published by github-actions[bot] about 1 year ago

🏆 Highlights

common subexpression elemination (#9632)

💥 Breaking changes

remove deprecate tz_localize, name CastTimezone to ReplaceTimeZone (#10070)

⚠️ Deprecations

renaming approx_unique as approx_n_unique (#10290)
remove/deprecate cache and its logic (#10066)
Add date_ranges/time_ranges expression functions (#10005)

🚀 Performance improvements

pre-alloc int_ranges (#10399)
use hash as CSE Identifier (#10385)
re-use regex capture allocation (#10302) (#10335)
don't parallelize literal expressions (#10321)
fix O(n^2) in sorted check during append (#10241)
speedup mode on sorted data (#10084)
speedup boolean apply (#10073)
shrink alp/lp ~2.5x (#10039)
Remove fused arithmetic from expressions with literals (#10011)

✨ Enhancements

quote style option for csv writer (#10422)
add "raise_if_empty" flag to read_excel, read_csv, scan_csv, and read_csv_batched (#10409)
be more permissive on predicate pushdown to left side of left join (#10442)
add use_earliest to to_datetime / strptime (#10426)
{any/all}_horizontal to expression architecture (#10412)
serialize flags (#10140)
allow unaligned pointers in arrow FFI (#10403)
add line_terminator option to write_csv (#10373)
Add is_local and to_local to categorical namespace (#10372)
cse for groupby.agg and reduced cse collisions (#10381)
re-use regex capture allocation (#10302) (#10335)
Add Series.cat.uses_lexical_ordering (#10325)
improve datetime parsing error message (#10332)
allow sequential runners in select/with_columns (#10322)
improve err msg parsing time, date, datetime (#10298)
Add str.extract_groups (#10179)
add extra build profiles (#10268)
Extend datetime expression function with time zone/time unit parameters (#10235)
added gcs to gcp cloud schema in polars-core::cloud #10206. (#10207)
support writing duration type in json (#10112)
inline lit(Series).cast(..) to -> lit(Series.cast(..)) (#10092)
Move transpose naming to Rust (#10009)
cse in groupby's (#10062)
Adds sql CASE statement expressions (#10065)
Add date_ranges/time_ranges expression functions (#10005)
comm_subexpr_elim in streaming 'select/with_columns' (#10050)
common subexpression elemination (#9632)
Let qcut create evenly spaced probabilities (#9960)
sorted flag on singletons (#9933)
maintain sorted flag after partition_by (#9944)
keep sorted flag in streaming left join (#9932)
Add cloudpickle for serializing python UDFs (#9921)

🐞 Bug fixes

Fix incorrect handling of VisitRecursion::Skip. (#10452)
fix negative decimal parsing (#10444)
ensure sorted_sink hash equals the default path (#10464)
fix sum agg (#10459)
ensure last aggregation deals with default chunk (#10453)
fix cse input schema (#10450)
fix list groupby of array dtype (#10408)
correct AnyValue::hash (#10391)
finalize cast in partitioned groupby (#10359)
fix oob in 'last' (#10329)
fix categorical lexical sort (#10318)
Fix join validation (#10257)
Set correct dtype for .extract_groups() (#10306)
clear window cache and run windows on proper runners (#10303)
fix sorted fast path in streaming groupby wrt nulls (#10289)
fix nan aggregation in groupby (#10287)
check dtypes of single-column 'by' parameter in asof-join (#10284)
fix pyo3 link errors on macos (#10256)
fix empty streaming parquet file (#10252)
fix logical columns of streaming multi-column sort (#10250)
fix date/datetime parsing for short inputs with exact=False (#10231)
correct agg_sum for ChunkedArray. (#10243)
don't panic in wildcard apply (#10240)
fix cse profile (#10239)
correct struct null counts (#10142)
no cse in groupby until fixed (#10216)
fix is_in on empty series (#10195)
fix cse windows (#10197)
block predicate pushdown is_in and null producing … (#10194)
prevent re-ordering of dict keys inside .apply (#10172)
initialize fixed null values (#10192)
ensure window function run partitioned when cse is hit (#10170)
adjust for null values in str.replace fast path (#10132)
clear bit settings in list iteration (#10131)
use row-encoded for struct::is_sorted (#10129)
fix(rust, python): don't run file-caching in streaming mode (#10117)
Allow initialize of pl.Array in Dataframe using schema alone (#10100)
don't panic if masked out values are invalid in temporal kernels (#10114)
Fix struct get field by index out of bounds error. (#10097)
fix ub in simd-json (#10093)
fix invalid access when groupby rolling produces empty sets (#10109)
respect null_on_oob=False in list.take when pa… (#10105)
fix is_sorted for structs (#10099)
add file path to io error in scan_csv (#10076)
fix false positive in parquet stats evaluation (#10087)
fix error message from cast-timezone to replace-time-zone (#10089)
Address .col(regex).exclude() operations not executing. (#10025)
fix Boolean::isin(null values) (#10074)
predicate pushdown #10058 (#10071)
Fix weighted quantile for 0 weights (#10051)
fix incorrect state in projection pushdown with joins (#9987)
don't pass predicates referring to renamed literal… (#9965)
fix regression in regex expansion (#9952)
potential SO in csv infer schema (#9950)
raise on unsupported transpose and object types (#9946)
Fix as-of join when by groups are interleaved (#9938)

🛠️ Other improvements

fix and run polars-plan tests (#10465)
Simplify flag methods (#10429)
match_block_trailing_comma (#10414)
implement ChunkArray::(try_)from_chunk_iter (#10395)
add test for 10401 (#10405)
Bump some dependencies (#10396)
Move dependency version info to workspace level (#10295)
patch reedline until fix released (#10382)
remove wasm-timer dependency (#10347)
write down invariants of ChunkedArray (#10334)
fix typo in lib.rs (#10313)
Exclude examples from workspace default (#10309)
Update CODEOWNERS (#10261)
avoid outputting docs of dependencies (#10292)
Do not keep history in gh-pages branch (#10282)
Use workspace package info / organize dependencies section (#10279)
fix dead links in Rust documentation (#10251)
Fix make pre-commit command (#10205)
Fix make integration-tests command (#10202)
Replace "question" issues with link to Stack Overflow (#10230)
Update dependabot config (#10222)
Fix LICENSE symlink for moved crates (#10150)
Re-organize folder structure for Rust crates (#10141)
update to rustc nightly-2023-07-27 (#10139)
temporarily turn off fail-fast so that ubuntu tests run (#10133)
Refactor when/then/otherwise internals (#9922)
move replace_time_zone to polars-ops (#10078)
remove unneeded branch (#10082)
remove deprecate tz_localize, name CastTimezone to ReplaceTimeZone (#10070)
fix typo in contribution example (#10038)
correct example in API reference (#10032)
add developer contribution examples (#10013)
Update autolabeler again (#9984)
fix docs build and add to CI (#9904)
Minor makeover for Rust Makefile (#9874)

Thank you to all our contributors for making this release possible!
@0xbe7a, @CanglongCl, @JulianCologne, @MarcoGorelli, @OndrejSlamecka, @OneRaynyDay, @SeanTroyUWO, @StefanBRas, @TLouf, @alexander-beedie, @c-peters, @cjackal, @cmdlineluser, @dependabot, @dependabot[bot], @drgif, @duvenagep, @eltociear, @fsimkovic, @ion-elgreco, @jonashaag, @lfn3, @magarick, @mcrumiller, @orlp, @potzenhotz, @rea1bacon, @reswqa, @rikkaka, @ritchie46, @stinodego, @thomasaarholt, @varunmittal91 and @zundertj

polars - Python Polars 0.18.14

Published by github-actions[bot] about 1 year ago

🏆 Highlights

Native implementation of dataframe interchange protocol (#10267)

⚠️ Deprecations

Deprecate behavior of list/tuple inputs for lit (#10461)

🚀 Performance improvements

optimise retrieval of values from df.item (~4-5x speedup) (#10411)
pre-alloc int_ranges (#10399)
use hash as CSE Identifier (#10385)

✨ Enhancements

quote style option for csv writer (#10422)
add "raise_if_empty" flag to read_excel, read_csv, scan_csv, and read_csv_batched (#10409)
add use_earliest to to_datetime / strptime (#10426)
add new "header_format" option for write_excel (#10392)
{any/all}_horizontal to expression architecture (#10412)
Native implementation of dataframe interchange protocol (#10267)
allow unaligned pointers in arrow FFI (#10403)
add line_terminator option to write_csv (#10373)
add explicit selector variants for signed/unsigned integers (#10384)
Add is_local and to_local to categorical namespace (#10372)
enhance selectors expansion function, so it can operate on a schema as well as a frame (#10341)
Order percentiles in describe (#10378)
cse for groupby.agg and reduced cse collisions (#10381)
improve take_every(0) exception (#10352)
add offset and length to get_ptr (#10361)

🐞 Bug fixes

fix pyarrow write_to_dataset wrt check_not_directory parameter (#10471)
fix negative decimal parsing (#10444)
ensure sorted_sink hash equals the default path (#10464)
address inconsistency in init from square numpy arrays with/without an explicit schema (#10445)
ensure last aggregation deals with default chunk (#10453)
fix cse input schema (#10450)
Fix by argument handling in join_asof (#10447)
fix potential OverflowError in testing asserts with huge UInt64 diffs (#10437)
Create delta compatible schema during writing (#10165)
fix list groupby of array dtype (#10408)
correct AnyValue::hash (#10391)
finalize cast in partitioned groupby (#10359)

🛠️ Other improvements

add vertical_relaxed example for pl.concat (#10472)
Run all streaming tests on the same test runner (#10469)
Organize OOC tests (#10463)
add test for 10417 (#10420)
Clean up some Sphinx settings (#10400)
add test for 10401 (#10405)
Address Ruff per file ignores (#10258)
Small improvement for PySeries.get_buffer (#10363)

Thank you to all our contributors for making this release possible!
@MarcoGorelli, @OndrejSlamecka, @alexander-beedie, @c-peters, @cmdlineluser, @drgif, @ion-elgreco, @lfn3, @orlp, @potzenhotz, @rea1bacon, @reswqa, @ritchie46, @stinodego and @zundertj

polars - Python Polars 0.18.13

Published by github-actions[bot] about 1 year ago

⚠️ Deprecations

Rename LazyFrame.read/write_json to de/serialize (#10238)
Add categorical_as_str parameter to testing utils (#10350)

🚀 Performance improvements

don't parallelize literal expressions (#10321)

✨ Enhancements

support selectors in additional frame methods (#10255)
Add Series.cat.uses_lexical_ordering (#10325)
utility to get buffers and pointers (#10331)
improve datetime parsing error message (#10332)
add ptr for small integer types (#10330)
add offsets utility (#10328)
allow sequential runners in select/with_columns (#10322)
warn about inefficient apply json.loads if json is local import (#10310)
improve err msg parsing time, date, datetime (#10298)
Add categorical_as_str parameter to testing utils

🐞 Bug fixes

fix oob in 'last' (#10329)
show inefficient apply warning in ipython (#10312)
add cse to no_optimization in profile (#10317)
fix categorical lexical sort (#10318)
Fix join validation (#10257)
Set correct dtype for .extract_groups() (#10306)

Thank you to all our contributors for making this release possible!
@CanglongCl, @JulianCologne, @MarcoGorelli, @alexander-beedie, @cmdlineluser, @eltociear, @orlp, @ritchie46 and @stinodego

polars - Python Polars 0.18.12

Published by github-actions[bot] about 1 year ago

⚠️ Deprecations

renaming approx_unique as approx_n_unique (#10290)
Rename first qcut parameter to quantiles (#10253)
Deprecate avg alias for mean (#10236)

🚀 Performance improvements

fix O(n^2) in sorted check during append (#10241)

✨ Enhancements

Add str.extract_groups (#10179)
raise TypeError for all LazyFrame comparison operators (#10275)
support bytecode translation to map_dict where the lookup key is an expression (#10265)
add entry point to the Consortium DataFrame API (#10244)
Extend datetime expression function with time zone/time unit parameters (#10235)
add "batch_size" to scan_pyarrow_dataset parameters (#10249)

🐞 Bug fixes

clear window cache and run windows on proper runners (#10303)
fix sorted fast path in streaming groupby wrt nulls (#10289)
Fix interchange protocol allowing copy even when allow_copy was set to False (#10262)
fix nan aggregation in groupby (#10287)
don't panic on cse if function hasn't implemented __eq__ (#10286)
fix empty streaming parquet file (#10252)
fix logical columns of streaming multi-column sort (#10250)
fix date/datetime parsing for short inputs with exact=False (#10231)
don't panic in wildcard apply (#10240)
fix cse profile (#10239)

🛠️ Other improvements

Update CODEOWNERS (#10261)
add note about pyarrow partitioning (#10297)
Do not keep history in gh-pages branch (#10282)
make an explicit note in read_parquet and scan_parquet about hive-style partitioning (point to scan_pyarrow_dataset instead) (#10277)
Fix typo in error message (#10281)
Replace "question" issues with link to Stack Overflow (#10230)
Use sphinx' maximum_signature_line_length (#10228)
add warning about parallel eval of .then(..) branches (#10229)
Update Sphinx to 7.1.1 and bump related dependencies (#10221)
Update dependabot config (#10222)

Thank you to all our contributors for making this release possible!
@0xbe7a, @MarcoGorelli, @TLouf, @alexander-beedie, @cmdlineluser, @dependabot, @dependabot[bot], @duvenagep, @mcrumiller, @orlp, @reswqa, @ritchie46 and @stinodego

polars - Python Polars 0.18.11

Published by github-actions[bot] about 1 year ago

🐞 Bug fixes

correct struct null counts (#10142)
no cse in groupby until fixed (#10216)
avoid false positives from multiple RETURN_VALUE ops when checking apply lambdas/functions (#10211)

🛠️ Other improvements

Improve deprecation utils (#10167)

Thank you to all our contributors for making this release possible!
@alexander-beedie, @magarick, @ritchie46, @stinodego and @varunmittal91

polars - Python Polars 0.18.10

Published by github-actions[bot] about 1 year ago

✨ Enhancements

raise a better error message from read_database if not passed a string URI (#10191)
Add pyarrow write_to_dataset to write_parquet function (#9835)

🐞 Bug fixes

fix is_in on empty series (#10195)
fix cse windows (#10197)
block predicate pushdown is_in and null producing … (#10194)
prevent re-ordering of dict keys inside .apply (#10172)
initialize fixed null values (#10192)
Don't pickle _scan_impl (#10175)
ensure window function run partitioned when cse is hit (#10170)

🛠️ Other improvements

prepend set_ to set operations on lists (#10182)
Track version in deprecation utils (#10147)
Add a simple util issue_deprecation_warning (#10146)
more precise checks for inefficient apply warnings (#10135)

Thank you to all our contributors for making this release possible!
@MarcoGorelli, @alexander-beedie, @cjackal, @cmdlineluser, @potzenhotz, @ritchie46 and @stinodego

polars - Python Polars 0.18.9

Published by github-actions[bot] about 1 year ago

🏆 Highlights

common subexpression elemination (#9632)

⚠️ Deprecations

Deprecate parsing string inputs as literals for when-then-otherwise (#10122)
deprecate "connection_uri" → "connection" param in read/write database methods (#10134)
remove/deprecate cache and its logic (#10066)
Add date_ranges/time_ranges expression functions (#10005)

🚀 Performance improvements

speedup mode on sorted data (#10084)
speedup boolean apply (#10073)
shrink alp/lp ~2.5x (#10039)

✨ Enhancements

suggest map_dict instead of lambda x: DICT[x] (#10123)
enable "inefficient apply" warnings from Series (#10104)
support writing duration type in json (#10112)
BytecodeParser can now handle mixed/nested and/or control flow (#10085)
inline lit(Series).cast(..) to -> lit(Series.cast(..)) (#10092)
Add ArcTan2 to SQLContext (#9571)
cse in groupby's (#10062)
Adds sql CASE statement expressions (#10065)
Add date_ranges/time_ranges expression functions (#10005)
comm_subexpr_elim in streaming 'select/with_columns' (#10050)
add dataframe.flags property (#10037)
common subexpression elemination (#9632)
detect and warn about usage of str/int/float python-based casts with apply (#10026)
detect and warn about usage of json.loads in conjunction with apply (#10023)
detect and warn about bare numpy functions passed to apply (#10021)
support bytecode identification/mapping of python string-case functions in UDFs (#10007)
support bytecode identification of numpy functions in UDFs that we can map to native expressions (#10003)

🐞 Bug fixes

adjust for null values in str.replace fast path (#10132)
clear bit settings in list iteration (#10131)
use row-encoded for struct::is_sorted (#10129)
fix(rust, python): don't run file-caching in streaming mode (#10117)
Allow initialize of pl.Array in Dataframe using schema alone (#10100)
silence Series.apply inefficient apply warning when calling Expr.apply (#10116)
don't panic if masked out values are invalid in temporal kernels (#10114)
Fix struct get field by index out of bounds error. (#10097)
fix ub in simd-json (#10093)
fix invalid access when groupby rolling produces empty sets (#10109)
respect null_on_oob=False in list.take when pa… (#10105)
undo regression in scan_parquet from s3 (#10098)
fix is_sorted for structs (#10099)
add file path to io error in scan_csv (#10076)
fix false positive in parquet stats evaluation (#10087)
Address .col(regex).exclude() operations not executing. (#10025)
address an inadvertently shallow-copy issue on underlying PySeries (#10086)
fix Boolean::isin(null values) (#10074)
predicate pushdown #10058 (#10071)
map 'postgres' URI prefix to ADBC 'postgresql' module (#10018)
Fix weighted quantile for 0 weights (#10051)
eager time_range/date_range dimensions fix (#9996)

🛠️ Other improvements

get test_udfs running on all python versions again (#10136)
temporarily turn off fail-fast so that ubuntu tests run (#10133)
clarify "clones data" in to_numpy (#10095)
Refactor when/then/otherwise internals (#9922)
Properly format Returns sections of docstrings (#10064)
much-improved Instruction matching for BytecodeParser (#10040)
add pure-python tests and CI for bytecodeparser (#10027)
split-out expression translation and instruction-rewrite logic from BytecodeParser (#10012)
cleans api sections in docs (#10004)
Bump some dependencies (#9997)
Add patchelf extra to maturin (#9995)
restructure all UDF parsing/translation methods into a new BytecodeParser class (#9993)
Clean up date_range/time_range (#9985)

Thank you to all our contributors for making this release possible!
@MarcoGorelli, @SeanTroyUWO, @alexander-beedie, @c-peters, @cmdlineluser, @jonashaag, @magarick, @mcrumiller, @rikkaka, @ritchie46 and @stinodego

polars - Python Polars 0.18.8

Published by github-actions[bot] over 1 year ago

⚠️ Deprecations

Add Series.extend (#9901)
Deprecate functions series input (#9878)

🚀 Performance improvements

Rolling min/max for partially sorted data (#9819)
Use pyo3::intern to avoid needlessly recreating PyString (#9853)

✨ Enhancements

Name transpose from column (#9846)
adds SQRT, CBRT, PI functions to SQLContext (#9936)
Let qcut create evenly spaced probabilities (#9960)
add freeze_panes option to write_excel (#9974)
initial support for parsing the set of jump bytecode instructions required to reconstruct and/or logic (#9972)
suggest more efficient expression if user passes simple lambda to Expr.apply or DataFrame.apply (#9918)
sorted flag on singletons (#9933)
maintain sorted flag after partition_by (#9944)
keep sorted flag in streaming left join (#9932)
Add cloudpickle for serializing python UDFs (#9921)
Optional three-valued logic for any/all (#9848)
Add Series.extend (#9901)
pass through unknown schema in unnest (#9896)
convenience support for parsing a list of SQL strings with sql_expr (#9881)
respect and allow more options in eager json parsing (#9882)
allow set_sorted in streaming (#9876)
Expr.cat.get_categories expression (#9869)
add LENGTH and OCTET_LENGTH string functions for SQL (#9860)
polars_warn! macro (#9868)

🐞 Bug fixes

fix incorrect state in projection pushdown with joins (#9987)
don't pass predicates referring to renamed literal… (#9965)
fix regression in regex expansion (#9952)
potential SO in csv infer schema (#9950)
raise on unsupported transpose and object types (#9946)
Fix as-of join when by groups are interleaved (#9938)
Handle DataFrame.extend extending by itself (#9897)
don't SO on align_frames (#9911)
respect original series dtype when constructing LitIter (#9886)
Handle DataFrame.vstack stacking itself (#9895)
sum aggregation empty set is 0, not null (#9894)
preserve expression aliases when parsing SQL with pl.sql_expr (#9875)
fmt unknown dtype (#9872)

🛠️ Other improvements

Update autolabeler again (#9984)
use param_name more in udfs for greater defensiveness (#9969)
fix or/and docstrings to say bitwise, not logical (#9964)
minor fix for apply docstring example text (#9953)
add note that collect_all returns result frames in the same order as input (#9951)
Improve docstrings for renaming operations (#9942)
Move sink_* methods to IO chapter (#9939)
Add 'nearest' in Expr.interpolation docstring with an example (#9935)
fix hyperlinks to pandas (#9937)
Address ignored Ruff doc rules (#9919)
improve weekday, day, ordinal_day examples (#9926)
deprecate bins argument and rename to breaks in Series.cut (#9913)
Use Pathlib everywhere (#9914)
Add various unit tests (#9903)
add big warnings about using apply (#9906)
Update autolabeler (#9885)
Workaround for PyCharm deprecation warning (#9907)
Mention func_horizontal on deprecated func docstrings (#9863)
note ordering guarantee for groupby (#9879)
add logo link entry to sphinx conf and factor-out website root paths (#9864)

Thank you to all our contributors for making this release possible!
@0xbe7a, @JulianCologne, @MarcoGorelli, @OneRaynyDay, @SeanTroyUWO, @StefanBRas, @alexander-beedie, @c-peters, @fsimkovic, @ion-elgreco, @magarick, @mcrumiller, @messense, @ritchie46, @sorhawell, @stinodego, @thomasaarholt and @zundertj