Dataframes powered by a multithreaded, vectorized query engine, written in Rust
OTHER License
Bot releases are visible (Hide)
Published by github-actions[bot] 10 months ago
count_bits_set_by_offsets
(#13253).dt.truncate('*mo')
more than 3x faster (#13192)Utf8
data type to String
, keep Utf8
as alias (#13257)offset
parameter to gather_every
(#13156)Array
dtype AnyValue Series construction (#12817)step
parameter in int_ranges
to take an expression (#13148)map_batches
safer (#13181)count
for DataFrame/LazyFrame (#13153)read_parquet
for all binary inputs (#13218)is_in
operator for categoricals (#13205)replace
(#13213)replace
fast path by casting old
input to the right data type (#13176)docs.pola.rs
(#13281)Thank you to all our contributors for making this release possible!
@MarcoGorelli, @TNieuwdorp, @adamreeve, @alexander-beedie, @c-peters, @cjfuller, @dependabot, @dependabot[bot], @mcrumiller, @orlp, @petrosbar, @r-brink, @reswqa, @ritchie46, @robvanmieghem and @stinodego
Published by github-actions[bot] 10 months ago
count_bits_set_by_offsets
(#13253).dt.truncate('*mo')
more than 3x faster (#13192)Utf8
data type to String
, keep Utf8
as alias (#13257)offset
parameter to gather_every
(#13156)Array
dtype AnyValue Series construction (#12817)step
parameter in int_ranges
to take an expression (#13148)map_batches
safer (#13181)count
for DataFrame/LazyFrame (#13153)read_parquet
for all binary inputs (#13218)is_in
operator for categoricals (#13205)replace
(#13213)replace
fast path by casting old
input to the right data type (#13176)docs.pola.rs
(#13281)Thank you to all our contributors for making this release possible!
@MarcoGorelli, @TNieuwdorp, @adamreeve, @alexander-beedie, @c-peters, @cjfuller, @dependabot, @dependabot[bot], @mcrumiller, @orlp, @petrosbar, @r-brink, @reswqa, @ritchie46, @robvanmieghem and @stinodego
Published by github-actions[bot] 10 months ago
iter_rows
; we can now do fully native conversion ~2-3x faster (#13122)any/all_horizontal
(#13144)from_iter_xxx_trusted_len
(#13132)lit
dtype determination for integers (#13129)any/all_horizontal
(#13144)auto_explode
param name to returns_scalar
(#13119)Thank you to all our contributors for making this release possible!
@alexander-beedie, @c-peters, @orlp, @reswqa, @ritchie46 and @stinodego
Published by github-actions[bot] 10 months ago
pl.lit
creation (#12997)contains_any
example (#13090)map_batches
warning more evident (#13081)Thank you to all our contributors for making this release possible!
@MarcoGorelli, @mcrumiller, @reswqa, @ritchie46 and @stinodego
Published by github-actions[bot] 10 months ago
This version includes quite a few breaking changes. We are preparing for the 1.0
release and aim to make the upgrade from 0.20
to 1.0
as smooth as possible. Therefore, we prioritized getting any breaking changes in now rather than with 1.0
.
Check out the upgrade guide for help navigating the upgrade to this version.
Please bear with us while we continue to make Polars the best tool it can be!
Enum
categorical data type which allows a fixed set of categories (#11822)read_parquet
(#13044)replace
expression on the Rust side (#13002)update
signature (#12986)Expr.count
to ignore null values by default (#12934)DataType
objects to be instantiated (#12470)value_counts
resulting column name from counts
to count
(#12506)join
behavior with regard to nulls, add join_nulls
parameter to keep existing behavior (#12840)Null
when no data is present (#12807)lit
behavior for list/tuple inputs (#12559)DataType.is_nested
from property to classmethod (#12453)NaN
ordering to make NaNs compare greater than any other float, and equal to themselves (#12721)write_database
parameter if_exists
to if_table_exists
(#12783)Series
methods (#13010)Series.head/tail
to the expression engine (#12946)any/all_horizontal
(#12976)truncate
(#12965)select_seq
for expression dispatch (#12962)rolling_median
algorithm (#12704)DataFrame.iter_rows
for smaller buffer sizes (#12804)Series
from a list of NumPy arrays (#12785)str.contains_any
and str.replace_many
(Aho-Corasick algorithms) (#13073).aws
folder (#13062)scan_parquet
(#13060)read_parquet
(#13044)Series
methods (#13010)replace
expression on the Rust side (#13002)inefficient map_*
warning (#13039)hist
(#13014)describe
to use new count
implementation (#12990)to_struct
Series name consistent with the usual default Series name (empty string) (#12998)map_elements
" warning message (#12978)end
before start
in date/time_range
(#12964)update
signature (#12986)Array
data type repr (#12973)Null
dtype (#12975)Expr.count
to ignore null values by default (#12934)repr
of Struct
data type class (#12922)merge
mode to write_delta
and remove pyarrow to delta conversions (#12392)str.reverse
(#12878)DataType
objects to be instantiated (#12470)value_counts
resulting column name from counts
to count
(#12506)std
and var
for Duration
columns (#12865)join
behavior with regard to nulls, add join_nulls
parameter to keep existing behavior (#12840)write_database
return (indicate the number of rows affected by the operation) (#12830)Decimal
selector (#12852)UInt
power (#10446)__repr__
implementation for Expr
(#12770)JOIN
and FROM
(#12819)quantile(method="nearest")
(#13058)datetime_range
if starting on ambiguous datetime and earliest was specified (#13050)json_decode
per max buffer length (#13029)00:00
time zone as UTC (#13034)align_frames
and fix edge-case where the identical frame object appears more than once (#13007)ranges
(#11900)sink_csv
(#12991)read_database
calls against cursors that only take positional args (#12967)truncate
when truncating by multiple weeks (#12948)Err
result (#12953)ambiguous
parameter is not Utf8 (#12913)rolling_var
/rolling_std
numerical stability (#12909)min
/max
due to incorrect SIMD mask construction (#12908)to_numpy
in the absence of pyarrow (#12888)Enum
types (#12886)Expr.gather
(which was still showing deprecated take) (#12864)Array
dtype equality (#12853)nan_min/max
incorrectly aggregating chunks with addition (#12848)collect_all
functions (#12796)group_by
(#12304)0.20
(#12844)describe
calculation of min/max (#13027)count
(#12960)group_by_dynamic
(#12906)--no-cov
flag for py3.12/ubuntu test workflow (vs implicit/omitted) (#12889)hash
docstring (#12879)list.take
(#12873)list.take
is deprecated (#12867)pip install
with dependencies (#12799)update
docstring #12797Thank you to all our contributors for making this release possible!
@MarcoGorelli, @Object905, @Yerachmiel-Feltzman, @alexander-beedie, @c-peters, @ion-elgreco, @jankislinger, @mcrumiller, @nameexhaustion, @oli-clive-griffin, @orlp, @rancomp, @ritchie46, @romanovacca, @stinodego and @xuestrange
Published by github-actions[bot] 11 months ago
Thank you to all our contributors for making this release possible!
@nameexhaustion, @ritchie46 and @stinodego
Published by github-actions[bot] 11 months ago
with_columns
(#12742)write_database
, accounting for latest adbc
fixes/updates (#12713)atoi_simd
release (#12748)xlsx2csv
dependency (#12741)aiohttp
dependency (#12733)Thank you to all our contributors for making this release possible!
@0siride, @PierreAttard, @RoDmitry, @alexander-beedie, @dependabot, @dependabot[bot], @eitsupi, @kszlim, @nameexhaustion, @orlp, @ritchie46 and @stinodego
Published by github-actions[bot] 11 months ago
DataFrame.iter_columns
(#12653)show_versions
(#12690)append
/extend
with null series (#11824) (#12686)scan_parquet
supports hive partitioning, remove note pointing to scan_pyarrow_dataset
(#12706)Thank you to all our contributors for making this release possible!
@MarcoGorelli, @alexander-beedie, @c-peters, @ritchie46, @stinodego and @tkarabela
Published by github-actions[bot] 11 months ago
series_equal
/frame_equal
to equals
(#12618)map_dict
to replace
and change default behavior (#12599)List
dtype Series
from 2D numpy array (#12672)merge_local_rhs_categorical
traversal (#12660)PySeries.from_buffer
for boolean buffers (#12654)PySeries.from_buffer
for numeric types (#12646)filter
syntax upgrades to when/then
construct (#12603)DataFrame.sum
(#12619)performant,lazy,random
(#12600)range
instead of np.arange
in constructors (#12621)Thank you to all our contributors for making this release possible!
@alexander-beedie, @c-peters, @cardoso, @dmitrybugakov, @nameexhaustion, @orlp, @ritchie46 and @stinodego
Published by github-actions[bot] 11 months ago
str.json_extract
to str.json_decode
(#12586)~7x
(#12552)by
column is not sorted in rolling aggregations (as opposed to raising), add warn_if_unsorted argument (#12398)read_csv
(#12519)LazyFrame.sink_ndjson
(#10786)eval
(#12563)group_by_dynamic
and rolling
(#12551)int_ranges
with negative step (#12548)Thank you to all our contributors for making this release possible!
@MarcoGorelli, @Qqwy, @alexander-beedie, @dmitrybugakov, @fernandocast, @gab23r, @itamarst, @nameexhaustion, @ritchie46, @stinodego and @uchiiii
Published by ritchie46 11 months ago
Series.set_at_idx
to scatter
(#12540)Series.view
(#12539)cumsum -> cum_sum
and similar (#12513)take
to gather
(#12528)DataFrame
(#12492)take_every
to gather_every
(#12531)Series.inner_dtype
property (#12494)parse_int
in favor of to_integer
(#12464)is_not
(#12458)is_boolean
and is_utf8
(#12457)DataType.is_integer
and other dtype groups (#12200)~3x 0.19.13/ ~2x numpy
(#12471)~2x
(#12412)DataFrame
(#12492)write_csv
and sink_csv
(#12253)DataType.is_integer
and other dtype groups (#12200)Decimal
type to parquet (#12532)Series
comparison with timedelta
matches that of other types (#12497)map_dicts
(#12436)scan_csv
error type (#12355)\n
when reading file-like object wi… (#12333)PolarsInefficientMapWarning
for lshift/rshift operations (#12385)polars-ds
to list of community plugins (#12527)polars-hash
reference (#12505)polars-hash
(#12496)import polars
timing test; now much more consistent/reliable (#12478).with_columns()
in all .list
namespace examples (#12475)manylinux_2_17
for building x86-64
wheel (#12408)Thank you to all our contributors for making this release possible!
@MarcoGorelli, @abstractqqq, @alexander-beedie, @c-peters, @cmdlineluser, @hirohira9119, @ion-elgreco, @jerome3o, @nameexhaustion, @reswqa, @ritchie46, @stinodego and @uchiiii
Published by github-actions[bot] 11 months ago
cumsum -> cum_sum
and similar (#12513)take
to gather
(#12528)DataFrame
(#12492)take_every
to gather_every
(#12531)parse_int
in favor of to_integer
(#12464)scan_csv
error type (#12355)write_csv
parameter has_header
to include_header
(#12351)is_signed
to is_signed_integer
(#12220)dt.seconds
to dt.total_seconds
(likewise for days, hours, minutes, milliseconds, microseconds, and nanoseconds) (#12179)ljust
/rjust
to pad_end
/pad_start
(#11975)~3x 0.19.13/ ~2x numpy
(#12471)~2x
(#12412)DataFrame
(#12492)write_csv
and sink_csv
(#12253)round_sig_figs
expression for rounding to significant figures (#11959)_saturating
in duration string language, make it the default (#12301)ambiguous
for truncate and round (#12204)is_signed
to is_signed_integer
(#12220)Config
options for numeric formatting: digit grouping and thousands/decimal separator (#12099)name=
in .write_avro
to set schema name (#12255).list.to_array
expression (#12192).arr.to_list
expression (#12136)List
/Array
(#12016)name
namespace for operations that affect expression names (#11973)Decimal
type to parquet (#12532)scan_csv
error type (#12355)\n
when reading file-like object wi… (#12333)date_range
(#12317)offset==-period
case (#12267)reshape
input (#12288)null_count
after arithmetic (#12280)numpy
ufuncs (#12212)take
should block predicate pushdown (#12130)schema_overrides
information available to the rust-side inference code when initialising from records/dicts (#12045)cumsum -> cum_sum
and similar (#12513)take
to gather
(#12528)DataFrame
(#12492)take_every
to gather_every
(#12531)polars-ds
to list of community plugins (#12527)parse_int
in favor of to_integer
(#12464)DataType
in the polars-arrow crate to ArrowDataType
for clarity, preventing conflation with our own/native DataType
(#12459)tempdir
(#12462)write_csv
parameter has_header
to include_header
(#12351)_saturating
in duration string language, make it the default (#12301)avro-rs
with apache-avro
(#12295)clippy
on all targets (#12293)make clippy
, simplify Rust linting workflows (#12290).venv
dirs (#12289)list.eval
(#12254)truncate_impl
(#12229)dt.seconds
to dt.total_seconds
(likewise for days, hours, minutes, milliseconds, microseconds, and nanoseconds) (#12179)sqlparser
to 0.39
(#12173)FunctionExpr
module (#12162)polars-parquet
(#12062)polars-arrow
and consolidate logic in polars-parquet
crate. (#12022)ljust
/rjust
to pad_end
/pad_start
(#11975)dataframe_api_compat
dependency (#11997)Thank you to all our contributors for making this release possible!
@JulianCologne, @MarcoGorelli, @Priyansh121096, @abstractqqq, @alexander-beedie, @braaannigan, @brayanjuls, @c-peters, @cmdlineluser, @daviskirk, @dependabot, @dependabot[bot], @dgilman, @hirohira9119, @ion-elgreco, @jerome3o, @jrycw, @mcrumiller, @messense, @moritzwilksch, @nameexhaustion, @orlp, @owrior, @rancomp, @reswqa, @ritchie46, @rob-sil, @stefmolin, @stinodego, @uchiiii, @universalmind303 and @wsyxbcl
Published by github-actions[bot] 12 months ago
write_csv
parameter has_header
to include_header
(#12351)_saturating
in duration string language, make it the default (#12301)Decimal
and set default scale=0
(#12224)dt.seconds
to dt.total_seconds
(likewise for days, hours, minutes, milliseconds, microseconds, and nanoseconds) (#12179)DataFrame.as_dict
positional input (#12131)BytecodeParser
for Python 3.12 (#12348)round_sig_figs
expression for rounding to significant figures (#11959)_saturating
in duration string language, make it the default (#12301)ambiguous
for truncate and round (#12204)Datetime
series from datetime.date
array (#12175)Config
options for numeric formatting: digit grouping and thousands/decimal separator (#12099)name=
in .write_avro
to set schema name (#12255)write_delta
to write large arrow types without casting (#12260).list.to_array
expression (#12192).arr.to_list
expression (#12136)DataFrame
"write" methods (#12113)date_range
(#12317)date_range
defined with 'saturating' interval (#12311)offset==-period
case (#12267)reshape
input (#12288)read_excel
in the originally specified order (#12243)numpy
ufuncs (#12212)take
should block predicate pushdown (#12130)schema_overrides
information available to the rust-side inference code when initialising from records/dicts (#12045)BytecodeParser
for Python 3.12 (#12348)group_by_dynamic
docstrings (#12366)rolling_*
docstrings (#12362)make clippy
, simplify Rust linting workflows (#12290).venv
dirs (#12289)py-polars
to Cargo workspace (#12256).with_columns
in some docstrings (#12250)scan_csv
plus slice
(#12239)name
namespace (#12236)manylinux_2_28
(#12211)rust-toolchain.toml
with sdist/wheels (#12184)sqlparser
to 0.39
(#12173)strip_{prefix, suffix}
& strip_chars_{start, end}
(#12161)DataFrame.fold
(#12164)Thank you to all our contributors for making this release possible!
@JulianCologne, @MarcoGorelli, @Priyansh121096, @alexander-beedie, @cmdlineluser, @daviskirk, @dependabot, @dependabot[bot], @dgilman, @hirohira9119, @ion-elgreco, @jrycw, @mcrumiller, @moritzwilksch, @nameexhaustion, @orlp, @owrior, @rancomp, @reswqa, @ritchie46, @rob-sil, @stefmolin, @stinodego and @wsyxbcl
Published by github-actions[bot] 12 months ago
DataFrame.as_dict
positional input (#12131).arr.to_list
expression (#12136)DataFrame
"write" methods (#12113)take
should block predicate pushdown (#12130)schema_overrides
information available to the rust-side inference code when initialising from records/dicts (#12045)strip_{prefix, suffix}
& strip_chars_{start, end}
(#12161)DataFrame.fold
(#12164)Thank you to all our contributors for making this release possible!
@MarcoGorelli, @Priyansh121096, @alexander-beedie, @dependabot, @dependabot[bot], @jrycw, @moritzwilksch, @nameexhaustion, @reswqa, @ritchie46, @stefmolin and @stinodego
Published by stinodego 12 months ago
Thank you to all our contributors for making this release possible!
@ritchie46
Published by github-actions[bot] 12 months ago
nans_compare_equal
parameter in assert utils (#12019)ljust
/rjust
to pad_end
/pad_start
(#11975)shift_and_fill
in favor of shift
(#11955)clip_min
/clip_max
in favor of clip
(#11961)List
/Array
(#12016)name
namespace for operations that affect expression names (#11973)infer_schema_length
to pl.read_json
(#11724)get_index
/iteration for Array
types (#12047)read_excel
(#12081)Mapping
objects used as schema being silently ignored (#12027)numpy
scalar values (#12025)black
by ruff format
(#11996)dataframe_api_compat
dependency (#11997)Development
and Releases
sections to the documentation (#11932)make clean
for docs (#11970)PyExpr
consistent (#11956)set_fmt_table_cell_list_len
to API docs (#11942)Thank you to all our contributors for making this release possible!
@JulianCologne, @MarcoGorelli, @Rohxn16, @alexander-beedie, @braaannigan, @brayanjuls, @messense, @nameexhaustion, @orlp, @reswqa, @ritchie46, @squnit, @stinodego and @universalmind303
Published by github-actions[bot] 12 months ago
rolling
expression as a special case of window functions. (#11445).list.lengths
and .str.lengths
(#11613)write_csv
parameter quote
to quote_char
(#11583)disable_string_cache
(#11020)cot
(cotangent) (#11717)infer_schema_length
to pl.read_json
(#11724)DATE
function for SQL (#11541)pl.scan_ndjson
(#10963)schema
, schema_override
for pl.read_json
with array-like input (#11492)UNION [ALL] BY NAME
, add "diagonal_relaxed" strategy for pl.concat
(#11597)INITCAP
string function for SQL (#9884)IN(subquery)
and SQL Subquery Infrastructure (#11218)rolling
expression as a special case of window functions. (#11445)label='right'
(#11337)disable_string_cache
(#11020)NULLIF
and COALESCE
SQL functions (#11124)tree-formatting
representation (#11176)duration + date
(#11190)read_csv
for empty lines (#11924)cast_unchecked
in lists (#11884)PyLazyGroupby
reusable (#11769)pl.duration
(#11748)join_asof
with strategy="nearest"
(#11673)IN
clauses (#11574)scan_csv
and read_csv
(#11575)is_in
handling of mismatched dtypes and fix a minor regression (#11533)USING
columns (#11518)cut
/qcut
when allow_breaks=True
(#11287)write_csv
when using non-default "quote" char (#11474)ANY
and ALL
behaviour (#10879)is_in
values to the column dtype being searched (#11427)Series.__contains__
for None values and implement is_in
for null Series (#11345)quote_style
is non-numeric (#11328)scan_pyarrow
predicates (#11195)Development
and Releases
sections to the documentation (#11932)polars-arrow
with the other crates (#11738)help
command output following addition of some longer options (#11681)polars-lts-cpu
for macOS x86-64/rosetta (#11660)is_cloud_url
(#11629).list.lengths
and .str.lengths
(#11613)write_csv
parameter quote
to quote_char
(#11583)repeat_by
to polars-ops (#11461)infer_schema_length
(#11358)GITHUB_TOKEN
to get contributor information for docs (#11321)disable_string_cache
(#11020)performant
feature only once (#11223)Thank you to all our contributors for making this release possible!
@ByteNybbler, @Cheukting, @Fokko, @Hofer-Julian, @JulianCologne, @LaurynasMiksys, @MarcoGorelli, @Rohxn16, @SeanTroyUWO, @TheDataScientistNL, @Walnut356, @aberres, @alexander-beedie, @alicja-januszkiewicz, @andysham, @billylanchantin, @bowlofeggs, @c-peters, @cmdlineluser, @dannyvankooten, @dependabot, @dependabot[bot], @ewoolsey, @jhorstmann, @jonashaag, @jrycw, @mcrumiller, @messense, @nameexhaustion, @orlp, @petrosbar, @ptiza, @rancomp, @reswqa, @ritchie46, @rjthoen, @romanovacca, @sd2k, @shenker, @squnit, @stinodego, @svaningelgem, @thomasjpfan, @uchiiii, @universalmind303 and Romano Vacca
Published by github-actions[bot] 12 months ago
shift_and_fill
in favor of shift
(#11955)clip_min
/clip_max
in favor of clip
(#11961)infer_schema_length
to pl.read_json
(#11724)Development
and Releases
sections to the documentation (#11932)make clean
for docs (#11970)PyExpr
consistent (#11956)set_fmt_table_cell_list_len
to API docs (#11942)Thank you to all our contributors for making this release possible!
@MarcoGorelli, @Rohxn16, @alexander-beedie, @messense, @orlp, @reswqa, @ritchie46, @squnit and @stinodego
Published by github-actions[bot] 12 months ago
shift
parameter from periods
to n
(#11923)Array
data type initialization (#11907)read_csv
for empty lines (#11924)filter
method (#11928)Array
data type initialization (#11907)numpy
arrays (#11905)read_excel
(#11908)read_excel
and/or read_ods
when target sheet does not exist (#11906)read_excel
docstring (#11934)diff
methods (#11921)pl.concat
"how" param docstring signature (#11909)Thank you to all our contributors for making this release possible!
@LaurynasMiksys, @alexander-beedie, @mcrumiller, @reswqa, @ritchie46, @romanovacca, @shenker, @stinodego and @uchiiii
Published by github-actions[bot] about 1 year ago
DataType.is_nested
(#11844)read_database
Databricks queries made using SQLAlchemy connections (#11885)include_nulls
parameter to update
(#11830)cast_unchecked
in lists (#11884)Thank you to all our contributors for making this release possible!
@Walnut356, @alexander-beedie, @dannyvankooten, @dependabot, @dependabot[bot], @ewoolsey, @jrycw, @mcrumiller, @nameexhaustion, @orlp, @reswqa, @ritchie46, @rjthoen, @romanovacca and @stinodego