Dataframes powered by a multithreaded, vectorized query engine, written in Rust
OTHER License
Bot releases are visible (Hide)
Published by github-actions[bot] over 1 year ago
~-30%
(#9423)~15%
(#9379)>3.5x
(#9346)OptState
in LazyFrame
to unit-test optimization toggle methods. (#9883)LENGTH
and OCTET_LENGTH
string functions for SQL (#9860)polars_warn!
macro (#9868)include_key
parameter to partition_by
(#9750)LEFT
string function for SQL (#9836)REGEXP_LIKE
function for SQL (both two and three parameter version) (#9838)maintain_order
argument to sort
/top_k
/bottom_k
(#9672)SUBSTR
function (#9803).list.any()
and .list.all()
(#9573)Datetime
with a "*" wildcard for timezones (#9641)to_numpy
(#9592)repeat
(#9614)SQLContext
(#9453)round
support (#9330)~
, !~
, ~*
, and !~*
) (#9327)//
integer floordiv operator in the SQL engine (#9324)offset_by
(#9253)Decimal
type: sum
, min
, max
aggregations in select
and agg
context. (#9135)repeat
(#9117)Utf8
to Decimal
. (#9090)LitIter
(#9886)pl.sql_expr
(#9875)maintain_order
argument to sort
/top_k
/bottom_k
(#9672)arr.eval
references (#9821)apply
caller determine if length needs to be checked. (#9140)is_in
should upcast numeric types (#9110)arr.eval
references (#9821)arange
(#9769)arange
(#9681)arange
and add int_range
/int_ranges
(#9666)Thank you to all our contributors for making this release possible!
@0xbe7a, @AnatolyBuga, @CloseChoice, @DeflateAwning, @EdmundsEcho, @MarcoGorelli, @SeanTroyUWO, @alexander-beedie, @ankane, @avimallu, @baggiponte, @bfeif, @borchero, @braaannigan, @c-peters, @datapythonista, @dependabot, @dependabot[bot], @dkrako, @durandtibo, @eitsupi, @guanqun, @jeroenjanssens, @jonashaag, @jorisSchaller, @josh, @kljensen, @lorentzenchr, @magarick, @mcrumiller, @messense, @mgperry, @mishpat, @moritzwilksch, @ritchie46, @sorhawell, @stinodego, @tarrafil, @thomascamminady, @ttencate, @universalmind303 and @zundertj
Published by github-actions[bot] over 1 year ago
in series
10x (#9794)include_key
parameter to partition_by
(#9750)LEFT
string function for SQL (#9836)REGEXP_LIKE
function for SQL (both two and three parameter version) (#9838)maintain_order
argument to sort
/top_k
/bottom_k
(#9672)SUBSTR
function (#9803)duration
selector and improve selector typing (#9772)maintain_order
argument to sort
/top_k
/bottom_k
(#9672)arr.eval
references (#9821)write_database
handling of db schema and quoted table names (#9788)arr.eval
references (#9821)arange
(#9769)last
entry (#9782)rows_by_key
docs (#9766)Thank you to all our contributors for making this release possible!
@CloseChoice, @MarcoGorelli, @alexander-beedie, @avimallu, @jonashaag, @magarick, @mcrumiller, @ritchie46 and @stinodego
Published by github-actions[bot] over 1 year ago
Thank you to all our contributors for making this release possible!
@magarick and @ritchie46
Published by github-actions[bot] over 1 year ago
_datetime_to_pl_timestamp
(#9533)adbc
connectivity, adding snowflake support (#9600)selector
utility functions with better docstrings/examples (#9683).list.any()
and .list.all()
(#9573)Datetime
with a "*" wildcard for timezones (#9641)to_numpy
(#9592)repeat
(#9614)rows_by_key
method, returning a keyed-dictionary of row data (#9567)from_dicts
drops columns explicitly omitted from schema (#9581)arange
(#9681)read_database
(inc snowflake) (#9686)selector
utility functions with better docstrings/examples (#9683)arange
and add int_range
/int_ranges
(#9666).list.difference()
example (#9615)to_numpy
(#9619).list.union()
, .list.difference()
, .list.intersection()
(#9602)arange
(#9544)Thank you to all our contributors for making this release possible!
@MarcoGorelli, @alexander-beedie, @borchero, @datapythonista, @dependabot, @dependabot[bot], @eitsupi, @guanqun, @jeroenjanssens, @jorisSchaller, @kljensen, @magarick, @mcrumiller, @messense, @mishpat, @moritzwilksch, @ritchie46, @stinodego, @ttencate, @universalmind303 and @zundertj
Published by github-actions[bot] over 1 year ago
~-30%
(#9423)SQLContext
(#9453)first
& last
selectors, additional minor repr improvements (#9456)polars.selectors
repr and implicit application of as_expr
when broadcasting (#9450)round
support (#9330)Series.qcut
(#9421)Thank you to all our contributors for making this release possible!
@EdmundsEcho, @MarcoGorelli, @SeanTroyUWO, @alexander-beedie, @baggiponte, @braaannigan, @datapythonista, @magarick, @mcrumiller, @messense, @mgperry, @mishpat, @ritchie46, @stinodego, @tarrafil, @universalmind303 and @zundertj
Published by github-actions[bot] over 1 year ago
~15%
(#9379)>3.5x
(#9346)Config
options to/from file (#9391)~
, !~
, ~*
, and !~*
) (#9327)//
integer floordiv operator in the SQL engine (#9324)is_in
TypeError with sets of values containing 'None' (#9323)eq_missing
and ne_missing
expressions (#9331)validate
arg in join
(#9319)Thank you to all our contributors for making this release possible!
@0xbe7a, @AnatolyBuga, @MarcoGorelli, @alexander-beedie, @dkrako, @durandtibo, @ritchie46 and @universalmind303
Published by ritchie46 over 1 year ago
StringCache
object as a function decorator (#9309)Config
object as a function decorator (#9307)pydantic
2.x release (#9296)Thank you to all our contributors for making this release possible!
@alexander-beedie, @magarick, @ritchie46, @stinodego and @thomascamminady
Published by github-actions[bot] over 1 year ago
selectors
module, consolidating/expanding existing selector capabilities (#9204)offset_by
(#9253)select
/with_columns
/groupby
(#9205)datetime
selector (#9212)selectors
module, consolidating/expanding existing selector capabilities (#9204)Decimal
type: sum
, min
, max
aggregations in select
and agg
context. (#9135)repeat
(#9117)select
input (#9198)apply
caller determine if length needs to be checked. (#9140)is_in
should upcast numeric types (#9110)name
arg for date_range
(#9107)Expr.over
docs (#9244)py-polars
crate (#9242)exprs=...
input for select
/with_columns
/agg
/struct
(#9219)tmp_path
(#9206)PyExpr
(#9166)exact=False
is a performance footgun (#9186)maturin
to 1.0.1
(#9115)Thank you to all our contributors for making this release possible!
@DeflateAwning, @MarcoGorelli, @alexander-beedie, @ankane, @avimallu, @bfeif, @dependabot, @dependabot[bot], @jonashaag, @josh, @lorentzenchr, @magarick, @ritchie46, @stinodego, @universalmind303 and @zundertj
Published by github-actions[bot] over 1 year ago
.arr
to .list
(#8999)Array
(backed by arrow::FixedSizeList
datatype (#8943).arr
to .list
(#8999)offsets_to_indexes
performance (#8964)exclude
behaviour when selecting against dtypes and/or wildcards (#8953)json_extract
(#8858)~4x
(#8775)Utf8
to Decimal
. (#9090)is_in
to pyarrow dataset (#8930)Array
(backed by arrow::FixedSizeList
datatype (#8943)SQLContext
(#8944)json_extract
(#8858)null_count
(#8837)OFFSET
keyword in SQL queries (#8833)time_range
utility function (#8776)hist
(#8982)BETWEEN
bounds should be inclusive (#8818)arange
/date_range
/time_range
(#9027).arr
to .list
(#8999)take_every
(#8971)Thank you to all our contributors for making this release possible!
@CloseChoice, @MarcoGorelli, @alexander-beedie, @avimallu, @cbowdon, @charliegallop, @chitralverma, @jonashaag, @kpberry, @mcrumiller, @petar-savov, @raymead, @ritchie46, @sorhawell, @stinodego, @tim-habitat, @uchiiii and @universalmind303
Published by github-actions[bot] over 1 year ago
.arr
to .list
(#8999)DataFrame
/LazyFrame
(#9008)date_range
/ones
/zeros
to eager=False
(#9007).arr
to .list
(#8999)Utf8
to Decimal
. (#9090)repeat
(#9046)date_range
/ones
/zeros
to eager=False
(#9007)Series
declared as int/temporal with floating point values (#9004)time_unit
property from Series
(#8990)repeat
(#9048)arange
/date_range
/time_range
(#9027)DataFrame
/LazyFrame
(#9008)SQLContext
docstring cleanups (#9005).arr
to .list
(#8999)Thank you to all our contributors for making this release possible!
@CloseChoice, @MarcoGorelli, @alexander-beedie, @charliegallop, @jonashaag, @mcrumiller, @raymead, @ritchie46, @sorhawell, @stinodego, @tim-habitat and @universalmind303
Published by github-actions[bot] over 1 year ago
Array
(backed by arrow::FixedSizeList
datatype (#8943)offsets_to_indexes
performance (#8964)exclude
behaviour when selecting against dtypes and/or wildcards (#8953)align_frames
, and add new alignment option (#8899)is_in
to pyarrow dataset (#8930)Array
(backed by arrow::FixedSizeList
datatype (#8943)dtype
argument for repeat
(#8946)pl.struct
(#8952)SQLContext
(#8944)align_frames
, and add new alignment option (#8899)hist
(#8982)Series
with empty names in-place on DataFrame
init (#8956)Series
objects (#8915)align_frames
, and add new alignment option (#8899)rename
"in_place" parameter (#8960)repeat
(#8979)name
argument for repeat
(#8977)take_every
(#8971)repeat
/ones
/zeros
(#8963)SQLContext
docstrings (#8948)lazygroupby.rs
error message (#8937)time()
(#8939)Thank you to all our contributors for making this release possible!
@CloseChoice, @MarcoGorelli, @alexander-beedie, @avimallu, @cbowdon, @chitralverma, @jonashaag, @kpberry, @mcrumiller, @petar-savov, @ritchie46, @stinodego and @universalmind303
Published by github-actions[bot] over 1 year ago
align_frames
and properly handle the case where the alignment key has duplicate values (#8825)align
option to pl.concat
(#8835)null_count
(#8837)OFFSET
keyword in SQL queries (#8833)align_frames
and properly handle the case where the alignment key has duplicate values (#8825)InitVar
typing declarations on dataclass
objects (#8856)align_frames
and properly handle the case where the alignment key has duplicate values (#8825)BETWEEN
bounds should be inclusive (#8818)Config
"set_tbl_formatting" and "set_fmt_str_lengths" methods (#8859)Thank you to all our contributors for making this release possible!
@MarcoGorelli, @alexander-beedie, @ritchie46, @stinodego and @universalmind303
Published by github-actions[bot] over 1 year ago
~4x
(#8775)time_range
utility function (#8776)Config
init (#8797)time
expression (#8785)SQLContext
registration of DataFrames
(#8762)SQLContext
frame/table registration from local variables (#8749)SQLContext
init time, and add an "unregister" method (#8744)DISTINCT
keyword in SQL select clauses (#8740)USING
clause in SQL join operations (#8731)extend_constant
Expr (#8734)SQLContext
(#8724)HAVING
clause to SQL GROUP BY
operations (#8704)numpy
string interop (#8703)arange
(#8796)update
(#8763)time
func (#8786)extend_constant
Expr (#8734)Thank you to all our contributors for making this release possible!
@DeflateAwning, @MarcoGorelli, @alexander-beedie, @mcrumiller, @ritchie46, @stinodego, @uchiiii, @universalmind303 and @zundertj
Published by github-actions[bot] over 1 year ago
concat_lst
to concat_list
(#8597)toggle_string_cache
to enable_string_cache
(#7970)sort
, top_k
, sort_by
, and arg_sort_by
, raise if descending
is a sequence and its length doesn't match the number of columns to sort by (#7957)~10/20%
(#8616)>2x
(#8432)fmt
is provided (#8111)arg_min
/arg_max
(via argminmax
) (#8074)arr.eval
run on groupby expression engine when possible (#8199)FromParalleIter<Option<str>> for Utf8Chunked
~1.9x
(#8058)~2.5x
(#8057)~2x
. (#8053)into_groups
materialization ~-25%
(#8036)~25%
(#7980)~10%
(#7938)DISTINCT
keyword in SQL select clauses (#8740)USING
clause in SQL join operations (#8731)HAVING
clause to SQL GROUP BY
operations (#8704)dt.to_string
alias for dt.strftime
(#8290)strptime
default strict/exact=true
(#8587)to_date
, to_datetime
, to_time
to String namespace (#8579)List
dtype (#8583)groupby_dynamic/rolling
(#8528)str_slice
method to StringNameSpace
(#8427)use_earliest
argument to replace_time_zone
for dealing with ambiguous datetimes (#8087)approx_unique()
(#7937)FunctionExpr
for cat
namespace (#8173)DatetimeArgs
ergonomics (#8133)FunctionExpr
for bound and round methods (#8172)BooleanFunction
enum (#8132)FunctionExpr
for abs
to allow for serialization (#8129)FunctionExpr
for cum*
functions (#8130)pct_change
(#8137)log1p
to list of mathematical functions (#8102)not_equal
comparator/operator (#8547)Series
initialised with nested tuple data as Object
dtype (#8401)top_k
fast path (#8275)map
lenghts (#8147)UInt64
values that exceed Int64
upper bound (#8146)is_in
(#8139)sort
, top_k
, sort_by
, and arg_sort_by
, raise if descending
is a sequence and its length doesn't match the number of columns to sort by (#7957)concat_lst
to concat_list
(#8597)strptime
(#8345)concat_owned_array_unchecked
when possible (#8274)strptime
/strftime
args (#8221)Expr.list
to implode
(#8165)FieldsMapper
utility class for obtaining FunctionExpr
schema (#8175)map_private
where applicable to reduce code duplication (#8128)-1
to show all rows. (#8080)toggle_string_cache
to enable_string_cache
(#7970)Duration::parse
docs (#7918)Thank you to all our contributors for making this release possible!
@DeflateAwning, @JoonHong-Kim, @LdRoW, @MarcoGorelli, @Newtoniano, @StefanBRas, @alexander-beedie, @alonme, @ankane, @avimallu, @ayemjay, @borchero, @cgevans, @chitralverma, @clickingbuttons, @dependabot, @dependabot[bot], @ghuls, @grantmcdermott, @jonashaag, @josh, @jvdd, @lorentzenchr, @mcrumiller, @mzjp2, @n8henrie, @pgimalac, @rben01, @ritchie46, @stinodego, @uchiiii, @universalmind303, @utkarshgupta137, @zaynetro and @zundertj
Published by github-actions[bot] over 1 year ago
arr.to_struct
to take a list of field names, fix it for Series
, improve related docstrings (#8673)from_repr
to handle parsing of table reprs with no dtype row (#8640)dt.to_string
alias for dt.strftime
(#8290)DataFrame
export to numpy
structured/record arrays (#8628)DataFrame
init from numpy
structured/record arrays. (#8620)arr.to_struct
to take a list of field names, fix it for Series
, improve related docstrings (#8673)replace
docstrings (#8685)extract_all
docstrings (#8675)arr.to_struct
to take a list of field names, fix it for Series
, improve related docstrings (#8673)extract
docstrings (#8669)implode
to internal functions (#8667)impl
blocks (#8665)pipe
docstring (#8658)contains
docstrings (#8657)from_repr
example/doctest (#8642)functions
module (#8629)typing_extensions
before Python 3.8 (#8623)Thank you to all our contributors for making this release possible!
@MarcoGorelli, @alexander-beedie, @dependabot, @dependabot[bot], @ghuls, @jonashaag, @josh, @mcrumiller, @ritchie46 and @stinodego
Published by github-actions[bot] over 1 year ago
~10/20%
(#8616)Series
init (#8613)Expr.meta
namespace eq
and ne
methods (#8599)Series
init (#8613)is_
and is_not
) (#8600)series <op> expr
to pl.lit(series) <op> expr
(#8549)functions
module in Rust bindings (#8598)impl
block into modules (#8596)Thank you to all our contributors for making this release possible!
@MarcoGorelli, @alexander-beedie, @dependabot, @dependabot[bot], @mcrumiller, @ritchie46 and @stinodego
Published by github-actions[bot] over 1 year ago
Series
data (#8501)to_date
, to_datetime
, to_time
to String namespace (#8579)List
strategy by default (#8571)round
(#8566)all
, any
, sum
, and cumsum
(#8541)groupby_dynamic/rolling
(#8528)is_nested
property to dtypes (#8514)NamedTuple
input that contains unhashable field data (#8578)List
dtype in parametric tests (#8581)NaN
values in Struct
data (#8557)NaN
values in List
data (#8537)Decimal
to Float64
in truediv (#8523)pl.min/max
(#8509)internals
module to _reexport
(#8554)NaN
values in Struct
data (#8557)Thank you to all our contributors for making this release possible!
@MarcoGorelli, @alexander-beedie, @cgevans, @ritchie46, @stinodego and @uchiiii
Published by github-actions[bot] over 1 year ago
Operation that require columns to be sorted will now give a warning if they are not explicitly sorted, or tagged as sorted.
# 1. inform polars that a column is sorted on the DataFrame / LazyFrame.
(
df.set_sorted("foo")
.groupby_dynamic(..)
)
# 2. inform polars inline via the `set_sorted` expression
df.join_asof(df2, on=pl.col("foo").set_sorted())
# 3. explicitly sort first
# this is expensive if the data is already sorted
df.sort("foo")
strptime
(#8496)Thank you to all our contributors for making this release possible!
@MarcoGorelli, @alexander-beedie, @ritchie46 and @stinodego
Published by github-actions[bot] over 1 year ago
describe
method (#8465)Decimal
strategy (#8444)Decimal
strategy (#8444)Thank you to all our contributors for making this release possible!
@MarcoGorelli, @alexander-beedie, @ritchie46, @universalmind303 and @utkarshgupta137
Published by github-actions[bot] over 1 year ago
>2x
(#8432)Decimal
dtype testing strategy (note: disabled by default) (#8430)Series
support to pl.from_repr
(#8429)%f
in strptime
format strings (#8404)str.strptime
error message: utf -> utc (#8422)Decimal
dtype testing strategy (note: disabled by default) (#8430)Thank you to all our contributors for making this release possible!
@alexander-beedie, @ayemjay, @jonashaag, @mzjp2, @pgimalac, @ritchie46 and @stinodego