Dataframes powered by a multithreaded, vectorized query engine, written in Rust
OTHER License
Bot releases are visible (Hide)
Published by github-actions[bot] about 1 year ago
ewm
methods (#11804)use_pyarrow
param for Series.to_list
(#11784)group_by_rolling
to rolling
(#11761)DataFrame.get_column
performance by ~35% (#11783)DATE
function for SQL (#11541)filter
capabilities with new support for *args
predicates, **kwargs
constraints, and chained boolean masks (#11740)OrderedDict
for schemas (#11742)pl.scan_ndjson
(#10963)update
method (#11688)read_database
(#11700)read_database
queries (#11664)DataFrame.melt
and LazyFrame.unnest
(#11662)assert_*_equal
AssertionError when exact=False
(#11781)PyLazyGroupby
reusable (#11769)pl.duration
(#11748)join_asof
with strategy="nearest"
(#11673)_to_rust_syntax
util (#11795)IntegralType
to IntegerType
(#11773)expand_selector
in user guide (#11722)df.to_dict
/series.to_list
(#11757)group_by_dynamic
into one module (#11741)describe
metrics (#11694)help
command output following addition of some longer options (#11681)polars-lts-cpu
for macOS x86-64/rosetta (#11660)Thank you to all our contributors for making this release possible!
@JulianCologne, @MarcoGorelli, @Walnut356, @aberres, @alexander-beedie, @alicja-januszkiewicz, @cmdlineluser, @jrycw, @mcrumiller, @messense, @nameexhaustion, @orlp, @petrosbar, @rancomp, @reswqa, @ritchie46, @romanovacca, @sd2k, @stinodego, @svaningelgem and @thomasjpfan
Published by github-actions[bot] about 1 year ago
.list.lengths
and .str.lengths
(#11613)radix
in parse_int
(#11615)write_csv
parameter quote
to quote_char
(#11583)schema
, schema_override
for pl.read_json
with array-like input (#11492)UNION [ALL] BY NAME
, add "diagonal_relaxed" strategy for pl.concat
(#11597)read_database
options passthrough to the underlying connection's execute
method (enables parameterised SQL queries, etc) (#11562)INITCAP
string function for SQL (#9884)IN
clauses (#11574)scan_csv
and read_csv
(#11575)is_in
handling of mismatched dtypes and fix a minor regression (#11533)USING
columns (#11518)py-polars
(#11616)write_csv
parameter quote
to quote_char
(#11583)**kwargs
from LazyFrame.collect()
(#11567)Thank you to all our contributors for making this release possible!
@ByteNybbler, @MarcoGorelli, @TheDataScientistNL, @alexander-beedie, @andysham, @c-peters, @jhorstmann, @mcrumiller, @nameexhaustion, @orlp, @reswqa, @ritchie46, @romanovacca, @stinodego and @svaningelgem
Published by github-actions[bot] about 1 year ago
rolling
expression as a special case of window functions. (#11445)left_on
and right_on
parameters to df.update
(#11277)IN(subquery)
and SQL Subquery Infrastructure (#11218)read_database
(#11448)rolling
expression as a special case of window functions. (#11445)ColumnFactory
to additionally support tab-complete for col
in IPython (#11435)cut
/qcut
when allow_breaks=True
(#11287)write_csv
when using non-default "quote" char (#11474)read_database
fallback for Snowflake warehouses/connections that don't support Arrow resultsets (#11447)ANY
and ALL
behaviour (#10879)is_in
values to the column dtype being searched (#11427)repeat_by
to polars-ops (#11461)polars-lts-cpu
/polars-u64-idx
(#11430)Thank you to all our contributors for making this release possible!
@ByteNybbler, @MarcoGorelli, @SeanTroyUWO, @alexander-beedie, @c-peters, @dependabot, @dependabot[bot], @mcrumiller, @orlp, @ritchie46, @romanovacca, @stinodego, @svaningelgem and Romano Vacca
Published by github-actions[bot] about 1 year ago
read_database
(#11377)Thank you to all our contributors for making this release possible!
@MarcoGorelli, @alexander-beedie, @bowlofeggs, @c-peters, @jonashaag, @orlp, @ritchie46 and @stinodego
Published by github-actions[bot] about 1 year ago
infer_schema_length
(#11358)Thank you to all our contributors for making this release possible!
@MarcoGorelli, @jonashaag, @orlp, @ritchie46 and @stinodego
Published by github-actions[bot] about 1 year ago
disable_string_cache
(#11020)pydantic
models that have a small number of fields, and support direct init from SQLModel data (often used with FastAPI) (#11263)label='right'
(#11337)read_excel
(for excel binary workbook files) (#11248)disable_string_cache
(#11020)NULLIF
and COALESCE
SQL functions (#11124)tree-formatting
representation (#11176)Series.__contains__
for None values and implement is_in
for null Series (#11345)quote_style
is non-numeric (#11328)has_validity
docstring and fix several cases where the presence of a bitmask was used to incorrectly infer the existence of null
values (#11319)collections.namedtuple
values (#11314)find_stacklevel
(#11292)selector
expressions in editor/console (#11235)Config
JSON string with file path (#11098)scan_pyarrow
predicates (#11195)-
or +
operators (#11158)read_excel
"read_csv_options" (#11162)assert_frame_equal
for LazyFrames (don't collect until after the schema has been checked) (#11331)GITHUB_TOKEN
to get contributor information for docs (#11321)null_count
from has_validity
(clarifies the correct way to check for nulls) (#11323)<2.4.0
(#11312)IntoExprColumn
(#11296)performant
feature only once (#11223)read_database
batch_size
docstring (#11132)Thank you to all our contributors for making this release possible!
@ByteNybbler, @Cheukting, @Fokko, @Hofer-Julian, @MarcoGorelli, @SeanTroyUWO, @alexander-beedie, @billylanchantin, @jonashaag, @mcrumiller, @orlp, @ptiza, @reswqa, @ritchie46, @stinodego and @universalmind303
Published by github-actions[bot] about 1 year ago
f64
for rank
when method="average"
(#10734)groupby
to group_by
(#10654)all
- fix Kleene logic implementation for all
/any
(#10564)arange
an alias for int_range
(#9983)date_range
/time_range
no longer return a List
type (#10526)0.18
(#10527)is_first/last
to is_first/last_distinct
(#11130)count_match
to count_matches
(#11028)strip
to strip_chars
(#10813)datetime_range
expression function (#10213)Series/Expr.rolling_apply
to rolling_map
(#10750)write_csv
(#11015)literal
for str count_match (#10996)strip_prefix
and strip_suffix
to the string namespace (#10958)datetime_range
expression function (#10213)array_to_string
(#10839)str.count_match
(#10900).offset_by
(#9967)select
(#10885)truncate_ragged_lines
(#10660)groupby
to group_by
(#10654)is_in
and more generic array construction (#10614)all
- fix Kleene logic implementation for all
/any
(#10564)cast
support (#10504)arange
an alias for int_range
(#9983)date_range
/time_range
no longer return a List
type (#10526)0.18
(#10527)int_range
(#10914)int_range(s)
exclusive on the upper bound when step is negative (#10898)println!
from datetime fn (#10862)ORDER BY
on unselected columns (#10752)chunks_mut
entry in vtable (#10745)value_counts
on column named "counts"
(#10737)f64
for rank
when method="average"
(#10734)AllHorizontal
format string (#10658)is_in
(#10620)all
- fix Kleene logic implementation for all
/any
(#10564)tolerance
implementation, address edge-cases (#10482)Duration
method for checking full days (#10850)LocalProjection
(#10886)range
expression module (#10871)range
related functions (#10830)polars-lazy
(#10729)date_range
(#10653)Cargo.lock
(#10555)flate2
release (#10492)Thank you to all our contributors for making this release possible!
@Barsik-sus, @I8dNLo, @JulianCologne, @KacpiW, @MarcoGorelli, @Object905, @OndrejSlamecka, @Qqwy, @SeanTroyUWO, @TNieuwdorp, @VasanthakumarV, @alexander-beedie, @aminalaee, @antoniocali, @braaannigan, @bvanelli, @c-peters, @cjackal, @cmdlineluser, @dependabot, @dependabot[bot], @drgif, @henrikig, @ion-elgreco, @jakob-keller, @jeroenjanssens, @jonashaag, @lorepozo, @marki259, @mcrumiller, @messense, @mrogowski11, @nameexhaustion, @orlp, @owrior, @rben01, @reswqa, @ritchie46, @s-banach, @sdamashek, @stinodego, @svaningelgem, @thomasjpfan, @titoeb, @trueb2, @washcycle, @wdoppenberg and @zundertj
Published by github-actions[bot] about 1 year ago
is_first/last
to is_first/last_distinct
(#11130)count_match
to count_matches
(#11028)strip
to strip_chars
(#10813)datetime_range
expression function (#10213)_unpack_schema()
(#11080)polars.utils._post_apply_columns()
(#11086)polars.utils._post_apply_columns()
(#11041)_unpack_schema()
(#10960)pl.read_ods
function (#11011)write_csv
(#11015)literal
for str count_match (#10996)strip_prefix
and strip_suffix
to the string namespace (#10958)read_excel
table data identification (#10953)from_dataframe
fast path and improve typing (#10979)openpyxl
as a new/optional engine for read_excel
(#6183)datetime_range
expression function (#10213)Series.__getitem__
raise an IndexError (#11061)read_dicts
and reduce latency of small-frame creation (#11047)series_equal
properly accounts for dtypes when strict=True (#11012)SchemaDefinition
type alias (#11077)fetch
explanation in a "notes" block to better highlight it in the docs (#11058)get_data_buffer
(#10966)pydantic >= 2.0.0
requirement (#10944)Thank you to all our contributors for making this release possible!
@I8dNLo, @KacpiW, @MarcoGorelli, @Object905, @Qqwy, @TNieuwdorp, @alexander-beedie, @antoniocali, @bvanelli, @cjackal, @henrikig, @jakob-keller, @mrogowski11, @nameexhaustion, @orlp, @reswqa, @ritchie46, @s-banach, @stinodego, @svaningelgem and @thomasjpfan
Published by github-actions[bot] about 1 year ago
col("foo")
-> col.foo
(#10874)Expr.is_not()
to not_()
(#10838)Config
options to be easily reset to their default value (#10922)str.count_match
(#10900)glimpse
customisation, fix strings repr (#10895).offset_by
(#9967)col("foo")
-> col.foo
(#10874)select
(#10885)read_database
(#10851)int_range
(#10914)int_range(s)
exclusive on the upper bound when step is negative (#10898)2.0.0
(#10923)Expr.map_elements
(#10647)read_database
connection/cursor behaviour (#10873)Thank you to all our contributors for making this release possible!
@Barsik-sus, @MarcoGorelli, @alexander-beedie, @c-peters, @cmdlineluser, @dependabot, @dependabot[bot], @drgif, @jeroenjanssens, @orlp, @ritchie46, @stinodego and @wdoppenberg
Published by github-actions[bot] about 1 year ago
binary
, boolean
, categorical
, date
, object
, and time
selectors (#10806)allow_copy=False
(#10822)reversed(df)
(#10823)range
related functions (#10830)Thank you to all our contributors for making this release possible!
@alexander-beedie, @orlp, @reswqa, @ritchie46 and @stinodego
Published by github-actions[bot] about 1 year ago
An upgrade guide is available on our website.
DataFrame
init from queries against users' existing database connections (#10649)groupby
to group_by
(#10656)f64
for rank
when method="average"
(#10734)all
- fix Kleene logic implementation for all
/any
(#10564)from_arrow
to take a generator of RecordBatches, change error type to TypeError
(#10529)arange
an alias for int_range
(#9983)date_range
/time_range
no longer return a List
type (#10526)0.18
(#10527)map
to map_batches
(#10801)GroupBy.apply
to map_groups
(#10799)DataFrame.apply
to map_rows
(#10797)Series/Expr.rolling_apply
to rolling_map
(#10750)Series/Expr.apply
to map_elements
(#10678)groupby
to group_by
(#10656)cut
/qcut
(#10484)Protocol
for interchange classes (#10688)DataFrame
init from queries against users' existing database connections (#10649)truncate_ragged_lines
(#10660)write_excel
arguments (#10589)LazyFrame.collect_async
and pl.collect_all_async
(#10616)is_in
and more generic array construction (#10614)all
- fix Kleene logic implementation for all
/any
(#10564)cast
support (#10504)from_arrow
to take a generator of RecordBatches, change error type to TypeError
(#10529)get_idx_type
- use get_index_type
instead (#10556)arange
an alias for int_range
(#9983)date_range
/time_range
no longer return a List
type (#10526)0.18
(#10527)ORDER BY
on unselected columns (#10752)pre-wrap
instead of pre
(#10739)value_counts
on column named "counts"
(#10737)f64
for rank
when method="average"
(#10734)on
arg type for join_asof (#10690)write_delta
(#10633)is_in
(#10620)all
- fix Kleene logic implementation for all
/any
(#10564)write_delta
with schema in delta_write_options
(#10541)pl.Config
options relating to shape, column names, and types when rendering HTML (#10449).venv
in repo root (#10789)write_database
unit tests to properly separate concerns (#10773)adbc
release (#10763)connectorx
and bump other Python dependencies (#10753)testing
docs about module import (#10741)13.0.0
behavior (#10691)sink_parquet
docs (#10669)deprecate_renamed_methods
util (#10537)inspect.currentframe
(#10630)Expr.meta
namespace (#10617)Cargo.lock
(#10555)make requirements
fully refreshes unpinned packages/deps (#10591)expr_dispatch
decorator to work on methods with decorators (#10549)pyo3/maturin-action
(#10503)Thank you to all our contributors for making this release possible!
@JulianCologne, @MarcoGorelli, @Object905, @OndrejSlamecka, @SeanTroyUWO, @VasanthakumarV, @alexander-beedie, @aminalaee, @braaannigan, @c-peters, @ion-elgreco, @lorepozo, @marki259, @mcrumiller, @messense, @orlp, @owrior, @rben01, @reswqa, @ritchie46, @sdamashek, @stinodego, @svaningelgem, @titoeb, @trueb2, @washcycle and @zundertj
Published by github-actions[bot] about 1 year ago
maturin
to version 1.2.1 (#10479)Thank you to all our contributors for making this release possible!
@ritchie46 and @stinodego
Published by github-actions[bot] about 1 year ago
approx_unique
as approx_n_unique
(#10290)date_ranges
/time_ranges
expression functions (#10005)~2.5x
(#10039)read_excel
, read_csv
, scan_csv
, and read_csv_batched
(#10409)use_earliest
to to_datetime
/ strptime
(#10426)is_local
and to_local
to categorical namespace (#10372)Series.cat.uses_lexical_ordering
(#10325)time
, date
, datetime
(#10298)str.extract_groups
(#10179)datetime
expression function with time zone/time unit parameters (#10235)lit(Series).cast(..)
to -> lit(Series.cast(..))
(#10092)CASE
statement expressions (#10065)date_ranges
/time_ranges
expression functions (#10005).extract_groups()
(#10306)is_in
on empty series (#10195).apply
(#10172)null_on_oob=False
in list.take
when pa⦠(#10105).col(regex).exclude()
operations not executing. (#10025)by
groups are interleaved (#9938)gh-pages
branch (#10282)make pre-commit
command (#10205)make integration-tests
command (#10202)when
/then
/otherwise
internals (#9922)Thank you to all our contributors for making this release possible!
@0xbe7a, @CanglongCl, @JulianCologne, @MarcoGorelli, @OndrejSlamecka, @OneRaynyDay, @SeanTroyUWO, @StefanBRas, @TLouf, @alexander-beedie, @c-peters, @cjackal, @cmdlineluser, @dependabot, @dependabot[bot], @drgif, @duvenagep, @eltociear, @fsimkovic, @ion-elgreco, @jonashaag, @lfn3, @magarick, @mcrumiller, @orlp, @potzenhotz, @rea1bacon, @reswqa, @rikkaka, @ritchie46, @stinodego, @thomasaarholt, @varunmittal91 and @zundertj
Published by github-actions[bot] about 1 year ago
lit
(#10461)df.item
(~4-5x speedup) (#10411)read_excel
, read_csv
, scan_csv
, and read_csv_batched
(#10409)use_earliest
to to_datetime
/ strptime
(#10426)write_excel
(#10392)selector
variants for signed/unsigned integers (#10384)is_local
and to_local
to categorical namespace (#10372)selectors
expansion function, so it can operate on a schema as well as a frame (#10341)describe
(#10378)OverflowError
in testing asserts with huge UInt64
diffs (#10437)vertical_relaxed
example for pl.concat
(#10472)Sphinx
settings (#10400)Thank you to all our contributors for making this release possible!
@MarcoGorelli, @OndrejSlamecka, @alexander-beedie, @c-peters, @cmdlineluser, @drgif, @ion-elgreco, @lfn3, @orlp, @potzenhotz, @rea1bacon, @reswqa, @ritchie46, @stinodego and @zundertj
Published by github-actions[bot] about 1 year ago
LazyFrame.read/write_json
to de/serialize
(#10238)categorical_as_str
parameter to testing utils (#10350)selectors
in additional frame methods (#10255)Series.cat.uses_lexical_ordering
(#10325)time
, date
, datetime
(#10298)categorical_as_str
parameter to testing utils.extract_groups()
(#10306)Thank you to all our contributors for making this release possible!
@CanglongCl, @JulianCologne, @MarcoGorelli, @alexander-beedie, @cmdlineluser, @eltociear, @orlp, @ritchie46 and @stinodego
Published by github-actions[bot] about 1 year ago
approx_unique
as approx_n_unique
(#10290)qcut
parameter to quantiles
(#10253)avg
alias for mean
(#10236)str.extract_groups
(#10179)TypeError
for all LazyFrame comparison operators (#10275)map_dict
where the lookup key is an expression (#10265)datetime
expression function with time zone/time unit parameters (#10235)scan_pyarrow_dataset
parameters (#10249)allow_copy
was set to False (#10262)gh-pages
branch (#10282)read_parquet
and scan_parquet
about hive-style partitioning (point to scan_pyarrow_dataset
instead) (#10277)maximum_signature_line_length
(#10228).then(..)
branches (#10229)Thank you to all our contributors for making this release possible!
@0xbe7a, @MarcoGorelli, @TLouf, @alexander-beedie, @cmdlineluser, @dependabot, @dependabot[bot], @duvenagep, @mcrumiller, @orlp, @reswqa, @ritchie46 and @stinodego
Published by github-actions[bot] about 1 year ago
RETURN_VALUE
ops when checking apply
lambdas/functions (#10211)Thank you to all our contributors for making this release possible!
@alexander-beedie, @magarick, @ritchie46, @stinodego and @varunmittal91
Published by github-actions[bot] about 1 year ago
read_database
if not passed a string URI (#10191)is_in
on empty series (#10195).apply
(#10172)_scan_impl
(#10175)issue_deprecation_warning
(#10146)Thank you to all our contributors for making this release possible!
@MarcoGorelli, @alexander-beedie, @cjackal, @cmdlineluser, @potzenhotz, @ritchie46 and @stinodego
Published by github-actions[bot] about 1 year ago
when-then-otherwise
(#10122)date_ranges
/time_ranges
expression functions (#10005)~2.5x
(#10039)Series
(#10104)and/or
control flow (#10085)lit(Series).cast(..)
to -> lit(Series.cast(..))
(#10092)SQLContext
(#9571)CASE
statement expressions (#10065)date_ranges
/time_ranges
expression functions (#10005)apply
(#10026)json.loads
in conjunction with apply
(#10023)numpy
functions passed to apply
(#10021)numpy
functions in UDFs that we can map to native expressions (#10003)null_on_oob=False
in list.take
when pa⦠(#10105).col(regex).exclude()
operations not executing. (#10025)time_range
/date_range
dimensions fix (#9996)when
/then
/otherwise
internals (#9922)Returns
sections of docstrings (#10064)Instruction
matching for BytecodeParser
(#10040)BytecodeParser
(#10012)BytecodeParser
class (#9993)date_range
/time_range
(#9985)Thank you to all our contributors for making this release possible!
@MarcoGorelli, @SeanTroyUWO, @alexander-beedie, @c-peters, @cmdlineluser, @jonashaag, @magarick, @mcrumiller, @rikkaka, @ritchie46 and @stinodego
Published by github-actions[bot] over 1 year ago
Series.extend
(#9901)pyo3::intern
to avoid needlessly recreating PyString (#9853)SQRT
, CBRT
, PI
functions to SQLContext
(#9936)jump
bytecode instructions required to reconstruct and/or
logic (#9972)Series.extend
(#9901)sql_expr
(#9881)LENGTH
and OCTET_LENGTH
string functions for SQL (#9860)polars_warn!
macro (#9868)by
groups are interleaved (#9938)DataFrame.extend
extending by itself (#9897)LitIter
(#9886)DataFrame.vstack
stacking itself (#9895)pl.sql_expr
(#9875)apply
docstring example text (#9953)collect_all
returns result frames in the same order as input (#9951)sink_*
methods to IO chapter (#9939)weekday
, day
, ordinal_day
examples (#9926)bins
argument and rename to breaks
in Series.cut
(#9913)link
entry to sphinx conf and factor-out website root paths (#9864)Thank you to all our contributors for making this release possible!
@0xbe7a, @JulianCologne, @MarcoGorelli, @OneRaynyDay, @SeanTroyUWO, @StefanBRas, @alexander-beedie, @c-peters, @fsimkovic, @ion-elgreco, @magarick, @mcrumiller, @messense, @ritchie46, @sorhawell, @stinodego, @thomasaarholt and @zundertj