polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust

OTHER License

Downloads
9.7M
Stars
26.3K
Committers
213

Bot releases are visible (Hide)

polars - Python Polars 0.17.6

Published by github-actions[bot] over 1 year ago

🚀 Performance improvements

  • optimize join inner materialization of single keys (#8405)
  • parallelize sorted group tuple materialization (#8387)
  • improve materialization of huge cardinality group tuples (#8382)
  • improve group_tuples materialization (#8375)
  • conversion speedups from polars int64 timestamps to python temporal types:
    • ~35% faster → python date (#8339)
    • ~15% faster → python time (#8352)
    • ~10% faster → python datetime (#8339)
  • use online variance kernel for aggregation (#8306)

✨ Enhancements

  • allow existing item method to optionally take row/col indices (#8412)
  • allow negative 'arange' expression (#8413)
  • warn if argument is not explicitly sorted (#8409)
  • .to_numpy(use_pyarrow=False) for Object and Boolean (#8397)
  • new hypothesis strategy that can generate data for List dtypes (#8400)
  • offer cleaner usage pattern for Config object in context-manager context (#8394)
  • add support for SQL "IN" expr (#8396)
  • add a "signed" param to Series.is_integer (#8383)
  • add is_integer (#8373)
  • raise error on invalid dict aggregation (#8371)
  • cli output mode & sql read_json (#8336)
  • more informative keyerror on invalid getitem (#8320)

🐞 Bug fixes

  • infer supertype in json serde (#8411)
  • duration on empty df (#8403)
  • don't inadvertently set Series initialised with nested tuple data as Object dtype (#8401)
  • use physical in streaming unique global table (#8390)
  • recursively bubble up all dtypes in list cast (#8386)
  • is_in struct logical types (#8378)
  • fix nested null parquet read (#8372)
  • fix logical type in ListChunked::new_from_index (#8367)
  • fix unintentional loading of hypothesis profile (#8362)
  • bubble up logical type in recursive list cast (#8356)
  • ensure that iter_rows doesn't return nested Timestamp values (#8359)
  • implement clone_inner for all series (#8357)
  • add missing __hash__ support to Field, include "time_zone" in Datetime hash, fix Struct hash (#8354)
  • fix fill_null for categorical (#8353)
  • time.cast(str) as strftime (#8351)
  • fix logical dtypes in parallel list collection (#8349)
  • improve logical types of explode operation (#8348)
  • logical type in anonymous list builders (#8346)
  • address potential error caused by float division on time_unit scaling (#8337)
  • escape csv header names if they contain special chars (#8331)
  • nested struct/list/categorical logical/physical (#8334)
  • fix struct schema argument (#8327)
  • fix precision issue when converting pl.Datetime("ms") to Python datetime (#8332)
  • fix deserialize empty list (#8326)
  • List<Null> consistency (#8325)
  • fix coalesce schema (#8324)
  • don't do null propagation (#8322)
  • validate window_size user input in rolling_expr (#8318)
  • ensure invalid list eval raises (#8317)
  • fix typing overloads of read_excel (#8300)

🛠️ Other improvements

  • new hypothesis strategy that can generate data for List dtypes (#8400)
  • update duration docstring/example (#8392)
  • Upgrade ruff (#8380)
  • enhanced parametric testing for temporal dtypes (#8347)
  • Minor update to strptime (#8345)
  • adjust pytest config so as not to inadvertently prevent test debugging in IPython consoles (#8308)
  • add newline in pl.DataFrame.pivot docs (#8307)

Thank you to all our contributors for making this release possible!
@JoonHong-Kim, @MarcoGorelli, @StefanBRas, @alexander-beedie, @avimallu, @grantmcdermott, @jonashaag, @rben01, @ritchie46, @stinodego and @universalmind303

polars - Python Polars 0.17.4

Published by github-actions[bot] over 1 year ago

🚀 Performance improvements

  • add specialized boolean aggregation for min/max (#8294)

✨ Enhancements

  • preserve time zone in combine (#8263)

🐞 Bug fixes

  • pass name to struct construction in aggregation (#8299)
  • improve nested list construction (#8278)
  • Truncate long column name in glimpse (#8281)
  • Fix DataFrame.sum returning empty column names (#8283)
  • always sort in top_k fast path (#8275)
  • don't use fast paths for sorted join if there are … (#8272)

🛠️ Other improvements

  • use concat_owned_array_unchecked when possible (#8274)

Thank you to all our contributors for making this release possible!
@MarcoGorelli, @alexander-beedie, @ritchie46, @stinodego, @zaynetro and @zundertj

polars - Python Polars 0.17.3

Published by github-actions[bot] over 1 year ago

🏆 Highlights

  • support DataFrame init from pydantic model data (#8178)

🚀 Performance improvements

  • fail fast on non-inferable strings in strptime if no fmt is provided (#8111)
  • make chunks search more resilient (#8229)
  • SIMD accelerated arg_min/arg_max (via argminmax) (#8074)
  • speed up csv parsing for slower datetimes formats (#8213)
  • improve datetime interpret perf (#8209)
  • arr.eval run on groupby expression engine when possible (#8199)
  • ~2-3x speedup for DataFrame init from pydantic models (#8181)

✨ Enhancements

  • add use_earliest argument to replace_time_zone for dealing with ambiguous datetimes (#8087)
  • fail loudly on .%f directive, as it differs from the Python standard library (#8237)
  • SQL CTE's (#8208)
  • automatically convert series OP expr -> pl.lit(series) OP expr where OP is arithmetic (#8225)
  • add pickle support for LazyFrame (#8220)
  • add duration cumsum and remainder (#8219)
  • support DataFrame init from nested dataclass, pydantic, and NamedTuple objects (#8185)
  • better algorithm for streaming unique (#8003)
  • Add approx distinct count via approx_unique() (#7937)
  • add percentiles to describe methods (#8169)
  • support DataFrame init from pydantic model data (#8178)
  • display skipped row if same number of rows (#8170)

🐞 Bug fixes

  • add special numpy float branch in anyvalue conversion (#8259)
  • fix boolean par materialization (#8257)
  • improve null/empty list construction (#8255)
  • fix offsets in parallel utf8 materialization (#8254)
  • nested struct logical type consistency (#8249)
  • keep literal state if elementwise function is applied (#8195)
  • decimal ensure backed arrow arrays have correct dtype (#8193)

🛠️ Other improvements

  • parametric/hypothesis testing code cleanups (#8253)
  • Rename strptime/strftime args (#8221)
  • change sampling ratio for groupby strategy (#8223)
  • Rename Expr.list to implode (#8165)
  • don't panic on err in offset_by (#8210)
  • re-enable test parallization for Windows tests (#8214)
  • Fix small typo: "im memory" -> "in memory" (#8187)
  • remove unused dtype_to_arrow_type (#8177)

Thank you to all our contributors for making this release possible!
@MarcoGorelli, @alexander-beedie, @avimallu, @borchero, @chitralverma, @clickingbuttons, @ghuls, @josh, @jvdd, @rben01, @ritchie46, @stinodego and @universalmind303

polars - Python Polars 0.17.2

Published by github-actions[bot] over 1 year ago

✨ Enhancements

  • make unique expr serde and cmp (#8153)
  • Enhanced parametric testing DataFrame generation (#8149)
  • support negative index in pct_change (#8137)
  • add log1p to list of mathematical functions (#8102)

🐞 Bug fixes

  • object conversion in anyvalue (#8155)
  • Address a ~15% regression in import polars speed (#8151)
  • validate map lenghts (#8147)
  • fix row-wise init of UInt64 values that exceed Int64 upper bound (#8146)
  • implement list<null> constructor (#8143)
  • add all primitives to av_buffer builder (#8140)
  • struct is_in (#8139)
  • fix wrong display name of binary expressions (#8131)

🛠️ Other improvements

  • Enhanced parametric testing DataFrame generation (#8149)

Thank you to all our contributors for making this release possible!
@alexander-beedie, @borchero, @dependabot, @dependabot[bot], @jonashaag, @ritchie46 and @stinodego

polars - Python Polars 0.17.1

Published by github-actions[bot] over 1 year ago

✨ Enhancements

  • Add median stat to Series.describe (#8118)
  • Support n expression passed to Expr.head/tail (#8098)
  • expand list of tz-aware formats which can be auto-inferred (#8085)
  • clearer error message if strptime without a fmt specified fails (#8086)
  • infer tz-aware formats with try_parse_dates in read_csv (#8084)
  • feat(python, rust)! make 'mo' interval raise if the target date does not exist (#8078)
  • auto-infer fmt for tz-aware date strings (#7405)
  • multiple sql contexts & optional sql highlighting in cli (#8072)

🐞 Bug fixes

  • fix detection of default integer indexes on win32 when loading from pandas frames (#8110)
  • fix stacklevel of some deprecation warnings (#8089)
  • lazy: fix boolean sum schema (#8108)
  • Expr.str.decode returns binary dtype (#8099)
  • Fix show_versions util (#8096)
  • don't exponentially grow error messages (partial fix). (#8081)
  • Fix regression with scan_parquet/ipc and fsspec (#8071)

🛠️ Other improvements

  • Improve some tests (#8043)
  • Do not parallelize Windows tests (#8097)
  • Fail doctest on deprecation warnings (#8091)
  • Fix Expr.apply docstring for return_dtype parameter (#8069)

Thank you to all our contributors for making this release possible!
@MarcoGorelli, @StefanBRas, @alexander-beedie, @josh, @n8henrie, @rben01, @ritchie46, @stinodego, @universalmind303 and @zundertj

polars - Python Polars 0.17.0

Published by github-actions[bot] over 1 year ago

⚠️ Breaking changes

  • rename some function arguments (#8017)
  • don't create duplicate pivot names (#8002)
  • Remove deprecated behaviour (#7978)
  • rename toggle_string_cache to enable_string_cache (#7970)
  • change top_k(descending) -> bottom_k (#7969)
  • in sort, top_k, sort_by, and arg_sort_by, raise if descending is a sequence and its length doesn't match the number of columns to sort by (#7957)
  • Use RowsError instead of RowsException as recommended … (#6009)
  • Use time_unit/time_zone instead of tu/tz (#7910)
  • More ergonomic args for struct, concat_str, and arg_sort_by (#7308)
  • swap arguments of shift_and_fill and add default… (#7192)
  • set maintain_order=False for df/lf.unique (#7468)
  • Rename pipe arg func to function (#7139)
  • Set some args for Series/Expr methods to keyword-only (#7860)

🚀 Performance improvements

  • FromParalleIter<Option<str>> for Utf8Chunked ~1.9x (#8058)
  • speed up from_par_iter Option<bool> ~2.5x (#8057)
  • parallelize numeric ChunkedArray materialization ~2x. (#8053)
  • parallelize into_groups materialization ~-25% (#8036)
  • use a trusted anyvalue builder (#8001)
  • numeric grouptuples with nulls hash in single pass ~25% (#7980)
  • ensure primitives are parsed first in anyvalue conversion (#7955)
  • use perfect hash table for categoricals (#7951)

✨ Enhancements

  • multiple sql contexts & optional sql highlighting in cli (#8072)
  • implement arg_sort for struct dtype (#8051)
  • Support DataFrame init from pyarrow RecordBatch objects, and improve init from Array (#8011)
  • allow write_ipc to take file=None (returning BytesIO) (#7997)
  • Add __array__ method to DataFrame (#7979)
  • support struct in df.unique (#7976)
  • change top_k(descending) -> bottom_k (#7969)
  • basic sanity-checks for some Config methods, reference POLARS_MAX_THREADS in threadpool_size docstring (#7965)
  • optimize away nested unions in lp (#7861)
  • Use RowsError instead of RowsException as recommended … (#6009)
  • More ergonomic args for struct, concat_str, and arg_sort_by (#7308)

🐞 Bug fixes

  • check element count in multi-column explode (#8050)
  • set lower limit for chunk_size (#8048)
  • impl to_static for struct (#8037)
  • create Series with list of only None with Float32 dtype (#8015)
  • version gate pyarrow version for `to_pandas=(use_pyarrow… (#8026)
  • Only allow correct type for get_column and to_series arg… (#7983)
  • Output correct dtype for values of remapping dict in map… (#8013)
  • all/any empty sets (#8012)
  • struct null_count, cast string, tranpose and describe (#8009)
  • fix pivot and transpose of struct data (#8005)
  • don't create duplicate pivot names (#8002)
  • Fix test_literal_group_agg_chunked_7968 test (#7991)
  • fix chunked literals in expression engine (#7973)
  • in sort, top_k, sort_by, and arg_sort_by, raise if descending is a sequence and its length doesn't match the number of columns to sort by (#7957)
  • pandas 2.0 compat (#7962)
  • concat object types (#7958)
  • fix decimal conversion alignment (#7954)

🛠️ Other improvements

  • Fix Expr.apply docstring for return_dtype parameter (#8069)
  • rename some function arguments (#8017)
  • Remove deprecated behaviour (#7978)
  • Add docstring examples for top_k and bottom_k (#7987)
  • rename toggle_string_cache to enable_string_cache (#7970)
  • add remaining operator-equivalent method docstrings and a related html/docs entry (#7953)
  • Use time_unit/time_zone instead of tu/tz (#7910)
  • swap arguments of shift_and_fill and add default… (#7192)
  • set maintain_order=False for df/lf.unique (#7468)
  • Rename pipe arg func to function (#7139)
  • Set some args for Series/Expr methods to keyword-only (#7860)

Thank you to all our contributors for making this release possible!
@MarcoGorelli, @StefanBRas, @alexander-beedie, @ghuls, @rben01, @ritchie46, @stinodego and @universalmind303

polars - Python Polars 0.16.18

Published by github-actions[bot] over 1 year ago

🚀 Performance improvements

  • improve group_tuples of high cardinality data ~10% (#7938)

✨ Enhancements

  • Add seed argument to rank for random (#7913)
  • Support Numpy ufunc with more than one expression (#7924)

🐞 Bug fixes

  • Fix lazy encode schema (#7912)
  • respect skip_nulls in apply for temporal types (#7908)

🛠️ Other improvements

  • Rename argument f to function in reduce docstring (#7925)
  • improve docstrings for numeric/math operator-equivalent methods (#7942)

Thank you to all our contributors for making this release possible!
@MarcoGorelli, @alexander-beedie, @alonme, @ankane, @dependabot, @dependabot[bot], @lorentzenchr, @rben01, @ritchie46 and @zundertj

polars - Python Polars 0.16.17

Published by github-actions[bot] over 1 year ago

🚀 Performance improvements

  • use streaming instead of partitioned groupby (#7907)
  • don't auto-stream groupby (#7906)
  • rechunk before aggs (#7903)
  • don't re-allocate groups in sorted to_dummies (#7897)
  • fix hashing regression (#7833)
  • rechunk dataframe before unique computation (#7814)
  • improve hash quality (#7813)
  • always take sorted fast path group_tuples (#7787)

✨ Enhancements

  • auto-infer detecting time-zone-awareness of fmt argument in strptime; deprecate tz_aware argument (#7886)
  • Add Series.pow() (#7898)
  • deal with null values in cut/qcut (#7878)
  • allow list/tuple lit values (#7879)
  • Support writing dynamic/live formula columns via write_excel (#7871)
  • support datetime/date subclasses (e.g. FreezeGun) (#7819)
  • support mode for floats and categoricals (#7827)
  • allow Series init with Unknown dtype to proceed as if dtype is None, to allow inference (#7830)
  • support sort by 'struct' type (#7822)
  • add to_repr methods to DataFrame and Series (#7802)
  • thousand separators in shape of repr DataFrame (#7775)
  • Improve automatic output dtype setting for map_dict. (#7797)
  • new utility from_repr function that reconstructs a DataFrame from its table repr (#7781)
  • deprecate default value of aggregation_function being 'first' in pivot. In a future version, it will default to None (#7784)

🐞 Bug fixes

  • fix lit agg (#7904)
  • disable ooc groupby (#7901)
  • Use check_exact for temporal types in assert_series_equal (#7896)
  • fix abs logical type (#7895)
  • fix boolean min/max output type and null handling (#7894)
  • Cast compound types to their simple string representation on export to Excel (#7887)
  • ensure _repr_html_ escapes column names in addition to data/body elements (#7877)
  • validate groupby_dynamic inputs (#7876)
  • correct for chunks in arg_where (#7873)
  • fix nested logical/physical list (#7872)
  • fix arbitrary nested logical types (#7869)
  • Relax type hints for when/then (#7857)
  • don't use fxhash in sink_sorted fast path (#7849)
  • parquet stats & all kernel (#7846)
  • Add missing type hint for is_between (#7835)
  • fill null list (#7836)
  • fix explode list[null] (#7832)
  • fix unicode lower/uppercase (#7826)
  • raise error on invalid series concat strategy (#7823)
  • don't use naive name in partitioned agg (#7810)
  • Ensure CsvReader always respects the n_rows parameter (#7789)

🛠️ Other improvements

  • Fix read_csv docstring formatting (#7875)
  • update concat docstring for how parameter (#7834)
  • don't run hash stability test on arm64 (#7825)
  • Improve pl.when documentation (#7793)
  • add description of ddof (#7811)
  • Rename venv folder to .venv (#7790)
  • add a make requirements option to install/refresh dependencies without having to recreate the venv (#7792)
  • fixup stacklevels (#7796)
  • Drop ruff target version (#7791)

Thank you to all our contributors for making this release possible!
@LdRoW, @MarcoGorelli, @Newtoniano, @advoet, @alexander-beedie, @duskmoon314, @foxcroftjn, @ghuls, @jonashaag, @ritchie46, @stinodego and @zundertj

polars - Rust Polars 0.28.0

Published by github-actions[bot] over 1 year ago

🏆 Highlights

  • out of core sort on multiple columns (#7244)

🚀 Performance improvements

  • rechunk dataframe before unique computation (#7814)
  • improve hash quality (#7813)
  • remove unnecessary copy in rolling function (#7801)
  • always take sorted fast path group_tuples (#7787)
  • change top_k algorithm (#7718)
  • runtime SIMD target detection for min/max/sum and impl SIMD mean ~2-5x (#7702)
  • implement top-k optimization (#7678)
  • ooc-sort dump in thread local if IO-thread is full. (#7668)
  • use perfect hash table for ooc partitioning (#7653)
  • optimize string kernels, (elide redundant allocs) (#7602)
  • optimize str_replace for same length replacements ~2x (#7580)
  • improve perf or str.replace_n and add n argument ~10x (#7575)
  • speedup replace_literal_all of single byte replacements ~15x. (#7565)
  • set sorted flags (#7558)
  • use atoi in favor of lexical in strptime -25% (#7501)
  • [csv] faster utf8 validation ~20% (#7500)
  • [csv] SIMD accelerate SplitFields -40% (#7498)
  • (csv) don't use memchr for splitfields -~0.15% (#7494)
  • csv-file use fast-float for csv float parsing (#7492)
  • speed up comparison of sorted arrays ~3.85x. (#7478)
  • improve performance for datetime parsing with %Z (#7369)
  • optimize str.replace_all (#7353)
  • optimize str.replace ~2x improvement (#7347)
  • ensure utf8 apply preallocates memory (#7345)
  • improve batched csv readers perf and memory perf (#7329)
  • use inlined strings for field and schema (#7272)
  • reuse groups in binary expressions (#7202)
  • improve perf of multi-args exprs in groupby context (#7186)
  • improve single argument elementwise expression pe… (#7180)
  • optimize arr.sum for list array with inner nulls (#7053)
  • optimize arr.min/arr.max (#7050)
  • optimize arr.mean (#7048)
  • optimize arr.sum (#7047)
  • optimize 'arg_where' (#7039)
  • add arr.count_match expression and optimize arr.sum for List<Boolean> (#7023)
  • remove O^2 behavior in melt (#7003)
  • improve vec_hash perf for boolean and utf8 (#6963)
  • don't pack utf8 columns in grouptuples ~5-15% (#6959)
  • don't pack integer keys in determining ~8-18% group tuples. (#6956)
  • use fxhash for all integers (#6954)
  • speedup quantile/median ~2x (#6861)
  • remove unneeded series allocations in groupby aggs (#6855)
  • faster str.contains literal matching in the small-string regime (#6811)
  • optimize arg_min/arg_max (#6799)

✨ Enhancements

  • support mode for floats and categoricals (#7827)
  • support sort by 'struct' type (#7822)
  • thousand separators in shape of repr DataFrame (#7775)
  • deprecate default value of aggregation_function being 'first' in pivot. In a future version, it will default to None (#7784)
  • add dt.datetime, dt.date, dt.time (#7735)
  • add qcut (#7724)
  • add maintain_order option to Series.cut (#7723)
  • add maintain_order in arr.unique (#7721)
  • DataFrame.top_k/ LazyFrame.top_k (#7720)
  • clearer error message when replace_time_zone encounters ambiguous or non-existent datetimes (#7685)
  • anonymous_scan::as_any (#7715)
  • include set_fmt_float value in Config load/save state (#7696)
  • raise on descending date_range arguments (#7671)
  • add is_leap_year to temporal expressions (#7618)
  • full out-of core support for streaming groupby (#7630)
  • clearer error message when creating duration string without integer (#7648)
  • out-of-core groupby/unique of groupby on integer keys (#7604)
  • slightly more space-efficient table output (use ellipsis char, not three periods) (#7599)
  • implement decimal -> dtype cast (#7600)
  • overwrite streaming chunk size (#7543)
  • slice pushdown in LazyFrame.unique (#7470)
  • streaming LazyFrame.unique (#7466)
  • automatically infer iso8601-like dates (#7457)
  • convert decimal 256 to 128 on entry (#7448)
  • dynamically change chunk_size in streaming `explo… (#7415)
  • add unary +,-,! to sql (#7399)
  • use IO backed reader when low_memory=True. (#7394)
  • The big error revamp (#7362)
  • parse year-month-day as Datetime in slow-path (#7373)
  • make melt streamable (#7364)
  • don't rechunk before writing to csv (#7365)
  • make LazyFrame.explode streamable. (#7341)
  • initial working version of Decimal Series (#7220)
  • implement serde for literal datetime and series (#7301)
  • improve error message if mmap fails in ipc (#7300)
  • add support for serializing categoricals to json (#7276)
  • enable min-max skipping for binary in parquet, enable min-max skipping for is_in exprs (#7169)
  • out of core sort on multiple columns (#7244)
  • support nulls_last for multi-column sort (#7242)
  • implement row encoding for boolean and binary (#7218)
  • allow passing utc=True when parsing time-zone-naive date strings (#7203)
  • add sql "ARRAY_AGG" (#7204)
  • show column name if read_csv errors (#7177)
  • add explode for binary (#7159)
  • improve error message when read_csv fails (#7150)
  • Improve usability of Null type. (#7136)
  • add sort maintaining order row encoding (#7117)
  • add glob support to scan_ndjson (#7143)
  • streaming: scale chunk_size on table width (#7119)
  • additional read functions (#7102)
  • add 'use_statistics' option to parquet readers (#7087)
  • add arr.count_match expression and optimize arr.sum for List<Boolean> (#7023)
  • add sort for struct dtype (#7021)
  • raise informative error if invalid datetime_format passed to write_csv (#7005)
  • rename parse_dates => try_parse_dates (#6987)
  • add is_duplicated/is_unique for struct dtype (#6940)
  • supported nested fixedsizebinary conversion (#6923)
  • raise error on invalid aggregation expressions (#6921)
  • properly implement null array (#6817)
  • avoid panic error in strftime with invalid format (#6810)

🐞 Bug fixes

  • fill null list (#7836)
  • fix explode list[null] (#7832)
  • fix unicode lower/uppercase (#7826)
  • don't use naive name in partitioned agg (#7810)
  • Ensure CsvReader always respects the n_rows parameter (#7789)
  • ensure k is lower than height (#7779)
  • raise error on invalid categorical cast (#7686)
  • compile issue in polars-lazy (#7766)
  • compile issues in "polars-core" with default features (#7765)
  • make zip_with_same_type obligatory (#7761)
  • fix melt projection pushdown node (#7752)
  • fix predicate pushdown for 'unique' first/last (#7749)
  • fix null propagation (#7748)
  • avoid ambiguous time error when passing python Datetime to DataFrame constructor (#7711)
  • Fix infering CSV schema when skip_rows_after_heade… (#7701)
  • fix race condition in null handling of window fast… (#7695)
  • respect time zone in groupby_rolling with negative offset (#7664)
  • fix empty case str.replace (#7662)
  • respect time zone in rolling_* functions (#7643)
  • fix schema of decimal type reads (#7652)
  • respect time zone in offset_by (#7626)
  • respect time zone in dt.round (#7611)
  • add decimal chunk_lengths (#7589)
  • fix ooc sort. the fast path was invalid (#7588)
  • Fix regression throwing AmbiguousTimeError in groupby_dynamic (#7454)
  • activate dtype-duration for polars-ops (#7582)
  • distinct project whole schema if not a subset (#7581)
  • sql window functions (#7458)
  • respect time zone in upsample (#7563)
  • fix rolling windows for windows that shrink from lhs (#7556)
  • pushdown key in merge sorted projection pd (#7542)
  • don't upcast column to string in 'is_in' operation (#7538)
  • Enable link to DateLikeNameSpace in the docs. (#7526)
  • fix(rust, python) respect time zone in date_range (#7503)
  • use physical types in sort-by args (#7518)
  • fix projection pushdown of asof_joins (#7487)
  • raise error on categorical by arguments if not fro… (#7464)
  • sql floor & ceil (#7456)
  • allow for hourly date_range to cross DST (#7430)
  • respect lexical/physical in multi-column categoric… (#7417)
  • fix null_dtype slice (#7414)
  • sort_by logical types (#7412)
  • parse single-digit months and dates when code would have gone down fastpath (#7391)
  • creating empty struct series with some unit fields (#7383)
  • don't panic when writing NullArray values to python row tuple (#7346)
  • fix projection pushdown on join with unused join key (#7326)
  • raise error on time -> datetime cast (#7325)
  • make pl.struct mappable (#7299)
  • err on duplicate with_column names (#7296)
  • don't panic on str.parse_int (#7072)
  • improve concat_list with empty list error message (#7236)
  • fix groupby_dynamic's binning when index_column is time-zone-aware (#7278)
  • fix preservation of microseconds when converting Python datetime (#7271)
  • no panic on empty cross join (#7266)
  • raise error on ambiguous filter predicates (#7265)
  • handle concat_list with first lit value (#7235)
  • add type annotation to avoid potential build errors (#7223)
  • floating point CSV parsing with escaping and whitespace (#7196)
  • fix(rust, python); make list function 'map' and refactor multi-arg ex… (#7185)
  • validate trees before inserting streaming node (#7179)
  • fix list take logical types (#7163)
  • fix null cmp fast paths (#7157)
  • don't panic un unsupported arithmetic type (#7154)
  • don't let a cast unset agg_state and keep logical … (#7151)
  • expose sort expressions to stack-optimizer (#7148)
  • improve error message when read_csv fails (#7150)
  • make cast unknown a no-op (#7147)
  • fix panic on cum_prod (#7141)
  • respect f32 schema in deep expressions (#7146)
  • fix deadlock in scan_csv()->sink_parquet() (#7118)
  • make CSV reader respect n_rows with globbing (#6969)
  • nested sql exprs (#7112)
  • fix logical types in arr.get (#7094)
  • allow fill_null in eager if type now known (#7092)
  • do projection just before concat to ensure same sizes (#7089)
  • fix 'filter' in groupby context when expression is… (#7041)
  • reflect time zone conversion in lazy dataframe schema (#7022)
  • ensure set_sorted never panics (#7013)
  • fix struct append 0 sliced (#7012)
  • fix coalesce supertype (#7000)
  • fix fill_null for categoricals (#6998)
  • dtype of pow function (#6985)
  • fix is_duplicated for utf8 dtype (#6997)
  • fix temporal logical types in pivot (#6957)
  • ensure literals are expanded in streaming (#6952)
  • str.contains strict=False took no effect (#6950)
  • add special fast path for elementwise expression o… (#6924)
  • fix arg_min/arg_max when sorted (#6927)
  • fix anonymous list builder (#6916)
  • reject multithreading on excessive ',\n' fields (#6906)
  • dispatch suffix to asof_join by (#6899)
  • improve recursive casting of nested data (#6897)
  • don't fast explode on null introducing take (#6890)
  • fix crash in write_csv when mixed tz-naive and tz-aware datetimes are present (#6828)
  • Do not panic when infering schema from empty rows (#6849)
  • fix schema of functions: (#6845)
  • Do not panic when failing to extract numeric value (#6848)
  • stabilize integer operation to minimal required dtype (#6841)
  • respect schema in ndjson (#6819)

🛠️ Other improvements

  • refactor(rust); split up vector_hasher module (#7807)
  • remove unnecessary copy in rolling function (#7801)
  • cover uncovered paths in agg_* functions (#7800)
  • Add "typos" as spell checking lint (#7759)
  • fix typos (#7756)
  • change some panics to errors (#7669)
  • remove apply_on_tz_corrected (#7624)
  • don't branch via error in read_csv::parse_dates (#7621)
  • fix a bunch of cargo warnings & errors (#7549)
  • factor out some utils into polars-time/src/utils (#7562)
  • abstract memory collection in sinks (#7560)
  • mark DataFrame.get_columns_mut as unsafe (#7557)
  • Use pre-installed rustup (#7544)
  • refactor date parsing (#7517)
  • refactor join pushdown (#7486)
  • Use eprintln! instead of eprint! (#7473)
  • Improved JSON IO docs (#7445)
  • update arrow (#7409)
  • Rename Decimal prec to precision (#7401)
  • add more docstrings to Expr (#7258)
  • use SchemaRef in CSV modules (#7250)
  • fix polars-row tests and add to ci (#7275)
  • remove binary feature (#7219)
  • Replace num with num-traits + a few minor maintenance fixes (#7201)
  • simplify binary expression evaluation (#7195)
  • ensure binary branches are executed in parall… (#7193)
  • Build versioned API reference (#7114)
  • update_arrow fix categorical statistics (#7098)
  • separate crate for error type (#7096)
  • Rename kwarg reverse to descending (#6914)
  • update rayon (#7001)
  • remove time 0.1 dep (#6979)
  • add LazyFileListReader trait (#6937)
  • cleanup is_unique impl (#6935)
  • Clean up some warnings (#6934)
  • update rustc to nightly-2023-02-14 (#6909)
  • avoid unnecessary mut (#6894)
  • setup support for fixedsizebinary convertion (#6867)
  • split agg in modules and make quantile DRY (#6857)
  • Rename argsort/argsort_by to arg_sort/arg_sort_by (#6829)
  • Update dprint config excludes (#6822)

Thank you to all our contributors for making this release possible!
@CloseChoice, @Hofer-Julian, @LdRoW, @MarcoGorelli, @MatveyF, @SauravMaheshkar, @Trippy3, @Vincenthays, @adamgreg, @advoet, @aldanor, @alexander-beedie, @borchero, @chitralverma, @cjackal, @coinflip112, @csko, @datapythonista, @dependabot, @dependabot[bot], @didriksg, @duskmoon314, @ecashin, @foxcroftjn, @ghuls, @iamsmkr, @igmriegel, @jakob-keller, @jonashaag, @josemasar, @josh, @juba, @jvdd, @kngwyu, @minimav, @moritzwilksch, @mslapek, @nrebena, @oysols, @ozgrakkurt, @papparapa, @ptiza, @rben01, @ritchie46, @romanovacca, @s-banach, @sorhawell, @stinodego, @universalmind303, @vincev, @xhochy, @xyning and @zundertj

polars - Python Polars 0.16.16

Published by github-actions[bot] over 1 year ago

🐞 Bug fixes

  • ensure k is lower than height (#7779)
  • raise error on invalid categorical cast (#7686)
  • raise error on attempt to set invalid Datetime or Duration dtype timeunit (#7768)

🛠️ Other improvements

  • Add "typos" as spell checking lint (#7759)
  • fix typos (#7756)

Thank you to all our contributors for making this release possible!
@alexander-beedie, @ghuls, @ritchie46 and @universalmind303

polars - Python Polars 0.16.15

Published by github-actions[bot] over 1 year ago

🚀 Performance improvements

  • change top_k algorithm (#7718)
  • runtime SIMD target detection for min/max/sum and impl SIMD mean ~2-5x (#7702)
  • implement top-k optimization (#7678)
  • ooc-sort dump in thread local if IO-thread is full. (#7668)
  • use perfect hash table for ooc partitioning (#7653)

✨ Enhancements

  • add dt.datetime, dt.date, dt.time (#7735)
  • new "row_totals" parameter for write_excel that adds a row-wise total column using structured references (#7751)
  • More ergonomic args for min/max (#7742)
  • More ergonomic args for concat_list (#7745)
  • add Series.hist (#7727)
  • add qcut (#7724)
  • add maintain_order option to Series.cut (#7723)
  • create series with only none list with specific dtype (#7722)
  • add maintain_order in arr.unique (#7721)
  • DataFrame.top_k/ LazyFrame.top_k (#7720)
  • clearer error message when replace_time_zone encounters ambiguous or non-existent datetimes (#7685)
  • include set_fmt_float value in Config load/save state (#7696)
  • raise on descending date_range arguments (#7671)
  • include add operator-equivalent expression (#7667)
  • add expression method equivalents for existing math/logical operators (#7660)
  • add is_leap_year to temporal expressions (#7618)
  • full out-of core support for streaming groupby (#7630)
  • clearer error message when creating duration string without integer (#7648)
  • allow scan_csv to take a list of column names in a new_columns param (#7642)
  • out-of-core groupby/unique of groupby on integer keys (#7604)
  • allow set and/or frozenset as input to is_in expressions (#7613)

🐞 Bug fixes

  • make zip_with_same_type obligatory (#7761)
  • fix melt projection pushdown node (#7752)
  • fix predicate pushdown for 'unique' first/last (#7749)
  • fix null propagation (#7748)
  • fix init from pandas Series that has no dtype and is empty (or contains only null values) (#7716)
  • avoid ambiguous time error when passing python Datetime to DataFrame constructor (#7711)
  • Fix infering CSV schema when skip_rows_after_heade… (#7701)
  • fix race condition in null handling of window fast… (#7695)
  • address Series init regression from list of np.arange objects (#7692)
  • improve error message if unavailable lazy module is queried for __version__ attribute (#7680)
  • fix reversed non-existant file error msg (#7657) (#7673)
  • respect time zone in groupby_rolling with negative offset (#7664)
  • fix empty case str.replace (#7662)
  • allow for list of datetimes with timezone(timedelta!=0) in Series constructor (#7645)
  • respect time zone in rolling_* functions (#7643)
  • fix schema of decimal type reads (#7652)
  • detect deltalake version in show_versions (#7622)
  • respect time zone in offset_by (#7626)
  • fix boolean Series init with integer 1/0 values (#7619)
  • respect time zone in dt.round (#7611)

🛠️ Other improvements

  • Display full argument names in __repr__ for Datetime a… (#7736)
  • add Expr.pipe API docs link (#7734)
  • Add sort_by example taking one row per group (#7712)
  • Clean up a few type hints/imports (#7687)
  • Move wrap_x utils to utils module (#7672)
  • Reduce number of polars.internals imports (#7628)
  • Remove duplicate column from Expr.sort example (#7684)
  • Move expr parsing to utils (#7661)
  • Eliminate function re-exports through internals (#7650)
  • Move last functionality out of internals (#7649)
  • More internals cleanup (#7638)
  • Update lockfile (#7637)
  • fix and improve type hints and function names (#7609)
  • remove additional logic from scan delta (#7605)

Thank you to all our contributors for making this release possible!
@MarcoGorelli, @alexander-beedie, @borchero, @chitralverma, @didriksg, @ghuls, @jakob-keller, @minimav, @ritchie46, @stinodego, @universalmind303 and @zundertj

polars - Python Polars 0.16.14

Published by github-actions[bot] over 1 year ago

🚀 Performance improvements

  • optimize string kernels, (elide redundant allocs) (#7602)
  • even faster polars module import (~15%) (#7584)
  • optimize str_replace for same length replacements ~2x (#7580)
  • reinstate fast module import and optimise DataFrame init by implementing dynamic singledispatch registration (#7559)
  • improve perf or str.replace_n and add n argument ~10x (#7575)
  • speedup replace_literal_all of single byte replacements ~15x. (#7565)
  • set sorted flags (#7558)
  • extend ultrafast constant-value frame init to temporal types (over 1,000x speedup) (#7527)

✨ Enhancements

  • slightly more space-efficient table output (use ellipsis char, not three periods) (#7599)
  • implement decimal -> dtype cast (#7600)
  • use head on pyarrow datasets (#7570)
  • overwrite streaming chunk size (#7543)

🐞 Bug fixes

  • remove index columns in pandas to_sql() (#7596)
  • add decimal chunk_lengths (#7589)
  • fix ooc sort. the fast path was invalid (#7588)
  • Fix regression throwing AmbiguousTimeError in groupby_dynamic (#7454)
  • activate dtype-duration for polars-ops (#7582)
  • distinct project whole schema if not a subset (#7581)
  • reinstate fast module import and optimise DataFrame init by implementing dynamic singledispatch registration (#7559)
  • sql window functions (#7458)
  • respect time zone in upsample (#7563)
  • fix rolling windows for windows that shrink from lhs (#7556)
  • remove pyarrow from construction and dispatch to rust (#7551)
  • fix negative indexing for head/tail (#7554)
  • Remove BatchedCsvReader from public API (#7546)
  • fix logical/list getitem (#7545)
  • pushdown key in merge sorted projection pd (#7542)
  • don't upcast column to string in 'is_in' operation (#7538)

🛠️ Other improvements

  • Move more code out of internals (#7597)
  • add a performance hint about use of lru_cache to the apply docstrings (#7593)
  • Avoid pli in type hints (part 2) (#7587)
  • Avoid pli in type hints (part 1) (#7586)
  • Move core objects to top level (#7576)
  • Bump ruff (#7567)
  • Rename namespace Array -> List in docs (#7541)
  • Move fmt tests to test_fmt (#7555)
  • Rename sep arg to separator (#7533)
  • Minor Series cleanup (#7531)

Thank you to all our contributors for making this release possible!
@MarcoGorelli, @Vincenthays, @alexander-beedie, @ritchie46, @stinodego, @universalmind303 and @vincev

polars - Python Polars 0.16.13

Published by github-actions[bot] over 1 year ago

🚀 Performance improvements

  • use atoi in favor of lexical in strptime -25% (#7501)
  • [csv] faster utf8 validation ~20% (#7500)
  • [csv] SIMD accelerate SplitFields -40% (#7498)
  • (csv) don't use memchr for splitfields -~0.15% (#7494)
  • csv-file use fast-float for csv float parsing (#7492)

✨ Enhancements

  • literal support for binary (#7519)

🐞 Bug fixes

  • fix(rust, python) respect time zone in date_range (#7503)
  • transparently integrate externally-registered Excel formats (#7520)
  • use physical types in sort-by args (#7518)
  • keep series name in arithmetic (#7513)
  • Initialize with Decimal dtype (#7511)
  • fix projection pushdown of asof_joins (#7487)

🛠️ Other improvements

  • update show_versions with xlsxwriter (and add as optional dependency) (#7507)
  • Use new LazyFrame init in docs (#7508)
  • Bump some linting versions (#7505)

Thank you to all our contributors for making this release possible!
@CloseChoice, @MarcoGorelli, @alexander-beedie, @ecashin, @ritchie46 and @stinodego

polars - Python Polars 0.16.12

Published by github-actions[bot] over 1 year ago

🚀 Performance improvements

  • speed up comparison of sorted arrays ~3.85x. (#7478)
  • improve performance for datetime parsing with %Z (#7369)

✨ Enhancements

  • slice pushdown in LazyFrame.unique (#7470)
  • streaming LazyFrame.unique (#7466)
  • automatically infer iso8601-like dates (#7457)
  • push down temporal predicates to pyarrow scanner (#7421)
  • slice pushdown in scan_arrow_ds (#7449)
  • convert decimal 256 to 128 on entry (#7448)
  • provide option to set individual row_heights on Excel export (#7447)
  • optimise Excel export when all data in a multi-column conditional format is contiguous (#7427)
  • dynamically change chunk_size in streaming `explo… (#7415)
  • support setting multiple conditional formats on the same Excel table column/range (#7411)
  • add unary +,-,! to sql (#7399)
  • disallow converting key values to null in map_dict due … (#7393)
  • use IO backed reader when low_memory=True. (#7394)
  • The big error revamp (#7362)
  • parse year-month-day as Datetime in slow-path (#7373)
  • support applying one conditional format to multiple columns on Excel export (allows for heatmaps) (#7379)
  • Proper superclass for Decimal (#7384)
  • tweak default Date and Time format strings for Excel export (#7380)
  • make melt streamable (#7364)
  • don't rechunk before writing to csv (#7365)

🐞 Bug fixes

  • handle an unusual edge-case introspecting dataclass type hints (#7476)
  • raise error on categorical by arguments if not fro… (#7464)
  • fix and test df.corr (#7463)
  • make DataFrame rendering compatible with quarto and pandoc (#7455)
  • sql floor & ceil (#7456)
  • fix DataFrame table rendering issue in some Jupyter environments (#7450)
  • allow for hourly date_range to cross DST (#7430)
  • respect lexical/physical in multi-column categoric… (#7417)
  • fix null_dtype slice (#7414)
  • sort_by logical types (#7412)
  • parse single-digit months and dates when code would have gone down fastpath (#7391)
  • creating empty struct series with some unit fields (#7383)
  • minor Excel export improvements/fixes (#7363)

🛠️ Other improvements

  • Rename read_x functions arg file to source (#7460)
  • Refactor utils module (#7435)
  • Rename functions that clash with builtins (#7424)
  • Showcase new ergonomic syntax in README (#7419)
  • Rename Decimal prec to precision (#7401)
  • Remove _base_type util (#7410)
  • Rename first arg of from_x to data (#7407)
  • use exc as variable name for all captured exceptions (#7403)
  • Remove redundant schema keyword description from `pl.… (#7400)
  • Rename cfg module to config (#7385)
  • Add test for for groupby referencing the same column twice (#7340)
  • Split up datatypes module (#7357)
  • Clean up type checking lints (#7358)

Thank you to all our contributors for making this release possible!
@Hofer-Julian, @MarcoGorelli, @SauravMaheshkar, @aldanor, @alexander-beedie, @cjackal, @ghuls, @josh, @juba, @nrebena, @rben01, @ritchie46, @stinodego and @universalmind303

polars - Python Polars 0.16.11

Published by github-actions[bot] over 1 year ago

🚀 Performance improvements

  • optimize str.replace_all (#7353)
  • optimize str.replace ~2x improvement (#7347)
  • ensure utf8 apply preallocates memory (#7345)

✨ Enhancements

  • make LazyFrame.explode streamable. (#7341)
  • allow import of dtype groups from the top-level to improve discovery (#7339)

🐞 Bug fixes

  • make decimal types opt-in (#7348)
  • fix chunk_sizes in threading apply (#7351)
  • don't panic when writing NullArray values to python row tuple (#7346)

🛠️ Other improvements

  • add write_excel API docs link (#7338)

Thank you to all our contributors for making this release possible!
@alexander-beedie, @ritchie46 and @s-banach

polars - Python Polars 0.16.10

Published by github-actions[bot] over 1 year ago

🏆 Highlights

  • Excel export support via new write_excel IO method (#7251)
  • out of core sort on multiple columns (#7244)

🚀 Performance improvements

  • improve batched csv readers perf and memory perf (#7329)
  • use inlined strings for field and schema (#7272)
  • reuse groups in binary expressions (#7202)

✨ Enhancements

  • support creation of sparklines when exporting Excel tables (#7333)
  • support sqlalchemy/pandas backed write_database (#7322)
  • add adbc database reader and writer (DataFrame.write_database) (#7318)
  • make expr.apply streamable in selection context (#7316)
  • More ergonomic unnest args (#7310)
  • initial working version of Decimal Series (#7220)
  • Support explicit Binary dtype in constructor (#7305)
  • implement serde for literal datetime and series (#7301)
  • improve error message if mmap fails in ipc (#7300)
  • add multi-threaded apply (#7277)
  • add support for serializing categoricals to json (#7276)
  • Add Expr.arg_true (#7056)
  • don't require pyarrow for initialising Series with Python datetimes (#7273)
  • Excel export support via new write_excel IO method (#7251)
  • deprecate describe_(optimized)_plan in favor of explain (#7264)
  • enable min-max skipping for binary in parquet, enable min-max skipping for is_in exprs (#7169)
  • out of core sort on multiple columns (#7244)
  • support nulls_last for multi-column sort (#7242)
  • allow optimizations flags in describe_plan (#7233)
  • implement row encoding for boolean and binary (#7218)
  • allow passing utc=True when parsing time-zone-naive date strings (#7203)
  • Add **named_exprs input for struct (#7208)
  • add sql "ARRAY_AGG" (#7204)

🐞 Bug fixes

  • fix offset in threading apply (#7330)
  • fix projection pushdown on join with unused join key (#7326)
  • raise error on time -> datetime cast (#7325)
  • raise error if output of 'apply' cannot be determined (#7317)
  • make pl.struct mappable (#7299)
  • err on duplicate with_column names (#7296)
  • don't panic on str.parse_int (#7072)
  • improve concat_list with empty list error message (#7236)
  • fix groupby_dynamic's binning when index_column is time-zone-aware (#7278)
  • fix preservation of microseconds when converting Python datetime (#7271)
  • fix us precision of datetime to anyvalue conversion (#7268)
  • no panic on empty cross join (#7266)
  • raise error on ambiguous filter predicates (#7265)
  • handle concat_list with first lit value (#7235)
  • respect schema in DataFrame initialisation for time-zone-aware datetime (#7240)
  • ensure every type is properly normalised (for groupby_dynamic and groupby_rolling) (#7238)
  • add test of median function in lazy mode (#7224)
  • dont lose precision in pl.date_range due to floating point arithmetic (#7229)
  • Conversion of negative timedelta to polars duration (#7209)
  • ensure parametric testing cols=int definition respects allowed_dtypes (#7213)

🛠️ Other improvements

  • Fix read/write_database tests (#7327)
  • Rename scan_ds to scan_pyarrow_dataset (#7320)
  • don't run tests that write to disk by default (#7321)
  • rename read_sql to read_database (#7315)
  • Address git2 vulnerability (#7309)
  • Correctly deprecate DataFrame.pearson_corr (#7307)
  • Skip write_excel doctests (#7306)
  • Run pytest-xdist with worksteal (#7304)
  • Rename pearson_corr & spearman_rank_corr (#7014)
  • refactor(python) Split io module per type (#7295)
  • Move _html module to dataframe module (#7256)
  • Enable strict for ruff TCH lints (#7234)
  • better select on map_dict dtype (#7217)
  • add warning of mmap to ipc docstring (#7216)
  • exit non-zero on fix from ruff (#7215)
  • ensure that DataFrame and LazyFrame init params don't diverge (#7214)

Thank you to all our contributors for making this release possible!
@MarcoGorelli, @aldanor, @alexander-beedie, @coinflip112, @csko, @dependabot, @dependabot[bot], @ghuls, @josemasar, @josh, @mslapek, @nrebena, @ozgrakkurt, @papparapa, @ptiza, @rben01, @ritchie46, @sorhawell, @stinodego, @universalmind303, @xyning and @zundertj

polars - Python Polars 0.16.9

Published by github-actions[bot] over 1 year ago

🚀 Performance improvements

  • improve perf of multi-args exprs in groupby context (#7186)
  • optimize sequence_to_pydf (#7044)
  • improve single argument elementwise expression pe… (#7180)

✨ Enhancements

  • show column name if read_csv errors (#7177)
  • support direct LazyFrame init (same params as DataFrame) (#7122)
  • add a base_type method to DataType (#7166)
  • add explode for binary (#7159)
  • add binary apply (#7160)
  • Allow pl.Int32 Series as output in eager repeat. (#7152)
  • improve error message when read_csv fails (#7150)
  • Improve usability of Null type. (#7136)
  • add glob support to scan_ndjson (#7143)
  • add Expr.pipe (#7134)
  • streaming: scale chunk_size on table width (#7119)
  • additional read functions (#7102)
  • More ergonomic explode args (#7115)

🐞 Bug fixes

  • fix(rust, python); make list function 'map' and refactor multi-arg ex… (#7185)
  • Fix Series.argsort (#7183)
  • validate trees before inserting streaming node (#7179)
  • Raise ValueError for getitem when column indexes are out… (#7167)
  • fix list take logical types (#7163)
  • fix null cmp fast paths (#7157)
  • fix df division dispatch (#7155)
  • don't panic un unsupported arithmetic type (#7154)
  • don't let a cast unset agg_state and keep logical … (#7151)
  • expose sort expressions to stack-optimizer (#7148)
  • improve error message when read_csv fails (#7150)
  • make cast unknown a no-op (#7147)
  • fix panic on cum_prod (#7141)
  • respect f32 schema in deep expressions (#7146)
  • fix return type of _unpack_schema to prevent potential TypeError (#7128)
  • fix docstring in set_tbl_cols() (#7121)
  • fix deadlock in scan_csv()->sink_parquet() (#7118)
  • subtracting Series from date has wrong sign (#7107)
  • fix scan_ipc receiving storage_options (#7085)
  • nested sql exprs (#7112)

🛠️ Other improvements

  • ensure binary branches are executed in parall… (#7193)
  • Deprecate pl.get_dummies (#7055)
  • Add Series.cut, deprecate pl.cut (#7058)
  • examples functional programing (#7135)
  • fix docstring in set_tbl_cols() (#7121)
  • Build versioned API reference (#7114)

Thank you to all our contributors for making this release possible!
@MarcoGorelli, @Trippy3, @alexander-beedie, @foxcroftjn, @ghuls, @iamsmkr, @jakob-keller, @josh, @mslapek, @papparapa, @ritchie46, @romanovacca, @stinodego, @universalmind303 and @zundertj

polars - Python Polars 0.16.8

Published by github-actions[bot] over 1 year ago

🚀 Performance improvements

  • optimize arr.sum for list array with inner nulls (#7053)
  • optimize arr.min/arr.max (#7050)
  • optimize arr.mean (#7048)
  • optimize arr.sum (#7047)
  • optimize 'arg_where' (#7039)
  • More efficient handling of *args/**kwargs (#7026)

✨ Enhancements

  • allow for simple creation of n-row empty frame/series via clear (#7095)
  • Make polars not copy data when importing from arrow (#7084)
  • More ergonomic drop args (#7063)
  • More ergonomic partition_by args (#7065)
  • More ergonomic exclude args (#7082)
  • allow inline expressions in asof_join (#7088)
  • add 'use_statistics' option to parquet readers (#7087)

🐞 Bug fixes

  • allow map_dict on categorical dtype (#7097)
  • fix logical types in arr.get (#7094)
  • allow fill_null in eager if type now known (#7092)
  • do projection just before concat to ensure same sizes (#7089)
  • fix 'filter' in groupby context when expression is… (#7041)
  • fix type hint of 'when->then->otherwise' (#7040)
  • accept more types in from_records (#7033)

🛠️ Other improvements

  • Rename pivot aggregate_fn to aggregate_function (#7059)
  • Add TYPE_CHECKING lints (#7070)
  • Deprecate more non-keyword arguments (#7030)
  • Rename kwarg reverse to descending (#6914)
  • Rename args f/func to function (#7032)
  • let read_csv take Sequence as columns, remove several type: ignore (#7028)
  • add example for arr.count_match() (#7029)

Thank you to all our contributors for making this release possible!
@MarcoGorelli, @alexander-beedie, @coinflip112, @datapythonista, @jakob-keller, @moritzwilksch, @ritchie46, @stinodego, @universalmind303 and @zundertj

polars - Python Polars 0.16.7

Published by github-actions[bot] over 1 year ago

🚀 Performance improvements

  • add arr.count_match expression and optimize arr.sum for List<Boolean> (#7023)
  • optimize selection_to_pyexpr_list (#7020)
  • avoid unnecessary function calls in LazyFrame.with_columns() (#7019)
  • remove O^2 behavior in melt (#7003)
  • Improve performance of expr_to_lit_or_expr for arguments of type Expr by ~80% (#6967)
  • improve vec_hash perf for boolean and utf8 (#6963)
  • don't pack utf8 columns in grouptuples ~5-15% (#6959)
  • don't pack integer keys in determining ~8-18% group tuples. (#6956)
  • use fxhash for all integers (#6954)

✨ Enhancements

  • add arr.count_match expression and optimize arr.sum for List<Boolean> (#7023)
  • add sort for struct dtype (#7021)
  • More ergonomic coalesce args (#6989)
  • raise informative error if invalid datetime_format passed to write_csv (#7005)
  • Improve Series & Numpy arithmetic (#6983)
  • More ergonomic agg args (#6982)
  • rename parse_dates => try_parse_dates (#6987)
  • remove packaging and/or distutils dependency with a minimal version parser utility (#6972)
  • More ergonomic over args (#6986)
  • add upper_bound and lower_bound methods to Series (#6990)
  • More ergonomic col args (#6996)
  • More ergonomic sort args (#6896)
  • Make groupby agg shortcuts available in lazy (#6944)
  • add map_dict method for Series (#6946)

🐞 Bug fixes

  • reflect time zone conversion in lazy dataframe schema (#7022)
  • ensure set_sorted never panics (#7013)
  • fix struct append 0 sliced (#7012)
  • fix dtype of diff for uint8 (#7010)
  • fix coalesce supertype (#7000)
  • if given, respect dtype time zone when instantiating pl.lit value (#6999)
  • fix fill_null for categoricals (#6998)
  • dtype of pow function (#6985)
  • fix is_duplicated for utf8 dtype (#6997)
  • Remove check for path to be non-directory if use_pyarrow (#6994)
  • if given, respect dtype timeunit when instantiating pl.lit value (#6991)
  • Add packaging to runtime dependencies (#6962)
  • fix temporal logical types in pivot (#6957)
  • typo in mean unit test - changed median -> mean (#6960)
  • ensure literals are expanded in streaming (#6952)
  • str.contains strict=False took no effect (#6950)

🛠️ Other improvements

  • date-time unit tests refactor (#7002)
  • test lit series arithmetic order (#7015)
  • More test restructure (#6961)
  • Properly deprecate .struct.to_frame (#6958)
  • Properly deprecate GroupBy.agg_list (#6943)

Thank you to all our contributors for making this release possible!
@MarcoGorelli, @MatveyF, @alexander-beedie, @jakob-keller, @mslapek, @ozgrakkurt, @papparapa, @ritchie46, @sorhawell, @stinodego, @xhochy and @zundertj

polars - Python Polars 0.16.6

Published by github-actions[bot] over 1 year ago

✨ Enhancements

  • add is_duplicated/is_unique for struct dtype (#6940)
  • add is_between method for Series (#6933)
  • supported nested fixedsizebinary conversion (#6923)
  • raise error on invalid aggregation expressions (#6921)
  • provide better errors when failing to read CSV data from buffers that have advanced their read position (#6920)
  • truncate file path on error msg (#6917)
  • Parse JSON data in Utf8 to polars dtype (#6885)
  • More ergonomic groupby args (#6872)

🐞 Bug fixes

  • object to_dict (#6931)
  • respect maintain_order in groupby.apply (#6926)
  • add special fast path for elementwise expression o… (#6924)
  • fix anonymous list builder (#6916)
  • reject multithreading on excessive ',\n' fields (#6906)
  • fix regression with date => object typing in to_pandas method (#6902)
  • dispatch suffix to asof_join by (#6899)
  • improve recursive casting of nested data (#6897)
  • don't fast explode on null introducing take (#6890)
  • prevent external modules found on PYTHONPATH from bleeding into polars venv (#6888)
  • prevent conflation of unit.io tests directory with python io module (#6889)

🛠️ Other improvements

  • Bump ruff version (#6936)
  • add more nested construction tests (#6912)
  • Update Cargo.lock (#6893)
  • unify constructor logic when initialising from a sequence of dicts (#6887)
  • prevent conflation of unit.io tests directory with python io module (#6889)
  • refactor datelike as temporal, and support Time dtype in Series.to_numpy (#6881)
  • Consistently parse column name inputs (#6879)
  • Use Self type more consistently (#6882)

Thank you to all our contributors for making this release possible!
@MarcoGorelli, @adamgreg, @alexander-beedie, @josh, @jvdd, @ritchie46 and @stinodego