Python library for using dplyr like syntax with pandas and SQL
MIT License
Bot releases are hidden (Show)
Published by machow about 4 years ago
Thanks to @tmastny for the PR (#248)!
import siuba.experimental.completer
Published by machow over 4 years ago
Published by machow over 4 years ago
See issue #138. This release ensures summarize...
Published by machow over 4 years ago
This is a small release, designed to support the new siuba documentation.
Features
Published by machow over 4 years ago
siuba/series/spec.yml
(#211)Published by machow over 4 years ago
Published by machow over 4 years ago
Published by machow over 4 years ago
Fixes nest raising the error "TypeError: copy() takes no keyword arguments". Nest now uses a more principled approach to splitting a grouped DataFrame, and creating a list of sub frames! (see #182)
Also fixed doc build, by not trying to run notebooks starting with draft-
. (#186)
Published by machow over 4 years ago
from siuba.siu import symbolic_dispatch
from pandas.core.groupby import SeriesGroupBy, GroupBy
from pandas import Series
@symbolic_dispatch(cls = Series)
def cummean(x):
"""Return a same-length array, containing the cumulative mean."""
return x.expanding().mean()
@cummean.register(SeriesGroupBy)
def _cummean_grouped(x) -> SeriesGroupBy:
grouper = x.grouper
n_entries = x.obj.notna().groupby(grouper).cumsum()
res = x.cumsum() / n_entries
return res.groupby(grouper)
from siuba import _, mutate
from siuba.data import mtcars
# a pandas DataFrameGroupBy object
g_cyl = mtcars.groupby("cyl")
mutate(g_students, cumul_mean = cummean(_.score))
Published by machow almost 5 years ago
Features
Note that CallTreeLocal has new options, allowing it to look up based on chained attributes (e.g. look for an entry named "dt.year", and override custom function calls.).
I still need to finish support for user defined operations and some light siu refactoring.
Breaking changes
dt.year
will consume dt
anyway (can't imagine a situation where we'd want to keep it, and couldn't do that in the translator function)Demo
from siuba.experimental.pd_groups import fast_mutate, fast_filter, fast_summarize
from siuba import *
from siuba.data import mtcars
g_cars = mtcars.groupby(['cyl', 'gear'])
fast_mutate(g_cars, _.hp - _.hp.mean())
semi_join
was duplicating rows as standard joins do.full_join
failing for pandas, since pandas calls it an 'outer' joinseparate()
. https://github.com/machow/siuba/pull/119
As an experimental feature, I shortened the stacktraces for SQL translator errors! https://github.com/machow/siuba/pull/125
Published by machow about 5 years ago
Small fix, supporting count without args. This is a very common case against SQL dbs, since it lets you know how big (how many rows) a table has.
e.g.
tbl_something >> count()
Published by machow about 5 years ago
case_when({True: _.a + 1}
) https://github.com/machow/siuba/pull/103
Published by machow about 5 years ago
Published by machow over 5 years ago
Published by machow over 5 years ago
This release implements extensive testing for postgres and sqlite. It also sets up (but skips) pandas unit tests.
It follows from this PR: https://github.com/machow/siuba/pull/36
SQL Improvements: