koalas

Koalas: pandas API on Apache Spark

APACHE-2.0 License

Downloads
1.4M
Stars
3.3K
Committers
53

Bot releases are hidden (Show)

koalas - Version 0.5.0

Published by ueshin over 5 years ago

We refined the package management and pushed to conda-forge as well as PyPI. Now we can install Koalas with the conda package manager:

conda install koalas -c conda-forge

We also added the following features:

koalas:

  • concat (#348)

koalas.DataFrame:

  • astype (#349)
  • to_records (#298)
  • size (#356)
  • iloc (#364)
  • describe (#375)

koalas.Series:

  • to_json (#358)
  • to_csv (#358)
  • dtypes (#355)
  • size (#356)
  • to_excel (#361)
  • iloc (#364)
  • all (#359)
  • any (#359)
  • dt (#295, #372)
  • describe (#375)

Along with the following improvements:

  • Explicitly marked functions deprecated in pandas which we won't support without a special reason. (#342)
  • Introduced Index/MultiIndex corresponding to pandas', instead of reusing Series. (#341)
koalas - Version 0.4.0

Published by rxin over 5 years ago

We rapidly improved Koalas in documentation and added new functionalities in the past week. As of this release, all functions are documented. We also added the following features:

koalas:

  • range (#254) - for generating a distributed sequence of data
  • sql (#256) - for running SQL queries

koalas.DataFrame:

  • merge (#264)
  • to_json (#238)
  • to_csv (#239)
  • to_excel (#288)
  • to_clipboard (#257)
  • clip (#297)
  • to_latex (#297)

koalas.Series:

  • unique (#249)
  • to_clipboard (#257)
  • to_latex (#297)
  • clip (#297)
  • fillna (#317)
  • is_unique (#325)
  • sample (#327)

Along with the following improvements:

  • Design Principles and Contribution Guide (#246, #255)
  • DataFrame.drop now supports columns parameter (#253)
  • repr and repr_html improvements (#258) - only shows top 1000 when the number of values/rows in DataFrame and Series exceed 1000.
koalas - Version 0.3.0

Published by ueshin over 5 years ago

We fixed a critical bug for Python 3.5 introduced in v0.2.0. #241

Also we have added the following features:

koalas.DataFrame:

  • isin
  • to_dict

koalas.Series:

  • isin
  • to_dict

and improvements:

koalas.Series:

  • __add__ and __radd__ now supports string concatenation

koalas.groupby.GroupBy:

  • agg() now preserves the group keys as indices

and a lot of code and document cleanups.

koalas - Version 0.2.0

Published by rxin over 5 years ago

We have implemented a lot of major functionalities in the past week. Here's a summary of what's new in release v0.2.0.

spark.DataFrame:

  • to_koalas is monkey patched into Spark's DataFrame API when koalas package is imported

koalas.DataFrame:

  • count
  • corr
  • dtypes
  • groupby
  • sort_values now supports ascending, na_position, and inplace parameters
  • to_numpy
  • to_pandas (with toPandas as an alias for compatibility with Spark)
  • to_string
  • Allow direct literal assignment to create a new column
  • Various stats functions now work with boolean type
  • In notebooks or REPL, automatically display the content of the DataFrame, similar to pandas

koalas.Series:

  • alias (as an alias for rename function)
  • count
  • groupby
  • to_numpy
  • to_pandas (with toPandas as an alias for compatibility with Spark)
  • to_string
  • fillna
  • Various stats functions now work with boolean type
  • In notebooks or REPL, automatically display the content of the Series, similar to pandas

Significantly improved documentation of the project.

Last but not least, we have done some major refactoring of the codebase and its infrastructure to make it more amenable to changes in the future, e.g.

  • Now koalas.DataFrame wraps around a Spark DataFrame, rather than directly monkey patching all methods.
  • Doctests are enabled and can be run directly in PyCharm
  • Mypy type hint linter is added
  • Switched from nose to pytest for test infrastructure.
  • Introduced utility methods to support older versions of pandas. #210
  • Code coverage report
koalas - Version 0.1.0

Published by rxin over 5 years ago

We rewrote the internals of Koalas to make it more extensible for upcoming features. We also laid down the foundation for API reference docs in this release.

koalas - Version 0.0.6

Published by thunterdb over 5 years ago

This version significantly expands the amount of functions available. It is still meant to be a technology preview, and users are encouraged to report issues that they encounter with their current pandas code.

Noteworthy features:

  • indexing is now supported
  • slicing and accessing columns is much improved
  • most of the methods are accessible as stubs
  • support for N/A (fillna, dropna, etc.) has been added

We thank all the contributors who have contributed to this release.

koalas - Version 0.0.5

Published by thunterdb over 5 years ago

This is the initial release outside Databricks.

This release is meant to be a technology preview. See the README.md file for more information.

Package Rankings
Top 19.39% on Conda-forge.org
Top 6.72% on Proxy.golang.org
Top 1.21% on Pypi.org
Badges
Extracted from project README
Github Actions codecov Documentation Status Latest Release Conda Version Binder Downloads
Related Projects