datasketches-cpp

Core C++ Sketch Library

APACHE-2.0 License

Downloads
37.1K
Stars
224
Committers
31

Bot releases are hidden (Show)

datasketches-cpp - datasketches-cpp-5.1.0 Latest Release

Published by AlexanderSaydakov 3 months ago

  • implemented tdigest
  • added get_serialized_size_bytes() and get_max_serialized_size_bytes() to compact Theta sketch
  • fixed compressed Theta sketch stream serialization
  • added Tuple sketch filter() method
datasketches-cpp - 5.0.2

Published by jmalkin 9 months ago

This is patch update. The original 5.0.0 release notes are presented next with a cumulative set of patch update changes at the end.

This is a major release due to separation of Python part of the library into its own repository datasketches-python, which can potentially be API-breaking for somebody. We also took this opportunity to do some other possibly API-breaking cleanup.

  • moved all Python-related code to new datasketches-python repository
  • finished moving public constants to separate namespaces
  • removed deprecated methods (such as get_quantiles())
  • generalized array_of_doubles sketch as array_tuple_sketch
  • implemented new EB-PPS sketch (exact PPS sampling with bounded sample size)
  • fixed slowness in Theta intersection
  • fixed incompatibility of serialized empty frequent items sketches with Java

The patch release fixes:

  • a bug in KLL that could cause a self-move (undefined behavior) (5.0.1)
  • a bug in EBPPS Sampling's to_string() method that could cause compilation failure for non-string types (5.0.1)
  • use of a method in density sketch that was removed in C++17, breaking forward compatibility (5.0.2)
datasketches-cpp - datasketches-cpp-5.0.1

Published by jmalkin 10 months ago

This is a major release due to separation of Python part of the library into its own repository datasketches-python, which can potentially be API-breaking for somebody. We also took this opportunity to do some other possibly API-breaking cleanup.

  • moved all Python-related code to new datasketches-python repository
  • finished moving public constants to separate namespaces
  • removed deprecated methods (such as get_quantiles())
  • generalized array_of_doubles sketch as array_tuple_sketch
  • implemented new EB-PPS sketch (exact PPS sampling with bounded sample size)
  • fixed slowness in Theta intersection
  • fixed incompatibility of serialized empty frequent items sketches with Java

The patch release fixes:

  • a bug in KLL that could cause a self-move (undefined behavior)
  • a bug in EBPPS Sampling's to_string() method that could cause compilation failure for non-string types
datasketches-cpp - datasketches-cpp-5.0.0

Published by AlexanderSaydakov 11 months ago

  • moved all Python-related code to new datasketches-python repository
  • finished moving public constants to separate namespaces
  • removed deprecated methods (such as get_quantiles())
  • generalized array_of_doubles sketch as array_tuple_sketch
  • implemented new EB-PPS sketch (exact PPS sampling with bounded sample size)
  • fixed slowness in Theta intersection
  • fixed incompatibility of serialized empty frequent items sketches with Java
datasketches-cpp - datasketches-cpp-4.1.0

Published by AlexanderSaydakov over 1 year ago

  • HLL union speed improvement
  • Fixed a bug in theta and tuple union base
  • new density sketch
  • new count min sketch
  • thread local random generator
  • generic quantile sketches in Python (KLL, REQ, classic quantiles)
  • generic frequent items sketch in Python
  • generic tuple sketch in Python
  • added optional compression of serialized theta sketch
  • iterators use new style (no inheritance from std::iterator)
datasketches-cpp - datasketches-cpp-4.0.1

Published by jmalkin over 1 year ago

This is a patch release with only very minor code changes to address several small compiler warnings.

The main difference is that the associated Python wheels distributed as convenience binaries (and not included in git) are now produced for ARM64 architectures, which should provide increased compatibility with several major cloud computing providers.

datasketches-cpp - datasketches-cpp-4.0.0

Published by AlexanderSaydakov almost 2 years ago

This is a major release with some API-breaking changes

  • Common sorted view used by all quantiles sketches with simultaneous support for both inclusive and exclusive modes
  • The default mode for all methods for querying quantiles sketches was changed from exclusive to inclusive
  • The mode is now a method parameter, not a template parameter
  • Queries of empty quantiles sketches such as get_rank() and get_quantile() will throw an exception now (returned NaN for floating point types before)
  • SerDe was removed from class templates and added to the relevant method templates (such as serialize and deserialize)
  • Support for comparator instances in quantiles sketches
  • Support for equality operator instance in frequent items sketch
  • Added operator-> to iterators over quantiles sketches
datasketches-cpp - v3.5.1

Published by jmalkin almost 2 years ago

Patch release, no new features:

  • Fix python wheel build script to produce valid wheels for Apple Silicon Macs
  • Fix a serialization bug for theta and tuple sketches when sketch had no entries but was not empty (e.g. the result of an intersection between disjoint sets)
datasketches-cpp - datasketches-cpp-3.5.0

Published by AlexanderSaydakov over 2 years ago

  • Type converting constructors for KLL and REQ sketches
  • Fixed KLL copy constructor (affects non-arithmetic types)
  • Added internal check in CPC sketch compression to avoid problems with static analysis
datasketches-cpp - v3.4.0

Published by jmalkin over 2 years ago

This release includes the following changes:

  • addition of Quantiles sketch: the algorithm is largely obsolete vs KLL but this provides compatibility for existing sketches
  • support for serde instances in all relevant sketches; class-level templates are now marked deprecated
  • greater API consistency across quantiles, KLL, and REQ
    • all three support a new public get_sorted_view() interface
    • all three support rank and quantile queries with an optional inclusive mode
  • cmake minimum version bump to 3.16
  • Kolmogorov-Smirnov test for KLL and classic Quantiles, also available in python
  • code cleanup and bugfixes
datasketches-cpp - datasketches-cpp-3.3.0

Published by AlexanderSaydakov almost 3 years ago

  • several fixes with respect to allocations using a provided allocator instance
  • fixes and improvements in cmake files for including DataSketches as dependency in other projects
  • Tuple sketch serial version 3 for compatibility with Java
  • support for older serialization versions of Theta sketch
  • added reset() method in Theta and Tuple sketch and union
  • minor changes to some corner cases of Theta and Tuple intersection and a-not-b operations

Known problems:

  • support for older serialization versions of Theta sketch is incomplete: deserialize from bytes does not handle old versions
  • REQ sketch get_PMF() has undefined behavior for empty sketches (can crash). Check is_empty() before calling get_PMF()
datasketches-cpp -

Published by jmalkin about 3 years ago

This version includes the following changes:

  • Fix issue #236 , a serialization bug in the KLL sketch
  • Refactored python to remove pybind11 as a submodule. It is now a dependency only for building the package
  • Updated LICENSE file to reflect how pybind11 is used
  • Added convenience binaries for python available from https://pypi.org/project/datasketches
datasketches-cpp - datasketches-cpp-3.1.0

Published by AlexanderSaydakov over 3 years ago

  • Kolmogorov-Smirnov test for KLL sketch
  • custom seed support in Theta Jaccard similarity
  • Theta union bug fix
  • added get_max_serialized_size_bytes method for KLL and CPC sketches
  • added wrapped_compact_theta_sketch to avoid some cost of deserialization
  • massive code cleanup to avoid compiler warnings
  • iterator fix in KLL sketch
  • iterator fix in REQ sketch
  • exception safety fix in theta_update_sketch_base
  • misaligned access fix in MurmurHash3
datasketches-cpp - datasketches-cpp-3.0.0

Published by AlexanderSaydakov over 3 years ago

  • Introduction of new Relative Error Quantiles Sketch,
  • Added Tuple sketch and rewritten Theta sketch to share the same base,
  • Performance improvement of HLL sketch,
  • Removed serialization of Update Theta sketch and Union, and HLL Union,
  • Added support for passing instances of allocators
datasketches-cpp - Apache Release 2.1.0-incubating

Published by leerho almost 4 years ago

  • fixed potential crash when querying KLL with complex types
  • added vector_of_kll to python
  • added help text to all python methods
datasketches-cpp - Apache Release 2.0.0-incubating

Published by leerho almost 4 years ago

  • header-only library
  • fully allocator-aware
  • exception-safe
  • varopt sampling added
  • API changes for consistency
datasketches-cpp - Apache Release 1.0.0-incubating

Published by AlexanderSaydakov about 5 years ago

The first release.

  • KLL quantiles sketch
  • Frequent items sketch
  • CPC distinct-counting sketch
  • Theta distinct-counting sketch with set operations
  • HLL distinct-counting sketch