bytewax

Python Stream Processing

APACHE-2.0 License

Downloads
15.3K
Stars
1.3K
Committers
23

Bot releases are hidden (Show)

bytewax - v0.20.1 Latest Release

Published by davidselassie 5 months ago

Overview

  • Fixes a bug when using
    {py:obj}~bytewax.operators.windowing.EventClock where in-order but
    "slow" data results in watermark assertion errors.

What's Changed

Full Changelog: https://github.com/bytewax/bytewax/compare/v0.20.0...v0.21.1

bytewax - v0.15.0

Published by davidselassie over 1 year ago

Overview

  • Fixes issue with multi-worker recovery. If the cluster crashed
    before all workers had completed their first epoch, the cluster
    would resume from the incorrect position. This requires a change to
    the recovery store. You cannot resume from recovery data written
    with an older version.

What's Changed

Full Changelog: https://github.com/bytewax/bytewax/compare/v0.14.0...v0.15.0

bytewax - v0.14.0

Published by davidselassie almost 2 years ago

Overview

  • Dataflow continuation now works. If you run a dataflow over a finite
    input, all state will be persisted via recovery so if you re-run the
    same dataflow pointing at the same input, but with more data
    appended at the end, it will correctly continue processing from the
    previous end-of-stream.

  • Fixes issue with multi-worker recovery. Previously resume data was
    being routed to the wrong worker so state would be missing.

  • The above two changes require that the recovery format has been
    changed for all recovery stores. You cannot resume from recovery
    data written with an older version.

  • Adds an introspection web server to dataflow workers.

  • Adds collect_window operator.

What's Changed

Full Changelog: https://github.com/bytewax/bytewax/compare/v0.13.1...v0.14.0

bytewax - v0.13.1

Published by miccioest almost 2 years ago

Overview

  • Added Google Colab support.

What's Changed

Full Changelog: https://github.com/bytewax/bytewax/compare/v0.13.0...v0.13.1

bytewax - v0.10.1

Published by blakestier about 2 years ago

Overview

  • Bugfix: Resolves pickling error. KafkaInputConfig now works with
    spawn_cluster.

What's Changed

  • Pickling logic for auto_commit in KafkaInputConfig by @blakestier in #102
bytewax - v0.10.0

Published by davidselassie over 2 years ago

Overview

  • Input is no longer specified using an input_builder, but now an
    input_config which allows you to use pre-built input
    components. See bytewax.inputs for more info.

  • Preliminary support for a pre-built Kafka input component. See
    bytewax.inputs.KafkaInputConfig.

  • Keys used in the (key, value) 2-tuples to route data for stateful
    operators (like stateful_map and reduce_epoch) must now be
    strings. Because of this bytewax.exhash is no longer necessary and
    has been removed.

  • Recovery format has been changed for all recovery stores. You cannot
    resume from recovery data written with an older version.

  • Slight changes to bytewax.recovery.RecoveryConfig config options
    due to recovery system changes.

  • bytewax.run() and bytewax.run_cluster() no longer take
    recovery_config as they don't support recovery.

What's Changed

New Contributors

Full Changelog: https://github.com/bytewax/bytewax/compare/v0.7.1...v0.10.0

bytewax - v0.8.0

Published by davidselassie over 2 years ago

Overview

  • Capture operator no longer takes arguments. Items that flow through
    those points in the dataflow graph will be processed by the output
    handlers setup by each execution entry point. Every dataflow
    requires at least one capture.

  • Executor.build_and_run() is replaced with four entry points for
    specific use cases:

    • run() for exeuction in the current process. It returns all
      captured items to the calling process for you. Use this for
      prototyping in notebooks and basic tests.

    • run_cluster() for execution on a temporary machine-local cluster
      that Bytewax coordinates for you. It returns all captured items to
      the calling process for you. Use this for notebook analysis where
      you need parallelism.

    • spawn_cluster() for starting a machine-local cluster with more
      control over input and output. Use this for standalone scripts
      where you might need partitioned input and output.

    • cluster_main() for starting a process that will participate in a
      cluster you are coordinating manually. Use this when starting a
      Kubernetes cluster.

  • Adds bytewax.parse module to help with reading command line
    arguments and environment variables for the above entrypoints.

  • Renames bytewax.inp to bytewax.inputs.

What's Changed

New Contributors

Full Changelog: https://github.com/bytewax/bytewax/compare/v0.7.1...v0.8.0

Package Rankings
Top 6.75% on Proxy.golang.org
Top 3.43% on Pypi.org
Badges
Extracted from project README
Actions Status PyPI Bytewax User Guide
Related Projects