bytewax

Python Stream Processing

APACHE-2.0 License

Downloads
15.3K
Stars
1.3K
Committers
23

Bot releases are visible (Hide)

bytewax - v0.11.1

Published by whoahbot about 2 years ago

  • KafkaInputConfig now accepts additional properties. See
    bytewax.inputs.KafkaInputConfig.

  • Support for a pre-built Kafka output component. See
    bytewax.outputs.KafkaOutputConfig.

What's Changed

Full Changelog: https://github.com/bytewax/bytewax/compare/v0.11.0...v0.11.1

bytewax - v0.11.0

Published by whoahbot about 2 years ago

  • Added the fold_window operator, works like reduce_window but allows
    the user to build the initial accumulator for each key in a builder function.

  • Output is no longer specified using an output_builder for the
    entire dataflow, but you supply an "output config" per capture. See
    bytewax.outputs for more info.

  • Input is no longer specified on the execution entry point (like
    run_main), it is instead using the Dataflow.input operator.

  • Epochs are no longer user-facing as part of the input system. Any
    custom Python-based input components you write just need to be
    iterators and emit items. Recovery snapshots and backups now happen
    periodically, defaulting to every 10 seconds.

  • Recovery format has been changed for all recovery stores. You cannot
    resume from recovery data written with an older version.

  • The reduce_epoch operator has been replaced with
    reduce_window. It takes a "clock" and a "windower" to define the
    kind of aggregation you want to do.

  • run and run_cluster have been removed and the remaining
    execution entry points moved into bytewax.execution. You can now
    get similar prototyping functionality with
    bytewax.execution.run_main and bytewax.execution.spawn_cluster
    using Testing{Input,Output}Configs.

  • Dataflow has been moved into bytewax.dataflow.Dataflow.

What's Changed

New Contributors

Full Changelog: https://github.com/bytewax/bytewax/compare/v0.10.0...v0.11.0

bytewax - v0.10.1

Published by blakestier about 2 years ago

Overview

  • Bugfix: Resolves pickling error. KafkaInputConfig now works with
    spawn_cluster.

What's Changed

  • Pickling logic for auto_commit in KafkaInputConfig by @blakestier in #102
bytewax - v0.10.0

Published by davidselassie over 2 years ago

Overview

  • Input is no longer specified using an input_builder, but now an
    input_config which allows you to use pre-built input
    components. See bytewax.inputs for more info.

  • Preliminary support for a pre-built Kafka input component. See
    bytewax.inputs.KafkaInputConfig.

  • Keys used in the (key, value) 2-tuples to route data for stateful
    operators (like stateful_map and reduce_epoch) must now be
    strings. Because of this bytewax.exhash is no longer necessary and
    has been removed.

  • Recovery format has been changed for all recovery stores. You cannot
    resume from recovery data written with an older version.

  • Slight changes to bytewax.recovery.RecoveryConfig config options
    due to recovery system changes.

  • bytewax.run() and bytewax.run_cluster() no longer take
    recovery_config as they don't support recovery.

What's Changed

New Contributors

Full Changelog: https://github.com/bytewax/bytewax/compare/v0.7.1...v0.10.0

bytewax - v0.9.0

Published by whoahbot over 2 years ago

Overview

  • Adds bytewax.AdvanceTo and bytewax.Emit to control when processing
    happens.

  • Adds bytewax.run_main() as a way to test input and output builders
    without starting a cluster.

  • Adds a bytewax.testing module with helpers for testing.

  • bytewax.run_cluster() and bytewax.spawn_cluster() now take a
    mp_ctx argument to allow you to change the multiprocessing
    behavior. E.g. from "fork" to "spawn". Defaults now to "spawn".

What's Changed

New Contributors

Full Changelog: https://github.com/bytewax/bytewax/compare/v0.7.1...v0.9.0

bytewax - v0.8.0

Published by davidselassie over 2 years ago

Overview

  • Capture operator no longer takes arguments. Items that flow through
    those points in the dataflow graph will be processed by the output
    handlers setup by each execution entry point. Every dataflow
    requires at least one capture.

  • Executor.build_and_run() is replaced with four entry points for
    specific use cases:

    • run() for exeuction in the current process. It returns all
      captured items to the calling process for you. Use this for
      prototyping in notebooks and basic tests.

    • run_cluster() for execution on a temporary machine-local cluster
      that Bytewax coordinates for you. It returns all captured items to
      the calling process for you. Use this for notebook analysis where
      you need parallelism.

    • spawn_cluster() for starting a machine-local cluster with more
      control over input and output. Use this for standalone scripts
      where you might need partitioned input and output.

    • cluster_main() for starting a process that will participate in a
      cluster you are coordinating manually. Use this when starting a
      Kubernetes cluster.

  • Adds bytewax.parse module to help with reading command line
    arguments and environment variables for the above entrypoints.

  • Renames bytewax.inp to bytewax.inputs.

What's Changed

New Contributors

Full Changelog: https://github.com/bytewax/bytewax/compare/v0.7.1...v0.8.0

bytewax - v0.8.0-beta.2

Published by whoahbot over 2 years ago

Beta release

What's Changed

Updated execution interface

run() now takes a dataflow and some input, runs it synchronously as a single worker in the existing Python thread, and returns the output to that thread. This is what you'd use in tests and simple notebook work.

run_cluster() takes a dataflow and some input, starts a local cluster of processes, runs it, waits for the cluster to finish
work, then collects thre results, and returns the output to that thread. This is what you'd use in a notebook if you need parallelism or higher throughput.

cluster_main() starts up a cluster of local processes, coordinates the addresses and process IDs between them, runs a dataflow on it, and waits for it to finish. This has a partitioned "input builder" and an "output builder" (discussed below). This is what you'd use if you'd want to write a standalone script or example that does some higher throughput processing.

Full Changelog: https://github.com/bytewax/bytewax/compare/v0.7.1...v0.8.0-beta.0

bytewax - v0.7.1

Published by whoahbot over 2 years ago

v0.7.1

Updates to build_and_run() to support running in notebook environments.

What's Changed

New Contributors

Full Changelog: https://github.com/bytewax/bytewax/compare/v0.7.0...v0.7.1

Package Rankings
Top 6.75% on Proxy.golang.org
Top 3.43% on Pypi.org
Badges
Extracted from project README
Actions Status PyPI Bytewax User Guide
Related Projects