Bot releases are visible (Hide)
Published by whoahbot about 2 years ago
KafkaInputConfig now accepts additional properties. See
bytewax.inputs.KafkaInputConfig
.
Support for a pre-built Kafka output component. See
bytewax.outputs.KafkaOutputConfig
.
Full Changelog: https://github.com/bytewax/bytewax/compare/v0.11.0...v0.11.1
Published by whoahbot about 2 years ago
Added the fold_window
operator, works like reduce_window
but allows
the user to build the initial accumulator for each key in a builder
function.
Output is no longer specified using an output_builder
for the
entire dataflow, but you supply an "output config" per capture. See
bytewax.outputs
for more info.
Input is no longer specified on the execution entry point (like
run_main
), it is instead using the Dataflow.input
operator.
Epochs are no longer user-facing as part of the input system. Any
custom Python-based input components you write just need to be
iterators and emit items. Recovery snapshots and backups now happen
periodically, defaulting to every 10 seconds.
Recovery format has been changed for all recovery stores. You cannot
resume from recovery data written with an older version.
The reduce_epoch
operator has been replaced with
reduce_window
. It takes a "clock" and a "windower" to define the
kind of aggregation you want to do.
run
and run_cluster
have been removed and the remaining
execution entry points moved into bytewax.execution
. You can now
get similar prototyping functionality with
bytewax.execution.run_main
and bytewax.execution.spawn_cluster
using Testing{Input,Output}Config
s.
Dataflow
has been moved into bytewax.dataflow.Dataflow
.
Full Changelog: https://github.com/bytewax/bytewax/compare/v0.10.0...v0.11.0
Published by davidselassie over 2 years ago
Input is no longer specified using an input_builder
, but now an
input_config
which allows you to use pre-built input
components. See bytewax.inputs
for more info.
Preliminary support for a pre-built Kafka input component. See
bytewax.inputs.KafkaInputConfig
.
Keys used in the (key, value)
2-tuples to route data for stateful
operators (like stateful_map
and reduce_epoch
) must now be
strings. Because of this bytewax.exhash
is no longer necessary and
has been removed.
Recovery format has been changed for all recovery stores. You cannot
resume from recovery data written with an older version.
Slight changes to bytewax.recovery.RecoveryConfig
config options
due to recovery system changes.
bytewax.run()
and bytewax.run_cluster()
no longer take
recovery_config
as they don't support recovery.
stateful_map
operator. by @whoahbot in https://github.com/bytewax/bytewax/pull/36
sorted_window()
to support items with identical times by @davidselassie in https://github.com/bytewax/bytewax/pull/41
distribute()
helper by @davidselassie in https://github.com/bytewax/bytewax/pull/71
JoinHandle::is_finished
by @davidselassie in https://github.com/bytewax/bytewax/pull/78
recovery_wordcount.py
into examples by @davidselassie in https://github.com/bytewax/bytewax/pull/83
0.10.0
by @davidselassie in https://github.com/bytewax/bytewax/pull/91
Full Changelog: https://github.com/bytewax/bytewax/compare/v0.7.1...v0.10.0
Published by whoahbot over 2 years ago
Adds bytewax.AdvanceTo
and bytewax.Emit
to control when processing
happens.
Adds bytewax.run_main()
as a way to test input and output builders
without starting a cluster.
Adds a bytewax.testing
module with helpers for testing.
bytewax.run_cluster()
and bytewax.spawn_cluster()
now take a
mp_ctx
argument to allow you to change the multiprocessing
behavior. E.g. from "fork" to "spawn". Defaults now to "spawn".
stateful_map
operator. by @whoahbot in https://github.com/bytewax/bytewax/pull/36
sorted_window()
to support items with identical times by @davidselassie in https://github.com/bytewax/bytewax/pull/41
Full Changelog: https://github.com/bytewax/bytewax/compare/v0.7.1...v0.9.0
Published by davidselassie over 2 years ago
Capture operator no longer takes arguments. Items that flow through
those points in the dataflow graph will be processed by the output
handlers setup by each execution entry point. Every dataflow
requires at least one capture.
Executor.build_and_run()
is replaced with four entry points for
specific use cases:
run()
for exeuction in the current process. It returns all
captured items to the calling process for you. Use this for
prototyping in notebooks and basic tests.
run_cluster()
for execution on a temporary machine-local cluster
that Bytewax coordinates for you. It returns all captured items to
the calling process for you. Use this for notebook analysis where
you need parallelism.
spawn_cluster()
for starting a machine-local cluster with more
control over input and output. Use this for standalone scripts
where you might need partitioned input and output.
cluster_main()
for starting a process that will participate in a
cluster you are coordinating manually. Use this when starting a
Kubernetes cluster.
Adds bytewax.parse
module to help with reading command line
arguments and environment variables for the above entrypoints.
Renames bytewax.inp
to bytewax.inputs
.
stateful_map
operator. by @whoahbot in https://github.com/bytewax/bytewax/pull/36
sorted_window()
to support items with identical times by @davidselassie in https://github.com/bytewax/bytewax/pull/41
Full Changelog: https://github.com/bytewax/bytewax/compare/v0.7.1...v0.8.0
Published by whoahbot over 2 years ago
run()
now takes a dataflow and some input, runs it synchronously as a single worker in the existing Python thread, and returns the output to that thread. This is what you'd use in tests and simple notebook work.
run_cluster()
takes a dataflow and some input, starts a local cluster of processes, runs it, waits for the cluster to finish
work, then collects thre results, and returns the output to that thread. This is what you'd use in a notebook if you need parallelism or higher throughput.
cluster_main()
starts up a cluster of local processes, coordinates the addresses and process IDs between them, runs a dataflow on it, and waits for it to finish. This has a partitioned "input builder" and an "output builder" (discussed below). This is what you'd use if you'd want to write a standalone script or example that does some higher throughput processing.
Full Changelog: https://github.com/bytewax/bytewax/compare/v0.7.1...v0.8.0-beta.0
Published by whoahbot over 2 years ago
Updates to build_and_run()
to support running in notebook environments.
build_and_run
. by @whoahbot in https://github.com/bytewax/bytewax/pull/10
Full Changelog: https://github.com/bytewax/bytewax/compare/v0.7.0...v0.7.1