Bot releases are visible (Hide)
~bytewax.operators.windowing.EventClock
where in-order butEventClock
by @davidselassie in https://github.com/bytewax/bytewax/pull/470
Full Changelog: https://github.com/bytewax/bytewax/compare/v0.20.0...v0.21.1
Published by whoahbot 5 months ago
Adds a dataflow structure visualizer. Run python -m bytewax.visualize
.
Breaking change The internal format of recovery databases has been
changed from using JsonPickle
to Python's built-in {py:obj}pickle
.
Recovery stores that used the old format will not be usable after
upgrading.
Breaking change The unary
operator and UnaryLogic
have been
renamed to stateful
and StatefulLogic
respectively.
Adds a stateful_batch
operator to allow for lower-level batch
control while managing state.
StatefulLogic.on_notify
, StatefulLogic.on_eof
, and
StatefulLogic.notify_at
are now optional overrides. The defaults
retain the state and emit nothing.
Breaking change Windowing operators have been moved from
bytewax.operators.window
into bytewax.operators.windowing
.
Breaking change ClockConfig
s have had Config
dropped from
their name and are just Clock
s. E.g. If you previously from bytewax.operators.window import SystemClockConfig
now from bytewax.operators.windowing import SystemClock
.
Breaking change WindowConfig
s have been renamed to Windower
s.
E.g. If you previously from bytewax.operators.window import SessionWindow
now from bytewax.operators.windowing import SessionWindower
.
Breaking change All windowing operators now return a set of
streams {py:obj}~bytewax.operators.windowing.WindowOut
.
{py:obj}~bytewax.operators.windowing.WindowMetadata
now is
branched into its own stream and is no longer part of the single
downstream. All window operator emitted items are labeled with the
unique window ID they came from to facilitate joining the data
later.
Breaking change {py:obj}~bytewax.operators.windowing.fold_window
now requires a merge
argument. This handles whenever the session
windower determines that two windows must be merged because a new
item bridged a gap.
Breaking change The join_named
and join_window_named
operators
have been removed because they did not support returning proper type
information. Use {py:obj}~bytewax.operators.join
or
{py:obj}~bytewax.operators.windowing.join_window
instead, which
have been enhanced to properly type their downstream values.
Breaking change {py:obj}~bytewax.operators.join
and
{py:obj}~bytewax.operators.windowing.join_window
have had their
product
argument replaced with mode
. You now can specify more
nuanced kinds of join modes.
Python interfaces are now provided for custom clocks and windowers.
Subclass {py:obj}~bytewax.operators.windowing.Clock
(and a
corresponding {py:obj}~bytewax.operators.windowing.ClockLogic
) or
{py:obj}~bytewax.operators.windowing.Windower
(and a corresponding
{py:obj}~bytewax.operators.windowing.WindowerLogic
) to define your
own senses of time and window definitions.
Adds a {py:obj}~bytewax.operators.windowing.window
operator to
allow you to write more flexible custom windowing operators.
Session windows now work correctly with out-of-order data and joins.
All windowing operators now process items in timestamp order. The
most visible change that this results in is that the
{py:obj}~bytewax.operators.windowing.collect_window
operator now
emits collections with values in timestamp order.
Adds a {py:obj}~bytewax.operators.filter_map_value
operator.
Adds a {py:obj}~bytewax.operators.enrich_cached
operator for
easier joining with an external data source.
Adds a {py:obj}~bytewax.operators.key_rm
convenience operator to
remove keys from a {py:obj}~bytewax.operators.KeyedStream
.
enrich_cached
operator by @davidselassie in https://github.com/bytewax/bytewax/pull/449
operators.window
to operators.windowing
by @davidselassie in https://github.com/bytewax/bytewax/pull/453
join
and join_window
operators now have modes by @davidselassie in https://github.com/bytewax/bytewax/pull/462
WindowMetadata
by @davidselassie in https://github.com/bytewax/bytewax/pull/469
Full Changelog: https://github.com/bytewax/bytewax/compare/v0.19.0...v0.20.0
Published by whoahbot 7 months ago
Full Changelog: https://github.com/bytewax/bytewax/compare/v0.19.0...v0.19.1
Published by whoahbot 7 months ago
Multiple operators have been reworked to avoid taking and releasing
Python's global interpreter lock while iterating over multiple items.
Windowing operators, stateful operators and operators like branch
will see significant performance improvements.
Thanks to @damiondoesthings for helping us track this down!
Breaking change FixedPartitionedSource.build_part
,
DynamicSource.build
, FixedPartitionedSink.build_part
and DynamicSink.build
now take an additional step_id
argument. This argument can be used when
labeling custom Python metrics.
Custom Python metrics can now be collected using the prometheus-client
library.
Breaking change The schema registry interface has been removed.
You can still use schema registries, but you need to instantiate
the (de)serializers on your own. This allows for more flexibility.
See the confluent_serde
and redpanda_serde
examples for how
to use the new interface.
Fixes bug where items would be incorrectly marked as late in sliding
and tumbling windows in cases where the timestamps are very far from
the align_to
parameter of the windower.
Adds stateful_flat_map
operator.
Breaking change Removes builder
argument from stateful_map
.
Instead, the initial state value is always None
and you can call
your previous builder by hand in the mapper
.
Breaking change Improves performance by removing the now: datetime
argument from FixedPartitionedSource.build_part
,
DynamicSource.build
, and UnaryLogic.on_item
. If you need the
current time, use:
from datetime import datetime, timezone
now = datetime.now(timezone.utc)
sched: datetime
argument from StatefulSourcePartition.next_batch
,StatelessSourcePartition.next_batch
, UnaryLogic.on_notify
. You{Stateful,Stateless}SourcePartition.next_awake
orUnaryLogic.notify_at
.collect
and branch
operator test file names by @davidselassie in https://github.com/bytewax/bytewax/pull/385
stateful_map
builder
function and adds stateful_flat_map
by @davidselassie in https://github.com/bytewax/bytewax/pull/387
now
and sched
arguments in input partitions and unary logic by @davidselassie in https://github.com/bytewax/bytewax/pull/391
time_for
twice by @whoahbot in https://github.com/bytewax/bytewax/pull/414
Full Changelog: https://github.com/bytewax/bytewax/compare/v0.18.2...v0.19.0
Published by whoahbot 8 months ago
next_awake
is set far in the future.Full Changelog: https://github.com/bytewax/bytewax/compare/v0.18.1...v0.18.2
Full Changelog: https://github.com/bytewax/bytewax/compare/v0.18.1...v0.18.2
Published by whoahbot 9 months ago
KafkaSource
from 1 to 1000 to match the Kafka input operator.count_window
operator: https://github.com/bytewax/bytewax/issues/364.reduce_final
by @davidselassie in https://github.com/bytewax/bytewax/pull/361
Full Changelog: https://github.com/bytewax/bytewax/compare/v0.18.0...v0.18.1
Published by whoahbot 10 months ago
Support for schema registries, through bytewax.connectors.kafka.registry.RedpandaSchemaRegistry
and bytewax.connectors.kafka.registry.ConfluentSchemaRegistry
.
Custom Kafka operators in bytewax.connectors.kafka.operators
:
input
, output
, deserialize_key
, deserialize_value
, deserialize
,
serialize_key
, serialize_value
and serialize
.
Breaking change KafkaSource
now emits a special KafkaSourceMessage
to allow access to all data on consumed messages. KafkaSink
now consumes KafkaSinkMessage
to allow setting additional fields on produced messages.
Non-linear dataflows are now possible. Each operator method returns
a handle to the Stream
s it produces; add further steps via calling
operator functions on those returned handles, not the root
Dataflow
. See the migration guide for more info.
Auto-complete and type hinting on operators, inputs, outputs,
streams, and logic functions now works.
A ton of new operators: collect_final
, count_final
,
count_window
, flatten
, inspect_debug
, join
, join_named
,
max_final
, max_window
, merge
, min_final
, min_window
,
key_on
, key_assert
, key_split
, merge
, unary
. Documentation
for all operators are in bytewax.operators
now.
New operators can be added in Python, made by grouping existing
operators. See bytewax.dataflow
module docstring for more info.
Breaking change Operators are now stand-alone functions; import bytewax.operators as op
and use e.g. op.map("step_id", upstream, lambda x: x + 1)
.
Breaking change All operators must take a step_id
argument now.
Breaking change fold
and reduce
operators have been renamed to
fold_final
and reduce_final
. They now only emit on EOF and are
only for use in batch contexts.
Breaking change batch
operator renamed to collect
, so as to
not be confused with runtime batching. Behavior is unchanged.
Breaking change output
operator does not forward downstream its
items. Add operators on the upstream handle instead.
next_batch
on input partitions can now return any Iterable
, not
just a List
.
inspect
operator now has a default inspector that prints out items
with the step ID.
collect_window
operator now can collect into set
s and dict
s.
Adds a get_fs_id
argument to {Dir,File}Source
to allow handling
non-identical files per worker.
Adds a TestingSource.EOF
and TestingSource.ABORT
sentinel values
you can use to test recovery.
Breaking change Adds a datetime
argument to
FixedPartitionSource.build_part
, DynamicSource.build_part
,
StatefulSourcePartition.next_batch
, and
StatelessSourcePartition.next_batch
. You can now use this to
update your next_awake
time easily.
Breaking change Window operators now emit WindowMetadata
objects
downstream. These objects can be used to introspect the open_time
and close_time of windows. This changes the output type of windowing
operators from: (key, values)
to (key, (metadata, values))
.
Breaking change IO classes and connectors have been renamed to
better reflect their semantics and match up with documentation.
Moves the ability to start multiple Python processes with the
-p
or --processes
to the bytewax.testing
module.
Breaking change SimplePollingSource
moved from
bytewax.connectors.periodic
to bytewax.inputs
since it is an
input helper.
SimplePollingSource
's align_to
argument now works.
now
argument to build_part
and next_batch
by @davidselassie in https://github.com/bytewax/bytewax/pull/316
TestingSource.{ABORT, EOF}
by @davidselassie in https://github.com/bytewax/bytewax/pull/317
get_fs_id
argument to {Dir,File}Source
by @davidselassie in https://github.com/bytewax/bytewax/pull/320
SimplePollingSource
by @davidselassie in https://github.com/bytewax/bytewax/pull/324
key_split
operator; fixes type annotations by @davidselassie in https://github.com/bytewax/bytewax/pull/331
Stream
arguments to operators by @davidselassie in https://github.com/bytewax/bytewax/pull/334
Optional
and more complex types in operator signatures by @davidselassie in https://github.com/bytewax/bytewax/pull/338
flat_map_batch
operator by @davidselassie in https://github.com/bytewax/bytewax/pull/341
branch
operator so if it gets a TypeGuard
, the output streams are typed correctly by @davidselassie in https://github.com/bytewax/bytewax/pull/342
rusqlite
deps by @davidselassie in https://github.com/bytewax/bytewax/pull/345
ruff format
instead of black
by @davidselassie in https://github.com/bytewax/bytewax/pull/346
batch
operator to collect
by @davidselassie in https://github.com/bytewax/bytewax/pull/351
join_window
fixes by @davidselassie in https://github.com/bytewax/bytewax/pull/344
Full Changelog: https://github.com/bytewax/bytewax/compare/v0.17.1...v0.18.0
Published by whoahbot about 1 year ago
Full Changelog: https://github.com/bytewax/bytewax/compare/v0.17.1...v0.17.2
Published by whoahbot about 1 year ago
Adds the batch
operator to Dataflows. Calling Dataflow.batch
will batch incoming items until either a batch size has been reached
or a timeout has passed.
Adds the SimplePollingInput
source. Subclass this input source to
periodically source new input for a dataflow.
Re-adds GLIBC 2.27 builds to support older linux distributions.
Full Changelog: https://github.com/bytewax/bytewax/compare/v0.17.0...v0.17.1
Published by whoahbot about 1 year ago
Breaking change Recovery system re-worked. Kafka-based recovery
removed. SQLite recovery file format changed; existing recovery DB
files can not be used. See the module docstring for
bytewax.recovery
for how to use the new recovery system.
Dataflow execution supports rescaling over resumes. You can now
change the number of workers and still get proper execution and
recovery.
epoch-interval
has been renamed to snapshot-interval
The list-parts
method of PartitionedInput
has been changed to
return a List[str]
and should only reflect the available
inputs that a given worker has access to. You no longer need
to return the complete set of partitions for all workers.
The next
method of StatefulSource
and StatelessSource
has
been changed to next_batch
and should return a List
of elements,
or the empty list if there are no elements to return.
Added new cli parameter backup-interval
, to configure the length of
time to wait before "garbage collecting" older recovery snapshots.
Added next_awake
to input classes, which can be used to schedule
when the next call to next_batch
should occur. Use next_awake
instead of time.sleep
.
Added bytewax.inputs.batcher_async
to bridge async Python libraries
in Bytewax input sources.
Added support for linux/aarch64 and linux/armv7 platforms.
KafkaRecoveryConfig
has been removed as a recovery store.ymd
deprecation warnings. by @whoahbot in https://github.com/bytewax/bytewax/pull/262
Full Changelog: https://github.com/bytewax/bytewax/compare/v0.16.2...v0.17.0
Published by whoahbot over 1 year ago
Full Changelog: https://github.com/bytewax/bytewax/compare/v0.16.1...v0.16.2
Published by whoahbot over 1 year ago
Full Changelog: https://github.com/bytewax/bytewax/compare/v0.16.0...v0.16.1
Published by whoahbot over 1 year ago
Breaking change Reworked the execution model. run_main
and cluster_main
have been moved to bytewax.testing
as they are only supposed to be used
when testing or prototyping.
Production dataflows should be ran by calling the bytewax.run
module with python -m bytewax.run <dataflow-path>:<dataflow-name>
.
See python -m bytewax.run -h
for all the possible options.
The functionality offered by spawn_cluster
are now only offered by the
bytewax.run
script, so spawn_cluster
was removed.
Breaking change {Sliding,Tumbling}Window.start_at
has been
renamed to align_to
and both now require that argument. It's not
possible to recover windowing operators without it.
Fixes bugs with windows not closing properly.
Fixes an issue with SQLite-based recovery. Previously you'd always
get an "interleaved executions" panic whenever you resumed a cluster
after the first time.
Add SessionWindow
for windowing operators.
Add SlidingWindow
for windowing operators.
Breaking change Rename TumblingWindowConfig
to TumblingWindow
Add filter_map
operator.
Breaking change New partition-based input and output API. This
removes ManualInputConfig
and ManualOutputConfig
. See
bytewax.inputs
and bytewax.outputs
for more info.
Breaking change Dataflow.capture
operator is renamed to
Dataflow.output
.
Breaking change KafkaInputConfig
and KafkaOutputConfig
have
been moved to bytewax.connectors.kafka.KafkaInput
and
bytewax.connectors.kafka.KafkaOutput
.
Deprecation warning The KafkaRecovery
store is being deprecated
in favor of SqliteRecoveryConfig
, and will be removed in a future
release.
add_pymethods
macro everywhere by @Psykopear in https://github.com/bytewax/bytewax/pull/199
filter_map
operator by @Psykopear in https://github.com/bytewax/bytewax/pull/198
hash
for routing by @davidselassie in https://github.com/bytewax/bytewax/pull/227
bytewax.run
by @Psykopear in https://github.com/bytewax/bytewax/pull/232
Full Changelog: https://github.com/bytewax/bytewax/compare/v0.15.0...v0.16.0
Published by whoahbot over 1 year ago
list
serialization. by @whoahbot in https://github.com/bytewax/bytewax/pull/189
Full Changelog: https://github.com/bytewax/bytewax/compare/v0.14.0...v0.15.1
Published by davidselassie over 1 year ago
list
serialization. by @whoahbot in https://github.com/bytewax/bytewax/pull/189
Full Changelog: https://github.com/bytewax/bytewax/compare/v0.14.0...v0.15.0
Published by davidselassie almost 2 years ago
Dataflow continuation now works. If you run a dataflow over a finite
input, all state will be persisted via recovery so if you re-run the
same dataflow pointing at the same input, but with more data
appended at the end, it will correctly continue processing from the
previous end-of-stream.
Fixes issue with multi-worker recovery. Previously resume data was
being routed to the wrong worker so state would be missing.
The above two changes require that the recovery format has been
changed for all recovery stores. You cannot resume from recovery
data written with an older version.
Adds an introspection web server to dataflow workers.
Adds collect_window
operator.
Full Changelog: https://github.com/bytewax/bytewax/compare/v0.13.1...v0.14.0
Published by miccioest almost 2 years ago
Full Changelog: https://github.com/bytewax/bytewax/compare/v0.13.0...v0.13.1
Published by whoahbot almost 2 years ago
Full Changelog: https://github.com/bytewax/bytewax/compare/v0.12.0...v0.13.0
Published by whoahbot almost 2 years ago
Fixes bug where window is never closed if recovery occurs after last
item but before window close.
Recovery logging is reduced.
Breaking change Recovery format has been changed for all recovery stores.
You cannot resume from recovery data written with an older version.
Adds a DynamoDB
and BigQuery
output connector.
stateful_map
operator. by @whoahbot in https://github.com/bytewax/bytewax/pull/36
sorted_window()
to support items with identical times by @davidselassie in https://github.com/bytewax/bytewax/pull/41
distribute()
helper by @davidselassie in https://github.com/bytewax/bytewax/pull/71
JoinHandle::is_finished
by @davidselassie in https://github.com/bytewax/bytewax/pull/78
recovery_wordcount.py
into examples by @davidselassie in https://github.com/bytewax/bytewax/pull/83
0.10.0
by @davidselassie in https://github.com/bytewax/bytewax/pull/91
downcast
instead of extract
by @whoahbot in https://github.com/bytewax/bytewax/pull/128
TestingClockConfig
by @davidselassie in https://github.com/bytewax/bytewax/pull/139
Full Changelog: https://github.com/bytewax/bytewax/compare/v0.7.1...v0.12.0
Published by whoahbot about 2 years ago
Performance improvements. ✨
Support SASL and SSL for bytewax.inputs.KafkaInputConfig
.
downcast
instead of extract
by @whoahbot in https://github.com/bytewax/bytewax/pull/128
Full Changelog: https://github.com/bytewax/bytewax/compare/v0.11.1...0.11.2