stream-reactor

A collection of open source Apache 2.0 Kafka Connector maintained by Lenses.io.

APACHE-2.0 License

Stars
987
Committers
117

Bot releases are visible (Hide)

stream-reactor - Stream Reactor 7.4.5

Published by github-actions[bot] about 1 month ago

stream-reactor - Stream Reactor 7.4.4

Published by github-actions[bot] about 2 months ago

Azure Datalake & GCP Storage

Dependency version upgrades

Data Lake Sinks (AWS, Azure Datalake and GCP Storage)

  • Fixes a gap in the avro/parquet storage where enums where converted from Connect enums to string.
  • Adds support for explicit "no partition" specification to kcql, to enable topics to be written in the bucket and prefix without partitioning the data.
    • Syntax Example: INSERT INTO foo SELECT * FROM bar NOPARTITION
stream-reactor - Stream Reactor 7.4.3

Published by github-actions[bot] 2 months ago

All Connectors

Dependency version upgrades

Data Lake Sinks (AWS, Azure Datalake and GCP Storage)

This release introduces a new configuration option for three Kafka Connect Sink Connectors—S3, Data Lake, and GCP Storage—allowing users to disable exactly-once semantics. By default, exactly once is enabled, but with this update, users can choose to disable it, opting instead for Kafka Connect’s native at-least-once offset management.

Configuration Parameters:

S3 Sink Connector: connect.s3.exactly.once.enable
Data Lake Sink Connector: connect.datalake.exactly.once.enable
GCP Storage Sink Connector: connect.gcpstorage.exactly.once.enable

Default Value: true

stream-reactor - Stream Reactor 7.4.2

Published by github-actions[bot] 2 months ago

stream-reactor - Stream Reactor 7.4.1

Published by github-actions[bot] 3 months ago

stream-reactor - Stream Reactor 7.4.0

Published by github-actions[bot] 3 months ago

stream-reactor - Stream Reactor 7.3.2 Latest Release

Published by github-actions[bot] 4 months ago

stream-reactor - Stream Reactor 7.3.1

Published by github-actions[bot] 4 months ago

stream-reactor - Stream Reactor 7.3.0

Published by github-actions[bot] 4 months ago

NEW: Azure Service Bus Source Connector

NEW: GCP Pub/Sub Source Connector

Data Lake Sinks (AWS, Azure Datalake and GCP Storage)

  • To back the topics up the KCQL statement is
INSERT INTO bucket
SELECT * FROM `*`
...

When * is used the envelope setting is ignored.

This change allows for the * to be taken into account as a default if the given message topic is not found.

HTTP Sink

Bug fix to ensure that, if specified as part of the template, the Content-Type header is correctly populated.

All Connectors:

  • Update of dependencies.

Note

Upgrading from any version prior to 7.0.0, please see the release and upgrade notes for 7.0.0.

stream-reactor - Stream Reactor 7.2.0

Published by github-actions[bot] 5 months ago

Enhancements

  1. Automated Skip for Archived Objects:

    • The S3 source now seamlessly bypasses archived objects, including those stored in Glacier and Deep Archive. This enhancement improves efficiency by automatically excluding archived data from processing, avoiding the connector crashing otherwise
  2. Enhanced Key Storage in Envelope Mode:

    • Changes have been implemented to the stored key when using envelope mode. These modifications lay the groundwork for future functionality, enabling seamless replay of Kafka data stored in data lakes (S3, GCS, Azure Data Lake) from any specified point in time.

Full Changelog: https://github.com/lensesio/stream-reactor/compare/7.1.0...7.2.0

stream-reactor - Stream Reactor 7.1.0

Published by github-actions[bot] 6 months ago

Source Line-Start-End Functionality Enhancements

We've rolled out enhancements to tackle a common challenge faced by users of the S3 source functionality. Previously, when an external producer abruptly terminated a file without marking the end message, data loss occurred.

To address this, we've introduced a new feature: a property entry for KCQL to signal the handling of unterminated messages. Meet the latest addition, read.text.last.end.line.missing. When set to true, this property ensures that in-flight data is still recognized as a message even when EOF is reached but the end line marker is missing.

#Note

Upgrading from any version prior to 7.0.0, please see the release and upgrade notes for 7.0.0.

stream-reactor - v7.0.0

Published by github-actions[bot] 6 months ago

This release brings changes which are not compatible with the previous version for S3, GCS and Azure sinks.A migration is required.
For migration details please follow the link here.

Data-lakes Sink Connectors

This release brings substantial enhancements to the data-lakes sink connectors, elevating their functionality and flexibility. The focal point of these changes is the adoption of the new KCQL syntax, designed to improve usability and resolve limitations inherent in the previous syntax.

Key Changes

New KCQL Syntax The data-lakes sink connectors now embrace the new KCQL syntax, offering users enhanced capabilities while addressing previous syntax constraints.
Data Lakes Sink Partition Name This update ensures accurate preservation of partition names by avoiding the scraping of characters like \ and /. Consequently, SMTs can provide partition names as expected, leading to reduced configuration overhead and increased conciseness.

KCQL Keywords Replaced

Several keywords have been replaced with entries in the PROPERTIES section for improved clarity and consistency:

WITHPARTITIONER: Replaced by PROPERTIES ('partition.include.keys'=true/false). When WITHPARTITIONER KeysAndValue is set to true, the partition keys are included in the partition path. Otherwise, only the partition values are included.
WITH_FLUSH_SIZE: Replaced by PROPERTIES ('flush.size'=$VALUE).
WITH_FLUSH_COUNT: Replaced by PROPERTIES ('flush.count'=$VALUE).
WITH_FLUSH_INTERVAL: Replaced by PROPERTIES ('flush.interval'=$VALUE).

Benefits

The adoption of the new KCQL syntax enhances the flexibility of the data-lakes sink connectors, empowering users to tailor configurations more precisely to their requirements. By transitioning keywords to entries in the PROPERTIES section, potential misconfigurations stemming from keyword order discrepancies are mitigated, ensuring configurations are applied as intended

stream-reactor - Stream Reactor 6.3.1

Published by github-actions[bot] 6 months ago

This update specifically affects datalake sinks employing the JSON storage format. It serves as a remedy for users who have resorted to a less-than-ideal workaround: employing a Single Message Transform (SMT) to return a Plain Old Java Object (POJO) to the sink. In such cases, instead of utilizing the Connect JsonConverter to seamlessly translate the payload to JSON, reliance is placed solely on Jackson.

However, it's crucial to note that this adjustment is not indicative of a broader direction for future expansions. This is because relying on such SMT practices does not ensure an agnostic solution for storage formats (such as Avro, Parquet, or JSON).

Full Changelog: https://github.com/lensesio/stream-reactor/compare/6.3.1...6.3.1

stream-reactor - Stream Reactor 6.3.0

Published by github-actions[bot] 7 months ago

Release notes

New Connector
The HTTP Sink is offered as beta. Please report any issues via GitHub issues.

stream-reactor - Stream Reactor 6.2.0

Published by github-actions[bot] 7 months ago

Release notes

New Connector
The GCP Storage source is offered as beta. Please report any issues via GitHub issues.

Important
AWS S3 Source Partition search properties have changed. See the release notes for detailed information.

stream-reactor - Stream Reactor 6.1.0

Published by github-actions[bot] 8 months ago

All Connectors:
In this release, all connectors have been updated to address an issue related to conflicting Antlr jars that may arise in specific environments.

AWS S3 Source:
Byte Array Support: Resolved an issue where storing the Key/Value as an array of bytes caused compatibility problems due to the connector returning java.nio.ByteBuffer while the Connect framework's ByteArrayConverter only works with byte[]. This update ensures seamless conversion to byte[] if the key/value is a ByteBuffer.

JMS Sink:
Fix for NullPointerException: Addressed an issue where the JMS sink connector encountered a NullPointerException when processing a message with a null JMSReplyTo header value.

JMS Source:
Fix for DataException: Resolved an issue where the JMS source connector encountered a DataException when processing a message with a JMSReplyTo header set to a queue.

AWS S3 Sink/GCP Storage Sink (beta)/Azure Datalake Sink (beta):
GZIP Support for JSON Writing: Added support for GZIP compression when writing JSON data to AWS S3, GCP Storage, and Azure Datalake sinks.

stream-reactor - Stream Reactor 6.0.3

Published by github-actions[bot] 9 months ago

stream-reactor - Stream Reactor 6.0.2

Published by github-actions[bot] 10 months ago

stream-reactor - Stream Reactor 6.0.1

Published by github-actions[bot] 10 months ago

Three connectors were updated in this release:

  • AWS S3
  • Azure Datalake
  • GCP Storage

The following enhancements were made:

  • Removed check preventing nested paths being used in the sink.
  • Avoid cast exception in GCP Storage connector when using Credentials mode.

Please Remember
The Azure Data Lake and GCP Storage sinks are offered as beta. Please report any issues via GitHub issues.

For the latest version of all other connectors please see Version 6.0.0

stream-reactor - Stream Reactor 6.0.0

Published by github-actions[bot] 11 months ago

Release notes

Important
Package names for several connectors have changed. This may impact your configuration if upgrading. More information about package changes.

New Connectors
The Azure Data Lake and GCP Storage sinks are offered as beta. Please report any issues via GitHub issues.

Badges
Extracted from project README
FOSSA Status Alt text FOSSA Status
Related Projects