Bot releases are visible (Hide)

stream-reactor - 5.0.1

Published by stheppi about 1 year ago

Fix: S3 sink when using envelope did not handle messages with null Value correctly

stream-reactor - Stream Reactor 5.0.0+deprecated

Published by github-actions[bot] about 1 year ago

stream-reactor - Stream Reactor 5.0.0

Published by github-actions[bot] about 1 year ago

5.0.0 Release

All Connectors

Test Fixes and E2E Test Clean-up: Improved testing with bug fixes and end-to-end test clean-up.
Code Optimization: Removed unused code and converted Java code and tests to Scala for enhanced performance. (TODO =
it's not clear what performance enhancements it made by the Java to Scala)
Ascii Art Loading Fix: Resolved issues related to ASCII art loading.
Build System Updates: Implemented build system updates and improvements.
Stream Reactor Integration: Integrated Kafka-connect-query-language inside of Stream Reactor for enhanced
compatibility.
STOREAS Consistency: Ensured consistent handling of backticks with STOREAS.

AWS S3 Connector

The source and sink has been the focus of this release.

S3 Source & Sink

Full message backup. The S3 sink and source now supports full message backup. This is enabled by adding in the
KCQL PROPERTIES('store.envelope'=true)
Removed Bytes_*** storage format. For those users leveraging them there is a migration information below. Storing raw
Kafka message the storage format should be AVRO/PARQUET/JSON(less ideal).
Introduced support for BYTES storing single message as raw binary. Typically, storing images or videos are the use
case
for this. This is enabled by adding in the KCQL STOREAS BYTES
Introduced support for PROPERTIES to drive new settings required to drive the connectors' behaviour. The KCQL looks
like this: INSERT INTO ... SELECT ... FROM ... PROPERTIES(property=key, ...)

Sink

Enhanced PARTITIONBY Support: expanded support for PARTITIONBY fields, now accommodating fields containing dots. For
instance, you can use PARTITIONBY a, `field1.field2` for enhanced partitioning control.
Advanced Padding Strategy: a more advanced padding strategy configuration. By default, padding is now enforced,
significantly improving compatibility with S3 Source.
Improved Error Messaging: Enhancements have been made to error messaging, providing clearer guidance, especially in
scenarios with misaligned topic configurations (#978).
Commit Logging Refactoring: Refactored and simplified the CommitPolicy for more efficient commit logging (#964).
Comprehensive Testing: Added additional unit testing around configuration settings, removed redundancy from property
names, and enhanced KCQL properties parsing to support Map structures.
Consolidated Naming Strategies: Merged naming strategies to reduce code complexity and ensure consistency. This effort
ensures that both hierarchical and custom partition modes share similar code paths, addressing issues related to
padding and the inclusion of keys and values within the partition name.
Optimized S3 API Calls: Switched from using deleteObjects to deleteObject for S3 API client calls (#957), enhancing
performance and efficiency.
JClouds Removal: The update removes the use of JClouds, streamlining the codebase.
Legacy Offset Seek Removal: The release eliminates legacy offset seek operations, simplifying the code and enhancing
overall efficiency

Source

Expanded Text Reader Support: new text readers to enhance data processing flexibility, including:
- Regex-Driven Text Reader: Allows parsing based on regular expressions.
- Multi-Line Text Reader: Handles multi-line data.
- Start-End Tag Text Reader: Processes data enclosed by start and end tags, suitable for XML content.
Improved Parallelization: enhancements enable parallelization based on the number of connector tasks and available
data partitions, optimizing data handling.
Data Consistency: Resolved data loss and duplication issues when the connector is restarted, ensuring
reliable data transfer.
Dynamic Partition Discovery: No more need to restart the connector when new partitions are added; runtime partition
discovery streamlines operations.
Efficient Storage Handling: The connector now ignores the .indexes directory, allowing data storage in an S3 bucket
without a prefix.
Increased Default Records per Poll: the default limit on the number of records returned per poll was changed from 1024
to 10000, improving data retrieval efficiency and throughput.
Ordered File Processing: Added the ability to process files in date order. This feature is especially useful when S3
files lack lexicographical sorting, and S3 API optimisation cannot be leveraged. Please note that it involves reading
and sorting files in memory.
Parquet INT96 Compatibility: The connector now allows Parquet INT96 to be read as a fixed array, preventing runtime
failures.

Kudu and Hive

The Kudu and Hive connectors are now deprecated and will be removed in a future release.

InfluxDB

Fixed a memory issue with the InfluxDB writer.
Upgraded to Influxdb2 client (note: doesn't yet support Influxdb2 connections).

S3 upgrade notes

Upgrading from 5.0.0 (preview) to 5.0.0

For installations that have been using the preview version of the S3 connector and are upgrading to the release, there
are a few important considerations:

Previously, default padding was enabled for both "offset" and "partition" values starting in June.

However, in version 5.0, the decision to apply default padding to the "offset" value only, leaving the "
partition" value without padding. This change was made to enhance compatibility with querying in Athena.

If you have been using a build from the master branch since June, your connectors might have been configured with a
different default padding setting.

To maintain consistency and ensure your existing connector configuration remains valid, you will need to use KCQL
configuration properties to customize the padding fields accordingly.

INSERT INTO $bucket[:$prefix]
SELECT *
FROM $topic
...
PROPERTIES(
  'padding.length.offset'=12,
  'padding.length.partition'=12
)

Upgrading from 4.x to 5.0.0

Starting with version 5.0.0, the following configuration keys have been replaced.

Field	Old Property	New Property
AWS Secret Key	aws.secret.key	connect.s3.aws.secret.key
Access Key	aws.access.key	connect.s3.aws.access.key
Auth Mode	aws.auth.mode	connect.s3.aws.auth.mode
Custom Endpoint	aws.custom.endpoint	connect.s3.custom.endpoint
VHost Bucket	aws.vhost.bucket	connect.s3.vhost.bucket

Upgrading from 4.1.* and 4.2.0

In version 4.1, padding options were available but were not enabled by default. At that time, the default padding
length, if not specified, was set to 8 characters.

However, starting from version 5.0, padding is now enabled by default, and the default padding length has been increased
to 12 characters.

Enabling padding has a notable advantage: it ensures that the files written are fully compatible with the Lenses Stream
Reactor S3 Source, enhancing interoperability and data integration.

Sinks created with 4.2.0 and 4.2.1 should retain the padding behaviour, and, therefore should disable padding:

INSERT INTO $bucket[:$prefix]
SELECT *
FROM $topic
...
PROPERTIES (
  'padding.type'=NoOp
)

If padding was enabled in 4.1, then the padding length should be specified in the KCQL statement:

INSERT INTO $bucket[:$prefix]
SELECT *
FROM $topic
...
PROPERTIES (
  'padding.length.offset'=12,
  'padding.length.partition'=12
)

Upgrading from 4.x to 5.0.0 only when `STOREAS Bytes_***` is used

The Bytes_*** storage format has been removed. If you are using this storage format, you will need to install the
5.0.0-deprecated connector and upgrade the connector instances by changing the class name:

Source Before:

class.name=io.lenses.streamreactor.connect.aws.s3.source.S3SourceConnector 
...

Source After:

class.name=io.lenses.streamreactor.connect.aws.s3.source.S3SourceConnectorDeprecated
...

Sink Before:

class.name=io.lenses.streamreactor.connect.aws.s3.sink.S3SinkConnector
...

Sink After:

class.name=io.lenses.streamreactor.connect.aws.s3.sink.S3SinkConnectorDeprecated
connect.s3.padding.strategy=NoOp
...

The deprecated connector won't be developed any further and will be removed in a future release.
If you want to talk to us about a migration plan, please get in touch with us at [email protected].

Upgrade a connector configuration

To migrate to the new configuration, please follow the following steps:

stop all running instances of the S3 connector
upgrade the connector to 5.0.0
update the configuration to use the new properties
resume the stopped connectors

stream-reactor - Stream Reactor 4.2.0

Published by github-actions[bot] over 1 year ago

stream-reactor - Stream Reactor 4.1.0

Published by github-actions[bot] over 1 year ago

stream-reactor - Stream Reactor 4.0.0

Published by github-actions[bot] about 2 years ago

stream-reactor - 3.0.1

Published by lanbot almost 3 years ago

3.0.1

stream-reactor - 3.0.0

Published by lanbot almost 3 years ago

3.0.0

stream-reactor - 2.1.3

Published by lanbot almost 4 years ago

2.1.3

stream-reactor - 2.1.2

Published by lanbot almost 4 years ago

2.1.2

stream-reactor - 2.1.1

Published by lanbot about 4 years ago

2.1.1

stream-reactor - 2.1.0

Published by lanbot about 4 years ago

2.1.0

stream-reactor - 2.0.0

Published by lanbot over 4 years ago

2.0.0

stream-reactor - 1.2.7

Published by lanbot over 4 years ago

1.2.7

stream-reactor - 1.2.6

Published by lanbot over 4 years ago

1.2.6

stream-reactor - 1.2.5

Published by lanbot over 4 years ago

1.2.5

stream-reactor - 1.2.4

Published by lanbot almost 5 years ago

Bug fixes

JMS Source

Ack the JMS messages was not always possible. Also there was an issue with producing the messages to Kafka out of order from the JMS queue.
Changes:
- Queue messages order are retained when published to Kafka (although they might be routed to different partitions)
- Ack happens for each message. This is a change from previous behaviour.
- Records which fail to be committed to Kafka are not ack-ed on JMS side

stream-reactor - 1.2.3

Published by lanbot about 5 years ago

Release Notes:

Features

Influx

Support for referencing _key values
Support Unix timestamp as double

MQTT

Replicate shared subscription to all tasks
Add sink config to specify retained messages
Add a config to specify retained messages

Hazelcast

SSL support

MongoDB

SSL support
Removing database name dashes restriction

FTP

FTPS support

Bug fixes

Hive:

Fix for writing nested structures to Hive
Improves the code for the async function call to use the CAS

stream-reactor - 1.2.2

Published by lanbot over 5 years ago

Release Notes:

Features

Redis

TTL Support
SSL support
AWS ElasticCache support
GEOADD support
PUB/SUB support

MQTT

Multi server connection
Dynamic Target support

Hive

Kerberos support

Kudu

Comma separated master endpoints

Bug fixes

Redis:

Topic regex

JMS:

Filters out Kafka records with null value

Cassandra:

Timestamp comparison

Mongo:

Memory leak

Cassandra

PK check for incremental

stream-reactor - 1.2.1

Published by lanbot over 5 years ago

1.2.1

Badges

Extracted from project README

Related Projects

log4j2-elasticsearch

Log4j2 Elasticsearch Appender plugins

21 Apr 2017 173

fscrawler

Elasticsearch File System Crawler (FS Crawler)

08 Jun 2012 1,345

docker-symfony-api

Docker environment with Symfony REST API example

11 Jan 2020 54

sentinl

Kibana Alert & Report App for Elasticsearch

07 May 2016 1,332

bigconnect

A multi-model Big Data graph store supporting graph, document, key/value, and object models

14 Dec 2020 8

reactify

Java library for developing backend with reactive programming

20 Aug 2024 12

SpringCloud

基于SpringCloud2.1的微服务开发脚手架，整合了spring-security-oauth2、nacos、feign、sentinel、springcloud-gateway等。服务治...

23 Jul 2017 8,685

Mongo-Redis-RabbitMQ-ELK-Stack

🚀 Dive into the world of Node.js, MongoDB, Redis, Postgres, RabbitMQ, and the ELK stack with this...

23 Jun 2024 13

bubble

bubble 旨在为项目快速开发提供一系列的基础能力，方便使用者根据项目需求快速进行功能拓展。已将所有 JAR 包都推送至中央仓库，也会为每个版本的升级改动列出详细的更新日志

30 Mar 2020 8

haystack-traces

Components that handle spans, buffers them, writes to ElasticSearch/Cassandra, and provides API t...

17 Aug 2017 9

spring-elasticsearch

Spring factories for elasticsearch

10 Feb 2012 286

JavaKeeper

✍️ Java 工程师必备架构体系知识总结：涵盖分布式、微服务、RPC等互联网公司常用架构，以及数据存储、缓存、搜索等必备技能

03 Sep 2019 1,866

kafka-connect-ui

Web tool for Kafka Connect |

02 Dec 2016 496

core

Microservice abstract class

20 Nov 2015 38

elasticsearch-hadoop

Elasticsearch real-time search and analytics natively integrated with Hadoop

11 Mar 2013 1,927

stream-reactor

5.0.0 Release

All Connectors

AWS S3 Connector

S3 Source & Sink

Sink

Source

Kudu and Hive

InfluxDB

S3 upgrade notes

Upgrading from 5.0.0 (preview) to 5.0.0

Upgrading from 4.x to 5.0.0

Upgrading from 4.1.* and 4.2.0

Upgrading from 4.x to 5.0.0 only when STOREAS Bytes_*** is used

Upgrade a connector configuration

Release Notes:

Features

Influx

MQTT

Hazelcast

MongoDB

FTP

Bug fixes

Hive:

Release Notes:

Features

Redis

MQTT

Hive

Kudu

Bug fixes

Redis:

JMS:

Cassandra:

Mongo:

Cassandra

Related Projects

log4j2-elasticsearch

fscrawler

docker-symfony-api

sentinl

bigconnect

reactify

SpringCloud

Mongo-Redis-RabbitMQ-ELK-Stack

bubble

haystack-traces

spring-elasticsearch

JavaKeeper

kafka-connect-ui

core

elasticsearch-hadoop

Upgrading from 4.x to 5.0.0 only when `STOREAS Bytes_***` is used