karafka

Ruby and Rails efficient Kafka processing framework

OTHER License

Downloads
14.5M
Stars
2K
Committers
59

Bot releases are visible (Hide)

karafka - v2.1.2

Published by mensfeld over 1 year ago

  • Set minimum karafka-core on 2.0.13 to make sure correct version of karafka-rdkafka is used.
  • Set minimum waterdrop on 2.5.3 to make sure correct version of waterdrop is used.
karafka - v2.1.1

Published by mensfeld over 1 year ago

  • [Fix] Liveness Probe Doesn't Meet HTTP 1.1 Criteria - Causing Kubernetes Restarts (#1450)
karafka - v2.1.0

Published by mensfeld over 1 year ago

  • [Feature] Provide ability to use CurrentAttributes with ActiveJob's Karafka adapter.
  • [Feature] Introduce collective Virtual Partitions offset management.
  • [Feature] Use virtual offsets to filter out messages that would be re-processed upon retries.
  • [Improvement] No longer break processing on failing parallel virtual partitions in ActiveJob because it is compensated by virtual marking.
  • [Improvement] Always use Virtual offset management for Pro ActiveJobs.
  • [Improvement] Do not attempt to mark offsets on already revoked partitions.
  • [Improvement] Make sure, that VP components are not injected into non VP strategies.
  • [Improvement] Improve complex strategies inheritance flow.
  • [Improvement] Optimize offset management for DLQ + MoM feature combinations.
  • [Change] Removed Karafka::Pro::BaseConsumer in favor of Karafka::BaseConsumer. (#1345)
  • [Fix] Fix for max_messages and max_wait_time not having reference in errors.yml (#1443)
karafka - v2.0.41

Published by mensfeld over 1 year ago

  • [Feature] Provide Karafka::Pro::Iterator for anonymous topic/partitions iterations and messages lookups (#1389 and #1427).
  • [Improvement] Optimize topic lookup for read_topic admin method usage.
  • [Improvement] Report via LoggerListener information about the partition on which a given job has started and finished.
  • [Improvement] Slightly normalize the LoggerListener format. Always report partition-related operations as follows: TOPIC_NAME/PARTITION.
  • [Improvement] Do not retry recovery from unknown_topic_or_part when Karafka is shutting down as there is no point and no risk of any data losses.
  • [Improvement] Report client.software.name and client.software.version according to librdkafka recommendation.
  • [Improvement] Report ten longest integration specs after the suite execution.
  • [Improvement] Prevent user-originating errors related to statistics processing after listener loop crash from potentially crashing the listener loop and hanging Karafka process.
karafka - v2.0.40

Published by mensfeld over 1 year ago

  • [Improvement] Introduce Karafka::Messages::Messages#empty? method to handle Idle related cases where shutdown or revocation would be called on an empty messages set. This method allows for checking if there are any messages in the messages batch.
  • [Refactor] Require messages builder to accept partition and do not fetch it from messages.
  • [Refactor] Use empty messages set for internal APIs (Idle) (so there always is Karafka::Messages::Messages)
  • [Refactor] Allow for empty messages set initialization with -1001 and -1 on metadata (similar to librdkafka)
karafka - v2.0.39

Published by mensfeld over 1 year ago

  • [Feature] Provide ability to throttle/limit number of messages processed in a time unit (#1203)
  • [Feature] Provide Delayed Topics (#1000)
  • [Feature] Provide ability to expire messages (expiring topics)
  • [Feature] Provide ability to apply filters after messages are polled and before enqueued. This is a generic filter API for any usage.
  • [Improvement] When using ActiveJob with Virtual Partitions, Karafka will stop if collectively VPs are failing. This minimizes number of jobs that will be collectively re-processed.
  • [Improvement] #retrying? method has been added to consumers to provide ability to check, that we're reprocessing data after a failure. This is useful for branching out processing based on errors.
  • [Improvement] Track active_job_id in instrumentation (#1372)
  • [Improvement] Introduce new housekeeping job type called Idle for non-consumption execution flows.
  • [Improvement] Change how a manual offset management works with Long-Running Jobs. Use the last message offset to move forward instead of relying on the last message marked as consumed for a scenario where no message is marked.
  • [Improvement] Prioritize in Pro non-consumption jobs execution over consumption despite LJF. This will ensure, that housekeeping as well as other non-consumption events are not saturated when running a lot of work.
  • [Improvement] Normalize the DLQ behaviour with MoM. Always pause on dispatch for all the strategies.
  • [Improvement] Improve the manual offset management and DLQ behaviour when no markings occur for OSS.
  • [Improvement] Do not early stop ActiveJob work running under virtual partitions to prevent extensive reprocessing.
  • [Improvement] Drastically increase number of scenarios covered by integration specs (OSS and Pro).
  • [Improvement] Introduce a Coordinator#synchronize lock for cross virtual partitions operations.
  • [Fix] Do not resume partition that is not paused.
  • [Fix] Fix LoggerListener cases where logs would not include caller id (when available)
  • [Fix] Fix not working benchmark tests.
  • [Fix] Fix a case where when using manual offset management with a user pause would ignore the pause and seek to the next message.
  • [Fix] Fix a case where dead letter queue would go into an infinite loop on message with first ever offset if the first ever offset would not recover.
  • [Fix] Make sure to resume always for all LRJ strategies on revocation.
  • [Refactor] Make sure that coordinator is topic aware. Needed for throttling, delayed processing and expired jobs.
  • [Refactor] Put Pro strategies into namespaces to better organize multiple combinations.
  • [Refactor] Do not rely on messages metadata for internal topic and partition operations like #seek so they can run independently from the consumption flow.
  • [Refactor] Hold a single topic/partition reference on a coordinator instead of in executor, coordinator and consumer.
  • [Refactor] Move #mark_as_consumed and #mark_as_consumed!into Strategies::Default to be able to introduce marking for virtual partitions.
karafka - v2.0.38

Published by mensfeld over 1 year ago

  • [Improvement] Introduce Karafka::Admin#read_watermark_offsets to get low and high watermark offsets values.
  • [Improvement] Track active_job_id in instrumentation (#1372)
  • [Improvement] Improve #read_topic reading in case of a compacted partition where the offset is below the low watermark offset. This should optimize reading and should not go beyond the low watermark offset.
  • [Improvement] Allow #read_topic to accept instance settings to overwrite any settings needed to customize reading behaviours.
karafka - v2.0.37

Published by mensfeld over 1 year ago

  • [Fix] Declarative topics execution on a secondary cluster run topics creation on the primary one (#1365)
  • [Fix] Admin read operations commit offset when not needed (#1369)
karafka - v2.0.36

Published by mensfeld over 1 year ago

  • [Refactor] Rename internal naming of Structurable to Declaratives for declarative topics feature.
  • [Fix] AJ + DLQ + MOM + LRJ is pausing indefinitely after the first job (#1362)
karafka - v2.0.35

Published by mensfeld over 1 year ago

  • [Feature] Allow for defining topics config via the DSL and its automatic creation via CLI command.
  • [Feature] Allow for full topics reset and topics repartitioning via the CLI.

You can read about this feature here: karafka.io/docs/Topics-management-and-administration/

karafka - v2.0.34

Published by mensfeld over 1 year ago

  • [Improvement] Attach an embedded tag to Karafka processes started using the embedded API.
  • [Change] Renamed Datadog::Listener to Datadog::MetricsListener for consistency. (#1124)
karafka - v2.0.33

Published by mensfeld over 1 year ago

  • [Feature] Support perform_all_later in ActiveJob adapter for Rails 7.1+
  • [Feature] Introduce the ability to assign and re-assign tags in consumer instances. This can be used for extra instrumentation that is context aware.
  • [Feature] Introduce the ability to assign and reassign tags to the Karafka::Process.
  • [Improvement] When using ActiveJob adapter, automatically tag jobs with the name of the ActiveJob class that is running inside of the ActiveJob consumer.
  • [Improvement] Make ::Karafka::Instrumentation::Notifications::EVENTS list public for anyone wanting to re-bind those into a different notification bus.
  • [Improvement] Set fetch.message.max.bytes for Karafka::Admin to 5MB to make sure that all data is fetched correctly for Web UI under heavy load (many consumers).
  • [Improvement] Introduce a strict_topics_namespacing config option to enable/disable the strict topics naming validations. This can be useful when working with pre-existing topics which we cannot or do not want to rename.
  • [Fix] Karafka monitor is prematurely cached (#1314)
karafka - v2.0.32

Published by mensfeld over 1 year ago

  • [Fix] Many non-existing topic subscriptions propagate poll errors beyond client
  • [Improvement] Ignore unknown_topic_or_part errors in dev when allow.auto.create.topics is on.
  • [Improvement] Optimize temporary errors handling in polling for a better backoff policy
karafka - v2.0.31

Published by mensfeld over 1 year ago

  • [Feature] Allow for adding partitions via Admin#create_partitions API.
  • [Fix] Do not ignore admin errors upon invalid configuration (#1254)
  • [Fix] Topic name validation (#1300) - CandyFet
  • [Improvement] Increase the max_wait_timeout on admin operations to five minutes to make sure no timeout on heavily loaded clusters.
  • [Maintenance] Require karafka-core >= 2.0.11 and switch to shared RSpec locator.
  • [Maintenance] Require karafka-rdkafka >= 0.12.1
karafka - v2.0.30

Published by mensfeld over 1 year ago

  • [Improvement] Alias --consumer-groups with --include-consumer-groups
  • [Improvement] Alias --subscription-groups with --include-subscription-groups
  • [Improvement] Alias --topics with --include-topics
  • [Improvement] Introduce --exclude-consumer-groups for ability to exclude certain consumer groups from running
  • [Improvement] Introduce --exclude-subscription-groups for ability to exclude certain subscription groups from running
  • [Improvement] Introduce --exclude-topics for ability to exclude certain topics from running
karafka - v2.0.29

Published by mensfeld over 1 year ago

  • [Improvement] Make sure, that the Karafka#producer instance has the LoggerListener enabled in the install template, so Karafka by default prints both consumer and producer info.
  • [Improvement] Extract the code loading capabilities of Karafka console from the executable, so web can use it to provide CLI commands.
  • [Fix] Fix for: running karafka console results in NameError with Rails (#1280)
  • [Fix] Make sure, that the caller for async errors is being published.
  • [Change] Make sure that WaterDrop 2.4.10 or higher is used with this release to support Web-UI.
karafka - v2.0.28

Published by mensfeld over 1 year ago

  • [Feature] Provide the ability to use Dead Letter Queue with Virtual Partitions.
  • [Improvement] Collapse Virtual Partitions upon retryable error to a single partition. This allows dead letter queue to operate and mitigate issues arising from work virtualization. This removes uncertainties upon errors that can be retried and processed. Affects given topic partition virtualization only for multi-topic and mulit-partition parallelization. It also minimizes potential "flickering" where given data set has potentially many corrupted messages. The collapse will last until all the messages from the collective corrupted batch are processed. After that, virtualization will resume.
  • [Improvement] Introduce #collapsed? consumer method available for consumers using Virtual Partitions.
  • [Improvement] Allow for customization of DLQ dispatched message details in Pro (#1266) via the #enhance_dlq_message consumer method.
  • [Improvement] Include original_consumer_group in the DLQ dispatched messages in Pro.
  • [Improvement] Use Karafka client_id as kafka client.id value by default

Upgrade notes

If you want to continue to use karafka as default for kafka client.id, assign it manually:

class KarafkaApp < Karafka::App
  setup do |config|
    # Other settings...
    config.kafka = {
      'client.id': 'karafka'
    }
karafka - v2.0.27

Published by mensfeld almost 2 years ago

  • Do not lock Ruby version in Karafka in favour of karafka-core.
  • Make sure karafka-core version is at least 2.0.9 to make sure we run karafka-rdkafka.
karafka - v2.0.26

Published by mensfeld almost 2 years ago

  • [Feature] Allow for disabling given topics by setting active to false. It will exclude them from consumption but will allow to have their definitions for using admin APIs, etc.
  • [Improvement] Early terminate on read_topic when reaching the last offset available on the request time.
  • [Improvement] Introduce a quiet state that indicates that Karafka is not only moving to quiet mode but actually that it reached it and no work will happen anymore in any of the consumer groups.
  • [Improvement] Use Karafka defined routes topics when possible for read_topic admin API.
  • [Improvement] Introduce client.pause and client.resume instrumentation hooks for tracking client topic partition pausing and resuming. This is alongside of consumer.consuming.pause that can be used to track both manual and automatic pausing with more granular consumer related details. The client.* should be used for low level tracking.
  • [Improvement] Replace LoggerListener pause notification with one based on client.pause instead of consumer.consuming.pause.
  • [Improvement] Expand LoggerListener with client.resume notification.
  • [Improvement] Replace random anonymous subscription groups ids with stable once.
  • [Improvement] Add consumer.consume, consumer.revoke and consumer.shutting_down notification events and move the revocation logic calling to strategies.
  • [Change] Rename job queue statistics processing key to busy. No changes needed because naming in the DataDog listener stays the same.
  • [Fix] Fix proctitle listener state changes reporting on new states.
  • [Fix] Make sure all files descriptors are closed in the integration specs.
  • [Fix] Fix a case where empty subscription groups could leak into the execution flow.
  • [Fix] Fix LoggerListener reporting so it does not end with ..
  • [Fix] Run previously defined (if any) signal traps created prior to Karafka signals traps.
karafka - v2.0.25 - yanked

Published by mensfeld almost 2 years ago

  • Release yanked because RubyGems release contained uncomitted changes. Please use 2.0.26.