karafka

Ruby and Rails efficient Kafka processing framework

OTHER License

Downloads
14.5M
Stars
2K
Committers
59

Bot releases are hidden (Show)

karafka - v2.0.5

Published by mensfeld about 2 years ago

  • Fix unnecessary double new line in the karafka.rb template for Ruby on Rails
  • Fix a case where a manually paused partition would not be processed after rebalance (#988)
  • Increase specs stability.
  • Lower concurrency of execution of specs in Github CI.
karafka - v2.0.4

Published by mensfeld about 2 years ago

  • Fix hanging topic creation (#964)
  • Fix conflict with other Rails loading libraries like gruf (#974)
karafka - v2.0.3

Published by mensfeld about 2 years ago

  • Update boot info on server startup.
  • Update karafka info with more descriptive Ruby version info.
  • Fix issue where when used with Rails in development, log would be too verbose.
  • Fix issue where Zeitwerk with Rails would not load Pro components despite license being present.
karafka - v2.0.2

Published by mensfeld about 2 years ago

karafka - v2.0.1

Published by mensfeld about 2 years ago

  • Provide Karafka::Admin for creating and destroying topics and fetching cluster-info.
  • Update integration specs always to use one-time disposable topics.
  • Remove no longer needed wait_for_kafka script.
  • Add more integration specs to cover offset management upon errors.
karafka - v2.0.0

Published by mensfeld about 2 years ago

This changelog describes changes between 1.4 and 2.0. Please refer to appropriate release notes for changes between particular rc releases.

Karafka 2.0 is a major rewrite that brings many new things to the table but also removes specific concepts that happened not to be as good as I initially thought when I created them.

Please consider getting a Pro version if you want to support my work on the Karafka ecosystem!

For anyone worried that I will start converting regular features into Pro: This will not happen. Anything free and fully OSS in Karafka 1.4 will forever remain free. Most additions and improvements to the ecosystem are to its free parts. Any feature that is introduced as a free and open one will not become paid.

Additions

This section describes new things and concepts introduced with Karafka 2.0.

Karafka 2.0:

  • Introduces multi-threaded support for concurrent work consumption for separate partitions as well as for single partition work via Virtual Partitions.
  • Introduces Active Job adapter for using Karafka as a jobs backend with Ruby on Rails Active Job.
  • Introduces fully automatic integration end-to-end test suite that checks any case I could imagine.
  • Introduces Virtual Partitions for ability to parallelize work of a single partition.
  • Introduces Long-Running Jobs to allow for work that would otherwise exceed the max.poll.interval.ms.
  • Introduces the Enhanced Scheduler that uses a non-preemptive LJF (Longest Job First) algorithm instead of a a FIFO (First-In, First-Out) one.
  • Introduces Enhanced Active Job adapter that is optimized and allows for strong ordering of jobs and more.
  • Introduces seamless Ruby on Rails integration via Rails::Railte without need for any extra configuration.
  • Provides #revoked method for taking actions upon topic revocation.
  • Emits underlying async errors emitted from librdkafka via the standardized error.occurred monitor channel.
  • Replaces ruby-kafka with librdkafka as an underlying driver.
  • Introduces official EOL policies.
  • Introduces benchmarks that can be used to profile Karafka.
  • Introduces a requirement that the end user code needs to be thread-safe.
  • Introduces a Pro subscription with a commercial license to fund further ecosystem development.

Deletions

This section describes things that are no longer part of the Karafka ecosystem.

Karafka 2.0:

  • Removes topics mappers concept completely.
  • Removes pidfiles support.
  • Removes daemonization support.
  • Removes support for using sidekiq-backend due to introduction of multi-threading.
  • Removes the Responders concept in favour of WaterDrop producer usage.
  • Removes completely all the callbacks in favour of finalizer method #shutdown.
  • Removes single message consumption mode in favour of documentation on how to do it easily by yourself.

Changes

This section describes things that were changed in Karafka but are still present.

Karafka 2.0:

  • Uses only instrumentation that comes from Karafka. This applies also to notifications coming natively from librdkafka. They are now piped through Karafka prior to being dispatched.
  • Integrates WaterDrop 2.x tightly with autoconfiguration inheritance and an option to redefine it.
  • Integrates with the karafka-testing gem for RSpec that also has been updated.
  • Updates cli info to reflect the 2.0 details.
  • Stops validating kafka configuration beyond minimum as the rest is handled by librdkafka.
  • No longer uses dry-validation.
  • No longer uses dry-monitor.
  • No longer uses dry-configurable.
  • Lowers general external dependencies three heavily.
  • Renames Karafka::Params::BatchMetadata to Karafka::Messages::BatchMetadata.
  • Renames Karafka::Params::Params to Karafka::Messages::Message.
  • Renames #params_batch in consumers to #messages.
  • Renames Karafka::Params::Metadata to Karafka::Messages::Metadata.
  • Renames Karafka::Fetcher to Karafka::Runner and align notifications key names.
  • Renames StdoutListener to LoggerListener.
  • Reorganizes monitoring and logging to match new concepts.
  • Notifies on fatal worker processing errors.
  • Contains updated install templates for Rails and no-non Rails.
  • Changes how the routing style (0.5) behaves. It now builds a single consumer group instead of one per topic.
  • Introduces changes that will allow me to build full web-UI in the upcoming 2.1.
  • Contains updated example apps.
  • Standardizes error hooks for all error reporting (error.occurred).
  • Changes license to LGPL-3.0.
  • Introduces a karafka-core dependency that contains common code used across the ecosystem.
  • Contains updated wiki on everything I could think of.

What's ahead

Karafka 2.0 is just the beginning.

There are several things in the plan already for 2.1 and beyond, including a web dashboard, at-rest encryption, transactions support, and more.

karafka - v2.0.0.rc5

Published by mensfeld about 2 years ago

  • Improve specs stability
  • Improve forceful shutdown
  • Add support for debug TTIN backtrace printing
  • Fix a case where logger listener would not intercept warn level
  • Require rdkafka >= 0.12
  • Replace statistics decorator with the one from karafka-core
karafka - v2.0.0.rc4

Published by mensfeld about 2 years ago

  • Remove dry-monitor
  • Use karafka-core
karafka - v2.0.0.rc3

Published by mensfeld about 2 years ago

  • Fix Pro partitioner hash function may not utilize all the threads (#907).
  • Improve virtual partitions messages distribution.
  • Add StatsD/DataDog optional monitoring listener + dashboard template.
  • Validate that Pro consumer is always used for Pro subscription.
  • Improve ActiveJob consumer shutdown behaviour.
  • Change default max_wait_time to 1 second.
  • Change default max_messages to 100 (#915).
  • Move logger listener polling reporting level to debug when no messages (#916).
  • Improve stability on aggressive rebalancing (multiple rebalances in a short period).
  • Improve specs stability.
  • Allow using :key and :partition_key for Enhanced Active Job partitioning.
karafka - v2.0.0.rc2

Published by mensfeld over 2 years ago

  • Fix example_consumer.rb.erb #shutdown and #revoked signatures to correct once.
  • Improve the install user experience (print status and created files).
  • Change default max_wait_time from 10s to 5s.
  • Remove direct dependency on dry-configurable in favour of a home-brew.
  • Remove direct dependency on dry-validation in favour of a home-brew.
karafka - v2.0.0.rc1

Published by mensfeld over 2 years ago

  • Extract consumption partitioner out of listener inline code.
  • Introduce virtual partitioner concept for parallel processing of data from a single topic partition.
  • Improve stability when there kafka internal errors occur while polling.
  • Fix a case where we would resume a LRJ partition upon rebalance where we would reclaim the partition while job was still running.
  • Do not revoke pauses for lost partitions. This will allow to un-pause reclaimed partitions when LRJ jobs are done.
  • Fail integrations by default (unless configured otherwise) if any errors occur during Karafka server execution.
karafka - v2.0.0.beta5

Published by mensfeld over 2 years ago

  • Always resume processing of a revoked partition upon assignment.
  • Improve specs stability.
  • Fix a case where revocation job would be executed on partition for which we never did any work.
  • Introduce a jobs group coordinator for easier jobs management.
  • Improve stability of resuming paused partitions that were revoked and re-assigned.
  • Optimize reaction time on partition ownership changes.
  • Fix a bug where despite setting long max wait time, we would return messages prior to it while not reaching the desired max messages count.
  • Add more integration specs related to polling limits.
  • Remove auto-detection of re-assigned partitions upon rebalance as for too fast rebalances it could not be accurate enough. It would also mess up in case of rebalances that would happen right after a #seek was issued for a partition.
  • Optimize the removal of pre-buffered lost partitions data.
  • Always rune #revoked when rebalance with revocation happens.
  • Evict executors upon rebalance, to prevent race-conditions.
  • Align topics names for integration specs.
karafka - v2.0.0.beta4

Published by mensfeld over 2 years ago

  • Rename job internal API methods from #prepare to #before_call and from #teardown to #after_call to abstract away jobs execution from any type of executors and consumers logic
  • Remove ability of running before_consume and after_consume completely. Those should be for internal usage only.
  • Reorganize how Pro consumer and Pro AJ consumers inherit.
  • Require WaterDrop 2.3.1.
  • Add more integration specs for rebalancing and max poll exceeded.
  • Move revoked? state from PRO to regular Karafka.
  • Use return value of mark_as_consumed! and mark_as_consumed as indicator of partition ownership + use it to switch the ownership state.
  • Do not remove rebalance manager upon client reset and recovery. This will allow us to keep the notion of lost partitions, so we can run revocation jobs for blocking jobs that exceeded the max poll interval.
  • Run revocation jobs upon reaching max poll interval for blocking jobs.
  • Early exit poll operation upon partition lost or max poll exceeded event.
  • Always reset consumer instances on timeout exceeded.
  • Wait for Kafka to create all the needed topics before running specs in CI.
karafka - v2.0.0.beta3

Published by mensfeld over 2 years ago

  • Jobs building responsibility extracted out of the listener code base.
  • Fix a case where specs supervisor would try to kill no longer running process (#868)
  • Fix an instable integration spec that could misbehave under load
  • Commit offsets prior to pausing partitions to ensure that the latest offset is always committed
  • Fix a case where consecutive CTRL+C (non-stop) would case an exception during forced shutdown
  • Add missing consumer.prepared.error into LoggerListener
  • Delegate partition resuming from the consumers to listeners threads.
  • Add support for Long Running Jobs (LRJ) for ActiveJob [PRO]
  • Add support for Long Running Jobs for consumers [PRO]
  • Allow active_job_topic to accept a block for extra topic related settings
  • Remove no longer needed logger threads
  • Auto-adapt number of processes for integration specs based on the number of CPUs
  • Introduce an integration spec runner that prints everything to stdout (better for development)
  • Introduce extra integration specs for various ActiveJob usage scenarios
  • Rename consumer method #prepared to #prepare to reflect better its use-case
  • For test and dev raise an error when expired license key is used (never for non dev)
  • Add worker related monitor events (worker.process and worker.processed)
  • Update LoggerListener to include more useful information about processing and polling messages
karafka - v2.0.0.beta2

Published by mensfeld over 2 years ago

  • Abstract away notion of topics groups (until now it was just an array)
  • Optimize how jobs queue is closed. Since we enqueue jobs only from the listeners, we can safely close jobs queue once listeners are done. By extracting this responsibility from listeners, we remove corner cases and race conditions. Note here: for non-blocking jobs we do wait for them to finish while running the poll. This ensures, that for async jobs that are long-living, we do not reach max.poll.interval.
  • Shutdown jobs are executed in workers to align all the jobs behaviours.
  • Shutdown jobs are always blocking.
  • Notion of ListenersBatch was introduced similar to WorkersBatch to abstract this concept.
  • Change default shutdown_timeout to be more than max_wait_time not to cause forced shutdown when no messages are being received from Kafka.
  • Abstract away scheduling of revocation and shutdown jobs for both default and pro schedulers
  • Introduce a second (internal) messages buffer to distinguish between raw messages buffer and karafka messages buffer
  • Move messages and their metadata remap process to the listener thread to allow for their inline usage
  • Change how we wait in the shutdown phase, so shutdown jobs can still use Kafka connection even if they run for a longer period of time. This will prevent us from being kicked out from the group early.
  • Introduce validation that ensures, that shutdown_timeout is more than max_wait_time. This will prevent users from ending up with a config that could lead to frequent forceful shutdowns.
karafka - v2.0.0.beta1

Published by mensfeld over 2 years ago

  • Update the jobs queue blocking engine and allow for non-blocking jobs execution
  • Provide #prepared hook that always runs before the fetching loop is unblocked
  • [Pro] Introduce performance tracker for scheduling optimizer
  • Provide ability to pause (#pause) and resume (#resume) given partitions from the consumers
  • Small integration specs refactoring + specs for pausing scenarios
karafka - v2.0.0.alpha6

Published by mensfeld over 2 years ago

  • Fix a bug, where upon missing boot file and Rails, railtie would fail with a generic exception (#818)
  • Fix an issue with parallel pristine specs colliding with each other during bundle install (#820)
  • Replace consumer.consume with consumer.consumed event to match the behaviour
  • Make sure, that offset committing happens before the consumer.consumed event is propagated
  • Fix for failing when not installed (just a dependency) (#817)
  • Evict messages from partitions that were lost upon rebalancing (#825)
  • Do not run #revoked on partitions that were lost and assigned back upon rebalancing (#825)
  • Remove potential duplicated that could occur upon rebalance with re-assigned partitions (#825)
  • Optimize integration test suite additional consumers shutdown process (#828)
  • Optimize messages eviction and duplicates removal on poll stopped due to lack of messages
  • Add static group membership integration spec
karafka - v2.0.0.alpha5

Published by mensfeld over 2 years ago

karafka - v2.0.0.alpha4

Published by mensfeld over 2 years ago

karafka - v2.0.0.alpha3

Published by mensfeld over 2 years ago

  • Restore 'app.initialized' state and add notification on it
  • Fix the installation flow for Rails and add integration tests for this scenario
  • Add more integration tests covering some edge cases