Xinfra Monitor monitors the availability of Kafka clusters by producing synthetic workloads using end-to-end pipelines to obtain derived vital statistics - E2E latency, service produce/consume availability, offsets commit availability & latency, message loss rate and more.
APACHE-2.0 License
Xinfra Monitor (formerly Kafka Monitor) is a framework to implement and execute long-running kafka system tests in a real cluster. It complements Kafka’s existing system tests by capturing potential bugs or regressions that are only likely to occur after prolonged period of time or with low probability. Moreover, it allows you to monitor Kafka cluster using end-to-end pipelines to obtain a number of derived vital stats such as
You can easily deploy Xinfra Monitor to test and monitor your Kafka cluster without requiring any change to your application.
Xinfra Monitor can automatically create the monitor topic with the specified config and increase partition count of the monitor topic to ensure partition# >= broker#. It can also reassign partition and trigger preferred leader election to ensure that each broker acts as leader of at least one partition of the monitor topic. This allows Xinfra Monitor to detect performance issue on every broker without requiring users to manually manage the partition assignment of the monitor topic.
Xinfra Monitor is used in conjunction with different middle-layer services such as li-apache-kafka-clients in order to monitor single clusters, pipeline desination clusters, and other types of clusters as done in Linkedin engineering for real-time cluster healthchecks.
These are some of the metrics emitted from a Xinfra Monitor instance.
kmf:type=kafka-monitor:offline-runnable-count
kmf.services:type=produce-service,name=*:produce-availability-avg
kmf.services:type=consume-service,name=*:consume-availability-avg
kmf.services:type=produce-service,name=*:records-produced-total
kmf.services:type=consume-service,name=*:records-consumed-total
kmf.services:type=produce-service,name=*:records-produced-rate
kmf.services:type=produce-service,name=*:produce-error-rate
kmf.services:type=consume-service,name=*:consume-error-rate
kmf.services:type=consume-service,name=*:records-lost-total
kmf.services:type=consume-service,name=*:records-lost-rate
kmf.services:type=consume-service,name=*:records-duplicated-total
kmf.services:type=consume-service,name=*:records-delay-ms-avg
kmf.services:type=commit-availability-service,name=*:offsets-committed-avg
kmf.services:type=commit-availability-service,name=*:offsets-committed-total
kmf.services:type=commit-availability-service,name=*:failed-commit-offsets-avg
kmf.services:type=commit-availability-service,name=*:failed-commit-offsets-total
kmf.services:type=commit-latency-service,name=*:commit-offset-latency-ms-avg
kmf.services:type=commit-latency-service,name=*:commit-offset-latency-ms-max
kmf.services:type=commit-latency-service,name=*:commit-offset-latency-ms-99th
kmf.services:type=commit-latency-service,name=*:commit-offset-latency-ms-999th
kmf.services:type=commit-latency-service,name=*:commit-offset-latency-ms-9999th
Xinfra Monitor requires Gradle 2.0 or higher. Java 7 should be used for building in order to support both Java 7 and Java 8 at runtime.
Xinfra Monitor supports Apache Kafka 0.8 to 2.0:
$ git clone https://github.com/linkedin/kafka-monitor.git
$ cd kafka-monitor
$ ./gradlew jar
$ ./bin/xinfra-monitor-start.sh config/xinfra-monitor.properties
Edit config/xinfra-monitor.properties
to specify custom configurations for producer in the key/value map produce.producer.props
in
config/xinfra-monitor.properties
. Similarly specify configurations for
consumer as well. The documentation for producer and consumer in the key/value maps can be found in the Apache Kafka wiki.
$ ./bin/xinfra-monitor-start.sh config/xinfra-monitor.properties
Metrics produce-availability-avg
and consume-availability-avg
demonstrate
whether messages can be properly produced to and consumed from this cluster.
See Service Overview wiki for how these metrics are derived.
$ ./bin/single-cluster-monitor.sh --topic test --broker-list localhost:9092 --zookeeper localhost:2181
Edit config/multi-cluster-monitor.properties
to specify the right broker and
zookeeper url as suggested by the comment in the properties file
Metrics produce-availability-avg
and consume-availability-avg
demonstrate
whether messages can be properly produced to the source cluster and consumed
from the destination cluster. See config/multi-cluster-monitor.properties for
the full jmx path for these metrics.
$ ./bin/xinfra-monitor-start.sh config/multi-cluster-monitor.properties
./gradlew checkstyleMain checkstyleTest
./gradlew idea
./gradlew eclipse