The open-source observability platform everyone needs!
GPL-3.0 License
Bot releases are hidden (Show)
Published by Ferroin over 2 years ago
Table of contents
❗ We're keeping our codebase healthy by removing features that are end of life. Read the deprecation notice to check if you are affected.
We are excited to launch one of our flagship machine learning (ML) assisted troubleshooting features in Netdata: the Anomaly Advisor.
Netdata now comes with on-device ML! Unsupervised ML models are trained for every metric, at the edge (on your devices), enabling real time anomaly detection across your infrastructure.
This feature is part of a broader philosophy we have at Netdata when it comes to how we can leverage ML-based solutions to help augment and assist traditional troubleshooting workflows, without having to centralize all your data.
The new Anomalies tab quickly lets you find periods of time with elevated anomaly rates across all of your nodes. Once you highlight a period of interest, Netdata will generate a ranked list of the most anomalous metrics across all nodes in the highlighted timeframe. The goal is to quickly let you find periods of abnormal activity in your infrastructure and bring to your attention the metrics that were most anomalous during that time.
In our latest release, we improved the usability of Anomaly Advisor and also ensured that the anomalous metrics are always relevant to the time period you are investigating.
A great deal of care has gone into ensuring that ML running on your device is as light weight in terms of resource consumption as possible. For instance, metrics that do not have sufficient data for training and metrics that are consistently constant during training periods are considered to be "normal" until their behavior changes significantly to require re-training of the ML models.
To use this feature, please enable ML on your agent and then navigate to the "Anomalies" tab in Netdata cloud. Update netdata.conf
with the following information to enable ML on your agent:
[ml]
enabled = yes
Read more about Anomaly Advisor at our blog.
Metric Correlations allow you to quickly find metrics and charts related to a particular window of interest that you want to explore further. Metric correlations compare two adjacent windows to find how they relate to each other, and then score all metrics based on this rating, providing a list of metrics that may have influence or have been influenced by the highlighted one.
Metric Correlation was already available in Netdata Cloud, but now we are releasing a version implemented at the Netdata Agent, which drastically reduces the time required for to run. This means the metric correlation can now run almost instantly (more than 10x faster than before)!
To enable the new metric correlation at the Netdata Agent, set the following in your netdata.conf
file:
[global]
enable metric correlations = yes
On very busy Kubernetes clusters where hundreds of containers spawn and are destroyed all the time, Netdata was consuming a lot of resources and was slow to detect changes and under certain conditions it missed certain containers.
Now, Netdata:
Netdata is also capable of detecting the network interfaces that have been allocated to containers, by spawning a process that switches network namespace and identifies virtual interfaces that belong to each container. This process is improved drastically, now requiring 1/3 of the CPU resources it needed before.
Additionally, Netdata cgroups.plugin
now collects CPU shares for Kubernetes containers, allowing the visualization of the Kubernetes CPU Requests (Kubernetes writes in cgroup CPU Shares the CPU Requests that have been configured for the containers).
A new option has been added in netdata.conf
[plugin:cgroup]
section, to allow filtering containers by (resolved) name. It matches the name of the cgroup (as you see it on the dashboard).
We have also released a blog post and a video about CPU Throttling in Kubernetes. You will be amazed by our findings. Read the blog and watch the video about Kubernetes CPU throttling.
Netdata Cloud dashboards are now a lot faster in aggregating data from multiple agents, as the protocol between agents and the Cloud is approaching its final shape.
Netdata Cloud has a new look and feel for charts, which resembles the look and feel for coding IDEs:
The new home tab for war rooms allows you to quickly inspect the most important metrics for every war room, like number of nodes, metrics, retention, replication, alerts, users, custom dashboards, etc.
Time units now in charts auto-scale from microseconds to days, automatically based on the value of time to be shown.
The agent now sets a timeout on every query it sends to the agents, and the agents now respect this timeout. Previously, the cloud was timing out because of a slow query, but the agents remained busy executing that query, which had a waterfall effect on the agent load.
Custom dashboards on Netdata Cloud can now be renamed.
We have added a new Alert Configs sub tab which lists all the alerts configured on all the nodes belonging to the war room. You have now a possibility of listing the alerts configured in the - war room, nodes and alert instances respectively.
There have been a number of corner cases under which alerts could remain raised on Netdata cloud. We identified all such cases, and now Netdata Cloud is always in sync with Netdata agents about their alerts.
Netdata now identifies the Cloud provider node type it runs on. It works for GCP and AWS, and exposes this information at the Nodes tab, the single node dashboard, and the node inspector.
We improved the virtualization detection in cases where systemd is not available. Now Netdata can properly detect virtualization even in these cases.
The new Netdata Cloud now supports a global filter on nodes of war rooms. The new filter is applied on every tab for each room, allowing users to quickly switch between tabs while retaining the nodes filtered.
Netdata admin users now have the ability to remove obsolete nodes from a space. Many users have been eagerly waiting for this feature, and we thank you for your patience. We hope you will be happy to use the feature and have cleaner spaces and war rooms. A few notes to be considered:
Every Netdata Agent is a StatsD server, listening on localhost port 8125, both TCP and UDP. You can use the Netdata StatsD server to quickly visualize metrics from scripts, Cron Job, and local applications.
In this release, the Netdata StatsD server has been improved to use Judy arrays for indexing the collected metrics, drastically improving its performance.
At the same time we extended the StatsD protocol to support dictionaries
. Dictionaries are similar to sets
, but instead of reporting only the number of unique entries in the set
, dictionaries
create a counter for each of the values and report the number of occurrences for each unique event. So, to quickly get a break down of events, you can push them to StatsD like myapp.metric:EVENT|d
. StatsD will create a chart for myapp.metric
and for each unique EVENT
it will create a dimension with the number of times this events was encountered.
We also added the ability to change the units of the chart and the family of the chart, using StatsD tags, like this: myapp.metric:EVENT|d|#units=events/s
.
Finally, StatsD now automatically creates a dashboard section for every StatsD application name. Following StatsD best practices, these application names are considered to be the first keyword of collected metrics. For example, by pushing the metric myapp.metric:1|c
, StatsD will create the dashboard section "StatsD myapp".
Read more at the Netdata StatsD documentation. A real-life example of using Netdata StatsD from a shell script pushing in realtime metric to a local Netdata Agent, is available at this stress-with-curl.sh gist.
Netdata dashboards refresh all visible charts in parallel, utilizing all the resources the web browsers provide to quickly present the required charts. Since Netdata only stores metric data at the agents, all these queries are executed in parallel at the agents.
This parallelism of queries is even more intense when metrics replication/streaming is configured. In these cases, parent Netdata agents centralize metric data from many agents, and, since Netdata Cloud prefers the more distant parents for queries, they receive quite a few queries in parallel for all their children.
We also reworked many parts of the query engine of Netdata agents to achieve top performance in parallel queries. Now, Netdata agents are able to perform queries at a rate of more than 30 million points per second, per core on modern hardware. On a parent Netdata agent with a 24-core CPU we observed a sustained rate of 1.3 billion points per second! This is 3 times faster compared to the previous release.
To achieve this performance improvements we worked in these areas:
When querying metric data, a lot of memory allocations need to happen. Although Netdata agents automatically adapt their memory requirements for data collection avoiding memory operations while iterating to collect data, unfortunately at the query engine site, this is not feasible.
To make the agent more efficient for queries, the number of system calls allocating memory had to be drastically decreased. So, we developed a One Way Allocator
(OWA
), a system that works like a scratchpad for memory allocations. When the query starts, we now predict the amount of memory needed to execute the query. The query engine still does all the individual allocations, but all these are now made against the scratchpad, not against the system. OWA
is smart enough to increase the size of the scratchpad if needed during querying. And it frees all memory at once without the need for individual memory releases.
For huge data queries, the benefit is astonishing. For certain heavy data queries, 45000 memory allocations before are down to 20 with this release! This doubled the performance of the query engine.
To optimize its memory footprint for metric data, Netdata agents store collected metric data into a fixed step database (after interpolation) with a custom floating point number format we developed (we call it storage_number
), requiring just 4 bytes per data collection point, including the timestamp. When on disk, mainly due to compression, Netdata's dbengine needs just 0.34 bytes per point (including all metadata), which is probably the best among all monitoring solutions available today, allowing Netdata to massively store and manage metric data at a very high rate.
This means however, that in order to actually use a point in a query, we have to unpack it. This unpacking happens point-by-point even for data cached in memory. 1 billion points in a data query, 1 billion numbers unpacked.
In this release we analyzed the CPU cache efficiency of the number unpacking and we refactored it to make the best use of available CPU caches to finally increase its performance by 30%.
This release includes a better algorithm to pick the available parent to stream metrics to. The previous version was always reconnecting to the first available parent. Now it rotates them, one by one and then restarts.
An issue was fixed regarding parents with stale alerts from disconnected children. Now, the parent validates all alerts on every child re-connection.
Netdata parents now have a timeout to cleanup dead/abandoned children connections automatically.
We also worked to eliminate most of the bottlenecks when multiple children connect to the same parent. But this is still under testing, so it will make it in the next release.
Netdata uses many workers to execute several of its features. There are web workers, aclk workers, dbengine
workers, health monitoring workers, libuv workers, and many more.
We manage to identify a lot of deadlocks happening that slowed down the whole operation. We also
increased the amount of workers to deliver more capacity on busy parents.
There is a new section for monitoring Netdata workers at the "Netdata Monitoring" section of the dashboard. Using this
work we are still working to make them even more efficient.
The last release was hindered by rare deadlocks on very busy parents. These deadlocks are now gone, improving the agents ability to centralize data from many children.
Judy arrays are probably the fastest and most CPU cache-friendly indexes available. Netdata already uses them for
dbengine and its page cache. Now all Netdata dictionaries are using them too, giving a performance boost to all
dictionary operations, including StatsD.
Initialization of /proc
collectors was suboptimal, because they had to go over a slow process or adapting their read
buffers. We added a forward-looking algorithm to optimize this initialization, which now happens in 1/10th of the
time.
Some users have experiences gaps in /proc
plugin charts. We identified that these gaps were triggered by the netdev
module, which were cause the whole plugin to slow down and miss data collection iterations.
Now the netdev
module of /proc
plugin runs on its own thread to avoid this influencing the rest of the /proc
modules.
The internal web server of Netdata now spreads the work among its worker threads more evenly, utilizing as much of the
parallelism that is available to it.
netdata.conf
re-organized
We re-organized the [global]
section of the netdata.conf
, so that it is more meaningful for new users. The new
configurations are backward compatible. So, after you restart netdata with your old netdata.conf
, grab the new one
from http://localhost:19999/netdata.conf
to have the new format.
We now have our own MQTT implementation within our ACLK protocol that will eventually replace the current MQTT-C client
for several reasons, including the following:
Currently, it’s provided as a tech preview, and it’s disabled by default. Feel free to have some fun with the new
implementation. This is how to enable it in netdata.conf
:
[cloud]
mqtt5 = yes
tailscaled
to apps_groups.conf.net
, aws
, and ha
groups in apps_groups.conf.caddy
to apps_groups.conf.⚙️ Enhancing our collectors to collect all the data you need.
🐞 Improving our collectors one bug fix at a time.
📄 Keeping our documentation healthy together with our awesome community.
📦 "Handle with care" - Just like handling physical packages, we put in a lot of care and effort to publish beautiful
software packages.
⚙️ Greasing the gears to smoothen your experience with Netdata.
🐞 Increasing Netdata's reliability one bug fix at a time.
🏋️ Changes to keep our code base in good shape.
history
with relevant dbengine
params (#13041, @andrewm4894)--auto-update
option when using static/build install method (#12725, @ilyam8)The following items will be removed in our next minor release (v1.36.0):
Patch releases (if any) will not be affected.
Component | Type | Will be replaced by |
---|---|---|
python.d/chrony | collector | go.d/chrony |
python.d/ovpn_status_log | collector | go.d/openvpn_status_log |
All the deprecated components will be moved to the netdata/community repository.
In accordance with our previous deprecation notice, the following items have been removed in this release:
Component | Type | Replaced by |
---|---|---|
node.d | plugin | - |
node.d/snmp | collector | go.d/snmp |
python.d/apache | collector | go.d/apache |
python.d/couchdb | collector | go.d/couchdb |
python.d/dns_query_time | collector | go.d/dnsquery |
python.d/dnsdist | collector | go.d/dnsdist |
python.d/elasticsearch | collector | go.d/elasticsearch |
python.d/energid | collector | go.d/energid |
python.d/freeradius | collector | go.d/freeradius |
python.d/httpcheck | collector | go.d/httpcheck |
python.d/isc_dhcpd | collector | go.d/isc_dhcpd |
python.d/mysql | collector | go.d/mysql |
python.d/nginx | collector | go.d/nginx |
python.d/phpfpm | collector | go.d/phpfpm |
python.d/portcheck | collector | go.d/portcheck |
python.d/powerdns | collector | go.d/powerdns |
python.d/redis | collector | go.d/redis |
python.d/web_log | collector | go.d/weblog |
This release adds official support for the following platforms:
This release removes official support for the following platforms:
This release includes the following additional platform support changes.
Join the Netdata team on the 9th of June at 5pm UTC for the Netdata Agent Release Meetup, which will be held on
the Netdata Discord.
Together we’ll cover:
RSVP now - we look forward to
meeting you.
As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter
an issue with any of the changes made in this release or any feature in the Netdata Agent, feel free to contact us
through one of the following channels:
Published by Ferroin almost 3 years ago
The newest version of Netdata, v.1.32.0, propels us toward the end of the year, and the Netdata community is positioned to grow stronger than ever in 2022. Before we get into specifics of the new release, it's worth reflecting on that growth.
The open-source Netdata Agent, the best OSS node monitoring and troubleshooting ever, currently has:
The Netdata Cloud, our infrastructure-level, distributed, real-time monitoring and troubleshooting orchestrator, is also showing similar growth, with:
We are not just pleased with this amazing adoption rate, we are inspired by it. It is you users who give us the energy and confidence to move forward into a new era of high-fidelity, real-time monitoring and troubleshooting, made accessible to everyone!
Thank you for the inspiration! You rock!
As many of you know, even though we are not endorsed by CNCF, Netdata is the fourth most starred project in the CNCF landscape. We want to thank you for this expression of your appreciation. If you love Netdata and haven't yet, consider giving us a Github star.
Additionally, we invite you to join us on our new Discord server to continue our growth and trajectory, but also to join in on fun and informative live conversations with our wonderful community.
The following offers a high-level overview of some of the key changes made in this release, with more detailed description available in subsequent sections.
New Cloud backend and Agent communication protocol
This Agent release supports our new Cloud backend. From here, we will be offering much faster and simpler communication, reliable alerts and exchange of metadata, and first-time support for the parent-child relationship of Netdata agents. This is the first Agent release that allows Netdata Cloud to use the Netdata Agent as a distributed time-series database that supports replication and query routing, for every metric!
eBPF latency monitoring, container monitoring, and more
We use eBPF to monitor all running processes, without the cooperation of the processes and without sniffing data traffic. This new release includes 13 new eBPF monitoring features, including I/O latency, BTRFS, EXT4, NFS, XFS and ZFS latencies, IRQs latencies, extended swap monitoring, and more.
Machine learning (ML) powered anomaly detection
This release links Netdata Agent with dlib, the popular C++ machine learning algorithms library, which we use to automatically detect anomalies out-of-the-box, at the edge! Once enabled, Netdata trains an ML model for every metric, which is then used to detect outliers in real-time. The resulting "anomaly bit" (where 0=normal, 1=anomalous) associated with each database entry is stored alongside the raw metric value with zero additional storage overhead! This feature is still in development, so it is disabled by default. If you would like to test it and provide feedback, you can enable the feature using the instructions provided in the Detailed release highlights section.
New timezone selector and time controls in the user interface
We implemented a new timezone picker and time controls to enhance administrative abilities in the dashboard.
Docker image POWER8+ support
Netdata Docker images now support recent IBM Power Systems, Raptor Talos II, and more.
And more...
Four new collectors, 112 total improvements, 95 bug fixes, 49 documentation updates, and 57 packaging and installation changes!
It's no secret that the best of Netdata Cloud is yet to come. After several months of developing, testing, and benchmarking a new architectural system, we have steadied ourselves for that growth. These changes should offer notable and immediate improvements in reliability and stability, but more importantly, they allow us to quickly and efficiently develop new features and enhanced functionality. Here's what you can look for on the short-term horizon, thanks to our new architecture:
If you would like to be among the first to test this new architecture and provide feedback, first make sure that you have installed the latest Netdata version following our guide. Then, follow our instructions for enabling the new architecture.
We did a lot of work to enhance our eBPF container monitoring this release. First, we start with the development of full eBPF support for cgroups. As a refresher on just how important this update is: cgroups together with Namespaces are the building blocks for containers, which is the dominant way of distributing monitoring applications. We use cgroups to control how much of a given key resource (CPU, memory, network, and disk I/O) can be accessed or used by a process or set of processes. Our eBPF collector now creates charts for each cgroup, which enables us to understand how a specific cgroup interacts with the Linux kernel! 🤓
This enhances our already extensive monitoring by including cgroups for mem, process, network, file access, and more.
By enabling eBPF monitoring on all systems that support it, Netdata has already been established as a world-leading distributor of eBPF! We use eBPF to monitor all running processes, without the cooperation of the processes, by tracking any way the application interfaces with the system. And in this release, we continue our commitment to further improve eBPF by tracking latencies by disks, IRQs, etc.
Our new eBPF latency features include:
eBPF is a very strong addition to our monitoring tools, and we are committed to provide the best experience with monitoring with eBPF from a distance without disrupting the data flow!
But we didn't stop there with eBPF in v1.32.0. We also provided the following updates:
If you share our interest in eBPF monitoring, or have questions or requests, feel free to drop by our Community forum to start a discussion with us.
Machine learning (ML) is undeniably a wave of the future in monitoring and troubleshooting. The Netdata community is riding that wave forward together, ahead of everyone else. Netdata v.1.32.0 introduces some foundational capabilities for ML-driven anomaly detection in the agent. We have integrated the popular dlib c++ ml library to power unsupervised anomaly detection out-of-the-box.
While this functionality is still under development and subject to change, we want to develop this with you, as a team. The functionality is disabled by default while we dogfood the feature internally and build additional ML-leveraging features into Netdata Cloud. But you can go to the new [ml]
section in netdata.conf
and set enabled=yes
to turn on anomaly detection. After restarting Netdata, you should see the Anomaly Detection menu with charts highlighting the overall number and percent of anomalous metrics on your node. This can be a very useful single number summary of the state of your node.
Share your feedback by emailing us at [email protected] or just come hang out in the 🤖-ml-powered-monitoring channel of our discord, where we discuss all things ML and more!
And then, be on the lookout for some bigger announcements and launches relating to ML over the next couple of months.
Collaborating in a remote world across regions can be difficult, so we wanted to make it easier for you to sync with your administrative teams and your system information. Our new timezone selector allows you to select a timezone to accommodate collaboration needs within your teams and infrastructure. Additionally, we have added the following time controls to allow you to distinguish if the content you are looking at is live or historical and to refresh the content of the page when the tabs are in the background:
And on top of all of that, we have added 64-bit little-endian POWER8+ support to our official Docker images, allowing the use of Netdata Docker images on recent IBM Power Systems, Raptor Talos II, and similar POWER based hardware, extending the list of what is currently supported for our Docker images, which includes:
reset_netdata_trace.sh
from netdata.service (#11517, @ilyam8).install-type
before it is created (#11262, @ilyam8)install_type
detection during update (#11199, @ilyam8)-W buildinfo
output. (#11634, @Ferroin)An upcoming stable release of the Netdata agent will include a maintainability update to our base Docker image.
A small percentage of users will find that all self-compiled packages must be manually rebuilt after the update, even if relocation/SONAME errors are not encountered. --security-opt=seccomp=unconfined
can be passed with no default.json, but this introduces security vulnerabilities between the host and malicious code in the container.
Alternatively, users can prepare for the update by upgrading to one of the following:
While Netdata previously avoided making this update to minimize inconvenience to our users, we are now facing a third-party end-of-life date, and we believe the minimal number of affected users substantiates the need for the change.
Additionally, in a future stable release, we will be removing our legacy agent-to-cloud connection. Most users should see no change in this upgrade, but we will lose SOCKS 5 proxy support for the Netdata Cloud functionality, which will affect a small number of users.
As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter an issue with any of the changes made in this release or any feature in the Netdata agent, feel free to contact us by one of the following channels:
Published by firehol-automation over 6 years ago
Posted on twitter, facebook, reddit r/linux,
Hi all,
Another great netdata release: netdata v1.10.0 !
This is a birthday release: netdata is now 2 years old !
Many thanks to all the contributors that help building, enhancing and improving a project useful and helpful for thousands of admins, devops and developers around the world! You rock!
- @ktsaou
netdata now has a new web server (called static
) with a fixed number of threads, providing a lot better performance and finer control of the resources allocated to it.
All dashboard elements (javascript) have been updated to their latest versions - this allows a smoother experience when embedding netdata charts on third party web sites and apps.
IMPORTANT: all users using older netdata are advised to update to this version. This version offers improved stability, security and a huge number of bug fixes, compared to any prior version of netdata.
And as always, hundreds more enhancements, improvements and bugfixes.
BTRFS space usage monitoring and related alarms.
netdata is able to detect if any of the space-related components (physical disk allocation, data, metdata and system) of BTRFS is about the become exhausted!
#3150 - thanks to @Ferroin for explaining everything about btrfs...
netdata now monitors bcache metrics - they are automatically added to any disk that is found to be a bcache disk.
New plugin to monitor ceph, the unified, distributed storage system designed for excellent performance, reliability and scalability (#3166 @lets00).
systemd-nspawn
containers.virsh
is now called with -r
to avoid prompting for password #3144
cgroup-network
is now a lot more strict, preventing unauthorized privilege escalation #3269
cgroup-network
now searches for container processes in sub-cgroups too - this improves the mapping of network interfaces to containerscgroup-network
now works even when there are no veth
interfaces in the systemnetdata can now monitor isc-ntpd. @rda0 did a marvelous job decoding NTP Control Message Protocol, collecting ntpd metrics in the most efficient way #3421, #3454 @rda0
btw, netdata also monitors
chrony
but the chrony module of netdata is disabled by default, because certain CentOS versions ship a version of chrony that consumes 100% cpu when queried for statistics.
Added python plugin to monitor the operation of nginx plus servers. The plugin monitors everything about nginx+, except streaming #3312 @l2isbad
netdata now monitors libreswan tunnels - #3204
netdata now has an httpcheck
plugin (module of python.d.plugin), that can query remote http/https servers, track the response timings and check that the response body contains certain text #3448 @ccremer .
netdata now has portcheck
plugin (module of python.d.plugin), that can check any remote TCP port is open #3447 @ccremer
netdata now monitors icecast servers #3511 @l2isbad.
netdata now monitors traefik reverse proxies - #3557.
netdata can now monitor java spring-boot applications @Wing924
netdata now monitors dnsdist name servers - @nobody-nobody #3009
hidden
to add the dimension, but make it hidden on the dashboard - a hidden dimension can participate in various calculations, including alarms).zinit
to allow them get initialized without altering their values (this is useful if you have rare metrics that you need to initialize when netdata starts).Several new charts have been added to monitor (#3400 by @anayrat):
Also, the postgres plugin now also works when postgres is in recovery mode.
netdata prior to this version was detecting the user and group of processes by examining the ownership of /proc/PID/stat
. Unfortunately it seems that the owneship of files in /proc
do not change when the process switches user. So, netdata could not detect the user and group of processes that started as root and then switched to another user.
Now netdata reads /proc/PID/status
:
/proc/PID/statm
(all the information of /proc/PID/statm
is available in /proc/PID/status
)VmSwap
, so a new chart has been added to monitor the swap memory usage per process, user and group.
The new plugin is 20% more expensive in terms of CPU. We tried hard to optimize it, but this is as good as it can get. Read about it at #3434 and #3436
Added charts:
@ktarasz
netdata now uses /proc/uptime
when CLOCK_BOOTTIME
does not report the same uptime. In containers CLOCK_BOOTTIME
reports the uptime of the host, while /proc/uptime
reports the uptime of the container, so now netdata correctly reports the uptime of the container.
various fixes to better monitor rebuild time and rate @l2isbad
to_scan
dimensionAdded several charts for translog / indices segments statistics and JVM buffer pool utilization, which are often helpful when evaluating an elasticsearch node health #3544 @NeonSludge
netdata now supports monitoring multiple APC UPSes.
netdata now also supports monitoring IPv6 leases - @l2isbad
solar_consumption
@ccremerAdded web server response timings histogram #3558 @Wing924 .
/etc/netdata/python.d.conf
is missing @l2isbadcharts.d.plugin BASH modules can now have custom number of retries in case of data collection failures #3524.
static web server
. This web server allows netdata to work around memory fragmentation (since the treads are fixed, the underlying memory allocators reuse the same memory arenas) and cpu utilization (we can control the number of threads that will be used by netdata). This is the default now. #3248
the print button now respects the URL path netdata is hosted.
dygraphs updated to the latest version - this fixes an issue that prevented netdata charts from being interactive under certain conditions
added dygraph theme logscale
#3283
fontawesome updated to version 5
d3 updated to the latest version (this broke c3 charts that require an older version)
added d3pie charts
custom dashboards can now have alarms for specific roles (all, none, one or more).
allow stacked charts to zoom vertically when dimensions are selected
netdata now has a global XSS protection #3363
netdata now uses intersectionObserver when available #3280 - this improves the scrolling performance of the dashboard.
prevent date, time and units from wrapping at the charts legends #3286
various units scaling improvements #3285
added data-common-colors="NAME"
chart option for custom dashboards #3282.
added wiki page for creating custom dashboards on Atlassian's Confluence.
prevented a double click on the charts' toolbox to select the text of the buttons.
fixed the alignment of dashboard icons #3224 @xPaw
added a simple js, called refresh-badges.js, to update badges on a custom web page
netdata badges can now be scaled #3474
gtime
parameter, for group time. This is used to request from netdata to return values in a different rate (i.e. gtime=60
on a X/sec
dimension, will return X/min
).dimensions=
parameter now supports simple patterns #3170 and added option values match-ids
and match-names
to control which matches are executed for dimensions.system.swap
alarms now send notifications with a 30 seconds delay, to work-around a kernel bug that incorrectly reports all swap as instantly used under containers #3380.
added alarm to predict the time a mount point will run out of inodes #3566.
all system alarms are now ported to FreeBSD too #3337 @arch273
added alerta.io notifications @kattunga
added available memory alarm
removed unsupported html tags from hipchat notifications.
pagerduty notifications have been modified to avoid incident duplication #3549.
alarm definitions can now use both chart IDs and chart names (prior to this version only chart IDs were allowed).
curl
options (eg for disabling SSL certificates verification) for alarm-notify.sh
can now be defined in health_alarm_notify.conf
.
netdata can now send notifications to IRC channels #3458 @manosf
IRCCloud web client:
Irssi terminal client:
send hosts matching = *
pattern.EALREADY
or EINPROGRESS
.host tags
(the tags have to be formatted in a json friendly way) #3556.timestamps=yes|no
to /api/v1/allmetrics
to support prometheus Pushgateway #3533
netdata_info
variable with the version of netdatanetdata_host_tags
to netdata_host_tags_info
(the old exists but is deprecated and will be removed eventually)average
metrics, netdata remembers the last access time the prometheus collected metrics, on a per host basis.stream.conf
option multiple connections = accept | deny
to allow or deny multiple connection for the same netdata host. The default remains accept
, but it is likely to be changed to no
on future versions.netdata-updater
was growing the PATH
variable on each of its runs - fixed it.--accept
and --dont-start-it
command line options to kickstart-static64.sh
long double
support (useful in embedded devices that don't support long double numbers) #3354
netdata.spec
to allow building netdata on older and newer rpm based distros. Also added a script to build a netdata rpm
curl
provided with this path.gap when lost iterations
to control the number of iterations that should be lost to show a gap on the charts.idle
process scheduling priority, even when it was configured to do otherwise. Fixed it #3523
snapshots
We can now save and load dashboard snapshots for any timeframe in any resolution. snapshots allow us to save artifacts, evidence, documentation of incidents, or just the raw data for postmortem analysis.
highlighted time-frame
We can now highlight a selected time-frame on all dashboard charts. So, to quickly compare charts press ALT or CONTROL and select an area on one chart. The same area will be highlighted on all charts.
export to PDF
We can now export netdata dashboards to PDF, for any timeframe with any detail.
access lists (IP filtering)
We can now setup IP filtering at netdata.conf
for all functions of netdata (dashboard access, streaming, registry, badges, etc - no more iptables rules for protecting netdata).
TCP overflows and connection drops
netdata can now detect TCP listening sockets overflows and connection drops, for any server running on the host (even the ones netdata is not aware of).
libvirt VMs
netdata now detects libvirt network interfaces and moves them to VM section of the dashboard (it also supports .libvirt-qemu
naming of cgroups).
Units auto-scaling
netdata dashboards can now scale units (KB
-> MB
-> GB
-> TB
, etc), on the fly.
Units conversions
netdata dashboards can now convert units (eg. Celsius to Fahrenheit, seconds to HH:MM:DD, etc), on the fly.
Multiple Timezones
netdata dashboards can now change timezone on the fly (yes, we can now compare charts with server logs).
python.d.plugin rewritten
@l2isbad rewrote the whole of it, to add flexibility and support the latest netdata features! The new plugin supports the old python modules.
better / faster dashboard scrolling
netdata now uses passive event listeners to detect page scrolling. This improved significantly the responsiveness of the dashboard (check your dashboard settings: sync
scrolling is the fastest, async
is closer to the older behavior).
netdata now monitors couchdb, powerdns, beanstalkd and dnsdist !
netdata now detects redis background save failures
netdata can now send flock.com and kavenegar.com alarm notifications
and as always... dozens more improvements, enhancements, new features and bug fixes!
Netdata can now export and import dashboard snapshots.
Snapshots are JSON files containing everything the dashboard needs to be rendered: charts and chart data.
They are exported as JSON files, to your computer. The saved snapshots can be loaded back on any netdata dashboard (even of different host). When importing, not network traffic is generated. The web browser loads the local file and renders an interactive dashboard to examine it.
The current visible timeframe of the dashboard is respected, so first align the dashboard to the timeframe required and the click "Export". The pop-up allows selecting the resolution of the export (its detail).
Press the ALT or CONTROL key and select a time-frame at a chart. An overlay will appear with the selected time-frame and all the charts will highlight the same region.
The highlighted time-frame:
my-netdata
menuAlso, netdata charts can now be zoomed vertically (use the SHIFT key, like in zoom, but select the chart vertically):
netdata dashboards can now be printed to PDF. Just click the 🖨️ icon on the dashboard.
The current visible timeframe of the dashboard is respected, so first align the dashboard to the timeframe required and the click "Print".
netdata can now check the client IPs connecting to it and deny/allow access based on your settings. No more iptables rules to control access to netdata.
All these settings are netdata simple patterns that are checked against the client IP (string matching - not subnet matching). localhost clients (IPv4, IPv6 and unix domain sockets) can be matched with localhost
:
[web].allow connections from
to match the clients' IPs allowed to connect to netdata. This has the same effect with iptables (but implemented at the application level - so clients will get connected, and disconnected immediately if they are not allowed access, without any response from netdata).netdata.conf
: [web].allow dashboard from
to match the clients' IPs that are allowed to access the dashboard (ie fetch static files and query netdata API).netdata.conf
: [web].allow badges from
to match the clients' IPs that are allowed to access badges (the dashboard clients are allowed to access badges too, so this setting allows badges to clients that do not have access to the dashboard).netdata.conf
: [web].allow streaming from
to match the the clients' IPs that are allowed to stream to stream metrics.stream.conf
: [API_KEY].allow from
to match the clients' IPs allowed to push metrics for the given API KEY.stream.conf
: [MACHINE_GUID].allow from
to match the clients' IPs allowed to push metrics for the specific machine.netdata will also check the API keys supplied by slaves and proxies connected.
netdata.conf
: [web].allow netdata.conf from
to limit the clients that can get netdata.conf
- by default netdata allows only private IPs.netdata.conf
: [registry].allow from
to limit the clients allowed to access the registry (only when this netdata acts as a registry).Added a new chart: ipv4.tcplistenissues
with dimensions ListenOverflows
and ListenDrops
.
This chart detects if any listening TCP socket on the host, is overflown, or it drops connections. This is system-wide: any listening TCP socket, of any application.
The chart will not be shown if these kernel counters are zero. It will be enabled automatically if it is found non-zero at any point (it is collected via /proc/net/netstat
every second). If you need to enable it even if it is zero, edit netdata.conf and set:
[plugin:proc:/proc/net/netstat]
TCP listen issues = yes
Two alarms have been added, one for ListenOverflows
and one for ListenDrops
that detect if there is any overflow or drop in the last minute (they run every 10 seconds).
slack alarm for overflows:
slack alarm for drops:
and the alarms configuration:
The alarms will automatically be attached when the chart is active.
The overflows dimension and alarm is supported on FreeBSD too.
/proc/net/sockstat
and /proc/net/sockstat6
These files provide sockets statistics for all protocols.
netdata also adds 3 new alarms:
netdata proxies with more than 100 slaves, had a timing issue that caused them to crash randomly on slave reconnects. Parts of the code have been rewritten to get rid of the timing issue.
netdata slaves and proxies, now have a protection that ensures they will never use 100% CPU, even if the master is misbehaving.
expired orphaned hosts are now removed from the my-netdata
menu of the dashboard.
streaming functions can now be monitored via access.log
streaming now support IP filtering. So the entire streaming functionality, API keys and MACHINE GUIDs can be associated with one or more IPs or IP patterns.
streaming now transfers alarm variables too
@l2isbad did a marvelous job rewriting python.d.plugin
. The new plugin:
supports option autodetection_retry: SECONDS
. When set to non-zero, the plugin will re-check the module every that many seconds. This solves the problem that netdata did not persist on collecting metrics from applications, if the application is not found running when netdata starts. By default is zero for all modules, so you need to enable it for all the applications you need it.
got a rewrite of several functions, like logging, module configuration, chart and dimensions management.
the new URL service disables by default certificates checks, to allow self-signed certificates to work without configuration.
The new plugin is compatible with custom python modules developed for the previous version.
custom regex now supports parsing hostnames and IPs @l2isbad
web_log now parses lines with error 408 (request timeout - these are a special case, since the request has not received by the web server, so the log line is incomplete) @l2isbad
now properly parses resp_length
with value -
@racciari
CouchDB maintainer @wohali, submitted a couchdb plugin for netdata. The plugin monitors:
2 charts have been added to monitor background save health status, bundled with 2 alarms that detect if background save has failed, or background save is slow (warn > 10 mins, crit > 20min). @l2isbad
netdata now monitors PowerDNS, @l2isbad
netdata now monitors beanstalkd, @l2isbad
netdata now monitors dnsdist, @nobody-nobody
disks under Linux are renamed using /dev/disk/by-label
. An option has been added at netdata.conf to also allow renaming based on /dev/disk/by-id
.
chrony
is now disabled by default, because there have been reports that chronyc
enters an infinite loop in CentOS and RHEL.
tomcat
improvements to support flavors of the tomcat server @Wing924
zfs
on FreeBSD now monitors ZFS TRIM statistics
disks monitoring charts on FreeBSD got a lot more FreeBSD related dimensions.
added CPU frequency charts on FreeBSD (Linux already had them).
chart system.io
(the total system Disk I/O) is now calculated by aggregating the reads and writes of all physical disks. The previous system.io
chart (that is based on pgpgin
and pgpgout
from /proc/vmstat
) is now named system.pgpgio
. The key difference is that the new system.io
now sees ZFS I/O, and it also correctly and accurately sums the real disk bandwidth of RAID arrays.
chart system.net
(the total system network bandwidth) is now calculated by aggregating the bandwidth of all physical network interfaces and is common for both IPv4 and IPv6.
tc
(QoS) charts now sort the dimensions on the legends, the same way tc
reports them.
postgres
versions <= 10 the WAL directory was named pg_xlog'
and from 10 upwards has been renamed to pg_wal
@facetoe
mysql
(and mariadb) got new charts for galera replication @spinitron
openvpn_log
improvements @l2isbad
smartd
improvements @l2isbad
varnish
module has been rewritten @l2isbad
mdstat
regex fix @l2isbad
smartd_log
improvements @l2isbad
dns_query_time
improvements @wungad
isc_dhcpd
improvements @wungad
freeipmi.plugin
got a command line option (can be given at netdata.conf) to ignore certain sensor IDs that are faulty.
freeradius
improvements @wungad
node.d.plugin
bugfixes
netdata.conf
, plugins directory = "DIRECTORY1" "DIRECTORY2" ...
, up to 20 directories. By default netdata sets:[global]
plugins directory = "/usr/libexec/netdata/plugins.d" "/etc/netdata/custom-plugins.d"
netdata now supports alarms variables.
Each plugin can now define host global and chart local variables with static values, that can be used in alarms' expressions. So, hosts and charts can now have any number of static values associated with them (eg. an application server may expose its max connections limit), and these static values can be used to trigger alarms (eg. the current connections, is compared to the max connections variable). The whole setup allows alarm templates to use this feature (eg each netdata can maintain different such variables for each server it monitors).
Alarm variables are propagated to upstream netdata servers.
added init file for SLC 6.9 and CloudLinux Server release 6.9
packages installer was incorrectly detecting all python versions as version 2.
a makeself
bug that prevented the static netdata binaries from being installed on busybox
systems, has been fixed.
openrc startup script (gentoo, alpine) had hardcoded the path to netdata. This affected all static-64bit builds when installed on these distros. Fixed.
the static 64bit installer now downloads netdata.conf, much like the git installer does.
openrc / gentoo init improvements @candrews
enabled support for macOS versions 10.5+ (10.11 was working already) @vlvkobal
enabled support for FreeBSD 12 @vlvkobal
fixed a crash on macOS hosts with empty disk names.
added Dockerfile.armv7hf
for running netdata under docker on ARM v7 machines @justin8
hover selection of charts is now faster on all browsers. Perfect on Chrome, Firefox and Opera. Quite usable on Edge.
the dashboard is now fixed when a modal is open, preventing scrolling the page.
the dashboard now uses fontawesome 5.0.1 for icons.
the chart names can now be searched with browser control-F (find in page). netdata lazy loads all charts for it was impossible to search of a chart. Now the charts are searchable. This is important on dashboards with several hundreds of statsd charts, because all these charts appear under the same section.
netdata now detects libvirt VM network interfaces and moves them to the VM section of the dashboard. The same functionality already exists for containers.
Show the context of each chart. The context
is used in alarm templates. (hover on the date of the chart)
Show the resolution of the chart. (hover on the time of the chart)
The dashboard now adds a tooltip at the date of the charts, to show the plugin and its module that collects each chart.
The dashboard should now put a lot less CPU pressure on the browser when the page does not have focus.
The dashboard does dynamic units scaling, on the fly ! It converts:
kilobits/s
to megabits/s
or gigabits/s
)kilobytes/s
to megabytes/s
or gigabytes/s
, similarly for KB/s
)MB
to KB
, GB
or TB
)GB
to MB
or TB
)Chart units dynamically adapt based on the value of the selected dimension too:
Custom dashboards can give data-desired-units="UNITS"
and netdata will automatically convert the presented values to the desired units. UNITS
can be any of the supported one, or auto
for auto-scaling based on the values, or original
to show the original units maintained by the netdata server.
The dashboard now supports units conversions. Currently it converts:
temperatures from Celsius
to Fahrenheit
seconds
to human readable duration DDd:HH:MM:SS
netdata can now convert all dates presented to any timezone. Traditionally netdata presented all charts at the timezone of the viewer. This allowed homogeneous central administration of systems that are installed all over the world. However, this was inefficient when we needed to compare the information presented on the dashboard, with the log files of the servers.
So, now netdata can present the charts on any timezone. The netdata server auto-detects the timezone of the server and new dashboard settings have been added to allow this conversion.
If autodetection of the servers timezone fails, the configuration option [global].timezone
has been added in netdata.conf
to set it. Also, the dashboard itself allows the viewers to configure the timezone (it is saved at browser local storage, so this has to be set just once per viewer).
To support all the above, the dashboard settings got a new tab, with all the required options:
statsd metrics can now be added to statsd synthetic charts using patterns. No need to add a dimension
line for each statsd metric to be added. netdata will also extract the wildcarded part of the metric name and use that one for the dimension name.
dimensions added to statsd synthetic charts, can automatically be renamed using a dictionary. Each synthetic charts application has its own dictionary of name - value pairs, which is used to automatically rename statsd metrics when they are added to synthetic charts.
statsd timers and histograms now report zeros when nothing is collected
fixed a bug in netdata badges that was incorrectly matching zero values with the null
color condition.
added API option display_absolute
to allow badges use the signed value for color evaluation, but present the absolute value.
warning emails sent by netdata, are now a little bit more orange (they were a bit green'sh).
added flock.com notifications @tvarsis
added kavenegar.com support for SMS notifications @vahit
fixed a bug in email notifications that was triggering a corrupted MIME match by anti-spam solutions.
pushbullet notifications now track the devices, so that per device filtering at pushbullet is possible. Also improved the formatting a bit. @user501254
pushover notifications fixes (the priority of warnings was set incorrectly)
alarms can now use variables like this ${variable with spaces or +, -, *, / in it}
. So, alarms can now use dimension names with any character in them.
access.log
has been refactored to support monitoring all netdata operations
inodes monitoring is now by default disabled for mount points based on filesystems that do not have a maximum inode threshold (such as cephfs
).
rabbitmq
has been added to apps_groups.conf
so that apps.plugin
now monitors (cpu, memory, disk I/O, sockets, etc) for rabbitmq instances.
several email and log management apps have been added to email
and logs
targets of apps_groups.conf
, @Flums
ceph
target added to apps_groups.conf
to allow netdata monitor Ceph - the unified, distributed storage system, @k0ste
refactored several internal data collection plugins to eliminate a few hundreds of index lookups per second.
netdata.conf
settings that are loaded from disk, but were the same with the default ones, were generated commented when the server was asked to give its config. Now all loaded settings are generated uncommented.
netdata simple patterns can now extract the the wildcarded part of the string they match (used in statsd synthetic charts)
netdata simple patterns can allow escaping spaces by prefixing them with a backslash.
netdata v1.8.0 released.
This release focuses on metrics streaming improvements and containers monitoring.
As always, this netdata is the fastest and the more stable netdata ever! Update now!
To install or update netdata, click here!
netdata, as a slave, was not handling all the error cases properly, resulting in 100% cpu utilization of a single core, under certain conditions. Especially under FreeBSD and macOS slaves, these conditions were always met, so using FreeBSD or macOS as netdata slaves, was completely broken.
netdata was incorrectly messing cached alarm state data between the alarms of the mirrored hosts, resulting in alarm notifications not dispatched under certain conditions. This was affecting only netdata masters (ie. netdata servers with more than one host databases, with health monitoring enabled). The alarms were generated and were visible at the dashboards, but the notifications were not always sent.
There was a minor issue with charts that were created with name aliases. When these charts were streamed from netdata slaves to netdata masters, they ended up with duplicate chart names (ie instead of type.name
they had type.type.name
).
Container network interfaces are now moved to the container section and they are rendered from the container view point (i.e. sent
= what the container sent) - no more veth*
garbage on the dashboard.
The interfaces also appear as eth0
(or whatever the container sees) and they are inside the container section of the dashboard. netdata maps each veth*
interface to the right container, using plain cgroups
features, so this works for all container managers (docker, lxc, etc).
Eliminated the nested containers shown under certain versions of lxc
.
Also, containers and VMs now have summary gauges on the dashboard
netdata now uses urllib3
(shipped with netdata for both python v2 and v3) for URLService based plugins.
This enables HTTP keep-alive
on all connections, which allows netdata to have permanent connections to third party web applications.
Fixed by @l2isbad
fping
can now run as non-root, in static binary netdata packagesnetdata can now listen on UNIX domain sockets (.sock
files). This allows a local web server and netdata to communicate bypassing the network stack (for netdata set bind to = unix:/path/to/netdata.sock
- this option supports multiple arguments, so netdata can listen to multiple unix sockets and tcp sockets, at the same time).
netdata was assuming that the JSON representation of a chart would at most be 1024 bytes, and it was generating corrupted JSON output when any chart was exceeding that limit. Removed the limitation (ie. now there is no limit).
netdata was crashing while starting, if no usable disks were found.
systemd netdata.service
now allows setting negative netdata OOM score and restarts netdata if it crashes. The new netdata.service
is not automatically installed when updating netdata. Either delete /etc/systemd/system/netdata.service
and then update/re-install netdata, or copy the file by hand.
minor fixes at the installer, by @vincele
chrony
plugin, by @domschlweb_log
bugfixes, enhancements and optimizations (including squid
logs), by @l2isbadweb_log
now enables parsing HTTP/2 logs in custom_log_format
, by @Funzinatorredis
bugfixes, by @l2isbadhaproxy
bugfixes, by @l2isbadelasticsearch
bugfixes and optimizations, by @l2isbadrabbitmq
bugfixes and optimizations, by @l2isbadmdstat
bugfixes, by @JeffHensontomcat
improvements, by @Wing924mysql
improvements, by @alibo and @l2isbaddovecot
improvementspostgres
improvements, by @facetoecpufreq
fixed a bug that prevented accurate
reporting of CPU frequencies. accurate
works with the acpi-cpufreq
driver and calculates the average CPU clock of the CPUs utilizing the accounting per frequency, as reported by the kernel, by @tychocpuidle
performance improvements (faster under load) by @tychofail2ban
bugfixes, by @l2isbadSNMP
plugin new uses latest net-snmp
and the corrupted 64 bit counters encountered under certain node.js version is now fixed.easypiecharts
and gauges
can now render arbitrary ranges and animate clock wise or counter clock wise.
traditionally netdata was using 1024 bits = 1 kilobit. It is fixed: 1000 bits = 1 kilobit.
netdata charts should now work on wordpress pages.
alarm-notify.sh
now supports debug mode, showing the exact commands it runs to send notifications, when export NETDATA_ALARM_NOTIFY_DEBUG=1
alarm-notify.sh
now supports setting the sender email address of the emails it sends.
emails sent by alarm-notify.sh
now include headers to reduce the possibility of them being scored as spam, by @Ferroin
network related alarms got new thresholds and improved badges
netdata now detects if the system has been suspended and pauses all alarms for 60 seconds on resume, to prevent false alarms (no more false alarms on laptops when they resume).
netdata alarms now support filtering based on hostname and O/S (linux, freebsd, macos). This means that netdata masters, can now support alarms for slaves of any O/S (i.e. a Linux netdata master can handle alarms for a FreeBSD slave).
netdata slack notifications now show the host sent the alarm. In the image below, the alarm is about bangalore
, and is sent by netdata-build-server
(at the lower left corner):
Published by philwhineray over 7 years ago
This is release v1.7 of netdata.
netdata is still spreading fast: we are at 320.000 users and 132.000 servers! Almost 100k new users, 52k new installations and 800k docker pulls since the previous release 4 and a half months ago! netdata user base grows at about 1000 new users and 600 new servers per day! Thank you! You are awesome!
The next release (v1.8) will be focused on providing a global health monitoring service, for all netdata users, for free! Read more about it here. We need supporters for this cause. Join us!
netdata is now a (very fast) fully featured statsd server and the only one with automatic visualization: push a statsd metric and hit F5 on the netdata dashboard: your metric visualized. It also supports synthetic charts, defined by you, so that you can correlate and visualize your application the way you like it.
netdata got new installation options - it is now easier than ever to install netdata - we also distribute a statically linked netdata x86_64 binary, including key dependencies (like bash
, curl
, etc) that can run everywhere a Linux kernel runs (CoreOS, CirrOS, etc).
metrics streaming and replication has been improved significantly. All known issues have been solved and key enhancements have been added. headless collectors and proxies can now send metrics to backends when data source = as collected
.
backends have got quite a few enhancements, including host tags, metrics filtering at the netdata side and sending of chart and dimension names instread of IDs; prometheus support has been re-written to utilize more prometheus features and provide more flexibility and integration options. IF YOU UPDATE FROM NETDATA 1.6 PLEASE CHECK YOUR DASHBOARDS, SINCE MANY METRICS HAVE CHANGED NAMES.
netdata now monitors ZFS (on Linux and FreeBSD), ElasticSearch, RabbitMQ, Go applications (via expvar
), ipfw (on FreeBSD 11), samba, squid logs (with web_log
plugin!).
netdata dashboard loading times have been improved significantly (hit F5 a few times on a netdata dashboard - it is now amazingly fast), to support dashboards with thousands of charts.
netdata alarms now support custom hooks, so you can run whatever you like in parallel with netdata alarms.
As usual, this release brings dozens more improvements, enhancements and compatibility fixes.
netdata is now a fully featured statsd server. It can collect statsd formatted metrics, visualize them on its dashboards, stream them to other netdata servers or archive them to backend time-series databases.
netdata statsd is fast. It can collect more than 1.200.000 metrics per second on modern hardware, more than 200Mbps of sustained statsd traffic. netdata statsd is inside netdata. This provides a distributed statsd implementation.
netdata also supports statsd synthetic charts: You can create dedicated sections on the dashboard to render the charts. You can control everything: the main menu, the submenus, the charts, the dimensions on each chart, etc.
Read more about netdata statsd
name:INTEGER|c
or name:INTEGER|C
or name|c
INTEGER
number supplied (positive, or negative).name:FLOAT|g
FLOAT
begins with +
or -
.name:FLOAT|h
The same chart with sum
unselected, to show the detail of the dimensions supported:
This is identical to counter
.
name:INTEGER|m
or name|m
or just name
INTEGER
number supplied (positive, or negative).name:TEXT|s
name:FLOAT|ms
The same chart with the sum
unselected:
There have been significant optimizations to the loading times of the dashboard. The dashboard loads instantly now, even when there are several hundreds of charts in it (hit F5 on the dashboard - it is super fast).
For those who know: we eliminated most browser reflows, by refactoring the way the charts are initialized and splitting initialization in 2 phases. Unfortunately we had to re-shape gauge and easypiecharts, so pay some attention to your custom dashboards after updating.
We now use natural sorting on the dashboard elements (i.e. instead of 1, 10, 2, 3 we get 1, 2, 3, 10).
There have been dozens of performance improvements on the netdata dashboard. Like all the previous releases, this release makes netdata the fastest netdata so far!
average
, sum
or volume
(from the netdata database) are now more accurate.contrib/nc-backend.sh
, a script that can act as a fallback backend for graphite, opentsdb and compatibles.as collected
metrics to backends.expvar
! @kralewitzweb_log
plugin can now monitor squid logs too ! @l2isbadweb_log
plugin can now monitor apache cache logs too (removed old apache_cache
plugin) @l2isbadweb_log
improvements - web_log
is now a lot more powerful! @l2isbadpython.d.plugin
LogService
now supports monitoring web log files matching a pattern @l2isbad/dev/mapper
names. It also has improved docker compatibility.haproxy
improvements @l2isbaddns_query_time
plugin to monitor the response time of nameservers @l2isbadcpufreq
improvements @l2isbadsmartd_log
improvements @pkoenig10bind_rndc
rewritten @l2isbadlighttpd
improvements (part of the apache
plugin)isc_dhcpd
improvements @l2isbadfping
improvementsapps.plugin
improvements (added many more applications to monitor, notably hadoop and friends, improved compatibility)freeipmi
improvementsmdstat
improvements @l2isbadmysql
improvements @aliboredis
improvements @l2isbadpostgres
rds fixes @facetoefail2ban
improvements @l2isbadidlejitter
rewrittenopenvpn
improvements @l2isbadnuma
improvements @Benje06alarm-notify.sh
now supports custom notification methods (you can hook whatever you like to netdata alarms).lighttpd
alarmmongodb
alarm @jnogolram
utilizes KSM (kernel memory deduper).map
improvements for faster operation with huge databases.clang
, even on FreeBSDPublished by philwhineray over 7 years ago
Release announced on twitter, hacker news, reddit r/linux, reddit r/sysadmin, reddit r/linuxadmin, reddit r/freebsd reddit r/devops reddir r/homelab facebook
netdata was first published on March 30th, 2016.
It has been a crazy year since then:
This is the first release that supports real-time streaming of metrics between netdata servers.
netdata can now be:
metrics databases can be configured on all nodes and each node maintaining a database may have a different retention policy and possibly run (even different) alarms on them.
There are 4 settings that control what netdata can be:
[global].memory mode
in netdata.conf
, controls if a netdata will maintain a local database and the type of it. For more information check Running a dedicated central netdata server.
[web].mode
in netdata.conf
, controls if netdata will expose its API, and the type of web server to enable (single or multi-threaded). Check netdata.conf configuration for streaming.
[stream].enabled
in stream.conf
, controls if netdata will stream its metrics to another netdata. Check stream.conf for sending metrics.
[API KEY].enabled
in stream.conf
, controls if netdata will accept metrics from other netdata. Check stream.conf for receiving metrics.
Using the above, we support a lot of different configurations, like these:
target | memorymode | webmode | streamenabled | send tobackend | localalarms | localdashboard |
---|---|---|---|---|---|---|
headless collector | none |
none |
yes |
not possible | not possible | no |
headless proxy | none |
not none
|
yes |
not possible | not possible | no |
proxy with db | not none
|
not none
|
yes |
possible | possible | yes |
central netdata | not none
|
not none
|
no |
possible | possible | yes |
netdata now supports monitoring autoscaled ephemeral nodes, that are started and stopped on demand (their IP is not known).
When the ephemeral nodes start streaming metrics to the central netdata, the central netdata will show register them at my-netdata
menu on the dashboard, like this:
You can see this live at https://build.my-netdata.io (this server may not always be available for demo).
For more information check: monitoring ephemeral nodes.
netdata now cleans up container, guest VM, network interfaces and mounted disk metrics, disabling automatically their alarms too.
For more information check monitoring ephemeral containers.
Vladimir Kobal has ported apps.plugin
to FreeBSD.
netdata can now provide Applications
, Users
and User Groups
under FreeBSD too:
Also, the CPU utilization of netdata under FreeBSD, is now a lot less compared to netdata v1.5.
See it live at our FreeBSD demo server.
Ilya Mashchenko has done a wonderful job creating a unified web log parsing plugin for all kinds of web server logs. With it, netdata provides real-time performance information and health monitoring alarms for web applications and web sites!
Requests by http status:
Requests by http status code family:
Requests by http status code:
Requests bandwidth:
Requests timings:
URL patterns of interest (you configure the patterns):
Requests by http method:
Requests by IP version:
Number of unique clients:
and a lot more, including alarms:
alarm | description | minimumrequests | warning | critical |
---|---|---|---|---|
1m_redirects |
The ratio of HTTP redirects (3xx except 304) over all the requests, during the last minute. Detects if the site or the web API is suffering from too many or circular redirects. (i.e. oops! this should not redirect clients to itself) | 120/min | > 20% | > 30% |
1m_bad_requests |
The ratio of HTTP bad requests (4xx) over all the requests, during the last minute. Detects if the site or the web API is receiving too many bad requests, including 404 , not found. (i.e. oops! a few files were not uploaded) |
120/min | > 30% | > 50% |
1m_internal_errors |
The ratio of HTTP internal server errors (5xx), over all the requests, during the last minute. Detects if the site is facing difficulties to serve requests. (i.e. oops! this release crashes too much) | 120/min | > 2% | > 5% |
5m_requests_ratio |
The percentage of successful web requests of the last 5 minutes, compared with the previous 5 minutes. Detects if the site or the web API is suddenly getting too many or too few requests. (i.e. too many = oops! we are under attack)(i.e. too few = oops! call the network guys) | 120/5min | > double or < half | > 4x or < 1/4x |
web_slow |
The average time to respond to requests, over the last 1 minute, compared to the average of last 10 minutes. Detects if the site or the web API is suddenly a lot slower. (i.e. oops! the database is slow again) | 120/min | > 2x | > 4x |
1m_successful |
The ratio of successful HTTP responses (1xx, 2xx, 304) over all the requests, during the last minute. Detects if the site or the web API is performing within limits. (i.e. oops! help us God!) | 120/min | < 85% | < 75% |
For more information check: the spectacles of a web server log file.
netdata can now archive metrics to JSON
backends (both push, by @lfdominguez, and pull modes).
netdata now has an IPMI plugin (based on freeipmi) for monitoring server hardware.
The plugin creates (up to) 8 charts, based on the information collected from IPMI:
It also supports alarms (including the number of sensors in critical state):
For more information, check monitoring IPMI.
Ilya Mashchenko builds python data collection plugins for netdata at an wonderfull rate! He rocks!
nice
netdata has received a lot more improvements from many more contributors! (it was really a lot of work to dig into git log to collect all the above, so forgive me if I forgot to mention a few contributions and contributors).
Thank you all!
Published by ktsaou over 7 years ago
Release announced on twitter, hacker news, reddit r/linux, reddit r/sysadmin, reddit r/linuxadmin, reddit r/freebsd
Yet another release that makes netdata the fastest netdata ever!
This is probably the release with the largest changeset so far. A lot of work, by a lot of people made this release possible!
Vladimir Kobal has done a magnificent work porting netdata to FreeBSD and MacOS.
Everything works:
Wow! Check it live on FreeBSD, at https://freebsd.my-netdata.io/
netdata supports data archiving to backend databases:
and of course all the compatible ones (KairosDB, InfluxDB, Blueflood, etc)
With this feature netdata can interface with your existing devops infrastructure and allow you to visualize its metrics with other tools, like grafana.
Ilya Mashchenko has created most of the python data collection plugins in this release! He rocks!
Shell scripts can now query netdata easily!
eval "$(curl -s 'http://localhost:19999/api/v1/allmetrics')"
after this command, all the netdata metrics are exposed to shell. Check:
# source the metrics
eval "$(curl -s 'http://localhost:19999/api/v1/allmetrics')"
# let's see if there are variables exposed by netdata for system.cpu
set | grep "^NETDATA_SYSTEM_CPU"
NETDATA_SYSTEM_CPU_GUEST=0
NETDATA_SYSTEM_CPU_GUEST_NICE=0
NETDATA_SYSTEM_CPU_IDLE=95
NETDATA_SYSTEM_CPU_IOWAIT=0
NETDATA_SYSTEM_CPU_IRQ=0
NETDATA_SYSTEM_CPU_NICE=0
NETDATA_SYSTEM_CPU_SOFTIRQ=0
NETDATA_SYSTEM_CPU_STEAL=0
NETDATA_SYSTEM_CPU_SYSTEM=1
NETDATA_SYSTEM_CPU_USER=4
NETDATA_SYSTEM_CPU_VISIBLETOTAL=5
# let's see the total cpu utilization of the system
echo ${NETDATA_SYSTEM_CPU_VISIBLETOTAL}
5
# what about alarms?
set | grep "^NETDATA_ALARM_SYSTEM_SWAP_"
NETDATA_ALARM_SYSTEM_SWAP_RAM_IN_SWAP_STATUS=CRITICAL
NETDATA_ALARM_SYSTEM_SWAP_RAM_IN_SWAP_VALUE=53
NETDATA_ALARM_SYSTEM_SWAP_USED_SWAP_STATUS=CLEAR
NETDATA_ALARM_SYSTEM_SWAP_USED_SWAP_VALUE=51
# let's get the current status of the alarm 'ram in swap'
echo ${NETDATA_ALARM_SYSTEM_SWAP_RAM_IN_SWAP_STATUS}
CRITICAL
# is it fast?
time curl -s 'http://localhost:19999/api/v1/allmetrics' >/dev/null
real 0m0,070s
user 0m0,000s
sys 0m0,007s
# it is...
# 0.07 seconds for curl to be loaded, connect to netdata and fetch the response back...
The _VISIBLETOTAL
variable sums up all the dimensions of each chart.
The format of the variables is:
NETDATA_${chart_id^^}_${dimension_id^^}="${value}"
The value
is rounded to the closest integer, since shell script cannot process decimal numbers.
netdata has received a lot more improvements from many more contributors! (it was really a lot of work to dig into git log to collect all the above, so forgive me if I forgot to mention a few contributions and contributors).
Thank you all!
Published by ktsaou about 8 years ago
Release announced on Hacker News
Release announced on reddit r/linux
Release announced on reddit r/sysadmin
Release announced on twitter
Many new alarms have been added to detect common kernel configuration errors and old alarms have been re-worked to avoid notification floods.
Alarms now support:
notification hysteresis (both static and dynamic)
notification self-cancellation, and
dynamic thresholds based on current alarm status
Also, a new alarms log:
netdata now supports:
For all the above methods, netdata supports role-based notifications, with multiple recipients for each role and severity filtering per recipient!
Also, netdata support HTML5 notifications, while the dashboard is open in a browser window (no need to be the active one).
All notifications (HTML5, emails, slack, pushover, telegram) are now clickable to get to the chart that raised the alarm.
improved IoT support!
netdata builds and runs with musl libc and runs on systems based on busybox.
improved containers support!
netdata runs on alpine linux (a low profile linux distribution used in containers).
Dozens of other improvements and bugfixes
netdata 1.4.0 - download release tarfiles from http://firehol.org/download/netdata/releases/v1.4.0
Published by ktsaou about 8 years ago
IMPORTANT:
Since netdata now uses python plugins, new packages are
required to be installed on a system to allow it work.
For more information, please check the installation page.
Based on the POLL we made on github, health monitoring was the winner. So here it is!
netdata now has a powerful health monitoring system embedded.
netdata can generate badges with live information from the collected metrics.
Thanks to the great work of Paweł Krupa (@paulfantom), most BASH plugins have been ported to python.
The new python.d.plugin supports both python2 and python3 and data collection from multiple sources for all modules.
The following pre-existing modules have been ported to python:
The following new modules have been added:
Thanks to @simonnagl netdata now reports disk space usage.
dashboards now transfer certain settings from server to server when changing servers via the my-netdata menu.
The settings transferred are the dashboard theme, the online help status and current pan and zoom timeframe of the dashboard.
API improvements:
apps.plugin improvements:
netdata now runs with IDLE process priority (lower than nice 19)
netdata now instructs the kernel to kill it first when it starves for memory.
netdata listens for signals:
netdata can now bind to multiple IPs and ports.
netdata now has new systemd service file (it starts as user netdata and does not fork).
Dozens of other improvements and bugfixes
netdata 1.3.0 - download release tarfiles from http://firehol.org/download/netdata/releases/v1.3.0
Published by ktsaou over 8 years ago
IMPORTANT:
This version requires libuuid. The package you need to build netdata is:
- uuid-dev (debian/ubuntu), or
- libuuid-devel (centos/fedora/redhat)
The central registry tracks all your netdata servers and bookmarks them for you at the my-netdata menu on all dashboards.
Every netdata can act as a registry, but there is also a global registry provided for free for all netdata users!
docker, lxc, or anything else. For each container it monitors CPU, RAM, DISK I/O (network interfaces were already monitored).
netdata 1.2.0 - download release tarfiles also from http://firehol.org/download/netdata/releases/v1.2.0
Published by ktsaou over 8 years ago
netdata 1.1.0 - download release tarfiles from http://firehol.org/download/netdata/releases/v1.1.0
Dozens of commits that improve netdata in several ways:
Published by ktsaou over 8 years ago
netdata 1.0.0 - download release tarfiles from http://firehol.org/download/netdata/releases/v1.0.0
Published by ktsaou over 8 years ago
Published by ktsaou about 9 years ago