The open-source observability platform everyone needs!
GPL-3.0 License
Bot releases are visible (Hide)
Published by netdatabot over 1 year ago
đ Our community growth is increasing steadily. â¤ď¸ Thank you! Your love and acceptance give us the energy and passion to work harder to simplify and make monitoring easier, more effective and more fun to use.
Let the world know you love Netdata.
Give Netdata a â on GitHub now.
Motivate us to keep pushing forward!
To help our community use Netdata more broadly, we just signed an agreement with Docker for the purchase of Rate Limit Removal, which will remove all Docker Hub pull limits for the Netdata repos at Docker Hub. We expect this add-on to be applied to our repos in the following few days, so that you will enjoy unlimited Docker Hub pulls of Netdata Docker images for free!
Netdata Cloud dashboards have been improved to provide instant summary tiles for most of their sections. This includes system overview, disks, network interfaces, memory, mysql, postgresql, nginx, apache, and dozens more.
To accomplish this, we extended the query engine of Netdata to support multiple grouping passes, so that queries like "sum metrics by label X, and then average by node" are now possible. At the same time we made room for presenting anomaly rates on them (vertical purple bar on the right) and significantly improved the tile placement algorithm to support multi-line summary headers and precise sizing and positioning, providing a look and feel like this:
The following chart tile types have been added:
To improve the efficiency of using these tiles, each of these tiles supports the following interactive actions:
Some examples that you can see from the Netdata Demo space:
Although Netdata Agent alerts support silencing, centrally dispatched alert notifications from Netdata Cloud were missing that feature. Today, we release alert notifications silencing rules for Netdata Cloud!
Silencing rules are applied on any combination of the following: users, rooms, nodes, host labels, contexts (charts), alert name, alert role. For the matching alerts, silencing can optionally have a starting date and time and/or an ending date time.
With this feature you can now easily setup silencing rules, which can be set to be applied immediately or at a defined schedule, allowing you to plan for upcoming schedule maintenance windows - see some examples here.
Read more about Silencing Alert notifications on our documentation.
Netdata trains ML models for each metric, using its past data. This allows Netdata to detect anomalous behaviors in metrics, based exclusively on the recent past data of the metric itself.
Before this release Netdata was training one model of each metric, learning the behavior of each metric during the last 4 hours. In the previous release we introduced persisting these models to disk and loading them back when Netdata restarts.
In this release we change the default ML settings to support multiple models per metric, maintaining multiple trained models per metric, covering the behavior of each metric for last 24 hours. All these models are now consulted automatically in order to decide if a data collection point is anomalous or not.
This has been implemented in a way to avoid introducing additional CPU overhead on Netdata agents. So, instead of training one model for 24 hours which would introduce significant query overhead on the server, we train each metric every 3 hours using the last 6 hours of data, and we keep 9 models per metric. The most recent model is consulted first during anomaly detection. Additional models are consulted as long as the previous ones predict an anomaly. So only when all 9 models agree that a data collection is anomalous, we mark the collected sample as anomalous in the database.
The impact of these changes is more accurate anomaly detection out of the box, with much fewer false positives.
You can read more about it in this deck presented during a recent office hours (office hours recording).
The SSL support at the Netdata Agent has been completely rewritten. The new code now reliably support SSL connections for both the Netdata internal web server and streaming. It is also easier to understand, troubleshoot and expand. At the same time performance has been improved by removing redundant checks.
During this process a long-standing bug on streaming connection timeouts has been identified and fixed, making streaming reliable and robust overall.
To keep building up on our set of existing alert notification methods we added Mattermost as another notification integration option on Netdata Cloud.
As part of our commitment to expanding our set of alert notification methods, Mattermost provides another reliable way to deliver alerts to your team, ensuring the continuity and reliability of your services.
Business Plan users can now configure Netdata Cloud to send alert notifications to their team on Mattermost.
On top of the work done on release v1.38, where we introduced real-time functions that enable you to trigger specific routines to be executed by a given Agent on demand. Our initial function provided detailed information on currently running processes on the node, effectively replacing top and iotop.
We have now added the capability to group your results by specific attributes. For example, on the Processes function you are now able to group the results by: Category, Cmd or User.
With this capability you can now get a consolidated view of your reported statistics over any of these attributes.
The agent core has been improved when it comes to integration with external plugins. Under certain conditions, a failed plugin would not be correctly acknowledged by the agent resulting in a defunc (i.e. zombie) plugin process. This is now fixed.
Starting with this release, our official DEB/RPM packages have been split so that each external data collection
plugin is in its own package instead of having everything bundled into a single package. We have previously had
our CUPS and FreeIPMI collectors split out like this, but this change extends that to almost all of our external
data collectors. This is the first step towards making these external collectors optional on installs that use
our native packages, which will in turn allow users to avoid installing things they donât actually need.
Short-term, these external collectors are listed as required dependencies to ensure that updates work correctly. At
some point in the future almost all of them will be changed to be optional dependencies so that users can pick
and choose which ones they want installed.
This change also includes a large number of fixes for minor issues in our native packages, including better handling
of user accounts and file permissions and more prevalent usage of file capabilities to improve the security of
our native packages.
We would like to thank our dedicated, talented contributors that make up this amazing community. The time and expertise that you volunteer are essential to our success. We thank you and look forward to continuing to grow together to build a remarkable product.
The following items will be removed in our next minor release (v1.41.0):
Patch releases (if any) will not be affected.
Component | Type | Will be replaced by |
---|---|---|
python.d/nvidia_smi | collector | go.d/nvidia_smi |
family attribute |
alert configuration and Health API | chart labels attribute (more details on netdata#15030) |
When using Netdata Cloud, the required agent version to take most benefits from the latest features is one version before the last stable.
On this release this will become v1.39.1
and you'll be notified and guided to take action on the UI if you are running agents on lower versions.
Check here for details on how to Update Netdata agents.
Join the Netdata team on the 19th of June at 16:00 UTC for the Netdata Release Meetup.
Together weâll cover:
RSVP now - we look forward to meeting you.
As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter an issue with any of the changes made in this release or any feature in the Netdata Agent, feel free to contact us through one of the following channels:
Helps us make Netdata even greater! We are trying to gather valuable information that is key for us to better position Netdata and ensure we keep bringing more value to you.
We would appreciate if you could take some time to answer this short survey (4 questions only).
Published by netdatabot over 1 year ago
This patch release provides the following bug fixes:
We noticed that claiming and enabling auto-updates have been failing due to incorrect permissions when kickstart.sh
was doing a static installation. The issue has affected all static installations, including the one done from the Windows MSI installer. The permissions have now been corrected.
The recipient lists of agent alert notifications are configurable via the health_alarm_notify.conf
file. A stock file with default configurations can be modified using edit-config
. @jamgregory noticed that the default settings in that file can make changing role recipients confusing. Unless the edited configuration file included every setting of the original stock file, the resulting behavior was unintuitive. @jamgregory kindly added a PR to fix the handling of custom role recipient configurations.
A bug in our collection and reporting of Infiniband bandwidth was discovered and fixed.
We noticed memory buffer overflows under some very specific conditions. We adjusted the relevant buffers and the calls to strncpyz
to prevent such overflows.
A memory leak in certain circumstances was found in the ACLK code. We fixed the the incorrect data handling that caused it.
An unrelated memory leak was discovered in the ACLK code and has also been fixed.
Exposing the anomaly rate right on top of each chart in Netdata Cloud surfaced an issue of bad ML models on some very noisy metrics. We addressed the issue by suppressing the indications that these noisy metrics would produce. This change gives the ML model a chance to improve, based on additional collected data.
Finally, we improved the handling of errors during ML transactions, so that transactions are properly rolled back, instead of failing in the middle.
As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter
an issue with any of the changes made in this release or any feature in the Netdata Agent, feel free to contact us
through one of the following channels:
Published by netdatabot over 1 year ago
zlib
, no upgrades of existing installs from versions prior to v1.11.We are excited to announce Netdata Charts v3.0 and the NIDL framework. These are currently available at Netdata Cloud. At the next Netdata release, the agent dashboard will be replaced to also use the same charts.
One of the key obstacles in understanding an infrastructure and troubleshooting issues, is making sense of the data we see on charts. Most monitoring solutions assume that the users have a deep understanding of the underlying data, so during visualization they actually do nothing to help users comprehend the data easier or faster. The problem becomes even more apparent when the users troubleshooting infrastructure problems are the not the ones who developed the dashboards. In these cases all kinds of misunderstandings are possible, resulting in bad decisions and slower time to resolution.
To help users instantly understand and validate the data they see on charts, we developed the NIDL (Nodes, Instances, Dimensions, Labels) framework and we changed all the Netdata query engines, at both the agent and the cloud, to enrich the returned data with additional information. This information is then visualized on all charts.
Netdata's unsupervised machine learning algorithm creates a unique model for each metric collected by your agents, using exclusively the metric's past data. We don't train ML models on a lab, or on aggregated sample data. We then use these unique models during data collection to predict the value that should be collected and check if the collected value is within the range of acceptable values based on past patterns and behavior. If the value collected is an outlier, we mark it as anomalous. This unmatched capability of real-time predictions as data is collected allows you to detect anomalies for potentially millions of metrics across your entire infrastructure within a second of occurrence.
Before this release, users had to either go to the "Anomalies" tab, or enable anomaly rate information from a button on the charts to access the anomaly rate. We found that this was not very helpful, since a lot of users were not aware of this functionality, or they were forgetting to check it. So, we decided that the best use of this information is to visualize it by default on all charts, so that users will instantly see if the AI algorithm in Netdata believes the values are not following past behavior.
In addition to the summarized tables and chart overlay, a new anomaly rate ribbon on top of each chart visualizes the combined anomaly rate of all the underlying data, highlighting areas of interest that may not be easily visible to the naked eye.
Hovering over the anomaly rate ribbon provides a histogram of the anomaly rates per dimension presented, for the specific point in time.
Anomaly rate visualization does not make Netdata slower. Anomaly rate is saved in the the Netdata database, together with metric values, and due to the smart design of Netdata, it does not even incur a disk footprint penalty.
Chart annotations have arrived! When hovering over the chart, the overlay may display an indication in the "Info" column.
Currently, annotations are used to inform users of any data collection issues that might affect the chart. Below each chart, we added an information ribbon. This ribbon currently shows 3 states related to the points presented in the chart:
[P]: Partial Data
At least one of the dimensions in the chart has partial data, meaning that not all instances available contributed data to this point. This can happen when a container is stopped, or when a node is restarted. This indicator helps to gain confidence of the dataset, in situations when unusual spikes or dives appear due to infrastructure maintenance, or due to failures to part of the infrastructure.
[O]: Overflowed
At least one of the datasources included in the chart was a counter that has overflowed exactly that point.
[E]: Empty Data
At least one of the dimensions included in the chart has no data at all for the given points.
All these indicators are also visualized per dimension, in the pop-over that appears when hovering the chart.
Hovering over any point in the chart now reveals a more informative overlay. This includes a bar indicating the volume percentage of each time series compared to the total, the anomaly rate, and a notification if there are data collection issues (annotations from the info ribbon).
The pop-over sorts all dimensions by value, makes bold the closest dimension to the mouse and presents a histogram based on the values of the dimensions.
When hovering the anomaly ribbon, the pop-over sorts all dimensions by anomaly rate, and presents a histogram of these anomaly rates.
You can now rapidly access condensed information for collected metrics, grouped by node, monitored instance, dimension, or any label key/value pair. Above all charts, there are a few drop-down menus. These drop-down menus have 2 functions:
In this release, we extended the query engines of Netdata (both at the agent and the cloud), to include comprehensive statistical data to help us understand what we see on the chart. We developed the NIDL framework to standardize this presentation across all charts.
The NIDL framework attaches the following metadata to every metric we collect:
Since all our metrics now have these metadata, we are use them at query time, to provide for each of them the following consolidated data for the visible time frame:
key:value
pair to the final query, so that we can immediately see for all label values involved in the query how much they affected the chart.All of these drop-down menus can now be used for instantly filtering the dataset, by including or excluding specific nodes, instances, dimensions or labels. Directly from the drop-down menu, without the need to edit a query string and without any additional knowledge of the underlying data.
At the same time, the new query engine of Netdata has been enhanced to support multiple group-by at once. The "Group by" drop-down menu allows selecting 1 or more groupings to be applied at once on the same dataset. Currently it supports:
Using this menu, you can slice and dice the data in any possible way, to quickly get different views of them, without the need to edit a query string and without any need to better understand the format of the underlying data. Netdata will do its by itself.
We are excited to announce that our Windows monitoring capabilities have been greatly improved with the addition of over 170 new system, network, and application metrics. This includes out-of-the-box support for MS Exchange, MS SQL, IIS, Active Directory (including AD Certificate and AD Federation Services).
To try out Netdata directly on your Windows machine, our .msi
installer allows for quick and easy installation with a Netdata WSL distribution. However, for production deployments, one or more Linux nodes are still required to run Netdata and store your metrics, as shown in the provided diagram.
To fully support this architecture, we have added the ability to declare each Windows host as a Netdata node. You can learn more about this feature in the virtual nodes section.
For more information, please check out our high-level introduction to
Windows monitoring, our demo, or our Windows collector documentation.
Netdata provides powerful tools for organizing hundreds of thousands of metrics collected every second in large infrastructures. From the automated organization into sections of related out-of-the-box aggregate charts, to concepts like spaces and war rooms that connect the metrics with the people who need to use them, scale is no problem. Easily slicing and dicing the metrics via grouping and filtering in our charts is also essential for exploration and troubleshooting, which is why we in the past we introduced host labels and default metric labels. To complete the available tool set, Netdata now offers the ability to define custom metric labels and virtual nodes. You can read how everything fits together in our documentation.
You can use custom labels to group and filter metrics in the Netdata Cloud aggregate charts. Virtual nodes work like normal Netdata Cloud nodes for the metrics you assign to them and can be added to any room.
The ability to define a virtual node is a new feature that is essential for monitoring remote Windows hosts, but has many other potential uses. For example, you may have a central monitoring node collecting data from many remote database hosts that you aren't allowed to install software on. You may also use the HTTP endpoint collector to check the availability and latency of APIs on multiple remote endpoints.
Defining virtual nodes lets you substantiate those entities that have no Netdata running on them, so they can appear in Netdata Cloud, be placed in rooms, filtered and grouped easily, and have their virtual node name displayed in alerts. Learn how to configure virtual nodes for any go.d.plugin data collection job.
Please read carefully through the following planned changes in our packaging, support of existing installs and required dependencies, as they may impact you. We are committed to providing the most up-to-date and reliable software, and we believe that the changes outlined below will help us achieve this goal more efficiently. As always, we are happy to provide any assistance needed during this transition.
As previously discussed on our blog, we will be changing how we package our external data collection plugins in the coming weeks. This change will be reflected in nightly builds a few weeks after this release, and in stable releases starting with v1.40.0. Please note that any patch releases for v1.39.0 will not include this change.
For detailed information on this change and how it may impact you, please refer to our blog post titled Upcoming Changes to Plugins in Native Packages.
Beginning shortly after this release, we will no longer be providing support for upgrading existing installs from versions prior to Netdata v1.11.0. It is highly unlikely that this change will affect any existing users, as v1.11.0 was released in 2018. However, this change is important in the long-term, as it will allow us to make our installer and updater code more portable.
In the near future, we will be making a significant change to the Netdata agent by making zlib
a mandatory dependency. Although we have not treated it as a mandatory dependency in the past, a number of features that we consider core parts of the agent rely on zlib
. Given that zlib
is ubiquitous across almost every platform, there is little to no benefit to it being an optional dependency. As such, this change is unlikely to have a significant impact on the vast majority of our users.
The change will be implemented in nightly builds shortly after this release and in stable releases starting with v1.40.0. Please note that any patch releases for v1.39.0 will not include this change.
In v1.38, we introduced real-time functions that enable you to trigger specific routines to be executed by a given Agent on demand. Our initial function provided detailed information on currently running processes on the node, effectively replacing top
and iotop
.
We have now expanded the versatility of functions by incorporating configurable bar charts above the table displaying detailed data. These charts will be a standard feature in all future functions, granting you the ability to manipulate and analyze the retrieved data as needed.
Ensuring the reliable delivery of alert notifications is crucial for maintaining the reliability of your services. While individual Netdata agents were already able to send alert notifications to Atlassian's Opsgenie, Netdata Cloud adds centralized control and more robust retry and failure handling mechanisms to improve the reliability of the notification delivery process.
Business Plan users can now configure Netdata Cloud to send alert notifications to their Atlassian Opsgenie platform, using our centralized alert dispatching feature. This feature helps to ensure the reliable delivery of notifications, even in cases where individual agents are offline or experiencing issues.
We are committed to continually extending the capabilities of Netdata Cloud, and our focus on centralized alert dispatching is just one example of this. By adding more centralized dispatching options, we can further increase the reliability of notification delivery and help our users maintain the highest levels of service reliability possible.
The cgroups plugin reads information on Linux Control Groups to monitor containers, virtual machines and systemd services.
Previously, we identified individual Docker containers solely through their container ID, which may not always provide adequate information to identify potential issues with your infrastructure. However, we've made significant improvements to our system by incorporating labels containing the image and the name of each container to all the collected metrics. These features allows you to group and filter the containers in a more efficient and effective manner, enabling you to quickly pinpoint and troubleshoot any issues that may arise
We always strive to provide the most informative chart titles and descriptions. The title of all our container CPU usage charts explain that 100% utilization means 1 CPU core, which also means you can exceed 100% when you add the utilization of multiple cores. This logic is a bit foreign to Kubernetes monitoring, where mCPU
is clearer. So we modified the chart title to state that 100% utilization is equivalent to 1000 mCPUs.
We place great importance on delivering the most informative chart titles and descriptions to our users. Our container CPU usage charts are no exception. We understand that the concept of 100% CPU utilization equating to 1 CPU core, and the ability to exceed 100% by adding the utilization of multiple cores may seem a bit unfamiliar to those using Kubernetes monitoring. In light of this, we have taken steps to modify our chart title by incorporating mCPU, which provides greater clarity. The title now indicates that 100% utilization equates to 1000 mCPUs in k8s. We hope this change will help you better understand and interpret our container CPU usage charts.
Netdata monitors the Docker engine to automatically generate charts for container health and state, and image size and state.
Previously, this collector only retrieved aggregate metrics for the containers managed by the Docker engine. We started a major change
in the way we collect metrics from Docker so that we can now present the health of each container separately, or grouped by the container name and image labels. Some teething issues with this change were fixed quickly with #1160.
We recently increased the client version of our collector, which started causing issues with older Docker engine servers. We resolved these issues by adding client version negotiation to our Docker collector.
Monitoring Kubernetes clusters can be challenging due to the intricate nature of the infrastructure. Identifying crucial aspects to monitor necessitates considerable expertise, which Netdata provides out-of-the-box through dedicated collectors for every layer of your Kubernetes infrastructure.
One key area to keep an eye on is the overall cluster state, which we address using the Kubernetes Cluster State Collector. This collector generates automated dashboards for 37 metrics encompassing overall node and pod resource limits and allocations, as well as pod and container readiness, health, and container restarts. Initially, we displayed the rate of container restarts, as we did with numerous other events. However, restarts are infrequent occurrences in many infrastructures. Displaying the rate of sparse events can lead to suboptimal charts for troubleshooting purposes. To address this, we have modified the logic and now present the absolute count of container restarts for enhanced clarity.
Kubernetes monitoring also relies on the cgroups plugin for container and pod monitoring. To properly label k8s containers, the cgroup
plugin makes calls to the k8s API server to retrieve pod metadata. In large clusters and under certain conditions (e.g. starting all the agents at once), these requests can potentially cause serious stress on the API server, or even a denial of service incident. To address this issue we have provided an alternative to querying the API server. We now allow querying the local kubelet server for the same information. However, since the Kubelet's /pods
endpoint is not well documented and should probably not be relied on (see 1, 2), we still query the API server by default. To switch to querying Kubelet, you can set the child.podsMetadata.useKubelet
and child.podsMetadata.kubeletUrl
variables that were added to our Helm chart.
The eBPF Collector offers numerous eBPF programs to assist you in troubleshooting and analyzing how applications interact with the Linux kernel. By utilizing tracepoints, trampoline, and kprobes, we gather a wide range of valuable data about the host that would otherwise be unattainable.
We recently addressed some significant issues with SIGABRT crashes on some systems. These crashes were caused by problems with memory allocation and deallocation functions, which resulted in unstable system behavior and prevented users from effectively monitoring their systems. To resolve these issues, we made some significant changes to our memory allocation and deallocation functions. Specifically, we replaced these functions with more reliable alternatives and began using vector allocation where possible.
We later identified issues with memory corruption, Oracle Linux
ported codes and OOMKill
, which were all resolved with #14869.
Finally, issues with CPU usage on EC2 instances appeared in a nightly release and were resolved with some changes that speed up the plugin clean up process and also prevent some possible SIGABRT
and SIGSEGV
crashes.
These changes helped to reduce the likelihood of crashes occurring and improved the overall stability and reliability of the eBPF collector.
In some environments, the collector demanded substantial memory resources. To address this, we introduced charts to monitor its memory usage and implemented initial optimizations to decrease the RAM requirements. We will continue this work in future releases, to bring you even
more eBPF observability superpowers, with minimal resource needs.
The disk space plugin is designed to monitor disk space usage and inode usage for mounted disks in Linux. However, because msdos
/FAT
file systems don't use inodes, the plugin would often generate false positives, leading to inaccurate results. To fix this, we've disabled inode data collection for these file systems, using the exclude inode metrics on filesystems
configuration option. This option has a default value of msdosfs msdos vfat overlayfs aufs* *unionfs
.
Our proc plugin is responsible for gathering system metrics from various endpoints, including /proc
and /sys
folders in Linux systems. It is an essential part of our monitoring tool, providing insights into system performance.
When running the Netdata agent in a Docker container, we encountered an issue where zram
memory metrics were not being displayed. To solve this, we made changes to the zram
collector code, respecting the /host
prefix added to the directories mounted from the host to the container. Now, our monitoring tool can collect zram
memory metrics even when running in a Docker container.
We also improved the zfs
storage pool monitoring code, by adding the state suspended
to the list of monitored states.
Finally, we added new metrics for BTRFS commits and device errors.
Our PostgreSQL collector is a highly advanced application collector, offering 70 out-of-the-box charts and 14 alerts to help users monitor their PostgreSQL databases with ease.
We recently discovered an issue in our documentation where we were instructing users to create a netdata
user, even though our data collection job was using the postgres
user. To address this issue, we have now added the netdata
user as an additional option to our data collection jobs. With this enhancement, users can now use either the postgres
user or the newly added netdata
user to collect data from their PostgreSQL databases, ensuring a more seamless and accurate monitoring experience.
Netdata automatically generates several charts for PostreSQL write-ahead logs (WAL). We recently discovered that
wal_files_count
, wal_archiving_files_count
and replication_slot_files_count
require superuser access, so we
added a check on whether the collection job has superuser access, before
attempting to collect these WAL metrics.
Finally, we fixed a bug with the bloat size calculation that used to erroneously return zeroes for some indexes.
The DNS query collector is a crucial tool that ensures optimal system performance by monitoring the liveness and latency of DNS queries. This tool is simple yet essential, as it attempts to resolve any hostname you provide and creates metrics for the response time and success or failure of each request/response.
Previously, we only measured the response time for successful queries. However, we have now enhanced the DNS query collector by collecting latency data for failed queries as well. This improvement enables us to identify and troubleshoot DNS errors more effectively, which ultimately leads to improved system reliability and performance.
Modern endpoint monitoring should include periodic checks on all your internal and public web applications, regardless of their traffic patterns. Automated and continuous tests can proactively identify issues, allowing them to be resolved before any users are affected.
Netdata's HTTP endpoint collector is a powerful tool that enables users to monitor the response status, latency, and content of any URL provided. While the collector has always supported basic authentication via a provided username and password, we have recently introduced a new enhancement that allows for more complex authentication flows. With the addition of the ability to include a cookie in the request, users can now authenticate and monitor more advanced applications, ensuring more comprehensive and accurate monitoring capabilities.
All you need to do is to add cookie: <filename>
to your data collection job and the collector will issue the request will the contents of that file.
Our Elasticsearch Collector seamlessly generates visualizations for 47 metrics, drawing from 4 endpoints of the renowned search engine.
The original Elasticsearch project evolved into an open-source initiative called OpenSearch, spearheaded by Amazon. However, our collector did not automatically connect to OpenSearch instances due to their default security settings with TLS and authentication.
Although it is possible to disable security by adding plugins.security.disabled: true
to /etc/opensearch/opensearch.yml
, which allows the default data collection job to function, we deemed it more prudent to introduce an OpenSearch-specific data collection job. This addition explicitly enables TLS and highlights the necessity of a username and password for secure access.
Dnsmasq
is a lightweight and easy-to-configure DNS forwarder that is specifically designed to offer DNS, DHCP, and TFTP services to small-scale networks. Netdata provides comprehensive monitoring of Dnsmasq
by collecting metrics for both the DHCP server and DNS forwarder.
Recently, we made a minor but important improvement to the order in which the DNS forwarder cache charts are displayed. With this update, the most critical information regarding cache utilization is now presented first, providing users with more efficient access to essential data. By constantly improving and refining our monitoring capabilities, we aim to provide our users with the most accurate and useful insights into their network performance.
Envoy is an L7 proxy and communication bus designed for large modern service oriented architectures.
Our new Envoy collector automatically generates charts for over 50 metrics.
The files and directories collector monitors existence, last update and size of any files or directories you specify.
The collector was not sanitizing file and directory names, causing issues with metric collection. The issue was specific to paths with
spaces in them and is now fixed.
The Netdata agent includes a RabbitMQ collector that tracks the performance of this open-source message broker. This collector queries RabbitMQ's HTTP endpoints, including overview
, node
, and vhosts
, to provide you with detailed metrics on your RabbitMQ instance. Recently, we fixed an issue that prevented our collector from properly collecting metrics on 32-bit systems.
The charts.d plugin is an external plugin for Netdata. It's responsible for orchestrating data collection modules written in BASH
v4+ to gather and visualize metrics.
Recently, we fixed an issue with the plugin's restarts that sometimes caused the connection to Netdata to be lost. Specifically, there was a chance for charts.d processes to die at the exact same time when the Netdata binary tried to read from them using fgets
. This caused Netdata to hang, as fgets
never returned. To fix this issue, we added a "last will" EOF
to the exit process of the plugin. This ensures that the fgets call has something to receive before the plugin exits, preventing Netdata from hanging.
With this issue resolved, the charts.d plugin can now continue to provide seamless data collection and visualization for your Netdata instance without any disruptions.
Our anomaly collector is a powerful tool that uses the PyOD
library in Python to perform unsupervised anomaly detection on your Netdata metrics. With this collector, you can easily identify unusual patterns in your data that might indicate issues with your system or applications.
Recently, we discovered an issue with the collector's Python version check. Specifically, the check was incorrectly rejecting Python 3.10 and higher versions due to how the float()
function was casting "10" to "1". This resulted in an inaccurate check that prevented some users from using the anomaly collector with the latest versions of Python.
To resolve this issue, we fixed the Python version check to work properly with Python 3.10 and above. With this fix in place, all users can now take advantage of the anomaly collector's powerful anomaly detection capabilities regardless of the version of Python they are using.
Pandas is a de-facto standard in reading and processing most types of structured data in Python. If you have metrics appearing in a CSV, JSON, XML, HTML, or other supported format,
either locally or via some HTTP endpoint, you can easily ingest and present those metrics in Netdata, by leveraging the Pandas collector. We fixed an issue we had logging some collector errors.
Our Generic Prometheus Collector gathers metrics from any Prometheus endpoint that uses the OpenMetrics exposition format.
In version 1.38, we made some significant changes to how we generate charts with labels per label set. These changes resulted in a drastic increase in the length of generated chart IDs, which posed some challenges for users with a large number of label key/value pairs. In some cases, the length of the type.id` string could easily exceed the previous limit of 200 characters, which prevented users from effectively monitoring their systems.
To resolve this issue, we took action to increase the chart ID limit from 200 to 1000 characters. This change provides you with more flexibility when it comes to labeling their charts and ensures that you can effectively monitor their systems regardless of the number of label key/value pairs you use.
We recently made some significant improvements to our email notification templates. These changes include adding the chart context, Space name, and War Room(s) with navigation links. We also updated the way the subject is built to ensure it's consistent with our other templates.
These improvements help to provide users with more context around their alert notifications, making it easier to quickly understand the nature of the issue and take appropriate action. By including chart context, Space name, and War Room(s) information, users can more easily identify the source of the problem and coordinate a response with their team members.
We've also enhanced our personal notification level settings to include an "Unreachable only" option. This option allows you to receive only reachability notifications for nodes disconnected from Netdata cloud. Previously this capability was only available combined with "All alerts". With this enhancement, you can now further customize you notification settings to more effectively manage your alerts and reduce notification fatigue.
The Netdata agent can now send alerts to ntfy servers. ntfy
(pronounced "notify") is a simple HTTP-based pub-sub notification service. It allows you to send notifications to your phone or desktop via scripts from any computer, entirely without sign-up, cost or setup. It's also open source if you want to run your own server.
You can learn how to send ntfy
alert notifications from a Netdata agent in our documentation.
Cloud to manage millions of alert state transitions daily. These transitions are transmitted from each connected agent through the agent-Cloud Link (ACLK). As with any communication channel, occasional data loss is unavoidable. Therefore, swiftly detecting missing transitions and reconciling discrepancies is crucial for maintaining real-time observability, regardless of scale.
We are thrilled to introduce a significant enhancement to our alert synchronization protocol between Netdata Agents and Netdata Cloud. This upgrade ensures faster transmission of alert states and prompt resolution of any temporary inconsistencies.
In the past, whenever a state transition change occurred, a message with a sequencing number was sent from the Agent to the Cloud. This method resulted in numerous read/write operations, generating excessive load on our Alerts database in the Cloud. Furthermore, it assumed that all messages had to be processed sequentially, imposing unnecessary constraints and restricting our scaling options for message brokers.
Our revamped protocol implements a far more efficient method. Instead of relying on sequencing numbers, we now use a checksum value calculated by both the Cloud and the Agent to verify synchronization. This approach not only lessens the burden on our Alerts database but also eliminates the dependency on sequential message processing, permitting out-of-order message delivery.
The enhanced synchronization and scaling capabilities allow us to address certain edge cases where users experienced out-of-sync alerts on the Cloud. Consequently, we can now deliver a superior service to our users.
We're committed to continually improving our [Events Feed](https://learn.netdata.cloud/docs/troubleshooting-and-machine-learning events-feed), which we introduced in version 1.38. We've made several user experience (UX) improvements to make the Events Feed even more useful for troubleshooting purposes.
One of the key improvements we made was the addition of a bar chart showing the distribution of events over time. This chart helps users quickly identify interesting time periods to focus on during troubleshooting. By visualizing the distribution of events across time, users can more easily spot patterns or trends that may be relevant to their troubleshooting efforts.
These improvements help to make the Events Feed an even more valuable tool, helping you troubleshoot issues more quickly and effectively. We will continue to explore ways to enhance the Events Feed and other features of our monitoring tool to provide the best possible user experience.
As part of our Machine Learning Roadmap we have been working to persist trained models to the db so that the models used in Netdata's native anomaly detection capabilities will not be lost on restarts and instead be persisted to the database. This is an important step on the way to extending the ML defaults to train on the last 24 hours by default in the near future (as discussed more in this blog post). This will help improve anomaly detection performance, reducing false positives and making anomaly rates more robust to system and netdata restarts where previously models would need to be fully re-trained.
This is an area of quite active development right now and there are still a few more pieces of work to be done in coming releases. If interested you can follow along with any area/ML
issues in netdata/netdata-cloud
or netdata/netdata
and check out active PR's here.
We have updated the bundled version of makeself used to create static builds, which was almost six years out of date, to sync it with the latest upstream release. This update should significantly improve compatibility on more exotic Linux systems.
We have also updated the metadata embedded in the archive to better reflect the current state of the project. This ensures that the project is up to date and accurately represented, providing users with the most relevant and useful information.
You can find more details about these updates in our Github repository.
Previously, the only way to get a default netdata.conf
file was to start the agent and query the /netdata.conf
API endpoint. This worked well enough for checking the effective configuration of a running agent, but it also meant that edit-config netdata.conf
didn't work as users expect, if there is no netdata.conf
file. It also meant that you couldn't check the effective configuration if you have the web server disabled.
We have now added the netdatacli dumpconfig
command, which outputs the current
netdata.conf
, exactly like the web API endpoint does. In the future we will look into making the edit-config
command a bit smarter,
so that it can provide the option to automatically retrieve the live netdata.conf
.
We're excited to announce the completion of a radical overhaul of our documentation site, available at learn.netdata.cloud. Our new site features a much clearer organization of content, a streamlined publishing process, and a powerful Google search bar that searches all available resources for articles matching your queries.
We've restructured and improved dozens of articles, updating or eliminating obsolete content and deduplicating similar or identical content. These changes help to ensure that our documentation remains up-to-date and easy to navigate.
Even seasoned Netdata power users should take a look at our new Deployment in Production section, which includes features and suggestions that you may have missed in the past.
We're committed to maintaining the highest standards for our documentation and invite our users to assist us in this effort. The "Edit this page" button, available on all published articles, allows you to suggest updates or improvements by directly editing the source file.
We hope that our new documentation site helps you more effectively use and understand our monitoring tool, and we'll continue to make improvements and updates based on your feedback.
The following items will be removed in our next minor release (v1.40.0):
Patch releases (if any) will not be affected.
Component | Type | Will be replaced by |
---|---|---|
python.d/nvidia_smi | collector | go.d/nvidia_smi |
In accordance with our previous deprecation notice, the following items have been removed in this release:
Component | Type | Replaced by |
---|---|---|
python.d/ntpd | collector | go.d/ntpd |
python.d/proxysql | collector | go.d/proxysql |
python.d/rabbitmq | collector | go.d/rabbitmq |
Join the Netdata team on the 9th of May, at 16:00 UTC for the Netdata Release Meetup.
Together weâll cover:
RSVP now - we look forward to meeting you.
As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter an issue with any of the changes made in this release or any feature in the Netdata Agent, feel free to contact us through one of the following channels:
Helps us make Netdata even greater! We are trying to gather valuable information that is key for us to better position Netdata and ensure we keep bringing more value to you.
We would appreciate if you could take some time to answer this short survey (4 questions only).
We would like to thank our dedicated, talented contributors that make up this amazing community. The time and expertise that you volunteer are essential to our success. We thank you and look forward to continuing to grow together to build a remarkable product.
--release-channel
and --nightly-channel
options in kickstart.sh
.Published by netdatabot over 1 year ago
The first patch release for v1.38 updates the version of OpenSSL included in our static builds
and Docker images to v1.1.1t
, to resolve a few moderate security vulnerabilities in v1.1.1n
.
The patch also includes the following minor bug fixes:
As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter
an issue with any of the changes made in this release or any feature in the Netdata Agent, feel free to contact us
through one of the following channels:
Published by netdatabot over 1 year ago
DBENGINE v2
The new open-source database engine for Netdata Agents, offering huge performance, scalability and stability improvements, with a fraction of memory footprint!
FUNCTION: Processes
Netdata beyond metrics! We added the ability for runtime functions, that can be implemented by any data collection plugin, to offer unlimited visibility to anything, even not-metrics, that can be valuable while troubleshooting.
Events Feed
Centralized view of Space and Infrastructure level events about topology changes and alerts.
NOTIFICATIONS: Slack, PagerDuty, Discord, Webhooks
Netdata Cloud now supports Slack, PagerDuty, Discord, Webhooks.
Role-based access model
Netdata Cloud supports more roles, offering finer control over access to infrastructure.
Integrations
New and improved plugins for data collection, alert notifications, and data exporters.
Health Monitoring and Alerts Notification Engine
Changes to the Netdata Health Monitoring and Notifications engine.
âWe are keeping our codebase healthy by removing features that are end-of-life. Read the deprecation notice to check if you are affected.
We completely reworked our custom-made, time series database (dbengine), resulting in stunning improvements to performance, scalability, and stability, while at the same time significantly reducing the agent memory requirements.
On production-grade hardware (e.g. 48 threads, 32GB ram) Netdata Agent Parents can easily collect 2 million points/second while servicing data queries for 10 million points / second, and running ML training and Health querying 1 million points / second each!
For standalone installations, the 64bit version of Netdata runs stable at about 150MB RAM (Reside Set Size + SHARED), with everything enabled (the 32bit version at about 80MB RAM, again with everything enabled).
We introduced a new journal file format (*.jnfv2
) that is way faster to initialize during loading. This file is used as a disk-based index for all metric data available on disk (metrics retention), reducing the memory requirements of dbengine by about 80%.
3 new caches (main cache, open journal cache, extent cache) have been added to speed up queries and control the memory footprint of dbengine.
These caches combined, offer excellent caching even for the most demanding queries. Cache hit ratio now rarely falls bellow 50%, while for the most common use cases, it is constantly above 90%.
The 3 caches support memory ballooning and autoconfigure themselves, so they don't require any user configuration in netdata.conf
.
At the same time, their memory footprint is predictable: twice the memory of the currently collected metrics, across all tiers. The exact equation is:
METRICS x 4KB x (TIERS - 1) x 2 + 32MB
Where:
METRICS x 4KB x TIERS
is the size of the concurrently collected metrics.4KB
is the page size for each metric.TIERS
is whatever configured for [db].storage tiers
in netdata.conf
; use (TIERS - 1)
when using 3 tiers or more (3 is the default).x 2 + 32MB
is the commitment of the new dbengine.The new combination of caches makes Netdata memory footprint independent of retention! The amount of metric data on disk, does not any longer affect the memory footprint of Netdata, it can be just a few MB, or even hundreds of GB!
The caches try to keep the memory footprint at 97% of the predefined size (i.e. twice the concurrently collected metrics size). They automatically enter a survival mode when memory goes above this, by paralleling LRU evictions and metric data flushing (saving to disk). This system has 3 distinct levels of operation:
The caches are now shared across all dbengine instances (all tiers).
LRU evictions are now smarter: the caches know when metrics are referenced by queries or by collectors and favor the ones that have been used recently by data queries.
The new dbengine query engine is totally asynchronous, working in parallel while other threads are processing metrics points. Chart and Context queries, but also Replication queries, now take advantage of this feature and ask dbengine to preload metric data in advance, before they are actually needed. This makes Netdata amazingly fast to respond in data queries, even on busy parent that at the same time collect millions of points.
At the same time we support prioritization of queries based on their nature:
Starvation is prevented by allowing 2% of lower priority queries for each higher priority queue. So, even when backfilling is performed full speed at 15 million points per second, user queries are satisfied up to 300k points per second.
Internally all caches are partitioned to allow parallelism up to the number of cores the system has available. On busy parents with a lot of data and capable hardware it is now easy for Netdata to respond to queries using 10 million points per second.
At the same time, extent deduplication has been added, to prevent the unnecessary loading and uncompression of an extent multiple times in a short time. This works like this: while a request to load an extent is in flight, and up to the time the actual extent has been loaded and uncompressed in memory, more requests to extract data from it can be added to the same in flight request! Since dbengine trying to keep metrics of the same charts to the same extent, combined with the feature we added to prepare ahead multiple queries, this extent deduplication now provides hit of above 50% for normal chart and context queries!
A new metrics registry has been added that maintains an index of all metrics in the database, for all tiers combined.
Initialization is the metrics registry is fully multithreaded utilizing all the resources available on busy parents, improving start-up times significantly.
This metrics registry is now the only memory requirement related to retention. It keeps in memory the first and the last timestamps, along with a few more metadata, of all the metrics for which retention is available on disk. The metrics registry needs about 150 bytes per metric.
The biggest change in streaming is that the parent agents now inherit the clock of their children, for their data. So, all timestamps about collected metrics reflect the timestamps on the children that collected them. If a child clock is ahead of the parent clock, the parent will still accept collected points for the child, and it will process them and save them, but on parent node restart the parent will refuse to load future data about a child database. This has been done in such a way that if the clock of the child is fixed (by adjusting it backwards), after a parent restart the child will be able to push fresh metrics to the parent again.
Related to the memory footprint of the agent, streaming buffers were ballooning up to the configured size and remained like that for the lifetime of the agent. Now the streaming buffers are increased to satisfy the demand, but then they are again decreased to a minimum size. On busy parents this has a significant impact on the overall memory footprint of the agent (10MB buffer per node x 200 child nodes on this parent, is 2GB - now they return to a few KB per node).
Active-Active parent clusters are now more reliable by detecting stale child connections and disconnecting them.
Several child to parent connection issues have been solved.
Replication now uses the new features of dbengine and pipelines queries preparation and metric data loading, improving drastically its performance. At the same time, the replication step is now automatically adjusted to the page size of dbengine, allowing replication to use the data are already loaded by dbengine and saving resources at the next iteration.
A single replication thread can now push metrics at a rate of above 1 million points / second on capable hardware.
Solved an issue with replication, where if the replicated time-frame had a gap at the beginning of the replicated period, then no replication was performed for that chart. Now replication skips the gap and continues replicating all the points available.
Replication does not replicate now empty points. The new dbengine has controls in place to insert gaps into the database which metrics are missing. Utilizing this feature, we have now stopped replicating empty points, saving bandwidth and processing time.
Replication was increasing the streaming buffers above the configured ones, when big replication messages had to fit in it. Now, instead of increasing the streaming buffers, we interrupt the replication query at a point that the buffer will be sufficient to accept the message. When queries are interrupted like this, the remaining query is then repeated until all of it executed.
Replication and data collection are now synchronized atomically at the sending side, to ensure that the parent will not have gaps at the point the replication ends and streaming starts.
Replication had discrepancies when the db mode was not dbengine
. To solve these discrepancies, combined with the storage layer API changes introduced by the new dbengine, we had to rewrite them to be compliant. Replication can now function properly, without gaps at the parents, even when the child has db mode alloc
, ram
, save
or map
.
Several improvements have been performed to speed up agent startup and shutdown. Combined with the new dbengine, now Netdata starts instantly on single node installations and uses just a fraction of the time that was needed by the previous stable version, even on very busy parents with huge databases (hundreds of GB).
Special care has been taken to ensure that during shutdown the agent prioritizes dbengine flushing to disk of any unsaved data. So, now during shutdown, data collection is first stopped and then the hot and dirty pages of the main cache are flushed to disk before proceeding with other cleanup activities.
After the groundwork done on the Netdata Agent in v1.37.0, Netdata Agent collectors are able to expose functions that can be executed on-demand, at run-time, by the data collecting agent, even when queries are executed via a Netdata Agent Parent. We are now utilizing this capability to provide the first of many powerful features via the Netdata Cloud UI.
Netdata Functions on Netdata Cloud allow you to trigger specific routines to be executed by a given Agent on request. These routines can range from a simple reader that fetches real time information to help you troubleshoot (like the list of currently running processing, currently running db queries, currently open connections, etc.), to routines that trigger an action on your behalf (restart a service, rotate logs, etc.), directly on the node. The key point is to remove the need to open an ssh connection to your node to execute a command like top
while you are troubleshooting.
The routines are triggered directly from the Netdata Cloud UI, with the request going through the secure, already established by the agent Agent-Cloud Link (ACLK). Moreover, unlike many of the commands you'd issue from the shell, Netdata Functions come with powerful capabilities like auto-refresh, sorting, filtering, search and more! And, as everything about Netdata, they are fast!
At the moment, just one, to display detailed information on the currently running processes on the node, replacing top
and iotop
. The function is provided by the apps.plugin collector.
The nitty-gritty details are in PR "Allow netdata plugins to expose functions for querying more information about specific charts" (#13720). In short:
data
query call, that returns the metric values./api/v1/functions
endpoint (see it in swagger). However, for security reasons, the specific call is protected, meaning it is disabled from the HTTP API and will return a 403. Only the cloud can call the particular endpoint and only via the secure and protected Agent-Cloud Link (ACLK).plugins.d
API has for the first time become bidirectional, precisely to support the daemon querying this type of information.The definitions of functions are transmitted to parent nodes via streaming, so that the parents know all the functions available on all child database they maintain. This works even across multiple levels of parents.
When a parent node is connected to Netdata Cloud, it is capable of triggering the call to the respective child node, to run any of its functions. When multiple parents are involved, all of them will propagate the request to the right child to execute the function.
Since these functions are able to execute routines on the node and expose information beyond metric data (even action buttons could be implemented using functions), our concern is to ensure no sensitive information or disruptive actions are exposed through the unprotected Agent's API.
Since Netdata Cloud provides all the infrastructure to authenticate users, assign roles to them and establishes a secure communication channel to Netdata Agents ACLK, this concern is addressed. Netdata Cloud is free forever for everyone, providing a lot more than just the agent dashboard and is our main focus of development for new visualization features.
For even more details please check our docs.
If you have ideas or requests for other functions:
Coming by Feb 15th
The Events feed is a powerful new feature that tracks events that happen on your infrastructure, or in your Space. The feed lets you investigate events that occurred in the past, which is obviously invaluable for troubleshooting. Common use cases are ones like when a node goes offline, and you want to understand what events happened before that. A detailed event history can also assist in attributing sudden pattern changes in a time series to specific changes in your environment.
We start from humble beginnings, capturing topology events (node state transitions) and alert state transitions. We intend to expand the events we capture to include infrastructure changes like deployments or services starting/stopping and we plan to provide a way to display the events in the standard Netdata charts.
â ď¸ Based on your space's plan different allowances are defined to query past data. The length of the history is provided in this table:
Domains of events | Community | Pro | Business |
---|---|---|---|
Topology events Node state transition events, e.g. live or offline. | 4 hours | 7 days | 14 days |
Alert events Alert state transition events, can be seen as an alert history log. | 4 hours | 7 days | 90 days |
Event name | Description |
---|---|
Node Became Live | The node is collecting and streaming metrics to Cloud. |
Node Became Stale | The node is offline and not streaming metrics to Cloud. It can show historical data from a parent node. |
Node Became Offline | The node is offline, not streaming metrics to Cloud and not available in any parent node. |
Node Created | The node is created, but it is still Unseen on Cloud, didn't establish a successful connection yet. |
Node Removed | The node was removed from the Space through the Delete action, if it becomes Live again it will be automatically added. |
Node Restored | The node is restored, if node becomes Live after a remove action it is re-added to the Space. |
Node Deleted | The node is deleted from the Space, see this as an hard delete and won't be re-added to the Space if it becomes live. |
Agent Claimed | The agent was successfully registered to Netdata Cloud and is able to connect. |
Agent Connected | The agent connected to the Netdata Cloud MQTT server (Agent-Cloud Link established). |
Agent Disconnected | The agent disconnected from the Netdata Cloud MQTT server (Agent-Cloud Link severed). |
Agent Authenticated | The agent successfully authenticated itself to Netdata Cloud. |
Agent Authentication Failed | The agent failed to authenticate itself to Netdata Cloud. |
Event name | Description |
---|---|
Node Alert State Changed | These are node alert state transition events and can be seen as an alert history log. You can see transitions to or from any of these states: Cleared, Warning, Critical, Removed, Error or Unknown |
Coming by Feb 15th
Every Netdata Agent comes with hundreds of pre-installed health alerts designed to notify you when an anomaly or performance issue affects your node or the applications it runs. All these events, from all your nodes, are centralized at Netdata Cloud.
Before this release, Netdata Cloud was only dispatching centralized email alert notifications to your team whenever an alert enters a warning, critical, or unreachable state. However, the agent supported tens of notification delivery methods, which we hadn't provided via the cloud.
We are now adding to Netdata Cloud more alert notification integration methods. We categorize them similarly to our subscription plans, as Community, Pro and Business. On this release, we added Discord (Community Plan), web hook (Pro Plan), PagerDuty and Slack (Business Plan).
â ď¸ Netdata Cloud notification methods availability depends on your subscription plan.
Notification methods | Community | Pro | Business |
---|---|---|---|
âď¸ | âď¸ | âď¸ | |
Discord | âď¸ | âď¸ | âď¸ |
Web hook | - | âď¸ | âď¸ |
PagerDuty | - | - | âď¸ |
Slack | - | - | âď¸ |
Notification integrations are classified based on whether they need to be configured per user (Personal notifications), or at the system level (System notifications).
Email notifications are Personal, meaning that administrators can enable or disable them globally, and each user can enable or disable them for them, per room. Email notifications are sent to the destination of the channel which is a user-specific attribute, e.g. user's e-mail. The users are the ones who can manage what specific configurations they want for the Space / Room(s) and the desired Notification level, via their User Profile page under Notifications.
All other introduced methods are classified as System, as the destination is a target that usually isn't specific to a single user, e.g. slack channel. These notification methods allow for fine-grain rule settings to be done by administrators. Administrators are able to specify different targets depending on Rooms or Notification level settings.
For more details please check the documentation here.
Coming by Feb 15th
Netdata Cloud already provides a role-based-access mechanism, that allows you to control what functionalities in the app users can access.
Each user can be assigned only one role, which fully specifies all the capabilities they are afforded.
With the advent of the paid plans we revamped the roles to cover needs expressed by our users, like providing more limited access to your customers, or being able to join any room. We also aligned the offered roles to the target audience of each plan. The end result is the following:
Role | Community | Pro | Business |
---|---|---|---|
Administrators This role allows users to manage Spaces, War Rooms, Nodes, Users, and Plan & Billing settings.Provides access to all War Rooms in the space | âď¸ | âď¸ | âď¸ |
Managers This role allows users to manage War Rooms and Users. Provides access to all War Rooms and Nodes in the space. | - | - | âď¸ |
Troubleshooters This role is for users focused on using Netdata to troubleshoot, not manage entities.Provides access to all War Rooms and Nodes in the space. | - | âď¸ | âď¸ |
Observers This role is for read-only access, with restricted access to explicitly defined War Rooms and only the Nodes that appear in those War Rooms. đĄ Ideal for restricting your customer's access to their own dedicated rooms. | - | - | âď¸ |
Billing This role is for users that only need to manage billing options and see invoices, with no further access to the system. | - | - | âď¸ |
The proc plugin gathers metrics from the /proc
and /sys
folders in Linux
systems, along with a few other endpoints, and is responsible for the bulk of the system metrics collected and
visualized by Netdata. It collects CPU, memory, disks, load, networking, mount points, and more.
We added a "cpu" label to the per core utilization % charts. Previously, the only way to filter or group by core was to use the "instance", i.e. the chart name. The new label makes the displayed dimensions much more user-friendly.
We fixed the issues we had with collection of CPU/memory metrics when running inside an LXC container as a systemd
service.
We also fixed the missing network stack metrics, when IPv6 is disabled.
Finally, we improved how the loadavg
alerts behave when the number of processors is 0, or unknown.
The apps plugin breaks down system resource usage
to processes, users and user groups, by reading whole process tree, collecting resource usage information for every process found running.
We fixed the nodejs
application group node
, which incorrectly included node_exporter
. The rule now is that the process must be called node
to be included in that group.
We also added a telegraf application group.
The cgroups plugin reads information on Linux Control Groups to monitor containers, virtual machines and systemd services.
The "net" section in a cgroups
container would occasionally pick the wrong / random interface name to display in the navigation menu. We removed the interface name from the cgroup
"net" family. The information is available in the cloud as labels and on the agent as chart names and ids.
The eBPF plugin helps you troubleshoot and debug how applications interact with the Linux kernel.
We improved the speed and resource impact of the collector shutdown, by reducing the number of threads running in parallel.
We fixed a bug with eBPF routines that would sometimes cause kernel panic and system reboot on RedHat 8.* family OSs. #14090, #14131
We fixed an ebpf.d
crash: sysmalloc
Assertion failed, then killed with SIGTERM
.
We fixed a crash when building eBPF while using a memory address sanitizer.
The eBPF collector also creates charts for each running application through an integration with the apps.plugin
.
This integration helps you understand how specific applications interact with the Linux kernel. In systems with many VMs (like Proxmox), this integration
can cause a large load. We used to have the integration turned on by default, with the ability to disable it from ebpf.d.conf
. We have now done the opposite, having the integration disabled by default, with the ability to enable it. #14147
We have been making tremendous improvements on how we monitor Windows Hosts. The work will be completed in the next release. For now, we can say that we have done some preparatory work by adding more info to existing charts, adding metrics for MS SQL Server, IIS in 1.37, Active Directory, ADFS and ADCS.
We also reorganized the navigation menu, so that Windows application metrics don't appear under the generic "WMI" category, but on their own category, just like Linux applications.
We invite you to try out with these collectors either from a remote Linux machine, or using our new MSI installer, which however is not suitable for production. Your feedback will be really appreciated, as we invest on making Windows Monitoring a first class citizen of Netdata.
Our Generic Prometheus Collector gathers metrics from any Prometheus endpoint that uses
the OpenMetrics exposition format.
To allow better grouping and filtering of the collected metrics we now create a chart with labels per label set.
We also fixed the handling of Summary/Histogram NaN values.
The TCP endpoint (portcheck) collector monitors TCP service availability and response time.
We enriched the portcheck
alarms with labels that show the problematic host and port.
The HTTP endpoint monitoring collector (httpcheck) monitors their availability and response time.
We enriched the alerts with labels that show the slow or unavailable URL relevant to the alert.
The new host reachability collector replaced fping
in v1.37.0.
We removed the deprecated fping.plugin
, in accordance with the v1.37.0 deprecation notice.
The RabbitMQ collector monitors the open source message broker, by
querying its overview
, node
and vhosts
HTTP endpoints.
We added monitoring of the RabitMQ queues that was available in the older Python module and
fixed an issue with the new metrics.
We monitor the MongoDB NoSQL database serverStatus and dbStats.
To allow better grouping and filtering of the collected metrics we now create a chart per database, repl set member, shard and additional metrics. We also improved the cursors_by_lifespan_count
chart dimension names, to make them clearer.
Our powerful PostgreSQL database collector has been enhanced with
an improved WAL replication lag calculation and better support of versions before 10.
The Redis collector monitors the in-memory data structure store via its INFO ALL command.
We now support password protected Redis instances, by allowing users to set the username/password in the collector configuration.
The Consul collector is production ready! Consul by HashiCorp is a powerful and complex identity-based networking solution, which is not trivial to monitor. We were lucky to have the assistance of HashiCorp itself in this endeavor, which resulted in a monitoring solution of exceptional quality. Look for common blog posts and announcements in the coming weeks!
The NGINX Plus collector monitors the load balancer, API gateway, and reverse proxy built on top of NGINX, by utilizing its Live Activity Monitoring capabilities.
We improved the collector that was launched last November with additional information explaining the charts and the addition of SSL error metrics.
The Elastic Search collector monitors the search engine's instances
via several of the provided local interfaces.
To allow better grouping and filtering of the collected metrics we now create a chart per node index, a dimension per health status. We also added several OOB alerts.
Our NVIDIA GPU Collector monitors memory usage, fan speed, PCIE bandwidth utilization, temperature, and other GPU performance metrics using the nvidia-smi
cli tool.
Multi-Instance GPU (MIG) is a feature from NVIDIA that lets users partition a single GPU to smaller GPU instances.
We added MIG metrics for uncorrectable errors and memory usage.
We also added metrics for voltage and PCIe bandwidth utilization percentage.
Last but not least, we significantly improved the collector's performance, by switching to collecting data using the CSV format.
We monitor Pi-hole, the Linux network-level advertisement and Internet tracker blocking application via its PHP API.
We fixed an issue with the requests failing against an authenticated API.
The ntpd program is an operating system daemon which sets and maintains the system time of day in synchronism with Internet standard time-servers (man page).
We rewrote our previous python.d collector in go, improving its performance and maintainability.
The new collector still monitors the system variables of a local ntpd
daemon and optionally the variables of its polled peers.
Similarly to ntpq
, the standard NTP query program, we used the NTP Control Message Protocol over a UDP socket.
The python collector will be deprecated in the next release, with no effect on current users.
See Additional alert notification methods on Netdata Cloud
The agents can now send notifications to Mattermost, using the Slack integration! Mattermost has a Slack-compatible API that only required a couple of additional parameters. Kudos to @je2555!
Netdata can export and visualize Netdata metrics in Graphite.
Our exporter was broken in v1.37.0 due to our host labels for ephemeral nodes. we fixed the issue with #14105.
To improve performance and stability, we made health run in a single thread.
The agent alert notifications are controlled by the configuration file health_alarm_notify.conf. Previously, if one used the |critical
modifier, the recipients would always get at least 2 notifications: critical and clear. There was no way how to stop sending clear/warning notifications afterwards. We added the |nowarn
and |noclear
notification modifiers, to allow users to really receive just the transitions to the critical state.
We also fixed the broken redirects from alert notifications to cleared alerts.
We constantly strive to improve the clarity of the information provided by the hundreds of out of the box alerts we provide.
We can now provide more fine-tuned information on each alert, as we started using specific chart labels instead of family
.
To provide the capability we also had to change the format of alert info variables to support the more complex syntax.
Administrators can now globally, permanently disable specific OOB alerts via netdata.conf
. Previously the options where to edit individual alert configuration files, or to use the health management API.
The [health]
section of netdata.conf
now support the setting enabled_alarms
. It's value defines which alarms to load from both user and stock directories. The value is a simple pattern list of alarm or template names, with the default value of *
, meaning that all alerts are loaded. For example, to disable specific alarms, you can provide enabled alarms = !oom_kill *
, which will load all alarms except oom_kill
.
Our main focus for visualization is on the Netdata Cloud Overview dashboard. This dashboard is our flagship, on which everything we do, all slicing and dicing capabilities of Netdata, are added and integrated. We are working hard to make this dashboard powerful enough, so that the need to learn a query language for configuring and customizing monitoring dashboards, will be eliminated.
On this release, we virtualized all items on the dashboard, allowing us to achieve exceptional performance on page rendering. In previous releases there were issues on dashboards with thousands of charts. Now the number of items in the page is irrelevant!
To make slicing and dicing of data easier, we ordered the on-chart selectors in a way that is more natural for most users:
This bar above the chart now describes the data presented, in plain English: On 6 out of 20 Nodes, group by dimension, the SUM() of 23 Instances, using All dimensions, each as AVG() every 3s
A tool-tip provides more information about the missing nodes:
And the drop-down menu now shows the exact nodes that contributed data to the query, together with a short explanation on why nodes did not provide any data:
Additionally, the pop-out icon next to each node can be used to jump to the single node dashboard of this node.
All the slicing and dicing controls (Nodes, Dimensions, Instances), now support filtering. As shown above, there is a search box in the drop-down and a tick-mark to the left of each item in the list, which can be used to instantly filter the data presented.
At the same time, we re-worked most of the Netdata collectors to add labels to the charts, allowing the chart to be pivoted directly from the group by drop-down menu. On the following image, we see the same chart as above, but now the data have been grouped by the label device
, the values of which became dimensions of the chart.
The data can be instantly be filtered by original dimension (reads
and writes
in this example), like this:
or even by a specific instance (disk
in this example), like this:
On the Instances drop down list (shown above), the pop-out icon to the right of each instance can be used to quickly jump to the single node dashboard, and we also made this function automatically scroll the dashboard to relative chart's position and filter on that chart the specific instance from which the jump was made.
Our goal is to polish and fine tune this interface, to the degree that it will be possible to slice and dice any data, without learning a query language, directly from the dashboard. We believe that this will simplify monitoring significantly, make it more accessible to people, and it will eventually allow all of us to troubleshoot issues without any prior knowledge of the underlying data structures.
At the same time, we worked to improve switching between rooms and tabs within a room, by saving the last visible chart and the selected page filters, we are restored automatically when the user switches back to the same room and tab.
For the ordering of the sections and subsections on the dashboard menu, we made a change to allow currently collected charts to overwrite the position of the section and subsection (we call it priority
). Before this change, archived metrics (old metrics that are retained due to retention), were participating in the election of the priority
for a section or subsection and because the retention Netdata maintains by default is more than a year, changes to the priority
were never propagated to the UI.
We fixed:
See Functions
See Events Feed.
See Dramatic performance and stability improvements, with a smaller agent footprint
Saving metadata to SQLite is now faster. Metadata saving starts asynchronously when the agent starts and continues as long as there are metadata to be saved. We implemented optimizations by grouping queries into transactions. At runtime this grouping happens per chart, which on shutdown it happens per host. These changes made metadata syncing up to 4x faster.
We introduced very significant reliability and performance improvements to the streaming protocol and the database replication. See Streaming, Replication.
At the same time, we fixed SSL handshake issues on established SSL connections, provide stable streaming SSL connectivity between Netdata agents.
Data queries for charts and contexts now have the following additional features:
We have been busy at work under the hood of the Netdata agent to introduce new capabilities that let you extend the "training window" used by Netdata's native anomaly detection capabilities.
We have introduced a new ML parameter called number of models per dimension
which will control the number of most recently trained models used during scoring.
Below is some pseudo-code of how the trained models are actually used in producing anomaly bits (which give you an "anomaly rate" over any window of time) each second.
# preprocess recent observations into a "feature vector"
latest_feature_vector = preprocess_data([recent_data])
# loop over each trained model
for model in models:
# if recent feature vector is considered normal by any model, stop scoring
if model.score(latest_feature_vector) < dimension_anomaly_score_threshold:
anomaly_bit = 0
break
else:
# only if all models agree the feature vector is anomalous is it considered anomalous by netdata
anomaly_bit = 1
The aim here is to only use those additional stored models when we need to. So essentially once one model suggests a feature vector looks anomalous we check all saved models and only when they all agree that something is anomalous does the anomaly bit get to be finally set to 1 to signal that Netdata considered the most recent feature vector unlike anything seen in all the models (spanning a wider training window) checked.
Read more in this blog post!
We now create ML charts on child hosts, when a parent runs a ML for a child. These charts use the parent's hostname to differentiate multiple parents that might run ML for a child.
Finally, we refactored the ML code and added support for multiple KMeans models.
We are always looking to improve the ways we make the agent available to users. Where we host our build artifacts is an important piece of the puzzle, and we've taken some significant steps in the past couple of months.
As of 2023-01-16, our nightly build artifacts are being hosted as GitHub releases on the new https://github.com/netdata/netdata-nightlies/ repository instead of being hosted on Google Cloud Storage. In most cases, this should have no functional impact for users, and no changes should be required on user systems.
As part of improving support for our native packages, we are migrating off of Package Cloud to our own self-hosted package repositories located at https://repo.netdata.cloud/repos/. This new infrastructure provides a number of benefits, including signed packages, easier on-site caching, more rapid support for newly released distributions, and the ability to support native packages for a wider variety of distributions.
Our RPM repositories have already been fully migrated and the DEB repositories are currently in the process of being migrated.
In addition to Docker Hub, our official Docker images are now available on GHCR and Quay. The images are identical across all three registries, including using the same tagging.
You can use our Docker images from GHCR or Quay by either configuring them as registries with your local container tooling, or by using ghcr.io/netdata/netdata
or quay.io/netdata/netdata
instead of netdata/netdata
.
The directives --local-build-options
and --static-install-options
used to only accept a single option each. We now allow multiple options to be entered.
We renamed the --install
option to --install-prefix
, to clarify that it affects the directory under which the Netdata agent will be installed.
To help prevent user errors, passing an unrecognized option to the kickstart script now results in a fatal error instead of just a warning.
We previously used grep
to get some info on login
or group
, which could not handle cases with centralized authentication like Active Directory or FreeIPA or pure LDAP. We now use "getent group" to get the group information.
We fixed the required permissions of the cgroup-network
and ebpf.plugin
in RPM packages.
We fixed the binary package updates that were failing with an error on "Zypper upgrade".
We fixed the missing required package installation of "tar".
We fixed some crashes on MacOS.
Netdata on Proxmox virtualization management servers must be allowed to resolve VM/container names and read their CPU and memory limits.
We now explicitly add the netdata
user to the www-data
group on Proxmox, so that users don't have to do it manually.
We fixed the path to "netdata.pid" in the logrotate postrotate script, which causes some errors during log rotation.
We also added pre gcc v5 support and allowed building without dbengine.
We have been working hard to revamp Netdata Learn. We are revising not just its structure and content, but also the
Continuous Integration processes around it. We're getting close to the finish line, but you may notice that we currently publish two versions; 1.37.x
is frozen with the state of the docs as of the 1.37.1 release, and the nightly
version has the target experience.
While not yet ready for production, the nightly
version is the only place where information on the latest features and changes is available.
The following screenshot shows how you can switch between versions.
Be aware that you may encounter some broken links or missing pages while we are sorting out the several hundred markdown documents and several thousand links they include. We ask for your patience and expect that by the next release we'll have properly launched the new, more easy to navigate and use
version.
The Netdata Demo space on Netdata Cloud is constantly being updated with new rooms, for various
use cases. You don't even need a cloud account to see our powerful infrastructure monitoring in action, so what are you waiting for?
We have improved the readability of our main error log file error.log
, by moving data collection specific log messages to collector.log
. For the same reason we reduced the log verbosity of streaming connections.
We reimplemented the edit-config
script we install in the user config directory, adding a few new features, and fixing a number of outstanding issues with the previous script.
Overall changes from the existing script:
ERROR:
instead of looking no different from other output from the script.edit-config --help
now properly returns usage information instead of throwing an error. Other supported options are --file
for explicitly specifying the file to edit (using this is not required, but we should ideally encourage it), and --editor
to specify an editor of choice on the command-line.--container
option to bypass the auto-detection and explicitly specify a container ID or name to use. Supports both Docker and Podman.NETDATA_USER_CONFIG_DIR
in the environment, just like with the old script..environment
file created by the install, falling back first to inferring the location from the scriptâs path and if that fails using the âdefaultâ of /usr/lib/netdata/conf.d
. From a user perspective, this changes nothing for any type of install we officially support and for any third-party packages I know of. This results in a slight simplification of the build code, as well as making testing of the script much easier (you can now literally just copy it to the right place, and it should work). Users can still override this by setting NETDATA_STOCK_CONFIG_DIR
.--list
option. This has two specific benefits:
/
or .
, we now do a proper prefix check for the supplied file path to make sure itâs under the user config directory. This provides tow specific benefits:
/etc/netdata/edit-config apps_groups.conf
, and it will blindly copy the stock apps_groups.conf
file to the current directory. With the new script, this will throw an error instead.netdata/edit-config netdata/apps_groups.conf
when in /etc
will now work, and /etc/netdata/edit-config /etc/netdata/apps_groups.conf
will work from anywhere on the system.The new Netdata Monitoring section on our dashboard has dozens of charts detailing the operation of Netdata. All new components have their charts, dbengine, metrics registry, the new caches, the dbengine query router, etc.
At the same time, we added a chart detailing the memory used by the agent and the function it is used for. This was the hardest to gather, since information was spread all over the place, but thankfully the internals of the agents have changed drastically in the last few months, allowing us to have a better visibility on memory consumption. At its heart, the agent is now mainly an array allocator (ARAL) and a dictionary (indexed and ordered lists of objects), carefully crafted to achieve their maximum performance when multithreaded. Everything we do, from data collection, to health, streaming, replication, etc., is actually business logic on top of these elements.
netdatacli version
now returns the version of netdata.
Coming by Feb 15th
At Netdata we take pride in our commitment to the principle of providing free and unrestricted access to high-quality monitoring solutions. We offer our free SaaS offering - what we call the Community plan - and the Open Source Agent, which feature unlimited nodes and users, unlimited metrics, and retention, providing real-time, high-fidelity, out-of-the-box infrastructure monitoring for packaged applications, containers, and operating systems.
We also start providing paid subscriptions, designed to provide additional features and capabilities for businesses that need tighter and customizable integration of the free monitoring solution to their processes. These are divided into three different plans: Pro, Business, and Enterprise. Each plan offers a different set of features and capabilities to meet the needs of businesses of different sizes and with different monitoring requirements.
You can change your plan at any time. Any remaining balance will be credited to your account, even for yearly plans. Netdata designed this in order to respect the unpredictability of world dynamics. Less anxiety about choosing the right commitments in order to save money in the long run.
The paid Netdata Cloud plans work as subscriptions and overall consist of:
Netdata provides two billing frequency options:
The detailed feature list and pricing in available in netdata.cloud/pricing.
The only dynamic variable we consider for billing is the number of concurrently running nodes or agents. We only charge you for your active running nodes. We obviously don't count offline nodes, which were connected in a previous month and are currently offline, with their metrics unavailable. But we go further and don't count stale nodes either, which are available to query through a Netdata parent agent but are not actively collecting metrics at the moment.
To ensure we don't overcharge any user due to sporadic spikes throughout a month or even at a certain point in a day we:
:note: Even if you have a yearly billing frequency, we track the p90 counts monthly, to charge any potential overage over your committed nodes.
When you subscribe to a Yearly plan you need to specify the number of nodes that you commit to. in addition to the discounted flat fee, you then get a 25% discount on the per node fee, as you're also committing to have those connected for a year. The charge for the committed nodes is part of your annual prepayment (node discounted price x committed nodes x 12 months
).
If in a given month your usage is over these committed nodes, we charge the undiscounted cost per node for the overage.
The Agent-Cloud link (ACLK) is the mechanism responsible for securely connecting a Netdata Agent to your web browser
through Netdata Cloud. The ACLK establishes an outgoing secure WebSocket (WSS) connection to Netdata Cloud on port
443
. The ACLK is encrypted, safe, and is only established if you connect your node.
We have always supported unauthenticated HTTP proxies for the ACLK. We have now added support for HTTP Basic authentication.
We also fixed a race condition on the ACLK query thread startup.
The following items will be removed in our next minor release (v1.39.0):
Patch releases (if any) will not be affected.
Component | Type | Will be replaced by |
---|---|---|
python.d/ntpd | collector | go.d/ntpd |
python.d/proxysql | collector | go.d/proxysql |
python.d/rabbitmq | collector | go.d/rabbitmq |
python.d/nvidia_smi | collector | go.d/nvidia_smi |
In accordance with our previous deprecation notice, the following items have been removed in this release:
Component | Type | Replaced by |
---|---|---|
python.d/dockerd | collector | go.d/docker |
python.d/logind | collector | go.d/logind |
python.d/mongodb | collector | go.d/mongodb |
fping | collector | go.d/ping |
Join the Netdata team on the 7th of February, at 17:00 UTC for the Netdata Agent Release Meetup.
Together weâll cover:
RSVP now - we look forward to meeting you.
As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter an issue with any of the changes made in this release or any feature in the Netdata Agent, feel free to contact us through one of the following channels:
We would like to thank our dedicated, talented contributors that make up this amazing community. The time and expertise
that you volunteer are essential to our success. We thank you and look forward to continuing to grow together to build a
remarkable product.
|nowarn
and |noclear
notification modifiers to agent notifications.Full Changelog: https://github.com/netdata/netdata/compare/v1.37.0...v1.38.0
Published by netdatabot almost 2 years ago
Netdata v1.37.1 is a patch release to address issues discovered since v1.37.0. Refer to the v.1.37.0 release notes for the full scope of that release.
The v1.37.1 patch release fixes the following issues:
In addition, the release contains the following optimizations and improvements:
As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter
an issue with any of the changes made in this release or any feature in the Netdata Agent, feel free to contact us
through one of the following channels:
Published by netdatabot almost 2 years ago
This release fixes two security issues, one in streaming authorization and another at the execution of alarm notification commands. All users are advised to update to this version or any later! Credit goes to Stefan Schiller of SonarSource.com for identifying both of them. Thank you, Stefan!
Another release of the Netdata Monitoring solution is here!
We focused on these key areas:
Read more about this release in the following sections!
Table of contents
â We're keeping our codebase healthy by removing features that are end of life. Read the deprecation notices to check if you are affected.
Scalability is one of the biggest challenges of monitoring solutions. Almost every commercial or open-source solution assumes that metrics should be centralized to a time-series database, which is then queried to provide dashboards and alarms. This centralization, however, has two key problems:
At Netdata we love high fidelity monitoring. We want granularity to be "per second" as a standard for all metrics, and we want to monitor as many metrics as possible, without limits.
Netdata Cloud does not collect or store all the data collected; that is one of its most beautiful and unique qualities. It only needs active connections to the Netdata Agents having the metrics. The Netdata Agents store all metrics in their own time-series databases (we call it dbengine
, and it is embedded into the Netdata Agents).
In this release, we introduce a new way for the Agents to communicate their metadata to the cloud. To minimize the amount of traffic exchanged between Netdata Cloud and Agents, we only transfer a very limited information of metadata. We call this information contexts
, and it is pretty much limited to the unique metric names collected, coupled with the actual retention (first and last timestamps) that each agent has available for query.
At the same time, to overcome the limitations of having hundreds of thousands of Agents concurrently connected to Netdata Cloud, we are now using EMQX as the message broker that connects Netdata Agents to Netdata Cloud. As the community grows, the next step planned is to have such message brokers in five continents, to minimize the round-trip latency for querying Netdata Agents through Netdata Cloud.
We also see Netdata Parents as a key component of our ecosystem. A Netdata Parent is a Netdata Agent that acts as a centralization point for other Netdata Agents. The idea is simple: any Netdata Agent (Child) can delegate all its functions, except data collection, to any other Netdata Agent (Parent), and by doing so, the latter now becomes a Netdata Parent. This means that metrics storage, metrics querying, health monitoring, and machine learning can be handled by the Netdata Parent, on behalf of the Netdata Children that push metrics to it.
This functionality is crucial for our ecosystem for the following reasons:
In this release we introduce significant improvements to Netdata Parents:
stream.conf
- and the connecting Agent Child needs to have data for at least that long in order for them to be replicated).All these improvements establish a huge step forward in providing an infinitely scalable monitoring infrastructure.
Many users think of Netdata Agent as an amazing single node-monitoring solution, offering limited real-time retention to metrics. This changed slightly over the years as we introduced dbengine
for storing metrics and even with the introduction of database tiering at the previous release, allowing Netdata to downscale metrics and store them for a longer duration.
As of this release, we now enable tiering by default! So, a typical Netdata Agent installation, with default settings, will now have 3 database tiers, offering a retention of about 120 - 150 days, using just 0.5 GB of disk space!
This is coupled with another significant achievement. Traditionally, the Agent dashboard showed only currently collected metrics. The dashboard of Netdata Cloud however, should present all the metrics that were available for the selected time-frame, independently of whether they are currently being collected or not. This is especially important for highly volatile environments, like Kubernetes, that metrics come and go all the time.
So, in this release, we rewrote the query engine of the Netdata Agent to properly query metrics independently of them being currently collected or not. In practice, the Agent is now sliced in two big modules: data collection and querying. These two parts do not depend on each other any more, allowing dashboards to query metrics for any time-frame there are data available.
This feature of querying past data even for non-collected metrics is available now via Netdata Cloud Overview dashboards.
We have completely rewritten the part of the installer responsible for setting up Netdata as a system service. This includes a number of major improvements over the old code, including the following:
/usr/local/etc/rc.d
instead of /etc/rc.d
.Additionally, this release includes a number of improvements to our OpenRC init script, bringing it more in-line with best practices for OpenRC init scripts, fixing a handful of bugs, and making it easier to run Netdata under OpenRCâs native process supervision.
We plan to continue improving this area in upcoming release cycles as well, including further improvements to our OpenRC support and preliminary support for installing Netdata as a service on systems using Runit.
As of this release, plugins can now register functions to the agent that can be executed on demand to provide real time, detailed and specific chart data. Via streaming, the definitions of functions are now transmitted to a parent and seamlessly exposed to the agent.
Agents now build an optimized disk-based index file to reduce memory requirements up to 90%. In turn, the Agent startup time improved by 1,000% (You read this right; this is not a typo!).
The Overview dashboard is the key dashboard of the Netdata ecosystem. We are constantly putting effort into improving this dashboard so that it will eventually be unnecessary to use anything else.
Unlike the Netdata Agent dashboard, the Netdata Cloud Overview dashboard is multi-node, providing infrastructure and service level views of the metrics, seamlessly aggregating and correlating metrics from all Netdata Agents that participate in a war room.
We believe that dashboards should be fully automated and out-of-the-box, providing all the means for slicing and dicing data without learning any query language, without editing chart definitions, and without having a deep understanding of the underlying metrics, so that the monitoring system is fully functional and ready to be used for troubleshooting the moment it is installed.
Moving towards this goal, in this release we introduce the following improvements:
The Single Node view dashboard now uses the same engine as the Overview.
With this, you get a more consistent experience, but also:
We are working to bring similar improvements to the local Agent dashboard. In the meantime, it will look different than the Single Node view on Netdata Cloud. On Netdata Cloud we use composite charts, instead of separate charts, for each instance.
This initial release of the Netdata data source plugin aims to maximize the troubleshooting capabilities of Netdata in Grafana, making them more widely available. It combines Netdataâs powerful collector engine with Grafana's amazing visualization capabilities!
We expect that the Open-Source community will take a lot of value from this plugin, so we donât plan on stopping here. We want to keep improving this plugin! We already have some enhancements on our backlog, including the following plans:
We would love to get you involved in this project! If you have ideas on things you'd like to see or just want to share a cool dashboard you've setup, you're more than welcome to contribute.
Check out our blogpost and YouTube video on this new plugin to see how it can work best for you.
Unseen
node state
To provide better visibility on different causes for why a node is Offline, we broke this status in to two separate statuses, so that you can now distinguish cases where a node never connected to Netdata Cloud successfully.
The following list presents our current node's statuses and their meaning:
There are different reasons why a node can't connect; the most common explanation for this falls into one of the following three categories:
For some guidelines on how to solve these issues, check our docs here.
To better showcase the potentialities and upgrades of Netdata, we have made available multiple rooms in our Demo space to allow you to experience the power and simplicity of Netdata with live infrastructure monitoring.
Netdata's new PostgreSQL collector offers a fully revamped comprehensive PostgreSQL DB monitoring experience. 100+ PostrgreSQL metrics are collected and visualized across 60+ composite charts. Netdata now collects metrics at per database, per table and per index granularity (besides the metrics that are global to the entire DB cluster) and lets users explore which table or index has a specific problem such as high cache miss, low rows fetched ratio (indicative of missing indexes) or bloat that's eating up valuable space. The new collector also includes built-in alerts for several problem scenarios that a user is likely to run into on a PostgreSQL cluster. For more information, read our docs or our blogfor a deep dive into PostgreSQL and why these metrics matter.
Netdata's Redis collector was updated to include new metrics crucial for database performance monitoring such as latency and new built-in alerts. For the full list of Redis metrics now available, read our docs or our blog for a deeper dive into Redis monitoring.
Netdata now monitors Cassandra, and comes with 25+ charts for all key Cassandra metrics. The collected metrics include throughput, latency, cache (key cache + row cache), disk usage and compaction, as well as JVM runtime metrics such as garbage collection. Any potential errors and exceptions that occur on your Cassandra cluster are also monitored. For more information read our docs or our blog.
To further improve Netdata Cloud and your user experience, multiple points around tech debt and infrastructure improvements have been completed. To name some of the key achievements:
We have improved the speed of chart creation by 70x. According to lab tests creating 30,000 charts with 10 dimensions each,
we achieved a chart creation rates of 7000 charts/second (vs 100 charts/second prior)
Alert processing for a host (e.g. child connected to a parent) is now done on its own host. Time-consuming health related initialization functions are deferred as needed and parallelized to improve performance.
Code improvements have been made to make use of dictionaries, better managing the life cycle of objects (creation, usage, and destruction using reference counters) and reducing explicit locking to access resources.
We would like to thank our dedicated, talented contributors that make up this amazing community. The time and expertise that you volunteer is essential to our success. We thank you and look forward to continue to grow together to build a remarkable product.
âď¸ Enhancing our collectors to collect all the data you need.
đ Improving our collectors one bug fix at a time.
đ Keeping our documentation healthy together with our awesome community.
âď¸ Greasing the gears to smooth your experience with Netdata.
đ Increasing Netdata's reliability, one bug fix at a time.
The following items will be removed in our next minor release (v1.38.0):
Patch releases (if any) will not be affected.
Component | Type | Will be replaced by |
---|---|---|
python.d/dockerd | collector | go.d/docker |
python.d/logind | collector | go.d/logind |
python.d/mongodb | collector | go.d/mongodb |
fping | collector | go.d/ping |
All the deprecated components will be moved to the netdata/community repository.
In accordance with our previous deprecation notice, the following items have been removed in this release:
Component | Type | Replaced by |
---|---|---|
python.d/postgres | collector | go.d/postgres |
In an effort to improve our kickstart script even more, documented here and here, a change will be made in the next major release that will result in users receiving an error if they pass an unrecognized option, rather than allowing them to pass through the installer code.
In the coming weeks, we will be introducing a new structure to Netdata Learn. Part of this effort includes having healthy redirects, instructions, and landing pages to minimize confusion and lost bookmarks, but users may still encounter broken links or errors when loading moved or deleted pages. Users can feel free to submit a Github Issues if they encounter such a problem, or reach out to the Netdata Documentation Team with questions or ideas on how our docs can best serve you.
In a forthcoming release, many external plugins will be moved to their own packages in our native packages to allow enhanced control over what plugins you have installed, to preserve bandwidth when updating, and to avoid some potentially undesirable dependencies. As a result of this, at some point during the lead-up to the next minor release, the following plugins will no longer be installed by default on systems using native packages, and users with any of these plugins on an existing install will need to manually install the packages in order to continue using them:
Note: Static builds and locally built installations are unaffected. Netdata will provide more details once the changes go live.
Join the Netdata team on the 1st of December, at 5PM UTC, for the Netdata Release Meetup, which will be held on
the Netdata Discord.
Together weâll cover:
RSVP now - we look forward to meeting you.
As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter
an issue with any of the changes made in this release or any feature in Netdata, feel free to contact us through one of the following channels:
Netdata Learn: Find documentation, guides, and reference material for monitoring and troubleshooting your systems with Netdata.
GitHub Issues: Make use of the Netdata repository to report bugs or open a new feature request.
GitHub Discussions: Join the conversation around the Netdata development process and be a part of it.
Community Forums: Visit the Community Forums and contribute to the collaborative knowledge base.
Discord: Jump into the Netdata Discord and hangout with like-minded sysadmins, DevOps, SREs and other troubleshooters. More than 1300 engineers are already using it!
Published by netdatabot about 2 years ago
Netdata v1.36.1 is a patch release to address two issues discovered since v1.36.0. Refer to the v.1.36.0 release notes for the full scope of that release.
The v1.36.1 patch release fixes the following:
As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter
an issue with any of the changes made in this release or any feature in the Netdata Agent, feel free to contact us
through one of the following channels:
Published by netdatabot about 2 years ago
Table of contents
â We're keeping our codebase healthy by removing features that are end of life. Read the deprecation notice to check if you are affected.
The Agent's default algorithm to run a metric correlations job (ks2) is based on Kolmogorov-Smirnov test. In this release, we also included the Volume algorithm, which is an heuristic algorithm based on the percentage change in averages between the highlighted window and a baseline, where various edge cases are sensibly controlled. You can explore our implementation in the Agent's source code
This algorithm is almost 73 times faster than the default algorithm (named ks2) with near the same accuracy. Give it a try by enabling it by default in your netdata.conf.
[global]
# enable metric correlations = yes
metric correlations method = volume
The Anomaly Advisor feature lets you quickly surface potentially anomalous metrics and charts related to a particular highlight window of interest. When the Agent trains its internal Machine Learning models, it produces an Anomaly Rate for each metric.
With this release, Netdata can now perform Metric Correlation jobs based on these Anomalous Rate values for your metrics.
In the past, you used to run MC jobs from the Node's dashboard with all the settings predefined. Now, Netdata gives you some extra functionality to run an MC job for a window of interest with the following options:
All this from the same, single dashboard.
Troubleshooting complicated infrastructures can get increasingly hard, but Netdata wants to continually provide you with the best troubleshooting experience. On that note, here are some next logical steps for for our Metric Correlations feature, planned for upcoming releases:
/weights
endpoint in the Agent's API; this is a WIP).Be on the lookout for these upgrades and feel free to reach us in our channels with your ideas.
Netdata is a high fidelity monitoring solution. That comes with a cost, the cost of keeping those data in your disks. To help remedy this cost issue, Netdata introduces with this release the Tiering mechanism for the Agent's time-series database (dbengine).
Tiering is the mechanism of providing multiple tiers of data with different granularity on metrics by doing the following:
Visit the Tiering in a nutshell section in our docs to understand the maximum potential of this feature. Also, don't hesitate to enable this feature to change the retention of your metrics
Note: *Of course the metric may vary; you can just recreate the exact time series without taking into consideration other parameters.
A Kubernetes Cluster can easily have hundreds (or even thousands) of pods running containers. Netdata is now able to provide you with an overview of the workloads and the nodes of your Cluster. Explore the full capabilities of the k8s_state module
In a previous release, we introduced unsupervised ML & Anomaly Detection in Netdata with Anomaly Advisor. With this next step, weâre bringing anomaly rates to every chart in Netdata Cloud. Anomaly information is no longer limited to the Anomalies tab and will be accessible to you from the Overview and Single Node view tabs as well. We hope this will make your troubleshooting journey easier, as you will have the anomaly rates for any metric available with a single click, whichever metric or chart you happen to be exploring at that instant.
If you are looking at a particular metric in the overview or single node dashboard and are wondering if the metric was truly anomalous or not, you can now confirm or disprove that feeling by clicking on the anomaly icon and expanding the anomaly rate view. Anomaly rates are calculated per second based on ML models that are trained every hour.
For more details please check our blog post and video walkthrough.
We've listened and understood the your pain around Space and War Room settings in Netdata Cloud. In response, we have simplified and organized these settings into a Centralized Administration Interface!
In a single place, you're now able to access and change attributes around:
Along with this change, the deletion of individual offline nodes has been greatly improved. You can now access the Space settings, and on Nodes within which it is possible to filter all Offline nodes, you can now mass select and bulk delete them.
On this release, we are doing a major improvement on our chart metadata syncing protocol. We moved from a very granular message exchange at chart dimension level to a higher level at context.
This approach will allow us to decrease the complexity and points of failure on this flow, since we reduced the number of events being exchanged and scenarios that need to be dealt with. We will continuously fix complex and hard-to-track existing bugs and any potential unknown ones.
This will also bring a lot of benefits to data transfer between Agents to Cloud, since we reduced the number of messages being transmitted.
To sum up these changes:
We have restructured composite charts into a more natural presentation. You can now read composite charts as if reading a simple sentence, and make better sense of how and what queries are being triggered.
In addition to this, we've added additional control over time aggregations. You can now instruct the agent nodes on what type of aggregation you want to apply when multiple points are grouped into a single one.
The options available are: min, max, average, sum, incremental sum (delta), standard deviation, coefficient of variation, media, exponential weighted moving average and double exponential smoothing.
We've also put some effort to improve our light and dark themes. The focus was put on:
Most of the time, you will group metrics by their dimension or their instance, but there are some benefits to other groupings. So, you can now group them by logical representations.
For instance, you can represent the traffic in your network interfaces by their interface type, virtual or physical.
This is still a work in progress, but you can explore the newly added labels on the following areas/charts:
We would like to thank our dedicated, talented contributors that make up this amazing community. The time and expertise that you volunteer is essential to our success. We thank you and look forward to continue to grow together to build a remarkable product.
âď¸ Enhancing our collectors to collect all the data you need.
âď¸ Enhancing our collectors to collect all the data you need.
đ Improving our collectors one bug fix at a time.
đ Keeping our documentation healthy together with our awesome community.
đŚ "Handle with care" - Just like handling physical packages, we put in a lot of care and effort to publish beautiful software packages.
--force-update
and new version of the updater script is available (#13104, @ilyam8)âď¸ Greasing the gears to smoothen your experience with Netdata.
đ Increasing Netdata's reliability one bug fix at a time.
đď¸ Changes to keep our code base in good shape.
minimum num samples to train
to 900
(#13174, @andrewm4894)The following items will be removed in our next minor release (v1.37.0):
Patch releases (if any) will not be affected.
Component | Type | Will be replaced by |
---|---|---|
python.d/postgres | collector | go.d/postgres |
All the deprecated components will be moved to the netdata/community repository.
In accordance with our previous deprecation notice, the following items have been removed in this release:
Component | Type | Replaced by |
---|---|---|
python.d/chrony | collector | go.d/chrony |
python.d/ovpn_status_log | collector | go.d/openvpn_status_log |
Join the Netdata team on the 11th of August for the Netdata Agent Release Meetup, which will be held on the Netdata Discord.
Together weâll cover:
We look forward to meeting you.
As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter an issue with any of the changes made in this release or any feature in the Netdata Agent, feel free to contact us through one of the following channels:
Published by netdatabot over 2 years ago
Netdata v1.35.1 is a patch release to address issues discovered since v1.35.0. Refer to the v.1.35.0 release notes for the full scope of that release.
The v1.35.1 patch release fixes an issue in the static build installation code that causes automatic updates to be unintentionally disabled when updating static installs.
If you have installed Netdata using a static build since 2022-03-22 and you did not explicitly disable automatic updates, you are probably affected by this bug.
For more details, including info on how to re-enable automatic updates if you are affected, refer to this Github issue.
As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter
an issue with any of the changes made in this release or any feature in the Netdata Agent, feel free to contact us
through one of the following channels:
Published by Ferroin over 2 years ago
Table of contents
â We're keeping our codebase healthy by removing features that are end of life. Read the deprecation notice to check if you are affected.
We are excited to launch one of our flagship machine learning (ML) assisted troubleshooting features in Netdata: the Anomaly Advisor.
Netdata now comes with on-device ML! Unsupervised ML models are trained for every metric, at the edge (on your devices), enabling real time anomaly detection across your infrastructure.
This feature is part of a broader philosophy we have at Netdata when it comes to how we can leverage ML-based solutions to help augment and assist traditional troubleshooting workflows, without having to centralize all your data.
The new Anomalies tab quickly lets you find periods of time with elevated anomaly rates across all of your nodes. Once you highlight a period of interest, Netdata will generate a ranked list of the most anomalous metrics across all nodes in the highlighted timeframe. The goal is to quickly let you find periods of abnormal activity in your infrastructure and bring to your attention the metrics that were most anomalous during that time.
In our latest release, we improved the usability of Anomaly Advisor and also ensured that the anomalous metrics are always relevant to the time period you are investigating.
A great deal of care has gone into ensuring that ML running on your device is as light weight in terms of resource consumption as possible. For instance, metrics that do not have sufficient data for training and metrics that are consistently constant during training periods are considered to be "normal" until their behavior changes significantly to require re-training of the ML models.
To use this feature, please enable ML on your agent and then navigate to the "Anomalies" tab in Netdata cloud. Update netdata.conf
with the following information to enable ML on your agent:
[ml]
enabled = yes
Read more about Anomaly Advisor at our blog.
Metric Correlations allow you to quickly find metrics and charts related to a particular window of interest that you want to explore further. Metric correlations compare two adjacent windows to find how they relate to each other, and then score all metrics based on this rating, providing a list of metrics that may have influence or have been influenced by the highlighted one.
Metric Correlation was already available in Netdata Cloud, but now we are releasing a version implemented at the Netdata Agent, which drastically reduces the time required for to run. This means the metric correlation can now run almost instantly (more than 10x faster than before)!
To enable the new metric correlation at the Netdata Agent, set the following in your netdata.conf
file:
[global]
enable metric correlations = yes
On very busy Kubernetes clusters where hundreds of containers spawn and are destroyed all the time, Netdata was consuming a lot of resources and was slow to detect changes and under certain conditions it missed certain containers.
Now, Netdata:
Netdata is also capable of detecting the network interfaces that have been allocated to containers, by spawning a process that switches network namespace and identifies virtual interfaces that belong to each container. This process is improved drastically, now requiring 1/3 of the CPU resources it needed before.
Additionally, Netdata cgroups.plugin
now collects CPU shares for Kubernetes containers, allowing the visualization of the Kubernetes CPU Requests (Kubernetes writes in cgroup CPU Shares the CPU Requests that have been configured for the containers).
A new option has been added in netdata.conf
[plugin:cgroup]
section, to allow filtering containers by (resolved) name. It matches the name of the cgroup (as you see it on the dashboard).
We have also released a blog post and a video about CPU Throttling in Kubernetes. You will be amazed by our findings. Read the blog and watch the video about Kubernetes CPU throttling.
Netdata Cloud dashboards are now a lot faster in aggregating data from multiple agents, as the protocol between agents and the Cloud is approaching its final shape.
Netdata Cloud has a new look and feel for charts, which resembles the look and feel for coding IDEs:
The new home tab for war rooms allows you to quickly inspect the most important metrics for every war room, like number of nodes, metrics, retention, replication, alerts, users, custom dashboards, etc.
Time units now in charts auto-scale from microseconds to days, automatically based on the value of time to be shown.
The agent now sets a timeout on every query it sends to the agents, and the agents now respect this timeout. Previously, the cloud was timing out because of a slow query, but the agents remained busy executing that query, which had a waterfall effect on the agent load.
Custom dashboards on Netdata Cloud can now be renamed.
We have added a new Alert Configs sub tab which lists all the alerts configured on all the nodes belonging to the war room. You have now a possibility of listing the alerts configured in the - war room, nodes and alert instances respectively.
There have been a number of corner cases under which alerts could remain raised on Netdata cloud. We identified all such cases, and now Netdata Cloud is always in sync with Netdata agents about their alerts.
Netdata now identifies the Cloud provider node type it runs on. It works for GCP and AWS, and exposes this information at the Nodes tab, the single node dashboard, and the node inspector.
We improved the virtualization detection in cases where systemd is not available. Now Netdata can properly detect virtualization even in these cases.
The new Netdata Cloud now supports a global filter on nodes of war rooms. The new filter is applied on every tab for each room, allowing users to quickly switch between tabs while retaining the nodes filtered.
Netdata admin users now have the ability to remove obsolete nodes from a space. Many users have been eagerly waiting for this feature, and we thank you for your patience. We hope you will be happy to use the feature and have cleaner spaces and war rooms. A few notes to be considered:
Every Netdata Agent is a StatsD server, listening on localhost port 8125, both TCP and UDP. You can use the Netdata StatsD server to quickly visualize metrics from scripts, Cron Job, and local applications.
In this release, the Netdata StatsD server has been improved to use Judy arrays for indexing the collected metrics, drastically improving its performance.
At the same time we extended the StatsD protocol to support dictionaries
. Dictionaries are similar to sets
, but instead of reporting only the number of unique entries in the set
, dictionaries
create a counter for each of the values and report the number of occurrences for each unique event. So, to quickly get a break down of events, you can push them to StatsD like myapp.metric:EVENT|d
. StatsD will create a chart for myapp.metric
and for each unique EVENT
it will create a dimension with the number of times this events was encountered.
We also added the ability to change the units of the chart and the family of the chart, using StatsD tags, like this: myapp.metric:EVENT|d|#units=events/s
.
Finally, StatsD now automatically creates a dashboard section for every StatsD application name. Following StatsD best practices, these application names are considered to be the first keyword of collected metrics. For example, by pushing the metric myapp.metric:1|c
, StatsD will create the dashboard section "StatsD myapp".
Read more at the Netdata StatsD documentation. A real-life example of using Netdata StatsD from a shell script pushing in realtime metric to a local Netdata Agent, is available at this stress-with-curl.sh gist.
Netdata dashboards refresh all visible charts in parallel, utilizing all the resources the web browsers provide to quickly present the required charts. Since Netdata only stores metric data at the agents, all these queries are executed in parallel at the agents.
This parallelism of queries is even more intense when metrics replication/streaming is configured. In these cases, parent Netdata agents centralize metric data from many agents, and, since Netdata Cloud prefers the more distant parents for queries, they receive quite a few queries in parallel for all their children.
We also reworked many parts of the query engine of Netdata agents to achieve top performance in parallel queries. Now, Netdata agents are able to perform queries at a rate of more than 30 million points per second, per core on modern hardware. On a parent Netdata agent with a 24-core CPU we observed a sustained rate of 1.3 billion points per second! This is 3 times faster compared to the previous release.
To achieve this performance improvements we worked in these areas:
When querying metric data, a lot of memory allocations need to happen. Although Netdata agents automatically adapt their memory requirements for data collection avoiding memory operations while iterating to collect data, unfortunately at the query engine site, this is not feasible.
To make the agent more efficient for queries, the number of system calls allocating memory had to be drastically decreased. So, we developed a One Way Allocator
(OWA
), a system that works like a scratchpad for memory allocations. When the query starts, we now predict the amount of memory needed to execute the query. The query engine still does all the individual allocations, but all these are now made against the scratchpad, not against the system. OWA
is smart enough to increase the size of the scratchpad if needed during querying. And it frees all memory at once without the need for individual memory releases.
For huge data queries, the benefit is astonishing. For certain heavy data queries, 45000 memory allocations before are down to 20 with this release! This doubled the performance of the query engine.
To optimize its memory footprint for metric data, Netdata agents store collected metric data into a fixed step database (after interpolation) with a custom floating point number format we developed (we call it storage_number
), requiring just 4 bytes per data collection point, including the timestamp. When on disk, mainly due to compression, Netdata's dbengine needs just 0.34 bytes per point (including all metadata), which is probably the best among all monitoring solutions available today, allowing Netdata to massively store and manage metric data at a very high rate.
This means however, that in order to actually use a point in a query, we have to unpack it. This unpacking happens point-by-point even for data cached in memory. 1 billion points in a data query, 1 billion numbers unpacked.
In this release we analyzed the CPU cache efficiency of the number unpacking and we refactored it to make the best use of available CPU caches to finally increase its performance by 30%.
This release includes a better algorithm to pick the available parent to stream metrics to. The previous version was always reconnecting to the first available parent. Now it rotates them, one by one and then restarts.
An issue was fixed regarding parents with stale alerts from disconnected children. Now, the parent validates all alerts on every child re-connection.
Netdata parents now have a timeout to cleanup dead/abandoned children connections automatically.
We also worked to eliminate most of the bottlenecks when multiple children connect to the same parent. But this is still under testing, so it will make it in the next release.
Netdata uses many workers to execute several of its features. There are web workers, aclk workers, dbengine
workers, health monitoring workers, libuv workers, and many more.
We manage to identify a lot of deadlocks happening that slowed down the whole operation. We also
increased the amount of workers to deliver more capacity on busy parents.
There is a new section for monitoring Netdata workers at the "Netdata Monitoring" section of the dashboard. Using this
work we are still working to make them even more efficient.
The last release was hindered by rare deadlocks on very busy parents. These deadlocks are now gone, improving the agents ability to centralize data from many children.
Judy arrays are probably the fastest and most CPU cache-friendly indexes available. Netdata already uses them for
dbengine and its page cache. Now all Netdata dictionaries are using them too, giving a performance boost to all
dictionary operations, including StatsD.
Initialization of /proc
collectors was suboptimal, because they had to go over a slow process or adapting their read
buffers. We added a forward-looking algorithm to optimize this initialization, which now happens in 1/10th of the
time.
Some users have experiences gaps in /proc
plugin charts. We identified that these gaps were triggered by the netdev
module, which were cause the whole plugin to slow down and miss data collection iterations.
Now the netdev
module of /proc
plugin runs on its own thread to avoid this influencing the rest of the /proc
modules.
The internal web server of Netdata now spreads the work among its worker threads more evenly, utilizing as much of the
parallelism that is available to it.
netdata.conf
re-organized
We re-organized the [global]
section of the netdata.conf
, so that it is more meaningful for new users. The new
configurations are backward compatible. So, after you restart netdata with your old netdata.conf
, grab the new one
from http://localhost:19999/netdata.conf
to have the new format.
We now have our own MQTT implementation within our ACLK protocol that will eventually replace the current MQTT-C client
for several reasons, including the following:
Currently, itâs provided as a tech preview, and itâs disabled by default. Feel free to have some fun with the new
implementation. This is how to enable it in netdata.conf
:
[cloud]
mqtt5 = yes
tailscaled
to apps_groups.conf.net
, aws
, and ha
groups in apps_groups.conf.caddy
to apps_groups.conf.âď¸ Enhancing our collectors to collect all the data you need.
đ Improving our collectors one bug fix at a time.
đ Keeping our documentation healthy together with our awesome community.
đŚ "Handle with care" - Just like handling physical packages, we put in a lot of care and effort to publish beautiful
software packages.
âď¸ Greasing the gears to smoothen your experience with Netdata.
đ Increasing Netdata's reliability one bug fix at a time.
đď¸ Changes to keep our code base in good shape.
history
with relevant dbengine
params (#13041, @andrewm4894)--auto-update
option when using static/build install method (#12725, @ilyam8)The following items will be removed in our next minor release (v1.36.0):
Patch releases (if any) will not be affected.
Component | Type | Will be replaced by |
---|---|---|
python.d/chrony | collector | go.d/chrony |
python.d/ovpn_status_log | collector | go.d/openvpn_status_log |
All the deprecated components will be moved to the netdata/community repository.
In accordance with our previous deprecation notice, the following items have been removed in this release:
Component | Type | Replaced by |
---|---|---|
node.d | plugin | - |
node.d/snmp | collector | go.d/snmp |
python.d/apache | collector | go.d/apache |
python.d/couchdb | collector | go.d/couchdb |
python.d/dns_query_time | collector | go.d/dnsquery |
python.d/dnsdist | collector | go.d/dnsdist |
python.d/elasticsearch | collector | go.d/elasticsearch |
python.d/energid | collector | go.d/energid |
python.d/freeradius | collector | go.d/freeradius |
python.d/httpcheck | collector | go.d/httpcheck |
python.d/isc_dhcpd | collector | go.d/isc_dhcpd |
python.d/mysql | collector | go.d/mysql |
python.d/nginx | collector | go.d/nginx |
python.d/phpfpm | collector | go.d/phpfpm |
python.d/portcheck | collector | go.d/portcheck |
python.d/powerdns | collector | go.d/powerdns |
python.d/redis | collector | go.d/redis |
python.d/web_log | collector | go.d/weblog |
This release adds official support for the following platforms:
This release removes official support for the following platforms:
This release includes the following additional platform support changes.
Join the Netdata team on the 9th of June at 5pm UTC for the Netdata Agent Release Meetup, which will be held on
the Netdata Discord.
Together weâll cover:
RSVP now - we look forward to
meeting you.
As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter
an issue with any of the changes made in this release or any feature in the Netdata Agent, feel free to contact us
through one of the following channels:
Published by netdatabot over 2 years ago
This patch release fixes versioning issues that occured in the latest release (Netdata v1.34):
Supporting people in using and building with Netdata is very important to us! Should you need any help or encounter an issue with any of the changes made in this release, feel free to get in touch with the community through the following channels:
Published by netdatabot over 2 years ago
Table of contents
â We're keeping our codebase healthy by removing features that are end of life. Read the deprecation notice to check if you are affected.
We're proud to empower each and every one of you to troubleshoot your infrastructure using Netdata:
If you're part of our community and love Netdata, please give us a star on GitHubâ.
Have you seen your applications get stuck or fail to respond to health checks? It might be the CPU quota limit!
Kubernetes relies on the kernel control group (cgroup) mechanisms to manage CPU constraints. The CPU quota is allocated based on a period of time, not on available CPU power. When an application has used its allotted quota for a given period, it gets throttled until the next period.
So if you donât set your CPU limits correctly, your applications will be throttled while your CPU may be idle. And CPU throttling is really hard to identify since Kubernetes only exposes usage metrics.
In this release, we make troubleshooting Kubernetes even easier by adding two new charts for CPU throttling:
The performance of the machine learning threads have been significantly optimized in this release. We were able to reduce peak CPU usage considerably by sampling input data randomly and excluding constant metrics from training. That way, we've optimized performance while maintaining high levels of accuracy. If you're streaming data between nodes: We've optimized CPU usage on parent nodes with multiple child nodes by altering the training thread's max sleep time.
We introduced streaming compression in Netdata Agent v1.33.0 as a tech preview. The feature has matured a lot since then so we are moving forward to alpha stage. From now on, streaming compression will be enabled by default, allowing you to leverage faster streaming between parent and child nodes at a lower bandwidth.
Go is known for its reliability and blazing speed - precisely what you need when monitoring networks. We've rewritten our SNMP collector from Node.js to Go. Apart from improved configuration options, the new collector eliminates the need for Node.js, slimming down our dependency tree.
Note: The node.js-based SNMP collector will be deprecated in the next release, see the deprecation notice.
đ SNMP Go collector documentation
We have been improving our kickstart script to give you a smooth installation experience. We've added some handy features like:
--reinstall-clean
option, you can now have the kickstart script cleanly uninstall an existing installation before installing Netdata again.We would like to thank our dedicated, talented contributors that make up this amazing community. The time and expertise that you volunteer is essential to our success. We thank you and look forward to continue to grow together to build a remarkable product.
sudo
.tar
can not set the correct permissions during installation.âď¸ Enhancing our collectors to collect all the data you need.
đ Improving our collectors one bug fix at a time.
đ Improving eBPF integration one bug fix at a time.
đŚ "Handle with care" - Just like handling physical packages, we put in a lot of care and effort to publish beautiful software packages.
đ Keeping our documentation healthy together with our awesome community.
âď¸ Greasing the gears to smoothen your experience with Netdata.
đ Increasing Netdata's reliability one bug fix at a time.
đď¸ Changes to keep our code base in good shape.
The following items will be removed in our next minor release (v1.35.0):
Patch releases (if any) will not be affected.
Component | Type | Replaced by |
---|---|---|
node.d | plugin | - |
node.d/snmp | collector | go.d/snmp |
python.d/apache | collector | go.d/apache |
python.d/couchdb | collector | go.d/couchdb |
python.d/dns_query_time | collector | go.d/dnsquery |
python.d/dnsdist | collector | go.d/dnsdist |
python.d/elasticsearch | collector | go.d/elasticsearch |
python.d/energid | collector | go.d/energid |
python.d/freeradius | collector | go.d/freeradius |
python.d/httpcheck | collector | go.d/httpcheck |
python.d/isc_dhcpd | collector | go.d/isc_dhcpd |
python.d/mysql | collector | go.d/mysql |
python.d/nginx | collector | go.d/nginx |
python.d/phpfpm | collector | go.d/phpfpm |
python.d/portcheck | collector | go.d/portcheck |
python.d/powerdns | collector | go.d/powerdns |
python.d/redis | collector | go.d/redis |
python.d/web_log | collector | go.d/weblog |
All the deprecated components will be moved to the netdata/community repository.
In accordance with our previous deprecation notice, the following items have been removed in this release:
Component | Type | Replaced by |
---|---|---|
backends | subsystem | exporting engine |
node.d/fronius | collector | - |
node.d/sma_webbox | collector | - |
node.d/stiebeleltron | collector | - |
node.d/named | collector | go.d/bind |
Supporting people in using and building with Netdata is very important to us! Should you need any help or encounter an issue with any of the changes made in this release, feel free to get in touch with the community through the following channels:
Published by netdatabot over 2 years ago
Netdata v1.33.1 is a patch release to address issues discovered since v1.33.0.
This release contains bug fixes and documentation updates.
If you also use Netdata Cloud, please note that we started migrating nodes running on the old architecture to the new one. Most users donât have to take any action on their part, but if you are affected by the migration, a banner will be added to your Cloud dashboard with a link to further instructions.
If you love Netdata and haven't yet considered giving us a Github star, we would appreciate for you to do so!
after
and before
URL params in direct links (#12052)As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter an issue with any of the changes made in this release or any feature in the Netdata agent, feel free to contact us by one of the following channels:
Published by netdatabot over 2 years ago
Happy New Year to everyone in the Netdata community. After one of our biggest releases ever, we have re-energized over the holidays and are ready to continue helping more people troubleshoot their infrastructure. Hopefully you've already heard about the improvements we made to the kickstart script. With this release, we're adding even more features:
âWe're also keeping our codebase healthy by removing end-of-life features. Read the deprecation notice to check if you are affected.
If you love Netdata and haven't given us a yet Github star, please do, we would really appreciate it!
The open-source Netdata Agent, the best OSS node monitoring and troubleshooting solution, currently has:
Netdata is supported both by an active community of global contributors and the Netdata staff.
Get involved:
We recently released a completely new version of our one-line installer code. Wherever available, our new kickstart script uses DEB or RPM packages provided by Netdata. These packages are tightly integrated with the package management system of the distribution, providing the best installation experience in a reliable and fast way.
Already over 70% of our new installations use DEB or RPM packages! The updated kickstart script has several advantages over the old one:
đ Find the updated install documentation on our official docs site.
If you were using the old kickstart.sh
script through a custom script or orchestration tool, you may need to update the options being passed to get it to behave like it used to (this will usually just involve adding --build-only
to the options).
Other installation types do not need to make any changes because of this.
The Agent's streaming mechanism now supports stream compression. Streaming thousands of metrics between Netdata Agents increases your data availability and provides a more robust mechanism to monitor your metrics and troubleshoot problems.
Stream compression allows you to:
Stream compression uses the lossless "LZ4 - Extreme fast compression" library. It achieves compression speeds up to 800Mbps, decompression speeds up to 4500Mbps with an average compression ratio between 2.0 and 3.0. Because this is a technical preview and we are still working to make it amazing, stream compression will be disabled by default.
đ Learn how to enable streaming between nodes.
đ If you already stream between nodes, learn how to enable streaming compression
Note: Stream compression only works if all participating Netdata Agents are hosted on an OS which supports the library version lz4 v1.9.0+. If a Netdata Agent does not detect the lz4 v1.9.0+ library version, it will disable stream compression.
In v1.32 we added some major improvements to our eBPF support. For this release, weâre taking the next step by gradually introducing BPF CO-RE support!
Today, the distribution of eBPF programs is very challenging, because trying to compile an eBPF program with so many different Linux kernels is so complex. We want to make eBPF widely available to everyone without worrying about compatibility. And here is where eBPF CO-RE (Compile Once, Run Everywhere), part of libbpf, comes to the rescue.
CO-RE is a modern approach to writing portable BPF applications that can run on multiple kernel versions and configurations without modifications and runtime source code compilation on the target machine. We now have the opportunity to focus on what matters, add more features, and improve performance of our eBPF offering!
Furthermore, in this release we also introduce two new eBPF charts:
We would like to thank our dedicated, talented contributors that make up this amazing community. The time and expertise that you volunteer is essential to our success. We thank you and look forward to continue to grow together to build a remarkable product.
-W buildinfo
output. (#12010, @Ferroin)The following items will be removed in our next release:
Following our previous deprecation notice legacy ACLK support is officially removed in this release. See more information in our last release notes (v1.32).
As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter an issue with any of the changes made in this release or any feature in the Netdata agent, feel free to contact us by one of the following channels:
Published by netdatabot almost 3 years ago
Netdata v.1.32.1 is a patch release to address issues discovered since 1.32.0.
This release contains bug fixes and documentation updates, including clarified instructions for ACLK and our Machine Learning (ML) functionality.
We appreciate our community's help in identifying and diagnosing these issues so we could fix them quickly.
We encourage users to upgrade to the latest version at their earliest convenience.
/api/v1/aclk
endpoint (#11881, @underhood)Published by Ferroin almost 3 years ago
The newest version of Netdata, v.1.32.0, propels us toward the end of the year, and the Netdata community is positioned to grow stronger than ever in 2022. Before we get into specifics of the new release, it's worth reflecting on that growth.
The open-source Netdata Agent, the best OSS node monitoring and troubleshooting ever, currently has:
The Netdata Cloud, our infrastructure-level, distributed, real-time monitoring and troubleshooting orchestrator, is also showing similar growth, with:
We are not just pleased with this amazing adoption rate, we are inspired by it. It is you users who give us the energy and confidence to move forward into a new era of high-fidelity, real-time monitoring and troubleshooting, made accessible to everyone!
Thank you for the inspiration! You rock!
As many of you know, even though we are not endorsed by CNCF, Netdata is the fourth most starred project in the CNCF landscape. We want to thank you for this expression of your appreciation. If you love Netdata and haven't yet, consider giving us a Github star.
Additionally, we invite you to join us on our new Discord server to continue our growth and trajectory, but also to join in on fun and informative live conversations with our wonderful community.
The following offers a high-level overview of some of the key changes made in this release, with more detailed description available in subsequent sections.
New Cloud backend and Agent communication protocol
This Agent release supports our new Cloud backend. From here, we will be offering much faster and simpler communication, reliable alerts and exchange of metadata, and first-time support for the parent-child relationship of Netdata agents. This is the first Agent release that allows Netdata Cloud to use the Netdata Agent as a distributed time-series database that supports replication and query routing, for every metric!
eBPF latency monitoring, container monitoring, and more
We use eBPF to monitor all running processes, without the cooperation of the processes and without sniffing data traffic. This new release includes 13 new eBPF monitoring features, including I/O latency, BTRFS, EXT4, NFS, XFS and ZFS latencies, IRQs latencies, extended swap monitoring, and more.
Machine learning (ML) powered anomaly detection
âThis release links Netdata Agent with dlib, the popular C++ machine learning algorithms library, which we use to automatically detect anomalies out-of-the-box, at the edge! Once enabled, Netdata trains an ML model for every metric, which is then used to detect outliers in real-time. The resulting "anomaly bit" (where 0=normal, 1=anomalous) associated with each database entry is stored alongside the raw metric value with zero additional storage overhead! This feature is still in development, so it is disabled by default. If you would like to test it and provide feedback, you can enable the feature using the instructions provided in the Detailed release highlights section.
New timezone selector and time controls in the user interface
We implemented a new timezone picker and time controls to enhance administrative abilities in the dashboard.
Docker image POWER8+ support
Netdata Docker images now support recent IBM Power Systems, Raptor Talos II, and more.
And more...
Four new collectors, 112 total improvements, 95 bug fixes, 49 documentation updates, and 57 packaging and installation changes!
It's no secret that the best of Netdata Cloud is yet to come. After several months of developing, testing, and benchmarking a new architectural system, we have steadied ourselves for that growth. These changes should offer notable and immediate improvements in reliability and stability, but more importantly, they allow us to quickly and efficiently develop new features and enhanced functionality. Here's what you can look for on the short-term horizon, thanks to our new architecture:
If you would like to be among the first to test this new architecture and provide feedback, first make sure that you have installed the latest Netdata version following our guide. Then, follow our instructions for enabling the new architecture.
We did a lot of work to enhance our eBPF container monitoring this release. First, we start with the development of full eBPF support for cgroups. As a refresher on just how important this update is: cgroups together with Namespaces are the building blocks for containers, which is the dominant way of distributing monitoring applications. We use cgroups to control how much of a given key resource (CPU, memory, network, and disk I/O) can be accessed or used by a process or set of processes. Our eBPF collector now creates charts for each cgroup, which enables us to understand how a specific cgroup interacts with the Linux kernel! đ¤
This enhances our already extensive monitoring by including cgroups for mem, process, network, file access, and more.
By enabling eBPF monitoring on all systems that support it, Netdata has already been established as a world-leading distributor of eBPF! We use eBPF to monitor all running processes, without the cooperation of the processes, by tracking any way the application interfaces with the system. And in this release, we continue our commitment to further improve eBPF by tracking latencies by disks, IRQs, etc.
Our new eBPF latency features include:
eBPF is a very strong addition to our monitoring tools, and we are committed to provide the best experience with monitoring with eBPF from a distance without disrupting the data flow!
But we didn't stop there with eBPF in v1.32.0. We also provided the following updates:
If you share our interest in eBPF monitoring, or have questions or requests, feel free to drop by our Community forum to start a discussion with us.
Machine learning (ML) is undeniably a wave of the future in monitoring and troubleshooting. The Netdata community is riding that wave forward together, ahead of everyone else. Netdata v.1.32.0 introduces some foundational capabilities for ML-driven anomaly detection in the agent. We have integrated the popular dlib c++ ml library to power unsupervised anomaly detection out-of-the-box.
While this functionality is still under development and subject to change, we want to develop this with you, as a team. The functionality is disabled by default while we dogfood the feature internally and build additional ML-leveraging features into Netdata Cloud. But you can go to the new [ml]
section in netdata.conf
and set enabled=yes
to turn on anomaly detection. After restarting Netdata, you should see the Anomaly Detection menu with charts highlighting the overall number and percent of anomalous metrics on your node. This can be a very useful single number summary of the state of your node.
Share your feedback by emailing us at [email protected] or just come hang out in the đ¤-ml-powered-monitoring channel of our discord, where we discuss all things ML and more!
And then, be on the lookout for some bigger announcements and launches relating to ML over the next couple of months.
Collaborating in a remote world across regions can be difficult, so we wanted to make it easier for you to sync with your administrative teams and your system information. Our new timezone selector allows you to select a timezone to accommodate collaboration needs within your teams and infrastructure. Additionally, we have added the following time controls to allow you to distinguish if the content you are looking at is live or historical and to refresh the content of the page when the tabs are in the background:
And on top of all of that, we have added 64-bit little-endian POWER8+ support to our official Docker images, allowing the use of Netdata Docker images on recent IBM Power Systems, Raptor Talos II, and similar POWER based hardware, extending the list of what is currently supported for our Docker images, which includes:
reset_netdata_trace.sh
from netdata.service (#11517, @ilyam8).install-type
before it is created (#11262, @ilyam8)install_type
detection during update (#11199, @ilyam8)-W buildinfo
output. (#11634, @Ferroin)An upcoming stable release of the Netdata agent will include a maintainability update to our base Docker image.
A small percentage of users will find that all self-compiled packages must be manually rebuilt after the update, even if relocation/SONAME errors are not encountered. --security-opt=seccomp=unconfined
can be passed with no default.json, but this introduces security vulnerabilities between the host and malicious code in the container.
Alternatively, users can prepare for the update by upgrading to one of the following:
While Netdata previously avoided making this update to minimize inconvenience to our users, we are now facing a third-party end-of-life date, and we believe the minimal number of affected users substantiates the need for the change.
Additionally, in a future stable release, we will be removing our legacy agent-to-cloud connection. Most users should see no change in this upgrade, but we will lose SOCKS 5 proxy support for the Netdata Cloud functionality, which will affect a small number of users.
As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter an issue with any of the changes made in this release or any feature in the Netdata agent, feel free to contact us by one of the following channels:
Published by netdatabot over 3 years ago
The v1.31.0 release of Netdata comes with re-packaged and redesigned elements of the dashboard to help you focus on your metrics, even more Linux kernel insights via eBPF, on-node machine learning to help you find anomalies, and much more.
This release contains 10 new collectors, 54 improvements (7 in the dashboard), 31 documentation updates, and 29 bug fixes.
We re-packaged and redesigned portions of the dashboard to improve the overall experience. Part of this effort is better handling of dashboard code during installationâanyone using third-party packages (such as the Netdata Homebrew formula) will start seeing new features and the new designs starting today. The timeframe picker has moved to the top panel, and just to its right are two counters with live CRITICAL
and WARNING
alarm statuses for your node. Click on either of these two open the alarms modal.
We've also pushed a number of powerful new collectors, including directory cache monitoring via eBPF. By monitoring directory cache, developers and SREs alike can find opportunities to optimize memory usage and reduce disk-intensive operations.
Our new Z-scores and changefinder collectors use machine learning to let you know, at a glance, when key metrics start to behave oddly. We'd love to get feedback on these sophisticated, subjective new brand of collectors!
Netdata Learn, our documentation and educational site, got some refreshed visuals and an improved navigation tree to help you find the right doc quickly. Hit Ctrl/â + k
to start a new search!
If you're not receiving automatic updates on your node(s), check our update doc for details.
smartd_log
module.fping
version..deb
and .rpm
packaging of the eBPF plugin.dash-example.html
.dash-example
documentation.node_id
for a host. (#11059, @stelfrag)curl
. (#11010, @ilyam8)dash-example.html
. (#10870, @tnyeanderson)host_cloud_enabled
attribute to analytics. (#11100, @MrZammler)systemdunits
collector. (#10904, @ilyam8)CRITICAL
or WARNING
alarms.charts
configuration option to templates. (#11054, @thiagoftsm)inconsistent
state to the mysql_galera_cluster_state
alarm. (#10945, @ilyam8)average
instead of sum
in VerneMQ alarms. (#11037, @ilyam8)CUSTOM
and MSTEAM
. (#11113, @MrZammler)synchronization.conf
to the Makefile. (#10907, @ilyam8)smartd_log
module. (#10872, @RaitoBezarius)mdstat
collector chart families. (#11024, @ilyam8)fping
version. (#10977, @Habetdin)python.d.plugin
runtime charts. (#11007, @ilyam8)kprobe
names in the eBPF plugin. (#11034, @thiagoftsm)cgroup
discovery cleanup info message. (#11101, @vlvkobal)size_t
instead of int for vfs_bufspace_count
in FreeBSD plugin. (#11142, @diizzyy)opensipsctl
executable. (#10978, @ilyam8)virsh
. (#11096, @ilyam8).deb
and .rpm
packaging of the eBPF plugin. (#11031, @wangpei-nice)make(1)
when building LWS. (#10799, @vkalintiris)mqtt_websockets
with Netdata autotools. (#11083, @underhood)performance.md
. (#11144, @cakrit)k6.md
. (#11127, @OdysLam)cgroups
plugin documentation. (#10924, @vlvkobal)options
syntax in the docs. (#10974, @ilyam8)nvidia-smi
collector documentation. (#10214, @vincentkersten)1.1.0
. (#11089, @thiagoftsm)NULL
claim ID on a parent node. (#11036, @stelfrag)Published by netdatabot over 3 years ago
This is a patch release to address discovered issues since 1.30.0.
cd
in updater. (#10936, @Ferroin)Published by netdatabot over 3 years ago
The v1.30.0 release of Netdata brings major improvements to our packaging and completely replaces Google Analytics/GTM for product telemetry. We're also releasing the first changes in an upcoming overhaul to both our dashboard UI/UX and the suite of preconfigured alarms that comes with every installation.
v1.30.0 contains 3 new collectors, 3 enhancements to notifications method, 38 improvements (13 in the dashboard), 16 documentation updates, and 17 bug fixes.
The ACLK-NG is a much faster method of securely connecting a node to Netdata Cloud. In addition, there are no external dependencies to our custom libmosquitto and libwebsockets libraries, which means there's no more need to build these during installation. To enable ACLK-NG on a node that's already running the Netdata Agent, reinstall with the --aclk-ng
option:
bash <(curl -Ss https://my-netdata.io/kickstart.sh) --aclk-ng --reinstall
We replaced Google Analytics/GTM, which we used for collecting product telemetry, with a self-hosted instance of the open-source PostHog project. When sending statistics to PostHog, any fields that might contain identifiable information, such as an IP address or URL, are hardcoded. If you previously opted-out of anonymous statistics, this migration does not change your existing settings.
We also published a developer environment (devenv) to simplify contributing to the Netdata Agent. The devenv packages everything you need to develop improvements on the Netdata Agent itself, or its collectors, in a single Docker image. Read more about this devenv, and get started, in the Netdata community repo.
ACLK-NG
). (#10315, @underhood)_aclk_impl
label to the /api/v1/info
endpoint. (#10778, @underhood)chart
parameter to the /api/v1/alarm_log
endpoint. (#10788, @MrZammler)curl connect-timeout
and decrease number of claim attempts. (#10800, @ilyam8)access.log
. (#10697, @stelfrag)struct avl
to avl_element
and the typedef
to avl_t
. (#10735, @vkalintiris)max
value to the nvidia_smi.fan_speed
gauge. (#10780, @ilyam8)dashboard_info.js
. (#10754, @ilyam8)dashboard_info.js
. (#10849, @ilyam8)docs.netdata.cloud
to learn.netdata.cloud
.repeat
feature. (#10846, @thiagoftsm)alarm.log
. (#10564, @thiagoftsm)wmi_
prefix to the wmi collector network alarms. (#10782, @ilyam8)info
fields. (#10853, @ilyam8)exporting_metrics_lost
template. (#10829, @ilyam8)ram_in_swap
alarm. (#10789, @ilyam8)packets_dropped_ratio
alarms for wireless network interfaces. (#10785, @ilyam8)dump_methods
parameter to alarm-notify.sh.in
. (#10772, @MrZammler)k6.conf
to the StatsD collector. (#10733, @OdysLam)apps.plugin
configuration. (#9313, @Steve8291)vpn
group in the apps.plugin
configuration. (#10743, @liepumartins)attribute 233
(Media Wearout Indicator (SSD)) collection to the python smartd_log collector. (#10711, @aazedo)go.d.plugin
version to v0.28.1. (#10826, @ilyam8)noauthcodecheck
workaround flag to the freeipmi collector. (#10701, @vlvkobal)cpuset.cpus
in the cgroups collector. (#10757, @ilyam8)buildinfo
. (#10706, @Ferroin)--aclk-ng
option to the netdata-installer script. (#10852, @underhood)service
to systemctl
. (#10703, @joelhans)memory mode
mention in StatsD example. (#10751, @OdysLam)main.h
. (#10858, @eltociear)backend_prometheus.c
. (#10716, @eltociear)dashboard_info.js
. (#10775, @eltociear)chart_label_key
. (#10844, @stelfrag)abs
to ABS
. (#10354, @KickerTom)