netdata

The open-source observability platform everyone needs!

GPL-3.0 License

Stars
68.6K
Committers
630

Bot releases are visible (Hide)

netdata - v1.45.5 Latest Release

Published by netdatabot 5 months ago

Netdata v1.45.5 is a patch release to address issues discovered since v1.45.4.

This patch release provides the following bug fixes and updates:

  • Fixed streaming sender functions payload corruption (#17696, @ktsaou).
  • Fixed crashes due to missing dimension IDs (external protocol) by detecting incorrect syntax and disabling plugins (#17690, @stelfrag).
  • Fixed ACLK Proxy compatibility: added Host header to CONNECT requests (#17670, @stelfrag).
  • Added Machine Learning support to CentOS 7 RPM packages, making it now available for users (#17667, @vkalintiris, #17682, @Ferroin).
  • Fixed calculation issue in the go.d/cockroachdb collector (#17659, @ilyam8).
  • Added limited support for offline installations within the updater code (#17648, @Ferroin).
  • Fixed Cloud Alert consistency: sending REMOVED transitions for disconnected child Agents (#17621, @stelfrag).
  • Added vnode support to go.d/windows collector dyncfg (#17478, @ilyam8).

Support options

As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter an issue with any of the changes made in this release or any feature in the Netdata Agent, feel free to contact us through one of the following channels:

  • Netdata Learn: Find documentation, guides, and reference material for monitoring and troubleshooting your systems with Netdata.
  • GitHub Issues: Make use of the Netdata repository to report bugs or open a new feature request.
  • GitHub Discussions: Join the conversation around the Netdata development process and be a part of it.
  • Community Forums: Visit the Community Forums and contribute to the collaborative knowledge base.
  • Discord Server: Jump into the Netdata Discord and hang out with like-minded sysadmins, DevOps, SREs, and other troubleshooters. More than 2000 engineers are already using it!
netdata - v1.45.4

Published by netdatabot 5 months ago

Netdata v1.45.4 is a patch release to address issues discovered since v1.45.3.

This patch release provides the following bug fixes and updates:

  • Added missing update_every property to the health prototype JSON schema (#17613, @ktsaou)
  • Fixed issue where parent alerts remained active after child disconnection, by resetting health on child disconnect (#17612, @ktsaou)
  • Fixed a packaging issue that prevented ndsudo from having the setuid bit in static builds (#17583, @ilyam8)
  • Increased spawn server command size and added shutdown safeguard to prevent crashes from command size limit exceeded (#17566, @stelfrag)
  • Fixed error code reporting for failed data insertion in SQLite (#17508, @stelfrag)
  • Fixed issue with name-only label matching (#17482, @stelfrag)
  • Improved Cloud connectivity: automatically re-establish connection upon system resume from suspension by scheduling a node update (#17444, @stelfrag)
  • Improved termination handling: start watcher thread post-fork, preventing main process from waiting indefinitely on TERM signal (#17436, @stelfrag)
  • Fixed priority order for alarms and alarm templates: now, alarms are applied before alarm templates consistently, regardless of their order in configuration files (#17398, @ktsaou)
  • Added option for health table cleanup with 'netdata -W sqlite-alert-cleanup' command (#17385, @stelfrag)

Support options

As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter an issue with any of the changes made in this release or any feature in the Netdata Agent, feel free to contact us through one of the following channels:

  • Netdata Learn: Find documentation, guides, and reference material for monitoring and troubleshooting your systems with Netdata.
  • GitHub Issues: Make use of the Netdata repository to report bugs or open a new feature request.
  • GitHub Discussions: Join the conversation around the Netdata development process and be a part of it.
  • Community Forums: Visit the Community Forums and contribute to the collaborative knowledge base.
  • Discord Server: Jump into the Netdata Discord and hang out with like-minded sysadmins, DevOps, SREs, and other troubleshooters. More than 2000 engineers are already using it!
netdata - v1.45.3

Published by netdatabot 6 months ago

[!WARNING]
Important Security Update

Netdata v1.45.3 is a patch release to fix a local privilege escalation vulnerability discovered in v1.45.x releases. Users are advised to upgrade any systems running v1.45.0, v1.45.1, or v1.45.2 immediately. Stable releases before v1.45.0 are unaffected by this vulnerability. Full details on the vulnerability can be found in the associated security advisory on GitHub. A big thank you to mia-0 for identifying and reporting this issue!

This patch release also addresses other issues discovered since v1.45.2.

This patch release provides the following bug fixes and updates:

  • Mitigated a security issue in ndsudo by restricting its search paths to a predefined set of directories (#17377, @ilyam8)
  • Resolved an issue that prevented the "percentage" option from functioning correctly in alert lookups (#17391, @ktsaou)
  • Enhanced macOS uninstallation by enabling removal of the associated LaunchDaemons plist file (#17357, @ilyam8)
  • Increased the default minimum thread stack size to 1 MB to address potential stability issues caused by the musl libc's smaller default (128kB) (#17317, @ilyam8)

Support options

As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter an issue with any of the changes made in this release or any feature in the Netdata Agent, feel free to contact us through one of the following channels:

  • Netdata Learn: Find documentation, guides, and reference material for monitoring and troubleshooting your systems with Netdata.
  • GitHub Issues: Make use of the Netdata repository to report bugs or open a new feature request.
  • GitHub Discussions: Join the conversation around the Netdata development process and be a part of it.
  • Community Forums: Visit the Community Forums and contribute to the collaborative knowledge base.
  • Discord Server: Jump into the Netdata Discord and hang out with like-minded sysadmins, DevOps, SREs, and other troubleshooters. More than 2000 engineers are already using it!
netdata - v1.45.2

Published by netdatabot 7 months ago

Netdata v1.45.2 is a patch release to address issues discovered since v1.45.1.

This patch release provides the following bug fixes and updates:

  • Improved PostgreSQL/MySQL local listener discovery to automatically check for connections using both TCP and Unix sockets, enabling support for passwordless Unix socket connections (#17304 #17305, @ilyam8)
  • Fixed an issue that prevented negative matching of host/chart labels in alert configurations (#17290 #17292, @ktsaou)
  • Improved go.d.plugin stability by preventing Netdata from shutting down the entire plugin due to an issue with registering jobs for unregistered modules (#17289, @ilyam8)
  • Improved go.d.plugin HTTP requests now include a UserAgent string, enhancing identification in server log (#17286, @ilyam8)
  • Improved Nginx discovery in go.d.plugin by automatically trying multiple status endpoints when discovering Nginx containers (#17285, @ilyam8)
  • Fixed a go.d.plugin panic that could occur when using the Unbound collector with TLS (#17283, @ilyam8)
  • Fixed a libyaml linking issue (#17276, @Ferroin)
  • Improved go.d.plugin configuration validation, preventing unexpected or invalid options through dynamic configurations (#17269, @ilyam8)

Support options

As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter an issue with any of the changes made in this release or any feature in the Netdata Agent, feel free to contact us through one of the following channels:

  • Netdata Learn: Find documentation, guides, and reference material for monitoring and troubleshooting your systems with Netdata.
  • GitHub Issues: Make use of the Netdata repository to report bugs or open a new feature request.
  • GitHub Discussions: Join the conversation around the Netdata development process and be a part of it.
  • Community Forums: Visit the Community Forums and contribute to the collaborative knowledge base.
  • Discord Server: Jump into the Netdata Discord and hang out with like-minded sysadmins, DevOps, SREs, and other troubleshooters. More than 2000 engineers are already using it!
netdata - v1.45.1

Published by netdatabot 7 months ago

Netdata v1.45.1 is a patch release to address issues discovered since v1.45.0.

This patch release provides the following bug fixes and updates:

  • Ensured proper handling of default values for data collection jobs submitted via dynamic configuration. (#17255, @ilyam8)
  • Optimized go.d.plugin service discovery by filtering out irrelevant docker-proxy listeners. (#17254, @ilyam8)
  • Improved go.d.plugin's ability to find applications, including those using IPv6, and identify Apache processes more reliably. (#17252, @ilyam8)
  • Improved OpenSSL discovery on macOS for Homebrew builds. (#17250, @Ferroin)
  • Obsolete references to saving the internal database using the USR1 signal, reflecting the removal of save/map memory modes. (#17249, @ilyam8)
  • Added ZSTD compression support for dbengine (disabled by default for now). This improves storage efficiency when available, automatically falling back to uncompressed pages for compatibility. (#17244, @ktsaou)
  • Fixed a bug that caused metric reference count errors during release. (#17239, @ktsaou)
  • Code cleanup. (#17237, @ktsaou)
  • Enabled Gorilla compression by default for dbengine, reducing memory usage. (#17234, @ktsaou)
  • Improved dbengine unit tests for better code coverage and maintainability. (#17232, @ktsaou)
  • Fixed a database engine cache bug that could cause queries to stop prematurely under pressure. (#17231, @ktsaou)
  • Implemented caching optimization to reduce the number of cache flushes following journal file v2 creation. (#17220, @stelfrag)
  • Reduced clutter in MySQL/MariaDB query logs by disabling session query logging for the go.d/mysql collector. (#17219, @ilyam8)
  • Improved go.d.plugin to correctly identify MariaDB databases. (#17218, @ilyam8)
  • Enhanced macOS build stability by using native libraries and optimizing checks for dependencies. (#17216, @Ferroin)
  • Suppressed unnecessary compiler warnings about redefined macros, improving build cleanliness and compatibility with stricter build flags. (#17209, @Ferroin)

Support options

As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter an issue with any of the changes made in this release or any feature in the Netdata Agent, feel free to contact us through one of the following channels:

  • Netdata Learn: Find documentation, guides, and reference material for monitoring and troubleshooting your systems with Netdata.
  • GitHub Issues: Make use of the Netdata repository to report bugs or open a new feature request.
  • GitHub Discussions: Join the conversation around the Netdata development process and be a part of it.
  • Community Forums: Visit the Community Forums and contribute to the collaborative knowledge base.
  • Discord Server: Jump into the Netdata Discord and hang out with like-minded sysadmins, DevOps, SREs, and other troubleshooters. More than 2000 engineers are already using it!
netdata - v1.45.0

Published by netdatabot 7 months ago

Table of Contents

Netdata Growth

  • 67.5k GitHub stars!
  • 626M Docker Hub pulls!

Thanks to your love ❤️, Netdata is leading the observability category in CNCF, having significantly more stars than Elasticsearch, Grafana, Prometheus and all other observability solutions listed in CNCF landscape.

We are committed to provide the most advanced and innovative observability solution, to help us minimize monitoring costs while providing AI-powered high-fidelity monitoring!

You like Netdata? Give Netdata a ⭐ too, on GitHub!

Release Summary

3 months have passed since the previous Netdata release. A lot has changed since then! Netdata now has a mobile app for alert notifications, new drag-and-drop custom dashboards, network connections monitoring, dynamic configuration for data collection jobs and alerts, and many more...

To see how Netdata stacks up against the most advanced commercial offerings available today, we did an analysis on how Dynatrace, Datadog, Instana, Grafana and Netdata commercial offerings compare.

It is nice to see that Netdata stands out for its:

  1. Excellent technology coverage

    Netdata's monitoring coverage is significantly higher compared to others, in all areas!

  2. High-fidelity, real-time insights

    Netdata is the only monitoring solution offering this kind of fidelity (per-second for all metrics), at this extend!

  3. Real AI and Machine-Learning.

    Netdata is the only monitoring system that offers real machine learning, running at the edge.

  4. Lightweight

    Netdata is among the lightest agents, despite the fact that it does a lot more than the others.

  5. Best cost efficiency

    Netdata's cost efficiency is unbeatable, making Netdata the most cost-efficient monitoring solution available today!

Read the full blog here.

Release Highlights

Netdata Mobile App

You can now receive Netdata alerts directly on your mobile phone!

Choose your space and see all the available notifications since you last signed in!

Check the full demo here.

The Mobile App is available for Homelab and Business plan users.

Custom Dashboards

You can now create advanced custom dashboards with Netdata!

  • Drag-and-Drop

    Easily move charts from Metrics or Single Node views straight to your dashboards. It's intuitive and fun.

  • New Chart Types

    Discover your data in new ways with Bar, Circle, Gauge, Pie, Value, and Group boxes.

  • Quick Dashboard Creation

    Hit the plus button, drag, and you've got a new dashboard. Simple as that.

  • Rename Charts

    Customize your dashboard by renaming charts to whatever makes sense to you.

  • Refreshed Text Cards

    We've upgraded text cards for better clarity and aesthetics.

On the Agent UI and the Community plan of Netdata Cloud 1 custom dashboard is allowed. The Homelab and Business plans of Netdata Cloud support an unlimited number of custom dashboards.

Network Viewer

Explore the network connections of your servers and processes!

302690196-5f71c102-9146-463e-acba-329094b136a5

Netdata got a network viewer (select network-connections from the Top tab inside the dashboard).

The tool reports all IPv4 and IPv6, TCP and UDP sockets a system and all its processes have. Also, it automatically and reliably classifies them as inbound, outbound, local (i.e. within the host itself), or listen (for daemons).

The visualization graph has 4 sides:

  • public (i.e. public IPs),
  • private (i.e. private and reserved IPs),
  • servers (i.e. listening and inbound sockets),
  • clients (i.e. sockets towards other servers).

The position of each application on the chart is determined by the classification of the sockets it has. To the top are clients, to the bottom are servers, to the right are internet facing applications, to the left is internal network applications.

The size of each application in the chart is determined by the number of sockets it has, and each application is a pie chart representing the percentage of each kind of sockets it has.

For servers with dozens of thousands of sockets, the tool provides an aggregated view, grouping similar sockets together and reporting the total. Users can switch to a detailed view from the UI.

User Settings

We've improved immensely the customization capabilities of Netdata with the introduction of User Settings.

Our first release on this front is focused on the customization of charts, either on the Metrics tab or the Single Node view tab. You can now create for any chart:

  • Personal views
  • Room specific views
  • Space dedicated views

With this, you can define what is best for your team to visualise a given chart but still allow each teammate to define their own. Users will be presented with the view they should see, based on setting hierarchy, but they
can interchangeably select which of the select views they want.

chrome_UPtwnDlNyd

More areas of customization will come soon, Filters saved views, Dashboards Table of Content (TOC) ordering, etc.

Dynamic Configuration (beta)

Netdata agents are now deployed with the ability to dynamically accept configuration from the UI, for data collection jobs and alerts. The feature is released in beta.

Alerts Configuration Manager

Check the full demo here

Alerts Silencing Rules

g6oHMtz

🔕Improvements done to make it easier to interact and see Alert Silencing Rules.
With this release, you will be able to:

  • See silencing rules status directly on entities like Alerts, Rooms, and Nodes
  • Immediately create a silencing rule for an Alert, a Room, or a Node

We hope this makes it easier for you to interact with the Alert Silencing Rule Manager.
Stay tuned for more improvements!

MacOS Processes Monitoring

Netdata's apps.plugin has been ported to macOS, allowing users to view processes information on Linux, FreeBSD and macOS!

Just install the latest Netdata on your macOS and enjoy full processes monitoring!

Homelab Plan

For non-professional use, get the whole and the latest of Netdata! For the cost of a beer per month, you can get access to all Business features of Netdata, for your home lab or personal project!

Our Homelab plan is available to technology enthusiasts and students, for non-professional user, offering the entire Netdata suite, for a small flat fee, under a fair usage policy.

  • Unlimited Access: Enjoy the freedom of unlimited usage, with no caps on nodes or custom dashboards.

  • Premium Features: Get your hands on business-level features, including enhanced alert integrations and access to our mobile app, all tailored for your personal projects.

  • Support Netdata: Support the open-source Netdata, to ensure it will be there for you, when you need it!

New Build Infrastructure

Starting with Netdata 1.45, we have completely removed our GNU Autotools based build system and replaced it with
CMake. The new CMake build system has a number of significant benefits for developers, package maintainers, and
those using local builds of Netdata.

  • We now have proper support for out-of-tree builds, and this is now the preferred method for building Netdata.
  • Build configuration is now measurably faster than it was previously.
  • Netdata can now be built using Ninja instead of Make, further speeding up the build process. The installer and
    updater script will automatically use Ninja instead of Make when possible, and developers, package maintainers,
    and users who are building by hand are encouraged to explicitly use it themselves when building Netdata by
    specifying the -G Ninja option to CMake during the configuration process.
  • Overall maintenance of the build system will be significantly easier going forwards. This means we should be
    able to fix any issues involving it much more quickly, and contributions from external developers should be
    much easier.
  • A number of features we have wanted to add to our build infrastructure will be much easier to add now.

Most users should not be directly affected by this change other than benefiting from the faster build times,
only those who were building locally by hand (not using the netdata-installer.sh script or the kickstart script)
will need to change things.

Go Plugin Moved to Main Repository

Alongside the new CMake build system, we have also moved the go.d.plugin
code from the netdata/go.d.plugin repository to the main netdata/netdata
repository.

We have made this change for three reasons:

  • It makes handling of bugs in the Go plugin much easier. Instead of possibly needing to track issues and PRs
    across two repositories, now everything should end up tracked coherently in one repository.
  • It lets us significantly simplify a number of parts of our CI and installation code, allowing for greater
    reliability and easier maintenance.
  • It provides a testbed for infrastructure for handling of Go in the main repo, which is significant as we have
    been internally looking at reimplementing some other components of the agent in Go.

Users of native packages and static builds should see no difference at all from this change.

Building the agent locally will now require a working Go toolchain supporting a particular minimum version of the
Go language (currently 1.21) if the Go plugin needs to be built. The plugin itself can still be disabled to avoid
this requirement, but this is not recommended.

The installer code will attempt to ensure that a sufficiently up-to-date Go toolchain is installed when installing
or updating the agent. If such a toolchain is not found, it will attempt to automatically install a copy of the
official toolchain from https://go.dev/dl/ in /usr/local/go. If that attempt fails, the Go plugin will be
DISABLED automatically at build time.

Acknowledgments

  • @candlerb for improving robustness of netdata-updater.sh.
  • @carrychair for removing unnecessary repetition of words in documentation.
  • @luisj1983 for adding "Backing up a Netdata Agent" documentation.
  • @moschlar for fixing --distro-override parameter name in kickstart documentation.
  • @pschaer for correcting instructions on creating a startup script in the "Install Netdata on Synology" guide.
  • @sepek for fixing description of "chart labels" in "Configure alerts".

Contributions

Collectors

  • Add macOS support for collecting resource usage of processes (apps.plugin) (#17180, @ktsaou)
  • Improve identification of applications in docker service discovery (go.d.plugin) (#17174, @ilyam8)
  • Execute local-listeners periodically rather than just once at startup (go.d.plugin) (#17160, @ilyam8)
  • Add service discovery for applications running inside Docker containers (go.d.plugin) (#17152, @ilyam8)
  • Implement dynamic configuration for configuring data collection jobs (go.d.plugin) (#17064, @ilyam8)
  • Update message IDs for systemd and dbus (systemd-journal.plugin) (#16987, @ktsaou)
  • Report EDAC ECC errors s total counts since boot instead of rates (proc/sys_devices_system_edac_mc) (#16970, @ilyam8)
  • Add aggregated view (network-viewer.plugin) (#16940, @ktsaou)
  • Add filtering by username (network-viewer.plugin) (#16911, @ktsaou)
  • Add Network Viewer plugin (#16872, @ktsaou)
  • Add CPU throttling % column to the containers-vms function (cgroups.plugin) (#16800, @ilyam8)
  • Add the ndsudo binary, a helper tool for assisting in the execution of privileged commands (#16614, @ktsaou)
  • Disable CPU per core metrics by default (proc.plugin) (#16572, @ilyam8)
  • Fix incorrect family value of the ZFS ZPool state chart (proc/proc_spl_kstat_zfs) (#17054, @ilyam8)
  • Fix race conditions (diskspace.plugin) (#16786, @ktsaou)
  • Fix allocated memory after it has been freed (diskspace.plugin) (#16784, @ktsaou)
  • Fix priority per-core CPU charts (proc/proc_stat) (#16749, @ilyam8)
  • Fix missing CPU frequency chart (proc/proc_stat) (#16732, @ilyam8)
  • Fix an issue where cgroup_check_for_new_every was incorrectly multiplied by update_every (cgroups.plugin) (#16719, @ilyam8)
  • Add mongodb-community-server image to docker service discovery configuration (go.d.plugin) (#17173, @ilyam8)
  • Add an option to disable service discovery (go.d.plugin) (#17171, @ilyam8)
  • Allow array/object to be null json schemas (go.d.plugin) (#17166, @ilyam8)
  • Update file path pattern in jsonschema (go.d.plugin) (#17164, @ilyam8)
  • Add support for multi-config templates in the service discovery configuration (go.d.plugin) (#17157, @ilyam8)
  • Improve go.d.plugin dyncfg config schemas (#17124, @ilyam8)
  • Fix incorrect chart priority for discovered configs (go.d.plugin) (#17115, @ilyam8)
  • Add notice log level (go.d.plugin) (#17112, @ilyam8)
  • Fix pulsar tests (go.d/pulsar) (#17093, @ilyam8)
  • Set max chart id length to 1200 (go.d.plugin) (#17062, @ilyam8)
  • Improve aggregated view (network-viewer.plugin) (#16960, @ktsaou)
  • Show unknown container (network-viewer.plugin) (#16900, @ktsaou)
  • Reorganise code to prepare for functions (ebpf.plugin) (#16788, @thiagoftsm)
  • Fix missing aral_freez call (eBPF) (#16765, @thiagoftsm)
  • Cleanup network devices rename (proc/proc_net_dev) (#16745, @ktsaou)
  • Improve ebpf-socket function column names (ebpf.plugin) (#16727, @ilyam8)
  • Add double-linked network interfaces collection delay (#16701, @ilyam8)
  • Cleanup code and improve reliability (ebpf.plugin) (#16669, @thiagoftsm)
  • Update to create a separate chart for each systemd service rather than a chart dimension (ebpf.plugin) (#16630, @thiagoftsm)
  • Include 'lxcfs.service/.control' in the list of filtered cgroups (cgroups.plugin) (#16620, @ilyam8)
  • Exit if unable to locate journal data directories (systemd-journal.plugin) (#16592, @ilyam8)

Health

  • Remove deprecated alert fields from stock alarms (#17113, @ilyam8)
  • Fix filtering by severity for gotify notifications (#17069, @ilyam8)
  • Remove deprecated alert fields: "charts", "os", "host", "plugin" and "module" (#17048, @ktsaou)
  • Add a new alert to notify about systemd timer units that have failed (#16845, @tkatsoulas)
  • Implement dynamically configured alerts (#16779, @ktsaou)
  • Add a new alert to detect unexpected HTTP headers (#16736, @ilyam8)

Packaging/Installation

Documentation

  • Improve "Choose your Netdata Cloud theme" doc (#17172, @Ancairon)
  • Add instructions for monitoring NVIDIA GPUs to the Docker installation guide (#17167, @ilyam8)
  • Add documentation for the "Integration URL" field to PagerDuty Cloud integration doc (#17149, @juacker)
  • Bring back old docs that were containing missing information (#17146, @Ancairon)
  • Remove unnecessary repetition of words in docs (#17131, @carrychair)
  • Fix broken link in "Netdata Cloud On-Prem Installation" (#17118, @tkatsoulas)
  • Fix typos and improve wording in "Backing up a Netdata Agent" (#17117, @Ancairon)
  • Remove deprecated settings from "Configure alerts" (#17116, @ilyam8)
  • Fix broken links in go.d.plugin markdown files (#17108, @ilyam8)
  • Remove deprecated "foreach" from "Configure alerts" (#17106, @ilyam8)
  • Remove distributed-data-architecture.md (#17097, @Ancairon)
  • Fix broken links (#17095, @Ancairon)
  • Remove docs/netdata-security.md (#17094, @Ancairon)
  • Update "Plugin Functions Tables" docs (#17071, @car12o)
  • Update "Sizing Netdata Agents" doc (#17057, @ktsaou)
  • Fix links pointing to old go.d repo and update the integrations (#17040, @Ancairon)
  • Update links to Netdata Agent start-stop-restart docs (#17037, @Ancairon)
  • Include information on securing Netdata parent-child communication in "Configuring Metrics Centralization Points" (#17035, @Ancairon)
  • Restructure and update documentation (#17014, @Ancairon)
  • Add "Backing up a Netdata Agent" documentation (#17006, @luisj1983)
  • Correct instructions on creating a startup script in the "Install Netdata on Synology" guide (#16980, @pschaer)
  • Improve formatting in "How to optimize the Netdata Agent's performance" (#16925, @tkatsoulas)
  • Fix links to the energy efficiency screenshots to main readme file (#16904, @Aliki92)
  • Update "What's New and Coming?"based on Office Hours shared plans to main readme file (#16895, @hugovalente-pm)
  • Improve readability of Webhook Cloud notification documentation (#16882, @juacker)
  • Remove deprecated db mode "save" from "Database" (#16864, @Ancairon)
  • Fix CNCF link (#16851, @hugovalente-pm)
  • Add documentation on how to configure MS Teams Cloud notifications (#16834, @papazach)
  • Added instructions on calculating replication history to "Streaming and Replication Reference" (#16816, @thiagoftsm)
  • Update provisioning instructions in "Netdata Cloud On-Prem Light PoC" (#16811, @M4itee)
  • Add information about the new node permissions to "Role-Based Access model" (#16791, @vkuznecovas)
  • Add missing settings to "Streaming and replication reference" (#16778, @thiagoftsm)
  • Fix instructions for setting up Telegram notifications (#16777, @thiagoftsm)
  • Updated the kickstart URL to https://get.netdata.cloud/kickstart.sh (#16738, @ilyam8)
  • Fix --distro-override parameter name in "Install Netdata with kickstart.sh" (#16726, @moschlar)
  • Add the Mobile App notification Integration (#16715, @sashwathn)
  • Add "Require Cloud" column to the functions table in "Netdata Functions" (#16681, @ilyam8)
  • Fix typos and improve wording in "Creating Alerts with Netdata Alerts Configuration Manager" (#16679, @Ancairon)
  • Fix description of "chart labels" in "Configure alerts" (#16656, @sepek)
  • Fix formatting in "Creating Alerts with Netdata Alerts Configuration Manager" (#16651, @Ancairon)
  • Add practical examples showcasing how to utilize journalctl for querying Netdata logs to "Netdata Logging" (#16650, @ilyam8)
  • Add "Creating Alerts with Netdata Alerts Configuration Manager" (#16642, @sashwathn)
  • Add instructions for installing Netdata in a rootless Docker environment (#16632, @ilyam8)
  • Add energy efficiency image to main readme file (#16617, @Aliki92)
  • Remove deprecated memory mode "map" and "save" (#16604, @vkalintiris)
  • Update Splunk icon to a dark version for improved visibility (#16593, @juacker)
  • Add documentation on how to configure Splunk Cloud notifications (#16586, @juacker)
  • Add a new document explaining Gorilla compression and decompression techniques (#16553, @vkalintiris)
  • Add an initial version of the "Plugin Functions Tables" documentation (#16535, @ktsaou)

Other Notable Changes

  • Fix a crash occurring when failing to create the requested number of tiers (#16999, @stelfrag)
  • Fix an issue where Netdata plugins could inherit unintended sockets or file descriptors during the forking process (#16881, @ktsaou)

Deprecation notice

Changed in this release

All depreciated items from the v1.44.0 notice have been addressed except for enabling gorilla compression by default.

Additionally, the following Alert options have been deprecated in this release. While Netdata will still understand these options when
reading existing alert configurations for now, we recommend updating your custom alert configurations to use the
replacements listed below. Compatibility with these deprecated options might be removed in a future release.

Option Use instead
foreach DIMENSIONS (lookup line) -
charts -
os host labels: _os=X
host host labels: _hostname=X
plugin chart labels: _collect_plugin=X
module chart labels: _collect_module=X

Where X is a simple pattern.

Netdata Release Meetup

Join the Netdata team on the 25th of March at 17:00 UTC for the Netdata Release Meetup.

Together we’ll cover:

  • Release Highlights.
  • Acknowledgments.
  • Q&A with the community.

RSVP now - we look forward to meeting you.

Support options

As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter an issue with any of the changes made in this release or any feature in the Netdata Agent, feel free to contact us through one of the following channels:

  • Netdata Learn: Find documentation, guides, and reference material for monitoring and troubleshooting your systems with Netdata.
  • GitHub Issues: Make use of the Netdata repository to report bugs or open a new feature request.
  • GitHub Discussions: Join the conversation around the Netdata development process and be a part of it.
  • Community Forums: Visit the Community Forums and contribute to the collaborative knowledge base.
  • Discord Server: Jump into the Netdata Discord and hang out with like-minded sysadmins, DevOps, SREs, and other troubleshooters. More than 2000 engineers are already using it!
netdata - v1.44.3

Published by netdatabot 8 months ago

Netdata v1.44.3 is a patch release to address issues discovered since v1.44.2.

This patch release provides the following bug fixes and updates:

  • Improved handling of slow queries and CPU usage of the ACLKSYNC thread. (#16838, @stelfrag)
  • Improved error handling for listen bind failures. Instead of terminating fatally, Netdata now exits gracefully. (#16937, @stelfrag)
  • Fixed invalid alert durations in health log entries. (#16931, @stelfrag)
  • Fixed a race condition during analytics data setup, preventing potential Netdata crashes. (#16929, @stelfrag)
  • The Netdata base image includes Debian backports for comprehensive security and stability. (netdata/helper-images#271, @tkatsoulas)

Support options

As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter an issue with any of the changes made in this release or any feature in the Netdata Agent, feel free to contact us through one of the following channels:

  • Netdata Learn: Find documentation, guides, and reference material for monitoring and troubleshooting your systems with Netdata.
  • GitHub Issues: Make use of the Netdata repository to report bugs or open a new feature request.
  • GitHub Discussions: Join the conversation around the Netdata development process and be a part of it.
  • Community Forums: Visit the Community Forums and contribute to the collaborative knowledge base.
  • Discord Server: Jump into the Netdata Discord and hang out with like-minded sysadmins, DevOps, SREs, and other troubleshooters. More than 1900 engineers are already using it!
netdata - v1.44.2

Published by netdatabot 9 months ago

Netdata v1.44.2 is a patch release to address issues discovered since v1.44.1.

This patch release provides the following bug fixes and updates:

  • Fixed an inconsistency where the NETDATA_LOG_LEVEL environment variable did not affect log level in Docker containers. (#16943, @ilyam8)
  • Fixed inconsistent log severity across sources: log severity level setting now work for all Netdata log sources (daemon, collector, health, access, aclk). (#16922, @ilyam8)
  • Fixed a bug in chartd.d.plugin that prevented loading of its modules configuration files. (#16939, @ilyam8)
  • Fixed inaccurate server type identification in Netdata Cloud for FreeBSD jails. Jails are now recognized correctly. (#16858, @ilyam8)
  • Fixed a bug that prevented the edit-config script from running correctly in Podman containers. The script now accurately identifies container environments. (#16825, @Ferroin)
  • Fixed a bug that caused excessive logging of "Using host prefix directory" messages. (#16814, @ilyam8)
  • Fixed incorrect label source for apps.plugin charts, ensuring they are now accessible when querying Prometheus metrics. (#16810, @boxjan)
  • Fixed a bug in the cgroups.plugin that could lead to crashes. Additionally, addressed incorrect thread name during fatal Agent exits. (#16771, @ktsaou)
  • Fixed a race condition related to pthread_detach() calls, preventing potential Netdata crashes during thread creation. (#16760, @ktsaou)
  • Fixed a bug that caused "maximum number of cgroups reached" messages to spam logs. (#16730, @ilyam8)
  • Fixed incorrect service file location during MacOS installation: now, launchctl commands can reliably start and stop Netdata. (#16693, @ilyam8)
  • Fixed a bug that caused the Netdata claiming process to fail on macOS due to an inaccessible netdata-claim.sh script. (#16686, @ilyam8)
  • Fixed missing host label streaming from child nodes: host labels are now transmitted reliably to parent nodes. (#16821, @stelfrag)
  • Fixes a bug in clock resolution calculation that prevented some data collection plugins from working correctly. (#16720, @ktsaou)
  • Fixed a bug that caused Netdata to crash when calculating database size due to missing or single datafiles. (#16699, @ktsaou)
  • Fixed a bug that caused the cups.plugin to not terminate upon receiving a SIGPIPE (Broken Pipe) signal. (#16691, @ilyam8)
  • Fixed a reference counting issue that could lead to Netdata crashes. (#16687, @ktsaou)
  • Fixed charts context and family definitions of exporting engine. (#16683, @ilyam8)
  • Fixed a bug that could cause crashes when processing web requests. (#16664, @ktsaou)
  • Fixed improper handling of the dbengine event loop during shutdown. (#16658, @stelfrag)
  • Fixed a potential memory corruption issue in database code. (#16654, @stelfrag)
  • Fixed "response too big" error for Systemd-journal: addressed limitations by raising the maximum web response size. (#16649, @ktsaou)
  • Fixed compilation issues with --disable-dbengine: addressed errors that prevented successful builds when this flag was used. (#16611, @stelfrag)
  • Fixed labels corruption due to duplicate key/value pairs. Additionally, addressed logging errors that occurred during fatal Agent exits. (commit, @ktsaou)
  • Update go.d.plugin to v0.58.0. (#16725, @ilyam8).

Acknowledgements

We would like to thank our dedicated, talented contributors who make up this amazing community. The time and expertise that you volunteer are essential to our success. We thank you and look forward to continuing to grow together to build a remarkable product.

  • @boxjan for fixing incorrect label source for apps.plugin charts.

Support options

As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter an issue with any of the changes made in this release or any feature in the Netdata Agent, feel free to contact us through one of the following channels:

  • Netdata Learn: Find documentation, guides, and reference material for monitoring and troubleshooting your systems with Netdata.
  • GitHub Issues: Make use of the Netdata repository to report bugs or open a new feature request.
  • GitHub Discussions: Join the conversation around the Netdata development process and be a part of it.
  • Community Forums: Visit the Community Forums and contribute to the collaborative knowledge base.
  • Discord Server: Jump into the Netdata Discord and hang out with like-minded sysadmins, DevOps, SREs, and other troubleshooters. More than 1900 engineers are already using it!
netdata - v1.44.1

Published by netdatabot 10 months ago

Netdata v1.44.1 is a patch release to address issues discovered since v1.44.0.

This patch release provides the following bug fixes and updates:

  • Fixed an issue in the uninstall script that prevented log2journal and systemd-cat-native from being removed (#16585, @ilyam8).
  • Fixed a bug that caused the debugfs.plugin to not terminate upon receiving a SIGPIPE (Broken Pipe) signal (#16569, @ilyam8).
  • Fixed memory leak during host chart label cleanup (#16568, @stelfrag).
  • Fixed incorrect cpu architecture/ram/disk values in build info (#16567, @ilyam8).
  • Fixed a bug that prevented the parent from accepting streaming connections on systems with one CPU core (#16565, @stelfrag).
  • Make the systemd-journal mandatory package on Centos 7 and Amazon linux 2 (#16562, @tkatsoulas).
  • Fixed crash on reading memory clock speed of an AMD graphics card (#16561, @MrZammler).
  • Fixed an unhandled error that occurred when setting file capabilities in the Debian postinst script of the perf.plugin (#16558, @tkatsoulas).
  • Fixed an issue where the user's netdata home directory was set to an incorrect value (#16548, @ilyam8).
  • Added the lightweight text editor to the Docker image (#254, @tkatsoulas).

Support options

As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter an issue with any of the changes made in this release or any feature in the Netdata Agent, feel free to contact us through one of the following channels:

  • Netdata Learn: Find documentation, guides, and reference material for monitoring and troubleshooting your systems with Netdata.
  • GitHub Issues: Make use of the Netdata repository to report bugs or open a new feature request.
  • GitHub Discussions: Join the conversation around the Netdata development process and be a part of it.
  • Community Forums: Visit the Community Forums and contribute to the collaborative knowledge base.
  • Discord Server: Jump into the Netdata Discord and hang out with like-minded sysadmins, DevOps, SREs, and other troubleshooters. More than 1700 engineers are already using it!
netdata - v1.44.0

Published by netdatabot 11 months ago

Table of Contents

Steady to our schedule, this is another great Netdata release!

[!IMPORTANT]
Stay informed about upcoming changes and potential deprecations by reviewing the deprecation notice sections. This will help you plan for any necessary adjustments to ensure a smooth transition.

Netdata Growth

  • 66k+ GitHub Stars ⭐
    Since October 2023, Netdata is leading the observability category in the CNCF landscape, surpassing Elasticsearch. Thank you for your love ❤️! Give Netdata a ⭐ too, on GitHub!

  • 600M+ docker hub pulls
    Netdata runs with about 200k docker hub downloads per day. Since June 2023 we are a Verified Publisher, so that Netdata pulls don't count against docker hub pull limits for our users, allowing all our users to integrate Netdata to their CI/CD toolchains.

Release Summary

  • Netdata beats Prometheus in all aspects: this version of Netdata includes significant improvement allowing Netdata to be a lot more performant than Prometheus, at scale. Full performance analysis included.
  • Netdata Journal Logs: Netdata can now deal with huge systemd-journal databases and is available for the host logs when Netdata runs in a container.
  • First beta version of Netdata's log2journal: a utility to extract, convert, transform and send to systemd-journal any kind of structured logs (including JSON and logfmt logs), similar to what promtail does for Loki.
  • More Netdata Functions: monitor containers and VMs, network interfaces, mount points, block devices, systemd units, systemd services, and more!
  • Netdata now logs to journal instead of log files and the results are amazing!

Release Highlights

Netdata beats Prometheus in all aspects

image

We tested Netdata and Prometheus at scale, both ingesting 2.7 million metrics per second. On the same workload, Netdata vs Prometheus needs:

  • 35% less CPU
  • 49% less RAM
  • 12% less bandwidth
  • 75% less disk space
  • 98% less disk I/O

Read the full performance comparison between Netdata and Prometheus.

To achieve these astonishing results, we made the following changes to Netdata since the previous release:

New SLOTS streaming protocol

A new streaming protocol, allows Netdata children and parents to share a common index of the metrics streamed, allowing the parents to receive metrics without consulting hashtables, reducing the overall overhead on parents by about 30%, without increasing the overhead on children (the children just number each metric).

The new protocol, called SLOTS, is automatically selected when both the child and the parent support it.

Streaming compression algorithms

Streaming now supports multiple compression algorithms. Previous Netdata releases supported only LZ4, which is known for its speed and average compression ratio. This release adds support for ZSTD, GZIP, and BROTLI.

ZSTD provides the best balance between compression ratio and CPU consumption, and therefore it is now the default.

The compression algorithms selection order can be configured on parents, in stream.conf, at the [API] section (parents), by setting compression algorithms order = zstd lz4 brotli gzip.

If you need to save most bandwidth at the expense of CPU utilization set this so that brotli or gzip appear first in the list, before zstd and lz4.

This also means that parents can now have a different compression order for each API key, allowing the use of different API keys based on the location of the child (i.e. children that are on billable egress bandwidth can use an API key that prefers the best compression, like brotli and gzip, while children on non-billable egress bandwidth can use an API key that prefers the best CPU utilization, like zstd or lz4).

Gorilla compression beta

Gorilla compression is a time series data compression technique, developed by Facebook for their time series database, Gorilla. It's particularly efficient for compressing data that changes incrementally over time, which is a common characteristic of time series data.

This release of Netdata includes an adaptation of Gorilla compression, which once enabled, provides 30% additional memory reduction to Netdata.

This was not ready when we compared Netdata and Prometheus, so the Gorilla compression benefits weren't accounted in the comparison. By enabling Gorilla compression, Netdata memory reduction is 70%+ compared to Prometheus.

To try Gorilla compression, edit netdata.conf and set at the [db] section, dbengine page type = gorilla.

Keep in mind that enabling Gorilla compression changes the dbegnine file format to Gorilla compressed metrics. This version of Netdata can read Gorilla-compressed data from dbengine even if Gorilla compression is not enabled, but previous versions of Netdata cannot read it. So, enable Gorilla, only if you don't plan to switch back to a previous version of Netdata.

Our plan is to have Gorilla compression enabled by default at the next release of Netdata.

systemd-journal logs

Our systemd-journal.plugin was already quite faster (10x) than journalctl, but still it was slow when the journal databases is huge (e.g. at journals centralization points where hundreds or thousands of nodes push their logs).

In this release, we introduce several changes to allow the plugin to work promptly in such environments.

Sampling and estimations

The biggest performance issue with systemd-journal logs is the query performance when dealing with huge logs databases.

To overcome this performance issue and provide prompt responses to queries, Netdata now uses the following strategy:

  1. The latest 500k log entries read from journal files work like before: we read all of them and all the values for all their fields, so that we can have accurate histograms and counters per field value at the filters.
  2. Once we hit the 500k log entries limit on a single query, we turn on sampling and estimations.
  3. Sampling distributes 500k more log entries to all the journal files to be read, so that the total log entries queried for their field values will be 1M. This means that if we have to read 100 files, 10k log entries per file will be sampled and 10k log entries more will be unsampled. Since files are usually spread over time, this provides a good sample across time.
  4. When the sampling threshold is hit, Netdata continues reading more log entries without querying the values of the fields. These log entries appear as [unsampled] at the histogram. We know these log entries are there, but the value counters on the field filters do not include them.
  5. When the [unsampled] threshold is hit, and we have read more than 1% of each file, Netdata estimates the number of entries that will be read from the file and skips the rest of it. This estimation appears as [estimated] in the histogram.

The above process allows Netdata to provide a histogram of the logs in a timely manner, even when the number of log entries in the visible timeframe is several dozen million.

A similar process is usually used by log management systems, including Grafana Loki and Elasticsearch. However, Netdata takes a much bigger sample of the data (other systems usually sample only a few thousand log entries, while Netdata usually samples more than a million) and the visualization allows exposing the exact sampling and estimations made at the histogram.

Image showing [unsampled] and [estimated] on a systemd journal system that collects about 10k nginx log entries per second:
image

Read more about journals query performance.

journals scan

On busy logs centralization servers, the number of journal files available in /var/log/journal/remote can grow significantly, slowing down directory listing (even ls -l is very slow on them).

To overcome this issue, Netdata now uses inotify events and sorts the files to be scanned from the latest to the oldest.

These changes allow Netdata to present the logs user interface for the most recent journals, immediately after a Netdata restart, while the journals database is scanned in the background.

Logs UI is now available when using Netdata docker images

We switched Netdata docker images from Alpine Linux to Debian, so that libsystemd will be available inside the docker image, allowing systemd-journal.plugin to be compiled and shipped with Netdata docker images.

Using Netdata docker images, Netdata can now query the host system journal files, while running inside the container.

MESSAGE_ID support

systemd-journal has a nice feature where certain events of common interest are given a specific MESSAGE_ID. Several such MESSAGE_IDs have been assigned to track common events, like coredumps, units start/stop events, VMs start/stop events, time changes, etc. In total, we found more than 50 total unique events that are tracked this way.

This version if systemd-journal.plugin automatically tracks and annotates these MESSAGE_IDs using their names allowing quick spotting of events of common interest.

This feature is available at the MESSAGE_ID field filter, at the right side of the dashboard.

log2journal, a new tool on your quiver for managing logs

log2journal is a new utility allowing the conversion of log files into structured systemd-journal log entries. This is currently in beta.

The utility allows processing logs like this:

tail -F /var/log/nginx/access.log |\
   log2journal -c nginx-combined |\
   systemd-cat-native

The above builds a basic pipeline for converting the access.log of an Nginx web server into structured log entries in the local systemd-journal.

  • tail is responsible for feeding the latest logs lines to log2journal. Multiple files can be specified and log2journal can also pick up the filename from tail and add it as a field to the journal logs.
  • log2journal extracts fields from the log lines it is fed with. This is a powerful tool that can read json and logfmt logs, but also extract fields using PCRE2 patterns from any log. It supports filtering, renaming, and rewriting rules using command line arguments or yaml configuration files. The output of log2journal is the standard Journal Export Format.
  • systemd-cat-native is another new Netdata utility, reading standard Journal Export Format entries, which are then sent to a local or remote systemd-journal system.

Read more here.

Image showing structured nginx logs into systemd-journal:
image

Netdata now logs to systemd-journal

The logging layer of Netdata has been rewritten, so that Netdata logs now go to the systemd-journal, in a namespace called netdata.

The obvious outcome is that now you can monitor Netdata logs, using Netdata's systemd-journal.plugin user interface and thanks to journal namespaces, this does not pollute the system logs. But this is just the beginning...

Netdata utilizes the MESSAGE_ID feature of systemd-journal to register:

  • all alert transitions
  • all alert notifications
  • all connections from Netdata children
  • all connections to Netdata parents

This means that the systemd-journal.plugin user interface, and journalctl can now be used to list all such events uniformly.

Screenshot of Netdata alert transitions in systemd-journals:
image

All Netdata logs are now structured. Netdata can also log in json or logfmt formats. We introduced a lot of new fields to track every aspect of Netdata, in a uniform and consistent way. Read more here.

Furthermore, we introduced a new tool called systemd-cat-native allowing any application or shell script to send structured logs to systemd-journal. Read more here.

Functions, power up your troubleshooting toolkit!

Several new Functions have been added to help us in our troubleshooting journeys. On top of processes, streaming and systemd-journal, we are leveraging the wide range of collectors and metrics Netdata has and bring data in a different visual representation.

The updated list can be found on our documentation here, and you can find a summary of the currently available functions with the corresponding CLI tool it relates to:

Function Description Alternative to CLI tools plugin - module
block-devices Disk I/O activity for all block devices, offering insights into both data transfer volume and operation performance. iostat proc
containers-vms Insights into the resource utilization of containers and QEMU virtual machines: CPU usage, memory consumption, disk I/O, and network traffic. docker stats, systemd-cgtop cgroups
ipmi-sensors Readings and status of IPMI sensors. ipmi-sensors freeipmi
mount-points Disk usage for each mount point, including used and available space, both in terms of percentage and actual bytes, as well as used and available inode counts. df diskspace
network interfaces Network traffic, packet drop rates, interface states, MTU, speed, and duplex mode for all network interfaces. bmon, bwm-ng proc
processes Real-time information about the system's resource usage, including CPU utilization, memory consumption, and disk IO for every running process. top, htop apps
systemd-journal Viewing, exploring and analyzing systemd journal logs. journalctl systemd-journal
systemd-list-units Information about all systemd units, including their active state, description, whether or not they are enabled, and more. systemctl list-units systemd-journal
systemd-services System resource utilization for all running systemd services: CPU, memory, and disk IO. systemd-cgtop cgroups
streaming Comprehensive overview of all Netdata children instances, offering detailed information about their status, replication completion time, and many more.

In the short-term, we will keep adding more (hopefully) helpful Functions but have longer-term plan where we will want to expand this functionality to potentially allow taking and storing snapshots of the results based on: triggered alerts, or periodical configuration.

In case you have suggestions we have a running GitHub Discussion open here.

New Alert Notification Integrations to Netdata Cloud

We've been working on adding more Alert Notification Integrations to Netdata Cloud and recently added the following new ones:

  • Amazon Simple Notification Service (Amazon SNS), and
  • Telegram

image

The full list of Alert Notification Integrations from Netdata Cloud can be found on our documentation here.

Acknowledgments

  • @ClaraCrazy for improving degraded adapters detection in python.d/megacli.
  • @thomasbeaudry for adding UPS selftest and status metrics to charts.d/apcupsd.
  • @watsonbox for adding LBAs written/read metrics to python.d/smartd_log.
  • @sepek for correcting an error in the "Change how long Netdata stores metrics" guide.
  • @seniorquico for fixing parsing and adding MAINT status metrics to python.d/haproxy.
  • @luisj1983 for correcting errors in the Health API documentation.
  • @andyundso for improving apps plugin by adding Erlang in apps_groups.conf.
  • @vobruba-martin for adding various improvements to go.d/mysql.

Contributions

Collectors

  • Add more cases for megacli adapter degraded state (python.d/megacli) (#16522, @ClaraCrazy)
  • Improve estimations accuracy (systemd-journal.plugin) (#16467, @ktsaou)
  • Implement estimations (systemd-journal.plugin)(#16445, @ktsaou)
  • Improve startup time (systemd-journal.plugin) (#16443, @ktsaou)
  • Implement sampling (systemd-journal.plugin) (#16433, @ktsaou)
  • Add cgroup current pids metric (cgroups.plugin) (#16369, @ilyam8)
  • Add Ipmi-sensors function (freeipmi.plugin) (#16363, @ilyam8)
  • Add UPS status code metric (charts.d/apcupsd) (#16361, @thomasbeaudry)
  • Add Mount-points function (diskspace.plugin) (#16345, @ilyam8)
  • Add Block-devices function (proc/diskstats) (#16338, @ilyam8)
  • Add UsedBy field to Network-interfaces function (proc/proc_net_dev) (#16337, @ilyam8)
  • Add various improvements to Network-interfaces function (proc/proc_net_dev)(#16336, @ilyam8)
  • Add Network-interfaces function (proc/proc_net_dev) (#16334, @ilyam8)
  • Add Systemd-list-units function (systemd-journal.plugin) (#16318, @ktsaou)
  • Add Containers-vms function (cgroups.plugin) (#16314, @ktsaou)
  • Add UPS selftest status metric (charts.d/apcupsd) (#16286, @thomasbeaudry)
  • Add a configuration option to set private cleanup timeout (statsd.plugin) (#16269, @MrZammler)
  • Add container_device label to network interfaces (cgroups.plugin) (#16261, @ilyam8)
  • Add selecting multiple sources support (systemd-journal.plugin) (#16252, @ktsaou)
  • Add total LBAs written/read metrics (python.d/smartd_log) (#16245, @watsonbox)
  • Add Erlang to apps_groups.conf (apps.plugin) (#16231, @andyundso)
  • Add support for Proxmox vms/containers name resolution in Docker (cgroups.plugin) (#16193, @ilyam8)
  • Add nested JSON support to log parser (go.d/weblog) (#1416, @ilyam8)

Bug Fixes

  • Fix configuration loading (charts.d.plugin ) (#16471, @ilyam8)
  • Fix an issue where systemd-journal would stop trying different socket paths after the first failure (systemd-journal.plugin) (#16458, @ktsaou)
  • Fix parsing PD without NCQ status (python.d/adaptec_raid) (#16400, @ilyam8)
  • Fix Systemd-list-units function expiration time (#16393, @ilyam8)
  • Fix lack of system.net when running inside LXC (#16364, @ilyam8)
  • Fix memory leak in Systemd-list-units function (systemd-journal.plugin) (#16333, @ktsaou)
  • Fix server status parsing and add MAINT status chart (python.d/haproxy) (#16253, @seniorquico)

Other

  • Skip timestamp when logging to journald (python.d.plugin) (#16516, @ilyam8)
  • Mute stock jobs logging during check() (python.d.plugin) (#16515, @ilyam8)
  • Improvement performance of the plugin (systemd-journal.plugin) (#16509, @ktsaou)
  • Don't create runtime disk config by default (proc/diskspace, proc/diskstats) (#16503, @ilyam8)
  • Don't create runtime device config by default (proc/proc_net_dev) (#16501, @ilyam8)
  • Disable netdata monitoring section by default (#16480, @MrZammler)
  • Change apps oom and net charts order (ebpf.plugin) (#16395, @thiagoftsm)
  • Fix "differ in signedness" warn in cgroups plugin (#16391, @ilyam8)
  • Fix throttle_duration chart context (cgroups.plugin) (#16367, @ilyam8)
  • Hide summary columns in network and block devices functions (proc/diskstats, proc/proc_net_dev) (#16347, @ktsaou)
  • Fix crash when a container has no CPU/mem metrics in Containers-vms function (cgroups.plugin) (#16331, @ilyam8)
  • Add tcp v6 connect calls to Ebpf_socket function (ebpf.plugin) (#16316, @thiagoftsm)
  • Update journal sources once per minute (systemd-journal.plugin) (#16298, @ktsaou)
  • Minor updates and cleanup (systemd-journal.plugin) (#16267, @ktsaou)
  • Stop using deprecated distutils module (python.d.plugin) (#16259, @MrZammler)
  • Remove charts.d/nut (#16230, @ilyam8)
  • Don't log an error opening cgroup.procs/tasks if it does not exist (cgroups.plugin) (#16196, @ilyam8)
  • Improve exposing metrics by creating a chart for each app group (ebpf.plugin) (#16139, @thiagoftsm)
  • Skip timestamp when logging to journald (go.d.plugin) (#1418, @ilyam8)
  • Replace logger with structured logger (go.d.plugin) (#1418, @ilyam8)
  • Use SHOW REPLICA STATUS for MySQL v8.0.22+ (go.d/mysql) (#1392, @vobruba-martin)
  • Use performance_schema instead of information_schema for MySQL v8.0.22+ (go.d/mysql) (#1390, @vobruba-martin)

Packaging/Installation

Documentation

Other Notable Changes

Deprecation notice

Changed in this release

In accordance with our previous deprecation notice, the following items in this release have been changed:

Other unannounced changes:

  • Netdata internal metrics (Netdata Monitoring section) are disabled by default to reduce the overall data volume. Later we plan to enable only important internal metrics by default.

    Can be enabled in netdata.conf by uncommenting and changing no to yes:

    [plugins]
      # netdata monitoring = no
      # netdata monitoring extended = no
    
  • Logging

    • Logs format changed to logfmt.
    • Default logging destination changed to systemd-journal (systemd-only): logs are now sent to the "netdata" namespace in systemd-journal. Systemd-journal provides a centralized repository for all system logs, making it easier to manage and search for logs. To override the default behavior and continue using the file-based logging, refer to the netdata.conf file and make the necessary changes under the [logs] section.
    • File-based logging: error.log renamed to daemon.log.

Will be changed in the next release

  • To ensure seamless compatibility with future updates, we recommend transitioning from source-built installations to our distribution packages or static binaries. Starting with our next release, we will no longer guarantee compatibility when updating source-built installations. This change allows us to focus on enhancing the stability and feature delivery for the rest of our supported installation methods.

  • Gorilla compression will be enabled by default.

  • The Google Cloud Pub Sub and the AWS Kinesis exporters will be removed in the next release. Both of them were not maintained and were not used when building packages. Users can consult the exporting documentation for alternative exporters to use.

  • The database modes map and save will be removed in the next release. The dbengine database mode will be used to persist metrics on disk automatically.

  • Per-core CPU metrics will be disabled by default to reduce data volume. Summary (per-system) metrics are still collected. This change enhances performance and resource utilization. Disabled metrics:

    • cpu.cpu (utilization).
    • cpu.interrupts (all interrupts).
    • cpu.softirqs (software interrupts).
    • cpu.softnet_stat (software interrupts related to network receive work).
    • cpu.cpu_cstate_residency_time (idle states).

    Can be enabled in netdata.conf by uncommenting and changing no to yes:

    [plugin:proc:/proc/stat]
        # per cpu core utilization = no
        # cpu idle states = no
    
    [plugin:proc:/proc/interrupts]
        # interrupts per core = no
    
    [plugin:proc:/proc/softirqs]
        # interrupts per core = no
    
    [plugin:proc:/proc/net/softnet_stat]
        # softnet_stat per core = no
    
  • To optimize system performance, several eBPF.plugin modules have been disabled by default. While these modules provide valuable insights into system resource usage, they can also contribute to system overhead. They will expose metrics using Functions (run on demand and for a limited period of time). These modules include:

    • cachestat
    • fd
    • process
    • oomkill
    • shm
    • swap

Netdata Release Meetup

Join the Netdata team on the 11th of December at 16:30 UTC for the Netdata Release Meetup.

Together we’ll cover:

  • Release Highlights.
  • Acknowledgments.
  • Q&A with the community.

RSVP now - we look forward to meeting you.

Support options

As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter an issue with any of the changes made in this release or any feature in the Netdata Agent, feel free to contact us through one of the following channels:

  • Netdata Learn: Find documentation, guides, and reference material for monitoring and troubleshooting your systems with Netdata.
  • GitHub Issues: Make use of the Netdata repository to report bugs or open a new feature request.
  • GitHub Discussions: Join the conversation around the Netdata development process and be a part of it.
  • Community Forums: Visit the Community Forums and contribute to the collaborative knowledge base.
  • Discord Server: Jump into the Netdata Discord and hang out with like-minded sysadmins, DevOps, SREs, and other troubleshooters. More than 1800 engineers are already using it!
netdata - v1.43.2

Published by netdatabot 12 months ago

Netdata v1.43.2 is a patch release to address issues discovered since v1.43.1.

This patch release provides the following bug fixes and updates:

  • Fix rrdlabels type (1676de2, @stelfrag).
  • Fix label copy to allow new keys with different values (6179213, @stelfrag).
  • Fix internal label source propagation when streaming metrics (60cd86d, @ktsaou).
  • Speed up queries when sending alerts to Cloud on parents with a large number of alerts per child (f80f0fc, @MrZammler).
  • Fix filtering when selecting multiple fields in systemd-journal plugin (750ca8e, @stelfrag).
  • Fix an issue where parents were missing chart labels of child instances (240f9e7, @ktsaou).
  • Fix an issue where updated labels were not propagated to parents (644d432, @stelfrag).

Support options

As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter an issue with any of the changes made in this release or any feature in the Netdata Agent, feel free to contact us through one of the following channels:

  • Netdata Learn: Find documentation, guides, and reference material for monitoring and troubleshooting your systems with Netdata.
  • GitHub Issues: Make use of the Netdata repository to report bugs or open a new feature request.
  • GitHub Discussions: Join the conversation around the Netdata development process and be a part of it.
  • Community Forums: Visit the Community Forums and contribute to the collaborative knowledge base.
  • Discord Server: Jump into the Netdata Discord and hang out with like-minded sysadmins, DevOps, SREs, and other troubleshooters. More than 1700 engineers are already using it!
netdata - v1.43.1

Published by netdatabot 12 months ago

Netdata v1.43.1 is a patch release to address issues discovered since v1.43.0.

This patch release provides the following bug fixes and updates:

  • Prevent wrong optimization armv7l static build (#16274, @stelfrag).
  • Fixed pattern matching in Functions Search (#16264, @ktsaou).
  • Fixed an issue where the query planner was using the wrong dbengine tier that had no data for the selected time period (#16263, @ktsaou).
  • Fixed invalid payload in Discord notifications (#16257, @luchaos).
  • Fixed possible deadlock on discovery thread shutdown in cgroups plugin (#16246, @stelfrag).
  • Fixed duplicate chart labels (#16249, @stelfrag).
  • Fixed dimension HETEROGENEOUS check (#16234, @stelfrag).
  • Updated go.d plugin version to v0.56.3 (#16228, @ilyam8).
  • Fixed calculation of dbengine statistics on 32bit systems (#16222, @stelfrag).
  • Improved handling of duplicate labels (#16172, @stelfrag).
  • Improved cleanup on shutdown of collectors (#16023, @ktsaou)

Acknowledgements

We would like to thank our dedicated, talented contributors that make up this amazing community. The time and expertise
that you volunteer are essential to our success. We thank you and look forward to continuing to grow together to build a
remarkable product.

  • @luchaos for fixing Discord notifications.

Support options

As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter an issue with any of the changes made in this release or any feature in the Netdata Agent, feel free to contact us through one of the following channels:

  • Netdata Learn: Find documentation, guides, and reference material for monitoring and troubleshooting your systems with Netdata.
  • GitHub Issues: Make use of the Netdata repository to report bugs or open a new feature request.
  • GitHub Discussions: Join the conversation around the Netdata development process and be a part of it.
  • Community Forums: Visit the Community Forums and contribute to the collaborative knowledge base.
  • Discord Server: Jump into the Netdata Discord and hang out with like-minded sysadmins, DevOps, SREs, and other troubleshooters. More than 1700 engineers are already using it!
netdata - v1.43.0

Published by netdatabot about 1 year ago

Groundbreaking: systemd-journal logs release!

Table of Contents

Steady to our schedule, this is another great Netdata release!

Netdata Growth

  • 65.5 k GitHub Stars ⭐
    Since October 2023, Netdata is leading the observability category in the CNCF landscape, surpassing Elasticsearch. Thank you for your love ❤️! Give Netdata a ⭐ too, on GitHub!

  • 595 M docker hub pulls
    Netdata runs with about 200k docker hub downloads per day. Since June 2023 we are a Verified Publisher, so that Netdata pulls don't count against docker hub pull limits for our users, allowing all our users to integrate Netdata to their CI/CD toolchains.

Release Summary

This release is the most robust and reliable Netdata we have ever built.

These are the main areas Netdata has improved since the last release:

  1. Logs
    Today we release an almost rewritten version of systemd-journal, to improve its performance and visualization capabilities. systemd-journal holds critical systems and security information and given the lack of systemd-journal visualization tools, we focused first on filling this gap. At the same time, we are standardizing the way logs should be as a part of Netdata, enabling us to support more log management engines, like Loki and Elasticsearch.

  2. Instances Slice and Dice
    Given the capabilities of the new Netdata Agent UI (v2), we are changing the way some of our collectors collect and expose metrics, to allow easier slicing and dicing of the data and be more OpenTelemetry compatible in terms of specifications. So, in this release we changed the way apps.plugin exposes charts in the Applications section of the dashboard. Following the NIDL framework, each application group is now an instance, allowing better aggregation of processes utilization across nodes. Similarly, our systemd units charts have been updated to have an instance for each systemd unit. For the same reasons, disk charts now have additional labels (id, model and serial) to help us identify disks from the charts. Unfortunately, such changes tend to make the older dashboards (v1, v0) less usable, especially on servers with many hundreds of instances.

  3. Stock Alerts
    A number of changes have been implemented to the Netdata Health engine, to allow better integration with the new dashboard. More changes in this area are about to come, as part of the next release: a) allow multi-node alerts on parents, b) allow evaluating and configuring alerts from the UI.

  4. Alerts Accuracy
    Netdata has by default 3 tiers of metrics, each with a different resolution. The Netdata query planner is automatically picking the right tier to satisfy a query, based on the number of points requested in the response. For alerts there was a side effect. Since alerts request only 1 point of data in the response, the query planner was picking the "easier" tier to query, which is of course the one with the lower resolution. Now alerts are always run on tier 0, the higher resolution one.

  5. Lower Resources Utilization
    Several changes have been implemented for Netdata to better take care of itself. That includes lower memory usage, lower disk footprint, self vacuuming of SQLite databases, and more. Probably the most notable change is that now Netdata needs only 1 pointer (8 bytes on 64 bit, 4 bytes on 32 bit) for each use of a label name-value combination. This improves drastically Netdata's memory requirements in setups like busy k8s clusters, that containers come and go all the time, increasing the labels cardinality significantly.

  6. 32bit Netdata on 64bit IoT machines
    A common request when Netdata is installed on 64bit IoT devices, is to have a 32bit Netdata running there. Before this release, this was not possible. Now a 32bit Netdata will nicely run on a 64bit operating system.

  7. Netdata Cloud on prem
    Netdata Cloud is now available to be installed on-prem! Several companies have already deployed it and are currently testing it. If you want to join them, submit this form.

Release Highlights

systemd-journal

systemd-journal was first included in Netdata v1.42.0. Immediately after release, we recognized the wider need for this feature, so we've rewritten the plugin almost entirely, to provide the best possible experience. This work is also fundamental for supporting more log monitoring integrations - stay tuned!

The major improvements done on systemd-journal logs function were:

  • addition of the histogram for log entries over time, with a break down per field-value, for any field and any time-frame
  • enable of the PLAY mode provides the same experience as journalctl -f, showing new logs entries immediately after they are received
  • allow filtering on any journal field or field value, for any time-frame
  • add support for coloring log entries, the same way journalctl does

If you want to take a look at a full presentation of the systemd-journal plugin, how it works, how you can take full advantage of this and even instructions on configuration of a logs centralization server, check the documentation for the plugin.

chrome_tf8dV0qS5x

You can experience the power of systemd-journal logs function in one of our Netdata demo rooms here
or check our latest YouTube video on it.

Want to know why you should untap the full potential of systemd-journal logs? Check out Netdata's founder, Costa Tsaousis @ktsaou, blogpost on it here.

Virtual Machine monitoring (VMWare vSphere)

With the increased feedback and requests on VMware vCenter Server collectors we have:

  • Reviewed our out-of-the-box charts
  • Added labels to the charts, e.g. host, datacenter, cluster, vm
  • Reviewed the metadata on alerts
  • Added summary charts section

It is with this feedback from the Community that we can keep working on improving Netdata to ensure it meets
your needs!

What is coming next

We are currently working on the following areas, which we hope to release next month:

  1. Logs Explorer for Loki and Elasticsearch
    Similar to systemd-journal, allow Netdata to explore, query and visualize logs from Loki and Elasticsearch.

  2. Collectors Configuration from the UI
    In the last release we presented the Integrations Marketplace. Since then, we work to make all integrations configurable via the dashboard. This will allow all of us to configure our Netdata servers directly from the UI, without touching configuration files, improving significantly the usability and easiness of Netdata.

  3. Alerts Configuration from the UI
    Similarly, we work to allow configuring alerts directly from the UI, without text file configurations, so the all of us can create powerful alerts on the spot.

  4. Netdata Mobile App
    We are at the final stage of releasing our Netdata Mobile App (iOS and Android) for receiving mobile push notifications and exploring alerts statuses.

  5. Scalability
    Given the wide adoption of Netdata, we are committed to make Netdata scale better in larger environments. Especially when it comes to Netdata parents, we aim to provide the best scalability possible. We are currently finalizing the necessary changes to allow Netdata achieve:

    • 1 CPU core per 1 million metrics/s for data collection
    • 1 CPU core per 1 million metrics/s for ML and health (alerts)
    • 1 CPU core per 1 million metrics/s for re-streaming (pushing metrics to another parent)

    Of course, the numbers depend on the CPU and its clock, but they shouldn't vary significantly on modern systems.

    At the same time, we work to integrate Gorilla compression to our database. This will provide a significantly better overall memory footprint for Netdata.

Acknowledgments

We would like to thank our dedicated, talented contributors that make up this amazing community. The time and expertise that you volunteer are essential to our success. We thank you and look forward to continuing to grow together to build a remarkable product.

  • @MAH69IK for improving ntfy notification title.
  • @chpfm for fixing slave/user metrics collection stopping when query times out in go.d/mysql.
  • @k0ste for various installation improvements on CentOS-Stream.
  • @kylemanna for fixing an issue where a properly functioning sensors was skipped due to limits in python.d/sensors.
  • @miversen33 for adding access control configuration to ntfy notification method.
  • @novotnyJiri for fixing the wrong path in ansible-playbook deployment guide.
  • @theggs for adding installation description for Homebrew on Apple Silicon.
  • @vpnable for fixing counting UNDEF as users in go.d/openvpn_status_log.
  • @zhqu1148980644 for fixing docker-compose example.
  • @luisj1983 for implementing molecule tests in the netdata/ansible playbook

Contributions

Collectors

Improvements

  • Improve exposing metrics by creating a chart for each app group/user/user group (apps.plugin) (#16095, @thiagoftsm)
  • Add env NETDATA_LOG_SEVERITY_LEVEL support to external collectors (#16089, @ilyam8)
  • Add env NETDATA_LOG_SEVERITY_LEVEL support (charts.d.plugin) (#16085, @ilyam8)
  • Add env NETDATA_LOG_SEVERITY_LEVEL support (python.d.plugin) (#16084, @ilyam8)
  • Improve performance by reading files sequentially (systemd-journal.plugin) (#16038, @ktsaou)
  • Add systemd-journal plugin to apps_groups.conf (apps.plugin) (#16024, @ilyam8)
  • Improve exposing metrics by creating a chart for each systemd service (cgroups.plugin) (#15975, @thiagoftsm)
  • Add disk labels (proc/diskstats) (#15949, @ktsaou)
  • Add support for opening journal files when running inside a container (systemd-journal.plugin) (#15830, @ktsaou)
  • Add env NETDATA_LOG_SEVERITY_LEVEL support (go.d.plugin) (#1351, @ilyam8)
  • Add "network" config option that allows configuration of DNS resolution (go.d/ping) (#1348, @ilyam8)
  • Add "custom_numeric_fields" config option (go.d/web_log) (#1343, @ilyam8)
  • Add upsd (NUT) collector (go.d/upsd) (#1341, @ilyam8)
  • Improve status chart by making it a dimension per status (go.d/vcsa) (#1332, @ilyam8)
  • Add label to vm/host charts (go.d/vsphere) (#1331, @ilyam8)

Bug fixes

  • Fix 1-second latency in play mode (systemd-journal.plugin) (#16123, @ktsaou)
  • Fix an issue where ipv4 metrics were exposed as ip (proc/netstat) (#16122, @ilyam8)
  • Fix an issue where OOMKill was created unconditionally (ebpf.plugin) (#16115, @thiagoftsm)
  • Fix an issue where ebpf threads did not respect the enable/disable value in the configuration (ebpf.plugin) (#16083, @thiagoftsm)
  • Fix using undefined var when loading job statuses (python.d.plugin) (#15965, @ilyam8)
  • Fix an issue where a properly functioning sensor was skipped due to limits (python.d/sensors) (#15905, @kylemanna)
  • Fix slave/user metrics collection stopping when query times out (go.d/mysql) (#1346, @chpfm)
  • Fix counting UNDEF as users (go.d/openvpn_status_log) (#1334, @vpnable)
  • Fix an issue where power metric were not collected due to renaming (go.d/nvidia_smi) (#1310, @ilyam8)

Other

Packaging / Installation

  • Fix removing wrong directories when uninstalling on FreeBSD (#16167, @tkatsoulas)
  • Fix repo path for openSUSE 15.5 packages (#16161, @tkatsoulas)
  • Fix an issue running a Docker container when the default user was configured as a non-root user (#16156, @ilyam8)
  • Fix an issue where the uninstaller script doesn't clean up properly (#16148, @ilyam8)
  • Fix problem with the uninstaller script when executed as a regular user (#16146, @ilyam8)
  • Skip trying to preserve file owners when bundling external code (#15966, @Ferroin)
  • Cleanup Dockerfile (#15902, @Ferroin)
  • Skip copying environment/install-type files when checking existing installations (#15876, @Ferroin)
  • Add setuid fallback for perf and slabinfo plugins in the installer script (#15807, @ilyam8)
  • Fix an issue where cleanup was not performed during the kickstart.sh dry run (#15775, @ilyam8)
  • Add CentOS-Stream to distros (#15742, @k0ste)
  • Fix build with --disable-https (#15395, @MrZammler)
  • Enable building go.d plugin natively for CentOS-Stream (#14551, @k0ste)

Documentation

Health

Other Notable Changes

Improvements

Bug Fixes

Other

Deprecation notice

Changed in this release

In accordance with our previous deprecation notice, the following items in this release have been changed:

Component Type Change Action
apps.plugin collector a dimension for each group/user/user group => a chart for each group/user/user group
cgroups.plugin collector a dimension for each systemd service => a chart for each systemd service
proc.plugin collector all "Networking Stack" metrics except "tcp" have been moved to "IPv4 Networking"
family attribute alert configuration and Health API deprecated use chart labels

Will be changed in the next release

We plan to change in the next release (v1.44.0):

Component Type Change Action
charts.d/nut collector deprecated use go.d/upsd

Netdata Release Meetup

Join the Netdata team on the 18th of October at 16:30 UTC for the Netdata Release Meetup.

Together we’ll cover:

  • Release Highlights.
  • Acknowledgments.
  • Q&A with the community.

RSVP now - we look forward to meeting you.

Support options

As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter an issue with any of the changes made in this release or any feature in the Netdata Agent, feel free to contact us through one of the following channels:

  • Netdata Learn: Find documentation, guides, and reference material for monitoring and troubleshooting your systems with Netdata.
  • GitHub Issues: Make use of the Netdata repository to report bugs or open a new feature request.
  • GitHub Discussions: Join the conversation around the Netdata development process and be a part of it.
  • Community Forums: Visit the Community Forums and contribute to the collaborative knowledge base.
  • Discord Server: Jump into the Netdata Discord and hang out with like-minded sysadmins, DevOps, SREs, and other troubleshooters. More than 1700 engineers are already using it!
netdata - v1.42.4

Published by netdatabot about 1 year ago

Netdata v1.42.4 is a patch release to address issues discovered since v1.42.3.

This patch release provides the following bug fixes and updates:

  • Fixed alarm variables not being created for all chart dimensions. (#15984, @MrZammler).

Support options

As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter an issue with any of the changes made in this release or any feature in the Netdata Agent, feel free to contact us through one of the following channels:

  • Netdata Learn: Find documentation, guides, and reference material for monitoring and troubleshooting your systems with Netdata.
  • GitHub Issues: Make use of the Netdata repository to report bugs or open a new feature request.
  • GitHub Discussions: Join the conversation around the Netdata development process and be a part of it.
  • Community Forums: Visit the Community Forums and contribute to the collaborative knowledge base.
  • Discord Server: Jump into the Netdata Discord and hang out with like-minded sysadmins, DevOps, SREs, and other troubleshooters. More than 1700 engineers are already using it!
netdata - v1.42.3

Published by netdatabot about 1 year ago

Netdata v1.42.3 is a patch release to address issues discovered since v1.42.2.

This patch release provides the following bug fixes and updates:

Acknowledgements

We would like to thank our dedicated, talented contributors that make up this amazing community. The time and expertise
that you volunteer are essential to our success. We thank you and look forward to continuing to grow together to build a
remarkable product.

  • @moonbreon for improving handling of closed connections in streaming.

Support options

As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter an issue with any of the changes made in this release or any feature in the Netdata Agent, feel free to contact us through one of the following channels:

  • Netdata Learn: Find documentation, guides, and reference material for monitoring and troubleshooting your systems with Netdata.
  • GitHub Issues: Make use of the Netdata repository to report bugs or open a new feature request.
  • GitHub Discussions: Join the conversation around the Netdata development process and be a part of it.
  • Community Forums: Visit the Community Forums and contribute to the collaborative knowledge base.
  • Discord Server: Jump into the Netdata Discord and hang out with like-minded sysadmins, DevOps, SREs, and other troubleshooters. More than 1700 engineers are already using it!
netdata - v1.42.2

Published by netdatabot about 1 year ago

Netdata v1.42.2 is a patch release to address issues discovered since v1.42.1.

This patch release provides the following bug fixes and updates:

Acknowledgements

We would like to thank our dedicated, talented contributors that make up this amazing community. The time and expertise
that you volunteer are essential to our success. We thank you and look forward to continuing to grow together to build a
remarkable product.

  • @kevin-fwu for adding an option to avoid duplicate labels when exporting in Prometheus format.
  • @k0ste for fixing permission attributes for conf.d dirs for RPM.

Support options

As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter an issue with any of the changes made in this release or any feature in the Netdata Agent, feel free to contact us through one of the following channels:

  • Netdata Learn: Find documentation, guides, and reference material for monitoring and troubleshooting your systems with Netdata.
  • GitHub Issues: Make use of the Netdata repository to report bugs or open a new feature request.
  • GitHub Discussions: Join the conversation around the Netdata development process and be a part of it.
  • Community Forums: Visit the Community Forums and contribute to the collaborative knowledge base.
  • Discord Server: Jump into the Netdata Discord and hang out with like-minded sysadmins, DevOps, SREs, and other troubleshooters. More than 1600 engineers are already using it!
netdata - v1.42.1

Published by netdatabot about 1 year ago

Netdata v1.42.1 is a patch release to address issues discovered since v1.42.0.

This patch release provides the following bug fixes and updates:

  • Fixed issue with missing entries for Systemd-journal and Processes functions (#15814, @ktsaou)
  • Fixed linking health.log to stdout in Docker (#15813, @ilyam8)
  • Updated UI version to v6.28.0 (#15810, @ilyam8)
  • Fixed 401 when behind a proxy with Basic auth and signed in (#15808, @ktsaou)
  • Fixed Health Management API (#15806, @underhood)
  • Fixed build deps in DEB packages for systemd-journal.plugin (#15805, @Ferroin)
  • Cleaned up python deps for RPM packages (#15804, @Ferroin)
  • Added proper SUID fallback for DEB plugin packages (#15803, @Ferroin)
  • Fixed an issue where the nd_journal_process column was not populated for the Systemd-journal function (#15798, @ktsaou)
  • Fixed negative retention when database is empty in /api/v2/info (#15796, @ktsaou)
  • Fixed handling of unassigned drives for python.d/hpssa (#15793, @ilyam8)
  • Fixed an issue that prevented systemd-journal.plugin from restarting (#15787, @ktsaou)
  • Fixed publishing of openSUSE 15.5 packages (#15781, @tkatsoulas)
  • Updated OpenSSL version of static builds to 1.1.1v (#15779, @tkatsoulas)

Support options

As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter an issue with any of the changes made in this release or any feature in the Netdata Agent, feel free to contact us through one of the following channels:

  • Netdata Learn: Find documentation, guides, and reference material for monitoring and troubleshooting your systems with Netdata.
  • GitHub Issues: Make use of the Netdata repository to report bugs or open a new feature request.
  • GitHub Discussions: Join the conversation around the Netdata development process and be a part of it.
  • Community Forums: Visit the Community Forums and contribute to the collaborative knowledge base.
  • Discord Server: Jump into the Netdata Discord and hang out with like-minded sysadmins, DevOps, SREs, and other troubleshooters. More than 1600 engineers are already using it!
netdata - v1.42.0

Published by netdatabot about 1 year ago

Steady to our schedule, this is another great Netdata release!

Netdata Growth

  • 64.5 k GitHub Stars ⭐

    Netdata got at the top trending repos on GitHub, after the last release. ❤️ Thank you for your love! 🚀 You rock!

    Give Netdata a ⭐ on GitHub too!

  • 580+ M docker hub pulls, running at 200+ k per day.

    Netdata is a verified publisher on Docker Hub, and our users enjoy free unlimited Docker Hub pulls!

Release Highlights

Integrations Marketplace

A beta version of the Netdata Marketplace is included in this release:
image

More than 800 integrations are available, directly from the dashboard. For each integration, all the information required to get it up and running is included:

2023-08-08 15-36-40

Integrations are still in beta. We improve it every day, but we think it is already quite useful.

SystemD Journal

A new Netdata Function has been added to query the systemd journal logs:

2023-08-08 16-04-49

The function respects the current date-time picker, so it can query any possible timeframe the systemd journal has data for.

IMPORTANT
Netdata Functions are available only when you are signed in to Netdata and your Netdata Agent is claimed.
This has been done to protect your privacy. Netdata Cloud checks that the users of the Agent dashboard are allowed to view this information.

IMPORTANT
The systemd-journal function is currently available only on Netdata Agents that have been installed from source, or with native packages of the Linux distribution (RPM, DEB). For users running static builds of Netdata or running Netdata in a Docker container, we are working to bring systemd-journal to them too. Stay tuned...

Claiming via the UI

You can now connect your agents to Netdata Cloud, via the dashboard:

2023-08-08 15-53-30

The UI verifies that you are the owner of a Netdata, by asking you to provide a random key that is saved to a file on disk. Once you provide the right key, Netdata is automatically claimed to your space at Netdata Cloud.

Easily Spot Anomalies

The UI has an AR button above the menu. When you press it, the dashboard queries the Netdata Metrics Scoring Engine, to find the anomaly rates for the visible timeframe, across the metrics included in the dashboard. Then it add a badge next to each category and subcategory, showing its anomaly rate.

This way, you can quickly spot what is anomalous on the current view of the dashboard.

2023-08-08 16-25-44

Acknowledgements

We would like to thank our dedicated, talented contributors that make up this amazing community. The time and expertise that you volunteer are essential to our success. We thank you and look forward to continuing to grow together to build a remarkable product.

  • @Leny1996 for fixing Docker bind-mount stock files creation.
  • @fhriley for adding Linux power cap Intel RAPL metrics collector.
  • @icy17 for fixing potential crash in the h2o server.
  • @kiela for fixing typos and images placement in the Deployment Strategies doc.
  • @zeylos for fixing non-interactive options for apt-get and zypper.

Contributions

Collectors

New

  • Add AMD GPU collector (proc.plugin)(#15515, @Dim-P)
  • Add PCI Advanced Error Reporting metrics collector (proc.plugin) (#15488, @ktsaou)
  • Add Linux power cap Intel RAPL metrics collector (proc.plugin) (#15364, @fhriley)
  • Add systemd-journal plugin (systemd-journal.plugin)(#15363, @ktsaou)

Improvements

  • Collect EDAC metrics per-memory controller (MC) and DIMM (proc.plugin) (#15473, @ktsaou)

Bug fixes

Other

  • Change restart message to info (freeipmi.plugin) (#15664, @ilyam8)
  • Filter out systemd-udevd.service/udevd cgroup (cgroups.plugin) (#15571, @ilyam8)
  • Improve FD limit issue tracing (apps.plugin) (#15504, @ktsaou)
  • Add hash table charts for internal monitoring (ebpf.plugin) (#15323, @thiagoftsm)

Documentation

Packaging / Installation

Health

  • Disable systemdunits alarms (#15726, @ilyam8)
  • Remove the noise by silencing alerts that don't need to wake up people (#15590, @ktsaou)

Other Notable Changes

Improvements

Bug Fixes

Code organization

Deprecation notice

We plan to change the following items in the next release (v1.43.0):

Component Type Change Action
apps.plugin collector a dimension for each group/user/user group => a chart for each group/user/user group
cgroups.plugin collector a dimension for each systemd service => a chart for each systemd service
proc.plugin collector all "Networking Stack" metrics except "tcp" => "IPv4 Networking"
python.d/nvidia_smi collector deprecated use go.d/nvidia_smi
family attribute alert configuration and Health API deprecated use chart labels

Netdata Release Meetup

Join the Netdata team on the 11th of August at 17:00 UTC for the Netdata Release Meetup.

Together we’ll cover:

  • Release Highlights.
  • Acknowledgements.
  • Q&A with the community.

RSVP now - we look forward to meeting you.

Support options

As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter an issue with any of the changes made in this release or any feature in the Netdata Agent, feel free to contact us through one of the following channels:

  • Netdata Learn: Find documentation, guides, and reference material for monitoring and troubleshooting your systems with Netdata.
  • GitHub Issues: Make use of the Netdata repository to report bugs or open a new feature request.
  • GitHub Discussions: Join the conversation around the Netdata development process and be a part of it.
  • Community Forums: Visit the Community Forums and contribute to the collaborative knowledge base.
  • Discord Server: Jump into the Netdata Discord and hang out with like-minded sysadmins, DevOps, SREs, and other troubleshooters. More than 1600 engineers are already using it!
netdata - v1.41.0

Published by netdatabot over 1 year ago

Checkout the v1.41 release meetup recording or read on to learn more about the new UI and other features in this release.

netdata release notes meetup

Steady to our schedule, this is another great Netdata release!

Netdata Growth

  • 64 k GitHub Stars ⭐
  • 1.7 M monitored nodes
  • 570+ M docker hub pulls

Give Netdata a ⭐ too, on Github!

❤️ Thank you for your love! 🚀 You rock!

Release Highlights

New Agent Dashboard

Netdata Agents and Parents now have a new UI!

New CHARTS 🟢 New SUMMARIES 🟢 MACHINE-LEARNING FIRST 🟢 INFRASTRUCTURE LEVEL DASHBOARDS 🟢 FILTER, SLICE, and DICE any dataset 🟢 ANOMALY ADVISOR 🟢 METRICS CORRELATIONS 🟢 NETDATA FUNCTIONS 🟢 EVENTS FEED 🟢 HEATMAPS 🟢

Netdata Agent

In the last few months, we have ported and open-sourced all Netdata Cloud APIs to the Netdata Agent, allowing Netdata Parents to drive the same multi-node / infrastructure level dashboards Netdata Cloud provides!

So, as of today, Netdata Agents and Parents present the same UI, exactly the same dashboard, charts and features with Netdata Cloud!

Single Node Dashboard Changes

Apart from the entirely new look, single-node dashboards now group similar charts together. So, all disk drives, network interfaces, cgroups (containers and VMs), are now a single set of charts.

This allows Netdata to aggregate a vast amount of datasets in a chart, like the following, where almost 20k containers are now manageable:

image

To make it easier for you to navigate, filter, slice, and dice the data, the menus above each chart give you easy access to all the data of the chart:

Netdata Agent 2

Multi Node Dashboards

When Netdata Agents are configured as Parents (multiple other agents stream metrics to them), they now present multi-node and multi-instance charts. At the top right corner of the dashboard, there is the global nodes filter, from which you can slice the entire dashboard for one or a few of your nodes.

image

Want to know more?

Get a firsthand walkthrough with Costa Tsaousis, Netdata's Founder, on the rationale for this change and the path Netdata is taking by checking the video from Netdata Office Hours on YouTube.

The old dashboards are still accessible

You can still access all versions of the dashboards, as follows:

  • http://your.server:19999/
    The default dashboard is now a live version of the new UI. The dashboard static files are served by Cloudflare and are automatically updated when we release a new version of the UI, so that your Netdata agent is always up to date.

  • http://your.server:19999/v2/
    A local copy of the latest dashboard, as it was at the time the agent was released. This is distributed with Netdata under the Netdata Cloud UI License v1.0. The local copy is automatically used if for any reason the web browser cannot download the live version of it.

  • http://your.server:19999/v1/
    The previous single-node version of the Netdata Agent dashboard.

  • http://your.server:19999/v0/
    The now ancient, original version of the Netdata Agent dashboard.

Netdata Assistant

Netdata Assistant: Your AI-Powered Troubleshooting Sidekick

The Netdata Assistant is an AI-powered tool that uses large language models and our community's knowledge to guide you during troubleshooting and help you get to the root cause sooner.

The goal of the Netdata Assistant is straightforward: to make your troubleshooting process easier. It's here to save you from the hassle of sifting through tons of information so you can focus on solving the problem at hand.

It will give you the lowdown on the alert, why it's happening, and why you should care. It'll also guide you on how to troubleshoot it and even offer some handy web links for more info if you're interested.

image

Read more about it on the Netdata blog here.

New FreeIPMI collector for monitoring enterprise hardware

Netdata got a new FreeIPMI collector. The new collector is able to collect IPMI sensors at a much better data collection rate, and it is more reliable and robust compared to the previous one.

We have also categorized all sensors based on the component they monitor:

image

And provided as labels the exact sensor name each metric refers to:

image

Netdata Detects FDs Leaking

"FD" stands for "file descriptor". A file descriptor is an integer that the operating system assigns to an open file to track it. This includes regular data files, directories, network sockets, pipes, and other types of I/O streams.

In Linux, everything is treated as a file, which includes hardware devices, directories, and sockets. Each open file is assigned a file descriptor. When a file is closed, its file descriptor is freed up for reuse. However, if an application doesn't close a file when it's done with it, that's called a "file descriptor leak".

File descriptor leaks can cause several problems:

  1. Resource exhaustion: Each process has a limit to the number of file descriptors it can open. If a process continually leaks file descriptors without closing them, it will eventually hit this limit and won't be able to open any more files, which often causes the process to crash.

  2. Unexpected behavior: Open file descriptors hold resources, like network sockets, that might be expected to be available for other uses. If these resources are tied up due to a leak, it can cause unexpected behavior.

  3. Security issues: File descriptors can sometimes be used to gain unauthorized access to data if they're not properly managed.

apps.plugins is now able to track the usage of FDs against the limits set for each application. We have added an fds category in the Applications section of the dashboard. The first chart shows the percentage of FDs used by each application against its limits:

image

Acknowledgements

We would like to thank our dedicated, talented contributors that make up this amazing community. The time and expertise that you volunteer are essential to our success. We thank you and look forward to continuing to grow together to build a remarkable product.

  • @k0ste for improving Prometheus exporting doc.
  • @carlocab for replacing info macro with a less generic name.
  • @MYanello for updating the pfSense package installation instructions.

Contributions

Collectors

Improvements

  • Improve of fds monitoring (apps.plugin) (#15437, @ktsaou)
  • Add application groups file descriptor limit monitoring (apps.plugin) (#15417, @ktsaou)
  • Re-create sdr cache on start (freeipmi.plugin) (#15361, @ktsaou)
  • Add sensor state chart, create a per-sensor chart instead of a per-sensor dimension (freeipmi.plugin) (#15327, @ktsaou)
  • Expose CmdLine in apps function (apps.plugin) (#15275, @ilyam8)
  • Remove pod_uid and container_id labels in k8s (cgroups.plugin) (#15216, @ilyam8)
  • Add cluster mode (go.d/elasticsearch) (#1227, @ilyam8)
  • Add 'fallback_type' config option to match Untyped (go.d/prometheus) (#1225, @ilyam8)

Bug fixes

  • Fix sensor state updates (freeipmi.plugin) (#15360, @ilyam8)
  • Fix tc.plugin charts labels (tc.plugin) (#15262, @ilyam8)
  • Fix collecting hostgroup from stats_mysql_connection_pool (go.d/proxysql) (#1226, @ilyam8)

Other

Documentation

Packaging / Installation

Health

Exporting

  • Hide not available for viewers charts when exporting in the shell format (#15309, @ilyam8)
  • Fix slow exporting in Prometheus format (#15276, @ilyam8)

Other Notable Changes

Improvements

  • Enrichment of /api/v2, buildinfo improvements and code cleanup (#15294, @ktsaou)

Bug fixes

Code organization

Deprecation notice

There is not an obvious list of items that will be deprecated in the upcoming release (v1.42.0). Feel free to check and elaborate on the upcoming backlog

Deprecated in this release

In accordance with our previous deprecation notice, the following items in this release:

Component Type Will be replaced by
python.d/nvidia_smi collector go.d/nvidia_smi
family attribute alert configuration and Health API chart labels attribute (more details on netdata#15030)

Netdata Release Meetup

Join the Netdata team on the 21st of July at 17:00 UTC for the Netdata Release Meetup.

Together we’ll cover:

  • Release Highlights.
  • Acknowledgements.
  • Q&A with the community.

RSVP now - we look forward to meeting you.

Support options

As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter an issue with any of the changes made in this release or any feature in the Netdata Agent, feel free to contact us through one of the following channels:

  • Netdata Learn: Find documentation, guides, and reference material for monitoring and troubleshooting your systems with Netdata.
  • GitHub Issues: Make use of the Netdata repository to report bugs or open a new feature request.
  • GitHub Discussions: Join the conversation around the Netdata development process and be a part of it.
  • Community Forums: Visit the Community Forums and contribute to the collaborative knowledge base.
  • Discord Server: Jump into the Netdata Discord and hang out with like-minded sysadmins, DevOps, SREs, and other troubleshooters. More than 1400 engineers are already using it!
netdata - v1.40.1

Published by netdatabot over 1 year ago

Netdata v1.40.1 is a patch release to address issues discovered since v1.40.0.

This patch release provides the following bug fixes:

  • Fixed ebpf sync thread crash (#15174, thiagoftsm).
  • Fixed ebpf threads taking too long to terminate (#15187, thiagoftsm).
  • Fixed building with eBPF on RPM systems due to missing build dependency (#15192, k0ste).
  • Fixed building on macOS due to incorrect include directive (#15195, nandahkrishna).
  • Fixed a crash during health log entry processing (#15209, stelfrag).
  • Fixed architecture detection on i386 when building native packages (#15218, ilyam8).
  • Fixed SSL non-blocking retry handling in the web server (#15222, ktsaou).
  • Fixed handling of plugin ownership in static builds (#15230, Ferroin).
  • Fixed an exception in python.d/nvidia_smi due to not handling N/A value (#15231, ilyam8).
  • Fixed installing the wrong systemd unit file on older RPM systems (#15240, Ferroin).
  • Fixed creation of charts for network interfaces of virtual machines/containers as normal network interface charts (#15244, ilyam8).
  • Fixed building on openSUSE Leap 15.4 due to incorrect $(libh2o_dir) expansion (#15253, Dim-P).

Acknowledgements

We would like to thank our dedicated, talented contributors that make up this amazing community. The time and expertise
that you volunteer are essential to our success. We thank you and look forward to continuing to grow together to build a
remarkable product.

  • @k0ste for fixing building with eBPF on RPM systems.
  • @nandahkrishna for fixing building on macOS.

Support options

As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter
an issue with any of the changes made in this release or any feature in the Netdata Agent, feel free to contact us
through one of the following channels:

  • Netdata Learn: Find documentation, guides, and reference material for monitoring and troubleshooting your systems with Netdata.
  • GitHub Issues: Make use of the Netdata repository to report bugs or open a new feature request.
  • GitHub Discussions: Join the conversation around the Netdata development process and be a part of it.
  • Community Forums: Visit the Community Forums and contribute to the collaborative knowledge base.
  • Discord Server: Jump into the Netdata Discord and hang out with like-minded sysadmins, DevOps, SREs, and other troubleshooters. More than 1400 engineers are already using it!