netdata

Bot releases are visible (Hide)

netdata - v1.45.5 Latest Release

Published by netdatabot 5 months ago

Netdata v1.45.5 is a patch release to address issues discovered since v1.45.4.

This patch release provides the following bug fixes and updates:

Fixed streaming sender functions payload corruption (#17696, @ktsaou).
Fixed crashes due to missing dimension IDs (external protocol) by detecting incorrect syntax and disabling plugins (#17690, @stelfrag).
Fixed ACLK Proxy compatibility: added Host header to CONNECT requests (#17670, @stelfrag).
Added Machine Learning support to CentOS 7 RPM packages, making it now available for users (#17667, @vkalintiris, #17682, @Ferroin).
Fixed calculation issue in the go.d/cockroachdb collector (#17659, @ilyam8).
Added limited support for offline installations within the updater code (#17648, @Ferroin).
Fixed Cloud Alert consistency: sending REMOVED transitions for disconnected child Agents (#17621, @stelfrag).
Added vnode support to go.d/windows collector dyncfg (#17478, @ilyam8).

Support options

As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter an issue with any of the changes made in this release or any feature in the Netdata Agent, feel free to contact us through one of the following channels:

Netdata Learn: Find documentation, guides, and reference material for monitoring and troubleshooting your systems with Netdata.
GitHub Issues: Make use of the Netdata repository to report bugs or open a new feature request.
GitHub Discussions: Join the conversation around the Netdata development process and be a part of it.
Community Forums: Visit the Community Forums and contribute to the collaborative knowledge base.
Discord Server: Jump into the Netdata Discord and hang out with like-minded sysadmins, DevOps, SREs, and other troubleshooters. More than 2000 engineers are already using it!

netdata - v1.45.4

Published by netdatabot 5 months ago

Netdata v1.45.4 is a patch release to address issues discovered since v1.45.3.

This patch release provides the following bug fixes and updates:

Added missing update_every property to the health prototype JSON schema (#17613, @ktsaou)
Fixed issue where parent alerts remained active after child disconnection, by resetting health on child disconnect (#17612, @ktsaou)
Fixed a packaging issue that prevented ndsudo from having the setuid bit in static builds (#17583, @ilyam8)
Increased spawn server command size and added shutdown safeguard to prevent crashes from command size limit exceeded (#17566, @stelfrag)
Fixed error code reporting for failed data insertion in SQLite (#17508, @stelfrag)
Fixed issue with name-only label matching (#17482, @stelfrag)
Improved Cloud connectivity: automatically re-establish connection upon system resume from suspension by scheduling a node update (#17444, @stelfrag)
Improved termination handling: start watcher thread post-fork, preventing main process from waiting indefinitely on TERM signal (#17436, @stelfrag)
Fixed priority order for alarms and alarm templates: now, alarms are applied before alarm templates consistently, regardless of their order in configuration files (#17398, @ktsaou)
Added option for health table cleanup with 'netdata -W sqlite-alert-cleanup' command (#17385, @stelfrag)

Support options

Netdata Learn: Find documentation, guides, and reference material for monitoring and troubleshooting your systems with Netdata.
GitHub Issues: Make use of the Netdata repository to report bugs or open a new feature request.
GitHub Discussions: Join the conversation around the Netdata development process and be a part of it.
Community Forums: Visit the Community Forums and contribute to the collaborative knowledge base.
Discord Server: Jump into the Netdata Discord and hang out with like-minded sysadmins, DevOps, SREs, and other troubleshooters. More than 2000 engineers are already using it!

netdata - v1.45.3

Published by netdatabot 6 months ago

[!WARNING]
Important Security Update

Netdata v1.45.3 is a patch release to fix a local privilege escalation vulnerability discovered in v1.45.x releases. Users are advised to upgrade any systems running v1.45.0, v1.45.1, or v1.45.2 immediately. Stable releases before v1.45.0 are unaffected by this vulnerability. Full details on the vulnerability can be found in the associated security advisory on GitHub. A big thank you to mia-0 for identifying and reporting this issue!

This patch release also addresses other issues discovered since v1.45.2.

This patch release provides the following bug fixes and updates:

Mitigated a security issue in ndsudo by restricting its search paths to a predefined set of directories (#17377, @ilyam8)
Resolved an issue that prevented the "percentage" option from functioning correctly in alert lookups (#17391, @ktsaou)
Enhanced macOS uninstallation by enabling removal of the associated LaunchDaemons plist file (#17357, @ilyam8)
Increased the default minimum thread stack size to 1 MB to address potential stability issues caused by the musl libc's smaller default (128kB) (#17317, @ilyam8)

Support options

Netdata Learn: Find documentation, guides, and reference material for monitoring and troubleshooting your systems with Netdata.
GitHub Issues: Make use of the Netdata repository to report bugs or open a new feature request.
GitHub Discussions: Join the conversation around the Netdata development process and be a part of it.
Community Forums: Visit the Community Forums and contribute to the collaborative knowledge base.
Discord Server: Jump into the Netdata Discord and hang out with like-minded sysadmins, DevOps, SREs, and other troubleshooters. More than 2000 engineers are already using it!

netdata - v1.45.2

Published by netdatabot 7 months ago

Netdata v1.45.2 is a patch release to address issues discovered since v1.45.1.

This patch release provides the following bug fixes and updates:

Improved PostgreSQL/MySQL local listener discovery to automatically check for connections using both TCP and Unix sockets, enabling support for passwordless Unix socket connections (#17304 #17305, @ilyam8)
Fixed an issue that prevented negative matching of host/chart labels in alert configurations (#17290 #17292, @ktsaou)
Improved go.d.plugin stability by preventing Netdata from shutting down the entire plugin due to an issue with registering jobs for unregistered modules (#17289, @ilyam8)
Improved go.d.plugin HTTP requests now include a UserAgent string, enhancing identification in server log (#17286, @ilyam8)
Improved Nginx discovery in go.d.plugin by automatically trying multiple status endpoints when discovering Nginx containers (#17285, @ilyam8)
Fixed a go.d.plugin panic that could occur when using the Unbound collector with TLS (#17283, @ilyam8)
Fixed a libyaml linking issue (#17276, @Ferroin)
Improved go.d.plugin configuration validation, preventing unexpected or invalid options through dynamic configurations (#17269, @ilyam8)

Support options

Netdata Learn: Find documentation, guides, and reference material for monitoring and troubleshooting your systems with Netdata.
GitHub Issues: Make use of the Netdata repository to report bugs or open a new feature request.
GitHub Discussions: Join the conversation around the Netdata development process and be a part of it.
Community Forums: Visit the Community Forums and contribute to the collaborative knowledge base.
Discord Server: Jump into the Netdata Discord and hang out with like-minded sysadmins, DevOps, SREs, and other troubleshooters. More than 2000 engineers are already using it!

netdata - v1.45.1

Published by netdatabot 7 months ago

Netdata v1.45.1 is a patch release to address issues discovered since v1.45.0.

This patch release provides the following bug fixes and updates:

Ensured proper handling of default values for data collection jobs submitted via dynamic configuration. (#17255, @ilyam8)
Optimized go.d.plugin service discovery by filtering out irrelevant docker-proxy listeners. (#17254, @ilyam8)
Improved go.d.plugin's ability to find applications, including those using IPv6, and identify Apache processes more reliably. (#17252, @ilyam8)
Improved OpenSSL discovery on macOS for Homebrew builds. (#17250, @Ferroin)
Obsolete references to saving the internal database using the USR1 signal, reflecting the removal of save/map memory modes. (#17249, @ilyam8)
Added ZSTD compression support for dbengine (disabled by default for now). This improves storage efficiency when available, automatically falling back to uncompressed pages for compatibility. (#17244, @ktsaou)
Fixed a bug that caused metric reference count errors during release. (#17239, @ktsaou)
Code cleanup. (#17237, @ktsaou)
Enabled Gorilla compression by default for dbengine, reducing memory usage. (#17234, @ktsaou)
Improved dbengine unit tests for better code coverage and maintainability. (#17232, @ktsaou)
Fixed a database engine cache bug that could cause queries to stop prematurely under pressure. (#17231, @ktsaou)
Implemented caching optimization to reduce the number of cache flushes following journal file v2 creation. (#17220, @stelfrag)
Reduced clutter in MySQL/MariaDB query logs by disabling session query logging for the go.d/mysql collector. (#17219, @ilyam8)
Improved go.d.plugin to correctly identify MariaDB databases. (#17218, @ilyam8)
Enhanced macOS build stability by using native libraries and optimizing checks for dependencies. (#17216, @Ferroin)
Suppressed unnecessary compiler warnings about redefined macros, improving build cleanliness and compatibility with stricter build flags. (#17209, @Ferroin)

Support options

Netdata Learn: Find documentation, guides, and reference material for monitoring and troubleshooting your systems with Netdata.
GitHub Issues: Make use of the Netdata repository to report bugs or open a new feature request.
GitHub Discussions: Join the conversation around the Netdata development process and be a part of it.
Community Forums: Visit the Community Forums and contribute to the collaborative knowledge base.
Discord Server: Jump into the Netdata Discord and hang out with like-minded sysadmins, DevOps, SREs, and other troubleshooters. More than 2000 engineers are already using it!

netdata - v1.45.0

Published by netdatabot 7 months ago

Netdata Growth

67.5k GitHub stars!
626M Docker Hub pulls!

Thanks to your love ❤️, Netdata is leading the observability category in CNCF, having significantly more stars than Elasticsearch, Grafana, Prometheus and all other observability solutions listed in CNCF landscape.

We are committed to provide the most advanced and innovative observability solution, to help us minimize monitoring costs while providing AI-powered high-fidelity monitoring!

You like Netdata? Give Netdata a ⭐ too, on GitHub!

Release Summary

3 months have passed since the previous Netdata release. A lot has changed since then! Netdata now has a mobile app for alert notifications, new drag-and-drop custom dashboards, network connections monitoring, dynamic configuration for data collection jobs and alerts, and many more...

To see how Netdata stacks up against the most advanced commercial offerings available today, we did an analysis on how Dynatrace, Datadog, Instana, Grafana and Netdata commercial offerings compare.

It is nice to see that Netdata stands out for its:

Excellent technology coverage

Netdata's monitoring coverage is significantly higher compared to others, in all areas!
High-fidelity, real-time insights

Netdata is the only monitoring solution offering this kind of fidelity (per-second for all metrics), at this extend!
Real AI and Machine-Learning.

Netdata is the only monitoring system that offers real machine learning, running at the edge.
Lightweight

Netdata is among the lightest agents, despite the fact that it does a lot more than the others.
Best cost efficiency

Netdata's cost efficiency is unbeatable, making Netdata the most cost-efficient monitoring solution available today!

Read the full blog here.

Release Highlights

Netdata Mobile App

You can now receive Netdata alerts directly on your mobile phone!

Choose your space and see all the available notifications since you last signed in!

Check the full demo here.

The Mobile App is available for Homelab and Business plan users.

Custom Dashboards

You can now create advanced custom dashboards with Netdata!

Drag-and-Drop

Easily move charts from Metrics or Single Node views straight to your dashboards. It's intuitive and fun.
New Chart Types

Discover your data in new ways with Bar, Circle, Gauge, Pie, Value, and Group boxes.
Quick Dashboard Creation

Hit the plus button, drag, and you've got a new dashboard. Simple as that.
Rename Charts

Customize your dashboard by renaming charts to whatever makes sense to you.
Refreshed Text Cards

We've upgraded text cards for better clarity and aesthetics.

On the Agent UI and the Community plan of Netdata Cloud 1 custom dashboard is allowed. The Homelab and Business plans of Netdata Cloud support an unlimited number of custom dashboards.

Network Viewer

Explore the network connections of your servers and processes!

302690196-5f71c102-9146-463e-acba-329094b136a5

Netdata got a network viewer (select network-connections from the Top tab inside the dashboard).

The tool reports all IPv4 and IPv6, TCP and UDP sockets a system and all its processes have. Also, it automatically and reliably classifies them as inbound, outbound, local (i.e. within the host itself), or listen (for daemons).

The visualization graph has 4 sides:

public (i.e. public IPs),
private (i.e. private and reserved IPs),
servers (i.e. listening and inbound sockets),
clients (i.e. sockets towards other servers).

The position of each application on the chart is determined by the classification of the sockets it has. To the top are clients, to the bottom are servers, to the right are internet facing applications, to the left is internal network applications.

The size of each application in the chart is determined by the number of sockets it has, and each application is a pie chart representing the percentage of each kind of sockets it has.

For servers with dozens of thousands of sockets, the tool provides an aggregated view, grouping similar sockets together and reporting the total. Users can switch to a detailed view from the UI.

User Settings

We've improved immensely the customization capabilities of Netdata with the introduction of User Settings.

Our first release on this front is focused on the customization of charts, either on the Metrics tab or the Single Node view tab. You can now create for any chart:

Personal views
Room specific views
Space dedicated views

With this, you can define what is best for your team to visualise a given chart but still allow each teammate to define their own. Users will be presented with the view they should see, based on setting hierarchy, but they
can interchangeably select which of the select views they want.

chrome_UPtwnDlNyd

More areas of customization will come soon, Filters saved views, Dashboards Table of Content (TOC) ordering, etc.

Dynamic Configuration (beta)

Netdata agents are now deployed with the ability to dynamically accept configuration from the UI, for data collection jobs and alerts. The feature is released in beta.

Alerts Configuration Manager

Check the full demo here

Alerts Silencing Rules

g6oHMtz

🔕Improvements done to make it easier to interact and see Alert Silencing Rules.
With this release, you will be able to:

See silencing rules status directly on entities like Alerts, Rooms, and Nodes
Immediately create a silencing rule for an Alert, a Room, or a Node

We hope this makes it easier for you to interact with the Alert Silencing Rule Manager.
Stay tuned for more improvements!

MacOS Processes Monitoring

Netdata's apps.plugin has been ported to macOS, allowing users to view processes information on Linux, FreeBSD and macOS!

Just install the latest Netdata on your macOS and enjoy full processes monitoring!

Homelab Plan

For non-professional use, get the whole and the latest of Netdata! For the cost of a beer per month, you can get access to all Business features of Netdata, for your home lab or personal project!

Our Homelab plan is available to technology enthusiasts and students, for non-professional user, offering the entire Netdata suite, for a small flat fee, under a fair usage policy.

Unlimited Access: Enjoy the freedom of unlimited usage, with no caps on nodes or custom dashboards.
Premium Features: Get your hands on business-level features, including enhanced alert integrations and access to our mobile app, all tailored for your personal projects.
Support Netdata: Support the open-source Netdata, to ensure it will be there for you, when you need it!

New Build Infrastructure

Starting with Netdata 1.45, we have completely removed our GNU Autotools based build system and replaced it with
CMake. The new CMake build system has a number of significant benefits for developers, package maintainers, and
those using local builds of Netdata.

We now have proper support for out-of-tree builds, and this is now the preferred method for building Netdata.
Build configuration is now measurably faster than it was previously.
Netdata can now be built using Ninja instead of Make, further speeding up the build process. The installer and
updater script will automatically use Ninja instead of Make when possible, and developers, package maintainers,
and users who are building by hand are encouraged to explicitly use it themselves when building Netdata by
specifying the -G Ninja option to CMake during the configuration process.
Overall maintenance of the build system will be significantly easier going forwards. This means we should be
able to fix any issues involving it much more quickly, and contributions from external developers should be
much easier.
A number of features we have wanted to add to our build infrastructure will be much easier to add now.

Most users should not be directly affected by this change other than benefiting from the faster build times,
only those who were building locally by hand (not using the netdata-installer.sh script or the kickstart script)
will need to change things.

Go Plugin Moved to Main Repository

Alongside the new CMake build system, we have also moved the go.d.plugin
code from the netdata/go.d.plugin repository to the main netdata/netdata
repository.

We have made this change for three reasons:

It makes handling of bugs in the Go plugin much easier. Instead of possibly needing to track issues and PRs
across two repositories, now everything should end up tracked coherently in one repository.
It lets us significantly simplify a number of parts of our CI and installation code, allowing for greater
reliability and easier maintenance.
It provides a testbed for infrastructure for handling of Go in the main repo, which is significant as we have
been internally looking at reimplementing some other components of the agent in Go.

Users of native packages and static builds should see no difference at all from this change.

Building the agent locally will now require a working Go toolchain supporting a particular minimum version of the
Go language (currently 1.21) if the Go plugin needs to be built. The plugin itself can still be disabled to avoid
this requirement, but this is not recommended.

The installer code will attempt to ensure that a sufficiently up-to-date Go toolchain is installed when installing
or updating the agent. If such a toolchain is not found, it will attempt to automatically install a copy of the
official toolchain from https://go.dev/dl/ in /usr/local/go. If that attempt fails, the Go plugin will be
DISABLED automatically at build time.

Acknowledgments

@candlerb for improving robustness of netdata-updater.sh.
@carrychair for removing unnecessary repetition of words in documentation.
@luisj1983 for adding "Backing up a Netdata Agent" documentation.
@moschlar for fixing --distro-override parameter name in kickstart documentation.
@pschaer for correcting instructions on creating a startup script in the "Install Netdata on Synology" guide.
@sepek for fixing description of "chart labels" in "Configure alerts".

Contributions

Collectors

Add macOS support for collecting resource usage of processes (apps.plugin) (#17180, @ktsaou)
Improve identification of applications in docker service discovery (go.d.plugin) (#17174, @ilyam8)
Execute local-listeners periodically rather than just once at startup (go.d.plugin) (#17160, @ilyam8)
Add service discovery for applications running inside Docker containers (go.d.plugin) (#17152, @ilyam8)
Implement dynamic configuration for configuring data collection jobs (go.d.plugin) (#17064, @ilyam8)
Update message IDs for systemd and dbus (systemd-journal.plugin) (#16987, @ktsaou)
Report EDAC ECC errors s total counts since boot instead of rates (proc/sys_devices_system_edac_mc) (#16970, @ilyam8)
Add aggregated view (network-viewer.plugin) (#16940, @ktsaou)
Add filtering by username (network-viewer.plugin) (#16911, @ktsaou)
Add Network Viewer plugin (#16872, @ktsaou)
Add CPU throttling % column to the containers-vms function (cgroups.plugin) (#16800, @ilyam8)
Add the ndsudo binary, a helper tool for assisting in the execution of privileged commands (#16614, @ktsaou)
Disable CPU per core metrics by default (proc.plugin) (#16572, @ilyam8)

Fix incorrect family value of the ZFS ZPool state chart (proc/proc_spl_kstat_zfs) (#17054, @ilyam8)
Fix race conditions (diskspace.plugin) (#16786, @ktsaou)
Fix allocated memory after it has been freed (diskspace.plugin) (#16784, @ktsaou)
Fix priority per-core CPU charts (proc/proc_stat) (#16749, @ilyam8)
Fix missing CPU frequency chart (proc/proc_stat) (#16732, @ilyam8)
Fix an issue where cgroup_check_for_new_every was incorrectly multiplied by update_every (cgroups.plugin) (#16719, @ilyam8)

Add mongodb-community-server image to docker service discovery configuration (go.d.plugin) (#17173, @ilyam8)
Add an option to disable service discovery (go.d.plugin) (#17171, @ilyam8)
Allow array/object to be null json schemas (go.d.plugin) (#17166, @ilyam8)
Update file path pattern in jsonschema (go.d.plugin) (#17164, @ilyam8)
Add support for multi-config templates in the service discovery configuration (go.d.plugin) (#17157, @ilyam8)
Improve go.d.plugin dyncfg config schemas (#17124, @ilyam8)
Fix incorrect chart priority for discovered configs (go.d.plugin) (#17115, @ilyam8)
Add notice log level (go.d.plugin) (#17112, @ilyam8)
Fix pulsar tests (go.d/pulsar) (#17093, @ilyam8)
Set max chart id length to 1200 (go.d.plugin) (#17062, @ilyam8)
Improve aggregated view (network-viewer.plugin) (#16960, @ktsaou)
Show unknown container (network-viewer.plugin) (#16900, @ktsaou)
Reorganise code to prepare for functions (ebpf.plugin) (#16788, @thiagoftsm)
Fix missing aral_freez call (eBPF) (#16765, @thiagoftsm)
Cleanup network devices rename (proc/proc_net_dev) (#16745, @ktsaou)
Improve ebpf-socket function column names (ebpf.plugin) (#16727, @ilyam8)
Add double-linked network interfaces collection delay (#16701, @ilyam8)
Cleanup code and improve reliability (ebpf.plugin) (#16669, @thiagoftsm)
Update to create a separate chart for each systemd service rather than a chart dimension (ebpf.plugin) (#16630, @thiagoftsm)
Include 'lxcfs.service/.control' in the list of filtered cgroups (cgroups.plugin) (#16620, @ilyam8)
Exit if unable to locate journal data directories (systemd-journal.plugin) (#16592, @ilyam8)

Health

Remove deprecated alert fields from stock alarms (#17113, @ilyam8)
Fix filtering by severity for gotify notifications (#17069, @ilyam8)
Remove deprecated alert fields: "charts", "os", "host", "plugin" and "module" (#17048, @ktsaou)
Add a new alert to notify about systemd timer units that have failed (#16845, @tkatsoulas)
Implement dynamically configured alerts (#16779, @ktsaou)
Add a new alert to detect unexpected HTTP headers (#16736, @ilyam8)

Packaging/Installation

Fix installing incorrect systemd service files in native deb installations (#17159, @tkatsoulas)
Add a macOS build check in the CI pipeline (#17139, @tkatsoulas)
Remove the "nut" package suggestion in native deb installations (#17129, @ilyam8)
Fix incorrect ownership of cups.plugin in native deb installations (#17087, @tkatsoulas)
Fix incorrect ownership of ioping.plugin in native deb installations (#17086, @tkatsoulas)
Fix an issue where the ebpf.plugin was recommended for unsupported architectures (#17085, @tkatsoulas)
Fix pre/post install script names for the network-viewer plugin in native deb installations (#17084, @tkatsoulas)
Improve message in kickstart if a static build can’t be found (#17081, @Ferroin)
Fix determining repo root in Coverity scan script (#17024, @Ferroin)
More concretely utilize local modules in our CMake code (#17022, @Ferroin)
Update eBPF packages (#17012, @thiagoftsm)
Integrate Go plugin with build system (#17005, @Ferroin)
Bump the version of the installed Go toolchain to 1.22.0 (#17004, @Ferroin)
Include Go plugin sources in main repository (#16997, @Ferroin)
Move CO-RE headers (integration between eBPF and Network Viewer) (#16978, @thiagoftsm)
Use C++14 by default when building on systems that support it (#16972, @Ferroin)
Split network viewer plugin to its own package (#16949, @Ferroin)
Fix "Fluent-Bit" installation (#16924, @ilyam8)
Build network-viewer only on Linux (#16910, @vkalintiris)
Set build type to release with debug information, ensuring optimized builds (#16889, @vkalintiris)
Add ARMv6 static builds (#16853, @Ferroin)
Fix issue with fetching the latest tag on macOS in kickstart.sh (#16844, @ilyam8)
Make the kickstart checksum placeholder value more specific (#16843, @tkatsoulas)
Fix directory handling in Go toolchain handling script (#16828, @Ferroin)
Add script to ensure a usable Go toolchain is installed (#16815, @Ferroin)
Apply ASCII-based comparisons to commands in kickstart script that rely on a particular language setting (#16806, @tkatsoulas)
Remove help text that no longer applies from netdata-installer.sh (#16805, @vkalintiris)
Fix incorrect major version check in updater (#16803, @Ferroin)
Change default build directory in installer to build (#16768, @Ferroin)
Add cap_dac_read_search capability to go.d.plugin (#16754, @ilyam8)
Improve removal of the netdata user from groups in uninstaller.sh (#16742, @ilyam8)
Update the default netdata.conf configuration file used for native packages (#16734, @ilyam8)
Fix handling of hardening flags with Clang (#16731, @Ferroin)
Disable logs-management plugin when installing on macOS (#16708, @ilyam8)
Remove Ubuntu 23.04 from the CI (#16694, @tkatsoulas)
Fix enable/disable options for Prometheus remote write in netdata-installer.sh (#16690, @tkatsoulas)
Disable logs-management plugin when building static packages (#16684, @ilyam8)
Fix an issue where install-required-packages.sh wouldn't run due to elevated privileges on macOS (#16675, @ilyam8)
Remove unused contrib/rhel directory (#16672, @ilyam8)
Update eBPF packages (#16671, @thiagoftsm)
Add extra build flags to CMakeLists.txt (#16641, @Ferroin)
Assorted cleanup of native packaging code (#16640, @Ferroin)
Make web directory configurable through CMake variables (#16638, @ilyam8)
Improve handling of basic permissions for most scripts on install (#16629, @Ferroin)
Improve robustness of netdata-updater.sh (#16613, @candlerb)
Remove v1 dashboard version check from installer.sh (#16603, @ilyam8)
Fix "target_ram" calculation in netdata-installer.sh (#16602, @ilyam8)
Improve enable_feature function in installer.sh (#16601, @ilyam8)
Allow passing cmake options with NETDATA_CMAKE_OPTIONS (#16598, @vkalintiris)
Add Alpine Linux 3.19 to CI (#16579, @Ferroin)
Remove Netdata packages from APT cache when attempting to install (#16566, @Ferroin)
Assorted kickstart script fixes (#16537, @Ferroin)
Remove openSUSE 15.4 from CI (#16449, @tkatsoulas)
Remove fedora 37 from CI (#16422, @tkatsoulas)
Add CMake build system (#15996, @vkalintiris)
Add check to avoid auto-installing new major versions of Netdata (#15898, @Ferroin)
Improve support running the Docker entrypoint code as a non-root user (#15118, @Ferroin)

Documentation

Improve "Choose your Netdata Cloud theme" doc (#17172, @Ancairon)
Add instructions for monitoring NVIDIA GPUs to the Docker installation guide (#17167, @ilyam8)
Add documentation for the "Integration URL" field to PagerDuty Cloud integration doc (#17149, @juacker)
Bring back old docs that were containing missing information (#17146, @Ancairon)
Remove unnecessary repetition of words in docs (#17131, @carrychair)
Fix broken link in "Netdata Cloud On-Prem Installation" (#17118, @tkatsoulas)
Fix typos and improve wording in "Backing up a Netdata Agent" (#17117, @Ancairon)
Remove deprecated settings from "Configure alerts" (#17116, @ilyam8)
Fix broken links in go.d.plugin markdown files (#17108, @ilyam8)
Remove deprecated "foreach" from "Configure alerts" (#17106, @ilyam8)
Remove distributed-data-architecture.md (#17097, @Ancairon)
Fix broken links (#17095, @Ancairon)
Remove docs/netdata-security.md (#17094, @Ancairon)
Update "Plugin Functions Tables" docs (#17071, @car12o)
Update "Sizing Netdata Agents" doc (#17057, @ktsaou)
Fix links pointing to old go.d repo and update the integrations (#17040, @Ancairon)
Update links to Netdata Agent start-stop-restart docs (#17037, @Ancairon)
Include information on securing Netdata parent-child communication in "Configuring Metrics Centralization Points" (#17035, @Ancairon)
Restructure and update documentation (#17014, @Ancairon)
Add "Backing up a Netdata Agent" documentation (#17006, @luisj1983)
Correct instructions on creating a startup script in the "Install Netdata on Synology" guide (#16980, @pschaer)
Improve formatting in "How to optimize the Netdata Agent's performance" (#16925, @tkatsoulas)
Fix links to the energy efficiency screenshots to main readme file (#16904, @Aliki92)
Update "What's New and Coming?"based on Office Hours shared plans to main readme file (#16895, @hugovalente-pm)
Improve readability of Webhook Cloud notification documentation (#16882, @juacker)
Remove deprecated db mode "save" from "Database" (#16864, @Ancairon)
Fix CNCF link (#16851, @hugovalente-pm)
Add documentation on how to configure MS Teams Cloud notifications (#16834, @papazach)
Added instructions on calculating replication history to "Streaming and Replication Reference" (#16816, @thiagoftsm)
Update provisioning instructions in "Netdata Cloud On-Prem Light PoC" (#16811, @M4itee)
Add information about the new node permissions to "Role-Based Access model" (#16791, @vkuznecovas)
Add missing settings to "Streaming and replication reference" (#16778, @thiagoftsm)
Fix instructions for setting up Telegram notifications (#16777, @thiagoftsm)
Updated the kickstart URL to https://get.netdata.cloud/kickstart.sh (#16738, @ilyam8)
Fix --distro-override parameter name in "Install Netdata with kickstart.sh" (#16726, @moschlar)
Add the Mobile App notification Integration (#16715, @sashwathn)
Add "Require Cloud" column to the functions table in "Netdata Functions" (#16681, @ilyam8)
Fix typos and improve wording in "Creating Alerts with Netdata Alerts Configuration Manager" (#16679, @Ancairon)
Fix description of "chart labels" in "Configure alerts" (#16656, @sepek)
Fix formatting in "Creating Alerts with Netdata Alerts Configuration Manager" (#16651, @Ancairon)
Add practical examples showcasing how to utilize journalctl for querying Netdata logs to "Netdata Logging" (#16650, @ilyam8)
Add "Creating Alerts with Netdata Alerts Configuration Manager" (#16642, @sashwathn)
Add instructions for installing Netdata in a rootless Docker environment (#16632, @ilyam8)
Add energy efficiency image to main readme file (#16617, @Aliki92)
Remove deprecated memory mode "map" and "save" (#16604, @vkalintiris)
Update Splunk icon to a dark version for improved visibility (#16593, @juacker)
Add documentation on how to configure Splunk Cloud notifications (#16586, @juacker)
Add a new document explaining Gorilla compression and decompression techniques (#16553, @vkalintiris)
Add an initial version of the "Plugin Functions Tables" documentation (#16535, @ktsaou)

Other Notable Changes

Change query label matching logic (#16827, @stelfrag)
Setup sentry-native SDK for reporting crashes (#16798, @vkalintiris)
Add netdata_os_info metric to Prometheus export (#16756, @colinleroy)
Rewrite and extend dynamic configuration (#16702, @ktsaou)
Track the progress of data queries (#16574, @ktsaou)

Fix a crash occurring when failing to create the requested number of tiers (#16999, @stelfrag)
Fix an issue where Netdata plugins could inherit unintended sockets or file descriptors during the forking process (#16881, @ktsaou)

Announce dynamic configuration capability to the Cloud (#17162, @stelfrag)
Fix a problem preventing the Agent from starting due to missing SOCK_CLOEXEC on macOS (#17151, @stelfrag)
Add a check to ensure that the duration is not negative when sending alert log to the Cloud (#17144, @stelfrag)
Add a check to detect self thread when exiting (#17126, @vkalintiris)
Fix health alert dyncfg schema fullPage option (#17125, @ilyam8)
Fix memory leak when freeing an array pattern (#17114, @stelfrag)
Remove unused go.d.plugin files (#17110, @ilyam8)
Improve cleanup of ephemeral hosts during agent startup (#17104, @stelfrag)
Reorganize and cleanup database related code (#17101, @stelfrag)
Fix ebpf compilation warnings (#17100, @stelfrag)
Make watcher thread wait for explicit steps (#17079, @vkalintiris)
Abort the agent if a single shutdown step takes more than 60 seconds (#17060, @vkalintiris)
Fix a typo in the ENABLE_SENTRY cmake variable (#17059, @vkalintiris)
Call the dyncfg interceptor when executing 'test' for a new job (#17052, @ktsaou)
Fix alerts jsonschema prototype for latest dyncfg (#17047, @ktsaou)
Protect type anomaly rate map with a spinlock (#17044, @vkalintiris)
Do not use backtrace when sentry is enabled (#17043, @vkalintiris)
Add metric and sample count into the api/v2/info response (#17042, @stelfrag)
Improved query target cleanup (#17038, @stelfrag)
Database and health code cleanup (#17036, @stelfrag)
Do not fetch retention on metric release (#17033, @stelfrag)
Increase RRD_ID_LENGTH_MAX to 1200 (#17028, @stelfrag)
Add support for deleting orphan configurations to dyncfg (#17023, @ktsaou)
Correctly mark protobuf as required in find_package (#17021, @Ferroin)
Protect metric release in dimension delete callback (#17020, @stelfrag)
Reorganize ebpf plugin code for network-viewer (#17018, @thiagoftsm)
Allow tree for individual IDs (#17017, @ktsaou)
Add watcher thread to report shutdown steps (#17010, @vkalintiris)
Fix testing new jobs in dyncfg (#17009, @ktsaou)
Abort on non-zero rc during exiting on sentry-enabled builds (#17008, @vkalintiris)
Misc improvements (#17001, @stelfrag)
Move diagrams/ under docs/ (#16998, @vkalintiris)
Small code cleanup (#16996, @vkalintiris)
Remove historical changelog and cppcheck (#16995, @vkalintiris)
Remove config macros that are always set (#16994, @vkalintiris)
Move web/ under src/ (#16992, @vkalintiris)
Add spinlock to protect metric release (#16989, @stelfrag)
Detect machine GUID change (#16979, @stelfrag)
Move collectors/ under src/ (#16965, @vkalintiris)
Improve agent shutdown (#16959, @stelfrag)
Add support for testing new jobs to dyncfg (#16958, @ktsaou)
Fix path in health integrations (#16956, @Ancairon)
Move health/ under src/ (#16954, @vkalintiris)
Do not declare struct meant for internal usage (#16951, @vkalintiris)
Remove cleanup_destroyed_dictionaries call during shutdown (#16944, @stelfrag)
Remove duplicate check (#16936, @stelfrag)
Move daemon/ under src/ (#16933, @vkalintiris)
Split dictionary into multiple files (#16920, @ktsaou)
Release label key if already in use (#16916, @stelfrag)
Add support for the info parameter to all external plugin functions (#16915, @ktsaou)
Move exporting/ under src/ (#16913, @vkalintiris)
Rename network functions (#16908, @ktsaou)
Assorted build-related changes (#16906, @vkalintiris)
Move fluent-bit & logsmanagement under src/ (#16903, @vkalintiris)
updated permissions map comment (#16902, @ktsaou)
Use spinlock for reference counting (#16901, @vkalintiris)
Move aclk/ under src/ (#16899, @vkalintiris)
Enable sentry sessions (#16898, @vkalintiris)
Do not cancel detection thread (#16897, @vkalintiris)
Create a top-level directory for the source code (#16896, @vkalintiris)
Remove tags field from RRD hosts (#16894, @vkalintiris)
Add support for using netlink when libmnl is available to local-sockets (#16893, @ktsaou)
Fix order of opening a file and checking its inode in local-sockets (#16887, @ktsaou)
Fix crash on query_progress initializer (#16885, @ktsaou)
Remove references to map and save modes in stream.conf (#16874, @vkalintiris)
Fix coverity issues (#16873, @stelfrag)
Add support for network namespaces to local-listeners (#16867, @ktsaou)
Fix coverity issue (#16866, @stelfrag)
Fix an issue where alerts were applied based on order instead of the matching chart criteria (#16862, @ktsaou)
Add support for sockets direction to local-listeners (#16861, @ktsaou)
Improve service thread shutdown (#16841, @stelfrag)
New Permissions System (#16837, @ktsaou)
Add brotli and libyaml to buildinfo (#16830, @ktsaou)
Add explicit callback types for readability (#16820, @vkalintiris)
Improve the robustness of host prefix verification (#16813, @ilyam8)
Move mqtt_websockets under aclk/ (#16804, @vkalintiris)
Add additional fail reason and source during database initialization (#16794, @stelfrag)
Use original summary for alert transition (#16793, @stelfrag)
Free key and search, replace patterns (#16789, @stelfrag)
Use named constants for keyword tokens (#16787, @vkalintiris)
Remove h2o header from libnetdata (#16780, @vkalintiris)
Delete unused variable (#16776, @vkalintiris)
Use unsigned char for binary data in mqtt (#16775, @vkalintiris)
Fix compiler warnings (#16774, @vkalintiris)
Allow POST requests to be received from ACLK (#16770, @ktsaou)
Keep transaction id of request headers (#16769, @ktsaou)
Improvements for /api/v1/config tree and swagger documentation (#16764, @ktsaou)
Fix compiler warnings (#16763, @ktsaou)
Fix cmake _GNU_SOURCE warnings (#16761, @ktsaou)
Fix sanitizer errors (#16759, @ktsaou)
Report timestamps with progress (#16758, @ktsaou)
Add schemas to /usr/lib/netdata/conf.d/schema.d (#16757, @ktsaou)
Recursively merge mqtt_websockets (#16755, @vkalintiris)
Name storage engine variables consistently (#16753, @vkalintiris)
Address sanitizer through CMake and use it for unit tests (#16748, @vkalintiris)
Improve the error message when accessing functions (#16692, @ktsaou)
Fix linking issues for log2journal and netdatacli against libnetdata (#16688, @ktsaou)
Fatal relaxation of unknown page types (#16682, @vkalintiris)
Fix cmake missing defines (#16680, @ktsaou)
Set log level of the too-old-data message to debug (#16663, @ilyam8)
Improve context load (#16659, @stelfrag)
Fix compilation error when using --disable-dbengine (#16645, @stelfrag)
Remove code relying on autotools (#16634, @vkalintiris)
Fix UB of unaligned loads/stores and signed shifts (#16628, @vkalintiris)
Fix coverity issues, logically dead code and error checking (#16618, @stelfrag)
Fix small coverity issue (#16616, @stelfrag)
Remove CPack stuff from CMake (#16608, @vkalintiris)
Remove includes outside libnetdata (#16607, @vkalintiris)
Remove build/ (#16600, @vkalintiris)
Cleanup am files (#16597, @vkalintiris)
Handle coverity issues related to Y2K38_SAFETY (#16583, @stelfrag)
Update naming for swagger api (#16564, @tkatsoulas)
Code cleanup (#16542, @ktsaou)

Deprecation notice

Changed in this release

All depreciated items from the v1.44.0 notice have been addressed except for enabling gorilla compression by default.

Additionally, the following Alert options have been deprecated in this release. While Netdata will still understand these options when
reading existing alert configurations for now, we recommend updating your custom alert configurations to use the
replacements listed below. Compatibility with these deprecated options might be removed in a future release.

Option	Use instead
`foreach DIMENSIONS` (`lookup` line)	-
`charts`	-
`os`	`host labels: _os=X`
`host`	`host labels: _hostname=X`
`plugin`	`chart labels: _collect_plugin=X`
`module`	`chart labels: _collect_module=X`

Where X is a simple pattern.

Netdata Release Meetup

Join the Netdata team on the 25th of March at 17:00 UTC for the Netdata Release Meetup.

Together we’ll cover:

Release Highlights.
Acknowledgments.
Q&A with the community.

RSVP now - we look forward to meeting you.

Support options

Netdata Learn: Find documentation, guides, and reference material for monitoring and troubleshooting your systems with Netdata.
GitHub Issues: Make use of the Netdata repository to report bugs or open a new feature request.
GitHub Discussions: Join the conversation around the Netdata development process and be a part of it.
Community Forums: Visit the Community Forums and contribute to the collaborative knowledge base.
Discord Server: Jump into the Netdata Discord and hang out with like-minded sysadmins, DevOps, SREs, and other troubleshooters. More than 2000 engineers are already using it!

netdata - v1.44.3

Published by netdatabot 8 months ago

Netdata v1.44.3 is a patch release to address issues discovered since v1.44.2.

This patch release provides the following bug fixes and updates:

Improved handling of slow queries and CPU usage of the ACLKSYNC thread. (#16838, @stelfrag)
Improved error handling for listen bind failures. Instead of terminating fatally, Netdata now exits gracefully. (#16937, @stelfrag)
Fixed invalid alert durations in health log entries. (#16931, @stelfrag)
Fixed a race condition during analytics data setup, preventing potential Netdata crashes. (#16929, @stelfrag)
The Netdata base image includes Debian backports for comprehensive security and stability. (netdata/helper-images#271, @tkatsoulas)

Support options

Netdata Learn: Find documentation, guides, and reference material for monitoring and troubleshooting your systems with Netdata.
GitHub Issues: Make use of the Netdata repository to report bugs or open a new feature request.
GitHub Discussions: Join the conversation around the Netdata development process and be a part of it.
Community Forums: Visit the Community Forums and contribute to the collaborative knowledge base.
Discord Server: Jump into the Netdata Discord and hang out with like-minded sysadmins, DevOps, SREs, and other troubleshooters. More than 1900 engineers are already using it!

netdata - v1.44.2

Published by netdatabot 9 months ago

Netdata v1.44.2 is a patch release to address issues discovered since v1.44.1.

This patch release provides the following bug fixes and updates:

Fixed an inconsistency where the NETDATA_LOG_LEVEL environment variable did not affect log level in Docker containers. (#16943, @ilyam8)
Fixed inconsistent log severity across sources: log severity level setting now work for all Netdata log sources (daemon, collector, health, access, aclk). (#16922, @ilyam8)
Fixed a bug in chartd.d.plugin that prevented loading of its modules configuration files. (#16939, @ilyam8)
Fixed inaccurate server type identification in Netdata Cloud for FreeBSD jails. Jails are now recognized correctly. (#16858, @ilyam8)
Fixed a bug that prevented the edit-config script from running correctly in Podman containers. The script now accurately identifies container environments. (#16825, @Ferroin)
Fixed a bug that caused excessive logging of "Using host prefix directory" messages. (#16814, @ilyam8)
Fixed incorrect label source for apps.plugin charts, ensuring they are now accessible when querying Prometheus metrics. (#16810, @boxjan)
Fixed a bug in the cgroups.plugin that could lead to crashes. Additionally, addressed incorrect thread name during fatal Agent exits. (#16771, @ktsaou)
Fixed a race condition related to pthread_detach() calls, preventing potential Netdata crashes during thread creation. (#16760, @ktsaou)
Fixed a bug that caused "maximum number of cgroups reached" messages to spam logs. (#16730, @ilyam8)
Fixed incorrect service file location during MacOS installation: now, launchctl commands can reliably start and stop Netdata. (#16693, @ilyam8)
Fixed a bug that caused the Netdata claiming process to fail on macOS due to an inaccessible netdata-claim.sh script. (#16686, @ilyam8)
Fixed missing host label streaming from child nodes: host labels are now transmitted reliably to parent nodes. (#16821, @stelfrag)
Fixes a bug in clock resolution calculation that prevented some data collection plugins from working correctly. (#16720, @ktsaou)
Fixed a bug that caused Netdata to crash when calculating database size due to missing or single datafiles. (#16699, @ktsaou)
Fixed a bug that caused the cups.plugin to not terminate upon receiving a SIGPIPE (Broken Pipe) signal. (#16691, @ilyam8)
Fixed a reference counting issue that could lead to Netdata crashes. (#16687, @ktsaou)
Fixed charts context and family definitions of exporting engine. (#16683, @ilyam8)
Fixed a bug that could cause crashes when processing web requests. (#16664, @ktsaou)
Fixed improper handling of the dbengine event loop during shutdown. (#16658, @stelfrag)
Fixed a potential memory corruption issue in database code. (#16654, @stelfrag)
Fixed "response too big" error for Systemd-journal: addressed limitations by raising the maximum web response size. (#16649, @ktsaou)
Fixed compilation issues with --disable-dbengine: addressed errors that prevented successful builds when this flag was used. (#16611, @stelfrag)
Fixed labels corruption due to duplicate key/value pairs. Additionally, addressed logging errors that occurred during fatal Agent exits. (commit, @ktsaou)
Update go.d.plugin to v0.58.0. (#16725, @ilyam8).

Acknowledgements

We would like to thank our dedicated, talented contributors who make up this amazing community. The time and expertise that you volunteer are essential to our success. We thank you and look forward to continuing to grow together to build a remarkable product.

@boxjan for fixing incorrect label source for apps.plugin charts.

Support options

Netdata Learn: Find documentation, guides, and reference material for monitoring and troubleshooting your systems with Netdata.
GitHub Issues: Make use of the Netdata repository to report bugs or open a new feature request.
GitHub Discussions: Join the conversation around the Netdata development process and be a part of it.
Community Forums: Visit the Community Forums and contribute to the collaborative knowledge base.
Discord Server: Jump into the Netdata Discord and hang out with like-minded sysadmins, DevOps, SREs, and other troubleshooters. More than 1900 engineers are already using it!

netdata - v1.44.1

Published by netdatabot 10 months ago

Netdata v1.44.1 is a patch release to address issues discovered since v1.44.0.

This patch release provides the following bug fixes and updates:

Fixed an issue in the uninstall script that prevented log2journal and systemd-cat-native from being removed (#16585, @ilyam8).
Fixed a bug that caused the debugfs.plugin to not terminate upon receiving a SIGPIPE (Broken Pipe) signal (#16569, @ilyam8).
Fixed memory leak during host chart label cleanup (#16568, @stelfrag).
Fixed incorrect cpu architecture/ram/disk values in build info (#16567, @ilyam8).
Fixed a bug that prevented the parent from accepting streaming connections on systems with one CPU core (#16565, @stelfrag).
Make the systemd-journal mandatory package on Centos 7 and Amazon linux 2 (#16562, @tkatsoulas).
Fixed crash on reading memory clock speed of an AMD graphics card (#16561, @MrZammler).
Fixed an unhandled error that occurred when setting file capabilities in the Debian postinst script of the perf.plugin (#16558, @tkatsoulas).
Fixed an issue where the user's netdata home directory was set to an incorrect value (#16548, @ilyam8).
Added the lightweight text editor to the Docker image (#254, @tkatsoulas).

Support options

Netdata Learn: Find documentation, guides, and reference material for monitoring and troubleshooting your systems with Netdata.
GitHub Issues: Make use of the Netdata repository to report bugs or open a new feature request.
GitHub Discussions: Join the conversation around the Netdata development process and be a part of it.
Community Forums: Visit the Community Forums and contribute to the collaborative knowledge base.
Discord Server: Jump into the Netdata Discord and hang out with like-minded sysadmins, DevOps, SREs, and other troubleshooters. More than 1700 engineers are already using it!

netdata - v1.44.0

Published by netdatabot 11 months ago

Steady to our schedule, this is another great Netdata release!

[!IMPORTANT]
Stay informed about upcoming changes and potential deprecations by reviewing the deprecation notice sections. This will help you plan for any necessary adjustments to ensure a smooth transition.

Netdata Growth

66k+ GitHub Stars ⭐
Since October 2023, Netdata is leading the observability category in the CNCF landscape, surpassing Elasticsearch. Thank you for your love ❤️! Give Netdata a ⭐ too, on GitHub!
600M+ docker hub pulls
Netdata runs with about 200k docker hub downloads per day. Since June 2023 we are a Verified Publisher, so that Netdata pulls don't count against docker hub pull limits for our users, allowing all our users to integrate Netdata to their CI/CD toolchains.

Release Summary

Netdata beats Prometheus in all aspects: this version of Netdata includes significant improvement allowing Netdata to be a lot more performant than Prometheus, at scale. Full performance analysis included.
Netdata Journal Logs: Netdata can now deal with huge systemd-journal databases and is available for the host logs when Netdata runs in a container.
First beta version of Netdata's log2journal: a utility to extract, convert, transform and send to systemd-journal any kind of structured logs (including JSON and logfmt logs), similar to what promtail does for Loki.
More Netdata Functions: monitor containers and VMs, network interfaces, mount points, block devices, systemd units, systemd services, and more!
Netdata now logs to journal instead of log files and the results are amazing!

Release Highlights

Netdata beats Prometheus in all aspects

We tested Netdata and Prometheus at scale, both ingesting 2.7 million metrics per second. On the same workload, Netdata vs Prometheus needs:

35% less CPU
49% less RAM
12% less bandwidth
75% less disk space
98% less disk I/O

Read the full performance comparison between Netdata and Prometheus.

To achieve these astonishing results, we made the following changes to Netdata since the previous release:

New `SLOTS` streaming protocol

A new streaming protocol, allows Netdata children and parents to share a common index of the metrics streamed, allowing the parents to receive metrics without consulting hashtables, reducing the overall overhead on parents by about 30%, without increasing the overhead on children (the children just number each metric).

The new protocol, called SLOTS, is automatically selected when both the child and the parent support it.

Streaming compression algorithms

Streaming now supports multiple compression algorithms. Previous Netdata releases supported only LZ4, which is known for its speed and average compression ratio. This release adds support for ZSTD, GZIP, and BROTLI.

ZSTD provides the best balance between compression ratio and CPU consumption, and therefore it is now the default.

The compression algorithms selection order can be configured on parents, in stream.conf, at the [API] section (parents), by setting compression algorithms order = zstd lz4 brotli gzip.

If you need to save most bandwidth at the expense of CPU utilization set this so that brotli or gzip appear first in the list, before zstd and lz4.

This also means that parents can now have a different compression order for each API key, allowing the use of different API keys based on the location of the child (i.e. children that are on billable egress bandwidth can use an API key that prefers the best compression, like brotli and gzip, while children on non-billable egress bandwidth can use an API key that prefers the best CPU utilization, like zstd or lz4).

Gorilla compression beta

Gorilla compression is a time series data compression technique, developed by Facebook for their time series database, Gorilla. It's particularly efficient for compressing data that changes incrementally over time, which is a common characteristic of time series data.

This release of Netdata includes an adaptation of Gorilla compression, which once enabled, provides 30% additional memory reduction to Netdata.

This was not ready when we compared Netdata and Prometheus, so the Gorilla compression benefits weren't accounted in the comparison. By enabling Gorilla compression, Netdata memory reduction is 70%+ compared to Prometheus.

To try Gorilla compression, edit netdata.conf and set at the [db] section, dbengine page type = gorilla.

Keep in mind that enabling Gorilla compression changes the dbegnine file format to Gorilla compressed metrics. This version of Netdata can read Gorilla-compressed data from dbengine even if Gorilla compression is not enabled, but previous versions of Netdata cannot read it. So, enable Gorilla, only if you don't plan to switch back to a previous version of Netdata.

Our plan is to have Gorilla compression enabled by default at the next release of Netdata.

systemd-journal logs

Our systemd-journal.plugin was already quite faster (10x) than journalctl, but still it was slow when the journal databases is huge (e.g. at journals centralization points where hundreds or thousands of nodes push their logs).

In this release, we introduce several changes to allow the plugin to work promptly in such environments.

Sampling and estimations

The biggest performance issue with systemd-journal logs is the query performance when dealing with huge logs databases.

To overcome this performance issue and provide prompt responses to queries, Netdata now uses the following strategy:

The latest 500k log entries read from journal files work like before: we read all of them and all the values for all their fields, so that we can have accurate histograms and counters per field value at the filters.
Once we hit the 500k log entries limit on a single query, we turn on sampling and estimations.
Sampling distributes 500k more log entries to all the journal files to be read, so that the total log entries queried for their field values will be 1M. This means that if we have to read 100 files, 10k log entries per file will be sampled and 10k log entries more will be unsampled. Since files are usually spread over time, this provides a good sample across time.
When the sampling threshold is hit, Netdata continues reading more log entries without querying the values of the fields. These log entries appear as [unsampled] at the histogram. We know these log entries are there, but the value counters on the field filters do not include them.
When the [unsampled] threshold is hit, and we have read more than 1% of each file, Netdata estimates the number of entries that will be read from the file and skips the rest of it. This estimation appears as [estimated] in the histogram.

The above process allows Netdata to provide a histogram of the logs in a timely manner, even when the number of log entries in the visible timeframe is several dozen million.

A similar process is usually used by log management systems, including Grafana Loki and Elasticsearch. However, Netdata takes a much bigger sample of the data (other systems usually sample only a few thousand log entries, while Netdata usually samples more than a million) and the visualization allows exposing the exact sampling and estimations made at the histogram.

Image showing [unsampled] and [estimated] on a systemd journal system that collects about 10k nginx log entries per second:

Read more about journals query performance.

journals scan

On busy logs centralization servers, the number of journal files available in /var/log/journal/remote can grow significantly, slowing down directory listing (even ls -l is very slow on them).

To overcome this issue, Netdata now uses inotify events and sorts the files to be scanned from the latest to the oldest.

These changes allow Netdata to present the logs user interface for the most recent journals, immediately after a Netdata restart, while the journals database is scanned in the background.

Logs UI is now available when using Netdata docker images

We switched Netdata docker images from Alpine Linux to Debian, so that libsystemd will be available inside the docker image, allowing systemd-journal.plugin to be compiled and shipped with Netdata docker images.

Using Netdata docker images, Netdata can now query the host system journal files, while running inside the container.

MESSAGE_ID support

systemd-journal has a nice feature where certain events of common interest are given a specific MESSAGE_ID. Several such MESSAGE_IDs have been assigned to track common events, like coredumps, units start/stop events, VMs start/stop events, time changes, etc. In total, we found more than 50 total unique events that are tracked this way.

This version if systemd-journal.plugin automatically tracks and annotates these MESSAGE_IDs using their names allowing quick spotting of events of common interest.

This feature is available at the MESSAGE_ID field filter, at the right side of the dashboard.

`log2journal`, a new tool on your quiver for managing logs

log2journal is a new utility allowing the conversion of log files into structured systemd-journal log entries. This is currently in beta.

The utility allows processing logs like this:

tail -F /var/log/nginx/access.log |\
   log2journal -c nginx-combined |\
   systemd-cat-native

The above builds a basic pipeline for converting the access.log of an Nginx web server into structured log entries in the local systemd-journal.

tail is responsible for feeding the latest logs lines to log2journal. Multiple files can be specified and log2journal can also pick up the filename from tail and add it as a field to the journal logs.
log2journal extracts fields from the log lines it is fed with. This is a powerful tool that can read json and logfmt logs, but also extract fields using PCRE2 patterns from any log. It supports filtering, renaming, and rewriting rules using command line arguments or yaml configuration files. The output of log2journal is the standard Journal Export Format.
systemd-cat-native is another new Netdata utility, reading standard Journal Export Format entries, which are then sent to a local or remote systemd-journal system.

Netdata now logs to systemd-journal

The logging layer of Netdata has been rewritten, so that Netdata logs now go to the systemd-journal, in a namespace called netdata.

The obvious outcome is that now you can monitor Netdata logs, using Netdata's systemd-journal.plugin user interface and thanks to journal namespaces, this does not pollute the system logs. But this is just the beginning...

Netdata utilizes the MESSAGE_ID feature of systemd-journal to register:

all alert transitions
all alert notifications
all connections from Netdata children
all connections to Netdata parents

This means that the systemd-journal.plugin user interface, and journalctl can now be used to list all such events uniformly.

Screenshot of Netdata alert transitions in systemd-journals:

All Netdata logs are now structured. Netdata can also log in json or logfmt formats. We introduced a lot of new fields to track every aspect of Netdata, in a uniform and consistent way. Read more here.

Furthermore, we introduced a new tool called systemd-cat-native allowing any application or shell script to send structured logs to systemd-journal. Read more here.

Functions, power up your troubleshooting toolkit!

Several new Functions have been added to help us in our troubleshooting journeys. On top of processes, streaming and systemd-journal, we are leveraging the wide range of collectors and metrics Netdata has and bring data in a different visual representation.

The updated list can be found on our documentation here, and you can find a summary of the currently available functions with the corresponding CLI tool it relates to:

Function	Description	Alternative to CLI tools	plugin - module
block-devices	Disk I/O activity for all block devices, offering insights into both data transfer volume and operation performance.	`iostat`	proc
containers-vms	Insights into the resource utilization of containers and QEMU virtual machines: CPU usage, memory consumption, disk I/O, and network traffic.	`docker stats`, `systemd-cgtop`	cgroups
ipmi-sensors	Readings and status of IPMI sensors.	`ipmi-sensors`	freeipmi
mount-points	Disk usage for each mount point, including used and available space, both in terms of percentage and actual bytes, as well as used and available inode counts.	`df`	diskspace
network interfaces	Network traffic, packet drop rates, interface states, MTU, speed, and duplex mode for all network interfaces.	`bmon`, `bwm-ng`	proc
processes	Real-time information about the system's resource usage, including CPU utilization, memory consumption, and disk IO for every running process.	`top`, `htop`	apps
systemd-journal	Viewing, exploring and analyzing systemd journal logs.	`journalctl`	systemd-journal
systemd-list-units	Information about all systemd units, including their active state, description, whether or not they are enabled, and more.	`systemctl list-units`	systemd-journal
systemd-services	System resource utilization for all running systemd services: CPU, memory, and disk IO.	`systemd-cgtop`	cgroups
streaming	Comprehensive overview of all Netdata children instances, offering detailed information about their status, replication completion time, and many more.

In the short-term, we will keep adding more (hopefully) helpful Functions but have longer-term plan where we will want to expand this functionality to potentially allow taking and storing snapshots of the results based on: triggered alerts, or periodical configuration.

In case you have suggestions we have a running GitHub Discussion open here.

New Alert Notification Integrations to Netdata Cloud

We've been working on adding more Alert Notification Integrations to Netdata Cloud and recently added the following new ones:

Amazon Simple Notification Service (Amazon SNS), and
Telegram

The full list of Alert Notification Integrations from Netdata Cloud can be found on our documentation here.

Acknowledgments

@ClaraCrazy for improving degraded adapters detection in python.d/megacli.
@thomasbeaudry for adding UPS selftest and status metrics to charts.d/apcupsd.
@watsonbox for adding LBAs written/read metrics to python.d/smartd_log.
@sepek for correcting an error in the "Change how long Netdata stores metrics" guide.
@seniorquico for fixing parsing and adding MAINT status metrics to python.d/haproxy.
@luisj1983 for correcting errors in the Health API documentation.
@andyundso for improving apps plugin by adding Erlang in apps_groups.conf.
@vobruba-martin for adding various improvements to go.d/mysql.