datadog-agent

Main repository for Datadog Agent

APACHE-2.0 License

Stars
2.6K
Committers
551

Bot releases are visible (Hide)

datadog-agent - 7.53.0 Latest Release

Published by kacper-murzyn 5 months ago

Agent

Prelude

Release on: 2024-04-30

New Features

  • Support database-monitoring autodiscovery for Aurora cluster instances. Adds a new configuration listener to poll for a specific set of Aurora cluster IDs and then create a new database-monitoring supported check configuration for each endpoint. This allows for monitoring of endpoints that scale dynamically.
  • Add new core check orchestrator_ecs to collect running ECS tasks
  • APM stats now include an is_trace_root field to indicate if the stats are from the root span of a trace.
  • The cluster-agent now collects network policies from the cluster.
  • Enable 'host_benchmarks' by default when running the security-agent compliance module.
  • OTLP ingest now has a feature flag to identify top-level spans by span kind. This new logic can be enabled by adding enable_otlp_compute_top_level_by_span_kind in DD_APM_FEATURES.
    • With this new logic, root spans and spans with a server or consumer span.kind will be marked as top-level. Additionally, spans with a client or producer span.kind will have stats computed.
    • Enabling this feature flag may increase the number of spans that generate trace metrics, and may change which spans appear as top-level in Datadog.
  • Experimental: The process-agent checks (process, container, and process-discovery) can be run from the Core Agent in Linux. This feature can be toggled on by setting the process_config.run_in_core_agent.enabled flag to true in the datadog.yaml file. This feature is disabled by default.

Enhancement Notes

  • Add the container image and container lifecycle checks to the output of the Agent status command.
  • Add kubelet_core_check_enabled flag to Agent config to control whether the kubelet core check should be loaded.
  • Added LastSuccessfulTime to cronjob status payload.
  • Add a retry mechanism to Software Bill of Materials (SBOM) collection for container images. This will help to avoid intermittent failures during the collection process.
  • Add startup timestamp to the Agent metadata payload.
  • Agents are now built with Go 1.21.9.
  • Adds image repo digest string to the container payload when present
  • CWS: Add selftests report on Windows and platforms with no eBPF support.
  • CWS: Add visibility for cross container program executions on platforms with no eBPF support.
  • APM: Enable credit card obfuscation by default. There is a small chance that numbers that are similar to valid credit cards may be redacted, this feature can be disabled by using apm_config.obfuscation.credit_cards.enabled. Alternatively, it can be made more accurate through luhn checksum verification by using apm_config.obfuscation.credit_cards.luhn, however, this increases the performance penalty of this check.
  • logs_config.expected_tags_duration now works for journald logs.
  • [oracle] Adds oracle.can_query service check.
  • [oracle] Automatically fall back to deprecated Oracle integration mode if privileges are missing.
  • [oracle] Add service configuration parameter.
  • The connections check no longer relies on the process/container check as it can now fetch container data independently.
  • The performance of Remote Config has been significantly improved when large amounts of configurations are received.
  • Send ECS task lifecycle events in the container lifecycle check.
  • dbm: add new SQL obfuscation mode normalize_only to support normalizing SQL without obfuscating it. This mode is useful for customers who want to view unobfuscated SQL statements. By default, ObfuscationMode is set to obfuscate_and_normalize and every SQL statement is obfuscated and normalized.
  • USM: Handle the HTTP TRACE method.

Deprecation Notes

  • [oracle] Deprecating Oracle integration code. The functionality is fully implemented in the oracle-dbm check which is now renamed to oracle.

Bug Fixes

  • The windows_registry check can be run with the check sub-command.
  • CWS: Fix very rare event corruption.
  • Fixes issue where processes for ECS Fargate containers would sometimes not be associated with the correct container.
  • Fixed a bug in the Dual Shipping feature where events were not being emitted on endpoint recovery.
  • Fix issue with display_container_name being tagged as N/A when container_name information is available.
  • Fix a Windows process handle leak in the Process Agent, which was introduced in 7.52.0 when process_collection is enabled.
  • Fixes a bug where the tagger server did not properly handle a closed channel.
  • [oracle] Set the default for metric_prefix in custom_queries to oracle.
  • [oracle] Fix global_custom_queries bug.
  • [oracle] Adds the oracle.process.pga_maximum_memory metric for backward compatibility.
  • Stop sending systemd metrics when they are not set

Datadog Cluster Agent

Prelude

Released on: 2024-04-30 Pinned to datadog-agent v7.53.0: CHANGELOG.

New Features

  • APM library injection now works on EKS Fargate when the admission controller is configured to add an Agent sidecar in EKS Fargate.
  • Cluster Agent now supports activating Application Security Management, Code Vulnerabilities, and Software Composition Analysis via Helm charts.

Enhancement Notes

  • Add the mutation_webhook tag to admission_webhooks.webhooks_received and admission_webhooks.response_duration Cluster Agent telemetry.
  • When using the admission controller to inject an Agent sidecar on EKS Fargate, shareProcessNamespace is now set to true automatically. This is to ensure that the process collection feature works.
datadog-agent - 7.52.1

Published by kacper-murzyn 7 months ago

Agent

Prelude

Release on: 2024-04-04

Enhancement Notes

  • Add a check to the Windows installer to verify that the caller has the correct membership to install the Agent.
  • Ensure the metadata requests are delayed at Agent startup to reduce host tag delays.
datadog-agent - 7.52.0

Published by kacper-murzyn 7 months ago

Agent

Prelude

Release on: 2024-03-21

Upgrade Notes

  • To prevent misconfigurations, the Windows Datadog Agent installer now raises an error if the user account running the installer MSI is provided as the ddagentuser (DDAGENTUSER_NAME) account. If the account is a service account, such as LocalSystem or a gMSA account, no action is needed. If the account is a regular account, configure a different Datadog Agent service account.

New Features

  • Add device_type to the device metadata.
  • Attach host tags to metrics for expected_tags_duration amount of time.
  • APM stats will now include, if present, the Git commit SHA from traces (or container tags) and the image tag from container tags.
  • Creation of a new packageSigning component to collect Linux package signature information and improve signature rotation process. More information can be found in DataDog documentation at 2024 linux key rotation.
  • Adds support for span links in the trace agent. This field contains a list of casual relationships between spans and is only populated when v0.4 of the Trace API is used.
  • The Windows Agent now supports CWS for process and network threats.
  • CWS: Add chdir event to allow recent container escape detection.
  • CWS: [BETA] Add File Integrity Monitoring support on Windows, supporting both files and registry.
  • CWS: The Agent now automatically suppresses benign security events if they have already been reported for a particular container image.
  • Updating process agent discovery configuration to include a Data Scrubber for obfuscating sensitive information such as passwords, API keys, or tokens.
  • Add support for pinging network devices in the SNMP integration.
  • [oracle] Add oracle.locks.transaction_duration metric.
  • APM: Add support for Single Step Instrumentation remote configuration
  • Headless agent installation support on macOS 14 and later

Enhancement Notes

  • [DBM] Increase the DBM dbm-metrics-intake endpoint's defaultInputChanSize value to 500.
  • Add debug level logs when files are evicted from registry.json after their TTL expires.
  • Add the instance ID returned by the IMDSv2 metadata endpoint to the list of EC2 host aliases.
  • This change adds journald permissions to the flare in the logs_file_permissions.log file, in the form of either the journald directory or a specific file (if specified by the Agent journald configuration).
  • The Logs Agent now creates a file in the flare, called logs_file_permissions.log, which lists every file and that file's permissions that the Logs Agent can detect.
  • Add the SBOM check to the output of the Agent status command and the Agent flare.
  • Add the Software Bill of Materials (SBOM) for container images to the output of the flare command.
  • Add repo_digest to containerd ContainerImage to remove duplicate images in container images UI.
  • Agents are now built with Go 1.21.7.
  • Agents are now built with Go 1.21.8.
  • CWS: Improved coverage on platforms with no eBPF support.
  • CWS: Send context of variables in events.
  • Add DD_APM_DEBUGGER_DIAGNOSTICS_DD_URL, DD_APM_DEBUGGER_DIAGNOSTICS_API_KEY, and DD_APM_DEBUGGER_DIAGNOSTICS_ADDITIONAL_ENDPOINTS to allow sending Live Debugger / Dynamic Instrumentation diagnostic data to multiple intakes.
  • Added config that allows user to toggle on and off the collection of zombie processes in the Process Agent.
  • [oracle] Add ddagenthostname tag.
  • [oracle]: Add oracle.tablespace.maxsize metric.
  • OTLP ingest supports stable Java runtime metrics introduced in opentelemetry-java-instrumentation v2.0.0. OTLP ingest supports Kafka metrics mapping. This allows users of the JMX Receiver/JMX Metrics Gatherer and Kafka metrics receiver to have access to the OOTB Kafka Dashboard.
  • Modified the process check to populate process with the newly created field "ProcessContext"
  • Rename the kubelet_core check to kubelet and change the metrics prefix from kubernetes_core to kubernetes so that it can replace the Python kubelet check.
  • APM: Adds msgp_short_bytes reason for trace payloads dropped to distinguish them from EOF errors.
  • When getting resource tags from an ECS task with zero containers, print a warn log instead of error log.

Deprecation Notes

  • Removal of the pod check from the process agent. The current check will run from the core agent.
  • This release drops support for Red Hat Enterprise Linux 6 and its derivatives.
  • [oracle] Deprecate the configuration parameter instant_client. Replacing it with oracle_client.
  • Removed the system-probe configuration value data_streams_config.enabled and replaced it with service_monitoring_config.enable_kafka_monitoring. This also implies that the DsmEnabled field in the AgentConfiguration proto will consistently be set to false.

Bug Fixes

  • Upgrade dependencies for systemd core check. This silences excessive warning logs on systemd v252.
  • oracle: Fix wrong tablespace metrics.
  • APM: Stop dropping incoming OTel payloads when the processing channel is full and eliminate OOM issues in the trace agent and collector component in high load scenarios, making the OTel pipeline more reliable.
  • Fix dogstatsd-capture. Message PID was not set after the 7.50 release.
  • Fix a memory exception where the flare controller tries to stat a file that doesn't exist.
  • Fleet Automation filters in the Datadog UI now accurately reflect which products are enabled when deployed with the official DataDog Helm chart on Kubernetes.
  • Corrected a problem where the ignore_autodiscovery_tags parameter was not functioning correctly with pod annotations or autodiscovery version 2 (adv2) annotations. This fix ensures that when this parameter is set to true, autodiscovery tags are ignored as intended. Example:

yaml ad.datadoghq.com/redis.checks: | { "redisdb": { "ignore_autodiscovery_tags": true, "instances": [ { "host": "%%host%%", "port": "6379" } ] } }
Moving forward, configurations that attempt to use hybrid setups—combining adv2 for check specification while also employing `adv1 for ignore_autodiscovery_tags—are no longer supported by default. Users should set the configuration parameter cluster_checks.support_hybrid_ignore_ad_tags to true to enable this behavior.

  • [oracle]: Add support for more Asian character sets.
  • Prevention of OOMs when collecting a large number of zombie processes.
  • Fixed race conditions caused by concurrent execution of etw.StartEtw() and etw.StopEtw() functions which may concurrently access and modify a global map.
  • Fix recent PR #22664 which in turn fixes a race condition in the ETW package. The previous PR introduced a minor error addressed in this PR.
  • [oracle] Add resource_manager configuration to conf.yaml.example.
  • [oracle] Fix multi-tagging bug.
  • Fixes a bug in OTLP ingest where empty histograms were not being sent to the backend in the distributions mode. Empty histograms are now mapped as if they had a single (min, max) bucket.
  • Scrub authentication bearer token of any size, even invalid, from integration configuration (when being printed through the checksconfig CLI command or other).
  • Empty UDS payloads no longer cause the DogStatsD server to close the socket.

Other Notes

  • The version of Python required for tooling in README matches that which the CI uses.

Datadog Cluster Agent

New Features

  • Add agent sidecar injection webhook in cluster-agent Kubernetes admission controller. This new webhook adds the Agent as sidecar container in applicative Pods when it is required by the environment. For example with the EKS Fargate environment.

Enhancement Notes

  • Introduces a new config option in the Cluster Agent to set the rebalance period when advanced dispatching is enabled: cluster_checks.rebalance_period. The default value is 10 min.

Bug Fixes

  • Fix an issue where the admission controller would remove the field restartPolicy from native sidecar containers, preventing pod creation on Kubernetes 1.29+.
  • Fix missing kube_api_version tag on HPA and VPA resources.
datadog-agent - 7.51.1

Published by kacper-murzyn 8 months ago

Agent

Prelude

Release on: 2024-02-29

New Features

  • Add the chdir event type to CWS.

Security Notes

  • Bump embedded Python version to 3.11.8 to address CVE-2023-5678 on Windows.

Bug Fixes

  • Fix a crash in the win32_event_log check that occurs when processing an event that has a missing publisher and no EventData.
datadog-agent - 7.51.0

Published by kacper-murzyn 8 months ago

Agent

Prelude

Release on: 2024-02-19

Upgrade Notes

  • The orchestrator check is moving from the Process Agent to the Core Agent. Any orchestrator configuration set on the Process Agent will need to be moved to the Core Agent. No other changes are required. If you need to go back to the old check, you can do so temporarily by manually setting the environment variable DD_ORCHESTRATOR_EXPLORER_RUN_ON_NODE_AGENT to false. The Process Agent pod check will be deprecated in the following release.
  • Upgrade the Python version from 3.9 to 3.11.

New Features

  • Add support for ARM64 SLES flavor of datadog-agent
  • Add support for multiple users when listening for SNMP traps.
  • Add check_delay metric in Agent telemetry
  • Add an ETW component for ETW tracing.
  • Add an ETW APM tracer component to forward .Net ETW events to the Tracer Agent.
  • DBM: Add configuration options to SQL obfuscator to customize the
    normalization of SQL statements:
    • KeepTrailingSemicolon - disable removing trailing semicolon. This option is only valid when ObfuscationMode is obfuscate_and_normalize.
    • KeepIdentifierQuotation - disable removing quotation marks around identifiers. This option is only valid when ObfuscationMode is obfuscate_and_normalize.
  • CWS: [BETA] early support based on ptrace for platforms with no eBPF support. Only processes and files are currently supported.
  • Add msodbcsql18 linux dependency needed for SQL Server to run in Docker Agent.
  • Add timestamps to the logs HTTP client
  • Add support for Oracle Active Data Guard.
  • Re-enable Aerospike in SUSE packages.
  • The Windows Registry integration can now send the registry values as logs.

Enhancement Notes

  • Updated the ntp check to support the default location of chrony.conf on Ubuntu (/etc/chrony/chrony.conf).

  • Agents are now built with Go 1.21.5.

  • CWS: Reloading the datadog-agent-sysprobe systemd service now reloads the runtime security policies.

  • CWS: Added ssdeep file hashing algorithm support.

  • USM will report the actual status code of the HTTP traffic, instead of reporting only the status code family (2xx, 3xx, etc.).

  • Improved performance of the activity sampling query on RDS and Oracle Cloud databases.

  • OTLP ingest log timestamps (i.e. '@timestamp') now include milliseconds.

  • Always report the following telemetry metrics about the retry queue capacity:

    • datadog.agent.retry_queue_duration.capacity_secs
    • datadog.agent.retry_queue_duration.bytes_per_sec
    • datadog.agent.retry_queue_duration.capacity_bytes
  • Support container metrics for kata containers using containerd.

  • System Probe can now expose its healthcheck on a dedicated HTTP port. The Kubernetes daemonset uses this by default on port 5558.

Deprecation Notes

  • The config value ipc_address is deprecated in favor of cmd_host.
  • service_monitoring_config.process_service_inference.enabled is deprecated and replaced by system_probe_config.process_service_inference.enabled service_monitoring_config.process_service_inference.use_windows_service_name is deprecated and replaced by system_probe_config.process_service_inference.use_windows_service_name
  • Removes freetds and msodbcsql18 dependencies for py2.
  • Removes postgresql dependency after upgrading psycopg2 to v2.9 in integrations-core. psycopg2 now comes with pre-built wheel for arm architecture.
  • An error will now be logged if replace tags are used to change the Agent "env", since this could have negative side effects. At this time, an error is logged, but future versions may explicitly disallow this to avoid bugs. See https://docs.datadoghq.com/getting_started/tracing/#environment-name for instructions on setting the env, and https://github.com/DataDog/datadog-agent/issues/21253 for more details about this issue.

Bug Fixes

  • CWS/CSPM: Fixes the hostname value attached to CWS and CSPM events, which in rare cases the security agent computed incorrectly.
  • Fix file_handle core check on Darwin by using sysctl system call.
  • Fix spikes for bandwidth usage metric when interface speed is auto-adjusted.
  • Fixes Agent startup script when enabling OOM Kill and TCP Queue Length checks to prevent crashes when restarting the container.
  • Fix a spewing error message ("DCA Client not initialized by main provider, cannot post heartbeat") in the cluster check runner log during CLC initialization.
  • Fixed Logs Agent additional endpoints to respect their logs_no_ssl setting.
  • [DBM] Add Oracle broken connection handling on Windows
  • Fix indentation in conf.yaml.example.
  • Bug fix for empty database names in query samples.
  • Bug fix for the Korean character set for Windows.
  • Fixing the issue with a Korean character set for Windows.
  • Fix missing sysmetrics, such as shared pool and library cache.
  • Bug fix for missing tags.
  • Fixed obfuscation error false positive when the access or filter
    predicates are empty.
  • Fix resource manager metrics collection bugs.
  • Pause containers from the Rancher image-mirror repository (rancher/mirrored-pause.*) are now excluded by default for containers and metrics collection.
  • Error messages from Go checks are now shown on the Agent GUI status page instead of UNKNOWN ERROR.

Other Notes

  • Update s6-overlay version used in Datadog Agent container images to v2.2.0.3
  • Added a warning when logs_no_ssl is set and dd_url contains an https prefix. logs_no_ssl will take precedence over the prefix in a future version.

Datadog Cluster Agent

Prelude

Released on: 2024-02-19 Pinned to datadog-agent v7.51.0: CHANGELOG.

New Features

  • Enable Horizontal Pod Autoscaler collection for the Orchestrator by default
  • Add isolate command to clusterchecks to make it easier to pinpoint a check that that is causing high CPU/memory usage. Command can be run in the cluster agent with: datadog-cluster-agent clusterchecks isolate --checkID=<checkID>

Enhancement Notes

  • Enable CRD collection by default in the orchestrator check.

Bug Fixes

  • Fixes a bug that would trigger unnecessary APIServer List requests from the Cluster Agent or Cluster Checks Runner.
datadog-agent - 7.50.3

Published by kacper-murzyn 9 months ago

Prelude

Release on: 2024-01-11

Bug Fixes

  • Fix incorrect metadata about system-probe being sent to Inventory and Fleet Automation products.
datadog-agent - 7.50.2

Published by kacper-murzyn 10 months ago

Prelude

Release on: 2024-01-04

Enhancement Notes

  • Agents are now built with Go 1.20.12.

Bug Fixes

  • The CWS configuration parameter to enable anomaly detection is now working and taken into account by the Agent.
  • Fix issue introduced in 7.47 that allowed all users to start/stop the Windows Datadog Agent services. The Windows installer now, as in versions before 7.47, grants this permission explicitly to ddagentuser.
datadog-agent - 7.50.1

Published by kacper-murzyn 10 months ago

Prelude

Release on: 2023-12-21

Bug Fixes

  • Fixes a bug introduced in 7.50.0 preventing DD_TAGS to be added to kubernetes_state.* metrics.
datadog-agent - 7.50.0

Published by kacper-murzyn 10 months ago

Agent

Prelude

Release on: 2023-12-19

Upgrade Notes

New Features

  • The orchestrator check is moving from the Process Agent to the Node Agent. In the next release, this new check will replace the current pod check in the Process Agent. You can start using this new check now by manually setting the environment variable DD_ORCHESTRATOR_EXPLORER_RUN_ON_NODE_AGENT to true.

  • Adds the following CPU manager metrics to the kubelet core check: kubernetes_core.kubelet.cpu_manager.pinning_errors_total, kubernetes_core.kubelet.cpu_manager.pinning_requests_total.

  • Add a diagnosis for connecting to the agent logs endpoints. This is accessible through the agent diagnose command.

  • Add FIPS mode support for Network Device Monitoring products

  • Added support for collecting Cloud Foundry container names without the Cluster Agent.

  • The Kubernetes State Metrics Core check now collects kubernetes_state.ingress.tls.

  • APM: Added a new endpoint tracer_flare/v1/. This endpoint acts as a proxy to forward HTTP POST request from tracers to the serverless_flare endpoint, allowing tracer flares to be triggered via remote config, improving the support experience by automating the collection of logs.

  • CWS: Ability to send a signal to a process when a rule was triggered. CWS: Add Kubernetes user session context to events, in particular the username, UID and groups of the user that ran the commands remotely.

  • Enable container image collection by default.

  • Enable container lifecycle events collection by default. This feature helps stopped containers to be cleaned from Datadog faster.

  • [netflow] Allow collecting configurable fields for Netflow V9/IPFIX

  • Add support for Oracle 12.1 and Oracle 11.

  • Add monitoring of Oracle ASM disk groups.

  • Add metrics for monitoring Oracle resource manager.

  • [corechecks/snmp] Load downloaded profiles

  • DBM: Add configuration option to SQL obfuscator to use go-sqllexer package to run SQL obfuscation and normalization

  • Support filtering metrics from endpoint and service checks based on namespace when the DD_CONTAINER_EXCLUDE_METRICS environment variable is set.

  • The Windows Event Log tailer saves its current position in an event log and resumes reading from that location when the Agent restarts. This allows the Agent to collect events created before the Agent starts.

Enhancement Notes

  • [corechecks/snmp] Support symbol modifiers for global metric tags and metadata tags.
  • Update the go-systemd package to the latest version (22.5.0).
  • Added default peer tags for APM stats aggregation which can be enabled through a new flag (peer_tags_aggregation).
  • Add a stop timeout to the Windows Agent services. If an Agent service does not cleanly stop within 15 seconds after receiving a stop command from the Service Control Manager, the service will hard stop. The timeout can be configured by setting the DD_WINDOWS_SERVICE_STOP_TIMEOUT_SECONDS environment variable. Agent stop timeouts are logged to the Windows Event Log and can be monitored and alerted on.
  • APM: OTLP: Add support for custom container tags via resource attributes prefixed by datadog.container.tag.*.
  • Agents are now built with Go 1.20.11.
  • CWS: Support for Ubuntu 23.10. CWS: Reduce memory usage of ring buffer on machines with more than 64 CPU cores. CSPM: Move away from libapt to run Debian packages compliance checks.
  • DBM: Bump the minimum version of the go-sqllexer library to 0.0.7 to support collecting stored procedure names.
  • Add subcommand diagnose show-metadata gohai for gohai data
  • Upgraded JMXFetch to 0.49.0 which adds some more telemetry and contains some small fixes.
  • Netflow now supports the datadog-agent status command, providing configuration information. Any configuration errors encountered will be listed.
  • Emit database_instance tag with the value host/cdb. The goal is to show each database separately in the DBM entry page. Currently, the backend initializes database_instance to host. Also, the Agent will emit the new db_server tag because we have to initialize the host tag to host/cdb.
  • Improve obfuscator formatting. Prevent spaces after parentheses. Prevent spaces before # when # is a part of an identifier.
  • Emit query metrics with zero executions to capture long runners spanning over several sampling periods.
  • Impose a time limit on query metrics processing. After exceeding the default limit of 20s, the Agent stops emitting execution plans and fqt events.
  • Add oracle.inactive_seconds metric. Add tags with session attributes to oracle.process_pga* metrics.
  • Stop override peer.service with other attributes in OTel spans.
  • Process-Agent: Improved parsing performance of the '/proc/pid/stat' file (Linux only)
  • [snmp_listener] Enable collect_topology by default.
  • dbm: add SQL obfuscation options to give customer more control over how SQL is obfuscated and normalized.
    • RemoveSpaceBetweenParentheses - remove spaces between parentheses. This option is only valid when ObfuscationMode is obfuscate_and_normalize.
    • KeepNull` - disable obfuscating null values with ?. This option is only valid whenObfuscationModeis "obfuscate_only" orobfuscate_and_normalize``.
    • KeepBoolean - disable obfuscating boolean values with ?. This option is only valid when ObfuscationMode is obfuscate_only or obfuscate_and_normalize.
    • KeepPositionalParameter - disable obfuscating positional parameters with ?. This option is only valid when ObfuscationMode is obfuscate_only or obfuscate_and_normalize.
  • Add logic to support multiple tags created by a single label/annotaion. For example, add the following config to extract tags for chart_name and app_chart_name. podLabelsAsTags: chart_name: chart_name, app_chart_name Note: the format must be a comma-separated list of tags.
  • The logs collection pipeline has been through a refactor to support processing only the message content (instead of the whole raw message) in the journald and Windows events tailers. This feature is experimental and off by default since it changes how existing log_processing_rules behaves with journald and Windows events tailer. Note that it will be switched on by default in a future release of the Agent. A warning notifying about this is shown when the journald and Windows events tailers are used with some log_processing_rules.
  • The Datadog agent container image is now using Ubuntu 23.10 mantic as the base image.
  • The win32_event_log check now continuously collects and reports events instead of waiting for min_collection_interval to collect. min_collection_interval now controls how frequently the check attempts to reconnect when the event subscription is in an error state.

Deprecation Notes

  • Installing the Agent on Windows Server versions lower than 2016 and client versions lower than 10 is now deprecated.
  • The timeout option for the win32_event_log check is no longer applicable and can be removed. If the option is set, the check logs a deprecation warning and ignores the option.

Security Notes

  • Fix CVE-2023-45283 and CVE-2023-45284
  • Update OpenSSL from 3.0.11 to 3.0.12. This addresses CVE-2023-5363.

Bug Fixes

  • On Windows, uninstalling the Agent should not fail if the Datadog Agent registry key is missing.
  • APM: OTLP: Only extract DD container tags from resource attributes. Previously, container tags were also extracted from span attributes.
  • APM: OTLP: Only add container tags in tag _dd.tags.container. Previously, container tags were also added as span tags.
  • Resolved an issue in the containerd collector where the SBOM collection did not correctly attach RepoTags and RepoDigests to the SBOM payload.
  • Add a workaround for a bug in a Windows API that can cause the Agent to crash when collecting forwarded events from the Windows Event Log.
  • Resolve the issue with hostname resolution in the kube_apiserver provider when the useHostNetwork setting is enabled.
  • Fix an issue that prevented process ID (PID) from being associated with containers in Live Container View when the Agent is deployed in AWS Fargate.
  • APM: Fixed trace-agent not forwarding errors from remote configuration and reporting them all as 500s
  • On Windows, the SE_DACL_AUTO_INHERITED flag is reset on %PROJECTLOCATION% during upgrades and uninstalls.
  • Fixes a bug in the Windows NPM driver where NPM displays byte overcounts.
  • For USM on Windows, fixes the problem where paths were being erroneously reported as truncated
  • Fixes journald log's Seek function to be set at the beginning or end upon initialization.
  • Fixed the cause of some crashes related to CPU instruction incompatibility happening under certain CPUs when making calls to the included libgmp library.
  • [kubelet] The Kubelet client no longer fails to initialize when the parameter kubelet_tls_verify is set to false with a misconfigured root certificate authority.
  • Fixes a bug where the process-agent process check command would fail to run when language detection was enabled.
  • Document query metrics metric_prefix parameter.
  • Set the tag dd.internal.resource:database_instance to host instead of host/cdb.
  • Switch to the new obfuscator where bugs such as getting an error when obfuscating @! and where comments on DMLs weren't being removed are fixed.
  • Fixes wrong values in Oracle query metrics data. Extreme cases had inflated statistics and missing statements. The affected were pure DML and PL/SQL statements.
  • Fix the bug that prevented Oracle DBM working properly on AWS RDS non-multitenant instances.
  • Fix an issue that caused the win32_event_log check to not stop running when the rate of incoming event records was higher than the timeout option. The timeout option is now deprecated.
  • The Windows Event Log tailer automatically recovers and is able to resume collecting events when a log provider is reinstalled, which sometimes happens during Windows updates.

Datadog Cluster Agent

New Features

  • Add language detection API handler to the cluster-agent.
  • Report rate_limit_queries_remaining_min telemetry from external-metrics server.
  • Added a new --force option to the datadog-cluster-agent clusterchecks rebalance command that allows you to force clustercheck rebalancing with utilization.
  • [Beta] Enable APM library injection in cluster-agent admission controller based on automatic language detection annotations.

Enhancement Notes

  • Show Autodiscovery information in the output of datadog-cluster-agent status.
  • Added CreateContainerConfigError wait reason to the kubernetes_state.container.status_report.count.waiting metric reported by the kubernetes_state_core check.
  • Release the Leader Election Lock on shutdown to make the initialization of future cluster-agents faster.
  • The Datadog cluster-agent container image is now using Ubuntu 23.10 mantic as the base image.

Bug Fixes

  • Fixed a bug in the kubernetes_state_core check that caused tag corruption when telemetry was set to true.
  • Fix stale metrics being reported by kubernetes_state_core check in some rare cases.
  • Fixed a bug in the rebalancing of cluster checks. Checks that contained secrets were never rebalanced when the Cluster Agent was configured to not resolve check secrets (option secret_backend_skip_checks set to true).
datadog-agent - 7.49.1

Published by kacper-murzyn 11 months ago

Prelude

Release on: 2023-11-16

Bug Fixes

  • CWS: add arch field into agent context included in CWS events.
  • APM: Fix a deadlock issue which can prevent the trace-agent from shutting down.
  • CWS: Fix the broken lineage check for process activity in CWS.
  • APM: fix a regression in the Trace Agent that caused container tagging with UDS and cgroup v2 to fail.
datadog-agent - 7.49.0

Published by kacper-murzyn 12 months ago

Agent

Prelude

Release on: 2023-11-02

New Features

  • Add --use-unconnected-udp-socket flag to agent snmp walk command.

  • Add support for image pull metrics in the containerd check.

  • Add kubelet stats.summary check (kubernetes_core.kubelet.*) to the Agent's core checks to replace the old kubernetes.kubelet check generated from Python.

  • APM: [BETA] Adds peer_tags configuration to allow for more tags in APM stats that can add granularity and clarity to a peer.service. To set this config, use DD_APM_PEER_TAGs='["aws.s3.bucket", "db.instance", ...] or apm_config.peer_tags: ["aws.s3.bucket", "db.instance", ...] in datadog.yaml. Please note that DD_APM_PEER_SERVICE_AGGREGATION or apm_config.peer_service_aggregation must also be set to true.

  • Introduces new Windows crash detection check. Upon initial check run, sends a DataDog event if it is determined that the machine has rebooted due to a system crash.

  • Install the Aerospike integration on ARM platforms for Python 3

  • CWS: Detect patterns in processes and files paths to improve accuracy of anomaly detections.

  • Add Dynamic Instrumentation diagnostics proxy endpoint to the trace-agent http server.

    At present, diagnostics are forwarded through the debugger endpoint on the trace-agent server to logs. Since Dynamic Instrumentation also allows adding dynamic metrics and dynamic spans, we want to remove the dependency on logs for diagnostics - the new endpoint uploads diagnostic messages on a dedicated track.

  • Adds a configurable jmxfetch telemetry check that collects additional data on the running jmxfetch JVM in addition to data about the JVMs jmxfetch is monitoring. The check can be configured by enabling the jmx_telemetry_enabled option in the Agent.

  • [NDM] Collect diagnoses from SNMP devices.

  • Adding support for Oracle 12.2.

  • Add support for Oracle 18c.

  • CWS now computes hashes for all the files involved in the generation of a Security Profile and an Anomaly Detection Event

  • [Beta] Cluster agent supports APM Single Step Instrumentation for Kubernetes. Can be enabled in Kubernetes cluster by setting `DD_APM_INSTRUMENTATION_ENABLED=true. Single Step Instrumentation can be turned on in specific namespaces using environment variable DD_APM_INSTRUMENTATION_ENABLED_NAMESPACES. Single Step Instrumentation can be turned off in specific namespaces using environment variable DD_APM_INSTRUMENTATION_DISABLED_NAMESPACES.

Enhancement Notes

  • Moving the Orchestrator Explorer pod check from the process agent to the core agent. In the following release we will be removing the process agent check and defaulting to the core agent check. If you want to migrate ahead of time you can set orchestrator_explorer.run_on_node_agent = true in your configuration.

  • Add new GPU metrics in the KSM Core check:

    • kubernetes_state.node.gpu_capacity tagged by node, resource, unit and mig_profile.
    • kubernetes_state.node.gpu_allocatable tagged by node, resource, unit and mig_profile.
    • kubernetes_state.container.gpu_limit tagged by kube_namespace, pod_name, kube_container_name, node, resource, unit and mig_profile.
  • Tag container entity with image_id tag.

  • max_message_size_bytes can now be configured in logs_config. This allows the default message content limit of 256,000 bytes to be increased up to 1MB. If a log line is larger than this byte limit, the overflow bytes will be truncated.

  • APM: Add regex support for filtering tags by apm_config.filter_tags_regex or environment variables DD_APM_FILTER_TAGS_REGEX_REQUIRE and DD_APM_FILTER_TAGS_REGEX_REJECT.

  • Agents are now built with Go 1.20.10.

  • CWS: Support fentry/fexit eBPF probes which provide lower overhead than kprobe/kretprobes (currently disabled by default and supported only on Linux kernel 5.10 and later).

  • CWS: Improved username resolution in containers and handle their creation and deletion at runtime.

  • CWS: Apply policy rules on processes already present at startup.

  • CWS: Reduce memory usage of BTF symbols.

  • Remote Configuration for Cloud Workload Security detection rules is enabled if Remote Configuration is globally enabled for the Datadog Agent. Remote Configuration for Cloud Workload Security can be disabled while Remote Configuration is globally enabled by setting the runtime_security_config.remote_configuration.enabled value to false. Remote Configuration for Cloud Workload Security cannot be enabled if Remote Configuration is not globally enabled.

  • Add gce-container-declaration to default GCE excluded host tags. See exclude_gce_tags configuration settings for more.

  • Add metrics for the workloadmeta extractor to process-agent status output.

  • Add a heartbeat mechanism for SBOM collection to avoid having to send the whole SBOM if it has not changed since the last computation. The default interval for the host SBOM has changed from 24 hours to 1 hour.

  • Prefix every entry in the log file with details about the database server and port to distinguish log entries originating from different databases.

  • JMXFetch internal telemetry is now included in the agent status output when the verbose flag is included in the request.

  • Sensitive information is now scrubbed from pod annotations.

  • The image_id tag no longer includes the docker-pullable:// prefix when using Kubernetes with Docker as runtime.

  • Improve SQL text collection for self-managed installations. The Agent selects text from V$SQL instead of V$SQLSTATS. If it isn't possible to query the text, the Agent tries to identify the context, such as parsing or closing cursor, and put it in the SQL text.

  • Improve the Oracle check example configuration file.

  • Collect Oracle execution plans by default.

  • Add global custom queries to Oracle checks.

  • Add connection refused handling.

  • Add the hosting-type tag, which can have one of the following values: self-managed, RDS, or OCI.

  • Add a hidden parameter to log unobfuscated execution plan information.

  • Adding real_hostname tag.

  • Add sql_id and plan_hash_value to obfuscation error message.

  • Add Oracle pga_over_allocation_count_metric.

  • Add information about missing privileges with the link to the grant commands.

  • Add TCPS configuration to conf.yaml.example.

  • The container check reports two new metrics:

    • container.memory.page_faults
    • container.memory.major_page_faults

    to report the page fault counters per container.

  • prometheus_scrape: Adds support for multiple OpenMetrics V2 features in the prometheus_scrape.checks[].configurations[] items:

    • exclude_metrics_by_labels
    • raw_line_filters
    • cache_shared_labels
    • use_process_start_time
    • hostname_label
    • hostname_format
    • telemetry
    • ignore_connection_errors
    • request_size
    • log_requests
    • persist_connections
    • allow_redirects
    • auth_token

    For a description of each option, refer to the sample configuration in https://github.com/DataDog/integrations-core/blob/master/openmetrics/datadog_checks/openmetrics/data/conf.yaml.example.

  • Improved the SBOM check function to now communicate the status of scans and any potential errors directly to DataDog for more streamlined error management and resolution.

  • Separate init-containers from containers in the KubernetesPod structure of workloadmeta.

  • Improve marshalling performance in the system-probe -> process-agent path. This improves memory footprint when NPM and/or USM are enabled.

  • Raise the default logs_config.open_files_limit to 500 on Windows.

Deprecation Notes

  • service_monitoring_config.enable_go_tls_support is deprecated and replaced by service_monitoring_config.tls.go.enabled. network_config.enable_https_monitoring is deprecated and replaced by service_monitoring_config.tls.native.enabled.

Security Notes

  • APM: The Agent now obfuscates the entire Memcached command by default. You can revert to the previous behavior where only the values were obfuscated by setting DD_APM_OBFUSCATION_MEMCACHED_KEEP_COMMAND=true or apm_config.obfuscation.memcached.keep_command: true in datadog.yaml.
  • Fix CVE-2023-39325
  • Bump golang.org/x/net to v0.17.0 to fix CVE-2023-44487.

Bug Fixes

  • Fix Agent Flare not including Trace Agent's expvar output.
  • Fixes a panic that occurs when the Trace Agent receives an OTLP payload during shutdown
  • Fixes a crash upon receiving an OTLP Exponential Histogram with no buckets.
  • CWS: Scope network context to DNS events only as it may not be available to all events.
  • CWS: Fix a bug that caused security profiles of already running workloads to be empty.
  • The docker.cpu.shares metric emitted by the Docker check now reports the correct number of CPU shares when running on cgroups v2.
  • Fixes a critical data race in workloadmeta that was causing issues when a subscriber attempted to unsubscribe while events were being handled in another goroutine.
  • Fix misnamed metric in the trace-agent.
  • Fixed a problem that caused the Agent to miss some image labels when using containerd as the container runtime.
  • Fix config conflict preventing logs_config.use_podman_logs from working
  • The scubbing logic for configurations now scrubs YAML lists. This fixes flare_stripped_keys not working on YAML list.
  • Fixed an issue in the SBOM check when using Kubernetes with Docker as runtime. Some images used by containers were incorrectly marked as unused.
  • Fix Oracle SQL text truncation in query samples.
  • Make the custom queries feature available for non-DBM users.
  • Fix wrong tags generated by custom queries.
  • Eliminate duplicate upper case cdb and pdb tags.
  • Fix panic: runtime error: invalid memory address or nil pointer dereference in StatementMetrics by improving cache handling.
  • Fix truncation of SQL text for large statements.
  • Fix the failed to query v$pdbs, which was appearing for RDS databases.
  • Bug fix for ORA-06502: PL/SQL: numeric or value error: character string buffer too small. This error would occasionally appear during activity sampling.
  • Adjust doc links to grant privilege commands for multitenant and non-CDB architecture.
  • Workaround for the PGA memory leak.
  • Improve recovering from lost connections in custom queries.
  • Emit zero value for oracle.pga_over_allocation metric.
  • APM: Parse SQL Server query with single dollar identifier $action.

Other Notes

Datadog Cluster Agent

New Features

  • Added option to attach profiling data to a flare.
  • Increment cluster agent admission controller mutation attempts metric when library is auto-injected.

Enhancement Notes

  • Added the check_name tag to the cluster_checks.configs_info metric emitted by the Cluster Agent telemetry.
  • Sensitive information is now scrubbed from pod annotations.
  • Skip collections for resources missing RBACs in orchestrator check

Bug Fixes

  • Remove openmetrics endpoint default value from containerd check default configuration.
  • Resolved a conflict between the admission controller and the AKS admissions enforcer that previously led to a loop in reconciling the webhook.
  • Fixes a panic in the Cluster Agent that happens when trying to unschedule a check that has not been dispatched to any runner.
datadog-agent - 7.48.1

Published by kacper-murzyn about 1 year ago

Prelude

Release on: 2023-10-17

Upgrade Notes

  • Upgraded Python 3.9 to Python 3.9.18

Security Notes

  • Bump embedded curl version to 8.4.0 to fix CVE-2023-38545 and CVE-2023-38546
  • Updated the version of OpenSSL used by Python on Windows to 1.1.1w; addressed CVE-2023-4807, CVE-2023-3817, and CVE-2023-3446

Bug Fixes

  • On some slow drives, when the Agent shuts down suddenly the Logs Agent registry file can become corrupt. This means that when the Agent starts again the registry file can't be read and therefore the Logs Agent reads logs from the beginning again. With this update, the Agent now attempts to update the registry file atomically to reduce the chances of a corrupted file.
datadog-agent - 7.48.0

Published by kacper-murzyn about 1 year ago

Agent

Prelude

Release on: 2023-10-10

Upgrade Notes

  • The EventIDs logged to the Windows Application Event Log by the Agent services have been normalized and now have the same meaning across Agent services. Some EventIDs have changed and the rendered message may be incorrect if you view an Event Log from a host that uses a different version of the Agent than the host that created the Event Log. To ensure you see the correct message, choose "Display information for these languages" when exporting the Event Log from the host. This does not affect Event Logs collected by the Datadog Agent's Windows Event Log integration, which renders the event messages on the originating host. The EventIDs and messages used by the Agent services can be viewed in pkg/util/winutil/messagestrings/messagestrings.mc.

  • datadog-connectivity and metadata-availability subcommands do not exist anymore and their diagnoses are reported in a more general and structured way.

    Diagnostics previously reported via datadog-connectivity subcommand will be reported now as part of connectivity-datadog-core-endpointssuite. Correspondingly, diagnostics previously reported via metadata-availability subcommand will be reported now as part of connectivity-datadog-autodiscovery suite.

  • Streamlined settings by renaming workloadmeta.remote_process_collector.enabled and process_config.language_detection.enabled to language_detection.enabled.

  • The command line arguments to the Datadog Agent Trace Agent trace-agent have changed from single-dash arguments to double-dash arguments. For example, -config must now be provided as --config. Additionally, subcommands have been added, these may be listed with the --help switch. For backward-compatibility reasons the old CLI arguments will still work in the foreseeable future but may be removed in future versions.

New Features

  • Added the kubernetes_state.pod.tolerations metric to the KSM core check

  • Grab, base64 decode, and attach trace context from message attributes passed through SNS->SQS->Lambda

  • Add kubelet healthz check (check_run.kubernetes_core.kubelet.check) to the Agent's core checks to replace the old kubernetes.kubelet.check generated from Python.

  • Tag the aws.lambda span generated by the datadog-extension with a language tag based on runtime information in dotnet and java cases

  • Extended the "agent diagnose" CLI command to allow the easy addition of new diagnostics for diverse and dispersed Agent code.

  • Add support for the otlp_config.metrics.sums.initial_cumulative_monotonic_value setting.

  • [BETA] Adds Golang language and version detection through the system probe. This beta feature can be enabled by setting system_probe_config.language_detection.enabled to true in your system-probe.yaml.

  • Add new kubelet corecheck, which will eventually replace the existing kubelet check.

  • Add custom queries to Oracle monitoring.

  • Adding new configuration setting otlp_config.logs.enabled to enable/disable logs support in the OTLP ingest endpoint.

  • Add logsagentexporter, which is used in OTLP agent to translate ingested logs and forward them to logs-agent

  • Flush in-flight requests and pending retries to disk at shutdown when disk-based buffering of metrics is enabled (for example, when forwarder_storage_max_size_in_bytes is set).

  • Added a new collector in the process agent in workloadmeta. This collector allows for collecting processes when the process_config.process_collection.enabled is false and language_detection.enabled is true. The interval at which this collector collects processes can be adjusted with the setting workloadmeta.local_process_collector.collection_interval.

  • Tag lambda cold starts and proactive initializations on the root aws.lambda span

  • APM - This change improves the acceptance and queueing strategy for trace payloads sent to the Trace Agent. These changes create a system of backpressure in the Trace Agent, causing it to reject payloads when it cannot keep up with the rate of traffic, rather than buffering and causing OOM issues.

    This change has been shown to increase overall throughput in the Trace Agent while decreasing peak resource usage. Existing configurations for CPU and memory work at least as well, and often better, with these changes compared to previous Agent versions. This means users do not have to adjust their configuration to take advantage of these changes, and they do not experience performance degredation as a result of upgrading.

Enhancement Notes

  • When jmx_use_container_support is enabled you can use jmx_max_ram_percentage to set a maximum JVM heap size based off a percentage of the total container memory.
  • SNMP profile detection now updates the SNMP profile for a given IP if the device at that IP changes.
  • Add Process Language Detection Enabled in the output of the Agent Status command under the Process Agent section.
  • Improve agent diagnose command to be executed in context of running Agent process.
  • Agents are now built with Go 1.20.7. This version of Golang fixes CVE-2023-29409.
  • Added the container.memory.usage.peak metric to the container check. It shows the maximum memory usage recorded since the container started.
  • Unified agent diagnose CLI command by removing all, datadog-connectivity, and metadata-availability subcommands. These separate subcommands became one of the diagnose suites. The all subcommand became unnecessary.
  • APM: Improved performance and memory consumption in obfuscation, both halved on average.
  • Agents are now built with Go 1.20.8.
  • The processor frequency sent in metadata is now a decimal value on Darwin and Windows, as it already is on Linux. The precision of the value is increased on Darwin.
  • CPU metadata which failed to be collected is no longer sent as empty values on Windows.
  • Platform metadata which failed to be collected is no longer sent as empty values on Windows.
  • Filesystem metadata is now collected without running the df binary on Unix.
  • Adds language detection support for JRuby, which is detected as Ruby.
  • Add the oracle.can_connect metric.
  • Add duration to the plan payload.
  • Increasing the collection interval for all the checks except for activity samples from 10s to 60s.
  • Collect the number of CPUs and physical memory.
  • Improve Oracle query metrics algorithm and the fetching time for execution plans.
  • OTLP ingest pipeline panics no longer stop the Datadog Agent and instead only shutdown this pipeline. The panic is now available in the OTLP status section.
  • During the process check, collect the command name from /proc/[pid]/comm. This allows more accurate language detection of processes.
  • Change how SNMP trap variables with bit enumerations are resolved to hexadecimal strings prefixed with "0x" (previously base64 encoded strings).
  • The Datadog agent container image is now using Ubuntu 23.04 lunar as the base image.
  • Upgraded JMXFetch to 0.47.10 <https://github.com/DataDog/jmxfetch/releases/0.47.10>. This version improves how JMXFetch communicates with the Agent, and fixes a race condition where an exception is thrown if the Agent hasn't finished initializing before JMXFetch starts to shut down.
  • Added collector.worker_utilization to the telemetry. This metric represents the amount of time that a runner worker has been running checks.

Deprecation Notes

  • The command line arguments to the Datadog Agent Trace Agent trace-agent have changed from single-dash arguments to double-dash arguments. For example, -config must now be provided as --config. For backward-compatibility reasons the old CLI arguments will still work in the foreseeable future but may be removed in future versions.

Security Notes

  • APM: In order to improve the default customer experience regarding sensitive data, the Agent now obfuscates database statements within span metadata by default. This includes MongoDB queries, ElasticSearch request bodies, and raw commands from Redis and MemCached. Previously, this setting was off by default. This update could have performance implications, or obfuscate data that is not sensitive, and can be disabled or configured through the obfuscation options within the apm_config, or with the environment variables prefixed with DD_APM_OBFUSCATION. Please read the [Data Security documentation for full details](https://docs.datadoghq.com/tracing/configure_data_security/#trace-obfuscation).

  • This update ensures the sql.query tag is always obfuscated by the Datadog Agent even if this tag was already set by a tracer or manually by a user. This is to prevent potentially sensitive data from being sent to Datadog. If you wish to have a raw, unobfuscated query within a span, then manually add a span tag of a different name (for example, sql.rawquery).

  • Fix CVE-2023-39320, CVE-2023-39318, CVE-2023-39319, and CVE-2023-39321.

  • Update OpenSSL from 3.0.9 to 3.0.11. This addresses CVEs CVE-2023-2975, CVE-2023-3446, CVE-2023-3817, CVE-2023-4807.

Bug Fixes

  • APM: Fix issue of agent status returning an error when run shortly after starting the trace agent.

  • APM: Fix incorrect filenames and line numbers in logs from the trace agent.

  • OTLP logs ingestion is now disabled by default. To enable it, set otlp_config.logs.enabled to true.

  • Avoids fetching tags for ECS tasks when they're not consumed.

  • APM: Concurrency issue at high volumes fixed in obfuscation.

    • Updated datadog.agent.sbom_generation_duration to only be observed for successful scans.
  • Fixes a bug that prevents the Agent from writing permissions information about system-probe files when creating a flare.

  • Fixed a bug that causes the Agent to report the datadog.agent_name.running metric with missing tags in some environments with cgroups v1.

  • Fix dogstatsd_mapper_profiles wrong serialization when displaying the configuration (for example match_type was shown as matchtype). This also fixes a bug in which the secret management feature was incompatible with dogstatsd_mapper_profiles due to the renaming of the match_type key in the YAML data.

  • Fix a crash in the Cluster Agent when Remote Configuration is disabled

  • Corrected a bug in calculating the total size of a container image, now accounting for the configuration file size.

  • Fix to the process-agent from picking up processes which are kernel threads due integer overflow when parsing /proc/<pid>/stat.

  • Fixes a rare bug in the Kubernetes State check that causes the Agent to incorrectly tag the kubernetes_state.job.complete service check.

  • On Windows, the host metadata correctly reflects the Windows 11 version.

  • Fix a datadog.yaml configuration file parsing issue. When the datadog.yaml configuration file contained a complex configuration under prometheus.checks[*].configurations[*].metrics, a parsing error could lead to an OpenMetrics check not being properly scheduled. Instead, the Agent logged the following error:

    2023-07-26 14:09:23 UTC | CORE | WARN | (pkg/autodiscovery/common/utils/prometheus.go:77 in buildInstances) | Error processing prometheus configuration: json: unsupported type: map[interface {}]interface {}
    
  • Fixes the KSM check to support HPA v2beta2 again. This stopped working in Agent v7.44.0.

  • Counts sent through the no-aggregation pipeline are now sent as rate with a forced interval 10 to mimick the normal DogStatsD pipelines.

  • Bug fix for the wrong query signature.

  • Populate OTLP resource attributes in Datadog logs

  • Changes mapping for jvm.loaded_classes from process.runtime.jvm.classes.loaded to process.runtime.jvm.classes.current_loaded

  • The minimum and maximum estimation for OTLP Histogram to Datadog distribution mapping now ensures the average is within [min, max].

  • This estimation is only used when the minimum and maximum are not available in the OTLP payload or this is a cumulative payload.

  • Fixes a panic in the OTLP ingest metrics pipeline when sending OpenTelemetry runtime metrics

  • Set correct tag value "otel_source:datadog_agent" for OTLP logs ingestion

  • Removed specific environment variable filter on the Windows platform to fetch ECS task tags.

  • diagnose datadog-connectivity subcommand now loads and resolves secrets before checking connectivity.

  • The Agent now starts even if it cannot write events to the Application event log

  • Fix Windows Service detection by replacing svc.IsAnInteractiveSession() (deprecated) with svc.IsWindowsService()

Other Notes

  • System-probe no longer tries to resolve secrets in configurations.
  • Refactor in the logs collection pipeline, the journald and windowsevents support is now using the same pipeline as the rest of the logs collection implementations.
  • Please note that significant changes have been introduced to the Datadog Trace Agent for this release. Though these changes should not alter user-facing agent behavior beyond the CLI changes described above, please reach out to support should you experience any unexpected behavior.

Datadog Cluster Agent

New Features

  • Added the kubernetes_state.pod.tolerations metric to the KSM core check
  • Add HorizontalPodAutoscaler collection in the orchestrator check.

Enhancement Notes

  • Add safeguards for orchestrator CRD collection.
  • The Datadog cluster-agent container image is now using Ubuntu 23.04 lunar as the base image.

Bug Fixes

  • Fixed an error in the calculations performed by the algorithm that rebalances cluster checks. Cluster checks are now more evenly distributed when advanced dispatching is enabled (cluster_checks.advanced_dispatching_enabled is set to true).
  • Service checks are no longer excluded from rebalancing decisions when advanced dispatching is enabled (cluster_checks.advanced_dispatching_enabled is set to true).
  • Fixes a rare bug in the Kubernetes State check that causes the Agent to incorrectly tag the kubernetes_state.job.complete service check.
  • Removes an incorrect warning log message that mentions that the DD_POD_NAME env var is unknown.
  • Fixes the KSM check to support HPA v2beta2 again. This stopped working in Agent v7.44.0.
  • Adds the kube_cluster_name tag as a static global tag to the cluster agent when the DD_CLUSTER_NAME config option is set. This should fix an issue where the tag is not being attached to metrics in certain environments, such as EKS Fargate.
  • Fixed a bug in the advanced dispatching of cluster checks. All the checks scheduled since the last rebalance were being scheduled in the same node. Now they should be distributed among the available nodes.
datadog-agent - 7.47.1

Published by kacper-murzyn about 1 year ago

Prelude

Release on: 2023-09-21

Bug Fixes

  • Fixes issue with NPM driver restart failing with "File Not Found" error on Windows.
  • APM: The DD_APM_REPLACE_TAGS environment variable and apm_config.replace_tags setting now properly look for tags with numeric values.
  • Fix the issue introduced in 7.47.0 that causes the SE_DACL_AUTO_INHERITED flag to be removed from the installation drive directory when the installer fails and rolls back.
datadog-agent - 7.47.0

Published by kacper-murzyn about 1 year ago

Agent

Prelude

Release on: 2023-08-31

Upgrade Notes

  • Embedded Python 3 interpreter is upgraded to 3.9.17 in both Agent 6 and Agent 7. Embedded OpenSSL is upgraded to 3.0.9 in Agent 7 on Linux and macOS. On Windows, Python 3.9 in Agent 7 is still compiled with OpenSSL 1.1.1.

New Features

  • Add ability to send an Agent flare from the Datadog Application for Datadog support team troubleshooting. This feature requires enabling Remote Configuration.

  • Added workloadmeta remote process collector to collect process metadata from the Process-Agent and store it in the core agent.

  • Added new parameter workloadmeta.remote_process_collector.enabled to enable the workloadmeta remote process collector.

  • Added a new tag collector to datadog.agent.workloadmeta_remote_client_errors.

  • APM: Added support for obfuscating all Redis command arguments. For any Redis command, all arguments will be replaced by a single "?". Configurable using config variable apm_config.obfuscation.redis.remove_all_args and environment variable DD_APM_OBFUSCATION_REDIS_REMOVE_ALL_ARGS. Both accept a boolean value with default value false.

  • Added an experimental setting process_config.language_detection.enabled. This enables detecting languages for processes. This feature is WIP.

  • Added an experimental gRPC server to process-agent in order to expose process entities with their detected language. This feature is WIP and controlled through the process_config.language_detection.enabled setting.

  • The Agent now sends its configuration to Datadog by default to be displayed in the Agent Configuration section of the host detail panel. See https://docs.datadoghq.com/infrastructure/list/#agent-configuration for more information. The Agent configuration is scrubbed of any sensitive information and only contains configuration you’ve set using the configuration file or environment variables. To disable this feature set inventories_configuration_enabled to false.

  • The Windows installer can now send a report to Datadog in case of installation failure.

  • The Windows installer can now send APM telemetry.

  • Add support for Oracle Autonomous Database (Oracle Cloud Infrastructure).

  • Add shared memory (a.k.a. system global area - SGA) metric for Oracle databases: oracle.shared_memory.size

  • With this release, remote_config.enabled is set to true by default in the Agent configuration file. This causes the Agent to request configuration updates from the Datadog site.

    To receive configurations from Datadog, you still need to enable Remote Configuration at the organization level and enable Remote Configuration capability on your API Key from the Datadog application. If you don't want the Agent to request configurations from Datadog, set remote_config.enabled to false in the Agent configuration file.

  • DD_SERVICE_MAPPING can be used to rename Serverless inferred spans' service names.

  • Adds a new agent command stream-event-platform to stream the event platform payloads being generated by the agent. This will help diagnose issues with payload generation, and should ease validation of payload changes.

Enhancement Notes

  • Add two new initContainer metrics to the Kubernetes State Core check: kubernetes_state.initcontainer.waiting and kubernetes_state.initcontainer.restarts.

  • Add the following sysmetrics to improve DBA/SRE/SE perspective:

    avg_synchronous_single_block_read_latency,
    active_background_on_cpu, active_background,
    branch_node_splits, consistent_read_changes,
    consistent_read_gets, active_sessions_on_cpu, os_load,
    database_cpu_time_ratio, db_block_changes, db_block_gets,
    dbwr_checkpoints, enqueue_deadlocks, execute_without_parse,
    gc_current_block_received, gc_average_cr_get_time,
    gc_average_current_get_time, hard_parses,
    host_cpu_utilization, leaf_nodes_splits, logical_reads,
    network_traffic_volume, pga_cache_hit, parse_failures,
    physical_read_bytes, physical_read_io_requests,
    physical_read_total_io_requests, physical_reads_direct_lobs,
    physical_read_total_bytes, physical_reads_direct,
    physical_write_bytes, physical_write_io_requests,
    physical_write_total_bytes, physical_write_total_io_requests,
    physical_writes_direct_lobs, physical_writes_direct,
    process_limit, redo_allocation_hit_ratio, redo_generated,
    redo_writes, row_cache_hit_ratio, soft_parse_ratio,
    total_parse_count, user_commits

  • Pause containers from the new Kubernetes community registry (registry.k8s.io/pause) are now excluded by default for containers and metrics collection.

  • [corechecks/snmp] Add forced type rate as an alternative to counter.

  • [corechecks/snmp] Add symbol level metric_type for table metrics.

  • Adds support for including the span.kind tag in APM stats aggregations.

  • Allow ad_identifiers to be used in file based logs integration configs in order to collect logs from disk.

  • Agents are now built with Go 1.20.5

  • Agents are now built with Go 1.20.6. This version of Golang fixes CVE-2023-29406.

  • Improve error handling in External Metrics query logic by running queries with errors individually with retry and backoff, and batching only queries without errors.

  • CPU metadata is now collected without running the sysctl binary on Darwin.

  • Memory metadata is now collected without running the sysctl binary on Darwin.

  • Always send the swap size value in metadata as an integer in kilobytes.

  • Platform metadata is now collected without running the uname binary on Linux and Darwin.

  • Add new metrics for resource aggregation to the Kubernetes State Core check:

    • kubernetes_state.node.<cpu|memory>_capacity.total
    • kubernetes_state.node.<cpu|memory>_allocatable.total
    • kubernetes_state.container.<cpu|memory>_requested.total
    • kubernetes_state.container.<cpu|memory>_limit.total
  • The kube node name is now reported a host tag kube_node

  • [pkg/netflow] Collect flow_process_nf_errors_count metric from goflow2.

  • APM: Bind apm_config.obfuscation.* parameters to new obfuscation environment variables. In particular, bind:
    apm_config.obfuscation.elasticsearch.enabled to DD_APM_OBFUSCATION_ELASTICSEARCH_ENABLED: It accepts a boolean value with default value false.

    apm_config.obfuscation.elasticsearch.keep_values to DD_APM_OBFUSCATION_ELASTICSEARCH_KEEP_VALUES It accepts a list of strings of the form ["id1", "id2"].

    apm_config.obfuscation.elasticsearch.obfuscate_sql_values to DD_APM_OBFUSCATION_ELASTICSEARCH_OBFUSCATE_SQL_VALUES It accepts a list of strings of the form ["key1", "key2"].

    apm_config.obfuscation.http.remove_paths_with_digits to DD_APM_OBFUSCATION_HTTP_REMOVE_PATHS_WITH_DIGITS, It accepts a boolean value with default value false.

    apm_config.obfuscation.http.remove_query_string to DD_APM_OBFUSCATION_HTTP_REMOVE_QUERY_STRING, It accepts a boolean value with default value false.

    apm_config.obfuscation.memcached.enabled to DD_APM_OBFUSCATION_MEMCACHED_ENABLED: It accepts a boolean value with default value false.

    apm_config.obfuscation.mongodb.enabled to DD_APM_OBFUSCATION_MONGODB_ENABLED: It accepts a boolean value with default value false.

    apm_config.obfuscation.mongodb.keep_values to DD_APM_OBFUSCATION_MONGODB_KEEP_VALUES It accepts a list of strings of the form ["id1", "id2"].

    apm_config.obfuscation.mongodb.obfuscate_sql_values to DD_APM_OBFUSCATION_MONGODB_OBFUSCATE_SQL_VALUES It accepts a list of strings of the form ["key1", "key2"].

    apm_config.obfuscation.redis.enabled to DD_APM_OBFUSCATION_REDIS_ENABLED: It accepts a boolean value with default value false.

    apm_config.obfuscation.remove_stack_traces to DD_APM_OBFUSCATION_REMOVE_STACK_TRACES: It accepts a boolean value with default value false.

    apm_config.obfuscation.sql_exec_plan.enabled to DD_APM_OBFUSCATION_SQL_EXEC_PLAN_ENABLED: It accepts a boolean value with default value false.

    apm_config.obfuscation.sql_exec_plan.keep_values to DD_APM_OBFUSCATION_SQL_EXEC_PLAN_KEEP_VALUES It accepts a list of strings of the form ["id1", "id2"].

    apm_config.obfuscation.sql_exec_plan.obfuscate_sql_values to DD_APM_OBFUSCATION_SQL_EXEC_PLAN_OBFUSCATE_SQL_VALUES It accepts a list of strings of the form ["key1", "key2"].

    apm_config.obfuscation.sql_exec_plan_normalize.enabled to DD_APM_OBFUSCATION_SQL_EXEC_PLAN_NORMALIZE_ENABLED: It accepts a boolean value with default value false.

    apm_config.obfuscation.sql_exec_plan_normalize.keep_values to DD_APM_OBFUSCATION_SQL_EXEC_PLAN_NORMALIZE_KEEP_VALUES It accepts a list of strings of the form ["id1", "id2"].

    apm_config.obfuscation.sql_exec_plan_normalize.obfuscate_sql_values to DD_APM_OBFUSCATION_SQL_EXEC_PLAN_NORMALIZE_OBFUSCATE_SQL_VALUES It accepts a list of strings of the form ["key1", "key2"].

  • The Windows installer is now built using WixSharp.

  • Refactored the Windows installer custom actions in .Net.

  • Remove Oracle from the Heroku build.

  • [pkg/snmp/traps] Collect telemetry metrics for SNMP Traps.

  • [pkg/networkdevice] Add Meraki fields to NDM Metadata payload.

  • [corechecks/snmp] Add metric_type to metric root and deprecate forced_type.

  • [corechecks/snmp] Add tags to interface_configs to tag interface metrics

  • [corechecks/snmp] Add user_profiles directory support.

Deprecation Notes

  • The system_probe_config.http_map_cleaner_interval_in_s configuration has been deprecated. Use service_monitoring_config.http_map_cleaner_interval_in_s instead.
  • The system_probe_config.http_idle_connection_ttl_in_s configuration has been deprecated. Use service_monitoring_config.http_idle_connection_ttl_in_s instead.
  • The network_config.http_notification_threshold configuration has been deprecated. Use service_monitoring_config.http_notification_threshold instead.
  • The network_config.http_max_request_fragment configuration has been deprecated. Use service_monitoring_config.http_max_request_fragment instead.
  • The network_config.http_replace_rules configuration has been deprecated. Use service_monitoring_config.http_replace_rules instead.
  • The network_config.max_tracked_http_connections configuration has been deprecated. Use service_monitoring_config.max_tracked_http_connections instead.
  • The network_config.max_http_stats_buffered configuration has been deprecated. Use service_monitoring_config.max_http_stats_buffered instead.
  • The compliance_config.xccdf.enabled configuration has been deprecated. Use compliance_config.host_benchmarks.enabled instead.

Bug Fixes

  • APM: Fix a bug introduced in Agent versions 7.44 and 6.44 that changed the expected strings separator from comma to space when multiple features are defined in DD_APM_FEATURES. Now either separator can be used (for example, DD_APM_FEATURES="feat1,feat2" or DD_APM_FEATURES="feat1 feat2").
  • Add a workaround for erroneous database connection loss handling in go-ora.
  • If no NTP servers are reachable, datadog-agent status now displays ERROR for the NTP check, rather than OK.
  • Fixes a bug in auto-discovery annotations processing where two consecutive percent characters were wrongly altered even if they were not part of a %%var%% template variable pattern.
  • Fix memory leak by closing the time ticker in orchestrator check when the check is done.
  • Fixes a panic occuring when an entry in /etc/services does not follow the format port/protocol: https://gitlab.com/cznic/libc/-/issues/25
  • Fixes the inclusion of the security-agent.yaml file in the flare.
  • [apm] fix an issue for service and peer.service normalization where names starting with a digit are incorrectly considered as invalid
  • Fix building a local flare to use the expvar_port from the config instead of the default port.
  • Use a locale-independent format for the swap size sent in the metadata, to avoid issues when parsing the value in the frontend.
  • Fixes a bug where the metric with timestamps pipeline could have wrongly processed metrics without timestamps (when both pipelines were flooded), potentially leading to inaccuracies.
  • Fixes an issue where process_config.max_per_message and process_config.max_message_bytes were ignored when set larger than the default values, and increases the limit on accepted values for these variables.
  • rtloader: Use execinfo only if provided to fix builds on C libraries like musl.

Other Notes

  • Service check datadog.agent.check_status is now disabled bydefault. To re-enable, set integration_check_status_enabled to true.

Datadog Cluster Agent

Upgrade Notes

  • Add support for leases in leader election which can be enabled by setting leader_election_default_resource to leases, available since Kubernetes version 1.14. If this parameter is empty, leader election automatically detects if leases are available and uses them. Set leader_election_default_resource to configmap on clusters running Kubernetes versions previous to 1.14.

New Features

  • Auto-instrumentation admission controller now automatically activates crash tracking for Java applications

Enhancement Notes

  • Expose to cluster-agent HistogramBuckets and Events check stats. It should help the cluster-agent to define a better cluster-checks dispatching.

Bug Fixes

  • The Cluster Agent Admission Controller now injects DD_DOGSTATSD_URL when used in socket mode (default), allowing DogStatsD clients to work without configuration.
  • Fix persistent volume type for local volumes.
datadog-agent - 7.46.0

Published by kacper-murzyn over 1 year ago

Agent

Prelude

Release on: 2023-07-10

Upgrade Notes

  • Refactor the SBOM collection parameters from:

    conf.d/container_lifecycle.d/conf.yaml existence (A) # to schedule the container lifecycle long running check
    conf.d/container_image.d/conf.yaml     existence (B) # to schedule the container image metadata long running check
    conf.d/sbom.d/conf.yaml                existence (C) # to schedule the SBOM long running check
    
    Inside datadog.yaml:
    
    container_lifecycle:
      enabled:                        (D)  # Used to control the start of the container_lifecycle forwarder but has been decommissioned by #16084 (7.45.0-rc)
      dd_url:                              # \
      additional_endpoints:                # |
      use_compression:                     # |
      compression_level:                   #  > generic parameters for the generic EVP pipeline
        …                                  # |
      use_v2_api:                          # /
    
    container_image:
      enabled:                        (E)  # Used to control the start of the container_image forwarder but has been decommissioned by #16084 (7.45.0-rc)
      dd_url:                              # \
      additional_endpoints:                # |
      use_compression:                     # |
      compression_level:                   #  > generic parameters for the generic EVP pipeline
        …                                  # |
      use_v2_api:                          # /
    
    sbom:
      enabled:                        (F)  # control host SBOM collection and do **not** control container-related SBOM since #16084 (7.45.0-rc)
      dd_url:                              # \
      additional_endpoints:                # |
      use_compression:                     # |
      compression_level:                   #  > generic parameters for the generic EVP pipeline
        …                                  # |
      use_v2_api:                          # /
      analyzers:                      (G)  # trivy analyzers user for host SBOM collection
      cache_directory:                (H)
      clear_cache_on_exit:            (I)
      use_custom_cache:               (J)
      custom_cache_max_disk_size:     (K)
      custom_cache_max_cache_entries: (L)
      cache_clean_interval:           (M)
    
    container_image_collection:
      metadata:
        enabled:                      (N)  # Controls the collection of the container image metadata in workload meta
      sbom:
        enabled:                      (O)
        use_mount:                    (P)
        scan_interval:                (Q)
        scan_timeout:                 (R)
        analyzers:                    (S)  # trivy analyzers user for containers SBOM collection
        check_disk_usage:             (T)
        min_available_disk:           (U)
    

    to:

    conf.d/{container_lifecycle,container_image,sbom}.d/conf.yaml no longer needs to be created. A default version is always shipped with the Agent Docker image with an underscore-prefixed ad_identifier that will be synthesized by the agent at runtime based on config {container_lifecycle,container_image,sbom}.enabled parameters.
    
    Inside datadog.yaml:
    
    container_lifecycle:
      enabled:                        (A)  # Replaces the need for creating a conf.d/container_lifecycle.d/conf.yaml file
      dd_url:                              # \
      additional_endpoints:                # |
      use_compression:                     # |
      compression_level:                   #  > unchanged generic parameters for the generic EVP pipeline
        …                                  # |
      use_v2_api:                          # /
    
    container_image:
      enabled:                        (B)  # Replaces the need for creating a conf.d/container_image.d/conf.yaml file
      dd_url:                              # \
      additional_endpoints:                # |
      use_compression:                     # |
      compression_level:                   #  > unchanged generic parameters for the generic EVP pipeline
        …                                  # |
      use_v2_api:                          # /
    
    sbom:
      enabled:                        (C)  # Replaces the need for creating a conf.d/sbom.d/conf.yaml file
      dd_url:                              # \
      additional_endpoints:                # |
      use_compression:                     # |
      compression_level:                   #  > unchanged generic parameters for the generic EVP pipeline
        …                                  # |
      use_v2_api:                          # /
      cache_directory:                (H)
      clear_cache_on_exit:            (I)
      cache:                               # Factorize all settings related to the custom cache
        enabled:                      (J)
        max_disk_size:                (K)
        max_cache_entries:            (L)
        clean_interval:               (M)
    
      host:                                # for host SBOM parameters that were directly below `sbom` before.
        enabled:                      (F)  # sbom.host.enabled replaces sbom.enabled
        analyzers:                    (G)  # sbom.host.analyzers replaces sbom.analyzers
    
      container_image:                     # sbom.container_image replaces container_image_collection.sbom
        enabled:                      (O)
        use_mount:                    (P)
        scan_interval:                (Q)
        scan_timeout:                 (R)
        analyzers:                    (S)    # trivy analyzers user for containers SBOM collection
        check_disk_usage:             (T)
        min_available_disk:           (U)
    

New Features

  • This change adds support for ingesting information such as database settings and schemas as database "metadata"

  • Add the capability for the security-agent compliance module to export detailed Kubernetes node configurations.

  • Add unsafe-disable-verification flag to skip TUF/in-toto verification when downloading and installing wheels with the integrations install command

  • Add container.memory.working_set metric on Linux (computed as Usage - InactiveFile) and Windows (mapped to Private Working Set)

  • Enabling dogstatsd_metrics_stats_enable will now enable dogstatsd_logging_enabled. When enabled, dogstatsd_logging_enabled generates dogstatsd log files at:

    • For Windows:
      c:\programdata\datadog\logs\dogstatsd_info\dogstatsd-stats.log
    • For Linux:
      /var/log/datadog/dogstatsd_info/dogstatsd-stats.log
    • For MacOS:
      /opt/datadog-agent/logs/dogstatsd_info/dogstatsd-stats.log

    These log files are also automatically attached to the flare.

  • You can adjust the dogstatsd-stats logging configuration by using:

    • dogstatsd_log_file_max_size: SizeInBytes (default: dogstatsd_log_file_max_size:"10Mb")
    • dogstatsd_log_file_max_rolls: Int (default: dogstatsd_log_file_max_rolls:3)
  • The network_config.enable_http_monitoring configuration has changed to service_monitoring_config.enable_http_monitoring.

  • Add Oracle execution plans

  • Oracle query metrics

  • Add support for Oracle RDS multi-tenant

Enhancement Notes

  • agent status -v now shows verbose diagnostic information. Added tailer-specific stats to the verbose status page with improved auto multi-line detection information.
  • The health command from the Agent and Cluster Agent now have a configurable timeout (60 second by default).
  • Add two new metrics to the Kubernetes State Core check: kubernetes_state.configmap.count and kubernetes_state.secret.count.
  • The metadata payload containing the status of every integration run by the Agent is now sent one minute after startup and then every ten minutes after that, as before. This means that the integration status will be visible in the app one minute after the Agent starts instead of ten minutes. The payload waits for a minute so the Agent has time to run every configured integration twice and collect an accurate status.
  • Adds the ability to generate an Oracle SQL trace for Agent queries
  • APM: The disable_file_logging setting is now respected.
  • Collect conditions for a variety of Kubernetes resources.
  • Documents the max_recv_msg_size_mib option and DD_OTLP_CONFIG_RECEIVER_PROTOCOLS_GRPC_MAX_RECV_MSG_SIZE_MIB environment variable in the OTLP config. This variable is used to configure the maximum size of messages accepted by the OTLP gRPC endpoint.
  • Agents are now built with Go 1.19.10
  • Inject container tags in instrumentation telemetry payloads
  • Extract the task_arn tag from container tags and add it as its own header.
  • [pkg/netflow] Add flush_timestamp to payload.
  • [pkg/netflow] Add sequence metrics.
  • [netflow] Upgrade goflow2 to v1.3.3.
  • Add Oracle sysmetrics, pga process memory usage, tablespace usage with pluggable database (PDB) tags
  • OTLP ingestion: Support setting peer service to trace stats exported by the Agent.
  • OTLP ingestion: Stop overriding service with peer.service.
  • OTLP ingestion: Set OTLP span kind as Datadog span meta tag span.kind.
  • Adds new metric datadog.agent.otlp.runtime_metrics when runtime metrics are being received via OTLP.
  • [corechecks/snmp] Collect topology by default.
  • Upgraded JMXFetch to 0.47.9 which has fixes to improve efficiency when fetching beans, fixes for process attachment in some JDK versions, and fixes a thread leak.

Deprecation Notes

  • Installing the Agent on Windows Server versions lower than 2012 and client versions lower than 8.1 is now deprecated.
  • The network_config.enable_http_monitoring configuration is now deprecated. Use service_monitoring_config.enable_http_monitoring instead.

Security Notes

  • Upgraded embedded Python3 to 3.8.17; addressed CVE-2023-24329.

Bug Fixes

  • Fix an issue where auto_multi_line_detection, auto_multi_line_sample_size, and auto_multi_line_match_threshold were not working when set though a pod annotation or container label.
  • Ensure the Agent detects file rotations correctly when under heavy loads.
  • Fixes kubernetes_state_core crash when unknown resources are provided.
  • Fix a file descriptors leak in the Cloud Foundry Cluster Agent.
  • Fix the timeout for idle HTTP connections.
  • [netflow] Rename telemetry metric tag device_ip to exporter_ip.
  • When present, use 'host' resource attribute as the host value on OTLP payloads to avoid double tagging.
  • Remove thread count from OTel .NET runtime metric mappings.
  • Fix collection of I/O and open files data in the process check.
  • Fix unexpected warn log when using mapping in SNMP profiles.
  • Upgrade go-ora to 2.7.6 to prevent Agent crashes due to nil pointer dereference in case of database connection loss.

Datadog Cluster Agent

New Features

  • Enable collection of Vertical Pod Autoscalers by default in the orchestrator check.

Enhancement Notes

  • Collect conditions for a variety of Kubernetes resources.
  • Collect persistent volume source in the orchestrator check.

Bug Fixes

  • Fix the timeout for idle HTTP connections.
  • When the cluster-agent is started with hostNetwork: true, the leader election mechanism was using a node name instead of the pod name. This was breaking the “follower to leader” forwarding mechanism. This change introduce the DD_POD_NAME environment variable as a more reliable way to set the cluster-agent pod name. It is supposed to be filled by the Kubernetes downward API.
datadog-agent - 7.45.1

Published by kacper-murzyn over 1 year ago

Prelude

Release on: 2023-06-27

Security Notes

  • Bump ncurses to 6.4 in the Agent embedded environment. Fixes CVE-2023-29491.
  • Updated the version of OpenSSL used by Python to 1.1.1u; addressed CVE-2023-2650, CVE-2023-0466, CVE-2023-0465 and CVE-2023-0464.
datadog-agent - 7.45.0

Published by kacper-murzyn over 1 year ago

Agent

Prelude

Release on: 2023-06-05

New Features

  • Add Topology data collection with CDP.
  • APM: Addition of configuration to add peer.service to trace stats exported by the Agent.
  • APM: Addition of configuration to compute trace stats on spans based on their span.kind value.
  • APM: Added a new endpoint in the trace-agent API /symdb/v1/input that acts as a reverse proxy forwarding requests to Datadog. The feature using this is currently in development.
  • Add support for confluent-kafka.
  • Add support for XCCDF benchmarks in CSPM. A new configuration option, 'compliance_config.xccdf.enabled', disabled by default, has been added for enabling XCCDF benchmarks.
  • Add arguments to module load events
  • Oracle DBM monitoring with activity sampling. The collected samples form the foundation for database load profiling. With Datadog GUI, samples can be aggregated and filtered to identify bottlenecks.
  • Add reporting of container.{cpu|memory|io}.partial_stall metrics based on PSI Some values when host is running with cgroupv2 enabled (Linux only). This metric provides the wall time (in nanoseconds) during which at least one task in the container has been stalled on the given resource.
  • Adding a new option secret_backend_remove_trailing_line_break to remove trailing line breaks from secrets returned by secret_backend_command. This makes it easier to use secret management tools that automatically add a line break when exporting secrets through files.

Enhancement Notes

  • Cluster Agent: User config, cluster Agent deployment and node Agent daemonset manifests are now added to the flare archive, when the Cluster Agent is deployed with Helm (version 3.23.0+).

  • Datadog Agent running as a systemd service can optionally read environment variables from a text file /etc/datadog-agent/environment containing newline-separated variable assignments. See https://www.freedesktop.org/software/systemd/man/systemd.exec.html#Environment

  • Add ability to filter kubernetes containers based on autodiscovery annotation. Containers in a pod can now be omitted by setting ad.datadoghq.com/<container_name>.exclude as an annotation on the pod. Logs can now be ommitted by setting ad.datadoghq.com/<container_name>.logs_exclude as an annotation on the pod.

  • Added support for custom resource definitions metrics: crd.count and crd.condition.

    • Remove BadgerDB cache for Trivy.
    • Add new custom LRU cache for Trivy backed by BoltDB and parametrized by:
    • Periodically delete unused entries from the custom cache.
    • Add telemetry metrics to monitor the cache:
      • sbom.cached_keys: Number of cache keys stored in memory
      • sbom.cache_disk_size: Total size, in bytes, of the database as reported by BoltDB.
      • sbom.cached_objects_size: Total size, in bytes, of cached SBOM objects on disk. Limited by sbom.custom_cache_max_disk_size.
      • sbom.cache_hits_total: Total number of cache hits.
      • sbom.cache_misses_total: Total number of cache misses.
      • sbom.cache_evicts_total: Total number of cache evicts.
  • Added DD_ENV to the SBOMPayload in the SBOM check.

  • Added kubernetes_state.hpa.status_target_metric and kubernetes_state.deployment.replicas_ready metrics part of the kubernetes_state_core check.

  • Add support for emitting resources on metrics from tags in the format dd.internal.resource:type,name.

  • APM: Dynamic instrumentation logs and snapshots can now be shipped to multiple Datadog logs intakes.

  • Adds support for OpenTelemetry span links to the Trace Agent OTLP endpoint when converting OTLP spans (span links are added as metadata to the converted span).

  • Agents are now built with Go 1.19.9.

  • Make Podman DB path configurable for rootless environment. Now we can set $HOME/.local/share/containers/storage/libpod/bolt_state.db.

  • Add ownership information for containers to the container-lifecycle check.

  • Add Pod exit timestamp to container-lifecycle check.

  • The Agent now uses the ec2_metadata_timeout value when fetching EC2 instance tags with AWS SDK. The Agent fetches instance tags when collect_ec2_tags is set to true.

  • Upgraded JMXFetch to 0.47.8 which has improvements aimed to help large metric collections drop fewer payloads.

  • Kubernetes State Metrics Core: Adds collection of Kubernetes APIServices metrics

  • Add support for URLs with the http|https scheme in the dd_url or logs_dd_url parameters when configuring endpoints. Also automatically detects SSL needs, based on the scheme when it is present.

  • [pkg/netflow] Add NetFlow Exporter to NDM Metadata.

  • SUSE RPMs are now built with RPM 4.14.3 and have SHA256 digest headers.

  • observability_pipelines_worker can now be used in place of the vector config options.

  • Add an option and an annotation to skip kube_service tags on Kubernetes pods.

    When the selector of a service matches a pod and that pod is ready, its metrics are decorated with a kube_service tag.

    When the readiness of a pod flips, so does the kube_service tag. This could create visual artifacts (spikes when the tag flips) on dashboards where the queries are missing .fill(null).

    If many services target a pod, the total number of tags attached to its metrics might exceed a limit that causes the whole metric to be discarded.

    In order to mitigate these two issues, it’s now possible to set the kubernetes_ad_tags_disabled parameter to kube_config to globally remove the kube_service tags on all pods:: kubernetes_ad_tags_disabled

    • kube_service

    It’s also possible to add a tags.datadoghq.com/disable: kube_service annotation on only the pods for which we want to remove the kube_service tag.

    Note that kube_service is the only tag that can be removed via this parameter and this annotation.

  • Support OTel semconv 1.17.0 in OTLP ingest endpoint.

  • When otlp_config.metrics.histograms.send_aggregation_metrics is set to true, the OTLP ingest pipeline will now send min and max metrics for delta OTLP Histograms and OTLP Exponential Histograms when available, in addition to count and sum metrics.

    The deprecated option otlp_config.metrics.histograms.send_count_sum_metrics now also sends min and max metrics when available.

  • OTLP: Use minimum and maximum values from cumulative OTLP Histograms. Values are used only when we can assume they are from the last time window or otherwise to clamp estimates.

  • The OTLP ingest endpoint now supports the same settings and protocol as the OpenTelemetry Collector OTLP receiver v0.75.0.

  • Secrets with ENC[] notation are now supported for proxy setting from environment variables. For more information you can refer to our [Secrets Management](https://docs.datadoghq.com/agent/guide/secrets-management/) and [Agent Proxy Configuration](https://docs.datadoghq.com/agent/proxy/) documentations.

  • [corechecks/snmp] Adds ability to send constant metrics in SNMP profiles.

  • [corechecks/snmp] Adds ability to map metric tag value to string in SNMP profiles.

  • [corechecks/snmp] Add support to format bytes into ip_address

Deprecation Notes

  • APM OTLP: Field UsePreviewHostnameLogic is deprecated, and usage of this field has been removed. This is done in preparation to graduate the exporter.datadog.hostname.preview feature gate to stable.
  • The Windows Installer NPM feature option, used in ADDLOCAL=NPM and REMOVE=NPM, no longer controls the install state of NPM components. The NPM components are now always installed, but will only run when enabled in the agent configuration. The Windows Installer NPM feature option still exists for backwards compatability purposes, but has no effect.
  • Deprecate otlp_config.metrics.histograms.send_count_sum_metrics in favor of otlp_config.metrics.histograms.send_aggregation_metrics.
  • Removed the --info flag in the Process Agent, which has been replaced by the status command since 7.35.

Security Notes

  • Handle the return value of Close() for writable files in pkg/forwarder
  • Fixes cwe 703. Handle the return value of Close() for writable files and forces writes to disks in system-probe

Bug Fixes

  • APM: Setting apm_config.receiver_port: 0 now allows enabling UNIX Socket or Windows Pipes listeners.
  • APM: OTLP: Ensure that container tags are set globally on the payload so that they can be picked up as primary tags in the app.
  • APM: Fixes a bug with how stats are calculated when using single span sampling along with other sampling configurations.
  • APM: Fixed the issue where not all trace stats are flushed on trace-agent shutdown.
  • Fix an issue on the pod collection where the cluster name would not be consistently RFC1123 compliant.
  • Make the agent able to detect it is running on ECS EC2, even with a host install, i.e. when the agent isn’t deployed as an ECS task.
  • Fix missing case-sensitive version of the device tag on the system.disk group of metrics.
  • The help output of the Agent command now correctly displays the executable name on Windows.
  • Fix resource requirements detection for containers without any request and limit set.
  • The KSM core check now correctly handles labels and annotations with uppercase letters defined in the "labels_as_tags" and "annotations_as_tags" config attributes.
  • Fixes issue where trace data drops in OTLP ingest by adding batch processor for traces, and increases the grpc message limit
  • [pkg/netflow] Rename payload device.ip to exporter.ip
  • Fixes an issue in the process agent where in rare scenarios, negative CPU usage percentages would be reported for processes.
  • When a pod was annotated with prometheus.io/scrape: true, the Agent used to schedule one openmetrics check per container in the pod unless a datadog.prometheusScrape.additionalConfigs[].autodiscovery.kubernetes_container_names list was defined, which restricted the potential container targets. The Agent is now able to leverage the prometheus.io/port annotation to schedule an openmetrics check only on the container of the pod that declares that port in its spec.
  • Fixing an issue with Prometheus scrape feature when service_endpoints option is used where endpoint updates were missed by the Agent, causing checks to not be scheduled on endpoints created after Agent start.
  • On Windows, when using USM, fixes tracking of connections made via localhost.

Datadog Cluster Agent

Enhancement Notes

  • Add "active" tag on the telemetry datadog.cluster_agent.external_metrics.datadog_metrics tag. The label active is true if DatadogMetrics CR is used, false otherwise.
  • Library injection via Admission Controller: Allow configuring the CPU and Memory requests/limits for library init containers.
  • Validate the orchestration config provided by the user.

Bug Fixes

  • Fix the admission controller in socket mode for pods with init containers.
  • Fix resource requirements detection for containers without any request and limit set.
  • The KSM core check now correctly handles labels and annotations with uppercase letters defined in the "labels_as_tags" and "annotations_as_tags" config attributes.
datadog-agent - 7.44.1

Published by kacper-murzyn over 1 year ago

Prelude

Release on: 2023-05-16

Enhancement Notes

  • Agents are now built with Go 1.19.8.
  • Added optional config flag process_config.cache_lookupid to cache calls to user.LookupId in the process Agent. Use to minimize the number of calls to user.LookupId and avoid potential leak.

Bug Fixes

  • Fixes the inclusion of the security-agent.yaml file in the flare.
datadog-agent - 7.44.0

Published by kacper-murzyn over 1 year ago

Agent

Prelude

Release on: 2023-04-27

New Features

  • Added HTTP/2 parsing logic to Universal Service Monitoring.
  • Adding Universal Service Monitoring to the Agent status check. Now Datadog has visibility into the status of Universal Service Monitoring. Startup failures appear in the status check.
  • In the agent.log, a DEBUG, WARN, and ERROR log have been added to report how many file handles the core Agent process has open. The DEBUG log reports the info, the WARN log appears when the core Agent is over 90% of the OS file limit, and the ERROR log appears when the core Agent has reached 100% of the OS file limit. In the Agent status command, fields CoreAgentProcessOpenFiles and OSFileLimit have been added to the Logs Agent section. This feature is currently for Linux only.
  • APM: Collect trace agent startup errors and successes using instrumentation-telemetry "apm-onboarding-event" messages.
  • APM OTLP: Introduce OTLP Ingest probabilistic sampling, configurable via otlp_config.traces.probabilistic_sampler.sampling_percentage.
  • The Datadog Admission Controller can inject the .NET APM library into Kubernetes containers for auto-instrumentation.
  • Enable CWS Security Profiles by default.
  • Support the config additional_endpoints for Data Streams monitoring.
  • Added support for collecting container image metadata when using Docker.
  • Added Kafka parsing logic to system-probe
  • Allow writing SECL rules against container creation time through the new container.created_at field, similar to the existing process.container_at field. The container creation time is also reported in the sent events.
  • [experimental] CWS generates an SBOM for any running workload on the machine.
  • [experimental] CWS events are enriched with SBOM data.
  • [experimental] CWS activity dumps are enriched with SBOM data.
  • Enable OTLP endpoint for receiving traces in the Datadog Lambda Extension.
  • On Windows, when service inference is enabled, process_context tags can now be populated by the service name in the SCM. This feature can be controlled by either the service_monitoring_config.process_service_inference.enabled config setting in the user's datadog.yaml config file, or it can be configured via the DD_SYSTEM_PROBE_PROCESS_SERVICE_INFERENCE_USE_WINDOWS_SERVICE_NAME environment variable. This setting is enabled by default.

Enhancement Notes

  • Added kubernetes_state.hpa.status_target_metric and kubernetes_state.deployment.replicas_ready metrics part of the kubernetes_state_core check.

  • The status page now includes a Status render errors section to highlight errors that occurred while rendering it.

  • APM:

    • Run the /debug/* endpoints in a separate server which uses port 5012 by default and only listens on 127.0.0.1. The port is configurable through apm_config.debug.port and DD_APM_DEBUG_PORT, set it to 0 to disable the server.
    • Scrub the content served by the expvar endpoint.
  • APM: apm_config.features is now configurable from the Agent configuration file. It was previously only configurable via DD_APM_FEATURES.

  • Agents are now built with Go 1.19.7.

  • The OTLP ingest endpoint now supports the same settings and protocol as the OpenTelemetry Collector OTLP receiver v0.71.0.

  • Collect Kubernetes Pod conditions.

  • Added the "availability-zone" tag to the Fargate integration. This matches the tag emitted by other AWS infrastructure integrations.

  • Allow to report all gathered data in case of partial failure of container metrics retrieval.

  • Upgraded JMXFetch to 0.47.8 which has improvements aimed to help large metric collections drop fewer payloads.

  • JMXFetch upgraded to 0.47.5 which now supports pulling metrics from javax.management.openmbean.TabularDataSupport. Also contains a fix for pulling metrics from javax.management.openmbean.TabularDataSupport when no tags are specified.

  • Updated chunking util and use cases to use generics. No behavior change.

  • [corechecks/snmp] Add interface_configs to override interface speed.

  • No longer increments TCP retransmit count when the retransmit fails.

  • The OTLP ingestion endpoint now supports the same settings and protocols as the OpenTelemetry Collector OTLP receiver v0.70.0.

  • Changes the retry mechanism of starting workloadmeta collectors so that instead of retrying every 30 seconds, it retries following an exponential backoff with initial interval of 1s and max of 30s. In general, this should help start sooner the collectors that failed on the first try.

  • Added the "pull_duration" metric in the workloadmeta telemetry. It measures the time that it takes to pull from the collectors.

Deprecation Notes

  • Marked the "availability_zone" tag as deprecated for the Fargate integration, in favor of "availability-zone".
  • Configuration enable_sketch_stream_payload_serialization is now deprecated.

Security Notes

  • The Agent now checks containerd containers Spec size before parsing it. Any Spec exceeding 2MB will not be parsed and a warning will be emitted. This impacts the container_env_as_tags feature and %%hostname%% variable resolution for environments based on containerd outside of Kubernetes.

Bug Fixes

  • APM: Fix issue where dogstatsd proxy would not work when bind address was set to localhost on MacOS. APM: Fix issue where setting bind_host to "::1" would break runtime metrics for the trace-agent.
  • APM: Trace Agent not printing critical init errors.
  • Fixes a bug where ignored container files (that were not tailed) were incorrectly counted against the total open files.
  • Fixes the configuration parsing of the "container_lifecycle" check. Custom config values were not being applied.
  • Corrects dogstatsd metric message validation to support all current (and some future) dogstatsd features
  • Avoid panic in kubernetes_state_core check with specific Ingress objects configuration.
  • Fixes a divide-by-zero panic when sketch serialization fails on the last metric of a given batch
  • Fix issue introduced in 7.43 that prevents the Datadog Agent Manager application from executing from the checkbox at the end of the Datadog Agent installation when the installer is run by a non-elevated administrator user.
  • Fixes a problem with USM and IIS on Windows Server 2022 due to a change in the way Microsoft reports IIS connections.
  • Fixes the labelsAsTags parameter of the kube-state metrics core check. Tags were not properly formatted when they came from a label on one resource type (for example, namespace) and turned into a tag on another resource type (for example, pod).
  • The OTLP ingest endpoint does not report the first cumulative monotonic sum value if the start timestamp of the timeseries matches its timestamp.
  • Prevent disallowlisting on empty command line for processes in the Process Agent when encountering a failure to parse, use exe value instead.
  • Make SNMP Listener support all authProtocol.
  • Fix an issue where agent status would show incorrect system-probe status for 15 seconds as the system-probe started up.
  • Fix partial loss of NAT info in system-probe for pre-existing connections.
  • Replace ; with & in the URL to open GUI to follow golang.org/issue/25192.
  • Workloadmeta now avoids concurrent pulls from the same collector. This bug could lead to incorrect or missing data when the collectors were too slow pulling data.
  • Fixes a bug that prevents the containerd workloadmeta collector from starting sometimes when container_image_collection.metadata.enabled is set to true.
  • Fixed a bug in the SBOM collection feature. In certain cases, some SBOMs were not collected.

Other Notes

  • The logs_config.cca_in_ad has been removed.

Datadog Cluster Agent

New Features

  • Add conditions to Vertical Pod Autoscalers
  • Experimental: Support Ruby library injection through the Admission Controller on Kubernetes.

Enhancement Notes

  • Add new metrics for the KSM Core check for extended resources:
    • Pod requests and limits of the network bandwidth extended resource: kubernetes_state.container.network_bandwidth_limit, kubernetes_state.container.network_bandwidth_requested
    • The capacity and allocatable network bandwidth extended resource of a node: kubernetes_state.node.network_bandwidth_allocatable, kubernetes_state.node.network_bandwidth_capacity
  • Admission Controller: Add telemetry around auto-instrumentation via remote config.
  • The UDS socket volume when using the Admission Controller is now mounted in readOnly mode.