datadog-agent

Main repository for Datadog Agent

APACHE-2.0 License

Stars
2.6K
Committers
551

Bot releases are visible (Hide)

datadog-agent - 7.31.1

Published by remeh about 3 years ago

Prelude

Release on: 2021-09-28

Bug Fixes

  • Fix CSPM not sending intake protocol causing lack of host tags.
datadog-agent - Datadog Cluster Agent 1.15.0

Published by CharlyF about 3 years ago

Prelude

Released on: 2021-09-13
Pinned to datadog-agent v7.31.0

New Features

  • Enable StatefulSet collection by default in the orchestrator check.
  • Add PV and PVC collection in the orchestrator check.
  • Added possibility to use the maxAge attribute defined in the datadogMetric CRD overriding the global maxAge.
datadog-agent - Datadog Cluster Agent 1.14.0

Published by CharlyF about 3 years ago

Prelude

Released on: 2021-08-12
Pinned to datadog-agent v7.30.0

New Features

  • Enable DaemonSet collection by default in the orchestrator check. Add StatefulSet collection in the orchestrator check.

Enhancement Notes

  • The Cluster Agent's Admission Controller now uses the admissionregistration.k8s.io/v1 kubernetes API when available.
  • The Cluster Agent can be instructed to dispatch cluster checks without decrypting secrets. The node Agent or the cluster check runner will fetch the secrets after receiving the configurations from the Cluster Agent. This can be enabled by setting DD_SECRET_BACKEND_SKIP_CHECKS to true in the Cluster Agent config.
  • The Cluster Agent's external metrics provider now serves an OpenAPI endpoint.
  • Add the ability to change log_level at runtime. To set the log_level to debug the following command should be used: agent config set log_level debug.
  • Improve status and flare for the Cluster Check Runners.

Bug Fixes

  • Show different orchestrator status collection information between follower and leader.
  • Fix an edge case where the Admission Controller doesn't update the certificate according to the Cluster Agent configuration.
datadog-agent - Datadog Cluster Agent 1.13.0

Published by CharlyF about 3 years ago

Prelude

Released on: 2021-06-22
Pinned to datadog-agent v7.29.0

New Features

  • Collect the DaemonSet resources for the orchestrator explorer.

Enhancement Notes

  • The Cluster Agent exposes a new metric external_metrics.datadog_metrics to track the validity of DatadogMetric objects.

  • Add additional status information in orchestrator section output. Whether collection works and whether cluster name is set.

Bug Fixes

  • Autodetect EC2 cluster name

  • Decrease the Admission Controller timeout to avoid edge cases where high timeouts can cause ignoring the failurePolicy (see kubernetes/kubernetes#71508).

  • The Cluster Agent's admission controller now requires the pod label admission.datadoghq.com/enabled=true to inject standard labels. This optimizes the number of mutation webhook requests.

datadog-agent - 7.31.0

Published by kacper-murzyn about 3 years ago

Prelude

Release on: 2021-09-13

New Features

  • Added hostname_file as a configuration option that can be used to set
    the Agent's hostname.

  • APM: add a new HTTP proxy endpoint /appsec/proxy forwarding requests to Datadog's AppSec Intake API.

  • Add a new parameter (auto_exit) to allow the Agent to exit automatically based on some condition. Currently, the only supported method "noprocess", triggers an exit if no other processes are visible to the Agent (taking into account HOST_PROC). Only available on POSIX systems.

  • Allow specifying the destination for dogstatsd capture files, this
    should help drop captures on mounted volumes, etc. If no destination
    is specified the capture will default to the current behavior.

  • Allow capturing/replaying dogstatsd traffic compressed with zstd.
    This feature is now enabled by default for captures, but can still
    be disabled.

  • APM: Added endpoint for proxying Live Debugger requests.

  • Adds the ability to change log_level in the process agent at runtime using process-agent config set log_level <log-level>

  • Runtime-security new command line allowing to trigger runtime security agent self test.

Enhancement Notes

  • Introduce a container_exclude_stopped_age configuration option to allow
    the Agent to not autodiscover containers that have been stopped for a
    certain number of hours (by default 22). This makes restarts of the Agent
    not re-send logs for these containers.

  • Add two new parameters to allow customizing APIServer connection parameters (CAPath, TLSVerify) without requiring to use a fully custom kubeconfig.

  • Leverage Cloud Foundry application metadata to automatically tag Cloud Foundry containers. A label or annotation prefixed with tags.datadoghq.com/ is automatically picked up and used to tag the application container when the cluster agent is configured to query the CC API.

  • The agent configcheck command prints a message for checks that matched a
    container exclusion rule.

  • Add calls to Cloudfoundry API for space and organization data to tag application containers with more up-to-date information compared to BBS API.

  • The agent diagnose and agent flare commands no longer create error-level log messages when the diagnostics fail.
    These message are logged at the "info" level, instead.

  • With the dogstatsd-replay feature allow specifying the number of
    iterations to loop over the capture file. Defaults to 1. A value
    of 0 loops forever.

  • Collect net stats metrics (RX/TX) for ECS Fargate in Live Containers.

  • EKS Fargate containers are tagged with eks_fargate_node.

  • The agent flare command will now include an error message in the
    resulting "local" flare if it cannot contact a running agent.

  • The Kube State Metrics Core check sends a new metric kubernetes_state.pod.count
    tagged with owner tags (e.g kube_deployment, kube_replica_set, kube_cronjob, kube_job).

  • The Kube State Metrics Core check tags kubernetes_state.replicaset.count with a kube_deployment tag.

  • The Kube State Metrics Core check tags kubernetes_state.job.count with a kube_cronjob tag.

  • The Kube State Metrics Core check adds owner tags to pod metrics.
    (e.g kube_deployment, kube_replica_set, kube_cronjob, kube_job)

  • Improve accuracy and reduce false positives on the collector-queue health
    check

  • Support posix-compliant flags for process-agent. Shorthand flags for "p" (pid), "i" (info), and "v" (version) are
    now supported.

  • The Agent now embeds Python-3.8.11, an upgrade from
    Python-3.8.10.

  • APM: Updated the obfuscator to replace digits in IDs of SQL statement in addition to table names,
    when this option is enabled.

  • The logs-agent now retries on an HTTP 429 response, where this had been treated as a hard failure.
    The v2 Event Intake will return 429 responses when it is overwhelmed.

  • Runtime security now exposes change_time and modification_time in SECL.

  • Add security-agent config file to flare

  • Add min_collection_interval config to snmp_listener

  • TCP log collectors have historically closed sockets that are idle for more
    than 60 seconds. This is no longer the case. The agent relies on TCP
    keepalives to detect failed connections, and will otherwise wait indefinitely
    for logs to arrive on a TCP connection.

  • Enhances the secrets feature to support arbitrarily named user
    accounts running the datadog-agent service. Previously the
    feature was hardcoded to ddagentuser or Administrator accounts
    only.

Deprecation Notes

  • Deprecated non-posix compliant flags for process agent. A warning should now be displayed if one is detected.

Bug Fixes

  • Add send_monotonic_with_gauge, ignore_metrics_by_labels,
    and ignore_tags params to prometheus scrape. Allow values
    defaulting to true to be set to false, if configured.

  • APM: Fix bug in SQL normalization that resulted in negative integer values to be normalized with an extra minus sign token.

  • Fix an issue with autodiscovery on CloudFoundry where in case an application instance crash, a new integration configuration would not be created for the new app instance.

  • Auto-discovered checks will not target init containers anymore in Kubernetes.

  • Fixes a memory leak when the Agent is running in Docker environments. This
    leak resulted in memory usage growing linearly, corresponding with the
    amount of containers ever ran while the current Agent process was also
    running. Long-lived Agent processes on nodes with a lot of container churn
    would cause the Agent to eventually run out of memory.

  • Fixes an issue where the docker.containers.stopped metric would have
    unpredictable tags. Now all stopped containers will always be reported with
    the correct tags.

  • Fixes bug in enrich tags logic while a dogstatsd capture replay is in
    process; previously when a live traffic originID was not found in the
    captured state, no tags were enriched and the live traffic tagger was
    wrongfully skipped.

  • Fixes a packaging issue on Linux where the unixodbc configuration files in
    /opt/datadog-agent/embedded/etc would be erased during Agent upgrades.

  • Fix hostname detection when Agent is running on-host and monitoring containerized workload by not using hostname coming from containerized providers (Docker, Kubernetes)

  • Fix default mapping for statefulset label in Kubernetes State Metric Core check.

  • Fix handling of CPU metrics collected from cgroups when cgroup files are missing.

  • Fix a bug where the status command of the security agent
    could crash if the agent is not fully initialized.

  • Fixed a bug where the CPU check would not work within a container on Windows.

  • Flare generation is no longer subject to the server_timeout configuration,
    as gathering all of the information for a flare can take quite some time.

  • [corechecks/snmp] Support inline profile definition

  • Fixes a bug where the Agent would hold on to tags from stopped ECS EC2 (but
    not Fargate) tags forever, resulting in increased memory consumption on EC2
    instances handling a lot of short scheduled tasks.

  • On non-English Windows, the Agent correctly parses the output of netsh.

Other Notes

  • The datadog-agent, datadog-iot-agent and datadog-dogstatsd deb packages now have a weak dependency (Recommends:) on the datadog-signing-keys package.
datadog-agent - 7.30.2

Published by albertvaka about 3 years ago

Prelude

Release on: 2021-08-23

This is a Windows-only release.

Bug Fixes

  • On Windows, disables ephemeral port range detection. Fixes crash on non
    EN-US windows
datadog-agent - 7.30.1

Published by vickenty about 3 years ago

Prelude

Release on: 2021-08-20

datadog-agent - 7.30.0

Published by vickenty about 3 years ago

Prelude

Release on: 2021-08-12

New Features

  • APM: It is now possible to enable internal profiling of the
    trace-agent. Warning however that this will incur additional billing
    charges and should not be used unless agreed with support.
  • APM: Added experimental support for Opentelemetry collecting via
    experimental.otlp.{http_port,grpc_port} or their corresponding
    environment variables (DD_OTLP{HTTP,GRPC}_PORT).
  • Kubernetes Autodiscovery now supports additional template variables:
    %%kube_pod_name%%, %%kube_namespace%% and %%kube_pod_uid%%.
  • Add support for SELinux related events, like boolean value updates
    or enforcment status changes.

Enhancement Notes

  • Reveals useful information within a SQL execution plan for Postgres.
  • Add support to provide options to the obfuscator to change the
    behavior.
  • APM: Added additional tags to profiles in AWS Fargate environments.
  • APM: Main hostname acquisition now happens via gRPC to the Datadog
    Agent.
  • Make the check_sampler bucket expiry configurable based on the
    number of CheckSampler commits.
  • The cri check no longer sends metrics for stopped containers, in
    line with containerd and docker checks. These metrics were all zeros
    in the first place, so no impact is expected.
  • Kubernetes State Core check: Job metrics corresponding to a Cron Job
    are tagged with a kube_cronjob tag.
  • Environment autodiscovery is now used to selectively activate
    providers (kubernetes, docker, etc.) inside each component (tagger,
    host tags, hostname).
  • When using a secret_backend_command
    STDERR is always logged with a debug log level. This eases
    troubleshooting a user's secret_backend_command in a containerized
    environment.
  • secret_backend_timeout has been
    increased from 5s to 30s. This increases support for the slow to
    load Python script used for secret_backend_command. This was an issue
    when importing large libraries in a containerized environment.
  • Increase default timeout to sync Kubernetes Informers from 2 to 5
    seconds.
  • The Kube State Metrics Core checks adds the global user-defined tags
    (DD_TAGS) by the default.
  • If the new log_all_goroutines_when_unhealthy configuration
    parameter is set to true, when a component is unhealthy, log the
    stacktraces of the goroutines to ease the investigation.
  • The amount of time the agent waits before scanning for new logs is
    now configurable with logs_config.file_scan_period
  • Flares now include goroutine blocking and mutex profiles if enabled.
    New flare options were added to collect new profiles at the same
    time as cpu profile.
  • Add a section about container inclusion/exclusion errors to the
    agent status command.
  • Runtime Security now provide kernel related information as part of
    the flare.
  • Python interpreter sys.executable is now set to the appropriate
    interpreter's executable path. This should allow multiprocessing
    to be able to spawn new processes since it will try to invoke the
    Python interpreter instead of the Agent itself. It should be noted
    though that the Pyton packages injected at runtime by the Agent are
    only available from the main process, not from any sub-processes.
  • Add a single entrypoint script in the agent docker image. This
    script will be leveraged by a new version of the Helm chart.
  • [corechecks/snmp] Add bulk_max_repetitions config
  • Add device status snmp corecheck metadata
  • [snmp/corecheck] Add interface.id_tags needed to correlated
    metadata interfaces with interface metrics
  • In addition to the existing /readsecret.py script, the Agent
    container image contains another secret helper script
    /readsecret.sh, faster and more reliable.
  • Consider pinned CPUs (cpusets) when calculating CPU limit from
    cgroups.

Bug Fixes

  • APM: Fix SQL obfuscation on postgres queries using the tilde
    operator.
  • APM: Fixed an issue with the Web UI on Internet Explorer.
  • APM: The priority sampler service catalog is no longer unbounded. It
    is now limited to 5000 service & env combinations.
  • Apply the max_returned_metrics
    parameter from prometheus annotations, if configured.
  • Removes noisy error logs when collecting Cloud Foundry application
    containers
  • For dogstatsd captures, Only serialize to disk the portion of
    buffers actually used by the payloads ingested, not the full buffer.
  • Fix a bug in cgroup parser preventing from getting proper metrics in
    Container Live View when using CRI-O and systemd cgroup manager.
  • Avoid sending duplicated datadog.agent.up service checks.
  • When tailing logs from docker with DD_LOGS_CONFIG_DOCKER_CONTAINER_USE_FILE=true
    and a source container label is set the agent will now respect that
    label and use it as the source. This aligns the behavior with
    tailing from the docker socket.
  • On Windows, when the host shuts down, handles the PreShutdown
    message to avoid the error
    The DataDog Agent service terminated unexpectedly. It has done this 1 time(s). The following corrective action will be taken in 60000 milliseconds: Restart the service.
    in Event Viewer.
  • Fix label joins in the Kube State Metrics Core check.
  • Append the cluster name, if found, to the hostname for
    kubernetes_state_core metrics.
  • Ensure the health probes used as Kubernetes liveness probe are not
    failing in case of issues on the network or on an external
    component.
  • Remove unplanned call between the process-agent and the the DCA when
    the orchestratorExplorer feature is disabled.
  • [corechecks/snmp] Set default oid_batch_size to 5. High oid
    batch size can lead to timeouts.
  • Agent collecting Docker containers on hosts with a lot of container
    churn now uses less memory by properly purging the respective tags
    after the containers exit. Other container runtimes were not
    affected by the issue.

Other Notes

  • APM: The trace-agent no longer warns on the first outgoing request
    retry, only starting from the 4th.
  • All Agent binaries are now compiled with Go 1.15.13
  • JMXFetch upgraded to 0.44.2
    https://github.com/DataDog/jmxfetch/releases/0.44.2
  • Build environment changes:
    • omnibus-software: [cacerts] updating with latest: 2021-07-05
      (#399)
    • omnibus-ruby: Support 'Recommends' dependencies for deb packages
      (#122)
  • Runtime Security doesn't set the service tag with the runtime-security-agent value by default.
datadog-agent - 7.29.1

Published by truthbk over 3 years ago

Prelude

Release on: 2021-07-13

This is a linux + docker-only release.

New Features

  • APM: Fargate stats and traces are now correctly computed, aggregated
    and present the expected tags.

Bug Fixes

  • APM: The value of the default env is now normalized during
    trace-agent initialization.
datadog-agent - Datadog Cluster Agent 1.13.1

Published by mfpierre over 3 years ago

Prelude

Released on: 2021-07-05
Pinned to datadog-agent v7.29.0

Bug Fixes

  • Fix the embedded security policy version to match the one from the agent.
datadog-agent -

Published by truthbk over 3 years ago

Prelude

Release on: 2021-06-24

Upgrade Notes

  • Upgrade Docker base image to ubuntu:21.04 as new stable release.

New Features

  • New extra_tags setting and DD_EXTRA_TAGS environment variable can be
    used to specify additional host tags.
  • Add network devices metadata collection
  • APM: The obfuscator adds two new features (dollar_quoted_func and keep_sql_alias). They are off by default.
    For more details see PR 8071. We do not recommend using these
    features unless you have a good reason or have been recommended by
    support for your specific use-case.
  • APM: Add obfuscator support for Postgres dollar-quoted string
    constants.
  • Tagger state will now be stored for dogstatsd UDS traffic captures
    with origin detection. The feature will track the incoming traffic,
    building a map of traffic source processes and their source
    containers, then storing the relevant tagger state into the capture
    file. This will allow to not only replay the traffic, but also load
    a snapshot of the tagger state to properly tag replayed payloads in
    the dogstatsd pipeline.
  • New host_aliases setting can be used
    to add custom host aliases in addition to aliases obtained from
    cloud providers automatically.
  • Paths can now be relsolved using an eRPC request.
  • Add time comparison support in SECL allow to write rules such as:
    open.file.path == "/etc/secret" &&
    process.created_at > 5s

Enhancement Notes

  • Add the following new metrics to the kubernetes_state_core.
    • node.ephemeral_storage_allocatable`
    • node.ephemeral_storage_capacity
  • Agent can now set hostname based on Azure instance metadata. See the
    new azure_hostname_style configuration option.
  • Compliance agents can now generated multiple reports per run.
  • Docker and Kubernetes log launchers will now be retried until one
    succeeds instead of falling back to the docker launcher by default.
  • Increase payload size limit for dbm-metrics from 1
    MB to 20 MB.
  • Expose new batch_max_size and batch_max_content_size config settings
    for all logs endpoints.
  • Adds improved cadence/resolution captures/replay to dogstatsd
    traffic captures. The new file format will store payloads with
    nanosecond resolution. The replay feature remains
    backward-compatible.
  • Support fetching host tags using ECS task and EKS IAM roles.
  • Improve the resiliency of the datadog-agent check command when
    running Autodiscovered checks.
  • Adding the hostname to the host aliases when running on GCE
  • Display more information when the error
    Could not initialize instance happens. JMXFetch upgraded to
    0.44.0
  • Kubernetes pod with short-lived containers won't have a few logs of
    lines duplicated with both container tag (the stopped one and the
    running one) anymore while logs are being collected. Mount
    /var/log/containers and use
    logs_config.validate_pod_container_id to enable this feature.
  • The kube state metrics core check now tags pod metrics with a
    reason tag. It can be NodeLost, Evicted or
    UnexpectedAdmissionError.
  • Implement the following synthetic metrics in the
    kubernetes_state_core.
    • cronjob.count
    • endpoint.count
    • hpa.count
    • `vpa.count
  • Add system.cpu.interrupt on linux.
  • Authenticate logs http input requests using the API key header
    rather than the URL path.
  • Upgrade embedded Python 3 from 3.8.8 to 3.8.10. See Python 3.8's
    changelog
    .
  • Show autodiscovery errors from pod annotations in agent status.
  • Paths are no longer limited to segments of 128 characters and a
    depth of 16. Each segment can now be up to 255 characters (kernel
    limit) and with a depth of up to 1740 parents.
  • Add loader as snmp_listener.loader config
  • Make SNMP Listener configs compatible with SNMP Integration configs
  • The agent stream-logs command will
    use less CPU while idle.

Security Notes

  • Redact the whole annotation
    "kubectl.kubernetes.io/last-applied-configuration" to ensure we
    don't expose secrets.

Bug Fixes

  • Imports the value of non_local_traffic to dogstatsd_non_local_traffic (in addition
    to apm_config.non_local_traffic)
    when upgrading from Datadog Agent v5.
  • Fixes the Agent using 100% CPU on MacOS Big Sur.
  • Declare database_monitoring.{samples,metrics} as
    known keys in order to remove "unknown key" warnings on startup.
  • Fixes the container_name tag not being updated after Docker
    containers were renamed.
  • Fixes CPU utilization being underreported on Windows hosts with more
    than one physical CPU.
  • Fix CPU limit used for Live Containers page in ECS Fargate
    environments.
  • Fix bug introduced in 7.26 where default checks were schedueld on
    ECS Fargate due to changes in entrypoint scripts.
  • Fix a bug that can make the agent enable incompatible Autodiscovery
    listeners.
  • An error log was printed when the creation date or the started date
    of a fargate container was not found in the fargate API payload.
    This would happen even though it was expected to not have these
    dates because of the container being in a given state. This is now
    fixed and the error is only printed when it should be.
  • Fix the default value of the configuration option
    forwarder_storage_path when run_path is set. The default value
    is RUN_PATH/transactions_to_retry where RUN_PATH is defined by
    the configuration option run_path.
  • In some cases, compliance checks using YAML file with JQ expressions
    were failing due to discrepencies between YAML parsing and gojq
    handling.
  • On Windows, fixes inefficient string conversion
  • Reduce CPU usage when logs agent is unable to reach an http
    endpoint.
  • Fixed no_proxy depreciation warning from being logged too
    frequently. Added better warnings for when the proxy behavior could
    change.
  • Ignore CollectorStatus response from orchestrator-intake in the
    process-agent to prevent changing realtime mode interval to default
    2s.
  • Fixes an issue where the Agent would not retry resource tags
    collection for containers on ECS if it could retrieve only a subset
    of tags. Now it will keep on retrying until the complete set of tags
    is collected.
  • Fix noisy configuration error when specifying a proxy config and
    using secrets management.
  • Reduce amount of log messages on windows when tailing log files.

Other Notes

datadog-agent - 6.28.1

Published by olivielpeau over 3 years ago

6.28.1 ships the same features as 7.28.1 except for the Python versions it supports.

Please refer to the 7.28.1 changelog.

datadog-agent - 7.28.1

Published by olivielpeau over 3 years ago

Prelude

Release on: 2021-05-31

datadog-agent - 6.28.0

Published by olivielpeau over 3 years ago

6.28.0 ships the same features as 7.28.0 except for the Python versions it supports.

Please refer to the 7.28.0 changelog.

datadog-agent - 7.28.0

Published by olivielpeau over 3 years ago

Prelude

Release on: 2021-05-26

Upgrade Notes

  • Change base Docker image used to build the Agent images, moving from
    debian:bullseye to ubuntu:20.10. In the future the Agent will
    follow Ubuntu stable versions.
  • Windows Docker images based on Windows Core are now provided. Checks
    that didn't work on Nano should work on Core.

New Features

  • APM: Add a new feature flag component2name which determines the
    component tag value on a span to become its operation name. This
    facititates compatibility with Opentracing.

  • Adds a functionality to allow capturing and replaying of UDS
    dogstatsd traffic.

  • Expose new aggregator.submit_event_platform_event python API with
    two supported event types: dbm-samples and dbm-metrics.

  • Runtime security reports environment variables.

  • Runtime security now reports command line arguments as part of the
    exec events.

  • The args_flags and args_options were added to the SECL language
    to ease the writing of runtime security rules based on command line
    arguments. args_flags is used to catch arguments that start by
    either one or two hyphen characters but do not accept any associated
    value.

    Examples:

    • version is part of args_flags for the command
      cat --version
    • l and n both are in args_flags for the command
      netstat -ln
    • T=8 and width=8 both are in args_options for the command
      ls -T 8 --width=8.
  • Add support for ARM64 to the runtime security agent

Enhancement Notes

  • Add oid_batch_size configuration as init and instance config

  • Add oid_batch_size config to snmp_listener

  • Group the output of agent tagger-list by entity and by source.

  • On Windows on a Domain Controller, if no domain name is specified,
    the installer will use the controller's joined domain.

  • Windows installer can now use the command line key
    EC2_USE_WINDOWS_PREFIX_DETECTION to set the config value of
    ec2_use_windows_prefix_detection

  • APM: The trace writer will now consider 408 errors to be retriable.

  • Build RPMs that can be installed in FIPS mode. This change doesn't
    affect SUSE RPMs.

    RPMs are now built with RPM 4.15.1 and have SHA256 digest headers,
    which are required by RPM on CentOS 8/RHEL 8 when running in FIPS
    mode.

    Note that newly built RPMs are no longer installable on CentOS
    5/RHEL 5.

  • Make the check_sampler bucket expiry configurable

  • The Agent can be configured to replace colon : characters in the
    ECS resource tag keys by underscores _. This can be done by
    enabling ecs_resource_tags_replace_colon: true in the Agent config
    file or by configuring the environment variable
    DD_ECS_RESOURCE_TAGS_REPLACE_COLON=true.

  • Add jvm.gc.old_gen_size as an alias for Tenured Gen. Prevent
    double signing of release artifacts.

  • JMXFetch upgraded to
    v0.44.0.

  • The kubernetes_state_core check now collects two new metrics
    kubernetes_state.pod.age and kubernetes_state.pod.uptime.

  • Improve logs/sender throughput by adding optional concurrency for
    serializing & sending payloads.

  • Make kube_replica_set tag low cardinality

  • Runtime Security now supports regexp in SECL rules.

  • Add loader tag to snmp telemetry metrics

  • Network Performance Monitoring for windows now collects DNS stats,
    connections will be shows in the networks -> DNS page.

Deprecation Notes

  • For internal profiling of agent processes, the profiling option
    has been renamed to internal_profiling to avoid confusion.
  • The single dash variants of the system-probe flags are now
    deprecated. Please use --config and --pid instead.

Bug Fixes

  • APM: Fixes bug where long service names and operation names were not
    normalized correctly.
  • On Windows, fixes a bug in process agent in which the process agent
    would become unresponsive.
  • The Windows installer compares the DNS domain name and the joined
    domain name using a case-insensitive compare. This avoids an
    incorrect warning when the domain names match but otherwise have
    different cases.
  • Replace usage of runtime.NumCPU when used to compute metrics
    related to CPU Hosts. On some Unix systems, runtime.NumCPU can be
    influenced by CPU affinity set on the Agent, which should not affect
    the metrics computed for other processes/containers. Affects the CPU
    Limits metrics (docker/containerd) as well as the live containers
    page metrics.
  • Fix issue where Kube Apiserver cache sync timeout configuration is
    not used.
  • Fix the usage of DD_ORCHESTRATOR_EXPLORER_ORCHESTRATOR_DD_URL and
    DD_ORCHESTRATOR_EXPLORER_MAX_PER_MESSAGE environment variables.
  • Fix a panic that could occur in Docker AD listener when doing
    docker inspect fails
  • Fix a small leak where the Agent in some cases keeps in memory
    identifiers corresponding to dead objects (pods, containers).
  • Log file byte count now works correctly on Windows.
  • Agent log folder on Mac is moved from /var/log/datadog to
    /opt/datadog-agent/logs. A link will be created at
    /var/log/datadog pointing to /opt/datadog-agent/logs to maintain
    the compatibility. This is to workaround the issue that some Mac OS
    releases purge /var/log folder on ugprade.
  • Packaging: ensure only one pip3 version is shipped in embedded/
    directory
  • Fix eBPF runtime compilation errors with tcp_queue_length and
    oom_kill checks on Ubuntu 20.10.
  • Add a validation step before accepting metrics set in HPAs. This
    ensures that no obviously-broken metric is accepted and goes on to
    break the whole metrics gathering process.
  • The Windows installer now log only once when it fails to replace a
    property.
  • Windows installer will not abort if the Server service is not
    running (introduced in 6.24.0/7.24.0).

Other Notes

  • The Agent, Logs Agent and the system-probe are now compiled with Go
    1.15.11
  • Bump embedded Python 3 to 3.8.8
datadog-agent - 6.27.1

Published by sgnn7 over 3 years ago

6.27.1 ships the same features as 7.27.1 except for the Python versions it supports.

Please refer to the 7.27.1 changelog.

datadog-agent - 7.27.1

Published by sgnn7 over 3 years ago

Prelude

Release on: 2021-05-07

This is a Windows-only release (MSI and Chocolatey installers only).

Bug Fixes

  • On Windows, exit system-probe if process-agent has not queried for
    connection data for 20 consecutive minutes. This ensures excessive
    system resources are not used while connection data is not being
    sent to Datadog.
datadog-agent - Datadog Cluster Agent 1.12.0

Published by celenechang over 3 years ago

Prelude

Pinned to datadog-agent v7.28.0-rc.5

New Features

  • The cluster-agent container now tries to remove any folder beginning by .. in paths of
    files mounted in /conf.d while copying them to the cluster-agent config folder

  • collect cluster resource for orchestrator explorer.

  • It's now possible to template the kube_cluster_name tag in DatadogMetric queries
    Example: avg:nginx.net.request_per_s{kube_container_name:nginx,kube_cluster_name:%%tag_kube_cluster_name%%}

  • It's now possible to template any environment variable (as seen by the Datadog Cluster Agent) as tag in DatadogMetric queries
    Example: avg:nginx.net.request_per_s{kube_container_name:nginx,kube_cluster_name:%%env_DD_CLUSTER_NAME%%}

Enhancement Notes

  • It is now possible to configure a custom timeout for the MutatingWebhookConfigurations
    objects controlled by the Cluster Agent via DD_ADMISSION_CONTROLLER_TIMEOUT_SECONDS. (Default: 30 seconds)

  • The Datadog Cluster Agent's Admission Controller now uses a namespaced secrets informer.
    It no longer needs permissions to watch secrets at the cluster scope.

  • The cluster agent now uses the same configuration than the security agent for
    the logs endpoints configuration. The parameters (such as logs_dd_url can be
    either be specified in the compliance_config.endpoints section or through
    environment variables (such as DD_COMPLIANCE_CONFIG_ENDPOINTS_LOGS_DD_URL).

  • Improve the resilience of the connection of controllers to the External Metrics Server by moving to a dynamic client for the WPA controller.

Upgrade Notes

  • Change base Docker image used to build the Cluster Agent imges, moving from debian:bullseye to ubuntu:20.10.
    In the future the Cluster Agent will follow Ubuntu stable versions.

Bug Fixes

  • Fix a potential file descriptors leak.

  • The Cluster Agent can now be configured to use tls 1.2 via DD_FORCE_TLS_12=true

  • Fix "Error creating expvar server" error log when running the Datadog Cluster Agent CLI commands.

  • Fix a bug preventing the
    "DD_ORCHESTRATOR_EXPLORER_ORCHESTRATOR_ADDITIONAL_ENDPOINTS" environment
    variable to be read.

datadog-agent - 6.27.0

Published by hush-hush over 3 years ago

6.27.0 ships the same features as 7.27.0 except for the Python versions it supports.

Please refer to the 7.27.0 changelog

datadog-agent - 7.27.0

Published by hush-hush over 3 years ago

Prelude

Release on: 2021-04-14

Upgrade Notes

  • SECL and JSON format were updated to introduce the new attributes.
    Legacy support was added to avoid breaking existing rules.
  • The overlay_numlower integer
    attribute that was reported for files and executables was
    unreliable. It was replaced by a simple boolean attribute named
    in_upper_layer that is set to true
    when a file is either only on the upper layer of an overlayfs
    filesystem, or is an altered version of a file present in a base
    layer.

New Features

  • APM: Add support for AIX/ppc64. Only POWER8 and above is supported.
  • Adds support for Kubernetes namespace labels as tags extraction
    (kubernetes_namespace_labels_as_tags).
  • Add snmp corecheck implementation in go
  • APM: Tracing clients no longer need to be sending traces marked with
    sampling priority 0 (AUTO_DROP) in order for stats to be correct.
  • APM: A new discovery endpoint has been added at the /info path. It
    reveals information about a running agent, such as available
    endpoints, version and configuration.
  • APM: Add support for filtering tags by means of
    apm_config.filter_tags or environment variables
    DD_APM_FILTER_TAGS_REQUIRE and DD_APM_FILTER_TAGS_REJECT.
  • Dogstatsd clients can now choose the cardinality of tags added by
    origin detection per metrics via the tag 'dd.internal.card' ("low",
    "orch", "high").
  • Added two new metrics to the Disk check: read_time and write_time.
  • The Agent can store traffic on disk when the in-memory retry queue
    of the forwarder limit is reached. Enable this capability by setting
    forwarder_storage_max_size_in_bytes to
    a positive value indicating the maximum amount of storage space, in
    bytes, that the Agent can use to store traffic on disk.
  • PCF Containers custom tags can be extracted from environment
    variables based on an include and exclude lists mechanism.
  • NPM is now supported on Windows, for Windows versions 2016 and
    above.
  • Runtime security now report command line arguments as part of the
    exec events.
  • Process credentials are now tracked by the runtime security agent.
    Various user and group attributes are now collected, along with
    kernel capabilities.
  • File metadata attributes are now available for all events. Those new
    attributes include uid, user, gid, group, mode, modification time
    and change time.
  • Add config parameters to enable fim and runtime rules.
  • Network Performance Monitoring for Windows instruments DNS. Network
    data from Windows hosts will be tagged with the domain tag, and the
    DNS page will show data for Windows hosts.

Enhancement Notes

  • Improves sensitive data scrubbing in URLs
  • Includes UTC time (unless already in UTC+0) and millisecond
    timestamp in status logs. Flare archive filename now timestamped in
    UTC.
  • Automatically set debug log_level when the '--flare' option is used
    with the JMX command
  • Number of matched lines is displayed on the status page for each
    source using multi_line log processing rules.
  • Add public IPv4 for EC2/GCE instances to host network metadata.
  • Add loader config to snmp_listener
  • Add snmp corecheck extract value using regex
  • Remove agent MaxNumWorkers hard limit that cap the number of check
    runners to 25. The removal is motivated by the need for some users
    to run thousands of integrations like snmp corecheck.
  • APM: Change in the stats payload format leading to reduced CPU and
    memory usage. Use of DDSketch instead of GKSketch to aggregate
    distributions leading to more accurate high percentiles.
  • APM: Removal of sublayer metric computation improves performance of
    the trace agent (CPU and memory).
  • APM: All API endpoints now respond with the "Datadog-Agent-Version"
    HTTP response header.
  • Query application list from Cloud Foundry Cloud Controller API to
    get up-to-date application names for tagging containers and metrics.
  • Introduce a clc_runner_id config option to allow overriding the
    default Cluster Checks Runner identifier. Defaults to the node name
    to make it backwards compatible. It is intended to allow binpacking
    more than a single runner per node.
  • Improve migration path when shifting docker container tailing from
    the socket to file. If tailing from file for Docker containers is
    enabled, container with an existing entry relative to a socket
    tailer will continue being tailed from the Docker socket unless the
    following newly introduced option is set to true:
    logs_config.docker_container_force_use_file It aims to allow
    smooth transition to file tailing for Docker containers.
  • (Unix only) Add go_core_dump flag
    to generate core dumps on Agent crashes
  • JSON payload serialization and compression now uses shared input and
    output buffers to reduce total allocations in the lifetime of the
    agent.
  • On Windows the comments in the datadog.yaml file are preserved after
    installation.
  • Add kube_region and kube_zone tags to node metrics reported by the
    kube-state-metrics core check
  • Implement the following synthetic metrics in the
    kubernetes_state_core check to mimic the legacy kubernetes_state
    one.
    • persistentvolumes.by_phase
    • service.count
    • namespace.count
    • replicaset.count
    • job.count
    • deployment.count
    • daemonset.count
    • statefulset.coumt
  • Minor improvements to agent log-stream command. Fixed timestamp,
    added host name, use redacted log message instead of raw message.
  • NPM - Improve accuracy of retransmits tracking on kernels >=4.7
  • Orchestrator explorer collection is no longer handled by the
    cluster-agent directly but by a dedicated check.
  • prometheus_scrape.checks may now be defined as an environmnet
    variable DD_PROMETHEUS_SCRAPE_CHECKS formatted as JSON
  • Runtime security module doesn't stop on first policies file load
    error and now send an event with a report of the load.
  • Sketch series payloads are now compressed as a stream to reduce
    buffer allocations.
  • The Datadog Agent won't try to connect to kubelet anymore if it's
    not running in a Kubernetes cluster.

Known Issues

  • On Linux kernel versions < 3.15, conntrack (used for NAT info for
    connections) sampling is not supported, and conntrack updates will
    be aborted if a higher rate of conntrack updates from the system
    than set by system_probe_config.conntrack_rate_limit is
    detected. This is done to limit excessive resource consumption by
    the netlink conntrack update system. To keep using this system even
    with a high rate of conntrack updates, increase the
    system_probe_config.conntrack_rate_limit. This can potentially
    lead to higher cpu usage.

Deprecation Notes

  • APM: Sublayer metrics (trace.<SPAN_NAME>.duration and
    derivatives) computation is removed from the agent in favor of new
    sublayer metrics generated in the backend.

Bug Fixes

  • Fixes bug introduced in #7229
  • Adds a limit to the number of DNS stats objects the DNSStatkeeper
    can have at any given time. This can alleviate memory issues on
    hosts doing high numbers of DNS requests where network performance
    monitoring is enabled.
  • Add tags to snmp_listener network configs. This is needed since
    user switching from Python SNMP Autodiscovery will expect to have
    tags to be available with Agent SNMP Autodiscovery (snmp_listener)
    too.
  • APM: When UDP is not available for Dogstatsd, the trace-agent can
    now use any other available alternative, such as UDS or Windows
    Pipes.
  • APM: Fixes a bug where nested SQL queries may occasionally result in
    bad obfuscator output.
  • APM: All Datadog API key usage is sanitized to exclude newlines and
    other control characters.
  • Exceeding the conntrack rate limit
    (system_probe_config.conntrack_rate_limit) would result in
    conntrack updates from the system not being processed anymore
  • Address issue with referencing the wrong repo tag for Docker image
    by simplifying logic in DockerUtil.ResolveImageNameFromContainer to
    prefer Config.Image when possible.
  • Fix kernel version parsing when subversion/patch is > 255, so
    eBPF program loading does not fail.
  • Agent host tags are now correctly removed from the in-app host when
    the configured tags/DD_TAGS list is empty or not defined.
  • Fixes scheduling of non-working container checks introduced by
    environment autodiscovery in 7.26. Features can now be exluded from
    autodiscovery results through autoconfig_exclude_features. Example:
    autoconfig_exclude_features: ["docker","cri"] or
    DD_AUTOCONFIG_EXCLUDE_FEATURES="docker cri" Fix typo in variable
    used to disable environment autodiscovery and make it usable in
    datadog.yaml. You should now set
    autoconfig_from_environment: false
    or DD_AUTOCONFIG_FROM_ENVIRONMENT=false
  • Fixes limitation of runtime autodiscovery which would not allow to
    run containerd check without cri check enabled. Fixes error logs in
    non-Kubernetes environments.
  • Fix missing tags on Dogstatsd metrics when
    DD_DOGSTATSD_TAG_CARDINALITY=orchestrator (for instance,
    task_arn on Fargate)
  • Fix a panic in the system-probe part
    of the tcp_queue_length check when
    running on nodes with several CPUs.
  • Fix agent crashes from Python interpreter being freed too early.
    This was most likely to occur as an edge case during a shutdown of
    the agent where the interpreter was destroyed before the finalizers
    for a check were invoked by finalizers.
  • Do not make the liveness probe fail in case of network connectivity
    issue. However, if the agent looses network connectivity, the
    readiness probe may still fail.
  • On Windows, using process agent, fixes the virtual CPU count when
    the device has more than one physical CPU (package)).
  • On Windows, fixes problem in process agent wherein windows processes
    could not completely exit.
  • (macOS only) Apple M1 chip architecture information is now correctly
    reported.
  • Make ebpf compiler buildable on non-GLIBC environment.
  • Fix a bug preventing pod updates to be sent due to the Kubelet
    exposing unreliable resource versions.
  • Silence INFO and WARNING gRPC logs by default. They can be
    re-enabled by setting GRPC_GO_LOG_VERBOSITY_LEVEL to either INFO
    or WARNING.

Other Notes

  • Network monitor now fails to load if conntrack initialization fails
    on system-probe startup. Set
    network_config.ignore_conntrack_init_failure to true to reverse
    this behavior.
  • When generating the permissions.log file for a flare, if the owner
    of a file no longer exists in the system, return its id instead
    instead of failing.
  • Upgrade embedded openssl to 1.1.1k.