Datadog Fundamentals
What's the output of the following code snippet? class Dog: def walk(self): return ""*walking*"" def speak(self): return ""Woof!"" class JackRussellTerrier(Dog): def speak(self): return ""Arff!"" bobo = JackRussellTerrier() bobo.walk()
*walking*
What are the various status for a service check?
0 for OK, 1 for Warning, 2 for Critical, and 3 for Unknown.
Given the above metric, which one (s) represent (s) the space aggregation? max:lab.spikes.rate{lunch:salad} by {host}.rollup(sum, 3600) 1----2-------------------3----------------4----------5-----------------------
1 & 4
What is the range for private IP address?
10.0.0.0 — 10.255.255.255 172.16.0.0 — 172.31.255.255 192.168.0.0 — 192.168.255.255
How often does the Agent report metrics to the platform?
15 sec
What is the default number of check_runners?
4 - https://github.com/DataDog/datadog-agent/blob/main/pkg/config/config_template.yaml#L583
What is the default limit of API key?
50 API keys
How many system level metrics the agent is able to collect every 15 to 20 seconds?
75 to 100 - https://docs.datadoghq.com/getting_started/agent/
What port is used to submit custom metrics to the Datadog Agent via DogStatsD?
8125
What is the maximum total storage space used from which agent will not store metrics on disk?
95% - https://docs.datadoghq.com/agent/guide/network/?tab=agentv6v7#data-buffering
How to filter for my_metric where the tag team starts with `tps` and finishes with `is_the_best`?
<aggr>:my_metric{team:tps*,team:*is_the_best}
What is the impact of a sample rate on DogStatsD metric submission over UDP?
A sample rate of 1 sends metrics 100% of the time, while a sample rate of 0 sends metrics 0% of the time. METRIC TYPE SAMPLE RATE CORRECTION COUNT Values received are multiplied by (1/<SAMPLE_RATE>). It's reasonable to assume that for one datapoint received, 1/<SAMPLE_RATE> were actually sampled with the same value. GAUGE No correction. The value received is kept as is. SET No correction. The value received is kept as is. HISTOGRAM The histogram.count statistic is a COUNT metric, and receives the correction outlined above. Other statistics are gauge metrics and aren't "corrected". https://docs.datadoghq.com/metrics/custom_metrics/dogstatsd_metrics_submission/#sample-rates
What is a flare?
A zipped folder containing the configurations and logs of the Datadog Agent
For which OS a standalone DogStatsD package is NOT available?
AIX - https://docs.datadoghq.com/agent/
On an agent host installation, what needs to happen for the agent to access the docker daemon?
Add the Agent user to the Docker group: usermod -a -G docker dd-agent Create a docker_daemon.yaml file by copying the example file in the Agent conf.d directory. If you have a standard install of Docker on your host, there shouldn't be anything you need to change to get the integration to work.
What are the aggregation rules per metric type with DogStatsD (at the agent level)?
METRIC TYPE AGGREGATION PERFORMED OVER ONE FLUSH INTERVAL GAUGE The latest datapoint received is sent. COUNT The sum of all received datapoints is sent. HISTOGRAM The min, max, sum, avg, 95 percentiles, count, and median of all datapoints received is sent. SET The number of different datapoints is sent. DISTRIBUTION Aggregated as global distributions. https://docs.datadoghq.com/developers/dogstatsd/data_aggregation/#aggregation-rules-per-metric-type
What is a resource metric?
Most components of your software infrastructure serve as a resource to other systems. Some resources are low-level—for instance, a server's resources include such physical components as CPU, memory, disks, and network interfaces. But a higher-level component, such as a database or a geolocation microservice, can also be considered a resource if another system requires that component to produce work. Resource metrics can help you reconstruct a detailed picture of a system's state, making them especially valuable for investigation and diagnosis of problems. For each resource in your system, try to collect metrics that cover four key areas: utilization is the percentage of time that the resource is busy, or the percentage of the resource's capacity that is in use. saturation is a measure of the amount of requested work that the resource cannot yet service, often queued. errors represent internal errors that may not be observable in the work the resource produces. availability represents the percentage of time that the resource responded to requests. This metric is only well-defined for resources that can be actively and regularly checked for availability.
When are the functions being applied? before or after the time and space aggregation?
Most of the functions are applied at the last step. https://docs.datadoghq.com/dashboards/guide/query-to-the-graph/
Which metric is affected by ""ignore containers"" settings? - kubernetes.containers.running - docker.containers.stopped.total - kubernetes.pods.running - none - all
None - https://docs.datadoghq.com/containers/docker/?tab=standard#ignore-containers The kubernetes.containers.running, kubernetes.pods.running, docker.containers.running, .stopped, .running.total and .stopped.total metrics are not affected by these settings. All containers are counted. This does not affect your per-container billing.
What is the permission attached to an API key?
None. An API key is required by the Datadog Agent to submit metrics and events to Datadog. It does not have any access to the data or config of the Datadog platform.
What is the default metric type when submitting metric through API?
Not assigned
How to tag all DogStatsD metrics coming via UDS with the same tags as with autodiscovery?
On Kubernetes DD_DOGSTATSD_ORIGIN_DETECTION=true Note: container_id, container_name, and pod_name tags are not added by default to avoid creating too many custom metrics. https://docs.datadoghq.com/developers/dogstatsd/unix_socket/?tab=kubernetes#origin-detection
How often is the host map updated?
Once every 1 minute
How many containers in the daemonset pod deployment and what are their purpose?
One container with the Agent process (Agent + Log Agent) One container with the process-agent process One container with the trace-agent process One container with the system-probe process
What are the required parameters for a service check?
PARAMETER TYPE REQUIRED DEFAULT VALUE DESCRIPTION <SERVICE_CHECK_NAME> String Yes - The name of the service check. <STATUS> Int Yes - A constant describing the service status: 0 for OK, 1 for WARN, 2 for CRITICAL, and 3 for UNKNOWN. Optional parameters <TAGS> List of key:value pairs No - A list of tags to associate with the service check. <HOSTNAME> String No Current host The hostname to associate with the service check. <MESSAGE> String No - Additional information or a description of why the status occurred. https://docs.datadoghq.com/developers/service_checks/dogstatsd_service_checks_submission/#function
In which category of metrics does `utilization` belong to?
Resource metrics
With Autodiscovery enabled, how would you exclude a specific integration?
Set `DD_IGNORE_AUTOCONF= "<INTEGRATION NAME>"` as an environmental variable e.g. DD_IGNORE_AUTOCONF=""redisdb istio""
What is a work metric?
Work metrics indicate the top-level health of your system by measuring its useful output. When considering your work metrics, it's often helpful to break them down into four subtypes: throughput is the amount of work the system is doing per unit time. Throughput is usually recorded as an absolute number. success metrics represent the percentage of work that was executed successfully. error metrics capture the number of erroneous results, usually expressed as a rate of errors per unit time or normalized by the throughput to yield errors per unit of work. Error metrics are often captured separately from success metrics when there are several potential sources of error, some of which are more serious or actionable than others. performance metrics quantify how efficiently a component is doing its work. The most common performance metric is latency, which represents the time required to complete a unit of work. Latency can be expressed as an average or as a percentile, such as "99% of requests returned within 0.1s".
Can we change the limit in agent configuration to store data above or less 95% used disk space?
Yes, in the config file, parameter: forwarder_storage_max_disk_ratio - https://docs.datadoghq.com/agent/guide/network/?tab=agentv6v7#data-buffering
What is the command to change the log level at agent level without restarting the agent?
agent config set log_level debug
The canonical hostname is chosen according to the following rules. The first match is selected. What are those rules?
agent-hostname: A hostname explicitly set in the Agent configuration file if it does not start with ip- or domu. hostname (hostname -f on Linux): If the DNS hostname is not an EC2 default, for example: ip-192-0-0-1. instance-id: If the Agent can reach the EC2 metadata endpoint from the host. hostname: Fall back on the DNS hostname even if it is an EC2 default. https://docs.datadoghq.com/agent/faq/how-datadog-agent-determines-the-hostname/?tab=agentv6
What is the default time aggregation for a gauge in the Datadog platform?
average
What is the default time aggregation for a rate in the Datadog platforn?
average
What command should I run to fix the following log permission error? Error: file /var/log/application/error.log does not exist
chmod 755 /var/log/application/ & chmod 644 /var/log/application/error.log Note: 711 would be enough for the directory to access the hierarchy
What is the command to run a flare from a Debian host and in a k8s environment?
datadog-agent flare <CASE_ID> kubectl exec -it <AGENT_POD_NAME> -c agent -- agent flare <CASE_ID>
What are the possible options for datadog.agent.check_status check?
datadog.agent.check_status: Returns CRITICAL if an Agent check is unable to send metrics to Datadog, otherwise returns OK.
Which configuration setting defines the maximum memory usage for storing the metrics? forwarder_retry_queue_payloads_max_size forwarder_queue_payloads_max_size collector_retry_queue_payloads_max_size collector_retry_queue_max_size
forwarder_retry_queue_payloads_max_size - https://docs.datadoghq.com/agent/guide/network/?tab=agentv6v7#data-buffering
The correct way to instantiate the above Dog class is: class Dog: def
init__ (self, name, age): self.name = name self.age = age__Dog(""Rufus"", 3)
What is the folder where the config are stored on a host installation?
linux: /etc/datadog-agent/conf.d/ mac: ~/datadog-agent/conf.d/ windows 2008, vista, newer: %ProgramData%\Datadog\conf.d https://docs.datadoghq.com/agent/guide/agent-configuration-files/?tab=agentv6v7
When setting up APM on the SDK, what are the default values for host, port and https?
localhost, 8126, False - https://ddtrace.readthedocs.io/en/stable/advanced_usage.html#agent-configuration
Where in DD can a client see on which hosts APM is running?
metric datadog.apm.host_instance or on one of those metrics datadog.trace_agent.*
Where in DD can a client see on which hosts NPM is running?
metric datadog.system_probe.agent
What is the parameter to change to collection interval for an integration?
min_collection_interval
Which outbound port is used for the Kubernetes HTTPS Kubelet?
port: 10250
Which port is exposing the datadog runtime metrics?
port: 5000
What is the port used by the GUI and CLI to send commands to a running agent?
port: 5001
What is the port to display the agent UI?
port: 5002
What is the default port to receive spans?
port: 8126
The IP address 172.17.2.3 is: public, multicast, private, class E?
private address
With DogStatsD in Python, what is the parameter to add the same tags to all metrics?
statsd_constant_tags https://docs.datadoghq.com/developers/dogstatsd/?tab=hostagent&code-lang=python#client-instantiation-parameters
With DogStatsD in Python, what is the parameter to add a namespace to all metrics?
statsd_namespace https://docs.datadoghq.com/developers/dogstatsd/?tab=hostagent&code-lang=python#client-instantiation-parameters
What is the default time aggregation for a count in the Datadog platform?
sum
Which is the endpoint for the APM traces intake?
trace.agent.datadoghq.com
On ddtrace for Python, what is the parameter to add to tracer.config(...) to send traces through UDS?
uds_path -- tracer.configure(uds_path=""/path/to/socket"") https://ddtrace.readthedocs.io/en/stable/advanced_usage.html
What is the outcome of the following curl command? curl -X POST ""https://api.datadoghq.eu/api/v2/series"" \ -H ""DD-API-KEY: ${DD_API_KEY}"" \ -d @- << EOF {""series"":[{""points"":[{""value"": 1}],""resources"": [{""name"": ""host1"",""type"":""host""}]}]} EOF
{""errors"":[""Payload validation failed: metric name is empty""]}
What is the CPU and Memory overhead of the Datadog Agent?
~0.08% CPU / ~130MB RAM
What is the maximum number of datapoints that can be displayed on timeseries?
~300 https://docs.datadoghq.com/dashboards/guide/query-to-the-graph/
Where can we find full examples of integration YAML files for the Datadog Agent?
Datadog/integrations-core
How does DogStatsD distinguish itself from StatsD?
DogStatsD supports events submission
What is the shortcut for quick graphs?
G
What are the four data types for metrics inside the Datadog platform?
Gauge, Rate, Count, Distribution - https://docs.datadoghq.com/metrics/types/?tab=histogram#submission-types-and-datadog-in-app-types
What protocol is used to submit metrics to Datadog?
HTTPS
Given the above output from the agent configuration, which canonical hostname will be assigned to the agent? .... Hostnames ========= hostname: my.special.hostname agent-hostname: ip-192-0-0-1.internal ec2-hostname: ip-10-24-34-0.ec2.internal instance-id: i-deadbeef socket-hostname: myhost socket-fqdn: myhost.mydomain ...
Hostname: my.special.hostname
What happens to API key and APP keys created by an account that will be disabled?
If a user's account is disabled, any application keys that the user created are revoked. Any API keys that were created by the disabled account are not deleted, and are still valid.
What is the definition of a saturation metric?
Is a measure of the amount of requested work that the resource cannot yet service, often queued
What does DD_CHECKS_TAG_CARDINALITY=high change compare to DD_CHECKS_TAG_CARDINALITY=low?
It changes the tag attached to the check metrics. The tags below are added. Tag Cardinality Requirement container_name High Note: not included for the containerd runtime. container_id High rancher_container High Rancher environment mesos_task Orchestrator Mesos environment https://docs.datadoghq.com/containers/docker/tag/?tab=containerizedagent#out-of-the-box-tagging
What is the definition of the `is_reliable` setting when configuring the agent for dual shipping?
It ensures that data is not missed if a destination becomes unavailable
What is the parameter to configure the default tags attached to metrics emitted by containers on Kubernetes?
DD_CHECKS_TAG_CARDINALITY
What is the range for public IP address?
Anything not in the list below is public 10.0.0.0 — 10.255.255.255 172.16.0.0 — 172.31.255.255 192.168.0.0 — 192.168.255.255
What are the names no accepted as canonical names for host (by default)?
Anything starting with ip- or domu https://github.com/DataDog/datadog-agent/blob/main/docs/agent/hostname_force_config_as_canonical.md Note: This can be overridden for agent 6.16+ or 7.16+ with hostname_force_config_as_canonical:true
What are the environment variables to ignore containers?
DD_CONTAINER_INCLUDE, DD_CONTAINER_EXCLUDE, DD_CONTAINER_INCLUDE_METRICS, DD_CONTAINER_EXCLUDE_METRICS, DD_CONTAINER_INCLUDE_LOGS, DD_CONTAINER_EXCLUDE_LOGS https://docs.datadoghq.com/containers/docker/?tab=standard#ignore-containers
What are the monitors that you cannot create SLO on?
Datadog monitor-based SLOs support the following monitor types: Metric Monitor Types (Metric, Integration, APM Metric, Anomaly, Forecast, Outlier) Synthetic Service Checks (open beta) https://docs.datadoghq.com/monitors/service_level_objectives/monitor/#prerequisites
What is the definition of `system.disk.used`?
The amount of disk space in use
When will the monitor with the message below trigger? {{^is_renotify}} something written @[email protected] {{/is_renotify}}
The monitor will not trigger the notification message in case of renotifying
How many monitor types can you name?
There are 21 monitors - Host, Metric, Anomaly, APM, Audit Logs, CI, Composite, Custom Check, Error Tracking, Event, Forecast, Integration, Live Process, Logs, Network, Outlier, Process Check, Real User Monitoring, SLO Alerts, Synthetic Monitoring, Watchdog
What is the order of aggregation time first or space first?
Time aggregation is before space aggregation https://docs.datadoghq.com/dashboards/guide/query-to-the-graph/
A customer is not familiar with the concepts of dashboards and metrics. How would you describe both of these concepts to a user who has limited technical knowledge?
Toi même tu sais
What does a flare not include?
Tracer debug logs
An organization is implementing the practice of setting multiple API keys for various deployment methods: one for agents on Kubernetes in AW, one for on-prem with Chef, one for Terraform and one for developers deploying locally. Is this approach following the recommended best practices?
True, api key should have separate scope
What are the two methods used to send metrics with DogStatsD?
UDP or UDS (Unix Domain Socket)
When to use DogStatsD over UDS?
Unix Domain Sockets allow you to establish the connection with a socket file, regardless of the IP of the Datadog Agent container. It also enables the following benefits: Bypassing the networking stack brings a significant performance improvement for high traffic. While UDP has no error handling, UDS allows the Agent to detect dropped packets and connection errors, while still allowing a non-blocking use. DogStatsD can detect the container from which metrics originated and tag those metrics accordingly. https://docs.datadoghq.com/developers/dogstatsd/unix_socket?tab=host
In case the network configuration restricts outbound traffic, how can the traffic to Datadog be enabled?
Use Nginx as a reverse proxy for the agents
When is a default interpolation set?
When the metric is a gauge or a rate and needs to be compared with others. The platform automatically apply a linear interpolation.
In which OS the GUI is automatically enabled?
Windows and Mac