Skip to main content
Version: 1.17

Metrics

Another important aspect of observability and monitoring is the collection of metrics data. These metrics can help to identify patterns and understand trends and to make sure SLAs are met.

ADS collects metrics for requests according to the following key metrics:

  • Rate - the number of access requests ADS is serving

  • Errors - the number of failed access requests

  • Duration - distribution of the amount of time each access request takes

  • Rate of successful evaluations - the rate of successful access requests that evaluated to Permit, Deny, Indeterminate, or NotApplicable, respectively

Pull or push

Two primary approaches orchestrate communication of metrics data between the monitored application and the metrics backend: client-pull and server-push models.

ADS is compatible with both models and currently supports the following monitoring systems:

  • Prometheus (pull)
  • Azure Monitor (push)
  • InfluxDB (push)
  • Axiomatics Services Manager (push)

Several monitoring systems can be used concurrently with ADS.

Pull

In this model, the metrics backend pulls data from the application every T time units (usually the time unit is "seconds"). This action is also referred to as polling or scraping. This is done by having the application expose an HTTP endpoint, which returns the current value of each metric without doing any calculation.

The polling period is configured in the metrics backend by the operator. It should be set to a value that yields enough data to satisfy monitoring needs and the ability to draw conclusions, while not negatively affecting the performance of the primary operations of the application. This may require some tuning and testing by the operator.

Push

In this model, the metrics backend waits for metrics data to be pushed (or sent) to it, at a time set by the application, that is, the monitored application is configured to send metrics data to the metrics backend every T time units (usually the time unit is "seconds").

This means that the metrics library used and the polling period is configured in the monitored application by the operator. As with the pull model, the polling period should be set to a value that yields enough data to satisfy monitoring needs and the ability to draw conclusions, while not negatively affecting the performance of the primary operations of the application. This may require some tuning and testing by the operator.

Metrics data

The metrics data provided by ADS is presented in the form of counters and timers collated into an output, ready to be accessed by the metrics backend.

ADS tags metrics with the following properties:

  • ADS instance identity, as described in Configuring ADS instance identity.

  • Authorization domain identity, as described in The Identity section.

  • Authorization domain (namespace and domain name).

  • Domain sequence, a counter that represents how many domain changes the current instance of ADS has gone through since its startup.

note

The domain tag (namespace and domain name) is only available when ADS is configured to retrieve the authorization domain from ASM/ADM using the RetrieveByName endpoint.

This functionality enables data to be filtered by their respective ADS instance id, domain id and/or domain values.

The following example shows the plain text format used for Prometheus:

# HELP decisions_total
# TYPE decisions_total counter
decisions_total{ads_id="default-1603e8a2",domain="namespace0:domain1", domain_id="80bbc5fa-0647-4f22-804f-949056787c6b",domain_sequence="0", type="Indeterminate",} 5.0
decisions_total{ads_id="default-1603e8a2",domain="namespace0:domain1", domain_id="80bbc5fa-0647-4f22-804f-949056787c6b",domain_sequence="1", type="Deny",} 1.0
# HELP successful_requests_total
# HELP successful_requests_total
# TYPE successful_requests_total counter
successful_requests_total 8.0
# HELP error_requests_total
# TYPE error_requests_total counter
error_requests_total 1.0
# HELP duration_info_seconds_max
# TYPE duration_info_seconds_max gauge
duration_info_seconds_max 0.014
# HELP duration_info_seconds
# TYPE duration_info_seconds summary
duration_info_seconds{ads_id="default-1603e8a2",domain="namespace0:domain1", domain_id="80bbc5fa-0647-4f22-804f-949056787c6b",domain_sequence="0", quantile="0.5",} 0.005210112
duration_info_seconds{ads_id="default-1603e8a2",domain="namespace0:domain1", domain_id="80bbc5fa-0647-4f22-804f-949056787c6b",domain_sequence="0", quantile="0.75",} 0.014123008
duration_info_seconds{ads_id="default-1603e8a2",domain="namespace0:domain1", domain_id="80bbc5fa-0647-4f22-804f-949056787c6b",domain_sequence="0", quantile="0.9",} 0.014123008
duration_info_seconds{ads_id="default-1603e8a2",domain="namespace0:domain1", domain_id="80bbc5fa-0647-4f22-804f-949056787c6b",domain_sequence="0", quantile="0.99",} 0.014123008
duration_info_seconds{ads_id="default-1603e8a2",domain="namespace0:domain1", domain_id="80bbc5fa-0647-4f22-804f-949056787c6b",domain_sequence="0", quantile="0.999",} 0.014123008
duration_info_seconds{ads_id="default-1603e8a2",domain="namespace0:domain1", domain_id="80bbc5fa-0647-4f22-804f-949056787c6b",domain_sequence="0", quantile="1.0",} 0.014123008
duration_info_seconds_count 9.0
duration_info_seconds_sum 0.244
duration_info_seconds_bucket{ads_id="default-1603e8a2",domain="namespace0:domain1", domain_id="80bbc5fa-0647-4f22-804f-949056787c6b",domain_sequence="0", le="0.001",} 182.0
...
duration_info_seconds_bucket{ads_id="default-1603e8a2",domain="namespace0:domain1", domain_id="80bbc5fa-0647-4f22-804f-949056787c6b",domain_sequence="1", le="30.0",} 1200.0
duration_info_seconds_bucket{ads_id="default-1603e8a2",domain="namespace0:domain1", domain_id="80bbc5fa-0647-4f22-804f-949056787c6b",domain_sequence="1", le="+Inf",} 1200.0

Sample metrics output for Prometheus

Looking at the output per type, it can be broken down into the following sections:

# HELP successful_requests_total
# TYPE successful_requests_total counter
successful_requests_total{ads_id="default-1603e8a2",domain="namespace0:domain1", domain_id="80bbc5fa-0647-4f22-804f-949056787c6b",domain_sequence="0",} 8.0

The number of successful access requests.

# HELP error_requests_total
# TYPE error_requests_total counter
error_requests_total{ads_id="default-1603e8a2",domain="namespace0:domain1", domain_id="80bbc5fa-0647-4f22-804f-949056787c6b",domain_sequence="0",} 1.0

The number of errors.

# HELP duration_info_seconds_max
# TYPE duration_info_seconds_max gauge
duration_info_seconds_max 0.014
# HELP duration_info_seconds
# TYPE duration_info_seconds summary
duration_info_seconds{ads_id="default-1603e8a2",domain="namespace0:domain1", domain_id="80bbc5fa-0647-4f22-804f-949056787c6b",domain_sequence="0", quantile="0.5",} 0.005210112
duration_info_seconds{ads_id="default-1603e8a2",domain="namespace0:domain1", domain_id="80bbc5fa-0647-4f22-804f-949056787c6b",domain_sequence="0", quantile="0.75",} 0.014123008
duration_info_seconds{ads_id="default-1603e8a2",domain="namespace0:domain1", domain_id="80bbc5fa-0647-4f22-804f-949056787c6b",domain_sequence="0", quantile="0.9",} 0.014123008
duration_info_seconds{ads_id="default-1603e8a2",domain="namespace0:domain1", domain_id="80bbc5fa-0647-4f22-804f-949056787c6b",domain_sequence="0", quantile="0.99",} 0.014123008
duration_info_seconds{ads_id="default-1603e8a2",domain="namespace0:domain1", domain_id="80bbc5fa-0647-4f22-804f-949056787c6b",domain_sequence="0", quantile="0.999",} 0.014123008
duration_info_seconds{ads_id="default-1603e8a2",domain="namespace0:domain1", domain_id="80bbc5fa-0647-4f22-804f-949056787c6b",domain_sequence="0", quantile="1.0",} 0.014123008
duration_info_seconds_count 9.0
duration_info_seconds_sum 0.244
duration_info_seconds_bucket{ads_id="default-1603e8a2",domain="namespace0:domain1", domain_id="80bbc5fa-0647-4f22-804f-949056787c6b",domain_sequence="0", le="0.001",} 182.0
duration_info_seconds_bucket{ads_id="default-1603e8a2",domain="namespace0:domain1", domain_id="80bbc5fa-0647-4f22-804f-949056787c6b",domain_sequence="0", le="0.001048576",} 182.0
...
duration_info_seconds_bucket{ads_id="default-1603e8a2",domain="namespace0:domain1", domain_id="80bbc5fa-0647-4f22-804f-949056787c6b",domain_sequence="0", le="30.0",} 1200.0
duration_info_seconds_bucket{ads_id="default-1603e8a2",domain="namespace0:domain1", domain_id="80bbc5fa-0647-4f22-804f-949056787c6b",domain_sequence="0", le="+Inf",} 1200.0

The duration distribution for access requests. (Prometheus uses several buckets with histogram percentiles and for reasons of space the list is abbreviated.)

# HELP decisions_total
# TYPE decisions_total counter
decisions_total{ads_id="default-1603e8a2",domain="namespace0:domain1", domain_id="80bbc5fa-0647-4f22-804f-949056787c6b",domain_sequence="0", type="Permit",} 5.0
decisions_total{ads_id="default-1603e8a2",domain="namespace0:domain1", domain_id="80bbc5fa-0647-4f22-804f-949056787c6b",domain_sequence="0", type="Indeterminate",} 0.0
decisions_total{ads_id="default-1603e8a2",domain="namespace0:domain1", domain_id="80bbc5fa-0647-4f22-804f-949056787c6b",domain_sequence="0", type="Deny",} 1.0
decisions_total{ads_id="default-1603e8a2",domain="namespace0:domain1", domain_id="80bbc5fa-0647-4f22-804f-949056787c6b",domain_sequence="0", type="NotApplicable",} 0.0

The rate of successful access requests that evaluated to Permit, Deny, Indeterminate, or NotApplicable, respectively.

note

Only requests that result in single decisions are included in the data for decision values.

To access the metrics data, you need to set up a configuration property in the deployment configuration file that includes the configuration of the metrics backend according to the chosen model to receive the data for visualization and/or further processing.

note

In this implementation, metrics are collected for requests served from the REST endpoint of ADS. No extra features are implemented for the legacy SOAP endpoint. Any metrics data shown for the SOAP endpoint would be information produced by default by Dropwizard's Metrics library.

Metrics configuration

Statistics generated from a timer, such as maximum values, percentiles, and histogram counts, are designed to diminish over time, prioritizing recent data samples. To regulate this decay process, ADS utilizes an internal ring buffer (an array that maintains a pointer to a specific element) to monitor the statistics for maximums and percentiles.

Two optional sub-properties, common across all metrics processing monitoring systems, control this decay mechanism. As previously mentioned, multiple monitoring systems can be employed simultaneously.

The steps below describe how to configure your metrics feature for shared functionality:

  1. Add the metricsBackends property in your deployment.yaml file.
  2. Add the sub-property parameters.
  3. Add and set the decay and bufferLength sub-properties with suitable values.

To configure the metrics feature for shared functionality in ADS you need to set a the following section in your deployment.yaml file:

metricsBackends:
parameters:
decay: 2 minutes
bufferLength: 3

Shared configuration properties for metrics

The nested sub-properties of the paremeters sub-property that configures the metrics feature for shared functionality, are listed in the table below:

PropertiesDescription
decayDefines the frequency at which the pointer in the ring buffer advances to the next element. The duration must be expressed as a positive integer followed by a time unit.
The default is 2 minutes and the minimum allowed 1 second.
NOTE: To prevent undersampling, Axiomatics strongly recommends using a minimum decay value that is at least twice the highest step value used across all enabled metrics backends in your configuration. This applies regardless of using a single or multiple backends. Additionally, it is recommended to determine the value for the step sub-property first and then adjust the decay value accordingly.
bufferLengthDefines the size of the ring buffer.
The default is 3 and is not recommended using a value as low as 1.

Shared properties for metrics

important

While including these sub-properties is optional, leaving them empty or using null values will render the entry invalid and prevent system initialization. If a sub-property is not included in the configuration, the default value will be applied.

Optional metrics configuration for backend services

To optimize the tracking and publishing of metrics, you can configure ADS to integrate with various metrics backend services listed in Pull or push.

These metric backends are configured using their respective sub-properties under the metricBackend section mentioned in the previous section and configuring their nested sub-properties accordingly. Follow the instructions below to integrate your chosen metrics backend with ADS and to enhance your monitoring.

important

While including some of the nested sub-properties listed below, is optional, leaving them empty or using null values will render the entry invalid and prevent system initialization. If a sub-property is not included in the configuration, the default value will be applied.

ASM can be used as a metrics backend. This configuration is used to publish key metrics for the graph displays of the Dashboard feature of ASM.

note

ASM can be used as a metrics backend only in the case where ADS is started with an authorization domain retrieved from ASM. Ιn order to publish the metrics to ASM, you have to configure the authentication to ASM as described in Authentication using an authorization server section. Furthermore, ASM must be running with the Dashboard functionality enabled, as described in the Installation section of the ASM documentation.

To set up ADS for use with ASM, you need to set a the following section in your deployment.yaml file:

metricsBackends:
parameters:
decay: 2 minute
bufferLength: 3
asm:
enabled: true
uri: https://localhost/metrics/push

The nested sub-properties of the sub-property asm that configures ASM as a backend service for the metrics, are listed in the table below:

PropertiesRequiredDescription
enabledRequiredEnables the collection of data for the ASM metrics backend when set to true.
uriRequiredSpecifies the URI for the ASM backend.

ASM configuration sub-properties