Metrics
Another important aspect of observability and monitoring is the collection of metrics data. These metrics can help to identify patterns and understand trends and to make sure SLAs are met.
ADS collects metrics for requests according to the following key metrics:
Rate - the number of access requests ADS is serving
Errors - the number of failed access requests
Duration - distribution of the amount of time each access request takes
Rate of successful evaluations - the rate of successful access requests that evaluated to Permit, Deny, Indeterminate, or NotApplicable, respectively
Pull or push
Two primary approaches orchestrate communication of metrics data between the monitored application and the metrics backend: client-pull and server-push models.
ADS is compatible with both models and currently supports the following monitoring systems:
- Prometheus (pull)
- Azure Monitor (push)
- InfluxDB (push)
- Axiomatics Services Manager (push)
Several monitoring systems can be used concurrently with ADS.
Pull
In this model, the metrics backend pulls data from the application every T time units (usually the time unit is "seconds"). This action is also referred to as polling or scraping. This is done by having the application expose an HTTP endpoint, which returns the current value of each metric without doing any calculation.
The polling period is configured in the metrics backend by the operator. It should be set to a value that yields enough data to satisfy monitoring needs and the ability to draw conclusions, while not negatively affecting the performance of the primary operations of the application. This may require some tuning and testing by the operator.
Push
In this model, the metrics backend waits for metrics data to be pushed (or sent) to it, at a time set by the application, that is, the monitored application is configured to send metrics data to the metrics backend every T time units (usually the time unit is "seconds").
This means that the metrics library used and the polling period is configured in the monitored application by the operator. As with the pull model, the polling period should be set to a value that yields enough data to satisfy monitoring needs and the ability to draw conclusions, while not negatively affecting the performance of the primary operations of the application. This may require some tuning and testing by the operator.
Metrics data
The metrics data provided by ADS is presented in the form of counters and timers collated into an output, ready to be accessed by the metrics backend.
ADS tags metrics with the following properties:
ADS instance identity, as described in Configuring ADS instance identity.
Authorization domain identity, as described in The Identity section.
Authorization domain (namespace and domain name).
Domain sequence, a counter that represents how many domain changes the current instance of ADS has gone through since its startup.
The domain tag (namespace and domain name) is only available when ADS is configured to retrieve the authorization domain from ASM/ADM using the RetrieveByName endpoint.
This functionality enables data to be filtered by their respective ADS instance id, domain id and/or domain values.
The following example shows the plain text format used for Prometheus:
# HELP decisions_total
# TYPE decisions_total counter
decisions_total{ads_id="default-1603e8a2",domain="namespace0:domain1", domain_id="80bbc5fa-0647-4f22-804f-949056787c6b",domain_sequence="0", type="Indeterminate",} 5.0
decisions_total{ads_id="default-1603e8a2",domain="namespace0:domain1", domain_id="80bbc5fa-0647-4f22-804f-949056787c6b",domain_sequence="1", type="Deny",} 1.0
# HELP successful_requests_total
# HELP successful_requests_total
# TYPE successful_requests_total counter
successful_requests_total 8.0
# HELP error_requests_total
# TYPE error_requests_total counter
error_requests_total 1.0
# HELP duration_info_seconds_max
# TYPE duration_info_seconds_max gauge
duration_info_seconds_max 0.014
# HELP duration_info_seconds
# TYPE duration_info_seconds summary
duration_info_seconds{ads_id="default-1603e8a2",domain="namespace0:domain1", domain_id="80bbc5fa-0647-4f22-804f-949056787c6b",domain_sequence="0", quantile="0.5",} 0.005210112
duration_info_seconds{ads_id="default-1603e8a2",domain="namespace0:domain1", domain_id="80bbc5fa-0647-4f22-804f-949056787c6b",domain_sequence="0", quantile="0.75",} 0.014123008
duration_info_seconds{ads_id="default-1603e8a2",domain="namespace0:domain1", domain_id="80bbc5fa-0647-4f22-804f-949056787c6b",domain_sequence="0", quantile="0.9",} 0.014123008
duration_info_seconds{ads_id="default-1603e8a2",domain="namespace0:domain1", domain_id="80bbc5fa-0647-4f22-804f-949056787c6b",domain_sequence="0", quantile="0.99",} 0.014123008
duration_info_seconds{ads_id="default-1603e8a2",domain="namespace0:domain1", domain_id="80bbc5fa-0647-4f22-804f-949056787c6b",domain_sequence="0", quantile="0.999",} 0.014123008
duration_info_seconds{ads_id="default-1603e8a2",domain="namespace0:domain1", domain_id="80bbc5fa-0647-4f22-804f-949056787c6b",domain_sequence="0", quantile="1.0",} 0.014123008
duration_info_seconds_count 9.0
duration_info_seconds_sum 0.244
duration_info_seconds_bucket{ads_id="default-1603e8a2",domain="namespace0:domain1", domain_id="80bbc5fa-0647-4f22-804f-949056787c6b",domain_sequence="0", le="0.001",} 182.0
...
duration_info_seconds_bucket{ads_id="default-1603e8a2",domain="namespace0:domain1", domain_id="80bbc5fa-0647-4f22-804f-949056787c6b",domain_sequence="1", le="30.0",} 1200.0
duration_info_seconds_bucket{ads_id="default-1603e8a2",domain="namespace0:domain1", domain_id="80bbc5fa-0647-4f22-804f-949056787c6b",domain_sequence="1", le="+Inf",} 1200.0
Sample metrics output for Prometheus
Looking at the output per type, it can be broken down into the following sections:
# HELP successful_requests_total
# TYPE successful_requests_total counter
successful_requests_total{ads_id="default-1603e8a2",domain="namespace0:domain1", domain_id="80bbc5fa-0647-4f22-804f-949056787c6b",domain_sequence="0",} 8.0
The number of successful access requests.
# HELP error_requests_total
# TYPE error_requests_total counter
error_requests_total{ads_id="default-1603e8a2",domain="namespace0:domain1", domain_id="80bbc5fa-0647-4f22-804f-949056787c6b",domain_sequence="0",} 1.0
The number of errors.
# HELP duration_info_seconds_max
# TYPE duration_info_seconds_max gauge
duration_info_seconds_max 0.014
# HELP duration_info_seconds
# TYPE duration_info_seconds summary
duration_info_seconds{ads_id="default-1603e8a2",domain="namespace0:domain1", domain_id="80bbc5fa-0647-4f22-804f-949056787c6b",domain_sequence="0", quantile="0.5",} 0.005210112
duration_info_seconds{ads_id="default-1603e8a2",domain="namespace0:domain1", domain_id="80bbc5fa-0647-4f22-804f-949056787c6b",domain_sequence="0", quantile="0.75",} 0.014123008
duration_info_seconds{ads_id="default-1603e8a2",domain="namespace0:domain1", domain_id="80bbc5fa-0647-4f22-804f-949056787c6b",domain_sequence="0", quantile="0.9",} 0.014123008
duration_info_seconds{ads_id="default-1603e8a2",domain="namespace0:domain1", domain_id="80bbc5fa-0647-4f22-804f-949056787c6b",domain_sequence="0", quantile="0.99",} 0.014123008
duration_info_seconds{ads_id="default-1603e8a2",domain="namespace0:domain1", domain_id="80bbc5fa-0647-4f22-804f-949056787c6b",domain_sequence="0", quantile="0.999",} 0.014123008
duration_info_seconds{ads_id="default-1603e8a2",domain="namespace0:domain1", domain_id="80bbc5fa-0647-4f22-804f-949056787c6b",domain_sequence="0", quantile="1.0",} 0.014123008
duration_info_seconds_count 9.0
duration_info_seconds_sum 0.244
duration_info_seconds_bucket{ads_id="default-1603e8a2",domain="namespace0:domain1", domain_id="80bbc5fa-0647-4f22-804f-949056787c6b",domain_sequence="0", le="0.001",} 182.0
duration_info_seconds_bucket{ads_id="default-1603e8a2",domain="namespace0:domain1", domain_id="80bbc5fa-0647-4f22-804f-949056787c6b",domain_sequence="0", le="0.001048576",} 182.0
...
duration_info_seconds_bucket{ads_id="default-1603e8a2",domain="namespace0:domain1", domain_id="80bbc5fa-0647-4f22-804f-949056787c6b",domain_sequence="0", le="30.0",} 1200.0
duration_info_seconds_bucket{ads_id="default-1603e8a2",domain="namespace0:domain1", domain_id="80bbc5fa-0647-4f22-804f-949056787c6b",domain_sequence="0", le="+Inf",} 1200.0
The duration distribution for access requests. (Prometheus uses several buckets with histogram percentiles and for reasons of space the list is abbreviated.)
# HELP decisions_total
# TYPE decisions_total counter
decisions_total{ads_id="default-1603e8a2",domain="namespace0:domain1", domain_id="80bbc5fa-0647-4f22-804f-949056787c6b",domain_sequence="0", type="Permit",} 5.0
decisions_total{ads_id="default-1603e8a2",domain="namespace0:domain1", domain_id="80bbc5fa-0647-4f22-804f-949056787c6b",domain_sequence="0", type="Indeterminate",} 0.0
decisions_total{ads_id="default-1603e8a2",domain="namespace0:domain1", domain_id="80bbc5fa-0647-4f22-804f-949056787c6b",domain_sequence="0", type="Deny",} 1.0
decisions_total{ads_id="default-1603e8a2",domain="namespace0:domain1", domain_id="80bbc5fa-0647-4f22-804f-949056787c6b",domain_sequence="0", type="NotApplicable",} 0.0
The rate of successful access requests that evaluated to Permit, Deny, Indeterminate, or NotApplicable, respectively.
Only requests that result in single decisions are included in the data for decision values.
To access the metrics data, you need to set up a configuration property in the deployment configuration file that includes the configuration of the metrics backend according to the chosen model to receive the data for visualization and/or further processing.
In this implementation, metrics are collected for requests served from the REST endpoint of ADS. No extra features are implemented for the legacy SOAP endpoint. Any metrics data shown for the SOAP endpoint would be information produced by default by Dropwizard's Metrics library.
Metrics configuration
Statistics generated from a timer, such as maximum values, percentiles, and histogram counts, are designed to diminish over time, prioritizing recent data samples. To regulate this decay process, ADS utilizes an internal ring buffer (an array that maintains a pointer to a specific element) to monitor the statistics for maximums and percentiles.
Two optional sub-properties, common across all metrics processing monitoring systems, control this decay mechanism. As previously mentioned, multiple monitoring systems can be employed simultaneously.
The steps below describe how to configure your metrics feature for shared functionality:
- Add the
metricsBackends
property in yourdeployment.yaml
file. - Add the sub-property
parameters
. - Add and set the
decay
andbufferLength
sub-properties with suitable values.
To configure the metrics feature for shared functionality in ADS you need to set a the following section in your deployment.yaml
file:
metricsBackends:
parameters:
decay: 2 minutes
bufferLength: 3
Shared configuration properties for metrics
The nested sub-properties of the paremeters
sub-property that configures the metrics feature for shared functionality, are listed in the table below:
Properties | Description |
---|---|
decay | Defines the frequency at which the pointer in the ring buffer advances to the next element. The duration must be expressed as a positive integer followed by a time unit. The default is 2 minutes and the minimum allowed 1 second .NOTE: To prevent undersampling, Axiomatics strongly recommends using a minimum decay value that is at least twice the highest step value used across all enabled metrics backends in your configuration. This applies regardless of using a single or multiple backends. Additionally, it is recommended to determine the value for the step sub-property first and then adjust the decay value accordingly. |
bufferLength | Defines the size of the ring buffer. The default is 3 and is not recommended using a value as low as 1 . |
Shared properties for metrics
While including these sub-properties is optional, leaving them empty or using null values will render the entry invalid and prevent system initialization. If a sub-property is not included in the configuration, the default value will be applied.
Optional metrics configuration for backend services
To optimize the tracking and publishing of metrics, you can configure ADS to integrate with various metrics backend services listed in Pull or push.
These metric backends are configured using their respective sub-properties under the metricBackend
section mentioned in the previous section and configuring their nested sub-properties accordingly. Follow the instructions below to integrate your chosen metrics backend with ADS and to enhance your monitoring.
While including some of the nested sub-properties listed below, is optional, leaving them empty or using null values will render the entry invalid and prevent system initialization. If a sub-property is not included in the configuration, the default value will be applied.
- ASM
- Prometheus
- Azure Monitor
- InfluxDB
ASM can be used as a metrics backend. This configuration is used to publish key metrics for the graph displays of the Dashboard feature of ASM.
ASM can be used as a metrics backend only in the case where ADS is started with an authorization domain retrieved from ASM. Ιn order to publish the metrics to ASM, you have to configure the authentication to ASM as described in Authentication using an authorization server section. Furthermore, ASM must be running with the Dashboard functionality enabled, as described in the Installation section of the ASM documentation.
To set up ADS for use with ASM, you need to set a the following section in your deployment.yaml
file:
metricsBackends:
parameters:
decay: 2 minute
bufferLength: 3
asm:
enabled: true
uri: https://localhost/metrics/push
The nested sub-properties of the sub-property asm
that configures ASM as a backend service for the metrics, are listed in the table below:
Properties | Required | Description |
---|---|---|
enabled | Required | Enables the collection of data for the ASM metrics backend when set to true . |
uri | Required | Specifies the URI for the ASM backend. |
ASM configuration sub-properties
To set up ADS for use with Prometheus, you need to add the prometheus
sub-property in the deployment.yaml
file and at least one nested sub-property as shown in the sample below:
metricsBackends:
parameters:
decay: 2 minutes
bufferLength: 3
prometheus:
enabled: true
descriptions: false
histogramFlavor: VictoriaMetrics
prefix: prom
step: 1 minute
Prometheus configuration
Setting only the enabled
sub-property under prometheus
results in a minimal Prometheus configuration.
The nested sub-properties of the prometheus
sub-property that configures Prometheus as a backend service for the metrics, are listed in the table below:
Properties | Required | Description |
---|---|---|
enabled | Required | Enables the collection of data for the Prometheus metrics backend via an administration endpoint when set to true . The endpoint is GET /admin/metrics/prometheus under the administration endpoint. |
descriptions | Optional | Enables sending meter descriptions to Prometheus. When set to false it minimizes the amount of data sent on each scrape.Default value is true . |
histogramFlavor | Optional | Specifies the Histogram type to use for the meters DistributionSummary and Timer. Default value is Prometheus . |
prefix | Optional | Specifies the string prefix that is used by the metrics library employed internally by ADS. Default value is prometheus . |
step | Optional | Defines how often data is sampled from gauges and percentiles. The duration must be expressed as a positive integer and a time unit. Default value is 1 minute .NOTE: Axiomatics strongly recommends that the step interval is the same as the pull (or scrape) interval set for Prometheus, and half that of the value of the decay sub-property mentioned above. Axiomatics also recommends determining the value for the step sub-property first and then adjusting the decay value accordingly. |
Prometheus configuration sub-properties
These instructions are only referring to steps relevant to configure ADS for use with Prometheus. For other questions regarding Prometheus, please refer to Prometheus documentationOpens in a new tab.
Prometheus endpoint
This is the administration endpoint at which ADS exposes the current values of the metrics for Prometheus to pull (or scrape). The endpoint is only available when there is a valid Prometheus configuration enabled.
GET /admin/metrics/prometheus
Configuring Azure Monitor as a metrics backend in ADS using the OpenTelemetry Java agent is no longer recommended. This option is deprecated and will be removed in a future release.
When Azure Monitoring Application Insights is used, the Application Insights Java agent should be used for both tracing and metrics information. See Running ADS with the Application Insights Java agent.
Azure Monitor Application Insights is a feature of Azure Monitor that is used to monitor live applications. To set up ADS for use with Azure Monitor Application Insights, you need to provide a property for Azure Monitor and at least two sub-properties in the deployment configuration file.
If ADS is launched with the Application Insights Java agent, Axiomatics recommends that the azureMonitor configuration in the deployment configuration file is disabled, and vice versa, do not use the Application Insights Java agent when the azureMonitor configuration is enabled.
To set up ADS for use with Azure Monitoring, you need to add the azureMonitor
sub-property in the deployment.yaml
file as shown in the sample below:
metricsBackends:
parameters:
decay: 2 minutes
bufferLength: 3
azuremonitor:
enabled: true
instrumentationKey: <String that represents the Instrumentation Key>
prefix: azuremonitor
step: 1 minute
Azure Monitor Application Insights configuration
The nested sub-properties of the azuremonitor
sub-property that configures Azure Monitor as a backend service for the metrics, are listed in the table below:
Properties | Required | Description |
---|---|---|
enabled | Required | Enables the collection of metrics data for the Azure Monitor metrics backend (Application Insights) via the Instrumentation Key when set to true . |
instrumentationKey | Required | A string that identifies the Application Insights resource that should be associated with the metrics data sent by ADS. It is the key integration point between ADS and the Application Insights monitoring service. |
prefix | Optional | Specifies the string prefix that is used by the metrics library employed internally by ADS. Default value is azuremonitor . |
step | Optional | Defines how often data is sampled from gauges and percentiles. It also governs the push interval, that is, the reporting frequency. The duration must be expressed as a positive integer and a time unit. Default value is 1 minute .NOTE: Axiomatics strongly recommends that the step interval is half that of the value of the decay sub-property mentioned above. Axiomatics also recommends determining the value for the step sub-property first and then adjusting the decay value accordingly. |
Azure Monitor Application Insights configuration sub-properties
These instructions are only referring to steps relevant to configure ADS for use with Azure Monitor Application Insights. For other questions regarding Azure Monitor Application Insights, please refer to Azure Monitor Application Insights documentationOpens in a new tab.
When setting up ADS to use InfluxDB as a metrics backend, you need to create an InfluxDB account to include some of the nested sub-properties under the influx
sub-property as shown in the sample below:
metricsBackends:
parameters:
decay: 2 minutes
bufferLength: 3
influx:
enabled: true
org: my-organization
prefix: influx
step: 1 minute
bucket: my-bucket
token: <String that represents the authentication token>
uri: http://localhost:8086
InfluxDB configuration
The sub-properties bucket
, org
, and token
are created during the set up of your InfluxDB account and their values can then be copied into the configuration file. The token value is automatically generated.
The nested sub-properties of the influx
sub-property that configures InfluxDB as a backend service for the metrics, are listed in the table below:
Properties | Required | Description |
---|---|---|
enabled | Required | Enables the InfluxDB time series platform that collects, stores, processes and visualizes metrics and events when set to true . |
prefix | Optional | Specifies the string prefix that is used by the metrics library employed internally by ADS. Default value is influx . |
org | Required | A string that specifies the destination organization for writes. Takes either the ID or Name interchangeably. This needs to be an org that exists in InfluxDB. |
bucket | Required | A string that specifies the destination bucket for writes. Takes either the ID or Name interchangeably. This needs to be a bucket that exists in InfluxDB. |
token | Required | A string that represents the Authentication token for the InfluxDB API, to authorize API requests. This token is automatically generated in InfluxDB. |
step | Optional | Defines how often data is sampled from gauges and percentiles. It also governs the push interval, that is, the reporting frequency. The duration must be expressed as a positive integer and a time unit. The default value is 1 minute .NOTE: It is strongly recommended to set the step interval to half the value of the decay value mentioned above. It is also recommended to determine the value for the step sub-property first and then adjust the decay value |
uri | Optional | A string that specifies the URI for the Influx backend. This sub-property can be set to allow writes only to a specific instance of InfluxDB. Default value is http://localhost:8086 . |
InfluxDB configuration sub-properties
ADS will push metrics data to influxDB according to the step setting. If step is set to 1 minute (and assuming that there is 1 request per second), then all the 60 requests will be written to influxDB every 1 minute.
These instructions are only referring to steps relevant to configure ADS for use with InfluxDB. For other questions regarding InfluxDB, please refer to InfluxDB documentationOpens in a new tab.
Enable TLS encryption
InfluxData strongly recommends enabling TLS, especially if you plan on sending requests to InfluxDB over a network.
Proper TLS certificates must be provided to InfluxDB, and to the JVM used to run ADS.
Refer to Enable TLS encryptionOpens in a new tab for information on how to enable and configure TLS encryption.