Config Sync metrics

The page describes the OpenTelemetry metrics available to monitor your Config Sync resources.

How Config Sync collects metrics

Config Sync uses OpenCensus to create and record metrics and OpenTelemetry to export its metrics to Prometheus and Cloud Monitoring. You can also export OpenTelemetry metrics to a custom monitoring system. The following guides explain how to export metrics:

To configure the OpenTelemetry Collector, by default, Config Sync creates a ConfigMap named otel-collector. The otel-collector Deployment runs in the config-management-monitoring namespace.

The creation of the otel-collector ConfigMap configures the prometheus exporter, which exposes a metrics endpoint for Prometheus to scan.

When you run Config Sync on GKE, or in another Kubernetes environment that's configured with Google Cloud credentials, Config Sync creates a ConfigMap named otel-collector-google-cloud. The otel-collector-google-cloud overrides the configuration in the otel-collector ConfigMap. Config Sync reverts any changes to the otel-collector or otel-collector-google-cloud ConfigMaps.

The creation of the otel-collector-google-cloud ConfigMap also adds the cloudmonitoring exporter, which exports to Cloud Monitoring, and the kubernetes exporter, which exports to Google's internal metric service. The kubernetes exporter sends select, anonymized metrics to Google to help improve Config Sync.

Cloud Monitoring stores the metrics that you send to it in your Google Cloud project. The cloudmonitoring and kubernetes exporters use the same Google Cloud service account, which needs IAM permission to write to Cloud Monitoring. To configure these permissions, see Grant metric-writing permission for Cloud Monitoring.

OpenTelemetry metrics

Config Sync and the Resource Group Controller collect the following metrics with OpenCensus and make them available through OpenTelemetry Collector . The Tags column lists Config Sync specific tags that are applicable to each metric. Metrics with tags represent multiple measurements, one for each combination of tag values.

Config Sync metrics

NameTypeTagsDescription
api_duration_secondsDistributionoperation, statusThe latency distribution of API server calls.
apply_duration_secondsDistributionstatusThe latency distribution of applying resources declared from source of truth to a cluster.
apply_operations_totalCountoperation, status, controllerThe total number of operations that have been performed to sync resources from source of truth to a cluster.
declared_resourcesLast ValueThe number of declared resources parsed from Git.
internal_errors_totalCountsourceThe total number of internal errors encountered by Config Sync. Metric might not appear in query results if no internal error has happened.
last_sync_timestampLast ValuestatusThe timestamp of the most recent sync from Git.
parser_duration_secondsDistributionstatus, trigger, sourceThe latency distribution of different stages involved in syncing from source of truth to a cluster.
pipeline_error_observedLast Valuename, reconciler, componentThe status of RootSync and RepoSync custom resources. A value of 1 indicates a failure.
reconcile_duration_secondsDistributionstatusThe latency distribution of reconcile events handled by the reconciler manager.
reconciler_errorsLast Valuecomponent, errorclassThe number of errors encountered while syncing resources from the source of the truth to a cluster.
remediate_duration_secondsDistributionstatusThe latency distribution of remediator reconciliation events.
resource_conflicts_totalCountThe total number of resource conflicts resulting from a mismatch between the cached resources and cluster resources. Metric might not appear in query results if no resource conflict has happened.
resource_fights_totalCountThe total number of resources that are being synced too frequently. Any result higher than zero indicates a problem. For more information, see KNV2005: ResourceFightWarning. Metric might not appear in query results if no resource fight has happened.

Resource Group Controller metrics

The Resource Group Controller is a component in Config Sync that keeps track of the managed resources and checks if each individual resource is ready or reconciled. The following metrics are available.

NameTypeTagsDescription
rg_reconcile_duration_secondsDistributionstallreasonThe distribution of time taken to reconcile a ResourceGroup CR
resource_group_totalLast ValueThe current number of ResourceGroup CRs
resource_countLast ValueresourcegroupThe total number of resources tracked by a ResourceGroup
ready_resource_countLast ValueresourcegroupThe total number of ready resources in a ResourceGroup
resource_ns_countLast ValueresourcegroupThe number of namespaces used by resources in a ResourceGroup
cluster_scoped_resource_countLast ValueresourcegroupThe number of cluster scoped resources in a ResourceGroup
crd_countLast ValueresourcegroupThe number of CRDs in a ResourceGroup
kcc_resource_countLast ValueresourcegroupThe total number of KCC resources in a ResourceGroup
pipeline_error_observedLast Valuename, reconciler, componentThe status of RootSync and RepoSync custom resources. A value of 1 indicates a failure.

Config Sync metric labels

Metric labels can be used to aggregate metric data in Cloud Monitoring and Prometheus. They are selectable from the "Group By" drop-down list in the Monitoring Console.

For more information about Cloud Monitoring label and Prometheus metric label, see the Components of the metric model and Prometheus data model.

Metric labels

The following labels are used by Config Sync and Resource Group Controller metrics, available when monitoring with Cloud Monitoring and Prometheus.

NameValuesDescription
operationcreate, , update, deleteThe type of operation performed
statussuccess, errorThe execution status of an operation
reconcilerrootsync, reposyncThe type of the Reconciler
sourceparser, differ, remediatorThe source of the internal error
triggerretry, watchUpdate, managementConflict, resync, reimportThe trigger of an reconciliation event
nameThe name of reconcilerThe name of the Reconciler
componentparsing, source, sync, rendering, readinessThe name of component / current stage of reconciliation
containerreconciler, git-syncThe name of the container
resourcecpu, memoryThe type of the resource
controllerapplier, remediatorThe name of the controller in a root or namespace reconciler
typeAny Kubernetes resource, for example ClusterRole, Namespace, NetworkPolicy, Role, and so on.The kind of Kubernetes API
commit----The hash of the latest synced commit

Resource labels

Config Sync metrics sent to Prometheus and Cloud Monitoring have the following metric labels set to identify the source Pod:

NameDescription
k8s.node.nameThe name of the Node hosting a Kubernetes Pod
k8s.pod.namespaceThe namespace of the Pod
k8s.pod.uidThe UID of the Pod
k8s.pod.ipThe IP of the Pod
k8s.deployment.nameThe name of the Deployment that owns the Pod

Config Sync metrics sent to Prometheus and Cloud Monitoring from reconciler Pods also have the following metric labels set to identify the RootSync or RepoSync used to configure the reconciler:

NameDescription
configsync.sync.kindThe kind of resource that configures this reconciler: RootSync or RepoSync
configsync.sync.nameThe name of the RootSync or RepoSync that configures this reconciler
configsync.sync.namespaceThe namespace of the RootSync or RepoSync that configures this reconciler

Cloud Monitoring resource labels

Cloud Monitoring Resource labels are used for indexing metrics in storage, which means they have negligible effect on cardinality, unlike metric labels, where cardinality is a significant performance concern. See Monitored Resource Types for more information.

The k8s_container resource type sets the following resource labels to identify the source Container:

NameDescription
container_nameThe name of the Container
pod_nameThe name of the Pod
namespace_nameThe namespace of the Pod
locationThe region or zone of the cluster hosting the node
cluster_nameThe name of the cluster hosting the node
projectThe ID of the project hosting the cluster

Configure custom metric filtering

You can adjust the custom metrics that Config Sync exports to Prometheus, Cloud Monitoring, and Google's internal monitoring service. Adjust custom metrics to fine-tune the included metrics or configure different backends.

To modify custom metrics, create and then edit a ConfigMap named otel-collector-custom. Using this ConfigMap ensures that Config Sync doesn't revert any of the modifications that you make. If you modify the otel-collector or otel-collector-google-cloud ConfigMaps, Config Sync reverts any changes.

For examples of how to adjust this ConfigMap, see Custom Metric Filtering in the open source Config Sync documentation.

Understand the pipeline_error_observed metric

The pipeline_error_observed metric is a metric that can help you quickly identify RepoSync or RootSync CRs that are not in sync or contain resources that are not reconciled to the desired state.

  • For a successful sync by a RootSync or RepoSync, the metrics with all components (rendering, source, sync, readiness) are observed with value 0.

    A screenshot of the pipeline_error_observed metric with all components observed with value 0

  • When the latest commit fails the automated rendering, the metric with the component rendering is observed with value 1.

  • When checking out the latest commit encounters error or the latest commit contains invalid configuration, the metric with the component source is observed with value 1.

  • When a resource fails to be applied to the cluster, the metric with the component sync is observed with value 1.

  • When a resource is applied, but fails to reach its desired state, the metric with the component readiness is observed with value 1. For example, a Deployment is applied to the cluster, but the corresponding Pods are not created successfully.

What's next