Settings for common alerting policies

To create an alerting policy, you must describe what is to be monitored, when the condition of the alerting policy is met, and how you want to be notified. This page contains settings that you can use to create alerting policies. Most sections in this page have the following elements:

  • Title: Lists the relevant product name and a brief description of the alerting policy.
  • Summary: A brief description of the alerting policy. For full information, see the product documentation.
  • Steps to create an alerting policy: Outline of the steps required to create an alerting policy. For detailed information on these steps, see Creating an alerting policy.
  • New condition: These fields specify what is being monitored and how the data is aggregated.

  • Condition alert trigger: These fields specify when the condition of an alerting policy is met. By changing the retest window, you can reduce how often the condition is met.
When you only want to configure a chart that displays quota data, you can use the settings in the New condition table. Alerting conditions use different notation than charting tools. Charting tools include Metrics Explorer and configuring charts on custom dasards:
New condition dialog
field name
Charts
Rolling window function

Optimally configured based on selected metric and aggregation settings.

To specify the alignment function, do the following:

  1. In the Aggregation element, expand the first menu and select Configure aligner. The Alignment function and Grouping elements are added.
  2. Expand the Alignment function element and make a selection.

Rolling windowMin Interval
(to access, click Add query element)
Time series group by
(in the Across time series section)
Aggregation element's second menu
Time series aggregation
(in the Across time series section)
Aggregation element's first menu

Billing

To be notified if your billable or forecasted charges exceed a budget, create an alert by using the Budgets and alerts page of the Google Cloud console:

  1. In the Google Cloud console, go to the Billing page:

    Go to Billing

    You can also find this page by using the search bar.

    If you have more than one Cloud Billing account, then do one of the following:

    • To manage Cloud Billing for the current project, select Go to linked billing account.
    • To locate a different Cloud Billing account, select Manage billing accounts and choose the account for which you'd like to set a budget.
  2. In the Billing navigation menu, select Budgets & alerts.
  3. Click Create budget.
  4. Complete the budget dialog. In this dialog, you select Google Cloud projects and products, and then you create a budget for that combination. By default, you are notified when you reach 50%, 90%, and 100% of the budget. For complete documentation, see Set budgets and budget alerts.

BigQuery execution time

To create an alerting policy that triggers when the 99th percentile of the execution time of a BigQuery query exceeds a user-defined limit, use the following settings.

New condition
Field

Value
Resource and MetricIn the Resources menu, select BigQuery Project.
In the Metric categories menu, select Query.
In the Metrics menu, select Query execution times.
Filter
Across time series
Time series group by
priority
Across time series
Time series aggregation
99th percentile
Rolling window5 m
Rolling window functionsum
Configure alert trigger
Field

Value
Condition typeThreshold
Alert triggerAny time series violates
Threshold positionAbove threshold
Threshold valueYou determine this value; however, a threshold of 60 seconds is recommended.
Retest windowmost recent value

BigQuery usage

To create an alerting policy that triggers when the ingested BigQuery metrics exceed a user-defined level, use the following settings.

New condition
Field

Value
Resource and MetricIn the Resources menu, select BigQuery Dataset.
In the Metric categories menu, select Storage.
Select a metric from the Metrics menu. Metrics specific to usage include Stored bytes, Uploaded bytes, and Uploaded bytes billed. For a full list of available metrics, see BigQuery metrics.
Filterproject_id: Your Google Cloud project ID.
dataset_id: Your dataset ID.
Across time series
Time series group by
dataset_id: Your dataset ID.
Across time series
Time series aggregation
sum
Rolling window1 m
Rolling window functionmean
Configure alert trigger
Field

Value
Condition typeThreshold
Alert triggerAny time series violates
Threshold positionAbove threshold
Threshold valueYou determine the acceptable value.
Retest window1 minute

Bigtable storage utilization

To create an alerting policy that triggers when the storage utilization for your Bigtable cluster is above a recommended threshold, such as 70%, use the following settings.

New condition
Field

Value
Resource and MetricIn the Resources menu, select Cloud Bigtable Cluster.
In the Metric categories menu, select Cluster.
In the Metrics menu, select Storage utilization.

(The metric.type is bigtable.googleapis.com/cluster/storage_utilization).
Filtercluster = YOUR_CLUSTER_ID
Configure alert trigger
Field

Value
Condition typeThreshold
Condition triggers ifAny time series violates
Threshold positionAbove threshold
Threshold value70
Retest window10 minutes

Compute Engine early boot validation

Early Boot Validation shows the pass/fail status of the early boot portion of the last boot sequence. Early boot is the boot sequence from the start of the UEFI firmware until it passes control to the bootloader.

To create an alerting policy that triggers when the early boot sequence fails for any of your Compute Engine VM instances, use the following settings.

New condition
Field

Value
Resource and MetricIn the Resources menu, select VM Instance.
In the Metric categories menu, select Instance.
In the Metrics menu, select Early boot validation.
Filterstatus = failed
Across time series
Time series group by
status
Across time series
Time series aggregation
sum
Rolling windowUse default.
Rolling window functionUse default
Configure alert trigger
Field

Value
Condition typeThreshold
Alert triggerAny time series violates
Threshold positionAbove threshold
Threshold value0
Retest window1 minute

Compute Engine late boot validation

Late Boot Validation shows the pass/fail status of the late boot portion of the last boot sequence. Late boot is the boot sequence from the bootloader until completion. This includes the loading of the operating system kernel.

To create an alerting policy that triggers when the late boot sequence fails for any of your Compute Engine VM instances, use the following settings.

New condition
Field

Value
Resource and MetricIn the Resources menu, select VM Instance.
In the Metric categories menu, select Instance.
In the Metrics menu, select Late boot validation.
Filterstatus = failed
Across time series
Time series group by
status
Across time series
Time series aggregation
sum
Rolling windowUse default.
Rolling window functionUse default
Configure alert trigger
Field

Value
Condition typeThreshold
Alert triggerAny time series violates
Threshold positionAbove threshold
Threshold value0
Retest window1 minute

Logging monthly log bytes ingested

To create an alerting policy that triggers when the number of log bytes written to your log buckets exceeds your user-defined limit for Cloud Logging, use the following settings.

New condition
Field

Value
Resource and MetricIn the Resources menu, select Global.
In the Metric categories menu, select Logs-based metric.
In the Metrics menu, select Monthly log bytes ingested.
FilterNone.
Across time series
Time series aggregation
sum
Rolling window60 m
Rolling window functionmax
Configure alert trigger
Field

Value
Condition typeThreshold
Alert triggerAny time series violates
Threshold positionAbove threshold
Threshold valueYou determine the acceptable value.
Retest windowMinimum acceptable value is 30 minutes.

Recommendations prediction

To set up a Recommendations prediction alert, use the following settings in the alerting policy.

New condition
Field

Value
Resource and MetricIn the Resources menu, select Consumed API.
In the Metric categories menu, select Api.
In the Metrics menu, select Request count.
Filterservice = recommendationengine.googleapis.com
method = google.cloud.recommendationengine.v1beta1.PredictionService.Predict
response_code != 200
Across time series
Time series aggregation
sum
Rolling window1 m
Rolling window functionsum
Configure alert trigger
Field

Value
Condition typeThreshold
Alert triggerAny time series violates
Threshold positionAbove threshold
Threshold value0
Retest window5 minutes

Recommendations user event recording reduction

To set up a Recommendations event recording reduction alert, use the following settings in the alerting policy.

New condition
Field

Value
Resource and MetricIn the Resources menu, select Consumed API.
In the Metric categories menu, select Api.
In the Metrics menu, select Request count.
Filterservice = recommendationengine.googleapis.com
method = google.cloud.recommendationengine.v1beta1.PredictionService.CollectUserEvent
response_code != 200
Across time series
Time series aggregation
sum
Rolling window1 m
Rolling window functionsum
Configure alert trigger
Field

Value
Condition typeMetric absence
Alert triggerAny time series violates
Trigger absence time10 minutes

Spanner high priority CPU usage

To create an alerting policy that triggers when your high priority cpu utilization for Spanner is above a recommended threshold, use the following settings.

New condition
Field

Value
Resource and MetricIn the Resources menu, select Spanner Instance.
In the Metric categories menu, select Instance.
In the Metrics menu, select CPU Utilization by priority.

(The metric.type is spanner.googleapis.com/instance/cpu/utilization_by_priority).
Filterinstance_id = YOUR_INSTANCE_ID
priority = high
Across time series
Time series group by
location for multi-region instances;
leave it blank for regional instances.
Across time series
Time series aggregation
sum
Rolling window10 m
Rolling window functionmean
Configure alert trigger
Field

Value
Condition typeThreshold
Alert triggerAny time series violates
Threshold positionAbove threshold
Threshold value45% for multi-region instances;
65% for regional instances.
Retest window10 minutes

Spanner 24 hour rolling usage

To create an alerting policy that triggers when the 24 hour rolling average of your cpu utilization for Spanner is above a recommended threshold, use the following settings.

New condition
Field

Value
Resource and MetricIn the Resources menu, select Spanner Instance.
In the Metric categories menu, select Instance.
In the Metrics menu, select Smoothed CPU utilization.

(The metric.type is spanner.googleapis.com/instance/cpu/smoothed_utilization).
Filterinstance_id = YOUR_INSTANCE_ID
Across time series
Time series aggregation
sum
Rolling window10 m
Rolling window functionmean
Configure alert trigger
Field

Value
Condition typeThreshold
Alert triggerAny time series violates
Threshold positionAbove threshold
Threshold90%
Retest window10 minutes

Spanner storage

To create an alerting policy that triggers when your storage for your Spanner instance is above a recommended threshold, use the following settings.

New condition
Field

Value
Resource and MetricIn the Resources menu, select Spanner Instance.
In the Metric categories menu, select Instance.
In the Metrics menu, select Storage used.

(The metric.type is spanner.googleapis.com/instance/storage/utilization).
Filterinstance_id = YOUR_INSTANCE_ID
Across time series
Time series aggregation
sum
Rolling window10 m
Rolling window functionmax
Configure alert trigger
Field

Value
Condition typeThreshold
Condition triggers ifAny time series violates
Threshold positionAbove threshold
Threshold valueYou don't need to set a specific threshold for the maximum storage per node. However, we recommended that you set up an alert for when you are approaching the maximum storage limit. To learn more, see Storage utilization metrics.
Retest window10 minutes

Trace over quota on API usage

To create an alerting policy that triggers when your monthly Cloud Trace spans ingested exceeds your quota, use the following settings.

New condition
Field

Value
Resource and MetricIn the Resources menu, select Consumed API.
In the Metric categories menu, select Api.
In the Metrics menu, select Request count.

(The metric.type is serviceruntime.googleapis.com/api/request_count).
Filterservice = cloudtrace.googleapis.com
response_code = 429
Across time series
Time series aggregation
sum
Rolling window1 m
Rolling window functionsum
Configure alert trigger
Field

Value
Condition typeThreshold
Alert triggerAny time series violates
Threshold positionAbove threshold
Threshold value0
Retest window1 minute

Trace monitor monthly span-usage

To create an alerting policy that triggers when your monthly Cloud Trace spans ingested exceeds a user-defined limit, use the following settings.

New condition
Field

Value
Resource and MetricIn the Resources menu, select Global.
In the Metric categories menu, select Billing.
In the Metrics menu, select Monthly trace spans ingested.
Filter
Across time series
Time series aggregation
sum
Rolling window60 m
Rolling window functionmax
Configure alert trigger
Field

Value
Condition typeThreshold
Alert triggerAny time series violates
Threshold positionAbove threshold
Threshold valueYou determine the acceptable value.
Retest windowMinimum acceptable value is 30 minutes.

Trace export errors

To create an alerting policy that triggers if there are errors exporting Cloud Trace data to BigQuery, use the following settings.

New condition
Field

Value
Resource and MetricIn the Resources menu, select Cloud Trace.
In the Metric categories menu, select Bigquery_export.
In the Metrics menu, select Spans Exported to BigQuert.
Filterstatus != ok
Across time series
Time series group by
status
Across time series
Time series aggregation
sum
Rolling window1 m
Rolling window functionrate
Configure alert trigger
Field

Value
Condition typeThreshold
Alert triggerAny time series violates
Threshold positionAbove threshold
Threshold value0
Retest window1 minute

Uptime check monitoring

To create an alerting policy for an uptime check, or to create a chart that displays the success or latency status of an uptime check, see Alerting on uptime checks.