Assignees
Labels
api: spannerIssues related to the Spanner API.priority: p1Important issue which blocks shipping the next release. Will be fixed prior to next release.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@lesv
com.google.cloud.spanner.SpannerException: DEADLINE_EXCEEDED: io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED: Deadline expired before operation could complete.
	at com.example.spanner.SpannerSampleIT.runSample(SpannerSampleIT.java:60)
	at com.example.spanner.SpannerSampleIT.testSample(SpannerSampleIT.java:268)
Caused by: io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED: Deadline expired before operation could complete.
@lesvlesv added type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.priority: p1Important issue which blocks shipping the next release. Will be fixed prior to next release.api: spannerIssues related to the Spanner API.labels Apr 5, 2020
@lesvlesv assigned dmitry-s and skuruppu and unassigned nnegrey Apr 5, 2020
@lesv

Found in PR #2563

@skuruppu

@olavloite, would you please be able to take a look? Thanks very much.

@olavloite

I've seen similar flakes in other test cases as well, and the team developing the CPP client for Spanner is also running into a similar issue. The problem is that the CreateBackup and CreateDatabase operations are non-idempotent. These can therefore sometimes fail with DEADLINE_EXCEEDED and/or UNAVAILABLE in case of temporary network problems or other transient issues.

I'll have a look and see if we can try to fix this in the client library by using approximately the following logic:

  1. In case of a DEADLINE_EXCEEDED or UNAVAILABLE error for these Create... calls, the client library will query the backend for the corresponding long-running operation for the call.
  2. If the long-running operation is found, the call will return an OperationFuture for the operation.
  3. If the long-running operation was not found, the Create... call will be retried.

olavloite added a commit to googleapis/java-spanner that referenced this issue Apr 7, 2020
RPCs returning a long-running operation, such as CreateDatabase,
CreateBackup and RestoreDatabase, are non-idempotent and cannot be
retried automatically by gax. This means that these RPCs sometimes fail
with transient errors, such as UNAVAILABLE or DEADLINE_EXCEEDED. This
change introduces automatic retries of these RPCs using the following
logic:
1. Execute the RPC and wait for the operation to be returned.
2. If a transient error occurs while waiting for the operation, the
   client library queries the backend for the corresponding operation.
   If the operation is found, the resumes the tracking of the existing
   operation and returns that to the user.
3. If no corresponding operation is found in step 2, the client library
   retries the RPC from step 1.

Fixes GoogleCloudPlatform/java-docs-samples#2571
olavloite added a commit to googleapis/java-spanner that referenced this issue Apr 10, 2020
RPCs returning a long-running operation, such as CreateDatabase,
CreateBackup and RestoreDatabase, are non-idempotent and cannot be
retried automatically by gax. This means that these RPCs sometimes fail
with transient errors, such as UNAVAILABLE or DEADLINE_EXCEEDED. This
change introduces automatic retries of these RPCs using the following
logic:
1. Execute the RPC and wait for the operation to be returned.
2. If a transient error occurs while waiting for the operation, the
   client library queries the backend for the corresponding operation.
   If the operation is found, the resumes the tracking of the existing
   operation and returns that to the user.
3. If no corresponding operation is found in step 2, the client library
   retries the RPC from step 1.

Fixes GoogleCloudPlatform/java-docs-samples#2571
olavloite added a commit to googleapis/java-spanner that referenced this issue Apr 13, 2020
RPCs returning a long-running operation, such as CreateDatabase, CreateBackup and RestoreDatabase, are non-idempotent and cannot be retried automatically by gax. This means that these RPCs sometimes fail with transient errors, such as UNAVAILABLE or DEADLINE_EXCEEDED. This change introduces automatic retries of these RPCs using the following logic:
1. Execute the RPC and wait for the operation to be returned.
2. If a transient error occurs while waiting for the operation, the
   client library queries the backend for the corresponding operation.
   If the operation is found, the resumes the tracking of the existing
   operation and returns that to the user.
3. If no corresponding operation is found in step 2, the client library
   retries the RPC from step 1.

Fixes GoogleCloudPlatform/java-docs-samples#2571
Sign up for free to join this conversation on . Already have an account? Sign in to comment
api: spannerIssues related to the Spanner API.priority: p1Important issue which blocks shipping the next release. Will be fixed prior to next release.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.
None yet

Successfully merging a pull request may close this issue.