Description
There are 1 billion old records need to be cleaned. According the documents, It seems Partitioned DML should be the good choice. I choose java client because we can set timeout by using setPartitionedDmlTimeout(Duration.ofHours(24L)). However, I still experienced the '504 Deadline Exceeded' after a few hours that shorter than 24 hours I set. Why setPartitionedDmlTimeout did not work? How should I solve the problem? Is it not possible to delete so large number of data by using Partitioned DML.
Following are the details.
Environment details
- Spanner
3 nodes, 2 billion records in a table and 1 billion old records need to be cleaned - OS type and version: MacOs version 10.15.3
- Java version: java 8
- spanner version(s): the latest version
Steps to reproduce
- Write 2 billion records with timestamp in to the spanner table with dataflow
- Run a clean job to delete the old 1 billion records with Partitioned DML by java client
- Set the timeout to be 24 hours by using setPartitionedDmlTimeout(Duration.ofHours(24L))
- Run the java code
Code example
SpannerOptions options = SpannerOptions.newBuilder().setPartitionedDmlTimeout(Duration.ofHours(24L)).build();
Spanner spanner = options.getService();
try {
DatabaseId db = DatabaseId.of(options.getProjectId(), args[0], args[1]);
// [END init_client]
//This will return the default project id based on the environment.
String clientProject = spanner.getOptions().getProjectId();
if (!db.getInstanceId().getProject().equals(clientProject)) {
System.err.println(
"Invalid project specified. Project in the database id should match the"
+ "project name set in the environment variable GOOGLE_CLOUD_PROJECT. Expected: "
+ clientProject);
}
// [START init_client]
DatabaseClient dbClient = spanner.getDatabaseClient(db);
SimpleDateFormat formatter = new SimpleDateFormat("dd/MM/yyyy hh:mm:ss.SSS");
String sql="DELETE FROM MyTable WHERE updateTimestamp<'2020-05-02T03:00:00Z'";
Date currentDate = new Date();
System.out.println(String.format("Delete Start:%s",formatter.format(currentDate)));
long lStartTime = Instant.now().toEpochMilli();
deleteUsingPartitionedDml(dbClient,sql);
long lEndTime = Instant.now().toEpochMilli();
long elaspedTime = lEndTime - lStartTime;
System.out.println("Elapsed time in : " + elaspedTime/1000);
} finally {
spanner.close();
}
// [END init_client]
System.out.println("Closed client");
}
private static void deleteUsingPartitionedDml(DatabaseClient dbClient,String sql) {
long rowCount = dbClient.executePartitionedUpdate(Statement.of(sql));
System.out.printf("%d records deleted.\n", rowCount);
}