Closed
@FengnaLiu

Description

There are 1 billion old records need to be cleaned. According the documents, It seems Partitioned DML should be the good choice. I choose java client because we can set timeout by using setPartitionedDmlTimeout(Duration.ofHours(24L)). However, I still experienced the '504 Deadline Exceeded' after a few hours that shorter than 24 hours I set. Why setPartitionedDmlTimeout did not work? How should I solve the problem? Is it not possible to delete so large number of data by using Partitioned DML.

Following are the details.

Environment details

  1. Spanner
    3 nodes, 2 billion records in a table and 1 billion old records need to be cleaned
  2. OS type and version: MacOs version 10.15.3
  3. Java version: java 8
  4. spanner version(s): the latest version

Steps to reproduce

  1. Write 2 billion records with timestamp in to the spanner table with dataflow
  2. Run a clean job to delete the old 1 billion records with Partitioned DML by java client
  3. Set the timeout to be 24 hours by using setPartitionedDmlTimeout(Duration.ofHours(24L))
  4. Run the java code

Code example

SpannerOptions options = SpannerOptions.newBuilder().setPartitionedDmlTimeout(Duration.ofHours(24L)).build();

        Spanner spanner = options.getService();
        try {

            DatabaseId db = DatabaseId.of(options.getProjectId(), args[0], args[1]);
            // [END init_client]
            //This will return the default project id based on the environment.
            String clientProject = spanner.getOptions().getProjectId();
            if (!db.getInstanceId().getProject().equals(clientProject)) {
                System.err.println(
                        "Invalid project specified. Project in the database id should match the"
                                + "project name set in the environment variable GOOGLE_CLOUD_PROJECT. Expected: "
                                + clientProject);
            }

            // [START init_client]
            DatabaseClient dbClient = spanner.getDatabaseClient(db);

            SimpleDateFormat formatter = new SimpleDateFormat("dd/MM/yyyy hh:mm:ss.SSS");
            String sql="DELETE FROM MyTable WHERE updateTimestamp<'2020-05-02T03:00:00Z'";
            Date currentDate = new Date();
            System.out.println(String.format("Delete Start:%s",formatter.format(currentDate)));

            long lStartTime = Instant.now().toEpochMilli();
            deleteUsingPartitionedDml(dbClient,sql);
            long lEndTime = Instant.now().toEpochMilli();
            long elaspedTime = lEndTime - lStartTime;
            System.out.println("Elapsed time in : " + elaspedTime/1000);

        } finally {
            spanner.close();
        }
        // [END init_client]
        System.out.println("Closed client");
    }
    private static void deleteUsingPartitionedDml(DatabaseClient dbClient,String sql) {
        long rowCount = dbClient.executePartitionedUpdate(Statement.of(sql));

        System.out.printf("%d records deleted.\n", rowCount);
    }