tf.distribute.experimental.CommunicationOptions
Stay organized with collections Save and categorize content based on your preferences.
Options for cross device communications like All-reduce.
tf.distribute.experimental.CommunicationOptions(
bytes_per_pack=0,
timeout_seconds=None,
implementation=tf.distribute.experimental.CollectiveCommunication.AUTO
)
Used in the notebooks
This can be passed to methods like tf.distribute.get_replica_context().all_reduce()
to optimize collective operation performance. Note that these are only hints, which may or may not change the actual behavior. Some options only apply to certain strategy and are ignored by others.
One common optimization is to break gradients all-reduce into multiple packs so that weight updates can overlap with gradient all-reduce.
Examples:
options = tf.distribute.experimental.CommunicationOptions(
bytes_per_pack=50 * 1024 * 1024,
timeout_seconds=120.0,
implementation=tf.distribute.experimental.CommunicationImplementation.NCCL
)
grads = tf.distribute.get_replica_context().all_reduce(
'sum', grads, options=options)
optimizer.apply_gradients(zip(grads, vars),
experimental_aggregate_gradients=False)
Args |
---|
bytes_per_pack | a non-negative integer. Breaks collective operations into packs of certain size. If it's zero, the value is determined automatically. This hint is respected by all multi-replica strategies except TPUStrategy . |
timeout_seconds | a float or None, timeout in seconds. If not None, the collective raises tf.errors.DeadlineExceededError if it takes longer than this timeout. Zero disables timeout. This can be useful when debugging hanging issues. This should only be used for debugging since it creates a new thread for each collective, i.e. an overhead of timeout_seconds * num_collectives_per_second more threads. This only works for tf.distribute.experimental.MultiWorkerMirroredStrategy . |
implementation | a tf.distribute.experimental.CommunicationImplementation . This is a hint on the preferred communication implementation. Possible values include AUTO , RING , and NCCL . NCCL is generally more performant for GPU, but doesn't work for CPU. This only works for tf.distribute.experimental.MultiWorkerMirroredStrategy . |
Raises |
---|
ValueError | When arguments have invalid value. |
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates. Some content is licensed under the numpy license.
Last updated 2024-04-26 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2024-04-26 UTC."],[],[]]