GPU machine types

To use GPUs on Google Cloud, you can either deploy an accelerator-optimized VM that has attached GPUs, or attach GPUs to an N1 general-purpose VM. The following GPU machine types are supported for running your artificial intelligence (AI), machine learning (ML) and high performance computing (HPC) workloads on the AI Hypercomputer platform.

A4X series

This section outlines the available configurations for the A4X machine series. For more information about this machine series, see A4X accelerator-optimized machine series in the Compute Engine documentation.

A4X

These machine types use NVIDIA GB200 Superchips (nvidia-gb200) and are ideal for foundation model training and serving.

A4X is an exascale platform based on NVIDIA GB200 NVL72. Each machine has two sockets with NVIDIA Grace™ CPUs with Arm® Neoverse™ V2 cores. These CPUs are connected to four B200 GPUs with fast chip-to-chip (NVLink-C2C) communication.

Machine typeGPU countGPU memory*
(GB HBM3e)
vCPU countVM memory (GB)Attached Local SSD (GiB)Physical NIC countMaximum network bandwidth (GBps)
a4x-highgpu-4g472014088412,00062,000

*GPU memory is the memory on a GPU device that can be used for temporary storage of data. It is separate from the VM's memory and is specifically designed to handle the higher bandwidth demands of your graphics-intensive workloads.
A vCPU is implemented as a single hardware hyper-thread on one of the available CPU platforms.
Maximum egress bandwidth cannot exceed the number given. Actual egress bandwidth depends on the destination IP address and other factors. See Network bandwidth.

A4 series

This section outlines the available configurations for the A4 machine series. For more information about this machine series, see A4 accelerator-optimized machine series in the Compute Engine documentation.

A4

These machine types have NVIDIA B200 GPUs (nvidia-b200) attached and are ideal for foundation model training and serving.

Machine typeGPU countGPU memory*
(GB HBM3e)
vCPU countVM memory (GB)Attached Local SSD (GiB)Physical NIC countMaximum network bandwidth (Gbps)
a4-highgpu-8g81,4402243,96812,000103,600

*GPU memory is the memory on a GPU device that can be used for temporary storage of data. It is separate from the VM's memory and is specifically designed to handle the higher bandwidth demands of your graphics-intensive workloads.
A vCPU is implemented as a single hardware hyper-thread on one of the available CPU platforms.
Maximum egress bandwidth cannot exceed the number given. Actual egress bandwidth depends on the destination IP address and other factors. See Network bandwidth.

A3 series

This section outlines the available configurations for the A3 machine series. For more information about this machine series, see A3 accelerator-optimized machine series in the Compute Engine documentation.

A3 Ultra

These machine types have NVIDIA H200 GPUs (nvidia-h200-141gb) attached and are ideal for foundation model training and serving.

Machine typeGPU countGPU memory*
(GB HBM3e)
vCPU countVM memory (GB)Attached Local SSD (GiB)Physical NIC countMaximum network bandwidth (Gbps)
a3-ultragpu-8g811282242,95212,000103,600

*GPU memory is the memory on a GPU device that can be used for temporary storage of data. It is separate from the VM's memory and is specifically designed to handle the higher bandwidth demands of your graphics-intensive workloads.
A vCPU is implemented as a single hardware hyper-thread on one of the available CPU platforms.
Maximum egress bandwidth cannot exceed the number given. Actual egress bandwidth depends on the destination IP address and other factors. See Network bandwidth.

A3 Mega

These machine types have NVIDIA H100 80GB GPUs (nvidia-h100-mega-80gb) and are ideal for large model training and multi-host inference.

Machine typeGPU countGPU memory*
(GB HBM3)
vCPU countVM memory (GB)Attached Local SSD (GiB)Physical NIC countMaximum network bandwidth (Gbps)
a3-megagpu-8g86402081,8726,00091,800

*GPU memory is the memory on a GPU device that can be used for temporary storage of data. It is separate from the VM's memory and is specifically designed to handle the higher bandwidth demands of your graphics-intensive workloads.
A vCPU is implemented as a single hardware hyper-thread on one of the available CPU platforms.
Maximum egress bandwidth cannot exceed the number given. Actual egress bandwidth depends on the destination IP address and other factors. See Network bandwidth.

A3 High

These machine types have NVIDIA H100 80GB GPUs (nvidia-h100-80gb) and are well-suited for both large model inference and model fine tuning.

Machine typeGPU countGPU memory*
(GB HBM3)
vCPU countVM memory (GB)Attached Local SSD (GiB)Physical NIC countMaximum network bandwidth (Gbps)
a3-highgpu-1g18026234750125
a3-highgpu-2g2160524681,500150
a3-highgpu-4g43201049363,0001100
a3-highgpu-8g86402081,8726,00051,000

*GPU memory is the memory on a GPU device that can be used for temporary storage of data. It is separate from the VM's memory and is specifically designed to handle the higher bandwidth demands of your graphics-intensive workloads.
A vCPU is implemented as a single hardware hyper-thread on one of the available CPU platforms.
Maximum egress bandwidth cannot exceed the number given. Actual egress bandwidth depends on the destination IP address and other factors. See Network bandwidth.

A3 Edge

These machine types have NVIDIA H100 80GB GPUs (nvidia-h100-80gb), are designed specifically for serving and are available in a limited set of regions.

Machine typeGPU countGPU memory*
(GB HBM3)
vCPU countVM memory (GB)Attached Local SSD (GiB)Physical NIC countMaximum network bandwidth (Gbps)
a3-edgegpu-8g86402081,8726,0005
  • 800: for asia-south1 and northamerica-northeast2
  • 400: for all other A3 Edge regions

*GPU memory is the memory on a GPU device that can be used for temporary storage of data. It is separate from the VM's memory and is specifically designed to handle the higher bandwidth demands of your graphics-intensive workloads.
A vCPU is implemented as a single hardware hyper-thread on one of the available CPU platforms.
Maximum egress bandwidth cannot exceed the number given. Actual egress bandwidth depends on the destination IP address and other factors. See Network bandwidth.

What's next?

For more information about GPUs, see the following pages in the Compute Engine documentation: