2-4 设备管理

2.4 设备管理

在本节,你将通过以下两种方法学习查询和管理GPU设备:

  • CUDA运行时API函数
  • NVIDIA系统管理界面(nvidia-smi)命令行实用程序

2.4.1 使用运行时API查询GPU信息

  • cudaError_t cudaSetDevice(int dev) 设置当前GPU设备

当有多个GPU时,选定cuda设备,默认是0即第一个主GPU,多GPU时0,1,2以此类推。

  • cudaError_t cudaGetDeviceProperties(struct cudaDeviceProp* prop,int dev) 返回设备信息

以*prop形式返回设备dev的属性。struct cudaDevicePro结构体内定义了很多记录GPU设备信息的数据成员,比如multiProcessorCount表示设备上多处理器的数量。详见NVIDIA官网

  • cudaError_t cudaGetDeviceCount( int* count )返回具体计算能力的设备数量

以 *count 形式返回可用于执行的计算能力大于等于 1.0 的设备数量。

代码清单2-8提供了一个示例,查询了大家通常感兴趣的一般属性

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
//chapter02/checkDeviceInfor.cu
#include "../common/common.h"
#include <cuda_runtime.h>
#include <stdio.h>

/*
* Display a variety of information on the first CUDA device in this system,
* including driver version, runtime version, compute capability, bytes of
* global memory, etc.
*/

int main(int argc, char **argv)
{
printf("%s Starting...\n", argv[0]);

int deviceCount = 0;
cudaGetDeviceCount(&deviceCount);

if (deviceCount == 0)
{
printf("There are no available device(s) that support CUDA\n");
}
else
{
printf("Detected %d CUDA Capable device(s)\n", deviceCount);
}

int dev = 0, driverVersion = 0, runtimeVersion = 0;
CHECK(cudaSetDevice(dev));
cudaDeviceProp deviceProp;
CHECK(cudaGetDeviceProperties(&deviceProp, dev));
printf("Device %d: \"%s\"\n", dev, deviceProp.name);

cudaDriverGetVersion(&driverVersion);
cudaRuntimeGetVersion(&runtimeVersion);
printf(" CUDA Driver Version / Runtime Version %d.%d / %d.%d\n",
driverVersion / 1000, (driverVersion % 100) / 10,
runtimeVersion / 1000, (runtimeVersion % 100) / 10);
printf(" CUDA Capability Major/Minor version number: %d.%d\n",
deviceProp.major, deviceProp.minor);
printf(" Total amount of global memory: %.2f GBytes (%llu "
"bytes)\n", (float)deviceProp.totalGlobalMem / pow(1024.0, 3),
(unsigned long long)deviceProp.totalGlobalMem);
printf(" GPU Clock rate: %.0f MHz (%0.2f "
"GHz)\n", deviceProp.clockRate * 1e-3f,
deviceProp.clockRate * 1e-6f);
printf(" Memory Clock rate: %.0f Mhz\n",
deviceProp.memoryClockRate * 1e-3f);
printf(" Memory Bus Width: %d-bit\n",
deviceProp.memoryBusWidth);

if (deviceProp.l2CacheSize)
{
printf(" L2 Cache Size: %d bytes\n",
deviceProp.l2CacheSize);
}

printf(" Max Texture Dimension Size (x,y,z) 1D=(%d), "
"2D=(%d,%d), 3D=(%d,%d,%d)\n", deviceProp.maxTexture1D,
deviceProp.maxTexture2D[0], deviceProp.maxTexture2D[1],
deviceProp.maxTexture3D[0], deviceProp.maxTexture3D[1],
deviceProp.maxTexture3D[2]);
printf(" Max Layered Texture Size (dim) x layers 1D=(%d) x %d, "
"2D=(%d,%d) x %d\n", deviceProp.maxTexture1DLayered[0],
deviceProp.maxTexture1DLayered[1], deviceProp.maxTexture2DLayered[0],
deviceProp.maxTexture2DLayered[1],
deviceProp.maxTexture2DLayered[2]);
printf(" Total amount of constant memory: %lu bytes\n",
deviceProp.totalConstMem);
printf(" Total amount of shared memory per block: %lu bytes\n",
deviceProp.sharedMemPerBlock);
printf(" Total number of registers available per block: %d\n",
deviceProp.regsPerBlock);
printf(" Warp size: %d\n",
deviceProp.warpSize);
printf(" Maximum number of threads per multiprocessor: %d\n",
deviceProp.maxThreadsPerMultiProcessor);
printf(" Maximum number of threads per block: %d\n",
deviceProp.maxThreadsPerBlock);
printf(" Maximum sizes of each dimension of a block: %d x %d x %d\n",
deviceProp.maxThreadsDim[0],
deviceProp.maxThreadsDim[1],
deviceProp.maxThreadsDim[2]);
printf(" Maximum sizes of each dimension of a grid: %d x %d x %d\n",
deviceProp.maxGridSize[0],
deviceProp.maxGridSize[1],
deviceProp.maxGridSize[2]);
printf(" Maximum memory pitch: %lu bytes\n",
deviceProp.memPitch);

exit(EXIT_SUCCESS);
}

我的电脑运行如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
./checkDeviceInfor
./checkDeviceInfor Starting...
Detected 1 CUDA Capable device(s)
Device 0: "Quadro P2000"
CUDA Driver Version / Runtime Version 11.7 / 11.4
CUDA Capability Major/Minor version number: 6.1
Total amount of global memory: 4.93 GBytes (5296029696 bytes)
GPU Clock rate: 1481 MHz (1.48 GHz)
Memory Clock rate: 3504 Mhz
Memory Bus Width: 160-bit
L2 Cache Size: 1310720 bytes
Max Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072,65536), 3D=(16384,16384,16384)
Max Layered Texture Size (dim) x layers 1D=(32768) x 2048, 2D=(32768,32768) x 2048
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 2147483647 x 65535 x 65535
Maximum memory pitch: 2147483647 bytes

2.4.2 确定最优GPU

有多GPU设备时,我们可以根据multiProcessorCount设备的多处理器的数量,来选择最优的GPU。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
int numDevices = 0 ;
cudaGetDeviceCount(&numDevices);
if (numDevices > 1) {
int maxMultiprocessors = 0,
maxDevice = 0 ;
for (int device=0; device<numDevices; device++){
cudaDeviceProp props;
cudaGetDeviceProperties(&props, device);
if (maxMultiprocessors < props.multiProcessorCount){
maxMultiprocessors = props.multiProcessorCount;
maxDevice = device;
}
}
cudasetDevice(maxDevice);
}

2.4.3 使用nvidia-smi查询GPU信息

nvidia-smi是一个命令行工具,用于管理和监控GPU设备,并允许查询和修改设备状态。

2.4.4 在运行时设置设备

支持多GPU的系统是很常见的。对于一个有N个GPU的系统,nvidia-smi从0到N―1标记设备ID。使用环境变量CUDA_VISIBLE_DEVICES,就可以在运行时指定所选的GPU且无须更改应用程序。

设置运行时环境变量CUDA_VISIBLE_DEVICES=2。nvidia驱动程序会屏蔽其他GPU,这时设备2作为设备0出现在应用程序中。

也可以使用CUDA_VISIBLE_DEVICES指定多个设备。例如,如果想测试GPU 2和GPU3,可以设置CUDA_VISIBLE_DEVICES=2,3。然后,在运行时,nvidia驱动程序将只使用ID为2和3的设备,并且会将设备ID分别映射为0和1。