Nvidia GPU benchmark压力测试工具

大家好，欢迎来到IT知识分享网。

一、参考资料

使用Pytorch测试cuda设备的性能（单卡或多卡并行）

GPU：使用gpu-burn压测GPU

二、GPU压力测试方法

1. PyTorch 方式

使用 torch.ones 测试CUDA设备。

import torch import time def cuda_benchmark(device_id, N=): # 指定要使用的显卡设备 torch.cuda.set_device(device_id) # 创建输入数据 data = torch.ones(N).cuda() # 启动CUDA操作，并记录执行时间 start_time = time.time() for i in range(10000): data += 1 torch.cuda.synchronize() # 等待CUDA操作执行完成 end_time = time.time() # 将结果从GPU内存下载到主机内存 result = data.cpu().numpy() # 打印Benchmark结果和执行时间 print(f"Benchmark结果：{ 
     result[:10]}") print(f"执行时间：{ 
     end_time - start_time} 秒") if __name__ == '__main__': # 测试第一块显卡 device_id = 0 cuda_benchmark(device_id, )

2. CUDABenchmarkModel 方式

使用自带的 CUDABenchmarkModel 测试CUDA设备。

import torch import torch.nn as nn import time class CUDABenchmarkModel(nn.Module): def __init__(self): super(CUDABenchmarkModel, self).__init__() self.fc = nn.Linear(10, 10).cuda() def forward(self, x): return self.fc(x) def cuda_benchmark(device_ids, N=): # 创建模型 model = CUDABenchmarkModel() model = nn.DataParallel(model, device_ids=device_ids) # 创建输入数据 data = torch.ones(N, 10).cuda() # 启动CUDA操作，并记录执行时间 start_time = time.time() for i in range(10000): output = model(data) torch.cuda.synchronize() # 等待CUDA操作执行完成 end_time = time.time() # 打印执行时间 print(f"执行时间：{ 
     end_time - start_time} 秒") if __name__ == '__main__': # 同时测试3块显卡 device_ids = [0, 1, 3] cuda_benchmark(device_ids=device_ids)

3. nccl 方式

使用nccl多进程的方式测试CUDA设备。

import torch import torch.nn as nn import torch.distributed as dist import torch.multiprocessing as mp import time def cuda_benchmark(device_id, N=): # 指定要使用的显卡设备 torch.cuda.set_device(device_id) print(f"该GPU的核心数量为：{ 
     torch.cuda.get_device_properties(device_id).multi_processor_count}") # 创建输入数据 data = torch.ones(N).cuda() # 启动CUDA操作，并记录执行时间 start_time = time.time() for i in range(10000): data += 1 torch.cuda.synchronize() # 等待CUDA操作执行完成 end_time = time.time() # 将结果从GPU内存下载到主机内存 result = data.cpu().numpy() # 打印Benchmark结果和执行时间 print(f"Benchmark结果：{ 
     result[:10]}") print(f"执行时间：{ 
     end_time - start_time} 秒") def main(num): # 初始化多进程 mp.spawn(run, args=(num,), nprocs=num) def run(rank,world_size): """每个进程的入口函数""" # 初始化进程组 dist.init_process_group("nccl", init_method="tcp://127.0.0.1:23456", rank=rank, world_size=world_size) # 指定设备ID device_id = rank # 在多个GPU上并行执行操作 model = cuda_benchmark(device_id) if __name__ == '__main__': # 同时启用3个进程（一个进程对应一块显卡） device_numbers = 3 main(device_numbers)

4. gpu-burn 方式

gpu_burn 代码仓库：https://github.com/wilicc/gpu-burn

4.1 总体步骤

git clone https://github.com/wilicc/gpu-burn cd gpu-burn make

4.2 make编译

yoyo@yoyo:~/360Downloads/gpu-burn$ make COMPUTE=8.6 g++ -O3 -Wno-unused-result -I/usr/local/cuda/include -std=c++11 -DIS_JETSON=false -c gpu_burn-drv.cpp PATH="/opt/ros/kinetic/bin:/home/yoyo/360Downloads/cmake-3.21.1-linux-x86_64/bin:/home/yoyo/miniconda3/condabin:/home/yoyo/bin:/home/yoyo/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/usr/local/cuda/bin::." /usr/local/cuda/bin/nvcc -I/usr/local/cuda/include -arch=compute_86 -ptx compare.cu -o compare.ptx g++ -o gpu_burn gpu_burn-drv.o -O3 -lcuda -L/usr/local/cuda/lib64 -L/usr/local/cuda/lib64/stubs -L/usr/local/cuda/lib -L/usr/local/cuda/lib/stubs -Wl,-rpath=/usr/local/cuda/lib64 -Wl,-rpath=/usr/local/cuda/lib -lcublas -lcudart

安装成功

编译成功后，将生成 gpu_burn 二进制文件。

yoyo@yoyo:~/360Downloads/gpu-burn$ ./gpu_burn -h GPU Burn Usage: gpu-burn [OPTIONS] [TIME] -m X Use X MB of memory. -m N% Use N% of the available GPU memory. Default is 90% -d Use doubles -tc Try to use Tensor cores -l Lists all GPUs in the system -i N Execute only on GPU N -c FILE Use FILE as compare kernel. Default is compare.ptx -stts T Set timeout threshold to T seconds for using SIGTERM to abort child processes before using SIGKILL. Default is 30 -h Show this help message Examples: gpu-burn -d 3600 # burns all GPUs with doubles for an hour gpu-burn -m 50% # burns using 50% of the available GPU memory gpu-burn -l # list GPUs gpu-burn -i 2 # burns only GPU of index 2

4.3 测试GPU

测试单卡：

yoyo@yoyo:~/360Downloads/gpu-burn$ ./gpu_burn 120 Using compare file: compare.ptx Burning for 120 seconds. GPU 0: NVIDIA GeForce RTX 3060 (UUID: GPU-a460cb29-b0ea-f6a5-b261-590f0a23f79e) Initialized device 0 with 12050 MB of memory (11457 MB available, using 10311 MB of it), using FLOATS Results are  bytes each, thus performing 38 iterations 10.8% proc'd: 76 (7234 Gflop/s) errors: 0 temps: 57 C Summary at: 2024年 09月 05日 星期四 22:53:05 CST 25.0% proc'd: 190 (7106 Gflop/s) errors: 0 temps: 62 C Summary at: 2024年 09月 05日 星期四 22:53:22 CST 37.5% proc'd: 266 (7019 Gflop/s) errors: 0 temps: 65 C Summary at: 2024年 09月 05日 星期四 22:53:37 CST 50.0% proc'd: 380 (7000 Gflop/s) errors: 0 temps: 67 C Summary at: 2024年 09月 05日 星期四 22:53:52 CST

测试多卡：

export CUDA_VISIBLE_DEVICES=0,1 ./gpu_burn 100

免责声明：本站所有文章内容,图片，视频等均是来源于用户投稿和互联网及文摘转载整编而成，不代表本站观点，不承担相关法律责任。其著作权各归其原作者或其出版社所有。如发现本站有涉嫌抄袭侵权/违法违规的内容,侵犯到您的权益，请在线联系站长,一经查实,本站将立刻删除。本文来自网络,若有侵权，请联系删除，如若转载，请注明出处：https://haidsoft.com/113958.html