Tuesday, August 14, 2018

Multi-GPU on Pytorch

After some time, I finally figured out how to run multi-gpu on pytorch. In fact, multi-gpu API is just extremely simple in pytorch; the problem was my system.

Here is a simple test code to try out multi-gpu on pytorch. If this works about of the box, then you are good. However, some people may face problems, as discussed in this forum. As pointed out here, the problem is not about pytorch, but with external factor. In my case, it was ngimel 's comment that saved me. To recap her solution,

1. Test p2pBandwithLatencyTest from CUDA samples and make sure it works fine. If it does not pass this one, then the problem is with CUDA installation, etc, and not with pytorch. To download samples, simply run

$ cuda-install-samples-9.2.sh <target_path>

where you would replace the version above to whatever version you have. Then,

$ cd <target_path>/NVIDIA_CUDA-9.2_Samples/1_Utilities/p2pBandwidthLatencyTest/
$ make
$ ./p2pBandwidthLatencyTest

2. In my case, it was IOMMU that was the culprit. Disable it by editing /etc/default/grub and replace
#GRUB_CMDLINE_LINUX="" 
with
GRUB_CMDLINE_LINUX="iommu=soft"

Then update grup
$ sudo update-grup

Then reboot

This is how I solved my problem. I love this open source community forum! Thank you everyone!

No comments:

Post a Comment