With my experimental model getting larger and larger, training takes too long time. This is especially true as most of my model is vision-based, so it requires a lot of computation and memory. Yes, I could use cloud computing, such as AWS or GCP, so I did some calculation.
The cheapest monthly cost I found for an instance with 2x GTX 1080 Ti GPUs is $500 (AWS or GCP costs much higher). In just four months of using the service, I would spend $2000 on the cloud service.
Instead, I could spend $2000 once building my own system with 2 GPUs, and some $50 or less each month for electric bill and train two models simultaneously. I could even sell the rig later on when I need to upgrade. I am expecting resale value of 1/3 to 1/2 of my system in 2 years.
The answer is quite obvious at this point. I need to build my own rig. After some research, below is the list of parts and justification if necessary.
CPU: AMD Ryzen 1700
This is a 8-core 16-thread processor from AMD. Since most of the computation during training is performed by GPU and not CPU, I did not want to spend more than $300 on CPU. I debated whether to get even cheaper one, 1600, which has 6-cores and 12-threads with higher clock speed. This could be a better option for neural network training. They are both good options. However, at the time of buying, I could not get Ryzen 1600 at its retail price, because 1600 was in such a high demand.
I did not get Intel CPU because it is over-priced at the moment, as the new generation Coffee Lake is imminent. If I could wait a few more months, maybe Coffee Lake processors could be much better candidates than Kaby Lake.
GPU: NVIDIA GTX 1080
This was a tough call. I could get 1060 6G, 1070 8G, 1080 8G, or 1080 Ti 11G. The best bang for the buck would be 1060 6G, but I wanted more VRAM than 6GB. Next up is 1070 8G, but this was too expensive at the time due to high demand, costing around $500. Next up is 1080 8G, which is around $550 with more than 15% boost in performance. Next up is 1080 Ti 11G at $750, but this is too expansive compared to 1080 8G and the performance gain does not justify it. I therefore went with GTX 1080 8G. In fact, I got 2x GTX 1080 8G to train two models simultaneously. If you are willing to spend extra $500, you could go with 2x GTX 1080 Ti 11G.
AMD GPUs were not considered, as most deep learning libraries do not fully support AMD GPUs at the moment. I really hope AMD catches up with GPGPU support for deep learning libraries soon.
Mainboard: ASRock Fatal1ty X370 Gaming K4
This was one of the cheapest mainboards that support AMD Ryzen series CPUs and 2x PCI Express3 8 lanes each. Since I was getting two GPUs, I wanted to make sure that both GPUs get at least PCI Express3 8 lanes.
Yes I could have chosen CPU and mainboard to support dual PCI Express3 16 lanes, but this would sky-rocket my rig cost, and I don't think there will be much performance difference between PCI Express3 8 lanes vs 16 lanes for GTX 1080 graphics cards (source). If you are getting GTX 1080 Ti series, then perhaps you may want to opt for high-end CPU that supports PCI Express 32+ lanes and mainboard to fully support PCI Express3 16 lanes for each GPU, but you would have to spend $3000 or so on the system.
RAM: 2x DDR4 2400 8G
I will get more RAM when the cost goes down a bit. Currently the memory price is just too expansive.
SSD: Samsung Evo 860 500G
Just get a decent SSD with >= 500G. Absolutely no HDD, as this will significantly lower the performance. Samsung's SSDs are renowned for speed and stability.
Power Supply: 850W Gold-rated
Maybe 850W is too much for my config, but it is always better to choose power supply with abundant output. A cheap low-quality low-power output supply can actually destroy the whole system! I roughly estimated 100W for CPU, 200W for each GPU, and 100W for the rest. This is 600W in total, and I wanted 200W margin just to be safe, but 700W+ should have worked just fine. For dual GTX 1080 Ti configuration, you may want to get at least 850W or more.
Case: ATX Mid-Tower
Choose whatever you like as long as it is large enough to fit two GPUs and the motherboard. Most of ATX mid-tower cases should do.
Note that you should select a case with lots of fans and ventilation for cooling. I made a mistake of getting a case that wasn't so good in cooling, and the GPU temperature went up to 90C or more, so I had to buy and install additional fans to cool them down. It is very important to keep them cool enough, probably below 85C at full load. With GTX 1080 Ti GPUs, I imagine that cooling will be even more critical.
OK, so the grand total excluding monitor/mouse/keyboards, etc is a bit more than $1900 before tax. With this config, you can train a network that requires up to 16GB of VRAM, since you have 2x GTX 1080 8G, although you will need to make sure to distribute the work load between the two GPUs manually in the code.
I installed Ubuntu 16.04 LTS for now, although I may switch to Cent OS later on. After installing NVIDIA CUDA toolkit, I can successfully detect both GPUs and use them simultaneously with no problem. I did not connect them with SLI though, since it is not needed for my purpose.
Good luck with configuring your system!