Showing posts with label Ubuntu 16.04 LTS. Show all posts
Showing posts with label Ubuntu 16.04 LTS. Show all posts

Tuesday, August 14, 2018

Multi-GPU on Pytorch

After some time, I finally figured out how to run multi-gpu on pytorch. In fact, multi-gpu API is just extremely simple in pytorch; the problem was my system.

Here is a simple test code to try out multi-gpu on pytorch. If this works about of the box, then you are good. However, some people may face problems, as discussed in this forum. As pointed out here, the problem is not about pytorch, but with external factor. In my case, it was ngimel 's comment that saved me. To recap her solution,

1. Test p2pBandwithLatencyTest from CUDA samples and make sure it works fine. If it does not pass this one, then the problem is with CUDA installation, etc, and not with pytorch. To download samples, simply run

$ cuda-install-samples-9.2.sh <target_path>

where you would replace the version above to whatever version you have. Then,

$ cd <target_path>/NVIDIA_CUDA-9.2_Samples/1_Utilities/p2pBandwidthLatencyTest/
$ make
$ ./p2pBandwidthLatencyTest

2. In my case, it was IOMMU that was the culprit. Disable it by editing /etc/default/grub and replace
#GRUB_CMDLINE_LINUX="" 
with
GRUB_CMDLINE_LINUX="iommu=soft"

Then update grup
$ sudo update-grup

Then reboot

This is how I solved my problem. I love this open source community forum! Thank you everyone!

Saturday, February 3, 2018

Prevent Process Termination over SSH Connection

When running heavy long-time running processes on a remote server through ssh, it is quite annoying that your processes are terminated when you are disconnected. Here is a tip just for this.

Let's assume you ssh into your server
$ ssh USERNAME@SERVER_IP

You want to run some program, which will run for quite some time.
$ ./a.out arg1 arg2 ...

What happens if your ssh connection is lost, perhaps due to unstable network connection, etc. Well, your program a.out will terminate itself, so you will have to re-run it. In fact, you will need to keep your ssh connection until the program completes, which is quite annoying.

One solution is using nohup
$ nohup ./a.out arg1 arg2 ... &

This will redirect the output to nohup.out file. With this, even if you disconnect from ssh connection, your program will keep running in the background.

Another solution, which is probably much more sophisticated, is to use screen. It is your work session where you can run processes, and you can easily detach from it to do other things while your processes running inside the session are intact and not terminated. You can always re-attach to the screen session and resume your processes still running.

To use this, you need to first create a new screen session
$ ssh USERNAME@SERVER_IP
$ screen -S SESSION_NAME

This will create a new screen. Here, you can run your program
$ ./a.out arg1 arg2 ...

Next, you can always detach from this current session with <CTRL + a> d. That is, press <CTRL> and <a> keys together, release the keys and then press <d> key and release. This will detach from the session. At this point, your program will keep running regardless of whether you disconnect from your ssh connection to the server.

To list running screen sessions, run
$ screen -list

To re-attach to the session, you simply run
$ screen -r SESSION_NAME

To exit the screen, you run
$ exit

By the way, you may notice that within screen session, your scroll will register as up/down arrow keys. If you want your scroll to work as scroll terminal output, you can run the following (credit to pistos)
$ echo 'termcapinfo xterm* ti@:te@' >> ~/.screenrc

You can also press <CTRL> <a> and then <ESC> to enter copy mode and do scroll freely. To exit copy mode, simply press <ESC>

Happy hacking!

Thursday, February 1, 2018

Debugging Bash Scripts

Bash scripts are extremely handy when mostly dealing with Linux commands, such as find, grep, sed, and so on. One can write functions, just like in any other languages. However, I have to admit that I haven't really dealt with Bash scripting much so far. One of the reasons, I suppose, is that it is difficult to debug Bash scripts. I didn't think there was any debugging tool for Bash. Well, I just realized that I was wrong! There is in fact a very nice debugger for Bash: bashdb

To install, you can run the following on macOS
$ brew install bashdb

or the following for Ubuntu
$ sudo apt-get install bashdb

To start debugging, run
$ bashdb bash_script.sh argument1 argument2 ...

Many of the commands are similar to that of gdb, but for more info, type in help. Also, this documentation can be of great reference.

Happy bashing!

Wednesday, January 17, 2018

Replace Lines with Given Strings in Shell

Let's learn how to replace given lines by given string. Consider, for example,
$ cat some_file
abcdefg
hijklmn
opqrstu
vwxyz
12345
67890

Let's replace line 2 with string THIS_IS_NEW_LINE2
$ sed '2s/.*/THIS_IS_NEW_LINE2/g' some_file
abcdefg
THIS_IS_NEW_LINE2
opqrstu
vwxyz
12345
67890

You can replace multiple lines with multiple -e options
$ sed -e '2s/.*/THIS_IS_NEW_LINE2/g' -e '3s/.*/THIS_IS_NEW_LINE3/g' some_file
abcdefg
THIS_IS_NEW_LINE2
THIS_IS_NEW_LINE3
vwxyz
12345
67890

Of course, you can always use -i option to modify file on the fly
$ sed -i '2s/.*/THIS_IS_NEW_LINE2/g' some_file
$ cat some_file
abcdefg
THIS_IS_NEW_LINE2
opqrstu
vwxyz
12345
67890

Print Selected Lines in Shell

Let's assume you want to print certain lines in given text. For example, consider
$ cat some_file
1
2
3
4
5
6
7
8
9

Let's say you want to print lines 3-5. This is very easy using sed:
$ sed -n '3,5p' some_file
3
4
5

Let's say you want to print all lines, except 3-5. This can be done with d option:
$ sed '3,5d' some_file
1
2
6
7
8
9

Note that by default sed prints all lines, so -n option asks sed to print only 3 to 5 lines and suppresses the default behavior. On the other hand, the d letter in the single quotes asks sed to delete the lines 3-5 from printing by default, so it prints only the rest of the lines instead. 

Find Line Number Matching Given Expression in Shell

Say you want to find the line number that matches a given search string. For instance,
$ cat some_file.txt
this is some file line 1
it was written in Vim line 2
let's learn unix commands line 3

To find an expression, say unix, we can use grep
$ grep unix some_file
let's learn unix commands line 3

To show the line number of the expression, use -n option
$ grep -n unix some_file
3:let's learn unix commands line 3

To output the line number only, pipe with cut command
$ grep -n unix some_file | cut -d : -f 1
3

Here, -d option specifies delimiter, and -f option specifies the position separated by the delimiter.

If there are multiple lines matching the given expression, you can always use -m option of grep to print just first N of them
$ grep -n -m 2 unix some_file | cut -d : -f 1
1
2

Happy hacking!

Saturday, January 13, 2018

Build HTK in Ubuntu 16.04 with Minimum Effort

This post will walk through compilation of HTK on Ubuntu 16.04 64-bit. For macOS, take a look at this post.

First, you will need to download HTK from here. In this post, I will assume you download the latest stable version, 3.4.1. Extract the source files and change the directory
$ tar xfz HTK-3.4.1.tar.gz && cd htk

Because HTK was designed for 32-bit OS, we need to set the environment as such for 64-bit OS. Enter 32-bit environment by running
$ linux32 bash

Now, we do the usual things, starting off with configure
$ ./configure --prefix=$(pwd) && make

You will see an error:
HGraf.c:73:77: fatal error: X11/Xlib.h: No such file or directory
compilation terminated.

To find out which library you need, you can run the following
$ sudo apt-get install apt-file -y && apt-file update
$ apt-file search Xlib.h

You will see some lines of search result, one of which should read
libx11-dev: /usr/include/X11/Xlib.h

From this, you know that you need to install libx11-dev package.
$ sudo apt-get install libx11-dev -y

Let's resume make
$ make

You will encounter yet another error, complaining
Makefile:77: *** missing separator (did you mean TAB instead of 8 spaces?).  Stop.

It turns out that there is an error in HLMTools/Makefile line 77. In the very beginning of the line, there are a number of spaces, which should be replaced by a single tab.
<INSERT TAB HERE>if [ ! -d $(bindir) -a X_ = X_yes ] ; then mkdir -p $(bindir) ; fi

OK, let's continue.
$ make

You should see the compilation succeeds without any problem. Let's install this and you should find HTK binary files inside bin folder in the current directory!
$ make install
$ bin/HCopy

Wednesday, November 8, 2017

Setting up Your Web App with Flask

With Python Flask, you can easily setup a web application that is interactive. In this tutorial, we will build a simple deep neural network server that will classify the given image into one of the 1000 categories. When done, your app will look like




Let's create a folder where all your web app files will reside.
$ mkdir ~/web_app
$ cd ~/web_app

Install Flas, Keras and OpenCV modules
$ pip install flask keras opencv-python

Next, create two more folders as below
$ mkdir templates uploads

Create server.py and copy the code below:

Create templates/index.html file and copy the code below:


Create templates/predict.html file and copy the code below:


That's it! Your web app will classify a given image---either uploaded directly from the client or using the web url---using ResNet50 pre-trained network.

To run the server, run
$ python server.py

While the server is running, you can browse to http://SERVER_IP:8888 to view the web app, where of course SERVER_IP must be replaced with the server's actual IP address.

Tuesday, October 24, 2017

Using RAM Disk for Expediting Neural Net Training

These days I am constantly experimenting models with different architecture / hyper parameters. Because I am working with lots of images, I realized that it takes quite a bit of time for loading training or validation images every epoch from the disk. Yes, I am using a solid state drive, but it is still slow compared to RAM. To speed up the training, I have been saving images into the RAM in my Python code, which definitely expedites the training.

However, there are two main issues I see. The first is that raw training files are usually in JPEG images, which do not take up much space. However, when I am saving the image data in the Python code, I save the images in the numpy array of bitmap format, which significantly expands the storage space.

The second issue is that I have two GPUs in the server, thus sharing the CPU and RAM. When I am running different models on each GPU, but they share the same set of training data, I would have two copies of the exact same dataset in memory.

So, I was looking for a solution, and I found one that is very easy to implement. On Linux, there is a way to create a RAM disk, which is basically chunk of data saved in RAM but the system treats it as if it is a disk. Basically, I would mount a RAM disk and have the programs access the data from this RAM disk mount location.

Here is how to do it. The instructions below are based on this excellent article. First, create a folder to which you will mount the RAM disk.
$ sudo mkdir /mnt/ramdisk

Next, mount the RAM disk with specified space. For example,
$ sudo mount -t tmpfs -o size=1024m tmpfs /mnt/ramdisk

Finally, copy your training data into this folder and make sure to have your training code point to the new location.
$ cp -r /your/training/data /mnt/ramdisk

Happy training!

** Something to keep in mind **
- This is RAM, so every time your system reboots, the content will be gone
- Make sure that you have enough free RAM so that your data can fit in

Saturday, September 16, 2017

Examining the Bottleneck between CPU and NVIDIA GPU

I was investigating which part of my computer is the culprit for slowing down neural net training. I first thought it was CPU doing the image preprocessing, as my CPU is Intel's low-end series G4560, which only costs about $90, whereas my GPU is NVIDIA's high-end series GTX 1070 that costs more than whopping $400, thanks to cryptocurrency booming.



To my surprise, it was actually the GPU that was lagging behind this time, at least for the current network that I am training. I would like to share how I found out whether GPU or CPU was lagging. Below is the code, most of which is taken from Patrick Rodriguez's repository keras-multiprocess-image-data-generator.


To run the script, you first need to install necessary modules. Save the following as requirement.txt
cycler
functools32
matplotlib
numpy
nvidia-ml-py
pkg-resources
psutil
pyparsing
python-dateutil
pyt
six
subprocess32

Next, run the command below to automate installing all the necessary modules:
$ pip install -r requirement.txt

Lastly, you also need python-tk module, so install it via
$ sudo apt-get install python-tk

Now, you can run the script
$ python sysmonitor.py

Note that you must have NVIDIA GPU in order for the script to work.

Saturday, September 2, 2017

Execute User Scripts on Boot

I wanted to have my VirtualBox guest system start automatically when I turn on the host computer. After some research, here is what I have found out:

First, write a script file that is to be executed when the computer boots. For my case, it was vmscript.sh file that reads
vboxmanage startvm CentOS --type headless

Make sure that this script is runnable.
$ chmod u+x vmscript.sh

Next, run the script yourself to test it works
$ ./vmscript.sh

When everything works fine, edit crontab to run the script on boot. Make sure to run the command below as the user who shall execute the script on boot. That is, don't run it as root unless you want the script to be executed as root.
$ crontab -e

Add the following line when in the crontab
@reboot /path/to/vmscript.sh

That's it! The script shall be executed on each system boot!

Friday, September 1, 2017

Monitor NVIDIA GPU Status

If you have properly installed NVIDIA driver, then you can easily check your GPU's temperature by running
$ nvidia-smi -q -d temperature

In case you are not sure how to install NVIDIA drivers, refer to this page for excellent answer.

To display the GPU status in general, run
$ nvidia-smi

To watch the GPU status real time, run
$ watch nvidia-smi

Saturday, July 22, 2017

Alternative to scp

For transferring files from remote computer, I use scp command. In this post. I will cover an alternative command: rsync.

The syntax is very similar to scp. For example,
$ rsync some/local/file user@remote:/location/in/remote
$ rsync user@remote:/location/in/remote some/local/file

However, the options are quite different. I would say just use -aP
$ rsync -aP some/local/file user@remote:/location/in/remote
$ rsync -aP user@remote:/location/in/remote some/local/file

Here, -a stands for archive, which includes recursive -r option, and -P stands for --partial and --progress. With these options, you can resume downloading the files when interrupted.

For more info, look up man page:
$ man rsync

Ah, by the way, if you are using port other than 22, make sure to use -e option as below:
$ rsync -aP -e 'ssh -P 1234' some/local/file user@remote:/location/in/remote
$ rsync -aP -e 'ssh -P 1234' user@remote:/location/in/remote some/local/file
where 1234 is the port you want to use.

Friday, July 21, 2017

Resume Downloading with wget

It is quite annoying when you are downloading a large file with wget or curl, and somehow your download is interrupted, because you have to download from scratch again. Well, here is what you can do to resume downloading where you left off.

With wget, simply add -c option:
$ wget -c http://url/to/download/file

That's it!

Thursday, May 4, 2017

Setup Deep Learning Development Environment in Ubuntu

In this tutorial, I will go through step by step instructions to setup deep learning development environment for Ubuntu. We will install the following python packages on fresh Ubuntu 16.04:
opencv
tensorflow
keras
matplotlib
numpy
scipy
sklearn
tk

Let's dig it! First, I would install virtualenv, in case you need multiple python environments.
$ sudo apt-get install virtualenv -y

To create an environment, simply run
$ virtualenv ENV
where replace ENV with the deep learning environment name you would like.

To activate the new environment,
$ source ~/ENV/bin/activate
where again replace ENV with the name chosen above.

Next, we need to install pip, which helps us install these python packages with ease.
$ sudo apt-get install python-pip -y

You may want to upgrade pip to the latest version:
$ pip install --upgrade pip

Next, let's install python packages within the environment.
$ pip install tensorflow keras numpy scipy matplotlib sklearn

For OpenCV and TK, we need to install it from apt-get:
$ sudo apt-get install libopencv-dev python-opencv python-tk -y

That's it! Now you are ready to develop your neural network with tensorflow backend keras! If you want to test out if your environment is successfully setup, check out this post.

Saturday, April 22, 2017

Hosting Multiple Websites on a Single IP Address

It is not too difficult to host multiple websites on a single server and a single IP address. Here is how to do it with Apache server, based on article1, article2, and article3. I am assuming that you already have a server setup with apache2. If not, please refer to this tutorial for instructions.

The first step is to create virtual hosts where each virtual host serves each different website. For example, assume you want to serve domain names: hi.com and hello.com. You will need to create two virtual hosts, so that one will serve hi.com while the other will serve hello.com.

On Ubuntu or Debian, apache default server configuration files are in /etc/apache2 directory, while web server directory is /var/www.

First, copy the default configuration file and create two virtual host config files:
$ sudo cp /etc/apache2/sites-available/000-default.conf /etc/apache2/sites-available/hi.conf
$ sudo cp /etc/apache2/sites-available/000-default.conf /etc/apache2/sites-available/hello.conf

Next, edit /etc/apache2/sites-available/hi.conf similar to below:
 <VirtualHost *:80>
    ServerName hi.com
    ServerAlias www.hi.com
    DocumentRoot /var/www/hi
    ErrorLog ${APACHE_LOG_DIR}/error.log
    CustomLog ${APACHE_LOG_DIR}/access.log combined
</VirtualHost>


Note that you must create /var/www/hi directory that contains files to serve for hi.com.

Similarly, edit /etc/apache2/sites-available/hello.conf in the same manner.
 <VirtualHost *:80>
    ServerName hello.com
    ServerAlias www.hello.com
    DocumentRoot /var/www/hello
    ErrorLog ${APACHE_LOG_DIR}/error.log
    CustomLog ${APACHE_LOG_DIR}/access.log combined
</VirtualHost>


Again, you will need to create /var/www/hello directory that will serve visitors to hello.com.

Next, enable the new virtual host configuration files:
$ sudo a2ensite hi.conf 
$ sudo a2ensite hello.conf 

Next, reload apache2 so that the change takes effect
$ sudo service apache2 reload

If you want to test these out, refer to this excellent article for more details.


These steps up to here will complete the setup for the server side. Now, it is time to setup your domain name configurations.

To direct any visitor who enters hi.com or hello.com to your virtual hosts, you will need to add A Record. Take a look at this article for more details.

Essentially, create A Record for hi.com and hello.com to direct to your server's IP address, and apache server will then take care of directing visitors of hi.com to your hi.com virtual host, and visitors of hello.com to the virtual host of hello.com that you have set up above.

*** Note: make sure not to enable forwarding of your domain name to your server. That was my first attempt, and it did not work. You will need to set up A Record instead in order for your apache server to point to appropriate virtual host.

Sunday, April 16, 2017

Java Compilation with Package and Jar Dependencies

In this tutorial, I will go over Java package and dependencies basics.

Let's go over the simplest java code with package declaration. Let's assume we are working on ~/java_package directory. Create Hello.java file below in the current directory:


To compile and execute this, we will first create a directory whose name is equal to the package name. In this case, the directory name should be pkgtest. So, create this directory and move Hello.java file into this. Run all of the following from ~/java_package directory:
$ mkdir pkgtest
$ mv Hello.java pkgtest/

Next, we need to compile. This is very simple.
$ javac pkgtest/Hello.java

Finally, we need to run it.
$ java pkgtest/Hello
hello

Note that you must run this from ~/java_package directory; otherwise java will complain with the error below:
$ cd pkgtest
$ java Hello
Error: Could not find or load main class Hello


Next, we will see how to import the package. Modify pkgtest/Hello.java and create ./JarTest.java as below:


To compile and run, run the following from ~/java_package directory:
$ javac JarTest.java
$ java JarTest
hello
hi


Next, we will create a jar library file and use this to compile and run. First, let's create the jar library. Run the following in ~/java_package directory
$ javac pkgtest/Hello.java
$ jar cvf pkgtest.jar pkgtest/Hello.class
added manifest
adding: pkgtest/Hello.class(in = 648) (out= 372)(deflated 42%)

This creates pkgtest.jar file, which contains Hello class.

Next, we will rename the pkgtest directory so that we don't compile from the source.
$ mv pkgtest pkgtest_bk

Let's see what happens if don't specify the jar file as we compile.
$ javac JarTest.java
JarTest.java:1: error: package pkgtest does not exist
import pkgtest.Hello;
              ^
JarTest.java:5: error: cannot find symbol
        Hello.print("hello");
        ^
  symbol:   variable Hello
  location: class JarTest
JarTest.java:6: error: cannot find symbol
        Hello h = new Hello();
        ^
  symbol:   class Hello
  location: class JarTest
JarTest.java:6: error: cannot find symbol
        Hello h = new Hello();
                      ^
  symbol:   class Hello
  location: class JarTest
4 errors

As you can see above, java compiler complains that it cannot find some symbols, since we renamed the package directory. To resolve this, let's link the jar file:
$ javac -cp ".:pkgtest.jar" JarTest.java
$ java -cp ".:pkgtest.jar" JarTest
hello
hi

Viola! Let's call it a day.

Saturday, April 1, 2017

How to Save Core Dumped FIle

Whenever you encounter the error message:
Segmentation fault (core dumped)

you may be wondering where is this core file?

Well, it is most likely that you need to increase the core file size. Try
$ ulmit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 63344
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 63344
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited


You see, the core file size is set as 0 by default in Ubuntu, and that's why you are not seeing the core dumped file. Increase this to, say, 50000:
$ ulimit -c 50000

Now, you should see a core file as core in the current directory whenever you encounter the above error message!

How to Deal with Grub-Efi-Amd64-Signed Failure Error

These days, computers come with UEFI, Unified Extensible Firmware Interface, which is an upgrade from traditional legacy BIOS. With UEFI, Linux installation may fail with the following error message:
grub-efi-amd64-signed failed installation /target/ Ubuntu 16.04 

If this message appears, here is what you need to do.

First, create EFI partition on your disk. That is, boot your system from the Linux installation media, which typically has gparted. Open up gparted, create a partition such that the first partition on the disk is FAT32 with 200MB of size.

Second, set esp and boot flags for this FAT32 partition from gparted. This will indicate that this partition is for EFI.

Now that you proceed to Linux installation, and select the disk with EFI partition for boot loader installation. Do not choose the partition itself; select the entire disk such as /dev/sda. Now it should work!

Friday, March 31, 2017

How to Install NVIDIA Drivers on Ubuntu for CUDA / cuDNN

OK, so I have purchased my desktop system with dedicated GPU! The last time I purchased a system with dGPU was back in 1990s, so it's been about 20 years. I bought one because I wanted to study tensorflow / keras, and I simply couldn't do it with just CPUs, so I had to get NVIDIA GPU.

Anyways, I noticed that Ubuntu doesn't install its driver automatically, so here is how to do so manually. First, download CUDA from NVIDIA here. As of now, the latest version is CUDA 8.0. If you are going to use tensorflow, make sure it supports the version of CUDA you are going to download from the official documentation page.

Follow the instruction on NVIDIA download page to install. I am going to download local runfile for Ubuntu 16.04. For this option, I need to run
$ sudo sh cuda_8.0.61_375.26_linux.run

***
Note that the runfile will probably not work unless you make sure 2 things:

1. Install gcc on your system
$ sudo apt-get install build-essential

2. Run it in console mode. Follow the answer by Rey on this post for details.
***

Download cuDNN from here. I am going to download cuDNN v5.1 Library for Linux for CUDA 8.0. Decompress the file into /usr/local/cuda/cudnn folder:
$ tar xfz cudnn-8.0-linux-x64-v5.1.tgz -d cudnn
$ sudo mv cudnn /usr/local/cuda/cudnn

Next, add library paths by creating /etc/ld.so.conf.d/cuda.conf file with the following lines:
/usr/local/cuda/lib64/
/usr/local/cuda/cudnn/lib64/

Refresh ld cache by
$ sudo ldconfig

Finally, install tensorflow-gpu. I am going to use pip:
$ sudo apt-get install python-pip
$ pip install tensorflow-gpu

Now, when you run the following command, you should see all CUDA library opened successfully:
$ python -c 'import tensorflow'

Happy hacking!