Wednesday, November 8, 2017

Setting up Your Web App with Flask

With Python Flask, you can easily setup a web application that is interactive. In this tutorial, we will build a simple deep neural network server that will classify the given image into one of the 1000 categories. When done, your app will look like

Let's create a folder where all your web app files will reside.
$ mkdir ~/web_app
$ cd ~/web_app

Install Flas, Keras and OpenCV modules
$ pip install flask keras opencv-python

Next, create two more folders as below
$ mkdir templates uploads

Create and copy the code below:

Create templates/index.html file and copy the code below:

Create templates/predict.html file and copy the code below:

That's it! Your web app will classify a given image---either uploaded directly from the client or using the web url---using ResNet50 pre-trained network.

To run the server, run
$ python

While the server is running, you can browse to http://SERVER_IP:8888 to view the web app, where of course SERVER_IP must be replaced with the server's actual IP address.

Wednesday, November 1, 2017

Running Keras Deep Neural Network Inference Web Server using WebDNN

Web app is a great tool for simple demo across any platforms. I find it very useful because I can easily show my trained model's output to anyone like a breeze. The catch, however, would be setting up the server, but once the server is up and running, it cannot be any better to boast in your resume and show off to your boss with your trained model.

Here is a tutorial for setting up a simple neural network inference server using WebDNN library. Although its Github page and official documentations are very clear, I still had to spend some time to get it to work. In addition, the official documentation instructs users to compile and install emscripten from source, which will consume quite some time; I will show you how to get around this with install pre-compiled version.

First, one needs to clone WebDNN repository from Github:
$ git clone && cd webdnn

Note that WebDNN only supports Python3.6+, so you need to install this unless you already have it on the system. To check the your Python3 version, run
$ python3 --version

Make sure that it is 3.6+. There are plenty of resources that you can search on Google on how to install Python 3.6+. For instance, on Mac OS X, using homebrew is probably the easiest way:
$ brew install python3

Once you have Python 3.6+, it is a good idea to create virtual environment for what you will need to do.
$ virtualenv -p `which python3` python3

Now, activate the environment and install necessary packages
$ source python3/bin/activate
$ pip install tensorflow-gpu keras h5py

The Python environment is now complete. You now need to setup emscripten environment.
$ git clone && cd emsdk
$ ./emsdk install latest
$ ./emsdk activate latest
$ source ./

Now, we need to install eigen library.
$ wget
$ tar jxf 3.3.3.tar.bz2
$ export CPLUS_INCLUDE_PATH=$PWD/eigen-eigen-67e894c6cd8f
$ cd ..

Finally, we are ready. Let's first create pre-trained ResNet Keras model that we will use. Run Python and run the following lines
$ python
>>> from keras.applications import resnet50
>>> model = resnet50.ResNet50(include_top=True, weights='imagenet')

Exit Python and run the following
$ python ./bin/ resnet50.h5 --input_shape '(1,224,224,3)' --out output

After some time, it will generates files in output directory. To run the server, we first need to modify the example/resnet/script.js file. We must make sure to point to the correct directory by changing the weight path. For the version I have, I modified line 39 to point to output directory.
let runner = await WebDNN.load(`/output`, {backendOrder: backend_name});

Finally, we are ready to run the server. To start the server, run the following in webdnn directory
$ python -m http.server

On your web browser, go to address localhost:8000/example/resnet/index.html

You should be able to test ResNet50 model!

Tuesday, October 24, 2017

Using RAM Disk for Expediting Neural Net Training

These days I am constantly experimenting models with different architecture / hyper parameters. Because I am working with lots of images, I realized that it takes quite a bit of time for loading training or validation images every epoch from the disk. Yes, I am using a solid state drive, but it is still slow compared to RAM. To speed up the training, I have been saving images into the RAM in my Python code, which definitely expedites the training.

However, there are two main issues I see. The first is that raw training files are usually in JPEG images, which do not take up much space. However, when I am saving the image data in the Python code, I save the images in the numpy array of bitmap format, which significantly expands the storage space.

The second issue is that I have two GPUs in the server, thus sharing the CPU and RAM. When I am running different models on each GPU, but they share the same set of training data, I would have two copies of the exact same dataset in memory.

So, I was looking for a solution, and I found one that is very easy to implement. On Linux, there is a way to create a RAM disk, which is basically chunk of data saved in RAM but the system treats it as if it is a disk. Basically, I would mount a RAM disk and have the programs access the data from this RAM disk mount location.

Here is how to do it. The instructions below are based on this excellent article. First, create a folder to which you will mount the RAM disk.
$ sudo mkdir /mnt/ramdisk

Next, mount the RAM disk with specified space. For example,
$ sudo mount -t tmpfs -o size=1024m tmpfs /mnt/ramdisk

Finally, copy your training data into this folder and make sure to have your training code point to the new location.
$ cp -r /your/training/data /mnt/ramdisk

Happy training!

** Something to keep in mind **
- This is RAM, so every time your system reboots, the content will be gone
- Make sure that you have enough free RAM so that your data can fit in

Monday, October 16, 2017

Thread-Safe Generators in Python

In this post, we will go over how to create thread-safe generators in Python. This post is heavily based on this excellent article.

Generator makes life so much easier when coding in Python, but there is a catch; raw generators are not thread-safe. Consider the example below:

We see that the generator does not produce correct output when multiple threads are accessing this at the same time.

One easy way to make it thread-safe is by creating a wrapper class that simply lets only one thread to execute the generator's next method at any given time with threading lock. This is shown below:

Note that the generator now is thread-safe but doesn't execute its next method in parallel. You can also use Python's decorator to make it look even easier, although it basically does the same thing.

Monday, October 2, 2017

Three Minutes Daily Vim Tip: Disable a Key

The letter 'J' in vim is mapped to a shortcut to join two lines. Unfortunately, the letter 'j' is used a lot to navigate, and I often mistakenly press shift along with 'j'. This is quite annoying, so I decided to simply disable this shortcut.

To do this, open up ~/.vimrc file and add the following line:
nnoremap J <nop>

That's it!

Sunday, October 1, 2017

Applying Common Changes to Multiple Branches in Git

Say you have two branches, branchA and branchB.

Assume that in branchA you have

whereas in branchB you have

Say you want to make some change that will be common to both branchA and branchB; that is, for example, you want to add the same file to them, so that branchA shall become

and branchB shall become

To do this, you first need to commit to either branch, say branchA.
$ git checkout branchA
$ # write file4.txt
$ git add file4.txt
$ git commit file4.txt -m 'add file4'

Next, you just need to clone the very last commit using git cherry-pick
$ git checkout branchB
$ git cherry-pick branchA

If you have more branches, then simply repeat the above cherry-pick steps to branchC, branchD, and so forth.

Saturday, September 30, 2017

Tensorflow Fundamentals - K-means Cluster Part 2

From the previous post, I have shown how to calculate k-mean cluster using Tensorflow. In this post, I will add a bit more advanced implementations. In particular, I will show you how to implement conditional statement in Tensorflow.

The difference is at guessing the initial set of centroids. In the previous implementation, I simply chose k random points as initial centroids. Here, instead, I am selecting the first centroid to be the point furthest away from the origin. Next ith initial centroid for i = {1,2,...,k} is chosen such that the sum of the distances from previous i-1 centroids is the largest. This way, we can significantly reduce iteration number required to achieve the final state.

Saturday, September 23, 2017

Building Deep Learning Machine Under $2000 with Dual GTX 1080 GPUs

With my experimental model getting larger and larger, training takes too long time. This is especially true as most of my model is vision-based, so it requires a lot of computation and memory. Yes, I could use cloud computing, such as AWS or GCP, so I did some calculation.

The cheapest monthly cost I found for an instance with 2x GTX 1080 Ti GPUs is $500 (AWS or GCP costs much higher). In just four months of using the service, I would spend $2000 on the cloud service.

Instead, I could spend $2000 once building my own system with 2 GPUs, and some $50 or less each month for electric bill and train two models simultaneously. I could even sell the rig later on when I need to upgrade. I am expecting resale value of 1/3 to 1/2 of my system in 2 years.

The answer is quite obvious at this point. I need to build my own rig. After some research, below is the list of parts and justification if necessary.

CPU: AMD Ryzen 1700
This is a 8-core 16-thread processor from AMD. Since most of the computation during training is performed by GPU and not CPU, I did not want to spend more than $300 on CPU. I debated whether to get even cheaper one, 1600, which has 6-cores and 12-threads with higher clock speed. This could be a better option for neural network training. They are both good options. However, at the time of buying, I could not get Ryzen 1600 at its retail price, because 1600 was in such a high demand.

I did not get Intel CPU because it is over-priced at the moment, as the new generation Coffee Lake is imminent. If I could wait a few more months, maybe Coffee Lake processors could be much better candidates than Kaby Lake.

This was a tough call. I could get 1060 6G, 1070 8G, 1080 8G, or 1080 Ti 11G. The best bang for the buck would be 1060 6G, but I wanted more VRAM than 6GB. Next up is 1070 8G, but this was too expensive at the time due to high demand, costing around $500. Next up is 1080 8G, which is around $550 with more than 15% boost in performance. Next up is 1080 Ti 11G at $750, but this is too expansive compared to 1080 8G and the performance gain does not justify it. I therefore went with GTX 1080 8G. In fact, I got 2x GTX 1080 8G to train two models simultaneously. If you are willing to spend extra $500, you could go with 2x GTX 1080 Ti 11G.

AMD GPUs were not considered, as most deep learning libraries do not fully support AMD GPUs at the moment. I really hope AMD catches up with GPGPU support for deep learning libraries soon.

Mainboard: ASRock Fatal1ty X370 Gaming K4
This was one of the cheapest mainboards that support AMD Ryzen series CPUs and 2x PCI Express3 8 lanes each. Since I was getting two GPUs, I wanted to make sure that both GPUs get at least PCI Express3 8 lanes.

Yes I could have chosen CPU and mainboard to support dual PCI Express3 16 lanes, but this would sky-rocket my rig cost, and I don't think there will be much performance difference between PCI Express3 8 lanes vs 16 lanes for GTX 1080 graphics cards (source). If you are getting GTX 1080 Ti series, then perhaps you may want to opt for high-end CPU that supports PCI Express 32+ lanes and mainboard to fully support PCI Express3 16 lanes for each GPU, but you would have to spend $3000 or so on the system.

RAM: 2x DDR4 2400 8G
I will get more RAM when the cost goes down a bit. Currently the memory price is just too expansive.

SSD: Samsung Evo 860 500G
Just get a decent SSD with >= 500G. Absolutely no HDD, as this will significantly lower the performance. Samsung's SSDs are renowned for speed and stability.

Power Supply: 850W Gold-rated
Maybe 850W is too much for my config, but it is always better to choose power supply with abundant output. A cheap low-quality low-power output supply can actually destroy the whole system! I roughly estimated 100W for CPU, 200W for each GPU, and 100W for the rest. This is 600W in total, and I wanted 200W margin just to be safe, but 700W+ should have worked just fine. For dual GTX 1080 Ti configuration, you may want to get at least 850W or more.

Case: ATX Mid-Tower
Choose whatever you like as long as it is large enough to fit two GPUs and the motherboard. Most of ATX mid-tower cases should do.

Note that you should select a case with lots of fans and ventilation for cooling. I made a mistake of getting a case that wasn't so good in cooling, and the GPU temperature went up to 90C or more, so I had to buy and install additional fans to cool them down. It is very important to keep them cool enough, probably below 85C at full load. With GTX 1080 Ti GPUs, I imagine that cooling will be even more critical.

OK, so the grand total excluding monitor/mouse/keyboards, etc is a bit more than $1900 before tax. With this config, you can train a network that requires up to 16GB of VRAM, since you have 2x GTX 1080 8G, although you will need to make sure to distribute the work load between the two GPUs manually in the code.

I installed Ubuntu 16.04 LTS for now, although I may switch to Cent OS later on. After installing NVIDIA CUDA toolkit, I can successfully detect both GPUs and use them simultaneously with no problem. I did not connect them with SLI though, since it is not needed for my purpose.

Good luck with configuring your system!

Tuesday, September 19, 2017

Tensorflow Fundamentals - K-means Cluster Part 1

Now that we are familiar with Tensorflow, let us actually write code. For this series of posts, we are going to implement K-means clustering algorithm with Tensorflow.

K-means clustering algorithm is to divide a set of points into k-clusters. The simplest algorithm is
1. choose k random points
2. cluster all points into corresponding k groups, where each point in the group is closest to the centroid
3. update the centroids by finding geometric centroids of the clusters
4. repeat steps 2 & 3 until satisfied

Below is my bare-minimum implementation in Tensorflow.

Tensorflow Fundamentals - Interactive Session

As I have discussed in my previous posts, Tensorflow's computation graphs will not evaluate the expression unless one explicitly asks it to do so. One may find this quite annoying during debugging, so Tensorflow provides what is called Interactive Session, which let's you evaluate the expressions as you go, with minimal code.

This is really simple; just call
sess = tf.InteractiveSession()

in the beginning, and this will act like the block
with tf.Session() as sess:

See the code below.

Sunday, September 17, 2017

Tensorflow Fundamentals - Computation Graph Part 3

Next up, we want to now evaluate an expression from given input values. Let us construct a function (graph)
f(x,y) = x + y

where x,y are input values to fed into the graph f. To do this, we need to use tf.placeholder methods to define input nodes, and supply feed_dict parameter to run method as shown below:

The code is easy enough to be self-explanatory. The output of the code shall look similar to
$ python 2>/dev/null
8 + -6 = 2
-6 + 4 = -2
-10 + -1 = -11
-6 + 0 = -6
5 + 9 = 14
7 + 6 = 13
3 + 8 = 11
3 + 6 = 9
5 + -4 = 1
0 + -3 = -3

Tensorflow Fundamentals - Computation Graph Part 2

Let's continue our journey on Tensorflow's computation graph.

We will now make use of Tensorflow's variables. They are different from constant in that
1. they are mutable during the execution, i.e., they can change their values
2. they will store their state (value), which shall equal to that from the lastest execution

For example, you can define a counter variable that will increment its value on each execution:
counter = counter + 1

Below is the simple demo code for doing this in Tensorflow:

The only catch here is that
1. tf.assign is another operation that will assign a new value to the variable on each execution (run)
2. one must initialize all variables

Running it will output

So far so good!

Tensorflow Fundamentals - Computation Graph Part 1

This post is intended for anyone, including myself, who is having difficulty grasping the very basic concepts of Tensorflow: Computation Graph. This post is heavily based on Tensorflow's official documentation.

The way Tensorflow and Theano operates is just different from all those other ones in that there are two distinct phases. In the first phase one builds the computation graph that defines the computations to be performed. It is like a function where given whatever inputs, it will spit out outputs. For example, let us say you define
f(x,y) = x + y

Then f is the computation graph in Tensroflow. This phase is called construction phase.

In the second phase, one executes the graph by feeding in the inputs. For example,
f(1,2) = 3
f(3,2) = 5

and so on for any x,y pairs you feed in. The graph will output the numerical values given numerical inputs. This phase is called execution phase.

One difference, however, is that not only the mathematical operations, such as additions, subtractions, multiplications, and divisions but also each variables are considered as operation nodes in Tensorflow's computation graphs. Thus, with the example above, we now have three ops:

Here, x y are constant ops, meaning that their values will be some constant directly fed in during the execution phase. The output of the graph f can be fed into a more complex graph.

Let's do a very simple example in code. We will define the computation graph for
f = pi + 1

and compute for constant pi = 3.14159...

The output of the script shall yield

So far so easy. We will progressively construct and execute more complex and useful graphs, so stay with me.

Saturday, September 16, 2017

Examining the Bottleneck between CPU and NVIDIA GPU

I was investigating which part of my computer is the culprit for slowing down neural net training. I first thought it was CPU doing the image preprocessing, as my CPU is Intel's low-end series G4560, which only costs about $90, whereas my GPU is NVIDIA's high-end series GTX 1070 that costs more than whopping $400, thanks to cryptocurrency booming.

To my surprise, it was actually the GPU that was lagging behind this time, at least for the current network that I am training. I would like to share how I found out whether GPU or CPU was lagging. Below is the code, most of which is taken from Patrick Rodriguez's repository keras-multiprocess-image-data-generator.

To run the script, you first need to install necessary modules. Save the following as requirement.txt

Next, run the command below to automate installing all the necessary modules:
$ pip install -r requirement.txt

Lastly, you also need python-tk module, so install it via
$ sudo apt-get install python-tk

Now, you can run the script
$ python

Note that you must have NVIDIA GPU in order for the script to work.

Friday, September 15, 2017

Multithreading in Python

In the previous post, I investigated a way to preprocess images using multiple processes. In this post, I will investigate a way to preprocess images using multiple threads.

The real difference from the multiprocessing code is not much. Instead of using multiprocessing.Pool class, use multiprocessing.pool.ThreadPool class. Below is the code:

The execution time for multithreading is a bit slower than that of multiprocessing, but I am not sure if this is always the case, as the difference is not significant.

single thread elapsed time: 364
threads elapsed time: 184
threads elapsed time: 115

Multiprocessing with Python

I have been training a simple neural network on my desktop, and I realized that GPU wasn't running at its full capacity, i.e., there must be some bottleneck from, most likely, CPU side. My guess is image preprocessing from CPU is taking longer than GPU computation for each batch. In order to reduce the time for CPU to preprocess the images, I started investigating multiprocessing option in Python.

Below is a simple code for running OpenCV's Canny function across multiple processes using Python's built-in multiprocess module:

Running the script yields approximately linear time reduction for 2 processes and sub-linear for 4 processes, due to other bottleneck, such as disk IO.

single process elapsed time: 364
2 processes elapsed time: 181
4 processes elapsed time: 108

Saturday, September 2, 2017

Execute User Scripts on Boot

I wanted to have my VirtualBox guest system start automatically when I turn on the host computer. After some research, here is what I have found out:

First, write a script file that is to be executed when the computer boots. For my case, it was file that reads
vboxmanage startvm CentOS --type headless

Make sure that this script is runnable.
$ chmod u+x

Next, run the script yourself to test it works
$ ./

When everything works fine, edit crontab to run the script on boot. Make sure to run the command below as the user who shall execute the script on boot. That is, don't run it as root unless you want the script to be executed as root.
$ crontab -e

Add the following line when in the crontab
@reboot /path/to/

That's it! The script shall be executed on each system boot!

Friday, September 1, 2017

Enable X11 Forwarding from macOS X

In order to enable X11 forwarding from macOS client, you must first download XQuartz from and install it. Then, you must log out and log back in.

Next, you can remote into your server via ssh with -X or -Y option:
$ ssh username@server_ip -X

To verify that you have X11 forwarding, simply examine $DISPLAY variable while you are remotely logged in:
$ echo $DISPLAY

By the way, some Linux servers may not have X11 forwarding feature disabled by default. In this case, you must enable it by editing /etc/ssh/sshd_config file:
X11Forwarding yes

Enjoy Mac Life!

Monitor NVIDIA GPU Status

If you have properly installed NVIDIA driver, then you can easily check your GPU's temperature by running
$ nvidia-smi -q -d temperature

In case you are not sure how to install NVIDIA drivers, refer to this page for excellent answer.

To display the GPU status in general, run
$ nvidia-smi

To watch the GPU status real time, run
$ watch nvidia-smi

Sunday, August 27, 2017

Setup SSH Login Email Alert on Cent OS 7

This is similar to this post but not exactly.

First, you need to install mailx
# yum install -y mailx

Second, create /etc/ssh/ with the following:

# Change these two lines:

if [ "$PAM_TYPE" != "close_session" ]; then
    subject="SSH Login: $PAM_USER from $PAM_RHOST on $host"
    # Message to send, e.g. the current environment variables.
    echo "$message" | mailx -r "$sender" -s "$subject" "$recepient"

Lastly, add the following line to /etc/pam.d/sshd
session required seteuid /bin/sh /etc/ssh/

That's it!

Enable Network on Cent OS 7 Minimal

I was going to be testing out a very minimal server, so I chose to install Cent OS 7 minimal version. To my surprise, after installation I noticed that network was not ON by default!

After some searching, I realized that I need to edit /etc/sysconfig/network-scripts/ifcfg-eth0 to read:

Note that you will need to replace eth0 with your network adapter name.

In addition, there is no ifconfig or wget on Cent OS minimal, which I understand. To install these, simply run
# yum install -y net-tools wget

Thursday, August 24, 2017

Algorithm: Largest Sequence of Target Sum

This was one of the interview questions I was recently asked; I wasn't able to answer it, but hopefully this post will help others solve it on their interviews.

The question is quite simple and perhaps classic. Given an array of n numbers and target sum, find the largest sequence within the array such that the sum of the array is equal to the target sum. Design an implementation in O(n) time complexity.

Hint? Use hash map.

This is how you can solve this. Say the array is given by {a_1, a_2, a_3, ..., a_n}. Any subsequence of the array will have the form {a_i, ..., a_j} where i <= j <= n. If you let S_k be the sum from a_1 through a_k, then you notice that sum of any subsequence is given by S_j - S_i.

Thus, you need to go through each number and calculate S_j, and see if there has been S_i in the hash map such that S_j - S_i = target. If so, you can update your result.

In code, you can easily implement with Python:

There are some subtleties, but it shouldn't be too difficult to follow the code.

Install VirtualBox Guest Addition for Cent OS

Here is how to install VirtualBox Guest Additions on Cent OS 7. For Ubuntu or Debian, refer to here.

If you just insert the Guest Addition image and try to run the Linux installation script, you will probably end up with
# ./
Verifying archive integrity... All good.
Uncompressing VirtualBox 5.1.22 Guest Additions for Linux...........
VirtualBox Guest Additions installer
Removing installed version 5.1.22 of VirtualBox Guest Additions...
Copying additional installer modules ...
Installing additional modules ... Starting the VirtualBox Guest Additions.
Failed to set up service vboxadd, please check the log file
/var/log/VBoxGuestAdditions.log for details.

The log file indicates
# cat /var/log/VBoxGuestAdditions.log failed: Look at /var/log/vboxadd-install.log to find out what went wrong. failed: modprobe vboxguest failed.

Finally, the next log file reads
# cat /var/log/vboxadd-install.log
/tmp/vbox.0/Makefile.include.header:112: *** Error: unable to find the sources of your current Linux kernel. Specify KERN_DIR=<directory> and run Make again.  Stop.
Creating user for the Guest Additions.
Creating udev rule for the Guest Additions kernel module.

Basically, you need to specify the kernel source directory. So, here is what you need to do.

First, you need to install dkms, but if you do, you are like to receive
# yum install dkms
Loaded plugins: fastestmirror, langpacks
Loading mirror speeds from cached hostfile
 * base:
 * extras:
 * updates:
No package dkms available.
Error: Nothing to do

To install this, the easiest way is to install EPEL repo.
# yum install epel-release -y

After installing EPEL, try installing dkms again:
# yum install dkms -y

Next, install headers and sources
# yum install kernel-devel -y

You can now locate your kernel sources. In my case, I get
# ls /usr/src/kernels
3.10.0-514.26.2.el7.x86_64  3.10.0-514.26.2.el7.x86_64.debug

Export the environment variable with the first directory. Make sure to replace it the correct version for your system as it may differ from mine.
# export KERN_DIR=/usr/src/kernels/3.10.0-514.26.2.el7.x86_64

Finally, you are ready to install VirtualBox Guest Additions.
# ./
Verifying archive integrity... All good.
Uncompressing VirtualBox 5.1.22 Guest Additions for Linux...........
VirtualBox Guest Additions installer
Removing installed version 5.1.22 of VirtualBox Guest Additions...
Copying additional installer modules ...
Installing additional modules ... Starting the VirtualBox Guest Additions.

By the way, if you are running commands with sudo, you need to make sure to run with -E option:
$ sudo -E ./ 

Otherwise, your sudo command will shadow KERN_DIR environment variable that was exported from the user.

Adding User to Sudoer in Cent OS 7

I am just so used to sudo command with Ubuntu that I am going to need to do the same with Cent OS. Here is how to add a user to sudoer in Cent OS. This is a bit different from how you would do with Debian systems discussed here.

To get root access in Cent OS, run
$ su

As a root, to give a user permission to use sudo command, run
# usermod -aG wheel USER_NAME

That's it. The user must log back in order for the change to take effect.

Sunday, August 20, 2017

Setup SSH Remote Login for Cent OS

Maybe it's time. After using Ubuntu and Debian for years, I now officially want to switch to Cent OS.  More on this topic later. For today's post, I will just go over how to install ssh server and enable it at boot time.

To install ssh server on Cent OS, simply run
# yum install openssh-server

Unlike Ubuntu, where ssh server will be automatically started at boot time, Cent OS seems that one must manually do so. For one time start, run
# service sshd start

However, if you want your ssh server to keep running even after reboot or restart, run
# chkconfig sshd on
Created symlink from /etc/systemd/system/ to /usr/lib/systemd/system/sshd.service.

That's all. If you want to turn it off, then
# chkconfig sshd off
Note: Forwarding request to 'systemctl disable sshd.service'.
Removed symlink /etc/systemd/system/

Enjoy Cent OS!

Saturday, July 29, 2017

Output Size Calculation for Convolution and Pooling Layers

I keep forgetting the exact equation for calculating the output size of convolution and pooling layers. Below are equations directly from tensorflow's official documentation:

For 'SAME' padding, we have
out_height = ceil(float(in_height) / float(strides[1]))
out_width  = ceil(float(in_width) / float(strides[2]))

For 'VALID' padding, we have
out_height = ceil(float(in_height - filter_height + 1) / float(strides[1]))
out_width  = ceil(float(in_width - filter_width + 1) / float(strides[2]))

Note that in tensorflow, a typical input is 4D tensor with shape [batch_size,  height, width, channels]. Thus, strides[1] and strides[2] corresponds to stride_height and stride_width, respectively.

Hopefully this is useful!

Wednesday, July 26, 2017

Reversing Linked List In-Place

In this post, I will go through a simple algorithm for reversing a singly-linked list with space complexity of O(1), i.e., in-place.

There are two methods: one with recursive and the other with iterative. To me, it is easier to solve in recursive format first and then apply iterative afterwards.

Let's dig right in!

By the way, I claim that the recursive algorithm actually has space complexity of O(N), since it is recursively called N times, growing up the stack by O(N). With this view, quick sort no longer becomes an in-place algorithm, as it requires O(log N) recursive calls.

To see why I claim this, compare the functions reverse_with_array and reverse_with_function_with_stack(_helper). They pretty much do the same thing, except that in the first case, we create an array (a list, to be exact, but can be easily replaced by an array) whereas in the second case we simply create this array using recursive function stack. In fact, the recursive way costs more in time as well as space complexity.

In any case, these methods all do the work with quite different implementations!

Three Minutes Daily Vim Tip: Displaying Currently Viewing File

This post is based on jmathew's answer here.

To display which file you are currently looking at in vim, simply insert the following lines in ~/.vimrc file:
set statusline=%F%m%r%<\ %=%l,%v\ [%L]\ %p%%
set laststatus=2

Saturday, July 22, 2017

Alternative to scp

For transferring files from remote computer, I use scp command. In this post. I will cover an alternative command: rsync.

The syntax is very similar to scp. For example,
$ rsync some/local/file user@remote:/location/in/remote
$ rsync user@remote:/location/in/remote some/local/file

However, the options are quite different. I would say just use -aP
$ rsync -aP some/local/file user@remote:/location/in/remote
$ rsync -aP user@remote:/location/in/remote some/local/file

Here, -a stands for archive, which includes recursive -r option, and -P stands for --partial and --progress. With these options, you can resume downloading the files when interrupted.

For more info, look up man page:
$ man rsync

Ah, by the way, if you are using port other than 22, make sure to use -e option as below:
$ rsync -aP -e 'ssh -P 1234' some/local/file user@remote:/location/in/remote
$ rsync -aP -e 'ssh -P 1234' user@remote:/location/in/remote some/local/file
where 1234 is the port you want to use.

Friday, July 21, 2017

Resume Downloading with wget

It is quite annoying when you are downloading a large file with wget or curl, and somehow your download is interrupted, because you have to download from scratch again. Well, here is what you can do to resume downloading where you left off.

With wget, simply add -c option:
$ wget -c http://url/to/download/file

That's it!

Friday, July 14, 2017

Polymorphism in C++

Every time I have a job interview, people ask me questions regarding polymorphism and the role of virtual keyword in C++. It saddens me that I did not know the answer to them, and this post is to explain what they are to others who may be asked the same questions in their job interviews.

First, polymorphism in programming means that an object may take different forms. Assume you have Polygon class, from which you derive Rectangle and Triangle classes. You have an object that will be either Rectangle or Triangle, but you do not know what it will be until the runtime. In order to write code such that it shares as much as it can, we can write the following:

Compile it
$ g++ poly.cpp

and running the program with various inputs should help us understand what is going on:
$ ./a.out 2
This is a polygon object with name: polygon
This is a polygon object with name: polygon
area with w=3 and h=4 is -1

$ ./a.out 3
This is a triangle object with name: triangle
This is a polygon object with name: triangle
area with w=3 and h=4 is 6

$ ./a.out 4
This is a rectangle object with name: rectangle
This is a polygon object with name: rectangle
area with w=3 and h=4 is 12

Note that only virtual methods take polymorphic behavior, meaning that the method corresponding to the object's class is executed. This is the role of the keyword virtual. Another thing to note is that for virtual destructor, the base class' destructor is still called, unlike regular methods that are overriden.

Saturday, July 8, 2017

How to Use Vim on Macbook Pro Touch Bar

I really love my new Macbook Pro 15" model, with just one exception: its touch bar. I have no idea why the heck Apple wants to force everyone to use the touch bar in this model. You see, in the 13" Macbook Pro, you actually have an option to choose either the standard function keys or the touch bar, but with the 15" model, there is no such option.

My big complaints about touch bar should probably go into a separate thread, so I will start right into with my solutions to cope with the touch bar. I map F2 and F3 to work as :tabp and :tabn in Vim, but it just won't work with the touch bar, as there no long is standard function keys. What is even worse is that there is no physical ESC key, which is the most essential key in using Vim.

I actually remap CAPS LOCK to work as ESC key. This is supported by Mac OS X by default. To set this, go to Preferences --> Keyboard --> Modifier Keys --> Caps Lock --> Escape. For standard function key shortcuts, I simply remap ;n to Fn where n could be from 1 to 12. That is, for example, I have the following liens in ~/.vimrc:

nnoremap ;2 :tabp<CR>
nnoremap ;3 :tabn<CR>

for switching between tabs with ;2 and ;3 keystrokes. It takes some time to get used to, but at least this remapping makes touch bar usable.

Python Development with VIM

Just some notes regarding my quest for developing python using vim:

1. Never use anaconda. Most vim plugins will not support it

2. Use virtualenv instead.

3. Install python-mode.

First, run
$  mkdir -p ~/.vim/autoload ~/.vim/bundle && curl -LSso ~/.vim/autoload/pathogen.vim

Next, edit ~/.vimrc to have:
execute pathogen#infect()
syntax on
filetype plugin indent on

Finally, run
$ cd ~/.vim/bundle && git clone

4. Install ctags.
$ sudo apt-get install ctags

Now I am happy with it for now.

Friday, June 30, 2017

Go Back to Previous Directory in Bash

Wow, I can't believe I wasn't aware of these super useful commands till now!

The following command will change back to your previous directory
$ cd -

The following command will push the current directory to stack and move to the designated directory
$ pushd /new/directory/to/go/to

The following directory will pop from the stack and move to that directory
$ popd

Tuesday, June 20, 2017

Java Comparable and Comparator

In this tutorial, I will go over two similar but very different interfaces in Java: comparable and comparator. For official documentations on these, refer to here and here.

Comparable and Comparator are both Java interfaces to compare two or more objects. The difference is that comparable is the default comparator of the object, whereas comparator is a custom comparison implementation.

For example, assume you are entering personal info. For each personal info object, you have one's name, year, month, and date of birth. One way to sort a list of personal info objects is by their names, but another way is to sort by the date of birth. In Java, one will be able to sort by name or date of birth by providing corresponding comparator to the class. On the other hand, if one is to simply sort the objects without any comparator provided, then the list will be sorted by the class' comparable interface.

Let's take a look at an example.

To run it, simply type in
$ javac && java Personal

Sorting by default
Bob 1999, 1, 2
Charles 1955, 7, 4
Mike 2011, 4, 1
Tom 2011, 4, 4
Tony 2000, 5, 5
Tony 1987, 12, 11

Sorting by date of birth
Charles 1955, 7, 4
Tony 1987, 12, 11
Bob 1999, 1, 2
Tony 2000, 5, 5
Mike 2011, 4, 1
Tom 2011, 4, 4

Sorting by name
Bob 1999, 1, 2
Charles 1955, 7, 4
Mike 2011, 4, 1
Tom 2011, 4, 4
Tony 1987, 12, 11
Tony 2000, 5, 5

In the first sorting, no comparator is provided, so the list is sorted by Personal's comparable interface, namely compareTo method. For the second and third sorting, DobComparator and NameComparator is provided, so the list is sorted by the date of birth and name, respectively. The comparator classes provide compare method which specifies the order in which the list is to be sorted.

Sunday, June 18, 2017

Analysis of Quicksort with Duplicate Keys

In this post, I will present two slightly different versions of quicksort, and examine why one is better than the other.

Let's dig in the code first.

As you can see, the difference between two versions is very subtle. One partitions into two where the left contains values less or equal to pivot key and the right contains values strictly larger. The second version partitions into two where the left contains values less or equal to pivot key and the right contains values greater or equal to the pivot key.

What difference will this make? For an array with distinct keys, the difference is not as significant. However, when we have lots of distinct keys, the story changes. The first one performs extremely poorly when there are many identical keys, whereas the second one performs faster as the number of duplicate keys increases.

$ g++ qsort_analysis.cpp -g -ltictoc -O0 && ./a.out 10000000 10000000
Time elapsed: 2.067320s
Time elapsed: 1.842697s

$ g++ qsort_analysis.cpp -g -ltictoc -O0 && ./a.out 10000000 10000
Time elapsed: 10.985325s
Time elapsed: 1.441884s

How does such subtle difference make such large difference? We can analyze this by looking at an array with all the same keys and observe how many calls each quicksort is called in total.

$ g++ qsort_analysis.cpp -g -ltictoc -O3 -DDEBUG && ./a.out 16 1
#1 quicksort1 with 0,15
#2 quicksort1 with 0,14
#3 quicksort1 with 0,13
#4 quicksort1 with 0,12
#5 quicksort1 with 0,11
#6 quicksort1 with 0,10
#7 quicksort1 with 0,9
#8 quicksort1 with 0,8
#9 quicksort1 with 0,7
#10 quicksort1 with 0,6
#11 quicksort1 with 0,5
#12 quicksort1 with 0,4
#13 quicksort1 with 0,3
#14 quicksort1 with 0,2
#15 quicksort1 with 0,1
#16 quicksort1 with 0,0
#17 quicksort1 with 2,1
#18 quicksort1 with 3,2
#19 quicksort1 with 4,3
#20 quicksort1 with 5,4
#21 quicksort1 with 6,5
#22 quicksort1 with 7,6
#23 quicksort1 with 8,7
#24 quicksort1 with 9,8
#25 quicksort1 with 10,9
#26 quicksort1 with 11,10
#27 quicksort1 with 12,11
#28 quicksort1 with 13,12
#29 quicksort1 with 14,13
#30 quicksort1 with 15,14
#31 quicksort1 with 16,15
Time elapsed: 0.000057s
#1 quicksort2 with 0,15
#2 quicksort2 with 0,7
#3 quicksort2 with 0,3
#4 quicksort2 with 0,1
#5 quicksort2 with 0,0
#6 quicksort2 with 2,1
#7 quicksort2 with 3,3
#8 quicksort2 with 5,7
#9 quicksort2 with 5,5
#10 quicksort2 with 7,7
#11 quicksort2 with 9,15
#12 quicksort2 with 9,11
#13 quicksort2 with 9,9
#14 quicksort2 with 11,11
#15 quicksort2 with 13,15
#16 quicksort2 with 13,13
#17 quicksort2 with 15,15
Time elapsed: 0.000020s

As you can see above, the first version of quicksort is invoked about 2*N times where the second version is invoked about N times for given array of N elements all of equal keys. The reason is that for the first version, each partitioning will be extremely skewed over to the right, whereas for the second version the partition is always at the center, thus leading to much more efficiency.

Saturday, June 17, 2017

Variations in Quicksort

In this post, I would like to go over three different versions of quicksort and compare their performance.

Let's dig into code first.

There are three different version of quicksort. The first is regular binary quicksort, the second is ternary quicksort, and the last is 3-way quicksort. Note that tictoc.h is my custom library; for more details, see this post.

The binary quicksort is what everyone is most familiar with. The left partition contains those less or equal to pivot value, whereas the right partition contains those who are greater or equal to the pivot value.

The ternary quicksort is similar in that it divides into three partitions and two pivot values. The left-most partition contains those less or equal to the first pivot value, the middle partition contains those in between two pivot values, and the right-most partition contains those greater than the second pivot value. Note that I am assuming the first pivot value is less or equal to the second pivot value.

Lastly, the 3-way quicksort partitions the array into 3, where the first is strictly less, the middle is equal to, and the last is strictly greater.

Let's examine their performance. First, when the array values are mostly distinct:
$ g++ -g quicksort.cpp -ltictoc -O3 && ./a.out 1000000000
Time elapsed: 0.983398s
Time elapsed: 0.883001s
Time elapsed: 1.151062s

It seems like ternary quicksort is the fastest among these.

When the array values are less distinct, however, we observe that 3-way quicksort performance starts to take off, while ternary quicksort slows downs greatly.

$ g++ -g quicksort.cpp -ltictoc -O3 && ./a.out 100000
Time elapsed: 0.808114s
Time elapsed: 1.043392s
Time elapsed: 0.829712s

$ g++ -g quicksort.cpp -ltictoc -O3 && ./a.out 1000
Time elapsed: 0.612664s
Time elapsed: 27.944554s
Time elapsed: 0.427633s

We will examine what is going on as we introduce more duplicate key values in the next post.

Assertion: Great Practice for Software Development

In this post, I will discuss recommended usage of assertion in programming. For both C/C++/Java, assert is a great way to alert a programmer that there is something against his expectation.

Let's do a quick example. Say you are implementing quick sort. In particular, you are trying to validate your partition method. You can insert an assertion to make sure that your partition method is functioning as expected, as below:

Well, you must agree that assertion is a very great tool for debugging, as it tells you exactly where it could have gone wrong. Unfortunately, assertion will slow down the performance and is not needed for production code, so do we need to comment them out when releasing the code for production?

In fact, both C/C++ and Java has a nice way to disable and enable the entire assertion statements. For C/C++, you simply need to define a macro NDEBUG, which will disable all assertions. Otherwise, assertions are enabled by default, unless of course you define NDEBUG macro in your source files explicitly. Therefore, you need disable assertions at compile time for production binary:

$ g++ *.cpp -DNDEBUG

Here, -D is the option for implicit #define macro, so the option -DNDEBUG will define NDEBUG macro for all files.

For Java, there is -da and -ea options which will disable and enable assertions, respectively at the runtime. In fact, assertions are by default disabled in Java, so you need to enable with -ea option when debugging, as below:

$ java -ea SomeClass

For more info, please refer to this and this official documentations.

Happy coding!

Monday, June 12, 2017

Setting up Environment for Coursera's Algorithm Class

Yes, I am taking the very basic algorithm class from Coursera. In fact, this class is very good. It is actually a course from Princeton, and I really like the course, except that they use Java, which I am not as familiar with compared to C/C++.

On top of that, the author provides custom jar library which I must use for compiling and running Java projects. Although the instructions are written very clearly on the official website, I still want to go over this bit, because I spent quite some time figuring this out.

In order to compile Java with a custom library, one must specify the library's path with -classpath flag as below:
$ javac -classpath /path/to/custom/library.jar -g

Also, it is handy to include -g flag so that one can debug it later. If you are compiling a program that uses algs4.jar library file provided by the author, you must download the file and specify it with -classpath flag.

In order to run the program, you need to do the same thing, but with the path to source files as well:
$ java -classpath /path/to/custom/library.jar:/directory/to/source/files SomeClass 

Finally, to debug, you need to do the same thing:
$ jdb -classpath /path/to/custom/library.jar:/directory/to/source/files SomeClass

Let's do an example. Let's assume that you have downloaded algs4.jar file to ~/Downloads folder. Say you have created file in ~/Documents folder.

You will need to run the following commands to compile, run, and debug, respectively:
$ javac -classpath ~/Downloads/algs4.jar ~/Documents/ -g
$ java -classpath ~/Downloads/algs4.jar:~/Documents Queue
$ jdb -classpath ~/Downloads/algs4.jar:~/Documents Queue

Happy coding algorithms!