Friday, December 7, 2018

Use pkg-config with CMake

Say you want to build a binary which uses OpenCV library. For example, the following would be a typical command for compiling your program
$ g++ `pkg-config --cflags opencv` main.cpp `pkg-config --libs opencv` -o opencv_example

Now, the question is how do we do this with CMake? The following would be what you would write in CMakeLists.txt
project(opencv_example)

# this is where we get pkg-config info
find_package(PkgConfig REQUIRED)
pkg_check_modules(OPENCV REQUIRED opencv)

# this is where you compile your app
add_executable(opencv_example main.cpp)
target_link_libraries(opencv_example ${OPENCV_LIBRARIES})
target_include_directories(opencv_example PUBLIC ${OPENCV_INCLUDE_DIRS})
target_compile_options(opencv_example PUBLIC ${OPENCV_CFLAGS_OTHER})

Happy hacking!

Friday, November 23, 2018

Learn Swift on Ubuntu: install and compile

Swift is a new language developed by Apple. Recently, I am studying Swift so that I can build iOS or macOS apps. In this post, I will go over how to setup Swift environment on Ubuntu 18.04 LTS.

First, install clang
$ sudo apt-get update && sudo apt-get install clang -y

Next, download Swift from Swift.org
$ wget https://swift.org/builds/swift-4.2.1-release/ubuntu1804/swift-4.2.1-RELEASE/swift-4.2.1-RELEASE-ubuntu18.04.tar.gz

Decompress
$ tar xfz swift-4.2.1-RELEASE-ubuntu18.04.tar.gz

Add swift binary directory to your PATH environment
$ export PATH=/path/to/swift/usr/bin:${PATH}

This is it for setting up the environment. Let's now build "hello world" in Swift.

Create hello.swift with the following content:
print("hello world")

To run your code, simply run
$ swift hello.swift
hello world

To compile and create a binary, run
$ swiftc hello.swift
./hello
hello world

That's it for this post. Happy hacking!

Sunday, November 4, 2018

Simple 2-Way Chat with Java

In this post, I'm going to discuss how to create a simple chat application with Java. Basically, this is going to be a tutorial on Socket API and Java Thread API.

Here is the basic idea. A chat class should be running two tasks simultaneously: one is to send messages that the user inputs, and the other is to receive incoming message from the other side. Because these two tasks should be run simultaneously, we will need to implement a multi-threaded application.

I am going to let the main thread take care of user input and transmitting over to the other side, and a separate thread for receiving any incoming messages. Here we go.

So, the base class is ChatThread, which pretty much takes care of what I described above. The main thread takes care of user input and transmitting over to the other side, whereas a separate thread is going to run in the background, which receives incoming messages.

For this 2-way simple application, there are only two sides: one for server and the other for client. The server needs to take care of creating the server socket and wait for a client to join. Once the client joins, the server starts the ChatThread. The client will simply join the server and start the chatThread.

The implementation here is very basic and doesn't take care of properly exiting the connections. However, this should give you good introduction as to how to deal with socket programming and threading.

Happy hacking!

Friday, October 26, 2018

Build AI Camera App (Caffe2)

AI Camera app from Facebook is an example app that shows how to build Caffe2 on Android platform. Unfortunately, you will find that it does not work with the latest Android Studio. Here is how to get it to work.

One of the main reasons that it does not build is because starting with NDK r18b, GCC has been removed. The easiest way to get it working is to download older version, say r17c, and edit local.properties file and change ndk.dir to r17c folder.

Other than that, a couple of minor changes. In gradle.build file, update gradle version to 3.2.1 as below:

classpath 'com.android.tools.build:gradle:3.0.1'
classpath 'com.android.tools.build:gradle:3.2.1'


Lastly, edit gradle/wrapper/gradle-wrapper.properties file and change gradle tool version to 4.6 as below:

distributionUrl=https\://services.gradle.org/distributions/gradle-4.1-all.zip
distributionUrl=https\://services.gradle.org/distributions/gradle-4.6-all.zip


After all these changes, the app should successfully build! Happy hacking!

Build Caffe2 using Android NDK

Starting with Google's Android NDK r18b, gcc has been removed (history). That is, only one has to use clang as the toolchain. This may cause some problems if you do not set its toolchain as clang.

In fact, when you try to build Caffe2 for Android with android-ndk, it will fail, complaining

GCC is no longer supported.  See
  https://android.googlesource.com/platform/ndk/+/master/docs/ClangMigration.md.

The solution is to simply edit pytorch/scripts/build_android.sh file, replacing gcc to clang for ANDROID_TOOLCHAIN variable, as shown below:

CMAKE_ARGS+=("-DANDROID_TOOLCHAIN=gcc")
CMAKE_ARGS+=("-DANDROID_TOOLCHAIN=clang")

Happy hacking!

Building Caffe2 and Pytorch for Ubuntu 18.04 LTS

I am trying to export my pytorch model to Android devices. It seems that using onnx and caffe2 is the easiest way to do so. Here, I will describe the steps to build and install pytorch & caffe2 for Ubuntu 18.04 LTS.

Basically, one has to follow the instructions here, but I had some problems building. The easiest way to build is to disable all unnecessary features. For example, since I am only going to port it to caffe2 and do not intend to use GPU for running pytorch model (at least for this python environment), I can set some flags:
$ export NO_CUDA=1
$ export NO_DISTRIBUTED=1

Also, I will build pytorch and caffe2 together. Hence, I set the flag
$ export FULL_CAFFE2=1

With these environment variables set, I successfully built and installed caffe2 and pytorch from the source, following the official instruction.

Happy hacking!

Tuesday, August 14, 2018

Multi-GPU on Pytorch

After some time, I finally figured out how to run multi-gpu on pytorch. In fact, multi-gpu API is just extremely simple in pytorch; the problem was my system.

Here is a simple test code to try out multi-gpu on pytorch. If this works about of the box, then you are good. However, some people may face problems, as discussed in this forum. As pointed out here, the problem is not about pytorch, but with external factor. In my case, it was ngimel 's comment that saved me. To recap her solution,

1. Test p2pBandwithLatencyTest from CUDA samples and make sure it works fine. If it does not pass this one, then the problem is with CUDA installation, etc, and not with pytorch. To download samples, simply run

$ cuda-install-samples-9.2.sh <target_path>

where you would replace the version above to whatever version you have. Then,

$ cd <target_path>/NVIDIA_CUDA-9.2_Samples/1_Utilities/p2pBandwidthLatencyTest/
$ make
$ ./p2pBandwidthLatencyTest

2. In my case, it was IOMMU that was the culprit. Disable it by editing /etc/default/grub and replace
#GRUB_CMDLINE_LINUX="" 
with
GRUB_CMDLINE_LINUX="iommu=soft"

Then update grup
$ sudo update-grup

Then reboot

This is how I solved my problem. I love this open source community forum! Thank you everyone!

Deep Neural Network Tips and Tricks

I just want to scribble down some of the things I have learned from my own experience in training deep neural networks. Hope this helps others too.

1. Optimizer: use SGD with momentum. If momentum is too high, you may experience validation error greater than train error even if it is not overfit. This is because for each epoch, the momentum starts with zero but builds up as more batches are trained, and at the end of epoch, you may experience gradient explosion, which leads to large validation error. Typical value of momentum is 0.9

2. Gradient clipping: always use gradient clipping to prevent gradient explosion. This saves a lot of time because you don't need to manually tune learning rate constantly while training. Typical value is of 10 or lower

3. Learning rate: in theory, as large as it can be, given that it is small enough to prevent gradient explosion. However, this is just too much of work to adjust learning rate during the training, so simply set it high enough and use gradient clipping to prevent gradient explosion. Typical value is 0.001 or lower

4. Input normalization: to facilitate training, normalize the input data to have zero-mean and unit standard deviation.

5. Batch normalization: employ batch normalization layers. These layers are especially very helpful for deep-networks.

6. Drop out: although drop out is not needed when batch norm is employed, one can still employ small dropout (~0.1) for multiple layers. I think this is better than one or two large dropout (~0.5). If data size is small compared to network, and one needs extra measure to prevent overfitting, drop out layers are useful

7. L2 weight decay: not necessary, but still useful as an option. Typical value of 1e-5 should be fine

8. Short-cuts: shortcuts are extremely useful for deep neural networks. Most popular implementation is perhaps residual blocks. Full pre-activation may be the best choice as illustrated here

9. The output size y of convolution given input size x is
y = (x - kernel + padding*2)/stride + 1

10. For 2D convolutions, using small-kernel convolutions many times is more beneficial than using one large-kernel convolution. For example, using 3 convolutions of 3x3 kernel having depth d results in 3^3 * d = 27d parameters, whereas 1 convolution of 7x7 kernel having the same depth d results in 7^2 * d = 49d parameters. Note that both in both cases the receptive size is 7x7, while with the former case, we can employ 3 activation layers, while the latter we can only get 1 activation layer. Therefore, it is usually believed that the former should be more effective in learning. However, for 1D convolution, the former case requires more parameters than the latter

11. Activation layers: typically ReLU layers are used, but ELU may be a good alternative. It is recommended to employ clipping on those unbounded activation layers. i.e., use y = clamp(x, min=0, max=5) in place of ReLU layers to prevent too large values

12. Training history: it is very important to save loss and accuracy history during the training for both training data and validation data. This is because the training history tells us a lot about it. Usually, in the beginning of the training the validation error should be less than training error, since the error is the running average for the training, while the validation error is the error at the end of the training stage in the epoch. However, as time goes by, the training error should be less than validation error, because it will be slightly overfit. That is a good time to either stop the training, to prevent overfitting, or take additional measure. Also, when the improvement flattens, it is a good indicator to lower LR.

I will continue to add more as I gain more experienced.

Wednesday, August 8, 2018

Enable S3 Sleep for Thinkpad X1 Carbon 6th Gen on Ubuntu

I recently purchased X1 C6; I am back to Thinkpad after 8 years of digression to Macbook. I will probably write  a post explaining why I switched back, but for now I will focus on my story of getting S3 sleep with it under Ubuntu.

One thing I really miss from Macbook is how easy it was to simply close the lid and not worry about draining the battery. Unfortunately, with X1 C6, that is not the case. With Windows 10, closing the lid makes it go to Si03 sleep state where some processes can be on doing stuff in the background. The idea is great, but unfortunately in real world this new Si03 sleep state simply drains the battery so much that I just want to go back to old S3 state, where none of the processes can be on.

Since I installed Ubuntu on this machine and will mainly use Ubuntu, I searched for methods to enable S3 sleep rather than stupid Si03 sleep. After spending much time, I finally found a solution, and I want to share it with anyone who needs this.

At first, I tried this post, but did not work. Then, it was this answer here by Adonis that saved me. Basically, the missing step was to to generate grub.cfg file from grub-mkconfig from /etc/default/grub and then add initrd /boot/acpi_override.

Before this patch, it didn't even last a full day on sleep. Now with this patch, it seems to drain 10% a day in sleep. This is not as good as Macbook, which lasts about 30 days in sleep, so about 3% per day, but I guess this is better than 100% drain per day. With Windows, it was about 30% drain per day in sleep.

I really like this laptop, but I have to admit that I miss Macbook when it comes to this sort of minute convenience.

By the way, I really appreciate that people managed to create this patch and shared with all of us. I really respect their knowledge and skills. This was something Lenovo engineers could not even do!

Saturday, August 4, 2018

Build OS from Scratch 1

I've been always curious as to how a computer works, all the way from the bottom level to the top level. We use computers everyday, so we are familiar with the user-level, i.e., the top-most level, but how many people in the world actually know what is going on in the most deep down level?

I took a course during my undergrad how a computer and OS works, but it has been too long since then, and I don't remember much. Furthermore, when I was taking that course, I lacked much of the necessary knowledge to really absorb the materials; I wasn't even familiar with the most basic shell commands, such as cp, mv, etc.

Now that I think about it, that course was really something I want to learn now; unfortunately, I can't access the course materials any more. Thankfully, there are abundant other resources that are accessible from simple Google search, so I am going to dive into these very low level materials one by one.

I will be starting a series of short blog post to summarize what I learn in my own words, starting with this one. For this post, I am referring to this excellent document.

Boot process looks for bootable sector in any available disk or media. The bootable sector is flagged by magic number 0xAA55 in the last two bytes. The boot sector refers to the first sector in the media.

Let's install qemu to emulate a computer, and nasm to compile assembly.
$ sudo apt-get install qeum nasm -y

Next, create a simple boot sector that prints 'Hello' by first creating assembly source file hello.asm
;
; A simple  boot  sector  that  prints a message  to the  screen  using a BIOS  routine.
;
mov ah, 0x0e    ; teletype mode (tty)
mov al, 'H'     ; char to write to
int 0x10        ; print char on sreen
mov al, 'e'
int 0x10
mov al, 'l'
int 0x10
mov al, 'l'
int 0x10
mov al, 'o'
int 0x10
jmp $           ; Jump to the  current  address (i.e.  forever).
;
; Padding  and  magic  BIOS  number.
;
times  510-($-$$) db 0  ; Pad  the  boot  sector  out  with  zeros
                        ; $ means address at the beginning of the line
                        ; $$ means address at the beginning of the session (file)
dw 0xaa55               ; Last  two  bytes  form  the  magic  number ,
; so BIOS  knows  we are a boot  sector.


and compile to binary format
$ nasm hello.asm -f bin -o hello.bin

For more details on int 0x10, refer to here.

To boot this sector, simply run
$ qemu-system-x86-64 hello.bin

To view the boot sector in HEX, run
$ od -t x1 -A n hello.bin

You should see it boots up successfully by printing out 'Hello'!!