It is common to have multiple implementations to a given problem. In STL, for example, you can build a queue from a vector or a list, i.e,
std::queue<int, std::vector<int>> queue_using_vector;
std::queue<int, std::list<int>> queue_using_list;
Here, the container is provided as a template and therefore must be declared during compilation time. What if, here comes the question, I want to be able to choose its implementation during runtime and not compilation time?
In this post, I want to show the closest solution to the problem above. Consider the following scenario. I want to build a queue that uses some container APIs. The container can be either implemented through an array or through a list. I want to be able to decide this during runtime.
The code below shows a sketch of this design pattern. First, you must create its container APIs through a purely abstract class, i.e., define container interface. Then, you implement the APIs using two implementations: using an array and using a list. Next, you define a concrete container class which takes in either one of the implementation pointer. Finally, you can now use this concrete container class to implement a queue.
Cool, right? Happy hacking!
Unix & Me
Saturday, May 9, 2020
Friday, March 6, 2020
Docker Cheat Sheet
Concepts:
- container: an instance of running image
View images
$ docker images
Delete an image
$ docker image rm IMAGE_ID
View running containers
$ docker container ls
View all containers
$ docker container ls --all
Delete a container
$ docker container rm CONTAINER_NAME [--force]
Delete all containers
$ docker container prune
To run a new ubuntu container and start terminal
$ docker run -it ubuntu
To run bash on running container
$ docker exec -it CONTAINER_NAME bash
- container: an instance of running image
View images
$ docker images
Delete an image
$ docker image rm IMAGE_ID
View running containers
$ docker container ls
View all containers
$ docker container ls --all
Delete a container
$ docker container rm CONTAINER_NAME [--force]
Delete all containers
$ docker container prune
To run a new ubuntu container and start terminal
$ docker run -it ubuntu
To run bash on running container
$ docker exec -it CONTAINER_NAME bash
Monday, February 17, 2020
Kaldi series 1 - setup/debug with CLion
In the series of posts, I will describe how to run automatic speech recognition (ASR) system with Kaldi. For the best debugging experience, I will describe step by step run/debug instructions with CLion, the best C++ IDE on non-Windows systems. FYI, I am running these commands on macOS 10.15 (Catalina), but should be similar on Linux systems.
In the very first series, we will simply setup Kaldi project on CLion for running and debugging.
$ git clone https://github.com/kaldi-asr/kaldi.git && cd kaldi
Kaldi recently added CMake support (Thank you so much!), and it will be so much easier for CLion to load the project now. Run CLion and open up Kaldi directory. Run Build --> Build All in Debug. This process will take quite some time, so please be patient.
Unfortunately, there are other things to take care of. The following commands will take some time to run, so be patient.
$ cd tools && make -j4
$ extras/install_irstlm.sh && cd ..
Once you are done, let's run a pre-trained model to see if it works fine.
$ cd egs/apiai_decode/s5
$ ./download-model.sh
We also need to let CMake-built binary files to be used. Edit path.sh as below:
export KALDI_ROOT=`pwd`/../../..
export KALDI_CMAKE_ROOT=`pwd`/../../../cmake-build-debug
[ -f $KALDI_ROOT/tools/env.sh ] && . $KALDI_ROOT/tools/env.sh
export PATH=$PWD/utils/:$KALDI_ROOT/tools/openfst/bin:$PWD:$PATH
[ ! -f $KALDI_ROOT/tools/config/common_path.sh ] && echo >&2 "The standard file $KALDI_ROOT/src/path.sh is not present -> Exit!" && exit 1
. $KALDI_ROOT/tools/config/common_path.sh
export LC_ALL=C
Lastly, edit tools/config/common_path.sh by replacing KALDI_ROOT to KALDI_CMAKE_ROOT as follows:
# we assume KALDI_CMAKE_ROOT is already defined
[ -z "$KALDI_CMAKE_ROOT" ] && echo >&2 "The variable KALDI_CMAKE_ROOT must be already defined" && exit 1
# The formatting of the path export command is intentionally weird, because
# this allows for easy diff'ing
export PATH=\
${KALDI_CMAKE_ROOT}/src/bin:\
${KALDI_CMAKE_ROOT}/src/chainbin:\
${KALDI_CMAKE_ROOT}/src/featbin:\
${KALDI_CMAKE_ROOT}/src/fgmmbin:\
${KALDI_CMAKE_ROOT}/src/fstbin:\
${KALDI_CMAKE_ROOT}/src/gmmbin:\
${KALDI_CMAKE_ROOT}/src/ivectorbin:\
${KALDI_CMAKE_ROOT}/src/kwsbin:\
${KALDI_CMAKE_ROOT}/src/latbin:\
${KALDI_CMAKE_ROOT}/src/lmbin:\
${KALDI_CMAKE_ROOT}/src/nnet2bin:\
${KALDI_CMAKE_ROOT}/src/nnet3bin:\
${KALDI_CMAKE_ROOT}/src/nnetbin:\
${KALDI_CMAKE_ROOT}/src/online2bin:\
${KALDI_CMAKE_ROOT}/src/onlinebin:\
${KALDI_CMAKE_ROOT}/src/rnnlmbin:\
${KALDI_CMAKE_ROOT}/src/sgmm2bin:\
${KALDI_CMAKE_ROOT}/src/sgmmbin:\
${KALDI_CMAKE_ROOT}/src/tfrnnlmbin:\
${KALDI_CMAKE_ROOT}/src/cudadecoderbin:\
$PATH
$ ./recognize-wav.sh /PATH/TO/YOUR/WAV/test.wav
You should see its transcript in the log. Now let's debug, decoding for example, with CLion. As you can see from the log, the main decoding execution command is as follows:
nnet3-latgen-faster --frame-subsampling-factor=3 --frames-per-chunk=50 --extra-left-context=0 --extra-right-context=0 --extra-left-context-initial=-1 --extra-right-context-final=-1 --minimize=false --max-active=7000 --min-active=200 --beam=15.0 --lattice-beam=8.0 --acoustic-scale=1.0 --allow-partial=true --word-symbol-table=exp/api.ai-model/words.txt exp/api.ai-model/final.mdl exp/api.ai-model//HCLG.fst 'ark,s,cs:apply-cmvn --norm-means=false --norm-vars=false --utt2spk=ark:data/test-corpus/utt2spk scp:data/test-corpus/cmvn.scp scp:data/test-corpus/feats.scp ark:- |' 'ark:|lattice-scale --acoustic-scale=10.0 ark:- ark:- >exp/lat.1'
This big command consists of multiple execution piped in a convoluted way, so let's do one by one. The main execution binary nnet3-latgen-faster takes 4 arguments, as you can see from
$ nnet3-latgen-faster
By the way, it is likely that you will get command not found error, so let's do this first
$ export KALDI_CMAKE_ROOT=$(pwd)/../../../cmake-build-debug
$ source ../../../tools/config/common_path.sh
Now, try again
$ nnet3-latgen-faster
The first two arguments are provided from the files, i.e. exp/api.ai-model/final.mdl and exp/api.ai-model/HCLG.fst.
The third argument is features, which is read from stdin from running the command
apply-cmvn --norm-means=false --norm-vars=false --utt2spk=ark:data/test-corpus/utt2spk scp:data/test-corpus/cmvn.scp scp:data/test-corpus/feats.scp ark:-
We will create this features file separately, by running
$ apply-cmvn --norm-means=false --norm-vars=false --utt2spk=ark:data/test-corpus/utt2spk scp:data/test-corpus/cmvn.scp scp:data/test-corpus/feats.scp ark:features.feat
You should see features.feat file created. We can now run the decoding with this file as an input
$ nnet3-latgen-faster --frame-subsampling-factor=3 --frames-per-chunk=50 --extra-left-context=0 --extra-right-context=0 --extra-left-context-initial=-1 --extra-right-context-final=-1 --minimize=false --max-active=7000 --min-active=200 --beam=15.0 --lattice-beam=8.0 --acoustic-scale=1.0 --allow-partial=true --word-symbol-table=exp/api.ai-model/words.txt exp/api.ai-model/final.mdl exp/api.ai-model/HCLG.fst ark:features.feat ark:lat.1
Here, I simply replaced the third argument as the feature file and the fourth argument as lat.1 output file, without piping to lattice-scale.
Finally, it is time to run this on CLion debug mode. From CLion's edit configuration, select nnet3-latgen-faster. Enter the program arguments, copied from the above and make sure to set the working directory as the current directory, i.e., egs/apiai_decode/s5. You can set the breakpoint in main function, say line 38 and start debugging with CLion. It should all work well!
In the very first series, we will simply setup Kaldi project on CLion for running and debugging.
$ git clone https://github.com/kaldi-asr/kaldi.git && cd kaldi
Kaldi recently added CMake support (Thank you so much!), and it will be so much easier for CLion to load the project now. Run CLion and open up Kaldi directory. Run Build --> Build All in Debug. This process will take quite some time, so please be patient.
Unfortunately, there are other things to take care of. The following commands will take some time to run, so be patient.
$ cd tools && make -j4
$ extras/install_irstlm.sh && cd ..
Once you are done, let's run a pre-trained model to see if it works fine.
$ cd egs/apiai_decode/s5
$ ./download-model.sh
We also need to let CMake-built binary files to be used. Edit path.sh as below:
export KALDI_ROOT=`pwd`/../../..
export KALDI_CMAKE_ROOT=`pwd`/../../../cmake-build-debug
[ -f $KALDI_ROOT/tools/env.sh ] && . $KALDI_ROOT/tools/env.sh
export PATH=$PWD/utils/:$KALDI_ROOT/tools/openfst/bin:$PWD:$PATH
[ ! -f $KALDI_ROOT/tools/config/common_path.sh ] && echo >&2 "The standard file $KALDI_ROOT/src/path.sh is not present -> Exit!" && exit 1
. $KALDI_ROOT/tools/config/common_path.sh
export LC_ALL=C
# we assume KALDI_CMAKE_ROOT is already defined
[ -z "$KALDI_CMAKE_ROOT" ] && echo >&2 "The variable KALDI_CMAKE_ROOT must be already defined" && exit 1
# The formatting of the path export command is intentionally weird, because
# this allows for easy diff'ing
export PATH=\
${KALDI_CMAKE_ROOT}/src/bin:\
${KALDI_CMAKE_ROOT}/src/chainbin:\
${KALDI_CMAKE_ROOT}/src/featbin:\
${KALDI_CMAKE_ROOT}/src/fgmmbin:\
${KALDI_CMAKE_ROOT}/src/fstbin:\
${KALDI_CMAKE_ROOT}/src/gmmbin:\
${KALDI_CMAKE_ROOT}/src/ivectorbin:\
${KALDI_CMAKE_ROOT}/src/kwsbin:\
${KALDI_CMAKE_ROOT}/src/latbin:\
${KALDI_CMAKE_ROOT}/src/lmbin:\
${KALDI_CMAKE_ROOT}/src/nnet2bin:\
${KALDI_CMAKE_ROOT}/src/nnet3bin:\
${KALDI_CMAKE_ROOT}/src/nnetbin:\
${KALDI_CMAKE_ROOT}/src/online2bin:\
${KALDI_CMAKE_ROOT}/src/onlinebin:\
${KALDI_CMAKE_ROOT}/src/rnnlmbin:\
${KALDI_CMAKE_ROOT}/src/sgmm2bin:\
${KALDI_CMAKE_ROOT}/src/sgmmbin:\
${KALDI_CMAKE_ROOT}/src/tfrnnlmbin:\
${KALDI_CMAKE_ROOT}/src/cudadecoderbin:\
$PATH
Tedious Kaldi setup is all done finally. Now, you need some audio file for testing, so simply create a wav file with your voice, saying whatever you want to be transcribed (in English). Make sure to use 16KHz sampling rate w/ 16-bit encoding. Save this file as test.wav. Let's run it!
$ ./recognize-wav.sh /PATH/TO/YOUR/WAV/test.wav
You should see its transcript in the log. Now let's debug, decoding for example, with CLion. As you can see from the log, the main decoding execution command is as follows:
nnet3-latgen-faster --frame-subsampling-factor=3 --frames-per-chunk=50 --extra-left-context=0 --extra-right-context=0 --extra-left-context-initial=-1 --extra-right-context-final=-1 --minimize=false --max-active=7000 --min-active=200 --beam=15.0 --lattice-beam=8.0 --acoustic-scale=1.0 --allow-partial=true --word-symbol-table=exp/api.ai-model/words.txt exp/api.ai-model/final.mdl exp/api.ai-model//HCLG.fst 'ark,s,cs:apply-cmvn --norm-means=false --norm-vars=false --utt2spk=ark:data/test-corpus/utt2spk scp:data/test-corpus/cmvn.scp scp:data/test-corpus/feats.scp ark:- |' 'ark:|lattice-scale --acoustic-scale=10.0 ark:- ark:- >exp/lat.1'
This big command consists of multiple execution piped in a convoluted way, so let's do one by one. The main execution binary nnet3-latgen-faster takes 4 arguments, as you can see from
$ nnet3-latgen-faster
By the way, it is likely that you will get command not found error, so let's do this first
$ export KALDI_CMAKE_ROOT=$(pwd)/../../../cmake-build-debug
$ source ../../../tools/config/common_path.sh
Now, try again
$ nnet3-latgen-faster
The first two arguments are provided from the files, i.e. exp/api.ai-model/final.mdl and exp/api.ai-model/HCLG.fst.
The third argument is features, which is read from stdin from running the command
apply-cmvn --norm-means=false --norm-vars=false --utt2spk=ark:data/test-corpus/utt2spk scp:data/test-corpus/cmvn.scp scp:data/test-corpus/feats.scp ark:-
We will create this features file separately, by running
$ apply-cmvn --norm-means=false --norm-vars=false --utt2spk=ark:data/test-corpus/utt2spk scp:data/test-corpus/cmvn.scp scp:data/test-corpus/feats.scp ark:features.feat
You should see features.feat file created. We can now run the decoding with this file as an input
$ nnet3-latgen-faster --frame-subsampling-factor=3 --frames-per-chunk=50 --extra-left-context=0 --extra-right-context=0 --extra-left-context-initial=-1 --extra-right-context-final=-1 --minimize=false --max-active=7000 --min-active=200 --beam=15.0 --lattice-beam=8.0 --acoustic-scale=1.0 --allow-partial=true --word-symbol-table=exp/api.ai-model/words.txt exp/api.ai-model/final.mdl exp/api.ai-model/HCLG.fst ark:features.feat ark:lat.1
Here, I simply replaced the third argument as the feature file and the fourth argument as lat.1 output file, without piping to lattice-scale.
Finally, it is time to run this on CLion debug mode. From CLion's edit configuration, select nnet3-latgen-faster. Enter the program arguments, copied from the above and make sure to set the working directory as the current directory, i.e., egs/apiai_decode/s5. You can set the breakpoint in main function, say line 38 and start debugging with CLion. It should all work well!
Tuesday, November 19, 2019
Debug Pybind11 C++ Extension with CLion
OK, so I need to be able to debug C++ part of the code, which is called from Python3 using Pybind11, and I don't want to do it with lldb or gdb, i.e., simple TUI debugger. In fact, I develop C++ extension with CLion extensively, so I want to be able to debug/step within CLion. Here is how to do so.
I'm going to use the Pybind11's cmake-example, since we want to use CMake with CLion.
First, download the repo
$ git clone --recursive https://github.com/pybind/cmake_example.git && cd cmake_example
From now on, I'm going to assume $ROOT is the path for this cmak_example repository.
Next, import the directory with CLion
CLion --> Open --> select $ROOT folder
Add symbols and turn off optimization for debugging by adding the following line to CMakeLists.txt file
cmake_minimum_required(VERSION 2.8.12)
project(cmake_example)
set(CMAKE_CXX_FLAGS "-g -O0")
add_subdirectory(pybind11)
pybind11_add_module(cmake_example src/main.cpp)
I'm going to use the Pybind11's cmake-example, since we want to use CMake with CLion.
First, download the repo
$ git clone --recursive https://github.com/pybind/cmake_example.git && cd cmake_example
From now on, I'm going to assume $ROOT is the path for this cmak_example repository.
Next, import the directory with CLion
CLion --> Open --> select $ROOT folder
Add symbols and turn off optimization for debugging by adding the following line to CMakeLists.txt file
cmake_minimum_required(VERSION 2.8.12)
project(cmake_example)
set(CMAKE_CXX_FLAGS "-g -O0")
add_subdirectory(pybind11)
pybind11_add_module(cmake_example src/main.cpp)
Edit Run/Debug configurations for cmake_exmaple as follows:
target: cmake_example
executable: /your/python3/binary
program arguments: tests/test.py
working directory: $ROOT
environment variables: PYTHONPATH=$ROOT/cmake-build
Now, debug with this configuration. You'll probably get version assert error. Let's just comment out that line in tests/test.py.
import cmake_example as m
#assert m.__version__ == '0.0.1'
assert m.add(1, 2) == 3
assert m.subtract(1, 2) == -1
Now, re-run debug with break point at line 4 of src/main.cpp.
CLion should break there!
Sunday, October 27, 2019
WTF? Fix to "string.h not found" in macOS
OK, I love macOS but I sometimes hate the hassle when it comes to Xcode and its toolchains. Randomly I get errors like "string.h" not found... WTF?
Here is the fix. You probably have the Xcode command line tools installed; Run
open /Library/Developer/CommandLineTools/Packages/macOS_SDK_headers_for_macOS_10.1X.pkg
Here is the fix. You probably have the Xcode command line tools installed; Run
open /Library/Developer/CommandLineTools/Packages/macOS_SDK_headers_for_macOS_10.1X.pkg
where you want to put the right version yourself. (X = 4 for Mojave, 5 for Catalina, etc)
If you don't have the package, try deleting and re-installing the tools and re-try
sudo rm -rf /Library/Developer/CommandLineTools
xcode-select --install
If that does not work, try
sudo xcode-select -s /Applications/Xcode.app/Contents/Developer
Hope this fixes!
sudo xcode-select -s /Applications/Xcode.app/Contents/Developer
Hope this fixes!
*** EDIT ***
Sometimes with CLion, you may get similar error. For that, the fix is rather simple. Go to Tools --> CMake --> Reset Cache and Reload Project. That's it!
Thursday, October 3, 2019
Load Makefile Projects on CLion
If you are like me, there is no better C/C++ IDE than Jetbrain's CLion. I absolutely love it and refuse to use any other IDE.
There is one problem, however. CLion only supports CMake projects. There is a way to import Makefile projects using compiledb, but it was not trivial for projects that heavily relies on GNU toolchains.
In this post, I will go over how to import Makefile projects, such as openfst, that cannot be imported properly following Jetbrain's tutorial and tutorial2. There is one trick; when running make, add -w option, which prints entering/leaving directory. Without this option, the generated compile commands will not locate the true path.
That is, run
$ compiledb make -w
By the way, never use multithreading option -jN here because it will mess up the order of files and compiledb will not be able to reproduce all the make commands.
That's it! Happy hacking.
There is one problem, however. CLion only supports CMake projects. There is a way to import Makefile projects using compiledb, but it was not trivial for projects that heavily relies on GNU toolchains.
In this post, I will go over how to import Makefile projects, such as openfst, that cannot be imported properly following Jetbrain's tutorial and tutorial2. There is one trick; when running make, add -w option, which prints entering/leaving directory. Without this option, the generated compile commands will not locate the true path.
That is, run
$ compiledb make -w
By the way, never use multithreading option -jN here because it will mess up the order of files and compiledb will not be able to reproduce all the make commands.
That's it! Happy hacking.
Wednesday, June 19, 2019
Compile GNU Coreutils from Scratch on macOS Mojave
Here, I will discuss how to compile GNU coreutils from scratch. You have two options. I recommend Option 2 below.
Option 1:
First, download the source code from its repo. I will use v8.31
$ git clone https://github.com/coreutils/coreutils.git -b v8.31
$ cd coreutils
Next, clone git submodule
$ git submodule update --init
Next, bootstrap
$ ./bootstrap
./bootstrap: line 470: autopoint: command not found
./bootstrap: Error: 'autopoint' not found
./bootstrap: line 470: gettext: command not found
./bootstrap: Error: 'gettext' not found
./bootstrap: Error: 'makeinfo' version == 4.8 is too old
./bootstrap: 'makeinfo' version >= 6.1 is required
./bootstrap: See README-prereq for how to get the prerequisite programs
Well, I need to get prerequisite programs first. Install gettext using brew
$ brew install gettext && brew link gettext --force
Let's try again
$ ./bootstrap
./bootstrap: Error: 'makeinfo' version == 4.8 is too old
./bootstrap: 'makeinfo' version >= 6.1 is required
./bootstrap: See README-prereq for how to get the prerequisite programs
To check the version, run
$ makeinfo --version
makeinfo (GNU texinfo) 4.8
Copyright (C) 2004 Free Software Foundation, Inc.
There is NO warranty. You may redistribute this software
under the terms of the GNU General Public License.
For more information about these matters, see the files named COPYING.
So, I do need to update makeinfo. Note that this is the same as texi2any from GNU texinfo package.
$ pushd && cd ~/Downloads
$ wget http://ftp.gnu.org/gnu/texinfo/texinfo-6.6.tar.gz
$ cd texinfo-6.6
$ ./configure
$ make -j4
$ sudo make install
$ makeinfo --version
texi2any (GNU texinfo) 6.6
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Now, we are ready to bootstrap again
$ popd && ./bootstrap
Option 2:
Download the distribution source code
$ wget https://ftp.gnu.org/gnu/coreutils/coreutils-8.31.tar.xz
$ tar xfj coreutils-8.31.tar.xz && cd coreutils-8.31
------------------------------------------------------------------
The easy parts are left.
$ ./configure
Finally, we should be able to build it
$ make -j4
$ sudo make install
Happy hacking!
Option 1:
First, download the source code from its repo. I will use v8.31
$ git clone https://github.com/coreutils/coreutils.git -b v8.31
$ cd coreutils
Next, clone git submodule
$ git submodule update --init
Next, bootstrap
$ ./bootstrap
./bootstrap: line 470: autopoint: command not found
./bootstrap: Error: 'autopoint' not found
./bootstrap: line 470: gettext: command not found
./bootstrap: Error: 'gettext' not found
./bootstrap: Error: 'makeinfo' version == 4.8 is too old
./bootstrap: 'makeinfo' version >= 6.1 is required
./bootstrap: See README-prereq for how to get the prerequisite programs
Well, I need to get prerequisite programs first. Install gettext using brew
$ brew install gettext && brew link gettext --force
Let's try again
$ ./bootstrap
./bootstrap: Error: 'makeinfo' version == 4.8 is too old
./bootstrap: 'makeinfo' version >= 6.1 is required
./bootstrap: See README-prereq for how to get the prerequisite programs
To check the version, run
$ makeinfo --version
makeinfo (GNU texinfo) 4.8
Copyright (C) 2004 Free Software Foundation, Inc.
There is NO warranty. You may redistribute this software
under the terms of the GNU General Public License.
For more information about these matters, see the files named COPYING.
So, I do need to update makeinfo. Note that this is the same as texi2any from GNU texinfo package.
$ pushd && cd ~/Downloads
$ wget http://ftp.gnu.org/gnu/texinfo/texinfo-6.6.tar.gz
$ cd texinfo-6.6
$ ./configure
$ make -j4
$ sudo make install
$ makeinfo --version
texi2any (GNU texinfo) 6.6
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Now, we are ready to bootstrap again
$ popd && ./bootstrap
Option 2:
Download the distribution source code
$ wget https://ftp.gnu.org/gnu/coreutils/coreutils-8.31.tar.xz
$ tar xfj coreutils-8.31.tar.xz && cd coreutils-8.31
------------------------------------------------------------------
The easy parts are left.
$ ./configure
Finally, we should be able to build it
$ make -j4
$ sudo make install
Happy hacking!
Saturday, May 4, 2019
Copy to Clipboard in Terminal
Install xclip
$ sudo apt install xclip
Copy to clipboard
$ cat some_file.txt | xclip -selection clipboard
Paste to anywhere!
$ sudo apt install xclip
Copy to clipboard
$ cat some_file.txt | xclip -selection clipboard
Paste to anywhere!
Sunday, April 21, 2019
N-Gram ARPA Model
This post is to summarize how the probability is calculated from ARPA model.
Consider test.arpa and the following sequences of words:
look beyond
more looking on
Let's consider look beyond first. The log10 probability of seeing beyond conditioned upon look, i.e., log10(P(beyond | look)) = -0.2922095. This is directly from the test.arpa file, line 78.
What is, then, the probability of seeing look beyond? Well, this is by the chain rule of conditional probabilities
log10(P(look beyond))
= log10(P(look) * P(beyond | look))
= log10(P(look)) + log10(P(beyond | look))
= -1.687872 + -0.2922095 = -1.980081558227539,
which can be verified with python code
import kenlm
model = kenlm.LanguageModel('test.arpa')
print(model.score('look beyond', eos=False, bos=False)
Let's try the next sequence more looking on. Let us start with the chain rule
log10(P(more looking on))
= log10(P(more)) + log10(P(looking | more)) + log10(P(on | more looking))
The first term on the RHS is easy: log10(P(more)) = -1.206319 from line 34
The second term is a bit tricky, because we cannot find the bi-gram more looking from the model. Hence, we use the following formula:
P(looking | more) = P(looking) * BW(more)
where log10(P(looking)) = -1.285941 from line 33, and log10(BW(more)) = -0.544068 is the back-off weight, which can be read off from line 34.
Lastly, the third term is again not present in the model, so we reduce it to
P(on | more looking) = P(on | looking) * BW(looking | more)
where the first term is -0.4638903 from line 80, and the second term is assumed to be 1, because the bigram more looking does not exist in the model
Thus, we get log10(P(more looking on)) = -(1.206319 + 1.285941 + 0.544068 + 0.4638903) = -3.5
For more details, refer to this document. I also find this answer very helpful.
Consider test.arpa and the following sequences of words:
look beyond
more looking on
Let's consider look beyond first. The log10 probability of seeing beyond conditioned upon look, i.e., log10(P(beyond | look)) = -0.2922095. This is directly from the test.arpa file, line 78.
What is, then, the probability of seeing look beyond? Well, this is by the chain rule of conditional probabilities
log10(P(look beyond))
= log10(P(look) * P(beyond | look))
= log10(P(look)) + log10(P(beyond | look))
= -1.687872 + -0.2922095 = -1.980081558227539,
which can be verified with python code
import kenlm
model = kenlm.LanguageModel('test.arpa')
print(model.score('look beyond', eos=False, bos=False)
Let's try the next sequence more looking on. Let us start with the chain rule
log10(P(more looking on))
= log10(P(more)) + log10(P(looking | more)) + log10(P(on | more looking))
The first term on the RHS is easy: log10(P(more)) = -1.206319 from line 34
The second term is a bit tricky, because we cannot find the bi-gram more looking from the model. Hence, we use the following formula:
P(looking | more) = P(looking) * BW(more)
where log10(P(looking)) = -1.285941 from line 33, and log10(BW(more)) = -0.544068 is the back-off weight, which can be read off from line 34.
Lastly, the third term is again not present in the model, so we reduce it to
P(on | more looking) = P(on | looking) * BW(looking | more)
where the first term is -0.4638903 from line 80, and the second term is assumed to be 1, because the bigram more looking does not exist in the model
Thus, we get log10(P(more looking on)) = -(1.206319 + 1.285941 + 0.544068 + 0.4638903) = -3.5
For more details, refer to this document. I also find this answer very helpful.
Wednesday, March 27, 2019
Remote Debugging with Eclipse
I am not familiar with Eclipse, as I prefer to use CLion. However, for projects that do not use CMake build system, I will have to use Eclipse.
In this post, I will discuss how to remote-debug with Eclipse. The setup is as follows:
target (local): running the application from, say terminal
host (Eclipse): debug the program as it is running
First, open up the project with Eclipse. Make sure that Eclipse can build the project.
Next, setup gdbserver on the target:
$ gdbserver :7777 EXECUTABLE ARG1 ARG2 ...
Here, 7777 is the port we will use for remote-debugging, EXECUTABLE is the binary file we are going to debug as it is running, and ARG1, ARG2, ... are appropriate arguments for this program.
Next, we setup Eclipse debugging.
From the menu, select Run --> Debug Configurations... --> C/C++ Remote Application (double click) --> Using GDB (DSF) Auto Remote Debugging Launcher (Select other) --> GDB (DSF) Manual Remote Debugging Launcher --> OK. Basically, we have selected "manual" remote debugging configuration here.
Make sure Project and C/C++ Application fields are properly filled, i.e., you should be able to select the project from the drop down menu if the project import/build is successful, and choose the EXECUTABLE for C/C++ Application.
In the Debugger tab --> Connection tab, change Port Number to 7777.
Finally, click Debug button. You now should be able to remote debug with Eclipse.
Happy hacking!
In this post, I will discuss how to remote-debug with Eclipse. The setup is as follows:
target (local): running the application from, say terminal
host (Eclipse): debug the program as it is running
First, open up the project with Eclipse. Make sure that Eclipse can build the project.
Next, setup gdbserver on the target:
$ gdbserver :7777 EXECUTABLE ARG1 ARG2 ...
Here, 7777 is the port we will use for remote-debugging, EXECUTABLE is the binary file we are going to debug as it is running, and ARG1, ARG2, ... are appropriate arguments for this program.
Next, we setup Eclipse debugging.
From the menu, select Run --> Debug Configurations... --> C/C++ Remote Application (double click) --> Using GDB (DSF) Auto Remote Debugging Launcher (Select other) --> GDB (DSF) Manual Remote Debugging Launcher --> OK. Basically, we have selected "manual" remote debugging configuration here.
Make sure Project and C/C++ Application fields are properly filled, i.e., you should be able to select the project from the drop down menu if the project import/build is successful, and choose the EXECUTABLE for C/C++ Application.
In the Debugger tab --> Connection tab, change Port Number to 7777.
Finally, click Debug button. You now should be able to remote debug with Eclipse.
Happy hacking!
Subscribe to:
Posts (Atom)