Saturday, May 9, 2020

C++ Multiple Implementation Design Pattern

It is common to have multiple implementations to a given problem. In STL, for example, you can build a queue from a vector or a list, i.e,

std::queue<int, std::vector<int>> queue_using_vector;
std::queue<int, std::list<int>> queue_using_list;

Here, the container is provided as a template and therefore must be declared during compilation time. What if, here comes the question, I want to be able to choose its implementation during runtime and not compilation time?

In this post, I want to show the closest solution to the problem above. Consider the following scenario. I want to build a queue that uses some container APIs. The container can be either implemented through an array or through a list. I want to be able to decide this during runtime.

The code below shows a sketch of this design pattern. First, you must create its container APIs through a purely abstract class, i.e., define container interface. Then, you implement the APIs using two implementations: using an array and using a list. Next, you define a concrete container class which takes in either one of the implementation pointer. Finally, you can now use this concrete container class to implement a queue.


#include <iostream>
#include <memory>
/*
* This code provides an example design pattern to achieve the following:
*
* We want to implement Container class using two different implementations
* We further want to select its implementation during runtime
* We also want to be able to do the same for any classes derived from Container
* We even want to be able to templatize on Container class
*/
/**
* interface that all implementations must bind to
*/
class ContainerInterface {
public:
virtual ~ContainerInterface() = default;
virtual void identify() const = 0;
virtual void push_back(int) = 0;
virtual void pop_back() = 0;
virtual size_t size() const = 0;
virtual int &front() = 0;
virtual int &back() = 0;
};
/**
* one type of implementation
*/
class Array : public ContainerInterface {
public:
Array() = default;
void identify() const override { std::cout << "array" << std::endl; }
void push_back(int x) override { /* ... */ }
void pop_back() override { /* ... */ }
size_t size() const override { /* ... */ }
int &front() override { /* ... */ }
int &back() override { /* ... */ }
private:
/* ... */
};
/**
* another type of implementation
*/
class List : public ContainerInterface {
public:
List() = default;
void identify() const override { std::cout << "list" << std::endl; }
void push_back(int x) override { /* ... */ }
void pop_back() override { /* ... */ }
size_t size() const override { /* ... */ }
int &front() override { /* ... */ }
int &back() override { /* ... */ }
private:
/* ... */
};
/**
* a concrete container taking one implementation
*/
class Container : public ContainerInterface {
public:
explicit Container(std::unique_ptr<ContainerInterface> impl)
: impl_{std::move(impl)} {}
void identify() const override { impl_->identify(); }
void push_back(int x) override { impl_->push_back(x); }
void pop_back() override { impl_->pop_back(); }
size_t size() const override { return impl_->size(); }
int &front() override { return impl_->front(); }
int &back() override { return impl_->back(); }
private:
std::unique_ptr<ContainerInterface> impl_;
};
/**
* Demonstration of templated private inheritance
* Queue is implemented through Container APIs
*/
template<typename C = Container>
class Queue : private C {
public:
explicit Queue(std::unique_ptr<ContainerInterface> impl) :
Container{std::move(impl)} {
C::identify();
}
void push(int x) { /* ... */ }
void pop() { /* ... */ }
const int& top() const { /* ... */ }
size_t size() const { return C::size(); }
bool empty() const { return C::empty(); }
};
/**
* Demonstration of public inheritance
* AdvancedContainer extends Container APIs
* but its implementations should only rely on Container APIs
* and should not depend on implementation specific of array or list
*/
class AdvancedContainer : public Container {
public:
explicit AdvancedContainer(std::unique_ptr<ContainerInterface> impl) :
Container{std::move(impl)} {}
bool empty() const { return size() == 0; }
};
int main(int argc, const char** argv) {
// two copies of implementations depending on user input, determined during run-time
std::unique_ptr<ContainerInterface> impl1, impl2;
if (std::strcmp(argv[1], "array") == 0) {
impl1 = std::unique_ptr<ContainerInterface>(new Array);
impl2 = std::unique_ptr<ContainerInterface>(new Array);
}
else {
impl1 = std::unique_ptr<ContainerInterface>(new List);
impl2 = std::unique_ptr<ContainerInterface>(new List);
}
// both Queue and AdvancedContainer implementations are determined during run-time
Queue<Container> queue{std::move(impl1)};
AdvancedContainer advancedContainer{std::move(impl2)};
advancedContainer.identify();
return 0;
}
/*
* run example
* $ ./a.out array
* array
* array
*
* $ ./a.out list
* list
* list
*/
Cool, right? Happy hacking!

Friday, March 6, 2020

Docker Cheat Sheet

Concepts:
- container: an instance of running image

View images
$ docker images

Delete an image
$ docker image rm IMAGE_ID

View running containers
$ docker container ls

View all containers
$ docker container ls --all

Delete a container
$ docker container rm CONTAINER_NAME [--force]

Delete all containers
$ docker container prune

To run a new ubuntu container and start terminal
$ docker run -it ubuntu

To run bash on running container
$ docker exec -it CONTAINER_NAME bash

Monday, February 17, 2020

Kaldi series 1 - setup/debug with CLion

In the series of posts, I will describe how to run automatic speech recognition (ASR) system with Kaldi. For the best debugging experience, I will describe step by step run/debug instructions with CLion, the best C++ IDE on non-Windows systems. FYI, I am running these commands on macOS 10.15 (Catalina), but should be similar on Linux systems.

In the very first series, we will simply setup Kaldi project on CLion for running and debugging.
$ git clone https://github.com/kaldi-asr/kaldi.git && cd kaldi

Kaldi recently added CMake support (Thank you so much!), and it will be so much easier for CLion to load the project now. Run CLion and open up Kaldi directory. Run Build --> Build All in Debug. This process will take quite some time, so please be patient.

Unfortunately, there are other things to take care of. The following commands will take some time to run, so be patient.
$ cd tools && make -j4
$ extras/install_irstlm.sh && cd ..

Once you are done, let's run a pre-trained model to see if it works fine.
$ cd egs/apiai_decode/s5
$ ./download-model.sh

We also need to let CMake-built binary files to be used. Edit path.sh as below:
export KALDI_ROOT=`pwd`/../../..
export KALDI_CMAKE_ROOT=`pwd`/../../../cmake-build-debug
[ -f $KALDI_ROOT/tools/env.sh ] && . $KALDI_ROOT/tools/env.sh
export PATH=$PWD/utils/:$KALDI_ROOT/tools/openfst/bin:$PWD:$PATH
[ ! -f $KALDI_ROOT/tools/config/common_path.sh ] && echo >&2 "The standard file $KALDI_ROOT/src/path.sh is not present -> Exit!" && exit 1
. $KALDI_ROOT/tools/config/common_path.sh
export LC_ALL=C

Lastly, edit tools/config/common_path.sh by replacing KALDI_ROOT to KALDI_CMAKE_ROOT as follows:
# we assume KALDI_CMAKE_ROOT is already defined
[ -z "$KALDI_CMAKE_ROOT" ] && echo >&2 "The variable KALDI_CMAKE_ROOT must be already defined" && exit 1
# The formatting of the path export command is intentionally weird, because
# this allows for easy diff'ing

export PATH=\
${KALDI_CMAKE_ROOT}/src/bin:\
${KALDI_CMAKE_ROOT}/src/chainbin:\
${KALDI_CMAKE_ROOT}/src/featbin:\
${KALDI_CMAKE_ROOT}/src/fgmmbin:\
${KALDI_CMAKE_ROOT}/src/fstbin:\
${KALDI_CMAKE_ROOT}/src/gmmbin:\
${KALDI_CMAKE_ROOT}/src/ivectorbin:\
${KALDI_CMAKE_ROOT}/src/kwsbin:\
${KALDI_CMAKE_ROOT}/src/latbin:\
${KALDI_CMAKE_ROOT}/src/lmbin:\
${KALDI_CMAKE_ROOT}/src/nnet2bin:\
${KALDI_CMAKE_ROOT}/src/nnet3bin:\
${KALDI_CMAKE_ROOT}/src/nnetbin:\
${KALDI_CMAKE_ROOT}/src/online2bin:\
${KALDI_CMAKE_ROOT}/src/onlinebin:\
${KALDI_CMAKE_ROOT}/src/rnnlmbin:\
${KALDI_CMAKE_ROOT}/src/sgmm2bin:\
${KALDI_CMAKE_ROOT}/src/sgmmbin:\
${KALDI_CMAKE_ROOT}/src/tfrnnlmbin:\
${KALDI_CMAKE_ROOT}/src/cudadecoderbin:\
$PATH

Tedious Kaldi setup is all done finally. Now, you need some audio file for testing, so simply create a wav file with your voice, saying whatever you want to be transcribed (in English). Make sure to use 16KHz sampling rate w/ 16-bit encoding. Save this file as test.wav. Let's run it!

$ ./recognize-wav.sh /PATH/TO/YOUR/WAV/test.wav

You should see its transcript in the log. Now let's debug, decoding for example, with CLion. As you can see from the log, the main decoding execution command is as follows:
nnet3-latgen-faster --frame-subsampling-factor=3 --frames-per-chunk=50 --extra-left-context=0 --extra-right-context=0 --extra-left-context-initial=-1 --extra-right-context-final=-1 --minimize=false --max-active=7000 --min-active=200 --beam=15.0 --lattice-beam=8.0 --acoustic-scale=1.0 --allow-partial=true --word-symbol-table=exp/api.ai-model/words.txt exp/api.ai-model/final.mdl exp/api.ai-model//HCLG.fst 'ark,s,cs:apply-cmvn --norm-means=false --norm-vars=false --utt2spk=ark:data/test-corpus/utt2spk scp:data/test-corpus/cmvn.scp scp:data/test-corpus/feats.scp ark:- |' 'ark:|lattice-scale --acoustic-scale=10.0 ark:- ark:-  >exp/lat.1'

This big command consists of multiple execution piped in a convoluted way, so let's do one by one. The main execution binary nnet3-latgen-faster takes 4 arguments, as you can see from
$ nnet3-latgen-faster

By the way, it is likely that you will get command not found error, so let's do this first
$ export KALDI_CMAKE_ROOT=$(pwd)/../../../cmake-build-debug
$ source ../../../tools/config/common_path.sh

Now, try again
$ nnet3-latgen-faster

The first two arguments are provided from the files, i.e. exp/api.ai-model/final.mdl and exp/api.ai-model/HCLG.fst.

The third argument is features, which is read from stdin from running the command
apply-cmvn --norm-means=false --norm-vars=false --utt2spk=ark:data/test-corpus/utt2spk scp:data/test-corpus/cmvn.scp scp:data/test-corpus/feats.scp ark:-

We will create this features file separately, by running
$ apply-cmvn --norm-means=false --norm-vars=false --utt2spk=ark:data/test-corpus/utt2spk scp:data/test-corpus/cmvn.scp scp:data/test-corpus/feats.scp ark:features.feat

You should see features.feat file created. We can now run the decoding with this file as an input
nnet3-latgen-faster --frame-subsampling-factor=3 --frames-per-chunk=50 --extra-left-context=0 --extra-right-context=0 --extra-left-context-initial=-1 --extra-right-context-final=-1 --minimize=false --max-active=7000 --min-active=200 --beam=15.0 --lattice-beam=8.0 --acoustic-scale=1.0 --allow-partial=true --word-symbol-table=exp/api.ai-model/words.txt exp/api.ai-model/final.mdl exp/api.ai-model/HCLG.fst ark:features.feat ark:lat.1

Here, I simply replaced the third argument as the feature file and the fourth argument as lat.1 output file, without piping to lattice-scale.

Finally, it is time to run this on CLion debug mode. From CLion's edit configuration, select nnet3-latgen-faster. Enter the program arguments, copied from the above and make sure to set the working directory as the current directory, i.e., egs/apiai_decode/s5. You can set the breakpoint in main function, say line 38 and start debugging with CLion. It should all work well!