Monday, February 17, 2020

Kaldi series 1 - setup/debug with CLion

In the series of posts, I will describe how to run automatic speech recognition (ASR) system with Kaldi. For the best debugging experience, I will describe step by step run/debug instructions with CLion, the best C++ IDE on non-Windows systems. FYI, I am running these commands on macOS 10.15 (Catalina), but should be similar on Linux systems.

In the very first series, we will simply setup Kaldi project on CLion for running and debugging.
$ git clone https://github.com/kaldi-asr/kaldi.git && cd kaldi

Kaldi recently added CMake support (Thank you so much!), and it will be so much easier for CLion to load the project now. Run CLion and open up Kaldi directory. Run Build --> Build All in Debug. This process will take quite some time, so please be patient.

Unfortunately, there are other things to take care of. The following commands will take some time to run, so be patient.
$ cd tools && make -j4
$ extras/install_irstlm.sh && cd ..

Once you are done, let's run a pre-trained model to see if it works fine.
$ cd egs/apiai_decode/s5
$ ./download-model.sh

We also need to let CMake-built binary files to be used. Edit path.sh as below:
export KALDI_ROOT=`pwd`/../../..
export KALDI_CMAKE_ROOT=`pwd`/../../../cmake-build-debug
[ -f $KALDI_ROOT/tools/env.sh ] && . $KALDI_ROOT/tools/env.sh
export PATH=$PWD/utils/:$KALDI_ROOT/tools/openfst/bin:$PWD:$PATH
[ ! -f $KALDI_ROOT/tools/config/common_path.sh ] && echo >&2 "The standard file $KALDI_ROOT/src/path.sh is not present -> Exit!" && exit 1
. $KALDI_ROOT/tools/config/common_path.sh
export LC_ALL=C

Lastly, edit tools/config/common_path.sh by replacing KALDI_ROOT to KALDI_CMAKE_ROOT as follows:
# we assume KALDI_CMAKE_ROOT is already defined
[ -z "$KALDI_CMAKE_ROOT" ] && echo >&2 "The variable KALDI_CMAKE_ROOT must be already defined" && exit 1
# The formatting of the path export command is intentionally weird, because
# this allows for easy diff'ing

export PATH=\
${KALDI_CMAKE_ROOT}/src/bin:\
${KALDI_CMAKE_ROOT}/src/chainbin:\
${KALDI_CMAKE_ROOT}/src/featbin:\
${KALDI_CMAKE_ROOT}/src/fgmmbin:\
${KALDI_CMAKE_ROOT}/src/fstbin:\
${KALDI_CMAKE_ROOT}/src/gmmbin:\
${KALDI_CMAKE_ROOT}/src/ivectorbin:\
${KALDI_CMAKE_ROOT}/src/kwsbin:\
${KALDI_CMAKE_ROOT}/src/latbin:\
${KALDI_CMAKE_ROOT}/src/lmbin:\
${KALDI_CMAKE_ROOT}/src/nnet2bin:\
${KALDI_CMAKE_ROOT}/src/nnet3bin:\
${KALDI_CMAKE_ROOT}/src/nnetbin:\
${KALDI_CMAKE_ROOT}/src/online2bin:\
${KALDI_CMAKE_ROOT}/src/onlinebin:\
${KALDI_CMAKE_ROOT}/src/rnnlmbin:\
${KALDI_CMAKE_ROOT}/src/sgmm2bin:\
${KALDI_CMAKE_ROOT}/src/sgmmbin:\
${KALDI_CMAKE_ROOT}/src/tfrnnlmbin:\
${KALDI_CMAKE_ROOT}/src/cudadecoderbin:\
$PATH

Tedious Kaldi setup is all done finally. Now, you need some audio file for testing, so simply create a wav file with your voice, saying whatever you want to be transcribed (in English). Make sure to use 16KHz sampling rate w/ 16-bit encoding. Save this file as test.wav. Let's run it!

$ ./recognize-wav.sh /PATH/TO/YOUR/WAV/test.wav

You should see its transcript in the log. Now let's debug, decoding for example, with CLion. As you can see from the log, the main decoding execution command is as follows:
nnet3-latgen-faster --frame-subsampling-factor=3 --frames-per-chunk=50 --extra-left-context=0 --extra-right-context=0 --extra-left-context-initial=-1 --extra-right-context-final=-1 --minimize=false --max-active=7000 --min-active=200 --beam=15.0 --lattice-beam=8.0 --acoustic-scale=1.0 --allow-partial=true --word-symbol-table=exp/api.ai-model/words.txt exp/api.ai-model/final.mdl exp/api.ai-model//HCLG.fst 'ark,s,cs:apply-cmvn --norm-means=false --norm-vars=false --utt2spk=ark:data/test-corpus/utt2spk scp:data/test-corpus/cmvn.scp scp:data/test-corpus/feats.scp ark:- |' 'ark:|lattice-scale --acoustic-scale=10.0 ark:- ark:-  >exp/lat.1'

This big command consists of multiple execution piped in a convoluted way, so let's do one by one. The main execution binary nnet3-latgen-faster takes 4 arguments, as you can see from
$ nnet3-latgen-faster

By the way, it is likely that you will get command not found error, so let's do this first
$ export KALDI_CMAKE_ROOT=$(pwd)/../../../cmake-build-debug
$ source ../../../tools/config/common_path.sh

Now, try again
$ nnet3-latgen-faster

The first two arguments are provided from the files, i.e. exp/api.ai-model/final.mdl and exp/api.ai-model/HCLG.fst.

The third argument is features, which is read from stdin from running the command
apply-cmvn --norm-means=false --norm-vars=false --utt2spk=ark:data/test-corpus/utt2spk scp:data/test-corpus/cmvn.scp scp:data/test-corpus/feats.scp ark:-

We will create this features file separately, by running
$ apply-cmvn --norm-means=false --norm-vars=false --utt2spk=ark:data/test-corpus/utt2spk scp:data/test-corpus/cmvn.scp scp:data/test-corpus/feats.scp ark:features.feat

You should see features.feat file created. We can now run the decoding with this file as an input
nnet3-latgen-faster --frame-subsampling-factor=3 --frames-per-chunk=50 --extra-left-context=0 --extra-right-context=0 --extra-left-context-initial=-1 --extra-right-context-final=-1 --minimize=false --max-active=7000 --min-active=200 --beam=15.0 --lattice-beam=8.0 --acoustic-scale=1.0 --allow-partial=true --word-symbol-table=exp/api.ai-model/words.txt exp/api.ai-model/final.mdl exp/api.ai-model/HCLG.fst ark:features.feat ark:lat.1

Here, I simply replaced the third argument as the feature file and the fourth argument as lat.1 output file, without piping to lattice-scale.

Finally, it is time to run this on CLion debug mode. From CLion's edit configuration, select nnet3-latgen-faster. Enter the program arguments, copied from the above and make sure to set the working directory as the current directory, i.e., egs/apiai_decode/s5. You can set the breakpoint in main function, say line 38 and start debugging with CLion. It should all work well!