This post is to summarize how the probability is calculated from ARPA model.
Consider test.arpa and the following sequences of words:
look beyond
more looking on
Let's consider look beyond first. The log10 probability of seeing beyond conditioned upon look, i.e., log10(P(beyond | look)) = -0.2922095. This is directly from the test.arpa file, line 78.
What is, then, the probability of seeing look beyond? Well, this is by the chain rule of conditional probabilities
log10(P(look beyond))
= log10(P(look) * P(beyond | look))
= log10(P(look)) + log10(P(beyond | look))
= -1.687872 + -0.2922095 = -1.980081558227539,
which can be verified with python code
import kenlm
model = kenlm.LanguageModel('test.arpa')
print(model.score('look beyond', eos=False, bos=False)
Let's try the next sequence more looking on. Let us start with the chain rule
log10(P(more looking on))
= log10(P(more)) + log10(P(looking | more)) + log10(P(on | more looking))
The first term on the RHS is easy: log10(P(more)) = -1.206319 from line 34
The second term is a bit tricky, because we cannot find the bi-gram more looking from the model. Hence, we use the following formula:
P(looking | more) = P(looking) * BW(more)
where log10(P(looking)) = -1.285941 from line 33, and log10(BW(more)) = -0.544068 is the back-off weight, which can be read off from line 34.
Lastly, the third term is again not present in the model, so we reduce it to
P(on | more looking) = P(on | looking) * BW(looking | more)
where the first term is -0.4638903 from line 80, and the second term is assumed to be 1, because the bigram more looking does not exist in the model
Thus, we get log10(P(more looking on)) = -(1.206319 + 1.285941 + 0.544068 + 0.4638903) = -3.5
For more details, refer to this document. I also find this answer very helpful.