Anyone watching and wondering: the dictionary is used to try to solve the problem of making words from a string of characters with no delimiters between words.
forexampleifenglishlanguagetextwaswrittenwithoutspacesbetweenthewordsandeveryletterwasavalidwordallbyitself
which should be tokenized as
for example if english language text was written without spaces between the words and every letter was a valid word all by itself
and not
ex ample...with out...word sand...it self...
(I hope these code boxes prevent the automatic translation...)
word sand
vs
words and
nicely illustrates the problem.
Craige