|
CMU-CS-03-134
Computer Science Department
School of Computer Science, Carnegie Mellon University
CMU-CS-03-134
Uniquely Decodable n-gram Embeddings
Leonid Kontorovich
April 2003
CMU-CS-03-134.ps
CMU-CS-03-134.pdf
Keywords: n-gram, embedding, finite state automata,
strings
We define the family of n-gram embeddings from strings over a finite
alphabet into the semimodule N^K. We classify all x in N^K
that are valid images of strings under such embeddings, as well as
all x whose inverse image consists of exactly 1 string (we
call such x uniquely decodable). We prove
that for a fixed alphabet, the set of all strings whose image is
uniquely decodable is a regular language.
11 pages
|