CMU-CS-03-134
Computer Science Department
School of Computer Science, Carnegie Mellon University



CMU-CS-03-134

Uniquely Decodable n-gram Embeddings

Leonid Kontorovich

April 2003

CMU-CS-03-134.ps
CMU-CS-03-134.pdf


Keywords: n-gram, embedding, finite state automata, strings


We define the family of n-gram embeddings from strings over a finite alphabet into the semimodule N^K. We classify all x in N^K that are valid images of strings under such embeddings, as well as all x whose inverse image consists of exactly 1 string (we call such x uniquely decodable). We prove that for a fixed alphabet, the set of all strings whose image is uniquely decodable is a regular language.

11 pages


Return to: SCS Technical Report Collection
School of Computer Science homepage

This page maintained by reports@cs.cmu.edu