Decoding Data with Prime Numbers

Program to Encode Words

My previous post describes how any alphabet of symbols can be encoded as unique numbers. Now we will write the necessary functions to decode the numbers.

Our data type and functions below are used to encode the Hebrew alphabet (see previous post).

data Symbol
    = Alef | Bet | Gimel | Dalet | He | Vav
    | Zayin | Het | Tet | Yod | Kaf | Lamed
    | Mem | Nun | Samekh | Ayin | Pe | Tsadi
    | Qof | Resh | Shin | Tav
    deriving (Enum, Show)

exponentsFromSymbols :: [Symbol] -> [Integer]
exponentsFromSymbols = map (toInteger . (+ 1) . fromEnum)

primes :: [Integer]
primes = sieve [2..] where
    sieve [] = []
    sieve (first:rest) = first : sieve (filter (\x -> x `mod` first /= 0) rest)

encode :: [Symbol] -> Integer
encode symbols =
    product $
    map (\(prime, exponent) -> prime ^ exponent) $
    zip primes (exponentsFromSymbols symbols)

Decoding Numbers into Words

To decode a number into its Symbols, we first need a function that computes the exponent of each unique prime factor in the number. These are represented as a list of exponents, corresponding to the prime numbers generated by primes.

primeExponents :: Integer -> [Integer]
primeExponents n = factorize n primes where

    factorize 1 _ = []
    factorize _ [] = []
    factorize unfactored (prime:remaining)
        | unfactored < prime * prime = [1]
        | unfactored `mod` prime == 0 =
            let (count, rest) = countFactors unfactored prime 0
            in count : factorize rest remaining
        | otherwise = 0 : factorize unfactored remaining

    countFactors remaining factor count
        | remaining `mod` factor /= 0 = (count, remaining)
        | otherwise = countFactors (remaining `div` factor) factor (count + 1)

Each unique prime factor represents a position in the sequence (first = 2, second = 3, third = 5), and the exponent of that prime factor represents the Symbol in that position.

We also need a function to convert a Symbol's corresponding exponent back to the symbol.

symbolFromExponent :: Integer -> Symbol
symbolFromExponent number = toEnum (fromInteger number - 1)

We can now finally write the function that decodes a number back into the Symbols that it was produced from.

decode :: Integer -> [Symbol]
decode number = map symbolFromExponent (takeWhile (> 0) (primeExponents number))

Our functions are now sufficient for both the encoding and decoding of Symbols.

Sequenced Hebrew letters being encoded as numbers

Numbers being decoded into sequenced Hebrew letters