- Not encrypting numbers. Numbers were ignored in the inputs. Caesar was a Roman emperor who did not use Arabic numbers like we do in English. His numbers were represented by Roman Numerals which would be encryptable since they are the letters X, V, I, and L. Arabic numbers do not encypt using a simple shift cipher such as the Caesar Cipher.
Encryption
The encryption program (EncryptFile - see Appendix A for the code) uses any plain text input. It starts off by uppercasing the text then removing all whitespace. Then it generates one random shift between 1 and 25, inclusive. It applies this shift uniformly to the entire text using the Caesar encryption and saves the file.
We took out the spaces from the text before encrypting it because the caesar cipher normally does not contain spaces, because it would be too easy for a human to guess short words (like I and a). We uppercased the text because if we went with both uppercase and lowercase we would have double the possibilities and that would make it unnessacarily complicated and tedious.
We tested inputs of various sizes by character count and text type. We used technical articles from Wikipedia and literature passages from the on-line Project Gutenberg. We ran several tests and test inputs ranged from 550 to 1057 characters.
As an example, we used ``Robin Hood" from project Gutenberg [7]:
IN MERRY ENGLAND in the time of old, when good King Henry the Second ruled the land, there lived within the green glades of Sherwood Forest, near Nottingham Town, a famous outlaw whose name was Robin Hood. No archer ever lived that could speed a gray goose shaft with such skill and cunning as his, nor were there ever such yeomen as the sevenscore merry men that roamed with him through the greenwood shades. Right merrily they dwelled within the depths of Sherwood Forest, suffering neither care nor want, but passing the time in merry games of archery or bouts of cudgel play, living upon the King's venison, washed down with draughts of ale of October brewing.
Illustration 2: Encryption (EncyptFile) Flow Chart
Decryption
First, the decryption program reads the encrypted text. Next, the program tests all the 25 decryption shifts, one at a time. A ``shift test" is a test for a specific shift.
Every shift test is sent as a string to a method called ``analzyeWithJazzy" which looks carefully for English words, one after another. When an English word is found, we look further for a longer word that starts with this word (``checkForCompoundWord"), searching for up to 30 characters following this starting word. For example, if the word ``for" is found, and this is followed by ``est" the result will be ``forest", not just the first part (``for").
After the longest word is found, we add this word to the output, adding a space between what we found before and this new word to form the whole output built up of these individual words.
The way the spelling checker worked, we did not get returned the words that never matched in the spelling checker library. So if the original input was:
THERELIVEDWITHINTHEGREENGLADESOFSHERWOODFOREST
we would get back:
THERE LIVED WITHIN THE GREEN GLADES OF SHE WOOD FOREST
which left out the R in SHERWOOD FOREST (SHE WOOD FOREST)
So we sent the output back to get these missing character sequences that were not in the dictionary. After that our output looked like:
THERE LIVED WITHIN THE GREEN GLADES OF SHE R WOOD FOREST
The correct word (SHERWOOD) would never be better than SHE R WOOD because the spelling checker does not know SHERWOOD as a word but instead found the two short words SHE and WOOD, rejecting the non-word R.
As all the shifts are analyzed, the program keeps track of the shift that had the most English words found. In the end, after all 25 shifts are tested, a report is generated to summarize which shift was the best and to record the original input text (encrypted), the decrypted text with spaces, and the final output with the rejected characters added back in.
Illustration 3: Decryption Flow Chart (DecryptFile)