cracking codes with python dictionary

Python 3 program for dictionary attacks on password hashes Written by Dan Boxall, aka "apex123". However, lines 29 and 30 specifically check for this case and return 0.0 if the list is empty. def kasiskiExamination(ciphertext):112. The two keys produce two different ciphertexts, as expected. # To use, type this code: 5. 4. The list in origCase is then joined on line 207 to become the new value of decryptedText. Then when we enter eggs, it returns the only value stored in this list, which is 'hovercraft'. Even when a set converted from a list is reconverted to a list, it will still not have any repeated values. This tuple is appended to the end of freqScores on line 172: 171. keyAndFreqMatchTuple = (possibleKey, freqAnalysis.englishFreqMatchScore(decryptedText))172. freqScores.append(keyAndFreqMatchTuple). You can also pass list values to itertools.product() and some values similar to lists, such as range objects returned from range(). # Set the hacked ciphertext to the original casing:201. origCase = []202. for i in range(len(ciphertext)):203. if ciphertext[i].isupper():204. origCase.append(decryptedText[i].upper())205. else:206. origCase.append(decryptedText[i].lower())207. decryptedText = ''.join(origCase). Right now, the number of the words in possibleWords that are recognized as English and the total number of words in possibleWords are represented by integers. spam = {'name': 'Zophie', 'species':'cat', 'age':8}, def spam(eggs=42): print(eggs)spam()spam('Hello'). You can see that the float() function changes the integer 42 into a float value. (See “Getting Factors of Spacings” on page 283 for the reason why.) 66. However, the in operator executes on a very large dictionary value much faster than on a very large list. The second, more sophisticated method, which was used by the 19th-century mathematician Charles Babbage, works even when the key is a random group of letters, such as VUWFE or PNFJ. 43. As you can see, entering print(k, spam[k]) returns each key in the dictionary along with its corresponding value. Next, we need to decrypt the letters of the nth subkey with all 26 possible subkeys to see which ones produce English-like letter frequencies. 68. for i in range(2, MAX_KEY_LENGTH + 1): # Don't test 1: it's not useful. Make sure dictionary.txt is in the same directory as detectEnglish.py or this code won’t work. If the resulting value is 1, the program doesn’t include it in the factors list, so line 72 checks for this case: 72. if otherFactor < MAX_KEY_LENGTH + 1 and otherFactor != 1: 73. factors.append(otherFactor). Open a new file editor window by selecting File▸New File. Because these factors are stored as the first item of the two-integer tuples list in factorsByCount, we need to pull these factors from the tuples and put them in a separate list. In lists, we use an integer index to retrieve items in the list, such as spam[42]. # from https://www.nostarch.com/crackingcodes/: 17. ciphertext = """Adiz Avtzqeci Tmzubb wsa m Pmilqev halpqavtakuoi, lgouqdaf, kdmktsvmztsl, izr xoexghzr kkusitaaf. For example, allFreqScores[0] has a list of tuples for the first subkey along with the frequency match scores of each potential subkey, allFreqScores[1] has a list of tuples for the second subkey and frequency match scores, and so on: >>> allFreqScores[0][('A', 9), ('E', 5), ('O', 4), ('P', 4)]>>> allFreqScores[1][('S', 10), ('D', 4), ('G', 4), ('H', 4)]. These indexes start at seqStart + seqLen, or after the sequence currently in seq, and go up to len(message) - seqLen, which is the last index where a sequence of length seqLen can be found. Press F5 to run the program. Because the key is cycled through to encrypt the plaintext, a key length of 4 would mean that starting from the first letter, every fourth letter in the ciphertext is encrypted using the first subkey, every fourth letter starting from the second letter of the plaintext is encrypted using the second subkey, and so on. The dictionary file dictionary.txt (available on this book’s website at https://www.nostarch.com/crackingcodes/) has approximately 45,000 English words. Because the source code for the vigenereDictionaryHacker.py program is similar to previous hacking programs in this book, I won’t explain it line by line. 52. numLetters = len(removeNonLetters(message))53. messageLettersPercentage = float(numLetters) / len(message) * 10054. lettersMatch = messageLettersPercentage >= letterPercentage. The dictionary file sits on the user’s hard drive, but unless we load the text in this file as a string value, our Python code can’t use it. ... What is interesting is the way that some of the surrounding tools are explained, in particular the use of a dictionary to detect when a code has been broken. Then we name the dictionary variable englishWords and set it to an empty dictionary. getNthSubkeysLetters(1, 3, 'ABCABCABC') returns 'AAA'140. 88. for seq in seqFactors: 89. factorList = seqFactors[seq] 90. for factor in factorList: 91. if factor not in factorCounts: 92. factorCounts[factor] = 0 93. factorCounts[factor] += 1 94. Certain characters, such as numbers or punctuation marks, will cause our word detection to fail because words won’t look exactly as they’re spelled in our dictionary file. The for loop on line 131 iterates over each of the tuples in factorsByCount and appends the tuple’s index 0 item to the end of allLikelyKeyLengths. This chapter also introduced the split() method, which can split strings into a list of strings, and the NoneType data type, which has only one value: None. The percentage will always be between 0 percent (meaning no words are English) and 100 percent (meaning all of the words are English). This check is necessary to avoid a divide-by-zero error. Most likely, getEnglishCount() will return a float value between 0.0 and 1.0. # Detect English module 2. Two methods exist to hack the Vigenère cipher. If the string in message is made up of integers, such as '12345', the call to removeNonLetters() would return a blank string, which split() would be called on to return an empty list. This joins the strings in lettersOnly with a blank string between them. One method uses a brute-force dictionary attack to try every word in the dictionary file as the Vigenère key, which works only if the key is an English word, such as RAVEN or DESK. Notice that the string doesn’t have any spaces. Then the for loop on line 98 goes through each of the factors in factorCounts and appends this (factor, factorCounts[factor]) tuple to the factorsByCount list only if the factor is less than or equal to MAX_KEY_LENGTH. Similar to how the continue statement is used inside a loop to go back to the start of the loop, the break statement is used inside a loop to immediately exit the loop. The first integer in the tuple is the factor, and the second integer is how many times it appeared in seqFactors. Remember that we’re relying on the dictionary file to be accurate and complete for the detectEnglish module to work correctly. By "useful" we mean factors 58. Fortunately, we can use English dictionary files, which are text files that contain nearly every word in English. Whether you are looking to get more into Data Structures and Algorithms , increase your earning potential or just want a job with more freedom, this is the right course for you! GitHub Gist: instantly share code, notes, and snippets. Messages encrypted with the transposition file cipher can have thousands of possible keys, which your computer can still easily brute-force, but you’d then have to look through thousands of decryptions to find the one correct plaintext. Even though a computer has no problem decrypting a message with thousands of potential keys, we need to write code that can determine whether a decrypted string is valid English and therefore the original message. Table 20-5: Most Likely Subkeys for the Example Strings. When you run the vigenereDictionaryHacker.py program, the output should look like this: Possible encryption break:Key ASTROLOGY: The recl yecrets crk not the qnks I tell.Enter D for done, or just press Enter to continue breaking:>Possible encryption break:Key ASTRONOMY: The real secrets are not the ones I tell.Enter D for done, or just press Enter to continue breaking:> dCopying hacked message to clipboard:The real secrets are not the ones I tell. For the second step of getMostCommonFactors(), we need to sort the values in the factorCounts dictionary by their count. Before continuing with the next lines of code, you’ll need to learn about the extend() list method. Instead, other encryption programs will import detectEnglish.py so they can call the detectEnglish.isEnglish() function, which returns True when the string is determined to be English. Now let’s return to detectEnglish.py and set up the dictionary file. # Determine the most likely letters for each letter in the key:157. ciphertextUp = ciphertext.upper()158. 9. If the hacking program fails to hack the ciphertext, try increasing this value and running the program again. Hacking the Vigenère cipher requires you to follow several detailed steps. The next lines of code print the decryption output to the user to check whether the key has been found: 210. print('Possible encryption hack with key %s:' % (possibleKey))211. print(decryptedText[:200]) # Only show first 200 characters.212. To do this, we use print() but pass an argument to an optional parameter we haven’t used before. About Cracking Codes with Python. Meaning, the current code always uses the same MAC address. # When finding factors, you only need to check the integers up to 67. This alignment can happen at any multiple of the real key length (such as 3, 6, 9, 12, and so on), which is why the three-letter key can produce a repeated sequence with a spacing of 9. 56. def getUsefulFactors(num): 57. But no books teach beginners how … # MAX_KEY_LENGTH: 68. for i in range(2, MAX_KEY_LENGTH + 1): # Don't test 1: it's not useful. 61. if num < 2: 62. return [] # Numbers less than 2 have no useful factors. # getNthSubkeysLetters(1, 5, 'ABCDEFGHI') returns 'AF'143.144. For the first step of getMostCommonFactors(), we’ll set up the factorCounts dictionary on line 83, which we’ll use to store the counts of each factor. # [(, = wordPercentage. ('D', 4), ('G', 4), ('H', 4)], [('I', 11), ('V', 4), ('X', 4), ('B', 3)]. For example, seqFactors could contain a dictionary value that looks something like this: {'VRA': [8, 2, 4, 2, 3, 4, 6, 8, 12, 16, 8, 2, 4], 'AZU': [2, 3, 4, 6, 8, 12, 16, 24], 'YBN': [8, 2, 4]}. But we don’t need values associated with the keys since we’re using the dictionary data type, so we’ll just store the None value for each key. We’ll use this for loop to determine the start of the slice and slice message into a substring seqLen characters long. Run the following code in the interactive shell: >>> import freqAnalysis, vigenereCipher>>> for subkey in 'ABCDEFGHJIJKLMNOPQRSTUVWXYZ':... decryptedMessage = vigenereCipher.decryptMessage(subkey, 'PAEBABANZIAHAKDXAAAKIU')... print(subkey, decryptedMessage, freqAnalysis.englishFreqMatchScore(decryptedMessage))...A PAEBABANZIAHAKDXAAAKIU 2B OZDAZAZMYHZGZJCWZZZJHT 1--snip--, Table 20-4: English Frequency Match Score for Each Decryption. The for loop on line 202 goes through each of the indexes in the ciphertext string, which, unlike ciphertextUp, has the original casing of the ciphertext. Higher score means better match.167. Let’s look at the source code for the Vigenère hacking program. To avoid typos, copy and paste it from the book’s website at https://www.nostarch.com/crackingcodes/. The book features the source code to several ciphers and hacking programs for these ciphers. When we remove the non-letters in the ciphertext PPQCA XQVEKG YBNKMAZU YBNGBAL JON I TSZM JYIM. If the user skips a question and doesn’t answer it, it makes the most sense to assign quizAnswer to None as a default value rather than to True or False. for i in range(mostLikelyKeyLength):192. possibleKey += allFreqScores[i][indexes[i]][0]. # Look for this sequence in the rest of the message: 44. for i in range(seqStart + seqLen, len(message) - seqLen): 45. if message[i:i + seqLen] == seq: 46. Next, the for loop on line 88 loops over every sequence in seqFactors, storing it in a variable named seq on each iteration. Python 3 always does regular division regardless of the value type, whereas Python 2 performs integer division when both values in the division operation are integers. # getNthSubkeysLetters(2, 3, 'ABCABCABC') returns 'BBB'141. # Compile a list of seqLen-letter sequences found in the message: 37. seqSpacings = {} # Keys are sequences; values are lists of int spacings. This is usually faster than a brute force attack because the combinations of letters and numbers have already been computed, saving you time and computing power. Whether it’s successful depends on the characteristics of the ciphertext. 81. def getMostCommonFactors(seqFactors): 82. 16. for word in dictionaryFile.read().split('\n'):17. englishWords[word] = None. # Use a regular expression to remove non-letters from the message: 34. message = NONLETTERS_PATTERN.sub('', message.upper()) 35. The return value of len(message) will be the total number of characters in message. 49. return ''.join(letters)153.154.155. def attemptHackWithKeyLength(ciphertext, mostLikelyKeyLength):156. But checking the English frequency matching helped determine the four most likely letters for each subkey. The for loop at the beginning of the line iterates over each word to store each one in a key. These are the steps that our Vigenère hacking program will take. Enter the following into the interactive shell: >>> spam = {'name': 'Al', 'age': 99}>>> for k in spam:... print(k, spam[k])...Age 99name Al. for keyLength in allLikelyKeyLengths:230. keyLengthStr += '%s ' % (keyLength)231. print('Kasiski examination results say the most likely key lengths are: ' + keyLengthStr + '\n')232. hackedMessage = None233. For each possible key length, the code calls attemptHackWithKeyLength() on line 236. Cracking Codes walks you through several different methods of encoding messages with different ciphers using the Python programming language. Then you use square brackets again and enter the key 'name' corresponding to the nested string value 'Al' that you want to retrieve. # factorsByCount has a value like [(3, 497), (2, 487), ...].103. factorsByCount.append( (factor, factorCounts[factor]) )104.105. However, unlike with a list, you can index values in a dictionary using string values as keys instead of only integers. Finally, the value in hackedMessage is returned on line 253: Lines 258 and 259 call the main() function if this program was run by itself rather than being imported by another program: 256. The len() function correctly shows the length of this empty dictionary is 0. The wacker.py script makes use of a few f-strings among other python3-isms. This is an example of making the code backward compatible with previous versions. is passed after calling getEnglishCount(), the value stored in possibleWords after lines 25 to 27 execute would be ['HELLO', 'THERE', 'HOW', 'ARE', 'YOU']. The seqSpacings dictionary is returned from findRepeatSequencesSpacings() on line 53: Now that you’ve seen how the program performs the first step of the Kasiski examination by finding repeated sequences in the ciphertext and counting the number of letters between them, let’s look at how the program conducts the next step of the Kasiski examination. The program will then declare a list called word_order, used to determine the order of starting letters for parsing the word list. Here will come the main logic. We’ll explore how to use default arguments and calculate percentages in the following sections. If you use append() again to add 'eels' to the end of the list, eggs now returns 'hovercraft' followed by 'eels'. Book Name: Cracking Codes with Python Author: Al Sweigart ISBN-10: 1593278225 Year: 2018 Pages: 416 Language: English File size: 19.1 MB File format: ePub, AZW3. You can also iterate over the keys in a dictionary using for loops, just as you can iterate over the items in a list. First, we set up a dictionary called spam with two key-value pairs. Step 2 of Kasiski examination involves finding each of the spacings’ factors (excluding 1), as shown in Table 20-2. 36. These functions are helpful if you need a value’s equivalent to be a different data type. For example, when we try to access the dictionary with the key 42, we get the new value associated with it. 76. If a word isn’t in the dictionary text file, it won’t be counted as English, even if it’s a real word. The downside of not printing information is that you won’t know how the program is doing until it has completely finished running. If possibleWords were set to the empty list, the program execution would never get past line 30, so we can be confident that line 36 won’t cause a ZeroDivisionError. When you define functions, you can give some of the parameters default arguments. The tuples produced by itertools.product() each represent one key where the position in the tuple corresponds to the first index we access in allFreqScores, and the integers in the tuple represent the second index we access in allFreqScores. I’ll provide you with a dictionary file to use, so we just need to write the isEnglish() function that checks whether the substrings in the message are in the dictionary file. In this example, the function determined that the string 'Is this sentence English text?' - Code (and hack!) The len() function shows you the number of items in a list or the number of characters in a string. A for loop will iterate over each word in the words list, decrypt the message with the word as the key, and then call detectEnglish.isEnglish() to see whether the result is understandable English text. # https://www.nostarch.com/crackingcodes/ (BSD Licensed) 3. The set data type is similar to the list data type except a set value can only contain unique values. 1. print()213. print('Enter D if done, anything else to continue hacking:')214. response = input('> ')215.216. if response.strip().upper().startswith('D'):217. return decryptedText. 36. The figure also shows the repeated sequences in this string—VRA, AZU, and YBN—and the number of letters between each sequence pair. # (not punctuation or numbers).51. wordsMatch = getEnglishCount(message) * 100 >= wordPercentage52. For example, if we pass the 'PPQCAXQV...' example string as message, the findRepeatSequenceSpacings() function would return {'VRA': [8, 24, 32], 'AZU': [48], 'YBN': [8]}. But this is much better than brute-forcing through 26 × 26 × 26 × 26 (or 456,976) possible keys, our task had we not narrowed down the list of possible subkeys. For example, allFreqScores might look like this for a six-letter key where we found the four most likely letters for each subkey: >>> allFreqScores = [[('A', 9), ('E', 5), ('O', 4), ('P', 4)], [('S', 10),('D', 4), ('G', 4), ('H', 4)], [('I', 11), ('V', 4), ('X', 4), ('B', 3)],[('M', 10), ('Z', 5), ('Q', 4), ('A', 3)], [('O', 11), ('B', 4), ('Z', 4),('A', 3)], [('V', 10), ('I', 5), ('K', 5), ('Z', 4)]]. Enter the following into the interactive shell to see how to use the print() function’s end keyword argument: >>> def printStuff():➊ print('Hello', end='\n')➋ print('Howdy', end='')➌ print('Greetings', end='XYZ') print('Goodbye') >>> printStuff() Hello HowdyGreetingsXYZGoodbye. The plaintext could also be in a different language, but we’ll assume it’s in English for now. In those situations, a program can just pass arguments for wordPercentage and letterPercentage instead of using the default arguments. # Exclude factors larger than MAX_KEY_LENGTH:100. if factor <= MAX_KEY_LENGTH:101. Download the eBook Cracking Codes with Python: An Introduction to Building and Breaking Ciphers in PDF or EPUB format and read it directly on your mobile phone, computer or any device. 64. factors = [] # The list of factors found. # https://www.nostarch.com/crackingcodes/ (BSD Licensed) 3. You learned how to avoid divide-by-zero errors when using the / operator; convert values into other data types using the int(), float(), and str() functions; and use the append() list method to add a value to the end of a list. If we’re unable to crack this ciphertext, we can try again assuming the key length is 2 or 8. It takes less than five minutes for my computer to run through all these decryptions for a message the size of a long paragraph. "Cracking Codes with Python" also shows how to: Combine loops, variables, and flow control statements into real working programs; Use dictionary files to instantly detect whether decrypted messages are valid English or gibberish; Create test programs to make sure that your code encrypts and decrypts correctly; Code (and hack!) # Determine the most likely letters for each letter in the key:157. ciphertextUp = ciphertext.upper(). 55. We can use this information to figure out the subkey. Line 69 tests whether num % i is equal to 0; if it is, we know that i divides num evenly with no remainder, which means i is a factor of num. '), spam = {'key1': 'This is a value', 'key2': 42}, foo = {'fizz': {'name': 'Al', 'age': 144}, 'moo':['a', 'brown', 'cow']}, dictionaryVal = {'spam':0, 'eggs':0, 'bacon':0}, 'My very energetic mother just served us Nutella. Line 74 passes the list value in factors to set() to remove any duplicate factors: 74. return list(set(factors)) # Remove duplicate factors. So, we are applying brute force attack here with the help of while loop in python. But as they say, no real codes were broken in the making of this book. This value is useful for representing a lack of a value. The subkeys that produce decryptions with the closest frequency match to English are most likely to be the real subkey. Line 118 starts with an empty dictionary in seqFactors. The gaffer says something longer and more complicated. numLetters = len(removeNonLetters(message))53. messageLettersPercentage = float(numLetters) / len(message) * 10054. lettersMatch = messageLettersPercentage >= letterPercentage55. For instance, the third example in Table 11-1 shows that when the function is called with the second and third parameters specified, the program will use those arguments, not the default arguments. Enter the following code into the file editor and then save it as detectEnglish.py. As a result, allFreqScores[0] contains the frequency scores for the first subkey, allFreqScores[1] contains the frequency scores for the second subkey, and so on. When you need to add multiple values to the end of a list, there is an easier way than calling append() inside a loop. # getNthSubkeysLetters(3, 3, 'ABCABCABC') returns 'CCC'142. And setting NUM_MOST_FREQ_LETTERS to 26 would cause the program to skip narrowing down the number of possible letters for each subkey entirely! Keep in mind that trying to hack an incorrect key length that is short takes a short amount of time. 63. The factors of 9 are 9, 3, and 1. # Try every combination of the most likely letters for each position187. To try retrieving values from a dictionary using keys, enter the following into the interactive shell: >>> spam = {'key1': 'This is a value', 'key2': 42}>>> spam['key1']'This is a value'. Line 245 starts a for loop that calls attemptHackWithKeyLength() for each value of keyLength (which ranges from 1 to MAX_KEY_LENGTH) as long as it’s not in allLikelyKeyLengths. Otherwise, it might look like the user answered the question when they didn’t. STEP 2 Lines 130 to 134 store the separate list of factors in allLikelyKeyLengths. The repeated sequences occur when the same letters in the message (THE in our example) are encrypted with the same letters of the key (ABC and XYZ in our example), which happens when the similar letters in the message and key “line up” and encrypt to the same sequence. (Remember that the / operator always evaluates to a float value, such as 21 / 7 evaluating to the float 3.0 instead of the int 3.) This list shows that in the seqFactors dictionary that was passed to getMostCommonFactors(), the factor 3 occurred 556 times, the factor 2 occurred 541 times, the factor 6 occurred 529 times, and so on. This is why we don’t give detectEnglish.py a main() function. The dictionary file sits on the user’s hard drive, but unless we load the text in this file as a string value, our Python code can’t use it. We use the UPPERLETTERS constant to set up LETTERS_AND_SPACE, which contains all the uppercase and lowercase letters of the alphabet as well as the space character, the tab character, and the newline character. This dictionary has strings of sequences as its keys and a list of integer factors as the value of each key. Earlier in the code, we wrote the removeNonLetters() function to find all the letter and space characters in a string, so we can just reuse it. We exclude 1 because if the Vigenère key had a length of 1, the Vigenère cipher would be no different than the Caesar cipher! The first step of Kasiski examination is to find every repeated set of at least three letters in the ciphertext. Learn how to program in Python while making and breaking ciphers—algorithms used to create and send secret messages! Find many great new & used options and get the best deals for Cracking Codes with Python : An Introduction to Building and Breaking Ciphers by Al Sweigart (2018, Trade Paperback) at the best online prices at eBay! “Cracking Codes with Python” is a fun way of leanring Python. Originally, if we wanted to brute-force through the full Vigenère key, the number of possible keys would be 26 raised to the power of key length. For example, if we wanted to try only the first three most likely letters of each subkey (which is determined by NUM_MOST_FREQ_LETTERS) for a key that’s likely five letters long, the first value itertools.product() would produce would be (0, 0, 0, 0, 0). The findRepeatSequencesSpacings() function accomplishes the first step of the Kasiski examination by locating all the repeated sequences of letters in the message string and counting the spacings between the sequences: 28. def findRepeatSequencesSpacings(message): --snip-- 33. The possibleKey value decrypts the ciphertext by calling vigenereCipher.decryptMessage() on line 170. Instead, because our subkeys are stored in tuples in allFreqScores, we’ll access those letters by index values, which will range from 0 to the number of letters we want to try minus 1. That way, if the computer decrypts using the wrong key, it knows to go on and try the next possible key. We store this list value in a variable named factorsByCount, which starts as an empty list on line 97: 97. factorsByCount = [] 98. for factor in factorCounts: 99. Lines 10 and 11 set up a few variables as constants, which are in uppercase. If the hacking fails, the function returns None. Something like this: So, this was the core of whole attack. # By default, 20% of the words must exist in the dictionary file, and49. Then enter the following code into the file editor and save it as vigenereHacker.py.