As you probably know by now, the pattern of 1s and 0s on the cover of the 2009 Verizon Data Breach Investigations Report contains a hidden message. I decided to give it a whirl and eventually figured it out. No doubt plenty of people managed to beat me to it, as evidenced by the fact that I didn't get my solution in early enough to win the cash prize -- but so far, I haven't seen anybody write up a walkthrough, so I thought I'd do one.
If you haven't taken a crack at it yet and plan to, then stop reading -- SPOILERS AHEAD, as they say. Otherwise, I hope that this is helpful to anyone who is interested in learning more about basic cryptography.
I started by copying and pasting the binary digits from the cover of the report into a text file. Then, I converted the bits to ASCII resulting in the following text:
$ cat vz|unsplit|bin -d|split 72 EVNTXIGYIMWSNEHEIEFOTXBSCWYHRQMWGUZABVYCBBFREYFBVEDKEVMFRIFNGFNRBFGVKSFP NBUFZJGCEEEWAKHPXEBTZJCZOWGTBSQGTMIAYDPYDRIRYETKCJRPYHEPWKUOAEKNVTVZHSMZ NTTIVIKMMRYSNUIAKBRKQMSTYCGCCRLRRIIREFGYTJUBUXHEYSGLEYRVHIYXDEYZCJKVTOSO IXJEHOXEVMWJBNZMTKWZEFOFCNBWNCUWMYFIUVBKWNPWTYOEYQTIRRYRCMNVFVLRSBNTPWPA OCZPEKHLFCEERRVWVUYBVJPUVPOAYMIKQQNSWZGHZKDGYLAEGWPKESGCYZFVJDMEPQKSSLNV SVPUVVRVYERHDTUTYYMQGEVWRMQSZFNPNRJIGGWAJNNJLKOEQHNETRPUQYDFZWCZKVJEXLMC KCSIFTCTSUTLDRRMIKQTNINPGRPQQXPTZDPAIOTCEUAZFEWDQLLPZRHXLXQGSLRJTBLZRIRV ISNZIWLMVYADVOHFEVNAKKGORRXSYGXPUMVGBOMRJLCREFCMRQVXTMIYMJJVHXNBTSZMTJEF KFGKURFLNHXPKCWLEXMIYLGYNNRWAKSEWTHPKGZKKXGAZELLUTAYCIEKWISHUNDKEKWARGBY ZFGKEPKQGZZSRIMFLGKARTURAINSNGEEUMEXRVEELZXTISUWVZKOYLTPBHZWEOQWNXNPXPKS SXJHPANCVFPRYADRLROEWEBQEWHZRGATZDGUCEKLFYHZJNNZIJRGNZRVBOCAUYEZGKPSJXJI ASMVFTDWFXBIDHQZEYKDRTDRIOPPKJRPISSKMCZJFZTBVBJUGEYANJIGJTDCPTZDEOGUTLZP EKHTNIHTGGUMVGBOMRJLCREFSWFZOCROHEAU
The first thing I tried was Caesar shifts, which is basically ROT-n for values of n from 1 to 25. So for n=1, A is encoded as B, B is encoded as C, and so on, all the way through Z, which is encoded as A. For n=2, A is encoded as C, B is encoded as D... you get the picture. I won't print out the results of all the decodes because it would take up too much space, but suffice it to say, nothing interesting came out of it.
Next, I tried frequency analysis, which can be an effective way of deciphering a simple substitution cipher (i.e. a given plaintext character is always encoded as the same ciphertext). A simple substitution cipher reflects the tendency of a written language to use certain letters more than others. For example, in English, the most frequent letter, E, appears roughly 170x more often than the least frequent letter, Z, for a sufficiently large sample size. Here's the frequency distribution from the Verizon ciphertext:
$ cat vz|unsplit|bin -d|split 1|sort|uniq -c|sort -n 21 D 21 Q 24 O 25 X 26 A 26 H 27 B 28 L 28 U 29 J 31 C 32 M 33 W 34 F 34 S 36 P 38 I 40 N 40 V 40 Y 41 G 42 Z 43 T 44 K 56 R 61 E
As you can see, the most frequent character, E, was only three times as prevalent as the least frequent character, D, which meant it was unlikely to be a simple substitution cipher, provided the plaintext was English. The frequency distribution was far too different than what we would expect.
Just for kicks, I tried various transposition ciphers, rearranging the 900 characters into an M-by-N grid, for different values of M and N (M*N=900), and reading down the columns instead of across the rows. Frequency analysis already told us that we shouldn't expect to see any English text, but I thought some visual patterns might emerge. Wrong again.
Around this point, I saw somebody on Twitter mention that there were clues embedded in the body of the report, so I started skimming through it. At the bottom of page 48 is “yr puvsser vaqrpuvssenoyr” which ROT-13 decodes to “le chiffre indechiffrable.” Here’s where I went briefly astray by using Google Translate instead of just Googling the term. The literal French translation is “indecipherable figure” which made me think that the clue was that the whole thing was a hoax and the front cover was just a bunch of garbage. A friend reminded me that “le chiffre indechiffrable” actually refers to a Vigenère cipher, which would have been painfully obvious if I’d used regular Google search instead of Google Translate (smacks self on head). Logically, the Vigenère would've been the next target anyway, as it's just a simple substitution cipher with a twist.
If you're not familiar with how a Vigenère cipher works, it basically uses a keyword to cycle through different substitution maps. For example, if you were encoding ZZZZZZ with the keyword FOOBAR it would come out as ENNAZQ -- the letter Z is encoded differently depending on how it aligns with the keyword. You can see why frequency analysis isn't useful here.
My first inclination was to just guess the keyword outright. I thought maybe it was something obvious such as VERIZON, VZ, RISK, DATA, BREACH, VIGENERE, etc. I grabbed Crypt::Vigenere and tried each of the guessed keywords, but none of them worked. I even wrote a quick script to brute force all 2- and 3-letter keywords, again coming up with nothing.
Then I took a different approach -- trying to guess what the decoded message might contain and work backwards. I speculated that the first word would be CONGRATULATIONS which corresponds to a potential key of CHANGINEXMDKZRP. This didn't seem right, but the CHANGIN part of it seemed like too much of a coincidence. So I tried CONGRATS as the plaintext, which corresponded to the keyword CHANGING. I thought it was solved at this point, but decoding the entire ciphertext using CHANGING as the keyword still gave me junk. So then I searched through the PDF for the word CHANGING, and sure enough, on page 46, one of the bullet items says “Changing default credentials is key” (clever, huh). So I decoded with a keyword of CHANGINGDEFAULTCREDENTIALS and it worked. The text decodes to the following message:
$ cat vz|unsplit|bin -d|vigenere -d changingdefaultcredentials|split 72 congratsfirsttocrackgetsrewardgotowwwverizonbusinesscomslashdbirhunttocl aimforeveryoneelsehighlvlstatsforfinsvcsandretailfollowplssharefinsvcsso urcesexternalnineteeninternalninepartnertwothreatsmalwareelevenhackingfi fteendeceitfourmisusesixphysicaltwoerroroneerrorsigcontributorinfifteent opthreehacktypessqlinjectionsevenmisconfigaclssevendefaultcredstwotophac kvectoriswebapptentopassetisonlinedatatwentysixandallrecordstopthreedata typesauthcredelevenpiitenpymntcardeightpymntcardwasninetyeightpctofrecor dstopuuisunknownconnectionssevendiscoverytakesweekstomonthsretailsources externaltwentythreeinternalonepartnereightthreatsmalwaretenhackingtwenty onedeceittwomisusetwophysicalzeroerrorzeroerrorsigcontributorinsixteento ptwohacktypessqlinjectionsevenstolencredsseventophackvectorisremaccmgtei ghttopassetisposelevenandoverhalfofrecordstoptwodatatypespaycardtwentyth reepiininediscoverytakesmostlymonths
Had the message not begun with “CONGRATS”, there are some other techniques for attacking a Vigenère cipher, including trying to deduce the length of the keyword by looking for cyclical patterns in the ciphertext. Luckily, it didn't come to that because I wanted to watch TV.
I visited the embedded URL which said that somebody had already claimed first prize but that I was still in the top three. I later found out that about ten people, including myself, submitted solutions around the same time before the authors could update the congratulatory message. So I didn't win any money but it was still a lot of fun (and significantly better that the corny FBI challenge).