Decoding the Verizon DBIR 2009 Cover

As you probably know by now, the pattern of 1s and 0s on the cover of the 2009 Verizon Data Breach Investigations Report contains a hidden message. I decided to give it a whirl and eventually figured it out. No doubt plenty of people managed to beat me to it, as evidenced by the fact that I didn't get my solution in early enough to win the cash prize -- but so far, I haven't seen anybody write up a walkthrough, so I thought I'd do one.

If you haven't taken a crack at it yet and plan to, then stop reading -- SPOILERS AHEAD, as they say. Otherwise, I hope that this is helpful to anyone who is interested in learning more about basic cryptography.

I started by copying and pasting the binary digits from the cover of the report into a text file. Then, I converted the bits to ASCII resulting in the following text:

$ cat vz|unsplit|bin -d|split 72
EVNTXIGYIMWSNEHEIEFOTXBSCWYHRQMWGUZABVYCBBFREYFBVEDKEVMFRIFNGFNRBFGVKSFP
NBUFZJGCEEEWAKHPXEBTZJCZOWGTBSQGTMIAYDPYDRIRYETKCJRPYHEPWKUOAEKNVTVZHSMZ
NTTIVIKMMRYSNUIAKBRKQMSTYCGCCRLRRIIREFGYTJUBUXHEYSGLEYRVHIYXDEYZCJKVTOSO
IXJEHOXEVMWJBNZMTKWZEFOFCNBWNCUWMYFIUVBKWNPWTYOEYQTIRRYRCMNVFVLRSBNTPWPA
OCZPEKHLFCEERRVWVUYBVJPUVPOAYMIKQQNSWZGHZKDGYLAEGWPKESGCYZFVJDMEPQKSSLNV
SVPUVVRVYERHDTUTYYMQGEVWRMQSZFNPNRJIGGWAJNNJLKOEQHNETRPUQYDFZWCZKVJEXLMC
KCSIFTCTSUTLDRRMIKQTNINPGRPQQXPTZDPAIOTCEUAZFEWDQLLPZRHXLXQGSLRJTBLZRIRV
ISNZIWLMVYADVOHFEVNAKKGORRXSYGXPUMVGBOMRJLCREFCMRQVXTMIYMJJVHXNBTSZMTJEF
KFGKURFLNHXPKCWLEXMIYLGYNNRWAKSEWTHPKGZKKXGAZELLUTAYCIEKWISHUNDKEKWARGBY
ZFGKEPKQGZZSRIMFLGKARTURAINSNGEEUMEXRVEELZXTISUWVZKOYLTPBHZWEOQWNXNPXPKS
SXJHPANCVFPRYADRLROEWEBQEWHZRGATZDGUCEKLFYHZJNNZIJRGNZRVBOCAUYEZGKPSJXJI
ASMVFTDWFXBIDHQZEYKDRTDRIOPPKJRPISSKMCZJFZTBVBJUGEYANJIGJTDCPTZDEOGUTLZP
EKHTNIHTGGUMVGBOMRJLCREFSWFZOCROHEAU

The first thing I tried was Caesar shifts, which is basically ROT-n for values of n from 1 to 25. So for n=1, A is encoded as B, B is encoded as C, and so on, all the way through Z, which is encoded as A. For n=2, A is encoded as C, B is encoded as D... you get the picture. I won't print out the results of all the decodes because it would take up too much space, but suffice it to say, nothing interesting came out of it.

Next, I tried frequency analysis, which can be an effective way of deciphering a simple substitution cipher (i.e. a given plaintext character is always encoded as the same ciphertext). A simple substitution cipher reflects the tendency of a written language to use certain letters more than others. For example, in English, the most frequent letter, E, appears roughly 170x more often than the least frequent letter, Z, for a sufficiently large sample size. Here's the frequency distribution from the Verizon ciphertext:

$ cat vz|unsplit|bin -d|split 1|sort|uniq -c|sort -n
     21 D
     21 Q
     24 O
     25 X
     26 A
     26 H
     27 B
     28 L
     28 U
     29 J
     31 C
     32 M
     33 W
     34 F
     34 S
     36 P
     38 I
     40 N
     40 V
     40 Y
     41 G
     42 Z
     43 T
     44 K
     56 R
     61 E

As you can see, the most frequent character, E, was only three times as prevalent as the least frequent character, D, which meant it was unlikely to be a simple substitution cipher, provided the plaintext was English. The frequency distribution was far too different than what we would expect.

Just for kicks, I tried various transposition ciphers, rearranging the 900 characters into an M-by-N grid, for different values of M and N (M*N=900), and reading down the columns instead of across the rows. Frequency analysis already told us that we shouldn't expect to see any English text, but I thought some visual patterns might emerge. Wrong again.

Around this point, I saw somebody on Twitter mention that there were clues embedded in the body of the report, so I started skimming through it. At the bottom of page 48 is “yr puvsser vaqrpuvssenoyr” which ROT-13 decodes to “le chiffre indechiffrable.” Here’s where I went briefly astray by using Google Translate instead of just Googling the term. The literal French translation is “indecipherable figure” which made me think that the clue was that the whole thing was a hoax and the front cover was just a bunch of garbage. A friend reminded me that “le chiffre indechiffrable” actually refers to a Vigenère cipher, which would have been painfully obvious if I’d used regular Google search instead of Google Translate (smacks self on head). Logically, the Vigenère would've been the next target anyway, as it's just a simple substitution cipher with a twist.

If you're not familiar with how a Vigenère cipher works, it basically uses a keyword to cycle through different substitution maps. For example, if you were encoding ZZZZZZ with the keyword FOOBAR it would come out as ENNAZQ -- the letter Z is encoded differently depending on how it aligns with the keyword. You can see why frequency analysis isn't useful here.

My first inclination was to just guess the keyword outright. I thought maybe it was something obvious such as VERIZON, VZ, RISK, DATA, BREACH, VIGENERE, etc. I grabbed Crypt::Vigenere and tried each of the guessed keywords, but none of them worked. I even wrote a quick script to brute force all 2- and 3-letter keywords, again coming up with nothing.

Then I took a different approach -- trying to guess what the decoded message might contain and work backwards. I speculated that the first word would be CONGRATULATIONS which corresponds to a potential key of CHANGINEXMDKZRP. This didn't seem right, but the CHANGIN part of it seemed like too much of a coincidence. So I tried CONGRATS as the plaintext, which corresponded to the keyword CHANGING. I thought it was solved at this point, but decoding the entire ciphertext using CHANGING as the keyword still gave me junk. So then I searched through the PDF for the word CHANGING, and sure enough, on page 46, one of the bullet items says “Changing default credentials is key” (clever, huh). So I decoded with a keyword of CHANGINGDEFAULTCREDENTIALS and it worked. The text decodes to the following message:

$ cat vz|unsplit|bin -d|vigenere -d changingdefaultcredentials|split 72
congratsfirsttocrackgetsrewardgotowwwverizonbusinesscomslashdbirhunttocl
aimforeveryoneelsehighlvlstatsforfinsvcsandretailfollowplssharefinsvcsso
urcesexternalnineteeninternalninepartnertwothreatsmalwareelevenhackingfi
fteendeceitfourmisusesixphysicaltwoerroroneerrorsigcontributorinfifteent
opthreehacktypessqlinjectionsevenmisconfigaclssevendefaultcredstwotophac
kvectoriswebapptentopassetisonlinedatatwentysixandallrecordstopthreedata
typesauthcredelevenpiitenpymntcardeightpymntcardwasninetyeightpctofrecor
dstopuuisunknownconnectionssevendiscoverytakesweekstomonthsretailsources
externaltwentythreeinternalonepartnereightthreatsmalwaretenhackingtwenty
onedeceittwomisusetwophysicalzeroerrorzeroerrorsigcontributorinsixteento
ptwohacktypessqlinjectionsevenstolencredsseventophackvectorisremaccmgtei
ghttopassetisposelevenandoverhalfofrecordstoptwodatatypespaycardtwentyth
reepiininediscoverytakesmostlymonths

Had the message not begun with “CONGRATS”, there are some other techniques for attacking a Vigenère cipher, including trying to deduce the length of the keyword by looking for cyclical patterns in the ciphertext. Luckily, it didn't come to that because I wanted to watch TV.

I visited the embedded URL which said that somebody had already claimed first prize but that I was still in the top three. I later found out that about ten people, including myself, submitted solutions around the same time before the authors could update the congratulatory message. So I didn't win any money but it was still a lot of fun (and significantly better that the corny FBI challenge).

Veracode Security Solutions
Veracode Security Threat Guides

Comments (9)

Jebediah Webb | April 27, 2009 4:44 pm

I was really wishing it has said "Be sure to drink your Ovaltine"....

Alex | April 28, 2009 5:52 pm

@ Jebediah - actually, that was the first thing we thought of. Unfort. using a brand name would be problematic.

Nate | April 30, 2009 12:53 pm

Nice work, Chris. While this kind of cipher does level out the frequency of individual characters, it only reduces the distance between the least and most frequent letters, it doesn't eliminate the variation. This is because each group of ciphertext characters (where group is determined by the length of the key) is dependent on the same key bits as every other group. So a quick way to solve it is to first find the length of the key by breaking the message up into sets of regular groups and then doing frequency analysis between the groups with the index of coincidence (freq count of pairs or triplets, etc).

CEng | April 30, 2009 1:41 pm

@Nate: Yeah, I think that's the approach that this <a href="http://islab.oregonstate.edu/koc/ece575/02Project/Mun+Lee/VigenereCipher.html" rel="nofollow">Vigenere brute force applet</a> implements (the one Grant Stavely found and used to <a href="http://grantstavely.com/how-i-decoded-the-verizon-2009-dbir-cover" rel="nofollow">win the contest</a>). I was surprised to see just how effective that method was, considering the relatively small sample size of 900 characters. I don't quite understand why the dot products work, though I'm sure it would become more apparent with a little experimentation.

Erzengel | May 1, 2009 11:46 am

Your final solution (guess the word and work back from there) reminds me of what I did when "Order Of The Stick" (a webcomic) had a character (Haley) whose brain broke and so she ended up talking in cypher garbage. I guessed that it was cypher (Belkar, another character, later confirmed it in a 4th wall break), and started, much as you did, with Caesar shifts, but when that failed I tried frequency analysis but didn't really come up with much from such a small sample size. So I just guessed what she might be saying, based upon what was happening, and found the substitution cypher that way. Then the author changed the cypher on the next comic, with an even smaller sample size, so I kind of gave up. I'm not a cryptographer, though, so I don't exactly have the tools or knowledge you do at figuring these out.

isaac dawson | May 3, 2009 11:32 am

Hey Chris, Enjoyed the quick write up on the steps you took to crack it. After you got to the part where you realized it was a Vigenere cipher and started explaining how that works, I realized a SSO implementation I tested a little while back was the same type of algorithm! However, I had the luxury of an encryption oracle and could just choose the plaintext I wanted encrypted ;&gt;. Hope things are going well with you, ^isaac

haxor | May 4, 2009 1:18 pm

Lame encryption.

Allison Ego | January 17, 2011 3:53 pm

I like the valuable info you provide in your articles. I’ll bookmark your blog and check again here frequently. I'm quite sure I’ll learn many new stuff right here! Best of luck for the next!

Mocny katalog | April 10, 2013 12:19 am

Awesome blog!

Please Post Your Comments & Reviews

Your email address will not be published. Required fields are marked *

The content of this field is kept private and will not be shown publicly.