Encryption and tokenization are great security tools—when executed properly—as they sidestep protecting data and instead attempt to make the data worthless to thieves. It's a great strategy. But when it's executed improperly, it can insidiously weaken security. This happens when IT gets cocky and overconfident that the data would indeed be worthless to attackers and starts to get lax implementing strong prevention tactics, such as firewalls.
What brings this all to mind is Apple's new approach—unveiled June 13—called Differential Privacy. Apple is using mathematical encryption techniques to anonymize data. But Apple is being infuriatingly vague as to the precise mechanics of Differential Privacy. For security professionals, this is a concern. Many will be tempted to try and replicate Differential Privacy in an attempt to anonymize other kinds of data and make them theoretically less attractive to thieves. But if doesn't work, the almost inevitable security complacency from believing that it is working is frightening.
This is how Apple's news release described Differential Privacy: "Starting with iOS 10, Apple is using technology called Differential Privacy to help discover the usage patterns of a large number of users without compromising individual privacy. In iOS 10, this technology will help improve QuickType and emoji suggestions, Spotlight deep link suggestions and Lookup Hints in Notes."
In its developer release notes, Apple got microscopically more specific: "iOS 10 introduces a differentially private way to help improve the ranking of your app’s content in search results. iOS submits a subset of differentially private hashes to Apple servers as users use your app and as NSUserActivity objects that include a deep link URL and have their eligibleForPublicIndexing property set to YES are submitted to iOS. The differential privacy of the hashes allows Apple to count the frequency with which popular deep links are visited without ever associating a user with a link."
Apple has also said: "To obscure an individual’s identity, Differential Privacy adds mathematical noise to a small sample of the individual’s usage pattern. As more people share the same pattern, general patterns begin to emerge, which can inform and enhance the user experience."
Here's where things go off the proverbial rails. Differential Privacy has mathematical limitations, but it's hard to know when those limits have been reached.
This was a terrific writeup on this in Cryptography Engineering, which addressed the programming realities—and limitations of this approach.
"It goes without saying that the simple process of 'tallying up the results' and releasing them does not satisfy the DP definition, since computing a sum on the database that contains your information will potentially produce a different result from computing the sum on a database without it. Thus, even though these sums may not seem to leak much information, they reveal at least a little bit about you," the story said. "A key observation of the Differential Privacy research is that in many cases, DP can be achieved if the tallying party is willing to add random noise to the result. For example, rather than simply reporting the sum, the tallying party can inject noise from a Laplace or gaussian distribution, producing a result that's not quite exact -- but that masks the contents of any given row."
But, the story continued, it gets messy after that.
Although "the amount of information leakage from a single query can be bounded by a small value, this value is not zero. Each time you query the database on some function, the total leakage increases and can never go down. Over time, as you make more queries, this leakage can start to add up. The more information you intend to ask of your database, the more noise has to be injected in order to minimize the privacy leakage," the story noted. "This means that in DP there is generally a fundamental tradeoff between accuracy and privacy, which can be a big problem when training complex ML models. Once data has been leaked, it's gone. Once you've leaked as much data as your calculations tell you is safe, you can't keep going -- at least not without risking your users' privacy. At this point, the best solution may be to just to destroy the database and start over. If such a thing is possible."
Exactly. And that brings us back to the key point. These systems are a convenience and little more. Any of these data-obscuring tactics are, by their very nature, limited. Take payment data tokenization. It's pointless if it can't be reversed, such as to deal with a return. And if it can be reversed, it's foolish to think it's valueless to a thief.
One of the early big data-breach victims in retail—TJX—famously told the SEC back in 2007 that its attackers had stolen the chain's decryption key, making encryption irrelevant. But not to worry, TJX told the feds, because the attacker stole the data before it was encrypted, so all is well.
And nowhere is this fear more needed than when trying to protect apps. App data is where the bulk of data-fraud activity will happen over the next few years and it's the most tempting place to cut security corners. After all, proper security limits what apps can do and how it can do them. It will be irresistible for some to not want to believe that data can be obscured, thereby allowing security corners to be cut.
Don't believe it. There is no replacement for redundant security levels and when anyone starts chattering away about "this makes the data worthless to thieves," ask them if your team can itself access it. And then allow logic and common sense to have their day.