/dec 11, 2020

How Password Hashing Algorithms Work and Why You Never Ever Write Your Own

By Fletcher Heisler

Are you fascinated with cryptography? You're not alone: a lot of engineers are. Occasionally, some of them decide to go as far as to write their own custom cryptographic hash functions and use them in real-world applications. While understandably enticing, doing so breaks the number 1 rule of the security community: don't write your own crypto. 

How do hashing algorithms work and what's special about password hashing? What does it take for an algorithm to get ready for widespread production use? Is security through obscurity a good idea? Let's see. 

How does password hashing work? 

Before storing a user's password in your application's database, you're supposed to apply a cryptographic hash function to it. (You're not storing passwords in plain text, right? Good. Just asking.) 

Any cryptographic hash function converts an arbitrary-length input (a.k.a. "message") into a fixed-length output (a.k.a. "hash", "message digest"). Asecure cryptographic hash function must be: 

  • Deterministic: hashing the same input should always render the same output. 
  • One-way: generating an input message based on a given output should be infeasible. 
  • Collision-resistant: finding two input messages that hash to the same output should also be infeasible. 
  • Highly randomized: a small change in input should lead to a significant and uncorrelated change in output (a.k.a. "the avalanche effect"). Without this property, applying cryptoanalysis methods will allow making predictions about the input based on the output. 

Now, there's general cryptographic hashing, and then there's password hashing that is somewhat special. 

Standard cryptographic hash functions are designed to be fast, and when you're hashing passwords, it becomes a problem. Password hashing must be slow. You want to make it as hard as possible for the attacker to apply brute force attacks to passwords in your database should it ever leak. This is why you want to make passwords hashing computationally expensive. How expensive? Well, it's a tradeoff between convenience for your legitimate users when they validate their passwords and making brute-force attacks hard for the attacker. 

To make hashing computationally expensive, a special kind of functions is commonly used: key derivation functions (KDFs). Under the hood, KDFs invoke hashing functions, but they add a random salt before hashing, and then apply numerous (usually thousands or tens of thousands) iterations of hashing. Ideally, they make brute force attacks both CPU-intensive and memory-intensive. 

A key derivation function produces a derived key from a base key and other parameters. In a password-based key derivation function, the base key is a password and the other parameters are a salt value and an iteration count (RFC 2898: Password-Based Cryptography Specification Version 2.0).

In password hashing discussions, the terms "hash function" (such as MD5 or SHA-1) and "key derivation function" (such as PBKDF2 or Argon2) are often used interchangeably although they're technically not the same. 

Why shouldn't you write your own password hashing algorithm? 

Both writing a custom hashing algorithm and creating your own implementation of a well-known algorithm are bad ideas. Why? 

You probably don't have the skills. Let's face it: cryptography is hard, and messing up an algorithm or implementation is easy, even for professionals. Should you go for creating your own password hashing, some of the things you'd need to take care of include: 

  • Ensuring pre-image resistance to prevent calculating the input based on the hash output. 
  • Ensuring high collision resistance to prevent finding two inputs that hash to the same output. 
  • Randomization and the avalanche effect to make sure attackers can't easily find hashing patterns and correlations between the input and the output. 
  • Resilience to a wide array of side-channel attacks (that is, attacks based on algorithm implementation details and examining the physical effects caused by invoking the implementation), such as timing attacks and cache attacks. 
  • Minimizing any efficiency gains attainable by attackers through the use of cracking-optimized hardware such as ASIC, FPGA, and GPUs. 

This is a lot on your plate - even more so given that you won't have access to qualified testers from the cryptography community to help you find (inevitable) vulnerabilities. 

You'll likely want to depend on secrecy and obscurity by keeping your algorithm private. Doing so breaks the fundamental doctrine of cryptography known as the Kerckhoff's principle: "a cryptosystem should be secure even if everything about the system, except the key, is public knowledge." Security by obscurity can provide a short-term advantage but relying on it long-term is a bad practice: 

  • Hiding vulnerabilities prevents revealing and repairing them as part of an open discussion and increases the probability of exploits. 
  • If your password database ever leaks, there's a good chance that the source code of your application will leak along with it, and as soon as your untested algorithm becomes known to the attacker, they'll have an easy time cracking it. 

You'll put sensitive user data at risk. Leaking sensitive user data is one of the worst things that can happen to a business. This is something that instantly undermines trust, turns customers away, and is very expensive to remediate. Some companies and lots of developers are prone to the Not Invented Here fallacy, but password hashing is probably the worst thing you can choose to re-implement. 

Most importantly, you won't know when your algorithm gets broken. 

Established algorithms and implementations benefit from years of testing and polishing by large communities of cryptography experts who help reveal and fix vulnerabilities without any malicious intent. 

Since your own algorithm and/or implementation won't be available to anyone with a good will, attackers will be the only category of people willing to crack it. Once they do that, they won't give you a heads-up; you'll only know when sensitive data of your users is compromised, and your business is in serious trouble. 

But what if you really want to level up your cryptography and learn by doing? 

That's great! Go forward and practice. Read reference implementations of existing algorithms, play with your own implementations, reach out to the community for advice, and have a great time learning something new and exciting! 

Just don't use whatever you've written in your production applications. 

To learn more, read our vulnerability decoder on insecure crypto

Related Posts

By Fletcher Heisler

Fletcher Heisler is the Director of Developer Enablement at Veracode, where he is focused on finding ways to engage engineering teams and security champions in modern security and DevSecOps practices. Fletcher joined Veracode from the acquisition of Hunter2, a platform for interactive secure code training, where he was founder and CEO. He previously ran Real Python, an online community of hundreds of thousands of software engineers around the world learning modern programming practices.