What every developer should know about password hashing before writing it!

Another leak here: Macrumors leak and I start getting all annoyed again about how often online systems are NOT using best-practice when it comes to password storage. It sounds like the system was hacked due to account privilege escalation but it doesn't really matter. If a developer does not have a proper understanding of password hashing, they should not be allowed anywhere near a password system, they should certainly not be writing their own password system.

Sadly, much data on the web on such matters is either inaccurate, too opinionated and often out-of-date but this is not always easy to notice when most forums do not expire content related to technical information, even though they probably should. Anyway, although I'm sure there are plenty of other articles out there about password hashing, some of which are written by people who know much more than I do, I want to write something by way of an introduction to password hashing and how it is used and why. Hopefully, when people understand, they will stop following bad practices and even if we see breaches in the future, users will not be so worried about it.

I guess before we even start, web apps themselves should be secure and follow best-practice guidelines. For instance, using stored procedures in a certain way would mean the web app cannot access everyone's password even when hacked. Likewise, the connection should not be made using the sa user which can do anything on the database anyway. These types of practices are beyond the scope of this article but, in my opinion, another area that every developer should be familiar with is https://www.owasp.org who look after security so other people do not need to do their own research. They have a whole wealth of information on security and you should know where it is.

Passwords

Right, we all know what passwords are. We hopefully also know that most people use the same password on most of the web apps they are members of. That means that if only one site spills the beans, the other sites are vulnerable. Your web app is not an individual, it is part of a community and you should take that responsibility seriously. The best way you can manage passwords is to use a single-sign-on service like google, twitter or PixelPin so that the password issue, including how to securely store them is in the hands of companies who specialise in it and haven't glued two pieces of paper together and written your password on it in crayons like some sites appear to!

Doing it your way

I know many of you are saying, "but I don't want to use a 3rd-party for reason xyz". In most cases, I would disagree with you but lets assume that you really need to reinvent the wheel and make a user create a totally new account on your site.

Firstly, I hope that you can easily understand why you should never, never, never, never, never store user passwords in plain text. There are so many ways in which database contents can be leaked that this would be criminally negligent. A rogue worker, someone at a data center with machine access, a hacker, another customer who has shared access, a careless developer, a vulnerability in a framework. All of these are risks and a plain text password is plain wrong! What's that? You need to store passwords in plain text so you can send them to users who forget? Please do not do that! Email is insecure in many ways, including people reading over your shoulder but also, many connections are not encrypted and the password is there for anyone reading the network connections.

Encrypting passwords (using symmetric encryption like AES or DES - encryption that can be decrypted) is another issue that has surfaced recently due to the Adobe breach. The general received wisdom is that although the encryption itself might be very solid (at least hopefully, it is not using something old and weak like DES) that an attacker might well have access to the decryption key which makes every single piece of data decryptable. In most cases, you should not use symmetric encryption for passwords, using something called hashing is preferred. For one reason, doing hashing properly and choosing a strong password makes cracking the password as good as impossible for anyone.

Encrypting Passwords

So you obviously understand that if a password is not to be stored in plain text, it obviously needs to be 'encrypted' in some way. The three broad schools of encryption are called symmetrical encryption, asymmetrical encryption and crytographic hashing. For the purposes of this discussion, the first two are the same in that encrypted data can be decrypted by a key and as discussed above, this raises the danger of the key being accessed/stolen by an attacker at which point, none of the data is safe any more. Hashing is a little different in that the password is encrypted and stored but it cannot be decrypted meaning that theoretically, even if an attacker stole the hash, they wouldn't know what the original password is, the application is also unable to decrypt it so how can that be useful?

Hashing Passwords

The trick to using hashing is a property of a hashing algorithm (there are various algorithms available, we will discuss this later) in that if you hash the same data using the same algorithm, it will ALWAYS produce the same output which is a "hash", a series of bytes, most often seen as a base-64 encoded string of a certain length (dependent on the algorithm used) which makes it easy to read and transmit across channels like the web which are not binary friendly.

When a user creates an account, you store the hash of their password. When they login, you hash the password they type and if it matches the hash in the database, they typed in the correct password, otherwise they didn't. Theoretically, because the algorithm produces a hash, more than one password might create the same hash (a "collision") but this is so unbelievably unlikely that it is not considered a problem. In fact, if an algorithm is found to have too many collisions then it is discredited and not used any more.

As a basic example if you hash the word "password" (without the quotes) using the common algorithm known as MD5, it will produce a hash that looks like this: 5f4dcc3b5aa765d61d8327deb882cf99 Even though I told you that this hash was produced from "password" there is no known way to directly compute "password" from this hash. It is consider a one-way function. This is a bit like multiplication in maths where it is very easy to multiply two numbers to produce a result but much harder to work out what these factors are just from the result.

At its most basic level, hashing already adds some security because if someone read your database and saw that your password was stored as 5f4dcc3b5aa765d61d8327deb882cf99, they would not immediately know that your password was "password".

Basic Hashing Weaknesses

There is, however, a problem with just using a pure hash. The weakness is because the hashing algorithm will always produce the same result for the same input so if I hash a load of common passwords (including "password") and store the hashes in a big lookup table, when I come across 5f4dcc3b5aa765d61d8327deb882cf99, I can look it up in my table and see that it was produced from "password".

Ineffective Improvements

Developers are a strange breed and sometimes think they understand things that they don't. For instance, somebody thinks that rather than using a common hash algorithm, they will do something strange like invent their own hash algorithm, either from scratch or based on other algorithms. In most instances this causes something that is either no better than a basic algorithm or in some cases much weaker. The amount of time and work that has been put into attacking common algorithms proves how strong they are. If your home-made algorithm has not been reviewed in the same way (which it won't be!) then there is no way you will write anything that is any good. Please don't ever invent your own mechanisms, they are not needed since the correct way is very easy to do.

Another ineffective improvement is to add a fixed "salt" to every password before storing it. The idea being that the salt is then added to a typed in password and the hashes are compared in the same way as before. The thinking is that if I add salt of, say, "thisismysalt" to the end of "password" before I hash it, I will not get 5f4dcc3b5aa765d61d8327deb882cf99 anymore but 1d63491d7f52a91da41213205b422062 in other words, when the attacker sees it in the database, it won't match their lookup table! Win? Nope sadly not. If the attacker gets enough passwords, they can assume certain things like the most commonly occurring hash is likely to relate to one of the top 10 most common passwords like "password123", "letmein", "password" etc. in which case, it will not take long to work out the system used for hashing and what salt is used, all the attacker has to do is start hashing various combination of the top 10 passwords with data after it and perhaps before it. Some cracking systems can perform billions of such hashes per second so we might only be talking about minutes and as soon as one breaks and shows that the password was "passwordthisismysalt" it would not take an expert to realise how the passwords are constructed, at which point, the attacker simply re-hashes their password list with "thisismysalt" on the end and job-done!

Another popular method is to perform multiple hashes on the same data. Rather than running MD5 once, you run it in a loop hashing the hash for, say, 1000 times. This adds some amount of time to the process both for the attacker and the system itself but it still doesn't really help when the attacker has enough data. In the same way as attacking fixed salt, they can try various combinations of hash iterations to find what they are looking for since they can still assume that the most common hash probably relates to one of the top 10 most common passwords.

Using Salt Properly

If you are going to add a salt to a hash, a very minimum, it should be a different salt per user. Ideally, it should be random, relatively long (i.e. hard to guess) and never re-used either across users or for the same user (otherwise if historical data was eventually cracked, the information could be used to attack newly hashed data). PHP now includes very easy-to-use password hashing functions and these should be used if they are available. They are shortcuts to using bcrypt directly which makes it much easier for people who do not understand all the options. The defaults are good but they can be improved over time. .Net has security classes that perform the same functionality depending on what exactly you want.

The beauty of a "variable salt" is that you no longer have patterns of data in your database, you can no longer determine which of the passwords relates to "password" and which relates to "thisisaveryhardpasswordtocrack becAuseitislongandh@sweirdcharacter£init" this makes the work much harder even, as is the case with bcrypt, the salt is stored alongside the password hash.

Is Pepper Good

Another tactic that is often cited but should not be confused with the purpose of salt is pepper. The idea is that pepper is a deterministic way of introducing additional data to the password before hashing but it is not stored with the hash or salt in the database so is unknown to an attacker who only has the database data and not the source code. It could be something like the userid transformed in some way such as reversed, upper cased and then perhaps with a long fixed string added to the end so that it is still different per user but does not have to be cryptographically random like salt should be.

Defeating attackers with size and speed

There are, above these other techniques, two ways to defeat an attacker. The first uses the size of an algorithm to make it much less likely that an attacker can cover the required number of "guesses" before they make a match. For instance, MD5 only produces an output of 128 bits which would mean that guessing every value of MD5 would take 3^38 attempts and an average hit would take half of this time 1.7^38. Currently, this sounds like an impossible task but with some other data to hint at the answer and enough computing power (in excess of 7 billion guesses per second), these attacks are quite trivial. Compare this with something like SHA-512 which produces 512 bits of output and the number of combinations possible is now 1.3^154 which is massively more complex than MD5. This is also slower to guess (circa 200M/second) so this is a good way to defeat an attacker.

The alternative, arguable cleverer, way is to slow down the process deliberately. Imagine if your hash algorithm took, say, 300 milliseconds to compute. This would not be noticeable when one user was logging into a system but would slow down an attacker who could no longer try millions of guesses per second but just 3 per core per second! bcrypt, which uses blowfish is designed exactly for that and in many ways is a great defence for passwords. The only major problem is that the slowness incurs a memory overhead which makes it unsuitable for low-memory devices although even that is likely to become less of a problem over time.

Recommended Setup

Whatever I recommend is likely to end up being controversial but I might as well be brave! If you are using PHP, the following code is all you need to store a password and check it again afterwards:

// Note, requires PHP 5.5 Look at each function here http://us3.php.net/manual/en/ref.password.php to find equivalent code for earlier versions
function signup($username,$userpassword,etc...)
{
   // Do whatever checks are needed
   $hashedPassword = password_hash($userpassword, PASSWORD_DEFAULT);

   // Save $hashedPassword to database
}

function authenticate($username,$userpassword)
{
   // Get hashed password from database where username = username into $row
   if (password_verify($userpassword, $row->password))
   {
      // Success
   }
   else
   {
      // Failure
   }
}

One of the great things with these functions is that you can upgrade the "cost" or algorithm of the password hash as you go along. You can then test the existing database entries with password_needs_rehash() to see whether it is out-of-date. If so, check the password entered matches and if so, rehash the entered password and update the database!

.Net is a little different in that there is no built-in blowfish implementation. You can do one of two things, you can bring in another library like BouncyCastle that provides bcrypt or you can use PBKDF2 instead which is designed to do a similar thing to bcrypt but is not so cost-intensive. You can achieve this like this:

//Disclaimer: I have not used this code so it might not work exactly out of the box. I use PBKDF2 to create encryption keys using code like this
public void signup(String username, String password)
{
   // Do whatever checks are needed
   // Generate random salt using your own function or something like System.Web.Security.Membership.GeneratePassword
   var rfc2898 = new Rfc2898DeriveBytes(password, randomSalt);
   var hashedPassword = rfc2898.GetBytes(32);
   var combinedHash = randomSalt + "$" + Convert.ToBase64String(hashedPassword);    // Use dollar to make it easier to split later

   // Save the username and combinedHash in the database
}

public void authenticate(String username, String password)
{
   // Get hashed password from database where username = username
   // Split hashed password from database into dbpassword and dbsalt using the $ symbol
   var rfc2898 = new Rfc2898DeriveBytes(password, dbsalt);
   var hashedPassword = rfc2898.GetBytes(32);        // This should match what is in the database if successful

   if ( hashedPassword == dbpassword )
   {
      // Success
   }
   else
   {
      // Failure
   }
}

Edit: Thanks to Duncan Smart for pointing out that MS already do what I was attempting above here: http://msdn.microsoft.com/en-us/library/system.web.helpers.crypto.hashpassword and http://msdn.microsoft.com/en-us/library/system.web.helpers.crypto.verifyhashedpassword

Conclusion

If you follow the suggestions above, then what an attacker gets is a randomised salted-hash, they have few if any clues (they may or may not guess what hash algorithm you are using) in which case they would have to resort to some kind of brute force. Even if they had a known-plaintext (one of the hacked hashes is for a password they know), they would have to spend some time working out what system is in use, and even if they eventually work out you are using bcrypt with 100 iterations and they know the salt from the database, they would have to construct a brute-force against each hash, one at a time and they may or may not choose a hash that is generated from an easy password, which would significantly slow down the attacker. It would not prevent them from cracking any passwords but it would certainly make them think twice about whether the effort was worth it. If you added some pepper to the system, if they did not have the code, they would probably not be able to crack any passwords at all.