How are passwords cracked?

Introduction

Every day it seems there is a new article in the newspaper about how some system was hacked and passwords stolen. How is this achieved and how can you make your passwords stronger so it is less likely that you will be a victim?
Sadly, most cases of password hacking are not carried out by people with advanced skills and Telephone Repair man uniforms who connect crocodile clips to various electronic panels inside the buildings. Almost always, an easily preventable flaw in the design of a web site allows an attacker to gain information which can then be used to obtain the database behind the site which stores the details of the passwords. The attacker then copies this data and processes it on a system somewhere until the passwords are cracked. Sometimes this success is published but I suspect often is not either because the victim did not know of the breach and sometimes because of the fear of lost reputation or legal precedings.
Anyway, I want to describe in straight-forward terms how passwords are hacked. I will not describe how the original site might be hacked simply what happens after the database is obtained.

Web Sites and Databases

Firstly the database is a technical term for what might be represented like a spreadsheet or several sheets of squared paper. Each sheet will represent what is called a table and one of these will probably be called "User" or something like that (the name is not important). Inside this table "User" is a row for each user and each column will contain some information about that user such as name, age, email address, user id, password etc. The site might collect all kinds of different information and store it in the database but we will assume for example that each row only contains user id, password and email address. Something like this:

 User name  password     email address
=======================================
 lb1        Password123  luke@gmail.com

If the site is poorly written, the passwords are simply stored as plain old text and can be read or used at will by the attacker. The most dangerous aspect of any password system is that your email address and password are very likely to gain access to many other accounts on other sites since most people share passwords. Obviously in this case, however good your password is doesn't matter since the attacker has to do no work other than hack the web site and read the data.

Hashing

Most sites, in my experience, will not store the passwords in plain text but will do something called hashing, which turns the plain text password into something that looks completely different. If we use a method called MD5 (don't worry about the name) then the password "Password123" always becomes "42f749ade7f9e195bf475f37a44cafcb". Why is that useful? Well firstly, you cannot tell what the password is directly since there is no obvious connection between the password and this "hash" code but more importantly, a good hashing method will make it 'unfeasible' (very hard) to compute what the original password is just from knowing the hash. When a user logs into the web site, we don't need to check the actual password but we can compare the hash of the password they type in with the hash of the password they registered with, since the hash always produces the same output for a given password, this will work.

Hashing Weaknesses

Sadly, although hashing sounds wonderful, the biggest problem with it is that the hashing methods are generally all publically available and I can do something called a reverse look-up attack. What this means is that I compute thousands or millions of hash codes from obvious passwords like password, password123, letmein etc. and store all of these in a large computerised lookup table. If I then obtain a hashed password like "42f749ade7f9e195bf475f37a44cafcb", I can then look it up in my pre-computed table and hopefully find a match.

Defeating Reverse Lookups with Salt

Salt describes some additional text that we add to the password, both when it is first registered and also when the user logs in after which we carry out the same matching process as before to verify the password. Why does this work differently than before? Let us take the example of Password123 which we already used but this time, I will add some random data to the end of it like (&*^( and has it again. This time, the hash produced with MD5 is "8bafefe15f21e75dd0e084ecd25752b2" which is not related at all to the original hash we produced "42f7...". Note that the password is still the same, the user does not have to type the random data in, it will be added automatically by the web site. This works pretty well and means that if somebody is attempting a reverse lookup on this hash, they are unlikely to have pre-computed an effective password of "Password123(&*^(". If the salt is long enough then the chances of cracking it with reverse lookup are very low for any password however complicated.

Defeating Salt with Human Behaviour

Although the salt system appears to add enough randomness to prevent what would appear to be the only type of attack someone can perform, it does have a weakness and that relates to human beings not liking passwords and therefore many people using the same password as each other. In order for the salting above to work, the random data must always be the same so that you can match the hashes to determine whether the password is correct. In other words, if 10,000 people use the same password, even with salt, the hash value for each will be the same. Why is this useful? If you have enough users in the data you have stolen, you can determine which are the most common passwords and compare these to the most common, say, 10 passwords that people use: password, 123456, 12345678, abc123, qwerty, monkey, letmein, dragon, 111111, baseball. You then have a much more simple task as an attacker, you would take the password "password" and then add random data to the end of it and then compute the hash for each of these until you had a match to one of the 10 most common hashes in your stolen data. So compute the hash of "password", "password0", "password1", ... "passwordabfgdhjk" etc, naturally we would assume the salt wouldn't be something stupid like "salt" but you never know!
This attack is quite straight-forward and can be accomplished within anything from seconds to minutes. Importantly, once the value of salt that is added to the password has been determined for one password, it can then be used in the calculation of all the other passwords since the one piece of truly secret data has been computed.

Defeating Human Behaviour with variable salt

There is hope however and it involves using a salt that is different for each user. There is not much point in making it different only for each password (for example repeating the password before hashing: "password" >> "passwordpassword">>Hash) since this does not prevent the patterns of data in the database. One simple example might be to add the username to the password before hashing it: "password" >> "passwordlb1" >> hash. What this means is that a different user can have the same password as me but it won't produce the same hash because the salt will be different. This removes the patterns of data from the stolen data which makes it harder for the attacker but it is still possible to crack and it then depends on brute force.
Defeating Variable Salt with brute force
Brute force generally suggests running through all possible combinations of data to find the real value you are looking for. Since there are trillions of possible passwords, the chances of getting through them (or statistically half of them) is pretty small so this all sounds good. There are two problems with this however. Firstly, some password cracking systems are VERY fast and can crunch billions of passwords a second. Secondly, even though potentially there are very many passwords that people can use, you can still assume that at least most of the user accounts will use one of the most popular 100 passwords. In this case, you know the input (or that it is one of 100 different values) you know the hash value(s) from the stolen data so all you have to do is work out the salt just for one password which will probably give you the mechanism by which the salt is added and again then opens the door to cracking the remainder.

Strong Passwords and Defeating Brute Force

Even if the attacker has gleaned, e.g. that the passwords have the userid added to the end of the password before hashing, imagine they now want to determine the password for user xyz. Effectively they are attempting to work out Hash("xyz") = and this is where strong passwords really win.
The attacker at this point has to compute hashes for as many combinations of passwords they can think of and keep checking whether the result is correct, if not, it tries again. The attacker will build up dictionaries of common (and not so common) passwords like "password", "password123", "monkey" and also include dictionary words like "hotel", "climate" and these will include all likely substitutions of capitals, numbers and punctuation such as "h0tel", "Hotel", "Hot3l", you get the idea. This is why English dictionary words are bad, even if you think you are being clever with substitutions. Also, adding things like 123 on the end of your password is also common so don't bother with that.
Once the attacker has exhausted their "dictionary", they would only have the option of starting with, say, "a" and then going through all characters before moving onto "aa" etc. Taking into account letters, capitals, punctuation, even an eight digit password contains 5e+14 combinations (5 with 14 zeros!) which will take a long time to compute. This is time perhaps somebody might bother with for a specific account they are trying to attack (like president Obama's) but unlikely to bother with for some random person's gmail account. In other words, long random passwords are good so what are your options?

Use a 'truly' random long password, say 16 characters, which you store in an electronic key chain like keepass. This way, you don't remember it, you just copy it from the electronic key chain. Not so useful for using when out and about unless you have access to the key chain.
Remember a sentence and use the first letters of each word to form the password. For instance, to do that with the first sentence in this bullet point would give you rasautfloewtftp which is nicely random (and long)!
Use a long sentence if the site allows you to. Sadly, many sites restrict the size of password to something like 10 characters but if not, you could use anything like "ThisIsMyPasswordAndYouWillNeverRememberIt" although making it slightly more random or personal would help here.