Using symmetrical encryption in the correct way

Introduction

As I have learned more about cryptography, I have also discovered that there are more than a few ways in which we can get things wrong, or at least to provide more attack vectors than we realised. We might think that using AES256 in itself is good but this is only part of the chain.

Using AES badly

Firstly, imagine you use AES256 but you don't understand what an initialization vector (IV) is. You might do this:

$ciphertext_raw = openssl_encrypt( $clear_text, "aes-256-cbc", $key, true, "FIXED IV" );

What you can potentially do (depending on what you are encrypting) is create a known-plaintext vulnerability in your data. If you do not use a different IV with each encryption (but you use the same key) then every time you encrypt the same "plain text" (I use this term to refer to any data that is encrypted whether text or not) you will produce the same encrypted data. Why is this bad? Well if an attacker has an account on the site they attack with the password "password123" and then they discover that this encrypts to 0xABC123XYZ and then they see another person's password also encrypts to 0xABC123XYZ, they know, without knowing the encryption key, that the other person's password is also "password123". If the other person is identifiable because their email address or login is stored in plain text, the attacker can easily gain access to the victim's account. Note that the encryption itself has not been broken, yet an attack has still been possible. This was a similar attack vector to what happened at Adobe.

Using AES with an Initialization Vector

OK, so you know that you need an initialization vector for these block-mode encryption algorithms like AES-CBC-256 but did you know that the IV does not need to be private. Why? Because it doesn't actually provide any security to the algorithm, it's sole purpose is to obscure the output data so that encrypting the same thing multiple times produces completely different encrypted data. Now the attacker looks through the table and can't find any matching encrypted passwords to his own. Even though the IVs might be visible in the table, it is not feasible to subtract the IV from the encrypted data to find out what the underlying data is or to substitute the IV for the attackers IV. The only way the attacker would be able to break the system would be brute force to find out what key was used for the encryption. For AES256, the number of keys is so large as to make this practically impossible. This might look like this:

$iv = openssl_random_pseudo_bytes(16);
$ciphertext_raw = openssl_encrypt( $clear_text, "aes-256-cbc", $key, true, $iv );

Still not perfect without signing

Well, I say "the only way to break the system" but this is not strictly true. It would be easy to stop here and assume we have achieved security but there are still two things that an attacker could do, which of these is used depends on the type of data that is encrypted. Note that neither of these attacks the cipher itself, they essentially bypass what the security is trying to achieve.

The first of these is a tamper attack. Imagine an attacker has access to their company's payroll database and the salary column is encrypted. They want to modify their own salary but obviously won't be able to break the encryption. What an attacker can do is change bits of the encrypted data and hope to achieve a favourable result. Bearing in mind that it is quite possible that the decrypted salary column becomes an integer in the software, it is more than feasible that a few fiddles with the encrypted data, when cast to an integer, will achieve a favourable (and not too obvious) result. This is easier if the attacker can keep testing the change until it is just right - perhaps in the front-end or using other people's accounts.

The fix for this is to "sign" the data in a way that means tampering is detected. The most famous of these systems is probably hmac and involves performing a hash-based operation on the encrypted data and then storing the signature with the encrypted data. When the data is decrypted, the operation is performed again and compared with the stored hash to make sure they match. If an attacker tampers with the data, without knowing the key to the hmac algorithm, they will not be able to recompute the signature to match any changes. The chances of success are negligible. The signing looks like this:

$salt = openssl_random_pseudo_bytes(8);
$iv = openssl_random_pseudo_bytes(16);
$key = hash_pbkdf2( $base_key, $salt, 1000, 64); // 64 = 512 bits 
$clear_text = $data; 
$ciphertext_raw = openssl_encrypt( $clear_text, "aes-256-cbc", $key, true, $iv );
$hash = hash_hmac( "sha256", $ciphertext_raw, $key );

$ciphertext = base64_encode( $iv.$salt.$hash.$ciphertext_raw );

One last weakness

So we have now finished? No we haven't. There is still another attack and hopefully by now, you will appreciate that encryption is not just fit-and-forget. If you are a larger company or are securing important data then you should consider paying security consultants or using commercial-grade software that is known to be strong rather than rolling your own. You should also ensure you have followed the owasp Top 10 coding guidelines for any web applications and use defense-in-depth.
Anyway.. the last weakness is another non-encryption weakness and might be suitable for attacking certain types of data. It is substitution. Let's go back to our payroll database and think, what would happen if I was able to copy the encrypted data from my manager's salary column and paste it into mine? Even though it might be signed, this could still succeed. Why? Because the data is still consistent with itself since it is not changed. In other words, the signature is still valid for the data that is encrypted.

The way to fix this is to include some user (or row) dependent data into the signature method. Again, like salt, this does not have to be secret since an attacker cannot easily use the information. They will not be able to substitute the encrypted data any more since the signature for the real data will relate to e.g. the victim's user id whereas once copied into the attacker's column, the decryption will use the attacker's user id and the signatures won't match. Note this can also be used to gain access to a victim's account by substituting the attacker's password for the victim's one without tampering with the data (and even possible changing it back afterwards so the victim does not realise!).

We could achieve this last fix by replacing:

$hash = hash_hmac( "sha256", $ciphertext_raw, $key );

with

$hash = hash_hmac( "sha256", $rowid.$ciphertext_raw, $key );

Code examples (except the last one) are taken from https://github.com/noahjs/php-authenticated-encryption-helper