One of the most interesting things that I didn’t understand before I really spent much time learning about computers is how passwords work. It seems kind of stupid, but it’s an interesting and important question: how is a site validating that you’re you?
The Naivest Possible Password System
Well, pretty simply, you can make a password work just by knowing the password and storing it in its most obvious form. If our password is “sesame”, then I’d store in a database, or a file, or my live programming code that value, “sesame”. Then all we’d do is ask? “Did the user give us a value exactly like the stored value?” And if so, they’re logged in.
If you’ve ever used a site or service’s “I forgot my password” system, and got back the text of your old, forgotten password in an email, they were probably using this system.
This is pretty simple as a system, and it’s sadly still used by some live, operating identity verification systems. If you’ve ever used a site or service’s “I forgot my password” system, and got back the text of your old, forgotten password in an email, they were probably using this system. (We could optimistically assume they do some crude but easily reversible obscuration method like Base64 encoding.)
This system has a pretty obvious downside, though, which you may have understood from my words: anyone who gains access to the file or database of source code where the password is available in plain text knows the password. It’s a pretty big issue, and is the reason that most systems of any repute long ago left this system behind.
This is where “hashing” comes in. In a hashing system, we never store or even know that your password is “sesame”. Instead, we take your input and process it in a specifically prescribed way — “hash” it — and store that result. For example, in this case my system might store the MD5 hash of “sesame”:
(Quick obligatory note: no one I respect on the topic of internet security would recommend MD5 as a good password hashing algorithm. You should not use MD5 hashes for your passwords, I’m just using it to illustrate the concept.)
It probably looks like meaningless forgettable gibberish to you, which is precisely the point. Now I as the administrator am unable to accidentally see your password, and an attacker is unable to quickly discover it, and you’re a bit more secure. When you type in “sesame”, we can dependably verify that it matches the password you set because the hash result for any given phrase should be both unique and repeatable. So what we store is the result of hashing your password, and what we check is that the hash of what you just typed matches our stored hash.
This is a big improvement over a plain-text system, as any good password hashing function is impossible to directly reverse. So even when an attacker to my system gets a record of all the “passwords” in my system, they don’t have the ability to log in as any arbitrary user.
This is a big improvement over a plain-text system, as any good password hashing function is impossible to directly reverse. So even when an attacker to my system gets a record of all the “passwords” in my system, they don’t have the ability to log in as any arbitrary user. Instead they have to find the input that creates a given hash, which is a more intensive exercise (but hardly impossible).
In fact its so possible that security researcher have given this vulnerability a name: “rainbow tables”. It’s a sunnier name than its result; essentially it means that even though hashing passwords is a good system, it’s not really good enough. A rainbow table means that if I compute the MD5 hashes of all possible passwords, and then compare them to a table of all the MD5 hashes of all users of a system, I’ll pretty quickly get access to any account whose password hash matches one in my computed result set.
Salt that Hash
This is where the final conceptual wrinkle of a good password system comes in: the salt. A salt is a little something you add to your hash to protect against a rainbow table type attack. To use it on your password, I simply add something random and unique — both traits matter — to your password “sesame”.
The specific of how this’ll look vary, but you can think of it as me changing your password to “lu5b93Msesame” and then running it through a hashing function. Just as before, the hash result is what’s stored and compared against. But, unlike before, we also store the salt, in this case “lu5b93M”. So when I want to verify your password I add the salt and hash, and it’ll match.
We’re using the salt for the basic reason that without it an attacker that figures out what the hash of “sesame” from our system looks like would quickly have access to any account with that as its password, because all the stored hashes would look the same. With the salt, even if the attacker knows it and how its added, they still have to “crack” each password individually. This is why it is critical that the salt be unique per password. If it were used the same for all of them, we’d have practically no gain against bare hashes.
So Much More To Know
Cryptography is a field where it’s really easy to go wrong, and rather difficult to know you have. Please understand that.
If you’ve understood this article, you now know the very basics of password systems and security. But you do not know how to write a good and secure password storage system. Cryptography is a field where it’s really easy to go wrong, and rather difficult to know you have. Please understand that. If you learn only one thing, make it that.
Many many vulnerabilities, potential vulnerabilities, and future vulnerabilities, exist to even the most well-engineered cryptographic system. And we’ve covered nothing at all of the complex issues of secure salt generation, the role of timing and known quality in secure hashing algorithms and more things that I both can’t remember and have never fully understood.
However, I’d hope you have gained some basic understanding of how passwords are and can be stored. And maybe next time you hear some people talking about passwords or cryptography, you’ll have at least enough understanding to follow the thread. The basic of hashing and salting are universal, and almost any reasonably secure password system uses both.
If you find yourself in a position of needing to create a login system, now or in the future: do your homework. Depend on outside expertise. Don’t build a system you can’t break into and mistake it for secure.
My most-used language, PHP, recently got
password_verify() functions which make it rather easy to securely salt and hash passwords. If you’re not so lucky, chances are good than someone with a little more understanding of cryptography has created a resource or library you can use. Look for it. As a software creator its your obligation to both the people using your software and the world at large to keep things as secure as you can. Don’t be the next [insert name of most recent high-profile password or security breach where it was revealed the passwords were thoroughly compromised]!