Tuesday, August 16, 2011

Why passwords really suck

RE: http://securitynirvana.blogspot.com/2011/08/xkcd-936-discussion-continues.html?m=1

First: It's a comic. It's meant to be funny. Why the fuck are we over analyzing this?

Second: People don't use shitty passwords because they can't remember good passwords. People use shitty passwords because they don't care. They think no one will ever crack THEIR passwords, or because they're crap typists.

Therefore, any content below can only be pedantic.

Per Thorsheim has far more experience than me in this space. But, I've always been too stubborn or too stupid not to argue with people based on that.

I'll go ahead and admit that I am a sad American monoglot. My pathetic failure to achieve fluency in any other language cripples my ability to debate the security of Norwegian passwords or Russian passwords or, gods forbid, Chinese passwords with their variant character sets and vocabularies. So, I'll stick with what I think I know (which isn't much), and talk about American.

I do my good friends from the United Kingdom the favor of admitting this isn't the same as English.

Wolfram Alpha tells Mr. Thorsheim there are 600,000 words in the Oxford English Dictionary 2nd edition. I'm going to go out on a limb and suggest, if you can find a hundred Americans who know all 600,000 words in that dictionary, much less use them on a common basis, you might want to visit Las Vegas and put it all on red.

I'm not a developmental psychologist, and as it's seven PM on a Tuesday, I'm also a pretty lazy researcher. But, since the standard was set at Wolfram Alpha, that bar is fortunately low.

Wikipedia (see how I did that?) tells me there are two types of vocabulary: Productive and Receptive vocabulary. That's a fancy way of saying there are words you recognize if you hear them or if you see them, but you're not likely to actually use them. The kind you use are in the productive category, and that's stereotypically a smaller subset than the kind you recognize.

Now, of the words people actually use, there are words that are more common than others. For example, there is an occasion to use a word like herpes. Most people know it. Most people have used the word at least once in their lives, hopefully in jest. But most people don't use the word herpes very often.

I say most people because of the company I keep. As a good Southern Girl, I tend to stay away from shady bars and navy bases (no offense, sailors).

So, we assume that, if you take a person's entire vocabulary, their productive vocabulary is a subset portion of that vocabulary, and commonly used vocabulary is a subset of productive vocabulary. I couldn't find any scientifically supported studies that everyone recognizes as sound. But, I also live in a country where people are debating seriously whether or not the Bible should be used as a scientific text for schoolchildren.

I did find one blog that has used input from its Internet savvy readership as research about vocabulary. They compare the findings to self-reported SAT scores (and we're back to the American education system) to suggest their Internet readership isn't exactly average. http://testyourvocab.com/blog/2011-07-25-New-results-for-native-speakers.php

If nothing else, it was an interesting read. They suggested total vocabulary figures of around 26,000 words for people between 23 and 28 years of age (native American speakers).

Another (http://www.trivia-library.com/b/word-counts-and-vocabulary-usages.htm) suggested that the People's Almanac figures are closer to 3,000 words for the average person (since going to college and taking the SAT is, realistically, not an average experience when you consider the entirety of Americans.

Let's aim for somewhere in the middle and assume that a total vocabulary of around 10,000 words is fair. Subset that to extract productive vocabulary, and we can take that down to say 8,000.

Again, I can't be sure. I'm pulling numbers out of thin air. But, I'm pretty sure that the assumption that all 600,000 words is the best starting point for a crack dictionary is flawed.

You can add punctuation to increase the randomness. But when you look at GPU password cracking like http://hashcat.net/oclhashcat/#performance (that claims single system speeds of up to 6194M c/s (for NTLM)), use basic logic to assume that most people will put spaces between words (and may put a comma space between two of their words), and most people will put other punctuation at the end (and it will usually be a ?, !, or .)...

I'm not a talented enough programmer to whip up a solution to test this theory against your hash in only 14 days. But, I genuinely hope someone does. I'd be fascinated to know the result.

But, in the end, it doesn't really matter. People who are using "jennifer" or "password123" or "12345" as their passwords aren't going to stop doing so in favor of ")&$@ZVCjlmqr:;">|}=_-+" or even "monkeyshitforbrains"; that mess is just too hard to type.


No comments:

Post a Comment