Updating bad analysis
This commit is contained in:
@@ -4,7 +4,7 @@ date: 2021-03-11T18:55:01Z
|
||||
draft: false
|
||||
---
|
||||
|
||||
# Introduction
|
||||
## Introduction
|
||||
|
||||
For this episode of bad analysis, we are going to be looking at word frequency in passwords. Overall this isn't terrible analysis, but what makes it bad is I'm just looking for the first occurrence of a dictionary word in each password. This will miss *a lot* of words in passwords.
|
||||
|
||||
@@ -17,7 +17,7 @@ Additionally we will miss words because:
|
||||
|
||||
We are missing a lot of words in these passwords, but that is why this is bad analysis.
|
||||
|
||||
# Data
|
||||
## Data
|
||||
|
||||
The passwords come from several leaks. These include honey-net, MySpace, rockyou, hotmail, phpbb, and tuscl lists. All of these lists contain the count of how many times a password was used as well. Total there are 14,584,438 unique passwords.
|
||||
|
||||
@@ -25,9 +25,9 @@ This took forever to loop through, pulling out the words, then comparing them to
|
||||
|
||||
I'm comparing the password list to the American English word list found on Linux. There may be a more complete list somewhere out there, but this worked for me.
|
||||
|
||||
# Results
|
||||
## Results
|
||||
|
||||
## Raw Data
|
||||
### Raw Data
|
||||
|
||||
The word were extracted, counted, and sorted. There were 68,402 unique words, the top 10 words account for around 5% of total words seen, and there were 21,191 unique words only seen in their own password.
|
||||
|
||||
@@ -48,7 +48,7 @@ All percentages are approximate
|
||||
| and | 0.2 % |
|
||||
| ito | 0.2 % |
|
||||
|
||||
## Additional Fun Stuff
|
||||
### Additional Fun Stuff
|
||||
|
||||
How positive are people's passwords. Using a list of positive words found at [Positive List](https://gist.github.com/mkulakowski2/4289437) and a list of negative words found at [Negative List](https://gist.github.com/mkulakowski2/4289441), I've compared to our word frequency from our list.
|
||||
|
||||
@@ -64,7 +64,7 @@ Positive words were used 1,172,617 times and negative words were used 1,172,617.
|
||||
|
||||
Looking at positive and negative occurrences has it's own issues beyond just the word analysis. As you can see there are certain omissions that I would think would be in positive, like "baby." There are also inclusions in negative that I would not have made, such as "mar" which could just be March for someone's birthday. Better lists would need to be found or crafted, or entire passwords would need a language processor to determine if they are negative or positive.
|
||||
|
||||
# Conclusion
|
||||
## Conclusion
|
||||
|
||||
Not much to conclude here, mostly this was for fun. Don't use dictionary words in your password, it doesn't take long to loop through the dictionary, and if you do, try to use longer random words, rather then meaningful ones.
|
||||
|
||||
@@ -72,7 +72,7 @@ People tend to be more positive in their passwords which is nice to see.
|
||||
|
||||
This was a lot of fun to implement and I may come back to this to see if I can improve upon looking at words.
|
||||
|
||||
# Future Work
|
||||
## Future Work
|
||||
|
||||
- Thread all the things, maybe it'll run faster.
|
||||
- Look for more than just the first word in each password
|
||||
|
||||
Reference in New Issue
Block a user