Updating bad analysis

This commit is contained in:
2025-08-24 22:50:28 -04:00
parent 7e27faef70
commit 9630a14124
6 changed files with 46 additions and 46 deletions

View File

@@ -4,13 +4,13 @@ date: 2020-04-12
draft: false
---
# Introduction
## Introduction
For this bad malware analysis, I thought I would continue the theme of counting letters ... that way I could use most of my old code :)
Today, I decided to hash each file using sha512. Hashing is supposed to be completely random, so this is almost a test of that as well. I used around 3000 malicious samples and 1800 benign, so lets get started.
# Why Hash, Why sha512
## Why Hash, Why sha512
Hashing binaries is done all the time to verify downloads, check for changes, provide signatures, provide low hanging fruit for malware signatures, and many more purposes. It is so widely used, I was wondering if it was possible to use the hash itself as a flag to determine if this could be malware (beyond just a hash table).
@@ -20,31 +20,31 @@ The reason I decided to do the letter count on hashes was for two reasons; 1) it
The reason I decided sha512 is also two fold; 1) it's long, so it'll provide some of the most data and 2) sha in general is one of the most accepted hashing algorithms, so I went with that.
# What Was My Result
## What Was My Result
Surprising! There seems to be a pattern of what characters show up most in hashes for malware.
# What!
## What!
Yep, it appears that if you see around 3% more f's and 1% more 7's and 5's in your sha512 hash, then you might have some malware.
## That Can't be Right!
### That Can't be Right!
Hard to believe that is what it seems like 'f, 7, and 5' show up more and 'e and 6' show up 1% less in malware.
# Ok, So How Was it Done
## Ok, So How Was it Done
## Where are the Samples
### Where are the Samples
Same as my string analysis, to perform my hash analysis, I pulled down around 500 samples of malware from [theZoo](https://thezoo.morirt.com/) and [dasMalwerk](https://dasmalwerk.eu/). For samples of benign software I grabbed all of /bin on Fedora and 200 libraries from C:/Windows directory.
## How was it Analysed
### How was it Analysed
I modified my program from doing string checks to perform the hash analysis. Now, instead of running strings on each of the files it performs a sha512 hash. I then averaged the number of each character seen for each file. This means I counted the number of '1's seen for all malicious file hashes, then dividing by the total number of files.
This was done for all characters for each malicious and benign binaries. After that I subtracted the benign averages from the malicious averages and divided by the original value.
# Why?
## Why?
So a difference of 1 - 2% is not that much, but 3% seems more significant. This shouldn't happen, all characters should show up about evenly. This can probably be accounted for with just the samples that I had chosen. Choose a different set of 1000 binaries and the results could be different.