I had a question from a reader – Jim P – about the usage of yesterday’s function. Why did I write it and what am I using it for?
I wrote a response to Jim and thought I would share with the group:
In yesterday’s message (Hashing An Array of Files), the function shown takes an array of file names as strings, and generates a dictionary of each file name with it’s hash string.
Depending on the algorithm you use for hashing (e.g. MD5, SHA256, SHA512) you will get a consistent string value for the data contained in the file. It looks like this in my immediate window when I run the function with all the output:
' This is a one line multiple command in my immediate window
' The 4 lines starting with 0 are the timing outputs so I can see how long the function takes
' The final line is the output of the dctResult("C:\Windows\explorer.exe") dictionary lookup.
' The final line is the MD5 Hash value of the file C:\Windows\explorer.exe
Set dctResult = PS_GetFileHashes(Array("C:\Windows\explorer.exe"),"MD5"): _
?dctResult("C:\Windows\explorer.exe")
0 secs GetFileHashes Begin 8/30/2024 10:20:45 AM
0 secs GetFileHashes Start GetOutput 8/30/2024 10:20:45 AM
0.485 secs GetFileHashes End GetOutput 8/30/2024 10:20:46 AM
0.485 secs GetFileHashes End 8/30/2024 10:20:46 AM
D08504A4718A999E104AEF407BB43123
The reason I use the hash string is because it generally creates a unique string for each file so you can use that to see if the contents of the file are different.
In my case, in the customer’s application I have a whole bunch of files being processed and imported and I can’t guarantee the contents are exactly the same for each directory listing. Someone might update a file. If the contents become different I need to indicate that the file has changed and should be checked out and reprocessed if needed. Generally the files are being automatically generated and could get regenerated with different names and dates in some instances, so I can’t rely on only the name or file modified date.
A common usage of hashes you’ve probably seen when downloading a file from an internet site:
Often when you download files from the internet, the person who made the file provides a CRC value to check the file against after you download it to make sure you got the whole file and it hasn’t been tampered with or been corrupted in the download process.
This is the same process. I save the hash at the time the file is processed and then I check the files that are still in the processing folder the next day against that hash. If the hash is different I know the contents have changed.