Issue
I have two excel files saved at different locations. One is directly downloaded from the browser and another is downloaded using selenium driver. I manually checked both the files, both are exactly the same. But the MD5 hash value generated for both files are coming different. How to fix this issue.
Solution
MD5 is a hashing function. People use hashing functions to verify the integrity of a file, stream, or other resource. When it comes to hashing functions, when you're verifying the integrity of a file, you're verifying that at the bit level, the files are the same.
The ramifications of this are that when you're comparing a file with integrity constraints on the bitwise level, then a hashing function works perfectly.
However, given the nature of Excel spreadsheets. If so much as one bit is added, removed, or moved from the document, on the bitwise level, then the hash of that file will be completely different. (Not always, but don't worry about that.)
Since the driver for Excel is quite different from the driver that selenium uses, especially given compression and other alterations/optimizations that may be made to the file by selenium, then -- of course -- the hash is going to be different.
My recommendations:
Firstly: Pull up the file in diff
and find out what is different between those two files. It's almost (but not quite) axiomatic that if the hashes for two files are different, then those files are also different.
Secondly: write a driver that compares the information in those spreadsheets to verify integrity (and you can take hashes of that information) of the document, rather than verifying the files on a bitwise level.
I'd recommend exporting both as a CSV and go line by line and compare the two.
Answered By - alvonellos
Answer Checked By - Willingham (JavaFixing Volunteer)