So I've searched through this forum and read through a handful of different posts about similar topics, but I still feel my question really hasn't been answered, or I haven't been able to find the one that does....So, apologies in advance, I don't mean to spam the forum with a repeat post.
That said...Over the years my photo/video library on my computer has become an absolute mess...mostly due to backups out of platforms like photobucket, Google Photos, Google Drive, old computers, etc. Plus, just being lazy...copying it off my phone, but forgetting to organize it, or I partially organize it and forget...etc, etc.
So I've got a crap load of duplicate files all over the place in my media drive.
I am already familiar with the built in dupe feature, using things like `dupe:size;sha256`. The problem is, this is a project I am going to be working on for probably weeks because of how much of a mess it is. I don't want to constantly be re-computing hashes in the background every time I want to adjust my search or move files around. I figured, I'll just add a hash field to the index to make it persistent and limit it to only the folders I know that matter.
Problem is...I have no clue which hash property to add to the index...when I search terms like "sha", "md5", "crc" I come up with dozens of options.
Is there any advice from the Everything experts?
Up until recently, I've been using an app called dupeGuru.exe, which is great, but it's a passive tool that needs to constantly be re-run. The main reason I want to make this work with Everything is because as I move files and folders around, I want them to be auto-indexed and kept up to date in the background, saving me from re-running the app all the time.
The main thing I'm trying to understand better is the difference between "SHA-256", "sha256sum Pass" and "sha256sum SHA-256" (I just randomly picked the sha256 algorithm for this example).
Overwhelmed with all the various hash properties (for identifying dupes) - is there a recommended option?
Re: Overwhelmed with all the various hash properties (for identifying dupes) - is there a recommended option?
Please useProblem is...I have no clue which hash property to add to the index...when I search terms like "sha", "md5", "crc" I come up with dozens of options.
SHA-256Please consider
sha256sum SHA-256A break down of the different variations:
Folder Data and Names SHA-256
Folders only, computes the hash of all the file content and names in the folder and subfolders. (same as 7zip)
Folder Data SHA-256
Folders only, computes the hash of all the file content in the folder and subfolders.. (same as 7zip)
SHA-256
Files only sha256 hash (use this one)
sha256sum Pass
Simple Yes/No if the file content matches the hash inside the .sha256 sidecar file.
Folder Names SHA-256
Folders only, computes the hash of all the file names in the folder and subfolders.. (same as 7zip)
sha256sum SHA-256
Files only, get the precomputed hash from .sha256 sidecar files. (please consider this one)
Use
SHA-256Use
sha256sum SHA-256Note:
You don't need to index
sha256sum SHA-256But you will need to build the .sha256 sidecar files before hand.
-
TheBestPessimist
- Posts: 46
- Joined: Sat Jan 14, 2023 6:36 pm
Re: Overwhelmed with all the various hash properties (for identifying dupes) - is there a recommended option?
This is a wonderful question, as I'm in the same situation as you with my photos.
One problem I find is that after copying or moving multiple files (from NAS to laptop SSD or in reverse) the already computed hashes are lost and they need to be recalculated. Every single time.
I know I could use sidecar files but they don't move when I move my files, so they're useless to me
Also, for almost every setting change that I did, hashes needed to be recomputed.
Recomputing takes hours for me as I have more than 10TB of files (jpeg, raw video, metadata) on my 2 NASes.
In the end, I abandoned everything for this task as 'oops I need to recompute all the hashes again' issue has gone from annoying to infuriating to 'AAAARGH i can't use everything as it's frozen as it hashes yet again'.
One problem I find is that after copying or moving multiple files (from NAS to laptop SSD or in reverse) the already computed hashes are lost and they need to be recalculated. Every single time.
I know I could use sidecar files but they don't move when I move my files, so they're useless to me
Also, for almost every setting change that I did, hashes needed to be recomputed.
Recomputing takes hours for me as I have more than 10TB of files (jpeg, raw video, metadata) on my 2 NASes.
In the end, I abandoned everything for this task as 'oops I need to recompute all the hashes again' issue has gone from annoying to infuriating to 'AAAARGH i can't use everything as it's frozen as it hashes yet again'.
Re: Overwhelmed with all the various hash properties (for identifying dupes) - is there a recommended option?
Dont discard using a simple CRC32 hash if you just want to do file comparisons in everything. It's your fastest option.
If you intend to store the hash in disk, which may also be used by other programs, SHA256 might worth it, otherwise is an overkill IMHO.
I have an /append-search addcolumn:crc32 in my bookmarks.
If you intend to store the hash in disk, which may also be used by other programs, SHA256 might worth it, otherwise is an overkill IMHO.
I have an /append-search addcolumn:crc32 in my bookmarks.