When I Create an Index/Properties (SHA) ...

Discussion related to "Everything" 1.5.
Post Reply
ChrisGreaves
Posts: 821
Joined: Wed Jan 05, 2022 9:29 pm

When I Create an Index/Properties (SHA) ...

Post by ChrisGreaves »

See also:"Re: Monitor changes does not work for me in 1.5.0.1391a (x64)"
Version 1.5.0.1391a (x64)
Dupes03.jpg
Dupes03.jpg (181.51 KiB) Viewed 2685 times

I chose Tools, Options, Properties, and then elect to ADD a property for SHA512.
16,831 items have been considered, but this number is significantly less than the number of audio/image/video files (54,000) in T:\Media
(a) What drives, what files, what folders were considered for inclusion in the Index for SHA512? I have a boot drive C: and an encrypted data drive T:
(b) How/where do I control what files are included in the index. How do I prevent folders from being included?

These two questions are ground-level questions in a much bigger question that is based on the premises:
“I am now in possession of the most powerful machine I have ever owned (starting with a Radio-Shack MC-10 with 4 KB of RAM) or used (probably the CYBER-7x CDC mainframes in Western Australia and South Australia back in the 1970s)”
And:
“I have an Acer-Aspire with 2 GB of RAM and 2 TB of SSD. My encrypted data partition is 1.7 TB and I have organized my A-V source files into a tree T:\Media\”
Yes. I plan to use my huge RAM and huger SSD to build up several levels of complexity of detecting duplicates among 54,000 Audio-Visual files in T:\Media\ which holds 332 GB or about 600 MB per file on average.
Today I am exploring some tools in the tool kit, such as SHA-encoding, the clerical distinction between “properties” and “Indexes”, and much more. Those questions come later. First the simple two questions above.

Questions of Disk space and RAM and Processing speed in creating indexes do not concern me anymore :gloat!:

Thanks for any comments and for all elucidation.
Cheers, Chris
Last edited by ChrisGreaves on Fri Aug 01, 2025 8:28 pm, edited 1 time in total.
void
Developer
Posts: 19830
Joined: Fri Oct 16, 2009 11:31 pm

Re: When I Create an Index/Properties (SHA) ...

Post by void »

16,831 items have been considered, but this number is significantly less than the number of audio/image/video files (54,000) in T:\Media
Looks like some sort of active filter.
Try enabling the Everything filter from the Search menu.
Property indexing ignores the active filter.


(a) What drives, what files, what folders were considered for inclusion in the Index for SHA512? I have a boot drive C: and an encrypted data drive T:
All files.


(b) How/where do I control what files are included in the index. How do I prevent folders from being included?
I recommend setting include only folders for the sha256 property.

An example:
  • In Everything 1.5, from the Tools menu, click Options.
  • Click the Properties tab on the left.
  • Select sha256.
  • Set Include only folders to:
    t:\media

    (this will include subfolders under t:\media)
    empty = include ALL folders.
  • Set Exclude folders to:
    t:\media\junk

    (this is a folder in t:\media that you want to ignore)
    empty = no exclusions.
  • Set Include files to:
    *.mp3;*.mp4;*.mkv;*.jpg;*.png

    (use ; to separate items)
    empty = include ALL files.
  • Click OK.
ChrisGreaves
Posts: 821
Joined: Wed Jan 05, 2022 9:29 pm

Re: When I Create an Index/Properties (SHA) ...

Post by ChrisGreaves »

Looks like some sort of active filter.
Try enabling the Everything filter from the Search menu.
Property indexing ignores the active filter.
Dupes04.jpg
Dupes04.jpg (82.71 KiB) Viewed 2645 times
Thanks for this response void.
I have selected Everything as the filter
Dupes05.jpg
Dupes05.jpg (24.52 KiB) Viewed 2645 times
I used right-click Properties on folders under the root of C: and of T: with the results tabled above.
Everything now reports close to 2.02 times as many items (907,956) as Explorer files (450,230)
If I use Files AND Folders (587,468) the ratio is close to 1.55.
Either way I cannot explain the differences.

If the bottom-right corner of Everything’s screen means “482 GB of data indexed”. Then that too differs measurably from 419 GB.

I am “this close” to restoring this laptop and rebuilding it from factory reset, so please don’t fret over this yet.
I have checked Search, Organize filters and my Everything filter is empty, so I don’t have a weird setting there AFAIK. And anyway Everything’s 907,956 is either 1.55 or 2.01 times Explorer’s figures.
And I gather from your third line that indexing properties ignores the filter anyway; is that correct?

I am using SHA512 in the naïve belief that “bigger is better”.
Storage for me is not an issue, and I reasoned that a longer key won’t greatly affect my searches for duplicates.
I recommend setting include only folders for the sha256 property.

Set Include only folders to:
Set Include files to:
Again, please, have I understood this?
In Index, Properties: identifying Folders is a way of reducing the overall scope of what is to be indexed. In my case since I am aiming at locating duplicates in all areas of my t:\Media\ tree, I need not concern my SHA512 index with material found in T:\Greaves\ and T:\Writing and the like
In Index Properties: identifying Files (by extents) further restricts the index to ONLY audio-visual files and does not include the half-dozen MSWord documents and templates I have in there.
Given that I have oodles of SSD and oodles of RAM, and oodles of processing time (overnight!) I think that I should not be greatly affected by SHA512 indexing on every file on both C: and T: drives.
If indexing all drives takes six hours overnight, and indexing only T:\MEDIA\ files takes only three hours, I shouldn’t care – I’ll be asleep in either event?
Thanks, Chris
void
Developer
Posts: 19830
Joined: Fri Oct 16, 2009 11:31 pm

Re: When I Create an Index/Properties (SHA) ...

Post by void »

Either way I cannot explain the differences.
It's normal.

Everything indexes a lot of system files you will not be able to see in Windows Explorer.

One option is to fire up another instance and use folder indexing instead of NTFS indexing.
You could then create a file list from both instances and compare.


And I gather from your third line that indexing properties ignores the filter anyway; is that correct?
Yes, indexed properties ignore any search options and filters.
indexed properties applies to your whole index, not the current results.


I am using SHA512 in the naïve belief that “bigger is better”.
I don't recommend indexing SHA512 unless if for only a small subset of files (eg: *.jpg;*.mp4;*.mp3 etc..)

I recommend .sha256 sidecar files.


In Index, Properties: identifying Folders is a way of reducing the overall scope of what is to be indexed. In my case since I am aiming at locating duplicates in all areas of my t:\Media\ tree, I need not concern my SHA512 index with material found in T:\Greaves\ and T:\Writing and the like
In Index Properties: identifying Files (by extents) further restricts the index to ONLY audio-visual files and does not include the half-dozen MSWord documents and templates I have in there.
Given that I have oodles of SSD and oodles of RAM, and oodles of processing time (overnight!) I think that I should not be greatly affected by SHA512 indexing on every file on both C: and T: drives.
If indexing all drives takes six hours overnight, and indexing only T:\MEDIA\ files takes only three hours, I shouldn’t care – I’ll be asleep in either event?
The initial index may only take six hours.
However, it is expensive to maintain.
Each time a large file is "changed", Everything will recalculate the sha256.

I would recommend just gathering the sha256 values when needed.

eg:
dupe:size;sha256


This way you are only looking up the sha256 for files with the same size (a rather small subset)
ChrisGreaves
Posts: 821
Joined: Wed Jan 05, 2022 9:29 pm

Re: When I Create an Index/Properties (SHA) ...

Post by ChrisGreaves »

Void, Thank you for these explanations. I am now further along in my understanding of SHA Indexes. I think. I Hope. …
Everything indexes a lot of system files you will not be able to see in Windows Explorer.
Right! In the past I have explained to people that Everything finds Everything, but have ignored my own advice! In general I do not mind a small difference in tallys, but feel confident that when I specify folders and extents, Everything will be spot-on. It was the two-fold difference that leapt out at me.
Yes, indexed properties ignore any search options and filters.
This confirmation too is welcomed. Construction of Everything’s databases (for folders and for files) is done to serve searches, and is not governed by searches.

I don't recommend indexing SHA512 unless if for only a small subset of files (eg: *.jpg;*.mp4;*.mp3 etc..) I recommend the use .sha256 sidecar files. voidhash is designed for static / archive folders.
“define small” (Grin!)
“t:\media\music ext:mp3” returns 21,000 files, whereas “T:\” returns ten times as many (237,000), so for my purposes – identifying duplicate audio and image files as separate tasks – restricting the SHA to 10% of my file stack makes great sense.
I figure on setting up a list of duplicate files that I will examine over a period of a week or more. That makes me think of a static list of duplicates, which is different from a static Index. I think here I have blurred the lines between “databases” and “indexes”. A FileList generated for an off-line backup drive will be static until I run another backup.
I don’t need an answer to this; I will do some more trials and see how I get on in terms of timing. My sudden increase in RAM and SSD makes me think of Going Big with stored databases, indexes. Running overnight has long been a favourite method of mine.
The initial index may only take six hours. However, it is expensive to maintain.
My lean towards SHA512 was based on my premise that I had plenty of SSD space and plenty of time (overnight). That said I now reflect that if SHA256 is effective, I don’t need to go “Bigger is Better” at all.
Each time a large file is "changed", Everything will recalculate the sha256. I would recommend just gathering the sha256 values when needed.
Do I care about Everything recalculating the SHA value for one file?
If I delete a SHA-indexed file all that is needed is the removal (delete/null) of an entry, yes?
If I modify an MP3 or JPEG file, than that one file’s SHA value must be recalculated, yes?
Recalculating the SHA of a single file does not seem critical to me.
Recalculating the SHA of a folder tree does seem critical to me. That, for me, would be a candidate for an overnight job.
eg: “dupe:size;sha256” This way you are only looking up the sha256 for files with the same size (a rather small subset)
Using this as a search string suggests that the SHA will be re-calculated for all found files (“dupe:size”).
I was thinking of a more stable source of SHA; for me, do all the calculation once, and work off the stable results of that SHA data, while I paw over the apparent duplicate files for a week or two, then recalculate SHA.
I may not be understanding the situation at all. I am probably hampered by the massive line-printer lists we worked off years ago when clarifying “The New Shares Issue” of BHP et al.
Thanks again for improving my knowledge.
Chris
void
Developer
Posts: 19830
Joined: Fri Oct 16, 2009 11:31 pm

Re: When I Create an Index/Properties (SHA) ...

Post by void »

“define small” (Grin!)
The subset doesn't have to be small, just well defined.

The important thing is:
Avoid indexing sha256 for all files on system drives.


The initial index may only take six hours. However, it is expensive to maintain.
My lean towards SHA512 was based on my premise that I had plenty of SSD space and plenty of time (overnight). That said I now reflect that if SHA256 is effective, I don’t need to go “Bigger is Better” at all.
I should point out that indexing sha256 for files that change often is expensive to maintain.
I recommend indexing sha256 for files that are not going to change often.


Do I care about Everything recalculating the SHA value for one file?
Yes, if the file is large or if the file changes often.
Reading the file from disk will slow the system.
Calculating the SHA value is CPU expensive.


If I delete a SHA-indexed file all that is needed is the removal (delete/null) of an entry, yes?
If the file is deleted, the sha256 for that file (if indexed) is also removed from your Everything index.


If I modify an MP3 or JPEG file, than that one file’s SHA value must be recalculated, yes?
Yes.
Everything will do this automatically in the background.


Using this as a search string suggests that the SHA will be re-calculated for all found files (“dupe:size”).
Yes, the sha256 values are only gathered for the results of dupe:size
dupe:size;sha256 is more useful if you are not indexing sha256 as it will only gather sha256 for files that share the same size.
Post Reply