Auto-removing duplicate entries while remapping paths of pooled arrays

Have a suggestion for "Everything"? Please post it here.
Post Reply
malventano
Posts: 2
Joined: Tue Apr 28, 2020 12:32 am

Auto-removing duplicate entries while remapping paths of pooled arrays

Post by malventano » Tue Apr 28, 2020 12:55 am

I'm using the path remapping feature in order to index a Drivepool volume (same applies to Drive Bender and other similar pooling apps). This works quite well and is missing one small tweak for it to be a perfect solution for fast indexing of pooled solutions.

The issue stems from there being duplicate file and folder entries that appear in the Everything results. These entries are not just duplicate files but they (after remapping) are exact duplicate entries that have the same filename, path, size, date, etc. Since the remapping is done to mirror the effect of the pooled drive letter, these duplicate entries will effectively point to the same target location. If there was some way for Everything to ignore these duplicates, perhaps while computing its file sorts, this would clean up the search results considerably. Currently, files will have as many dupes as there are mirrored copies of that file, but folders multiply out of control (especially folders near the root of the pool - they will have one entry per drive in the array). If these dupes could be filtered by default, this would make Everything index pooled solutions exactly as it does when natively indexing a single large volume - what the user sees in Everything would match what they saw when browsing the pooled volume.

Understood that this is an obscure feature on top of another obscure feature, so if implemented this could just be an option flag that only exists in Everything.ini. Else I'd suggest placing the checkbox near the other sort options under Indexes in the Options GUI.

Thanks in advance for considering this tweak. Users of pooled drives will be eternally grateful for your support on this one :)
Allyn

void
Site Admin
Posts: 5377
Joined: Fri Oct 16, 2009 11:31 pm

Re: Auto-removing duplicate entries while remapping paths of pooled arrays

Post by void » Thu Apr 30, 2020 5:34 am

I'll look into a switch to remove duplicates (with the same path) at index time.

Currently Everything will allow duplicated paths as long as they belong to a different 'file system'.

Removing the duplicated entries is not so trivial as Everything uses multiple indexes. The duplicated files/folders will need to be 'marked' and removed from all indexes.

Thank you for your suggestion.

How do these mirrored pools work? Is there a primary drive that is mirrored to a secondary drive?
Can you exclude the secondary drive?

malventano
Posts: 2
Joined: Tue Apr 28, 2020 12:32 am

Re: Auto-removing duplicate entries while remapping paths of pooled arrays

Post by malventano » Mon May 04, 2020 5:11 pm

void wrote:
Thu Apr 30, 2020 5:34 am
How do these mirrored pools work? Is there a primary drive that is mirrored to a secondary drive?
Can you exclude the secondary drive?
It varies by types of pooling software, but generally speaking, it's not easily within the user's control. For pools with mirroring options, no specific drive holds the mirrored copy as the profile within the software will evenly distribute files as to not overfill any single drive. Of the pooling software that does allow the user to specify what goes where, trying to rework settings to force dupes into particular places (excluded from Everything) would likely take enough control away from the pooling algo that some drives would fill to capacity while others would remain unused.

A decent subset of users of pooling software are combining it with some form of bolt-on parity (e.g. SnapRAID), so while they would have no duplicate files, they would still have duplicated folders in the Everything results. If the pool software was evenly distributing files across an array of 12 drives, most of the folder structure would be duplicated across those 12 drives, meaning they would appear 12x within Everything results. Those who instead are duplicating files via the pool will get the folder dupes *and* the file dupes.
void wrote:
Thu Apr 30, 2020 5:34 am
Removing the duplicated entries is not so trivial as Everything uses multiple indexes. The duplicated files/folders will need to be 'marked' and removed from all indexes.
It may be possible to quickly handle this as a part of the sort algorithm (discard one of a pair each time an exact match occurs during sort). This would mean the duplicate rejection only happens with sort enabled, but I don't think those who would want this feature would complain about such a limitation.

Post Reply