Find Duplicates

Discussion related to "Everything" 1.5 Alpha.
Post Reply
ChrisGreaves
Posts: 606
Joined: Wed Jan 05, 2022 9:29 pm

Find Duplicates

Post by ChrisGreaves »

void wrote: Sat Feb 04, 2023 12:12 am Everything 1.5 adds support for finding unique or duplicates items in your results.
To find files with duplicated content:
  • Include the following in your search:
    dupe:size;sha256
David, this new DUPE function is significantly more fun than the old one. Congratulations!!
That's the good news. Now for the bad news:-

I don't understand what is happening here.
Dupe_07.png
Dupe_07.png (40.79 KiB) Viewed 8984 times
I am/was not surprised to find that out of 19,000+ MP3 files in T:\Music there were two pairs of files that matched by Name, Size, and DateModified.
I have about thirty pairs that match by Name and Size, because when I hear a swingin' Bach partita, I want more.
The Result List here was strong enough for me to delete one item of each pair, but I am here to try out Sha256, so ...
Dupe_08.png
Dupe_08.png (42.51 KiB) Viewed 8984 times
No surprise here. I was ready to delete the two duplicates anyway; I just wanted to run Sha256 so that I could brag that I was the first person in Bonavista to do so(grin)
Dupe_09.png
Dupe_09.png (37.05 KiB) Viewed 8984 times
But when I got curious about where these pairs were and added the Path to the column headings, I was surprised.
So, right-click, Open path and:-
Untitled.png
Untitled.png (51.18 KiB) Viewed 8984 times
Huh? I am ready to Rebuild Indexes and fiddle around to see if I can make this go away, but I'd rather hear back from you in case there is some other poking that should be done before I destroy evidence.

Besides which you have given me plenty to go on with!
Thanks; despite this one surprise, the new Dupe 1.5 really is less confusing than that for 1.4.
Chris
Last edited by NotNull on Sun Feb 12, 2023 8:01 am, edited 1 time in total.
Reason: Split post
therube
Posts: 4605
Joined: Thu Sep 03, 2009 6:48 pm

Re: Find Duplicates

Post by therube »

Are you indexing T: (as NTFS) & also have T: added as a Folder Index (or at least parts thereof)?
raccoon
Posts: 1017
Joined: Thu Oct 18, 2018 1:24 am

Re: Find Duplicates

Post by raccoon »

There definitely should be a warning when the same volume is added to multiple indexing sections. (I still suggest letting Everything automatically detect which index section a volume should be added to, as this is too advanced of an ask for most users.)
ChrisGreaves
Posts: 606
Joined: Wed Jan 05, 2022 9:29 pm

Re: Find Duplicates

Post by ChrisGreaves »

therube wrote: Fri Feb 10, 2023 6:23 pm Are you indexing T: (as NTFS) & also have T: added as a Folder Index (or at least parts thereof)?
by raccoon » Fri Feb 10, 2023 3:16 pm There definitely should be a warning when the same volume is added to multiple indexing sections. (I still suggest letting Everything automatically detect which index section a volume should be added to, as this is too advanced of an ask for most users.)
Thanks therube & raccoon.
The first two attachments (Untitled1 and Untitled0) show my volumes; I took the second screen shot because I had to scroll down the list of drives.
The third attachment (Untitled2) shows that the Folder List is empty; as well I checked inside the various Index panels, those yellow-circled on the left. They are empty.
FWIW it is now only two days I re-installed with defaults, knowing that I would be asking a great many questions. This installation is as close to vanilla as I can make it by clicking "I agree", "Sure" etc during the installation process and focusing (for now) on Functions in Searching from the wiki.

@raccoon I am a bit lost here with "indexing sections", but that I've not met it before is a good sign; that probably means I haven't dabbled in it.
If this means that when a user says "Add T:\Folder\" to the indexes while saying "Add Drive T: to the indexes", I would agree.
I know that I can detect my SUBSTituted drives using VBA, that is, detect their mapping.

My Fakes.bat :-

Code: Select all

:: Rem Fakes.bat
:: 7:53 AM 05/18/2022 cloned from autoexec20220616.bat
::

::
::		Map folders to drive letters
::
		for %%a in (a b c d e f g h i j k l m n o p q r s t u v w x y z) do subst %%a: /d >>nul
%pause%
		call T:\BatLap\SetDate.bat

		if not "%Timeout%"=="" timeout /T %Timeout%
		if not exist A:\NUL subst A: C:\Users\%UserName%\AppData\Roaming\Greaves
		if not "%Timeout%"=="" timeout /T %Timeout%
		if not exist B:\NUL subst B: T:\Blotter\%DAILY%
		if not "%Timeout%"=="" timeout /T %Timeout%
%pause%
::		if not exist I:\NUL subst I: T:\Greaves\Products\User\Indxr\
		if not "%Timeout%"=="" timeout /T %Timeout%
		if not exist J:\NUL subst J: "T:\Pers\Audio Books\_LibriVox\JSBach\"
		if not "%Timeout%"=="" timeout /T %Timeout%
		if not exist L:\NUL subst L: "T:\Pers\Audio Books\_LibriVox\"
		if not "%Timeout%"=="" timeout /T %Timeout%
::		if not exist M:\NUL subst M: "T:\Appl\Audacity\audacity-home\audacity-win-3.1.3-64bit\Portable Settings\Macros"
		if not "%Timeout%"=="" timeout /T %Timeout%
::		if not exist P:\NUL subst P: "T:\Pers\Audio Books\DistProof"
		if not "%Timeout%"=="" timeout /T %Timeout%
::		if not exist S:\NUL subst S: T:\Music\Calm\Classical\Shostakovitch\
		if not "%Timeout%"=="" timeout /T %Timeout%
::		if not exist U:\NUL subst U: T:\Greaves\Products\DEVEL\Turing\
		if not "%Timeout%"=="" timeout /T %Timeout%
		if not exist V:\NUL subst V: T:\Pers\Places\LivingInBonavista
		if not "%Timeout%"=="" timeout /T %Timeout%
		if not exist W:\NUL subst W: T:\Greaves\Admin\Domains
		if not "%Timeout%"=="" timeout /T %Timeout%
::		if not exist X:\NUL subst X: T:\Greaves\Admin\Domains\landfallgardenhouse
::		if not "%Timeout%"=="" timeout /T %Timeout%
%pause%
::
REM end of Fakes.bat
Not one of A:, B:, V:, or W: maps to anywhere like T:\Music\.

Thanks for checking up on/for me, anyway!
Chris
Attachments
Untitled1.png
Untitled1.png (50.71 KiB) Viewed 8966 times
Untitled0.png
Untitled0.png (59.79 KiB) Viewed 8966 times
Untitled2.png
Untitled2.png (50.24 KiB) Viewed 8966 times
therube
Posts: 4605
Joined: Thu Sep 03, 2009 6:48 pm

Re: Find Duplicates

Post by therube »

I am here to try out Sha256
dupe#property mentioned a limit of 3 items, so I don't think your sha256 actually made it into your wanted list.
therube
Posts: 4605
Joined: Thu Sep 03, 2009 6:48 pm

Re: Find Duplicates

Post by therube »

T:, Truecrypt, do you mount/unmount kind of thing?
Maybe it is in there twice, not as "T:", but under different GUID's.

Tools | Debug -> NTFS Path = T:, then some other = T: ?
ChrisGreaves
Posts: 606
Joined: Wed Jan 05, 2022 9:29 pm

Re: Find Duplicates

Post by ChrisGreaves »

therube wrote: Fri Feb 10, 2023 8:02 pm dupe#property mentioned a limit of 3 items, so I don't think your sha256 actually made it into your wanted list.
Good Point! Thanks.

I read "Everything will first compare file sizes, if the size matches, Everything will gather the SHA256 hash" as meaning that sha256 would force a Size: anyway, so that my explicit mention of Size: in my filter would be ignored.

I think that your interpretation is correct.
T:, Truecrypt, do you mount/unmount kind of thing?
Maybe it is in there twice, not as "T:", but under different GUID's.
I am not sure what you mean here. To the best of my knowledge the Mount occurs only once in my autoexec.bat:-

Code: Select all

::
::	Mount the encrypted drive
::
		if not exist T: "C:\Program Files\TrueCrypt\TrueCrypt.exe" /q /lT  /v\Device\Harddisk0\Partition3
		%pause%
Tools | Debug -> NTFS Path = T:, then some other = T: ?
cheers, Chris
Untitled.png
Untitled.png (31.89 KiB) Viewed 8950 times
I assumed you meant in Everything.
Choosing Tools, Debug, Console fires up the screen above which refreshes about once per second. No chance to type in commands.
therube
Posts: 4605
Joined: Thu Sep 03, 2009 6:48 pm

Re: Find Duplicates

Post by therube »

Oops. Left out "Statistics".

Tools | Debug | Statistics -> NTFS Path = T:, then some other = T: ?
ChrisGreaves
Posts: 606
Joined: Wed Jan 05, 2022 9:29 pm

Re: Find Duplicates

Post by ChrisGreaves »

therube wrote: Fri Feb 10, 2023 8:02 pm Maybe it is in there twice, not as "T:", but under different GUID's.
With my re-installation in place I thought to try the Dupe run again:-

Code: Select all

T:\Music\ *.mp3 Dupe:Name;Size;DM
Untitled.png
Untitled.png (38.24 KiB) Viewed 8947 times
I still find the same two pairs of duplicates.
Untitled2.png
Untitled2.png (51.3 KiB) Viewed 8947 times
And they are still in the same folder!

I have these two files backed up, of course, so next I wondered "What happens if I remove one file from one pair? But I sort-of promised David that I would wait until I had heard back from him.
Which was no excuse for me to re-install,
But whatever, the apparent problem is still there.
Now I am wondering what happens if I move each file to a separate folder. For example:-
Move the Bach file to "T:\Music\2022\202201\20210115\" and move the Buggles file to "T:\Music\2022\202201\20210117\"

Cheers, Chris
therube
Posts: 4605
Joined: Thu Sep 03, 2009 6:48 pm

Re: Find Duplicates

Post by therube »

Are these dup's only turning up when you are including dupe: ?

As in if you simply search for,
bmv sato nether societ low
, do you still see the T: entry listed twice?
ChrisGreaves
Posts: 606
Joined: Wed Jan 05, 2022 9:29 pm

Re: Find Duplicates

Post by ChrisGreaves »

therube wrote: Fri Feb 10, 2023 9:06 pm Are these dup's only turning up when you are including dupe: ?
As in if you simply search for,
bmv sato nether societ low
, do you still see the T: entry listed twice?
Untitled.png
Untitled.png (27.82 KiB) Viewed 8929 times
Now why didn't I think of that?

But: zero results!
Chris
froggie
Posts: 297
Joined: Wed Jun 12, 2013 10:43 pm

Re: Find Duplicates

Post by froggie »

Anything on in the search menu like match whole words?
ChrisGreaves
Posts: 606
Joined: Wed Jan 05, 2022 9:29 pm

Re: Find Duplicates

Post by ChrisGreaves »

froggie wrote: Sat Feb 11, 2023 1:14 am Anything on in the search menu like match whole words?
Oh how you work me! :lol:
Untitled.png
Untitled.png (32.73 KiB) Viewed 8808 times

Code: Select all

ww:*.mp3 ww:"bach" ww:"partita" T:\Music\ Dupe:Name;Size;DM 
I hadn't got to WholeWords yet, so I assume that the filter is formed correctly.

Two other points come to mind:-
(1) I don't really expect Everything to check its own results and (in this case) say "Hang about! I've reported two files twice in the same folder! That can't be correct"
(2) There is a possibility of a corrupted drive, at the least, of a directory contents, BUT three days ago I backed up my data partition, reformatted the encrypted partition, then RoboCopy'd data from the backup to the data partition on the HDD; so I expect my Windows file system to be un-corrupted at this time.
Untitled2.png
Untitled2.png (89.06 KiB) Viewed 8808 times
But then ww: got me interested, so I stripped the filter down.

Code: Select all

ww:bach ww:partita T:\Music\
No need to ask for *.mp3 - that's pretty well all I have stored in T:\Music\ and no need for Size: - with only 17 objects I can sort by size and eyeball the list.

Based in part on my item (1) above, I start to consider the possibility of thinking that this might be a bug. I still don't expect Everything to check its own results, but then, too, I don't expect to see two files reported from the same folder when Windows File Explorer reports only one (of each of the duplicate pairs originally reported, the "Bach Partita" and the "The Buggles"
Cheers, Chris
froggie
Posts: 297
Joined: Wed Jun 12, 2013 10:43 pm

Re: Find Duplicates

Post by froggie »

I meant for you to check the search menu to see if anything was checked, like whole word, to explain the zero result for the search, to see what happens without dupe as therube suggested, not to add ww: to the search.
LeoLUG
Posts: 69
Joined: Tue May 26, 2020 2:28 am

Re: Find Duplicates

Post by LeoLUG »

The zero results, is because it was write wrong! there is not BMV in the file name!
instate of:

Code: Select all

bmv sato nether societ low

search for:

Code: Select all

bwv sato nether societ low
ChrisGreaves
Posts: 606
Joined: Wed Jan 05, 2022 9:29 pm

Re: Find Duplicates

Post by ChrisGreaves »

LeoLUG wrote: Sun Feb 12, 2023 6:01 am The zero results, is because it was write wrong! there is not BMV in the file name!
Thank you LeoLug. I think I am getting punch-drunk about this time.
Untitled.png
Untitled.png (31.16 KiB) Viewed 8689 times
I think that's in part because I am still new to filter strings, and over the past few weeks have made mistakes when typing in from scratch and getting caught with spaces ("AND"), colons, semi-colons, periods and commas etc , that I have formed a habit of copy/pasting from the {CODE} box in a post to avoid my clumsy typing errors.

Well, same deal, one of those two pairs is duplicated.

But then I thought the "habit of copy/pasting" is a Good Thing, right? What else can I copy/paste?

(1) Well, I can r/c and paste the FullName of one of the results:-

Code: Select all

T:\Music\2022\202201\20220116\Bach - Violin Partita no. 3 in E major BWV 1006 - Sato _ Netherlands Bach Societ_low.mp3
Untitled2.png
Untitled2.png (29.84 KiB) Viewed 8689 times
Mirable Dictu! Using the FullName yields only one result, which is how it should be.
(2) What if I use the FullName and an extent mask?
Untitled3.png
Untitled3.png (30.45 KiB) Viewed 8689 times
That yields still just one result, which is how it should be.
(3) What if I use the EXT: function?
Untitled4.png
Untitled4.png (30.49 KiB) Viewed 8689 times
That yields still just one result, which is how it should be.
(4) What if I use the EXT: function and an extent mask?
Untitled5.png
Untitled5.png (29.63 KiB) Viewed 8689 times
That yields still just one result, which is how it should be. Although I admit that I had hoped that doubling up on "extents" might have been confusing Everything1.5
(5) OK. Add in the drive and path:

Code: Select all

T:\Music\ *.mp3 ext:mp3 T:\Music\2022\202201\20220116\Bach - Violin Partita no. 3 in E major BWV 1006 - Sato _ Netherlands Bach Societ_low.mp3
Untitled6.png
Untitled6.png (30.02 KiB) Viewed 8689 times
This is getting aggravating! How can I trip up Everything?

(6) My original filter was

Code: Select all

T:\Music\ *.mp3 Dupe:Name;Size;DM
I will spend some time this morning trying to see if I can find a MagicSwitch that will eliminate the duplication of data in the Result List.

Thanks though for pointing out my error.:thumbsup:
Cheers, Chris
ChrisGreaves
Posts: 606
Joined: Wed Jan 05, 2022 9:29 pm

Re: Find Duplicates

Post by ChrisGreaves »

(6) My original filter was “T:\Music\ *.mp3 Dupe:Name;Size;DM” I will spend some time this morning trying to see if I can find a MagicSwitch that will eliminate the duplication of data in the Result List.
(a) In Windows File Explorer rename the folder from “20220116” to “2022011Z”;
(b) Rename it back again to “20220116”
(c) Force a Master rebuild; all indexes, databases etc.
(d) Copy the contents of the folder “T:\Music\2022\202201\20210116” to a newly created folder “T:\Music\2022\202201\2021011Y”
(e) Uninstall and reinstall 1.5.0.1037a(x64). (I did this yesterday, Saturday, February 11, 2023)
(f) Move the contents of the folder “T:\Music\2022\202201\20210116” to a newly created folder “T:\Music\2022\202201\2021011X”
(g) Move the contents of the folder “T:\Music\2022\202201\20210116” to a newly created folder “C:\Music\2022\202201\2021011W”
(h) Move the contents of the folder “T:\Music\2022\202201\20210116” to a newly created folder “T:\V”
(i) Move the contents of the folder “T:\Music\2022\202201\20210116” to a newly created folder “c:\V”
(j) Delete the file T:\Music\2022\202201\20210116\Bach - Violin Partita no. 3 in E major BWV 1006 - Sato _ Netherlands Bach Societ_low.mp3
(k) Delete the file T:\Music\2022\202201\20220116\The Buggles - Video Killed The Radio Star (Official Music Video)_low.mp3
(l) T:\Music\ *.mp3 Dupe:Name;Size;DM !buggles
(m) T:\Music\ *.mp3 Dupe:Name;Size;DM !bach
(n) Move the folder tree of “T:\Music\2022\202201\20210116” to a different parental folder
(o) Move the folder tree of “T:\Music\2022\202201\20210116” to a different parental drive
(p) ???
(q) ???

The options listed above (open to suggestions about the sequence) would keep me quiet for at least a half a day. I would perform a regular “nightly backup” with RoboCopy and a RoboCopy restore from backup between each test, in the hopes of maintaining a copy of the NTFS drive.

A faster alternative to a “RoboCopy to an external drive” would be a PKZip2.5 zip archive to a local folder on my data partition T:

Cheers, Chris
void
Developer
Posts: 15251
Joined: Fri Oct 16, 2009 11:31 pm

Re: Find Duplicates

Post by void »

dupe.2021.2022.png
dupe.2021.2022.png (35.58 KiB) Viewed 8630 times
It was hard to see at first.
These items have different paths.
The results are expected.

dupe#property mentioned a limit of 3 items
I'll make the dupe: search fail if you specify more than 3 properties to avoid showing unexpected results.


There definitely should be a warning when the same volume is added to multiple indexing sections. (I still suggest letting Everything automatically detect which index section a volume should be added to, as this is too advanced of an ask for most users.)
I'll have a warning in the next alpha update.

Thank you for the suggestions.
ChrisGreaves
Posts: 606
Joined: Wed Jan 05, 2022 9:29 pm

Re: Find Duplicates

Post by ChrisGreaves »

void wrote: Mon Feb 13, 2023 7:24 am It was hard to see at first.
These items have different paths.
The results are expected.
Now we all know why you, David, are writing Everything, and I am not! :blush: :embarrassed:
Thank you for the suggestions.
I did not make suggestions, but I am glad that the authors will see this.

I take comfort in knowing that at least half of my suggestions might have shown me the error of my ways.
Thank you, Void.
Chris
therube
Posts: 4605
Joined: Thu Sep 03, 2009 6:48 pm

Re: Find Duplicates

Post by therube »

there is no BMV in the file name
Heh, did I do that.

Often I use the files 1 & 2.
I'll copy something into 1, then the "identical" item into 2.
Then I'll compare 1 vs 2 to see what I'm not seeing ;-).


(And while we're on fakes, & finding things that aren't there, you might find this interesting,
opening certain link results in www.xn-- prefix)
ChrisGreaves
Posts: 606
Joined: Wed Jan 05, 2022 9:29 pm

Re: Find Duplicates

Post by ChrisGreaves »

therube wrote: Mon Feb 13, 2023 7:14 pm Heh, did I do that.
Dunno!
But I used/posted it so I am responsible :D
Cheers, Chris
anmac1789
Posts: 561
Joined: Mon Aug 24, 2020 1:16 pm

Re: Find Duplicates

Post by anmac1789 »

therube wrote: Mon Feb 13, 2023 7:14 pm
there is no BMV in the file name
Heh, did I do that.

Often I use the files 1 & 2.
I'll copy something into 1, then the "identical" item into 2.
Then I'll compare 1 vs 2 to see what I'm not seeing ;-).


(And while we're on fakes, & finding things that aren't there, you might find this interesting,
opening certain link results in www.xn-- prefix)


Thats what I also do but it turns out you end up creating backups of backups of backups and things become a mess lol
therube
Posts: 4605
Joined: Thu Sep 03, 2009 6:48 pm

Re: Find Duplicates

Post by therube »

Often I use the files 1 & 2.
I'll copy something into 1, then the "identical" item into 2.
Then I'll compare 1 vs 2 to see what I'm not seeing ;-).
And I did that just yesterday evening.
Had 2 file names that should have been the "same", but they sure didn't look like it.

Once I plugged them into 1 & 2, & compared, sure enough, they were the same.
It was highlighting, that caused a slight difference in width, that caused the oddity.
void
Developer
Posts: 15251
Joined: Fri Oct 16, 2009 11:31 pm

Re: Find Duplicates

Post by void »

There definitely should be a warning when the same volume is added to multiple indexing sections. (I still suggest letting Everything automatically detect which index section a volume should be added to, as this is too advanced of an ask for most users.)
Everything 1.5.0.1338a adds a warning when attempting to add an NTFS volume as a folder index when it is already indexed as an NTFS index.

Thank you for the suggestion.
raccoon
Posts: 1017
Joined: Thu Oct 18, 2018 1:24 am

Re: Find Duplicates

Post by raccoon »

void wrote: Thu Feb 16, 2023 8:30 am Everything 1.5.0.1338a adds a warning when attempting to add an NTFS volume as a folder index when it is already indexed as an NTFS index.
Testing this, it works, but only under the specific condition that a user adds a Folder index of a root drive letter (ie, E:\) when the same volume is being indexed as NTFS. However, the reverse is not the same, as conflict is not detected when enabling NTFS indexing on an already indexed Folder. Similarly, I can NTFS index E:\, and then Folder index E:\Media\, and there will be no conflict warning either, yet duplicate search results will occur if I do.
void
Developer
Posts: 15251
Joined: Fri Oct 16, 2009 11:31 pm

Re: Find Duplicates

Post by void »

Thank you for the feedback raccoon,

Only drive letters are supported at the moment.

I will look into showing a warning when adding an NTFS folder.
void
Developer
Posts: 15251
Joined: Fri Oct 16, 2009 11:31 pm

Re: Find Duplicates

Post by void »

Everything 1.5.0.1339a will now show a warning when adding any NTFS folder that is already indexed.

Post Reply