Yet another distinct: question ..

Discussion related to "Everything" 1.5 Alpha.
Post Reply
NotNull
Posts: 5416
Joined: Wed May 24, 2017 9:22 pm

Yet another distinct: question ..

Post by NotNull »

I think I still don't understand distinct:

Why is 1stdout.txt not shown in the third Everything window?
Its regmatch1+md5 combination is even unique.


2024-03-30 20_59_13-file_   regex_(_T___hash__folder1__)(._$)_ _ regex_(_T___hash__folder2___)(._$) .png
2024-03-30 20_59_13-file_ regex_(_T___hash__folder1__)(._$)_ _ regex_(_T___hash__folder2___)(._$) .png (105.94 KiB) Viewed 927 times
2024-03-30 21_00_14-file_   regex_(_T___hash__folder1__)(._$)_ _ regex_(_T___hash__folder2___)(._$) .png
2024-03-30 21_00_14-file_ regex_(_T___hash__folder1__)(._$)_ _ regex_(_T___hash__folder2___)(._$) .png (44.21 KiB) Viewed 927 times

The queries:

Code: Select all

file:   regex:("T:\\hash\\folder1\\)(.*$)" | regex:("T:\\hash\\folder2\\")(.*$)     sort:regmatch1;md5 
file:   regex:("T:\\hash\\folder1\\)(.*$)" | regex:("T:\\hash\\folder2\\")(.*$)     distinctsort:regmatch1;md5 
file:   regex:("T:\\hash\\folder1\\)(.*$)" | regex:("T:\\hash\\folder2\\")(.*$)     distinct:regmatch1;md5


(I know, the distinct-sort query was not necessary)
void
Developer
Posts: 16471
Joined: Fri Oct 16, 2009 11:31 pm

Re: Yet another distinct: question ..

Post by void »

Nice find!

Everything is missing the first item listed here:

file: regex:("T:\\hash\\folder1\\)(.*$)" | regex:("T:\\hash\\folder2\\")(.*$) sort:regmatch1;md5

It's an issue when using 2 or more properties with distinct:
Using a single property will work as expected.



I will have this fixed in the next alpha update.
NotNull
Posts: 5416
Joined: Wed May 24, 2017 9:22 pm

Re: Yet another distinct: question ..

Post by NotNull »

Thanks for restoring my sanity :)

Some related questions:

Does distinct: parse all properties at once or does it try to match properties in order of appearance and stops at the first "mismatch"?
I realize that is a bit vague, so formulated differently:
What is faster: distinct:name;md5 or distinct:name;size;md5
(when Size is indexed and MD5 is not)


Based on intuition, I didn't expect the following to give the expected results ( = one distinct MD5 result per foldertree).
But if anyone would ask why, I would not be able to answer (other than some vague "global" murmering).

Code: Select all

file:  <t:\hash\folder1\   distinct:md5> | <t:\hash\folder2\   distinct:md5>
So my question: Why?
void
Developer
Posts: 16471
Joined: Fri Oct 16, 2009 11:31 pm

Re: Yet another distinct: question ..

Post by void »

Does distinct: parse all properties at once or does it try to match properties in order of appearance and stops at the first "mismatch"?
When using distinct:, Everything does a presort and then enumerates all items:
For each item:
  • Gather all properties.
  • Add the item as a result if it is the first item.
  • Add the item as a result if the item is not duplicated with the last item.
How does Everything find distinct items


What is faster: distinct:name;md5 or distinct:name;size;md5
(when Size is indexed and MD5 is not)
With distinct: they are both the same speed.
Everything will gather the MD5 for all items.

dupe:name;size;md5 would be faster as Everything finds duplicated names first, then sizes, then gathers and checks md5 values.


file: <t:\hash\folder1\ distinct:md5> | <t:\hash\folder2\ distinct:md5>
Everything applies distinct: after your search.

The search is the same as:

file: <t:\hash\folder1\ | t:\hash\folder2\> distinct:md5
NotNull
Posts: 5416
Joined: Wed May 24, 2017 9:22 pm

Re: Yet another distinct: question ..

Post by NotNull »

Crystal clear. Thanks!
void
Developer
Posts: 16471
Joined: Fri Oct 16, 2009 11:31 pm

Re: Yet another distinct: question ..

Post by void »

Everything 1.5.0.1372a fixes an issue with distinct: missing the first item when using multiple properties.
Post Reply