Smart Dupe

Discussion related to "Everything" 1.5.
Post Reply
dougbenham
Posts: 34
Joined: Wed Mar 15, 2023 8:19 pm

Smart Dupe

Post by dougbenham »

Lets say you have 2 movie files that have a $length of:
  • 51004799999
  • 51005040000
And you want to design a search that can return these as a potential dupe because of the similar length (amongst other characteristics). You might add a new column where you transform the $length into seconds and then round to the nearest second

Code: Select all

column1:=ROUND($length:/10000000.0)
However these 2 movies end up having just barely the wrong lengths:
  • 5100
  • 5101
And they don't show up as dupes because of how we've chosen to discretize our timeframes.

So we add another column with an offset of 0.5 before rounding:

Code: Select all

column2:=ROUND($length:/10000000.0+0.5)
Perfect, these end up having the same rounded lengths and would return as dupes:
  • 5101
  • 5101
And now you want to check if dupes are found on either column like so:

Code: Select all

.. <dupe:column1 | dupe:column2>
However, for some reason dupe does not work when combined with | operator. So my request is to either add a smart dupe mechanism where properties from both entity A and B are available for operations (like
ABS(length1-length2)<1
) or perhaps just allow dupe to work with OR operator so I can use my workaround.
therube
Posts: 5711
Joined: Thu Sep 03, 2009 6:48 pm

Re: Smart Dupe

Post by therube »

Length itself already has a Tolerance setting,
length_dupe_tolerance=
.
Pretty sure it (the tolerance number) is in ms.

So at least set that to 1000 (or whatever) & display 2 files that you know are within 1 second of another, & do a 'Find Length Duplicates' on the Length column, & ensure that those 2 expected files are identified as dupes.


From there, you can check to see if the same will apply to formulas you use in Search itself (which I'm not sure offhand).
dougbenham
Posts: 34
Joined: Wed Mar 15, 2023 8:19 pm

Re: Smart Dupe

Post by dougbenham »

Well that works perfectly, thanks :D
Post Reply