support for fuzzy logic search ?

General discussion related to "Everything".
Post Reply
Andreas Sachse
Posts: 5
Joined: Sat Feb 29, 2020 2:24 pm

support for fuzzy logic search ?

Post by Andreas Sachse »

maybe in the next release ?

thx
void
Developer
Posts: 15096
Joined: Fri Oct 16, 2009 11:31 pm

Re: support for fuzzy logic search ?

Post by void »

I have experimented with fuzzy searching for Everything 1.5.
I've found fuzzy searching for filenames to be not very useful.
I've tried Levenshtein distance/soundex/metaphone.

What might be useful is a dictionary suggestions.
For example, you search for 'curiousity' Everything will suggest did you mean 'curiosity'.
The problem here is the Windows dictionary API is not very useful and my own dictionary would use more disk space than the Everything executable. (ie: bloat)

I've added options to ignore spaces and punctuations for the next major release. This seems the most useful, eg:
spiderman
spider-man
spider.man
spider man
are all equal (when ignore spaces and ignore punctuation is enabled from the Search menu).

I am looking into user defined Synonyms lists and will look into supplying some basic localized Synonyms lists.
eg:
and
&
an
'n
would be a user-defined list of words to treat as equal.
Link
Posts: 21
Joined: Thu Nov 03, 2011 10:08 pm

Re: support for fuzzy logic search ?

Post by Link »

What issue did you have with Levenshtein distance? Ive played around with that one and it works. Just very slow (at least my implementation) unless you do fancy optimizations. Everyone have more than one core now. Parallelization search could speed it up.
You could try Damerau–Levenshtein distance. With the way Everything stores strings it may be fast.
void
Developer
Posts: 15096
Joined: Fri Oct 16, 2009 11:31 pm

Re: support for fuzzy logic search ?

Post by void »

Performance was fine.

It doesn't work so well for millions of filenames.. To many unwanted results. Even when tuned to 1 edit per 9+ characters.

For example, I might search for "tonic" and get 100,000 "sonic" results with 100 "tonic" results.
Everything would need a ranking system to make Levenshtein distance useful, eg: show "tonic" results first.
aviasd
Posts: 135
Joined: Sat Oct 07, 2017 2:18 am

Re: support for fuzzy logic search ?

Post by aviasd »

I've been using Fzf for fuzzy searching, which uses the Smith-Waterman Algorithm

Matched results were quite good and performance was fast (<1s) on my file list (~5.5M files).
Maybe it's worth checking out.
The following code was used to test (Powershell):

Code: Select all

$export="FileList.txt"
"Exporting"
es -sort-path-ascending -export-txt $export
"Loading results"
$file=gi $export
[IO.File]::ReadAllText($file,[text.encoding]::Default) | fzf
Note: fzf needs to be on %PATH%
Note2 This implementation does not handle typos.
nspp
Posts: 10
Joined: Tue Oct 27, 2020 8:57 am

Re: support for fuzzy logic search ?

Post by nspp »

The sample you give is not the most relevant when searching files with aproximate memory or not native language names or when doing typos i.e
search for Zpider instead of Spider Skizzy instead of Squeezy also when you mispell a word inverting some letters like Mna instead of man using just list of synonym is not sufficient in this case.

The fuzzy search algorithms like in fzf or agrep is able to give weighted results.
therube
Posts: 4580
Joined: Thu Sep 03, 2009 6:48 pm

Re: support for fuzzy logic search ?

Post by therube »

Heh.
Could someone clue me in on how to use fzf?

What to do, what to expect, & how to "interact" with it from there?

Typing 'fzf' displays the files in the current directory.
Up/down arrow moves the "selector" up/down. (Sometimes display was not correct at that?)

But, then what? What am I supposed to do, what is supposed to happen, what am I supposed to see?

Oh. It interactively filters the results as you type in.
(But in my case, at least when running sandboxed, Sandboxie, the screen does not repaint correctly, so it was never clear to me that it was filtering results.)

OK. So now what?
wason92
Posts: 8
Joined: Tue May 10, 2022 1:35 pm

Re: support for fuzzy logic search ?

Post by wason92 »

Fzf is basically a simple filter; It takes an input and you filter it - (using fuzzy logic by default, though you can change it to be exact by default)
You can pipe stdout to it, or have it read a file.
Most people use it like 'dir /b /s /a |fzf'
You pipe a list of all the files under the current path to fzf, then filter it.
That's fine, but it can take awhile what you can do is combine it with es.exe

Export a list
es.exe -exporttxt some\file - on my machine this takes about 3.2 seconds for 2m files
then just fzf that file
fzf < some\file
This avoids the pipe which can take a very long while if you have a large filelist.

If you use clink , there's a nice fzf plugin
https://github.com/chrisant996/clink-fzf
You can easily change this to use es instead of dir to get something like this
ctrl+s pops up fzf with a list of every file from everything with a preview window
https://streamable.com/cjy8h5

also, 3 seconds is a long time to wait each you want to find something, so you can just run es.exe -exporttxt some\file every minute or something to get a relatively up to date list of files to search
ChrisGreaves
Posts: 602
Joined: Wed Jan 05, 2022 9:29 pm

Re: support for fuzzy logic search ?

Post by ChrisGreaves »

void wrote: Mon Apr 06, 2020 4:28 am I've tried Levenshtein distance/soundex/metaphone.
I am late to the party, but at least I did a forum search for "Soundex" before sounding off.
Soundex was magic when first I met it, a lovely example of its use against three Aussies in Paris trying to flummox the British Airways Clerk.

Just ten minutes ago Online Soundex coded "Greaves" into G612.

I was thinking it might help narrow down duplicate files names of MP3 tracks, and it wouldn't take much (hint, hint!) to persuade me to experiment with music tracks:-
Soundex_01.png
Soundex_01.png (41.74 KiB) Viewed 3451 times
Of course there is a difference between trying to corral a herd of tracks when the first keyword is already in the filter.

Then I wondered how Soundex might be applied to text content within a file, rather than the name of the file.

I suppose that someone who understands regex already has a Soundex encoder hidden away somewhere?
Cheers, Chris
ChrisGreaves
Posts: 602
Joined: Wed Jan 05, 2022 9:29 pm

Re: support for fuzzy logic search ?

Post by ChrisGreaves »

void wrote: Tue Apr 07, 2020 10:26 am It doesn't work so well for millions of filenames..
... but on just 19,190 MP3 music tracks, Soundex or similar might trim the results down to a size from which the user could save some time.

Now that I think of it "sonata" reduced to "S530", if I then turned around and fed "S530" in and asked Everything to return all names that had mapped to "S530", that might uncover a slew of errors in my file names. Like "somata", "sonada" and the like.
Please note that I am not the one keying in these names; they are often keyed in by point-amassing YouTubers around the world.

Too it might help with names with diacritics.
Cheers, Chris
P.S. Do these two responses count as two more votes for Fuzzy? :twisted: :twisted:
void
Developer
Posts: 15096
Joined: Fri Oct 16, 2009 11:31 pm

Re: support for fuzzy logic search ?

Post by void »

I will consider adding soundex:/metaphone: search functions.

Thank you for the suggestion.
void
Developer
Posts: 15096
Joined: Fri Oct 16, 2009 11:31 pm

Re: support for fuzzy logic search ?

Post by void »

Everything 1.5.0.1339a adds a soundex: search modifier.

When enabled, Everything will match files/properties by SQL soundex.

The whole name is matched.
Only A-Z, a-z letters are currently supported.

For example:
soundex:david
soundex:carpenter
soundex:artist:michael

Soundex



I'll add metaphone support eventually.
metaphone will support ignoring diacritics. (I should support ignoring diacritics in soundex too, but this is non-standard)
ChrisGreaves
Posts: 602
Joined: Wed Jan 05, 2022 9:29 pm

Re: support for fuzzy logic search ?

Post by ChrisGreaves »

void wrote: Thu Mar 02, 2023 6:50 amWhen enabled, Everything will match files/properties by SQL soundex.
Thanks Void. I have d/l Everything-1.5.0.1339a.x64-Setup and will install it after lunch.
That is after my walk to the PO to collect the new laptop that still won't have arrived! :lol: If only Canada Post could deliver parcels as fast as you deliver new features ... :roll:
Cheers, Chris
void
Developer
Posts: 15096
Joined: Fri Oct 16, 2009 11:31 pm

Re: support for fuzzy logic search ?

Post by void »

Everything 1.5.0.1340a improves soundex:

trailing vowels are ignored. (davide will now match david)
added support for soundex format, for example: soundex:d13
added nodiacritics support.
added highlighting.
Post Reply