support for fuzzy logic search ?

Posted: Sun Apr 05, 2020 11:03 am
by Andreas Sachse
maybe in the next release ?


Re: support for fuzzy logic search ?

Posted: Mon Apr 06, 2020 4:28 am
by void
I have experimented with fuzzy searching for Everything 1.5.
I've found fuzzy searching for filenames to be not very useful.
I've tried Levenshtein distance/soundex/metaphone.

What might be useful is a dictionary suggestions.
For example, you search for 'curiousity' Everything will suggest did you mean 'curiosity'.
The problem here is the Windows dictionary API is not very useful and my own dictionary would use more disk space than the Everything executable. (ie: bloat)

I've added options to ignore spaces and punctuations for the next major release. This seems the most useful, eg:
spider man
are all equal (when ignore spaces and ignore punctuation is enabled from the Search menu).

I am looking into user defined Synonyms lists and will look into supplying some basic localized Synonyms lists.
would be a user-defined list of words to treat as equal.

Re: support for fuzzy logic search ?

Posted: Mon Apr 06, 2020 10:12 pm
by Link
What issue did you have with Levenshtein distance? Ive played around with that one and it works. Just very slow (at least my implementation) unless you do fancy optimizations. Everyone have more than one core now. Parallelization search could speed it up.
You could try Damerau–Levenshtein distance. With the way Everything stores strings it may be fast.

Re: support for fuzzy logic search ?

Posted: Tue Apr 07, 2020 10:26 am
by void
Performance was fine.

It doesn't work so well for millions of filenames.. To many unwanted results. Even when tuned to 1 edit per 9+ characters.

For example, I might search for "tonic" and get 100,000 "sonic" results with 100 "tonic" results.
Everything would need a ranking system to make Levenshtein distance useful, eg: show "tonic" results first.

Re: support for fuzzy logic search ?

Posted: Mon Apr 05, 2021 7:43 pm
by aviasd
I've been using Fzf for fuzzy searching, which uses the Smith-Waterman Algorithm

Matched results were quite good and performance was fast (<1s) on my file list (~5.5M files).
Maybe it's worth checking out.
The following code was used to test (Powershell):

Code: Select all

es -sort-path-ascending -export-txt $export
"Loading results"
$file=gi $export
[IO.File]::ReadAllText($file,[text.encoding]::Default) | fzf
Note: fzf needs to be on %PATH%
Note2 This implementation does not handle typos.