support for fuzzy logic search ?

General discussion related to "Everything".
Post Reply
Andreas Sachse
Posts: 5
Joined: Sat Feb 29, 2020 2:24 pm
Location: Dresden

support for fuzzy logic search ?

Post by Andreas Sachse » Sun Apr 05, 2020 11:03 am

maybe in the next release ?

thx

void
Site Admin
Posts: 6452
Joined: Fri Oct 16, 2009 11:31 pm

Re: support for fuzzy logic search ?

Post by void » Mon Apr 06, 2020 4:28 am

I have experimented with fuzzy searching for Everything 1.5.
I've found fuzzy searching for filenames to be not very useful.
I've tried Levenshtein distance/soundex/metaphone.

What might be useful is a dictionary suggestions.
For example, you search for 'curiousity' Everything will suggest did you mean 'curiosity'.
The problem here is the Windows dictionary API is not very useful and my own dictionary would use more disk space than the Everything executable. (ie: bloat)

I've added options to ignore spaces and punctuations for the next major release. This seems the most useful, eg:
spiderman
spider-man
spider.man
spider man
are all equal (when ignore spaces and ignore punctuation is enabled from the Search menu).

I am looking into user defined Synonyms lists and will look into supplying some basic localized Synonyms lists.
eg:
and
&
an
'n
would be a user-defined list of words to treat as equal.

Link
Posts: 18
Joined: Thu Nov 03, 2011 10:08 pm

Re: support for fuzzy logic search ?

Post by Link » Mon Apr 06, 2020 10:12 pm

What issue did you have with Levenshtein distance? Ive played around with that one and it works. Just very slow (at least my implementation) unless you do fancy optimizations. Everyone have more than one core now. Parallelization search could speed it up.
You could try Damerau–Levenshtein distance. With the way Everything stores strings it may be fast.

void
Site Admin
Posts: 6452
Joined: Fri Oct 16, 2009 11:31 pm

Re: support for fuzzy logic search ?

Post by void » Tue Apr 07, 2020 10:26 am

Performance was fine.

It doesn't work so well for millions of filenames.. To many unwanted results. Even when tuned to 1 edit per 9+ characters.

For example, I might search for "tonic" and get 100,000 "sonic" results with 100 "tonic" results.
Everything would need a ranking system to make Levenshtein distance useful, eg: show "tonic" results first.

aviasd
Posts: 64
Joined: Sat Oct 07, 2017 2:18 am

Re: support for fuzzy logic search ?

Post by aviasd » Mon Apr 05, 2021 7:43 pm

I've been using Fzf for fuzzy searching, which uses the Smith-Waterman Algorithm

Matched results were quite good and performance was fast (<1s) on my file list (~5.5M files).
Maybe it's worth checking out.
The following code was used to test (Powershell):

Code: Select all

$export="FileList.txt"
"Exporting"
es -sort-path-ascending -export-txt $export
"Loading results"
$file=gi $export
[IO.File]::ReadAllText($file,[text.encoding]::Default) | fzf
Note: fzf needs to be on %PATH%
Note2 This implementation does not handle typos.

Post Reply