maybe in the next release ?
thx
support for fuzzy logic search ?
-
- Posts: 5
- Joined: Sat Feb 29, 2020 2:24 pm
- Location: Dresden
Re: support for fuzzy logic search ?
I have experimented with fuzzy searching for Everything 1.5.
I've found fuzzy searching for filenames to be not very useful.
I've tried Levenshtein distance/soundex/metaphone.
What might be useful is a dictionary suggestions.
For example, you search for 'curiousity' Everything will suggest did you mean 'curiosity'.
The problem here is the Windows dictionary API is not very useful and my own dictionary would use more disk space than the Everything executable. (ie: bloat)
I've added options to ignore spaces and punctuations for the next major release. This seems the most useful, eg:
spiderman
spider-man
spider.man
spider man
are all equal (when ignore spaces and ignore punctuation is enabled from the Search menu).
I am looking into user defined Synonyms lists and will look into supplying some basic localized Synonyms lists.
eg:
and
&
an
'n
would be a user-defined list of words to treat as equal.
I've found fuzzy searching for filenames to be not very useful.
I've tried Levenshtein distance/soundex/metaphone.
What might be useful is a dictionary suggestions.
For example, you search for 'curiousity' Everything will suggest did you mean 'curiosity'.
The problem here is the Windows dictionary API is not very useful and my own dictionary would use more disk space than the Everything executable. (ie: bloat)
I've added options to ignore spaces and punctuations for the next major release. This seems the most useful, eg:
spiderman
spider-man
spider.man
spider man
are all equal (when ignore spaces and ignore punctuation is enabled from the Search menu).
I am looking into user defined Synonyms lists and will look into supplying some basic localized Synonyms lists.
eg:
and
&
an
'n
would be a user-defined list of words to treat as equal.
Re: support for fuzzy logic search ?
What issue did you have with Levenshtein distance? Ive played around with that one and it works. Just very slow (at least my implementation) unless you do fancy optimizations. Everyone have more than one core now. Parallelization search could speed it up.
You could try Damerau–Levenshtein distance. With the way Everything stores strings it may be fast.
You could try Damerau–Levenshtein distance. With the way Everything stores strings it may be fast.
Re: support for fuzzy logic search ?
Performance was fine.
It doesn't work so well for millions of filenames.. To many unwanted results. Even when tuned to 1 edit per 9+ characters.
For example, I might search for "tonic" and get 100,000 "sonic" results with 100 "tonic" results.
Everything would need a ranking system to make Levenshtein distance useful, eg: show "tonic" results first.
It doesn't work so well for millions of filenames.. To many unwanted results. Even when tuned to 1 edit per 9+ characters.
For example, I might search for "tonic" and get 100,000 "sonic" results with 100 "tonic" results.
Everything would need a ranking system to make Levenshtein distance useful, eg: show "tonic" results first.
Re: support for fuzzy logic search ?
I've been using Fzf for fuzzy searching, which uses the Smith-Waterman Algorithm
Matched results were quite good and performance was fast (<1s) on my file list (~5.5M files).
Maybe it's worth checking out.
The following code was used to test (Powershell):
Note: fzf needs to be on %PATH%
Note2 This implementation does not handle typos.
Matched results were quite good and performance was fast (<1s) on my file list (~5.5M files).
Maybe it's worth checking out.
The following code was used to test (Powershell):
Code: Select all
$export="FileList.txt"
"Exporting"
es -sort-path-ascending -export-txt $export
"Loading results"
$file=gi $export
[IO.File]::ReadAllText($file,[text.encoding]::Default) | fzf
Note2 This implementation does not handle typos.