Number of hits in Content Search?

Discussion related to "Everything" 1.5.
Post Reply
zeus04
Posts: 16
Joined: Wed Oct 17, 2018 3:45 pm

Number of hits in Content Search?

Post by zeus04 »

Is there a way to display (and sort by) the number of hits of a search term in a given file?
void
Developer
Posts: 19870
Joined: Fri Oct 16, 2009 11:31 pm

Re: Number of hits in Content Search?

Post by void »

I have put on my TODO list to add a property to show the number of occurrences of a search term.

Thank you for the suggestion.
void
Developer
Posts: 19870
Joined: Fri Oct 16, 2009 11:31 pm

Re: Number of hits in Content Search?

Post by void »

One convoluted way to do this now is:

Search for:

Code: Select all

#define:<t=your-search-term>regex:(#t:).*(\1).*(\1).*(\1).*(\1).*(\1).*(\1).*(\1).*(\1) | regex:(#t:).*(\1).*(\1).*(\1).*(\1).*(\1).*(\1).*(\1) | regex:(#t:).*(\1).*(\1).*(\1).*(\1).*(\1).*(\1) | regex:(#t:).*(\1).*(\1).*(\1).*(\1).*(\1) | regex:(#t:).*(\1).*(\1).*(\1).*(\1) | regex:(#t:).*(\1).*(\1).*(\1) | regex:(#t:).*(\1).*(\1) | regex:(#t:).*(\1) | regex:(#t:)
where your-search-term is your search term.

Then sort by the Regular Expression Matches 1-9 property.
NotNull
Posts: 5961
Joined: Wed May 24, 2017 9:22 pm

Re: Number of hits in Content Search?

Post by NotNull »

Can't test right now, but I think that will work fine for filenames, but less for file content as it will find multiple search terms on one line. So if there are 5 lines, each containing "searchterm" (one time), it will report 1

For a halfway solution (as said: cant test), you need NotNull ( ;) ) : [^\x00] as that makes regex look past line boundaries.
With that:

Code: Select all

c:\folder ext:txt   regex:utf8content:"(searchterm)([^\x00]*\1){9}" | regex:utf8content:"(searchterm)([^\x00]*\1){8}" | (etcetera)


I do hope that showing the number of occurrences does not become the default, as it looks to me as being slower (reading a small chunk of data, see if i contains searchterm and if so: continue with next file VS Read complete file, count occurrences, go to next file)
horst.epp
Posts: 1642
Joined: Fri Apr 04, 2014 3:24 pm

Re: Number of hits in Content Search?

Post by horst.epp »

That number of hits count for content should not be a default.
If one is realy interested on such info he can always start some tool or script from the results
which show word counts and other infos.
raccoon
Posts: 1017
Joined: Thu Oct 18, 2018 1:24 am

Re: Number of hits in Content Search?

Post by raccoon »

You can add (?s) to the beginning of your regex pattern to turn on PCRE_DOTALL [1][2][3]. This makes "." match \r and \n characters as well.
fox.txt wrote: The quick brown fox
jumps over
the lazy dog.
fails: regex:content:"fox.*dog"
works: regex:content:"(?s)fox.*dog"

@void: Can you add some /g counting from PCRE? Any chance of adding support for m//g or (?g) patterns?
NotNull
Posts: 5961
Joined: Wed May 24, 2017 9:22 pm

Re: Number of hits in Content Search?

Post by NotNull »

Heh, I always thought that was PCRE2 syntax .. But it does indeed work in Everything (PCRE1). Good to know!
void
Developer
Posts: 19870
Joined: Fri Oct 16, 2009 11:31 pm

Re: Number of hits in Content Search?

Post by void »

@void: Can you add some /g counting from PCRE? Any chance of adding support for m//g or (?g) patterns?
I have put this on my TODO list.
Thanks for the suggestion.
void
Developer
Posts: 19870
Joined: Fri Oct 16, 2009 11:31 pm

Re: Number of hits in Content Search?

Post by void »

Everything 1.5.0.1296a adds support for regex flags.

Regex flags can be enabled with the following search modifiers:

dotall:

. matches newlines
Regex alternative: (?s)



global:

Find all matches (not just the first).
If no capture groups are defined, each whole match is captured.
Regular Expression Match 0 captures from the start of the first match to the end of the last match.



multiline:

^ and $ match a whole line (not the whole text)
Regex alternative: (?m)



ungreedy:

Lazy matches by default, use (.*?) to swap between lazy/greedy.



case:

match case.
Regex alternative: (?i)



Instead of:

Code: Select all

#define:<t=your-search-term>regex:(#t:).*(\1).*(\1).*(\1).*(\1).*(\1).*(\1).*(\1).*(\1) | regex:(#t:).*(\1).*(\1).*(\1).*(\1).*(\1).*(\1).*(\1) | regex:(#t:).*(\1).*(\1).*(\1).*(\1).*(\1).*(\1) | regex:(#t:).*(\1).*(\1).*(\1).*(\1).*(\1) | regex:(#t:).*(\1).*(\1).*(\1).*(\1) | regex:(#t:).*(\1).*(\1).*(\1) | regex:(#t:).*(\1).*(\1) | regex:(#t:).*(\1) | regex:(#t:)
you can now search for:

global:regex:term

The Regular Expression Matches 1-9 property will show all the matches.



Everything doesn't support the /regex/flags syntax.
Everything uses PCRE
PCRE doesn't support a global flag.

I will consider adding support for my own (?g) flag.
For now, please use global:regex:
raccoon
Posts: 1017
Joined: Thu Oct 18, 2018 1:24 am

Re: Number of hits in Content Search?

Post by raccoon »

Thanks for adding global! :D I think the syntax you chose is more than adequate, and I've never actually seen (?g) implemented in the wild. Your method is more than fine. (in-pattern flags like (?s) can usually be turned on-and-off by using their (?-s) counterpart, and can be used around a sub-pattern. there's no way to turn /g off, or to wrap /g around only a sub-pattern, so maybe (?g) isn't even appropriate.)

Also, I wanted to ask a long time ago about (?i) and (?-i) pattern flags. If I'm not mistaken, they interfere against Everything's not-Match-Case and only works if Match-Case is enabled. Should Match Case be interfering with regex patterns, in your opinion, or should they be insulated from that option?
void
Developer
Posts: 19870
Joined: Fri Oct 16, 2009 11:31 pm

Re: Number of hits in Content Search?

Post by void »

Also, I wanted to ask a long time ago about (?i) and (?-i) pattern flags.
They work as expected.
That is (?i) and (?-i) override case: and nocase: (or Match case from the Search menu)

regex is initialized (compiled) with the case:/nocase: search modifier. (or Match case from the Search menu)
(?i) / (?-i) will override the initial regex state.

nocase:regex:(?-i)ABC matches ABC (case sensitive)
case:regex:(?i)ABC matches abc or ABC or Abc etc... (case insensitive)



Another thing to note:
The capture groups when using global: in Everything is not standard. (not that there is a standard as global doesn't exist in PCRE)
Everything doesn't really have a way to expose capture groups for each match.
I find the current implementation works well enough..

I also skip over the previous match, so
II
will only match
III
once
void
Developer
Posts: 19870
Joined: Fri Oct 16, 2009 11:31 pm

Re: Number of hits in Content Search?

Post by void »

Everything 1.5 adds support for the following:

Show the number of occurrences of foo in the name:
addcolumn:a a:=STRINGCOUNT($name:,"foo") sort:a-descending


Show the number of occurrences of foo in the content:
addcolumn:a a:=STRINGCOUNT($content:,"foo") sort:a-descending
ChrisGreaves
Posts: 821
Joined: Wed Jan 05, 2022 9:29 pm

Re: Number of hits in Content Search?

Post by ChrisGreaves »

void wrote: Sun Jul 28, 2024 4:33 am
addcolumn:a a:=STRINGCOUNT($name:,"foo") sort:a-descending

addcolumn:a a:=STRINGCOUNT($content:,"foo") sort:a-descending
Spring might be here any month now, so my mind turns to sowing beetroot seeds. I know I made a note not so long ago, and an Advanced Search turned up this post.
Untitled.jpg
Untitled.jpg (150.41 KiB) Viewed 415 times
My only part in all of this was (a) doing the advanced search (b) skipping past all the RegEx stuff and (c) trying a second time after replacing your "foo" with my "beetroot" :blush:

THANKS VOID :D :clapping: :bowing:.
Now back out to stare at the garden.
Cheers, Chris
Post Reply