Number of hits in Content Search?

Discussion related to "Everything" 1.5 Alpha.
Post Reply
zeus04
Posts: 16
Joined: Wed Oct 17, 2018 3:45 pm

Number of hits in Content Search?

Post by zeus04 »

Is there a way to display (and sort by) the number of hits of a search term in a given file?
void
Developer
Posts: 15096
Joined: Fri Oct 16, 2009 11:31 pm

Re: Number of hits in Content Search?

Post by void »

I have put on my TODO list to add a property to show the number of occurrences of a search term.

Thank you for the suggestion.
void
Developer
Posts: 15096
Joined: Fri Oct 16, 2009 11:31 pm

Re: Number of hits in Content Search?

Post by void »

One convoluted way to do this now is:

Search for:

Code: Select all

#define:<t=your-search-term>regex:(#t:).*(\1).*(\1).*(\1).*(\1).*(\1).*(\1).*(\1).*(\1) | regex:(#t:).*(\1).*(\1).*(\1).*(\1).*(\1).*(\1).*(\1) | regex:(#t:).*(\1).*(\1).*(\1).*(\1).*(\1).*(\1) | regex:(#t:).*(\1).*(\1).*(\1).*(\1).*(\1) | regex:(#t:).*(\1).*(\1).*(\1).*(\1) | regex:(#t:).*(\1).*(\1).*(\1) | regex:(#t:).*(\1).*(\1) | regex:(#t:).*(\1) | regex:(#t:)
where your-search-term is your search term.

Then sort by the Regular Expression Matches 1-9 property.
NotNull
Posts: 5167
Joined: Wed May 24, 2017 9:22 pm

Re: Number of hits in Content Search?

Post by NotNull »

Can't test right now, but I think that will work fine for filenames, but less for file content as it will find multiple search terms on one line. So if there are 5 lines, each containing "searchterm" (one time), it will report 1

For a halfway solution (as said: cant test), you need NotNull ( ;) ) : [^\x00] as that makes regex look past line boundaries.
With that:

Code: Select all

c:\folder ext:txt   regex:utf8content:"(searchterm)([^\x00]*\1){9}" | regex:utf8content:"(searchterm)([^\x00]*\1){8}" | (etcetera)


I do hope that showing the number of occurrences does not become the default, as it looks to me as being slower (reading a small chunk of data, see if i contains searchterm and if so: continue with next file VS Read complete file, count occurrences, go to next file)
horst.epp
Posts: 1332
Joined: Fri Apr 04, 2014 3:24 pm

Re: Number of hits in Content Search?

Post by horst.epp »

That number of hits count for content should not be a default.
If one is realy interested on such info he can always start some tool or script from the results
which show word counts and other infos.
raccoon
Posts: 1015
Joined: Thu Oct 18, 2018 1:24 am

Re: Number of hits in Content Search?

Post by raccoon »

You can add (?s) to the beginning of your regex pattern to turn on PCRE_DOTALL [1][2][3]. This makes "." match \r and \n characters as well.
fox.txt wrote: The quick brown fox
jumps over
the lazy dog.
fails: regex:content:"fox.*dog"
works: regex:content:"(?s)fox.*dog"

@void: Can you add some /g counting from PCRE? Any chance of adding support for m//g or (?g) patterns?
NotNull
Posts: 5167
Joined: Wed May 24, 2017 9:22 pm

Re: Number of hits in Content Search?

Post by NotNull »

Heh, I always thought that was PCRE2 syntax .. But it does indeed work in Everything (PCRE1). Good to know!
void
Developer
Posts: 15096
Joined: Fri Oct 16, 2009 11:31 pm

Re: Number of hits in Content Search?

Post by void »

@void: Can you add some /g counting from PCRE? Any chance of adding support for m//g or (?g) patterns?
I have put this on my TODO list.
Thanks for the suggestion.
void
Developer
Posts: 15096
Joined: Fri Oct 16, 2009 11:31 pm

Re: Number of hits in Content Search?

Post by void »

Everything 1.5.0.1296a adds support for regex flags.

Regex flags can be enabled with the following search modifiers:

dotall:

. matches newlines
Regex alternative: (?s)



global:

Find all matches (not just the first).
If no capture groups are defined, each whole match is captured.
Regular Expression Match 0 captures from the start of the first match to the end of the last match.



multiline:

^ and $ match a whole line (not the whole text)
Regex alternative: (?m)



ungreedy:

Lazy matches by default, use (.*?) to swap between lazy/greedy.



case:

match case.
Regex alternative: (?i)



Instead of:

Code: Select all

#define:<t=your-search-term>regex:(#t:).*(\1).*(\1).*(\1).*(\1).*(\1).*(\1).*(\1).*(\1) | regex:(#t:).*(\1).*(\1).*(\1).*(\1).*(\1).*(\1).*(\1) | regex:(#t:).*(\1).*(\1).*(\1).*(\1).*(\1).*(\1) | regex:(#t:).*(\1).*(\1).*(\1).*(\1).*(\1) | regex:(#t:).*(\1).*(\1).*(\1).*(\1) | regex:(#t:).*(\1).*(\1).*(\1) | regex:(#t:).*(\1).*(\1) | regex:(#t:).*(\1) | regex:(#t:)
you can now search for:

global:regex:term

The Regular Expression Matches 1-9 property will show all the matches.



Everything doesn't support the /regex/flags syntax.
Everything uses PCRE
PCRE doesn't support a global flag.

I will consider adding support for my own (?g) flag.
For now, please use global:regex:
raccoon
Posts: 1015
Joined: Thu Oct 18, 2018 1:24 am

Re: Number of hits in Content Search?

Post by raccoon »

Thanks for adding global! :D I think the syntax you chose is more than adequate, and I've never actually seen (?g) implemented in the wild. Your method is more than fine. (in-pattern flags like (?s) can usually be turned on-and-off by using their (?-s) counterpart, and can be used around a sub-pattern. there's no way to turn /g off, or to wrap /g around only a sub-pattern, so maybe (?g) isn't even appropriate.)

Also, I wanted to ask a long time ago about (?i) and (?-i) pattern flags. If I'm not mistaken, they interfere against Everything's not-Match-Case and only works if Match-Case is enabled. Should Match Case be interfering with regex patterns, in your opinion, or should they be insulated from that option?
void
Developer
Posts: 15096
Joined: Fri Oct 16, 2009 11:31 pm

Re: Number of hits in Content Search?

Post by void »

Also, I wanted to ask a long time ago about (?i) and (?-i) pattern flags.
They work as expected.
That is (?i) and (?-i) override case: and nocase: (or Match case from the Search menu)

regex is initialized (compiled) with the case:/nocase: search modifier. (or Match case from the Search menu)
(?i) / (?-i) will override the initial regex state.

nocase:regex:(?-i)ABC matches ABC (case sensitive)
case:regex:(?i)ABC matches abc or ABC or Abc etc... (case insensitive)



Another thing to note:
The capture groups when using global: in Everything is not standard. (not that there is a standard as global doesn't exist in PCRE)
Everything doesn't really have a way to expose capture groups for each match.
I find the current implementation works well enough..

I also skip over the previous match, so
II
will only match
III
once
Post Reply