Regex file content match limit

If you are experiencing problems with "Everything", post here for assistance.
Post Reply
Temm
Posts: 3
Joined: Fri Jan 29, 2021 7:55 pm

Regex file content match limit

Post by Temm »

I am on everything 1.5.0.1383a (x64).
I am searching in a folder with a search query like `regex:asciicontent:"prefix(.|[\r\n]){1,10}suffix"`
This shoud match "prefix" followed by 1 to 10 of any character including newlines and then "suffix".
When using a sufficiently large number for the quantifier maximum, the query stops working and returns no results (it still takes a similar amount of time to process?)
For example, for the query

Code: Select all

regex:asciicontent:"onMessage(.|[\r\n]){1,N}putStackInSlot"
the limit was 128:

Code: Select all

regex:asciicontent:"onMessage(.|[\r\n]){1,128}putStackInSlot"
returns results,

Code: Select all

regex:asciicontent:"onMessage(.|[\r\n]){1,129}putStackInSlot"
does not. (The files in the folder have not changed, the query with the higher number should have at least as many matches as the one with the lower number)

Am i missing a setting? Is there an uncommunicated regex processing limit?
Thanks
void
Developer
Posts: 19839
Joined: Fri Oct 16, 2009 11:31 pm

Re: Regex file content match limit

Post by void »

Everything sets the PCRE recursion limit to 256 to avoid stack overflows.

(.|[\r\n]){1,129}
is hitting this recursion limit.

Please try the following searches instead:

dotall:regex:asciicontent:onMessage.{1,129}putStackInSlot


regex:asciicontent:(?s)onMessage.{1,129}putStackInSlot


This avoids the recursion.

(?s) is the same as the dotall: prefix.

dotall:
Temm
Posts: 3
Joined: Fri Jan 29, 2021 7:55 pm

Re: Regex file content match limit

Post by Temm »

Nice, I can confirm this variant is working, even with quantifier limits far above 256(!?).

I had thought that something with the OR might be causing it, so even tried ([\s\S]) (whitespace or non-whitespace) instead of the (.|[\r\n]) but that didn't work either - no idea what the regex engine is doing internally here.

It seems like it makes a difference wether there is a capture group around the quantified part or not.
The expression
dotall:regex:asciicontent:"onMessage.{1,700}a"
finds 655 files.
The expression
dotall:regex:asciicontent:"onMessage(.){1,700}a"
finds only 31 files.
They should be equivalent under regex semantics. Is the capture group causing the regex engine to compile it differently and require recursion? Is there some other limit of the engine (like limit of found capture groups)? Using a non-capture group (?:.) doesn't seem to change anything.

Not sure what nonsense the regex engine is doing here. Linear-time recursion-free regex engines like https://github.com/google/re2/wiki/WhyRE2 exist, but i think that only supports actually-mathematically-"regular" regular expressions, e.g. no backreferences/look-arounds possible.

Its completely reasonable if you judge this to be unviable/too niche to deserve further attention/fixing.
I think some sort of warning to the user in case the regex search hit a limit is needed though, especially for cases where a limit-reaching regex returns *some* results (the ones that were found without hitting the limit?) but not all, as that would give false conficence of having searched properly.

Thanks!
void
Developer
Posts: 19839
Joined: Fri Oct 16, 2009 11:31 pm

Re: Regex file content match limit

Post by void »

PCRE uses recursion when using a capture group ( ... )

PCRE2 has a pcre2_dfa_match option which doesn't use recursion.
However, it doesn't support backreferences, has limited lookbehind support and is slower for some patterns.

I will consider an option to use pcre2_dfa_match.



I have put on my TODO list to handle the PCRE_ERROR_RECURSIONLIMIT (-21) error.
When encountered, Everything could fall back to pcre2_dfa_match.
Post Reply