Very complex regular expression in multiline

Off-topic posts of interest to the "Everything" community.
Post Reply
Debugger
Posts: 204
Joined: Thu Jan 26, 2017 11:56 am

Very complex regular expression in multiline

Post by Debugger » Fri Nov 09, 2018 10:33 am

Very complex regular expression in multiline
Find all words after the line === not containing Russian letters or unicode, or any url

Line1: ===
Line2: URL
Line3: Title

Example:
NO

Code: Select all

===
www.site.ru/Compliments
Про комплименты
YES

Code: Select all

===
Про комплименты

void
Site Admin
Posts: 4193
Joined: Fri Oct 16, 2009 11:31 pm

Re: Very complex regular expression in multiline

Post by void » Sat Nov 10, 2018 8:36 am

Find all words after the line === not containing Russian letters or unicode, or any url
Your YES example contains russian after === so I am not sure what you want..

=== = match literal ===
(\r\n|\n) = match newline
[a-z]+\.[a-z]+ = very loosely match a URL.
www\.[a-z]+\.[a-z]+ = match a URL starting with www.
(http://|https://)?(www\.)?[a-z]+\.[a-z]+ = match a URL starting with possible http:// or https:// and/or www.
[\p{Cyrillic}] = match a Russian letter.
[^\x00-\x7f] = match a non-ASCII character.

Post Reply