Regex Unicode and Non-Unicode

General discussion related to "Everything".
Debugger
Posts: 78
Joined: Thu Jan 26, 2017 11:56 am

Regex Unicode and Non-Unicode

Postby Debugger » Sun Feb 19, 2017 9:18 am

It does not work in Everything:

Unicode:
(?>\x0D\x0A|[\x0A-\x0D\x{85}\x{2028}\x{2029}])
or
\p{Han} for CJK ideographs

Not Unicode:
(?>\x0D\x0A|[\x0A-\x0D])

void
Site Admin
Posts: 3180
Joined: Fri Oct 16, 2009 11:31 pm

Re: Regex Unicode and Non-Unicode

Postby void » Sun Feb 19, 2017 11:45 pm

Everything uses Perl Compatible Regular Expressions.

Please try:

\b\x0D\x0A|[\x0A-\x0D\x{85}\x{2028}\x{2029}]

Unicode is supported.
\p{...} is not supported.

Debugger
Posts: 78
Joined: Thu Jan 26, 2017 11:56 am

Re: Regex Unicode and Non-Unicode

Postby Debugger » Wed Mar 01, 2017 5:08 pm

[quote="void"]

Please try:

\b (?>) Matches a word boundary (the start or end of a word).

Regex enabled:
\b\x0D\x0A|[\x0A-\x0D\x{85}\x{2028}\x{2029}]
Always not work:
0 objects!!!!!!!!!!!

void
Site Admin
Posts: 3180
Joined: Fri Oct 16, 2009 11:31 pm

Re: Regex Unicode and Non-Unicode

Postby void » Thu Mar 02, 2017 8:09 am

regex:"\b\x0D\x0A|[\x0A-\x0D\x{85}\x{2028}\x{2029}]" is working correctly here.

\b = starting word boundary.
\x0D = carriage return
\x0A = new line
| = OR (all text before this is one search, all text after this another search)
[] = in a set
\x{85} = new line
\x{2028} = separator
\x{2029} = separator

Combing them all together you get:
(a carriage return or newline after a word boundary) OR (a single character matching a carriage return, newline, atlernate newline or unicode separator 2028 or unicode separator 2029)

What exactly are you trying to search for?

Please try without the word boundary:
regex:[\x0A-\x0D\x{85}\x{2028}\x{2029}]

Make sure regex is disabled from the Search menu if you use the regex: modifier.
Also if you use the regex: modifier, please make sure you escape | with double quotes.

You can also use the built in macro to find unicode characters, which should be faster, with regex disabled, search for:
#x0a:|#x0d:|#x85:|#x2028:|#x2029:

Debugger
Posts: 78
Joined: Thu Jan 26, 2017 11:56 am

Re: Regex Unicode and Non-Unicode

Postby Debugger » Fri Mar 03, 2017 6:52 am

0 object

Image Image

void
Site Admin
Posts: 3180
Joined: Fri Oct 16, 2009 11:31 pm

Re: Regex Unicode and Non-Unicode

Postby void » Sat Mar 04, 2017 10:35 am

Are you certain you have a filename with one of the above characters?

Does the following search find any results:
#x0a:|#x0d:|#x85:|#x2028:|#x2029:

Debugger
Posts: 78
Joined: Thu Jan 26, 2017 11:56 am

Re: Regex Unicode and Non-Unicode

Postby Debugger » Mon Mar 06, 2017 1:35 pm

It does not work for me.
I want correct Regex: Show the names of Unicode
I want correct Regex: All names without Unicode.

void
Site Admin
Posts: 3180
Joined: Fri Oct 16, 2009 11:31 pm

Re: Regex Unicode and Non-Unicode

Postby void » Wed Mar 08, 2017 4:27 am

I've tested creating filenames with 0x0a, 0x0d, U+2028 and U+2029 characters and the above searches would find them.

It's not clear what you are searching for.

To search for files with non-ASCII characters, search for:
regex:[^\x{00}-\x{7f}]

To search for files with only non-ASCII characters, search for:
!regex:[\x{00}-\x{7f}]

To search for files with ASCII only characters, search for:
regex:^[\x{00}-\x{7f}]*$

skribb
Posts: 9
Joined: Thu Mar 20, 2014 11:06 am

Re: Regex Unicode and Non-Unicode

Postby skribb » Wed Mar 08, 2017 11:08 pm

Debugger wrote:It does not work in Everything:

Unicode:
(?>\x0D\x0A|[\x0A-\x0D\x{85}\x{2028}\x{2029}])
or
\p{Han} for CJK ideographs

Not Unicode:
(?>\x0D\x0A|[\x0A-\x0D])



I don't know anything about Regx BUT as far as I understand it I don't see why those strings would find folders and file names containing characters from the non-latin character set

Debugger
Posts: 78
Joined: Thu Jan 26, 2017 11:56 am

Re: Regex Unicode and Non-Unicode

Postby Debugger » Fri Mar 10, 2017 9:32 am

regex:[^\x{00}-\x{7f}]

It works, but I do not want to include Polish alphabet (native OS Polish)
https://en.wikipedia.org/wiki/Polish_alphabet
Image
Show only English + Unicode.

void
Site Admin
Posts: 3180
Joined: Fri Oct 16, 2009 11:31 pm

Re: Regex Unicode and Non-Unicode

Postby void » Sat Mar 11, 2017 1:21 pm

It works, but I do not want to include Polish alphabet (native OS Polish)


regex:[^\x{00}-\x{7f}\x{104}\x{106}\x{118}\x{141}\x{143}\x{d3}\x{15a}\x{179}\x{17b}\x{105}\x{107}\x{119}\x{142}\x{144}\x{f3}\x{15b}\x{17a}\x{17c}]

Show only English + Unicode.


What do you mean by English? does this include spaces? numbers?
What do you mean by Unicode? I assume you mean characters with a code > 7f.

To search for a-z only search for:
regex:^[a-zA-Z]*$

Debugger
Posts: 78
Joined: Thu Jan 26, 2017 11:56 am

Re: Regex Unicode and Non-Unicode

Postby Debugger » Sat Mar 11, 2017 3:45 pm

English
Aa Cc Ec
Aceelerator

-----------------------
Polish
AĄaą CĆcć EĘeę
Mąka ćwikłowa

------------------------
Unicode -> Other languages than Polish native + Special Chars ★ Hozda ★

Code: Select all

¡ ¦

гвинея-спорт_олимпиада_мюнхен-72(1972)
極上スマイル(brz_



regex:^[a-zA-Z]*$
It does not show all the folders
It does not show all the files


Return to “General”