Search Preprocessor Suggestions

Discussion related to "Everything" 1.5.
Post Reply
raccoon
Posts: 1017
Joined: Thu Oct 18, 2018 1:24 am

Search Preprocessor Suggestions

Post by raccoon »

Suggest:

#normalize:<text>
Returns text with non-ASCII Latin characters converted to nearest ASCII equivalent.
Example: #normalize:"déjà vu" -> deja vu

https://stackoverflow.com/a/10064701/8805628

Code: Select all

$string = 'Ë À Ì Â Í Ã Î Ä Ï Ç Ò È Ó É Ô Ê Õ Ö ê Ù ë Ú î Û ï Ü ô Ý õ â û ã ÿ ç';

$normalizeChars = array(
    'Š'=>'S', 'š'=>'s', 'Ð'=>'Dj','Ž'=>'Z', 'ž'=>'z', 'À'=>'A', 'Á'=>'A', 'Â'=>'A', 'Ã'=>'A', 'Ä'=>'A',
    'Å'=>'A', 'Æ'=>'A', 'Ç'=>'C', 'È'=>'E', 'É'=>'E', 'Ê'=>'E', 'Ë'=>'E', 'Ì'=>'I', 'Í'=>'I', 'Î'=>'I',
    'Ï'=>'I', 'Ñ'=>'N', 'Ń'=>'N', 'Ò'=>'O', 'Ó'=>'O', 'Ô'=>'O', 'Õ'=>'O', 'Ö'=>'O', 'Ø'=>'O', 'Ù'=>'U', 'Ú'=>'U',
    'Û'=>'U', 'Ü'=>'U', 'Ý'=>'Y', 'Þ'=>'B', 'ß'=>'Ss','à'=>'a', 'á'=>'a', 'â'=>'a', 'ã'=>'a', 'ä'=>'a',
    'å'=>'a', 'æ'=>'a', 'ç'=>'c', 'è'=>'e', 'é'=>'e', 'ê'=>'e', 'ë'=>'e', 'ì'=>'i', 'í'=>'i', 'î'=>'i',
    'ï'=>'i', 'ð'=>'o', 'ñ'=>'n', 'ń'=>'n', 'ò'=>'o', 'ó'=>'o', 'ô'=>'o', 'õ'=>'o', 'ö'=>'o', 'ø'=>'o', 'ù'=>'u',
    'ú'=>'u', 'û'=>'u', 'ü'=>'u', 'ý'=>'y', 'ý'=>'y', 'þ'=>'b', 'ÿ'=>'y', 'ƒ'=>'f',
    'ă'=>'a', 'î'=>'i', 'â'=>'a', 'ș'=>'s', 'ț'=>'t', 'Ă'=>'A', 'Î'=>'I', 'Â'=>'A', 'Ș'=>'S', 'Ț'=>'T',
);

//Output: E A I A I A I A I C O E O E O E O O e U e U i U i U o Y o a u a y c
echo strtr($string, $normalizeChars);
I would also add fancy quotes -> ascii quotes in with this function. But leave Cyrillic and other non-Latin sets alone. Toss in ligatures and digraphs if you're daring.
Last edited by void on Sat Oct 23, 2021 7:36 am, edited 3 times in total.
Reason: moved to Search Preprocessor Suggestions
void
Developer
Posts: 19870
Joined: Fri Oct 16, 2009 11:31 pm

Re: Search Preprocessor Suggestions

Post by void »

Added to my TODO list.

Thank you for the suggestion.
raccoon
Posts: 1017
Joined: Thu Oct 18, 2018 1:24 am

Re: Search Preprocessor

Post by raccoon »

#fixed got accidentally lost in the "I"s beneath #find/#instr
void
Developer
Posts: 19870
Joined: Fri Oct 16, 2009 11:31 pm

Re: Search Preprocessor Suggestions

Post by void »

Everything 1.5.0.1282a adds a #remove-diacritics: preprocessor search function to remove diacritics from the specified text.

Normalize isn't quite the right function name for the job.
I might add a normalize function similar to the javascript normalize function in a future release.
raccoon
Posts: 1017
Joined: Thu Oct 18, 2018 1:24 am

Re: Search Preprocessor Suggestions

Post by raccoon »

I would be interested in seeing/stealing your chosen mapping for #remove-diacritics:
void
Developer
Posts: 19870
Joined: Fri Oct 16, 2009 11:31 pm

Re: Search Preprocessor Suggestions

Post by void »

#remove-diacritics: will normalize the text first with NFKD.

æ becomes ae
ffi becomes ffi
Å becomes A + ◌̊
ⓥ becomes v

Any remaining Unicode-marks (eg: ◌̊ ) are removed.


It is the same function when disabling Match Diacritics from the Search menu.
raccoon
Posts: 1017
Joined: Thu Oct 18, 2018 1:24 am

Re: Search Preprocessor Suggestions

Post by raccoon »

How can I go about generating a complete mapping table?
void
Developer
Posts: 19870
Joined: Fri Oct 16, 2009 11:31 pm

Re: Search Preprocessor Suggestions

Post by void »

Everything doesn't have a simple mapping table.

Everything.Unicode.Tables.txt

values 0x0300 .. 0x036e map to the decomposition table.
3-byte UTF-8 to 3 decomposition characters are hard coded. (ffi -> ffi)
value 0x036f means the unicode point is a diacritic.

This table is likely to change during alpha.

These tables are generated from https://unicode.org/Public/UNIDATA/UnicodeData.txt
raccoon
Posts: 1017
Joined: Thu Oct 18, 2018 1:24 am

Re: Search Preprocessor Suggestions #number:000

Post by raccoon »

Advanced Rename / Folder Move --> #number:

Suggest: Allow #number:000 for convenient zero padding.
alias for #text:<#number:,000>
void
Developer
Posts: 19870
Joined: Fri Oct 16, 2009 11:31 pm

Re: Search Preprocessor Suggestions

Post by void »

Thank you for the suggestion raccoon.

I have put on my TODO list to add a #number00: and #number000: preprocessor function for 00 and 000 padding.
void
Developer
Posts: 19870
Joined: Fri Oct 16, 2009 11:31 pm

Re: Search Preprocessor Suggestions

Post by void »

Everything 1.5.0.1290a adds #number00: and #number000: for quick zero-padding.
raccoon
Posts: 1017
Joined: Thu Oct 18, 2018 1:24 am

Re: Search Preprocessor Suggestions

Post by raccoon »

void wrote: Tue Dec 21, 2021 5:35 am Everything 1.5.0.1290a adds #number00: and #number000: for quick zero-padding.
I can't tell you how many total accumulated hours that $number000: (formerly #number000:) has saved me over the past 2 years in the Advanced Rename dialog when naming audio files, and especially audiobooks, I've converted from CD. I just need to express my sincere appreciation here, and I'm shooting you $20 to your Donate page. <3
void
Developer
Posts: 19870
Joined: Fri Oct 16, 2009 11:31 pm

Re: Search Preprocessor Suggestions

Post by void »

Thank you for your donation and support.

I am glad to hear you find #number000: useful!
aRppPJhZ
Posts: 1
Joined: Fri May 02, 2025 3:42 pm

Re: Search Preprocessor Suggestions

Post by aRppPJhZ »

:)
thanks so much for the $number: pseudo property (or whatever its best name might be)!
I tried to have it reset for file renaming by folder but didn't succeed or find anything in the forum. Is there anything like this?
void
Developer
Posts: 19870
Joined: Fri Oct 16, 2009 11:31 pm

Re: Search Preprocessor Suggestions

Post by void »

Please try $parent-name:

For more properties, click the ▶ button to the right of New Format.
Click Insert Property.
Right click the property list column header and check Preview.
Examine the preview value, select the property and click OK.
Post Reply