Advanced Copy/Move name similarity patterning function source code

raccoon · Post by **raccoon** » Tue Nov 24, 2020 11:14 pm

Dear void,

I am really impressed, enamored even, by the name similarity patterning functionality of Advanced Copy and Move. It's on point and cleverly detects similarities I hadn't even noticed at times.

I would really like to port this function to a different scripting language, emulating the string comparison methods you've used in order to detect similarities and differences. I'm curious which technique you're using and how you approached it in code.

Please let me take a peek!

Post by **void** » Tue Nov 24, 2020 11:32 pm

How the Everything 1.4 multifile rename pattern matching works:

Find the shortest-filename-length and longest-filename-length.

Find the end-match-length (the text that matches at the end of all filenames).

Subtract the end-match-length from shortest-filename-length
Subtract the end-match-length from longest-filename-length

Loop through all the filename positions (0 - shortest-filename-length-1)
If the character at the current position in the filename is the same for all filenames, emit the character to the format, otherwise emit %x and increment the variable number.

There cannot be consecutive variables (eg: %1%2).
If shortest-filename-length != longest-filename-length emit another variable to the format.

Emit the text that matches at the end of all filenames to the format.

raccoon · Post by **raccoon** » Wed Nov 25, 2020 12:24 am

Ok, so if I've got this right, you start by iterating each string from right-to-left for each file, repeating for each matching character and halting upon the first mismatch character, thus creating our filter suffix. Then we do the same iterating each string from left-to-right until we hit a mismatch character, in which we drop a variable %N. I'm not sure though, at this point, how you pick up for finding substring pattern synchronization.

In most compare algorithms, there's going to be a synchronization 'window' of N-bytes. The smaller the window, the narrower the opportunity for synchronization, but also faster to iterate. I don't see you mention a window here.

Post by **void** » Wed Nov 25, 2020 12:30 am

There is no window. There is no substring matching.

Everything looks for matches only at the same position for each filename.
Which can lead to weird matches:

a b c
123 4

%1 %2
-The space in the format above matches the space 3rd position for each filename.

This issue is not as noticeable when using more filenames.

raccoon · Post by **raccoon** » Wed Nov 25, 2020 3:34 am

Ah, ok, so I guess effectively a 1 byte window. Either the character matches in that position for every filename, or it's masked as a wildcard. With exception to filename suffixes which are bound to the end of the filename. That explains why it's so fast, and yet still very effective.

Thanks so much!

P.S.: Oh, is regex the same thing, different symbols, or any new twists?

voidtools forum

Advanced Copy/Move name similarity patterning function source code

Advanced Copy/Move name similarity patterning function source code

Re: Advanced Copy/Move name similarity patterning function source code

Re: Advanced Copy/Move name similarity patterning function source code

Re: Advanced Copy/Move name similarity patterning function source code

Re: Advanced Copy/Move name similarity patterning function source code