Complex regex to find duplicate unique file names with exactly 11 characters?

If you are experiencing problems with "Everything", post here for assistance.
Post Reply
cal
Posts: 15
Joined: Fri Jul 25, 2025 5:24 pm

Complex regex to find duplicate unique file names with exactly 11 characters?

Post by cal »

I have videos downloaded using different downloaders.

It has resolutions in the file name like 720p 854p etc.

I have since downloaded those same videos in higher resolutions.

What I am trying to do is find those lower resolution duplicates.

The problem is, some file names will have the video id in it and some do not, file sizes for the same video downloaded by the same app can have different file sizes usually a little bigger or smaller but never the same file size.

The unique video id is the only unique identifier and it is 11 characters, it can have 0-9, a-Z, _ and -

I'm thinking of various ways to find these files.

There are file names like these, some has the video id in it and some do not, for this example the unique video id is -A_bcd01-3_

Some example file names, if a video id exists it would have a space or a dash in the beginning and end of the video id
Fundamentals of Finance & Economics for Businesses – Beginners Course 1.mp4
Fundamentals of Finance & Economics for Businesses – Beginners Course 1 -A_bcd01-3_ 720p.mp4
Fundamentals of Finance & Economics for Businesses – Beginners Course 1--A_bcd01-3_-720p.mp4
Fundamentals of Finance & Economics for Businesses – Bonus Lesson.mp4

I think to find duplicate file names that doesn't have the video id, I would need regex to match first 50 characters, this would bring up files that do not have the unique id and files that have the unique id.
I entered this in the search and it doesn't seem to work:

Code: Select all

regex:^.{1,50}
For files that have the video id, I would need it to match any file names that has the same unique 11 characters delimited by a space or a dash, couldn't figure out if that is possible, is it?
void
Developer
Posts: 19053
Joined: Fri Oct 16, 2009 11:31 pm

Re: Complex regex to find duplicate unique file names with exactly 11 characters?

Post by void »

You will need Everything 1.5.

Search for:

Code: Select all

regex:^(.*?)(?:[\s\-]([a-z0-9_\-]{11})[\s\-][\d]+p)?\.[^.]*$ addcolumn:1;2 dupe:1
The filename bit before -A_bcd01-3_ is shown in the regmatch1 column.
The -A_bcd01-3_ bit is shown in regmatch2 column. (if it is found)
regex: == enable regular expressions.
^ == match start of filename.
( ) == capture group
.*? == match any character any number of times (lazy) - matches the filename bit before -A_bcd01-3_
(?: ) == group without capture
[\s\-] == match a space or -
[a-z0-9_\-]{11} == match your video ID (a-z0-9 _ or - 11 times)
[\d]+p == match 720p, 854p etc..
? match previous element zero or one times. (this skips the Video ID if it doesn't exist)
\. == match a literal .
[^.]* == match the extension.
$ == match the end of the filename.
addcolumn:1;2 == add regmatch1 and regmatch2 columns
dupe:1 == list only files that have a duplicated regmatch1.
cal
Posts: 15
Joined: Fri Jul 25, 2025 5:24 pm

Re: Complex regex to find duplicate unique file names with exactly 11 characters?

Post by cal »

The board seems to have a problem with logging me out, after typing out a post or comment, it made me login again and when I do the post is gone and I would have to start over if I didn't type it out in notepad first.

That search doesn't seem to work, I am using 1.5.0.1383a, it brought up these 3 results but 5 was expected:
Fundamentals of Finance & Economics for Businesses – Beginners Course 1.mp4
Fundamentals of Finance & Economics for Businesses – Beginners Course 1 -A_bcd01-3_ 720p.mp4
Fundamentals of Finance & Economics for Businesses – Beginners Course 1--A_bcd01-3_-720p.mp4

I created a new folder and put these 6 files inside and searched in that folder to test:
Fundamentals of Finance & Economics for Businesses – Beginners Course 1.mp4
Fundamentals of Finance & Economics for Businesses – Beginners Course 1 -A_bcd01-3_ 720p.mp4
Fundamentals of Finance & Economics for Businesses – Beginners Course 1--A_bcd01-3_-720p.mp4
Fundamentals of Finance & Economics for Businesses – Bonus Lesson i_AaAbBs0f_ 360p.mp4
Fundamentals of Finance & Economics for Businesses – Bonus Lesson-i_AaAbBs0f_-1080p.mp4
Random Misc Video.mkv

It should have brought up these 5 results:
Fundamentals of Finance & Economics for Businesses – Beginners Course 1.mp4
Fundamentals of Finance & Economics for Businesses – Beginners Course 1 -A_bcd01-3_ 720p.mp4
Fundamentals of Finance & Economics for Businesses – Beginners Course 1--A_bcd01-3_-720p.mp4
Fundamentals of Finance & Economics for Businesses – Bonus Lesson i_AaAbBs0f_ 360p.mp4
Fundamentals of Finance & Economics for Businesses – Bonus Lesson-i_AaAbBs0f_-1080p.mp4

I have another regex request, this one is 6 to 8 digits unique id and only uses numbers and delimited with a dash or space in the file name.
Example video id, 102999 or 28192851
File names:
Video-28191851-720p.mp4
2nd Video 28192851-1080p.mp4
New Economics Video 102999.mp4
Another Video 102838.mp4

Expected results after using the search query
ECON 101-28191851-720p.mp4
Economics For Beginners 28192851 1080p.mp4

Code: Select all

$ == match the end of the filename.
Does that refer to the extension because a video could be saved in .mp4 while another app downloading the same video downloads it in .mkv extension.

Thanks!
void
Developer
Posts: 19053
Joined: Fri Oct 16, 2009 11:31 pm

Re: Complex regex to find duplicate unique file names with exactly 11 characters?

Post by void »

It should have brought up these 5 results:
Hit F5 in Everything to refresh a dupe: search.
I see your expected results here.


I have another regex request, this one is 6 to 8 digits unique id and only uses numbers and delimited with a dash or space in the file name.

Code: Select all

regex:[\s\-](\d{6,8})[\s\-]\d+p\.[^.]*$ addcolumn:1 dupe:1
Video ID is shown in regmatch1 column.

These shouldn't match because they have different IDs:
ECON 101-28191851-720p.mp4
Economics For Beginners 28192851 1080p.mp4


$ == match the end of the filename.
Does that refer to the extension because a video could be saved in .mp4 while another app downloading the same video downloads it in .mkv extension.
End of the filename.
The extension is skipped over earlier and ignored with [^.]*
cal
Posts: 15
Joined: Fri Jul 25, 2025 5:24 pm

Re: Complex regex to find duplicate unique file names with exactly 11 characters?

Post by cal »

Refreshing showed the correct results so far but it looks like this here is making it not work on some searches
[\d]+p == match 720p, 854p etc..

Testing it on the multiple drives instead of a test folder, some file names have codec and fps details in the file name.

They all have either file name or file name and video id at the start of the file name, maybe the regex could ignore anything after the video id if it exists?

For example the same video could have any one of these file names but different file sizes.
Fundamentals of Finance & Economics for Businesses – Beginners Course 1.mp4
Fundamentals of Finance & Economics for Businesses – Beginners Course 1 720p.mp4
Fundamentals of Finance & Economics for Businesses – Beginners Course 1 -A_bcd01-3_ 720p.mp4
Fundamentals of Finance & Economics for Businesses – Beginners Course 1 -A_bcd01-3_ av1 1080p.mp4
Fundamentals of Finance & Economics for Businesses – Beginners Course 1 -A_bcd01-3_ av1 1080p.mp4
Fundamentals of Finance & Economics for Businesses – Beginners Course 1--A_bcd01-3_-720p.mp4
Fundamentals of Finance & Economics for Businesses – Beginners Course 1--A_bcd01-3_-720p-30fps.mp4
Fundamentals of Finance & Economics for Businesses – Beginners Course 1--A_bcd01-3_-av1-720p-30fps.mp4

Similar issue on the other regex, if something comes after the resolution, it wouldn't show in results as a duplicate

Code: Select all

regex:[\s\-](\d{6,8})[\s\-]\d+p\.[^.]*$ addcolumn:1 dupe:1
Sample Video Test-31910171-2160p.mp4
Sample Video Test-31910171-720p-30fps.mp4
Sample Video Test 31910171 1080p-av1-60fps.mp4
void
Developer
Posts: 19053
Joined: Fri Oct 16, 2009 11:31 pm

Re: Complex regex to find duplicate unique file names with exactly 11 characters?

Post by void »

Because the unique key (-A_bcd01-3_) can also match any normal word, you will have to match av1 and the fps.

Does the unique key always have a - or _ ?
-It would make this easier.

Please try the following regex search:

Code: Select all

regex:^(.*?)(?:[\s\-]([a-z0-9_\-]{11})([\s\-]av1)?[\s\-]\d+p([\s\-]\d+fps)?)?\.[^.]*$ addcolumn:1;2 dupe:1

Similar issue on the other regex, if something comes after the resolution, it wouldn't show in results as a duplicate
Since the video ID is all numbers, please try:

Code: Select all

regex:[\s\-](\d{6,8})[\s\-] addcolumn:1 dupe:1
cal
Posts: 15
Joined: Fri Jul 25, 2025 5:24 pm

Re: Complex regex to find duplicate unique file names with exactly 11 characters?

Post by cal »

Does the unique key always have a - or _ ?
No, but if the filename has a unique key it will always be 11 characters and has either space or - to separate it
Such as [filename 1234567890abc AV1 1440p.mp4] or [filename-1234567890abc-AV1-1440p.mp4]
It won't have space and -, such as [filename 123456789abc-av1-1440p.mp4] or [filename-123456789abc av1 1440p.mp4]
Unique key for example above is 123456789abc

Code: Select all

regex:^(.*?)(?:[\s\-]([a-z0-9_\-]{11})([\s\-]av1)?[\s\-]\d+p([\s\-]\d+fps)?)?\.[^.]*$ addcolumn:1;2 dupe:1
This regex is able to find 8 out of 10, the 2 I bolded was supposed to be in the results but isn't there

Fundamentals of Finance & Economics for Businesses – Beginners Course 1.mp4
Fundamentals of Finance & Economics for Businesses – Beginners Course 1 1440p.mp4
Fundamentals of Finance & Economics for Businesses – Beginners Course 1 AV1 720p.mp4

Fundamentals of Finance & Economics for Businesses – Beginners Course 1 -A_bcd01-3_ 640p.mp4
Fundamentals of Finance & Economics for Businesses – Beginners Course 1--A_bcd01-3_-640p-30fps.mp4
Fundamentals of Finance & Economics for Businesses – Beginners Course 1--A_bcd01-3_-720p.mp4
Fundamentals of Finance & Economics for Businesses – Beginners Course 1 -A_bce01-3_ AV1 720p.mp4
Fundamentals of Finance & Economics for Businesses – Beginners Course 1--A_bcd01-3_ av1-1080p.mp4
Fundamentals of Finance & Economics for Businesses – Beginners Course 1 -A_bcd01-3_ av1 1080p.mp4
Fundamentals of Finance & Economics for Businesses – Beginners Course 1--A_bcd01-3_-av1-1440p-30fps.mp4

Code: Select all

regex:[\s\-](\d{6,8})[\s\-] addcolumn:1 dupe:1
This worked perfectly thanks!
void
Developer
Posts: 19053
Joined: Fri Oct 16, 2009 11:31 pm

Re: Complex regex to find duplicate unique file names with exactly 11 characters?

Post by void »

Please try:

Code: Select all

regex:^(.*?)(?:[\s\-]([a-z0-9_\-]{11}))?([\s\-]av1)?([\s\-]\d+p)?([\s\-]\d+fps)?\.[^.]*$ addcolumn:1;2 dupe:1
-I moved the
([\s\-]av1)?([\s\-]\d+p)?([\s\-]\d+fps)?
part outside the unique key match.
cal
Posts: 15
Joined: Fri Jul 25, 2025 5:24 pm

Re: Complex regex to find duplicate unique file names with exactly 11 characters?

Post by cal »

Sorry this is getting more complex.

The AV1 in the file name refers to the codec and sometimes there are different codecs used and other download apps use different abbreviation for them, some added bitrate after the codec.

Instead of only AV1, can you include these

Exact matches and case insensitive
av1
vp9

Exact matches and case sensitive
vp9.2
avc1.64002a
avc1.640028

Case sensitive, if the abbreviation is av01, it will be in either of these 2 formats and length. the numbers are random
av01.0.08M.08
av01.0.17M.08
av01.0.04M.08.0.110.05.01.06.0
av01.0.17M.10.0.110.09.18.09.0

Case sensitive, if the abbreviation is vp09, it will be in either of these 2 formats and lengths, the numbers are random, no M for vp09
vp09.00.40.08
vp09.02.51.10.01.09.16.09.00

Some example file names
Filename-BA_bcd01-3_-vp9.2-2160p-30fps.mkv
Filename-BA_bcd01-3_-vp09.00.40.08-1920p-30.0fps.mp4
Filename-BA_bcd01-3_-avc1.640028-1920p-30fps.mkv
Filename-BA_bcd01-3_-av01.0.13M.08-2160p-60fps.mp4
Filename-BA_bcd01-3_-av01.0.09M.08.0.110.05.01.06.0-1080p-60fps.mp4
void
Developer
Posts: 19053
Joined: Fri Oct 16, 2009 11:31 pm

Re: Complex regex to find duplicate unique file names with exactly 11 characters?

Post by void »

Since av*1 and vp are unlikely to hit a real word, here's a simple regex search:

Code: Select all

regex:^(.*?)(?:[\s\-]([a-z0-9_\-]{11}))?([\s\-]av1)?([\s\-]av[c0]?1.*?)?([\s\-]vp0?9.*?)?([\s\-]\d+p)?([\s\-]\d+fps)?\.[^.]*$ addcolumn:1;2 dupe:1
-finds the key before av1 or avc1 or av01 or vp09 or vp9
cal
Posts: 15
Joined: Fri Jul 25, 2025 5:24 pm

Re: Complex regex to find duplicate unique file names with exactly 11 characters?

Post by cal »

Thanks I just need 1 more adjustment to the regex to include filenames with these in the results.

(1920p_30fps_H264)
(2160p_24fps_AV1)
(1080p_60fps_VP9)
(2160p_30fps_VP9 LQ)

The numbers for p and fps is variable like it could be 720p 30fps
The codec format will end in H264) AV1) VP9) or VP9 LQ)

Example file names, some has the unique key and some does not
Filename (1920p_30fps_H264)
Filename (2160p_24fps_AV1)
Filename (1080p_60fps_VP9)
Filename (2160p_30fps_VP9 LQ)
Filename Unique Key (1920p_30fps_H264)
Filename Unique Key (2160p_24fps_AV1)
Filename Unique Key (1080p_60fps_VP9)
Filename Unique Key (2160p_30fps_VP9 LQ)

Is there also a regex that ignores everything else and only tries match the 11 characters unique key?
Some file names will have like something like Fundamentals and that would bring up the wrong results, the unique key is enclosed by a space or - so it would avoid that, I was planning to delete the duplicates using the unique key first since that guarantees a duplicate and using the other regex to find the remaining using thumbnails to verify if they are duplicates.
void
Developer
Posts: 19053
Joined: Fri Oct 16, 2009 11:31 pm

Re: Complex regex to find duplicate unique file names with exactly 11 characters?

Post by void »

Please try:

Code: Select all

regex:^(.*?)(?:[\s\-]([a-z0-9_\-]{11}))?([\s\-]av1)?([\s\-]av[c0]?1.*?)?([\s\-]vp0?9.*?)?([\s\-]\d+p)?([\s\-]\d+fps)?([\s\-]\(\d+p_\d+fps_.*?\))?\.[^.]*$ addcolumn:1;2 dupe:1
-added
([\s\-]\(\d+p_\d+fps_.*?\))?



Is there also a regex that ignores everything else and only tries match the 11 characters unique key?
Yes, but the key is not well defined and will match any valid 11 character word.

Code: Select all

regex:[\s\-]([a-z0-9_\-]{11})[\s\-] addcolumn:1 dupe:1
cal
Posts: 15
Joined: Fri Jul 25, 2025 5:24 pm

Re: Complex regex to find duplicate unique file names with exactly 11 characters?

Post by cal »

Yes, but the key is not well defined and will match any valid 11 character word.
It did bring up results that are exactly 11 words, I switched to view by details and sorted by dupe column and can quickly tell if it was a unique key or an actual word. I added a video length column to help with quick verification.

For anything that doesn't have the unique key, I am viewing them by thumbnails.

When viewing by thumbnails, is there any customizations possible?
Is there a way to add any of these attributes?
Size
Video Length
Path

The 3 regex works great, thank you for all the help.
void
Developer
Posts: 19053
Joined: Fri Oct 16, 2009 11:31 pm

Re: Complex regex to find duplicate unique file names with exactly 11 characters?

Post by void »

When viewing by thumbnails, is there any customizations possible?
Only the name is shown.

If you hover over the thumbnail a tooltip is shown with the values for each column that is enabled in detailed view.

I will look into improving the customization of the thumbnail view.
void
Developer
Posts: 19053
Joined: Fri Oct 16, 2009 11:31 pm

Re: Complex regex to find duplicate unique file names with exactly 11 characters?

Post by void »

Everything 1.5.0.1397a makes some minor improves to thumbnail tooltips.

add-column: will now work in thumbnail view.

Tooltips will now gather property values and update automatically.
Post Reply