I have about 3000 e-books "epub" files spread out over many directories on my computer with different names. Tae a look at the following
A Summers Moon.epub
A Summers Moon(2).epub
A Summers Moon by William Smith.epub
All these files are the same book. What is the best way to search for duplicate files by content instead of File Name? I thought of size, but most of my epub files are really close in size.
Thanks for any help
Search Duplicate Files w/ different names
Re: Search Duplicate Files w/ different names
Please try an exact size match first:
In Everything 1.5, search for:
This will instantly show possibly duplicated ebooks.
To find epubs that have the same content, search for:
For epubs with slightly different size, please try:
Adjust 1024 as needed.
For epubs with slightly different name, please try:
This will find epubs starting with the same first 12 characters.
Adjust 12 as needed.
Find duplicates in Everything 1.5
In Everything 1.5, search for:
Code: Select all
*.epub dupe:size
To find epubs that have the same content, search for:
Code: Select all
*.epub dupe:size;sha256
For epubs with slightly different size, please try:
Code: Select all
*.epub add-column:a a:=INT($size:/1024) dupe:a
For epubs with slightly different name, please try:
Code: Select all
*.epub regex:name:^(.{12}) dupe:1
Adjust 12 as needed.
Find duplicates in Everything 1.5
-
Herkules97
- Posts: 220
- Joined: Tue Oct 08, 2019 6:42 am
Re: Search Duplicate Files w/ different names
We don't know your exact setup.komobu wrote: Mon Jun 01, 2026 1:57 pm I have about 3000 e-books "epub" files spread out over many directories on my computer with different names. Tae a look at the following
A Summers Moon.epub
A Summers Moon(2).epub
A Summers Moon by William Smith.epub
All these files are the same book. What is the best way to search for duplicate files by content instead of File Name? I thought of size, but most of my epub files are really close in size.
Thanks for any help
If the files have different sizes, can't use sizedupe.
If the files have different names, can't use namedupe.
If the files have different content(would be the same as different size), can't use hashes like SHA256.
If the files aren't the exact same, they likely won't have the same timestamps, so can't use date-created-dupe nor date-modified-dupe. If you use Windows Explorer to copy, it will only copy the time modified so created would be irrelevant anyway.
The best way is to not have gathered every copy in the first place, second best is to manually de-duplicate.
Or you can do like me and just keep everything. For some songs I have 10 or more copies with varying metadata, sound quality and whatever other differences.