Validate Headers: For a good time, search for files with incorrect extensions.

Discussion related to "Everything" 1.5 Alpha.
Post Reply
raccoon
Posts: 684
Joined: Thu Oct 18, 2018 1:24 am

Validate Headers: For a good time, search for files with incorrect extensions.

Post by raccoon » Thu Jan 20, 2022 7:02 am

Most file types have a specific byte header, or sometimes a byte footer, to identify the file content type in absence of a filename and extension. Helpful for recovering corrupt file data, but also very useful in validating filenames. You, right now, have probably dozens of .png files that are not PNG files, but actually JPEG, JFIF, TIFF or other image types in disguise.

I want to start a thread listing different file extensions accompanied by their byte headers so we can validate different filetypes. The content function should be presented in the negative so to reveal invalid files. I'll edit this list below.

ext:png !first-8-bytes:89504E470D0A1A0A
ext: ...
ext: ...

PNG ref: PNG Specification "%PNG\r\n\z\n"

void
David Carpenter (Developer)
Posts: 9398
Joined: Fri Oct 16, 2009 11:31 pm

Re: Validate Headers: For a good time, search for files with incorrect extensions.

Post by void » Thu Jan 20, 2022 7:15 am

Everything 1.5 has a File Signature property.

The File Signature is determined from the file content. (not the filename or extension)

The following common mime types are supported:

application/x-msdownload
application/zip
audio/flac
audio/midi
audio/mpeg
audio/ogg
audio/wav
audio/x-ape
image/bmp
image/gif
image/jpeg
image/png
image/tiff
image/vnd.adobe.photoshop
image/webp
image/x-icon
image/x-pcx
image/x-tga
video/avi
video/flv
video/mp4
video/x-matroska
video/x-ms-asf

Please recommend any mime types here or here.
I am happy to list the first few bytes too.

Note:
ispng: is an alias for file-signature:image/png
More file-signature aliases here.



first-x-bytes:
last-x-bytes:
file-signature:
is-png:

raccoon
Posts: 684
Joined: Thu Oct 18, 2018 1:24 am

Re: Validate Headers: For a good time, search for files with incorrect extensions.

Post by raccoon » Thu Jan 20, 2022 8:15 am

Is dm:dc: aka dm:=dc: aka dm:==dc: as well as dm:!=dc: a unique trait of these properties or date/number properties generally?

content-type:!=file-signature: does not work. Is there a method to accomplish this?

/define fake content-type:!=file-signature:

void
David Carpenter (Developer)
Posts: 9398
Joined: Fri Oct 16, 2009 11:31 pm

Re: Validate Headers: For a good time, search for files with incorrect extensions.

Post by void » Thu Jan 20, 2022 8:21 am

dm:dc: aka dm:=dc: aka dm:==dc: is a unique trait for these properties only.

There's an eval: search function in development:
!eval:#<:#exact:#content-type:,#file-signature:#>:

However, eval: is currently broken.
I will post an fix soon.

void
David Carpenter (Developer)
Posts: 9398
Joined: Fri Oct 16, 2009 11:31 pm

Re: Validate Headers: For a good time, search for files with incorrect extensions.

Post by void » Fri Jan 21, 2022 6:24 am

Everything 1.5.0.1297a fixes the eval: search function.

The following search should work as expected:
!eval:#<:#exact:#content-type:,#file-signature:#>:

It's pretty slow, so combine with other filters for the best performance.


I not sure what the eval: search function is yet and it still needs time to mature..
I am working on a content-type:==file-signature: syntax.

raccoon
Posts: 684
Joined: Thu Oct 18, 2018 1:24 am

Re: Validate Headers: For a good time, search for files with incorrect extensions.

Post by raccoon » Fri Jan 21, 2022 6:46 am

Thanks! Playing with it now. Nice that also specifying "content-type:" and "file-signature:" as terms also makes sure that both columns do contain a not-null value.

content-type: file-signature: !eval:#<:#exact:#content-type:,#file-signature:#>:

Already found some neat things, like a collection I flagged as invalid years ago because it didn't pass the MP3 validator, are actually layer3 mpeg audio in RIFF WAVE containers then stupidly given a .mp3 file extension. Can only guess why they were encoded that way 10 or 20 years ago.

And of course the standard fair of hidden .zip files tacked onto the end of video files.

I'm still looking at file type byte signatures and the file extensions recognized by Everything to see what might be missing from the collection.

Post Reply