Validate Headers: For a good time, search for files with incorrect extensions.

Discussion related to "Everything" 1.5 Alpha.
Post Reply
raccoon
Posts: 1017
Joined: Thu Oct 18, 2018 1:24 am

Validate Headers: For a good time, search for files with incorrect extensions.

Post by raccoon »

Most file types have a specific byte header, or sometimes a byte footer, to identify the file content type in absence of a filename and extension. Helpful for recovering corrupt file data, but also very useful in validating filenames. You, right now, have probably dozens of .png files that are not PNG files, but actually JPEG, JFIF, TIFF or other image types in disguise.

I want to start a thread listing different file extensions accompanied by their byte headers so we can validate different filetypes. The content function should be presented in the negative so to reveal invalid files. I'll edit this list below.

ext:png !first-8-bytes:89504E470D0A1A0A
ext: ...
ext: ...

PNG ref: PNG Specification "%PNG\r\n\z\n"
void
Developer
Posts: 15329
Joined: Fri Oct 16, 2009 11:31 pm

Re: Validate Headers: For a good time, search for files with incorrect extensions.

Post by void »

Everything 1.5 has a File Signature property.

The File Signature is determined from the file content. (not the filename or extension)

The following common mime types are supported:

application/x-msdownload
application/zip
audio/flac
audio/midi
audio/mpeg
audio/ogg
audio/wav
audio/x-ape
image/bmp
image/gif
image/jpeg
image/png
image/tiff
image/vnd.adobe.photoshop
image/webp
image/x-icon
image/x-pcx
image/x-tga
video/avi
video/flv
video/mp4
video/x-matroska
video/x-ms-asf

Please recommend any mime types here or here.
I am happy to list the first few bytes too.

Note:
ispng: is an alias for file-signature:image/png
More file-signature aliases here.



first-x-bytes:
last-x-bytes:
file-signature:
is-png:
raccoon
Posts: 1017
Joined: Thu Oct 18, 2018 1:24 am

Re: Validate Headers: For a good time, search for files with incorrect extensions.

Post by raccoon »

Is dm:dc: aka dm:=dc: aka dm:==dc: as well as dm:!=dc: a unique trait of these properties or date/number properties generally?

content-type:!=file-signature: does not work. Is there a method to accomplish this?

/define fake content-type:!=file-signature:
void
Developer
Posts: 15329
Joined: Fri Oct 16, 2009 11:31 pm

Re: Validate Headers: For a good time, search for files with incorrect extensions.

Post by void »

dm:dc: aka dm:=dc: aka dm:==dc: is a unique trait for these properties only.

There's an eval: search function in development:
!eval:#<:#exact:#content-type:,#file-signature:#>:

However, eval: is currently broken.
I will post an fix soon.
void
Developer
Posts: 15329
Joined: Fri Oct 16, 2009 11:31 pm

Re: Validate Headers: For a good time, search for files with incorrect extensions.

Post by void »

Everything 1.5.0.1297a fixes the eval: search function.

The following search should work as expected:
!eval:#<:#exact:#content-type:,#file-signature:#>:

It's pretty slow, so combine with other filters for the best performance.


I not sure what the eval: search function is yet and it still needs time to mature..
I am working on a content-type:==file-signature: syntax.
raccoon
Posts: 1017
Joined: Thu Oct 18, 2018 1:24 am

Re: Validate Headers: For a good time, search for files with incorrect extensions.

Post by raccoon »

Thanks! Playing with it now. Nice that also specifying "content-type:" and "file-signature:" as terms also makes sure that both columns do contain a not-null value.

content-type: file-signature: !eval:#<:#exact:#content-type:,#file-signature:#>:

Already found some neat things, like a collection I flagged as invalid years ago because it didn't pass the MP3 validator, are actually layer3 mpeg audio in RIFF WAVE containers then stupidly given a .mp3 file extension. Can only guess why they were encoded that way 10 or 20 years ago.

And of course the standard fair of hidden .zip files tacked onto the end of video files.

I'm still looking at file type byte signatures and the file extensions recognized by Everything to see what might be missing from the collection.
Post Reply