How to exclude binaries from content search?

General discussion related to "Everything".
Post Reply
LifeH2O
Posts: 3
Joined: Wed Dec 16, 2009 12:48 pm

How to exclude binaries from content search?

Post by LifeH2O » Thu Jun 04, 2020 12:25 pm

While searching in content of a file, Everything can see if the content is text or binary. I have lots of files with all kind of extensions and it's really hard to know the extension of all binary/text files.

I don't want to search binary content for the text I am looking for. What is the current way to explicitly include/exclude all binary/text files?

void
Site Admin
Posts: 5458
Joined: Fri Oct 16, 2009 11:31 pm

Re: How to exclude binaries from content search?

Post by void » Fri Jun 05, 2020 10:18 am

Everything doesn't really know if a file is text or binary.
Everything will try the extension associated iFilter first, and if that fails, Everything will fall back to utf8 content.
-I think you want to avoid this fall back to utf8 content? -a ifiltercontent: search is in development which will do just this.

For now, you will need to specify the extensions of interest.
Please try using the ext: search function to limit your results to a list of extensions.

for example:
ext:c;h;cpp content:"foo bar"

LifeH2O
Posts: 3
Joined: Wed Dec 16, 2009 12:48 pm

Re: How to exclude binaries from content search?

Post by LifeH2O » Fri Jun 05, 2020 9:19 pm

I was debugging a system with around 100GB of data and text scripts in various formats and was looking for specific piece of code in code/script files.
I ended up opening small files of various extensions in notepad too see if its a readable text, and adding noting down those extensions to exclude one by one.

It did the job but I was wondering that while reading a file, Everything can see if the content is readable text from first few bytes, if not, skip reading through rest of that file. I know its complicated for unicode. But in case where I know I am not looking for unicode, it will be much faster to go through large data sets.

aviasd
Posts: 39
Joined: Sat Oct 07, 2017 2:18 am

Re: How to exclude binaries from content search?

Post by aviasd » Sat Jun 06, 2020 11:04 am

+1
I do believe this could be a useful feature.
Currently I revert to using cli tools like ripgrep for content searching, which knows to skip binary files as well as respect .gitignore and git folders etc.
Supporting these features in everything could be great!

NotNull
Posts: 2109
Joined: Wed May 24, 2017 9:22 pm

Re: How to exclude binaries from content search?

Post by NotNull » Sat Jun 06, 2020 12:51 pm

aviasd wrote:
Sat Jun 06, 2020 11:04 am
skip binary files
A couple of month ago (earIy February), I took a look at AnyTXT Searcher.
From what I remember, it does that.

It looks like it uses Everything for it's filename searches, as it includes Everything32.dll (from the SDK) to communicate with a running Everything. BUT: when I ran Everything in debug mode, I could not see any (IPC) activity from this AnyTXT.
AnyTXT was not very mature at that moment, but it looks promising.

aviasd
Posts: 39
Joined: Sat Oct 07, 2017 2:18 am

Re: How to exclude binaries from content search?

Post by aviasd » Mon Jun 08, 2020 12:47 pm

NotNull wrote:
Sat Jun 06, 2020 12:51 pm

It looks like it uses Everything for it's filename searches, as it includes Everything32.dll (from the SDK) to communicate with a running Everything. BUT: when I ran Everything in debug mode, I could not see any (IPC) activity from this AnyTXT.
AnyTXT was not very mature at that moment, but it looks promising.
Thanks for that but the project does seem very unmature. Cannot even search inside specific folder.

A common scenerio I was refferring:

Consider a coding project with multiple file extensions and no extensions in it as well as binaries etc on c:\projects\myproject and managed by git
Searching for lines inside one of the files in that project would require:

Code: Select all

c:\projects\myproject !folder:.git content:foo  
and preselecting a filter for all the variable file extensions in that folder.

If not selecting the file extensions , everything will search inside all binaries as well, which is really time consuming even for a small project on SSD.

Having everything support for "magic" file headers as well as respecting VCS systems could greatly increase it's reach for the coder realm (IMO)

NotNull
Posts: 2109
Joined: Wed May 24, 2017 9:22 pm

Re: How to exclude binaries from content search?

Post by NotNull » Mon Jun 08, 2020 3:09 pm

For now, you could exclude certain file extensions by adding something like
!ext:exe;dll;tlb
to your search query.

(and make a new filter out of that ;))

Post Reply