Search for Content in MS Word files and other application files.

Have a suggestion for "Everything"? Please post it here.
Post Reply
ChrisGreaves
Posts: 821
Joined: Wed Jan 05, 2022 9:29 pm

Search for Content in MS Word files and other application files.

Post by ChrisGreaves »

Please consider searching by CONTENT in ANY user-nominated files.

Code: Select all

t: normal.dot content:Function
This ought not to be as long a shot as it seems :D
I am now blessed with the knowledge of fine-tuning a search and THEN applying a Content term to obtain a very short list of files that suit me.
Except the Content term appears not to examine VBA coding modules within a DOCument, Template, Workbook or whatever is within the stable of MS application software.

I have attached a screenshot of one of my 964 DOT templates on which key words are visible to folks who write in VBA or even VB for that matter. (Or C++ or Python or FORTRAN or ...) I have macros in many of my 1,209 Workbooks and several of my 17,504 DOCument files.

In an earlier post I noted that "I have developed an application to open templates. And another application to harvest templates into a humongous (52MB) searchable library.". This application was written in 1999, and some programmers will recognize the urgency of searching for "that code that I wrote six months ago; where is it?"

My screenshot proves that a text-search can be made on a MSWord template file (extent:DOT). That search may be unsuccessful, but experience tells us that an unsuccessful search often triggers the clue to a better search term. This morning I tried to find my "CreateReply" macro, but that hurdle made me consider searching for "Initials" (of the person who had sent me an email) and Bingo!

My 25-year old project management application still works well, but it works best immediately after I have harvested VBA code from DOC, DOT, XLS, BAS, CLS and FRM files.

I would like the ability to harvest programming code from my unprofitable foray into LibreOffice files (ext:ODT) and Office21 files (ext:DOTM).

We know that a content search may cost time, but we know as well how to justify that time against (say) rewriting a Macro from scratch.

My conclusion is that since I, a human, can open any file on my system with a text editor (my screenshot was taken of a Notepad opening of a DOT file) and I, a human, can judge the contents of that Notepad representation, only Everything stands in the way (warm and friendly grin) of me, a human, examining a text representation of any file at all on my system.

This thread is an invitation to Everything to produce some results from a Content term in a search through any file in the established Result List. If I can see a chunk of text in Notepad, why can't Everything take a look? As soon as my eye/brain spots "Function blnEqualCI(strOne As String," a new thought is triggered in my eye/brain, and that is my responsibility.

Thanks, Chris
Attachments
Notepad.DOT(M).jpg
Notepad.DOT(M).jpg (788.27 KiB) Viewed 1833 times
horst.epp
Posts: 1640
Joined: Fri Apr 04, 2014 3:24 pm

Re: Search for Content in MS Word files and other application files.

Post by horst.epp »

It makes no sense to search binary content.
For LibreOffice (.odt) or MS Office files (.doc, .docx) there are search filters (iFilter)
which allow Everything to index their content.
I use this on my system and can find any words from Everything content index of such files.
______________________________________________________
Windows 11 Home Version 25H2 (OS Build 26200.8117)
Everything 1.5.0.1408a (x64), Everything Toolbar 2.3.0
void
Developer
Posts: 19793
Joined: Fri Oct 16, 2009 11:31 pm

Re: Search for Content in MS Word files and other application files.

Post by void »

Everything will try the system iFilter for the associated file extension to search content.

If there's no iFilter associated, Everything will fall back to a binary content search.
A binary content search will try to search the file as UTF-8, ANSI and Unicode.



A binary search for dot or dotx files is not going to help much as the content is not binary, it's structured and compressed.



There is an Office iFilter for searching DOT file content.
Installing Office will make it possible to search dot file content from Everything.



It's most likely the dot ifilter doesn't expose the VBA code in the file.
I don't have a good solution here. -You'll only be able to search for text in the main document template.
Maybe there's an existing iFilter to search VBA code, but I highly doubt it..
If you can find a way to do this in Windows Explorer, it should be possible to do it from Everything.
ChrisGreaves
Posts: 821
Joined: Wed Jan 05, 2022 9:29 pm

Re: Search for Content in MS Word files and other application files.

Post by ChrisGreaves »

void wrote: Wed Apr 15, 2026 7:37 amEverything will try the system iFilter for the associated file extension to search content. ... Maybe there's an existing iFilter to search VBA code, but I highly doubt it..
If you can find a way to do this in Windows Explorer, it should be possible to do it from Everything.
Thank you for this response David.
I am suggesting that there be a user option to ignore iFilters, jFilters, kFilters etc and all "special treatment" for DOC and DOT files. And XLS files. And LIB files.
Untitled.jpg
Untitled.jpg (199.13 KiB) Viewed 1772 times
The instant we focus on Extents or "type of file" we are lost. You and I can be confident in declaring that your LIB files are structured differently from my LIB files; which prompts me to ask how YOUR 059 files are structured. What?!! You don't have any ext:059 files? Well I do !!! :D

We ought to be able to agree that a user (or one of a user's clients!) can fabricate files with any extent, and so ultimately an extent does not define a file content (although it can in the case of DOC, DOT, XLS files - excepting DOC files that I might create by my implementation of a medical system ...
I trot out these examples solely to remind us that with humble Notepad.exe a user or developer can look inside a file (all files are composed of bits and so can be called "binary") and spot a string of humanly meaningful characters.

This is true, as I showed in the first post, when I used Notepad to look inside a DOT file and I could read and recognize strings that are meaningful to me since I am a VBA coder.

If I understand it, Everything uses the extent DOT to decide to apply an iFilter; The designer of the iFilter has decided that a DOT file is a Word template, and so the iFilter recognizes the body text of the filter (descriprive human text) but shrugs off the text that is held within the VBA project. The text that is held within the VBA project is easily recognizable by the human known as "chrisgreaves", nut that user is denied the view/use of that text - by the iFilter!

Please would you consider a user option in Everything that disables the use of filters and allow a Content: search to take place by scanning, byte-by-byte, any file, treating that file as "a string of 8-bit bytes"?

My suggestion that a VBS coder (me!) has an interest in spotting strings like "Function" is countered by other users who are not VBA coders.

I think that thirty years have passed since I wrote a crude file-searcher that not only was 40 times faster than explorer (I wrote then in Word 6.9 Basic and used the Instr() function) but also discovered "deleted" text in a pharmaceutical company's WordPerfect (for DOS) files. Much to their embarrassment. (Think: "Federal Approval" documentation).

With today's laptop (SSD) I can turn Everything loose on an overnight search and find what I am looking for - If only I could disable all content filters!

To anyone who claims this is not possible, I suggest you use Notepad.exe to examine one of your application's files.
For example I made a small ext:LIB file as a library of a small MSWord ext:dot template. The second screen shot shows the content of my ext:LIB file.

Cheers, Chris

Thanks, Chris
Attachments
Untitled2.jpg
Untitled2.jpg (66.65 KiB) Viewed 1772 times
ChrisGreaves
Posts: 821
Joined: Wed Jan 05, 2022 9:29 pm

Re: Search for Content in MS Word files and other application files.

Post by ChrisGreaves »

void wrote: Wed Apr 15, 2026 7:37 amA binary search for dot or dotx files is not going to help much as the content is not binary, it's structured and compressed.
P.S. DOTX I get - zipped text - but DOT? No.
Notepad.exe quite happily has displayed the contents of a DOT file with enough humanly-recognisable text to get me going.
Cheers, Chris
void
Developer
Posts: 19793
Joined: Fri Oct 16, 2009 11:31 pm

Re: Search for Content in MS Word files and other application files.

Post by void »

I am suggesting that there be a user option to ignore iFilters, jFilters, kFilters etc and all "special treatment" for DOC and DOT files. And XLS files. And LIB files.
Currently, the user will need to exclude these files from their search when using content:

For example:
!ext:doc;dot;xls;lib content:"foo bar"




Alternatively, use ifilter-content: to search with just the iFilter.

For example:
ifiltercontent:"foo bar"


Since there's no iFilter for lib files, this will just silently fail.
If office is not installed, doc, dot and xls files will also be ignored.


Please would you consider a user option in Everything that disables the use of filters and allow a Content: search to take place by scanning, byte-by-byte, any file, treating that file as "a string of 8-bit bytes"?
Already exists :)

Please try binarycontent:

For example:
binarycontent:"foo bar"


This will search your dot, lib, 059 files as UTF-8, ANSI, UTF-16 and UTF-16BE (basically all types of text/plain)


P.S. DOTX I get - zipped text - but DOT? No.
Correct, dotx is compressed.
dot is not compressed, but Everything doesn't know the structure of dot files, so it will come down to luck if the VBA code is stored as text/plain.
ChrisGreaves
Posts: 821
Joined: Wed Jan 05, 2022 9:29 pm

Re: Search for Content in MS Word files and other application files.

Post by ChrisGreaves »

void wrote: Fri Apr 17, 2026 1:55 amAlready exists :)
Please try binarycontent:
Void, I regret that phBBB does not permit me to show “Brilliant” in 500 point font.

I have been accumulating VBA code (mainly in Word97-format *.dot) since 1997, and since 1999 have been accumulating all my Visual Basic for Applications code (DOC, DOT, XLS, XLA, PPT, CLS, BAS, FRM).
Then Everything comes along and – aargh! – seems to block me from finding code that is visible to my eyes with simple Notepad.exe

As usual you have pre-empted and built exactly what is needed, not only in my system, but can be used as a standalone feature on any of my client’s Windows systems.

TaDa!! :Trumpets:
Now I can search any Windows system for any keyword (part of a keyword suffices to track down the setting) and focus on a problem and thus develop a likely solution.
Makes me wonder just how many “binary” searchable files there are in this universe!
And Everything can do it!
(wanders off to have a lie-down in great astonishment ...)
I might have mentioned this elsewhere, but it bears repeating: On a digital computer, all content is binary, even 2400’ magnetic tapes and 80-column punched cards on the mainframes of the 1950s!
Enough! Thank You, as always.

(signed) “Grateful (but humbled) Visual Basic for Applications programmer”
Post Reply