Unable to search PDF contents

Discussion related to "Everything" 1.5 Alpha.
Post Reply
excelsius
Posts: 4
Joined: Fri Sep 01, 2023 1:39 am

Unable to search PDF contents

Post by excelsius »

Version: 1.5.0.1355a (x64)
OS: Windows 11 Education 22H2 22621.2134

I just installed the alpha version of Everything and set the content indexing to include *.doc;*.docx;*.pdf;*.txt;*.xls;*.xlsx;*.sas;*.r;*.py;*.ipynb

The issue is that Everything is unable to search for pdf contents properly. I think indexing is complete, Everything database is about 3GB, RAM usage is 4GB. I have over 40GB of free RAM left and my drive is NVMe.

The searches I tried are:
  • "D:\" <*.pdf> content:perturb. This one bring no results at all
  • "D:\" <*.pdf> notindexed:content:perturb. This one brings only two results, but is very slow to come up with them and is missing 76 additional results that I have verified with AnyTXT
Trying to figure out if this is a bug or if I'm doing something wrong. The content in question is located on NVMe, but I have also included NAS locations in the general indexing of Everything. Everything 1.4 is also currently installed in parallel. Hopefully that's not a problem.

Must add, Everything is an amazing tool. Thank you for developing it and making it even way better in v1.5.
void
Developer
Posts: 17514
Joined: Fri Oct 16, 2009 11:31 pm

Re: Unable to search PDF contents

Post by void »

Thank you for your feedback excelsius,

Is Everything still indexing your content?
-A progress bar shown in the status bar if Everything is still indexing content.


(content indexing progress bar shown at the bottom right in the statusbar)

Progress is also shown under Tools -> Options -> Content.



Everything uses the system iFilter to search PDF content.

Could you please check the PDF content Everything retrieves is sane:

Please try the following search:

"c:\my folder\my file.pdf" regex:content:^(.*)$ addcol:regmatch1

where c:\my folder\my file.pdf is a PDF file that contains perturb.

Does the file content shown in the regular expression match 1 column look sane? -Does it match the text content in the PDF file?



Do you have any search options checked under the Search menu? (please make sure match case, whole words and regex are all disabled)



Some PDF iFilters do not like running as administrator.

Please make sure Everything is installed correctly and running as a standard user:
  • In Everything, from the Tools menu, click Options.
  • Click the General tab on the left.
  • Check Store settings and data in %APPDATA%\Everything.
  • Uncheck Run as administrator.
  • Check Everything Service. (Please make sure this is tick-checked and not square-checked)
  • Click OK.
  • Exit Everything (right click the Everything tray icon and click Exit).
  • Restart Everything.
excelsius
Posts: 4
Joined: Fri Sep 01, 2023 1:39 am

Re: Unable to search PDF contents

Post by excelsius »

Thank you for the very quick response. Yes, indexing is done. The status bar appeared when I first installed v1.5, but it completed very quickly, probably within few minutes.

I'm not sure exactly what results I'm looking for with the search you proposed, but here is a screenshot for one of the PDFs that does contain the keyword. The RegEx column is blank:
srch.png
srch.png (29.7 KiB) Viewed 8796 times
I checked all the other settings you mentioned in terms of Search and Administrator and they all are as you mentioned, so nothing needed to be changed there.
void
Developer
Posts: 17514
Joined: Fri Oct 16, 2009 11:31 pm

Re: Unable to search PDF contents

Post by void »

Thank you for the information.

Looks like the PDF iFilter is not working at all..
(The regmatch 1 column is blank)

Could you please send some debug output:
  • In Everything, from the Tools menu, under the Debug submenu, check Verbose.
  • In Everything, from the Tools menu, under the Debug submenu, check Start Debug Logging.
    ---
    Select your PDF file and press Ctrl + F5.
    ---
  • In Everything, from the Tools menu, under the Debug submenu, click Stop Debug Logging.
    The Everything Debug Log will open in Notepad.
  • Please save this file to the Desktop and send to support@voidtools.com
Privacy
excelsius
Posts: 4
Joined: Fri Sep 01, 2023 1:39 am

Re: Unable to search PDF contents

Post by excelsius »

I don't know if I was supposed to respond here too, but I sent the logs to you this morning. Thank you.
void
Developer
Posts: 17514
Joined: Fri Oct 16, 2009 11:31 pm

Re: Unable to search PDF contents

Post by void »

Thank you for the debug logs.

LoadIFilter D:\...\file.pdf 80070057


Loading the contents of the PDF fails.

Error 0x80070057: The function received an invalid parameter.

The parameters passed to the iFilter are correct.
This error is likely generated by the third party iFilter handler.



Reinstalling your PDF viewer might help.

I have put on my TODO list to add the option to override the default iFilter.
horst.epp
Posts: 1521
Joined: Fri Apr 04, 2014 3:24 pm

Re: Unable to search PDF contents

Post by horst.epp »

There is no info provided what PDF product is installed.
Windows 11 alone doesn't contain an iFilter.
NotNull
Posts: 5817
Joined: Wed May 24, 2017 9:22 pm

Re: Unable to search PDF contents

Post by NotNull »

horst.epp wrote: Sat Sep 02, 2023 6:24 am Windows 11 alone doesn't contain an iFilter.
It does here (Win11 Pro; no PDF software instalkled (yet)):

Reader Search Handler, using %systemroot%\system32\Windows.Data.Pdf.dll
Registered under CLSID {6C337B26-3E38-4F98-813B-FBA18BAB64F5}

( Maybe that one came with the Edge browser? )


2023-09-02 22_27_29-Registry Editor.png
2023-09-02 22_27_29-Registry Editor.png (101.52 KiB) Viewed 8708 times
horst.epp
Posts: 1521
Joined: Fri Apr 04, 2014 3:24 pm

Re: Unable to search PDF contents

Post by horst.epp »

2NotNull
You are right, it's the reader search handler.
Never used it.
Screenshot - 03.09.2023 , 16_02_51.png
Screenshot - 03.09.2023 , 16_02_51.png (11.92 KiB) Viewed 8664 times
NotNull
Posts: 5817
Joined: Wed May 24, 2017 9:22 pm

Re: Unable to search PDF contents

Post by NotNull »

OK, not just here then.

Your screenshot .. that program feels familiar, but can't put my finger on it. What is it?
horst.epp
Posts: 1521
Joined: Fri Apr 04, 2014 3:24 pm

Re: Unable to search PDF contents

Post by horst.epp »

NotNull wrote: Sun Sep 03, 2023 3:58 pm OK, not just here then.

Your screenshot .. that program feels familiar, but can't put my finger on it. What is it?
It's the Properties for an entry in the Nirsoft SearchFlterView
Screenshot - 03.09.2023 , 19_43_12.png
Screenshot - 03.09.2023 , 19_43_12.png (66.29 KiB) Viewed 8651 times
NotNull
Posts: 5817
Joined: Wed May 24, 2017 9:22 pm

Re: Unable to search PDF contents

Post by NotNull »

That's the one (forgot I even had it)

Thanks!
void
Developer
Posts: 17514
Joined: Fri Oct 16, 2009 11:31 pm

Re: Unable to search PDF contents

Post by void »

Everything 1.5.0.1356a adds support for custom iFilter handlers.

To set Everything to use the built-in Windows PDF iFilter:
  • In Everything 1.5, from the Tools menu, click Options.
  • Click the Advanced tab on the left.
  • To the right of Show settings containing, search for:
    ifilter
  • Select content_ifilter_handlers.
  • Set the value to: [{"filter":"*.pdf","handler":"{6C337B26-3E38-4F98-813B-FBA18BAB64F5}"}]
  • Click OK.
content_ifilter_handlers



To refresh your indexed PDF content and properties:
  • In Everything 1.5, search for:
    *.pdf
  • Select all results (Ctrl + A)
  • Press Ctrl + F5.
  • Indexing progress is shown in the status bar on the right.
  • Click OK.
excelsius
Posts: 4
Joined: Fri Sep 01, 2023 1:39 am

Re: Unable to search PDF contents

Post by excelsius »

This is excellent news. Thanks for such a quick update. I followed your instructions and it took couple of hours to index ~40K PDFs, but now I can search the contents. One thing that is strange is that before PDF indexing, Everything used about 3.8GB RAM. After PDF indexing, RAM usage dropped below 2GB and then jumped back up to about 6.7GB, which would be the expected value. But that was yesterday right after indexing. Today, the RAM usage is down to just 660MB. I'm wondering if maybe not all the indices are in RAM?

Also, a question, will the saved PDF indices in Everything automatically expand and contract as PDFs are added and removed in the file system?

To share the information I had shared with you via email with this forum, I'm including the AnyTXT screenshot below. If you ever have time to expand Everything so that it can populate the actual text results found without having to open the specific document, Everything would become an even more powerful tool.

Thanks again for all your hard work on this software.
anytxt.png
anytxt.png (451.62 KiB) Viewed 8311 times
void
Developer
Posts: 17514
Joined: Fri Oct 16, 2009 11:31 pm

Re: Unable to search PDF contents

Post by void »

Thank you for your feedback excelsius,
I'm wondering if maybe not all the indices are in RAM?
All indexes and content are stored in RAM.

Tools -> Debug -> Statistics gives more information about memory usage.
What is shown for the first Database section?


Also, a question, will the saved PDF indices in Everything automatically expand and contract as PDFs are added and removed in the file system?
Yes, removing the PDF file from the system will also remove the properties and content indexed by Everything.


To share the information I had shared with you via email with this forum, I'm including the AnyTXT screenshot below. If you ever have time to expand Everything so that it can populate the actual text results found without having to open the specific document, Everything would become an even more powerful tool.
My own text preview handler is on my TODO list.
Thank you for the suggestion.
void
Developer
Posts: 17514
Joined: Fri Oct 16, 2009 11:31 pm

Re: Unable to search PDF contents

Post by void »

Just making a note here...

Everything 1.5.0.1361a will now treat empty handlers as the NULL CLSID.

This might be useful to disable handlers.



For example, to disable the PDF iFIlter:
  • In Everything 1.5, from the Tools menu, click Options.
  • Click the Advanced tab on the left.
  • To the right of Show settings containing, search for:
    ifilter
  • Select content_ifilter_handlers.
  • Set the value to: [{"filter":"*.pdf","handler":""}]
  • Click OK.


For example, to disable the PDF iFIlter:
  • In Everything 1.5, from the Tools menu, click Options.
  • Click the Advanced tab on the left.
  • To the right of Show settings containing, search for:
    ifilter
  • Select content_ifilter_handlers.
  • Set the value to: [{"filter":"*.pdf","handler":""}]
  • Click OK.


An empty handler is the same as: {00000000-0000-0000-0000-000000000000}
connormacleod
Posts: 3
Joined: Wed Jan 22, 2025 1:43 am

Re: Unable to search PDF contents

Post by connormacleod »

Hi folks,

Been anxiously following this thread to try and get my Everything 1.5a to index PDF file content, but to no avail so far :(

I am running windows 7 Pro.

Here's what I tried so far, can anybody see if I'm missing a step?

1. Install PDFFilter64installer.msi from Acrobat (version 9, the one that is for Windows 7)

2. Fire up windows indexing options file types, for .PDF extension, change to "index properties & file contents", which changes to "PDF filter" in the filter description column (check)
(note windows search now returns results when searching content in PDF's, phew)

3. In Everything, advanced, select: content_ifilter_handlers
Set the value to: [{"filter":"*.pdf","handler":"{6C337B26-3E38-4F98-813B-FBA18BAB64F5}"}]
(so it would use windows Ifilter, which I've confirmed is now working after step 2)

4. In everything, tools, options, indexes, content
- check "index file content"
-include only "*.pdf;*.doc;*.docx;*.txt;*.xls;*.xlsx"

5. Forced a re-build of the indexes

6. closed everything and restarted

List of PDF's comes up fine when filtering PDF files (see 1st screenshot)
Capture1.JPG
Capture1.JPG (131.89 KiB) Viewed 1988 times

But as soon as I type content:hardware (yes the word hardware exists in a number of PDF's), all my results disappear :( (see 2nd screen shot). Doesn't matter what I type for a content argument, I get zero results
Capture2.JPG
Capture2.JPG (30.21 KiB) Viewed 1988 times
For some reason, it's doesn't seem like everything is using the windows PDF iFIlter :(

Can anyone advise what I might be missing?

grateful for any help
NotNull
Posts: 5817
Joined: Wed May 24, 2017 9:22 pm

Re: Unable to search PDF contents

Post by NotNull »

Running Everything as administrator (see the title bar of Everything 1.5) is intended for special cases.
It will limit the functionality of Everything. For exampe: Adobe shell extensions, like iFilters, are known for causing issues when run as administrator.

The recommended way to run Everything is to install the service and to run as a regular user:
  • In Everything, from the Tools menu, click Options.
  • Click the General tab on the left.
  • Check Store settings and data in %APPDATA%\Everything.
  • Uncheck Run as administrator.
  • Check Everything Service. (Please make sure this is tick-checked and not square-checked)
  • Click OK.
  • Exit Everything (right click the Everything tray icon and click Exit).
  • Restart Everything.


If Explorer is indexing the content as intended, there is no need to configure a separate iFilter in Everything. All should be functioning without your "Step 3.".
connormacleod
Posts: 3
Joined: Wed Jan 22, 2025 1:43 am

Re: Unable to search PDF contents

Post by connormacleod »

Oh!! Ok that's interesting, I thought I'd disabled that after reading the earlier threads.

So I tried again, and when I start it up is persistently stills shows administrator in the title grrr.... I checked the options, run as administrator keeps getting a "blue filled box in it". I click it and it puts a check mark, click again and box is empty. I hit apply and it prompts to restart as STANDARD user. (i made sure everything was killed in task manager just to be sure).

I restarted, and guess what? the "run as administrator" box is blue filled again (no check, no blank) and it says administrator in the title bar :(
see screenshot..
Capture3.JPG
Capture3.JPG (66.1 KiB) Viewed 1893 times
What am I doing wrong that the uncheked "run as administrator" box won't stay that way?

I hope there's a way to force it to keep this box unchecked that I'm missing?
connormacleod
Posts: 3
Joined: Wed Jan 22, 2025 1:43 am

Re: Unable to search PDF contents

Post by connormacleod »

I think I found the problem..... found this on another thread (viewtopic.php?t=13445)
NotNull wrote: Mon Jun 12, 2023 9:39 am I suspect your Windows account itself causes all programs -- including Explorer and Everything -- to be run with high privileges

To check:
  • Start Command Prompt ( CMD.exe )
  • Type or paste the following command and press ENTER:
    whoami.exe /groups
  • Find the Mandatory Label/... group name
  • Find the SID for that entry
  • The SID can be any of the following:
    • S-1-16-4096 ( = Low)
    • S-1-16-8192 ( = Medium)
    • S-1-16-12288 ( = High)
    • S-1-16-16384 ( = System)
  • Report back the SID for your user account
  • If not Medium or Low, you need to create a new useraccount with less privileges for daily operation [1]

To create a new usraccount:
  • Press 'Win + R' to activate the Run dialog
  • Type
    lusrmgr.msc
  • Press 'Ctrl + Shift + ENTER' to start this program elevated
    Local Users and Groups will open
  • Right-click Users
  • Select New User from the context menu
  • (fill in the settings the way you want it)
  • Press the Create button
  • Press the Close button

To test:
  • Log off
  • Log in with the newly created user account
  • Run whoami /groups for this user
  • Report back the current SID
  • Done.
SID for 3 different user accounts:
2023-06-12 11_50_29-.png


[1] "Need" is a bit too strong here, but it is highly recommended

indeed my Mandatory Label/... group name was running as [*]S-1-16-12288 ( = High), after re-enabling UAC (on one lowest from being turned off), it's now running as [*]S-1-16-8192 ( = Medium). Everything no longer starts in administrative mode, and it *Seems* to be spending a lot of time finally indexing my PDF,s, TXT's etc. yay!!

I'm glad its working seems like a lot to go thru though, maybe for the next rev we could just have a command line argument we can put in a shortcut that prevents everything from running as administrator? Everything64.exe - noadminmode or something ?
Post Reply