Love the content indexing!

Discussion related to "Everything" 1.5 Alpha.
Post Reply
DerekZiemba
Posts: 4
Joined: Thu Sep 27, 2018 4:46 pm

Love the content indexing!

Post by DerekZiemba » Fri Apr 23, 2021 5:47 pm

Love the content indexing!

Are there any plans to perhaps expand on it?
I have a few unsolicited suggestions/requests :D :
  1. add ability to configure settings on a per file type basis (includes, excludes, etc.)
  2. add ability to set Minimum size in addition to Maximum size
  3. WOULD BE AMAZING:
    Remove the include, exclude, maximum size fields all together and replace them with a fully featured Search field.
    (like the `Organize Filters -> Edit Filters -> Search` box)
    Due to the additional memory requirements of indexing content (ram usage increased >2GB!), I feel that better filtering to select content to include in the index is needed. Having the power of a fully featured Search Query would be perfect.
    I imagine Search only works on a fully constructed database, but could it perhaps be built in several passes?:
    • 1st pass: Construct the core database without content or indexed properties
    • 2nd pass: Go through again and add indexed properties to the DB to facilitate any properties that may be used in the content Search Query
    • 3rd pass: For each indexed file type, run the fully featured Search expression (that's unique for each file type!) and index the content of files that match it
  4. if you're really bored, ability to search content of both Outlook and Thunderbird emails would be nice, but each of those programs already have decent search so probably not worth the time investment. Apparently this is already possible

Everything is by far my favorite and the most useful app on my PC. I can't use a PC without it.
Saw you had a donate button so threw $40 bucks your way. Didn't see a way to leave a comment like many of the other donators though - might have clicked next->next->next too fast.

Keep up all your great work!
Last edited by DerekZiemba on Fri Apr 23, 2021 10:20 pm, edited 7 times in total.

horst.epp
Posts: 517
Joined: Fri Apr 04, 2014 3:24 pm

Re: Content Indexing

Post by horst.epp » Fri Apr 23, 2021 5:53 pm

DerekZiemba wrote:
Fri Apr 23, 2021 5:47 pm
Love the content indexing!

Are there any plans to perhaps expand on it?
I have a few unsolicited suggestions/requests :D :
  1. add ability to configure settings on a per file type basis (includes, excludes, etc.)
  2. add ability to set Minimum size in addition to Maximum size
  3. WOULD BE AMAZING:
    Remove the include, exclude, maximum size fields all together and replace them with a fully featured Search field.
    (like the `Organize Filters -> Edit Filters -> Search` box)
    I imagine Search only works on a fully constructed database, but could it perhaps be built in several passes?:
    • 1st pass: Construct the core database without content or indexed properties
    • 2nd pass: Go through again and add indexed properties to the DB to facilitate any properties that may be used in the content Search Query
    • 3rd pass: For each indexed file type, run the fully featured Search expression and index the content of files that match it
  4. if you're really bored, ability to search content of both Outlook and Thunderbird emails would be nice, but each of those programs already have decent search so probably not a worth the time investment.
Thunderbird emails can already be searched with Everything content indexing.
You have to set Thunderbird to use Maildir as Storage format
and add .eml to the list of files for content indexing.
Using only the search in Thunderbird is useless as it can't show external stored documents with the same content.
Using the Everything content search delivers all relevant mails and documents in one list.

void
Site Admin
Posts: 7534
Joined: Fri Oct 16, 2009 11:31 pm

Re: Love the content indexing!

Post by void » Sat Apr 24, 2021 1:55 am

Thank you for your feedback and support DerekZiemba,
add ability to configure settings on a per file type basis (includes, excludes, etc.)
You can do this already with file include only filters.

To create a filter to content include only certain file types:
  • In Everything, from the Tools menu, click Options.
  • Click the Content tab on the left.
  • Change Include only files to:
    *.docx;*.txt;*.pdf
    where docx, txt and pdf are the extensions you wish to content index.
  • Click OK.
To create a filter to content include only certain file types in certain folders:
  • In Everything, from the Tools menu, click Options.
  • Click the Content tab on the left.
  • Leave Exclude folders blank, and specify your paths with include only files:
  • Change Include only files to:
    c:\my docs\**.docx;c:\my pdfs\**.pdf
    Note: ** will match subdirectories, where as * does not.
  • Click OK.
I have on my TODO list to add a ... button to the right of the include only/exclude filters to open a dialog for easier editing.
add ability to set Minimum size in addition to Maximum size
I will consider a minimum size option.
Remove the include, exclude, maximum size fields all together and replace them with a fully featured Search field.
I will consider an option to do this.
Could be done with the addition of a search modifier, such as filter:size:>1mb
if you're really bored, ability to search content of both Outlook and Thunderbird emails would be nice, but each of those programs already have decent search so probably not a worth the time investment.
Native .eml support is in the works.
For now this can be done with eml-iFilters.
Didn't see a way to leave a comment like many of the other donators though
Paypal does not support comments with some payment methods and/or countries.

Thanks again for the suggestions and support.

tuska
Posts: 433
Joined: Thu Jul 13, 2017 9:14 am

Re: Love the content indexing!

Post by tuska » Sat Apr 24, 2021 9:28 am

2void
void wrote:
Sat Apr 24, 2021 1:55 am
To create a filter to content include only certain file types in certain folders:
  • Change Include only files to:
    c:\my docs\**.docx;c:\my pdfs\**.pdf
    Note: ** will match subdirectories, where as * does not.
Tooltip "Include only files:":
Can you please include this example in the tooltip for the field content of the field "Include only files:" and
check whether an addition to the tooltip on this settings page also makes sense in other fields, e.g. "Exclude files:"?
Semicolon delimited wildcard filter list for which files to include.
Include a backslash (\) to match full paths and filenames. Otherwise, match basename only.
Empty = Include all files.
regex: = Regular expressions.
** = will match subdirectories, where as * does not.
Example: *.docx;*.pdf;c:\my docs\**.docx
Thank you!

void
Site Admin
Posts: 7534
Joined: Fri Oct 16, 2009 11:31 pm

Re: Love the content indexing!

Post by void » Sun Apr 25, 2021 1:36 am

Can you please include this example in the tooltip
Added to my TODO list.
Thanks for the suggestion!

tuska
Posts: 433
Joined: Thu Jul 13, 2017 9:14 am

Re: Love the content indexing!

Post by tuska » Thu Apr 29, 2021 4:14 pm

2void
Thanks for the prompt additions to the tooltips in Everything-1.5.0.1256a.x64 / 27.04.2021! :)

DerekZiemba
Posts: 4
Joined: Thu Sep 27, 2018 4:46 pm

Re: Love the content indexing!

Post by DerekZiemba » Thu Jun 10, 2021 7:34 am

@void
I came here to report a bug in 1.5.0.1263a (x64) regarding content indexing rather than start a new thread and never realized you responded! I'll detail the bug at the end of this post after responding to your response.

------------------------
Regarding filtering for file types:
I was suggesting unique filtering parameters for individual file types or a group of file types. For instance each filetype example below would have their own criteria:
  1. "*.txt;*.log;": // the file types the rules below will evaluate on
    • Exclude folders: *\node_modules;*\.git;*\.svn;*\.**\;C:\Windows;C:\ProgramData;C:\Program Files**\;*\AppData;*\msys64;*\packages;*\resources;*\*cache*;*\tmp;*\temp;*\EULA;*\Legal;*\Dictionary; *\intl;*\locale;*\_locale;*\_locales;*\lang;*\language*;*\de;*\de-DE;*\en_CA;*\en_GB;*\es;*\es-ES;*\fr;*\fr-FR;*\fr_CA;*\it;*\it-IT;*\ja;*\ja-JP;*\ko;*\ko-KR;*\pl;*\pt-BR;*\ru;*\ru-RU;*\tr;*\zh-CHS;*\zh-CHT;*\zh-CN;*\zh-TW;
    • Maximize size: 128KB
    • Minimum size: 512b // likely nothing useful in txt files smaller than this
    • Exclude Hidden Files/Folders: true // would be nice option to have, would easily take care of folders like: *\.git;*\AppData;*\ProgramData;
    • Exclude Known Cache & Temp Directories: true // would be another nice option to have (a fully featured search field where we can access our custom functions would make things like this easy)
  2. "*.md;README;":
    • Include folders: A:\;%USERPROFILE%;*\node_modules;
    • Exclude folders: *\.git;*\.svn;*\.**\;*\AppData;*\packages;*\resources;*\*cache*;*\tmp;*\temp;*\EULA;*\Legal;*\Dictionary; *\intl;*\locale;*\_locale;*\_locales;*\lang;*\language*;*\de;*\de-DE;*\en_CA;*\en_GB;*\es;*\es-ES;*\fr;*\fr-FR;*\fr_CA;*\it;*\it-IT;*\ja;*\ja-JP;*\ko;*\ko-KR;*\pl;*\pt-BR;*\ru;*\ru-RU;*\tr;*\zh-CHS;*\zh-CHT;*\zh-CN;*\zh-TW;
    • Maximize size: 256KB
    • Minimum size: 0b
    • Exclude Hidden Files/Folders: true // would be nice option to have to easily take care of folders like: *\.git;*\AppData;*\ProgramData;
  3. "*.json;*.yml;":
    • Include folders: A:\;%USERPROFILE%;
    • Exclude folders: // Same excludes as *.md;README;
    • Maximize size: 64KB
    • Minimum size: 256b
  4. "*.ini;*.config":
    • Include folders: C:\Windows;C:\ProgramData;C:\Program Files**\;*\AppData;
    • Maximize size: 128KB
------------------------
Regarding Content Indexing Bug
Since upgrading to 1.5.0.1263a 3 days ago directly from 1.5.1259a (1259a worked pretty much flawlessly for me) I've noticed my computer starts up slow and isn't immediately usable. Investigating further I found that Everything is starving out CPU & Disk resources for every other process trying to start.

According to Everything (this is after Excludes) there's 5.5TB or 4,443,540 objects to index. It's rebuilding the database apparently from scratch on every startup, and with the amount of content on this system this takes over an hour. (is one of the reasons I requested more advanced filters so that everything could be more specific about the content it indexes)

The actual AppData\Local\Voidtools\Everything-1.5a.db I've seen range from 450MB to 900MB so it's not like I'm indexing the contents of everything. Originally I indexed a lot more which put it over 3GB + a hefty 3-4GB chunk of RAM (this was on 1259a) so quickly rethought that and made the content filters a lot stricter, removed indexing of most Properties, and removed Fast Sort for all Properties to keep memory usage under a GB and DB size down.
I have not changed anything since upgrading to 1263a.


In the meantime I've:
* Limited Everything to using only 2 cores and set the process priority to low.
* Disabled "Start Everything on system startup"
* Created a task in Task Scheduler to run "C:\Program Files\Everything 1.5a\Everything64.exe" -admin -startup -load-delay 60000 at user logon

If I right click everything in the system tray and Exit Everything, then re-launch it, it does not rebuild the database. I also can't replicate it after creating that Scheduled Task to start it. Now I'm not sure if maybe the other times were flukes. But I'm scared of to test the way it was because it ties up the computer for hours when I want to use it.

More background for the reasons I restarted the past 3 days (maybe the reasons I restarted caused it?):
- Updated Everything, VPN (which mucks with the network adapters), and some other things.
- "Your Phone" app has memory leak
- Wanted to play COD:Warzone after running Docker & Hyper-V earlier (vmmem never returns all the memory back) so restarted to get system in fresh state.
- Updated VS2019 from 16.9.3 to 16.10.1

void
Site Admin
Posts: 7534
Joined: Fri Oct 16, 2009 11:31 pm

Re: Love the content indexing!

Post by void » Thu Jun 10, 2021 10:35 am

Thank you for the reply DerekZiemba,

I will consider an option to specify excludes under each includes only.
Thanks for the suggestion.

For now, you need to specify the include only as:
*\node_modules\**.txt;*\node_modules\**.log;*\.git\**.txt;*\.git\**.log;...A:\**.md;A:\**README;%USERPROFILE%\**.md;%USERPROFILE%\**README;...

I know this will get messy quick..
Since upgrading to 1.5.0.1263a 3 days ago directly from 1.5.1259a (1259a worked pretty much flawlessly for me) I've noticed my computer starts up slow and isn't immediately usable. Investigating further I found that Everything is starving out CPU & Disk resources for every other process trying to start.
What is the last rebuild reason reported in Tools -> Debug -> Statistics -> Build -> Last rebuild reason?


I have put on my TODO list to reduce the number of threads used during indexing.
Using a thread for each logical CPU is overkill. (maximum logical CPUs / 2 would be a good start)

void
Site Admin
Posts: 7534
Joined: Fri Oct 16, 2009 11:31 pm

Re: Love the content indexing!

Post by void » Fri Jun 18, 2021 6:47 am

This rebuild is most likely caused by a change in the database version with Everything 1260a.

I have slightly reduced the process priority of Everything in version 1264a.

If Everything is still rebuilding on Everything system startup, please let me know.

Post Reply