How does Everything query its data?

Plug-in and third party software discussion.
Post Reply
.oisyn
Posts: 9
Joined: Tue Jan 17, 2017 12:13 pm

How does Everything query its data?

Post by .oisyn » Mon Dec 09, 2019 1:06 pm

Hi!

I was hoping to get some info on the inner guts of Everything, especially with regards to querying all the data from the filesystem for the first time.

I work at a AAA game studio and we have a content build system that keeps track of hunderds of thousands of dependencies. It uses the NTFS change journal to get notified of changes to the filesystem. However, in some particular situations not worth mentioning here, we need to rescan all the file dates, which is currently done using the regular file functions using the file paths in our database, but this can take a *very* long time.

To optimize this process, we tried scanning the MFT, but that doesn't give us the last modified timestamp of a file - in particular, the TimeStap member of the USN_RECORD struct is always 0. We haven't yet tried using OpenFileById using the file id provided by the MFT scan, but that would still issue a call per file which is what we're trying to avoid. In my research into this subject, I've come across several tools that parses the $MFT file directly, in particular the $STANDARD_INFORMATION would provide timestamp information on a file. However, this all seems pretty undocumented and would require extra maintenance whenever (if ever) this structure changes, so I'd rather stick to regular API calls if I can.

So, before diving into this, I was wondering how Everything gets its hands on the timestamp information. Because it manages to index the 8.1 million files on my disk within a couple of minutes. If any of the devs can give me some insights that would be highly appreciated! I'm reachable through email if you rather not explain this publicly. Or if you don't want to share anything at all because it's a trade secret I can totally respect that :)

.oisyn
Posts: 9
Joined: Tue Jan 17, 2017 12:13 pm

Re: How does Everything query its data?

Post by .oisyn » Tue Dec 10, 2019 12:16 pm

Update, we get reasonable performance by using OpenFileById() using the file id supplied by the MFT enumeration and then a GetFileInformationByHandleEx() with FileBasicInfo. Scans about 2M files in a couple of minutes.

void
Site Admin
Posts: 5148
Joined: Fri Oct 16, 2009 11:31 pm

Re: How does Everything query its data?

Post by void » Wed Dec 11, 2019 5:15 am

So, before diving into this, I was wondering how Everything gets its hands on the timestamp information.
Everything reads date modified information from the standard information attribute in the MFT.

Post Reply