Page 1 of 1

How does Everything query its data?

Posted: Mon Dec 09, 2019 1:06 pm
by .oisyn
Hi!

I was hoping to get some info on the inner guts of Everything, especially with regards to querying all the data from the filesystem for the first time.

I work at a AAA game studio and we have a content build system that keeps track of hunderds of thousands of dependencies. It uses the NTFS change journal to get notified of changes to the filesystem. However, in some particular situations not worth mentioning here, we need to rescan all the file dates, which is currently done using the regular file functions using the file paths in our database, but this can take a *very* long time.

To optimize this process, we tried scanning the MFT, but that doesn't give us the last modified timestamp of a file - in particular, the TimeStap member of the USN_RECORD struct is always 0. We haven't yet tried using OpenFileById using the file id provided by the MFT scan, but that would still issue a call per file which is what we're trying to avoid. In my research into this subject, I've come across several tools that parses the $MFT file directly, in particular the $STANDARD_INFORMATION would provide timestamp information on a file. However, this all seems pretty undocumented and would require extra maintenance whenever (if ever) this structure changes, so I'd rather stick to regular API calls if I can.

So, before diving into this, I was wondering how Everything gets its hands on the timestamp information. Because it manages to index the 8.1 million files on my disk within a couple of minutes. If any of the devs can give me some insights that would be highly appreciated! I'm reachable through email if you rather not explain this publicly. Or if you don't want to share anything at all because it's a trade secret I can totally respect that :)

Re: How does Everything query its data?

Posted: Tue Dec 10, 2019 12:16 pm
by .oisyn
Update, we get reasonable performance by using OpenFileById() using the file id supplied by the MFT enumeration and then a GetFileInformationByHandleEx() with FileBasicInfo. Scans about 2M files in a couple of minutes.

Re: How does Everything query its data?

Posted: Wed Dec 11, 2019 5:15 am
by void
So, before diving into this, I was wondering how Everything gets its hands on the timestamp information.
Everything reads date modified information from the standard information attribute in the MFT.