making crappy linux file search, database needs FTS to get fast searching as you type, but that means no substring searches only beginning of words, what should I study up now to get both fast searching and substrings?
I am beginner programmer, learning python.
Year ago I switched to linux, and as you probably heard countless times, I miss Everything very much.
I used this unyielding desire to motivate me to learn basics of python, pyqt, some sqlite and also some better understanding of linux.
I managed to get something going(hell its only input field, database query, display result), but I would like to improve it.
What I do is use python command os.walk to index filesystem, save it to an sqlite database where there are only two columns - full path and boolean indicating if its a folder or a file.
As you can imagine with full path for file/folder this database gets quite big, I am talking hundreds of MB big, but that is not an issue for me, I really dont care about the size if it only works well.
What is an issue is the fact that I need to use full text indexing extension for sqlite - FTS to get fast searching in the database, it gets fast, similarly to Everything, but the problem is that the searches dont do substring.
So a query "toni" would not find astonishing, its the result of how that FTS extension works.
I am really not sure how Everything works, I know it uses some ntfs feature to allow that incredible awareness and fast indexing, but is there a database holding all that info gathered? How can it be so fast with such huge amount of data and doing substrings? Not only that, it actually in a fraction of a second tells you how many occurrences are there in the filesystem, even if your files are in millions. I mean this feels like magic to me now that I saw sqlite query in action, I have to limit the results to few hundred, not let it search whole database to the end on every keypress, that would be madness in my case.
Where I want to head now is bit more in to databases I guess, and something on how to get tree structures in to a database, cause that might reduce size of the database and maybe if its small enough, searches inside might be fast enough, it could just use non-indexed search queries (MATCH) to get data and it would maybe be fast enough. But initial googling reveal that this is not an easy topic to grasp. Also the fact that the program kinda works enough - it makes me lose that unquenchable initial drive that I had before and now I am kinda hesitant getting in to really heavy database topics without knowing if theres light at the end of the tunnel... can you give some sense of direction?
Plug-in and third party software discussion.
3 posts • Page 1 of 1
- Site Admin
- Posts: 3182
- Joined: Fri Oct 16, 2009 11:31 pm
I am really not sure how Everything works, I know it uses some ntfs feature to allow that incredible awareness and fast indexing, but is there a database holding all that info gathered?
Everything creates its database from the NTFS master file table.
The database is stored in memory and saved to disk when you exit Everything. The database is restored from disk the next time you open Everything.
Everything uses USN Journaling to keep its database up to date.
The database is basically a list of all the file names (in UTF-8) and pointers to parent folders.
The database is sorted by name, then path.
How can it be so fast with such huge amount of data and doing substrings?
Everything is written entirely in C.
Searches are compiled into byte code and executed.
Everything uses a highly optimized multi-threaded brute force search. Nothing special.
Everything basically uses strstr() on every single file name.
I wrote the Everything database specifically for filenames to be efficient as possible.
Hopefully this helps a bit.
I'm not sure I would be much help with FTS.