On my main system, running Everything v1.5.0.1391a, I noticed that the majority of what should be UTF-8 encoded files are being reported by the character-encoding: property/column as ANSI. Another property, valid-utf-8: seems to be reporting them correctly. I tested in a VM by installing previous versions and the issue is NOT present in v.1.5.0.1383a. All versions after this appear to have this issue.
It should be easily reproduceable by doing a search of expected UTF encoded files in 1383a and a recent version and compare the difference in results for the character-encoding: property/column is reporting. In my case, over 70% of UTF-8 text files are showing as ANSI in v1391a, so you may not even have to install 1383a to notice the issue.
Regards.
Possible Bug: 'Character-Encoding:' Property/Column Reports Many UTF-8 Files as ANSI
-
SolarTheory
- Posts: 13
- Joined: Sun Aug 14, 2022 9:51 pm
Re: Possible Bug: 'Character-Encoding:' Property/Column Reports Many UTF-8 Files as ANSI
If you check your UTF-8 files, you may find the difference between having a BOM or not.
-
SolarTheory
- Posts: 13
- Joined: Sun Aug 14, 2022 9:51 pm
Re: Possible Bug: 'Character-Encoding:' Property/Column Reports Many UTF-8 Files as ANSI
The issue I am describing is unrelated to UTF-8 with or without BOM. Unless you are saying that recent Everything builds now classify files encoded as "UTF-8 with BOM" as "ANSI" for the "character-encoding:" property?
Besides, that is not the issue. 70%+ of UTF-8 encoded files are being reported as ANSI for me with any version beginning v1.5.0.1384a through v1391a. Whether the file is UTF-with BOM or without BOM, v1.5.0.1383a reports both as UTF-8, while future versions are reporting both as ANSI. I have tested this on multiple computers.
But if you are not seeing the same on your end, then that would be strange indeed.
Besides, that is not the issue. 70%+ of UTF-8 encoded files are being reported as ANSI for me with any version beginning v1.5.0.1384a through v1391a. Whether the file is UTF-with BOM or without BOM, v1.5.0.1383a reports both as UTF-8, while future versions are reporting both as ANSI. I have tested this on multiple computers.
But if you are not seeing the same on your end, then that would be strange indeed.
Re: Possible Bug: 'Character-Encoding:' Property/Column Reports Many UTF-8 Files as ANSI
Windows (and Notepad specifically) treats txt files without a UTF-8 BOM and with all ASCII characters as ANSI.
Everything does the same.
Consider storing your txt files with a UTF-8 BOM (not recommended)
-or-
Use the valid UTF-8 Everything property.
Everything does the same.
Consider storing your txt files with a UTF-8 BOM (not recommended)
-or-
Use the valid UTF-8 Everything property.
-
SolarTheory
- Posts: 13
- Joined: Sun Aug 14, 2022 9:51 pm
-
SolarTheory
- Posts: 13
- Joined: Sun Aug 14, 2022 9:51 pm
Re: Possible Bug: 'Character-Encoding:' Property/Column Reports Many UTF-8 Files as ANSI
Actually, I tested it by saving a UTF-8 BOM encoded text file, and the character-encoding: property still reported it as ANSI, hence my confusion.void wrote: Sat Mar 22, 2025 11:34 pm Windows (and Notepad specifically) treats txt files without a UTF-8 BOM and with all ASCII characters as ANSI.
Everything does the same.
Consider storing your txt files with a UTF-8 BOM (not recommended)
-or-
Use the valid UTF-8 Everything property.
Re: Possible Bug: 'Character-Encoding:' Property/Column Reports Many UTF-8 Files as ANSI
Please upload a TXT with a UTF-8 BOM showing in Everything as ANSI in a bug report.
-
SolarTheory
- Posts: 13
- Joined: Sun Aug 14, 2022 9:51 pm
Re: Possible Bug: 'Character-Encoding:' Property/Column Reports Many UTF-8 Files as ANSI
Egg on my face. Apologies. As I went to submit the bug report, I could not recreate with the file I thought I had (it turns out incorrectly) saved with BOM. So I take back my comment about UTF-8 BOM showing as ANSI for the character-encoding property.
Ok, so to confirm: For the purposes of the 'character-encoding:' property, UTF-8 with BOM = UTF-8 (as expected, and unchanged from previous behavior).
And since v1.5.0.1384a, UTF-8 (w/o BOM) = ANSI (as now designed).
Fair enough. I had used 'ANSI' as a quick way to filter out 'binary' files in certain search filters and macros, so I'll just update them and rely on the 'Valid-UTF-8:' property.
Thanks!!
Ok, so to confirm: For the purposes of the 'character-encoding:' property, UTF-8 with BOM = UTF-8 (as expected, and unchanged from previous behavior).
And since v1.5.0.1384a, UTF-8 (w/o BOM) = ANSI (as now designed).
Fair enough. I had used 'ANSI' as a quick way to filter out 'binary' files in certain search filters and macros, so I'll just update them and rely on the 'Valid-UTF-8:' property.
Thanks!!