Possible Bug: 'Character-Encoding:' Property/Column Reports Many UTF-8 Files as ANSI

Discussion related to "Everything" 1.5.
Post Reply
SolarTheory
Posts: 13
Joined: Sun Aug 14, 2022 9:51 pm

Possible Bug: 'Character-Encoding:' Property/Column Reports Many UTF-8 Files as ANSI

Post by SolarTheory »

On my main system, running Everything v1.5.0.1391a, I noticed that the majority of what should be UTF-8 encoded files are being reported by the character-encoding: property/column as ANSI. Another property, valid-utf-8: seems to be reporting them correctly. I tested in a VM by installing previous versions and the issue is NOT present in v.1.5.0.1383a. All versions after this appear to have this issue.

It should be easily reproduceable by doing a search of expected UTF encoded files in 1383a and a recent version and compare the difference in results for the character-encoding: property/column is reporting. In my case, over 70% of UTF-8 text files are showing as ANSI in v1391a, so you may not even have to install 1383a to notice the issue.

Regards.
horst.epp
Posts: 1642
Joined: Fri Apr 04, 2014 3:24 pm

Re: Possible Bug: 'Character-Encoding:' Property/Column Reports Many UTF-8 Files as ANSI

Post by horst.epp »

If you check your UTF-8 files, you may find the difference between having a BOM or not.
SolarTheory
Posts: 13
Joined: Sun Aug 14, 2022 9:51 pm

Re: Possible Bug: 'Character-Encoding:' Property/Column Reports Many UTF-8 Files as ANSI

Post by SolarTheory »

The issue I am describing is unrelated to UTF-8 with or without BOM. Unless you are saying that recent Everything builds now classify files encoded as "UTF-8 with BOM" as "ANSI" for the "character-encoding:" property?

Besides, that is not the issue. 70%+ of UTF-8 encoded files are being reported as ANSI for me with any version beginning v1.5.0.1384a through v1391a. Whether the file is UTF-with BOM or without BOM, v1.5.0.1383a reports both as UTF-8, while future versions are reporting both as ANSI. I have tested this on multiple computers.

But if you are not seeing the same on your end, then that would be strange indeed.
NotNull
Posts: 5961
Joined: Wed May 24, 2017 9:22 pm

Re: Possible Bug: 'Character-Encoding:' Property/Column Reports Many UTF-8 Files as ANSI

Post by NotNull »

Might be related to this.
void
Developer
Posts: 19870
Joined: Fri Oct 16, 2009 11:31 pm

Re: Possible Bug: 'Character-Encoding:' Property/Column Reports Many UTF-8 Files as ANSI

Post by void »

Windows (and Notepad specifically) treats txt files without a UTF-8 BOM and with all ASCII characters as ANSI.

Everything does the same.

Consider storing your txt files with a UTF-8 BOM (not recommended)
-or-
Use the valid UTF-8 Everything property.
SolarTheory
Posts: 13
Joined: Sun Aug 14, 2022 9:51 pm

Re: Possible Bug: 'Character-Encoding:' Property/Column Reports Many UTF-8 Files as ANSI

Post by SolarTheory »

NotNull wrote: Sat Mar 22, 2025 11:25 pm Might be related to this.
Yes, that must be it. 1384a was the first version that changed the behavior of the "character-encoding:" property.

Thanks
SolarTheory
Posts: 13
Joined: Sun Aug 14, 2022 9:51 pm

Re: Possible Bug: 'Character-Encoding:' Property/Column Reports Many UTF-8 Files as ANSI

Post by SolarTheory »

void wrote: Sat Mar 22, 2025 11:34 pm Windows (and Notepad specifically) treats txt files without a UTF-8 BOM and with all ASCII characters as ANSI.

Everything does the same.

Consider storing your txt files with a UTF-8 BOM (not recommended)
-or-
Use the valid UTF-8 Everything property.
Actually, I tested it by saving a UTF-8 BOM encoded text file, and the character-encoding: property still reported it as ANSI, hence my confusion.
void
Developer
Posts: 19870
Joined: Fri Oct 16, 2009 11:31 pm

Re: Possible Bug: 'Character-Encoding:' Property/Column Reports Many UTF-8 Files as ANSI

Post by void »

Please upload a TXT with a UTF-8 BOM showing in Everything as ANSI in a bug report.
SolarTheory
Posts: 13
Joined: Sun Aug 14, 2022 9:51 pm

Re: Possible Bug: 'Character-Encoding:' Property/Column Reports Many UTF-8 Files as ANSI

Post by SolarTheory »

Egg on my face. Apologies. As I went to submit the bug report, I could not recreate with the file I thought I had (it turns out incorrectly) saved with BOM. So I take back my comment about UTF-8 BOM showing as ANSI for the character-encoding property.

Ok, so to confirm: For the purposes of the 'character-encoding:' property, UTF-8 with BOM = UTF-8 (as expected, and unchanged from previous behavior).
And since v1.5.0.1384a, UTF-8 (w/o BOM) = ANSI (as now designed).

Fair enough. I had used 'ANSI' as a quick way to filter out 'binary' files in certain search filters and macros, so I'll just update them and rely on the 'Valid-UTF-8:' property.

Thanks!!
Post Reply