ES -export-txt with non-ASCII

If you are experiencing problems with "Everything", post here for assistance.
Post Reply
aussieboykie
Posts: 35
Joined: Sun Mar 08, 2015 11:05 pm

ES -export-txt with non-ASCII

Post by aussieboykie » Wed Feb 06, 2019 4:50 am

This is probably best explained with output from a command window. The first 3 lines show what happens when ES is executed normally and this is exactly what I expect to see. However, note what happens to "prüfung" (with diacritics) when output is written to a text file.

c:\test>es.exe prüfung -full-path-and-name
C:\test\prufung.txt
C:\test\prüfung.txt

c:\test>es.exe prüfung -full-path-and-name -export-txt exported.txt

c:\test>type exported.txt
C:\test\prufung.txt
C:\test\pr├╝fung.txt

c:\test>


My use case is to import the output from ES in order to create a Directory Opus collection. I can make this work by redirecting ES output to the clipboard and processing from there, but I would prefer to be able to rely on output written to an intermediate file.

es.exe is dated 27/06/2018

void
Site Admin
Posts: 4769
Joined: Fri Oct 16, 2009 11:31 pm

Re: ES -export-txt with non-ASCII

Post by void » Wed Feb 06, 2019 7:12 am

es exports text as UTF-8.

If you are ok with the active console code page, please try redirecting output to a file instead of using the -export command line options.
For example:
es.exe prüfung -full-path-and-name > exported.txt

Characters not supported by the active code page will be displayed as ?

To change the consoles active code page, please see:
https://ss64.com/nt/chcp.html

Added to the ES help:
UTF-8 encoding is used for exporting as txt.

aussieboykie
Posts: 35
Joined: Sun Mar 08, 2015 11:05 pm

Re: ES -export-txt with non-ASCII

Post by aussieboykie » Wed Feb 06, 2019 10:41 pm

void wrote:
Wed Feb 06, 2019 7:12 am
es exports text as UTF-8.
Thanks for clarifying. There is no BOM on the exported text file so by default (for my use case) it is not recognised as such. However, I can force the import to assume UTF-8 and that works. Would you consider adding a BOM, or an option to do so?

void
Site Admin
Posts: 4769
Joined: Fri Oct 16, 2009 11:31 pm

Re: ES -export-txt with non-ASCII

Post by void » Fri Feb 08, 2019 6:05 am

I'll add an option to do so.
However, it will be off by default as the UTF-8 spec doesn't recommend using the BOM.

Thanks for the suggestion.

NotNull
Posts: 1295
Joined: Wed May 24, 2017 9:22 pm

Re: ES -export-txt with non-ASCII

Post by NotNull » Fri Feb 08, 2019 9:51 pm

You could also use PowerShell to convert a UTF8 file to a UTF8-BOM file:

(Get-Content .\exported.txt) | Set-Content -Encoding UTF8 .\exported.txt

Post Reply