Page 1 of 1

ES -export-txt with non-ASCII

Posted: Wed Feb 06, 2019 4:50 am
by aussieboykie
This is probably best explained with output from a command window. The first 3 lines show what happens when ES is executed normally and this is exactly what I expect to see. However, note what happens to "prüfung" (with diacritics) when output is written to a text file.

c:\test>es.exe prüfung -full-path-and-name
C:\test\prufung.txt
C:\test\prüfung.txt

c:\test>es.exe prüfung -full-path-and-name -export-txt exported.txt

c:\test>type exported.txt
C:\test\prufung.txt
C:\test\pr├╝fung.txt

c:\test>


My use case is to import the output from ES in order to create a Directory Opus collection. I can make this work by redirecting ES output to the clipboard and processing from there, but I would prefer to be able to rely on output written to an intermediate file.

es.exe is dated 27/06/2018

Re: ES -export-txt with non-ASCII

Posted: Wed Feb 06, 2019 7:12 am
by void
es exports text as UTF-8.

If you are ok with the active console code page, please try redirecting output to a file instead of using the -export command line options.
For example:
es.exe prüfung -full-path-and-name > exported.txt

Characters not supported by the active code page will be displayed as ?

To change the consoles active code page, please see:
https://ss64.com/nt/chcp.html

Added to the ES help:
UTF-8 encoding is used for exporting as txt.

Re: ES -export-txt with non-ASCII

Posted: Wed Feb 06, 2019 10:41 pm
by aussieboykie
void wrote:
Wed Feb 06, 2019 7:12 am
es exports text as UTF-8.
Thanks for clarifying. There is no BOM on the exported text file so by default (for my use case) it is not recognised as such. However, I can force the import to assume UTF-8 and that works. Would you consider adding a BOM, or an option to do so?

Re: ES -export-txt with non-ASCII

Posted: Fri Feb 08, 2019 6:05 am
by void
I'll add an option to do so.
However, it will be off by default as the UTF-8 spec doesn't recommend using the BOM.

Thanks for the suggestion.

Re: ES -export-txt with non-ASCII

Posted: Fri Feb 08, 2019 9:51 pm
by NotNull
You could also use PowerShell to convert a UTF8 file to a UTF8-BOM file:

(Get-Content .\exported.txt) | Set-Content -Encoding UTF8 .\exported.txt