ASCII, oh ASCII! Wherefore art thou, ASCII?
The original line copied by an infamous English poet
Puns aside, in the XXI century there are still need to stick to plain, old 7 bit ASCII character table. Many industrial applications stick to it for its simplicity. Unicode is often an overkill in that situations.
So how to find if a stream of text is Unicode? POSIX systems (GNU/Linux, *BSD, MacOs and many, many others) have the file utility, which is as simple as “file filename” that will likely answer like:
DEST_CLI.CSV.20220901-105314.backup: UTF-8 Unicode text, with CRLF line terminators foo: UTF-8 Unicode text
In case of debug we will need to find where Unicode characters are actually used. Ifound inspiration in How to Find Non-ASCII Characters in Text Files in Linux that has been graciously updated less than three days ago (lucky me!), slightly modifying the command; it is as simple as:
grep --color='auto' -P -n "[\x80-\xFF]" foo
Sometimes we just need to convert a UTF-8 file to ASCII (best-effort). In many cases the iconv does the job, piping your data to
iconv -f utf-8 -t ascii//TRANSLIT