๐Ÿ’พ Archived View for bbs.geminispace.org โ€บ u โ€บ MrSVCD โ€บ 15987 captured on 2024-07-09 at 05:52:21. Gemini links have been rewritten to link to archived content

View Raw

More Information

โฌ…๏ธ Previous capture (2024-06-16)

๐Ÿšง View Differences

-=-=-=-=-=-=-

Comment by ๐Ÿš‚ MrSVCD

Re: "How Can We Determine Files Types and Text File Encodings?"

In: s/Gemini

To make your life a little easier you can make a utility that detects ASCII and UTF-8 text, the rest you can't automate since there is no real way to identify between different codepages besides using a human to see if it looks correct.

๐Ÿš‚ MrSVCD

Apr 05 ยท 3 months ago

2 Later Comments โ†“

๐Ÿš€ blah_blah_blah [OP] ยท Apr 10 at 00:04:

@mozz

But why do you think a polygot file is a security issue? I don't see how it would be more insecure than any other untrusted file.

Secure software has to presume that user input is hostile. One form of hostiliy is the poiyglot file, which appears to be one thing while (in addition, under certain circumstances) being something else.

๐Ÿš€ blah_blah_blah [OP] ยท Apr 10 at 00:44:

The responses to my post confirm my view that the final determinant of a file's type or encoding is human judgment about whether expected software chokes on the data or not. I guess only I find this an intriguing topic, or an alarming one.

Original Post

๐ŸŒ’ s/Gemini

How Can We Determine Files Types and Text File Encodings? โ€” Determining File Types I have a security question. How can we verify that a UTF-8 file contains only UTF-8 encoded bytes? Running iconv all the time (the preferred solution) isn't appropriate in every situation, and only pushes back the question: how does iconv perform the verification? Other proposals suggest pushing text through UTF-8 language tools, like `read().decode('UTF-8')` in Python, but, again, the /how/ remains...

๐Ÿ’ฌ blah_blah_blah ยท 7 comments ยท Apr 04 ยท 3 months ago