Originally posted: 2022-02-09 ~ Last updated: 2022-02-10
Replying to this article by gemini.susa.net (sorry I didn't find your name on the site):
Gemserv - Just Block All Encoded Paths
Where it's suggested to block all requests containing percent symbol (used to encode non-ASCII characters in URLs).
Without being sarcastic, I'd suggest blocking all characters outside of [a-z] range (plus few whitelisted ones, like /._-, but also block double-dots), if that's all you need.
Reason is that in Gemini URL is a UTF8-encoded string (why you need to percent-encode it is beyond my comprehension†), so it might contain multiple unwanted characters both before and after being percent-decoded: think of things like newlines, quotes, spaces, non-breaking spaces, RTL markers, NUL characters, shell injection attempts, etc, etc.
Of course we all expect our servers to behave sanely and just confirm that you don't have a file with these weird characters in its name. But if you are planning not to have them at all - it might be better to have a list of allowed characters and block everything else, rather then check / block them one by one.
As @dece mentioned in a private email, main reason for the percent-encoding is to ensure the URLs can be shared outside of Gemini:
... if you paste an URL with weird unicode chars such as ZWS or an emoji on IRC, people without an appropriate renderer (terminal client) or emoji support in their fonts would receive a broken URL. Percent-encoding reduce the set of possible characters to ASCII (more or less?) so it's way more portable for humans!
Sounds reasonable! I wouldn't want to be locked out of some URL just because someone sticked a "poo" emoji somewhere in the middle of it!
@Acidus explained a proper way of defending against directory transversal attacks, which is to check what file you're going to send to the user after all path transformations and ensure that it's inside a directory with public files.
Robust Defence Against Directory Transversal attacks
I totally agree with this and actually that's exactly how I implemented it in my bash gemini server few days ago (using built-in `realpath` function, which also follows symlinks - another way of running outside of directory with public files). And I should've mentioned it in this post originally, but got sidetracked into percent-encoding Unicode land.
What do you think? Click one of the links below to show your opinion:
Blocking only requests with percent symbols should be enough (1 vote)
You forgot digits - you probably want to allow them, too (8 votes)
That's right, we should tighten security as much as we can (10 votes)
You don't need security, you just need to write good code (8 votes)
Robert'); DROP TABLE students;-- (13 votes)
Poll privacy warning: IPs are logged to avoid double-voting. To vote anonymously or double-vote, write me an e-mail.