💾 Archived View for gemi.dev › gemini-mailing-list › 000511.gmi captured on 2023-12-28 at 15:48:23. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2023-11-04)
-=-=-=-=-=-=-
It's not threatening my server or anything, but who ever is responsible for the client at 198.12.83.123, your client is currently stuck in the Redirection From Hell test and has been for some time. From the length of time, it appears to be running autonomously so perhaps a leftover thread, or an autonomous client that doesn't read robots.txt, or didn't follow the spec carefully enough. Anyway, just a heads up. -spc
It was thus said that the Great Sean Conner once stated: > > It's not threatening my server or anything, but who ever is responsible > for the client at 198.12.83.123, your client is currently stuck in the > Redirection From Hell test and has been for some time. From the length of > time, it appears to be running autonomously so perhaps a leftover thread, or > an autonomous client that doesn't read robots.txt, or didn't follow the spec > carefully enough. > > Anyway, just a heads up. > > -spc Sorry, this was sent to an additional address by mistake, and said other address has no relation to what the client is doing. -spc
(Resent to the list, not just spc) > 198.12.83.123 ? dig -x 198.12.83.123 +short phlox.titmouse.org. https://phlox.titmouse.org/about.html mentions a Discordian by the name of "Benedict T. Eyen, the T stands for Teeth." Unfortunately there's no email address, so we can't contact him. Curiously, there is a Gemini server running on that domain, but accessing it just gives me Proxy Request Refused. makeworld
It was thus said that the Great Sean Conner once stated: > > It's not threatening my server or anything, but who ever is responsible > for the client at 198.12.83.123, your client is currently stuck in the > Redirection From Hell test and has been for some time. From the length of > time, it appears to be running autonomously so perhaps a leftover thread, or > an autonomous client that doesn't read robots.txt, or didn't follow the spec > carefully enough. > > Anyway, just a heads up. > > -spc So the client in question was most likely a web proxy. I'm not sure what site, nor the software used, but it did response to a Gemini request with "53 Proxy Requet Refused" so there *is* a Gemini server there. And given that it made 137,060 requests before I shut down my own server told me that it was an autonomous agent that no one was watching. Usually, I may see a client hit 20 or 30 times before it stops. Not this one. Now granted, my server is a bit unique in that I have tests set up specifically for clients to test against, and several of them involve infinite redirects. And yes, that was 137,060 *unique* requests. So first up, Solderpunk, if you could please add a redirection follow limit to the specification and make it mandatory. You can specify some two, heck, even three digit number to follow, but please, *please*, add it to the specification and *not* just the best practices document to make programmers aware of the issue. It seems like it's too easy to overlook this potential trap (I see it often enough). Second, had the proxy in question fetched robots.txt, I had this area specifically marked out: User-agent: * Disallow: /test/redirehell I have that for a reason, and had the autonomous client in question read it, this wouldn't have happened in the first place. Even if you disagree with this, it may be difficult to stop an autonomous agent once the user of said web proxy has dropped the web connection. I don't know, I haven't written a web proxy, and this is one more thing to keep in mind when writing one. I think it would be easier to follow robots.txt. -spc (To the person who called me a dick for blocking a web proxy---yes, there *are* reasons to block them)
November 29, 2020 9:25 PM, "Sean Conner" <sean at conman.org> wrote: > It was thus said that the Great Sean Conner once stated: > >> It's not threatening my server or anything, but who ever is responsible >> for the client at 198.12.83.123, your client is currently stuck in the >> Redirection From Hell test and has been for some time. From the length of >> time, it appears to be running autonomously so perhaps a leftover thread, or >> an autonomous client that doesn't read robots.txt, or didn't follow the spec >> carefully enough. >> >> Anyway, just a heads up. >> >> -spc > > So the client in question was most likely a web proxy. I'm not sure what > site, nor the software used, but it did response to a Gemini request with > "53 Proxy Requet Refused" so there *is* a Gemini server there. And given > that it made 137,060 requests before I shut down my own server told me that > it was an autonomous agent that no one was watching. Usually, I may see a > client hit 20 or 30 times before it stops. Not this one. > > Now granted, my server is a bit unique in that I have tests set up > specifically for clients to test against, and several of them involve > infinite redirects. And yes, that was 137,060 *unique* requests. > > So first up, Solderpunk, if you could please add a redirection follow > limit to the specification and make it mandatory. You can specify some > two, heck, even three digit number to follow, but please, *please*, add it > to the specification and *not* just the best practices document to make > programmers aware of the issue. It seems like it's too easy to overlook > this potential trap (I see it often enough). > > Second, had the proxy in question fetched robots.txt, I had this area > specifically marked out: > > User-agent: * > Disallow: /test/redirehell > > I have that for a reason, and had the autonomous client in question read > it, this wouldn't have happened in the first place. Even if you disagree > with this, it may be difficult to stop an autonomous agent once the user of > said web proxy has dropped the web connection. I don't know, I haven't > written a web proxy, and this is one more thing to keep in mind when writing > one. I think it would be easier to follow robots.txt. > > -spc (To the person who called me a dick for blocking a web proxy---yes, > there *are* reasons to block them) I recently wrote a gemini to web proxy as a simple side-project to see how easy it would be to create, and one thing I implemented that I feel should be a standard for web proxies is not handling redirects internally. If you tell my gemini proxy to request a page that offers a redirect (say, the next page link for LEO), it will send you back a small web page saying "hey, the site at this URL wants to send you to this other URL, do you want to follow that redirect or nah?" (not exact wording but you get my drift). That is, if you attempt to access the Redirection from Hell test using my proxy, each and every redirect would be a "confirm redirect" page served to the user. After about 20 pages, you'd think the user would catch on. That being said, my gemini proxy is not linked anywhere on my website (and if it were in a place I would link publically I would use robots.txt to prevent web crawlers from accessing it), so perhaps I'm not the target of this message. I still maintain that a proxy is a direct agent of a user, and not an automated client. Proxy authors should use robots.txt on the web side to block crawlers from accessing the proxy, but proxies shouldn't have to follow robots.txt. It's actually easier to just write your web proxy in such a way that this doesn't happen to you. Just my two cents, Robert "khuxkm" Miles
On Sun, Nov 29, 2020 at 4:15 PM <colecmac at protonmail.com> wrote: > > (Resent to the list, not just spc) > > > 198.12.83.123 > > ? dig -x 198.12.83.123 +short > phlox.titmouse.org. > > https://phlox.titmouse.org/about.html mentions a Discordian by > the name of "Benedict T. Eyen, the T stands for Teeth." Unfortunately > there's no email address, so we can't contact him. Curiously, there > is a Gemini server running on that domain, but accessing it just gives > me Proxy Request Refused. > > makeworld I'm not trying to call anyone out, but since this situation has come up on the mailing list a few times it's helpful to look at the TLS cert. ``` openssl s_client -connect 198.12.83.123:1965 CONNECTED(00000003) depth=2 O = Digital Signature Trust Co., CN = DST Root CA X3 verify return:1 depth=1 C = US, O = Let's Encrypt, CN = Let's Encrypt Authority X3 verify return:1 depth=0 CN = lignumvitae.org verify return:1 --- Certificate chain 0 s:/CN=lignumvitae.org i:/C=US/O=Let's Encrypt/CN=Let's Encrypt Authority X3 1 s:/C=US/O=Let's Encrypt/CN=Let's Encrypt Authority X3 i:/O=Digital Signature Trust Co./CN=DST Root CA X3 ``` gemini://lignumvitae.org/ returns the same proxy error, but they're using a Let's Encrypt cert so presumably they have a HTTPS server running too. https://lignumvitae.org/ redirects to https://gj.libraryoferis.org/ from there, I recognized library of eris as an one of the early (and totally awesome) gemini servers so I tried gemini://libraryoferis.org/ There's an email address listed on that capsule. - Michael
It was thus said that the Great Michael Lazar once stated: > On Sun, Nov 29, 2020 at 4:15 PM <colecmac at protonmail.com> wrote: > > > > (Resent to the list, not just spc) > > > > > 198.12.83.123 > > > > ? dig -x 198.12.83.123 +short > > phlox.titmouse.org. > > > > https://phlox.titmouse.org/about.html mentions a Discordian by > > the name of "Benedict T. Eyen, the T stands for Teeth." Unfortunately > > there's no email address, so we can't contact him. Curiously, there > > is a Gemini server running on that domain, but accessing it just gives > > me Proxy Request Refused. > > > > makeworld > > I'm not trying to call anyone out, but since this situation has come up on the > mailing list a few times it's helpful to look at the TLS cert. It never occured to me to look at the certificate [1]. > There's an email address listed on that capsule. Thanks for the information. -spc [1] I had been away for hours at that point, and came back to the web proxy *still* going, and a much of ssh bots attempting to log onto the server and my automatic blocking system not fully working [2], so I was a bit preoccupied at the time. [2] Checking 0.0.0.0 and 0.0.0.0/0 are two entirely different things.
It was thus said that the Great Robert khuxkm Miles once stated: > November 29, 2020 9:25 PM, "Sean Conner" <sean at conman.org> wrote: > > > It was thus said that the Great Sean Conner once stated: > > > >> It's not threatening my server or anything, but who ever is responsible > >> for the client at 198.12.83.123, your client is currently stuck in the > >> Redirection From Hell test and has been for some time. From the length of > >> time, it appears to be running autonomously so perhaps a leftover thread, or > >> an autonomous client that doesn't read robots.txt, or didn't follow the spec > >> carefully enough. > >> > >> Anyway, just a heads up. > >> > >> -spc > > > > So the client in question was most likely a web proxy. I'm not sure what > > site, nor the software used, but it did response to a Gemini request with > > "53 Proxy Requet Refused" so there *is* a Gemini server there. And given > > that it made 137,060 requests before I shut down my own server told me that > > it was an autonomous agent that no one was watching. Usually, I may see a > > client hit 20 or 30 times before it stops. Not this one. > > > > Now granted, my server is a bit unique in that I have tests set up > > specifically for clients to test against, and several of them involve > > infinite redirects. And yes, that was 137,060 *unique* requests. > > > > So first up, Solderpunk, if you could please add a redirection follow > > limit to the specification and make it mandatory. You can specify some > > two, heck, even three digit number to follow, but please, *please*, add it > > to the specification and *not* just the best practices document to make > > programmers aware of the issue. It seems like it's too easy to overlook > > this potential trap (I see it often enough). > > > > Second, had the proxy in question fetched robots.txt, I had this area > > specifically marked out: > > > > User-agent: * > > Disallow: /test/redirehell > > > > I have that for a reason, and had the autonomous client in question read > > it, this wouldn't have happened in the first place. Even if you disagree > > with this, it may be difficult to stop an autonomous agent once the user of > > said web proxy has dropped the web connection. I don't know, I haven't > > written a web proxy, and this is one more thing to keep in mind when writing > > one. I think it would be easier to follow robots.txt. > > > > -spc (To the person who called me a dick for blocking a web proxy---yes, > > there *are* reasons to block them) > > I recently wrote a gemini to web proxy as a simple side-project to see how > easy it would be to create, and one thing I implemented that I feel should > be a standard for web proxies is not handling redirects internally. If you > tell my gemini proxy to request a page that offers a redirect (say, the > next page link for LEO), it will send you back a small web page saying > "hey, the site at this URL wants to send you to this other URL, do you > want to follow that redirect or nah?" (not exact wording but you get my > drift). That is, if you attempt to access the Redirection from Hell test > using my proxy, each and every redirect would be a "confirm redirect" page > served to the user. After about 20 pages, you'd think the user would catch > on. That being said, my gemini proxy is not linked anywhere on my website > (and if it were in a place I would link publically I would use robots.txt > to prevent web crawlers from accessing it), so perhaps I'm not the target > of this message. You, specifically, weren't the target for my last bit, but I am addressing in general those who write webproxies for Gemini. Your proxy's method of handing redirects works. I was just a bit upset that an agent out there made 137,000 requests [1] before anyone did anything about it. > I still maintain that a proxy is a direct agent of a user, and not an > automated client. Proxy authors should use robots.txt on the web side to > block crawlers from accessing the proxy, but proxies shouldn't have to > follow robots.txt. I understand the argument, but I can't say I'm completely on board with it either, because ... > It's actually easier to just write your web proxy in such a way that this > doesn't happen to you. you would probably be amazed at just how often clients *don't* limit following redirects. Most of the time, someone is sitting there, watching their client and stopping it after perhaps 30 seconds, fix the first redirect issue (redirecting back to itself) only to get trapped at the next step. And to think I brought this upon myself for wanting redirects in the first place. -spc (How ironic) [1] For the record, it was NOT placing an undue burden on my server, just cluttering the log files. It's only an issue when the log file gets to 2G in size, then logging stops for everything.
It was thus said that the Great Sean Conner once stated: > > you would probably be amazed at just how often clients *don't* limit > following redirects. Most of the time, someone is sitting there, watching > their client and stopping it after perhaps 30 seconds, fix the first > redirect issue (redirecting back to itself) only to get trapped at the next > step. I want to clarify that this usually happens when a new client is being tested. Somehow, the web proxy in question wasn't tested. -spc
---
Previous Thread: [SPEC-CHANGE] Mandatory scheme in request and link URLs