💾 Archived View for rawtext.club › ~sloum › geminilist › 002336.gmi captured on 2020-09-24 at 01:16:29. Gemini links have been rewritten to link to archived content

View Raw

More Information

-=-=-=-=-=-=-

<-- back to the mailing list

Getting slammed by a client

Hannu Hartikainen hannu.hartikainen+gemini at gmail.com

Sat Jul 25 09:51:39 BST 2020

- - - - - - - - - - - - - - - - - - - 

Thanks for pointing this out. I never read logs if everything works, so...

I'm getting lots of requests to urls likegemini://hannuhartikainen.fi/twinwiki/Welcome,%20visitors%21/twinwiki/Welcome%252C%2520visitors%2521/_history/twinwiki/_edit/twinwiki/_create/twinwiki/_create/twinwiki/_help/twinwiki/_help/twinwiki/_index/twinwiki/_edit/twinwiki/_history/twinwiki/_edit/twinwiki/_create/twinwiki/_help/twinwiki/_history/twinwiki/_help/twinwiki/_create/twinwiki/_history/twinwiki/_history/twinwiki/_history/twinwiki/_create/twinwiki/_create/twinwiki/_index/twinwiki/_history/twinwiki/_edit/twinwiki/_edit/twinwiki/_history/twinwiki/_help/twinwiki/_help/twinwiki/_history/twinwiki/_create/twinwiki/_help/twinwiki/_edit/twinwiki/_index/twinwiki/_history

Oops, I've written bugs once again! I do have this robots.txt, though:

User-agent: gusAllow: /

User-agent: *Disallow: /

(I guess I should disallow even gus from twinwiki, or at least anynon-content pages.)

The crawler also breaks ansi.hrtk.in for other users while crawling(which disallows even gus in robots.txt). I couldn't figure out how tomake Jetforce stop streaming if the client closes connection. The codeis here if someone has pointers:https://github.com/dancek/ansimirror/blob/master/ansimirror.py

Anyone have experience fighting misbehaving crawlers? Should wedevelop low-resource honeypots to exhaust crawler resources? Or startmaintaining a community blacklist?

-Hannu