On Wed, Feb 26, 2020 at 07:54:35PM -0800, Bradley D. Thornton wrote: > This is preferable to me, just blocking it at the firewall level, but > does become administratively cumbersome as critical mass is acheived and > a curated list of proxies isn't available - if someone does maintain > such a list, it could just be popped into ipsets to keep the rulesets > to a minimum. I am happy to add something to the Best Practices document regarding HTTP proxies, which could include a polite request to inform me of proxies and their IP addresses so I can maintain a master list somewhere, as well as a strong admonition to serve a robots.txt which prevents web crawlers from slurping up Gemini content. > I don't want ANYONE being able to access any of my Gemini servers via a > browser that doesn't support Gemini either natively, or via a plug-in. > I've been quite vocal and adament about this in the Gopher community for > well over a decade - to me, but not most folks apparently, it defeats > the purpose of, and incentive to, develop unique content in > Gopher/Gemini space, since someone is simply accessing it via HTTP anyway. I understand this sentiment, but at the end of the day it's literally impossible to prevent this. It's part and parcel of serving digital content to universal machines owned and operated by other people - you lose all control over things like this. As was posted previously, attempts to regain control with things like DRM just turn into arms races that make life harder for legitimate users. I'm in favour of leaving things at a straightforward "gentleman's agreement". > The problem with this method is that, let's say, there's a GUS server > attempting to spider me on TCP 1965, but there's also some infernal HTTP > < > Gemini proxy trying to access content on my Gemini servers from the > same IP. I end up with an uncomfortable choice because I want to be > indexed by GUS, but I don't want to allow anyone to use the World Wide > Web to access my content. > > > A second one is to extend robots.txt to indicate proxying preference, or > > some other file, but then there are multiple requests (or maybe > > not---caching information could be included). Extending robots.txt to do this seems fairly straightforward. We could introduce "pseudo user-agents" like "proxy/*", "indexer/*", etc. which all user agents of a particular type should respect. Cheers, Solderpunk
---
Previous in thread (5 of 6): 🗣️ Sean Conner (sean (a) conman.org)