WWW indexing concerns (was: Gemini Universal Search)



On 2/26/2020 11:29 AM, Sean Conner wrote:
> It was thus said that the Great Andrew Kennedy once stated:
>>
>> So the issue here is that the only way to opt out of being indexed is to
>> contact each proxy maintainer and request that they make accommodations
>> for you. That's fine with only 15 or so gemini servers, but not fair to
>> proxy maintainers as gemini grows. It's also not enough to ask all proxies
>> to use robots.txt, because there's nothing stopping someone from ignoring
>> it either out of ignorance or in bad faith.
> 
>   There are other ways.  One way is to recognize a proxy server and block
> any requests from it.
This is preferable to me, just blocking it at the firewall level, but
does become administratively cumbersome as critical mass is acheived and
a curated list of proxies isn't available - if someone does maintain
such a list, it could  just be popped into ipsets to keep the rulesets
to a minimum.

I don't want ANYONE being able to access any of my Gemini servers via a
browser that doesn't support Gemini either natively, or via a plug-in.
I've been quite vocal and adament about this in the Gopher community for
well over a decade - to me, but not most folks apparently, it defeats
the purpose of, and incentive to, develop unique content in
Gopher/Gemini space, since someone is simply accessing it via HTTP anyway.

The problem with this method is that, let's say, there's a GUS server
attempting to spider me on TCP 1965, but there's also some infernal HTTP
< > Gemini proxy trying to access content on my Gemini servers from the
same IP. I end up with an uncomfortable choice because I want to be
indexed by GUS, but I don't want to allow anyone to use the World Wide
Web to access my content.


>   A second one is to extend robots.txt to indicate proxying preference, or
> some other file, but then there are multiple requests (or maybe
> not---caching information could be included). 

Ah yes, in a perfect world Sean :)



> Heck, even a DNS record (like
> a TXT RR with the contents "v=Gemini; proxy=no" with the TTL of the DNS
> record being honored).  But that relies upon the good will of the proxy to
> honor that data.

Again, in a perfect world ;) Either of these solutions (a TXT RR and/or
utilizing robots.txt) are ideal, sans the concerns about extra
traffic/requests.

Right now everyone, for the most part, is on this list, and the good
folks here are inclined to adhere to such friendly standards, but moving
forward as adoption builds like a snowball rolling down the mountain,
there will invariably be bad actors coming online.

One consideration worth mentioning is that, at least in my case, I tend
to have A and AAAA RRs point to a single host, and rely upon the
listening ports to determine which protocols are used to serve the
appropriate data. The way you suggested using the TXT RR would work fine
 in this case, however :)

-- 
Bradley D. Thornton
Manager Network Services
http://NorthTech.US
TEL: +1.310.421.8268

---

Previous in thread (2 of 6): 🗣️ Sean Conner (sean (a) conman.org)

Next in thread (4 of 6): 🗣️ Steve Ryan (stryan (a) saintnet.tech)

View entire thread.