💾 Archived View for rawtext.club › ~sloum › geminilist › 006939.gmi captured on 2024-02-05 at 10:54:21. Gemini links have been rewritten to link to archived content

View Raw

More Information

⬅️ Previous capture (2021-11-30)

-=-=-=-=-=-=-

<-- back to the mailing list

Logging format for Gemini servers

Stephane Bortzmeyer stephane at sources.org

Mon Jul 19 07:44:39 BST 2021

- - - - - - - - - - - - - - - - - - - 

On Mon, Jul 19, 2021 at 03:22:49AM +0500, Anna “CyberTailor” <cyber at sysrq.in> wrote a message of 91 lines which said:

Almost every HTTP server uses NCSA Common Log Format (or its
superset - Combined Log Format). This is very cool, because
developers of misc utilities (like fail2ban or monitoring tools)
don't need to bother writing log parsers for each server.

Yes, this is cool but it doesn't mean this format is perfect. Thebiggest problem is that it logs the source IP address but not thesource port. Because of the importance of IP address sharing today inthe IPv4 world (RFC 6269<gemini://gemini.bortzmeyer.org/rfc-mirror/rfc6269.txt>), logging justthe source IP address is a bad idea (RFC 6302<gemini://gemini.bortzmeyer.org/rfc-mirror/rfc6302.txt> recommends,*if* you log the source IP address, to log also the source port).

Also, of course, there is the privacy issue. IMHO, Gemini serversshould offer an option to log only the first N bits of the source IPaddress.

| | | .---- datetime string [%d/%b/%Y:%H:%M:%S %z]

RFC 3339 <gemini://gemini.bortzmeyer.org/rfc-mirror/rfc3339.txt>format would have been a better idea.

Thankfully, Gemini doesn't require client identification as there're no
compatibility issues between different Gemini clients. But that makes
learning anything about robots very hard for capsule operators :(

Indeed, this is a serious operational problem. There have been someattempts to list all "good" robots somewhere but it was not a success.

I appreciate Stéphane Bortzmeyer for including additional info in
robots.txt requests:
gemini://example.space/robots.txt?robot=true&uri=gemini://gemini.bortzmeyer.org/software/lupa/

Note that it breaks some Gemini servers<https://framagit.org/bortzmeyer/lupa/-/issues/9>.

Downsides:
* identd probably won't work behind ISP's NAT
* requires writing asynchronous or threaded server code to avoid
blocking main thread (although separating logger and listener
processes is a good idea as it's more secure)

Indeed. It seems to me there are serious limitations.