<-- back to the mailing list

Logging format for Gemini servers

Anna “CyberTailor” cyber at sysrq.in

Sun Jul 18 23:22:49 BST 2021

- - - - - - - - - - - - - - - - - - - 

Hello everyone, today I'd like to talk about access logs.

Almost every HTTP server uses NCSA Common Log Format (or its superset -Combined Log Format). This is very cool, because developers of miscutilities (like fail2ban or monitoring tools) don't need to botherwriting log parsers for each server.

Example log entry

.---------------------- IP address of the client which made the request | .------------ rfc1413 identity (always "-" in practice) | | .---------- authorized user ID (as in .htpasswd file) | | | .---- datetime string [%d/%b/%Y:%H:%M:%S %z] | | | | | | | | * * * * 127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 * * * | | | | | | HTTP method, resource and protocol version --------. | | HTTP status code returned to the client ---------------. | number of bytes of data transferred (without headers) ------.

References:=

https://en.wikipedia.org/wiki/Common_Log_Format=
https://publib.boulder.ibm.com/tividd/td/ITWSA/ITWSA_info45/en_US/HTML/guide/c-logs.html#common=
https://www.loganalyzer.net/log-analyzer/apache-common-log.html

Adaptibility

If you look at Gophernicus code, it's using Combined Log Format, whichis nice but confusing (I mean seeing "HTTP/1.0" string and HTTP statuscodes in a Gopher server's log feels weird), however compatibility isworth it.

I think Common Log Format can be applied for Gemini too. The onlyproblem is, such format does not include <META>. Also it won't look goodin syslog because of double datetime.

Let's review the syntax:

host ident authuser date request status bytes

Everything is obvious except authuser. I suggest using last 7 charactersof client certificate's SHA-1 cache (git had shown that it is enough).

RFC 1413: Ident protocol

If you run a webserver, you probably understand how useful User-agent isfor identifying robots visiting your website.

Thankfully, Gemini doesn't require client identification as there're nocompatibility issues between different Gemini clients. But that makeslearning anything about robots very hard for capsule operators :(

I appreciate Stéphane Bortzmeyer for including additional info inrobots.txt requests:

gemini://example.space/robots.txt?robot=true&uri=gemini://gemini.bortzmeyer.org/software/lupa/

I'd like to suggest another one solution for this problem (so we have 15competing standards later).

Let's suppose Yuri runs a Gemini server, Sergei runs a Gemini searchegnine *AND* an identd server, for example, fakeidentd:=

http://www.guru-group.fi/~too/sw/ A static, secure identd. One source file only!

Sergei's crawler makes a request to Yuri's server. Yuri's server sendsan ident query to Sergei's identd server, reads response and writesaccess log. Yuri reads 'celestial-crawler' in the logs and gets excitedabout his capsule getting indexed.

Upsides:* looks cool and fun* opt-in* actually standartized* 'ident' field can be logged every time a request is made* human visitors can leave their names in server logs so Geminispace feels more comfy and personal=

https://tvtropes.org/pmwiki/pmwiki.php/Main/KilroyWasHere

Downsides:* identd probably won't work behind ISP's NAT* requires writing asynchronous or threaded server code to avoid blocking main thread (although separating logger and listener processes is a good idea as it's more secure)* default fail2ban filters rely on 'ident' field always being "-"

What are you thoughts?Feel free to ask questions 🙃