💾 Archived View for gemi.dev › gemini-mailing-list › 000976.gmi captured on 2024-06-16 at 15:15:21. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2023-12-28)
-=-=-=-=-=-=-
Hello everyone, today I'd like to talk about access logs. Almost every HTTP server uses NCSA Common Log Format (or its superset - Combined Log Format). This is very cool, because developers of misc utilities (like fail2ban or monitoring tools) don't need to bother writing log parsers for each server. ## Example log entry .---------------------- IP address of the client which made the request | .------------ rfc1413 identity (always "-" in practice) | | .---------- authorized user ID (as in .htpasswd file) | | | .---- datetime string [%d/%b/%Y:%H:%M:%S %z] | | | | | | | | * * * * 127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 * * * | | | | | | HTTP method, resource and protocol version --------. | | HTTP status code returned to the client ---------------. | number of bytes of data transferred (without headers) ------. References: => https://en.wikipedia.org/wiki/Common_Log_Format => https://publib.boulder.ibm.com/tividd/td/ITWSA/ITWSA_info45/en_US/HTML/g uide/c-logs.html#common => https://www.loganalyzer.net/log-analyzer/apache-common-log.html ## Adaptibility If you look at Gophernicus code, it's using Combined Log Format, which is nice but confusing (I mean seeing "HTTP/1.0" string and HTTP status codes in a Gopher server's log feels weird), however compatibility is worth it. I think Common Log Format can be applied for Gemini too. The only problem is, such format does not include <META>. Also it won't look good in syslog because of double datetime. Let's review the syntax: > host ident authuser date request status bytes Everything is obvious except authuser. I suggest using last 7 characters of client certificate's SHA-1 cache (git had shown that it is enough). ## RFC 1413: Ident protocol If you run a webserver, you probably understand how useful User-agent is for identifying robots visiting your website. Thankfully, Gemini doesn't require client identification as there're no compatibility issues between different Gemini clients. But that makes learning anything about robots very hard for capsule operators :( I appreciate Stéphane Bortzmeyer for including additional info in robots.txt requests: > gemini://example.space/robots.txt?robot=true&uri=gemini://gemini.bortzmey er.org/software/lupa/ I'd like to suggest another one solution for this problem (so we have 15 competing standards later). Let's suppose Yuri runs a Gemini server, Sergei runs a Gemini search egnine *AND* an identd server, for example, fakeidentd: => http://www.guru-group.fi/~too/sw/ A static, secure identd. One source file only! Sergei's crawler makes a request to Yuri's server. Yuri's server sends an ident query to Sergei's identd server, reads response and writes access log. Yuri reads 'celestial-crawler' in the logs and gets excited about his capsule getting indexed. Upsides:
Some missing stuff ### References for ident protocol: Specification => gemini://gemini.bortzmeyer.org/rfc-mirror/rfc1413.txt Wikipedia article (see "Software" section for identd servers) => https://en.wikipedia.org/wiki/Ident_Protocol ## Sample query ```python import socket with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as client: client.settimeout(1) client.connect((foreign_host, 113)) client.send(f"{foreign_port},{local_port}\r\n".encode()) result = client.recv(4096).decode().split(":")[-1].strip() ```
On Mon, Jul 19, 2021 at 03:22:49AM +0500, Anna “CyberTailor” <cyber@sysrq.in> wrote a message of 91 lines which said: > Almost every HTTP server uses NCSA Common Log Format (or its > superset - Combined Log Format). This is very cool, because > developers of misc utilities (like fail2ban or monitoring tools) > don't need to bother writing log parsers for each server. Yes, this is cool but it doesn't mean this format is perfect. The biggest problem is that it logs the source IP address but not the source port. Because of the importance of IP address sharing today in the IPv4 world (RFC 6269 <gemini://gemini.bortzmeyer.org/rfc-mirror/rfc6269.txt>), logging just the source IP address is a bad idea (RFC 6302 <gemini://gemini.bortzmeyer.org/rfc-mirror/rfc6302.txt> recommends,
---