The Common Logging Format as defined by Apache and other HTML servers contains one line per client request divided into seven fields separated by a single space. They are: 1) IP address, either IPv4 or IPv6 2) Hostname of the client, or "-" if not known. 3) Name of the user, or "-" if not known. 4) Date in square brackets, in the form [10/Oct/2000:13:55:36 -0700]. 5) Request line in double quotes. 6) Status code of response. 7) Number of bytes in the response body, or 0 if none. I think there are two reasonable approaches to adapting this format to Gemini, the "as compatible as possible" or "ACAP" approach, and the "literal" approach. In either approach, fields 1 and 7 are just as in HTTP, and fields 2 and 3 are just "-". On the ACAP approach, field 4 uses the date format above, field 5 contains GET followed by the path segment of the URL followed by HTTP/1.1 (all space separated), and field 6 contains the Gemini code converted to an equivalent HTTP code (e.g. 20 becomes 200). I'll work out the full equivalence later if people like this. On the literal approach, field 4 is ISO 8601 (RFC 3336) format, field 5 is the URL request line (no quotes needed), and field 6 is the Gemini status code unconverted. The advantage of the ACAP approach is that it allows existing HTTP log analyzers to be used. The literal approach keeps all available information but will need its own analysis tools. Of course, a server can support both log formats as well as any other formats desired, so the question is which format is Best Practice if only one is provided. It's possible to convert literal format to ACAP format after the fact, but not vice versa. John Cowan http://vrici.lojban.org/~cowan cowan at ccil.org "Mr. Lane, if you ever wish anything that I can do, all you will have to do will be to send me a telegram asking and it will be done." "Mr. Hearst, if you ever get a telegram from me asking you to do anything, you can put the telegram down as a forgery." -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20201227/bb62 fb83/attachment-0001.htm>
Historically, efforts to establish a common logging format for Gemini have not been well received. I think there's already too much diversity out there and server authors are reluctant to change. The question of whether or not IP addresses should be routinely logged also usually proves quite divisive. I still think there's value in such a standard (as it allows for reusable log processing tools), but I definitely think it's out of scope for the protocol spec proper and belongs in a companion spec. I think those are better suited to [tech] than [spec]? For the record, I don't like that the Apache format uses spaces as a field separator when spaces also occur inside the date. Molly Brown's log format uses tabs as separators, so it works very nicely with the standard `cut` utility. I use `cut`, `grep`, `sort`, `uniq` and `wc -l` in short pipelines to run queries on my logs, and really enjoy being able to do so. Cheers, Solderpunk
On Sun, Dec 27, 2020 at 02:59:02PM -0500, John Cowan wrote: > On the literal approach, field 4 is ISO 8601 (RFC 3336) format, field 5 is > the URL request line (no quotes needed), and field 6 is the Gemini status > code unconverted. We want to be careful about malicious clients sending a request like '\n<garbage or fake log here>'. Although that may fail, it would still show up in the logs and mess them up. Perhaps the logger should check if the request line is a proper URL, and if it is not it would encode it in some way (perhaps just URL-encoding it, because that function may already be available to the code). ~aravk | ~nothien -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20201227/e5f1 c805/attachment.sig>
The main differences with what my server is doing are:
> * I do not log the IP but its sha1 hash, because of privacy concerns Doesn't this provide no security though? It's trivial to hash all IPv4 addresses and compare them. Additionally, this doesn't provide any security to clients, because they can't guarantee this is in effect. makeworld
> On Dec 27, 2020, at 20:59, John Cowan <cowan at ccil.org> wrote: > > The Common Logging Format as defined by Apache and other HTML servers contains one line per client request divided into seven fields separated by a single space. They are: > IMO, that's should be left to the implementations. Their choice. Whatever is convenient. Doesn't hurt to point to the Common Logging Format as a FYI. And even suggest a mapping, the same way the CGI spec has been shoehorned back into Gemini. But that really seem to be an implementation detail.
> On Dec 27, 2020, at 22:48, colecmac at protonmail.com wrote: > > Doesn't this provide no security though? It's trivial to hash all IPv4 > addresses and compare them. Additionally, this doesn't provide any > security to clients, because they can't guarantee this is in effect. Genau. Privacy by obscurity is no privacy at all. Furthermore, TLS leaves a big, fat digital signature trail. In any case, best to leave such details to the implementations.
> On Dec 27, 2020, at 22:53, Petite Abeille <petite.abeille at gmail.com> wrote: > > Furthermore, TLS leaves a big, fat digital signature trail. Previously on Gemini: https://tlsfingerprint.io https://tlsfingerprint.io/static/frolov2019.pdf Related to: https://ssd.eff.org/en/module/what-fingerprinting While Gemini has far fever moving part than HTTP, it still has some. I'm not a privacy expert though, so not sure how practical this all is. But a trail is a trail :)
It was thus said that the Great Solderpunk once stated: > > For the record, I don't like that the Apache format uses spaces as a > field separator when spaces also occur inside the date. Molly Brown's > log format uses tabs as separators, so it works very nicely with the > standard `cut` utility. I use `cut`, `grep`, `sort`, `uniq` and `wc -l` > in short pipelines to run queries on my logs, and really enjoy being > able to do so. My own logging format is: remote=XXX.XXX.XXX.XXX status=20 request="gemini://gemini.conman.org/boston/2001/11/13.1" bytes=1540 subject="" issuer="" (I've redacted the IP address) The final two fields record information about the client certificate to help debug issues with my server. Here's an example: remote=XXX.XXX.XXX.XXX status=20 request="gemini://gemini.conman.org/private/" bytes=333 subject="/CN=default" issuer="/CN=default" I did not change the subject or issuer. It's been interesting to see what's being sent in client certificates. -spc
Le dimanche 27 d?cembre 2020, 22:48:13 CET colecmac at protonmail.com a ?crit : > > * I do not log the IP but its sha1 hash, because of privacy concerns > > Doesn't this provide no security though? It's trivial to hash all IPv4 > addresses and compare them. Additionally, this doesn't provide any > security to clients, because they can't guarantee this is in effect. It?s not for clients, it?s for me. I?m not sure what I am legally allowed to do with IPs so I feel more confident not storing them. I sha1 IPs the same whether they are v4 or v6. It may indeed be easy to do a dictonnary attack for v4 log entries, but I?m not sure what I can do about that. C?me
> On Dec 28, 2020, at 12:45, C?me Chilliet <come at chilliet.eu> wrote: > > I sha1 IPs the same whether they are v4 or v6. It may indeed be easy to do a dictonnary attack for v4 log entries, but I?m not sure what I can do about that. See https://en.wikipedia.org/wiki/Rainbow_table#Defense_against_rainbow_tables
On Sun, 27 Dec 2020 21:39:41 +0100 C?me Chilliet <come at chilliet.eu> wrote: > * I do not log the IP but its sha1 hash, because of privacy concerns Please note that the table of the sha-1 of the entire IPv4 address space is ~80 GiB and that such a measure can easily be reversed if not individually salted before hashing (after which comparing hashes in log entries is useless), even if I have to resort to searching the whole IPv4 address space. You should *not* depend on this measure where you have a real need for privacy. -- Philip -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 488 bytes Desc: not available URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20201228/d356 e3e6/attachment-0001.sig>
On Sun, Dec 27, 2020 at 09:04:01PM +0100, Solderpunk <solderpunk at posteo.net> wrote a message of 20 lines which said: > The question of whether or not IP addresses should be routinely > logged also usually proves quite divisive. By the way, *if* you log IP addresses (this is a big IF), in a world of NAT and CGNAT, you should also log the port, as requested by RFC 6302 <gemini://gemini.bortzmeyer.org/rfc-mirror/rfc6302.txt>
---