💾 Archived View for gemi.dev › gemini-mailing-list › 000541.gmi captured on 2024-12-17 at 15:04:33. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2023-12-28)
-=-=-=-=-=-=-
I'm running a Gemini crawler, which gathers metadata about the geminispace. The goal is not to make a search engine but to survey the geminispace. You can find the current results (the crawler did not crawl the entire space yet): gemini://gemini.bortzmeyer.org/software/lupa/stats.gmi The reference site: gemini://gemini.bortzmeyer.org/software/lupa/ The source code, with an issue tracker (bug reports and improvment requests are very welcome): https://framagit.org/bortzmeyer/lupa
It was thus said that the Great Stephane Bortzmeyer once stated: > I'm running a Gemini crawler, which gathers metadata about the > geminispace. The goal is not to make a search engine but to survey the > geminispace. > > You can find the current results (the crawler did not crawl the entire > space yet): > > gemini://gemini.bortzmeyer.org/software/lupa/stats.gmi > > The reference site: > > gemini://gemini.bortzmeyer.org/software/lupa/ > > The source code, with an issue tracker (bug reports and improvment > requests are very welcome): > > https://framagit.org/bortzmeyer/lupa Very cool! Thanks for the work. One stat I haven't seen yet (yours or from GUS) is a breakdown of langauge. How many pages had a lang parameter, then a breakdown by language, how many multiple languages per parameters (for example, "lang=en,fr"). -spc
On Wed, Dec 16, 2020 at 06:16:53PM -0500, Sean Conner <sean at conman.org> wrote a message of 27 lines which said: > > You can find the current results (the crawler did not crawl the entire > > space yet): > > > > gemini://gemini.bortzmeyer.org/software/lupa/stats.gmi > One stat I haven't seen yet (yours or from GUS) is a breakdown of > langauge. How many pages had a lang parameter, then a breakdown by > language, how many multiple languages per parameters (for example, > "lang=en,fr"). Just ask :-) Now done: gemini://gemini.bortzmeyer.org/software/lupa/stats.gmi I note:
On 16-Dec-2020 15:05, Stephane Bortzmeyer wrote: > I'm running a Gemini crawler, which gathers metadata about the > geminispace. The goal is not to make a search engine but to survey the > geminispace. > > You can find the current results (the crawler did not crawl the entire > space yet): > > gemini://gemini.bortzmeyer.org/software/lupa/stats.gmi (Due to user error, my reply was supposed to go to the list, but was sent privately, so I'm re-posting) This is very interesting thank you. Could it be possible to show the distribution of page sizes in geminispace? I know you show the average page size, but to get a better view of what is typical and the range would be good. For example does it follow a power law etc... Is there any raw data available? - Luke
> On Dec 16, 2020, at 16:05, Stephane Bortzmeyer <stephane at sources.org> wrote: > > I'm running a Gemini crawler, which gathers metadata about the > geminispace. Along those lines, a couple of one-liners to gather various host & content information: # IP address(es) # dig +short mozz.us 174.138.124.169 # geolocation # curl --silent https://tools.keycdn.com/geo.json?host=174.138.124.169 | jq | gron json = {}; json.data = {}; json.data.geo = {}; json.data.geo.asn = 14061; json.data.geo.city = "North Bergen"; json.data.geo.continent_code = "NA"; json.data.geo.continent_name = "North America"; json.data.geo.country_code = "US"; json.data.geo.country_name = "United States"; json.data.geo.datetime = "2020-12-18 09:04:57"; json.data.geo.host = "174.138.124.169"; json.data.geo.ip = "174.138.124.169"; json.data.geo.isp = "DIGITALOCEAN-ASN"; json.data.geo.latitude = 40.793; json.data.geo.longitude = -74.0247; json.data.geo.metro_code = 501; json.data.geo.postal_code = "07047"; json.data.geo.rdns = "174.138.124.169"; json.data.geo.region_code = "NJ"; json.data.geo.region_name = "New Jersey"; json.data.geo.timezone = "America/New_York"; json.description = "Data successfully received."; json.status = "success"; # certificate info # cfssl certinfo -domain mozz.us | jq | gron json = {}; json.authority_key_id = "A8:4A:6A:63:04:7D:DD:BA:E6:D1:39:B7:A6:45:65:EF:F3:A8:EC:A1"; json.issuer = {}; json.issuer.common_name = "Let's Encrypt Authority X3"; json.issuer.country = "US"; json.issuer.names = []; json.issuer.names[0] = "US"; json.issuer.names[1] = "Let's Encrypt"; json.issuer.names[2] = "Let's Encrypt Authority X3"; json.issuer.organization = "Let's Encrypt"; json.not_after = "2021-01-21T01:36:54Z"; json.not_before = "2020-10-23T01:36:54Z"; json.pem = "-----BEGIN CERTIFICATE-----\nMIIGJzCCBQ+gAwIBAgISBAK7/ku/XjgmczVT7mmM1cEcMA0GCSqGSIb3D QEBCwUA\nMEoxCzAJBgNVBAYTAlVTMRYwFAYDVQQKEw1MZXQncyBFbmNyeXB0MSMwIQYDVQQD\n ExpMZXQncyBFbmNyeXB0IEF1dGhvcml0eSBYMzAeFw0yMDEwMjMwMTM2NTRaFw0y\nMTAxMjEwM TM2NTRaMBIxEDAOBgNVBAMTB21venoudXMwggEiMA0GCSqGSIb3DQEB\nAQUAA4IBDwAwggEKAo IBAQDZ4pi5q0QlIxAo8sKNBgInG1BGH584lRghCdnrBsZD\n68IuFlJ3V3wrnfsaNv8IZOHRkvx N2uxDo/oVxCCSNug/Ne4b+Pqw7U8thB9zL46A\nMbrHVtAmloykToDRlOHv/OLp2YRQiW7cD57l xot+9+TPlHsAuMccQXQDMbmhT6bf\nirO4m6F6gRf478YLLVOmpxkLd87dhHa7gO3NwmRroIB/D MLdQRAVAMbdDGTjdCrA\nlToWeHOnPNBLKPmI6M9DCqEXoTbIa9OhpJmo+txlS85O8/RHzXu2fV kgnEnBIcsE\n/ZEh5ytov1SogIXzNQgIJFesaWCqgBPLun4molEnfcq5AgMBAAGjggM9MIIDOTA O\nBgNVHQ8BAf8EBAMCBaAwHQYDVR0lBBYwFAYIKwYBBQUHAwEGCCsGAQUFBwMCMAwG\nA1UdEw EB/wQCMAAwHQYDVR0OBBYEFI3x/VWfHHCG1IfE32kGHZPG4RC6MB8GA1Ud\nIwQYMBaAFKhKamM Efd265tE5t6ZFZe/zqOyhMG8GCCsGAQUFBwEBBGMwYTAuBggr\nBgEFBQcwAYYiaHR0cDovL29j c3AuaW50LXgzLmxldHNlbmNyeXB0Lm9yZzAvBggr\nBgEFBQcwAoYjaHR0cDovL2NlcnQuaW50L XgzLmxldHNlbmNyeXB0Lm9yZy8wgfIG\nA1UdEQSB6jCB54ILYXBpLm1venoudXOCE2FzdHJvYm 90YW55Lm1venoudXOCDGNo\nYXQubW96ei51c4ILZGV2Lm1venoudXOCDmdlbWluaS5tb3p6LnV zggtnaXQubW96\nei51c4IRZ29vZHZpYmVzLm1venoudXOCDmdvcGhlci5tb3p6LnVzghRtYWls LWFy\nY2hpdmUubW96ei51c4IMbWFpbC5tb3p6LnVzgg9taWNoYWVsLm1venoudXOCB21v\neno udXOCDnBvcnRhbC5tb3p6LnVzgg1wcm94eS5tb3p6LnVzggt3d3cubW96ei51\nczBMBgNVHSAE RTBDMAgGBmeBDAECATA3BgsrBgEEAYLfEwEBATAoMCYGCCsGAQUF\nBwIBFhpodHRwOi8vY3BzL mxldHNlbmNyeXB0Lm9yZzCCAQQGCisGAQQB1nkCBAIE\ngfUEgfIA8AB3AJQgvB6O1Y1siHMfgo siLA3R2k1ebE+UPWHbTi9YTaLCAAABdVNQ\n7ygAAAQDAEgwRgIhALmUv4K/i3UcPYCIseckN2n fpk8g+Gi4MZRq6Ybr8/JXAiEA\n00kRkd+19OB2j4VASwsoQatWKasN+yTMnkQWOf2YMbsAdQB9 PvL4j/+IVWgkwsDK\nnlKJeSvFDngJfy5ql2iZfiLw1wAAAXVTUO9TAAAEAwBGMEQCICOymh52O gxx/wjJ\ngo5TEIgfEDtgXvKdfBsVtibLeZQWAiAyiUPq2MBPxn9+KJFhhxE8LRI9VIhpWnHV\n 5JlOp2dIYzANBgkqhkiG9w0BAQsFAAOCAQEARqt9QyY4Fq7SBindKcHyrsQ9JtqB\nvfZy5yDKz FwuQZKmk2pxOzapCNRLNeyiEalfIFzrtHI11gr1ZEFHL1rA7pO3ud/j\nM2r0lmvNf8W+kUVf4G ng0TqGyRRh28RDNDCaz8uaYeg5C6BPUIZtHbO6qJBNme2W\noS4Qp0fjjAUvSQwTKDEh5GKnZv4 AnJifMRqSXgZ+HgsamqydODRRTszwCMTMGBhO\naUOf+wF9l90T9N3MLDxSdixh4/qMuE0LpIsy eLJJ08ZsmOvOPtar0zxUw8AXMtGG\n62wmZhlY+vXD4Nk6cKTepSCVEHmCLTtckbHfn518wCQEv JZYYVApG0y1QQ==\n-----END CERTIFICATE-----\n"; json.sans = []; json.sans[0] = "api.mozz.us"; json.sans[1] = "astrobotany.mozz.us"; json.sans[2] = "chat.mozz.us"; json.sans[3] = "dev.mozz.us"; json.sans[4] = "gemini.mozz.us"; json.sans[5] = "git.mozz.us"; json.sans[6] = "goodvibes.mozz.us"; json.sans[7] = "gopher.mozz.us"; json.sans[8] = "mail-archive.mozz.us"; json.sans[9] = "mail.mozz.us"; json.sans[10] = "michael.mozz.us"; json.sans[11] = "mozz.us"; json.sans[12] = "portal.mozz.us"; json.sans[13] = "proxy.mozz.us"; json.sans[14] = "www.mozz.us"; json.serial_number = "349379594475839169414317025618006180741404"; json.sigalg = "SHA256WithRSA"; json.subject = {}; json.subject.common_name = "mozz.us"; json.subject.names = []; json.subject.names[0] = "mozz.us"; json.subject_key_id = "8D:F1:FD:55:9F:1C:70:86:D4:87:C4:DF:69:06:1D:93:C6:E1:10:BA"; # retrieve content type # openssl s_client -quiet -crlf -connect mozz.us:1965 <<< gemini://mozz.us/ 2>/dev/null | head -1 20 text/gemini; lang=en # double check content type # openssl s_client -quiet -crlf -connect mozz.us:1965 <<< gemini://mozz.us/ 2>/dev/null | file --brief --mime-type --mime-encoding - text/plain; charset=utf-8 # validate encoding # openssl s_client -quiet -crlf -connect mozz.us:1965 <<< gemini://mozz.us/ 2>/dev/null | iconv -f utf-8 -t utf-8 > /dev/null; echo $? 0 # guess language # echo $(openssl s_client -quiet -crlf -connect mozz.us:1965 <<< gemini://mozz.us/ 2>/dev/null ) | polyglot detect | cut -d' ' -f1 | uniq English
On Fri, Dec 18, 2020 at 12:12:47PM +0000, Luke Emmet <luke at marmaladefoo.com> wrote a message of 23 lines which said: > > gemini://gemini.bortzmeyer.org/software/lupa/stats.gmi > Could it be possible to show the distribution of page sizes in geminispace? Like this (the page was updated)?
> On Dec 18, 2020, at 15:37, Petite Abeille <petite.abeille at gmail.com> wrote: > > # geolocation > # curl --silent https://tools.keycdn.com/geo.json?host=174.138.124.169 | jq | gron # while at it # whois mozz.us | grep @ e-mail: technical1 at registry.neustar e-mail: registrytechnical2 at neustar.biz Registrar Abuse Contact Email: registrar-abuse at google.com Registrant Email: lazar.michael22 at gmail.com Admin Email: lazar.michael22 at gmail.com Tech Email: lazar.michael22 at gmail.com
>> Could it be possible to show the distribution of page sizes in geminispace? > Like this (the page was updated)? > > * Less than 1 kbyte: 18465 URLs (48.7 %) > * 1 to 1000 kbytes: 15865 URLs (41.9 %) > * More than 1000 kbytes: 3559 URLs (9.4 %) Those bands are very wide. How about in increments of 10^n? e.g. 1kb, 10kb, 100kb.... Also we can have a good general idea of other media types, but to filter on text/gemini would be ideal. If you are inclined! > The code is available. For the data, I'm not decided yet. True, it is > only public data, and there is not even the content of the pages, but > I don't know yet if there isn't some privacy/ethical problem. Let me > check. How about if the data was anonymised, like to remove IP address, domain name, path and file name and replaced by anonymous labels, like this - Domain name: "Domain1" ... "DomainN" - path: "Path1" ..."PathN" but then still to include other details like: resource size, media type, encoding, etc That would still be a very useful for statistical analysis in the aggregate, without revealing any identifiable info? - Luke
It was thus said that the Great Stephane Bortzmeyer once stated: > On Wed, Dec 16, 2020 at 06:16:53PM -0500, > Sean Conner <sean at conman.org> wrote > a message of 27 lines which said: > > > > You can find the current results (the crawler did not crawl the entire > > > space yet): > > > > > > gemini://gemini.bortzmeyer.org/software/lupa/stats.gmi > > > One stat I haven't seen yet (yours or from GUS) is a breakdown of > > langauge. How many pages had a lang parameter, then a breakdown by > > language, how many multiple languages per parameters (for example, > > "lang=en,fr"). > > Just ask :-) Now done: Thanks. > gemini://gemini.bortzmeyer.org/software/lupa/stats.gmi > > I note: > > * French is the second language after english. Cocorico, as we say in > France. > > * There is one page in finnish. And I see one page that is both English and Japanese. > * There are more HTML than Markdown pages on the geminispace, which I > find suprising. Not really, as I've come across one Gemini site that only serves up HTML. > * There is one page in EBCDIC and one in CP-437 :-) Now *THAT* is surprising. -spc
On Fri, Dec 18, 2020 at 05:45:33PM -0500, Sean Conner <sean at conman.org> wrote a message of 40 lines which said: > > * There are more HTML than Markdown pages on the geminispace, which I > > find suprising. > > Not really, as I've come across one Gemini site that only serves > up HTML. Yes, but Gemini was supposed to be about lightness and soberness. So, Markdown seems a better fit. > > * There is one page in EBCDIC and one in CP-437 :-) > > Now *THAT* is surprising. It is actually a capsule dedicated to tests <gemini://egsam.pitr.ca/>.
On Wed Dec 16, 2020 at 4:05 PM CET, Stephane Bortzmeyer wrote: > I'm running a Gemini crawler, which gathers metadata about the > geminispace. The goal is not to make a search engine but to survey the > geminispace. This is very cool, thanks for your work! I would be curious to see more thorough statistics on TLS certificates in Geminispace. It's nice that you have the Let's Encrypt percentage on there, but I'd also like to know about self-signed certificates, what the distribution of sizes, key types, lifepans etc. is. But I know this information is *not* easy to get at in Python without external dependencies, so don't feel obligated to bang your head against it. I have plans to write a certificate observatory daemon in 2021, with a simple Gemini interface so that TOFU clients can query it regarding new certs. It should be straightforward to generate this kind of information as a side-effect, so my curiosity on this front will be satisfied one way or another sooner or later. Cheers, Solderpunk
On Sat, Dec 19, 2020 at 07:55:05PM +0100, Solderpunk <solderpunk at posteo.net> wrote a message of 22 lines which said: > I'd also like to know about self-signed certificates, I'm not expert enough on X.509 but do note that the obvious algorithm to detect self-signed certificates (checking that issuer == subject) does not work well in the geminispace where many certs are signed by... someone (not a known CA but not the subject). > But I know this information is *not* easy to get at in Python > without external dependencies, Most of it is easy to get <https://framagit.org/bortzmeyer/lupa/-/issues/7> > I have plans to write a certificate observatory daemon in 2021, with > a simple Gemini interface so that TOFU clients can query it > regarding new certs. If it is just for surveying, fine. If it is to turn it into a security system, be careful, there are many traps. Who can put new certificates, how to be sure that clients will check it, etc. gemini://gemini.bortzmeyer.org/rfc-mirror/rfc6962.txt
On Fri, Dec 18, 2020 at 10:03:00PM +0000, Luke Emmet <luke at marmaladefoo.com> wrote a message of 34 lines which said: > How about if the data was anonymised, like to remove IP address, domain > name, path and file name and replaced by anonymous labels, like this Note that you did not receive my private message since your email server denies access to mine. <luke at marmaladefoo.com>: host mx1.mythic-beasts.com[2a00:1098:0:86:1000:0:2:1] said: 550 Block listed: https://www.spamhaus.org/sbl/query/SBLCSS (in reply to MAIL FROM command) (And, no, I do not intend to start begging Spamhaus to unlist me.)
On 19-Dec-2020 19:20, Stephane Bortzmeyer wrote: > Note that you did not receive my private message since your email > server denies access to mine. > <luke at marmaladefoo.com>: host mx1.mythic-beasts.com[2a00:1098:0:86:1000:0:2:1] > said: 550 Block listed: https://www.spamhaus.org/sbl/query/SBLCSS (in reply > to MAIL FROM command) > > (And, no, I do not intend to start begging Spamhaus to unlist me.) Sorry about that - my domain ISP runs a fairly typical setup - I don't usually have problems getting email from people. I certainly don't have the skills or inclination to run my own mail server, so I'm not in a position to adjust the email server. Feel free to reply to the list, or if you want to send a personal email, you might try a different email address: luke [dot] emmet [at] gmail [dot] com - Luke
> On Dec 16, 2020, at 16:05, Stephane Bortzmeyer <stephane at sources.org> wrote: > > I'm running a Gemini crawler This is a very brave endeavor. Survivability is key in an experimental, buggy, hostile, or malicious environment. The same rules of cautiousness applies to any user-agent, but even more so to headless, automated bots. There are many possible traps out there. Nothing new under the sun. The interweb has been through this for the last few decades: pranksters vs bots. As this is a known, solved problem, I will not bore you with specific details, assuming instead everyone knows what they are doing, eyes wide open. Just be cautious. Assume hostility, aim for survivability.
I don?t think details are boring here. Would you mind listing some of the problems and possible solutions/workarounds? On Tue, Dec 22, 2020 at 12:57 Petite Abeille <petite.abeille at gmail.com> wrote: > > > > On Dec 16, 2020, at 16:05, Stephane Bortzmeyer <stephane at sources.org> > wrote: > > > > I'm running a Gemini crawler > > This is a very brave endeavor. Survivability is key in an experimental, > buggy, hostile, or malicious environment. > > The same rules of cautiousness applies to any user-agent, but even more so > to headless, automated bots. > > There are many possible traps out there. Nothing new under the sun. The > interweb has been through this for the last few decades: pranksters vs > bots. > > As this is a known, solved problem, I will not bore you with specific > details, assuming instead everyone knows what they are doing, eyes wide > open. > > Just be cautious. Assume hostility, aim for survivability. > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20201222/505b 3292/attachment.htm>
> On Dec 22, 2020, at 13:53, Peter Vernigorov <pitr.vern at gmail.com> wrote: > > I don?t think details are boring here. Really? Didn't get that vibe. Anyway. > Would you mind listing some of the problems and possible solutions/workarounds? Two simple examples: resource exhaustion and content poisoning. Exhaustion is the most trivial one, The adversary (a technical term, not a value judgment) tries to slow you down or fill you up or reach various limits on your side. E.g. throttling connection, infinite output, hanged connection, any combinations of the above, etc, etc... This is easy to deal with: always limit everything you do, be it reading, writing, waiting, computing, whatnot. Eventually something will reach these limits and let you out of the trap. You can then mark the site as hostile and/or dysfunctional. Not always clear which one is which: incompetence or malice. For example, assuming the network stack goes through, a client has to read at most the first 1024 + some bytes of a server response to figure out what to do. Nothing more. Don't expect a well-formed response line. Assert it. Always validate. Continuously. Drop the connection as soon as something is not right. Always remember what happened. Of course, there are downsides to resource exhaustion for the adversary, as it's a sort of self-inflicted denial of service. Oh well. Content poisoning is more fun. It can be anything from feeding you continuous junk (exhaustion + poisoning), well formed, but ill-intentioned logic bombs, busy beaver, wild goose chase, the list goes on. For example, a trivial chase is infinite redirects. Got to stop eventually. Another limit. Another one could be well formed text/gemini, but with junky links. Same as above. Again this is easy to identify statistically, marking the adversary as dysfunctional. You can them move on, or retaliate, depending on the mood. User-agents could also federate such information and use them in meaningful, if ominous, ways. This is not a one way street: user-agents, specially bots, can do a lot of damage at scale. Always keep in mind Hanlon's razor: "never attribute to malice that which is adequately explained by stupidity". https://en.wikipedia.org/wiki/Hanlon%27s_razor Just my 2?. Have fun.
Hello, Peter Vernigorov writes: > I don?t think details are boring here. Would you mind listing some of the > problems and possible solutions/workarounds? This gemini://alexschroeder.ch/page/2020-12-22_Website_down%2C_disk_full%2C_logs_crazy https://alexschroeder.ch/wiki/2020-12-22_Website_down%2c_disk_full%2c_logs_crazy might have been caused by a crawler. Cheers, Erich -- Keep it simple!
> On Dec 22, 2020, at 16:43, ew.gemini <ew.gemini at nassur.net> wrote: > > might have been caused by a crawler. There you go. Dysfunction cuts both way. Always be defensive. It's a hostile environment out there.
On Wed, 16 Dec 2020 16:05:50 +0100 Stephane Bortzmeyer <stephane at sources.org> wrote: > I'm running a Gemini crawler, which gathers metadata about the > geminispace. The goal is not to make a search engine but to survey the > geminispace. That's interesting, Stephane. Could you add statistics about character encodings used for text/gemini responses specifically? I'd like to know if there are currently text/gemini responses in any other encoding than UTF-8 (or US ASCII). That would be an interesting topic in the IRI+IDN discussion. -- Philip -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 488 bytes Desc: not available URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20201224/1204 e38c/attachment.sig>
It was thus said that the Great Philip Linde once stated: > On Wed, 16 Dec 2020 16:05:50 +0100 > Stephane Bortzmeyer <stephane at sources.org> wrote: > > > I'm running a Gemini crawler, which gathers metadata about the > > geminispace. The goal is not to make a search engine but to survey the > > geminispace. > > That's interesting, Stephane. Could you add statistics about character > encodings used for text/gemini responses specifically? I'd like to know > if there are currently text/gemini responses in any other encoding than > UTF-8 (or US ASCII). That would be an interesting topic in the IRI+IDN > discussion. There's a chart on the GUS stats page: https://portal.mozz.us/gemini/gus.guru/statistics It seesm it's a 54/46 split between UTF-8/US-ASCII (and 7 (seven) pages out of 84,400 that are NOT UTF-8 nor US-ASCII). -spc
On Thu, Dec 24, 2020 at 02:08:57AM +0100, Philip Linde <linde.philip at gmail.com> wrote a message of 37 lines which said: > Could you add statistics about character encodings used for > text/gemini responses specifically? Only for text/gemini:
On Wed, Dec 23, 2020 at 09:01:13PM -0500, Sean Conner <sean at conman.org> wrote a message of 22 lines which said: > There's a chart on the GUS stats page: > > https://portal.mozz.us/gemini/gus.guru/statistics > > It seesm it's a 54/46 split between UTF-8/US-ASCII It seems this percentage includes plain text (anyway, the sum of the numbers does not match the total, not the gemtext only). I cannot find a gemtext page tagged as US-ASCII.
On Sat, Dec 19, 2020 at 07:55:05PM +0100, Solderpunk <solderpunk at posteo.net> wrote a message of 22 lines which said: > I would be curious to see more thorough statistics on TLS certificates > in Geminispace. It's nice that you have the Let's Encrypt percentage on > there, but I'd also like to know about self-signed certificates, what > the distribution of sizes, key types, Now done at <gemini://gemini.bortzmeyer.org/software/lupa/stats.gmi>.
On Fri, Dec 18, 2020 at 12:12:47PM +0000, Luke Emmet <luke at marmaladefoo.com> wrote a message of 23 lines which said: > > gemini://gemini.bortzmeyer.org/software/lupa/stats.gmi > Could it be possible to show the distribution of page sizes in geminispace? > I know you show the average page size, but to get a better view of what is > typical and the range would be good. For example does it follow a power law > etc... Now displays fixed-size ranges *and* quantiles.
On 20-Feb-2021 14:53, Stephane Bortzmeyer wrote: > On Fri, Dec 18, 2020 at 12:12:47PM +0000, > Luke Emmet <luke at marmaladefoo.com> wrote > a message of 23 lines which said: >>> gemini://gemini.bortzmeyer.org/software/lupa/stats.gmi > Now displays fixed-size ranges *and* quantiles. Thank you Stephane - it is interesting to see the shape of the geminiverse resources. It also helps to tune some typical client default parameters for max resource size before abandoning a client connection - as we know there is no Content-Length to know how much content to expect. I know it is cheeky to keep coming with new suggestions - but it would be handy to know some time what is the shape of the predominant gemini resource - text/gemini. I assume that currently the stats apply to all resources, so may be skewed up due to binary files etc. Regards ?- Luke
On Sat, Feb 20, 2021 at 06:30:19PM +0000, Luke Emmet <luke at marmaladefoo.com> wrote a message of 22 lines which said: > I know it is cheeky to keep coming with new suggestions Developers love requests! > but it would be handy to know some time what is the shape of the > predominant gemini resource - text/gemini. I assume that currently > the stats apply to all resources, so may be skewed up due to binary > files etc. Just ask and its now done.
On 21-Feb-2021 17:23, Stephane Bortzmeyer wrote: > >> but it would be handy to know some time what is the shape of the >> predominant gemini resource - text/gemini. I assume that currently >> the stats apply to all resources, so may be skewed up due to binary >> files etc. > Just ask and its now done. Thank you Stephane for adding more fine grained info about the text/gemini subset of resources. I think these statistics are helpful in understanding the broad shape of the geminiverse. ?- Luke
---
Previous Thread: Synchronizing bookmarks - Request for comments