๐Ÿ‘ฝ krixano

New Stats Page on AuraGem: gemini://auragem.space/search/stats

All of indexed Geminispace is about 39 GB, and all text files are around 7.6 GB.

2 years ago ยท ๐Ÿ‘ astroseneca, kakafega, freezr, bencollversphone, smokey

Links

gemini://auragem.space/search/stats

Actions

๐Ÿ‘‹ Join Station

3 Replies

๐Ÿ‘ฝ krixano

@smokey As for the gemtext accounting for ~20%, it makes sense when you take into account that gemtext is going to be smaller than other file formats. So even if there's more gemtext files, those files are also smaller. Right now you can see all of the file formats in the index on the mimetypes page with counts (although, not with sizes, because those actually take a bit for the db to calculate since atm they aren't cached). I will add a link from the stats page to the mimetypes page right now. ยท 2 years ago

๐Ÿ‘ฝ krixano

@smokey You can't ever be sure that you have crawled everything, because there can be servers that aren't linked from anywhere. By "all of indexed" I mainly meant everything auragem has indexed, not all of "indexable" geminispace (indexed and indexable mean different things, of course). Also, in order for something to be indexable, they would have to be linked from a page that is itself indexable, so indexable by definition means you would be able to crawl it *eventually*. However, this is very much dependent on what seeds you have (what pages you start out with). AuraGem started out with a fairly broad set of seeds, increasing its chances of getting more sites. ยท 2 years ago

๐Ÿ‘ฝ smokey

Very cool! How can you be sure auragem has crawled *all* of indexable geminispace and not just part of it? Does the crawler slowing down in discovery speed eventually to a near halt give a good indication of a complete crawl? Also, if gemtext only accounts for ~20 percent of geminispace data, what are the other data types and in what order of highest to lowest? ยท 2 years ago