<-- back to the mailing list

[users] [announce] geminispace.info - alternative search provider

René Wagner rwagner at rw-net.de

Fri Feb 26 17:17:06 GMT 2021

- - - - - - - - - - - - - - - - - - - 

Hi,

i didn't realize that Stephane wrote to the mailing list and answered him directly to his personal mail. So here we go again. ;)

It will work again - once the currently running indexing is done.Unfortunately this is a shortcoming of the current GUS implementation.

Crawling and updating the search index are separate steps and the later onelocks the search index database.Unfortunately, as geminispace grows, this becomes more of a pain than earlier cause indexing takes more time.

It seems that a mirror of a webpage with a huge archive popped up a fewdays ago. I had to stop the crawl as it was still fetching this archive, iprobably need to exclude this mirror until we were able to improve theperformance of crawling/indexing.Unfortunately i'm not that familiar with Python, so it may take some time.

Especially the data gathering parts (crawling/indexing) are currently bottlenecks. They are strictly sequential and single-threaded "one page at a time"which will prolong these processes increasingly. But there are some more issues which arise as geminispace keeps growing,GUS was not designed to index large capsulses which mirrors of webpages.

If you are interested in helping out with Python coding feel free to join,every help is welcome:https://src.clttr.info/rwa/geminispace.info/issues

I'm not sure if its feasible to improve GUS in a sustainable way or if we needto start over and come up with a new design that honors the growth of geminispace (and is likely much more complex than GUS currently is).

regardsRené