git clone https://src.clttr.info/rwa/geminispace.info.git

Gemini Universal Search (GUS)

Dependencies

1. Install python (>3.5) and poetry

2. Run: poetry install

Making an initial index

Make sure you have some gemini URLs for testing which are nicely sandboxed to avoid indexing huge parts of the gemini space.

1. Create a "seed-requests.txt" file with you test gemini URLs

2. Run: poetry run crawl -d

3. Run: poetry run build_index -d

Now you'll have created index.new directory, rename it to index.

Running the frontend

1. Run: poetry run serve

2. Navigate your gemini client to: "gemini://localhost/"

Running the frontend in production with systemd

1. update infra/gus.service to match your needs (directory, user)

2. copy infra/gus.service to /etc/systemd/system/

3. run systemctl enable gus and systemctl start gus

Running the crawl to update the index

1. Run: poetry run crawl

2. Run: poetry run build_index

3. Restart frontend

Running the crawl & indexer in production with systemd

1. update infra/gus-crawl.service & infra/gus-index.service to match your needs (directory, user)

2. copy both files to /etc/systemd/system/

3. set up a cron job for root with the following params: 0 9 */3 * * systemctl start gus-crawl --no-block

Running the test suite

Run: poetry run pytest

configuration

exclude pages from crawling

To exclude a page (or a whole domain) from crawling, you can add a excludes.txt in the root directory which holds an excluded url per line.

These are checked against normalized_url with starts_with, so they should be prepended with the gemini:// protocol, be all lowercased, and not have the port specified if it is 1965.