💾 Archived View for bbs.geminispace.org › s › Cosmos › 4918 captured on 2023-11-14 at 08:43:55. Gemini links have been rewritten to link to archived content

View Raw

More Information

⬅️ Previous capture (2023-11-04)

➡️ Next capture (2023-12-28)

🚧 View Differences

-=-=-=-=-=-=-

Small Cosmos fix: paths in entry URLs are now cleaned up so that there are no relative references (`.` or `..`).

This should remove some duplicate entries. Keep an eye out for weirdly malformed/broken URLs, in case I introduced any new bugs with this...

Posted in: s/Cosmos

🚀 skyjake

Aug 30 · 2 months ago

2 Comments ↓

🚀 stack · Aug 30 at 13:47:

A quick thought: instead of worrying about duplicate paths, check for _duplicate content_ by hashing it.

Since you already have to read each text (to scan for a referenced link), a fast FNV1a hash (a mul/xor per character) will stand for its identity, eliminating duplicates. Bernstein''s djb2 is another option, with a shift and two adds.

Love Cosmos, btw; thank you!

🚀 skyjake · Aug 30 at 13:55:

Thanks for the suggestion. Content hashing has crossed my mind before, and it would indeed automatically eliminate all duplicates, including mirrored domains where the URLs are actually different. Something to try out in the future...