💾 Archived View for bbs.geminispace.org › u › stack › 4925 captured on 2023-09-08 at 17:50:15. Gemini links have been rewritten to link to archived content

View Raw

More Information

➡️ Next capture (2023-09-28)

-=-=-=-=-=-=-

Re: "Small Cosmos fix: paths in entry URLs are now cleaned up so..."

Comment in: s/Cosmos

A quick thought: instead of worrying about duplicate paths, check for _duplicate content_ by hashing it.

Since you already have to read each text (to scan for a referenced link), a fast FNV1a hash (a mul/xor per character) will stand for its identity, eliminating duplicates. Bernstein''s djb2 is another option, with a shift and two adds.

Love Cosmos, btw; thank you!

🚀 stack

2023-08-30 · 9 days ago

1 Later Comment

🚀 skyjake

Thanks for the suggestion. Content hashing has crossed my mind before, and it would indeed automatically eliminate all duplicates, including mirrored domains where the URLs are actually different. Something to try out in the future...

Original Post

🌒 s/Cosmos

Small Cosmos fix: paths in entry URLs are now cleaned up so that there are no relative references (`.` or `..`). This should remove some duplicate entries. Keep an eye out for weirdly malformed/broken URLs, in case I introduced any new bugs with this...

💬 skyjake · 2 comments · 2023-08-30 · 9 days ago