💾 Archived View for bbs.geminispace.org › u › stack › 4925 captured on 2024-05-10 at 12:40:32. Gemini links have been rewritten to link to archived content

View Raw

More Information

⬅️ Previous capture (2024-03-21)

➡️ Next capture (2024-05-26)

🚧 View Differences

-=-=-=-=-=-=-

Comment by 🚀 stack

Re: "Small Cosmos fix: paths in entry URLs are now cleaned up so..."

In: s/Cosmos

A quick thought: instead of worrying about duplicate paths, check for _duplicate content_ by hashing it.

Since you already have to read each text (to scan for a referenced link), a fast FNV1a hash (a mul/xor per character) will stand for its identity, eliminating duplicates. Bernstein''s djb2 is another option, with a shift and two adds.

Love Cosmos, btw; thank you!

🚀 stack

2023-08-30 · 8 months ago

1 Later Comment

🕹️ skyjake [OP/mod...] · 2023-08-30 at 13:55:

Thanks for the suggestion. Content hashing has crossed my mind before, and it would indeed automatically eliminate all duplicates, including mirrored domains where the URLs are actually different. Something to try out in the future...

Original Post

🌒 s/Cosmos

Small Cosmos fix: paths in entry URLs are now cleaned up so that there are no relative references (`.` or `..`). This should remove some duplicate entries. Keep an eye out for weirdly malformed/broken URLs, in case I introduced any new bugs with this...

💬 skyjake [mod...] · 2 comments · 2023-08-30 · 8 months ago