💾 Archived View for dioskouroi.xyz › thread › 29430609 captured on 2021-12-04 at 18:04:22. Gemini links have been rewritten to link to archived content

-=-=-=-=-=-=-

Show HN: Irchiver, your full-resolution personal web archive

Author: lazyjeff

Score: 34

Comments: 14

Date: 2021-12-03 14:50:37

Web Link

________________________________________________________________________________

MerelyMortal wrote at 2021-12-03 16:53:25:

Does anyone know of something like this, but saves in searchable html, and works on Linux?

(I imagine a Firefox extension could probably do this.)

I see lots of things where you have to click-to-save, but nothing automatic.

cycomanic wrote at 2021-12-04 09:50:02:

I was looking into this a while ago, there are a couple of solutions (all of them not perfect unfortunately IMO).

https://archivy.github.io/

https://perkeep.org/

https://github.com/go-shiori/shiori

Are three of them

MerelyMortal wrote at 2021-12-04 21:41:08:

None of those websites say they do what I'm asking. Did you try them out? (How do you know?) If they do what I'm asking, their websites aren't that good if I can't figure that out by reading them.

kseistrup wrote at 2021-12-04 07:24:03:

Something like this, yes: nb

⌘

https://github.com/xwmx/nb

Do an `nb $URL` in the CLI and `nb` will save the data as a markdown file in a collection managed by `git`. You can then e.g. `nb search` to find what you're looking for, or `nb browse` to see a rendered version of the markdown in a browser of your choice.

sackofmugs wrote at 2021-12-04 08:58:05:

That's another manual method. There's nothing that automatically gets every single page you go to. And nb would only work for stateless pages anyways.

kseistrup wrote at 2021-12-04 09:03:24:

Correct

rendx wrote at 2021-12-03 16:57:31:

perhaps this?

https://github.com/i5ik/22120

MerelyMortal wrote at 2021-12-04 21:42:21:

It looks like this is the closest thing, thank you.

lambdaba wrote at 2021-12-03 22:53:21:

I gotchu fam:

https://github.com/Y2Z/monolith

Note: this bundles external resources like images, css and scripts inside the file

overeater wrote at 2021-12-03 23:08:40:

I don't think this is automatic is it? So you have to save each page you want

lambdaba wrote at 2021-12-03 23:11:24:

Oh, true, I missed that.

In chrome it would be quite trivial to tail the history read from the sqlite db and get the latest url, and frankly something like Archivebox would be more powerful (truly grabbing all resources). Unfortunately in Firefox it's not possible to tail the history without unlocking the database, so the solution would be slightly more hackish but still feasible.

For illustration, this is the script I use to grab Chrome history:

      profile=$HOME/.config/google-chrome/Default

  sqlite3 "file:$profile/History?immutable=1" \
    "select datetime(last_visit_time/1000000-11644473600, 'unixepoch'), url from  urls order by last_visit_time asc"

It would be pretty easy to use `watch` or `inotifywait` to do this regularly and call Archivebox with the new urls.

tedmiston wrote at 2021-12-03 23:28:20:

ArchiveBox is popular -

https://github.com/ArchiveBox/ArchiveBox

MerelyMortal wrote at 2021-12-04 01:01:46:

It pulls URLs from your history, and downloads them later. It doesn't download what you see, so if you're logged in somewhere, it won't be able to save that; or if content changes between when you see it and it gets scraped, it won't know.

URfejk wrote at 2021-12-04 08:16:52:

Does anyone know of something like this, but for Android?