💾 Archived View for gemi.dev › gemini-mailing-list › 000354.gmi captured on 2023-11-04 at 12:43:37. Gemini links have been rewritten to link to archived content
-=-=-=-=-=-=-
Hi Gemini List, Has anyone thought about, or implemented, archiving of Gemini content/traffic? WARC (Web ARChive)? is a standard format used for web archiving. It uses text headers for metadata like in HTTP and email. It looks to me like WARC could be adapted for Gemini. The WARC spec supports multiple URI schemes, although it doesn't specify any other than http/https, ftp, and dns?. Bespoke formats could also be used, of course, or just downloading files wget-style, but using a standard format could allow for interop with "the WARC ecosystem"?. Archive Team? has also worked on archiving non-HTTP protocols like FTP? and Gopher?. I think there is an opportunity for people to maintain high-quality archives of Gemini content, like what the Internet Archive? and archive.today? do for the HTTP(S) Web. Now is a good time to start, while many of the original Gemini hosts? are still online. Regards, Charles E. Lehner ? https://en.wikipedia.org/wiki/Web_ARChive ? https://iipc.github.io/warc-specifications/specifications/warc-format/war c-1.1/#ftp-scheme ? https://www.archiveteam.org/index.php?title=The_WARC_Ecosystem ? https://www.archiveteam.org/ https://en.wikipedia.org/wiki/Archive_Team ? https://www.archiveteam.org/index.php?title=FTP ? https://www.archiveteam.org/index.php?title=Gopher ? https://en.wikipedia.org/wiki/Internet_Archive https://archive.org/ ? https://archive.today https://en.wikipedia.org/wiki/Archive.today ? gemini://gemini.circumlunar.space/servers/
On 2020-09-01 (Tuesday) at 23:43, Charles E. Lehner <cel at celehner.com> wrote: > Hi Gemini List, > > Has anyone thought about, or implemented, archiving of Gemini content/traffic? > > WARC (Web ARChive)? is a standard format used for web archiving. It > uses text headers for metadata like in HTTP and email. It looks to me > like WARC could be adapted for Gemini. The WARC spec supports multiple > URI schemes, although it doesn't specify any other than http/https, > ftp, and dns?. Bespoke formats could also be used, of course, or just > downloading files wget-style, but using a standard format could allow > for interop with "the WARC ecosystem"?. > > Archive Team? has also worked on archiving non-HTTP protocols like FTP? > and Gopher?. > > I think there is an opportunity for people to maintain high-quality > archives of Gemini content, like what the Internet Archive? and > archive.today? do for the HTTP(S) Web. Now is a good time to start, > while many of the original Gemini hosts? are still online. > > Regards, > Charles E. Lehner > > ? https://en.wikipedia.org/wiki/Web_ARChive > ? > https://iipc.github.io/warc-specifications/specifications/warc-format/war c-1.1/#ftp-scheme > ? https://www.archiveteam.org/index.php?title=The_WARC_Ecosystem > ? https://www.archiveteam.org/ > https://en.wikipedia.org/wiki/Archive_Team > ? https://www.archiveteam.org/index.php?title=FTP > ? https://www.archiveteam.org/index.php?title=Gopher > ? https://en.wikipedia.org/wiki/Internet_Archive > https://archive.org/ > ? https://archive.today > https://en.wikipedia.org/wiki/Archive.today > ? gemini://gemini.circumlunar.space/servers/ > I personally think this is a great idea, but I know some might not be so on-board with it. I'm thinking of solderpunk's post (in their gopherhole, actually): gopher://zaibatsu.circumlunar.space:70/0/~solderpunk/phlog/the-i ndividual-archivist-and-ghosts-of-gophers-past.txt So is there a way to opt-out of archiving for publishers? Some in the community might want to know about it, though I personally am of the opinion that if you've published it, it's now the property of the commons. -- ~ acdw acdw.net | breadpunk.club/~breadw
Quoting acdw (2020-09-01 18:23:22) > On 2020-09-01 (Tuesday) at 23:43, Charles E. Lehner <cel at celehner.com> wrote: > So is there a way to opt-out of archiving for publishers? Some in the community might want to know about it, though I personally am of the opinion that if you've published it, it's now the property of the commons. Perhaps via robots.txt? I block ia_archiver from pages that I don't want archived on http(s), for example. Alex
On Wed, 02 Sep 2020 01:23:22 +0000 acdw <acdw at acdw.net> wrote: > On 2020-09-01 (Tuesday) at 23:43, Charles E. Lehner > <cel at celehner.com> wrote: > > > Hi Gemini List, > > > > Has anyone thought about, or implemented, archiving of Gemini > > content/traffic? > > > > WARC (Web ARChive)? is a standard format used for web archiving. It > > uses text headers for metadata like in HTTP and email. It looks to > > me like WARC could be adapted for Gemini. The WARC spec supports > > multiple URI schemes, although it doesn't specify any other than > > http/https, ftp, and dns?. Bespoke formats could also be used, of > > course, or just downloading files wget-style, but using a standard > > format could allow for interop with "the WARC ecosystem"?. > > > > Archive Team? has also worked on archiving non-HTTP protocols like > > FTP? and Gopher?. > > > > I think there is an opportunity for people to maintain high-quality > > archives of Gemini content, like what the Internet Archive? and > > archive.today? do for the HTTP(S) Web. Now is a good time to start, > > while many of the original Gemini hosts? are still online. > > > > Regards, > > Charles E. Lehner > > > > ? https://en.wikipedia.org/wiki/Web_ARChive > > ? > > https://iipc.github.io/warc-specifications/specifications/warc-format/w arc-1.1/#ftp-scheme > > ? https://www.archiveteam.org/index.php?title=The_WARC_Ecosystem > > ? https://www.archiveteam.org/ > > https://en.wikipedia.org/wiki/Archive_Team > > ? https://www.archiveteam.org/index.php?title=FTP > > ? https://www.archiveteam.org/index.php?title=Gopher > > ? https://en.wikipedia.org/wiki/Internet_Archive > > https://archive.org/ > > ? https://archive.today > > https://en.wikipedia.org/wiki/Archive.today > > ? gemini://gemini.circumlunar.space/servers/ > > > > I personally think this is a great idea, but I know some might not be > so on-board with it. I'm thinking of solderpunk's post (in their > gopherhole, actually): > gopher://zaibatsu.circumlunar.space:70/0/~solderpunk/phlog/the-individual -archivist-and-ghosts-of-gophers-past.txt > > So is there a way to opt-out of archiving for publishers? Some in the > community might want to know about it, though I personally am of the > opinion that if you've published it, it's now the property of the > commons. > Ounce you publish something to the internet there is no retracting it. This is one of the first things I was taught the first time I used the net. Alongside never using your real name on the net unless your publishing something. -- _______________________________________ / Concentrate on th'cute, li'l CARTOON \ | GUYS! Remember the SERIAL NUMBERS!! | | Follow the WHIPPLE AVE. EXIT!! Have a | | FREE PEPSI!! Turn LEFT at th'HOLIDAY | | INN!! JOIN the CREDIT WORLD!! MAKE me | \ an OFFER!!! / --------------------------------------- \ \ /\ /\ //\\_//\\ ____ \_ _/ / / / * * \ /^^^] \_\O/_/ [ ] / \_ [ / \ \_ / / [ [ / \/ _/ _[ [ \ /_/
I can appreciate the instinct to archive, but I fall into the camp that would generally prefer that it not be done (while respecting that with the way the technology is built, there is not a reasonable way to prevent it). I think a great tragedy of the internet is the inability to be forgotten and to retract and change and not have your past mistakes dictate your present. I dont have a technical solution for that in gemini, or for that matter in gopher... but think that community norms and expectations should develop around it organically (which is of taking place currently in this discussion and will continue to do so over time). I definitely support the commons for articles, information, and "knowledge"... but hesitate to extend that to what are sometimes the only personal outlets that some people may have. I think if something like `robots.txt` were to be used for this purpose I would recommend doing it at the directory level (and thus break from how robots.txt works). In gemini many (most?) users are a part of multiuser systems. If `robots.txt` at the root were used it would generally control the whole domain and not allow for individual users to opt in or out. To that, I would also put in a vote for an opt-in system rather than an opt-out system (like robots.txt). Opt-in empowers all users to make choices whereas opt-out is often limited to those that know to do so and have the technical know how to do so. There are also environmental and energy arguments against full protocol archiving, though those costs may be small while gemini is at or around its current size. Anyway, just a few thoughts.
On Wednesday, September 2, 2020, Brian Evans wrote: > I think if something like `robots.txt` were to be used for this > purpose I would recommend doing it at the directory level (and thus > break from how robots.txt works). On Wednesday, September 2, 2020, Brian Evans wrote: > To that, I would also put in a vote for > an opt-in system rather than an opt-out system (like robots.txt). Agreed on both points! I was thinking of implementing a personal archiving system like the one mentioned by Solderpunk in gopher://zaibatsu.circumlunar.space:70/0/~solderpunk/phlog/the-individual-a rchivist-and-ghosts-of-gophers-past.txt
This seems like an incredibly cynical and myopic take. It's also expected that everything on the internet will track you, will be constantly expanded for the purpose of commercialization instead of user experience, etc.... Yet Gemini purposefully rejects those notions in favor of something better. The idea that the same shouldn't apply here is odd. -caranatar Tom writes: > On Wed, 02 Sep 2020 01:23:22 +0000 > acdw <acdw at acdw.net> wrote: > >> On 2020-09-01 (Tuesday) at 23:43, Charles E. Lehner >> <cel at celehner.com> wrote: >> >> > Hi Gemini List, >> > >> > Has anyone thought about, or implemented, archiving of Gemini >> > content/traffic? >> > >> > WARC (Web ARChive)? is a standard format used for web archiving. It >> > uses text headers for metadata like in HTTP and email. It looks to >> > me like WARC could be adapted for Gemini. The WARC spec supports >> > multiple URI schemes, although it doesn't specify any other than >> > http/https, ftp, and dns?. Bespoke formats could also be used, of >> > course, or just downloading files wget-style, but using a standard >> > format could allow for interop with "the WARC ecosystem"?. >> > >> > Archive Team? has also worked on archiving non-HTTP protocols like >> > FTP? and Gopher?. >> > >> > I think there is an opportunity for people to maintain high-quality >> > archives of Gemini content, like what the Internet Archive? and >> > archive.today? do for the HTTP(S) Web. Now is a good time to start, >> > while many of the original Gemini hosts? are still online. >> > >> > Regards, >> > Charles E. Lehner >> > >> > ? https://en.wikipedia.org/wiki/Web_ARChive >> > ? >> > https://iipc.github.io/warc-specifications/specifications/warc-format/ warc-1.1/#ftp-scheme >> > ? https://www.archiveteam.org/index.php?title=The_WARC_Ecosystem >> > ? https://www.archiveteam.org/ >> > https://en.wikipedia.org/wiki/Archive_Team >> > ? https://www.archiveteam.org/index.php?title=FTP >> > ? https://www.archiveteam.org/index.php?title=Gopher >> > ? https://en.wikipedia.org/wiki/Internet_Archive >> > https://archive.org/ >> > ? https://archive.today >> > https://en.wikipedia.org/wiki/Archive.today >> > ? gemini://gemini.circumlunar.space/servers/ >> > >> >> I personally think this is a great idea, but I know some might not be >> so on-board with it. I'm thinking of solderpunk's post (in their >> gopherhole, actually): >> gopher://zaibatsu.circumlunar.space:70/0/~solderpunk/phlog/the-individua l-archivist-and-ghosts-of-gophers-past.txt >> >> So is there a way to opt-out of archiving for publishers? Some in the >> community might want to know about it, though I personally am of the >> opinion that if you've published it, it's now the property of the >> commons. >> > > Ounce you publish something to the internet there is no retracting it. > This is one of the first things I was taught the first time I used the > net. Alongside never using your real name on the net unless your > publishing something. -- sent from emacs using mu4e
It was thus said that the Great Caranatar once stated: > Tom writes: > > > Ounce you publish something to the internet there is no retracting it. > > This is one of the first things I was taught the first time I used the > > net. Alongside never using your real name on the net unless your > > publishing something. > > This seems like an incredibly cynical and myopic take. I also think it's an incredibly realistic take. > It's also > expected that everything on the internet will track you, will be > constantly expanded for the purpose of commercialization instead of user > experience, etc.... Yet Gemini purposefully rejects those notions in > favor of something better. The idea that the same shouldn't apply here > is odd. Even though Gemini (and gopher to an extent) reject those ideas, it doesn't mean privacy or control over the content. I wrote about this last year: http://boston.conman.org/2019/10/29.2 gopher://gopher.conman.org/0Phlog:2019/10/29.2 gemini://gemini.conman.org/boston/2019/10/29.2 (take your pick of format) I even quote the same solderpunk article (and another one not by solderpunk) about how they're ... well ... "wrong" is the wrong word here, but it's close ... perhaps "misguided" is what I'm thinking of. Information that is publically available (and by any measure, most of Gemini is public) can, and will, travel in mysterious ways, which I discuss in my post above. I can find stuff I posted to USENET in 1993 *today*. I can still find my first website from 1997. -spc (I think I seriously just dated myself ... )
On Thu, Sep 03, 2020 at 11:54:08PM -0400, Caranatar wrote: > This seems like an incredibly cynical and myopic take. It's also > expected that everything on the internet will track you, will be > constantly expanded for the purpose of commercialization instead of user > experience, etc.... Yet Gemini purposefully rejects those notions in > favor of something better. The idea that the same shouldn't apply here > is odd. > > -caranatar Calling it myopic is a bit harsh and probably misses a point that you put forward as a support - one of the selling points of gemini is that it rejects complexity and some of the concerns of a more commercialized internet. One of those concerns is the potential for misuse of the information or infrastructure beyond the intent of the content creator or host. That or the right to retract that information. You'll have to forgive me seeing some irony that someone with a riseup.net email address would speak against someone putting forth advice about taking caution in what you post on the internet. Riseup exists largely in part because others share this "cynical and myopic take." Regardless, the issues being brought up here seem to circle around content control and archival ethics and less about the protocol. > > > Tom writes: > > > On Wed, 02 Sep 2020 01:23:22 +0000 > > acdw <acdw at acdw.net> wrote: > > > >> On 2020-09-01 (Tuesday) at 23:43, Charles E. Lehner > >> <cel at celehner.com> wrote: > >> > >> > Hi Gemini List, > >> > > >> > Has anyone thought about, or implemented, archiving of Gemini > >> > content/traffic? > >> > > >> > WARC (Web ARChive)? is a standard format used for web archiving. It > >> > uses text headers for metadata like in HTTP and email. It looks to > >> > me like WARC could be adapted for Gemini. The WARC spec supports > >> > multiple URI schemes, although it doesn't specify any other than > >> > http/https, ftp, and dns?. Bespoke formats could also be used, of > >> > course, or just downloading files wget-style, but using a standard > >> > format could allow for interop with "the WARC ecosystem"?. > >> > > >> > Archive Team? has also worked on archiving non-HTTP protocols like > >> > FTP? and Gopher?. > >> > > >> > I think there is an opportunity for people to maintain high-quality > >> > archives of Gemini content, like what the Internet Archive? and > >> > archive.today? do for the HTTP(S) Web. Now is a good time to start, > >> > while many of the original Gemini hosts? are still online. > >> > > >> > Regards, > >> > Charles E. Lehner > >> > > >> > ? https://en.wikipedia.org/wiki/Web_ARChive > >> > ? > >> > https://iipc.github.io/warc-specifications/specifications/warc-forma t/warc-1.1/#ftp-scheme > >> > ? https://www.archiveteam.org/index.php?title=The_WARC_Ecosystem > >> > ? https://www.archiveteam.org/ > >> > https://en.wikipedia.org/wiki/Archive_Team > >> > ? https://www.archiveteam.org/index.php?title=FTP > >> > ? https://www.archiveteam.org/index.php?title=Gopher > >> > ? https://en.wikipedia.org/wiki/Internet_Archive > >> > https://archive.org/ > >> > ? https://archive.today > >> > https://en.wikipedia.org/wiki/Archive.today > >> > ? gemini://gemini.circumlunar.space/servers/ > >> > > >> > >> I personally think this is a great idea, but I know some might not be > >> so on-board with it. I'm thinking of solderpunk's post (in their > >> gopherhole, actually): > >> gopher://zaibatsu.circumlunar.space:70/0/~solderpunk/phlog/the-individ ual-archivist-and-ghosts-of-gophers-past.txt > >> > >> So is there a way to opt-out of archiving for publishers? Some in the > >> community might want to know about it, though I personally am of the > >> opinion that if you've published it, it's now the property of the > >> commons. > >> > > > > Ounce you publish something to the internet there is no retracting it. > > This is one of the first things I was taught the first time I used the > > net. Alongside never using your real name on the net unless your > > publishing something. > > > -- > sent from emacs using mu4e -- Dr . Otto Skrzyk gemini : gemini://tilde.team/~drskrzyk web : https://drskrzyk.tilde.team/ mastodon : @docskrzyk at hackers.town
I agree 100% with Sean's post (http://boston.conman.org/2019/10/29.2) -- the act of posting something to a gemini *is* publishing, so it's out there -- toothpaste-tube-style. That being said, I think any archiver or spider should also respect *robots.txt* files -- though them being opt-in vs. opt-out is kind of moot, since spiders gonna spider, you know? It's the very nature of the Internet to communicate. However, I thought Dr. Otto brought up a vv good point as well: > Regardless, the issues being brought up here seem to circle around > content control and archival ethics and less about the protocol. Inasmuch as gemini is a technical specification/machine protocol, I think there's nothing to say about it vis-a-vis archiving. Socially, though, we have norms -- which are good to nail down in a nascent community. -- ~ acdw acdw.net | breadpunk.club/~breadw
acdw writes: > I think any archiver or spider should also respect *robots.txt* files -- though them being opt-in vs. opt-out is kind of moot, since spiders gonna spider, you know? I think opt-in vs opt-out is definitely not moot. The web largely operates on an opt out basis (where there is an option at all). We are at a point where we can develop different norms for a different system, and I think we should. I definitely agree that there is nothing that can be down about spiders that do not follow recommended community guidelines and that when you post something that is not behind a client cert requirement or the like that it is public. However, I do think that using robots.txt for spiders of all sorts is a bad idea for gemini and will create less user choice in the long run. robots.txt is suggested often because it exists and is there... but it is not designed for multi-user systems (the predominant form of system on gemini at present) and is explicitly designed to opt you out... meaning that if users dont even know that spiders are a thing (as many non-technical people do not) then they do not get to have a choice. My suggestion as simply about community norms and trying to push, at least for spiders that are willing to respect a community standard, an opt in that works at the directory level and can be managed by users rather than by system administrators. The idea being that if someone does not have a document, lets call it `green-light.txt`, saying yes to various sorts of spidering that a well behaved spider should ignore content in that directory. Having said all of that: I agree this is not a protocol issue and the conversation and is more about philosophical/ethical preferences and could be moved over to gemini posts rather than here on the mailing list. So I will likely not post more on it here... but maybe I'll write something up on my gemlog tonight.
On Fri, 04 Sep 2020 16:43:42 +0000 acdw <acdw at acdw.net> wrote: I want to clarify something. What I said does not purely revolve around inevitable misuse of the data. It is also based on freedom of the user. The only way you could prevent the user from doing something on his own machine ounce the data has been copied over the net is with some kind of rookit and spyware under the umbrella term Digital Restrictions Management. I Hope we can all agree that DRM is heinous and a key point in the downfall of the web. https://www.defectivebydesign.org/ The most you could do is ask someone not to unlist their archive for some TTL period. I feel this would be a good compromise between archivist and authors. Archivists are going to archive because without an immutable content addressable storage back-end like IPFS or the LoC ARC Resolver everything is fickle and could disappear at any moment, lost to entropy. -- ________________________________________ / telepression, n.: \ | | | The deep-seated guilt which stems from | | knowing that you did not try | | | | hard enough to look up the number on | | your own and instead put the | | | | burden on the directory assistant. | | | \ -- "Sniglets", Rich Hall & Friends / ---------------------------------------- \ \ /\ /\ //\\_//\\ ____ \_ _/ / / / * * \ /^^^] \_\O/_/ [ ] / \_ [ / \ \_ / / [ [ / \/ _/ _[ [ \ /_/
---