[tech] Questions about cache

Stephen <stephen (a) drsudo.com>

Gemini spec does not mention caching, and the FAQ says:

 > Gemini has no support for caching

So I was a bit surprised when I was putting together some dynamic 
content for my capsule and found that my client (amfora) was caching 
things. I was able to get around it by using temporary redirects to add 
cache busting query strings.

I don't have much experience with gopher, but I have seen that this has 
been a problem in the gopherspace. On the mozz.us phlog entry for his 
Rock Paper Scissors game he notes:

 > The random token at the end of the URL (e.g "/bfhqK3kH") doesn't
 > actually represent anything. I only added it for cache-busting on
 > lynx (otherwise you would always see the same result as the first
 > time you loaded the page). I really hate doing that because it screws
 > with the purity of my gopher routes. But a lot of people (including
 > myself) use lynx. It sort of reminds me of JS hacking to support old
 > Internet Explorer versions.

( gopher://mozz.us:70/1/phlog 2019-04-12 )

Is there some guarantee that a capsule writer has on what may _not_ be 
cached, or is it entirely up to the client's discretion? If the latter, 
is this something which could be clarified in the spec? My impression 
from this ML was that caching is frowned upon (except for perhaps 
navigating back and forth through your history), but this has not been 
my experience (at least with amfora).

~Stephen ( gemini://drsudo.com/ )

Link to individual message.

Sean Conner <sean (a) conman.org>

It was thus said that the Great Stephen once stated:
> Gemini spec does not mention caching, and the FAQ says:
> 
> > Gemini has no support for caching
> 
> So I was a bit surprised when I was putting together some dynamic 
> content for my capsule and found that my client (amfora) was caching 
> things. I was able to get around it by using temporary redirects to add 
> cache busting query strings.

  There was a thread about this in early November:

	https://lists.orbitalfox.eu/archives/gemini/2020/003077.html

  It appears that half want caching, half don't.  It's a mess.

  -spc

Link to individual message.

Luke Emmet <luke (a) marmaladefoo.com>

On 22-Dec-2020 22:40, Stephen wrote:
>
> Is there some guarantee that a capsule writer has on what may _not_ be 
> cached, or is it entirely up to the client's discretion? If the 
> latter, is this something which could be clarified in the spec? My 
> impression from this ML was that caching is frowned upon (except for 
> perhaps navigating back and forth through your history), but this has 
> not been my experience (at least with amfora).

As a user facing client author, the semantics that makes most sense to 
me, whilst recognising the Gemini does not have any mechanism to 
indicate staleness is as follows:

1. Any overt action by the user to activate or refresh a link retrieves 
a fresh copy

2. Navigation backwards and forwards through history may use a local 
cached version

This is largely the equivalent of web pages having expires=0

Other clients (like crawlers, indexers and the like) preserve their own 
copies on the understanding that the data will be out of date - but 
still their cache may be useful for their own purposes.

I think a client caching a resource when the user has overtly requested 
it is probably overstepping its authority to provide what the user 
expected/intended.

  - Luke

Link to individual message.

Luke Emmet <luke (a) marmaladefoo.com>



On 22-Dec-2020 23:08, Luke Emmet wrote:
>
> I think a client caching a resource when the user has overtly 
> requested it is probably overstepping its authority to provide what 
> the user expected/intended.
Sorry, to clarify, I meant to say that my view was that a client 
retrieving a cached version of a resource when the user has overtly 
requested it [...] has probably overstepped its authority etc

Link to individual message.

Philip Linde <linde.philip (a) gmail.com>

On Tue, 22 Dec 2020 14:40:55 -0800
Stephen <stephen at drsudo.com> wrote:

> Is there some guarantee that a capsule writer has on what may _not_ be 
> cached, or is it entirely up to the client's discretion? If the latter, 
> is this something which could be clarified in the spec? My impression 
> from this ML was that caching is frowned upon (except for perhaps 
> navigating back and forth through your history), but this has not been 
> my experience (at least with amfora).

It is at the clients discretion by my reading; it's not brought up at
all in the spec. In general it's probably best for the client to avoid
caching, at least by default. According to Stephane's statistics page?,
most documents are typically rather small anyway (but perhaps some of
the larger ones are more popular...)

The main cost is establishing a connection and the TLS handshake,
where the client and server have to do some back-and-forths and will
spend a lot of time just waiting for responses. For this reason I
personally prefer to use caching liberally, at least for the duration
of my session. There are some applications that will break as a result,
but I don't tend to use them myself.

I think that no-caching-by-default should be suggested as a best
practice in the best practices document.

1: gemini://gemini.bortzmeyer.org/software/lupa/stats.gmi

-- 
Philip

Link to individual message.

Alex // nytpu <alex (a) nytpu.com>

> > Gemini has no support for caching

This is referring to protocol-mandated caching, there's nothing saying a
client can't cache at its discretion. It also implies that if your
dynamic content doesn't work, then it's the client's fault, not yours
nor the protocol's.


> So I was a bit surprised when I was putting together some dynamic
> content for my capsule and found that my client (amfora) was caching
> things. I was able to get around it by using temporary redirects to
> add cache busting query strings.

Well behaved clients usually have some way to disable or get around the
cache. For instance, Lagrange only caches forwards and backwards
navigation, and any new links visited will not use the cache, even if
you've already visited that page.

Looking at the Amfora code, it seems like there's 3 limits to the
cache[1]: number of pages, size in bytes, and number of seconds. This
seems to be configurable[2],and it looks like it *should* (I may be
misreading the code) hard reload the page when you refresh a tab[3]. Not
that this is helpful for dynamic content, you can't expect your users to
do an extra refresh every time, but helpful if you're viewing a dynamic
page yourself.

[1]: https://github.com/makeworld-the-better-one/amfora/blob/master/cache/page.go#L19-L3
[2]: https://github.com/makeworld-the-better-one/amfora/blob/master/config/
config.go#L199-L201
[3]: https://github.com/makeworld-the-better-one/amfora/blob/master/display
/display.go#L591-L592


> > ...I only added it for cache-busting on lynx (otherwise you would
> > always see the same result as the first time you loaded the page).

This seems like lynx's fault. Caching a page permanently is not a good
idea, even in webspace.


> Is there some guarantee that a capsule writer has on what may _not_ be
> cached, or is it entirely up to the client's discretion? If the
> latter, is this something which could be clarified in the spec? My
> impression from this ML was that caching is frowned upon (except for
> perhaps navigating back and forth through your history), but this has
> not been my experience (at least with amfora).

There's no way to guarantee or request it, because pages assume they
aren't cached. The consensus on caching should be that it should be only
in special circumstances instead of the norm (for instance, caching
forward/backward navigation only). I'd look to see if there's a way to
disable caching on a per-site basis or at least temporarily disable it
globally in Amfora, because having a cache that acts like that doesn't
seem quite right. Absolute worst case, use a different client for
dynamic content than you would use for static content.


Just because I'm lazy I wouldn't go out of my way to implement cache
busting, and instead just put a warning saying some clients may cache
the page and cause it to not function properly. Especially when most
clients don't cache, and the cache-busting would clog up
forward/backward navigation on normal clients.

~nytpu

-- 
Alex // nytpu
alex at nytpu.com
GPG Key: https://www.nytpu.com/files/pubkey.asc
Key fingerprint: 43A5 890C EE85 EA1F 8C88 9492 ECCD C07B 337B 8F5B
https://useplaintext.email/

Link to individual message.

Stephen <stephen (a) drsudo.com>

>    There was a thread about this in early November:
> 
> 	https://lists.orbitalfox.eu/archives/gemini/2020/003077.html
> 
>    It appears that half want caching, half don't.  It's a mess.

Well that's embarrassing. I don't know how I missed that *blush*. Thanks 
for the link and sorry for the noise.

In case anyone is curious, this is the solution I came up with to 
prevent caching. I have a cgi script like this:

if [ -z "$QUERY_STRING" ]; then
   printf '30 %s?%s\r\n' \
     "$(basename "$SCRIPT_PATH")" \
     "$(date | tr ' :' __)"
else
   printf '20 text/gemini\r\n'
   generate_content
fi

Basically, if there is no query string on the path, it redirects you to 
the same page with a cache busting query string constructed from the 
current date and time. If there is a query string, it gives you the content.

It's not perfect. A user could bookmark a page with a cache busting 
string on it and then see the same static page until it went out of 
their cache. I don't think there is a way around that though.

~Stephen

Link to individual message.

Petite Abeille <petite.abeille (a) gmail.com>



> On Dec 23, 2020, at 00:31, Stephen <stephen at drsudo.com> wrote:
> 
> cache busting

Wouldn't that be a anti-pattern actually?  Perhaps gemini is not actually 
meant for such dynamic content at all, beside experimental purpose. 
Perhaps best to let it go. After all, gemini has no cache control. Why force it?

Let me present you with another anti-pattern, as far as gemini goes:

a gemini page embed a small data: link somewhere, let's call it  'meta', 
which includes random cache control directives, say, 'refresh=1', just because.

=> data:text;refresh%3D1 meta

When a client sees such construct, it refresh the page after one second.

Wouldn't that be cool?

Yes, I thought so too.

Link to individual message.

Petite Abeille <petite.abeille (a) gmail.com>



> On Dec 22, 2020, at 23:40, Stephen <stephen at drsudo.com> wrote:
> 
> Gemini spec does not mention caching, and the FAQ says:
> 
> > Gemini has no support for caching

Perhaps this should be rewritten as "no support for cache control". This 
would clear up any ambiguities. The user-agent decides what to do.

Link to individual message.

BjΓΆrn WΓ€rmedal <bjorn.warmedal (a) gmail.com>


>> Gemini spec does not mention caching, and the FAQ says:
>> 
>>> Gemini has no support for caching
> 
> Perhaps this should be rewritten as "no support for cache control". This 
would clear up any ambiguities. The user-agent decides what to do. 

Very much this. How, when and for how long a browser caches is first and 
foremost a UI/UX question. The browser should act as the user expects, and 
may work to change those expectations (i.e a specialized browser that 
behaves in a very specific manner may describe its usage case and 
consequences in documentation).

Us authors have little control over how our content is fetched, displayed, 
or otherwise consumed.

On an Amfora-related note: makeworld talked about the caching on IRC the 
other day, and another user brought up that it seemed like Amfora 
presented cached content when a previously-visited link was clicked. 
Makeworld commented that this was unintended, and said they would fix it 
for a later release. (At least that?s how I remember it; maybe makeworld can weigh in.)

Cheers,
ew0k

Sent from my smart speaker

Link to individual message.

colecmac@protonmail.com <colecmac (a) protonmail.com>

> On an Amfora-related note: makeworld talked about the caching on
> IRC the other day, and another user brought up that it seemed like
> Amfora presented cached content when a previously-visited link was
> clicked. Makeworld commented that this was unintended, and said they
> would fix it for a later release. (At least that?s how I remember it;
> maybe makeworld can weigh in.)


Wow, a lot of talk about Amfora in this thread. I'll respond to this, and
some other things.

What I remember from that IRC conversation was that another user (probably bie)
said that Amfora was using the cache, not when a link was clicked, but when
it was typed in the bottom bar. This should not be happening in my opinion,
but it is. I've opened an issue to address this, and yes, it will be fixed
in the next release.

https://github.com/makeworld-the-better-one/amfora/issues/159


>From Stephane's original email:

> So I was a bit surprised when I was putting together some dynamic
> content for my capsule and found that my client (amfora) was caching
> things. I was able to get around it by using temporary redirects to add
> cache busting query strings.

Amfora does not use the cache for redirects. That is, if the server redirects
the client to a cached page, Amfora will not use the cache to load that page.
The code for this change can be seen here:

https://github.com/makeworld-the-better-one/amfora/commit/b05885e7100a18fb7
afae5d388173b7795c274ec

So if your dynamic content was using redirects already, it should be okay.


nytpu wrote:

> Looking at the Amfora code, it seems like there's 3 limits to the
> cache[1]: number of pages, size in bytes, and number of seconds. This
> seems to be configurable[2],and it looks like it *should* (I may be
> misreading the code) hard reload the page when you refresh a tab[3].

All correct. There are some other things that disable cache as well. Redirects,
and client certs. Any page that Amfora sends a client cert to will not
be cached, and this helps with dynamic apps like Astrobotany.


It sounds like the consensus in this thread is that Amfora should not be
using the cache when the user "clicks" a link. I suppose that makes sense,
but I find that for a lot of cases using the cache improves the browsing
experience. Most content on Gemini is static, and so it's nice when the page
loads instantly, it really makes a noticeable difference, despite how fast
Gemini is.

I'm not quite sure what to do here, as I don't want to break apps, but I
want browsing static content to be nice as well. Thoughts?

Cheers,
makeworld

Link to individual message.

Stephen <stephen (a) drsudo.com>

>It sounds like the consensus in this thread is that Amfora should not be
>using the cache when the user "clicks" a link. I suppose that makes sense,
>but I find that for a lot of cases using the cache improves the browsing
>experience. Most content on Gemini is static, and so it's nice when the page
>loads instantly, it really makes a noticeable difference, despite how fast
>Gemini is.

I think calling anything on this thread a consensus is being generous ;) I 
say since it is your app and you are not doing anything to break the spec 
that you are free to implement cache however you want. I think that your 
reasoning and method for caching makes a lot of sense.

If I recall correctly, amfora has an option to disable caching. In my mind 
that's good enough. I think that in absence of any guarantees in the spec, 
that it is incumbent on the capsule owner to put a cache disclaimer on 
dynamic content (Dynamic content doesn't seem to be the focus of gemini anyway)

I love amfora btw. Hands down my favorite gemini browser. Thanks for all 
your work on it. I follow it on GitHub (I'm sudobash1 there btw) and wow, is it busy :)

~Stephen

Link to individual message.

colecmac@protonmail.com <colecmac (a) protonmail.com>

> I think calling anything on this thread a consensus is being generous ;) I
> say since it is your app and you are not doing anything to break the spec
> that you are free to implement cache however you want. I think that your
> reasoning and method for caching makes a lot of sense.
>
> If I recall correctly, amfora has an option to disable caching. In my mind
> that's good enough. I think that in absence of any guarantees in the spec,
> that it is incumbent on the capsule owner to put a cache disclaimer on dynamic
> content (Dynamic content doesn't seem to be the focus of gemini anyway)

Sounds good to me! I like the solution where I'm already right :)

> I love amfora btw. Hands down my favorite gemini browser. Thanks for all your
> work on it. I follow it on GitHub (I'm sudobash1 there btw) and wow, is it busy :)

Thank you! That means a lot. And thanks for your recent PR.

Cheers,
makeworld

Link to individual message.

---

Previous Thread: [spec] IRIs, IDNs, and all that international jazz

Next Thread: [Users] [ANN] Rocketeer for iOS Major Update