💾 Archived View for bbs.geminispace.org › s › Lagrange › 12049 captured on 2023-12-28 at 16:02:45. Gemini links have been rewritten to link to archived content

View Raw

More Information

-=-=-=-=-=-=-

Lagrange plugins?

Lagrange plugins?

(Apologies in advance for being took lazy to scan Lagrange documentation, and for likely using clunky terminology in what follows.)

Does Lagrange support adding something akin to a plugin that Lagrange would pass the gemtext about to be displayed, but then *instead* render the gemtext modified/output/passed back by the plugin?

I ask, because I played a bit with a simple "detect bad links" idea, which at the moment is this Lua script:

#! /usr/bin/env lua
for line in io.stdin:lines() do
  local url,title = string.match(line, '^=>%s*(%S*)%s*(.*)


)
  if url then
    local handle = io.popen('gemget ' .. url .. ' 2>&1')
    local output = handle:read('*a')
    handle:close()
    if string.match(output, '^Error: failed to connect') then
      print('DEAD ' .. line)
    else
      print(line)
    end
  else
    print(line)
  end
end

In case it's not obvious what's going on there :-) , the script reads gemtext lines on stdin, examining each line. Non-link lines are simply written to stdout immediately. Link lines are parsed for url, which is passed to "gemget", whose output is examined to see if it begins with "Error: failed to connect", in which case line is written to stdout prepended with "DEAD ", basically converting it to a paragraph clearly indicating a dead link.

I realize there are likely slicker ways to accomplish that ("gemget --help" doesn't say anything about exit codes...) (and maybe "gemget" is overkill...?), but if Lagrange could pass such a thing the incoming gemtext, and then receive resulting gemtext, well, I do believe I'd be accomplishing what I want without having to burden @skyjake and others with such.

So, for example, the script converts this incoming gemtext:

# This is the article title

Once upon a time, blah, blah

=> gemini://gemini.conman.org/boston/2023/11/29.1 good link



=> gemini://hello.world bad link

and then all was well!

to this:

# This is the article title

Once upon a time, blah, blah

=> gemini://gemini.conman.org/boston/2023/11/29.1 good link



DEAD => gemini://hello.world bad link

and then all was well!

Yes, it does take a while for gemget to return on bum urls, but I'd be blowing that time following them anyway, so I'm thinking this route would be more efficient...?

Posted in: s/Lagrange

☯️ oldernow

Nov 29 · 4 weeks ago

8 Comments ↓

🚀 skyjake · Nov 30 at 05:00:

Lagrange does indeed have a way to do this. It is called "MIME hooks" and it is documented in Help, section 4.

However, I doubt you'll want to use this. The filters you register via MIME hooks are unconditionally run on every matching page. This would make every Gemtext page with links load extremely slowly as it has to finish checking each link before showing the filtered results. Furthermore, you'd run the risk of overburdening slower servers by sending them too many requests at once.

I have been planning to improve the hooks system with things like applying the filter only manually when requested, but it hasn't been a high priority.

☯️ oldernow · Nov 30 at 14:44:

@skyjake Thanks for the "MIME hooks" documentation pointer!

However, I doubt you'll want to use this. The filters you register via MIME hooks are unconditionally run on every matching page. This would make every Gemtext page with links load extremely slowly as it has to finish checking each link before showing the filtered results. Furthermore, you'd run the risk of overburdening slower servers by sending them too many requests at once.

I'm not understanding the "at once" part of that, as I'm running gemget in a loop, subsequent invocations not running until the one before them completes. Or would hitting a server several times in a row, each spaced two seconds apart still be considered being impolite, as it were?

FWIW, I made improvements to the aforementioned Lua script, leveraging obtaining an exit status from os.execute() instead of parsing gemget stderr from io.popen(). (I think there's a way to get command exit status down the io.popen() path, but I'm not remembering the details, and a quick grep'ing of old scripts didn't reveal anything.). I also noticed gemget has a "--connect-timeout" option. Two seconds will probably cause me to miss out on some links unnecessarily, but then I tend toward impatience and probably wouldn't want to be waiting on such links regardless the content given my Gemini travels have me rough-estimating that maybe 1% of links overall ever lead to something that "really does it for me":

#! /usr/bin/env lua
for line in io.stdin:lines() do
	local url,title = string.match(line, '^=>%s*(%S*)%s*(.*)


)
	if url then
		local exit_okay, exit_termination_type, exit_status = os.execute('gemget --connect-timeout 2 ' .. url .. ' 1>/dev/null 2>/dev/null')
		if exit_status == 1 then
			print('DEAD ' .. line)
		else
			print(line)
		end
	else
		print(line)
	end
end

I ran it against Antenna for kicks:

time gemget gemini://warmedal.se/~antenna/ | test-gemini-links 2>&1 | tee test.gmi

That took 1 minute 7 seconds, checking 86 links, eight of which were considered "DEAD". But seven of those were for irrelevant relative URLs. The one that wasn't was a good link that simply took longer than two seconds to pursue.

I'm not sure I even need to use the "MIME hooks" feature given I can run the likes of the above, and then start Lagrange with the resulting file (test.gmi in this case) as a URL argument, and a little scripting could make that happen for me.

BTW, I love how Lagrange simply adds a tab for subsequent command line invocations if there's already an instance running!

☯️ oldernow · Nov 30 at 14:48:

Ooops... my previous comment somehow lead to the initial posting having two titles... not sure why... I'm not going to try to edit it out because I can imagine Murphy's Law having a field day with my not completely understanding editing post titles/segments... :-)

☯️ oldernow · Nov 30 at 14:54:

And, of course, now I'm realizing not all links are going to be gemini:// links, and of course getgem fails on them, making all https:// links look bad... :-) It never takes long to re-remember why I eventually couldn't do software development anymore.... :-) So I guess I'll play with using cURL against https:// links to at least cover that case, which I imagine to be the most frequent non-gemini:// case....

🚀 skyjake · Nov 30 at 15:17:

It is probably best to run this as a separate tool/script as you've noted. It is effectively a little crawler, after all.

An interesting related feature in Lagrange could be the creation of an offline archive by saving contents of a capsule and all of its linked pages up to a chosen depth. That would also determine if any links are bad.

When it comes to making too many requests, multiple ones spaced out a few seconds apart are already too much for the lowest-end hardware running servers in Geminispace.

🍵 michaelnordmeyer · Dec 01 at 07:37:

What happened to Gemini's paradigm "one page, one request?"

☯️ oldernow · Dec 01 at 16:29:

Now that I know the "two finger tap" trick to properly display a working context menu in Lagrange, exercising my "check links" code goes like this:

@skyjake that makes me wonder whether there's already (or might be in the future..) the means to customize the "context menu"? It would be nice to be able to add an action that calls a user script to process and return a modified version of the page being viewed - in place. Then the above list could be reduced to just the first bullet.

🚀 skyjake · Dec 01 at 16:44:

At the moment, there is no way for one to add actions to the context menu. It's a nice idea, though. I'll add it to my list of things to do.