2020-11-19 Trying to wrap my head around async Perl

On the Perl channel, they like to recommend IO::Async and Mojo::IOLoop but and in all the years I’ve tried using them, I never got anywhere. Somehow my brain just can’t digest the documentation. I find it so difficult to parse, in fact, that I can’t even tell you whether the documentation is incomplete or not.

Today I decided to give it another try. See how far I get! I want to spider a few links using moku pona and I means having to call some code for a bunch of URLs (fetching stuff and processing it) and I want to return a string from each one of them.

moku pona

Here’s what I cobbled together as I tried to get IO::Async::Loop to work. The run function simulates getting my results, the list of to-dos are the URLs I’m “fetching”.

use Modern::Perl;
use IO::Async::Loop;
say time() . " start";
my $loop = IO::Async::Loop->new;
my @todos = qw(a b c);
my @procs;
sub run { sleep rand(5); say time() . " " . shift; }
for my $todo (@todos) {
  push(@procs, $loop->run_process(
	 code => sub { run($todo) }));
}
for my $proc (@procs) {
  my ($exitcode, $stdout) = $proc->get;
  print $stdout;
}
say time() . " end";

And here’s the kind of output it produces.

1605792335 start
1605792338 a
1605792336 b
1605792337 c
1605792338 end

I can’t tell you whether this how you’re supposed to use it. All I know is that it seems to work: the tasks are accomplished in random order (in the example above, a is the latest to finish), and information is communicated back to the parent process (I’m not even sure it’s forking at all).

I don’t under understand how the loop knows when to end, for example. I don’t know how to read the documentation in order to find out.

I guess it’s creating a bunch of Futures, and as I’m calling “get” on them in the last loop it starts waiting for them, one after another. I think this is fine. We’ll wait the longest for the first one, but the others might all have finished in the background and so when it’s their turn, the Future is already done.

OK, I think I could replace the “run” sub with something more interesting, doing network requests. It feels weird to use STDOUT to get data back, but perhaps this is how forking and inter-process communication ought to work? I don’t know whether this is what IO::Async does.

I’d be using something like this, except instead of returning the string I’d have to print it to STDOUT, I guess.

sub query_gemini {
  my $url = shift;
  my($scheme, $authority, $path, $query, $fragment) =
      $url =~ m|(?:([^:/?#]+):)?(?://([^/?#]*))?([^?#]*)(?:\?([^#]*))?(?:#(.*))?|;
  die "⚠ The URL '$url' must use the gemini scheme\n" unless $scheme and $scheme eq 'gemini';
  die "⚠ The URL '$url' must have an authority\n" unless $authority;
  my ($host, $port) = split(/:/, $authority, 2);
  $port //= 1965;
  my $socket = IO::Socket::SSL->new(
    PeerHost => $host,
    PeerService => $port,
    SSL_verify_mode => SSL_VERIFY_NONE)
      or die "Cannot construct client socket: $@";
  # send data in one go
  print $socket "$url\r\n";
  # read response
  local $/ = undef;
  my ($header, $response) = split(/\r\n/, <$socket>, 2);
  return $response;
}

As I looked at Mojo::IOLoop, I noticed that this might be more interesting: it has code to create clients that get their data piece by piece. I’ll have to rewrite my three clients (Gopher, Gemini, Web), but it might work...

Here’s code using a finger requests (like Gopher) on port 79:

use Modern::Perl;
use Mojo::IOLoop;
say time() . " start";
my @requests = qw(Alex About Diary);
my %responses;
for my $request (@requests) {
  Mojo::IOLoop->client(
    {port => 79, address => 'alexschroeder.ch' }
    => sub {
      my ($loop, $err, $stream) = @_;
      $stream->on(read => sub { my ($stream, $bytes) = @_; $responses{$request} .= $bytes;});
      $stream->write("$request\x0d\x0a");
    })}
# Start event loop if necessary
Mojo::IOLoop->start unless Mojo::IOLoop->is_running;
say time() . " end";
# Results:
for my $request (@requests) {
  say $request;
  say $responses{$request};
  say "--------------------";
}

What’s new is that I can store the (partial) responses in a hash and keep adding to it.

I don’t know how this loop knows that it’s done.

In any case, I think I’m ready to move on to TLS. I can’t just use Mojo::UserAgent because I’m not just interested in HTTPS (which I’ll need to fetch RSS and Atom feeds) but I’m also interested in Gemini, which is a bit like Gopher over TLS, on port 1965.

use Modern::Perl;
use Mojo::IOLoop;
say time() . " start";
my @requests = qw(Alex About Diary);
my %responses;
for my $request (@requests) {
  Mojo::IOLoop->client(
    {port => 1965, address => 'alexschroeder.ch' }
    => sub {
      my ($loop, $err, $stream) = @_;
      $stream->on(
	read => sub { my ($stream, $bytes) = @_; $responses{$request} .= $bytes;});
      $stream->write("gemini://alexschroeder.ch/page/$request\x0d\x0a");
    })}
# Start event loop if necessary
Mojo::IOLoop->start unless Mojo::IOLoop->is_running;
say time() . " end";
# Results:
for my $request (@requests) {
  say $request;
  say $responses{$request};
  say "--------------------";
}

This doesn’t work. I need to add Mojo::IOLoop::TLS to the mix and I have no idea how to do this. Something about that stream not being a handle or whatever. But on the Perl channel user mst pointed out to me that there’s a TLS option for the client that does all that for me! And it works.

use Modern::Perl;
use Mojo::IOLoop;
use Mojo::IOLoop::TLS;
say time() . " start";
my @requests = qw(Alex About Diary);
my %responses;
for my $request (@requests) {
  Mojo::IOLoop->client(
    {port => 1965, address => 'alexschroeder.ch', tls => 1 }
    => sub {
      my ($loop, $err, $stream) = @_;
      $stream->on(read => sub { my ($stream, $bytes) = @_; $responses{$request} .= $bytes;});
      $stream->write("gemini://alexschroeder.ch/page/$request\x0d\x0a");
    })}
# Start event loop if necessary
Mojo::IOLoop->start unless Mojo::IOLoop->is_running;
say time() . " end";
# Results:
for my $request (@requests) {
  say $request;
  say $responses{$request};
  say "--------------------";
}

Sadly, something about the sort of documentation used by IO::Async and Mojo::IOLoop does not agree with my way of reading documentation, unfortunately. I often feel that I only understand it if I already know it. As I kept working on my IO::Async::Loop and Mojo::IOLoop examples, I was confused about many things, read the docs up and dow for about three hours, wasted time reading Mojo::IOLoop::TLS and trying to get that to work, trying to figure out how Mojo::IOLoop::Client plays into this. It was a very frustrating experience, for me.

But, on a happier note: moku-pona now starts all the downloads in parallel. I’m not sure how job management actually works. I have no idea how many “worker threads” (or whatever the correct terminology is) Mojo::IOLoop uses, but updating is now definitely faster than back when I downloaded every link one after the other.

​#Perl