I have this app that makes a gazillion web requests: at least two of them for every account in a list.
my @results = map { overview $_ } @accts;
This is what the code looks like:
sub overview { # HACK ALERT: plenty of shortcuts here which might only work for Mastodon... my $account = shift; my ($username, $domain) = split "@", $account; my $ua = Mojo::UserAgent->new(); my $result; # We should get the first URL from here, looking at the "aliases" key: # curl "https://octodon.social/.well-known/webfinger?resource=acct%3Akensanata%40octodon.social" my $url = "https://$domain/users/$username"; my %obj = (id => $account, url => $url, bio => '', published => ''); eval { $result = $ua->max_redirects(2)->get($url => {Accept => "application/json"})->result; }; if ($@) { $obj{bio} = "<p>Error: $@</p>"; return \%obj; } if (not $result->is_success) { $obj{bio} = "<p>" . $result->code . ": " . $result->message . "</p>"; return \%obj; } $obj{bio} = $result->json->{summary}; my $outbox = $result->json->{outbox}; # We should get this URL from the previous one: # curl -H 'Accept: application/json' https://octodon.social/users/kensanata # gives us the "outbox" key and the value is a URL which we can fetch again # curl https://octodon.social/users/kensanata/outbox # and that gives us a short description including the "first" key which gives us a bunch of statuses # and we just look at the first one $url = "$outbox?page=true"; eval { $result = $ua->max_redirects(2)->get($url => {Accept => "application/json"})->result; }; if ($@) { $obj{published} = "<p>Error: $@</p>"; return \%obj; } if (not $result->is_success) { $obj{published} = "<p>" . $result->code . ": " . $result->message . "</p>"; return \%obj; } $obj{published} = $result->json->{orderedItems}->[0]->{published}; return \%obj; }
When I tried to rewrite it yesterday using promises, I failed.
I called the code as follows:
my @results = overview($c, $name, @accts);
And here’s the code using Mojo::UserAgent's `get_p` which returns a promise and Mojo::Promise's `all` which waits for them.
sub overview { my $c = shift; my $name = shift; my @accounts = @_; # Wrap continuation-passing style APIs with promises my $ua = Mojo::UserAgent->new->max_redirects(2)->inactivity_timeout(20); my @promises; for my $account (@accounts) { my ($username, $domain) = split "@", $account; # "https://octodon.social/users/kensanata" my $url = "https://$domain/users/$username"; push @promises, $ua->get_p($url => {Accept => "application/json"}); } warn "@promises"; Mojo::Promise->all(@promises) ->then(sub { warn "@_"; my @results; for my $promise (@_) { my $result = $promise->[0]->result; if ($result->is_success) { push @results, $result->json; } else { push @results, { id => "A", url => "B", name => "C", summary => "<p>" . $_->code . ": " . $_->message . "</p>" }; } } $c->render(template => 'do_overview', name => $name, accounts => \@results)}) ->catch(sub { my $err = shift; warn "Connection error: $err"; }) ->wait; }
When I run this code, with a list of two accounts whose hosts are up and running, I get a connection error. The line with the hashes is the `warn` line in my code which shows that I do in fact have two promises.
Mojo::Promise=HASH(0x559e19d48038) Mojo::Promise=HASH(0x559e1c099468) Connection error: Premature connection close
I’m staring at the manual pages for Mojo::UserAgent and Mojo::Promise and just don’t understand what I need to change.
A note on unknown hosts: The two accounts I’m checking here are on reachable hosts, so I don’t understand what the premature closing is all about. But even if they were unreachable, I need the code to not abort. Sadly, many of the accounts I’m checking are on hosts that no longer exist, which is part of the reason I need to check them. That’s why the code above wraps the `get` call in an `eval` block. I need to do something like that, somewhere.
Anyway, I wrote a little test script to try and get a minimal working example, but that works as intended:
use Modern::Perl; use Mojo::UserAgent; use Mojo::Promise; my $ua = Mojo::UserAgent->new; my @accounts = qw(kensanata@octodon.social kensanata@dice.camp); my @promises; for my $account (@accounts) { my ($username, $domain) = split "@", $account; my $url = "https://$domain/users/$username"; warn $url; push @promises, $ua->get_p($url => {Accept => "application/json"}); } warn "@promises"; Mojo::Promise->all(@promises) ->then(sub { warn "@_"; my @results; for my $promise (@_) { my $result = $promise->[0]->result; if ($result->is_success) { push @results, $result->json; } else { push @results, { id => "A", url => "B", name => "C", summary => "<p>" . $_->code . ": " . $_->message . "</p>" }; } } say "@results" }) ->catch(sub { my $err = shift; warn "Connection error: $err"; }) ->wait;
The output:
https://octodon.social/users/kensanata at test.pl line 11. https://dice.camp/users/kensanata at test.pl line 11. Mojo::Promise=HASH(0x5595942869b8) Mojo::Promise=HASH(0x559595a8cc38) at test.pl line 14. ARRAY(0x559595b20598) ARRAY(0x559595b39080) at test.pl line 17. HASH(0x559595b20748) HASH(0x559595b396c8)
So... what’s my problem? I asked on IRC. There’s a `#mojo` channel on Freenode. Users `mst` and `CandyAngel` helped me out. The problem was that my code used a variable `$ua` for the user agent inside the `overview` sub. The sub finishes while the promises are still waiting, and thus the `$ua` goes out of scope immediately. The solution is to make sure the user agent is kept alive.
One way is to add `->finally(sub { undef $ua })` in there, or even better: just use the app’s own user agent!
my $ua = $c->app->ua;
That fixed it!
Well... except that stuff still didn’t run in parallel. As I discovered, that’s not what promises are for!
I had two options: previously, I had used `MCE::Loop` to run jobs in parallel (in the code that’s currently disabled). I remember trying to figure out how I might use Mojolicious to do it and failing. So today I tried again. It turns out that you can do it, if you use `Minion`.
Sadly, I again ran into many issues. The documentation is never as straightforward as I expect it to be. The uses cases aren’t clear to me. For example, I could not get it to work with using the default worker, `app->minion->worker`. If I had my own worker and ran `perform_jobs` it didn’t run in parallel. If I ran `run` then it didn’t return. On the `#mojo` channel they said that I should just start the workers once and then leave them. As long as the queue is empty, they aren’t wasting resources. But I just couldn’t get it to work. So in the end I copied an example from the manual that was labelled “a custom worker performing multiple jobs at the same time.” That did the trick.
I still feel bad about using a database backend. I would not have minded an in-memory solution! Perhaps in the end I should have just used `MCE::Loop`.
This uses a temporary database:
plugin Minion => { SQLite => ':temp:' };
This adds a task at the top level:
app->minion->add_task(overview => sub { my ($job, $account) = @_; $job->finish(overview $account) });
And this is the code that uses it, with up to 40 requests in parallel and checking every second if they’re finished and if we should start more.
my @ids = map { $c->app->minion->enqueue(overview => [$_]) } @accounts; my %jobs; my $worker = $c->app->minion->repair->worker->register; do { for my $id (keys %jobs) { delete $jobs{$id} if $jobs{$id}->is_finished; } if (keys %jobs >= 40) { sleep 1 } else { my $job = $worker->dequeue(1); $jobs{$job->id} = $job->start if $job; } } while keys %jobs; $worker->unregister; my @results = map { $c->app->minion->job($_)->info->{result} } @ids; $c->render(template => 'do_overview', name => $name, accounts => \@results);
I think now it works. 😓
#Mojolicious #Perl