2021-01-01 Oddmuse memory issues

Not really! It’s Phoebe… 😭

I just noticed that three of my server processes seem to consume a lot of memory. The following three wikis were reporting more than 2G used, each:

Strangely, this site, or Campaign Wiki seemed not to be affected. I don’t see any performance problems.

Memory issues started on week 51, 2020

I’m suspecting that something else is going on. I restarted the the three processes using Hypnotoad, a hot deployment. But htop is still telling me that we’re using a lot of memory. When I sort by memory, however, I see it’s Phoebe at the very top.

Phoebe is consuming 40% if all memory

So now I’m wondering:

Something to investigate in the upcoming days. For now, I’m going to restart Phoebe.

Memory use is halved, now

I fixed the regular expressions for my Munin setup so that all the command line parameters I’m using to start Phoebe don’t trigger matches for other processes such as Community Wiki, Emacs Wiki, or Oddmuse Wiki.

That still leaves the question: What did I install, what changes did I make around December 14? I have no idea...

​#Phoebe

Comments

(Please contact me if you want to remove your comment.)

Yep, Phoebe is at fault.

Memory use increasing for Phoebe only

So now I have to find a way to reproduce this result locally. Ideally we’ll find that there’s a single config file at fault... If I’m unlucky it’s because Phoebe 2 switched from Net::Server to Mojo::IOLoop and I’m somehow not closing the connections correctly, or some reference remains that prevents them from closing 100%. Yikes.

– Alex 2021-01-03 09:57 UTC

---

OK, I think I have a test setup that works. Let me know if you see an error in my thinking.

The test starts a server in the background, listening to a random port ($port) like all the other tests. I need its process id so that I can determine the memory it uses:

lsof -i:$port -F p

Given the process id ($pid), I can now ask for the memory size of the process:

ps -q $pid -o size

So what I’m doing is I get this size after starting the server, then I run 100 requests, and I get the size again. I run just Phoebe, no problem. If I install the speed bump extension (see 2020-12-25 Defending against crawlers), size goes up.

2020-12-25 Defending against crawlers

Well, I guess I’m fine with it going up a bit since we need to keep some numbers in memory, right? There problem is only truly a problem if the memory keeps going up. Let’s try.

100 requests:

$ make test TEST_FILES=t/leak.t
PERL_DL_NONLAZY=1 "/home/alex/perl5/perlbrew/perls/perl-5.30.0/bin/perl" "-MExtUtils::Command::MM" "-MTest::Harness" "-e" "undef *Test::Harness::Switches; test_harness(0, 'blib/lib', 'blib/arch')" t/leak.t
t/leak.t .. 1/?
#   Failed test 'No memory lost (42524 = 43256)'
#   at t/leak.t line 33.
#          got: '42524'
#     expected: '43256'
# Looks like you failed 1 test of 1.
t/leak.t .. Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/1 subtests

Test Summary Report
-------------------
t/leak.t (Wstat: 256 Tests: 1 Failed: 1)
  Failed test:  1
  Non-zero exit status: 1
Files=1, Tests=1,  2 wallclock secs ( 0.02 usr  0.00 sys +  0.71 cusr  0.07 csys =  0.80 CPU)
Result: FAIL
Failed 1/1 test programs. 1/1 subtests failed.
make: *** [Makefile:940: test_dynamic] Error 1

1000:

#   Failed test 'No memory lost (42528 = 43256)'

10000:

#   Failed test 'No memory lost (42524 = 43256)'

OK, clearly it just stays the same. Hm...

Perhaps it’s time to take a look at Devel::MAT. Too bad Devel::MAT::Dumper doesn’t install on the server. Something about xlocale.h missing. Installing libnewlib-dev (which provides /usr/include/newlib/xlocale.h) appears to have no effect:

Devel::MAT

Building Devel-MAT-Dumper
cc -I/home/alex/perl5/perlbrew/perls/perl-5.26.1/lib/5.26.1/x86_64-linux/CORE -DVERSION="0.42" -DXS_VERSION="0.42" -fPIC -c -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2 -O2 -o lib/Devel/MAT/Dumper.o lib/Devel/MAT/Dumper.c
In file included from lib/Devel/MAT/Dumper.xs:8:
/home/alex/perl5/perlbrew/perls/perl-5.26.1/lib/5.26.1/x86_64-linux/CORE/perl.h:738:13: fatal error: xlocale.h: No such file or directory
 #   include <xlocale.h>
             ^~~~~~~~~~~
compilation terminated.

OK. Before doing anything else, I’m going to use Perlbrew to upgrade Perl 5.26 to 5.32. Then we’ll see about installing Devel::MAT.

– Alex 2021-01-03 17:38 UTC

---

OK, that worked! I wrote a little extension that allows me to write a heap dump to disk on the server.

Download heap-dump.pl

Phoebe configuration

Devel::MAT::UserGuide

Install it in your conf.d directory, set the known fingerprints to the hash of your favourite fingerprint in your config file (or edit the heap-dump.pl file):

our @known_fingerprints = qw(
  sha256$54c0b95dd56aebac1432a3665107d3aec0d4e28fef905020ed6762db49e84ee1);

Install the Devel::MAT::Dump package on your server, and Devel::MAT on your development machine. Restart Phoebe.

Finally, visit the path /do/heap-dump on your Phoebe instance. You’ll be prompted for your client certificate. Point your Gemini client to the one whose fingerprint you added your config above, and now it should say: “Heap Dump Saved” – copy the file it created to your local machine and examine it, using the user guide linked above as your guide.

Here’s the output right after I restarted it:

$ pmat phoebe.pmat
Perl memory dumpfile from perl 5.32.0 non-threaded
Heap contains 217470 objects
pmat> largest
HASH(24948) at 0x558c89875548=strtab: 1.7 MiB
HASH(5672) at 0x558c89b3c450: 261.0 KiB
SCALAR(PV) at 0x558c89beaa18: 215.7 KiB
ARRAY(1,!REAL) at 0x558c89899098: 92.5 KiB
HASH(2231) at 0x558c8aac5240: 84.3 KiB
others: 15.2 MiB

I’m going to let it run over night and we’ll see how it does tomorrow.

– Alex 2021-01-03 21:34 UTC

---

OK, a few hours later...

$ pmat phoebe.pmat
Perl memory dumpfile from perl 5.32.0 non-threaded
Heap contains 313361 objects
pmat> largest
HASH(39010) at 0x558c89875548=strtab: 2.4 MiB
HASH(8940) at 0x558c89b8f2a8: 337.6 KiB
HASH(5672) at 0x558c89b3c450: 261.0 KiB
SCALAR(PV) at 0x558c98eaf6e8: 215.7 KiB
SCALAR(PV) at 0x558c89beaa18: 215.7 KiB
others: 21.0 MiB

That is not very encouraging. There apparently is no single thing that grew in limitless ways. 😢

pmat> count
  Kind        Count (blessed)        Bytes  (blessed)
  ARRAY       11108         7      1.2 MiB  520 bytes
  CODE        10387                1.3 MiB
  GLOB        14796        16      2.1 MiB    2.4 KiB
  HASH        23305      3182      7.4 MiB  849.6 KiB
  INVLIST       655              674.4 KiB
  IO             33        33      5.2 KiB    5.2 KiB
  PAD          7870                1.5 MiB
  REF         41199         1    965.6 KiB   24 bytes
  REGEXP       2503       253    528.0 KiB   53.4 KiB
  SCALAR     200878        12      8.0 MiB 1022 bytes
  STASH         627              801.6 KiB
  -------    ------ ---------    --------- ----------
  (total)    313361      3504     24.5 MiB  912.1 KiB

What I don’t really understand is that the numbers I’m seeing are all around 20 MiB, and yet Munin is telling me that the minimum memory is 97.31 MiB and the maximum is 266.54 MiB (also the current value). So where are the 170 MiB I’m missing?

Hm. Let’s take a look at the two scalar values we saw up in the list of “largest” SVs. Surprisingly, they are exactly the same size. Let’s examine them.

...
SCALAR(PV) at 0x558c98eaf6e8: 215.7 KiB
SCALAR(PV) at 0x558c89beaa18: 215.7 KiB
...

pmat> show 0x558c98eaf6e8
SCALAR(PV) at 0x558c98eaf6e8 with refcount 1
  size 215.7 KiB (220840 bytes)
  PV="// This Source Code Form is subject to the terms of the Mozilla "...
  PVLEN 220798

pmat> show 0x558c89beaa18
SCALAR(PV) at 0x558c89beaa18 with refcount 1
  size 215.7 KiB (220840 bytes)
  PV="// This Source Code Form is subject to the terms of the Mozilla "...
  PVLEN 220798

OK, that is surprising. I don’t recognize this text. Perhaps there are more like it? After all, the count shows 200878 scalars. What’s with the refcount? Who is keeping those references?

pmat> inrefs --all 0x558c98eaf6e8

pmat> inrefs --all 0x558c89beaa18
s  a constant  CODE(PP) at 0x558c89b8f080

OK, nobody is referring to the first one, which is even more surprising. Perhaps it’s about to be garbage collected? The second value gives us a hint:

pmat> show 0x558c89b8f080
CODE(PP) at 0x558c89b8f080 with refcount 1
  size 128 bytes
  named as &IO::Socket::SSL::PublicSuffix::_builtin_data
  no hekname
  stash=STASH(19) at 0x558c89bd0f98
  glob=GLOB(&*) at 0x558c89bea460
  location=/home/alex/perl5/perlbrew/perls/perl-5.32.0/lib/site_perl/5.32.0/IO/Socket/SSL/PublicSuffix.pm line 348
  scope=CODE() at 0x558c89c4f820
  pad[0]=PAD(1) at 0x558c89b8f188

The SSL stuff. 😭 This reminds me of the only test in gemini-diagnostics Phoebe is currently failing: “Server should send a close_notify alert before closing the connection.”

gemini-diagnostics

Perhaps that’s my problem. The SSL/TLS connections aren’t closed correctly, so they hang around.

I’ll have to think about this some more.

– Alex 2021-01-04 08:12 UTC

---

Ten hours later, Munin says 418.04 MiB, and the heap dump says:

pmat> largest
HASH(46066) at 0x558c89875548=strtab: 3.3 MiB
HASH(15996) at 0x558c89b8f2a8: 631.0 KiB
SCALAR(PV) at 0x558c8af5d8a0: 305.0 KiB
HASH(5672) at 0x558c89b3c450: 261.0 KiB
SCALAR(PV) at 0x558c98cc4d48: 232.0 KiB
others: 24.5 MiB

pmat> count
  Kind        Count (blessed)        Bytes  (blessed)
  ARRAY       12116         7      1.3 MiB  520 bytes
  CODE        11395                1.4 MiB
  GLOB        14796        16      2.1 MiB    2.4 KiB
  HASH        30361      3182      9.6 MiB  849.6 KiB
  INVLIST       655              674.4 KiB
  IO             33        33      5.2 KiB    5.2 KiB
  PAD          8878                1.8 MiB
  REF         56319         1      1.3 MiB   24 bytes
  REGEXP       2510       253    529.5 KiB   53.4 KiB
  SCALAR     228143        12      9.7 MiB 1022 bytes
  STASH         627              801.6 KiB
  -------    ------ ---------    --------- ----------
  (total)    365833      3504     29.2 MiB  912.1 KiB

– Alex 2021-01-04 17:40 UTC

---

Ok, let’s look at the ever growing hash at 0x558c8d32a720:

pmat> values 0x558c89b8f2a8
  ...
  {ptr_0x558c8d3d5f40} REF() at 0x558c8d29c850 => HASH(1) at 0x558c8d2ab990
  {ptr_0x558c8d3eabc0} REF() at 0x558c8d2ec190 => HASH(1) at 0x558c8d29fc48
  {ptr_0x558c8d3eb470} REF() at 0x558c8d3bdc18 => HASH(1) at 0x558c8d3bdc30
  {ptr_0x558c8d3f6cd0} REF() at 0x558c8d2f3e00 => HASH(1) at 0x558c8d32a720
  ... (15946 more)

pmat [more]> values 0x558c8d32a720
  {"ssleay_verify_callback!!func"} REF() at 0x558c8d3a4758 => CODE(PP,closure) at 0x558c8d32aac8

pmat> values 0x558c8d3bdc30
  {"ssleay_verify_callback!!func"} REF() at 0x558c8d3bdca8 => CODE(PP,closure) at 0x558c8d3bd8e8

pmat> values 0x558c8d29fc48
  {"ssleay_verify_callback!!func"} REF() at 0x558c8d2ac260 => CODE(PP,closure) at 0x558c8d32aac8

pmat> values 0x558c8d2ab990
  {"ssleay_verify_callback!!func"} REF() at 0x558c8d29d078 => CODE(PP,closure) at 0x558c8d32aac8

Ugh, SSL again! Verify callback – that also rings a bell. Phoebe code:

IO::Socket::SSL::set_defaults(
  SSL_verify_mode => SSL_VERIFY_PEER,
  SSL_verify_callback => \&verify_fingerprint);

Hm. So now I’m thinking: a hash with 16,000 SSL sockets?

– Alex 2021-01-04 20:05 UTC

---

I’ve made a small change: as the stream closes, I undefine the $data reference. I figured that something close to this loop (”for every socket?”) must be keeping references around.

Mojo::IOLoop->server({
  address => $address,
  port => $port,
  tls => 1,
  tls_cert => $server->{cert_file},
  tls_key  => $server->{key_file},
} => sub {
  my ($loop, $stream) = @_;
  my $data = { buffer => '', handler => \&handle_request };
  $stream->on(read => sub {
    my ($stream, $bytes) = @_;
    $log->debug("Received " . length($bytes) . " bytes");
    $data->{buffer} .= $bytes;
    $data->{handler}->($stream, $data) });
  $stream->on(close => sub {
    my $stream = shift;
    undef($data);
   }) });

Twelve hours later and Munin reports a current RSS of 206 MiB. That’s much better than anticipated! I’ll keep it running for another day or so, see how it develops.

– Alex 2021-01-06 07:55 UTC

---

A few more hours and memory is down to 174 MiB, so memory actually went *down*. Amazing! 😄

pmat> largest
HASH(29739) at 0x55cf4fc7c548=strtab: 2.0 MiB
HASH(5672) at 0x55cf4ff43d60: 261.0 KiB
SCALAR(PV) at 0x55cf4fff2238: 215.7 KiB
HASH(4782) at 0x55cf4ff96aa8: 176.1 KiB
ARRAY(1,!REAL) at 0x55cf4fca01d8: 92.5 KiB
others: 17.2 MiB

pmat> count
  Kind        Count (blessed)        Bytes  (blessed)
  ARRAY       10198         7      1.2 MiB  520 bytes
  CODE         9603                1.2 MiB
  GLOB        14374        16      2.1 MiB    2.4 KiB
  HASH        11712      3181      5.0 MiB  884.5 KiB
  INVLIST       583              652.3 KiB
  IO             33        33      5.2 KiB    5.2 KiB
  PAD          7079                1.3 MiB
  REF         24869         1    582.9 KiB   24 bytes
  REGEXP       2303       253    485.8 KiB   53.4 KiB
  SCALAR     173493        12      6.8 MiB 1022 bytes
  STASH         610              776.3 KiB
  -------    ------ ---------    --------- ----------
  (total)    254857      3503     19.9 MiB  946.9 KiB

– Alex 2021-01-06 12:14 UTC

---

Too bad with memory climbing again. That’s wasn’t the answer, or not the entire answer, sadly! Back at 325 MiB.

Phoebe rising

When I look at this, at all the complications, and the memory used, and compare it to the implementations based on Net::Server (”old style”) with it’s 23 MiB (used for the Gopher and Finger server on ports 70 and 79, as well as the SSL enabled Gopher on port 7334) – I don’t have happy thoughts.

pmat> count --blessed
  Kind                                             Count (blessed)        Bytes  (blessed)
  ARRAY                                            11506         7      1.3 MiB  520 bytes
      Math::BigInt::Calc                               0         3      0 bytes  216 bytes
      Net::DNS::RR::OPT                                0         2      0 bytes  160 bytes
      Class::Struct::Tie_ISA                           0         1      0 bytes   64 bytes
      Net::DNS::RR                                     0         1      0 bytes   80 bytes
  CODE                                             10786                1.3 MiB
  GLOB                                             14792        16      2.1 MiB    2.4 KiB
      IO::Socket::IP                                   0        15      0 bytes    2.2 KiB
      IO::Socket::SSL                                  0         1      0 bytes  152 bytes
  HASH                                             26086      3182      8.2 MiB  884.6 KiB
      Specio::DeclaredAt                               0      1537      0 bytes  324.2 KiB
      Specio::Constraint::Simple                       0      1428      0 bytes  483.8 KiB
      DateTime::Format::Builder::Parser::Regex         0        67      0 bytes   15.9 KiB
      Specio::Constraint::Parameterizable              0        60      0 bytes   22.1 KiB
      Mojo::IOLoop::Server                             0        15      0 bytes    3.7 KiB
      Specio::Constraint::ObjectIsa                    0        15      0 bytes    5.5 KiB
      Specio::Constraint::Enum                         0        10      0 bytes    3.8 KiB
      Specio::Constraint::Union                        0         7      0 bytes    2.2 KiB
      Specio::Constraint::AnyCan                       0         6      0 bytes    2.2 KiB
      Specio::Constraint::ObjectCan                    0         5      0 bytes    1.8 KiB
      (others)                                         0        32      0 bytes   19.5 KiB
  INVLIST                                            651              662.6 KiB
  IO                                                  33        33      5.2 KiB    5.2 KiB
      IO::File                                         0        33      0 bytes    5.2 KiB
  PAD                                               8269                1.6 MiB
  REF                                              47156         1      1.1 MiB   24 bytes
      IO::Socket::SSL::SSL_HANDLE                      0         1      0 bytes   24 bytes
  REGEXP                                            2491       253    525.4 KiB   53.4 KiB
      Regexp                                           0       253      0 bytes   53.4 KiB
  SCALAR                                          211663        12      8.2 MiB 1022 bytes
      Encode::XS                                       0         5      0 bytes  360 bytes
      JSON::PP::Boolean                                0         4      0 bytes  288 bytes
      Cpanel::JSON::XS                                 0         2      0 bytes  292 bytes
      Math::BigInt                                     0         1      0 bytes   82 bytes
  STASH                                              627              803.2 KiB
  --------------------------------------------    ------ ---------    --------- ----------
  (total)                                         334060      3504     25.8 MiB  947.1 KiB

I don’t understand what this means… I looks like everything is OK?

Comparing the two heap dumps I have:

$ pmat-counts phoebe-2021-01-06T*
phoebe-2021-01-06T13:14.pmat
  Kind        Count (blessed)        Bytes  (blessed)
  ARRAY       10198         7      1.2 MiB  520 bytes
  CODE         9603                1.2 MiB
  GLOB        14374        16      2.1 MiB    2.4 KiB
  HASH        11712      3181      5.0 MiB  884.5 KiB
  INVLIST       583              652.3 KiB
  IO             33        33      5.2 KiB    5.2 KiB
  PAD          7079                1.3 MiB
  REF         24869         1    582.9 KiB   24 bytes
  REGEXP       2303       253    485.8 KiB   53.4 KiB
  SCALAR     173493        12      6.8 MiB 1022 bytes
  STASH         610              776.3 KiB
  -------    ------ ---------    --------- ----------
  (total)    254857      3503     19.9 MiB  946.9 KiB
phoebe-2021-01-06T22:35.pmat
  Kind        Count         (blessed)            Bytes              (blessed)
  ARRAY       11506(+1308)          7          1.3 MiB(+119.4 KiB)  520 bytes
  CODE        10786(+1183)                     1.3 MiB(+147.9 KiB)
  GLOB        14792(+418)          16          2.1 MiB(+62.0 KiB)     2.4 KiB
  HASH        26086(+14374)      3182(+1)      8.2 MiB(+3.2 MiB)    884.6 KiB(+168 bytes)
  INVLIST       651(+68)                     662.6 KiB(+10.3 KiB)
  IO             33                33          5.2 KiB                5.2 KiB
  PAD          8269(+1190)                     1.6 MiB(+330.7 KiB)
  REF         47156(+22287)         1          1.1 MiB(+522.4 KiB)   24 bytes
  REGEXP       2491(+188)         253        525.4 KiB(+39.7 KiB)    53.4 KiB
  SCALAR     211663(+38170)        12          8.2 MiB(+1.5 MiB)   1022 bytes
  STASH         627(+17)                     803.2 KiB(+26.8 KiB)
  -------    -------------- -------------    --------------------- ----------------------
  (total)    334060(+79203)      3504(+1)     25.8 MiB(+5.9 MiB)    947.1 KiB(+168 bytes)

The difference seems minimal. And yet, ps -e rss reports 325 MiB instead of 174 MiB used.

– Alex 2021-01-06 21:30 UTC

---

Well… What do we have? I have three heap dump files for the current process:

23M 2021-01-06 13:14 phoebe-2021-01-06T13:14.pmat
29M 2021-01-06 22:35 phoebe-2021-01-06T22:35.pmat
42M 2021-01-07 21:03 phoebe-2021-01-07T21:03.pmat

RSS reported by ps is 174 MiB, 325 MiB, and 822 Mib. It’s not looking good. What amazes me, however, is that the memory usage as reported by Munin using “/proc/meminfo” doesn’t seem to reflect that. Or maybe it does?

Anyway, I’m experimenting with pmat tools. The output of “pmat-leakreport” is extremly long. It’s pages and pages of this:

$ pmat-leakreport phoebe-2021-01-0[67]*
…
LEAK[2] HASH(1) at 0x55cf5e4d3e00
LEAK[2] REF() at 0x55cf5e472f08
LEAK[2] SCALAR(PV) at 0x55cf5e2c7318
LEAK[2] SCALAR(UV) at 0x55cf5e4b8b78
LEAK[2] HASH(1) at 0x55cf5ab96ce0
LEAK[2] HASH(1) at 0x55cf5b69f8b8
LEAK[2] HASH(1) at 0x55cf604236c8
LEAK[2] UNDEF() at 0x55cf62d869c0
LEAK[2] GLOB($*) at 0x55cf5e039088
LEAK[2] HASH(1) at 0x55cf60a59478
LEAK[2] REF() at 0x55cf62887cf0
LEAK[2] SCALAR(UV) at 0x55cf5e49a430
LEAK[2] REF() at 0x55cf63108b28
LEAK[2] UNDEF() at 0x55cf5d57cfb8
LEAK[2] SCALAR(UV) at 0x55cf5e401578
LEAK[2] UNDEF() at 0x55cf6086c7a0
LEAK[2] REF() at 0x55cf5ecf80c0
LEAK[2] REF() at 0x55cf5e369e00
LEAK[2] UNDEF() at 0x55cf5fa091e8
LEAK[2] HASH(1) at 0x55cf5bc07e48
LEAK[2] UNDEF() at 0x55cf62a23f60
LEAK[2] UNDEF() at 0x55cf5c78a818
LEAK[2] SCALAR(UV) at 0x55cf5e48d5c0
LEAK[2] REF() at 0x55cf6137ebf8

I don’t know what to make of this. The documenation says:

A leaking SV is one that appears and then never gets touched again. To detect them, we need to look for SVs that appear between two files, and then don’t disappear again. Any that do disappear were simply temporaries and can be ignored.

So, all of these are new appearances, and they don’t disappear again. 😟

Let’s look at them, starting at the end:

pmat> identify 0x55cf6137ebf8
REF() at 0x55cf6137ebf8 is:
└─value {"ssleay_verify_callback!!func"} of HASH(1) at 0x55cf6137ec58, which is:
  └─(via RV) value {ptr_0x55cf614d66c0} of HASH(35071) at 0x55cf4ff96aa8, which is:
    └─not found

pmat> identify 0x55cf5e48d5c0
SCALAR(UV) at 0x55cf5e48d5c0 is:
└─value {"\\0"} of HASH(1) at 0x55cf5e48d5a8, which is:
  └─(via RV) value {rakkestad} of HASH(725) at 0x55cf5e44f090, which is:
    └─(via RV) value {no} of HASH(1526) at 0x55cf4ffdf998, which is:
      └─(via RV) value {tree} of HASH(2)=IO::Socket::SSL::PublicSuffix at 0x55cf5e546aa0, which is:
        └─(via RV) value {"1"} of HASH(1) at 0x55cf4ffd88a8, which is:
          └─the lexical %default at depth 1 of CODE(PP) at 0x55cf4ffd8440, which is:
            └─the symbol '&IO::Socket::SSL::PublicSuffix::default'

pmat> identify 0x55cf5c78a818
UNDEF() at 0x55cf5c78a818 is:
└─pad temporary 29 at depth 1 of CODE(PP,closure) at 0x55cf5c719058, which is:
  ├─(via RV) value {"ssleay_verify_callback!!func"} of HASH(1) at 0x55cf5c681250, which is:
  │ └─(via RV) value {ptr_0x55cf5c8999e0} of HASH(35071) at 0x55cf4ff96aa8, which is (*A):
  │   └─not found
  ├─(via RV) value {"ssleay_verify_callback!!func"} of HASH(1) at 0x55cf5c718b00, which is:
  │ └─(via RV) value {ptr_0x55cf5c891a30} of HASH(35071) at 0x55cf4ff96aa8, which is:
  │   └─not found
  ├─(via RV) value {"ssleay_verify_callback!!func"} of HASH(1) at 0x55cf5c86dea8, which is:
  │ └─(via RV) value {ptr_0x55cf5c88cff0} of HASH(35071) at 0x55cf4ff96aa8, which is:
  │   └─not found
  ├─(via RV) value {"ssleay_verify_callback!!func"} of HASH(1) at 0x55cf5c86def0, which is:
  │ └─(via RV) value {ptr_0x55cf5c89a180} of HASH(35071) at 0x55cf4ff96aa8, which is:
  │   └─not found
  ├─(via RV) value {"ssleay_verify_callback!!func"} of HASH(1) at 0x55cf5c7fc768, which is:
  │ └─(via RV) value {ptr_0x55cf5c871bd0} of HASH(35071) at 0x55cf4ff96aa8, which is:
  │   └─not found
  ├─(via RV) value {"ssleay_verify_callback!!func"} of HASH(1) at 0x55cf5c78a7d0, which is:
  │ └─(via RV) value {ptr_0x55cf5c8991c0} of HASH(35071) at 0x55cf4ff96aa8, which is:
  │   └─not found
  └─(via RV) value {"ssleay_verify_callback!!func"} of HASH(1) at 0x55cf5c7fc7f8, which is:
    └─(via RV) value {ptr_0x55cf5c8995d0} of HASH(35071) at 0x55cf4ff96aa8, which is:
      └─not found

pmat> identify 0x55cf62a23f60
UNDEF() at 0x55cf62a23f60 is:
└─the lexical $ctx_store at depth 1 of CODE(PP,closure) at 0x55cf62a24770, which is:
  ├─(via RV) value {"ssleay_verify_callback!!func"} of HASH(1) at 0x55cf62abd3d0, which is:
  │ └─(via RV) value {ptr_0x55cf62ba4960} of HASH(35071) at 0x55cf4ff96aa8, which is (*A):
  │   └─not found
  ├─(via RV) value {"ssleay_verify_callback!!func"} of HASH(1) at 0x55cf62991a00, which is:
  │ └─(via RV) value {ptr_0x55cf62b68370} of HASH(35071) at 0x55cf4ff96aa8, which is:
  │   └─not found
  ├─(via RV) value {"ssleay_verify_callback!!func"} of HASH(1) at 0x55cf62667430, which is:
  │ └─(via RV) value {ptr_0x55cf62b628d0} of HASH(35071) at 0x55cf4ff96aa8, which is:
  │   └─not found
  ├─(via RV) value {"ssleay_verify_callback!!func"} of HASH(1) at 0x55cf62667f70, which is:
  │ └─(via RV) value {ptr_0x55cf62ba5100} of HASH(35071) at 0x55cf4ff96aa8, which is:
  │   └─not found
  ├─(via RV) value {"ssleay_verify_callback!!func"} of HASH(1) at 0x55cf62a24b48, which is:
  │ └─(via RV) value {ptr_0x55cf62b72e10} of HASH(35071) at 0x55cf4ff96aa8, which is:
  │   └─not found
  ├─(via RV) value {"ssleay_verify_callback!!func"} of HASH(1) at 0x55cf62a24398, which is:
  │ └─(via RV) value {ptr_0x55cf62ba4550} of HASH(35071) at 0x55cf4ff96aa8, which is:
  │   └─not found
  └─(via RV) value {"ssleay_verify_callback!!func"} of HASH(1) at 0x55cf62abd160, which is:
    └─(via RV) value {ptr_0x55cf62ba4140} of HASH(35071) at 0x55cf4ff96aa8, which is:
      └─not found

pmat> values 0x55cf5bc07e48
  {"ssleay_verify_callback!!func"} REF() at 0x55cf5bb9e678 => CODE(PP,closure) at 0x55cf5bd3d8d8

OK, I’d it’s pretty clear by now: the “ssleay_verify_callback” stuff is bonkers!

Verify callback – remember the Phoebe code? Also note how “%default” is mentioned above.

IO::Socket::SSL::set_defaults(
  SSL_verify_mode => SSL_VERIFY_PEER,
  SSL_verify_callback => \&verify_fingerprint);

But I just read the IO::Socket::SSL man page. It says: “If you have a server and it looks like you have a memory leak you might check the size of your session cache. Default for Net::SSLeay seems to be 20480, see the example for SSL_create_ctx_callback for how to limit it.”

OK. I’m going to try that instead. The previous code had no effect. I’m limiting the session cache to 128, like in the example provided. Restarted the server. Memory is down below 100 MiB again:

# pgrep -f -l "phoebe"
29000 phoebe
# ps -q 29000 -eo rss,size
  RSS  SIZE
73392 62360

– Alex

---

I’m surprised. This seems to have helped!

Image 5

Ignoring the spike of Emacs Wiki and just looking Phoebe: we seem to be stable at 250 MiB.

Cool!

Hm… Then again, at the same time, I changed the “phoebe.service” file for systemd:

MemoryMax=250M
MemoryHigh=200M

That is… uncannily close! So here’s what I’m doing now: I decreased the SSH cache from 128 to 64, and I didn’t change the “phoebe.service” file. Let’s see if memory use is halved, i.e. the cache is relevant, or not, i.e. the memory limits are relevant.

– Alex 2021-01-09 15:00 UTC

---

OK, reducing the cache size didn’t help. We’re back at 250 MiB. This *is* a memory leak of sorts, but the resource limit I have in the service file is working.

I think I’m going to reduce those limits:

MemoryMax=100M
MemoryHigh=90M

Let’s see whether it works well enough!

– Alex 2021-01-10 09:54 UTC

---

It works well enough. I added this to the documentation and stopped trying to understand where Mojo::IOLoop::TLS, IO::Socket::SSL, Net::SSLeay, or OpenSSL are failing.

– Alex 2021-01-11 11:26 UTC