Our WSJT plugin maintains a one-hour buffer and reports statistics on its contents when asked by a Datalog. We see the buffer inexplicably empties about once a day.
See FT8 Decodes Datalog
See Heap Datalog
Errors don't happen every day. Recent errors are around 23:00 utc each day. We dig into the server logs when this happens.
22: JavaScript heap out of memory 23: JavaScript heap out of memory 24: JavaScript heap out of memory 25: Cannot read property 'replace' of undefined 26: server unexpectedly exited, 27972 (SIGKILL)
We notice Datalog timer based requests time out before or as the server runs out of memory. github
Unexpected restarts cease. enlarge
<img width=100% src=http://ward.asia.wiki.org/assets/pages/unexpected-server-restarts/Screen%20Shot%202019-08-29%20at%206.00.34%20PM.png>
We correlate the improvement with suspension of second radio reporting decode datagrams. enlarge
<img width=100% src=http://ward.asia.wiki.org/assets/pages/unexpected-server-restarts/Screen%20Shot%202019-08-29%20at%206.01.28%20PM.png>
Second source stops 9:54 pm, Monday, August 26, 2019.
Notice that restarts coincide with midday peak decodes per slot (blue line) of over 20. With one radio reporting (gray line) this peaks at around 10.
We're recording pings on the same server. We look to see if these are delayed at all when datagram traffic gets high midday. This doesn't seem to be the case. expand
<img width=100% src=http://ward.asia.wiki.org/assets/pages/unexpected-server-restarts/Screen%20Shot%202019-08-29%20at%206.46.32%20PM.png>
Digital Ocean 2 GB droplet memory usage shows daily restarts at peak consumption followed by days of stability under lighter traffic load. enlarge
<img width=100% src=http://ward.asia.wiki.org/assets/pages/unexpected-server-restarts/Screen%20Shot%202019-08-29%20at%206.56.24%20PM.png>
All other charts are nominal.
It's possible our datagram decoding logic is wasteful of memory. For each binary message we construct an object that contains five multi-byte decoding functions closed over the payload position counter. github
See Big Trouble Pages that trigger errors
We've seen the log grow well beyond the 240 slots expected when the winnow() is called in a timely way. This appears to be a bug in node fixed in 10.9.0. github