💾 Archived View for jacksonchen666.com › posts › 2023-07-16 › 20-09-39 › index.gmi captured on 2023-09-08 at 16:00:25. Gemini links have been rewritten to link to archived content

View Raw

More Information

⬅️ Previous capture (2023-07-22)

➡️ Next capture (2023-11-04)

🚧 View Differences

-=-=-=-=-=-=-

Figuring Out Why Alertmanager Isn't Sending Email Notifications

2023-07-16T20:09:39Z

This is a bit of scraping the bottom of the barrel.

So here's the thing: I have a Prometheus and Alertmanager setup. This was inspired by what sourcehut does (Prometheus and Alertmanager setup), and recently I setup a new thing: A thing that sends Alertmanager notifications through ntfy.

ntfy

That setup I think is better, because it does not eat up my daily email quota I have at Migadu (only receive is applied on sending yourself emails, which is a soft limit of 200 emails per day).

However, I do still have email notifications setup, for cases where the server would die (go offline, lose power, whatever) also known as very urgent alerts.

For those, I have not received any emails...

Here's the test alert rule I used for testing:

      - alert: "Test alarm 1"
        expr: '$something_that_is_the_same_value_as_1 == 1'
        labels:
          severity: "urgent"
          test: "true"
        annotations:
          summary: "Test alarm 1"

It's a snippet, figure out the rest yourself if you're curious in using it for testing.

After that's setup, I just restart Alertmanager with an extra argument `--log.level=debug` to turn on debugging.

I also checked the mail queue with `postqueue -p` to make sure, and all I got was:

Mail queue is empty

Next, I tried changing the receiver of the alerts to only email.

Not much. I lowered the waiting times to 1 second so I can quickly get notifications instead of having to wait.

More testing, still nothing.

So I tried setting to set the default receivers and:

ts=2023-07-16T19:43:26.206Z caller=notify.go:735 level=debug component=dispatcher receiver=email_and_ntfy integration=webhook[0] msg="Notify success" attempts=1
ts=2023-07-16T19:43:26.382Z caller=notify.go:735 level=debug component=dispatcher receiver=email_and_ntfy integration=email[0] msg="Notify success" attempts=1

So, it worked, I did receive the email, but it's the default receiver. So what's wrong with my routes?

Well, now test with a lower "severity" level of "interesting", which should only trigger notifications to ntfy and not email.

~~I added a new rule with a lower severity, and I received an from email *and* ntfy, which is not correct.~~

~~I realize I still have my urgent severity alarm going, so I removed that and renamed the alert rule again.~~

I realize now that I actually have 2 urgent alarms, one of them I forgot to turn down the severity (which is what I needed).

I get tired of waiting, so I restarted Prometheus and Alertmanager.

I then receive an "interesting" "severity" through email, meaning something isn't working on the routes and stuff.

So I look at the config page at

https://prometheus.io/docs/alerting/latest/configuration/

Going through the route section, then I reach the matcher section, I realize something: My config does not match the examples!

Alertmanager route configuration section

Alertmanager matcher configuration section

I had something like this:

matchers:
  - "{severity='important'}"

But the example was something like this:

matchers:
  - alertname = Watchdog
  - severity =~ "warning|critical"

So I turn my config into the following:

matchers:
  - "severity = interesting"

And finally, the notification was only sent through email, as configured.

So my config has been wrong this whole time. But how has it flown under my radar?

Well, possibly due to not reading the documentation enough, I just assumed things and continued to only use one single receiver: Email. I haven't yet had Alertmanager to ntfy setup, so there was never really going to be a problem when notifications are sent to the wrong place because it can't be sent to the wrong place! There was no other place for it to send notifications to.

Finally, I revert my testing configuration so that I am doing things normally instead of weirdly now. That includes deleting the test alarms and etc.

Also, this was a blog post written on the fly. Events and notes happened chronologically except for the case where I did a strike through, because I was wrong and I realized that *after*.

public inbox (comments and discussions)

public inbox archives

(mailing list etiquette for public inbox)