If you use Kubernetes for any length of time, you will come to realise that when everything is working, it's brilliant - and when it's not, it's entirely impenetrable. Figuring out what's going on means digging through dozens of different services, all in their own containers, and trying to work out where the interactions are failing...
Without a good centralised logging service, you are basically doing this blind. I would say the single most critical thing you can do immediately after you install your Kubernetes cluster, is wire it up to a logging service, so that you are capturing somewhere the output of every one of those containers. Do it, do it now!
Well, I'm glad you asked. In the real world, where I get paid for this stuff, we have used a variety of systems over the years, but the main one we've settled on is the now-venerable ELK (ElasticSearch, Logstash, Kibana) stack[1]. It works well, and offers a great deal, but I have to say personally I feel it adds rather a lot of overhead/complexity for a simple solution. We've also found that Grafana Loki[2] is a good solution, which combined with Prometheus and Grafana itself makes it easy to bring your logs and application/system metrics together into a single dashboard. Dare I say it, I think I actually prefer it to ELK.
Neither of those solutions is really appropriate for collecting logs from my simple Kubernetes cluster, though. The simplest way to deploy either of them is with a Helm chart onto Kubernetes itself, but there is something unpleasantly recursive about storing all the logs that will help me debug the cluster on an application hosted on the cluster itself. More pragmatically, given the somewhat underpowered[3] nature of my cluster, probably 80% of the cluster's processing capacity would be spent on the logging service.
No, I needed something simpler...
What I needed for this is a cloud service provider, that can do a simple job - give me an endpoint I can dump all my logs to, and then a nice web UI that lets me view them live, as well as searching through and filtering them. Being able to graph the rate of messages, as well as setting up automatic alerts, are nice to haves.
This is far from a unique requirement. But it's also far from cheap - there are many services out there that can do the job, often combined with APM (Application Performance Monitoring) and system monitoring services; but what most of these services have in common is being prohibitively expensive for someone who is "just playing" or using them for hobby purposes.
*An aside:* This is something that actually really annoys me about many
Cloud service providers. If you hide your prices away behind a "call our
sales team" number, or if your cheapest tier is priced in tens-of-dollars-per-host,
then I'm just not going to look at your service. And that's a shame for you,
because when it comes down to renewing the half-a-million-euro contract for
APM services I have in my day job, I'm not going to know how cool your
product is from my experience at home.
The really good Cloud providers, like Google, Cloudflare or even AWS, let
techies sign-up, and actually use their services for real (if small scale)
projects on a pay-as-you-go basis without getting on the hook for thousands
of dollars of minimum fees. They know that by doing this, they'll earn
tens or hundreds of thousands more by building awareness and expertise of
their services among the people who used them in a hobby project, but then
went on to use them professionally.
Anyway, ranting aside, after look through several such services, I did eventually settle on one that does the job, is simple to set up, and is also reasonably priced.
Solarwinds appears to be one of those companies that has grown by acquiring all the other companies that looked like they were cool at some point, and so have multiple, overlapping products that share the same brand name but are otherwise entirely different. Accordingly, they are able within a single company to represent both the worst and the best of cloud service provision, for basically the same product.
At the "worst" end of the scale you have Loggly. Seriously, don't even sign up for this - it's a nice product, and it works well enough, but the instant you register for a free trial you can expect to be spammed with increasingly passive-aggressive emails from sales people. I mean seriously, within three days of registering on their site, I had received three emails, and we'd already got to this stage of neediness:
Hi Tim,
As I haven't heard back, I'm still trying to figure out how to help you. Are you looking at Loggly for your production environment? Current development purposes? One-off project?
[...]
It's been three days since I first stumbled on your website and we're already at the "as I haven't heard back" stage? Damn, I'm glad we didn't meet on Tinder, Loggly.
But, I don't want to talk about my abusive relationship with Loggly. Because Solarwinds also have a product at the other, good, end of the spectrum: Papertrail[4]. It does one thing - consolidating and exposing logs - well, no salespeople will call, and it's reasonably priced. For $7/month, paid monthly, you can upload 1GB worth of logs, which will be searchable for a week and retained for download for a year. And if you need more than that (it's plenty for my little cluster), you can set up your own plan entirely at your own control from the dashboard. And, in a nice touch, if you have some system that's stupidly chatty and you really don't need to see everything it is saying - you can set up filters to remove those loglines *before* they impact your plan limits.
So, the short version of this story is - use Papertrail. It does the job, it's cost-effective, and it's easy to integrate with Kubernetes.
Aha, so this is the interesting bit. So, you've chosen your logging provider, now how do you integrate it with Kubernetes?
Well, the first thing you need is some kind of log-aggregator that runs on your Kubernetes nodes. It needs to collate all the logs from all the running pods, and then send them someplace (in my case, to Papertrail.)
There are many options for this component, but the simplest I've found is Logspout[5]. As the `readme.md` says:
Logspout is a log router for Docker containers that runs inside Docker. It attaches to all containers on a host, then routes their logs wherever you want. It also has an extensible module system.
It's a mostly stateless log appliance. It's not meant for managing log files or looking at history. It is just a means to get your logs out to live somewhere else, where they belong.
Because Logspout is 'docker native', all we need to do is spin it up on each of our Kubernetes nodes, and the rest pretty much happens by magic.
How do we spin some application up so that it runs on every node of our Kubernetes system? With a `DaemonSet` deployment, of course! This is what the deployment YAML looks like, to run Logspout up, and then send all my logs to my Papertrail account:
apiVersion: apps/v1 kind: DaemonSet metadata: labels: name: logspout name: logspout namespace: default spec: selector: matchLabels: name: logspout template: metadata: labels: name: logspout spec: containers: - env: - name: SYSLOG_TAG value: '{{ index .Container.Config.Labels "io.kubernetes.pod.namespace" }}[{{ index .Container.Config.Labels "io.kubernetes.pod.name" }}]' - name: SYSLOG_HOSTNAME value: '{{ index .Container.Config.Labels "io.kubernetes.container.name" }}' image: mdsol/logspout:latest imagePullPolicy: IfNotPresent args: - "syslog+udp://mypapertrailendpoint" name: logspout resources: limits: memory: 128Mi requests: cpu: 150m volumeMounts: - mountPath: /var/run/docker.sock name: log restartPolicy: Always volumes: - hostPath: path: /var/run/docker.sock type: "" name: log
The only thing you need to replace there to make this work with your own system is to change `mypapertrailendpoint` with the hostname and port number that you'll find in your Papertrail account dashboard (*Settings* > *Log Destinations*).
I deploy this to my Kubernetes environment with `kubectl apply -f`, and as if by magic, all my logs appear in one simple online dashboard: {{< figure src="img/2020-05-22-papertrail.png" caption="Papertrail" captionPosition="right">}}
And that, pretty much, is that.
I am using a `syslog+udp` endpoint here. I'm doing that because it's incredibly simple (and I've said before, I like simple.) It's also bandwidth efficient; there are two factors you need to consider though:
┌───────────────────┬──────────────────────────────────────────────────────────┐ │ Issue │ Explanation │ ╞═══════════════════╪══════════════════════════════════════════════════════════╡ │ │ UDP messages are basically sent fire-and-forget; │ │ │ hopefully it's received, but if not, it's just lost. │ │ UDP is unreliable │ For my purposes, this is fine - and I appreciate the │ │ │ lack of overhead associated with creating a TCP │ │ │ connection for reliable delivery. │ ├───────────────────┼──────────────────────────────────────────────────────────┤ │ │ There is no encryption happening here - your data is │ │ │ sent plain-text across the Internet. Anyone sniffing │ │ │ your traffic will be able to see it. For my purposes, │ │ UDP is insecure │ this is not an issue - but if I ever start logging │ │ │ things that could contain sensitive data (personal │ │ │ details of, well, anyone - letalone things like credit │ │ │ card numbers), UDP is no longer acceptable. │ └───────────────────┴──────────────────────────────────────────────────────────┘
The solution in both cases is to switch to an encrypted, TCP endpoint. Papertrail supports TLS-encrypted TCP connections, so this is easy enough to do. Why haven't I? UDP just doesn't get enough love, I guess...
1: https://www.elastic.co/what-is/elk-stack
2: https://grafana.com/oss/loki/
3: /posts/2020-05-11-snowgoonsplatform.gmi
4: https://www.solarwinds.com/papertrail
5: https://github.com/gliderlabs/logspout
--------------------