💾 Archived View for dvejmz.srht.site › 2020-05-03-500-aas.gmi captured on 2022-04-29 at 11:40:24. Gemini links have been rewritten to link to archived content

View Raw

More Information

⬅️ Previous capture (2022-03-01)

-=-=-=-=-=-=-

A few days ago I built a new website, though calling it such might be a touch too generous. It's called *500 as a Service* or *500aaS* for short. You can visit it on

500asaservice.com

. Don't sue me if you find it disappointing. It's meant to be a failure after all.

I figured, with the wealth of things available to consume as a service nowadays, it felt just appropriate to altruistically offer a piece of failure on-demand, free of charge, to my fellow humans. I know, I may have probably peaked with this idea right here.

Take it for what you may, but I probably got more out of building this website than you, dear reader, after

contemplating it in befuddlement. I didn't set out to build a lazy failure, mind you. I wanted to build a massively scalable one. And

this poses a somewhat more interesting challenge. How did I do it?

The ingredients of a scalable failure

The vision for 500aaS is as follows:

Provide a planet-scale, elastic, resilient, secure and low-latency on-demand HTTP 500 service.

Planet-scale, elastic... those couple of buzzword bingo entries hint at a cloud service... AWS maybe? Correct! (it was

the *elastic* part that gave it away, wasn't it?). In times past, the simplest way to deploy a site like this would've

been to get hold of a server box somewhere, set up Apache or Nginx on it and configure the web server to always return

500 errors, regardless of the URL pattern it receives. Open this server up to the public Internet via a static IP

address and voila: you got yourself a homemade failure.

# Minimal nginx config that will get you 500 responses forever
# /etc/nginx.conf
events {}

http {
    server {
        location / {
            return 500;
        }
    }
}

But there is a big caveat here. This is just one physical server we're talking about. What if 500aaS took off big time and people started swarming onto my site, anxiously seeking their daily fix of foobar? The server could become overwhelmed, unable to even muster the processing power to serve a *faux* HTTP 500, and start returning real ones, if at all. You could argue this is technically still OK, as the whole point of 500aaS is to fail, but I'm a bit of a purist, so I coudn't accept that possibility. The question remains then: how do I deploy a service like this so that it can serve endless botched responses in a controlled manner, to anyone, under any circumstances? By taking it to the cloud, of course!

The easiest, most scalable way to host a site on AWS is to build it on top of their serverless stack: Lambda,

DynamoDB... My application doesn't need any state to remember it should be always serving a 500 response back so all I

need is a simple Lambda function to run it. One like this maybe:

const fs = require('fs');

exports.handler = async () => {
    const htmlBody = `
<!doctype html>
<html>
    <head>
        <title>500 Internal Server Error</title>
    </head>
    <body>
        <h1>Internal Server Error</h1>
        <p>There was an error processing your request.</p>
    </body>
</html>
    `;
    const response = {
        status: '500',
        statusDescription: 'Internal Server Error',
        headers: {
            vary: [{
                key: 'Vary',
                value: '*',
            }],
            'last-modified': [{
                key: 'Last-Modified',
                value: '2017-01-13',
            }],
            'content-type': [{
               key: 'Content-Type',
               value: 'text/html',
            }],
        },
        body: htmlBody,
    };

    return response;
};

I want to return an error with the minimum amount of complexity and effort possible. Turns out it's actually pretty hard

to cause a Lambda function to truly crash, so I manually craft the 500 status codes instead. Is this cheating? Maybe, but it's not like a user of this service would care. They just want to see a 500 error page, for God's sake!

Despite the simplicity of its implementation, this approach would still require me setting up an API

Gateway as a frontend, which is good but not very cheap in the long run, and not entirely hassle-free. There is another

option, and that is to serve the content as close as possible to the location it was requested from, and generating said response directly where it's served. Does this sound like I'm talking about a CDN? Because that's exactly what I'm talking about.

If you've never come across this approach before, several cloud vendors and CDN providers let you ship your code

directly to the servers at their edge points of presence, which means the client-server exchange journey is remarkably

shortened. Instead of having the CDN as the middle-man that caches the content served from the actual web servers, the

CDN now becomes **the** server. Wait a minute, aren't CDNs just dumb caches serving static Internet files all over the planet? Well, not

anymore! You can now run arbitrary code in them too which allows them to modify Internet payloads running through them

on the fly, as

well as generating new content dynamically!

The first major vendor I know of that started offering this were Cloudflare, with

Cloudflare Workers](https://workers.cloudflare.com/). Workers have evolved a fair bit as a technology since they were first launched a few years ago. You can deploy pretty useful applications straight to their CDN using JavaScript or WASM, which unlocks Rust and even COBOL! The [technology

that enables this is pretty interesting but beyond the scope of this article. Anyway, getting started with Cloudflare Workers is fairly easy nowadays. Here's a sample Worker JS script I put together:

addEventListener('fetch', event => {
  event.respondWith(handleRequest(event.request));
});
/**
 * Respond with HTTP 500
 * @param {Request} request
 */
async function handleRequest(request) {
  const htmlBody = `
<html>
    <head>
        <title>500 - Internal Server Error</title>
    </head>
    <body>
        <h1>500</h1>
        <h2>Internal Server Error</h2>
    </body>
</html>
`;
  return new Response(htmlBody, {
    status: 500,
    headers: { 'content-type': 'text/html' },
  });
}

Deploying it was easy too. The problem came shortly after when I tried to add my new Worker URL to my AWS Route53 DNS

records so that I could use the 500asaservice.com domain for it. The bad news is that Cloudflare won't allow you to do

this, unless you pay them a shed load of money. So that was the end of my adventure with Cloudflare Workers. At this

point I decided to return to AWS to see what they could do for me. And easy enough, they have something pretty similar

to Cloudflare Workers. It's called Lambda@Edge and it allows you to run Lambda functions within CloudFront itself.

With barely no changes to my original Lambda code, I set up a new CloudFront distribution. The origin for the distribution is inconsequential since every single response will be generated within the

Lambda so I just gave it a made-up one. Then, all I had to do was to set up a CloudFront `viewer-request` event as the

trigger for my Lambda and deploy the distribution. Once I got everything working, I encoded the configuration in a

`serverless.yml` so it was easier to change and deploy. And that was pretty much it. I now have a Lambda function which

runs atop Amazon's ubiquitous and nearly infallible CDN. It costs me almost nothing to run it (provided it doesn't start

serving huge amounts of traffic) and requires no maintenance at all. I'm so confident of the performance and uptime

(downtime?) of my application that I even published

an SLA for it

.

There are still a couple of bugs in my application. Excuse me, bugs in an app that was built to fail? Yup, it turns out,

500aaS does not always return a HTTP 500 status code. It can still be susceptible to malformed HTTP requests, which will force

CloudFront to step in and return a HTTP 400 error instead, bypassing the Lambda altogether (this is why my SLA does not promise 100% downtime). This is something I could perhaps fix by overriding the custom error

responses CloudFront returns, but they seem to be set up as a function of the origin response, so I don't know if they

would work with Lambda@Edge. Still a work in progress.

If you're interested in checking out how 500aaS was built, you can browse the

repository on GitHub

. Pull requests and suggestions welcome. I even set up a GitHub Actions pipeline to run a test to ensure it always fails. Because I have standards, you know?