💾 Archived View for capsule.adrianhesketh.com › 2021 › 06 › 14 › go-cdk-app-runner captured on 2024-12-17 at 10:02:45. Gemini links have been rewritten to link to archived content

View Raw

More Information

⬅️ Previous capture (2021-11-30)

-=-=-=-=-=-=-

capsule.adrianhesketh.com

home

Running a Go app on AWS App Runner with CDK

App Runner [1] is a new service for running Web applications on AWS. I'm interested in finding ways to reduce time spent on infrastructure setup and management, and this service claims to be:

the easiest way to run your web application (including API services, backend web services, and websites) on AWS. With App Runner, there is no infrastructure or container orchestration required. You can go from an existing container image, container registry, source code repository, or existing CI/CD workflow to a fully running containerized web application on AWS in minutes.

[1]

Serverless vs App Runner

Some people have compared App Runner to Lambda, which has the strapline: "Run code without thinking about servers or clusters. Only pay for what you use.".

The reason I switched to Serverless from container deployments was because if you're willing/able to write your service explicitly to target Lambda, you gain immense benefits in return. For example, scaling down to zero cost, and per-request isolation. In Lambda, Each HTTP request is executed in a distinct Firecracker VM - even if a VM completely crashes, it only affects that single request. This makes Lambda extremely reliable.

But Serverless isn't perfect. Developers need to be trained to use it effectively, and deal with complexities like limitations around binary formats [2], Lambda Authorizers [3] and building an effective local dev environment.

[2]

[3]

There's also a nagging feeling that the latency could be better. Cold starts can result in relatively high p99 latency, although they can be well mitigated with concurrency configuration, so I'm interested to find out if it's possible to get lower latency and a better developer experience than API Gateway and Lambda with App Runner, while still getting the operational benefits of a managed execution platform.

But with App Runner we're going to lose the process isolation. One request can take down all other requests that are running on the server, or drag down the performance of other requests. On the other hand, we can potentially share cache data between more requests due to the fact that multiple requests are hitting the same service, and we can make more use of the CPU.

We'll also have to keep track of whether my Docker image contains vulnerabilities, but that's pretty easy to manage with ECR, which lists vulnerabilities on uploaded images, and tools like Dependabot [4] or Snyk [5] that can scan your repositories.

[4]

[5]

We'll lose true scale to zero, but the lower bound of App Runner is pretty low ($0.007 per GB/hour) at around $10 a month for a 2GB RAM app instance that isn't doing anything. $10 a month isn't a lot, but it's not nothing.

The pricing example on App Runner states that for an app that's doing 80 requests per second for around 8 hours a day, we'd be looking at a bill of about $25, because we have to pay for active instances. On Lambda, that would probably cost less than $5, but it's still reasonable for a fault tolerant service, as we'll find out though, App Runner isn't as fault tolerant as Lambda.

App Runner infrastructure as code

Unlike SAM or Serverless Framework, App Runner itself doesn't have a way to create additional infrastructure such as DynamoDB tables or SQS queues.

However, AWS Copilot [6] supports App Runner as a target architecture, and can fill the gap, providing a way to add "Additional Resources" [7].

[6]

[7]

I'm pretty sure I'd reach the capabilities of Copilot's additional resources support really quickly in a production project and find it annoying. For example, there's no way to subscribe a Lambda function to a queue within a Copilot project. Maybe the idea is that if you're a container person, you're going to stick with containers throughout, but I'd like to be able to use Lambda for handling async events, and App Runner apps for REST APIs and front-ends.

Either way, I've been spoiled by CDK, and there's no way I want to go back to writing CloudFormation in YAML so I'm going to use CDK [8] to deploy the app.

[8]

CDK got "L1" support for App Runner a few weesks ago, which means that you're basically just writing the same configuration as you would in CloudFormation. "L1" constructs all start with "Cfn" to highlight that they're CloudFormation resources rather than higher level constructs.

When you use these, you're not really gaining any benefits from CDK, like being able to reduce the amount of code you write, or apply permissions to resources using higher level programming constructs, but it does mean it's available for use.

Building an app

To test this out, I needed an app to run, so I created a Hello World Go web server `Dockerfile` in `app/main.go`.

If you hit the root, you get "Hello", but there's a couple of traps in there that can test the behaviour of various problems.

package main

import (
	"fmt"
	"io"
	"log"
	"net/http"
	"os"
)

func main() {
	m := http.NewServeMux()
	m.Handle("/panic", http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		panic("oh no")
	}))
	m.Handle("/exit", http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		os.Exit(1)
	}))
	m.Handle("/fatal", http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		log.Fatal("fatal error")
	}))
	m.Handle("/use-all-memory", http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		const increment = 1024 * 1024 * 256
		var space []byte
		for {
			// Use 256MB RAM.
			space = append(space, make([]byte, increment)...)
			fmt.Printf("%dMB consumed\n", len(space)/1024/1024)
		}
	}))
	m.Handle("/", http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		fmt.Println("Incoming request")
		io.WriteString(w, "<html><head><title>Hello</title></head><body><h1>World</h1></body><html>")
	}))
	http.ListenAndServe(":8000", m)
}

To get this app built, I needed to build the Go web server for Linux, and add it to a Docker container.

Go has brilliant support for cross-platform builds, so even though I'm on a Mac, it's really easy to get it to make a build for Linux and output it to the `../docker-images/` directory with the executable name `app`. I added this to a `Makefile` so I could remember it.

build-linux:
	cd app && GOOS=linux GOARCH=amd64 go build -o ../docker-images/app

With the `app` built for Linux, just a case of adding the executable to a suitable Linux docker image.

The `ubuntu:latest` image used to be *massive*, but these days it's just 27MB, so there's little benefit of using Alpine Linux. Alpine Linux uses `musl` instead of `libc`, so there's some compatability issues that can be avoided by using Ubuntu instead.

The `ADD app ./` statement copies the Go program binary to the container, while the `EXPOSE 8000` documents that port 8000 will be open in the container. Finally, the `ENTRYPOINT` sets the executable to run.

FROM ubuntu:latest
ADD app ./
EXPOSE 8000
ENTRYPOINT ["/app"]

Setting up the CDK project

So far, I've only ever used TypeScript with CDK, but since I'm working on a way to do great front-ends in Go [9], I thought I'd give it a blast with the Go CDK, following the getting started guide. [10]

[9]

[10]

Once CDK is installed, it's easy to create the bootstrap.

cdk init --language=go

With the project template in place, I can edit the Go file it created in the directory, and start adding my infrastructure.

Creating an ECR repository and adding a Docker container to it

Since we've got some code, and a Dockerfile all setup, the next step is to create an Elastic Container Registry and push a built Docker image to it that App Runner can use.

`DockerImageAsset` [11] accepts a Dockerfile and does the rest.

[11]

Unfortunately, CDK doesn't have Go documentation yet, so you've got to know how to convert everything from TypeScript to Go equivalents.

import { DockerImageAsset } = require('@aws-cdk/aws-ecr-assets');
import "github.com/aws/aws-cdk-go/awscdk/awsecrassets"

While it's not exactly taxing, and the Go CDK is just in developer preview, Pulumi's documentation [12] shows all examples in all languages by default.

[12]

If you're coming from TypeScript CDK to Go CDK, some things don't quite translate. In TypeScript you can use the `__dirname` feature to ensure paths are consistent:

new DockerImageAsset(this, 'Node12', {
	directory: path.join(__dirname, '../docker-images/node12')
})

In Go, there's no equivalent of `__dirname` - the paths have to be relative to the place where the code is being executed, so I just replaced that to be from the root of the project.

Go code that uses the Go AWS SDK is usually littered with calls to `aws.String()`, this function returns a pointer to a string, which I assume is because Go strings are not nullable, and the SDK uses pointers to strings to support null values instead. However, in a lot of cases, they're not required and it just makes using the SDK tedious, so I was sad to see a similar construct in CDK with `jsii.String`. It would be great if the team could find a way to make that nicer.

Regardless of minor complaints, I _love_ how simple it is in CDK to create an ECR repository and add a Docker image to it. CDK builds the Docker image locally and pushes it up automatically.

awsecrassets.NewDockerImageAsset(scope, jsii.String("ApplicationImage"), &awsecrassets.DockerImageAssetProps{
	Directory: jsii.String("./docker-images/app),
})

App Runner

With a Docker image in place in a registry, it's now a case of getting App Runner to run it.

This is where I ran into a bug. At the time of writing, the Go SDK has a problem with its type definitions for App Runner where the `sourceConfiguration` field doesn't match up with the `CfnService_SourceConfigurationProperty` type.

Go's "empty" type (`interface{}`) is used instead of the correct types. This meant that IDE help didn't work until I looked at the code and worked out which types should be used:

type CfnService_SourceConfigurationProperty struct {
	// `CfnService.SourceConfigurationProperty.AuthenticationConfiguration`.
	AuthenticationConfiguration interface{} `json:"authenticationConfiguration"`
	// `CfnService.SourceConfigurationProperty.AutoDeploymentsEnabled`.
	AutoDeploymentsEnabled interface{} `json:"autoDeploymentsEnabled"`
	// `CfnService.SourceConfigurationProperty.CodeRepository`.
	CodeRepository interface{} `json:"codeRepository"`
	// `CfnService.SourceConfigurationProperty.ImageRepository`.
	ImageRepository interface{} `json:"imageRepository"`
}

// Properties for defining a `AWS::AppRunner::Service`.
type CfnServiceProps struct {
	// `AWS::AppRunner::Service.SourceConfiguration`.
	SourceConfiguration interface{} `json:"sourceConfiguration"`
	// `AWS::AppRunner::Service.AutoScalingConfigurationArn`.
	AutoScalingConfigurationArn *string `json:"autoScalingConfigurationArn"`
	// `AWS::AppRunner::Service.EncryptionConfiguration`.
	EncryptionConfiguration interface{} `json:"encryptionConfiguration"`
	// `AWS::AppRunner::Service.HealthCheckConfiguration`.
	HealthCheckConfiguration interface{} `json:"healthCheckConfiguration"`
	// `AWS::AppRunner::Service.InstanceConfiguration`.
	InstanceConfiguration interface{} `json:"instanceConfiguration"`
	// `AWS::AppRunner::Service.ServiceName`.
	ServiceName *string `json:"serviceName"`
	// `AWS::AppRunner::Service.Tags`.
	Tags *[]*awscdk.CfnTag `json:"tags"`
}

This led me to (incorrectly) guessing at the config being using pointers - *don't do this*:

awsapprunner.NewCfnService(stack, jsii.String("AppRunner"), &awsapprunner.CfnServiceProps{
	SourceConfiguration: &awsapprunner.CfnService_SourceConfigurationProperty{
		ImageRepository: &awsapprunner.CfnService_ImageRepositoryProperty{
			ImageIdentifier: image.ImageUri(),
			ImageConfiguration: &awsapprunner.CfnService_ImageConfigurationProperty{
				Port: jsii.String("80"),
			},
			ImageRepositoryType: jsii.String("ECR"),
		},
	},
})

Looking at the output of `cdk synth`, I could see that the `SourceConfiguration` was very wrong.

  AppRunner:
    Type: AWS::AppRunner::Service
    Properties:
      SourceConfiguration: {}
    Metadata:
      aws:cdk:path: CdkGoAppRunnerStack/AppRunner

With a bit of fiddling around, I got there. The config wasn't _really_ simple, because the CDK construct is only L1, but it's still pretty simple. With this in place, I had a working app.

// Grant App Runner read access to the Docker container.
ecrAccessRole := awsiam.NewRole(stack, jsii.String("AppRunnerRole"), &awsiam.RoleProps{
	AssumedBy: awsiam.NewServicePrincipal(jsii.String("build.apprunner.amazonaws.com"), &awsiam.ServicePrincipalOpts{}),
})
image.Repository().GrantPull(ecrAccessRole)

awsapprunner.NewCfnService(stack, jsii.String("AppRunner"), &awsapprunner.CfnServiceProps{
	SourceConfiguration: awsapprunner.CfnService_SourceConfigurationProperty{
		ImageRepository: awsapprunner.CfnService_ImageRepositoryProperty{
			ImageIdentifier: image.ImageUri(),
			ImageConfiguration: awsapprunner.CfnService_ImageConfigurationProperty{
				Port: jsii.String("8000"),
			},
			ImageRepositoryType: jsii.String("ECR"),
		},
		AuthenticationConfiguration: awsapprunner.CfnService_AuthenticationConfigurationProperty{
			AccessRoleArn: ecrAccessRole.RoleArn(),
		},
	},
})

Deployment

Deployment is a single command `cdk deploy`, but I added it to a `Makefile` so that I remember to build the binary first.

deploy: build-linux
	cdk deploy

The first time I ran it, it completely failed with an error.

Template format error: Unrecognized resource types: [AWS::AppRunner::Service]

I realised that I hadn't set the region to be Ireland (eu-west-1) and so I was trying to use App Runner in London (eu-west-2) which isn't supported yet. Swapping regions by updating the env function did the trick:

func env() *awscdk.Environment {
	return &awscdk.Environment{
		Region: jsii.String("eu-west-1"),
	}
}

Deployment is _slow_ though. Not only did it take about 4 minutes for the Docker build, push, and CloudFormation change to execute. But the app popped up with a HTTPS domain.

https://xxxxxxxxx.eu-west-1.awsapprunner.com/

Testing

With an App Runner "hello world", I was able to give it a try out. The verbose output of `curl` showed that it's using the Envoy server [13].

[13]

> GET / HTTP/1.1
> Host: xxxxxxxxx.eu-west-1.awsapprunner.com
> User-Agent: curl/7.64.1
> Accept: */*
> 
< HTTP/1.1 200 OK
< content-length: 72
< content-type: text/html; charset=utf-8
< date: Sun, 13 Jun 2021 18:17:55 GMT
< x-envoy-upstream-service-time: 1
< server: envoy

I'm suprised that AWS gives this away, because this would show up on most penetration tests as an information disclosure vulnerability, even without showing the version number of the server software.

I used `wrk` [14] to run a quick check on the performance.

[14]

$ nix-shell -p wrk
[nix-shell:~/github.com/a-h/cdk-go-app-runner]$ wrk https://xxxxxxxxx.eu-west-1.awsapprunner.com/
Running 10s test @ https://xxxxxxxxx.eu-west-1.awsapprunner.com/
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    28.41ms    2.77ms  69.37ms   79.65%
    Req/Sec   175.33     19.96   202.00     77.27%
  3500 requests in 10.09s, 810.07KB read
Requests/sec:    346.91
Transfer/sec:     80.29KB

Pretty cool results.

I've got a Hello World API Gateway server hanging around to compare against. It uses Node.js rather than Go, but that should give it an advantage on cold start performance over Go. [15]

[15]

Running 10s test @ https://xxxxxxxxxx.execute-api.eu-west-1.amazonaws.com/dev/hello
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    65.53ms   16.24ms 202.08ms   89.97%
    Req/Sec    76.37     16.79   101.00     71.43%
  1516 requests in 10.03s, 5.25MB read
Requests/sec:    151.22
Transfer/sec:    536.47KB

Obviously, this is quite a small test, but a reduction from API Gateway's 65ms latency to App Runner's 30ms is a lot faster, and App Runner has more consistent performance.

Adding a database

To make it more realistic, I added some database read/write code to the app.

m.Handle("/dynamodb/read", http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
	log.Printf("read: creating session")
	sess, err := session.NewSession(&aws.Config{Region: aws.String("eu-west-1")})
	if err != nil {
		http.Error(w, "failed to create session", http.StatusInternalServerError)
		return
	}
	client := dynamodb.New(sess)
	tableName := os.Getenv("TABLE_NAME")
	log.Printf("tableName: %q", tableName)
	_, err = client.Query(&dynamodb.QueryInput{
		TableName:              aws.String(tableName),
		KeyConditionExpression: aws.String("#pk = :pk"),
		ExpressionAttributeNames: map[string]*string{
			"#pk": aws.String("pk"),
		},
		ExpressionAttributeValues: map[string]*dynamodb.AttributeValue{
			":pk": {S: aws.String("pk")},
		},
	})
	if err != nil {
		log.Printf("read error: %v", err)
		http.Error(w, "failed to read", http.StatusInternalServerError)
		return
	}
	io.WriteString(w, "<html><head><title>Read data</title></head><body><h1>Data read</h1></body><html>")
}))

Next, I needed to add a table to the CDK project, and set an environment variable for App Runner to pick up the table name.

table := awsdynamodb.NewTable(stack, jsii.String("AppRunnerTable"), &awsdynamodb.TableProps{
	PartitionKey: &awsdynamodb.Attribute{Name: jsii.String("pk"), Type: awsdynamodb.AttributeType_STRING},
	SortKey:      &awsdynamodb.Attribute{Name: jsii.String("sk"), Type: awsdynamodb.AttributeType_STRING},
	BillingMode:  awsdynamodb.BillingMode_PAY_PER_REQUEST,
})
RuntimeEnvironmentVariables: []awsapprunner.CfnService_KeyValuePairProperty{
	{Name: jsii.String("TABLE_NAME"), Value: table.TableName()},
},

Finally, I needed to add an instance role to App Runner to give it permission to access DynamoDB:

appRunnerInstanceRole := awsiam.NewRole(stack, jsii.String("AppRunnerInstanceRole"), &awsiam.RoleProps{
	AssumedBy: awsiam.NewServicePrincipal(jsii.String("tasks.apprunner.amazonaws.com"), &awsiam.ServicePrincipalOpts{}),
})
table.GrantReadWriteData(appRunnerInstanceRole)

The App Runner configuration was starting to look a bit chunkier, but not terrible.

awsapprunner.NewCfnService(stack, jsii.String("AppRunner"), &awsapprunner.CfnServiceProps{
	SourceConfiguration: awsapprunner.CfnService_SourceConfigurationProperty{
		ImageRepository: awsapprunner.CfnService_ImageRepositoryProperty{
			ImageIdentifier: image.ImageUri(),
			ImageConfiguration: awsapprunner.CfnService_ImageConfigurationProperty{
				Port: jsii.String("8000"),
				RuntimeEnvironmentVariables: []awsapprunner.CfnService_KeyValuePairProperty{
					{Name: jsii.String("TABLE_NAME"), Value: table.TableName()},
				},
			},
			ImageRepositoryType: jsii.String("ECR"),
		},
		AuthenticationConfiguration: awsapprunner.CfnService_AuthenticationConfigurationProperty{
			AccessRoleArn: ecrAccessRole.RoleArn(),
		},
	},
	InstanceConfiguration: awsapprunner.CfnService_InstanceConfigurationProperty{
		InstanceRoleArn: appRunnerInstanceRole.RoleArn(),
	},
})

But when I ran the deploy, I got an unexpected error in the logs, and the web request hung, showing that the default HTTP request timeout is somewhere in the region of 80 seconds, rather than the 30 of API Gateway.

Checking the logs, I saw:

caused by: Post "https://dynamodb.eu-west-1.amazonaws.com/": x509: certificate signed by unknown authority

Casting my mind back, I realised that this was likely because the now tiny Docker image doesn't contain any information about certificate authorities, so I had to update the Dockerfile to get it to install certificates.

FROM ubuntu:latest
ADD app ./
RUN apt-get update && apt-get -y install ca-certificates
EXPOSE 8000
ENTRYPOINT ["/app"]

After another deploy, I had an app that can read and write to DynamoDB with great response times. Now time to find out what happens when things go wrong.

Testing failures

Exiting the web server

If you check out the source code for the app, you'll notice that I added a `/exit` handler that terminates the Web server. Lets terminate the server and see how quickly App Runner recovers.

Immediately after termination, App Runner returns a `HTTP/1.1 502 Bad Gateway` response.

[nix-shell:~/github.com/a-h/cdk-go-app-runner]$ date && curl https://xxxxxxxxx.eu-west-1.awsapprunner.com/exit
Sun Jun 13 19:31:10 BST 2021

[nix-shell:~/github.com/a-h/cdk-go-app-runner]$ date && curl https://xxxxxxxxx.eu-west-1.awsapprunner.com/
Sun Jun 13 19:31:15 BST 2021

After 15 seconds, this turned into a `HTTP/1.1 503 Service Unavailable`:

[nix-shell:~/github.com/a-h/cdk-go-app-runner]$ date && curl https://xxxxxxxxx.eu-west-1.awsapprunner.com/
Sun Jun 13 19:31:30 BST 2021
upstream connect error or disconnect/reset before headers. reset reason: connection failure

After a total of 70 seconds, it was returning `404` errors:

[nix-shell:~/github.com/a-h/cdk-go-app-runner]$ date && curl -v https://xxxxxxxxx.eu-west-1.awsapprunner.com/
Sun Jun 13 19:32:21 BST 2021

  CApath: none

> GET / HTTP/1.1
> Host: xxxxxxxxx.eu-west-1.awsapprunner.com
> User-Agent: curl/7.64.1
> Accept: */*
>
< HTTP/1.1 404 Not Found
< date: Sun, 13 Jun 2021 18:32:21 GMT
< server: envoy
< content-length: 0
<

After 76 seconds of total elapsed time, we were back online.

[nix-shell:~/github.com/a-h/cdk-go-app-runner]$ date && curl https://xxxxxxxxx.eu-west-1.awsapprunner.com/
Sun Jun 13 19:32:26 BST 2021
<html><head><title>Hello</title></head><body><h1>World</h1></body><html>

Compared to Serverless, this is poor, because with a Serverless app, one crash would not affect any other user. But for a container app, that's a pretty good zero-effort recovery.

Using all the memory!

A common problem with long-running servers running in .NET and Java is memory leaks - the server takes up more RAM the longer the server is running for, ultimately exhausting all RAM.

With a Serverless app, if you use all the RAM up, the Lambda function terminates and that request is lost, but no other users are affected.

Let's find out what happens with App Runner. How quickly will we lose performance and, ultimately, the server.

[nix-shell:~/github.com/a-h/cdk-go-app-runner]$ date && curl -v https://xxxxxxxxx.eu-west-1.awsapprunner.com/use-all-memory
Sun Jun 13 20:00:42 BST 2021
...
< HTTP/1.1 502 Bad Gateway
< date: Sun, 13 Jun 2021 19:00:46 GMT
< content-length: 0
< x-envoy-upstream-service-time: 4069
< server: envoy
< 

Exactly the same thing happened as exiting the server, with the same recovery time. Again, the failure mode is more severe.

Load balancing

At the moment, it doesn't look like there's a way to configure autoscaling to set a minimum number of instances to greater than one using CloudFormation because there's no way to create an App Runner `AutoScalingConfiguration` resource [16].

[16]

If there's no way to do that in CloudFormation, then there's no way to do that in CDK.

Oh well. It's possible to update it from the UI.

Edit the configuration

Set to use a custom autoscaling configuration

Set a minimum of two servers

With that in place, I killed one of the servers to check what happened, but the service still collapsed just like I'd exited all servers. The active instance metric still showed only one server was running. I guessed that I had to click "Deploy" to make it work, but I was actually still getting routed to the broken server.

It looks to me that the container orchestration and load balancing are currently disconnected. You can forcibly terminate a container and the load balancer has no idea that it's happened until healthchecks fail. If the team can make that seamless, the service will be a lot more useful.

Summary

Well, it's got promise, but it's bleeding edge right now.

I'm tempted by the low latency and ease of setup, but as it stands, the danger of downtime and complex performance issues due to the lack of request isolation means I'm unlikely to use this for my next major project. I'll just have to hope for a faster API Gateway, that's properly integrated into Lambda rather than being a separate product, so that it can reduce cold starts by predicting load ahead of time, and integrating Lambda VM start up into TLS handshakes, like CloudFlare Workers [17].

[17]

The full example code is available on Github at [18].

[18]

More

Next

Cancelling Go network requests

Previous

Building a Hotwired web app with Go and Templ

Home

home