💾 Archived View for sylvaindurand.org › website-delivery-with-cloudfront › index.gmi captured on 2022-04-29 at 11:23:46. Gemini links have been rewritten to link to archived content

View Raw

More Information

⬅️ Previous capture (2022-04-28)

-=-=-=-=-=-=-

sylvaindurand.org

Website delivery with Cloudfront

The main interest of static website is to be able to be stored in the cloud, that is to say on a content delivery network able to serve our data with amazing performances, a very low cost, and multiple benefits in terms of lightness, safety and reliability.

This article sets out to show how to host a static website (wether built from a generator like Jekyll, Pelican or Hyde, or "by hand") on Amazon Web Services, especially Amazon S3 in order to store the website files, and CloudFront in order to deliver them. The main purpose is to be able to get a website fast from anywhere, reliable, secure, highly scalable and inexpensive (with a low traffic, the hosting will cost you about one dollar a month).

To this purpose, we will use:

- S3 in order to store the website;

- CloudFront in order to deliver our data;

- Route 53 in order to use our domain name.

- Awstats in order to analyse the logs.

Hosting on S3

After having created an AWS account, go to the management console, then to S3: this service will store the website files.

Creating buckets

The website will be located on `www.domain.tld`. The domain name root, `domain.tld`, will redirect to this location. Create two buckets (with `Create bucket`) named from the expected URL: `domain.tld` and `www.domain.tld`.

Bucket hosting the website

In `www.domain.tld` properties, activate website hosting (`Static Website Hosting` then `Enable website hosting`): you can choose a home page (`index.html`) or an error page (`error.html`). The website will be available from the `Endpoint` URL: keep it, we will need it in the following section.

Buckets redirecting

In `domain.tld` properties, choose `Static Website Hosting` in order to select `Redirect all requests`: we provide `domain.tld`. This bucket will stay empty.

From `Endpoint` location, we can now see the files hosted in `www.domain.tld` bucket. We can upload those files from the AWS console, but we will explain on the last part how to upload it with one bash command line.

Serving data with Cloudfront

S3 hosts our data in one unique location. Data stored in Dublin will be provided quite fast to a visitor located in Paris (about 200 ms in order to load the home page of this website) but less in New-York (500 ms) or Shanghai (1,300 ms).

Amazon CloudFront is a CDN, serving content to end-users with high availability and high performance. The access time falls below 100 ms in Paris, New-York and Shanghai.

In return, a propagation delay exists between an upload and its update on Cloudfront. We will see in the last part how to notify any modification.

Creating the distribution

In the AWS management console, choose CloudFront, `Create Distribution`, then `Web`.

In `Origin Domain Name`, we provide the address previously copied, similar to `www.domain.tld.s3-website-eu-west-1.amazonaws.com`. The field will automatically propose an other value (like `www.domain.tld.s3.amazonaws.com`); don't click on it: URL ending with `/` wouldn't lead to `/index.html`. Choose an ID in `Origin ID`.

Leave the other field as default, except `Alternate Domain Names` where we provide our domain name: `www.domain.tld`. Indicate the homepage in `Default Root Object`: `index.html`.

Our distribution is now created: we can now activate it with `Enable`. `InProgress` status means Cloudfront is currently propagating our data; when it's over, the status will become `Deployed`.

Domain name with Route 53

Creating zone

Route 53 service will allow us to use our own domain name. In AWS console management, select Route 53 then `Create Hosted Zone`. In `Domain Name`, put your domain name without any sub-domain: `domain.tld`.

Redirecting DNS

Select the newly created zone, then the `NS` type. Its `Value` field gives 4 addresses. Tell your registrar to make the DNS pointing to them.

Domain with Cloudfront

Back to Route 53, in `domain.tld`, create 3 records set with `Create Record Set`:

- `domain.tld` (put `www` in `name`), `A` type: in `Alias`, select our Cloudfront distribution;

- `domain.tld`, `A` type: select the same name bucket.

Of course, it is possible to redirect sub-domains to other services (with NS, A and CNAME records) and to use mails (MX records).

Now, an user going to `domain.tld` or `www.domain.tld` will target the same name buckets (thanks to Route 53) which redirect to `www.domain.tld` (thanks to S3). This address directly leads (thanks to Route 53) to the Cloudfront distribution, which provides our files stored in the bucket `www.domain.tld`. Now, we just have to send our website to Amazon S3.

Deploying Jekyll to the cloud

We will now create a `sh` file which will build the website, compress and send to Amazon S3 files which has been updated since the previous version, and indicate it to Cloudfront.

Prerequisite

We will use `s3cmd` in order to sync our bucket with our local files. Install the last development version (you need to install at least the 1.5 version) which allows us to invalidate files on Cloudfront.

On Mac OS, with Homebrew, install `s3cmd` with `--devel` option, and `gnupg` for secured transfers:

brew install --devel s3cmd
brew install gpg

Now we have to gives `s3cmd` the ability to deal with our AWS account. In Security Credentials, go to `Access Key` and generate one access key and its secret Key. Then configure `s3cmd` with `s3cmd --configure`.

In order to optimize images, we will install `jpegoptim` and `optipng`:

sudo brew install jpegoptim
sudo brew install optipng

Building Jekyll and compressing files

We first build Jekyll into the `_site` folder:

jekyll build

Using `jekyll-press` plugin will optimize HTML, CSS and JS files. If `jpegoptim` and `optipng` are installed, we can optimize images:

find _site -name '*.jpg' -exec jpegoptim --strip-all -m80 {} \;
find _site -name '*.png' -exec optipng -o5 {} \;

Then, in order to improve performances, we compress HTML, CSS and JS files with Gzip, that is to say all files out of `static/` folder:

find _site -path _site/static -prune -o -type f \
-exec gzip -n "{}" \; -exec mv "{}.gz" "{}" \;

Uploading files to Amazon S3

We use `s3cmd` to upload the website; only the updated files will be sent. We use the following options:

- `acl-public` make our files public;

- `cf-invalidate` warn Cloudfront files have to be updated;

- `add-header` defines headers (compression, cache duration...);

- `delete-sync` delete files removed locally;

- `M` defines files MIME type.

We first send static files, stored in `static/`, assigning them a 10 weeks cache duration:

s3cmd --acl-public --cf-invalidate -M \
      --add-header="Cache-Control: max-age=6048000" \
      --cf-invalidate \
      sync _site/static s3://www.domain.tld/

Then we send the other files (HTML, CSS, JS...) with a 48 hours cache duration:

s3cmd --acl-public --cf-invalidate -M \
      --add-header 'Content-Encoding:gzip' \
      --add-header="Cache-Control: max-age=604800" \
      --cf-invalidate \
      --exclude="/static/*" \
      sync _site/ s3://www.domain.tld/

Finally we clean the bucket by deleting files which have been deleted in the local folder, and we invalidate the home page on Cloudfront (`cf-invalidate` doesn't do it):

s3cmd --delete-removed --cf-invalidate-default-index \
      sync _site/ s3://www.domain.tld/

Deploy in one single command

We put those command in one single file, named `_deploy.sh`, located in Jekyll folder:

#!/bin/sh
# Building Jekyll
jekyll build

# Compressing and optimizing files
find _site -path _site/static -prune -o -type f \
      -exec gzip -n "{}" \; -exec mv "{}.gz" "{}" \;
find _site -name '*.jpg' -exec jpegoptim --strip-all -m80 {} \;
find _site -name '*.png' -exec optipng -o5 {} \;

# Synchronisation des médias
s3cmd --acl-public --cf-invalidate -M \
      --add-header="Cache-Control: max-age=6048000" \
      --cf-invalidate \
      sync _site/static s3://www.domain.tld/

# Sync media
s3cmd --acl-public --cf-invalidate -M \
      --add-header 'Content-Encoding:gzip' \
      --add-header="Cache-Control: max-age=604800" \
      --cf-invalidate \
      --exclude="/static/*" \
      sync _site/ s3://www.domain.tld/

# Delete removed files
s3cmd --delete-removed --cf-invalidate-default-index \
      sync _site/ s3://www.domain.tld/

You only have to execute `sh _deploy.sh` to update the website. A few minutes may be required in order to update CloudFront data.

Stats

Although our site is static and served by a CDN, it is quite possible to analyze the logs if you do not want to use a system based on a javascript code, such as Matomo or Google Analytics. Here, we will automate the task (recovery logs, processing and displaying statistics) from a server (in our example, a Raspberry Pi in Raspbian) and we will use Awstats.

Retrieving logs

Let's start by activating logs creation on our Cloudfront distribution. In the AWS Management Console, select the Amazon S3 service and create a "statistics" bucket which will store logs waiting to be retrieved. Then, in Cloudfront, select the distribution that provides our website, then `Distribution` settings, `Edit`, and select the `statistics` bucket in the field `Bucket for Logs`.

We create locally a folder that will retrieve these logs, then we can then retrieve the logs and then delete the bucket using `s3cmd`:

mkdir ~/awstats
mkdir ~/awstats/logs
s3cmd get --recursive s3://statistics/ ~/awstats/logs/
s3cmd del --recursive --force s3://statistics/

Installing and configuring Awstats

We begin by installing and copy Awstats (where `www.domain.tld` is your domain name):

sudo apt-get install awstats
sudo cp /etc/awstats/awstats.conf \
        /etc/awstats/awstats.www.domain.tld.conf
sudo nano /etc/awstats/awstats.www.domain.tld.conf

In this configuration file, change the following settings to specify how to treat Awstats logs Cloudfront (where `user` is your username):

# Processing multiple gzip logs files
LogFile="/usr/share/awstats/tools/logresolvemerge.pl /home/user/awstats/logs/* |"
# Formating Cloudfront generated logs
LogFormat="%time2 %cluster %bytesd %host %method %virtualname %url %code %referer %ua %query"
LogSeparator="\t"
# Domain names (website and Cloudfront)
SiteDomain="www.domain.tld"
HostAliases="REGEX[.cloudfront\.net]"

Finally, we copy the images which will be displayed in the reports:

sudo cp -r /usr/share/awstats/icon/ ~/awstats/awstats-icon/

Generating stats

Once this configuration is done, it is possible to generate statistics as a static HTML file using:

/usr/share/awstats/tools/awstats_buildstaticpages.pl \
    -dir=~/awstats/ -update -config=www.domain.tld \

The statistics are now readable from the `awstats.www.domain.tld.html` file. It is then possible to publish it, send it to a server or email for example.

Regular updating

To automate the generation of statistics at regular intervals, creating a `stats.sh` with `nano ~/awstats/stats.sh` that retrieves logs and generates statistics:

#!/bin/sh
# Retrieving logs
s3cmd get --recursive s3://statistics/ ~/awstats/logs/
s3cmd del --recursive --force s3://statistics/
# Generating stats
/usr/share/awstats/tools/awstats_buildstaticpages.pl \
    -dir=~/awstats/ -update -config=www.domain.tld \

We give the rights to this file so it can be executed, and then create a cron task:

sudo chmod 711 ~/awstats/stats.sh
sudo crontab -e

To generate statistics of every six hours for example:

0 */6 * * * ~/awstats/stats.sh