💾 Archived View for capsule.adrianhesketh.com › 2020 › 07 › 18 › wordpress-to-hugo-and-aws captured on 2024-05-12 at 15:06:30. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2021-11-30)
-=-=-=-=-=-=-
This year, I switched my blog away from wordpress.com (where it had been for years), over to hosting it myself on AWS because I didn't like the new editor in Wordpress (and the lack of markdown support). It's cheaper to host it on AWS, but that wasn't the main point.
I thought I'd document the process for anyone else, that wants to do it.
The first to do is to export the content from Wordpress [0]
You should then have two archive files:
- `adrianhesketh.wordpress.com-2020-07-16-10_28_08-sopboj4dtvekjo1mrrdyncxok0dhtqou.zip`
- Posts.
- `media-export-40546272-from-0-to-3610.tar`
- All the photos and other media.
Hugo [1] is a static site generator. That means you use it to build a Website from templates and content. It's a lot cheaper to host the results of a generated site than using something like Wordpress. Wordpress requires an application server to run code, and a database server to store content. Hugo just requires something to serve up the HTML, CSS and JavaScript generated by the build step.
I use Hugo, because it's fast, and easy to both install and use. Hugo can build Markdown files, which is also how I like to write.
To convert my blog from Wordpress to Hugo, I'd need to get the content out from my old site. Some of the solutions I found depended on installing a plugin into Wordpress, but I was on `wordpress.com` rather than running my own Wordpress installation. Plugins can only be installed on custom Wordpress installations), however I found that [2] could convert the content from the exports.
First, clone the repo:
git clone https://github.com/palaniraja/blog2md cd blog2md
Install the Node.js dependencies (if you don't have Node.js installed, you'll need to get that installed first).
npm install
The script requires the export to be unzipped:
unzip ~/Downloads/adrianhesketh.wordpress.com-2020-07-16-10_28_08-sopboj4dtvekjo1mrrdyncxok0dhtqou.zip
Then, execute the script:
node index.js w adrianhesketh.wordpress.com-2020-07-16-10_28_05/adrianhesketh.wordpress.2020-07-16.001.xml out
You should now have a lot of markdown files in the `out` directory.
Next, time to set up hugo. I used the Nix package manager to install it, but you can also use brew on the Mac (`brew install hugo`) or install a binary from Github for other platforms.
cd ../ mkdir newblog cd newblog hugo new site .
Next, configur a theme, as per the Hugo quickstart instructions [3]. For this example, I chose a Wordpress-y theme.
git submodule add https://github.com/vimux/mainroad themes/mainroad echo 'theme = "mainroad"' >> config.toml
Copy the converted content into the structure, under the `posts` directory.
mkdir ./content/posts cp -r ../blog2md/out/ ./content/posts
To preview the site locally, you can run:
hugo server
You should now be able to see the content at [4]
If you look at the source HTML, you will find that the images still point at the old server. That's no good, because when you shut down Wordpress, you don't want to lose those.
To fix this, you'll need to put the data from the media export into the `static` directory, and then update the links in the content.
First, unzip the content from the media export.
cd static tar -xf ../media-export-40546272-from-0-to-3610.tar cd ..
This puts all of your photos and stuff are in the right place. If the export points at `[5] you'll want it to point to `/2019/01/img_1440.jpg`.
To fix this, you'll need to update the posts. You can use a text editor's find/replace feature if you like, but I used the `goreplace` tool to do it [6] in a single operation.
cd content/posts goreplace https://adrianhesketh.files.wordpress.com/ --replace "/" cd ../../
My site contained links to Github Gists that Wordpress would convert into JavaScript to render the code. To switch this to a hugo shortcode [7] to render the content.
I used this search and replace to make the change.
goreplace 'https://gist.github.com/(.+?)/([a-zA-Z0-9]+)' --replace '\{\{< gist $1 $2 >\}\}'
Wordpress comments were also added to the site, I just deleted them out of the `posts` directory.
Hugo expects to see Markdown in your posts not HTML. In a recent upgrade, the Hugo team changed the Markdwn renderer to a new one that ignores any raw HTML added into your posts.
This is most likely not what we want, because we wrote all the HTML and we definitely want it in our output. I can understand why it's not the default for new projects, but it very much is a breaking change.
To fix that, add the following to `config.toml`:
[markup] [markup.goldmark] [markup.goldmark.renderer] unsafe = true
This `config.toml` is also where you can change the URL and set the site title, and configure themes.
You might run into other problems, it's definitely worth checking through your site locally.
I used custom CloudFormation to configure my blog, but when I was writing this, I wondered if AWS Amplify was actually the right way to go now. I tried it out so you don't have to.
I already had the AWS CLI set up to use my account, so I didn't need to configure AWS Amplify at all, I just ran `amplify init` and told it how to handle everything. The documentation states to use `amplify configure`, but that tries to get you to create an AWS IAM user, which is totally not required if you already have the AWS CLI setup.
In this example, I told it I was building a JavaScript project, because it's the closest to a `hugo` build. For `hugo` projects, the default output is `public`, the build command is just `hugo` and the local run is `hugo server`.
With the project configured, I can add hosting with `amplify add hosting`.
At the first attempt, I chose CloudFront and S3 hosting because I didn't think I needed needed the extra config steps, and I'm happy to run `amplify publish` myself from the command line.
This is one of many errors that should lead you to NOT choose this path.
At the end of the deploy, it Amplify wrote out the URL of [8] but when I visited it I got an error:
<Error> <Code>AccessDenied</Code> <Message>Access Denied</Message> <RequestId>BDEE096F42823A6C</RequestId> <HostId>6hWO3pYE4MK3sxd9SJ/AmqwNNnrEfTr78fTO8OcrK6LgSVpKOVMZVP4LHtSNqysNk0dlsR/ejy0=</HostId> </Error>
In the URL, I noticed that it had even redirected me to an S3 bucket: [9]
If this happens to you, don't panic. Believe it or not, this happens to everyone, and AWS haven't fixed it. CloudFront distributions are globally distributed and configured in North Virginia (that's why Lambda@Edge must be deployed there), and my S3 bucket was created in a different region. The metadata about the S3 bucket is eventually consistent, and hadn't made it to North Virginia yet. It can take over an hour, so just go and have a cup of tea. Don't waste time trying to "fix" it.
The next problem is that default documents within subdirectories don't work (e.g. [10] works fine, but [11] doesn't).
I think the developers might not have noticed, but IIRC, enabling S3 bucket hosting is OK with handling `index.html` in subdirectories, but if you put CloudFront in front, it uses AWS S3 APIs, which don't know anything about `index.html` files.
To add to this, the default CloudFront distribution settings are not great. Error pages are set to be the home page, and hide produce the expected HTTP status codes. Not helpful, because if you're not paying attention, you might not realise you just hit a dead link. Using `amplify hosting` gives you a chance to change the settings, but not all of them. There is a CloudFormation template in JSON format (oh no) in the `amplify` directory, but I'm not sure what Amplify does to it and whether it would be safe to edit it by hand.
The `amplify publish` seems to take forever, it looks like it copies the whole set of files up each time instead of executing an S3 sync, which is what my custom script does.
The S3 bucket that it creates will show up in AWS Config checks, and any security audits, because versioning isn't enabled, logging isn't enabled, encyryption isn't enabled, and there's no policy in place to force HTTPS access.
All-in-all, it's a pretty dire experience.
I decided to rip out the CloudFront hosting and see what the Amplify hosting experience is like. To remove the S3 hosting, I used `amplify hosting remove`, followed by `amplify push` to make the changes.
Then, tried out using the Amplify hosting, by executing `amplify hosting add` and `amplify publish`. Much more successful.
I got a domain [12], and things worked as expected. The deployment is still slow, because it uploads everything in the site, instead of just the changes (as an `s3 sync` would do, but it's OK).
Within the Amplify app in the Web console, you can configure custom domains, so that your Website isn't under `amplifyapp.com`.
You can go into `domains` and configure it there.
When you configure a domain, if you bought it using AWS, or if you've done a domain transfer to AWS (like I did), then the DNS enrties and the TLS configuration will be done automatically for you. That's probably the easiest way, or you'll end up having to deal with DNS from your current provider, to enable AWS Certificate Manager and add CNAME records to the amplifyapp domain.
Hope that's useful for you.
"Amplify hosting" costs a bit more than S3 and CloudFront would, but as we've seen, Amplify isn't nailing that at the moment, so unless you really know your AWS, I'd stick with Amplify hosting and save yourself the time.
However, I wrote my own CloudFormation, and I use a Lambda@Edge to fill in the missing pieces so if that's your thing. You can refer to this:
This has to be deployed to North Virginia.
var path = require("path"); const redirects = { "/redirect-from/example1": { to: "/target1", statusCode: 301 }, "/redirect-from/example2": { to: "/target2", statusCode: 302 }, }; exports.handler = async (event) => { const { request } = event.Records[0].cf; const normalisedUri = normalise(request.uri); const redirect = redirects[normalisedUri]; if (redirect) { return redirectTo(redirect.to, redirect.statusCode); } if (!hasExtension(request.uri)) { request.uri = trimSlash(request.uri) + "/index.html"; } return request; }; const trimSlash = (uri) => (hasTrailingSlash(uri) ? uri.slice(0, -1) : uri); const normalise = (uri) => trimSlash(uri).toLowerCase(); const hasExtension = (uri) => path.extname(uri) !== ""; const hasTrailingSlash = (uri) => uri.endsWith("/"); const redirectTo = (to, statusCode) => ({ status: statusCode.toString(), statusDescription: "Found", headers: { location: [ { key: "Location", value: to, }, ], }, });
To create the site you'll need something like this:
AWSTemplateFormatVersion: "2010-09-09" Description: adrianhesketh.com Parameters: DomainName: Type: String Description: The website domain name. Default: adrianhesketh.com CloudFrontCertificateArn: Type: String Description: ARN of the SSL certificate used for the CloudFront distribution (must be in us-east-1). WebsiteCloudFrontViewerRequestLambdaFunctionARN: Type: String Description: ARN of the Lambda@Edge function that does rewriting of URLs (must be in us-east-1). See lambda_at_edge.js Resources: WebsiteBucket: Type: AWS::S3::Bucket Properties: BucketName: !Ref "DomainName" WebsiteCloudFrontOriginAccessIdentity: Type: AWS::CloudFront::CloudFrontOriginAccessIdentity Properties: CloudFrontOriginAccessIdentityConfig: Comment: !Sub "CloudFront OAI for ${DomainName}" WebsiteBucketPolicy: Type: AWS::S3::BucketPolicy Properties: Bucket: !Ref WebsiteBucket PolicyDocument: Statement: - Action: - s3:GetObject Effect: Allow Resource: !Join ["", ["arn:aws:s3:::", !Ref WebsiteBucket, "/*"]] Principal: CanonicalUser: !GetAtt WebsiteCloudFrontOriginAccessIdentity.S3CanonicalUserId WebsiteCloudfront: Type: AWS::CloudFront::Distribution DependsOn: - WebsiteBucket Properties: DistributionConfig: Comment: !Ref "DomainName" Origins: - DomainName: !GetAtt WebsiteBucket.DomainName Id: website-s3-bucket S3OriginConfig: OriginAccessIdentity: !Join [ "", [ "origin-access-identity/cloudfront/", !Ref WebsiteCloudFrontOriginAccessIdentity, ], ] Aliases: - !Ref "DomainName" DefaultCacheBehavior: ViewerProtocolPolicy: redirect-to-https TargetOriginId: website-s3-bucket Compress: true ForwardedValues: QueryString: true LambdaFunctionAssociations: - EventType: viewer-request LambdaFunctionARN: !Ref WebsiteCloudFrontViewerRequestLambdaFunctionARN ViewerCertificate: AcmCertificateArn: !Ref CloudFrontCertificateArn MinimumProtocolVersion: TLSv1.2_2018 SslSupportMethod: sni-only Enabled: true HttpVersion: http2 DefaultRootObject: index.html IPV6Enabled: true CustomErrorResponses: - ErrorCode: 403 ResponseCode: 404 ResponsePagePath: "/error/index.html" PriceClass: PriceClass_100 Tags: - Key: Name Value: !Ref "DomainName"
I use a simple Makefile.
.PHONY: run build sync-files invalidate-cache deploy run: hugo server build: hugo sync-files: aws s3 sync ./public s3://adrianhesketh.com invalidate-cache: aws cloudfront create-invalidation --distribution-id EE9HA1565U22V --paths /index.html /index.xml /sitemap.xml /css/* deploy: build sync-files invalidate-cache
Install AWS Amplify CLI with Nix
OWASP baseline scan with basic auth in Docker and Github Actions