💾 Archived View for dioskouroi.xyz › thread › 29412905 captured on 2021-12-03 at 14:04:38. Gemini links have been rewritten to link to archived content

View Raw

More Information

➡️ Next capture (2021-12-04)

-=-=-=-=-=-=-

Using Nginx as an Object Storage Gateway

Author: thunderbong

Score: 85

Comments: 37

Date: 2021-12-02 05:37:26

Web Link

________________________________________________________________________________

mnutt wrote at 2021-12-03 02:44:30:

If you need random access to s3 within an ec2 region, sticking nginx in front of it as a caching proxy is unbelievably faster than hitting s3 directly; even moreso if you're multi-region.

I recently experimented with trying to have nginx rewrite images from png/jpeg to webp for clients that support it. I ended up with a solution where a lambda triggered off new files added to a bucket and re-encoded them as webp alongside the originals. When a request came into nginx, it would examine the URL and client Accept headers, and then first try to fetch the webp file from s3 before falling back to fetching the original from s3.

I was somewhat surprised that nginx was capable of doing it efficiently, given the nginx configuration format and all the moving pieces.

gonzo41 wrote at 2021-12-03 02:51:36:

If you;re doing this with an ec2, why not use CloudFront, and if you need a tiny bit of logic you could use API GW edge optimized and toss a lambda in there to do the logic bits.

mnutt wrote at 2021-12-03 05:14:29:

That would work, too. It would be very expensive at scale and would be difficult to keep tail latencies down. Which might be ok for some cases, but wasn't for mine.

halfmatthalfcat wrote at 2021-12-03 02:58:25:

At Edge workers really are great at solving so many of these use cases.

killingtime74 wrote at 2021-12-03 03:05:22:

Nginx itself supports lua and JavaScript

gonzo41 wrote at 2021-12-03 03:10:53:

yeah but with an EC2 you're always running it. Going serverless you only pay when there's use.

dikei wrote at 2021-12-03 03:19:41:

Instead of re-encoding everything, have you considered using something like `imgproxy` to transcode image on demand and cached the result ?

jgalt212 wrote at 2021-12-03 04:56:35:

why webp over png/jpg? Is there that much of a difference to amortize the cost of the extra processing and/or caching?

iostream23 wrote at 2021-12-03 08:46:57:

SĂł, nginx is a freemium webserver ($2500 IIRC, the open source community edition deliberately withheld features like hot reload of configs, not sure of current status wrt versions features parity etc.)

It can also serve as a proxy server, but we already have the finest proxy server in the world as open source: HAProxy

I urge anyone to learn it’s admittedly obscure but simple config file switches and be amazed at how many layers this software can operate on.

When you really need to performance tune your frontend in real-time, you will appreciate HAProxy and what it offers.

bsagdiyev wrote at 2021-12-03 11:18:00:

After using HAProxy at work, I’ve been trying to slowly move my personal setups to using it. The config is a bit weird to learn, like nginx, at first but it really is performant.

Datagenerator wrote at 2021-12-03 12:03:31:

Until you used Caddy?

bsagdiyev wrote at 2021-12-03 14:28:03:

Absolutely not. We can pay HAProxy for support.

chrislusf wrote at 2021-12-02 23:58:25:

Interesting that SeaweedFS also has a similar named "Gateway to Remote Object Storage"

https://github.com/chrislusf/seaweedfs/wiki/Gateway-to-Remot...

The difference is that SeaweedFS can support both read and write, with asynchronous write back. Nginx can support read only caching with 1 hour TTL

dekobon wrote at 2021-12-03 00:10:26:

Author of the article and project here...

SeaweedFS and this project have different purposes. This project is intended to show off how to configure NGINX to act as a S3 proxying gateway by using [njs](

https://nginx.org/en/docs/njs/

). If you look at the github for it, you will see it is just a collection of nginx config and javascript files. This all works will standard open source NGINX. All it does is proxy files like a L7 load balancer, but in this case, it adds AWS v2/v4 headers to the upstream requests.

As for caching, that is totally configurable to whatever you want; the example configuration is set to 1 hour but that is arbitrary. In fact, one of the interesting this is all of the additional functionality that can be enabled because the proxying is being done by NGINX.

Regarding read and write, that can be enabled for AWSv2 signatures, but it is more difficult to do in AWSv4 signatures. I have an idea about how to accomplish it with v4 signatures, but it will take some time to prototype it.

What is "asynchronous write back"?

chrislusf wrote at 2021-12-03 04:35:55:

SeaweedFS is very different from Nginx. It's just the names are so similar.

There are 2 ways to cache: write through and write back. You are using write through, which needs to write to the remote storage before returning. Write back is only writing to local copy, which is much faster to return. The actual updates are executed asynchronously.

timuralp wrote at 2021-12-03 01:25:31:

For requests with non-empty body with v4 signatures (e.g. PUT object) you can use Unsigned-Payload (

https://docs.aws.amazon.com/AmazonS3/latest/API/sig-v4-heade...

) and not have to compute the payload sha256.

conradfr wrote at 2021-12-03 08:09:08:

Coincidentally I wanted to do that for a side-project I launched yesterday [1]. I tried their nginx-s3-gateway image but couldn't get the authorization to AWS S3 to work.

I replaced it with Varnish with the files publicly available on the (cheaper) S3 compatible Scaleway. I guess a simple Nginx would have work the same at that point. My goal was mostly to minimize the bandwidth cost (which is not metered on my server).

[1]

https://abx.funkybits.fr

sneak wrote at 2021-12-02 23:07:00:

Note that this is part of the not-open-source nginx.

tempay wrote at 2021-12-02 23:35:16:

The article states:

> You can use both NGINX Open Source and NGINX Plus as the gateway to S3 or a compatible object store.

mike_d wrote at 2021-12-02 23:44:47:

Here be dragons. The free version of nginx will only do DNS resolution on a backend hostname at startup, the paid version will do periodic lookups.

They do mention this further down the page, but in 8 months when it randomly breaks you have to hope you remember it needs to be periodically restarted to keep working.

This is by far the stupidest paywalled feature ever, because it amount to downtime extortion.

dmart wrote at 2021-12-02 23:52:12:

There are hacky ways around it, though. The method here is something I've used:

https://tenzer.dk/nginx-with-dynamic-upstreams/

gary_0 wrote at 2021-12-03 00:46:24:

Having to hack around the nginx paywall made me briefly consider going back to Apache.

codetrotter wrote at 2021-12-03 01:13:24:

Have you tried Caddy server? No affiliation just a happy user. It’s open source.

It may or may not be able to replace Nginx depending on your use case. For me Caddy has replaced everything I used to use Nginx for and more.

https://caddyserver.com/

dmart wrote at 2021-12-03 01:47:49:

I'm quite interested in Caddy. The last time I checked, things were in a rough spot with the v2 transition, but it looks like the documentation has improved.

_hyn3 wrote at 2021-12-03 04:53:06:

Apache Traffic Server (no relation to Apache itself) would be an excellent option:

https://trafficserver.apache.org/

fosk wrote at 2021-12-03 01:43:48:

Kong Gateway - which is built on top of NGINX - provides frequent DNS lookups for free in the open source version, and we have implemented this feature a very long time ago (2017?) to overcome this limitation.

So if you need need this capability for free, check it out. Not only that, but SRV record resolutions too.

mbreese wrote at 2021-12-03 02:09:49:

Dumb question - do AWS S3 endpoints change DNS that much? Is the DNS resolution limit an issue with this specific workload, or just a general issue?

ceejayoz wrote at 2021-12-03 02:21:34:

Run `dig s3.amazonaws.com` a few times. It's got like a 5 second TTL and the IP changes every time.

prpl wrote at 2021-12-03 05:43:26:

Use openresty local resolver

with

set $proxy_url xxx;

proxy_pass $proxy_url;

tinus_hn wrote at 2021-12-03 00:07:34:

It’s open source, what’s keeping you from patching out that limitation?

jmg_ wrote at 2021-12-03 00:18:18:

I've always been curious to see how project owners respond to someone re-implementing portions of paid features in an open source project.

Assuming the patch is valid, do they decline it citing the paid feature or do something like making a straw man argument against it?

akerl_ wrote at 2021-12-03 00:26:24:

I haven’t tried to pitch something to nginx, but as long as you did it as a clean implementation, “We’re declining to merge, since this is duplicative of code in our paid offering” is the general approach. And then you’re able to maintain your patch set alongside their upstream source.

sneak wrote at 2021-12-03 11:08:02:

My patches removing spyware and phone-home in open source have been universally rejected.

gary_0 wrote at 2021-12-03 00:41:25:

Nothing. For instance, the Debian package nginx-extras includes implementations of some closed-source nginx features. But in my experience the patches are not particularly well-maintained (since they obviously won't be merged by nginx, there's already an official paid version, and the features are named differently from the closed-source ones so they're harder to find).

iostream23 wrote at 2021-12-03 08:49:03:

Use HAProxy, this is what it is designed for.

chespinoza wrote at 2021-12-02 23:41:01:

Indeed, that makes me wonder how difficult for companies like Nginx is making profit from open source, well they were acquired by F5 a couple of years ago so probably they were doing quite well I think.

rad_gruchalski wrote at 2021-12-03 02:38:07:

And it’s S3 only. The title could mention that.