💾 Archived View for dioskouroi.xyz › thread › 29404222 captured on 2021-12-04 at 18:04:22. Gemini links have been rewritten to link to archived content

-=-=-=-=-=-=-

Realtime Postgres Row Level Security

Author: kiwicopple

Score: 169

Comments: 32

Date: 2021-12-01 15:00:06

Web Link

________________________________________________________________________________

kiwicopple wrote at 2021-12-01 15:01:43:

Hey HN, Supabase is an open source Firebase alternative. We're building the features of Firebase using enterprise-grade open source tools. We're particularly focused on scalability. We take proven tools like Postgres, and we make them as easy to use as Firebase.

Today, Supabase is adding Row Level Security (RLS) to our Realtime engine.

The linked blog post goes into depth around the technical implementation, so I’ll just give a recap for the people who jump straight to comments (like me).

Supabase was launched here on HN when we open sourced our Realtime engine[0] - an Elixir server which clients (i.e. website visitors/users) can connect to via websockets and receive a stream of PostgreSQL changes.

The server receives those changes via a logical replication slot - the same system that PostgreSQL uses for replicating to other databases.

To achieve RLS we added a few SQL functions, the main one is apply_rls[1] which the stream is filtered through. For every user connected to the Elixir server, the Postgres function checks if they have access to the database change and appends an array of allowed user IDs. The Realtime server then delivers the change to the user only if the connected user is matched in this array.

This one has been a long time coming, and it's one of the reasons why we have maintained our "beta" badge for so long. A few of the team will be here to answer any questions - my cofounder @awalias and @steve-chavez from PostgREST, @inian, @wenbo and @1_over_n

[0] Realtime Show HN:

https://news.ycombinator.com/item?id=22114560

)

[1] SQL function:

https://github.com/supabase/realtime/blob/master/server/priv...

me_me_mu_mu wrote at 2021-12-01 18:20:08:

Looks really interesting, looking at your docs. I have an application using firebase + geofire where users can see things on a map, and those results are themselves updated (ex. real time reactions). However, geofire limits me heavily because I can't add additional filtering (ex. timestamp) and lack of bbox queries.

Can I use supabase's postgres + postgis setup, and support the same type of real time functionality where I can get a bbox result set to display on a map, and then when one of those items gets a reaction or something from some user, that change should propagate to anyone else who might have that particular item in their current map bbox result set. Just wondering if you know if that's possible. Thanks

kiwicopple wrote at 2021-12-02 01:35:01:

Your use-case is completely plausible, and if there are any blockers then we would consider it a bug that needs fixing. Postgres + PostGIS + PostgREST are a perfect match, and over time we will make mapping a first class citizen with proper documentation. The Realtime server is just a layer on top of PostgreSQL, so I'm confident it will work out-of-the-box with PostGIS

I haven't seen a lot of mapping use-cases in Supabase _yet_, but there are some:

https://geoexamples.com/svelte/2021/07/18/svelte-supabase-ma...

BrandiATMuhkuh wrote at 2021-12-01 21:05:42:

I just want so say thank you. RLS is the feature I was waiting for.

It's an impressive work you have done!

golemiprague wrote at 2021-12-01 20:05:08:

Looking at Hasura they provide also column level security and option to add custom logic via actions. Is this something planned in the future and in general how would you compare the two systems and what are the advantages or disadvantages of Supabase?

tehnorm wrote at 2021-12-01 16:28:47:

We currently make use of Supabase and it's been fantastic. It's enabled us to completely get away from having a traditional backend/API by utilizing RLS and it's realtime nature to have data directly in the UI. Rough numbers are showing we cut development time by a third vs a traditional approach.

pgroves wrote at 2021-12-01 16:45:42:

I'm on a POC project that's using PostgREST and it's been extremely fast to get a big complicated data model working with an API in front of it. But I guess I don't get how to really use this thing in reality? What does devops look like? Do you have sophisticated db migrations with every deploy? Is all the SQL in version control?

I also don't really get where the users get created in postgres that have all the row-level permissions. The docs are all about auth for users that are already in there.

michelpp wrote at 2021-12-01 17:44:14:

This is my personal experience with using PostgREST (I haven't had the full supabase experience yet):

> What does devops look like?

I usually spin PostgREST workers up in some kind of managed container service, like Google Compute Engine. PostgREST is stateless, so other than upgrades, you never really need to cycle the services. As for resources PostgREST is extremely lean, I usually try to run 4 to 8 workers per gigabyte of RAM.

> Do you have sophisticated db migrations with every deploy?

You can use whatever migration tool your want. Sqitch is quite popular. I've even worked on projects that were migrated by Django but PostgREST did the API service.

> Is all the SQL in version control?

Yes this is a good approach, but it means needing a migration tool to apply the migrations in the right order, this is what Sqitch does and many ORMy libraries have migration sort of half-baked in.

It's worth noting that because many of the objects that PostgREST deals with are views, which have no persistent state, the migration of the views can be _decoupled_ from the migration of the persistent objects like tables. Replacing a view (with CREATE OR REPLACE VIEW) can be done very quickly without locking tables as long as you don't change the view's schema.

kiwicopple wrote at 2021-12-01 17:11:49:

In Supabase we use a separate Auth server [0]. This stores the user in an `auth` schema, and these users can login to receive a JWT. Inside the JWT is a "role", which is, in fact, a PostgreSQL role ("authenticated") that has certain grants associated to it, and the user ID (a UUID).

Inside your RLS Policies you can use anything stored inside the JWT. My cofounder made a video [1] on this which is quite concise. Our way of handling this is just an extension of the PostgREST Auth recommendations:

https://postgrest.org/en/v9.0/auth.html

[0] Auth server:

https://github.com/supabase/gotrue

[1] RLS Video:

https://supabase.com/docs/learn/auth-deep-dive/auth-row-leve...

iwebdevfromhome wrote at 2021-12-01 17:14:07:

When I think of postgrest, supabase and other tools that allows you skip the backend completely and go straight to the DB; is how do you handle business logic that doesn't make sense to have in either the frontend or the DB ?

thelastbender12 wrote at 2021-12-01 19:08:58:

I have the same question. Since Supabase still exposes the connection parameters to the database, we could spin up another Flask server to deal with routes that are not expressible as simple CRUD operations. The JWT secret shared by the auth service (GoTrue) with Postgrest is available too, so I could also use it to authenticate the users myself.

@supabase developers, is this a reasonable way to do things or am I missing something?

kiwicopple wrote at 2021-12-02 01:37:17:

This is completely reasonable an almost encouraged. We give you a full PG database so that you can use it in any way you wish, with any combination of Supabase features that want or don't want.

For example, we have a lot of people who use Prisma with Supabase, rather than the APIs we provide.

michelpp wrote at 2021-12-01 17:48:59:

My go-to approach is to insert jobs into a queue table, and then have backend workers that consume items from the queue. This has a number of advantages:

1. Faster user experience, the user isn't waiting for the business logic to complete, inserting in the queue should only take a couple of milliseconds.

2. It's more secure, the web worker can have minimal privileges (INSERT only on the queue table) but the backend workers can have much more privilege because they are not user facing.

3. You can scale the web workers orthogonal to the queue workers, as they'll likely have very different scaling properties.

Supabase also has a WAL decoding tool called WALRUS that I have not tried yet, and that could be the most efficient approach going forward. The tradeoff with queue tables is that the tables are persistent and a bit more resilient to failure/retry.

cpursley wrote at 2021-12-01 20:01:15:

Do you have more info on the WALRUS stuff? Is that part of the Elixir lib?

kiwicopple wrote at 2021-12-02 01:38:53:

WALRUS is all SQL, so you could use it with anything (probably with a bit of extra code-wrangling):

https://github.com/supabase/walrus

pgroves wrote at 2021-12-01 18:39:36:

There are lots of simple things that are normally easier to do in the web framework that are suddenly easier to do in the database (with the side effect that you can do DB optimizations much easier).

But the other consideration is that you likely need to do a lot with a reverse-proxy like traefik to have much control of what you are really exposing to the outside world. PostgREST is not Spring, it doesn't have explicit control over every little thing so you're likely to need something in front of it. Anyway, point is that having a simple Flask server with a few endpoints running wouldn't complicate the architecture very much b/c you are better off with something in front of it doing routing already (and ssl termination, etc).

cpursley wrote at 2021-12-01 19:59:12:

I actually extracted out the WAL listening bit of code from Supabase and use that to listen to changes in Postgres then do callback style business logic in Elixir. If you like Elixir, this could be an option.

I haven't had time to work on it myself until recently, but I brought up the idea of making that bit a separate library (and I understand that the Supabase folks are super busy).

https://github.com/supabase/realtime/issues/146#issue-874855...

polskibus wrote at 2021-12-01 17:29:50:

You can do business logic in stored procedures, functions and views.

twoquestions wrote at 2021-12-02 14:17:18:

This is extremely painful to debug and troubleshoot compared to normal code.

SahAssar wrote at 2021-12-01 17:23:31:

I've been developing an app on postgrest (which supabase uses) for a while, and my solution to this was to have a NOTIFY trigger and a small program that broadcasted the changed ID's over SSE to clients. The clients would then fetch the changes.

As for the RLS aspect I considered it "good enough" to allow all clients to know the changed ID (an UUID) as long as they got no other info, and planned to have the trigger also send the allowed groups in the NOTIFY so that the program could filter without needing a separate PG connection per each subscriber. In my testing it scales very well.

Anyone from supabase that can comment why LISTEN/NOTIFY triggers were not used, and why WS was used over SSE when the communication is not bi-directional? It'd be interesting to hear your thoughts.

kiwicopple wrote at 2021-12-01 17:36:28:

> LISTEN/NOTIFY triggers were not used

We started with this implementation, however there are 2 issues with this approach. 1) NOTIFY has a limit of 8000 bytes, so large rows are silently dropped. 2) you need to attach the notify trigger to every table (vs a Publication, which can listen to an entire database in a single line of code).

> why WS was used

The Server is Elixir (Erlang) so it's very good at WS. Over time we will add other multiplayer useful features (like presence)

SahAssar wrote at 2021-12-01 17:48:51:

Thanks for the response! I have a few more questions if you have the time:

If I'm reading the article correctly this feature is only available for authenticated users? Is that restriction based on scaling or some other limitation?

When I looked at the example query in the article it only queries the primary key, does this also work for tables where the GRANT hides certain columns from different users?

oliverrice wrote at 2021-12-02 12:32:29:

> does this also work for tables where the GRANT hides certain columns from different users?

Yes, it does! Any columns that are not selectable by the user's role are hidden from the JSON `response` and `old_response` keys and are not allowed for user defined filtered.

Here is where the columns are checked to see if they are selectable

https://github.com/supabase/walrus/blob/f030688d9d94520b916d...

And here is there they are filtered out

https://github.com/supabase/walrus/blob/f030688d9d94520b916d...

SahAssar wrote at 2021-12-02 12:38:10:

Thanks! Seems like you guys thought of everything :)

kiwicopple wrote at 2021-12-02 01:48:39:

> this feature is only available for authenticated users

Every client is expected to provide an API key at a minimum. The API key is actually a long-lived "anon" JWT, so unauthenticated users can connect and listen to "public" changes (for example, you might want to add a table of stock prices which update every second)

> only queries the primary key

That's correct. We use PostgreSQL logical replication to capture changes, which requires PK's. This is expected for replication because otherwise there would be no way of definitely identifying an update/delete. IMO it's a good practice to add a PK to every table anyway (a simple "serial" will suffice), especially for auditing and historical analysis

SahAssar wrote at 2021-12-02 07:48:03:

> IMO it's a good practice to add a PK to every table anyway (a simple "serial" will suffice), especially for auditing and historical analysis

The question wasn't if a primary key was required, it was "does this also work for tables where the GRANT hides certain columns from different users?"

kiwicopple wrote at 2021-12-02 08:27:32:

thanks - long week. I should read more carefully.

Inside a supabase project users can connect to their projects as either "anon" or "authenticated". The role information is passed inside a JWT.

I'll need to confirm with Oli (asleep right now) on the technical implementation but I believe either of these roles is assumed when running the RLS checks.

SahAssar wrote at 2021-12-02 12:38:46:

No problem, totally understandable!

polycaster wrote at 2021-12-01 21:46:22:

What are approaches to you prevent DoS attacks with such an architecture? It seems trivial for an attacker to generate super expensive queries en masse.

Strict row limits could bring some relief but don't solve the problem.

I bet there are solutions for this but it's not very obvious. Would someone briefly explain?

kiwicopple wrote at 2021-12-02 01:42:00:

Supabase uses Kong (

https://konghq.com/kong/

) as an API gateway, which has a Rate-limiting plugin (

https://docs.konghq.com/hub/kong-inc/rate-limiting/

)

Our platform is also behind Cloudflare to protect agains DDoS, and we hope to use this to add some smart caching for the API service

steve76 wrote at 2021-12-02 00:10:23:

You can put traefik in front of it and cache. Then setup a max limit on query execution. I don't use REST. Everything I need I write out as a postgres function. The other side is monitoring. Setup logging and run fail2ban, or an alarm to kick the user and require manual oversight.

Ericson2314 wrote at 2021-12-01 16:50:43:

The row level security people should talk to the capability security people.

Probably _the_ main problem with RDBMS is also it's biggest benefit: you write code that naively looks some bad polynomial time, and then query planner does the magic and use indices. The problem is, of course, when the index doesn't exist or the query planner doesn't do a good job.

What would be nice is a way to _mandate_ that queries must go through certain indices. But this is in fact good for row level security too!

Functional programming is deeply related to capability-based security, in that both try to get rid of all ambient authority and mediate all actions through some sort of reference. Rust, with it's many types of reference makes the relationship even clearer. From these vantage points, it is very natural to use data structure both to make things performant _and_ enforce certain invariant.

Indices are the closes data structures in RDBMSs, because they do promote only certain access patterns. With the rule that queries must go through the indices, they enforce those access pattern, and thus we have the same proven tool. Combine them with views for filtering out data, (which we can think in conjunction with more indices of as sort of a curried indexes Map<UserID, Map<Stuff, OtherStuff>> even if they aren't implemented that way), and now we have very powerful tools for guaranteeing performance and security.

As an icing on the cake, I think enforcing limited access patterns could get us closer to that long-sought goal of a RMBDS that automatically knows how to distribute itself. The limited access patterns open up possible options, because we which of the pathological joins that every sharding possibility run into are no longer allowed. Reusing the same query planner tricks, "cardinality analysis" type stuff can be used to choose between those possibilities.

Here's a worked out concrete example:

Tables:

User: { k: UserID PK, t: TennentId, n: Name, isAdmin: Bool },
   Todo: { k: TodoID PK, u: UserID, d: Date, m: Msg }

Indices and Views:

exists_classical means proof of existence but we don't care which.

Map<(t: TennentId, exists_classical u: User. u.t = t, u.isAdmin = true), {u2 <- User, u2.t = t}>
    Map<(t: TennentId, exists_classical u: User. u.t = t, u.isAdmin = true), {t <- Todo, u2 <- User, t.u = u2.k, u2.t = t}>

^ that's a joined view with compression syntax

exists_constructive means need to show existence and we do see what item witness it

Map<t: TennantID, Map<uk: UserID, exist_constructive u: User. u.t = t, u.k = uk>>

    Map<(t: TennentId, Map<uk: UserId, exists_classical u: User. u.t = t, u.k = uk, {t <- Todo, t.u = u.k}>>

The more worked out system would probably allow tupling stuff to not repeat the "Map<(t: TennentId," at the beginning.

A few things to note:

1. When we do a comprehension syntax, we are "returning the whole table" i.e. an unconstrained view that can be queried in an arbitrary way. A joined table has to be joined in the given way, but is otherwise unconstrained.

2. exist_constructive is the same as a { ... } comprehension, but the cardinality must be 1 as proven by the unique constraints. exists_classical is a comprehension for an arbitrary number of stuff, that is then _truncated_ in the type theory sense.

https://xenaproject.wordpress.com/2021/05/19/the-trace-of-an...

is a good discussion of an example of this truncation.

3. Normally one puts the TennentID on every table, even though it is denormalizing. Now the access patterns enforce this.

Clearly this is a few PhD theses away from actually existing :), but all the embryonic theory comes from good well-established things, which makes me confident.

steve-chavez wrote at 2021-12-01 18:20:04:

> What would be nice is a way to mandate that queries must go through certain indices.

Yes, this is an interesting idea. We're kinda gravitating towards this approach in PostgREST lately. For example, on a recent discussion about integrating with PGroonga[1], we were thinking of only exposing its operators once an index is enabled on a column:

  CREATE INDEX index_name ON table USING pgroonga (column);

Enables doing:

  GET /table?column=groonga_query.PostgreSQL

Which roughly translates to:

  SELECT * FROM table WHERE column &@~ 'PostgreSQL';

Of course this is PostgREST-specific, only when going through it you'd enforce this restriction.

[1]:

https://github.com/PostgREST/postgrest/issues/2028