💾 Archived View for dioskouroi.xyz › thread › 29404222 captured on 2021-12-04 at 18:04:22. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2021-12-03)
-=-=-=-=-=-=-
________________________________________________________________________________
Hey HN, Supabase is an open source Firebase alternative. We're building the features of Firebase using enterprise-grade open source tools. We're particularly focused on scalability. We take proven tools like Postgres, and we make them as easy to use as Firebase.
Today, Supabase is adding Row Level Security (RLS) to our Realtime engine.
The linked blog post goes into depth around the technical implementation, so I’ll just give a recap for the people who jump straight to comments (like me).
Supabase was launched here on HN when we open sourced our Realtime engine[0] - an Elixir server which clients (i.e. website visitors/users) can connect to via websockets and receive a stream of PostgreSQL changes.
The server receives those changes via a logical replication slot - the same system that PostgreSQL uses for replicating to other databases.
To achieve RLS we added a few SQL functions, the main one is apply_rls[1] which the stream is filtered through. For every user connected to the Elixir server, the Postgres function checks if they have access to the database change and appends an array of allowed user IDs. The Realtime server then delivers the change to the user only if the connected user is matched in this array.
This one has been a long time coming, and it's one of the reasons why we have maintained our "beta" badge for so long. A few of the team will be here to answer any questions - my cofounder @awalias and @steve-chavez from PostgREST, @inian, @wenbo and @1_over_n
[0] Realtime Show HN:
https://news.ycombinator.com/item?id=22114560
)
[1] SQL function:
https://github.com/supabase/realtime/blob/master/server/priv...
Looks really interesting, looking at your docs. I have an application using firebase + geofire where users can see things on a map, and those results are themselves updated (ex. real time reactions). However, geofire limits me heavily because I can't add additional filtering (ex. timestamp) and lack of bbox queries.
Can I use supabase's postgres + postgis setup, and support the same type of real time functionality where I can get a bbox result set to display on a map, and then when one of those items gets a reaction or something from some user, that change should propagate to anyone else who might have that particular item in their current map bbox result set. Just wondering if you know if that's possible. Thanks
Your use-case is completely plausible, and if there are any blockers then we would consider it a bug that needs fixing. Postgres + PostGIS + PostgREST are a perfect match, and over time we will make mapping a first class citizen with proper documentation. The Realtime server is just a layer on top of PostgreSQL, so I'm confident it will work out-of-the-box with PostGIS
I haven't seen a lot of mapping use-cases in Supabase _yet_, but there are some:
https://geoexamples.com/svelte/2021/07/18/svelte-supabase-ma...
I just want so say thank you. RLS is the feature I was waiting for.
It's an impressive work you have done!
Looking at Hasura they provide also column level security and option to add custom logic via actions. Is this something planned in the future and in general how would you compare the two systems and what are the advantages or disadvantages of Supabase?
We currently make use of Supabase and it's been fantastic. It's enabled us to completely get away from having a traditional backend/API by utilizing RLS and it's realtime nature to have data directly in the UI. Rough numbers are showing we cut development time by a third vs a traditional approach.
I'm on a POC project that's using PostgREST and it's been extremely fast to get a big complicated data model working with an API in front of it. But I guess I don't get how to really use this thing in reality? What does devops look like? Do you have sophisticated db migrations with every deploy? Is all the SQL in version control?
I also don't really get where the users get created in postgres that have all the row-level permissions. The docs are all about auth for users that are already in there.
This is my personal experience with using PostgREST (I haven't had the full supabase experience yet):
> What does devops look like?
I usually spin PostgREST workers up in some kind of managed container service, like Google Compute Engine. PostgREST is stateless, so other than upgrades, you never really need to cycle the services. As for resources PostgREST is extremely lean, I usually try to run 4 to 8 workers per gigabyte of RAM.
> Do you have sophisticated db migrations with every deploy?
You can use whatever migration tool your want. Sqitch is quite popular. I've even worked on projects that were migrated by Django but PostgREST did the API service.
> Is all the SQL in version control?
Yes this is a good approach, but it means needing a migration tool to apply the migrations in the right order, this is what Sqitch does and many ORMy libraries have migration sort of half-baked in.
It's worth noting that because many of the objects that PostgREST deals with are views, which have no persistent state, the migration of the views can be _decoupled_ from the migration of the persistent objects like tables. Replacing a view (with CREATE OR REPLACE VIEW) can be done very quickly without locking tables as long as you don't change the view's schema.
In Supabase we use a separate Auth server [0]. This stores the user in an `auth` schema, and these users can login to receive a JWT. Inside the JWT is a "role", which is, in fact, a PostgreSQL role ("authenticated") that has certain grants associated to it, and the user ID (a UUID).
Inside your RLS Policies you can use anything stored inside the JWT. My cofounder made a video [1] on this which is quite concise. Our way of handling this is just an extension of the PostgREST Auth recommendations:
https://postgrest.org/en/v9.0/auth.html
[0] Auth server:
https://github.com/supabase/gotrue
[1] RLS Video:
https://supabase.com/docs/learn/auth-deep-dive/auth-row-leve...
When I think of postgrest, supabase and other tools that allows you skip the backend completely and go straight to the DB; is how do you handle business logic that doesn't make sense to have in either the frontend or the DB ?
I have the same question. Since Supabase still exposes the connection parameters to the database, we could spin up another Flask server to deal with routes that are not expressible as simple CRUD operations. The JWT secret shared by the auth service (GoTrue) with Postgrest is available too, so I could also use it to authenticate the users myself.
@supabase developers, is this a reasonable way to do things or am I missing something?
This is completely reasonable an almost encouraged. We give you a full PG database so that you can use it in any way you wish, with any combination of Supabase features that want or don't want.
For example, we have a lot of people who use Prisma with Supabase, rather than the APIs we provide.
My go-to approach is to insert jobs into a queue table, and then have backend workers that consume items from the queue. This has a number of advantages:
1. Faster user experience, the user isn't waiting for the business logic to complete, inserting in the queue should only take a couple of milliseconds.
2. It's more secure, the web worker can have minimal privileges (INSERT only on the queue table) but the backend workers can have much more privilege because they are not user facing.
3. You can scale the web workers orthogonal to the queue workers, as they'll likely have very different scaling properties.
Supabase also has a WAL decoding tool called WALRUS that I have not tried yet, and that could be the most efficient approach going forward. The tradeoff with queue tables is that the tables are persistent and a bit more resilient to failure/retry.
Do you have more info on the WALRUS stuff? Is that part of the Elixir lib?
WALRUS is all SQL, so you could use it with anything (probably with a bit of extra code-wrangling):
https://github.com/supabase/walrus
There are lots of simple things that are normally easier to do in the web framework that are suddenly easier to do in the database (with the side effect that you can do DB optimizations much easier).
But the other consideration is that you likely need to do a lot with a reverse-proxy like traefik to have much control of what you are really exposing to the outside world. PostgREST is not Spring, it doesn't have explicit control over every little thing so you're likely to need something in front of it. Anyway, point is that having a simple Flask server with a few endpoints running wouldn't complicate the architecture very much b/c you are better off with something in front of it doing routing already (and ssl termination, etc).
I actually extracted out the WAL listening bit of code from Supabase and use that to listen to changes in Postgres then do callback style business logic in Elixir. If you like Elixir, this could be an option.
I haven't had time to work on it myself until recently, but I brought up the idea of making that bit a separate library (and I understand that the Supabase folks are super busy).
https://github.com/supabase/realtime/issues/146#issue-874855...
You can do business logic in stored procedures, functions and views.
This is extremely painful to debug and troubleshoot compared to normal code.
I've been developing an app on postgrest (which supabase uses) for a while, and my solution to this was to have a NOTIFY trigger and a small program that broadcasted the changed ID's over SSE to clients. The clients would then fetch the changes.
As for the RLS aspect I considered it "good enough" to allow all clients to know the changed ID (an UUID) as long as they got no other info, and planned to have the trigger also send the allowed groups in the NOTIFY so that the program could filter without needing a separate PG connection per each subscriber. In my testing it scales very well.
Anyone from supabase that can comment why LISTEN/NOTIFY triggers were not used, and why WS was used over SSE when the communication is not bi-directional? It'd be interesting to hear your thoughts.
> LISTEN/NOTIFY triggers were not used
We started with this implementation, however there are 2 issues with this approach. 1) NOTIFY has a limit of 8000 bytes, so large rows are silently dropped. 2) you need to attach the notify trigger to every table (vs a Publication, which can listen to an entire database in a single line of code).
> why WS was used
The Server is Elixir (Erlang) so it's very good at WS. Over time we will add other multiplayer useful features (like presence)
Thanks for the response! I have a few more questions if you have the time:
If I'm reading the article correctly this feature is only available for authenticated users? Is that restriction based on scaling or some other limitation?
When I looked at the example query in the article it only queries the primary key, does this also work for tables where the GRANT hides certain columns from different users?
> does this also work for tables where the GRANT hides certain columns from different users?
Yes, it does! Any columns that are not selectable by the user's role are hidden from the JSON `response` and `old_response` keys and are not allowed for user defined filtered.
Here is where the columns are checked to see if they are selectable
https://github.com/supabase/walrus/blob/f030688d9d94520b916d...
And here is there they are filtered out
https://github.com/supabase/walrus/blob/f030688d9d94520b916d...
Thanks! Seems like you guys thought of everything :)
> this feature is only available for authenticated users
Every client is expected to provide an API key at a minimum. The API key is actually a long-lived "anon" JWT, so unauthenticated users can connect and listen to "public" changes (for example, you might want to add a table of stock prices which update every second)
> only queries the primary key
That's correct. We use PostgreSQL logical replication to capture changes, which requires PK's. This is expected for replication because otherwise there would be no way of definitely identifying an update/delete. IMO it's a good practice to add a PK to every table anyway (a simple "serial" will suffice), especially for auditing and historical analysis
> IMO it's a good practice to add a PK to every table anyway (a simple "serial" will suffice), especially for auditing and historical analysis
The question wasn't if a primary key was required, it was "does this also work for tables where the GRANT hides certain columns from different users?"
thanks - long week. I should read more carefully.
Inside a supabase project users can connect to their projects as either "anon" or "authenticated". The role information is passed inside a JWT.
I'll need to confirm with Oli (asleep right now) on the technical implementation but I believe either of these roles is assumed when running the RLS checks.
No problem, totally understandable!
What are approaches to you prevent DoS attacks with such an architecture? It seems trivial for an attacker to generate super expensive queries en masse.
Strict row limits could bring some relief but don't solve the problem.
I bet there are solutions for this but it's not very obvious. Would someone briefly explain?
Supabase uses Kong (
) as an API gateway, which has a Rate-limiting plugin (
https://docs.konghq.com/hub/kong-inc/rate-limiting/
)
Our platform is also behind Cloudflare to protect agains DDoS, and we hope to use this to add some smart caching for the API service
You can put traefik in front of it and cache. Then setup a max limit on query execution. I don't use REST. Everything I need I write out as a postgres function. The other side is monitoring. Setup logging and run fail2ban, or an alarm to kick the user and require manual oversight.
The row level security people should talk to the capability security people.
Probably _the_ main problem with RDBMS is also it's biggest benefit: you write code that naively looks some bad polynomial time, and then query planner does the magic and use indices. The problem is, of course, when the index doesn't exist or the query planner doesn't do a good job.
What would be nice is a way to _mandate_ that queries must go through certain indices. But this is in fact good for row level security too!
Functional programming is deeply related to capability-based security, in that both try to get rid of all ambient authority and mediate all actions through some sort of reference. Rust, with it's many types of reference makes the relationship even clearer. From these vantage points, it is very natural to use data structure both to make things performant _and_ enforce certain invariant.
Indices are the closes data structures in RDBMSs, because they do promote only certain access patterns. With the rule that queries must go through the indices, they enforce those access pattern, and thus we have the same proven tool. Combine them with views for filtering out data, (which we can think in conjunction with more indices of as sort of a curried indexes Map<UserID, Map<Stuff, OtherStuff>> even if they aren't implemented that way), and now we have very powerful tools for guaranteeing performance and security.
As an icing on the cake, I think enforcing limited access patterns could get us closer to that long-sought goal of a RMBDS that automatically knows how to distribute itself. The limited access patterns open up possible options, because we which of the pathological joins that every sharding possibility run into are no longer allowed. Reusing the same query planner tricks, "cardinality analysis" type stuff can be used to choose between those possibilities.
Here's a worked out concrete example:
Tables:
User: { k: UserID PK, t: TennentId, n: Name, isAdmin: Bool }, Todo: { k: TodoID PK, u: UserID, d: Date, m: Msg }
Indices and Views:
Map<(t: TennentId, exists_classical u: User. u.t = t, u.isAdmin = true), {u2 <- User, u2.t = t}> Map<(t: TennentId, exists_classical u: User. u.t = t, u.isAdmin = true), {t <- Todo, u2 <- User, t.u = u2.k, u2.t = t}>
^ that's a joined view with compression syntax
Map<t: TennantID, Map<uk: UserID, exist_constructive u: User. u.t = t, u.k = uk>> Map<(t: TennentId, Map<uk: UserId, exists_classical u: User. u.t = t, u.k = uk, {t <- Todo, t.u = u.k}>>
The more worked out system would probably allow tupling stuff to not repeat the "Map<(t: TennentId," at the beginning.
A few things to note:
1. When we do a comprehension syntax, we are "returning the whole table" i.e. an unconstrained view that can be queried in an arbitrary way. A joined table has to be joined in the given way, but is otherwise unconstrained.
2. exist_constructive is the same as a { ... } comprehension, but the cardinality must be 1 as proven by the unique constraints. exists_classical is a comprehension for an arbitrary number of stuff, that is then _truncated_ in the type theory sense.
https://xenaproject.wordpress.com/2021/05/19/the-trace-of-an...
is a good discussion of an example of this truncation.
3. Normally one puts the TennentID on every table, even though it is denormalizing. Now the access patterns enforce this.
Clearly this is a few PhD theses away from actually existing :), but all the embryonic theory comes from good well-established things, which makes me confident.
> What would be nice is a way to mandate that queries must go through certain indices.
Yes, this is an interesting idea. We're kinda gravitating towards this approach in PostgREST lately. For example, on a recent discussion about integrating with PGroonga[1], we were thinking of only exposing its operators once an index is enabled on a column:
CREATE INDEX index_name ON table USING pgroonga (column);
Enables doing:
GET /table?column=groonga_query.PostgreSQL
Which roughly translates to:
SELECT * FROM table WHERE column &@~ 'PostgreSQL';
Of course this is PostgREST-specific, only when going through it you'd enforce this restriction.
[1]: