The day my database refused to return a row I had just written

When your tenant isolation works so well it blocks your own writes, that's not a bug. It's the system proving it's doing exactly what you designed it to do.

For a while, my multi-tenant SaaS isolated tenants the way most apps do: every query carried a WHERE organization_id = :current_org clause, enforced in the application layer. It works. Until it doesn't. One missing filter, one new endpoint that forgets the convention, one ORM relationship that loads more than you expected, and one tenant can see another tenant's data.

For most products that's a bug. For a product whose entire value proposition is being the trustworthy custodian of someone else's records, it's existential. "We filter by organization in the code, trust us" is not a sentence I wanted to say to a security reviewer.

So I moved isolation down a layer, into the database itself, with Postgres Row-Level Security (RLS). This is a short write-up of how that rollout went, and specifically about the moment it appeared to break, which turned out to be the moment it proved it was working.

The Shape of the Design

The idea behind RLS is simple: instead of trusting every query to filter correctly, you attach a policy to the table, and Postgres refuses to return rows that don't match, no matter what the query says.

The policy needs to know who is asking. The common pattern is to push the current tenant into a session-scoped runtime parameter (a GUC), and have the policy read it:

#Postgres #Row-Level Security #Multi-Tenant #SaaS #Database_Security

The day my database refused to return a row I had just written

The Shape of the Design

Comments