Every User Gets Their Own Database. The Billing Architecture That Shouldn't Work — But Outperforms Everything.

When I tell engineers that our billing system gives every user their own SQLite database, the reaction follows a predictable pattern.
First, polite curiosity. "That's interesting."
Then, probing. "How many users do you have?"
Then, the inevitable objection. "That won't scale."
Then, the actually good question. "How do you handle concurrent transactions?"
Here's the uncomfortable answer to all four: it works, we have 99 active users, it scales to hundreds more, and concurrent transactions are easier — not harder — with per-user databases than with shared tables.
I – Why Not Just Use Stripe?
Vibe is a platform where users create AI-powered micro-apps. Each interaction — generating an app, running an AI query, storing data — costs credits. Credits are purchased in advance and consumed as you go.
Stripe is excellent for subscriptions and one-time payments. It is not designed for what we need.
Micro-transactions kill Stripe's pricing model. Each AI generation costs 5 to 50 credits, equivalent to one to ten cents. Stripe's per-transaction fee of thirty cents plus 2.9 percent would eat the entire charge. You can't bill pennies through a system that takes thirty cents per swipe.
Real-time balance feedback requires zero latency. Users need to see their credit balance change the instant they trigger an operation. Stripe webhooks have seconds-to-minutes latency. That's an eternity when you're watching credits tick down during an AI generation.
The hold/confirm pattern needs to be instantaneous. When a user starts an AI generation, we need to reserve credits immediately — before the generation completes — and either confirm the charge on success or release it on failure. Stripe's payment intents support this conceptually but add 500-plus milliseconds per operation.
Billing needs to work when external services don't. If our connection to Stripe is down, billing should still function. With per-user SQLite, it always does.
So here's the split. We use Stripe for credit purchases — the infrequent, high-value transaction where fraud protection and compliance matter. We use per-user SQLite for credit consumption — the frequent, low-value transaction where speed, reliability, and zero fees matter.
II – One Database Per User
When a micro-app is provisioned on the platform, it gets its own billing database file. A single SQLite file containing three things.
An account table — one row, always one row. The current balance, the lifetime total credited, and the lifetime total debited. We don't compute the balance by summing all transactions. We maintain it as a running total. Reading the balance is a single-row lookup that takes about 20 microseconds.
A transaction log — append-only, immutable. Every credit added, every debit consumed, every hold placed, every confirmation, every release. If we need to reverse a charge, we add a new credit transaction. We never update or delete a record. This is a permanent audit trail.
A holds table — the active reservations. Credits that have been set aside for in-progress operations but haven't been confirmed or released yet. Each hold has a status, an amount, a creation timestamp, and an expiration timestamp.
Balance stored as integers. Credits are whole numbers. No floating point. No rounding errors. No precision surprises. One credit equals the smallest billable unit.
The billing database runs in strict synchronous mode — every write is fully flushed to disk before the transaction is reported as committed. Because losing financial data is not negotiable.
III – The Hold/Confirm/Release Pattern
This is the heart of the architecture. Every operation that costs credits follows three steps.
Hold. Reserve credits before the operation starts. The balance drops immediately.
Execute. Perform the actual operation — the AI generation, the data processing, whatever.
Confirm or Release. If the operation succeeded, convert the hold to a permanent debit. If it failed, return the held credits to the balance.
This is exactly how credit card authorizations work. When you swipe at a gas station, the station holds a hundred dollars. You pump forty-seven dollars of gas. The hold is confirmed for forty-seven and the remaining fifty-three is released. If you drive away without pumping, the entire hold is released.
The hold immediately reduces the available balance. This is what prevents double-spending. If a user has 100 credits and two AI generations start simultaneously — each costing 50 — the first hold succeeds, dropping the balance to 50. The second hold succeeds, dropping it to zero. A third request would fail with insufficient balance.
All of this runs inside a SQLite transaction. Atomic. Consistent. Isolated. Durable. No race conditions. No partial states. No "I checked the balance and it was fine, but by the time I debited it, someone else already spent it."
IV – Partial Confirms: Never Overcharge
Here's a detail that matters enormously for user trust.
When we hold credits for an AI generation, we estimate the cost upfront based on the model and prompt length. But the actual cost depends on how many tokens the LLM uses, which we don't know until the generation completes.
So we hold the estimated amount — say, 50 credits — and when the generation finishes, we confirm the actual cost — say, 35 credits. The remaining 15 are automatically returned to the balance.
Users are never overcharged. The estimate is always the ceiling. The actual charge is always equal to or less than the hold.
On the user's dashboard, they see the hold appear immediately, then watch it resolve to a smaller confirmed amount. It feels fair. It feels transparent. And it's all happening in their own isolated database.
V – The 5-Minute TTL: Self-Healing Billing
Here's the scenario that keeps billing engineers awake at night.
A user starts an AI generation. Credits are held. Then the generation crashes. The LLM API times out. The server restarts. The connection drops.
The hold is never confirmed or released.
Without protection, those credits are trapped forever. The user sees a reduced balance but received nothing. This is a phantom charge — the worst kind of billing bug.
Our solution is a 5-minute TTL on every hold. If a hold isn't confirmed or released within five minutes, it automatically expires and the credits are returned.
A background job runs every sixty seconds, scanning active app connections for expired holds. When it finds one, it returns the credits to the balance, marks the hold as expired, and logs the event in the transaction history.
Why five minutes? Our longest possible operation — an AI generation with retries — takes at most two minutes. Five minutes gives a 2.5x safety margin. If an operation hasn't completed in five minutes, something is seriously wrong and the hold should expire.
The TTL creates a self-healing system. Any failure in the hold/confirm/release cycle — server crash, network timeout, bug in confirmation logic — is automatically corrected within five minutes. Credits are never permanently trapped.
The worst case is always "credits are returned." The system errs on the side of not charging. Users might occasionally get a free generation due to a crash at the exact wrong moment, but they'll never be charged for something they didn't receive.
For a billing system, that's the correct failure mode.
VI – Real-Time Balance via SSE
Users can watch their credit balance change in real-time. When they're on the dashboard and an AI generation runs, the balance ticks down as credits are held and up slightly as partial amounts are released.
This is powered by Server-Sent Events.
Every billing operation — hold, confirm, release — broadcasts an event after the database transaction completes. The event includes the operation type, the amount, the new balance, and a description.
On the frontend, the dashboard maintains a persistent SSE connection. When an event arrives, the balance display updates instantly.
No polling. No periodic refresh. The number changes the moment the database changes. Users get immediate, visceral feedback that their credits are being consumed. It builds trust because nothing feels hidden.
If you're designing a billing system — whether credit-based, subscription, metered, or something exotic — the architectural trade-offs are everything. I help engineers think through these decisions in mentoring sessions at mentoring.oakoliver.com. Or explore how the Vibe platform works at vibe.oakoliver.com.
VII – Why Per-User SQLite Beats Shared PostgreSQL
Let me make the comparison explicit.
The alternative is one PostgreSQL database with a shared transactions table. Every user's data in the same table, distinguished by a tenant identifier column. This works. But the operational characteristics are fundamentally different.
Transaction isolation. In shared PostgreSQL, transactions from different users live in the same table. Under high concurrency, User A's transactions can block User B's if they're contending for the same index pages. With per-user SQLite, there is zero contention between users. Physically separate files. Completely independent.
Query performance. In a shared table with 99 users and 10,000 transactions each, that's 990,000 rows in one table. A balance lookup requires an index scan filtered by tenant ID — about 0.5 milliseconds. With per-user SQLite, the balance lookup is a single-row read from a table that always has exactly one row — about 0.02 milliseconds. Twenty-five times faster.
Recent transaction history in the shared table scans a large filtered index — about 2 milliseconds. In the per-user database, it scans a small local index — about 0.1 milliseconds. Twenty times faster.
Backup and restore. In shared PostgreSQL, restoring one user's data means exporting their rows, dropping them, and re-importing — a complex operation with foreign key complications. With per-user SQLite, restoring one user is copying a single file. No impact on anyone else.
Schema migrations. In shared PostgreSQL, altering a 990,000-row table takes time, acquires locks, and affects everyone. With per-user SQLite, you run the migration on each database independently. If one fails, fix it for that user without affecting others. You can even roll out gradually — migrate ten users, verify, migrate ten more.
Corruption blast radius. If something corrupts the shared PostgreSQL table, every user is affected. If something corrupts one user's SQLite file, it's that user and only that user.
VIII – The One Thing Shared PostgreSQL Does Better
Cross-user aggregate queries.
"Show me total spending across all users for the last 30 days, grouped by user, sorted by amount." In PostgreSQL, that's a single query against the shared table. Fast and elegant.
With per-user SQLite, you iterate over all 99 databases — open each one, run the query, collect the results, aggregate in application code. This takes about 50 milliseconds for 99 databases.
Not great for real-time dashboards. Fine for admin reports that run a few times a day.
We run cross-user reports rarely — weekly admin review. We run per-user queries constantly — every billing operation, every balance check, every transaction log view. Optimizing for the common path is the right call.
IX – The Complete Flow
Let me trace the full billing lifecycle for an AI app generation.
A user triggers a generation request. The server estimates the cost based on the model and prompt length, then places a hold on the user's credits. If the balance is insufficient, the request fails immediately with a clear error.
If the hold succeeds, the AI generation begins. The LLM API is called, and tokens stream back. As they stream, the server writes generated files to the user's directory and broadcasts progress events via SSE.
When the generation completes successfully, the server calculates the actual cost based on tokens consumed, confirms the hold for that amount, and releases the difference back to the balance. The confirmation event broadcasts via SSE, and the user's dashboard updates in real-time.
If the generation fails — LLM error, timeout, any exception — the hold is released entirely. All credits return to the balance. The release event broadcasts, and the user sees their balance restored.
If neither confirmation nor release happens — server crash, network partition, the heat death of the universe — the 5-minute TTL takes over and releases the credits automatically.
The guarantees are ironclad. No charge on failure. No overcharge on success. Real-time visibility at every step. Full audit trail. Self-healing on infrastructure failure.
X – Consistency After a Crash
A common worry: what if the server crashes between the hold and the confirm?
SQLite's WAL mode with strict synchronous guarantees that committed transactions survive process crashes.
If the crash happens after the hold but before the generation, the hold exists in the database. The 5-minute TTL will expire it and return credits.
If the crash happens after the generation but before the confirm, the generated files exist and the hold exists but isn't confirmed. The TTL expires the hold. On the next request, we can detect the unconfirmed hold alongside existing generated files, and re-confirm if appropriate.
If the crash happens after the confirm, the transaction is complete and durable. No data loss.
The worst case is always "credits are returned." The system never charges for something the user didn't receive.
XI – Scaling Limits, Honestly
At around 500 users, the iteration-based admin queries become slow — about 500 milliseconds — and the filesystem starts having too many directories for quick enumeration. At that point, we'd add a shared PostgreSQL "index" database for admin queries while keeping per-user SQLite for transactional billing.
At around 10,000 users, per-user databases would strain the filesystem. We'd migrate to a sharded SQLite approach — one database per 100 users with tenant columns — or switch to shared PostgreSQL entirely.
But for 99 users on a single VPS? The entire billing system — holds, confirms, releases, SSE streaming, TTL expiry, admin reports — runs in about 3 megabytes of memory and handles every request in under a millisecond.
Per-user SQLite isn't just adequate at this scale. It's overperforming.
XII – When to Use This Pattern
Use per-user databases when the user count is bounded in the hundreds. When per-user queries dominate your workload. When cross-user queries are rare. When isolation matters more than aggregation. When operational simplicity — backup one user, restore one user, debug one user — is a priority.
Use shared databases when the user count is unbounded or in the millions. When cross-user queries are frequent. When schema changes happen often. When you need transactions that span multiple users.
For Vibe's credit billing system, every factor points to per-user databases. And the implementation — under 500 lines of TypeScript, zero external billing dependencies, sub-millisecond operations — proves that sometimes the unconventional architecture is the correct one.
The conventional wisdom says you shouldn't give every user their own database. The conventional wisdom is optimizing for problems we don't have.
What's the most unconventional database architecture you've shipped to production? Did it survive contact with real users, or did you eventually normalize it back to the standard approach?
The weird ones that worked are my favorite stories.
– Antonio