Stop Deducting Credits Immediately — Your Billing System Is a Ticking Time Bomb

You sit down at a restaurant. Hand your card to the waiter. They run a pre-authorization for $100.
You eat a $47 meal. The restaurant captures $47. The remaining $53 hold evaporates.
Nobody got overcharged. Nobody got stiffed.
Now imagine if your SaaS billing worked the same way.
Most credit-based platforms don't do this. They deduct credits immediately, pray that the operation succeeds, and scramble to refund if it doesn't. The result: race conditions, double-spending, phantom deductions, and support tickets that make you question your career choices.
When I built the credit billing system for vibe.oakoliver.com, I refused to repeat that mistake. I implemented what I call the captive credit system — a hold-confirm-release pattern with automatic TTL expiration that eliminates an entire category of billing bugs.
I – The Naive Approach That Will Betray You
Let's talk about the "obvious" approach that most developers reach for.
Check the user's credit balance. If sufficient, deduct the credits. Then run the expensive operation.
It looks fine. It works in development. It even works in production — until it doesn't.
Race conditions. The user fires two requests at the same time. Both read credits equal to 10. Both pass the balance check. Both deduct 8 credits. The user now has negative 6 credits. Congratulations, you just gave away free compute.
Failure without rollback. The AI operation fails after credits are already deducted. Now you need to refund. But what if the refund fails? What if the server crashes between deduction and the operation? The user lost credits and got nothing.
Partial operations. A multi-step pipeline deducts credits at step 1, fails at step 3. Do you refund all credits? Just the unused ones? How do you even calculate that?
Concurrent balance reads. Two microservices read the balance simultaneously. Both see enough credits. Both proceed. Double-spent.
These aren't theoretical edge cases. Every credit-based system that does immediate deduction will hit these bugs at scale. The question is not if. It's when.
II – The Restaurant Mental Model
The restaurant pre-authorization pattern solves all of these problems with a three-phase approach.
Phase one: Hold. Reserve credits without actually spending them. The user's available balance decreases, but the credits aren't gone. They're in limbo. Captive.
Phase two: Confirm. The operation succeeded. Convert the hold into a real deduction. Credits are now truly spent.
Phase three: Release. The operation failed. Release the hold. Credits return to available balance instantly.
And the critical addition that most people forget: TTL — Time To Live. If neither confirm nor release happens within 5 minutes, the hold automatically expires and credits are released. This prevents zombie holds from permanently locking user funds.
Hotels, gas stations, and restaurants have been doing this for decades. We're just applying the same pattern to software credits.
III – Holds as First-Class Entities
The first design decision is making holds explicit in your data model. They're not a side effect. Not a temporary flag. Not a log entry. They're a proper database entity with their own lifecycle.
Each hold has a unique identifier, the user it belongs to, the amount reserved, a status, the reason for the hold, a link to the operation that created it, timestamps for creation and resolution, and — crucially — an expiration time.
The status has exactly four values. Captive: credits held, operation in progress. Confirmed: credits spent, operation succeeded. Released: credits returned, operation failed. Expired: TTL elapsed, credits returned automatically.
No nullable booleans. No ambiguous flags. No "is this hold still active?" guesswork. The status tells you everything.
The user's available balance is always calculated as total credits minus the sum of active captive holds. Not total minus confirmed. Not total minus everything. Only active captive holds reduce availability. Confirmed holds are already reflected in the total. Released and expired holds don't count.
This calculation is the foundation of the entire system. Get it wrong, and everything downstream is wrong.
IV – Atomic Reservation: The Lock That Saves You
Creating a hold must be atomic. You cannot check the balance and create the hold in two separate queries. That's the exact race condition we're trying to eliminate.
The critical detail is the database-level row lock. When creating a hold, the transaction acquires a lock on the user's row, preventing any other transaction from reading or modifying it until completion. Two simultaneous requests serialize — the second waits for the first to finish, then reads the updated state including the new hold.
This eliminates the race condition entirely. It's not optimistic locking with retries. It's not an application-level mutex. It's the database doing what databases are designed to do.
Why not use a simpler approach like a conditional update? Because you need to account for existing captive holds. The user's credit column might show 100, but if 95 are already in captive holds, only 5 are actually available. A simple column check can't express this.
The transaction reads the user's credits, aggregates all active captive holds, calculates the true available balance, and only then creates the new hold — all within a single atomic operation. If the available balance is insufficient, the transaction aborts cleanly with a meaningful error that includes how many credits are available and how many are held.
V – Confirming: Converting Promise to Reality
When the operation succeeds, the hold converts from captive to confirmed.
The confirm phase does three things in a single transaction. It verifies the hold is still captive — not expired, not already resolved. It deducts the hold amount from the user's credit column. And it marks the hold as confirmed with a resolution timestamp.
There's a double-check on expiration. Even though a cleanup job runs periodically to expire stale holds, a hold could expire between the last cleanup and the confirm call. The confirm phase checks explicitly and handles this edge case.
If the hold has expired by the time you try to confirm it, the system formally marks it as expired and throws an error. The operation may have succeeded, but the credits are already back with the user. You'll need to handle this at the application level — perhaps by creating a new hold or flagging the operation for manual review.
This is a rare edge case. But in billing, rare edge cases are the ones that generate support tickets.
VI – Releasing: The Graceful Failure Path
When an operation fails, you release the hold. Credits return to available balance instantly.
Note the idempotent design. If you call release on an already-resolved hold, it silently succeeds. This is critical for retry scenarios. If a failure handler runs twice, the second call is a no-op. No errors. No double-refunds. No drama.
And here's a subtlety that reveals the elegance of the system. Release doesn't modify the user's credit column at all. The hold never actually deducted from the credit column — it only reduced the calculated available balance. When the hold disappears from the captive set, the available balance automatically increases.
Confirm writes to the credit column. Release only writes to the hold's status. This asymmetry is intentional and important.
VII – The 5-Minute Safety Net
The TTL is your insurance policy against the unknown.
Server crashes. Network partitions. Forgotten callbacks. Bugs in the operation pipeline. Any of these could leave a hold in captive state forever. The automatic expiration catches them all.
A cleanup job runs every 60 seconds, finding all captive holds whose expiration timestamp has passed and marking them as expired.
Why 5 minutes? It's a balance between two competing concerns.
Too short — say, 30 seconds — and legitimate operations might take longer, especially AI inference with queue times. You'd expire holds that should have been confirmed.
Too long — say, 30 minutes — and users see "missing" credits for extended periods. If they're near zero balance, they might think they can't do anything and churn.
Five minutes covers 99.9% of legitimate operations. Our p99 for AI inference is under 90 seconds. If an operation takes longer than 5 minutes, it should be restructured as a queued job with its own hold management.
The TTL also protects against bad deployments. If you ship a bug in the confirm path, holds expire instead of accumulating indefinitely. The user loses nothing. You get alerted by the warning log. You fix the bug. Nobody files a support ticket.
VIII – Defense in Depth: Three Layers of Protection
The complete flow for a credit-consuming operation has three layers of protection against stuck holds.
Layer one: explicit release in the catch block. If the operation throws an error, the catch handler immediately releases the hold. This covers the vast majority of failure cases — timeouts, API errors, validation failures.
Layer two: operation timeout shorter than hold TTL. The operation timeout is deliberately set to 4 minutes — a full minute less than the 5-minute hold TTL. This gives a buffer. If the operation times out, the catch block releases the hold. If somehow even the catch block fails, there's still a minute before TTL kicks in.
Layer three: the TTL expiration job. The nuclear option. If the server crashes, if the process is OOM-killed, if the catch block throws its own error — the cleanup job catches the orphaned hold within 60 seconds of its expiration.
Three independent mechanisms, each catching what the previous one missed. This is what production reliability looks like.
IX – The Double-Submit Problem
User clicks "Generate" twice in 200 milliseconds. Two requests arrive at the server. Without the captive system, both might pass the balance check and double-spend.
The solution is the operation ID. Each hold has a unique operation ID, and the database enforces a unique constraint on it. The operation ID is generated client-side as an idempotency key. Both requests send the same key. The second one fails with a constraint violation.
This is a one-line defense against one of the most common billing bugs in production systems.
The partial capture problem is equally interesting. An operation partially succeeds — 3 out of 5 requested images generated before a timeout. The system supports partial confirmation, where the hold is confirmed for only the actual amount consumed. The difference automatically returns to the user's available balance.
Every edge case has a clean resolution. That's the power of making holds a first-class entity with explicit state transitions.
X – Available Balance: Transparency Prevents Tickets
When the UI shows the user their balance, it shows all three numbers: total credits, held credits, and available credits.
"15 credits (3 held)."
The user understands why their available balance is lower than expected. Without this transparency, they see "12 credits" and wonder where the other 3 went. They file a support ticket. You investigate. You find they have an active hold from an in-progress operation. You explain. They say "oh."
Transparency prevents confusion. Confusion prevents support tickets. Support tickets prevent you from shipping features.
The balance polls every 10 seconds. Users see their balance update within seconds of a hold being created, confirmed, or released. You could use WebSockets or SSE for more real-time updates, but polling is simpler and good enough for credit balances.
Want to see this billing pattern in production? The Vibe platform at vibe.oakoliver.com processes every credit operation through captive holds. If you're building a credit-based product and want to go deeper on billing architecture, book a mentoring session at mentoring.oakoliver.com.
XI – Performance: Will the Locks Kill You?
A reasonable concern. Database row locks sound scary.
For a single user, yes, their requests serialize. But users don't share locks. User A's hold doesn't block User B's. The lock is held for less than 10 milliseconds — the time to read captive holds and insert a new one.
Even a power user firing 10 requests per second would see minimal contention. And in practice, nobody generates AI content 10 times per second.
The expiration cleanup is equally efficient. A compound index on expiration time and status makes the query a range scan filtered by status. Even with millions of hold records, it runs in sub-millisecond time.
Should you archive old holds? Absolutely. Confirmed, released, and expired holds older than 30 days can be moved to an archive table. Keep the main table lean. The hold table should contain mostly active captive holds and recently resolved ones.
XII – The Pattern That Lets You Sleep
Here's the mental model that fits in a tweet.
Your credit system should work like a restaurant tab. Hold the amount when they order. Charge when the meal is served. Release if the kitchen is out of ingredients. Auto-expire if the waiter disappears for 5 minutes.
If your credit billing does "deduct, operate, maybe refund," you're building on sand. The hold-confirm-release pattern with TTL expiration is the foundation that eliminates double-spending, handles failures gracefully, and lets you sleep at night.
The captive credit system costs you maybe two days of implementation. It saves you months of debugging race conditions and refunding angry users.
A healthy system sees less than 0.1% expired holds. Track that ratio. If you're seeing 5% or more, something is systematically failing to confirm or release. The expiration rate is your canary.
Build it once. Build it right. Build it like a restaurant.
Have you ever shipped a billing system that looked correct in development and then betrayed you at scale — and what was the first symptom you noticed?
– Antonio