14 Test Files, Zero Flaky Tests: How I Test Every API Domain in Elysia.js Without Jest or Vitest

Most developers write integration tests after things break.
I write them before I write the feature.
That's not virtue signaling. It's survival. When you're a solo developer running a mentoring platform that handles real payments, real user sessions, and real email notifications, you don't get the luxury of "we'll add tests later."
Later never comes. What comes instead is a 2 AM Slack message from a mentee who can't book a session because your payment flow silently broke three deployments ago.
I – Why I Abandoned Jest for Bun's Native Runner
If you're still using Jest with a Bun project, you're carrying dead weight.
Jest was designed for a world where Node.js was slow and you needed clever parallelization tricks to make tests bearable. Bun's built-in test runner was designed for a world where the runtime itself is fast, so the test framework can be simple.
No configuration files. No babel transforms. No TypeScript compilation step. Bun understands TypeScript natively, resolves imports natively, and runs your tests against the same runtime your production code uses.
A complete, runnable test file needs nothing but the imports from the built-in test module and your application. No setup ceremony. No plugin ecosystem. No thirty-line config file.
The speed difference isn't incremental — it's categorical. My full suite of 14 integration test files runs in under four seconds. The equivalent Jest setup I benchmarked took 23 seconds. That's not a micro-optimization. That's the difference between running tests on every save and running them "when I remember."
When tests are fast enough to be invisible, you run them constantly. When they're slow enough to be annoying, you skip them. Test speed is a behavioral choice disguised as a technical metric.
II – One File Per Domain, Full Lifecycle Per File
Here's the organizational principle that makes 14 test files manageable: each file tests one API domain, and each domain is tested end-to-end.
Auth. Users. Mentors. Sessions. Payments. Messages. Notifications. Availability. Reviews. Admin. Search. Uploads. Webhooks. Email scheduling.
Fourteen files. Each one exercises the full request lifecycle — from HTTP request through middleware, through validation, through business logic, through database queries, and back to HTTP response.
This isn't unit testing with mocked-out dependencies. The database is real — a test-specific PostgreSQL instance. The auth middleware is real. The route handlers are real. The only things mocked are external services — payment providers that would charge real money and email senders that would deliver real notifications.
The line between integration and unit tests matters. Unit tests verify that individual functions produce correct output. Integration tests verify that the system produces correct behavior when all the parts work together. The bugs that wake you up at 3 AM are integration bugs, not unit bugs. Test accordingly.
III – Testing Without Opening a Socket
The first thing you need for integration testing an Elysia.js app is a way to send requests without actually binding to a network port.
Elysia has a handle method that processes a request through its entire middleware chain — validation, auth, rate limiting, everything — without ever opening a socket. Your integration tests exercise the exact same code path as production, minus the TCP overhead.
This is architecturally significant. You're not testing a mock server. You're not testing a subset of your middleware. You're testing the actual application instance with every plugin, every guard, every transform applied. If your rate limiter would block a request in production, it blocks it in the test.
For typed clients, Elysia's Eden treaty gives you end-to-end type safety in tests. If you rename an endpoint parameter, your tests fail at compile time — not at runtime after a 30-second test run. Type errors caught at compile time cost zero debugging time. Type errors caught at runtime cost hours.
IV – Why Lazy Test Data Hides Real Bugs
I refuse to test with data that looks like "test user, test at test dot com." That kind of lazy test data hides bugs. Real bugs come from names with apostrophes, emails with plus signs, and timezones that aren't UTC.
Every test run generates different data. This is intentional.
Using Faker-powered factories, each test creates mentees with realistic names, random email addresses, randomized timezones from a real-world set, and plausible avatar URLs. Mentors get job-title headlines, multi-paragraph bios, hourly rates between 50 and 300, and randomly selected specialty combinations.
If a test only passes with specific data, it's not testing behavior — it's testing coincidence. The factories force your code to handle the full spectrum of valid inputs.
I found a bug where mentor names containing apostrophes broke a query. I found another where emails with plus characters failed validation. Neither of these would have been caught with "test at test dot com."
The database lifecycle makes this work without tests stepping on each other. Every test file starts by truncating tables in reverse foreign-key order. Every test starts clean. No test depends on state from another file. This eliminates the entire category of "works in isolation, fails in CI" bugs.
V – Mock at the Boundary, Not in the Middle
Bun's built-in mock module system is elegant. You declare module replacements before importing anything that depends on them, and the mock applies transparently.
The critical insight is what you mock and what you don't.
Mock external services: your payment provider, your email sender, your SMS gateway. These are services with real-world side effects that you don't want triggered during tests. Replace them with controlled implementations that capture calls and return predictable responses.
Do not mock your own database. Do not mock your own business logic. Do not mock your own middleware. The whole point of integration tests is to verify that your layers work together. If you mock the database, you're testing whether your mock returns what you told it to return. Congratulations — it does. That tells you nothing about whether your actual queries work.
For payment tests, the mock Stripe module returns realistic-looking checkout session IDs and simulates successful webhook payloads. For auth tests, the mock email module captures the magic link token from the rendered email HTML so the test can verify the full authentication lifecycle — request a link, capture the token, verify it, use the session, confirm the token was consumed in the database.
Your mocks should simulate external behavior at the boundary. Everything internal should be real. That's the integration testing philosophy in one sentence.
VI – The Auth Test That Tells a Story
The authentication test file is my favorite because it exercises an entire user lifecycle in a single flow.
It starts by requesting a magic link for a new email address. The mock email service captures the token from the rendered email. The test then verifies the token by hitting the verification endpoint. It confirms that a session token comes back. It uses that session token to access an authenticated endpoint. And finally, it checks the database to confirm the magic link token was marked as consumed.
That's four API calls that together verify: magic link generation works, token storage works, token validation works, session creation works, authentication middleware works, and token consumption works.
Then the edge cases. An expired token returns 401. An already-used token returns 401. A refresh token generates a new session with a different access token but the same user identity.
Each test builds on the previous state. The test file tells a story — not of isolated function calls, but of a user moving through a real workflow. That's integration testing done right. You're testing journeys, not functions.
VII – Session Booking: The Most Complex Test File
The sessions domain is where integration testing earns its keep.
Multiple actors — a mentor and a mentee. Time-sensitive logic — you can't book a slot in the past. State transitions — pending, confirmed, cancelled. Concurrent access — two mentees can't book the same slot.
The test sets up a mentor with defined availability windows. A mentee books a slot within those windows. The session starts as pending. The mentor confirms it. The status moves to confirmed.
Then the critical edge case: a second mentee tries to book the same time slot. The test expects a 409 conflict response. If your database doesn't handle concurrent booking correctly — if there's no unique constraint or advisory lock on the time slot — this test catches it before a real user discovers it the hard way.
The first mentee cancels with sufficient notice. The status moves to cancelled. The cancellation reason is stored. The mentor's availability opens back up.
Timezone handling gets its own test block. When a request includes a timezone header, the response should include localized times. A mentee in São Paulo should see their session time in BRT, not UTC. This is the kind of bug that unit tests can't catch because it only manifests when the HTTP layer, the business logic, and the date formatting all interact.
VIII – Testing Server-Side Rendered Pages
Some endpoints return HTML — SSR pages, email templates, error pages. For these, a DOM parser lets you assert on the structure of the rendered output.
You make a request with an Accept header of text/html and optionally a Googlebot user agent. You parse the response into a DOM. Then you check for correct meta tags, Open Graph attributes, embedded initial data, and the right JavaScript bundle reference for the route.
This is where integration tests shine brightest. A unit test might verify that a rendering function returns the right string. An integration test verifies that when a real browser — or Googlebot — hits a real route, it gets a fully-formed HTML document with correct SEO metadata, embedded server data, and the correct client bundle.
For a platform that relies on SSG with 480 pre-rendered pages across 40 locales, this kind of end-to-end HTML verification is essential. If the initial data embedding breaks, every page silently becomes a blank shell that hydrates with nothing. Integration tests catch this before Google does.
IX – The Numbers That Matter
Here's what this testing strategy produces in practice.
14 integration test files covering every API domain: auth, users, mentors, sessions, payments, messages, notifications, availability, reviews, admin, search, uploads, webhooks, and email scheduling.
11 unit test files covering validators, transforms, date utilities, and pure business logic.
About 180 total test cases across all files.
3.8 seconds for the full suite execution.
82% code coverage on the API server.
Zero flaky tests. Because we control the database, mock external services, and use deterministic time.
But the most valuable metric isn't coverage. It's confidence. I deploy to production on a single Hetzner VPS with Coolify, and I do it without fear. Every deployment runs the full suite. If tests pass, code ships. If they don't, I fix the test or the code before it reaches production.
X – Five Lessons from Six Months of Maintenance
After maintaining this test suite through months of feature development, refactors, and production incidents, these are the patterns that survived.
Test the HTTP boundary, not internal functions. If you're importing a service function directly into a test, you're writing a unit test. Integration tests should go through the application's request handler. This catches routing bugs, middleware bugs, and serialization bugs that unit tests miss.
Faker data finds bugs that static data hides. Random but realistic data pushes your code through edge cases you'd never think to write manually. Set up your factories once, run them on every test, and let statistical diversity do the work.
Mock at the boundary, not in the middle. External services: mock. Your own layers: real. The integration test is there to verify that your layers compose correctly.
Test error paths more than happy paths. My payment tests have more cases for failures — invalid packages, expired tokens, insufficient credits, webhook replays — than successes. Production systems spend most of their time handling errors. Your tests should too.
One cleanup strategy, applied everywhere. Truncate with cascade at the start of every test file. Brutal but effective. No leaked state. No ordering dependencies. No "works locally, fails in CI."
Want to Build a Test Suite That Lets You Sleep Through the Night?
I've helped developers go from zero tests to full integration coverage — without the overhead of heavyweight frameworks or the fragility of mocked-out databases.
Whether you're building with Elysia.js, Express, Fastify, or anything else on Bun, the principles are the same. Test the boundary. Use real databases. Mock only what's external. Make it fast enough to run on every save.
Book a session at mentoring.oakoliver.com and let's design your testing strategy together.
XI – The Goal Isn't Coverage
The goal isn't 100% coverage.
The goal is sleeping through the night.
Your users are counting on your code working. Your tests are how you keep that promise. And if your tests run in under four seconds, you'll keep that promise on every single commit.
The testing framework doesn't matter. The speed matters. The coverage philosophy matters. The discipline of running them every time matters.
What's your deployment confidence level — and when was the last time your tests actually caught a bug before production did?
– Antonio