Full-Text Search Without Elasticsearch: BM25 in Pure TypeScript

We needed search for Mentoring. Users searching for mentors by expertise. Mentors searching through session history. Admins searching through user accounts.

The obvious answer was Elasticsearch. Managed service, battle-tested, scales infinitely. Also: $150/month minimum for a single node, another service to monitor, another thing that can go down at 3am.

We wrote our own instead.

I – The Problem with Managed Search

Elasticsearch is infrastructure. Even managed Elasticsearch (AWS OpenSearch, Elastic Cloud) requires:

A dedicated cluster with memory allocation decisions
Index mapping configuration
Connection pooling and retry logic
Monitoring dashboards and alerts
A separate deployment pipeline

For a platform with 50,000 mentors and 200,000 sessions, this is overkill. Our search queries are simple: find mentors by name, expertise, or bio. Find sessions by topic or notes. Basic full-text search.

We didn't need distributed sharding. We didn't need real-time indexing of millions of documents per second. We needed to answer "which mentors know TypeScript?" in under 50ms.

II – BM25: The Algorithm Behind Everything

Every production search system uses BM25. Elasticsearch, Lucene, Solr — they all use the same formula from 1994:

score(D, Q) = Σ IDF(qi) · (f(qi, D) · (k1 + 1)) / (f(qi, D) + k1 · (1 - b + b · |D|/avgdl))

This looks complex but does something simple: rank documents by how relevant they are to a query. Terms that appear frequently in a document increase its score. Terms that appear in many documents (like "the") decrease their weight.

The innovation we borrowed from Python's bm25s library is eager sparse scoring. Instead of computing BM25 at query time, you precompute scores during indexing and store them in a compressed sparse matrix.

flowchart LR
    subgraph Index Time
        A[Documents] --> B[Tokenize]
        B --> C[Compute IDF]
        B --> D[Term Frequencies]
        C --> E[Precompute BM25]
        D --> E
        E --> F[Sparse Matrix]
    end

    subgraph Query Time
        G[Query Tokens] -->|lookup| F
        F --> H[Ranked Results]
    end

Query time becomes array lookups instead of formula evaluation.

III – The Implementation

We ported bm25s from Python to TypeScript. The core is 500 lines:

import { BM25, tokenize } from "bm25s";

// Index mentor profiles
const mentorProfiles = mentors.map(m => 
  `${m.name} ${m.expertise.join(" ")} ${m.bio}`
);
const tokens = tokenize(mentorProfiles);

const index = new BM25();
index.index(tokens);

// Search
const query = tokenize(["typescript react senior"]);
const { documents, scores } = index.retrieve(query, { k: 20 });

// documents = [12, 45, 3, ...] — indices into mentors array
// scores = [8.2, 7.1, 6.8, ...] — BM25 relevance scores

The tokenize function handles lowercasing, punctuation removal, and stopword filtering. Twelve languages supported out of the box.

The retrieve function returns document indices sorted by relevance. We map these back to mentor IDs and return the results.

IV – Performance in Production

We benchmarked against our requirements:

Metric	Requirement	Actual
Index 50K mentors	< 5s	1.2s
Search latency (p50)	< 50ms	8ms
Search latency (p99)	< 200ms	23ms
Memory usage	< 500MB	180MB

The Compressed Sparse Column matrix is memory-efficient. 50,000 mentor profiles with an average vocabulary of 200 terms each fits in 180MB.

Indexing runs at 4.6 million tokens per second on a single Bun process. We rebuild the index every hour from the database. Fresh mentor signups appear in search within the hour.

V – Integration with Elysia

The search endpoint is a simple Elysia route:

import { Elysia, t } from "elysia";
import { searchIndex } from "./search";

export const searchRoutes = new Elysia({ prefix: "/search" })
  .get("/mentors", async ({ query }) => {
    const results = searchIndex.search(query.q, {
      limit: query.limit ?? 20,
      filters: {
        expertise: query.expertise,
        minRating: query.minRating,
      },
    });
    
    return results;
  }, {
    query: t.Object({
      q: t.String(),
      limit: t.Optional(t.Number()),
      expertise: t.Optional(t.Array(t.String())),
      minRating: t.Optional(t.Number()),
    }),
  });

Post-filtering happens after BM25 retrieval. We fetch the top 100 results, then filter by expertise and rating, then return the top 20. This is fast because filtering 100 objects is trivial.

VI – Persistence and Rebuilds

The index persists to disk as three files:

/data/search/
  index.json      # Vocabulary and document frequency
  matrix.bin      # Compressed sparse scores
  documents.json  # Document ID mapping

Total size for 50K mentors: 12MB on disk, 180MB in memory.

Rebuild process runs hourly via our cron system:

import { Cron } from "croner";

Cron("0 * * * *", async () => {
  const mentors = await db.mentor.findMany({
    select: { id: true, name: true, expertise: true, bio: true },
  });
  
  await searchIndex.rebuild(mentors);
  await searchIndex.persist("/data/search");
  
  logger.info(`Search index rebuilt: ${mentors.length} mentors`);
});

If the rebuild fails, the previous index stays in memory. Users never see degraded search.

flowchart TD
    CRON["Hourly Cron"] --> FETCH["Fetch from DB"]
    FETCH --> BUILD["Rebuild index"]
    BUILD -->|success| SWAP["Swap live pointer"]
    BUILD -->|failure| KEEP["Keep previous index"]
    SWAP --> PERSIST["Persist to disk"]
    PERSIST --> SERVE["Serve from new index"]
    KEEP --> SERVE_OLD["Serve from old index"]

VII – What We Didn't Build

We consciously avoided:

Fuzzy matching — Typo tolerance adds complexity. Users can search again.
Synonyms — "JS" matching "JavaScript" requires a synonym dictionary. Not worth it yet.
Faceted search — Filtering by category is post-processing, not index-level.
Real-time indexing — Hourly rebuilds are good enough.

Each of these is addable later. The BM25 core doesn't change.

VIII – The Cost Comparison

Elasticsearch (managed):

$150/month for smallest viable cluster
Ongoing maintenance and monitoring
Another service in the dependency graph
Another thing to debug when search is slow

BM25 in-process:

$0/month additional
180MB memory on existing Bun server
No network calls for search
Debuggable with console.log

For our scale (50K documents, simple queries), the in-process solution wins on every metric.

IX – When to Use This

This approach works when:

Document count is under 1 million
Index fits in server memory (< 2GB)
Hourly index freshness is acceptable
Query patterns are simple (keyword search, not semantic)

If you need real-time indexing of millions of documents, distributed search across regions, or vector similarity search — use Elasticsearch or a managed alternative.

For everything else, BM25 in a few hundred lines of TypeScript is surprisingly capable.

X – Get It

npm install bm25s

The library is a port of Python's bm25s by Xing Han Lu, adapted for the JavaScript ecosystem. Works on Node.js, Bun, and Deno. Zero runtime dependencies.

Source: github.com/oakoliver/bm25s

We've been running this in production for Mentoring since January. Zero issues. Search just works.