BACK TO ENGINEERING
Runtime 8 min read

I Rewrote Python's Fastest BM25 Library in TypeScript. It's Now 2x Faster With Zero Dependencies.

Python's bm25s library is genuinely impressive. It achieves sub-millisecond retrieval on million-document corpora by precomputing BM25 scores during indexing and storing them in sparse matrices. The author, Xing Han Lu, wrote a paper about it. The library has over 1,200 GitHub stars.

I rewrote the entire thing in TypeScript for Bun. It's faster.

Not marginally faster. Twice as fast at indexing. Two to three times faster at retrieval against the numpy backend. And it has exactly zero runtime dependencies.

The library is called bun-bm25s. It's open source and available on npm. You can install it right now with bun add bun-bm25s.

This is how I did it.


I – Why BM25 Still Matters

Every developer eventually asks the same question: "How do I add search to my application?"

The modern answer is supposed to be vector embeddings. Semantic search. Dense retrieval. You embed your documents, embed your query, compute cosine similarity, and rank by distance.

But vector search has problems that nobody talks about.

Embeddings are expensive to compute. They require either API calls to OpenAI or running inference locally. They're opaque — you can't explain why a result ranked higher. They struggle with exact keyword matches. And they require specialized vector databases that add operational complexity.

BM25 is the opposite.

It's a formula from 1994 that ranks documents by term frequency and inverse document frequency. No machine learning. No embeddings. No external services. Just math that runs entirely on your machine.

For most search use cases — documentation, product catalogs, blog posts, help articles — BM25 produces results that are indistinguishable from semantic search. Often better, because users searching for "python bm25 library" actually want documents containing those exact words.

BM25 is the 80/20 solution that handles 95% of real-world search needs.

The problem is implementation. Naive BM25 is slow. You have to scan every document for every query. The bm25s library solved this with a technique called eager sparse scoring — precompute everything during indexing, store it in a sparse matrix, and retrieval becomes a series of array lookups.

I took that architecture and rebuilt it from scratch for Bun.


II – The Architecture That Makes It Fast

Traditional BM25 implementations compute scores at query time. For each query term, they iterate through all documents, calculate term frequency, apply the BM25 formula, and accumulate scores. This is O(queries × documents × terms).

Eager sparse scoring inverts this. During indexing, you compute the BM25 score for every (document, term) pair that exists in your corpus. You store these precomputed scores in a Compressed Sparse Column matrix — a data structure optimized for column-wise access.

At query time, you look up each query term in the matrix. The CSC format gives you immediate access to all documents containing that term and their precomputed scores. You sum the scores across query terms and return the top-k results.

Retrieval becomes O(query_terms × average_documents_per_term).

For a typical query with three terms against a corpus where each term appears in a few thousand documents, this is microseconds instead of milliseconds.

The Python implementation uses scipy's sparse matrix format and numpy for array operations. The numba backend JIT-compiles the hot loops to machine code.

I couldn't use any of that. Bun doesn't have scipy. It doesn't have numpy. It doesn't have numba.

I had to build the entire sparse matrix infrastructure from scratch in pure TypeScript.


III – CSC Matrices in TypeScript

A Compressed Sparse Column matrix stores three arrays:

  • data: The non-zero values (BM25 scores)
  • indices: The row index for each value (document IDs)
  • indptr: Pointers marking where each column starts

To find all documents containing token 42, you read indptr[42] and indptr[43]. Those two values tell you the slice of data and indices that contains every (document, score) pair for that token.

const start = indptr[tokenId];
const end = indptr[tokenId + 1];

for (let j = start; j < end; j++) {
  scores[indices[j]] += data[j];
}

This is the entire retrieval hot path. Three array accesses per entry. No hash lookups. No object allocations. No function calls.

The simplicity is the performance.

I use Float64Array for scores and Uint32Array for indices. Typed arrays in JavaScript are backed by contiguous memory buffers. They're as close to C arrays as you can get in a garbage-collected language.

Building the CSC matrix during indexing requires sorting entries by column (token ID), which I do with a single pass using counting sort. The entire matrix construction is O(n) where n is the number of (document, term) pairs.


IV – The Optimizations That Mattered

Three optimizations pushed performance from "pretty good" to "faster than Python."

First: loop unrolling in the scoring kernel. The innermost loop — summing scores for documents containing a query term — processes four entries at a time when possible. Modern CPUs can execute multiple independent additions in parallel. Unrolling exposes this parallelism to the processor.

let j = start;
const end4 = start + ((end - start) & ~3);

for (; j < end4; j += 4) {
  scores[indices[j]] += data[j];
  scores[indices[j + 1]] += data[j + 1];
  scores[indices[j + 2]] += data[j + 2];
  scores[indices[j + 3]] += data[j + 3];
}

for (; j < end; j++) {
  scores[indices[j]] += data[j];
}

Second: avoiding allocations in the hot path. The original implementation allocated a new Float64Array for scores on every query. I added buffer reuse — pass in a pre-allocated array and clear it with fill(0) instead of allocating fresh. On a 100,000-document corpus, this alone improved throughput by 15%.

Third: efficient top-k selection. After scoring, you need the k highest-scoring documents. Full sorting is O(n log n). I use a min-heap that maintains only the top k elements, which is O(n log k). For typical k values of 10-100, this is substantially faster.

For small arrays or large k values, the heap overhead isn't worth it, so the implementation falls back to full sorting. The crossover point is around n=1000 or k > n/2.


V – The Benchmark Numbers

All benchmarks run on Apple M2 Max with 12 cores and 96GB RAM. Bun 1.3.10. The Python comparison uses bm25s 0.2.0 with both numpy and numba backends.

Indexing throughput:

Corpus Size Python (numpy) bun-bm25s Speedup
10,000 docs 248 ms 117 ms 2.1x
50,000 docs 1.27 s 571 ms 2.2x
100,000 docs 2.54 s 1.20 s 2.1x

Peak indexing rate: 4.6 million tokens per second.

Retrieval throughput (1000 queries, k=10):

Corpus Size Python (numpy) bun-bm25s Speedup
10,000 docs 8,369 QPS 27,103 QPS 3.2x
50,000 docs 1,984 QPS 6,086 QPS 3.1x
100,000 docs 1,067 QPS 3,181 QPS 3.0x

Against the numba JIT-compiled backend, the story is more nuanced. Numba excels at retrieval on small corpora — it achieves 211,000 QPS on 1,000 documents where bun-bm25s hits 6,800. JIT compilation pays dividends when the loop body is trivial and iteration count is low.

But bun-bm25s still wins on indexing by 2x even against numba. And on larger corpora — 50,000+ documents — retrieval performance converges.

The real advantage is deployment simplicity. No Python. No pip. No numpy. No numba. No compilation step. Just TypeScript that runs directly on Bun.


VI – The API

import { BM25, tokenize } from "bun-bm25s";

const corpus = [
  "a cat is a feline and likes to purr",
  "a dog is the human's best friend",
  "a bird is a beautiful animal that can fly",
];

const corpusTokens = tokenize(corpus);
const retriever = new BM25();
retriever.index(corpusTokens);

const queryTokens = tokenize(["does the cat purr?"]);
const { documents, scores } = retriever.retrieve(queryTokens, { k: 2 });

The API mirrors bm25s exactly. If you've used the Python library, you already know how to use this one.

Five BM25 variants are supported: Robertson (original), Lucene (default), ATIRE, BM25L, and BM25+. The tokenizer includes stopwords for twelve languages and supports custom stemming functions.

Indices can be saved to disk and loaded later:

await retriever.save("./my_index");
const loaded = await BM25.load("./my_index");

The serialization format is binary for the sparse matrix and JSON for metadata. A 100,000-document index serializes to roughly 50MB.


VII – Why Zero Dependencies Matters

The library has no runtime dependencies. Not one.

This matters for three reasons.

Supply chain security. Every dependency is an attack surface. The recent xz backdoor affected a compression library buried six levels deep in dependency trees. Fewer dependencies means fewer things that can be compromised.

Bundle size. If you're using bun-bm25s in a serverless function or edge worker, every kilobyte counts. The entire library is under 20KB minified.

Operational simplicity. No native modules that need compilation. No platform-specific binaries. No Python interop. It runs anywhere Bun runs.

I could have used existing sparse matrix libraries. I could have pulled in a heap implementation from npm. Every time I considered it, I asked: can I implement this in under 200 lines of TypeScript?

The answer was always yes.


VIII – What I Learned

Typed arrays are fast. Really fast. The performance gap between Float64Array operations and regular JavaScript arrays is substantial. For numeric computation, always use typed arrays.

Allocation is expensive. The single biggest performance win came from reusing buffers instead of allocating new arrays. In a hot loop, even a single allocation per iteration dominates runtime.

Simple algorithms beat clever ones. The CSC matrix lookup is three array accesses. The scoring loop is one addition. There's no cleverness to go wrong, nothing to confuse the JIT compiler, no branches to mispredict.

Bun is ready for compute-heavy workloads. This isn't a web framework. It's a numerical computing library. And it runs beautifully on Bun, matching or exceeding Python with numpy — a language that exists specifically for numerical computing.


IX – Get It

bun add bun-bm25s

The source is at github.com/oakoliver/bun-bm25s. MIT licensed. Contributions welcome.

If you're building search into a Bun application — documentation sites, product catalogs, content platforms, internal tools — this gives you sub-millisecond retrieval with zero operational overhead.

No Elasticsearch cluster. No vector database. No API calls.

Just BM25, implemented properly, running at native speed.

What are you searching through — and have you considered that the 30-year-old algorithm might be exactly what you need?

– Antonio

"Simplicity is the ultimate sophistication."