Stop Pasting Into ChatGPT. Build a Knowledge Base Your AI Can Actually Search.
You're researching something. You have twelve browser tabs open. A PDF from last week. Notes from a call. Three articles you bookmarked and forgot about.
You copy a paragraph into ChatGPT. It gives you a decent answer. Then you need context from that PDF too, so you copy more. Then you realize the answer contradicts something from one of those tabs. So you copy that in. Your conversation is now a mess of pasted text, and tomorrow you'll start from scratch.
This is how most people use AI for research. It's painful and it doesn't scale.
I built a tool to fix this. It's called kb.
npm install -g @oakoliver/kb
I – What If Your AI Could Remember Everything You've Read?
Here's what using kb looks like in practice.
You start a knowledge base for whatever you're researching — say, machine learning papers for a project at work:
kb init ml-research
cd ml-research
Now you feed it sources. Anything you'd normally paste into ChatGPT, you give to kb instead:
kb ingest https://arxiv.org/abs/1706.03762
kb ingest ./attention-is-all-you-need.pdf
kb ingest ./meeting-notes.md
kb ingest https://github.com/huggingface/transformers
URLs get fetched and converted to clean markdown. PDFs get text-extracted. Local files get indexed. Git repos get their READMEs pulled. You don't think about formats — you just point kb at things.
Then you compile. This is where the magic happens:
kb compile
An LLM reads everything you've ingested and produces a structured wiki — articles about key concepts, named entities, and multi-source summaries. Each article links to related articles with [[wikilinks]] and cites its sources.
Your twelve tabs and scattered PDF are now an organized, interlinked knowledge base.
II – The Two Ways to Get Answers
Once compiled, you have two ways to search your knowledge base.
Fast keyword search — no AI involved, instant results:
kb find "attention mechanism"
# wiki/concepts/attention-mechanism.md (0.95)
# ...allows models to focus on relevant parts of the input...
This uses BM25 ranking, the same algorithm behind most search engines. It's fast because there's no API call. You get results in milliseconds.
AI-powered Q&A — when you need synthesis across multiple sources:
kb query "How does attention differ from traditional sequence models?"
# Based on the knowledge base articles, attention mechanisms
# fundamentally changed how models handle sequential data by...
# [streams in real-time]
#
# Sources: [[Attention Mechanism]], [[Transformer Architecture]]
The AI reads the relevant articles from your compiled wiki and gives you an answer grounded in your specific sources — not the internet, not its training data, your research.
Every answer gets saved. If an answer is good enough to keep, promote it:
kb promote queries/2026-04-03-attention.md --as concept
Now it's a permanent part of your wiki, linked to everything else.
III – Your Wiki Works Everywhere
Here's the part that surprised even me.
Because kb compiles everything to plain markdown files with standard YAML frontmatter and [[wikilinks]], the output is a fully functional Obsidian vault.
Open the wiki/ folder in Obsidian and you get:
- Graph view showing how all your concepts connect
- Backlinks showing which articles reference each other
- Full-text search across everything
- Tags for filtering by topic
You didn't configure anything. You didn't install any Obsidian plugins. The wiki is just files — organized files that Obsidian already knows how to render.
This also means your knowledge base is:
- Git-friendly — every article is a markdown file with clean diffs
- Portable — no database, no proprietary format, just a folder
- Yours — no cloud service, no subscription, no vendor lock-in
IV – The Workflow That Actually Works
After using kb across several of my own projects, here's the workflow I've settled into.
Daily: Ingest anything interesting. An article a colleague shared, a documentation page, notes from a meeting. Takes five seconds:
kb ingest https://interesting-article.com
Weekly: Compile and review. See what new concepts the LLM extracted, check if the connections make sense:
kb compile
kb status
# Sources: 42 (3 new)
# Articles: 87
# - Concepts: 45
# - Entities: 32
# - Syntheses: 10
# Health: 2 stale, 1 orphan
On demand: Ask questions. Instead of context-switching to ChatGPT and re-pasting everything, just ask your knowledge base:
kb query "What are the key tradeoffs between BM25 and vector search?"
Periodically: Lint and clean up. Make sure nothing is broken:
kb lint
# ✗ Broken link: wiki/concepts/foo.md → [[Nonexistent]]
# Found 1 error
The knowledge base grows incrementally. You never reprocess unchanged content — kb tracks content hashes and only recompiles what changed. A wiki with a thousand articles and five new sources recompiles in seconds.
V – What's Under the Hood
I didn't build this with duct tape. kb runs on a stack I've been developing across dozens of open-source packages — all TypeScript, all running on Bun, all zero external dependencies.
The pipeline:
Sources (URLs, PDFs, markdown, git repos)
│
▼
Ingestion ── type detection, fetching, text extraction
│
▼
raw/ ── clean markdown with manifest tracking
│
▼
Compilation ── LLM extracts concepts, entities, relationships
│
▼
wiki/ ── interlinked articles with YAML frontmatter
│
├──▶ kb find (BM25 keyword search, no LLM)
└──▶ kb query (LLM synthesis with source citations)
The stack:
| Component | What it does |
|---|---|
| Bun | Runtime — native TypeScript, fast startup, single binary builds |
| bm25s | Full-text search — my TypeScript port of Python's fastest BM25 library |
| pageindex | PDF text extraction — no native dependencies |
| lipgloss | Terminal styling — ported from Go's lipgloss |
| bubbles | Spinners and TUI components — ported from Go's Bubbles |
| glamour | Markdown rendering in terminal — ported from Go's Glamour |
| zod | Schema validation for article frontmatter |
Every one of the terminal UI libraries is a port I wrote from Go's Charm ecosystem to TypeScript. The search library is a rewrite of Python's bm25s that ended up 2x faster than the original. The PDF library extracts text without shelling out to system tools.
The numbers: 8 commands. 27 modules. 154 tests with 322 assertions. Zero runtime dependencies outside the ecosystem. Builds to a standalone binary with bun build --compile.
VI – It's Scriptable Too
Everything kb outputs has two modes: pretty text for humans, structured JSON for machines.
Run kb status in a terminal and you get a formatted dashboard. Pipe it somewhere and you get JSON:
status=$(kb status --json)
stale=$(echo "$status" | jq '.health.stale')
if [ "$stale" -gt 0 ]; then
kb compile
fi
This makes kb composable with other tools. Batch-ingest URLs from a file:
while read -r url; do
kb ingest "$url"
done < urls.txt
kb compile
Hook it into a CI pipeline that keeps a team knowledge base fresh. Pipe query results into another LLM for further processing. The JSON output mode means kb plays well with any automation you throw at it.
VII – When to Use This (And When Not To)
Use kb when:
- You're researching a topic across many sources and need a way to organize and query them
- You want an AI that answers from your specific material, not the open internet
- You like Obsidian and want a knowledge base that works as a vault
- You need something local, private, and file-based — no cloud required
- You're building automation around knowledge management
Don't use kb when:
- You need production-scale RAG serving thousands of users — this is a personal/small-team tool
- You need semantic vector search — kb uses keyword search (BM25), which handles 95% of use cases but isn't the same thing
- Your corpus is millions of documents — kb is optimized for hundreds to low thousands of articles
This isn't a replacement for Pinecone or Weaviate. It's the tool you reach for when you want a knowledge base that's as simple as a folder of markdown files, backed by an AI that actually knows what's in them.
Get started:
npm install -g @oakoliver/kb
kb init my-research
kb ingest https://some-interesting-article.com
kb compile
kb query "What did I just read?"
Links: