Obsidian Semantic Search: Grep, OmniSearch & QMD Benchmarked

I have around 2,400 notes in Obsidian — atomic notes, book summaries, meditation research, project files, writing drafts. At that scale, built-in search becomes unreliable for anything beyond finding a file you already know exists. If you want to find what you wrote about staying focused under pressure, or which books relate to a current project, keyword search fails you. You used different words when you wrote those notes.

This is a walkthrough of the three approaches I tried: Grep, OmniSearch, and QMD. What each does, how they actually perform on a real vault, and how I set up QMD for long-term use with Claude Code as an AI agent.

The Problem with Default Search

Obsidian’s built-in search and Grep both operate on the same principle: find documents containing these characters. “Meditation” finds documents with the word meditation. It won’t find notes about sustained attention or contemplative practice — unless those exact phrases appear.

At a few hundred notes this is manageable. At 2,400+, conceptual queries return either nothing or too much noise to be useful. I want to ask “what do I know about motivation decay over long projects?” and get relevant results — not hope I happened to write those exact words.

Three Approaches

Grep

The baseline. Fast, exact, no setup required.

grep -rl "meditation practice" Notes/ Hubs/ --include="*.md"

What you get: file paths, unranked, alphabetical. No snippets, no relevance scores, no semantic awareness. Useful when you know exactly what you’re looking for and roughly where it lives. For discovery across a large vault, it’s close to useless — and the output is just paths, so you still need to open files to see what’s actually relevant.

Running it against “meditation practice” across my vault:

Speed:    <1s
Matched:  20 files
Output:   ~568 tokens (file paths only)
 
Notes/Appamada - Five Hindrances Talk.md
Notes/GSA Meditation Class Outline.md
Notes/Three types of Zazen.md
Notes/Open monitoring meditation reduces the involvement...md
Planner/Projects/Done/PR-2023-GSA-Meditation-Class.md
... (15 more)

The token count is low because grep returns nothing but paths. Downstream, you’d need to read the relevant files to get any content — multiplying the actual context cost.

OmniSearch

OmniSearch is an Obsidian community plugin that builds a full-text index of your vault and exposes a local HTTP API. Under the hood it uses BM25 — the same ranking algorithm behind Elasticsearch’s default search — which scores results by term frequency and document length rather than binary yes/no matching.

Installing the plugin and enabling the HTTP server gives you this from any terminal or AI agent:

curl -s "http://localhost:51361/search?q=meditation+practice" \
  | jq '.[:10] | map({path, basename, score})'

Fast, ranked, and no model loading. It searches your entire vault including folders Obsidian indexes (Hubs, Notes, Writing Journal, Planner). The results against “meditation practice”:

Speed:    <1s (running as Obsidian background daemon)
Results:  54 ranked documents
Tokens:   ~200 (as used: paths + scores only)
          ~16,847 (raw response — OmniSearch returns full document excerpts)
 
Top 5:
  417  Hubs/Meditation Hub.md
  307  Writing Journal/zenist/meditation-guide/meditation-guide-draft.md
  298  Notes/GSA Meditation Class Outline.md
  276  Writing Journal/zenist/meditation-guide/outline.md
  232  Writing Journal/zenist/meditation-guide/PR-2025-Launch Meditation Guide.md

The top result — the Meditation Hub — is a Map of Content (MOC): a navigational note linking out to dozens of related atomic notes. BM25 naturally favors it because MOCs are dense with keywords and highly linked. That’s a win here.

Two caveats worth knowing upfront. First, OmniSearch requires Obsidian to be running — when you close Obsidian, the HTTP server stops. For an AI agent that might search your vault from a terminal session without Obsidian open, this is a real constraint. Second, OmniSearch is frequently described as “semantic search.” It isn’t. It’s BM25 — fast and useful, but keyword-based. Search for “staying focused” and it won’t surface notes about attention unless those words appear.

QMD

QMD is a command-line search tool built by Tobi Lütke — Shopify’s CEO — as a personal tool for searching his own knowledge base. It runs entirely locally: no API calls, no cloud, three GGUF models downloaded on first use and cached in ~/.cache/qmd/models/.

Three search modes with meaningfully different quality-speed tradeoffs:

Mode	Command	Mechanism	Speed
Keyword	`qmd search`	BM25 only	<1s
Semantic	`qmd vsearch`	Vector similarity	~1s
Hybrid	`qmd query`	BM25 + Vector + Reranking	~5–21s

The hybrid mode is where QMD earns its keep.

How QMD’s Pipeline Works

Understanding what qmd query actually does helps you know when to use it and what to expect from the latency.

Step 1: Query Expansion

Before searching, QMD generates two alternative phrasings using a fine-tuned 1.7B language model (qmd-query-expansion). For “meditation practice,” it generated:

Original query (weighted 2×):  meditation practice
Expanded 1:                     definition of meditation and its benefits
Expanded 2:                     importance of meditation in various religions
HyDE variant:                   Understanding meditation practice is essential for...

Your original query gets double weight to preserve intent. The expansions bridge the gap between how you’re asking now and how you wrote those notes months ago.

Step 2: Parallel BM25 + Vector Search

For each of the three query variants, QMD runs both BM25 full-text search and vector similarity search in parallel — 6 search passes total.

BM25 you already know from OmniSearch: term frequency, document length normalization, fast.

Vector search is different. It converts text to numerical representations (embeddings) where semantic similarity becomes geometric proximity. QMD uses Google’s embeddinggemma-300M — a 300M parameter model that runs quickly on Apple Silicon via Metal. Notes about “sustained attention” end up geometrically near notes about “focused awareness” in this space, even if they share no keywords. That’s the capability BM25 can’t replicate.

Step 3: Reciprocal Rank Fusion

Six result lists, each with different documents at different positions. Reciprocal Rank Fusion (RRF) merges them: each document scores based on its rank position across all lists, not its raw score. A document placing first in any list gets a +0.05 bonus; positions 2–3 get +0.02. Top 30 candidates advance.

RRF is more robust than averaging scores because raw BM25 numbers (0–25+) and cosine similarity scores (0–1.0) aren’t on the same scale. Rank positions are.

Step 4: LLM Reranking

The top 30 candidates get re-read by Qwen3-Reranker-0.6B — a small language model fine-tuned specifically to judge relevance. For each candidate it asks: does this document answer the query? It returns yes/no confidence via logprobs, which blend back with RRF scores in a position-aware way:

Top 3 results: 75% RRF, 25% reranker (preserves high-confidence exact matches)
Results 4–10: 60% RRF, 40% reranker
Results 11+: 40% RRF, 60% reranker (trust the reranker more when retrieval confidence is lower)

The result is scores that actually mean something. A 0.93 result is genuinely different from a 0.43 result — the gap tells you how confident the pipeline is. QMD’s pure BM25 mode returns everything at 0.88–0.89, which tells you almost nothing.

Installation

QMD requires Node.js v22 specifically. Not v23, not v25 — better-sqlite3 (a dependency) hasn’t caught up to later versions. If you’re running a newer Node, use fnm to manage versions:

# Install fnm (fast Node version manager, written in Rust)
brew install fnm
 
# Pin to Node 22
fnm use 22 --install-if-missing
 
# Install QMD
bun install -g @tobilu/qmd
# or
npm install -g @tobilu/qmd

macOS ships with a SQLite version that lacks the FTS5 extension QMD needs. Install the Homebrew version first:

brew install sqlite

On first use of qmd query, it downloads three models to ~/.cache/qmd/models/:

Model	Purpose	Size
embeddinggemma-300M-Q8_0.gguf	Vector embeddings	~300MB
qwen3-reranker-0.6b-q8_0.gguf	Re-ranking	~640MB
qmd-query-expansion-1.7B-q4_k_m.gguf	Query expansion	~1.1GB

Plan for ~2GB on first run. Subsequent runs load from cache — model load time on Apple Silicon M1 Pro is around 1–2 seconds.

Collection Design

QMD organizes content into named collections, and this is where setup decisions matter most. The naive approach — one collection pointing at your entire vault — works, but misses the most valuable feature: context.

Context is descriptive metadata attached to a collection that travels with every search result:

qmd://hubs/meditation-hub.md #a3f2c1
Title: Atomic Notes
Context: Maps of Content (MOCs) — topic indexes that link and organize related notes
Score: 47%

When an AI agent sees that context label, it knows this is a navigational document — not atomic content. That shapes how it uses the result. Without context, all results look the same regardless of what kind of note they are.

Don’t use one giant collection. Multiple focused collections with distinct contexts gives the reranker and any downstream LLM much better signal about what it’s looking at.

For a vault with ~2,400 notes, I ended up with four collections:

Collection	Path	Mask	Files	Context
`obsidian-notes`	Notes/	`*.md`	2,036	”Atomic notes and research — ideas, concepts, tools, and reference material”
`books`	Notes/Books/	`*/.md`	334	”Book notes and summaries — insights extracted from books”
`hubs`	Hubs/	`*/.md`	117	”Maps of Content (MOCs) — topic indexes linking related notes across the vault”
`projects`	Planner/Projects/	`*/.md`	186	”Project notes — active, on-hold, and completed projects”

Note that books is a subfolder of Notes/ — I excluded it from obsidian-notes by using *.md (flat, not recursive) as the mask. If I’d used **/*.md for both, the 334 book notes would appear in two collections.

Setup Commands

# Add collections
qmd collection add ~/vault/Notes --name obsidian-notes --mask "*.md"
qmd collection add ~/vault/Notes/Books --name books --mask "**/*.md"
qmd collection add ~/vault/Hubs --name hubs --mask "**/*.md"
qmd collection add ~/vault/Planner/Projects --name projects --mask "**/*.md"
 
# Add context — do this BEFORE embedding
qmd context add qmd://obsidian-notes "Atomic notes and research — ideas, concepts, tools, and reference material"
qmd context add qmd://books "Book notes and summaries — insights and highlights extracted from books"
qmd context add qmd://hubs "Maps of Content (MOCs) — topic indexes that link and organize related notes across the vault"
qmd context add qmd://projects "Project notes — active, on-hold, and completed projects with goals and next actions"
 
# Generate embeddings (downloads models on first run — ~2GB)
qmd embed

Add context before running qmd embed. If you add it after, you’ll need qmd embed -f to force a full re-embed.

Verify the setup:

qmd status

QMD Status
 
Index: ~/.cache/qmd/index.sqlite
Size:  54.5 MB
 
Documents
  Total:    2,673 files indexed
  Vectors:  2,673 embedded
  Updated:  just now
 
Collections
  obsidian-notes (qmd://obsidian-notes/)
    Pattern:  *.md
    Files:    2036
    Contexts: 1
      /: Atomic notes and research — ideas, concepts, tools, and r...
  books (qmd://books/)
    Pattern:  **/*.md
    Files:    334
    ...

Keeping the Index Current

Two commands, different costs.

qmd update re-scans the filesystem and updates the BM25 index. Fast, no model loading, can run frequently.

qmd embed loads the embedding model into VRAM, processes newly indexed documents, generates vectors. It’s incremental — only processes what’s changed — but the model load overhead is there every run regardless.

For a personal vault, once daily is sufficient. Most searches are for notes written days or weeks ago.

I already had a launchd agent that syncs Things tasks into my daily notes each morning — adding qmd update && qmd embed to that same script means one job handles all the overnight vault maintenance. The reindex runs at 5:15am, right after the daily note catchup and before a git snapshot commits everything.

If you don’t have an existing automation to piggyback on, a cron job works fine:

0 3 * * * /path/to/qmd update && /path/to/qmd embed

Either way, if you installed via fnm, the qmd binary won’t be on the default cron or launchd PATH. Use the full path — find it with which qmd.

Search Comparison

Same query — “meditation practice” — across all three methods. Vault at this point: ~2,673 indexed documents across four collections.

Method	Time	Output tokens	Results
Grep	<1s	~568	20 files (paths, unranked)
OmniSearch	<1s	~200*	54 ranked results
QMD search (BM25)	<1s	~1,502	10 snippets
QMD query (hybrid)	~2s warm / ~21s cold†	~1,471	10 snippets, reranked

*Passing {path, basename, score} to the LLM — the raw OmniSearch response is ~16,847 tokens because it includes full document excerpts. †Cold start loads ~1GB of models from disk. Once warm, subsequent queries in the same session run in ~2s.

Full results:

OmniSearch top 5:

417  Hubs/Meditation Hub.md
307  Writing Journal/meditation-guide/meditation-guide-draft.md
298  Notes/GSA Meditation Class Outline.md
276  Writing Journal/meditation-guide/outline.md
232  Writing Journal/meditation-guide/PR-2025-Launch Meditation Guide.md

QMD search (BM25 only) top 5:

0.89  attention-regulation-and-monitoring-in-meditation  [obsidian-notes]
0.89  open-monitoring                                    [obsidian-notes]
0.89  PR-2023-GSA-Meditation-Class                       [projects]
0.89  pitch-bio                                          [obsidian-notes]
0.88  focused-attention                                  [obsidian-notes]

QMD query (hybrid + rerank) top 5:

0.93  GSA Meditation Class Outline       [obsidian-notes]
0.55  open-monitoring                    [obsidian-notes]
0.47  Meditation Hub                     [hubs]           ← collection context working
0.46  PR-2023-GSA-Meditation-Class       [projects]
0.45  pitch-bio                          [obsidian-notes]

A few things are worth pointing out.

QMD’s pure BM25 scores are nearly identical (0.88–0.89) across the top results — there’s almost no discrimination. The Meditation Hub is in the BM25 index (it ranks #23 overall at score 0.88), but score bunching means it never makes the default top-10 cutoff. The reranker is what introduces meaningful spread: 0.93 versus 0.55 versus 0.47 tells you something real about relative relevance. Without reranking, you’re picking blindly from a tied list.

OmniSearch surfaces the Meditation Hub as its top result because MOCs are dense with keywords and heavily linked — BM25 naturally rewards that. The hybrid mode surfaces it at #3, correctly labeled as a Hub document. The collection context label is doing real work: the reranker knows this is a navigational document, not atomic content, which affects how it scores relevance.

The 21-second cold start is the main friction. Once models are loaded in VRAM, subsequent queries in the same session run in a few seconds. If you’re running QMD inside an AI agent loop that makes multiple searches per session, only the first query pays the full cost.

Using QMD with Claude Code

If you’re using Claude Code as an agent over your Obsidian vault, QMD replaces both Grep and OmniSearch for conceptual searches. Add this to your vault’s CLAUDE.md:

## Searching the Vault
Use `qmd` instead of Grep when searching for vault content by topic or concept.
- `qmd query "..." --json -n 10` — best quality, use for most searches (hybrid + rerank)
- `qmd search "..." --json -n 10` — faster BM25, use for exact keyword lookups
- No `-c` flag searches all collections; use `-c books`, `-c hubs` etc. to target specific ones
- `qmd get "path/to/file.md"` — retrieve full document content
 
Fall back to Grep only for exact string matches when you know the file location.

The --json flag returns structured output with file, title, score, and snippet for each result — enough for the agent to decide whether to fetch the full document. Snippets alone are often sufficient for linking and reference tasks, which keeps context costs low.

What I Learned

The collection structure matters more than the search mode. Before splitting the vault into multiple collections with distinct contexts, QMD and OmniSearch returned similar results. After — with the reranker knowing that one result is a Map of Content and another is an atomic note on attention research — the quality difference became real. Context is the feature, not just a label.

OmniSearch isn’t obsolete. Its tight integration with Obsidian metadata (tags, links, aliases) means it reliably surfaces well-linked hub files. For a Claude Code agent searching the vault without Obsidian running, QMD is clearly the better choice. For quick in-Obsidian lookup, OmniSearch is still instant and zero-overhead. They’re complementary rather than competing — I just stopped using OmniSearch from within Claude Code.

The token cost picture is more nuanced than it looks. OmniSearch’s raw response is 16k tokens, but in practice you pass only paths and scores (~200 tokens). QMD returns snippets (~1,500 tokens) that often contain enough context to avoid reading the full file. The real cost difference shows up downstream: fewer follow-up file reads.

Caveats

Node v22 only. The better-sqlite3 dependency hasn’t caught up to Node v23+. Use fnm to pin the version — don’t try to work around it.

21-second cold start. The first qmd query in a session loads ~1GB of models into VRAM. Subsequent queries are fast. If latency is a blocker, qmd search (BM25 only) is instant with acceptable quality for keyword queries.

Multiple -c flags are silently broken. You can filter by one collection (-c hubs) but not two. Only the last value is used — a known bug. Workaround: search all collections (no flag) and filter by the context field in results.

No exclude flag. The --mask parameter controls what gets indexed but there’s no --exclude. To exclude a subfolder from a parent collection, use a flat mask (*.md) rather than recursive (**/*.md). This only works cleanly if the subfolder you want to exclude is the only one — otherwise you lose all subdirectories.

iCloud sync and file timestamps. If your vault lives in iCloud Drive, modification times can be unreliable after sync events. qmd update uses mtimes to detect changes — if you notice stale results after a sync, force a full re-index with qmd collection remove <name> && qmd collection add ....

Add context before embedding. If you configure context after running qmd embed, the context won’t appear in existing results until you force a re-embed with qmd embed -f.

Semantic Search for Your Obsidian Vault — What I Tried and What Worked