Semantic Search
OpenTusk ships a local semantic index for your files. The @opentusk/indexer package maintains a SQLite + vector database on your machine, embeds file content with a local model, and exposes search over both the CLI and MCP tools. No content ever leaves your machine.
How it works
Section titled “How it works”Everything runs locally:
- Model:
all-MiniLM-L6-v2(384-dim, ~80 MB ONNX). CPU only, no GPU needed. - Database: SQLite with a vector extension, stored at
~/.opentusk/index/index.db. - Chunking: ~512 tokens per chunk, ~50 token overlap, respects paragraph boundaries.
- Content: plaintext is indexed at decryption time — during uploads and downloads. The index contains chunks of decrypted content, so treat
~/.opentusk/index/as sensitive (file perms should be0600).
Because indexing happens on plaintext and shared-vault files are encrypted end-to-end, indexing is a local concern — only files you can already decrypt are indexed.
Quick start
Section titled “Quick start”-
Install the indexer
Terminal window npm install -g @opentusk/indexerSkipping this keeps the CLI small. Install it only on machines that need search.
-
Download the model and initialize the database
Terminal window opentusk index setupFirst run downloads ~80 MB to
~/.opentusk/index/models/. Subsequent runs are instant. -
Sync your vaults
Terminal window opentusk index sync # all vaultsopentusk index sync --vault <id> # one vaultopentusk index sync --force # re-index everythingThis lists files via the API, downloads and decrypts anything the index doesn’t already have (comparing SHA-256 content hashes), indexes it, and discards the plaintext bytes.
-
Search
Terminal window opentusk search "deployment runbook"opentusk search "invoice Acme Q1" --vault <id> --limit 5opentusk search "pdf report" --type application/pdf --min-score 0.5opentusk search "design doc" --json
Search filters
Section titled “Search filters”| Flag | Purpose |
|---|---|
--vault <id> | Limit results to a single vault |
--folder <path> | Folder path prefix (recursive) |
--type <mime> | MIME type filter, e.g. application/pdf |
--limit <n> | Max results (default 10) |
--min-score <n> | Similarity threshold 0–1 (default 0.3) |
--json | Raw SearchResult[] instead of styled output |
Automatic indexing
Section titled “Automatic indexing”When you upload or download a file via the CLI or MCP tools, the indexer is notified with the plaintext bytes as a fire-and-forget call. If the indexer isn’t installed or isn’t running, nothing happens — the file just doesn’t get indexed, and you can catch up later with opentusk index sync.
Stdio vs HTTP daemon
Section titled “Stdio vs HTTP daemon”The indexer has two transports. You don’t need to pick one — the CLI handles stdio automatically.
| Transport | When | How |
|---|---|---|
| stdio (default) | Single user, one machine, ad-hoc use | CLI spawns the indexer as a child process per command |
| HTTP daemon | Multiple MCP agents sharing one index, IDE plugins, long-lived setups | Run opentusk-indexer serve --port 7600 and set OPENTUSK_INDEXER_URL=http://localhost:7600 |
Both transports expose the same API.
MCP integration
Section titled “MCP integration”When the indexer is available, the MCP server automatically registers two additional tools:
opentusk_search— semantic search with the same filters as the CLIopentusk_index_sync— trigger a full vault sync
If the indexer isn’t installed or reachable, the tools are simply not registered — no error. Tell the agent to run opentusk_index_sync after connecting to a new vault so searches are fresh.
See MCP Tools for the full tool catalog.
Content extraction
Section titled “Content extraction”| Type | What’s indexed |
|---|---|
.txt, .md, .csv, .log | Raw text |
.json, .yaml, .toml | Raw text |
.pdf | Extracted text |
.html | Tag-stripped text |
Source code (.ts, .js, .py, …) | Source as text (comments + identifiers searchable) |
| Images, video, audio, binary | Metadata only (filename, size, MIME) |
Index management
Section titled “Index management”# Statsopentusk index stats# → Documents, Chunks, Vaults, DB Size, Last Indexed
# Force re-indexopentusk index sync --force
# Inspect DB locationls -la ~/.opentusk/index/To reset the index entirely, delete ~/.opentusk/index/index.db and run opentusk index setup again.