Skip to content

Semantic Search

OpenTusk ships a local semantic index for your files. The @opentusk/indexer package maintains a SQLite + vector database on your machine, embeds file content with a local model, and exposes search over both the CLI and MCP tools. No content ever leaves your machine.

Everything runs locally:

  • Model: all-MiniLM-L6-v2 (384-dim, ~80 MB ONNX). CPU only, no GPU needed.
  • Database: SQLite with a vector extension, stored at ~/.opentusk/index/index.db.
  • Chunking: ~512 tokens per chunk, ~50 token overlap, respects paragraph boundaries.
  • Content: plaintext is indexed at decryption time — during uploads and downloads. The index contains chunks of decrypted content, so treat ~/.opentusk/index/ as sensitive (file perms should be 0600).

Because indexing happens on plaintext and shared-vault files are encrypted end-to-end, indexing is a local concern — only files you can already decrypt are indexed.

  1. Install the indexer

    Terminal window
    npm install -g @opentusk/indexer

    Skipping this keeps the CLI small. Install it only on machines that need search.

  2. Download the model and initialize the database

    Terminal window
    opentusk index setup

    First run downloads ~80 MB to ~/.opentusk/index/models/. Subsequent runs are instant.

  3. Sync your vaults

    Terminal window
    opentusk index sync # all vaults
    opentusk index sync --vault <id> # one vault
    opentusk index sync --force # re-index everything

    This lists files via the API, downloads and decrypts anything the index doesn’t already have (comparing SHA-256 content hashes), indexes it, and discards the plaintext bytes.

  4. Search

    Terminal window
    opentusk search "deployment runbook"
    opentusk search "invoice Acme Q1" --vault <id> --limit 5
    opentusk search "pdf report" --type application/pdf --min-score 0.5
    opentusk search "design doc" --json
FlagPurpose
--vault <id>Limit results to a single vault
--folder <path>Folder path prefix (recursive)
--type <mime>MIME type filter, e.g. application/pdf
--limit <n>Max results (default 10)
--min-score <n>Similarity threshold 0–1 (default 0.3)
--jsonRaw SearchResult[] instead of styled output

When you upload or download a file via the CLI or MCP tools, the indexer is notified with the plaintext bytes as a fire-and-forget call. If the indexer isn’t installed or isn’t running, nothing happens — the file just doesn’t get indexed, and you can catch up later with opentusk index sync.

The indexer has two transports. You don’t need to pick one — the CLI handles stdio automatically.

TransportWhenHow
stdio (default)Single user, one machine, ad-hoc useCLI spawns the indexer as a child process per command
HTTP daemonMultiple MCP agents sharing one index, IDE plugins, long-lived setupsRun opentusk-indexer serve --port 7600 and set OPENTUSK_INDEXER_URL=http://localhost:7600

Both transports expose the same API.

When the indexer is available, the MCP server automatically registers two additional tools:

  • opentusk_search — semantic search with the same filters as the CLI
  • opentusk_index_sync — trigger a full vault sync

If the indexer isn’t installed or reachable, the tools are simply not registered — no error. Tell the agent to run opentusk_index_sync after connecting to a new vault so searches are fresh.

See MCP Tools for the full tool catalog.

TypeWhat’s indexed
.txt, .md, .csv, .logRaw text
.json, .yaml, .tomlRaw text
.pdfExtracted text
.htmlTag-stripped text
Source code (.ts, .js, .py, …)Source as text (comments + identifiers searchable)
Images, video, audio, binaryMetadata only (filename, size, MIME)
Terminal window
# Stats
opentusk index stats
# → Documents, Chunks, Vaults, DB Size, Last Indexed
# Force re-index
opentusk index sync --force
# Inspect DB location
ls -la ~/.opentusk/index/

To reset the index entirely, delete ~/.opentusk/index/index.db and run opentusk index setup again.