Semantic Search

OpenTusk ships a local semantic index for your files. The @opentusk/indexer package maintains a SQLite + vector database on your machine, embeds file content with a local model, and exposes search over both the CLI and MCP tools. No content ever leaves your machine.

How it works

Everything runs locally:

Model: all-MiniLM-L6-v2 (384-dim, ~80 MB ONNX). CPU only, no GPU needed.
Database: SQLite with a vector extension, stored at ~/.opentusk/index/index.db.
Chunking: ~512 tokens per chunk, ~50 token overlap, respects paragraph boundaries.
Content: plaintext is indexed at decryption time — during uploads and downloads. The index contains chunks of decrypted content, so treat ~/.opentusk/index/ as sensitive (file perms should be 0600).

Because indexing happens on plaintext and shared-vault files are encrypted end-to-end, indexing is a local concern — only files you can already decrypt are indexed.

Quick start

Install the indexer
Terminal window
```
npm install -g @opentusk/indexer
```
Skipping this keeps the CLI small. Install it only on machines that need search.
Download the model and initialize the database
Terminal window
```
opentusk index setup
```
First run downloads ~80 MB to ~/.opentusk/index/models/. Subsequent runs are instant.
Sync your vaults
Terminal window
```
opentusk index sync                # all vaults
opentusk index sync --vault <id>   # one vault
opentusk index sync --force        # re-index everything
```
This lists files via the API, downloads and decrypts anything the index doesn’t already have (comparing SHA-256 content hashes), indexes it, and discards the plaintext bytes.

Search

opentusk search "deployment runbook"
opentusk search "invoice Acme Q1" --vault <id> --limit 5
opentusk search "pdf report" --type application/pdf --min-score 0.5
opentusk search "design doc" --json

Flag	Purpose
`--vault <id>`	Limit results to a single vault
`--folder <path>`	Folder path prefix (recursive)
`--type <mime>`	MIME type filter, e.g. `application/pdf`
`--limit <n>`	Max results (default 10)
`--min-score <n>`	Similarity threshold 0–1 (default 0.3)
`--json`	Raw `SearchResult[]` instead of styled output

Automatic indexing

When you upload or download a file via the CLI or MCP tools, the indexer is notified with the plaintext bytes as a fire-and-forget call. If the indexer isn’t installed or isn’t running, nothing happens — the file just doesn’t get indexed, and you can catch up later with opentusk index sync.

Stdio vs HTTP daemon

The indexer has two transports. You don’t need to pick one — the CLI handles stdio automatically.

Transport	When	How
stdio (default)	Single user, one machine, ad-hoc use	CLI spawns the indexer as a child process per command
HTTP daemon	Multiple MCP agents sharing one index, IDE plugins, long-lived setups	Run `opentusk-indexer serve --port 7600` and set `OPENTUSK_INDEXER_URL=http://localhost:7600`

Both transports expose the same API.

MCP integration

When the indexer is available, the MCP server automatically registers two additional tools:

opentusk_search — semantic search with the same filters as the CLI
opentusk_index_sync — trigger a full vault sync

If the indexer isn’t installed or reachable, the tools are simply not registered — no error. Tell the agent to run opentusk_index_sync after connecting to a new vault so searches are fresh.

See MCP Tools for the full tool catalog.

Content extraction

Type	What’s indexed
`.txt`, `.md`, `.csv`, `.log`	Raw text
`.json`, `.yaml`, `.toml`	Raw text
`.pdf`	Extracted text
`.html`	Tag-stripped text
Source code (`.ts`, `.js`, `.py`, …)	Source as text (comments + identifiers searchable)
Images, video, audio, binary	Metadata only (filename, size, MIME)

Index management

# Stats
opentusk index stats
# → Documents, Chunks, Vaults, DB Size, Last Indexed

# Force re-index
opentusk index sync --force

# Inspect DB location
ls -la ~/.opentusk/index/

To reset the index entirely, delete ~/.opentusk/index/index.db and run opentusk index setup again.

CLI reference All search and index commands in one place.

MCP tools opentusk_search and opentusk_index_sync for AI agents.

Upload & lifecycle Where auto-indexing hooks into the upload/download flow.