# https://github.com/tinysearch/tinysearch Project Manual

Generated at: 2026-06-25 07:35:12 UTC

## Table of Contents

- [Overview and Architecture](#page-overview)
- [Configuration with tinysearch.toml](#page-config)
- [Static Site Generator Integration](#page-ssg)
- [Rust Library API and Programmatic Usage](#page-library)

<a id='page-overview'></a>

## Overview and Architecture

### Related Pages

Related topics: [Configuration with tinysearch.toml](#page-config), [Static Site Generator Integration](#page-ssg), [Rust Library API and Programmatic Usage](#page-library)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [README.md](https://github.com/tinysearch/tinysearch/blob/main/README.md)
- [src/lib.rs](https://github.com/tinysearch/tinysearch/blob/main/src/lib.rs)
- [examples/library_advanced/main.rs](https://github.com/tinysearch/tinysearch/blob/main/examples/library_advanced/main.rs)
- [examples/blog/README.md](https://github.com/tinysearch/tinysearch/blob/main/examples/blog/README.md)
- [examples/documentation/README.md](https://github.com/tinysearch/tinysearch/blob/main/examples/documentation/README.md)
- [examples/zola/README.md](https://github.com/tinysearch/tinysearch/blob/main/examples/zola/README.md)
- [examples/pelican/README.md](https://github.com/tinysearch/tinysearch/blob/main/examples/pelican/README.md)
- [examples/index.json](https://github.com/tinysearch/tinysearch/blob/main/examples/index.json)
- [examples/ecommerce/products.json](https://github.com/tinysearch/tinysearch/blob/main/examples/ecommerce/products.json)
</details>

# Overview and Architecture

## Purpose and Scope

tinysearch is a lightweight, fast, full-text search engine purpose-built for static websites. The project positions itself as a dependency-free alternative to heavier JavaScript search libraries such as [lunr.js](https://lunrjs.com/) and [elasticlunr](http://elasticlunr.com/) (Source: [README.md](https://github.com/tinysearch/tinysearch/blob/main/README.md)). It is implemented in Rust and compiled to WebAssembly (WASM) so that the entire search index and engine can run client-side in the browser, without a backend.

The core value proposition is size: a test index of approximately 40 posts produces a WASM payload of 99kB (49kB gzipped, 40kB brotli) — smaller than the project's own demo image (Source: [README.md](https://github.com/tinysearch/tinysearch/blob/main/README.md)). This makes tinysearch attractive to sites where JavaScript bundle weight matters.

The scope of the project covers three primary use cases:

- **Static site search** integrated with generators such as Jekyll, Hugo, Zola, Cobalt, and Pelican (Source: [README.md](https://github.com/tinysearch/tinysearch/blob/main/README.md)).
- **Documentation and blog search** through configurable schemas (Source: [examples/documentation/README.md](https://github.com/tinysearch/tinysearch/blob/main/examples/documentation/README.md), [examples/blog/README.md](https://github.com/tinysearch/tinysearch/blob/main/examples/blog/README.md)).
- **Programmatic library usage** from Rust code, introduced as an experimental API in v0.10.0 (Source: [src/lib.rs](https://github.com/tinysearch/tinysearch/blob/main/src/lib.rs)).

Community interest in extending the engine has surfaced in feature requests covering: configurable fields via `tinysearch.toml` (resolved in v0.10.0), filterable numeric/boolean fields, and library API documentation (Sources: [README.md](https://github.com/tinysearch/tinysearch/blob/main/README.md), [issue #183](https://github.com/tinysearch/tinysearch/issues/183), [issue #116](https://github.com/tinysearch/tinysearch/issues/116)).

## System Architecture

The tinysearch architecture follows a build-time index generation and runtime browser-side query model. The pipeline consists of three stages:

1. **Index Build (offline)**: A static site generator produces a JSON array describing each page. The `tinysearch` CLI consumes that JSON and emits either a `storage` file (binary index) or a fully linked WASM module.
2. **Distribution**: The WASM module is shipped as a static asset next to the site, typically inside a path such as `static/wasm_output/`.
3. **Runtime Query**: A small JavaScript glue layer (provided by the project) loads the WASM module, calls a search function, and renders results.

The data flow is illustrated below:

```mermaid
flowchart LR
    A[Static Site<br>Content] --> B[SSG Template<br>e.g. Zola, Pelican]
    B --> C[JSON Index<br>index.json]
    C --> D[tinysearch CLI<br>-m storage | wasm]
    D --> E[Binary Storage<br>files]
    D --> F[WASM Module<br>+ JS glue]
    F --> G[Static Site<br>/wasm_output/]
    G --> H[Browser<br>search.js]
    H --> I[User Query<br>Results]
```

Under the hood, the engine is a Rust/WASM port of the Python code from the article ["Writing a full-text search engine using Bloom filters"](https://www.stavros.io/posts/bloom-filter-search-engine/). Internally it uses a [Xor Filter](https://arxiv.org/abs/1912.08258) — a space-efficient probabilistic data structure for fast set membership (Source: [README.md](https://github.com/tinysearch/tinysearch/blob/main/README.md)). The library depends on `xorf` for the filter implementation, `bincode` for binary serialization, and `serde` for JSON deserialization (Source: [src/lib.rs](https://github.com/tinysearch/tinysearch/blob/main/src/lib.rs)).

## Core Components

### Library API

`src/lib.rs` re-exports a public `api` module and provides a library-level entry point. The core types documented in the module doc-comment are:

- `BasicPost` — a default post struct with `title`, `url`, optional `body`, and a `HashMap` for arbitrary metadata.
- `TinySearch` — the engine, constructed via `TinySearch::new()` and optionally configured with a custom stopword list (`.with_stopwords(...)`).
- `SearchIndex` — the built index, produced by `search.build_index(&posts)`.

Two methods drive usage: `build_index(&posts)` to compile posts into filters, and `search(&index, query, limit)` to query the compiled index (Source: [src/lib.rs](https://github.com/tinysearch/tinysearch/blob/main/src/lib.rs)). The library is marked experimental and the API may change (Source: [README.md](https://github.com/tinysearch/tinysearch/blob/main/README.md)).

### Command-Line Interface

The CLI exposes three operating modes referenced in the example READMEs:

| Mode | Purpose | Example invocation |
|------|---------|--------------------|
| `storage` | Build the binary index only | `tinysearch -m storage -p ./output docs.json` |
| `search` | Run a one-shot query against a storage dir | `tinysearch -m search -S "rust" -N 3 ./output/storage` |
| `wasm` | Emit a complete WASM module + JS glue | `tinysearch --release -m wasm -p ./wasm_output docs.json` |

(Source: [examples/blog/README.md](https://github.com/tinysearch/tinysearch/blob/main/examples/blog/README.md), [examples/documentation/README.md](https://github.com/tinysearch/tinysearch/blob/main/examples/documentation/README.md), [README.md](https://github.com/tinysearch/tinysearch/blob/main/README.md))

The `--optimize` / `-o` flag enables `wasm-opt` compression from binaryen, which typically reduces output size by 20–30% (Source: [README.md](https://github.com/tinysearch/tinysearch/blob/main/README.md)). Users on Windows or non-Rust environments can run the same workflow through nightly-built Docker images (Source: [README.md](https://github.com/tinysearch/tinysearch/blob/main/README.md)).

### Configuration via `tinysearch.toml`

Introduced in v0.10.0, a TOML configuration file lets users declare which JSON fields are indexed for full-text search, which are stored as display-only metadata, and which field provides the result URL (Source: [README.md](https://github.com/tinysearch/tinysearch/blob/main/README.md)). Example schemas for blogs, documentation, and e-commerce catalogs are provided in the repository. When the file is absent, the default schema indexes `title` and `body` and uses `url` as the link field (Source: [README.md](https://github.com/tinysearch/tinysearch/blob/main/README.md)).

## Integration Patterns

### Static Site Generators

Each supported SSG has a small "index template" that emits a JSON array of pages:

- **Zola** uses a Tera template iterating `section.pages`, skipping drafts, and rendering `title`, `permalink`, and a sanitized `body` (Source: [examples/zola/README.md](https://github.com/tinysearch/tinysearch/blob/main/examples/zola/README.md)).
- **Pelican** uses a Jinja template that mirrors the same structure with `tojson` filters (Source: [examples/pelican/README.md](https://github.com/tinysearch/tinysearch/blob/main/examples/pelican/README.md)).
- **Blog and documentation sites** follow the same pattern, with `tinysearch.toml` configuring richer schemas (Sources: [examples/blog/README.md](https://github.com/tinysearch/tinysearch/blob/main/examples/blog/README.md), [examples/documentation/README.md](https://github.com/tinysearch/tinysearch/blob/main/examples/documentation/README.md)).

After `zola build` / `pelican content` produces the JSON, the typical call is:

```
tinysearch --optimize --path static public/tinysearch.json/index.html
```

(Source: [examples/zola/README.md](https://github.com/tinysearch/tinysearch/blob/main/examples/zola/README.md))

### Library Use from Rust

Custom post types can be indexed without running the executable. The advanced example defines a `BlogPost` struct, constructs a `TinySearch` with custom stopwords (`"the"`, `"with"`), and runs a series of queries over the resulting `SearchIndex` (Source: [examples/library_advanced/main.rs](https://github.com/tinysearch/tinysearch/blob/main/examples/library_advanced/main.rs)). This addresses the long-standing community request for a public Rust API (Source: [issue #183](https://github.com/tinysearch/tinysearch/issues/183)).

### Sample Data Layout

The examples folder ships ready-to-use JSON samples illustrating the schema flexibility: blog posts with tags and authors ([examples/blog/posts.json](https://github.com/tinysearch/tinysearch/blob/main/examples/blog/posts.json)), documentation pages with versioning and difficulty ([examples/documentation/docs.json](https://github.com/tinysearch/tinysearch/blob/main/examples/documentation/docs.json)), and product catalogs with prices and ratings ([examples/ecommerce/products.json](https://github.com/tinysearch/tinysearch/blob/main/examples/ecommerce/products.json)). A minimal three-post sample lives at [examples/index.json](https://github.com/tinysearch/tinysearch/blob/main/examples/index.json).

### Known Operational Considerations

- **WASM hosting**: Production deployments must serve `.wasm` with the correct MIME type (`application/wasm`); gzipped content must be advertised via `Content-Encoding` (Source: [README.md](https://github.com/tinysearch/tinysearch/blob/main/README.md)).
- **Browser WASM loading**: Changes to how browsers fetch and instantiate WASM have caused some glue-script implementations to break, particularly when cross-origin isolation or streaming compilation is involved (Source: [issue #175](https://github.com/tinysearch/tinysearch/issues/175)).
- **CJK text**: The engine indexes whole tokens; queries against Chinese or Japanese text may miss matches when the input is split in ways the tokenizer does not understand (Source: [issue #179](https://github.com/tinysearch/tinysearch/issues/179)).
- **Build directory output**: The default invocation emits several files alongside the WASM binary; community members have requested a flag to copy only the final `.wasm` (Source: [issue #169](https://github.com/tinysearch/tinysearch/issues/169)).

## See Also

- Configuration Reference (schema options in `tinysearch.toml`)
- Library API Guide (`BasicPost`, `TinySearch`, `SearchIndex`)
- Static Site Generator integration recipes (Jekyll, Hugo, Zola, Cobalt, Pelican)
- WebAssembly deployment checklist (MIME types, hosting)

---

<a id='page-config'></a>

## Configuration with tinysearch.toml

### Related Pages

Related topics: [Overview and Architecture](#page-overview), [Static Site Generator Integration](#page-ssg)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [README.md](https://github.com/tinysearch/tinysearch/blob/main/README.md)
- [examples/README.md](https://github.com/tinysearch/tinysearch/blob/main/examples/README.md)
- [examples/blog/README.md](https://github.com/tinysearch/tinysearch/blob/main/examples/blog/README.md)
- [examples/blog/tinysearch.toml](https://github.com/tinysearch/tinysearch/blob/main/examples/blog/tinysearch.toml)
- [examples/ecommerce/README.md](https://github.com/tinysearch/tinysearch/blob/main/examples/ecommerce/README.md)
- [examples/ecommerce/tinysearch.toml](https://github.com/tinysearch/tinysearch/blob/main/examples/ecommerce/tinysearch.toml)
- [examples/documentation/README.md](https://github.com/tinysearch/tinysearch/blob/main/examples/documentation/README.md)
- [examples/documentation/tinysearch.toml](https://github.com/tinysearch/tinysearch/blob/main/examples/documentation/tinysearch.toml)
- [examples/zola/README.md](https://github.com/tinysearch/tinysearch/blob/main/examples/zola/README.md)
- [src/lib.rs](https://github.com/tinysearch/tinysearch/blob/main/src/lib.rs)
</details>

# Configuration with tinysearch.toml

## Overview

The `tinysearch.toml` configuration file lets users customize which JSON fields are indexed for full-text search versus which fields are stored as metadata for display. It was introduced in **v0.10.0** (PR [#181](https://github.com/tinysearch/tinysearch/pull/181)) to address a long-standing roadmap request: making it possible to add arbitrary fields such as product images, descriptions, dates, authors, and categories without modifying the engine itself ([Issue #116](https://github.com/tinysearch/tinysearch/issues/116)).

The file is **optional**. When absent, tinysearch falls back to a default schema that indexes `title` and `body`, with `url` as the URL field. Source: [README.md](https://github.com/tinysearch/tinysearch/blob/main/README.md) — *"tinysearch will use the default schema (indexing `title` and `body` fields with `url` as the URL field)."*

## Schema Structure

A `tinysearch.toml` file contains a single `[schema]` table with three keys:

```toml
[schema]
indexed_fields  = ["title", "content", "tags"]
metadata_fields = ["author", "date", "category"]
url_field       = "permalink"
```

| Key | Purpose | Required |
|---|---|---|
| `indexed_fields` | List of JSON fields whose tokenized text is placed into the search index. | No (defaults to `["title", "body"]`) |
| `metadata_fields` | List of fields passed through verbatim to the output and returned with each search hit. | No (defaults to empty) |
| `url_field` | The single field whose value becomes the clickable link of a result. | No (defaults to `"url"`) |

Source: [README.md](https://github.com/tinysearch/tinysearch/blob/main/README.md) — example e-commerce configuration. Source: [examples/blog/tinysearch.toml](https://github.com/tinysearch/tinysearch/blob/main/examples/blog/tinysearch.toml) and [examples/ecommerce/tinysearch.toml](https://github.com/tinysearch/tinysearch/blob/main/examples/ecommerce/tinysearch.toml) for the three concrete configurations that ship with the repository.

### How the engine consumes the file

The `tinysearch` binary looks for `tinysearch.toml` in the working directory by default, and can be pointed at an alternate path with the `--config` CLI flag. It then walks the input JSON array and:

1. Concatenates the string values of all `indexed_fields` for each document and feeds them into the tokenizer that builds the Xor filter.
2. Copies each entry in `metadata_fields` into the output struct so the WASM module can return it alongside the URL when a query matches.
3. Reads the value of `url_field` and uses it as the link target.

Source: [src/lib.rs](https://github.com/tinysearch/tinysearch/blob/main/src/lib.rs) — the library exposes `TinySearch` and `SearchIndex` types and is built on top of the `xorf` crate, confirming the schema values flow into the filter that backs every query.

## Real-World Schemas from the Bundled Examples

The `examples/` directory ships three ready-made schemas that demonstrate how the same `[schema]` table adapts to very different content shapes.

```mermaid
flowchart LR
    A[JSON input file] --> B{Read tinysearch.toml}
    B -- indexed_fields --> C[Tokenizer + Xor filter]
    B -- metadata_fields --> D[Pass-through struct]
    B -- url_field --> E[Result links]
    C --> F[WASM / search output]
    D --> F
    E --> F
```

| Example | `indexed_fields` | `metadata_fields` | `url_field` |
|---|---|---|---|
| E-commerce | `product_name`, `description`, `category`, `tags` | `price`, `image_url`, `brand`, `availability`, `rating`, `reviews_count` | `product_url` |
| Blog | `title`, `content`, `excerpt`, `tags` | `author`, `publish_date`, `category`, `reading_time`, `featured_image` | `permalink` |
| Documentation | `title`, `content`, `section`, `keywords` | `version`, `last_updated`, `contributor`, `difficulty`, `type` | `doc_url` |

Sources: [examples/ecommerce/README.md](https://github.com/tinysearch/tinysearch/blob/main/examples/ecommerce/README.md), [examples/blog/README.md](https://github.com/tinysearch/tinysearch/blob/main/examples/blog/README.md), [examples/documentation/README.md](https://github.com/tinysearch/tinysearch/blob/main/examples/documentation/README.md), and the comparison table in [examples/README.md](https://github.com/tinysearch/tinysearch/blob/main/examples/README.md).

These three configurations illustrate the typical patterns:

- **E-commerce** combines human-readable text (`description`) with categorical data (`category`, `tags`) for matching, while pricing and availability are surfaced as metadata. This pattern directly answers the "is there a way to return the page description or body in the results?" question in [Issue #159](https://github.com/tinysearch/tinysearch/issues/159).
- **Blog** adds `excerpt` so the body field remains optional and snippet-friendly, addressing part of the roadmap in [Issue #116](https://github.com/tinysearch/tinysearch/issues/116).
- **Documentation** uses `section` and `keywords` to bias results toward navigational structure, with `version` and `difficulty` available for filtered UI.

## Usage Patterns and Common Pitfalls

### Pointing the CLI at the right config

Running `tinysearch` from the example directory picks up the adjacent `tinysearch.toml` automatically. When integrating with a static site generator it is usually placed at the project root; the Zola guide, for instance, assumes the file is present at the workspace level so all `tinysearch` invocations resolve the same schema. Source: [examples/zola/README.md](https://github.com/tinysearch/tinysearch/blob/main/examples/zola/README.md) — *"Run tinysearch: `tinysearch --optimize --path static …`"*.

### Backward compatibility

Projects that upgraded from v0.8.x or v0.9.x relied on hard-coded `title`, `body`, and `url` keys. Because v0.10.0 keeps the same default schema, legacy JSON input files continue to work without modification. The CI error reported in [Issue #182](https://github.com/tinysearch/tinysearch/issues/182) (`failed to select a version for the requirement tinysearch = "^0.9.0"`) is unrelated to the config file but illustrates why pinning to v0.10.0 is important when adopting the new schema.

### Library usage and custom fields

The `tinysearch` crate can also be driven programmatically, and any custom field exposed via the `BasicPost.meta` `HashMap` is preserved through the same code path the TOML file controls at the CLI level. Source: [README.md](https://github.com/tinysearch/tinysearch/blob/main/README.md) — *"Add tinysearch to your `Cargo.toml`: `cargo add tinysearch`"*. This addresses the request in [Issue #183](https://github.com/tinysearch/tinysearch/issues/183) for a public API usable from Rust without invoking the binary.

### Common failure modes

1. **Mismatched field names** — the schema is case-sensitive; a TOML key of `Title` will silently produce an empty index because the engine reads `title`. Verify by running the `storage` mode and inspecting the generated `index.html` or by performing a sanity check with `tinysearch -m search -S "..." ./output/storage`.
2. **CORS / MIME issues at the WASM layer** — the schema is processed before WASM emission, but the resulting module still must be served with `application/wasm`. See the "Changes in the way browsers work with wasm" discussion in [Issue #175](https://github.com/tinysearch/tinysearch/issues/175).
3. **Non-Latin tokenization** — because the engine matches whole words, partial Chinese substrings may miss, as reported in [Issue #179](https://github.com/tinysearch/tinysearch/issues/179). Splitting indexed text into whitespace-delimited tokens in the input JSON mitigates this regardless of schema.

## See Also

- [Library Usage (Experimental)](https://github.com/tinysearch/tinysearch/blob/main/README.md#library-usage-experimental) — driving the same engine from Rust.
- [examples/](https://github.com/tinysearch/tinysearch/tree/main/examples) — full e-commerce, blog, and documentation sample projects.
- [Zola integration guide](https://github.com/tinysearch/tinysearch/blob/main/examples/zola/README.md) — generating a JSON index from a Tera template.
- [Roadmap (Issue #116)](https://github.com/tinysearch/tinysearch/issues/116) — discussion of future schema features such as filters and boolean fields.

---

<a id='page-ssg'></a>

## Static Site Generator Integration

### Related Pages

Related topics: [Overview and Architecture](#page-overview), [Configuration with tinysearch.toml](#page-config)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [README.md](https://github.com/tinysearch/tinysearch/blob/main/README.md)
- [examples/zola/README.md](https://github.com/tinysearch/tinysearch/blob/main/examples/zola/README.md)
- [examples/pelican/README.md](https://github.com/tinysearch/tinysearch/blob/main/examples/pelican/README.md)
- [examples/blog/README.md](https://github.com/tinysearch/tinysearch/blob/main/examples/blog/README.md)
- [examples/documentation/README.md](https://github.com/tinysearch/tinysearch/blob/main/examples/documentation/README.md)
- [examples/library_advanced/main.rs](https://github.com/tinysearch/tinysearch/blob/main/examples/library_advanced/main.rs)
- [examples/index.json](https://github.com/tinysearch/tinysearch/blob/main/examples/index.json)
- [examples/blog/posts.json](https://github.com/tinysearch/tinysearch/blob/main/examples/blog/posts.json)
- [examples/documentation/docs.json](https://github.com/tinysearch/tinysearch/blob/main/examples/documentation/docs.json)
- [src/lib.rs](https://github.com/tinysearch/tinysearch/blob/main/src/lib.rs)
</details>

# Static Site Generator Integration

## Purpose and Scope

tinysearch is a lightweight, fast, full-text search engine written in Rust and compiled to WebAssembly. It is specifically designed for **static websites**, making Static Site Generator (SSG) integration its primary use case. The README states: "It can be used together with static site generators such as Jekyll, Hugo, Zola, Cobalt, or Pelican." ([README.md](https://github.com/tinysearch/tinysearch/blob/main/README.md))

Integration with an SSG follows a two-phase model: the SSG generates a JSON index from site content, and tinysearch consumes that JSON to produce a WASM payload that runs entirely client-side in the browser. The approach is generator-agnostic — any SSG capable of emitting a flat JSON array of `{title, body/url, ...}` records can be wired in.

The v0.10.0 release further strengthened integration by introducing a `tinysearch.toml` configuration file for declaring indexed vs. metadata fields, and by exposing tinysearch as a Rust library for programmatic index construction ([README.md](https://github.com/tinysearch/tinysearch/blob/main/README.md)).

## Integration Architecture

The end-to-end pipeline can be visualised as a data flow from source markdown content to a browser-resident search engine:

```mermaid
flowchart LR
    A[Markdown Pages<br/>in SSG] --> B[SSG Template<br/>Renders JSON]
    B --> C[posts.json<br/>index.json]
    C --> D[tinysearch CLI<br/>or Library]
    D --> E[storage/<br/>binary index]
    E --> F[WASM Module<br/>+ JS glue]
    F --> G[Browser<br/>Client-Side Search]

    style D fill:#f9f,stroke:#333
    style F fill:#bbf,stroke:#333
```

Each post is internally converted into an Xor Filter — "a datastructure for fast approximation of set membership that is smaller than bloom and cuckoo filters" — and then serialised with bincode into a single binary blob that ships with the site ([README.md](https://github.com/tinysearch/tinysearch/blob/main/README.md)). The library entry point in `src/lib.rs` exposes `TinySearch::new()` and `build_index(...)` so the same pipeline can be driven from Rust code instead of the CLI ([src/lib.rs](https://github.com/tinysearch/tinysearch/blob/main/src/lib.rs)).

## Generator-Specific Integration Patterns

### Zola (Tera templates)

The Zola example iterates over `section.pages` in a Tera template, skips drafts, strips HTML tags, and emits a JSON array. A key gotcha — discussed in community issue [#169](https://github.com/tinysearch/tinysearch/issues/169) and [#178](https://github.com/tinysearch/tinysearch/issues/178) — is that Zola emits `public/tinysearch.json/index.html` rather than `tinysearch.json`, so the CLI invocation is:

```bash
tinysearch --optimize --path static public/tinysearch.json/index.html
```

Source: [examples/zola/README.md](https://github.com/tinysearch/tinysearch/blob/main/examples/zola/README.md)

The template uses `json_encode` and explicit `replace` filters to escape braces, quotes, and backslashes that survive `striptags`. Zola now also supports a native `fuse_json` index format that is conceptually compatible with tinysearch, though this route has not yet been officially adopted (see community discussion [#178](https://github.com/tinysearch/tinysearch/issues/178)).

### Pelican (Jinja templates)

Pelican uses a near-identical Jinja template. The pattern is the same: loop over `articles`, filter `article.status != "draft"`, and emit records with `article.title`, `article.url`, and `article.content` ([examples/pelican/README.md](https://github.com/tinysearch/tinysearch/blob/main/examples/pelican/README.md)). The template is wired to a static page via frontmatter (`Template: json`) so `pelican content` writes the JSON to `output/pages/json.html`, which is then fed to `tinysearch --optimize --path output output/pages/json.html`.

### Hugo, Jekyll, and Cobalt

The main README explicitly lists Hugo and Jekyll as supported, and the `examples/` directory demonstrates the same JSON-as-bridge pattern. Hugo users can rely on its native JSON output formats, while Jekyll users can use Liquid templates to iterate over `site.posts` and emit records. For any homegrown SSG, community issue [#183](https://github.com/tinysearch/tinysearch/issues/183) raised the request for a Rust library API — a need fulfilled in v0.10.0 with `TinySearch` and `BasicPost` ([src/lib.rs](https://github.com/tinysearch/tinysearch/blob/main/src/lib.rs)).

## Configuration with `tinysearch.toml` (v0.10.0)

Prior to v0.10.0, tinysearch assumed a fixed schema of `title`, `body`, and `url`. The new configuration file, introduced in [PR #181](https://github.com/tinysearch/tinysearch/pull/181), lets users declare arbitrary schemas. The README example shows a documentation site with four indexed fields and four metadata fields:

```toml
[schema]
indexed_fields = ["title", "content", "section", "keywords"]
metadata_fields = ["version", "last_updated", "contributor", "difficulty"]
url_field = "doc_url"
```

Source: [README.md](https://github.com/tinysearch/tinysearch/blob/main/README.md)

This schema is reflected in the documentation example JSON, which extends the basic `{title, body, url}` shape with `section`, `keywords`, `doc_url`, `version`, `last_updated`, `contributor`, `difficulty`, and `type` ([examples/documentation/docs.json](https://github.com/tinysearch/tinysearch/blob/main/examples/documentation/docs.json)). The blog example adds `excerpt`, `tags`, `permalink`, `author`, `publish_date`, `category`, `reading_time`, and `featured_image` ([examples/blog/posts.json](https://github.com/tinysearch/tinysearch/blob/main/examples/blog/posts.json)).

When no `tinysearch.toml` is present, tinysearch falls back to the default schema (`title` and `body` indexed, `url` as link).

## Build Commands and Optimisation

The CLI exposes three modes relevant to SSG workflows: `storage` (build the binary index), `wasm` (build the WASM bundle), and `search` (run a query against a built index). Common invocation patterns from the README and examples:

```bash
# Dev build with demo HTML
tinysearch -m wasm -p wasm_output posts.json

# Production build, no demo
tinysearch --release -m wasm -p wasm_output posts.json

# Optimised WASM (requires binaryen's wasm-opt)
tinysearch --release -o -m wasm -p wasm_output posts.json
```

Source: [README.md](https://github.com/tinysearch/tinysearch/blob/main/README.md)

The `--optimize` flag invokes `wasm-opt`, typically shrinking the payload by 20–30% (mentioned in the docs example under `Performance Optimization`). Production deployment requires the web server to serve `.wasm` with `application/wasm` MIME type — a common source of the "Demo broken" error reported in issue [#177](https://github.com/tinysearch/tinysearch/issues/177).

## Library API for Programmatic Integration

For SSG authors who prefer not to shell out to the CLI, the library API introduced in [PR #184](https://github.com/tinysearch/tinysearch/pull/184) exposes the same pipeline. The advanced example demonstrates a custom `BlogPost` struct with `title`, `slug`, `content`, `tags`, and `author`, configured via `.with_stopwords(...)` ([examples/library_advanced/main.rs](https://github.com/tinysearch/tinysearch/blob/main/examples/library_advanced/main.rs)):

```rust
let search = TinySearch::new().with_stopwords(vec!["the".to_string(), "with".to_string()]);
let index = search.build_index(&blog_posts)?;
let results = search.search(&index, "rust", 10);
```

Source: [src/lib.rs](https://github.com/tinysearch/tinysearch/blob/main/src/lib.rs)

This addresses the request in issue [#183](https://github.com/tinysearch/tinysearch/issues/183) for a way to drive tinysearch from Rust without invoking an executable.

## Known Limitations and Community-Reported Issues

Several recurring limitations surface in community discussions and constrain SSG integration choices:

| Limitation | Source | Impact on SSGs |
|---|---|---|
| Whole-word search only — no prefix or suggestion matching | [README.md](https://github.com/tinysearch/tinysearch/blob/main/README.md) | Chinese and other non-space-separated languages see partial matches (issue [#179](https://github.com/tinysearch/tinysearch/issues/179)) |
| Recommended size: small to medium sites (~2 kB/article uncompressed) | [README.md](https://github.com/tinysearch/tinysearch/blob/main/README.md) | Large doc sets may exceed browser memory |
| Search relevance can feel non-deterministic | [#120](https://github.com/tinysearch/tinysearch/issues/120) | May require query expansion in template |
| Metadata such as descriptions and images is not returned in result objects | [#159](https://github.com/tinysearch/tinysearch/issues/159) | UI must fetch page separately |
| No native keyword highlighting | [#119](https://github.com/tinysearch/tinysearch/issues/119) | Must be implemented in JS glue layer |
| 7 files emitted by default into `--path` | [#169](https://github.com/tinysearch/tinysearch/issues/169) | Staging step required to copy only the WASM file |

The library API and `tinysearch.toml` schema in v0.10.0 directly address the first three of these gaps by giving SSG authors full control over what gets indexed and how posts are represented.

## See Also

- [Configuration Reference](configuration.md) — full `tinysearch.toml` schema documentation
- [Library API](library-api.md) — programmatic use of `TinySearch` from Rust
- [WebAssembly Output Format](wasm-output.md) — what the browser actually loads
- [Performance and Size Optimisation](performance.md) — `--optimize` and brotli/gzip trade-offs

---

<a id='page-library'></a>

## Rust Library API and Programmatic Usage

### Related Pages

Related topics: [Overview and Architecture](#page-overview), [Configuration with tinysearch.toml](#page-config)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [src/lib.rs](https://github.com/tinysearch/tinysearch/blob/main/src/lib.rs)
- [src/api.rs](https://github.com/tinysearch/tinysearch/blob/main/src/api.rs)
- [examples/library_basic/main.rs](https://github.com/tinysearch/tinysearch/blob/main/examples/library_basic/main.rs)
- [examples/library_advanced/main.rs](https://github.com/tinysearch/tinysearch/blob/main/examples/library_advanced/main.rs)
- [examples/search_index_type.rs](https://github.com/tinysearch/tinysearch/blob/main/examples/search_index_type.rs)
- [examples/yew-example-crate/src/main.rs](https://github.com/tinysearch/tinysearch/blob/main/examples/yew-example-crate/src/main.rs)
- [README.md](https://github.com/tinysearch/tinysearch/blob/main/README.md)
</details>

# Rust Library API and Programmatic Usage

## Overview

tinysearch began life as a standalone command-line tool that ingests a JSON index file and emits a WebAssembly (WASM) blob for browser-side search. Starting with release **v0.10.0**, the crate can also be consumed directly from Rust, letting developers build and query search indexes in-process without shelling out to the `tinysearch` binary. This was added in response to a long-standing community request (see [issue #183](https://github.com/tinysearch/tinysearch/issues/183)) and is documented as the "Library Usage (Experimental)" section of the [README.md](https://github.com/tinysearch/tinysearch/blob/main/README.md).

The library surface is intentionally small. It exposes one trait (`Post`), one ready-made struct (`BasicPost`), one engine (`TinySearch`), and one index type alias (`SearchIndex`). The design goal — and the reason the project remains a few tens of kilobytes after compilation — is that the same Xor-Filter-based representation used on the wire is also the in-memory representation in the library. Source: [src/lib.rs](https://github.com/tinysearch/tinysearch/blob/main/src/lib.rs).

## Core Types and the `Post` Trait

The library is built around a single trait, `Post`, defined in [src/lib.rs](https://github.com/tinysearch/tinysearch/blob/main/src/lib.rs). Anything the engine can index must implement it:

| Method | Signature | Purpose |
| --- | --- | --- |
| `title` | `fn title(&self) -> &str` | Required. The display title of the document. |
| `url` | `fn url(&self) -> &str` | Required. The link target returned with each hit. |
| `body` | `fn body(&self) -> Option<&str>` | Optional. The main searchable text. |
| `meta` | `fn meta(&self) -> HashMap<String, String>` | Optional. Extra fields stored alongside the hit (e.g., `author`, `category`). |

`BasicPost` is the only concrete type shipped by the crate; it is a plain owned-struct that satisfies `Post` for callers who do not want to write their own implementation. Source: [src/lib.rs](https://github.com/tinysearch/tinysearch/blob/main/src/lib.rs).

The `TinySearch` engine is constructed with `TinySearch::new()` and configured fluently. The crate-level rustdoc shows the minimal path:

```rust
use tinysearch::{BasicPost, TinySearch, SearchIndex};
use std::collections::HashMap;

let posts = vec![
    BasicPost {
        title: "First Post".to_string(),
        url: "/first".to_string(),
        body: Some("This is the first post content".to_string()),
        meta: HashMap::new(),
    },
];

let search = TinySearch::new();
let index: SearchIndex = search.build_index(&posts).expect("Failed to build index");
let results = search.search(&index, "rust", 10);
```

Source: [src/lib.rs](https://github.com/tinysearch/tinysearch/blob/main/src/lib.rs).

## Basic Library Usage

The `examples/library_basic/` example in the repository demonstrates the shortest possible integration. A user supplies a `Vec` of `BasicPost`, calls `build_index`, and then calls `search(&index, query, limit)` to obtain a vector of `SearchResult` values. The result type carries the title, the URL, and the metadata map that was passed in, so consumers can render hits without consulting the original document. Source: [examples/library_basic/main.rs](https://github.com/tinysearch/tinysearch/blob/main/examples/library_basic/main.rs) and [README.md](https://github.com/tinysearch/tinysearch/blob/main/README.md).

For projects that already have JSON in the standard tinysearch shape (`title`, `url`, optional `body`, optional `meta`), the API offers a convenience parser so callers do not have to deserialize by hand. The signature is documented in the rustdoc of [src/api.rs](https://github.com/tinysearch/tinysearch/blob/main/src/api.rs):

```rust
let json = r#"[
  {
    "title": "My Post",
    "url": "/my-post",
    "body": "Post content goes here",
    "meta": {"category": "programming", "author": "John"}
  }
]"#;

let search = TinySearch::new();
let posts = search.parse_posts(json).expect("parse error");
let index = search.build_index(&posts)?;
let results = search.search(&index, "post", 10);
```

Source: [src/api.rs](https://github.com/tinysearch/tinysearch/blob/main/src/api.rs). The `parse_posts` helper lowers the friction for users migrating from the CLI workflow to the library workflow.

## Advanced Usage with Custom Post Types

Most real sites do not have documents that look like `BasicPost`. The `examples/library_advanced/` example defines a domain struct (`BlogPost`) and implements `Post` for it. The pattern is straightforward: map the four trait methods to whatever fields your content store exposes. Source: [examples/library_advanced/main.rs](https://github.com/tinysearch/tinysearch/blob/main/examples/library_advanced/main.rs).

```rust
struct BlogPost {
    title: String,
    slug: String,
    content: String,
    tags: Vec<String>,
    author: String,
}

impl Post for BlogPost {
    fn title(&self) -> &str { &self.title }
    fn url(&self) -> &str { &self.slug }
    fn body(&self) -> Option<&str> { Some(&self.content) }
    fn meta(&self) -> HashMap<String, String> {
        let mut meta = HashMap::new();
        meta.insert("author".to_string(), self.author.clone());
        meta.insert("tags".to_string(), self.tags.join(", "));
        meta
    }
}
```

This is the integration point that community issue [#183](https://github.com/tinysearch/tinysearch/issues/183) was specifically asking for: a way to call tinysearch from inside a home-grown static site generator without spawning the executable. The `meta` map is what enables richer result rendering (e.g., showing an author or a thumbnail), which is the same use case behind issue [#119](https://github.com/tinysearch/tinysearch/issues/119) ("Highlight matched keywords in search results") and issue [#159](https://github.com/tinysearch/tinysearch/issues/159) ("Is there a way to return the page description or body in the results?"). Source: [examples/library_advanced/main.rs](https://github.com/tinysearch/tinysearch/blob/main/examples/library_advanced/main.rs).

A Yew (WebAssembly frontend) integration example lives in `examples/yew-example-crate/`. It shows the same trait being implemented inside a frontend crate, which is the most direct way to keep a single content schema across build-time index generation and runtime search. Source: [examples/yew-example-crate/src/main.rs](https://github.com/tinysearch/tinysearch/blob/main/examples/yew-example-crate/src/main.rs).

## Configuration Options

The library exposes a small but useful configuration surface, all on the `TinySearch` builder:

| Method | Effect | Source |
| --- | --- | --- |
| `TinySearch::new()` | Construct a default engine. | [src/api.rs](https://github.com/tinysearch/tinysearch/blob/main/src/api.rs) |
| `.with_stopwords(words)` | Override the default stopword list with a custom collection. | [src/api.rs](https://github.com/tinysearch/tinysearch/blob/main/src/api.rs) |
| `search(&index, query, limit)` | Run a query and cap the number of returned hits. | [src/lib.rs](https://github.com/tinysearch/tinysearch/blob/main/src/lib.rs) |
| `build_index(&posts)` | Convert any iterable of `&impl Post` into a `SearchIndex`. | [src/lib.rs](https://github.com/tinysearch/tinysearch/blob/main/src/lib.rs) |
| `parse_posts(json)` | Parse the canonical CLI JSON shape into `Vec<BasicPost>`. | [src/api.rs](https://github.com/tinysearch/tinysearch/blob/main/src/api.rs) |

Stopword tuning is the most common customization in practice. The default list is replaced wholesale by the words you pass, which is what the advanced example relies on to drop common English words that would otherwise bloat the index. Source: [src/api.rs](https://github.com/tinysearch/tinysearch/blob/main/src/api.rs) and [examples/library_advanced/main.rs](https://github.com/tinysearch/tinysearch/blob/main/examples/library_advanced/main.rs).

A separate `SearchIndex` type alias — re-exported by the library — represents the fully built, serializable form of the index. The `examples/search_index_type.rs` example demonstrates how to materialize an index and feed it into a custom sink (for example, a non-WASM target such as a CLI or a server). Source: [examples/search_index_type.rs](https://github.com/tinysearch/tinysearch/blob/main/examples/search_index_type.rs).

## End-to-End Data Flow

The library path mirrors the CLI path; the only thing that changes is who drives each step.

```mermaid
flowchart LR
    A["Source documents<br/>(any Rust type)"] -->|impl Post| B["Vec&lt;BasicPost&gt;<br/>or user struct"]
    B --> C["TinySearch::new()"]
    C -->|build_index| D["SearchIndex<br/>(Xor Filters + bincode)"]
    D -->|search| E["Vec&lt;SearchResult&gt;<br/>(title, url, meta)"]
    C -.optional.-> F["with_stopwords"]
    C -.optional.-> G["parse_posts<br/>(from JSON)"]
```

Source: [src/lib.rs](https://github.com/tinysearch/tinysearch/blob/main/src/lib.rs), [src/api.rs](https://github.com/tinysearch/tinysearch/blob/main/src/api.rs), and [README.md](https://github.com/tinysearch/tinysearch/blob/main/README.md).

## Limitations and Caveats

The README is explicit that the library surface is **experimental** and may change. Source: [README.md](https://github.com/tinysearch/tinysearch/blob/main/README.md). Inherited engine limitations also apply: only full-word matches are supported (no prefix or fuzzy search), and the index for an entire site must fit in memory because it is one contiguous blob. The roadmap issue [#116](https://github.com/tinysearch/tinysearch/issues/116) tracks adding richer query features (filters, booleans, optional `body` field) that, once shipped, will land on the library API as well. Source: [README.md](https://github.com/tinysearch/tinysearch/blob/main/README.md).

For consumers who need WASM as the final artifact, the library path composes with the CLI path: build the index in Rust, serialize it with the same bincode format the CLI uses, and either embed it directly in a frontend crate (see the Yew example) or write it to disk and run `tinysearch --optimize` on it. Source: [examples/yew-example-crate/src/main.rs](https://github.com/tinysearch/tinysearch/blob/main/examples/yew-example-crate/src/main.rs) and [README.md](https://github.com/tinysearch/tinysearch/blob/main/README.md).

## See Also

- CLI Usage and JSON Index Format
- Static Site Generator Integration (Hugo, Zola, Pelican, Jekyll)
- tinysearch.toml Configuration
- WASM Deployment and MIME Types

---

<!-- evidence_pipeline_checked: true -->
<!-- evidence_injected: true -->

---

## Pitfall Log

Project: tinysearch/tinysearch

Summary: Found 16 structured pitfall item(s), including 0 high/blocking item(s). Top priority: Installation risk - Installation risk requires verification.

## 1. Installation risk - Installation risk requires verification

- Severity: medium
- Evidence strength: runtime_trace
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Repro command: `docker run -v $PWD:/app tinysearch/cli -m wasm`
- Evidence: identity.distribution | https://github.com/tinysearch/tinysearch

## 2. Installation risk - Installation risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/tinysearch/tinysearch/issues/169

## 3. Installation risk - Installation risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/tinysearch/tinysearch/issues/182

## 4. Installation risk - Installation risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/tinysearch/tinysearch/issues/177

## 5. Installation risk - Installation risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/tinysearch/tinysearch/issues/174

## 6. Installation risk - Installation risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/tinysearch/tinysearch/issues/173

## 7. Installation risk - Installation risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/tinysearch/tinysearch/issues/116

## 8. Installation risk - Installation risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/tinysearch/tinysearch/issues/170

## 9. Capability evidence risk - Capability evidence risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: README/documentation is current enough for a first validation pass.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: capability.assumptions | https://github.com/tinysearch/tinysearch

## 10. Maintenance risk - Maintenance risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/tinysearch/tinysearch/issues/175

## 11. Maintenance risk - Maintenance risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/tinysearch/tinysearch

## 12. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: downstream_validation.risk_items | https://github.com/tinysearch/tinysearch

## 13. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: risks.scoring_risks | https://github.com/tinysearch/tinysearch

## 14. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/tinysearch/tinysearch/issues/151

## 15. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: issue_or_pr_quality=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/tinysearch/tinysearch

## 16. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: release_recency=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/tinysearch/tinysearch

<!-- canonical_name: tinysearch/tinysearch; human_manual_source: deepwiki_human_wiki -->
