Verba Manual Preview - Doramagic.ai

Doramagic Project Pack · Human Manual

Verba

Verba (The Golden RAGtriever) is an open-source, user-friendly RAG (Retrieval-Augmented Generation) application developed by Weaviate. It provides a streamlined interface for building and ...

Introduction to Verba

Overview

Verba (The Golden RAGtriever) is an open-source, user-friendly RAG (Retrieval-Augmented Generation) application developed by Weaviate. It provides a streamlined interface for building and interacting with vector databases, enabling users to explore datasets and extract insights through semantic search and generative AI capabilities.

The application is designed to be accessible to both technical and non-technical users, offering multiple deployment options and integration with various LLM providers. Verba supports Python versions 3.10.0 to 3.12.0 and is distributed as the goldenverba Python package.

Property	Value
Package Name	goldenverba
Current Version	2.1.3
Python Support	>=3.10.0, <3.13.0
Repository	https://github.com/weaviate/Verba
License	BSD License

Sources: setup.py:3-14

Core Features

Verba provides a comprehensive set of features for RAG-based applications:

Data Import and Management

Users can import documents through multiple methods:

Add Files: Upload individual files directly
Add Directory: Import entire folders of documents
Add URL: Fetch content from web sources

The system automatically chunks and processes documents, making them searchable and queryable. Supported file formats include text files, PDFs, CSV, XLSX, and XLS formats for the DefaultReader.

Sources: README.md

Chat Interface

The Chat page enables users to ask questions about their imported data. The system retrieves relevant document chunks and generates contextual responses using the configured LLM. The chat interface displays:

Real-time retrieval and generation status
Cached results indicator for faster subsequent queries
Source attribution for retrieved information
Code block syntax highlighting in responses

Sources: frontend/app/components/Chat/ChatInterface.tsx

Document Explorer

Users can browse, view, and manage imported documents through the Document Explorer. Each document displays:

Document metadata and labels
Chunk information with relevancy scores
Source links for reference
Content preview with Markdown rendering

Sources: frontend/app/components/Document/ContentView.tsx

Configuration Options

Verba allows granular configuration of the RAG pipeline, including:

LLM provider selection
Embedding model configuration
Retrieval parameters
Chunk size and overlap settings

Architecture Overview

graph TD
    A[User Interface - React Frontend] --> B[FastAPI Backend Server]
    B --> C[Verba Manager - Core Logic]
    C --> D[Weaviate Vector Database]
    
    E[LLM Providers] --> B
    F[Ollama / HuggingFace] --> B
    
    G[Document Readers] --> C
    H[Embedders] --> C
    I[Generators] --> C
    
    D --> G
    D --> H
    D --> I

Deployment Options

Verba supports four deployment configurations to accommodate different use cases and infrastructure requirements.

Deployment Type	Description	Use Case
Weaviate	Connect to Weaviate Cloud Services (WCS)	Production deployments with managed infrastructure
Docker	Run with Docker Compose	Containerized deployments
Local	Run entirely on local machine	Development and testing
Custom	Specify custom Weaviate URL and credentials	Integration with existing Weaviate instances

Sources: frontend/app/components/Login/LoginView.tsx

Environment Variables

Configuration can be managed through environment variables for automated deployments:

Variable	Description
`DEFAULT_DEPLOYMENT`	Pre-select deployment type (Local, Docker, Weaviate, Custom)
`OLLAMA_MODEL`	Default Ollama model name
`OLLAMA_EMBED_MODEL`	Default Ollama embedding model

Sources: CHANGELOG.md

Supported Integrations

LLM Providers

Verba integrates with multiple LLM providers for text generation:

OpenAI: GPT models with dynamic model name retrieval based on API key and URL
Anthropic: Claude models
Cohere: Command models
Groq: Fast inference API
Novita AI: Additional generative capabilities
Upstage: Reader, Embedder, and Generator support

Sources: README.md and CHANGELOG.md

Embedding Providers

For vector embeddings, Verba supports:

Ollama: Local embedding models
HuggingFace: Sentence transformers and other models
OpenAI: text-embedding-ada-002 and newer models
Upstage: Solar embedding models

Document Readers

The system includes specialized readers for various file formats:

AssemblyAI: Audio file transcription and processing
DefaultReader: Text, PDF, CSV, XLSX, XLS formats
Unstructured: Advanced document parsing capabilities

Installation Methods

Install via pip

pip install goldenverba

Build from Source

git clone https://github.com/weaviate/Verba
cd Verba
pip install -e .

Deploy with Docker

git clone https://github.com/weaviate/Verba
docker compose --env-file <your-env-file> up -d --build

Sources: README.md

Project Structure

The Verba project is organized into two main components:

Verba/
├── goldenverba/          # Python backend package
│   └── server/           # FastAPI server implementation
├── frontend/             # React TypeScript frontend
│   └── app/
│       ├── components/   # React components
│       │   ├── Chat/     # Chat interface components
│       │   ├── Document/ # Document explorer components
│       │   ├── Ingestion/# Data import components
│       │   ├── Login/    # Authentication views
│       │   ├── Navigation/ # Navigation components
│       │   └── Settings/ # Configuration components
│       └── page.tsx      # Main application page
├── setup.py              # Package configuration
└── README.md             # Project documentation

The main navigation includes the following sections:

Navigation Item	Description
Chat	Query imported data using RAG
Documents	Browse and manage imported documents
Import Data	Add new files, directories, or URLs
Settings	Configure Verba and manage collections

Sources: frontend/app/components/Navigation/NavbarComponent.tsx

Settings and Management

The Settings page provides administrative functions for Verba management:

Collections Management

View all Weaviate collections with their object counts and status information, including shard configuration.

Reset Operations

Operation	Description
Reset Documents	Clears all documents and chunks from Verba
Reset Config	Resets configuration to default values
Reset Verba	Deletes all Verba-related collections
Reset Suggestions	Clears autocomplete suggestion data

Sources: frontend/app/components/Settings/InfoView.tsx

Getting Started Workflow

graph LR
    A[Install Verba] --> B[Configure Deployment]
    B --> C[Set API Keys]
    C --> D[Import Data]
    D --> E[Configure RAG Pipeline]
    E --> F[Query Data]

Installation: Choose an installation method (pip, source, or Docker)
Deployment Configuration: Select deployment type (Weaviate, Docker, Local, or Custom)
API Keys: Configure required API keys in .env file or through the UI
Data Import: Import documents using Add Files, Add Directory, or Add URL
Configuration: Adjust RAG pipeline settings under the Config tab
Query: Ask questions and receive answers with relevant document citations

Sources: README.md and frontend/app/components/Login/GettingStarted.tsx

Version History

Version	Release	Key Features
2.1.3	Latest	OLLAMA_MODEL, OLLAMA_EMBED_MODEL env vars, CSV/XLSX/XLS support, Hiding Getting Started display
2.1.2	Previous	Novita Generator support, basic Document class tests, spaCy fixes
2.1.1	Earlier	Dynamic OpenAI model retrieval
2.1.0	Earlier	Upstage integration, Custom deployment, Groq support, AssemblyAI Reader

Sources: CHANGELOG.md

External Resources

Resource	URL
GitHub Repository	https://github.com/weaviate/Verba
Blog Post	https://weaviate.io/blog/verba-open-source-rag-app
Video Tutorial	https://www.youtube.com/watch?v=swKKRdLBhas
Weaviate Forum	https://forum.weaviate.io/

Contributing

Verba is an open-source community project. Contributions are welcome through:

GitHub Issues for bug reports
GitHub Discussions for feature requests and ideas
Pull Requests for code contributions

Before contributing, please review the Contribution Guide in the repository.

Sources: [setup.py:3-14](https://github.com/weaviate/Verba/blob/main/setup.py)

RAG Concepts in Verba

Related topics: Introduction to Verba, RAG Retrieval System, LLM Generators and Answer Generation

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Readers

Continue reading this section for the full explanation and source context.

Section Embedders

Continue reading this section for the full explanation and source context.

Section Generators

Continue reading this section for the full explanation and source context.

RAG Concepts in Verba

Verba (The Golden RAGtriever) is an open-source RAG (Retrieval-Augmented Generation) application designed to provide a streamlined, user-friendly interface for building and interacting with RAG-powered applications. This document explains the core RAG concepts implemented within Verba's architecture.

What is RAG?

Retrieval-Augmented Generation (RAG) is a pattern that combines the power of large language models (LLMs) with external knowledge retrieval. Instead of relying solely on a model's training data, RAG systems:

Retrieve relevant documents or chunks from a knowledge base
Augment the user's query with the retrieved context
Generate a response using the LLM with the augmented input

Sources: README.md:1

Verba's RAG Pipeline Architecture

Verba implements a complete RAG pipeline with configurable components for reading, embedding, chunking, and generating.

graph TD
    A[User Query] --> B[Retrieval Phase]
    B --> C[Vector Search in Weaviate]
    C --> D[Retrieve Relevant Chunks]
    D --> E[Augment Query with Context]
    E --> F[Generation Phase]
    F --> G[LLM Response]
    
    H[Document Ingestion] --> I[Readers]
    I --> J[Chunking]
    J --> K[Embedding]
    K --> L[Vector Storage in Weaviate]

Sources: README.md:1 Sources: frontend/app/components/Chat/ChatInterface.tsx:1

Core Components

Readers

Verba supports multiple document formats through its reader system. Documents can be imported via the frontend and processed by appropriate readers.

File Type	Format	Support Status
CSV	`csv`	Supported (v2.1.3+)
Excel	`xlsx`, `xls`	Supported (v2.1.3+)
Text	Plain text	Supported
Markdown	`.md`	Supported
PDF	`.pdf`	Supported
Audio	Various	Supported via AssemblyAI

Sources: CHANGELOG.md:8-13

Embedders

Verba supports multiple embedding providers for converting documents into vector representations:

OpenAI - Uses OpenAI's embedding models
Ollama - Local embeddings via Ollama
HuggingFace - Sentence transformers from HuggingFace
Cohere - Cohere's embedding models
Upstage - Upstage's embedding service

Sources: README.md:1 Sources: CHANGELOG.md:15-22

Generators

Multiple LLM providers are supported for generating responses:

Provider	Type	Configuration
OpenAI	Cloud	API Key required
Anthropic	Cloud	API Key required
Cohere	Cloud	API Key required
Groq	Cloud	API Key required
Novita	Cloud	API Key required (v2.1.2+)
Ollama	Local	No API Key needed
Upstage	Cloud	API Key required

Sources: CHANGELOG.md:15-22 Sources: frontend/app/components/Login/LoginView.tsx:1

Document Ingestion Workflow

When users import documents into Verba, the following workflow is executed:

sequenceDiagram
    participant User
    participant Frontend
    participant Reader
    participant Chunker
    participant Embedder
    participant Weaviate
    
    User->>Frontend: Upload Document
    Frontend->>Reader: Process File
    Reader->>Chunker: Raw Text
    Chunker->>Embedder: Text Chunks
    Embedder->>Weaviate: Vector Embeddings
    Weaviate->>Weaviate: Store Chunks + Vectors

Chunking Configuration

Documents are split into manageable chunks for retrieval. Each chunk contains:

Content - The text content
Chunk ID - Position in original document
Labels - User-defined labels for categorization
Source Link - Reference to original document location
Metadata - Additional document properties

Sources: frontend/app/components/Ingestion/BasicSettingView.tsx:1

Retrieval and Query Flow

Chat Interface

Verba's chat interface handles the retrieval and generation process:

graph LR
    A[User Input] --> B[Socket Connection]
    B --> C{Retrieval Phase}
    C -->|Fetching Status: CHUNKS| D[Vector Search]
    D --> E[Retrieve Top-K Chunks]
    E --> F{Generation Phase}
    F -->|Fetching Status: RESPONSE| G[LLM Processing]
    G --> H[Stream Response]
    H --> I[Display to User]

The chat interface displays retrieval status to users:

fetchingStatus === "CHUNKS" → "Retrieving..."
fetchingStatus === "RESPONSE" → "Generating..."

Sources: frontend/app/components/Chat/ChatInterface.tsx:1

Message Types

Verba supports multiple message types in the chat:

Type	Direction	Description
`user`	User → System	User queries
`system`	System → User	LLM responses
`error`	System → User	Error messages
`retrieval`	System → User	Retrieved context
`cached`	System → User	Cached responses

Sources: frontend/app/components/Chat/ChatMessage.tsx:1

Relevancy Scoring

Retrieved chunks are scored for relevance. Chunks with a score greater than 0 are flagged as "High Relevancy":

{contentSnippet.score > 0 && (
  <div className="flex gap-2 items-center p-3 bg-primary-verba rounded-full w-fit">
    <HiSparkles size={12} />
    <p className="text-xs flex text-text-verba">High Relevancy</p>
  </div>
)}

Sources: frontend/app/components/Document/ContentView.tsx:1

Deployment Options

Verba supports multiple deployment configurations:

Deployment	Description	Use Case
Local	Runs entirely on local machine with Ollama	Development, Privacy
Docker	Containerized deployment	Easy setup
Weaviate Cloud	Managed Weaviate service	Production
Custom	User-provided Weaviate instance	Enterprise

Sources: README.md:1 Sources: frontend/app/components/Login/LoginView.tsx:1

Environment Variables

Key environment variables for RAG configuration:

Variable	Description
`OLLAMA_MODEL`	Default Ollama model
`OLLAMA_EMBED_MODEL`	Ollama embedding model
`DEFAULT_DEPLOYMENT`	Default deployment type
`WEAVIATE_URL`	Weaviate instance URL
`WEAVIATE_API_KEY`	Weaviate API key

Sources: CHANGELOG.md:8-13

Configuration Management

The RAG pipeline can be configured through the Settings interface:

graph TD
    A[Settings Page] --> B[Config Tab]
    A --> C[Info Tab]
    A --> D[Collections Tab]
    
    B --> E[RAG Pipeline Settings]
    C --> F[System Information]
    D --> G[Weaviate Collections]
    
    E --> H[Embedder Selection]
    E --> I[Generator Selection]
    E --> J[Retrieval Settings]

Configurable Options

Embedder - Choose embedding provider and model
Generator - Select LLM provider and model
Retrieval - Configure top-k, similarity thresholds
Chunk Size - Adjust document chunking parameters

Sources: frontend/app/components/Settings/InfoView.tsx:1

Data Storage

Weaviate Collections

Verba automatically creates collections in Weaviate for:

Documents - Original document metadata
Chunks - Vectorized document chunks with embeddings
Configurations - RAG pipeline settings

Each collection tracks:

Object count
Shard configuration
Status

Sources: frontend/app/components/Settings/InfoView.tsx:1

Reset Operations

Verba provides granular reset capabilities:

Operation	Scope	Action
Reset Documents	Data	Clears all documents and chunks
Reset Config	Configuration	Restores default RAG settings
Reset Verba	System	Deletes all Verba collections
Reset Suggestions	UI	Clears autocomplete cache

Sources: frontend/app/components/Settings/InfoView.tsx:1

Summary

Verba implements a complete RAG pipeline with:

Multi-format document support - CSV, Excel, PDF, audio files
Flexible embedding options - Multiple cloud and local providers
Diverse LLM integration - OpenAI, Anthropic, Cohere, Ollama, and more
Visual chat interface - Real-time status updates during retrieval and generation
Configurable pipeline - Adjust chunking, embedding, and retrieval parameters
Multiple deployment modes - Local, Docker, Weaviate Cloud, or custom infrastructure

The system leverages Weaviate's vector database capabilities for efficient similarity search while providing a user-friendly interface for non-technical users to build and interact with RAG applications.

Sources: [README.md:1]()

Component Architecture

Related topics: Data Ingestion System, Chunking Strategies, Embedder Configuration, LLM Generators and Answer Generation

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Navigation Structure

Continue reading this section for the full explanation and source context.

Section Login and Deployment Configuration

Continue reading this section for the full explanation and source context.

Section Chat Interface Pipeline

Continue reading this section for the full explanation and source context.

Component Architecture

Overview

Verba (Golden RAGtriever) is a RAG (Retrieval-Augmented Generation) application built with a modular component architecture that separates concerns across ingestion, retrieval, and generation pipelines. The system enables users to import various data formats, process them into searchable chunks, and query them using configurable LLM-based chat interfaces. The frontend is built with React/Next.js and TypeScript, while the backend is powered by Python with FastAPI and Weaviate as the vector database.

The component architecture in Verba follows a plugin-based pattern where different readers, embedders, chunkers, and generators can be dynamically configured and swapped at runtime. This design allows extensibility without modifying core system code.

System Architecture

graph TD
    subgraph Frontend["Frontend (Next.js/React)"]
        Login[LoginView]
        Chat[ChatInterface]
        Ingest[FileSelectionView]
        Settings[InfoView]
        Nav[NavbarComponent]
    end

    subgraph Backend["Backend (Python/FastAPI)"]
        API[FastAPI Server]
        Manager[Component Manager]
        Components[Components Registry]
    end

    subgraph External["External Services"]
        Weaviate[Weaviate DB]
        LLMs[LLM Providers]
        Readers[Data Readers]
    end

    Login --> |Credentials| API
    Chat --> |Query/RAGConfig| API
    Ingest --> |FileData| API
    Settings --> |Reset/Config| API
    API --> Manager
    Manager --> Components
    Components --> Weaviate
    Components --> LLMs
    Components --> Readers

Core Component Types

Verba's component system is organized around four primary types that form the RAG pipeline:

Component Type	Purpose	Examples
Reader	Parse various data formats into text	DefaultReader, AssemblyAI, Unstructured
Embedder	Convert text to vector representations	OpenAI Embedder, HuggingFace, Ollama
Chunker	Split documents into manageable pieces	DefaultChunker
Generator	Produce natural language responses	OpenAI Generator, Anthropic, Novita, Groq

Component Manager Architecture

The component manager (goldenverba/components/managers.py) serves as the central registry and orchestrator for all pluggable components. It maintains references to available readers, embedders, chunkers, and generators, enabling runtime selection based on user configuration.

Components are registered through the __init__.py module which discovers and loads all available implementations. The manager provides methods to:

List available components by type
Retrieve component configurations
Instantiate components with provided settings
Validate component compatibility

Frontend Component Architecture

The frontend uses a page-based navigation system where NavbarComponent manages the main routing between different views:

graph LR
    Nav[NavbarComponent] --> |CHAT| ChatPage[ChatInterface]
    Nav --> |DOCUMENTS| DocPage[DocumentExplorer]
    Nav --> |ADD| AddPage[FileSelectionView]
    Nav --> |SETTINGS| SettingsPage[InfoView]

Navigation items are conditionally rendered based on the production environment variable:

Demo Mode: Shows only Chat page
Production/Local/Docker: Shows Chat, Documents, Import Data, and Settings

Sources: frontend/app/components/Navigation/NavbarComponent.tsx (lines showing conditional rendering with production != "Demo")

The LoginView component handles the initial setup flow, supporting multiple deployment types:

Deployment	Description	Configuration Required
Local	Standalone Weaviate instance	URL, API Key
Docker	Containerized Weaviate	URL, API Key
Weaviate Cloud (WCS)	Managed Weaviate service	URL, API Key
Custom	User-specified Weaviate endpoint	URL, API Key, Port

Sources: frontend/app/components/Login/LoginView.tsx (lines 35-45 showing deployment type definitions)

Chat Interface Pipeline

The ChatInterface component implements the query-time RAG workflow:

sequenceDiagram
    participant User
    participant ChatInterface
    participant Backend
    participant Weaviate
    participant LLM
    
    User->>ChatInterface: Submit Query
    ChatInterface->>Backend: /query with RAGConfig
    Backend->>Weaviate: Vector Search
    Weaviate->>Backend: Top-k Chunks
    Backend->>LLM: Context + Query
    LLM->>Backend: Generated Response
    Backend->>ChatInterface: Response + Chunks
    ChatInterface->>User: Display Results

The interface displays retrieval status messages:

CHUNKS state: "Retrieving..." while fetching from Weaviate
RESPONSE state: "Generating..." while LLM produces answer

Sources: frontend/app/components/Chat/ChatInterface.tsx (lines showing fetchingStatus states)

Data Ingestion Flow

The FileSelectionView manages the document import pipeline:

graph TD
    Files[Files/Directories/URLs] --> Reader[Reader Component]
    Reader --> Chunker[Chunker Component]
    Chunker --> Embedder[Embedder Component]
    Embedder --> Weaviate[Weaviate Collection]

File imports support multiple sources:

Add Files: Individual file upload
Add Directory: Batch directory ingestion
Add URL: Web content extraction

Sources: frontend/app/components/Ingestion/FileSelectionView.tsx (lines showing URL dropdown with Reader component filtering)

Settings and Configuration

The InfoView component provides system management capabilities:

Action	Function	Data Affected
Reset Documents	Clear all collections	Documents, Chunks
Reset Config	Restore default settings	RAGConfig
Reset Verba	Full system reset	All collections
Reset Suggestions	Clear autocomplete cache	Suggestion data

Sources: frontend/app/components/Settings/InfoView.tsx (lines showing UserModalComponent triggers)

RAG Configuration Schema

The RAGConfig object defines the active pipeline configuration:

interface RAGConfig {
  Reader: {
    components: Record<string, Component>;
  };
  Chunker: {
    components: Record<string, Component>;
  };
  Embedder: {
    components: Record<string, Component>;
  };
  Generator: {
    components: Record<string, Component>;
  };
}

Each component contains:

type: Component category (e.g., "URL", "Text", "Vector")
name: Human-readable identifier
settings: Key-value configuration parameters

State Management

Verba manages state through React props and context patterns:

graph TD
    App[App Root] --> |Credentials| LoginView
    App --> |RAGConfig| ChatInterface
    App --> |Themes| Components
    App --> |production| NavbarComponent
    
    ChatInterface --> |setRAGConfig| App
    LoginView --> |setIsLoggedIn| App

Key state objects:

Credentials: Weaviate connection details (URL, API key)
RAGConfig: Pipeline configuration for all component types
Themes: UI theming configuration
production: Deployment mode ("Local" | "Demo" | "Production")

Dependency Injection

The frontend components receive dependencies via constructor injection:

interface ChatInterfaceProps {
  credentials: Credentials;
  RAGConfig: RAGConfig | null;
  setRAGConfig: (config: RAGConfig | null) => void;
  production: "Local" | "Demo" | "Production";
  addStatusMessage: (message: string) => void;
}

This pattern enables:

Testability through mock injection
Flexible component composition
Runtime configuration changes

WebSocket Communication

Real-time updates between frontend and backend use WebSocket connections:

Connection status monitoring via socketOnline and socketStatus
Status updates: "ONLINE", "OFFLINE", "CONNECTING"
Ability to cancel ongoing operations (e.g., retrieval/generation)

Sources: frontend/app/components/Chat/ChatInterface.tsx (lines showing socket status handling)

Conclusion

The Verba component architecture demonstrates a clean separation between frontend presentation and backend processing, with a plugin-based system enabling flexible RAG pipeline configuration. The architecture supports multiple deployment scenarios, various data sources, and different LLM providers through abstracted component interfaces.

Sources: [frontend/app/components/Navigation/NavbarComponent.tsx]() (lines showing conditional rendering with `production != "Demo"`)

Data Ingestion System

Related topics: Chunking Strategies, Embedding and Vector Storage

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Document Creation from Configuration

Continue reading this section for the full explanation and source context.

Section Available Readers

Continue reading this section for the full explanation and source context.

Section HTMLReader Configuration

Continue reading this section for the full explanation and source context.

Data Ingestion System

The Data Ingestion System in Verba (The Golden RAGtriever) is responsible for accepting user-provided documents from various sources, processing them through configurable reader pipelines, and preparing them for embedding and retrieval. This system forms the entry point of the RAG pipeline, enabling users to import files, URLs, directories, and audio content into the application.

System Overview

The ingestion system operates as a multi-stage pipeline that transforms raw content into structured Document objects ready for vectorization. It supports multiple input types and allows per-file configuration of chunking, embedding, and reading strategies.

The architecture consists of three primary layers:

Layer	Responsibility	Key Components
Frontend	User interface for file selection and configuration	`FileSelectionView`, `BasicSettingView`, `NavbarComponent`
API	Backend endpoint handling and request processing	`server/api.py`
Readers	Content extraction from various file types and sources	`BasicReader`, `HTMLReader`, `UnstructuredAPI`, `AssemblyAIAPI`, etc.

Sources: frontend/app/components/Navigation/NavbarComponent.tsx:1-50

Ingestion Flow

graph TD
    A[User clicks Import Data] --> B[FileSelectionView]
    B --> C{Input Type}
    C -->|Files| D[Add Files Tab]
    C -->|Directory| E[Add Directory Tab]
    C -->|URLs| F[Add URL Tab]
    D --> G[BasicSettingView - Configure]
    E --> G
    F --> G
    G --> H[Select Reader Type]
    G --> I[Set Chunker]
    G --> J[Set Embedder]
    G --> K[Add Metadata & Labels]
    K --> L[Import Selected]
    L --> M[API Endpoint: /import]
    M --> N[Reader Processing]
    N --> O[Document Creation]
    O --> P[Chunking]
    P --> Q[Vector Storage in Weaviate]

Document Model

The core data structure used throughout ingestion is the Document class, defined in goldenverba/components/document.py. This class encapsulates all metadata and content associated with an ingested document.

Document(
    title=str,
    content=str,
    extension=str,
    labels=list,
    source=str,
    fileSize=int,
    metadata=str,
    meta=dict
)

Sources: goldenverba/components/document.py:1-100

Document Creation from Configuration

The create_document function provides a factory method for generating Document objects from file configuration:

def create_document(content: str, fileConfig: FileConfig) -> Document:
    return Document(
        title=fileConfig.filename,
        content=content,
        extension=fileConfig.extension,
        labels=fileConfig.labels,
        source=fileConfig.source,
        fileSize=fileConfig.file_size,
        metadata=fileConfig.metadata,
        meta={},
    )

Sources: goldenverba/components/document.py:100-115

Reader Components

Readers are responsible for extracting raw content from various input sources. Each reader implements a specific loading strategy and returns a list of Document objects.

Available Readers

Reader	Type	Supported Sources	Description
`BasicReader`	File	`.txt`, `.pdf`, `.docx`, `.xlsx`, `.xls`, `.csv`	Standard file reading for common document formats
`HTMLReader`	URL	Web pages	Fetches and converts web pages, supports recursive crawling
`UnstructuredAPI`	API	Multiple formats	Uses Unstructured.io API for complex document parsing
`AssemblyAIAPI`	API	Audio files	Transcribes and extracts content from audio
`GitReader`	Repository	Git repos	Clones and extracts documentation from Git repositories
`UpstageDocumentParse`	API	Documents	Uses Upstage AI for document parsing

HTMLReader Configuration

The HTMLReader supports advanced web scraping capabilities:

Parameter	Type	Default	Description
`URLs`	list	required	List of URLs to process
`Convert To Markdown`	bool	true	Whether to convert HTML to markdown
`Recursive`	bool	false	Whether to follow linked pages
`Max Depth`	int	3	Maximum recursion depth for linked pages

Sources: goldenverba/components/reader/HTMLReader.py:1-80

HTMLReader Recursive Processing

async def process_url(
    self,
    url: str,
    to_markdown: bool,
    recursive: bool,
    max_depth: int,
    current_depth: int,
    session: aiohttp.ClientSession,
    reader: BasicReader,
    fileConfig: FileConfig,
    documents: List[Document],
    processed_urls: set,
):
    if url in processed_urls or current_depth > max_depth:
        return
    
    processed_urls.add(url)
    # ... content fetching and document creation

The reader uses an async pattern with aiohttp.ClientSession for efficient concurrent URL processing, maintaining a processed_urls set to prevent duplicate processing.

Sources: goldenverba/components/reader/HTMLReader.py:80-120

Frontend Ingestion Interface

File Selection View

The FileSelectionView component provides the primary UI for selecting and managing files to ingest. It supports three input modes:

<div className="tab-group">
  <Tabs 
    tabs={["Add Files", "Add Directory", "Add URL"]}
    onTabChange={handleTabChange}
  />
</div>

Key features include:

File List Display: Shows all selected files with their status
Multi-file Selection: Allows batch processing of multiple files
URL Dropdown: Provides reader type selection for URL inputs
Import Actions: Triggers the actual ingestion process

Sources: frontend/app/components/Ingestion/FileSelectionView.tsx:1-100

Configuration Settings View

The BasicSettingView component provides per-file configuration options. Each file can have its own RAG pipeline configuration:

Setting	Description	UI Element
Reader	Content extraction method	Disabled text input showing selected reader
Chunker	Text splitting strategy	Disabled text input with description
Embedder	Vector embedding model	Disabled text input with description
Title	Document display name	Editable text input
Source Link	Original source reference	Editable text input
Labels	Categorization tags	Input with add button
Metadata	Custom key-value data	Textarea for JSON-like content
Overwrite	Replace existing documents	Checkbox toggle

Sources: frontend/app/components/Ingestion/BasicSettingView.tsx:1-150

Label Management

Labels provide a way to categorize documents for filtering during retrieval:

<input
  type="text"
  value={label}
  onChange={(e) => setLabel(e.target.value)}
  onKeyDown={(e) => {
    if (e.key === "Enter") {
      e.preventDefault();
      addLabel(label);
    }
  }}
/>
<VerbaButton
  title="Add"
  Icon={IoAddCircleSharp}
  onClick={() => addLabel(label)}
/>

Sources: frontend/app/components/Ingestion/BasicSettingView.tsx:20-45

The configuration includes a debug feature that displays the complete file configuration as JSON:

<pre className="whitespace-pre-wrap text-xs">
  {selectedFileData
    ? (() => {
        const objCopy = { ...fileMap[selectedFileData] };
        objCopy.content = "File Content";
        return JSON.stringify(objCopy, null, 2);
      })()
    : ""}
</pre>

This allows users to inspect the full configuration state including the RAG pipeline settings before import.

Sources: frontend/app/components/Ingestion/BasicSettingView.tsx:100-130

Configuration Management

FileConfig Structure

The FileConfig object carries all configuration for a single file through the ingestion pipeline:

Field	Type	Purpose
`filename`	string	Document title
`extension`	string	File type indicator
`labels`	list	Categorization tags
`source`	string	Original source URL/path
`file_size`	int	Size in bytes
`metadata`	string	Custom metadata string
`rag_config`	dict	Reader, Chunker, Embedder settings
`content`	string	Raw file content
`overwrite`	bool	Whether to replace existing

RAG Pipeline Configuration

Each file maintains its own RAG configuration that specifies the processing pipeline:

fileMap[selectedFileData].rag_config = {
  "Reader": {
    "selected": "BasicReader",
    "components": {
      "BasicReader": { "type": "file", "description": "..." },
      "HTMLReader": { "type": "URL", "description": "..." }
    }
  },
  "Chunker": {
    "selected": "DefaultChunker",
    "components": { ... }
  },
  "Embedder": {
    "selected": "OpenAIEmbedder",
    "components": { ... }
  }
}

The ingestion system is accessible from the main navigation bar:

{production != "Demo" && (
  <NavbarButton
    hide={false}
    Icon={IoMdAddCircle}
    title="Import Data"
    currentPage={currentPage}
    setCurrentPage={setCurrentPage}
    setPage="ADD"
  />
)}

In Demo mode, the import functionality is disabled to prevent unauthorized data ingestion.

Sources: frontend/app/components/Navigation/NavbarComponent.tsx:30-45

State Management

The frontend maintains the ingestion state using React hooks:

State Variable	Type	Purpose
`fileMap`	`Record<string, FileData>`	All files selected for ingestion
`selectedFileData`	`string \	null`	Currently selected file key
`socketStatus`	`string`	Connection status to backend
`currentPage`	`string`	Current navigation page

The FileData interface contains both the raw content and the configuration:

interface FileData {
  content: string;
  filename: string;
  extension: string;
  labels: string[];
  source: string;
  file_size: number;
  metadata: string;
  rag_config: RAGConfig;
  overwrite: boolean;
  block: boolean;  // Disables editing during processing
}

Integration with RAG Pipeline

After successful ingestion, documents flow into the RAG pipeline:

Reader extracts raw content from the source
Chunker splits content into smaller segments for retrieval
Embedder converts chunks into vector embeddings
Weaviate stores the embedded chunks for semantic search

This integration is configured per-file through the rag_config object, allowing different documents to use different processing strategies based on their content type or user requirements.

Error Handling

The HTMLReader demonstrates the error handling pattern used across readers:

async with aiohttp.ClientSession() as session:
    for url in urls:
        try:
            await self.process_url(
                url, to_markdown, recursive, max_depth,
                0, session, reader, fileConfig, documents, processed_urls
            )
        except Exception as e:
            msg.warn(f"Failed to process URL {url}: {str(e)}")

Individual failures do not halt the entire ingestion process, allowing partial success scenarios.

Sources: [frontend/app/components/Navigation/NavbarComponent.tsx:1-50]()

Chunking Strategies

Related topics: Data Ingestion System, Embedding and Vector Storage

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Chunker Interface

Continue reading this section for the full explanation and source context.

Section Chunk Data Model

Continue reading this section for the full explanation and source context.

Section 1. Token Chunker

Continue reading this section for the full explanation and source context.

Chunking Strategies

Overview

Chunking strategies in Verba define how documents are split into smaller, semantically coherent units for embedding and retrieval. Each strategy implements the Chunker interface and provides different approaches to partitioning document content based on structural markers, token counts, sentence boundaries, or semantic similarity.

The chunking system is designed to:

Break large documents into manageable pieces for embedding
Preserve contextual continuity through overlapping chunks
Respect document structure (headers, code blocks, HTML tags)
Provide configuration options for fine-tuning chunk sizes and overlap
Skip documents that already contain chunks to avoid redundant processing

Sources: goldenverba/components/chunking/__init__.py

Architecture

Chunker Interface

All chunkers inherit from the Chunker base class and implement the chunk() async method. The interface accepts configuration parameters, a list of documents, and an optional embedder.

class Chunker:
    name: str
    requires_library: list[str]
    description: str

    async def chunk(
        self,
        config: dict,
        documents: list[Document],
        embedder: Embedding | None = None,
        embedder_config: dict | None = None,
    ) -> list[Document]:

Chunk Data Model

Each chunk produced by the chunkers contains the following fields:

Field	Type	Description
`content`	str	Full chunk text including overlap
`chunk_id`	int	Sequential identifier within the document
`start_i`	int \	None	Character-level start index
`end_i`	int \	None	Character-level end index
`content_without_overlap`	str	Chunk text without overlap region

Sources: goldenverba/components/chunk.py

Available Chunking Strategies

1. Token Chunker

The Token Chunker splits documents based on token counts using spaCy's tokenizer. It ensures chunks align with natural token boundaries for efficient embedding.

Key Features:

Splits text by token count using spaCy NLP
Configurable tokens per chunk via Tokens configuration
Configurable overlap via Overlap configuration
Calculates precise character-level indices for chunk boundaries

Configuration Options:

Parameter	Type	Description
`Tokens`	int	Number of tokens per chunk
`Overlap`	int	Number of overlapping tokens between chunks

Behavior:

If Tokens exceeds document length or is zero, the entire document becomes a single chunk
If overlap exceeds tokens, overlap is clamped to tokens - 1 with a warning
Chunk content includes overlap; content_without_overlap excludes the overlap region

Sources: goldenverba/components/chunking/TokenChunker.py

2. Sentence Chunker

The Sentence Chunker splits documents at sentence boundaries using spaCy's sentence segmentation. This preserves complete sentences within chunks.

Key Features:

Sentence-level splitting using spaCy's sents property
Configurable sentences per chunk
Configurable overlap at sentence level
Character-level index tracking for precise boundaries

Configuration Options:

Parameter	Type	Description
`Sentences`	int	Number of sentences per chunk
`Overlap`	int	Number of overlapping sentences between chunks

Behavior:

Extracts all sentences from the document using spaCy
Joins sentences to form chunks while preserving overlap regions
Calculates character offsets accounting for spacing between sentences

Sources: goldenverba/components/chunking/SentenceChunker.py

3. Recursive Chunker

The Recursive Chunker uses LangChain's RecursiveCharacterTextSplitter to intelligently split text while attempting to preserve natural boundaries.

Key Features:

Multi-level character-based splitting
Preserves chunk boundaries at logical text breaks
Configurable chunk size and overlap
Uses keep Separator parameter to retain separators in chunks

Sources: goldenverba/components/chunking/RecursiveChunker.py

4. Semantic Chunker

The Semantic Chunker groups content based on semantic similarity rather than fixed sizes or boundaries.

Key Features:

Clusters sentences by semantic meaning
Dynamically determines chunk boundaries based on content similarity
Optimal for maintaining topical coherence

Sources: goldenverba/components/chunking/SemanticChunker.py

5. Markdown Chunker

The Markdown Chunker splits documents based on markdown header hierarchy using LangChain's MarkdownHeaderTextSplitter.

Supported Headers:

Header Level	Syntax
Header 1	`#`
Header 2	`##`
Header 3	`###`

Key Features:

Splits at markdown header boundaries
Preserves header context by prepending headers to each chunk
Uses get_header_values() helper to extract header text from LangChain document metadata
Maintains hierarchical context through header inclusion

Header Extraction Logic:

def get_header_values(split_doc: LangChainDocument) -> list[str]:
    header_keys = [header_key for _, header_key in HEADERS_TO_SPLIT_ON]
    return [
        header_value
        for header_key in header_keys
        if (header_value := split_doc.metadata.get(header_key)) is not None
    ]

Sources: goldenverba/components/chunking/MarkdownChunker.py

6. HTML Chunker

The HTML Chunker splits documents based on HTML tag structure using LangChain's HTMLHeaderTextSplitter.

Supported Tags:

Tag	Description
`h1`	Header 1
`h2`	Header 2
`h3`	Header 3
`h4`	Header 4

Key Features:

Splits at HTML header boundaries
Preserves header content within each chunk
Appends header text before page content
Requires langchain_text_splitters library

Chunk Text Construction:

chunk_text = ""
if len(chunk.metadata) > 0:
    chunk_text += list(chunk.metadata.values())[0] + "\n"
chunk_text += chunk.page_content

Sources: goldenverba/components/chunking/HTMLChunker.py

7. Code Chunker

The Code Chunker is optimized for source code files, splitting based on code-specific structures and syntax.

Key Features:

Language-aware splitting for code files
Preserves code structure and syntax context
Handles various programming language conventions

Sources: goldenverba/components/chunking/CodeChunker.py

8. JSON Chunker

The JSON Chunker splits JSON documents at logical structural boundaries.

Key Features:

Splits at JSON object/array boundaries
Preserves nested structure context
Handles JSON-specific formatting

Sources: goldenverba/components/chunking/JSONChunker.py

Chunking Workflow

graph TD
    A[Document Ingestion] --> B{Already Chunked?}
    B -->|Yes| C[Skip Chunking]
    B -->|No| D[Select Chunker Strategy]
    D --> E{Chunking Strategy}
    E -->|Token| F[TokenChunker]
    E -->|Sentence| G[SentenceChunker]
    E -->|Recursive| H[RecursiveChunker]
    E -->|Semantic| I[SemanticChunker]
    E -->|Markdown| J[MarkdownChunker]
    E -->|HTML| K[HTMLChunker]
    E -->|Code| L[CodeChunker]
    E -->|JSON| M[JSONChunker]
    F --> N[Split Document]
    G --> N
    H --> N
    I --> N
    J --> N
    K --> N
    L --> N
    M --> N
    N --> O[Create Chunk Objects]
    O --> P[Append to document.chunks]
    P --> Q[Return Modified Documents]

Chunking Configuration

Each chunker accepts a configuration dictionary with strategy-specific parameters:

Chunker	Primary Config	Overlap Config	Library Dependency
Token	`Tokens`	`Overlap`	`spacy`
Sentence	`Sentences`	`Overlap`	`spacy`
Recursive	`Chunk Size`	`Overlap`	`langchain_text_splitters`
Semantic	N/A	N/A	`langchain_text_splitters`
Markdown	N/A	N/A	`langchain_text_splitters`
HTML	N/A	N/A	`langchain_text_splitters`
Code	N/A	N/A	`langchain_text_splitters`
JSON	N/A	N/A	`langchain_text_splitters`

Overlap Strategy

Overlap enables context preservation between adjacent chunks, improving retrieval accuracy for queries that span chunk boundaries.

Overlap Calculation in TokenChunker:

while i < len(doc):
    start_i = i
    end_i = min(i + units + overlap, len(doc))
    if end_i == len(doc):
        overlap_start = end_i
    else:
        overlap_start = min(i + units, end_i)

    chunk_text = doc[start_i:end_i].text
    chunk_text_without_overlap = doc[start_i:overlap_start].text

Overlap Calculation in SentenceChunker:

overlap_start = max(0, end_i - overlap)
chunk_text = " ".join(sentences[start_i:end_i])
chunk_text_without_overlap = " ".join(sentences[start_i:overlap_start])

Sources: goldenverba/components/chunking/TokenChunker.py, goldenverba/components/chunking/SentenceChunker.py

Skip Mechanism

All chunkers implement a document skipping mechanism to prevent redundant processing:

# Skip if document already contains chunks
if len(document.chunks) > 0:
    continue

This ensures idempotent chunking operations and allows manual chunk management.

Best Practices

Choosing a Chunker

Document Type	Recommended Chunker	Reason
Plain text	Token, Sentence, Recursive	General-purpose splitting
Markdown files	Markdown	Respects header hierarchy
HTML documents	HTML	Preserves HTML structure
Source code	Code	Language-aware boundaries
JSON data	JSON	Structural preservation
Long-form content	Semantic	Topic coherence
Conversational data	Sentence	Natural language boundaries

Configuration Guidelines

Token/Sentence counts: Start with 256-512 tokens or 3-5 sentences per chunk
Overlap: Use 10-20% overlap for most use cases
Boundary alignment: For token-based chunking, prefer natural token boundaries over character cuts

Dependencies

Library	Used By	Purpose
`spacy`	TokenChunker, SentenceChunker, RecursiveChunker	NLP processing, tokenization, sentence segmentation
`langchain_text_splitters`	MarkdownChunker, HTMLChunker, CodeChunker, JSONChunker, RecursiveChunker, SemanticChunker	Text splitting algorithms

Sources: goldenverba/components/chunking/TokenChunker.py, goldenverba/components/chunking/MarkdownChunker.py, goldenverba/components/chunking/HTMLChunker.py

Sources: [goldenverba/components/chunking/__init__.py]()

Embedding and Vector Storage

Related topics: Chunking Strategies, RAG Retrieval System

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Available Embedders

Continue reading this section for the full explanation and source context.

Section Embedder Selection

Continue reading this section for the full explanation and source context.

Section Environment Variables

Continue reading this section for the full explanation and source context.

Related topics: Chunking Strategies, RAG Retrieval System

Embedding and Vector Storage

Verba implements a modular RAG (Retrieval-Augmented Generation) pipeline where embeddings play a critical role in transforming textual content into vector representations for semantic search and retrieval operations.

Overview

Embedding and vector storage in Verba enables the transformation of documents and their chunks into high-dimensional vector representations. These vectors power semantic search capabilities, allowing users to retrieve relevant content based on meaning rather than exact keyword matches. The system supports multiple embedding providers and integrates with Weaviate as the primary vector database.

Architecture

graph TD
    A[Document] --> B[Reader Component]
    B --> C[Text Content]
    C --> D[Chunker Component]
    D --> E[Document Chunks]
    E --> F[Embedder Component]
    F --> G[Vector Embeddings]
    G --> H[Weaviate Vector Store]
    
    I[User Query] --> J[Embedder Component]
    J --> K[Query Vector]
    K --> H
    H --> L[Similarity Search]
    L --> M[Retrieved Chunks]

Embedder Components

Verba supports multiple embedder implementations, allowing users to choose the provider that best fits their requirements.

Available Embedders

Embedder	Provider	Status
OpenAI Embedder	OpenAI	Default
Cohere Embedder	Cohere	Supported
Ollama Embedder	Ollama	Supported
SentenceTransformers Embedder	HuggingFace	Supported
VoyageAI Embedder	VoyageAI	Supported
Upstage Embedder	Upstage	Supported
Google Embedder	Google	Supported
Weaviate Embedder	Weaviate	Built-in

Sources: CHANGELOG.md

Embedder Selection

The embedder can be configured through the RAG configuration interface:

<ComponentView
  RAGConfig={RAGConfig}
  component_name="Embedder"
  selectComponent={selectComponent}
  updateConfig={updateConfig}
  saveComponentConfig={saveComponentConfig}
  blocked={production == "Demo"}
/>

Sources: frontend/app/components/Chat/ChatConfig.tsx

Configuration

Environment Variables

Variable	Description	Required
`OPENAI_API_KEY`	API key for OpenAI embeddings	For OpenAI
`OPENAI_EMBED_API_KEY`	Separate API key for embeddings	Optional
`OPENAI_EMBED_BASE_URL`	Custom endpoint for embeddings	Optional
`OPENAI_CUSTOM_EMBED`	Flag for custom embedding models	Optional
`COHERE_API_KEY`	API key for Cohere	For Cohere
`VOYAGE_API_KEY`	API key for VoyageAI	For VoyageAI
`UPSTAGE_API_KEY`	API key for Upstage	For Upstage
`GOOGLE_API_KEY`	API key for Google	For Google

Sources: README.md

Custom OpenAI Embedding Configuration

For users deploying custom OpenAI-compatible embedding servers:

OPENAI_EMBED_API_KEY=YOUR_API_KEY
OPENAI_EMBED_BASE_URL=YOUR_CUSTOM_URL
OPENAI_CUSTOM_EMBED=true

Sources: README.md

Document Processing Pipeline

The embedding process is part of a larger document ingestion pipeline:

graph LR
    A[File Upload] --> B[Reader]
    B --> C[Text Extraction]
    C --> D[Chunking]
    D --> E[Chunk Processing]
    E --> F[Embedding Generation]
    F --> G[Vector Storage]
    G --> H[Indexing]

Step 1: Document Reading

Documents are first processed by a Reader component that extracts text content based on file type. Supported formats include PDF, DOCX, TXT, CSV, XLSX, XLS, and more.

Step 2: Chunking

Extracted text is split into smaller chunks using configurable chunking strategies. The chunker can be selected and configured per document.

Step 3: Embedding Generation

Each chunk is passed through the selected embedder to generate a vector representation. The embedder transforms textual content into numerical vectors in a high-dimensional space.

Sources: frontend/app/components/Ingestion/BasicSettingView.tsx

RAG Configuration

Verba allows per-document RAG configuration including the embedder selection:

interface RAGConfig {
  "Reader": {
    selected: string;
    components: Record<string, ComponentConfig>;
  };
  "Chunker": {
    selected: string;
    components: Record<string, ComponentConfig>;
  };
  "Embedder": {
    selected: string;
    components: Record<string, ComponentConfig>;
  };
  "Retriever": {
    selected: string;
    components: Record<string, ComponentConfig>;
  };
  "Generator": {
    selected: string;
    components: Record<string, ComponentConfig>;
  };
}

Sources: frontend/app/components/Ingestion/BasicSettingView.tsx

Vector Storage in Weaviate

Verba uses Weaviate as its vector database. The embeddings are stored alongside metadata for efficient similarity search operations.

Collection Management

Collections in Weaviate store documents with their associated vectors:

{
    "name": "collection_name",
    "count": number_of_objects,
    "status": "vector_status",
    "shards": number_of_shards
}

Sources: frontend/app/components/Settings/InfoView.tsx

Document Document Structure

Documents stored with embeddings contain:

Field	Type	Description
`title`	string	Document title
`content`	string	Original text content
`extension`	string	File extension/type
`labels`	list	User-defined labels
`source`	string	Source URL or reference
`fileSize`	int	Size in bytes
`metadata`	dict	Additional metadata
`vectors`	array	Embedding vectors

Sources: goldenverba/components/document.py

Dependency Management

Core embedding dependencies are specified in setup.py:

install_requires=[
    "weaviate-client==4.9.6",
    "openpyxl==3.1.5",
    "fastapi==0.111.1",
    # ... additional dependencies
]

Sources: setup.py

Deployment Considerations

Local Deployment

When running Verba locally, users can select embedders that don't require API keys (such as SentenceTransformers) or configure API-based embedders with appropriate keys.

Production Deployment

In production environments:

Embedder selection may be restricted based on deployment type
API keys should be configured via environment variables
Vector storage is managed through the configured Weaviate instance

Summary

Verba's embedding and vector storage system provides a flexible, provider-agnostic approach to semantic search. By supporting multiple embedding providers and integrating with Weaviate for vector storage, the system enables effective retrieval-augmented generation across various use cases. Configuration can be done per-document or at the system level, giving users fine-grained control over the RAG pipeline.

Sources: [CHANGELOG.md](https://github.com/weaviate/Verba/blob/main/CHANGELOG.md)

RAG Retrieval System

Related topics: Embedding and Vector Storage, LLM Generators and Answer Generation

Section Related Pages

Continue reading this section for the full explanation and source context.

RAG Retrieval System

Overview

The RAG (Retrieval-Augmented Generation) Retrieval System is the core query mechanism in Verba that enables semantic search across document collections. It retrieves relevant text chunks from vectorized documents stored in Weaviate and delivers them to the Generator component for answer synthesis.

In Verba's architecture, the Retrieval System operates as a pipeline component that:

Receives user queries from the frontend chat interface
Embeds the query using the configured Embedder
Performs vector similarity search in Weaviate
Returns ranked chunks to the chat interface for display and generation

Sources: frontend/app/components/Chat/ChatConfig.tsx

Sources: [frontend/app/components/Chat/ChatConfig.tsx](frontend/app/components/Chat/ChatConfig.tsx)

LLM Generators and Answer Generation

Related topics: LLM Generators and Answer Generation, RAG Retrieval System

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Generator Interface

Continue reading this section for the full explanation and source context.

Section Supported Providers

Continue reading this section for the full explanation and source context.

Section Environment Variables

Continue reading this section for the full explanation and source context.

LLM Generators and Answer Generation

Overview

Verba implements a modular LLM Generator system that provides answer generation capabilities for the RAG (Retrieval-Augmented Generation) pipeline. The system supports multiple LLM providers through a common interface abstraction, allowing users to choose their preferred generation backend while maintaining consistent behavior across the application.

The generation module is part of Verba's component architecture, where each generator implements a common interface that defines methods for retrieving supported models, generating responses, and managing authentication credentials.

Architecture

Generator Interface

All LLM generators in Verba inherit from a common base class that defines the contract for generation operations. This design enables:

Provider Agnosticism: Switch between LLM providers without changing application code
Consistent API: All generators expose the same methods for model listing and text generation
Credential Management: Unified handling of API keys and authentication

Supported Providers

Verba supports the following LLM providers for answer generation:

Provider	Class Name	API Type	Cloud/Local
OpenAI	`OpenAIGenerator`	REST API	Cloud
Anthropic	`AnthrophicGenerator`	REST API	Cloud
Cohere	`CohereGenerator`	REST API	Cloud
Groq	`GroqGenerator`	REST API	Cloud
Ollama	`OllamaGenerator`	Local API	Local
Gemini	`GeminiGenerator`	REST API	Cloud
Novita AI	`NovitaGenerator`	REST API	Cloud
Upstage	`UpstageGenerator`	REST API	Cloud
Atlas Cloud	`AtlasCloudGenerator`	REST API	Cloud

Generation Workflow

graph TD
    A[User Query] --> B[RAG Pipeline]
    B --> C[Retrieval: Find Relevant Chunks]
    C --> D[Context Assembly]
    D --> E[LLM Generator]
    E --> F[Prompt Construction]
    F --> G[API Call to LLM Provider]
    G --> H[Response Parsing]
    H --> I[Formatted Answer]
    
    J[Credentials] --> E
    K[System Prompt] --> F
    L[Model Selection] --> G

Configuration

Environment Variables

Generation behavior can be configured through environment variables:

Variable	Description	Required
`OLLAMA_MODEL`	Default Ollama model for local generation	For Ollama setup
`OLLAMA_EMBED_MODEL`	Default Ollama embedding model	For Ollama setup
`OPENAI_API_KEY`	API key for OpenAI provider	For OpenAI setup
`ANTHROPIC_API_KEY`	API key for Anthropic/Claude	For Anthropic setup

Runtime Configuration

The frontend allows dynamic model selection through the ChatConfig component. When users select a deployment type, Verba fetches available models from the configured provider and presents them in the UI.

From frontend/app/components/Chat/ChatInterface.tsx:

<ChatConfig
  addStatusMessage={addStatusMessage}
  production={production}
  RAGConfig={RAGConfig}
  credentials={credentials}
  setRAGConfig={setRAGConfig}
  onReset={onResetConfig}
  onSave={onSaveConfig}
/>

Component Interaction

Chat Pipeline

sequenceDiagram
    participant User
    participant Frontend
    participant Backend
    participant Generator
    participant LLM_Provider
    
    User->>Frontend: Submit Query
    Frontend->>Backend: /api/chat with query
    Backend->>Generator: Generate response
    Generator->>LLM_Provider: API request
    LLM_Provider-->>Generator: LLM Response
    Generator-->>Backend: Formatted answer
    Backend-->>Frontend: Stream response
    Frontend->>User: Display answer

Message Handling

The chat interface manages different fetching states during generation:

{fetchingStatus === "CHUNKS" && "Retrieving..."}
{fetchingStatus === "RESPONSE" && "Generating..."}

Users can cancel ongoing generation through the UI, which sets the fetching status to DONE and stops further API calls.

Document-to-Answer Flow

When processing documents for RAG:

Ingestion: Documents are parsed and chunked via Reader components
Storage: Chunks are embedded and stored in Weaviate
Retrieval: Relevant chunks are fetched based on query similarity
Generation: Selected chunks are sent to the LLM generator with the user's query

From goldenverba/components/document.py:

def create_document(content: str, fileConfig: FileConfig) -> Document:
    """Create a Document object from the file content."""
    return Document(
        title=fileConfig.filename,
        content=content,
        extension=fileConfig.extension,
        labels=fileConfig.labels,
        source=fileConfig.source,
        fileSize=fileConfig.file_size,
        metadata=fileConfig.metadata,
        meta={},
    )

Deployment Types

Verba supports different deployment configurations that affect generation behavior:

Deployment	Description	Generator Usage
Weaviate Cloud	Full cloud deployment	Cloud-based LLM APIs
Docker	Containerized deployment	Configurable via environment
Local	Development setup	Often uses Ollama
Custom	User-defined infrastructure	Flexible configuration

The deployment type is selected during the initial setup through the LoginView component:

From frontend/app/components/Login/LoginView.tsx:

{production == "Local" && (
    <div className="flex flex-col justify-start gap-2 w-full">
        <VerbaButton
            Icon={FaDatabase}
            title="Weaviate"
            onClick={() => setSelectedDeployment("Weaviate")}
        />
        {/* Docker, Custom, Local options */}
    </div>
)}

Model Selection

Dynamic Model Retrieval

Verba supports dynamic model name retrieval for OpenAI-compatible APIs based on the provided API key and URL. This allows the system to automatically discover and list available models from the configured provider.

From CHANGELOG.md:

Dynamic model name retrieval for OpenAI Generator based on OpenAI URL and API Key

Model Fallback

When automatic model detection is unavailable, Verba uses default models:

OpenAI: gpt-4o-mini
Anthropic: claude-3-haiku-20240307
Ollama: Configurable via OLLAMA_MODEL

Status and Error Handling

The application uses a status messenger system to communicate generation states to users:

{messages.filter((message) => {
    const messageTime = new Date(message.timestamp).getTime();
    const currentTime = new Date().getTime();
    return currentTime - messageTime < 5000; // 5 seconds
}).map((message, index) => (
    <motion.div key={index}>
        {/* Status message display */}
    </motion.div>
))}

Messages are categorized by type and automatically expire after 5 seconds.

Reset Capabilities

The Settings panel (InfoView) provides reset functionality that affects generation:

Action	Scope	Effect on Generation
Reset Documents	Data	Clears chunks, requires re-retrieval
Reset Config	Configuration	Resets model selection and prompts
Reset Verba	All Data	Full system reset including models
Reset Suggestions	Autocomplete	Clears cached suggestions

From frontend/app/components/Settings/InfoView.tsx:

<UserModalComponent
    modal_id="reset-documents"
    title="Reset Documents"
    text="Are you sure you want to reset all documents?"
    triggerAccept={resetDocuments}
    triggerString="Reset"
/>

Dependencies

The generation module relies on the following core dependencies:

From setup.py:

install_requires=[
    "weaviate-client==4.9.6",
    "fastapi==0.111.1",
    "uvicorn[standard]==0.29.0",
    "click==8.1.7",
    # Provider-specific SDKs loaded dynamically
]

Version History

Version	Changes
2.1.3	Added `OLLAMA_MODEL` and `OLLAMA_EMBED_MODEL` environment variables
2.1.2	Added Novita Generator
2.1.1	Dynamic model retrieval for OpenAI Generator
2.1.0	Added Upstage Generator, Groq, improved configuration

Best Practices

API Key Security: Store API keys in .env files rather than hardcoding
Model Selection: Choose models based on task requirements (speed vs. quality)
Chunk Configuration: Adjust chunk sizes to match model's context window
Rate Limits: Be aware of provider-specific rate limits for high-volume usage

Embedder Configuration

Related topics: Embedding and Vector Storage

Section Related Pages

Continue reading this section for the full explanation and source context.

Section RAG Pipeline Integration

Continue reading this section for the full explanation and source context.

Section Component Selection Flow

Continue reading this section for the full explanation and source context.

Section Embedder Selection Panel

Continue reading this section for the full explanation and source context.

Related topics: Embedding and Vector Storage

Embedder Configuration

Overview

The Embedder is a core component in Verba's RAG (Retrieval-Augmented Generation) pipeline responsible for converting text content into vector embeddings. These embeddings enable semantic search capabilities by representing documents and queries as numerical vectors in a high-dimensional space.

Verba supports multiple embedder implementations including OpenAI, Ollama, Upstage, and other providers. The embedder configuration system allows users to select their preferred embedding provider, configure provider-specific settings, and integrate with the broader RAG pipeline.

Sources: frontend/app/components/Chat/ChatConfig.tsx:24-35

Architecture

RAG Pipeline Integration

The Embedder operates as part of a modular RAG configuration system alongside the Retriever and Generator components.

graph TD
    A[Document Input] --> B[Reader]
    B --> C[Chunker]
    C --> D[Embedder]
    D --> E[Weaviate Vector Store]
    F[User Query] --> G[Embedder]
    G --> H[Retriever]
    H --> I[Generator]
    I --> J[Response]
    E -.-> H

Sources: frontend/app/components/Chat/ChatConfig.tsx:24-35

Component Selection Flow

Users interact with the embedder configuration through a component selection interface:

sequenceDiagram
    participant User
    participant ComponentView
    participant RAGConfig
    participant Embedder
    
    User->>ComponentView: Select Embedder Component
    ComponentView->>RAGConfig: updateConfig(Embedder, selection)
    RAGConfig->>Embedder: Apply Configuration
    Embedder-->>RAGConfig: Confirm Settings
    RAGConfig-->>ComponentView: Update UI
    ComponentView-->>User: Display Updated State

Sources: frontend/app/components/Ingestion/BasicSettingView.tsx:1-100

Configuration UI

Embedder Selection Panel

The embedder can be configured through the ingestion interface where users select and customize embedding models for their documents.

Field	Type	Description
`selected`	string	Currently selected embedder name
`components`	object	Available embedder implementations
`description`	string	Human-readable description of selected embedder

Sources: frontend/app/components/Ingestion/BasicSettingView.tsx:85-95

Display and Configuration

The UI displays the currently selected embedder with its configuration:

<div className="flex gap-2 justify-between items-center text-text-verba">
  <p className="flex min-w-[8vw]">Embedder</p>
  <label className="input flex items-center gap-2 w-full bg-bg-verba">
    <input
      type="text"
      className="grow w-full"
      value={fileMap[selectedFileData].rag_config["Embedder"].selected}
      disabled={true}
    />
  </label>
</div>

Sources: frontend/app/components/Ingestion/BasicSettingView.tsx:85-95

Dynamic Description

Each embedder component provides a description that is displayed in the UI:

<div className="flex gap-2 items-center text-text-verba">
  <p className="flex min-w-[8vw]"></p>
  <p className="text-sm text-text-alt-verba text-start">
    {selectedFileData &&
      fileMap[selectedFileData].rag_config["Embedder"].components[
        fileMap[selectedFileData].rag_config["Embedder"].selected
      ].description}
  </p>
</div>

Sources: frontend/app/components/Ingestion/BasicSettingView.tsx:97-106

RAG Configuration System

Configuration Structure

The embedder is configured within the RAGConfig object which manages all pipeline components:

interface RAGConfig {
  Embedder: {
    selected: string;
    components: Record<string, EmbedderComponent>;
  };
  Generator: {
    selected: string;
    components: Record<string, GeneratorComponent>;
  };
  Retriever: {
    selected: string;
    components: Record<string, RetrieverComponent>;
  };
}

Sources: frontend/app/components/Chat/ChatConfig.tsx:1-50

Component View Integration

The ComponentView component renders the embedder selection UI:

<ComponentView
  RAGConfig={RAGConfig}
  component_name="Embedder"
  selectComponent={selectComponent}
  updateConfig={updateConfig}
  saveComponentConfig={saveComponentConfig}
  blocked={production == "Demo"}
/>

Sources: frontend/app/components/Chat/ChatConfig.tsx:24-35

Supported Embedder Providers

Provider Matrix

Provider	Environment Variables	Version Added	Features
OpenAI	`OPENAI_API_KEY`, `OPENAI_URL`	Base	Dynamic model name retrieval
Ollama	`OLLAMA_EMBED_MODEL`	2.1.3	Local embedding models
Upstage	(Upstage-specific)	2.1.0	High-performance embeddings

Sources: CHANGELOG.md:1-30, setup.py:1-50

Environment Variables

Verba supports environment variables for embedder configuration:

Variable	Description	Example
`OLLAMA_EMBED_MODEL`	Ollama embedding model name	`nomic-embed-text`
`OPENAI_API_KEY`	OpenAI API key	`sk-...`
`OPENAI_URL`	OpenAI API endpoint	`https://api.openai.com/v1`

Sources: CHANGELOG.md:1-15

Deployment Modes

Demo Mode Restrictions

In Demo mode, the embedder configuration is locked to prevent changes:

blocked={production == "Demo"}

This ensures that demo deployments maintain consistent behavior and don't allow users to modify the embedding pipeline.

Sources: frontend/app/components/Chat/ChatConfig.tsx:24-35

Configuration Persistence

The embedder configuration is saved through the saveComponentConfig callback:

<ComponentView
  RAGConfig={RAGConfig}
  component_name="Embedder"
  selectComponent={selectComponent}
  updateConfig={updateConfig}
  saveComponentConfig={saveComponentConfig}
  blocked={production == "Demo"}
/>

Sources: frontend/app/components/Chat/ChatConfig.tsx:24-35

Debugging Embedder Configuration

Verba provides a debug view for inspecting file configuration including embedder settings:

<VerbaButton
  Icon={CgDebug}
  onClick={openDebugModal}
  className="max-w-min"
/>

<dialog id={"File_Debug_Modal"} className="modal">
  <pre className="whitespace-pre-wrap text-xs">
    {selectedFileData
      ? (() => {
          const objCopy = { ...fileMap[selectedFileData] };
          objCopy.content = "File Content";
          return JSON.stringify(objCopy, null, 2);
        })()
      : ""}
  </pre>
</dialog>

Sources: frontend/app/components/Ingestion/BasicSettingView.tsx:110-130

File-Level Configuration

Per-File Embedder Override

Each file can have its own embedder configuration that overrides the global RAG config:

interface FileConfig {
  filename: string;
  extension: string;
  labels: string[];
  source: string;
  file_size: number;
  metadata: Record<string, any>;
  rag_config: RAGConfig;
}

Sources: frontend/app/components/Ingestion/BasicSettingView.tsx:1-100

Accessing File Embedder Config

fileMap[selectedFileData].rag_config["Embedder"].selected
fileMap[selectedFileData].rag_config["Embedder"].components

Sources: frontend/app/components/Ingestion/BasicSettingView.tsx:85-106

Version History

Version	Changes
2.1.3	Added `OLLAMA_MODEL` and `OLLAMA_EMBED_MODEL` environment variables
2.1.0	Added Upstage Embedder support
2.1.1	Dynamic model name retrieval for OpenAI

Sources: CHANGELOG.md:1-30

Dependencies

Verba requires the Weaviate client for vector storage operations:

install_requires=[
    "weaviate-client==4.9.6",
    "python-dotenv==1.0.0",
    "openpyxl==3.1.5",
    "wasabi==1.1.2",
    "fastapi==0.111.1",
    "uvicorn[standard]==0.29.0",
    "gunicorn==22.0.0",
    "click==8.1.7",
]

Sources: setup.py:15-25

Best Practices

Use Environment Variables: Store API keys and model names in .env files rather than hardcoding
Test in Non-Demo Mode: Full configuration features require non-Demo deployment
Leverage Debug Mode: Use the debug modal to inspect configuration state
Check Provider Support: Ensure your chosen embedder provider is compatible with your deployment mode

Sources: [frontend/app/components/Chat/ChatConfig.tsx:24-35]()

Frontend Component Overview

The Verba frontend is a React-based web application that provides a user-friendly interface for Retrieval-Augmented Generation (RAG) operations. Built with Next.js and TypeScript, the fron...

Section Navigation Components

Continue reading this section for the full explanation and source context.

Section Chat Components

Continue reading this section for the full explanation and source context.

Section Document Components

Continue reading this section for the full explanation and source context.

Section Ingestion Components

Continue reading this section for the full explanation and source context.

Introduction

The Verba frontend is a React-based web application that provides a user-friendly interface for Retrieval-Augmented Generation (RAG) operations. Built with Next.js and TypeScript, the frontend enables users to interact with their data through chat interfaces, document management, and configuration settings.

Sources: frontend/app/page.tsx:1-50

Architecture Overview

The frontend follows a modular component-based architecture organized by functionality. The application uses WebSocket connections for real-time communication with the backend server, enabling live status updates and streaming responses.

graph TD
    A[App Entry Point] --> B[LoginView]
    A --> C[Main Application]
    C --> D[NavbarComponent]
    C --> E[ChatInterface]
    C --> F[Document Components]
    C --> G[Ingestion Components]
    C --> H[Settings Components]
    D --> I[StatusMessenger]
    E --> J[ChatMessage]
    E --> K[ChatConfig]

Sources: frontend/app/components/Navigation/NavbarComponent.tsx:1-100

Core Components

#### NavbarComponent

The navigation bar serves as the primary routing mechanism within the application. It displays different pages based on the user's current selection and deployment mode.

Prop	Type	Description
`currentPage`	`string`	Current active page identifier
`setCurrentPage`	`function`	Callback to change active page
`production`	`string`	Deployment type (Local, Demo, Docker, etc.)
`gitHubStars`	`string`	GitHub star count display

The NavbarComponent conditionally renders menu items based on the production environment. In non-Demo modes, additional options like "Import Data" and "Settings" become available.

Sources: frontend/app/components/Navigation/NavbarComponent.tsx:1-150

#### StatusMessengerComponent

The StatusMessengerComponent provides toast-style notifications for application events. Messages are filtered to display only those within the last 5 seconds, providing transient user feedback.

graph LR
    A[Message Event] --> B[Timestamp Check]
    B --> C{Within 5s?}
    C -->|Yes| D[Animate In]
    C -->|No| E[Filter Out]
    D --> F[Display Message]
    F --> G[Animate Out]

Messages are color-coded by type using a getMessageColor() function and include icons for visual identification.

Sources: frontend/app/components/Navigation/StatusMessenger.tsx:1-80

Chat Components

#### ChatInterface

The ChatInterface is the central component for RAG interactions. It manages chat messages, user input, and the configuration panel.

State Variable	Type	Purpose
`messages`	`Message[]`	Array of chat messages
`previewText`	`string`	Streaming response preview
`isFetching`	`RefObject`	Fetching status indicator
`selectedSetting`	`string`	Active sub-panel (Chat/Config)
`fetchingStatus`	`string`	Current operation status

The component supports two sub-views: the chat view for conversation and a configuration view for RAG settings. A cancel button allows users to interrupt ongoing operations.

Sources: frontend/app/components/Chat/ChatInterface.tsx:1-150

#### ChatMessage

The ChatMessage component renders individual messages with support for multiple message types and syntax highlighting for code blocks.

Message Type	Styling	Features
`user`	Right-aligned, primary background	Plain text display
`system`	Left-aligned, alternate background	Markdown + syntax highlighting
`error`	Warning background color	Error notifications
`retrieval`	Standard background	Retrieval results

For system messages containing code, the component uses react-syntax-highlighter with theme support for both light and dark modes.

<SyntaxHighlighter
  style={selectedTheme.theme === "dark" ? oneDark : oneLight}
  language={match[1]}
  PreTag="div"
>
  {String(children).replace(/\n$/, "")}
</SyntaxHighlighter>

Sources: frontend/app/components/Chat/ChatMessage.tsx:1-100

Document Components

#### ContentView

The ContentView component displays document content with pagination support for both chunks and pages.

Feature	Description
Chunk Navigation	Previous/Next chunk buttons
Page Navigation	Page-based content display
Scroll Handling	Overflow auto-scroll for long content
Label Display	Truncated label badges with max-width constraints

The component conditionally renders navigation text based on whether chunk scores are available, switching between "Chunk" and "Page" labels.

Sources: frontend/app/components/Document/ContentView.tsx:1-100

Ingestion Components

#### BasicSettingView

The BasicSettingView provides configuration options for document ingestion, including source links and labels.

Field	Purpose	Constraints
`source`	Reference link to original document	Optional field
`label`	Document categorization labels	Enter key adds label
`chunker`	Text chunking strategy	Read-only display
`embedder`	Embedding model selection	Read-only display

Labels are added via keyboard (Enter key) or button click, with each label rendered as a removable badge.

Sources: frontend/app/components/Ingestion/BasicSettingView.tsx:1-150

#### FileSelectionView

The FileSelectionView displays a list of selected files and provides import functionality.

Files are rendered as FileComponent instances, each with delete and selection capabilities. The import footer contains action buttons that become available when the WebSocket connection is online.

graph TD
    A[FileSelectionView] --> B[FileComponent List]
    B --> C{User Action}
    C --> D[Delete File]
    C --> E[Select File]
    C --> F[Import Selected]
    F --> G{WebSocket Online?}
    G -->|Yes| H[Show Import Button]
    G -->|No| I[Hide Import Button]

Sources: frontend/app/components/Ingestion/FileSelectionView.tsx:1-100

Settings Components

#### InfoView

The InfoView component displays system information and provides reset functionality for various aspects of the application.

Section	Information Displayed
Weaviate Cluster	Name, status, shard count
Collections	Collection count, names, object counts
Reset Options	Documents, Config, Verba, Suggestions

Reset operations are protected by modal confirmations using the UserModalComponent, requiring explicit user confirmation before executing destructive actions.

Sources: frontend/app/components/Settings/InfoView.tsx:1-100

#### LoginView

The LoginView handles initial deployment type selection for the application.

Deployment Option	Icon	Description
Weaviate	`FaDatabase`	Weaviate Cloud deployment
Docker	`FaDocker`	Docker container deployment
Custom	`TbDatabaseEdit`	Custom backend connection
Local	`FaLaptopCode`	Local development mode

The component manages connection states and displays loading indicators during authentication attempts.

Sources: frontend/app/components/Login/LoginView.tsx:1-100

Application Entry Point

The main page component (frontend/app/page.tsx) orchestrates the overall application flow, handling:

Environment detection and configuration
WebSocket connection management
Theme persistence
Page routing based on deployment status

graph TD
    A[Page Load] --> B{Production Mode?}
    B -->|Demo| C[Direct to Main]
    B -->|Local| D[LoginView]
    D --> E{Deployment Selected}
    E -->|Weaviate| F[Configure Weaviate]
    E -->|Docker| G[Connect to Docker]
    E -->|Custom| H[Configure Custom]
    E -->|Local| I[Full Setup]
    F --> J[Main Application]
    G --> J
    H --> J
    I --> J
    C --> J

The footer displays a "Built with ♥ and Weaviate" message, confirming the project's association with Weaviate.

Sources: frontend/app/page.tsx:1-80

Theme System

The application supports both light and dark themes, with theme preferences persisted across sessions. Theme values are passed down to components that require styling adjustments, such as ChatMessage for syntax highlighting.

// Theme-dependent syntax highlighting
const codeStyle = selectedTheme.theme === "dark" ? oneDark : oneLight;

Component Communication

Components communicate through several mechanisms:

Props Drilling: Parent components pass state and callbacks to children via props
Ref Objects: Used for mutable values that don't trigger re-renders (e.g., isFetching)
WebSocket Events: Real-time updates from the backend server
State Callbacks: Functions like setCurrentPage for navigation state

State Management Summary

Component Area	Primary State	Secondary State
Chat	`messages`, `previewText`	`fetchingStatus`, `selectedSetting`
Documents	`content`, `chunkScores`	`page`, `selectedDocument`
Ingestion	`fileMap`, `selectedFileData`	`source`, `label`
Navigation	`currentPage`	`socketOnline`, `production`
Settings	`collectionPayload`	`clusterPayload`, `credentials`

Styling Conventions

The frontend uses Tailwind CSS with custom color variables following the -verba suffix convention:

Color Variable	Usage
`bg-bg-verba`	Primary background
`bg-bg-alt-verba`	Alternate background
`text-text-verba`	Primary text
`text-text-alt-verba`	Alternate text
`bg-button-verba`	Button backgrounds
`hover:bg-button-hover-verba`	Button hover states

This systematic naming ensures consistent theming throughout the application.

Key Features Summary

Feature	Components	Description
Chat Interface	ChatInterface, ChatMessage	RAG conversation with syntax highlighting
Document Explorer	ContentView, DocumentExplorer	View and navigate documents/chunks
Data Import	FileSelectionView, BasicSettingView	File upload and configuration
System Settings	InfoView, SettingsView	System info and reset options
Real-time Updates	StatusMessenger	Toast notifications for events
Navigation	NavbarComponent	Page routing and menu

Sources: [frontend/app/page.tsx:1-50]()

Doramagic Pitfall Log

Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.

medium 1.0.1 Beautiful Verba

First-time setup may fail or require extra isolation and rollback planning.

medium v0.4.0

First-time setup may fail or require extra isolation and rollback planning.

medium v1.0.3

First-time setup may fail or require extra isolation and rollback planning.

medium v2.1.0

First-time setup may fail or require extra isolation and rollback planning.

Doramagic Pitfall Log

Doramagic extracted 14 source-linked risk signals. Review them before installing or handing real data to the project.

1. Installation risk: 1.0.1 Beautiful Verba

Severity: medium
Finding: Installation risk is backed by a source signal: 1.0.1 Beautiful Verba. Treat it as a review item until the current version is checked.
User impact: First-time setup may fail or require extra isolation and rollback planning.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: Source-linked evidence: https://github.com/weaviate/Verba/releases/tag/1.0.0

2. Installation risk: v0.4.0

Severity: medium
Finding: Installation risk is backed by a source signal: v0.4.0. Treat it as a review item until the current version is checked.
User impact: First-time setup may fail or require extra isolation and rollback planning.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: Source-linked evidence: https://github.com/weaviate/Verba/releases/tag/0.4.0

3. Installation risk: v1.0.3

Severity: medium
Finding: Installation risk is backed by a source signal: v1.0.3. Treat it as a review item until the current version is checked.
User impact: First-time setup may fail or require extra isolation and rollback planning.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: Source-linked evidence: https://github.com/weaviate/Verba/releases/tag/v1.0.3

4. Installation risk: v2.1.0

Severity: medium
Finding: Installation risk is backed by a source signal: v2.1.0. Treat it as a review item until the current version is checked.
User impact: First-time setup may fail or require extra isolation and rollback planning.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: Source-linked evidence: https://github.com/weaviate/Verba/releases/tag/v2.1

5. Capability assumption: README/documentation is current enough for a first validation pass.

Severity: medium
Finding: README/documentation is current enough for a first validation pass.
User impact: The project should not be treated as fully validated until this signal is reviewed.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: capability.assumptions | github_repo:672002598 | https://github.com/weaviate/Verba | README/documentation is current enough for a first validation pass.

6. Maintenance risk: Maintainer activity is unknown

Severity: medium
Finding: Maintenance risk is backed by a source signal: Maintainer activity is unknown. Treat it as a review item until the current version is checked.
User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: evidence.maintainer_signals | github_repo:672002598 | https://github.com/weaviate/Verba | last_activity_observed missing

7. Security or permission risk: no_demo

Severity: medium
Finding: no_demo
User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: downstream_validation.risk_items | github_repo:672002598 | https://github.com/weaviate/Verba | no_demo; severity=medium

8. Security or permission risk: No sandbox install has been executed yet; downstream must verify before user use.

Severity: medium
Finding: No sandbox install has been executed yet; downstream must verify before user use.
User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: risks.safety_notes | github_repo:672002598 | https://github.com/weaviate/Verba | No sandbox install has been executed yet; downstream must verify before user use.

9. Security or permission risk: no_demo

Severity: medium
Finding: no_demo
User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: risks.scoring_risks | github_repo:672002598 | https://github.com/weaviate/Verba | no_demo; severity=medium

10. Security or permission risk: v0.3.0

Severity: medium
Finding: Security or permission risk is backed by a source signal: v0.3.0. Treat it as a review item until the current version is checked.
User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: Source-linked evidence: https://github.com/weaviate/Verba/releases/tag/0.3.0

11. Security or permission risk: v0.3.1

Severity: medium
Finding: Security or permission risk is backed by a source signal: v0.3.1. Treat it as a review item until the current version is checked.
User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: Source-linked evidence: https://github.com/weaviate/Verba/releases/tag/0.3.1

12. Security or permission risk: v2.1.2

Severity: medium
Finding: Security or permission risk is backed by a source signal: v2.1.2. Treat it as a review item until the current version is checked.
User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: Source-linked evidence: https://github.com/weaviate/Verba/releases/tag/v2.1.2

Source: Doramagic discovery, validation, and Project Pack records

Community Discussion Evidence

These external discussion links are review inputs, not standalone proof that the project is production-ready.

Sources 8

Count of project-level external discussion links exposed on this manual page.

Use Review before install

Open the linked issues or discussions before treating the pack as ready for your environment.

Community Discussion Evidence

Doramagic exposes project-level community discussion separately from official documentation. Review these links before using Verba with real data or production workflows.

v2.1.2 - github / github_release
v2.1.0 - github / github_release
v1.0.3 - github / github_release
1.0.1 Beautiful Verba - github / github_release
v0.4.0 - github / github_release
v0.3.1 - github / github_release
v0.3.0 - github / github_release
README/documentation is current enough for a first validation pass. - GitHub / issue

Source: Project Pack community evidence and pitfall evidence

Verba

Introduction to Verba

Related Pages

Introduction to Verba

Overview

Core Features

Data Import and Management

Chat Interface

Document Explorer

Configuration Options

Architecture Overview

Deployment Options

Environment Variables

Supported Integrations

LLM Providers

Embedding Providers

Document Readers

Installation Methods

Install via pip

Build from Source

Deploy with Docker

Project Structure

User Interface Navigation

Settings and Management

Collections Management

Reset Operations

Getting Started Workflow

Version History

External Resources

Contributing

RAG Concepts in Verba

Related Pages

RAG Concepts in Verba

What is RAG?

Verba's RAG Pipeline Architecture

Core Components

Readers

Embedders

Generators

Document Ingestion Workflow

Chunking Configuration

Retrieval and Query Flow

Chat Interface

Message Types

Relevancy Scoring

Deployment Options

Environment Variables

Configuration Management

Configurable Options

Data Storage

Weaviate Collections

Reset Operations

Summary

Component Architecture

Related Pages

Component Architecture

Overview

System Architecture

Core Component Types

Component Manager Architecture

Frontend Component Architecture

Navigation Structure

Login and Deployment Configuration

Chat Interface Pipeline

Data Ingestion Flow

Settings and Configuration

RAG Configuration Schema

State Management

Dependency Injection

WebSocket Communication

Conclusion

Data Ingestion System

Related Pages

Data Ingestion System

System Overview

Ingestion Flow

Document Model

Document Creation from Configuration

Reader Components

Available Readers