Doramagic Project Pack · Human Manual
Verba
Verba (The Golden RAGtriever) is an open-source, user-friendly RAG (Retrieval-Augmented Generation) application developed by Weaviate. It provides a streamlined interface for building and ...
Introduction to Verba
Related topics: RAG Concepts in Verba
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: RAG Concepts in Verba
Introduction to Verba
Overview
Verba (The Golden RAGtriever) is an open-source, user-friendly RAG (Retrieval-Augmented Generation) application developed by Weaviate. It provides a streamlined interface for building and interacting with vector databases, enabling users to explore datasets and extract insights through semantic search and generative AI capabilities.
The application is designed to be accessible to both technical and non-technical users, offering multiple deployment options and integration with various LLM providers. Verba supports Python versions 3.10.0 to 3.12.0 and is distributed as the goldenverba Python package.
| Property | Value |
|---|---|
| Package Name | goldenverba |
| Current Version | 2.1.3 |
| Python Support | >=3.10.0, <3.13.0 |
| Repository | https://github.com/weaviate/Verba |
| License | BSD License |
Sources: setup.py:3-14
Core Features
Verba provides a comprehensive set of features for RAG-based applications:
Data Import and Management
Users can import documents through multiple methods:
- Add Files: Upload individual files directly
- Add Directory: Import entire folders of documents
- Add URL: Fetch content from web sources
The system automatically chunks and processes documents, making them searchable and queryable. Supported file formats include text files, PDFs, CSV, XLSX, and XLS formats for the DefaultReader.
Sources: README.md
Chat Interface
The Chat page enables users to ask questions about their imported data. The system retrieves relevant document chunks and generates contextual responses using the configured LLM. The chat interface displays:
- Real-time retrieval and generation status
- Cached results indicator for faster subsequent queries
- Source attribution for retrieved information
- Code block syntax highlighting in responses
Sources: frontend/app/components/Chat/ChatInterface.tsx
Document Explorer
Users can browse, view, and manage imported documents through the Document Explorer. Each document displays:
- Document metadata and labels
- Chunk information with relevancy scores
- Source links for reference
- Content preview with Markdown rendering
Sources: frontend/app/components/Document/ContentView.tsx
Configuration Options
Verba allows granular configuration of the RAG pipeline, including:
- LLM provider selection
- Embedding model configuration
- Retrieval parameters
- Chunk size and overlap settings
Architecture Overview
graph TD
A[User Interface - React Frontend] --> B[FastAPI Backend Server]
B --> C[Verba Manager - Core Logic]
C --> D[Weaviate Vector Database]
E[LLM Providers] --> B
F[Ollama / HuggingFace] --> B
G[Document Readers] --> C
H[Embedders] --> C
I[Generators] --> C
D --> G
D --> H
D --> IDeployment Options
Verba supports four deployment configurations to accommodate different use cases and infrastructure requirements.
| Deployment Type | Description | Use Case |
|---|---|---|
| Weaviate | Connect to Weaviate Cloud Services (WCS) | Production deployments with managed infrastructure |
| Docker | Run with Docker Compose | Containerized deployments |
| Local | Run entirely on local machine | Development and testing |
| Custom | Specify custom Weaviate URL and credentials | Integration with existing Weaviate instances |
Sources: frontend/app/components/Login/LoginView.tsx
Environment Variables
Configuration can be managed through environment variables for automated deployments:
| Variable | Description |
|---|---|
DEFAULT_DEPLOYMENT | Pre-select deployment type (Local, Docker, Weaviate, Custom) |
OLLAMA_MODEL | Default Ollama model name |
OLLAMA_EMBED_MODEL | Default Ollama embedding model |
Sources: CHANGELOG.md
Supported Integrations
LLM Providers
Verba integrates with multiple LLM providers for text generation:
- OpenAI: GPT models with dynamic model name retrieval based on API key and URL
- Anthropic: Claude models
- Cohere: Command models
- Groq: Fast inference API
- Novita AI: Additional generative capabilities
- Upstage: Reader, Embedder, and Generator support
Sources: README.md and CHANGELOG.md
Embedding Providers
For vector embeddings, Verba supports:
- Ollama: Local embedding models
- HuggingFace: Sentence transformers and other models
- OpenAI: text-embedding-ada-002 and newer models
- Upstage: Solar embedding models
Document Readers
The system includes specialized readers for various file formats:
- AssemblyAI: Audio file transcription and processing
- DefaultReader: Text, PDF, CSV, XLSX, XLS formats
- Unstructured: Advanced document parsing capabilities
Installation Methods
Install via pip
pip install goldenverba
Build from Source
git clone https://github.com/weaviate/Verba
cd Verba
pip install -e .
Deploy with Docker
git clone https://github.com/weaviate/Verba
docker compose --env-file <your-env-file> up -d --build
Sources: README.md
Project Structure
The Verba project is organized into two main components:
Verba/
├── goldenverba/ # Python backend package
│ └── server/ # FastAPI server implementation
├── frontend/ # React TypeScript frontend
│ └── app/
│ ├── components/ # React components
│ │ ├── Chat/ # Chat interface components
│ │ ├── Document/ # Document explorer components
│ │ ├── Ingestion/# Data import components
│ │ ├── Login/ # Authentication views
│ │ ├── Navigation/ # Navigation components
│ │ └── Settings/ # Configuration components
│ └── page.tsx # Main application page
├── setup.py # Package configuration
└── README.md # Project documentation
User Interface Navigation
The main navigation includes the following sections:
| Navigation Item | Description |
|---|---|
| Chat | Query imported data using RAG |
| Documents | Browse and manage imported documents |
| Import Data | Add new files, directories, or URLs |
| Settings | Configure Verba and manage collections |
Sources: frontend/app/components/Navigation/NavbarComponent.tsx
Settings and Management
The Settings page provides administrative functions for Verba management:
Collections Management
View all Weaviate collections with their object counts and status information, including shard configuration.
Reset Operations
| Operation | Description |
|---|---|
| Reset Documents | Clears all documents and chunks from Verba |
| Reset Config | Resets configuration to default values |
| Reset Verba | Deletes all Verba-related collections |
| Reset Suggestions | Clears autocomplete suggestion data |
Sources: frontend/app/components/Settings/InfoView.tsx
Getting Started Workflow
graph LR
A[Install Verba] --> B[Configure Deployment]
B --> C[Set API Keys]
C --> D[Import Data]
D --> E[Configure RAG Pipeline]
E --> F[Query Data]- Installation: Choose an installation method (pip, source, or Docker)
- Deployment Configuration: Select deployment type (Weaviate, Docker, Local, or Custom)
- API Keys: Configure required API keys in
.envfile or through the UI - Data Import: Import documents using Add Files, Add Directory, or Add URL
- Configuration: Adjust RAG pipeline settings under the Config tab
- Query: Ask questions and receive answers with relevant document citations
Sources: README.md and frontend/app/components/Login/GettingStarted.tsx
Version History
| Version | Release | Key Features |
|---|---|---|
| 2.1.3 | Latest | OLLAMA_MODEL, OLLAMA_EMBED_MODEL env vars, CSV/XLSX/XLS support, Hiding Getting Started display |
| 2.1.2 | Previous | Novita Generator support, basic Document class tests, spaCy fixes |
| 2.1.1 | Earlier | Dynamic OpenAI model retrieval |
| 2.1.0 | Earlier | Upstage integration, Custom deployment, Groq support, AssemblyAI Reader |
Sources: CHANGELOG.md
External Resources
| Resource | URL |
|---|---|
| GitHub Repository | https://github.com/weaviate/Verba |
| Blog Post | https://weaviate.io/blog/verba-open-source-rag-app |
| Video Tutorial | https://www.youtube.com/watch?v=swKKRdLBhas |
| Weaviate Forum | https://forum.weaviate.io/ |
Contributing
Verba is an open-source community project. Contributions are welcome through:
- GitHub Issues for bug reports
- GitHub Discussions for feature requests and ideas
- Pull Requests for code contributions
Before contributing, please review the Contribution Guide in the repository.
Sources: [setup.py:3-14](https://github.com/weaviate/Verba/blob/main/setup.py)
RAG Concepts in Verba
Related topics: Introduction to Verba, RAG Retrieval System, LLM Generators and Answer Generation
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Introduction to Verba, RAG Retrieval System, LLM Generators and Answer Generation
RAG Concepts in Verba
Verba (The Golden RAGtriever) is an open-source RAG (Retrieval-Augmented Generation) application designed to provide a streamlined, user-friendly interface for building and interacting with RAG-powered applications. This document explains the core RAG concepts implemented within Verba's architecture.
What is RAG?
Retrieval-Augmented Generation (RAG) is a pattern that combines the power of large language models (LLMs) with external knowledge retrieval. Instead of relying solely on a model's training data, RAG systems:
- Retrieve relevant documents or chunks from a knowledge base
- Augment the user's query with the retrieved context
- Generate a response using the LLM with the augmented input
Sources: README.md:1
Verba's RAG Pipeline Architecture
Verba implements a complete RAG pipeline with configurable components for reading, embedding, chunking, and generating.
graph TD
A[User Query] --> B[Retrieval Phase]
B --> C[Vector Search in Weaviate]
C --> D[Retrieve Relevant Chunks]
D --> E[Augment Query with Context]
E --> F[Generation Phase]
F --> G[LLM Response]
H[Document Ingestion] --> I[Readers]
I --> J[Chunking]
J --> K[Embedding]
K --> L[Vector Storage in Weaviate]Sources: README.md:1 Sources: frontend/app/components/Chat/ChatInterface.tsx:1
Core Components
Readers
Verba supports multiple document formats through its reader system. Documents can be imported via the frontend and processed by appropriate readers.
| File Type | Format | Support Status |
|---|---|---|
| CSV | csv | Supported (v2.1.3+) |
| Excel | xlsx, xls | Supported (v2.1.3+) |
| Text | Plain text | Supported |
| Markdown | .md | Supported |
.pdf | Supported | |
| Audio | Various | Supported via AssemblyAI |
Sources: CHANGELOG.md:8-13
Embedders
Verba supports multiple embedding providers for converting documents into vector representations:
- OpenAI - Uses OpenAI's embedding models
- Ollama - Local embeddings via Ollama
- HuggingFace - Sentence transformers from HuggingFace
- Cohere - Cohere's embedding models
- Upstage - Upstage's embedding service
Sources: README.md:1 Sources: CHANGELOG.md:15-22
Generators
Multiple LLM providers are supported for generating responses:
| Provider | Type | Configuration |
|---|---|---|
| OpenAI | Cloud | API Key required |
| Anthropic | Cloud | API Key required |
| Cohere | Cloud | API Key required |
| Groq | Cloud | API Key required |
| Novita | Cloud | API Key required (v2.1.2+) |
| Ollama | Local | No API Key needed |
| Upstage | Cloud | API Key required |
Sources: CHANGELOG.md:15-22 Sources: frontend/app/components/Login/LoginView.tsx:1
Document Ingestion Workflow
When users import documents into Verba, the following workflow is executed:
sequenceDiagram
participant User
participant Frontend
participant Reader
participant Chunker
participant Embedder
participant Weaviate
User->>Frontend: Upload Document
Frontend->>Reader: Process File
Reader->>Chunker: Raw Text
Chunker->>Embedder: Text Chunks
Embedder->>Weaviate: Vector Embeddings
Weaviate->>Weaviate: Store Chunks + VectorsChunking Configuration
Documents are split into manageable chunks for retrieval. Each chunk contains:
- Content - The text content
- Chunk ID - Position in original document
- Labels - User-defined labels for categorization
- Source Link - Reference to original document location
- Metadata - Additional document properties
Sources: frontend/app/components/Ingestion/BasicSettingView.tsx:1
Retrieval and Query Flow
Chat Interface
Verba's chat interface handles the retrieval and generation process:
graph LR
A[User Input] --> B[Socket Connection]
B --> C{Retrieval Phase}
C -->|Fetching Status: CHUNKS| D[Vector Search]
D --> E[Retrieve Top-K Chunks]
E --> F{Generation Phase}
F -->|Fetching Status: RESPONSE| G[LLM Processing]
G --> H[Stream Response]
H --> I[Display to User]The chat interface displays retrieval status to users:
fetchingStatus === "CHUNKS" → "Retrieving..."
fetchingStatus === "RESPONSE" → "Generating..."
Sources: frontend/app/components/Chat/ChatInterface.tsx:1
Message Types
Verba supports multiple message types in the chat:
| Type | Direction | Description |
|---|---|---|
user | User → System | User queries |
system | System → User | LLM responses |
error | System → User | Error messages |
retrieval | System → User | Retrieved context |
cached | System → User | Cached responses |
Sources: frontend/app/components/Chat/ChatMessage.tsx:1
Relevancy Scoring
Retrieved chunks are scored for relevance. Chunks with a score greater than 0 are flagged as "High Relevancy":
{contentSnippet.score > 0 && (
<div className="flex gap-2 items-center p-3 bg-primary-verba rounded-full w-fit">
<HiSparkles size={12} />
<p className="text-xs flex text-text-verba">High Relevancy</p>
</div>
)}
Sources: frontend/app/components/Document/ContentView.tsx:1
Deployment Options
Verba supports multiple deployment configurations:
| Deployment | Description | Use Case |
|---|---|---|
| Local | Runs entirely on local machine with Ollama | Development, Privacy |
| Docker | Containerized deployment | Easy setup |
| Weaviate Cloud | Managed Weaviate service | Production |
| Custom | User-provided Weaviate instance | Enterprise |
Sources: README.md:1 Sources: frontend/app/components/Login/LoginView.tsx:1
Environment Variables
Key environment variables for RAG configuration:
| Variable | Description |
|---|---|
OLLAMA_MODEL | Default Ollama model |
OLLAMA_EMBED_MODEL | Ollama embedding model |
DEFAULT_DEPLOYMENT | Default deployment type |
WEAVIATE_URL | Weaviate instance URL |
WEAVIATE_API_KEY | Weaviate API key |
Sources: CHANGELOG.md:8-13
Configuration Management
The RAG pipeline can be configured through the Settings interface:
graph TD
A[Settings Page] --> B[Config Tab]
A --> C[Info Tab]
A --> D[Collections Tab]
B --> E[RAG Pipeline Settings]
C --> F[System Information]
D --> G[Weaviate Collections]
E --> H[Embedder Selection]
E --> I[Generator Selection]
E --> J[Retrieval Settings]Configurable Options
- Embedder - Choose embedding provider and model
- Generator - Select LLM provider and model
- Retrieval - Configure top-k, similarity thresholds
- Chunk Size - Adjust document chunking parameters
Sources: frontend/app/components/Settings/InfoView.tsx:1
Data Storage
Weaviate Collections
Verba automatically creates collections in Weaviate for:
- Documents - Original document metadata
- Chunks - Vectorized document chunks with embeddings
- Configurations - RAG pipeline settings
Each collection tracks:
- Object count
- Shard configuration
- Status
Sources: frontend/app/components/Settings/InfoView.tsx:1
Reset Operations
Verba provides granular reset capabilities:
| Operation | Scope | Action |
|---|---|---|
| Reset Documents | Data | Clears all documents and chunks |
| Reset Config | Configuration | Restores default RAG settings |
| Reset Verba | System | Deletes all Verba collections |
| Reset Suggestions | UI | Clears autocomplete cache |
Sources: frontend/app/components/Settings/InfoView.tsx:1
Summary
Verba implements a complete RAG pipeline with:
- Multi-format document support - CSV, Excel, PDF, audio files
- Flexible embedding options - Multiple cloud and local providers
- Diverse LLM integration - OpenAI, Anthropic, Cohere, Ollama, and more
- Visual chat interface - Real-time status updates during retrieval and generation
- Configurable pipeline - Adjust chunking, embedding, and retrieval parameters
- Multiple deployment modes - Local, Docker, Weaviate Cloud, or custom infrastructure
The system leverages Weaviate's vector database capabilities for efficient similarity search while providing a user-friendly interface for non-technical users to build and interact with RAG applications.
Sources: [README.md:1]()
Component Architecture
Related topics: Data Ingestion System, Chunking Strategies, Embedder Configuration, LLM Generators and Answer Generation
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Data Ingestion System, Chunking Strategies, Embedder Configuration, LLM Generators and Answer Generation
Component Architecture
Overview
Verba (Golden RAGtriever) is a RAG (Retrieval-Augmented Generation) application built with a modular component architecture that separates concerns across ingestion, retrieval, and generation pipelines. The system enables users to import various data formats, process them into searchable chunks, and query them using configurable LLM-based chat interfaces. The frontend is built with React/Next.js and TypeScript, while the backend is powered by Python with FastAPI and Weaviate as the vector database.
The component architecture in Verba follows a plugin-based pattern where different readers, embedders, chunkers, and generators can be dynamically configured and swapped at runtime. This design allows extensibility without modifying core system code.
System Architecture
graph TD
subgraph Frontend["Frontend (Next.js/React)"]
Login[LoginView]
Chat[ChatInterface]
Ingest[FileSelectionView]
Settings[InfoView]
Nav[NavbarComponent]
end
subgraph Backend["Backend (Python/FastAPI)"]
API[FastAPI Server]
Manager[Component Manager]
Components[Components Registry]
end
subgraph External["External Services"]
Weaviate[Weaviate DB]
LLMs[LLM Providers]
Readers[Data Readers]
end
Login --> |Credentials| API
Chat --> |Query/RAGConfig| API
Ingest --> |FileData| API
Settings --> |Reset/Config| API
API --> Manager
Manager --> Components
Components --> Weaviate
Components --> LLMs
Components --> ReadersCore Component Types
Verba's component system is organized around four primary types that form the RAG pipeline:
| Component Type | Purpose | Examples |
|---|---|---|
| Reader | Parse various data formats into text | DefaultReader, AssemblyAI, Unstructured |
| Embedder | Convert text to vector representations | OpenAI Embedder, HuggingFace, Ollama |
| Chunker | Split documents into manageable pieces | DefaultChunker |
| Generator | Produce natural language responses | OpenAI Generator, Anthropic, Novita, Groq |
Component Manager Architecture
The component manager (goldenverba/components/managers.py) serves as the central registry and orchestrator for all pluggable components. It maintains references to available readers, embedders, chunkers, and generators, enabling runtime selection based on user configuration.
Components are registered through the __init__.py module which discovers and loads all available implementations. The manager provides methods to:
- List available components by type
- Retrieve component configurations
- Instantiate components with provided settings
- Validate component compatibility
Frontend Component Architecture
Navigation Structure
The frontend uses a page-based navigation system where NavbarComponent manages the main routing between different views:
graph LR
Nav[NavbarComponent] --> |CHAT| ChatPage[ChatInterface]
Nav --> |DOCUMENTS| DocPage[DocumentExplorer]
Nav --> |ADD| AddPage[FileSelectionView]
Nav --> |SETTINGS| SettingsPage[InfoView]Navigation items are conditionally rendered based on the production environment variable:
- Demo Mode: Shows only Chat page
- Production/Local/Docker: Shows Chat, Documents, Import Data, and Settings
Sources: frontend/app/components/Navigation/NavbarComponent.tsx (lines showing conditional rendering with production != "Demo")
Login and Deployment Configuration
The LoginView component handles the initial setup flow, supporting multiple deployment types:
| Deployment | Description | Configuration Required |
|---|---|---|
| Local | Standalone Weaviate instance | URL, API Key |
| Docker | Containerized Weaviate | URL, API Key |
| Weaviate Cloud (WCS) | Managed Weaviate service | URL, API Key |
| Custom | User-specified Weaviate endpoint | URL, API Key, Port |
Sources: frontend/app/components/Login/LoginView.tsx (lines 35-45 showing deployment type definitions)
Chat Interface Pipeline
The ChatInterface component implements the query-time RAG workflow:
sequenceDiagram
participant User
participant ChatInterface
participant Backend
participant Weaviate
participant LLM
User->>ChatInterface: Submit Query
ChatInterface->>Backend: /query with RAGConfig
Backend->>Weaviate: Vector Search
Weaviate->>Backend: Top-k Chunks
Backend->>LLM: Context + Query
LLM->>Backend: Generated Response
Backend->>ChatInterface: Response + Chunks
ChatInterface->>User: Display ResultsThe interface displays retrieval status messages:
CHUNKSstate: "Retrieving..." while fetching from WeaviateRESPONSEstate: "Generating..." while LLM produces answer
Sources: frontend/app/components/Chat/ChatInterface.tsx (lines showing fetchingStatus states)
Data Ingestion Flow
The FileSelectionView manages the document import pipeline:
graph TD
Files[Files/Directories/URLs] --> Reader[Reader Component]
Reader --> Chunker[Chunker Component]
Chunker --> Embedder[Embedder Component]
Embedder --> Weaviate[Weaviate Collection]File imports support multiple sources:
- Add Files: Individual file upload
- Add Directory: Batch directory ingestion
- Add URL: Web content extraction
Sources: frontend/app/components/Ingestion/FileSelectionView.tsx (lines showing URL dropdown with Reader component filtering)
Settings and Configuration
The InfoView component provides system management capabilities:
| Action | Function | Data Affected |
|---|---|---|
| Reset Documents | Clear all collections | Documents, Chunks |
| Reset Config | Restore default settings | RAGConfig |
| Reset Verba | Full system reset | All collections |
| Reset Suggestions | Clear autocomplete cache | Suggestion data |
Sources: frontend/app/components/Settings/InfoView.tsx (lines showing UserModalComponent triggers)
RAG Configuration Schema
The RAGConfig object defines the active pipeline configuration:
interface RAGConfig {
Reader: {
components: Record<string, Component>;
};
Chunker: {
components: Record<string, Component>;
};
Embedder: {
components: Record<string, Component>;
};
Generator: {
components: Record<string, Component>;
};
}
Each component contains:
type: Component category (e.g., "URL", "Text", "Vector")name: Human-readable identifiersettings: Key-value configuration parameters
State Management
Verba manages state through React props and context patterns:
graph TD
App[App Root] --> |Credentials| LoginView
App --> |RAGConfig| ChatInterface
App --> |Themes| Components
App --> |production| NavbarComponent
ChatInterface --> |setRAGConfig| App
LoginView --> |setIsLoggedIn| AppKey state objects:
- Credentials: Weaviate connection details (URL, API key)
- RAGConfig: Pipeline configuration for all component types
- Themes: UI theming configuration
- production: Deployment mode ("Local" | "Demo" | "Production")
Dependency Injection
The frontend components receive dependencies via constructor injection:
interface ChatInterfaceProps {
credentials: Credentials;
RAGConfig: RAGConfig | null;
setRAGConfig: (config: RAGConfig | null) => void;
production: "Local" | "Demo" | "Production";
addStatusMessage: (message: string) => void;
}
This pattern enables:
- Testability through mock injection
- Flexible component composition
- Runtime configuration changes
WebSocket Communication
Real-time updates between frontend and backend use WebSocket connections:
- Connection status monitoring via
socketOnlineandsocketStatus - Status updates: "ONLINE", "OFFLINE", "CONNECTING"
- Ability to cancel ongoing operations (e.g., retrieval/generation)
Sources: frontend/app/components/Chat/ChatInterface.tsx (lines showing socket status handling)
Conclusion
The Verba component architecture demonstrates a clean separation between frontend presentation and backend processing, with a plugin-based system enabling flexible RAG pipeline configuration. The architecture supports multiple deployment scenarios, various data sources, and different LLM providers through abstracted component interfaces.
Sources: [frontend/app/components/Navigation/NavbarComponent.tsx]() (lines showing conditional rendering with `production != "Demo"`)
Data Ingestion System
Related topics: Chunking Strategies, Embedding and Vector Storage
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Chunking Strategies, Embedding and Vector Storage
Data Ingestion System
The Data Ingestion System in Verba (The Golden RAGtriever) is responsible for accepting user-provided documents from various sources, processing them through configurable reader pipelines, and preparing them for embedding and retrieval. This system forms the entry point of the RAG pipeline, enabling users to import files, URLs, directories, and audio content into the application.
System Overview
The ingestion system operates as a multi-stage pipeline that transforms raw content into structured Document objects ready for vectorization. It supports multiple input types and allows per-file configuration of chunking, embedding, and reading strategies.
The architecture consists of three primary layers:
| Layer | Responsibility | Key Components |
|---|---|---|
| Frontend | User interface for file selection and configuration | FileSelectionView, BasicSettingView, NavbarComponent |
| API | Backend endpoint handling and request processing | server/api.py |
| Readers | Content extraction from various file types and sources | BasicReader, HTMLReader, UnstructuredAPI, AssemblyAIAPI, etc. |
Sources: frontend/app/components/Navigation/NavbarComponent.tsx:1-50
Ingestion Flow
graph TD
A[User clicks Import Data] --> B[FileSelectionView]
B --> C{Input Type}
C -->|Files| D[Add Files Tab]
C -->|Directory| E[Add Directory Tab]
C -->|URLs| F[Add URL Tab]
D --> G[BasicSettingView - Configure]
E --> G
F --> G
G --> H[Select Reader Type]
G --> I[Set Chunker]
G --> J[Set Embedder]
G --> K[Add Metadata & Labels]
K --> L[Import Selected]
L --> M[API Endpoint: /import]
M --> N[Reader Processing]
N --> O[Document Creation]
O --> P[Chunking]
P --> Q[Vector Storage in Weaviate]Document Model
The core data structure used throughout ingestion is the Document class, defined in goldenverba/components/document.py. This class encapsulates all metadata and content associated with an ingested document.
Document(
title=str,
content=str,
extension=str,
labels=list,
source=str,
fileSize=int,
metadata=str,
meta=dict
)
Sources: goldenverba/components/document.py:1-100
Document Creation from Configuration
The create_document function provides a factory method for generating Document objects from file configuration:
def create_document(content: str, fileConfig: FileConfig) -> Document:
return Document(
title=fileConfig.filename,
content=content,
extension=fileConfig.extension,
labels=fileConfig.labels,
source=fileConfig.source,
fileSize=fileConfig.file_size,
metadata=fileConfig.metadata,
meta={},
)
Sources: goldenverba/components/document.py:100-115
Reader Components
Readers are responsible for extracting raw content from various input sources. Each reader implements a specific loading strategy and returns a list of Document objects.
Available Readers
| Reader | Type | Supported Sources | Description |
|---|---|---|---|
BasicReader | File | .txt, .pdf, .docx, .xlsx, .xls, .csv | Standard file reading for common document formats |
HTMLReader | URL | Web pages | Fetches and converts web pages, supports recursive crawling |
UnstructuredAPI | API | Multiple formats | Uses Unstructured.io API for complex document parsing |
AssemblyAIAPI | API | Audio files | Transcribes and extracts content from audio |
GitReader | Repository | Git repos | Clones and extracts documentation from Git repositories |
UpstageDocumentParse | API | Documents | Uses Upstage AI for document parsing |
HTMLReader Configuration
The HTMLReader supports advanced web scraping capabilities:
| Parameter | Type | Default | Description |
|---|---|---|---|
URLs | list | required | List of URLs to process |
Convert To Markdown | bool | true | Whether to convert HTML to markdown |
Recursive | bool | false | Whether to follow linked pages |
Max Depth | int | 3 | Maximum recursion depth for linked pages |
Sources: goldenverba/components/reader/HTMLReader.py:1-80
HTMLReader Recursive Processing
async def process_url(
self,
url: str,
to_markdown: bool,
recursive: bool,
max_depth: int,
current_depth: int,
session: aiohttp.ClientSession,
reader: BasicReader,
fileConfig: FileConfig,
documents: List[Document],
processed_urls: set,
):
if url in processed_urls or current_depth > max_depth:
return
processed_urls.add(url)
# ... content fetching and document creation
The reader uses an async pattern with aiohttp.ClientSession for efficient concurrent URL processing, maintaining a processed_urls set to prevent duplicate processing.
Sources: goldenverba/components/reader/HTMLReader.py:80-120
Frontend Ingestion Interface
File Selection View
The FileSelectionView component provides the primary UI for selecting and managing files to ingest. It supports three input modes:
<div className="tab-group">
<Tabs
tabs={["Add Files", "Add Directory", "Add URL"]}
onTabChange={handleTabChange}
/>
</div>
Key features include:
- File List Display: Shows all selected files with their status
- Multi-file Selection: Allows batch processing of multiple files
- URL Dropdown: Provides reader type selection for URL inputs
- Import Actions: Triggers the actual ingestion process
Sources: frontend/app/components/Ingestion/FileSelectionView.tsx:1-100
Configuration Settings View
The BasicSettingView component provides per-file configuration options. Each file can have its own RAG pipeline configuration:
| Setting | Description | UI Element |
|---|---|---|
| Reader | Content extraction method | Disabled text input showing selected reader |
| Chunker | Text splitting strategy | Disabled text input with description |
| Embedder | Vector embedding model | Disabled text input with description |
| Title | Document display name | Editable text input |
| Source Link | Original source reference | Editable text input |
| Labels | Categorization tags | Input with add button |
| Metadata | Custom key-value data | Textarea for JSON-like content |
| Overwrite | Replace existing documents | Checkbox toggle |
Sources: frontend/app/components/Ingestion/BasicSettingView.tsx:1-150
Label Management
Labels provide a way to categorize documents for filtering during retrieval:
<input
type="text"
value={label}
onChange={(e) => setLabel(e.target.value)}
onKeyDown={(e) => {
if (e.key === "Enter") {
e.preventDefault();
addLabel(label);
}
}}
/>
<VerbaButton
title="Add"
Icon={IoAddCircleSharp}
onClick={() => addLabel(label)}
/>
Sources: frontend/app/components/Ingestion/BasicSettingView.tsx:20-45
Debug Modal
The configuration includes a debug feature that displays the complete file configuration as JSON:
<pre className="whitespace-pre-wrap text-xs">
{selectedFileData
? (() => {
const objCopy = { ...fileMap[selectedFileData] };
objCopy.content = "File Content";
return JSON.stringify(objCopy, null, 2);
})()
: ""}
</pre>
This allows users to inspect the full configuration state including the RAG pipeline settings before import.
Sources: frontend/app/components/Ingestion/BasicSettingView.tsx:100-130
Configuration Management
FileConfig Structure
The FileConfig object carries all configuration for a single file through the ingestion pipeline:
| Field | Type | Purpose |
|---|---|---|
filename | string | Document title |
extension | string | File type indicator |
labels | list | Categorization tags |
source | string | Original source URL/path |
file_size | int | Size in bytes |
metadata | string | Custom metadata string |
rag_config | dict | Reader, Chunker, Embedder settings |
content | string | Raw file content |
overwrite | bool | Whether to replace existing |
RAG Pipeline Configuration
Each file maintains its own RAG configuration that specifies the processing pipeline:
fileMap[selectedFileData].rag_config = {
"Reader": {
"selected": "BasicReader",
"components": {
"BasicReader": { "type": "file", "description": "..." },
"HTMLReader": { "type": "URL", "description": "..." }
}
},
"Chunker": {
"selected": "DefaultChunker",
"components": { ... }
},
"Embedder": {
"selected": "OpenAIEmbedder",
"components": { ... }
}
}
Navigation Integration
The ingestion system is accessible from the main navigation bar:
{production != "Demo" && (
<NavbarButton
hide={false}
Icon={IoMdAddCircle}
title="Import Data"
currentPage={currentPage}
setCurrentPage={setCurrentPage}
setPage="ADD"
/>
)}
In Demo mode, the import functionality is disabled to prevent unauthorized data ingestion.
Sources: frontend/app/components/Navigation/NavbarComponent.tsx:30-45
State Management
The frontend maintains the ingestion state using React hooks:
| State Variable | Type | Purpose | |
|---|---|---|---|
fileMap | Record<string, FileData> | All files selected for ingestion | |
selectedFileData | `string \ | null` | Currently selected file key |
socketStatus | string | Connection status to backend | |
currentPage | string | Current navigation page |
The FileData interface contains both the raw content and the configuration:
interface FileData {
content: string;
filename: string;
extension: string;
labels: string[];
source: string;
file_size: number;
metadata: string;
rag_config: RAGConfig;
overwrite: boolean;
block: boolean; // Disables editing during processing
}
Integration with RAG Pipeline
After successful ingestion, documents flow into the RAG pipeline:
- Reader extracts raw content from the source
- Chunker splits content into smaller segments for retrieval
- Embedder converts chunks into vector embeddings
- Weaviate stores the embedded chunks for semantic search
This integration is configured per-file through the rag_config object, allowing different documents to use different processing strategies based on their content type or user requirements.
Error Handling
The HTMLReader demonstrates the error handling pattern used across readers:
async with aiohttp.ClientSession() as session:
for url in urls:
try:
await self.process_url(
url, to_markdown, recursive, max_depth,
0, session, reader, fileConfig, documents, processed_urls
)
except Exception as e:
msg.warn(f"Failed to process URL {url}: {str(e)}")
Individual failures do not halt the entire ingestion process, allowing partial success scenarios.
Sources: [frontend/app/components/Navigation/NavbarComponent.tsx:1-50]()
Chunking Strategies
Related topics: Data Ingestion System, Embedding and Vector Storage
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Data Ingestion System, Embedding and Vector Storage
Chunking Strategies
Overview
Chunking strategies in Verba define how documents are split into smaller, semantically coherent units for embedding and retrieval. Each strategy implements the Chunker interface and provides different approaches to partitioning document content based on structural markers, token counts, sentence boundaries, or semantic similarity.
The chunking system is designed to:
- Break large documents into manageable pieces for embedding
- Preserve contextual continuity through overlapping chunks
- Respect document structure (headers, code blocks, HTML tags)
- Provide configuration options for fine-tuning chunk sizes and overlap
- Skip documents that already contain chunks to avoid redundant processing
Sources: goldenverba/components/chunking/__init__.py
Architecture
Chunker Interface
All chunkers inherit from the Chunker base class and implement the chunk() async method. The interface accepts configuration parameters, a list of documents, and an optional embedder.
class Chunker:
name: str
requires_library: list[str]
description: str
async def chunk(
self,
config: dict,
documents: list[Document],
embedder: Embedding | None = None,
embedder_config: dict | None = None,
) -> list[Document]:
Chunk Data Model
Each chunk produced by the chunkers contains the following fields:
| Field | Type | Description | |
|---|---|---|---|
content | str | Full chunk text including overlap | |
chunk_id | int | Sequential identifier within the document | |
start_i | int \ | None | Character-level start index |
end_i | int \ | None | Character-level end index |
content_without_overlap | str | Chunk text without overlap region |
Sources: goldenverba/components/chunk.py
Available Chunking Strategies
1. Token Chunker
The Token Chunker splits documents based on token counts using spaCy's tokenizer. It ensures chunks align with natural token boundaries for efficient embedding.
Key Features:
- Splits text by token count using spaCy NLP
- Configurable tokens per chunk via
Tokensconfiguration - Configurable overlap via
Overlapconfiguration - Calculates precise character-level indices for chunk boundaries
Configuration Options:
| Parameter | Type | Description |
|---|---|---|
Tokens | int | Number of tokens per chunk |
Overlap | int | Number of overlapping tokens between chunks |
Behavior:
- If
Tokensexceeds document length or is zero, the entire document becomes a single chunk - If overlap exceeds tokens, overlap is clamped to
tokens - 1with a warning - Chunk content includes overlap;
content_without_overlapexcludes the overlap region
Sources: goldenverba/components/chunking/TokenChunker.py
2. Sentence Chunker
The Sentence Chunker splits documents at sentence boundaries using spaCy's sentence segmentation. This preserves complete sentences within chunks.
Key Features:
- Sentence-level splitting using spaCy's
sentsproperty - Configurable sentences per chunk
- Configurable overlap at sentence level
- Character-level index tracking for precise boundaries
Configuration Options:
| Parameter | Type | Description |
|---|---|---|
Sentences | int | Number of sentences per chunk |
Overlap | int | Number of overlapping sentences between chunks |
Behavior:
- Extracts all sentences from the document using spaCy
- Joins sentences to form chunks while preserving overlap regions
- Calculates character offsets accounting for spacing between sentences
Sources: goldenverba/components/chunking/SentenceChunker.py
3. Recursive Chunker
The Recursive Chunker uses LangChain's RecursiveCharacterTextSplitter to intelligently split text while attempting to preserve natural boundaries.
Key Features:
- Multi-level character-based splitting
- Preserves chunk boundaries at logical text breaks
- Configurable chunk size and overlap
- Uses
keep Separatorparameter to retain separators in chunks
Sources: goldenverba/components/chunking/RecursiveChunker.py
4. Semantic Chunker
The Semantic Chunker groups content based on semantic similarity rather than fixed sizes or boundaries.
Key Features:
- Clusters sentences by semantic meaning
- Dynamically determines chunk boundaries based on content similarity
- Optimal for maintaining topical coherence
Sources: goldenverba/components/chunking/SemanticChunker.py
5. Markdown Chunker
The Markdown Chunker splits documents based on markdown header hierarchy using LangChain's MarkdownHeaderTextSplitter.
Supported Headers:
| Header Level | Syntax |
|---|---|
| Header 1 | # |
| Header 2 | ## |
| Header 3 | ### |
Key Features:
- Splits at markdown header boundaries
- Preserves header context by prepending headers to each chunk
- Uses
get_header_values()helper to extract header text from LangChain document metadata - Maintains hierarchical context through header inclusion
Header Extraction Logic:
def get_header_values(split_doc: LangChainDocument) -> list[str]:
header_keys = [header_key for _, header_key in HEADERS_TO_SPLIT_ON]
return [
header_value
for header_key in header_keys
if (header_value := split_doc.metadata.get(header_key)) is not None
]
Sources: goldenverba/components/chunking/MarkdownChunker.py
6. HTML Chunker
The HTML Chunker splits documents based on HTML tag structure using LangChain's HTMLHeaderTextSplitter.
Supported Tags:
| Tag | Description |
|---|---|
h1 | Header 1 |
h2 | Header 2 |
h3 | Header 3 |
h4 | Header 4 |
Key Features:
- Splits at HTML header boundaries
- Preserves header content within each chunk
- Appends header text before page content
- Requires
langchain_text_splitterslibrary
Chunk Text Construction:
chunk_text = ""
if len(chunk.metadata) > 0:
chunk_text += list(chunk.metadata.values())[0] + "\n"
chunk_text += chunk.page_content
Sources: goldenverba/components/chunking/HTMLChunker.py
7. Code Chunker
The Code Chunker is optimized for source code files, splitting based on code-specific structures and syntax.
Key Features:
- Language-aware splitting for code files
- Preserves code structure and syntax context
- Handles various programming language conventions
Sources: goldenverba/components/chunking/CodeChunker.py
8. JSON Chunker
The JSON Chunker splits JSON documents at logical structural boundaries.
Key Features:
- Splits at JSON object/array boundaries
- Preserves nested structure context
- Handles JSON-specific formatting
Sources: goldenverba/components/chunking/JSONChunker.py
Chunking Workflow
graph TD
A[Document Ingestion] --> B{Already Chunked?}
B -->|Yes| C[Skip Chunking]
B -->|No| D[Select Chunker Strategy]
D --> E{Chunking Strategy}
E -->|Token| F[TokenChunker]
E -->|Sentence| G[SentenceChunker]
E -->|Recursive| H[RecursiveChunker]
E -->|Semantic| I[SemanticChunker]
E -->|Markdown| J[MarkdownChunker]
E -->|HTML| K[HTMLChunker]
E -->|Code| L[CodeChunker]
E -->|JSON| M[JSONChunker]
F --> N[Split Document]
G --> N
H --> N
I --> N
J --> N
K --> N
L --> N
M --> N
N --> O[Create Chunk Objects]
O --> P[Append to document.chunks]
P --> Q[Return Modified Documents]Chunking Configuration
Each chunker accepts a configuration dictionary with strategy-specific parameters:
| Chunker | Primary Config | Overlap Config | Library Dependency |
|---|---|---|---|
| Token | Tokens | Overlap | spacy |
| Sentence | Sentences | Overlap | spacy |
| Recursive | Chunk Size | Overlap | langchain_text_splitters |
| Semantic | N/A | N/A | langchain_text_splitters |
| Markdown | N/A | N/A | langchain_text_splitters |
| HTML | N/A | N/A | langchain_text_splitters |
| Code | N/A | N/A | langchain_text_splitters |
| JSON | N/A | N/A | langchain_text_splitters |
Overlap Strategy
Overlap enables context preservation between adjacent chunks, improving retrieval accuracy for queries that span chunk boundaries.
Overlap Calculation in TokenChunker:
while i < len(doc):
start_i = i
end_i = min(i + units + overlap, len(doc))
if end_i == len(doc):
overlap_start = end_i
else:
overlap_start = min(i + units, end_i)
chunk_text = doc[start_i:end_i].text
chunk_text_without_overlap = doc[start_i:overlap_start].text
Overlap Calculation in SentenceChunker:
overlap_start = max(0, end_i - overlap)
chunk_text = " ".join(sentences[start_i:end_i])
chunk_text_without_overlap = " ".join(sentences[start_i:overlap_start])
Sources: goldenverba/components/chunking/TokenChunker.py, goldenverba/components/chunking/SentenceChunker.py
Skip Mechanism
All chunkers implement a document skipping mechanism to prevent redundant processing:
# Skip if document already contains chunks
if len(document.chunks) > 0:
continue
This ensures idempotent chunking operations and allows manual chunk management.
Best Practices
Choosing a Chunker
| Document Type | Recommended Chunker | Reason |
|---|---|---|
| Plain text | Token, Sentence, Recursive | General-purpose splitting |
| Markdown files | Markdown | Respects header hierarchy |
| HTML documents | HTML | Preserves HTML structure |
| Source code | Code | Language-aware boundaries |
| JSON data | JSON | Structural preservation |
| Long-form content | Semantic | Topic coherence |
| Conversational data | Sentence | Natural language boundaries |
Configuration Guidelines
- Token/Sentence counts: Start with 256-512 tokens or 3-5 sentences per chunk
- Overlap: Use 10-20% overlap for most use cases
- Boundary alignment: For token-based chunking, prefer natural token boundaries over character cuts
Dependencies
| Library | Used By | Purpose |
|---|---|---|
spacy | TokenChunker, SentenceChunker, RecursiveChunker | NLP processing, tokenization, sentence segmentation |
langchain_text_splitters | MarkdownChunker, HTMLChunker, CodeChunker, JSONChunker, RecursiveChunker, SemanticChunker | Text splitting algorithms |
Sources: goldenverba/components/chunking/TokenChunker.py, goldenverba/components/chunking/MarkdownChunker.py, goldenverba/components/chunking/HTMLChunker.py
Sources: [goldenverba/components/chunking/__init__.py]()
Embedding and Vector Storage
Related topics: Chunking Strategies, RAG Retrieval System
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Chunking Strategies, RAG Retrieval System
Embedding and Vector Storage
Verba implements a modular RAG (Retrieval-Augmented Generation) pipeline where embeddings play a critical role in transforming textual content into vector representations for semantic search and retrieval operations.
Overview
Embedding and vector storage in Verba enables the transformation of documents and their chunks into high-dimensional vector representations. These vectors power semantic search capabilities, allowing users to retrieve relevant content based on meaning rather than exact keyword matches. The system supports multiple embedding providers and integrates with Weaviate as the primary vector database.
Architecture
graph TD
A[Document] --> B[Reader Component]
B --> C[Text Content]
C --> D[Chunker Component]
D --> E[Document Chunks]
E --> F[Embedder Component]
F --> G[Vector Embeddings]
G --> H[Weaviate Vector Store]
I[User Query] --> J[Embedder Component]
J --> K[Query Vector]
K --> H
H --> L[Similarity Search]
L --> M[Retrieved Chunks]Embedder Components
Verba supports multiple embedder implementations, allowing users to choose the provider that best fits their requirements.
Available Embedders
| Embedder | Provider | Status |
|---|---|---|
| OpenAI Embedder | OpenAI | Default |
| Cohere Embedder | Cohere | Supported |
| Ollama Embedder | Ollama | Supported |
| SentenceTransformers Embedder | HuggingFace | Supported |
| VoyageAI Embedder | VoyageAI | Supported |
| Upstage Embedder | Upstage | Supported |
| Google Embedder | Supported | |
| Weaviate Embedder | Weaviate | Built-in |
Sources: CHANGELOG.md
Embedder Selection
The embedder can be configured through the RAG configuration interface:
<ComponentView
RAGConfig={RAGConfig}
component_name="Embedder"
selectComponent={selectComponent}
updateConfig={updateConfig}
saveComponentConfig={saveComponentConfig}
blocked={production == "Demo"}
/>
Sources: frontend/app/components/Chat/ChatConfig.tsx
Configuration
Environment Variables
| Variable | Description | Required |
|---|---|---|
OPENAI_API_KEY | API key for OpenAI embeddings | For OpenAI |
OPENAI_EMBED_API_KEY | Separate API key for embeddings | Optional |
OPENAI_EMBED_BASE_URL | Custom endpoint for embeddings | Optional |
OPENAI_CUSTOM_EMBED | Flag for custom embedding models | Optional |
COHERE_API_KEY | API key for Cohere | For Cohere |
VOYAGE_API_KEY | API key for VoyageAI | For VoyageAI |
UPSTAGE_API_KEY | API key for Upstage | For Upstage |
GOOGLE_API_KEY | API key for Google | For Google |
Sources: README.md
Custom OpenAI Embedding Configuration
For users deploying custom OpenAI-compatible embedding servers:
OPENAI_EMBED_API_KEY=YOUR_API_KEY
OPENAI_EMBED_BASE_URL=YOUR_CUSTOM_URL
OPENAI_CUSTOM_EMBED=true
Sources: README.md
Document Processing Pipeline
The embedding process is part of a larger document ingestion pipeline:
graph LR
A[File Upload] --> B[Reader]
B --> C[Text Extraction]
C --> D[Chunking]
D --> E[Chunk Processing]
E --> F[Embedding Generation]
F --> G[Vector Storage]
G --> H[Indexing]Step 1: Document Reading
Documents are first processed by a Reader component that extracts text content based on file type. Supported formats include PDF, DOCX, TXT, CSV, XLSX, XLS, and more.
Step 2: Chunking
Extracted text is split into smaller chunks using configurable chunking strategies. The chunker can be selected and configured per document.
Step 3: Embedding Generation
Each chunk is passed through the selected embedder to generate a vector representation. The embedder transforms textual content into numerical vectors in a high-dimensional space.
Sources: frontend/app/components/Ingestion/BasicSettingView.tsx
RAG Configuration
Verba allows per-document RAG configuration including the embedder selection:
interface RAGConfig {
"Reader": {
selected: string;
components: Record<string, ComponentConfig>;
};
"Chunker": {
selected: string;
components: Record<string, ComponentConfig>;
};
"Embedder": {
selected: string;
components: Record<string, ComponentConfig>;
};
"Retriever": {
selected: string;
components: Record<string, ComponentConfig>;
};
"Generator": {
selected: string;
components: Record<string, ComponentConfig>;
};
}
Sources: frontend/app/components/Ingestion/BasicSettingView.tsx
Vector Storage in Weaviate
Verba uses Weaviate as its vector database. The embeddings are stored alongside metadata for efficient similarity search operations.
Collection Management
Collections in Weaviate store documents with their associated vectors:
{
"name": "collection_name",
"count": number_of_objects,
"status": "vector_status",
"shards": number_of_shards
}
Sources: frontend/app/components/Settings/InfoView.tsx
Document Document Structure
Documents stored with embeddings contain:
| Field | Type | Description |
|---|---|---|
title | string | Document title |
content | string | Original text content |
extension | string | File extension/type |
labels | list | User-defined labels |
source | string | Source URL or reference |
fileSize | int | Size in bytes |
metadata | dict | Additional metadata |
vectors | array | Embedding vectors |
Sources: goldenverba/components/document.py
Dependency Management
Core embedding dependencies are specified in setup.py:
install_requires=[
"weaviate-client==4.9.6",
"openpyxl==3.1.5",
"fastapi==0.111.1",
# ... additional dependencies
]
Sources: setup.py
Deployment Considerations
Local Deployment
When running Verba locally, users can select embedders that don't require API keys (such as SentenceTransformers) or configure API-based embedders with appropriate keys.
Production Deployment
In production environments:
- Embedder selection may be restricted based on deployment type
- API keys should be configured via environment variables
- Vector storage is managed through the configured Weaviate instance
Summary
Verba's embedding and vector storage system provides a flexible, provider-agnostic approach to semantic search. By supporting multiple embedding providers and integrating with Weaviate for vector storage, the system enables effective retrieval-augmented generation across various use cases. Configuration can be done per-document or at the system level, giving users fine-grained control over the RAG pipeline.
Sources: [CHANGELOG.md](https://github.com/weaviate/Verba/blob/main/CHANGELOG.md)
RAG Retrieval System
Related topics: Embedding and Vector Storage, LLM Generators and Answer Generation
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Embedding and Vector Storage, LLM Generators and Answer Generation
RAG Retrieval System
Overview
The RAG (Retrieval-Augmented Generation) Retrieval System is the core query mechanism in Verba that enables semantic search across document collections. It retrieves relevant text chunks from vectorized documents stored in Weaviate and delivers them to the Generator component for answer synthesis.
In Verba's architecture, the Retrieval System operates as a pipeline component that:
- Receives user queries from the frontend chat interface
- Embeds the query using the configured Embedder
- Performs vector similarity search in Weaviate
- Returns ranked chunks to the chat interface for display and generation
Sources: [frontend/app/components/Chat/ChatConfig.tsx](frontend/app/components/Chat/ChatConfig.tsx)
LLM Generators and Answer Generation
Related topics: LLM Generators and Answer Generation, RAG Retrieval System
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: LLM Generators and Answer Generation, RAG Retrieval System
LLM Generators and Answer Generation
Overview
Verba implements a modular LLM Generator system that provides answer generation capabilities for the RAG (Retrieval-Augmented Generation) pipeline. The system supports multiple LLM providers through a common interface abstraction, allowing users to choose their preferred generation backend while maintaining consistent behavior across the application.
The generation module is part of Verba's component architecture, where each generator implements a common interface that defines methods for retrieving supported models, generating responses, and managing authentication credentials.
Architecture
Generator Interface
All LLM generators in Verba inherit from a common base class that defines the contract for generation operations. This design enables:
- Provider Agnosticism: Switch between LLM providers without changing application code
- Consistent API: All generators expose the same methods for model listing and text generation
- Credential Management: Unified handling of API keys and authentication
Supported Providers
Verba supports the following LLM providers for answer generation:
| Provider | Class Name | API Type | Cloud/Local |
|---|---|---|---|
| OpenAI | OpenAIGenerator | REST API | Cloud |
| Anthropic | AnthrophicGenerator | REST API | Cloud |
| Cohere | CohereGenerator | REST API | Cloud |
| Groq | GroqGenerator | REST API | Cloud |
| Ollama | OllamaGenerator | Local API | Local |
| Gemini | GeminiGenerator | REST API | Cloud |
| Novita AI | NovitaGenerator | REST API | Cloud |
| Upstage | UpstageGenerator | REST API | Cloud |
| Atlas Cloud | AtlasCloudGenerator | REST API | Cloud |
Generation Workflow
graph TD
A[User Query] --> B[RAG Pipeline]
B --> C[Retrieval: Find Relevant Chunks]
C --> D[Context Assembly]
D --> E[LLM Generator]
E --> F[Prompt Construction]
F --> G[API Call to LLM Provider]
G --> H[Response Parsing]
H --> I[Formatted Answer]
J[Credentials] --> E
K[System Prompt] --> F
L[Model Selection] --> GConfiguration
Environment Variables
Generation behavior can be configured through environment variables:
| Variable | Description | Required |
|---|---|---|
OLLAMA_MODEL | Default Ollama model for local generation | For Ollama setup |
OLLAMA_EMBED_MODEL | Default Ollama embedding model | For Ollama setup |
OPENAI_API_KEY | API key for OpenAI provider | For OpenAI setup |
ANTHROPIC_API_KEY | API key for Anthropic/Claude | For Anthropic setup |
Runtime Configuration
The frontend allows dynamic model selection through the ChatConfig component. When users select a deployment type, Verba fetches available models from the configured provider and presents them in the UI.
From frontend/app/components/Chat/ChatInterface.tsx:
<ChatConfig
addStatusMessage={addStatusMessage}
production={production}
RAGConfig={RAGConfig}
credentials={credentials}
setRAGConfig={setRAGConfig}
onReset={onResetConfig}
onSave={onSaveConfig}
/>
Component Interaction
Chat Pipeline
sequenceDiagram
participant User
participant Frontend
participant Backend
participant Generator
participant LLM_Provider
User->>Frontend: Submit Query
Frontend->>Backend: /api/chat with query
Backend->>Generator: Generate response
Generator->>LLM_Provider: API request
LLM_Provider-->>Generator: LLM Response
Generator-->>Backend: Formatted answer
Backend-->>Frontend: Stream response
Frontend->>User: Display answerMessage Handling
The chat interface manages different fetching states during generation:
{fetchingStatus === "CHUNKS" && "Retrieving..."}
{fetchingStatus === "RESPONSE" && "Generating..."}
Users can cancel ongoing generation through the UI, which sets the fetching status to DONE and stops further API calls.
Document-to-Answer Flow
When processing documents for RAG:
- Ingestion: Documents are parsed and chunked via Reader components
- Storage: Chunks are embedded and stored in Weaviate
- Retrieval: Relevant chunks are fetched based on query similarity
- Generation: Selected chunks are sent to the LLM generator with the user's query
From goldenverba/components/document.py:
def create_document(content: str, fileConfig: FileConfig) -> Document:
"""Create a Document object from the file content."""
return Document(
title=fileConfig.filename,
content=content,
extension=fileConfig.extension,
labels=fileConfig.labels,
source=fileConfig.source,
fileSize=fileConfig.file_size,
metadata=fileConfig.metadata,
meta={},
)
Deployment Types
Verba supports different deployment configurations that affect generation behavior:
| Deployment | Description | Generator Usage |
|---|---|---|
| Weaviate Cloud | Full cloud deployment | Cloud-based LLM APIs |
| Docker | Containerized deployment | Configurable via environment |
| Local | Development setup | Often uses Ollama |
| Custom | User-defined infrastructure | Flexible configuration |
The deployment type is selected during the initial setup through the LoginView component:
From frontend/app/components/Login/LoginView.tsx:
{production == "Local" && (
<div className="flex flex-col justify-start gap-2 w-full">
<VerbaButton
Icon={FaDatabase}
title="Weaviate"
onClick={() => setSelectedDeployment("Weaviate")}
/>
{/* Docker, Custom, Local options */}
</div>
)}
Model Selection
Dynamic Model Retrieval
Verba supports dynamic model name retrieval for OpenAI-compatible APIs based on the provided API key and URL. This allows the system to automatically discover and list available models from the configured provider.
From CHANGELOG.md:
Dynamic model name retrieval for OpenAI Generator based on OpenAI URL and API Key
Model Fallback
When automatic model detection is unavailable, Verba uses default models:
- OpenAI:
gpt-4o-mini - Anthropic:
claude-3-haiku-20240307 - Ollama: Configurable via
OLLAMA_MODEL
Status and Error Handling
The application uses a status messenger system to communicate generation states to users:
{messages.filter((message) => {
const messageTime = new Date(message.timestamp).getTime();
const currentTime = new Date().getTime();
return currentTime - messageTime < 5000; // 5 seconds
}).map((message, index) => (
<motion.div key={index}>
{/* Status message display */}
</motion.div>
))}
Messages are categorized by type and automatically expire after 5 seconds.
Reset Capabilities
The Settings panel (InfoView) provides reset functionality that affects generation:
| Action | Scope | Effect on Generation |
|---|---|---|
| Reset Documents | Data | Clears chunks, requires re-retrieval |
| Reset Config | Configuration | Resets model selection and prompts |
| Reset Verba | All Data | Full system reset including models |
| Reset Suggestions | Autocomplete | Clears cached suggestions |
From frontend/app/components/Settings/InfoView.tsx:
<UserModalComponent
modal_id="reset-documents"
title="Reset Documents"
text="Are you sure you want to reset all documents?"
triggerAccept={resetDocuments}
triggerString="Reset"
/>
Dependencies
The generation module relies on the following core dependencies:
From setup.py:
install_requires=[
"weaviate-client==4.9.6",
"fastapi==0.111.1",
"uvicorn[standard]==0.29.0",
"click==8.1.7",
# Provider-specific SDKs loaded dynamically
]
Version History
| Version | Changes |
|---|---|
| 2.1.3 | Added OLLAMA_MODEL and OLLAMA_EMBED_MODEL environment variables |
| 2.1.2 | Added Novita Generator |
| 2.1.1 | Dynamic model retrieval for OpenAI Generator |
| 2.1.0 | Added Upstage Generator, Groq, improved configuration |
Best Practices
- API Key Security: Store API keys in
.envfiles rather than hardcoding - Model Selection: Choose models based on task requirements (speed vs. quality)
- Chunk Configuration: Adjust chunk sizes to match model's context window
- Rate Limits: Be aware of provider-specific rate limits for high-volume usage
See Also
Source: https://github.com/weaviate/Verba / Human Manual
Embedder Configuration
Related topics: Embedding and Vector Storage
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Embedding and Vector Storage
Embedder Configuration
Overview
The Embedder is a core component in Verba's RAG (Retrieval-Augmented Generation) pipeline responsible for converting text content into vector embeddings. These embeddings enable semantic search capabilities by representing documents and queries as numerical vectors in a high-dimensional space.
Verba supports multiple embedder implementations including OpenAI, Ollama, Upstage, and other providers. The embedder configuration system allows users to select their preferred embedding provider, configure provider-specific settings, and integrate with the broader RAG pipeline.
Sources: frontend/app/components/Chat/ChatConfig.tsx:24-35
Architecture
RAG Pipeline Integration
The Embedder operates as part of a modular RAG configuration system alongside the Retriever and Generator components.
graph TD
A[Document Input] --> B[Reader]
B --> C[Chunker]
C --> D[Embedder]
D --> E[Weaviate Vector Store]
F[User Query] --> G[Embedder]
G --> H[Retriever]
H --> I[Generator]
I --> J[Response]
E -.-> HSources: frontend/app/components/Chat/ChatConfig.tsx:24-35
Component Selection Flow
Users interact with the embedder configuration through a component selection interface:
sequenceDiagram
participant User
participant ComponentView
participant RAGConfig
participant Embedder
User->>ComponentView: Select Embedder Component
ComponentView->>RAGConfig: updateConfig(Embedder, selection)
RAGConfig->>Embedder: Apply Configuration
Embedder-->>RAGConfig: Confirm Settings
RAGConfig-->>ComponentView: Update UI
ComponentView-->>User: Display Updated StateSources: frontend/app/components/Ingestion/BasicSettingView.tsx:1-100
Configuration UI
Embedder Selection Panel
The embedder can be configured through the ingestion interface where users select and customize embedding models for their documents.
| Field | Type | Description |
|---|---|---|
selected | string | Currently selected embedder name |
components | object | Available embedder implementations |
description | string | Human-readable description of selected embedder |
Sources: frontend/app/components/Ingestion/BasicSettingView.tsx:85-95
Display and Configuration
The UI displays the currently selected embedder with its configuration:
<div className="flex gap-2 justify-between items-center text-text-verba">
<p className="flex min-w-[8vw]">Embedder</p>
<label className="input flex items-center gap-2 w-full bg-bg-verba">
<input
type="text"
className="grow w-full"
value={fileMap[selectedFileData].rag_config["Embedder"].selected}
disabled={true}
/>
</label>
</div>
Sources: frontend/app/components/Ingestion/BasicSettingView.tsx:85-95
Dynamic Description
Each embedder component provides a description that is displayed in the UI:
<div className="flex gap-2 items-center text-text-verba">
<p className="flex min-w-[8vw]"></p>
<p className="text-sm text-text-alt-verba text-start">
{selectedFileData &&
fileMap[selectedFileData].rag_config["Embedder"].components[
fileMap[selectedFileData].rag_config["Embedder"].selected
].description}
</p>
</div>
Sources: frontend/app/components/Ingestion/BasicSettingView.tsx:97-106
RAG Configuration System
Configuration Structure
The embedder is configured within the RAGConfig object which manages all pipeline components:
interface RAGConfig {
Embedder: {
selected: string;
components: Record<string, EmbedderComponent>;
};
Generator: {
selected: string;
components: Record<string, GeneratorComponent>;
};
Retriever: {
selected: string;
components: Record<string, RetrieverComponent>;
};
}
Sources: frontend/app/components/Chat/ChatConfig.tsx:1-50
Component View Integration
The ComponentView component renders the embedder selection UI:
<ComponentView
RAGConfig={RAGConfig}
component_name="Embedder"
selectComponent={selectComponent}
updateConfig={updateConfig}
saveComponentConfig={saveComponentConfig}
blocked={production == "Demo"}
/>
Sources: frontend/app/components/Chat/ChatConfig.tsx:24-35
Supported Embedder Providers
Provider Matrix
| Provider | Environment Variables | Version Added | Features |
|---|---|---|---|
| OpenAI | OPENAI_API_KEY, OPENAI_URL | Base | Dynamic model name retrieval |
| Ollama | OLLAMA_EMBED_MODEL | 2.1.3 | Local embedding models |
| Upstage | (Upstage-specific) | 2.1.0 | High-performance embeddings |
Sources: CHANGELOG.md:1-30, setup.py:1-50
Environment Variables
Verba supports environment variables for embedder configuration:
| Variable | Description | Example |
|---|---|---|
OLLAMA_EMBED_MODEL | Ollama embedding model name | nomic-embed-text |
OPENAI_API_KEY | OpenAI API key | sk-... |
OPENAI_URL | OpenAI API endpoint | https://api.openai.com/v1 |
Sources: CHANGELOG.md:1-15
Deployment Modes
Demo Mode Restrictions
In Demo mode, the embedder configuration is locked to prevent changes:
blocked={production == "Demo"}
This ensures that demo deployments maintain consistent behavior and don't allow users to modify the embedding pipeline.
Sources: frontend/app/components/Chat/ChatConfig.tsx:24-35
Configuration Persistence
The embedder configuration is saved through the saveComponentConfig callback:
<ComponentView
RAGConfig={RAGConfig}
component_name="Embedder"
selectComponent={selectComponent}
updateConfig={updateConfig}
saveComponentConfig={saveComponentConfig}
blocked={production == "Demo"}
/>
Sources: frontend/app/components/Chat/ChatConfig.tsx:24-35
Debugging Embedder Configuration
Debug Modal
Verba provides a debug view for inspecting file configuration including embedder settings:
<VerbaButton
Icon={CgDebug}
onClick={openDebugModal}
className="max-w-min"
/>
<dialog id={"File_Debug_Modal"} className="modal">
<pre className="whitespace-pre-wrap text-xs">
{selectedFileData
? (() => {
const objCopy = { ...fileMap[selectedFileData] };
objCopy.content = "File Content";
return JSON.stringify(objCopy, null, 2);
})()
: ""}
</pre>
</dialog>
Sources: frontend/app/components/Ingestion/BasicSettingView.tsx:110-130
File-Level Configuration
Per-File Embedder Override
Each file can have its own embedder configuration that overrides the global RAG config:
interface FileConfig {
filename: string;
extension: string;
labels: string[];
source: string;
file_size: number;
metadata: Record<string, any>;
rag_config: RAGConfig;
}
Sources: frontend/app/components/Ingestion/BasicSettingView.tsx:1-100
Accessing File Embedder Config
fileMap[selectedFileData].rag_config["Embedder"].selected
fileMap[selectedFileData].rag_config["Embedder"].components
Sources: frontend/app/components/Ingestion/BasicSettingView.tsx:85-106
Version History
| Version | Changes |
|---|---|
| 2.1.3 | Added OLLAMA_MODEL and OLLAMA_EMBED_MODEL environment variables |
| 2.1.0 | Added Upstage Embedder support |
| 2.1.1 | Dynamic model name retrieval for OpenAI |
Sources: CHANGELOG.md:1-30
Dependencies
Verba requires the Weaviate client for vector storage operations:
install_requires=[
"weaviate-client==4.9.6",
"python-dotenv==1.0.0",
"openpyxl==3.1.5",
"wasabi==1.1.2",
"fastapi==0.111.1",
"uvicorn[standard]==0.29.0",
"gunicorn==22.0.0",
"click==8.1.7",
]
Sources: setup.py:15-25
Best Practices
- Use Environment Variables: Store API keys and model names in
.envfiles rather than hardcoding - Test in Non-Demo Mode: Full configuration features require non-Demo deployment
- Leverage Debug Mode: Use the debug modal to inspect configuration state
- Check Provider Support: Ensure your chosen embedder provider is compatible with your deployment mode
Sources: [frontend/app/components/Chat/ChatConfig.tsx:24-35]()
Frontend Component Overview
The Verba frontend is a React-based web application that provides a user-friendly interface for Retrieval-Augmented Generation (RAG) operations. Built with Next.js and TypeScript, the fron...
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Introduction
The Verba frontend is a React-based web application that provides a user-friendly interface for Retrieval-Augmented Generation (RAG) operations. Built with Next.js and TypeScript, the frontend enables users to interact with their data through chat interfaces, document management, and configuration settings.
Sources: frontend/app/page.tsx:1-50
Architecture Overview
The frontend follows a modular component-based architecture organized by functionality. The application uses WebSocket connections for real-time communication with the backend server, enabling live status updates and streaming responses.
graph TD
A[App Entry Point] --> B[LoginView]
A --> C[Main Application]
C --> D[NavbarComponent]
C --> E[ChatInterface]
C --> F[Document Components]
C --> G[Ingestion Components]
C --> H[Settings Components]
D --> I[StatusMessenger]
E --> J[ChatMessage]
E --> K[ChatConfig]Sources: frontend/app/components/Navigation/NavbarComponent.tsx:1-100
Core Components
Navigation Components
#### NavbarComponent
The navigation bar serves as the primary routing mechanism within the application. It displays different pages based on the user's current selection and deployment mode.
| Prop | Type | Description |
|---|---|---|
currentPage | string | Current active page identifier |
setCurrentPage | function | Callback to change active page |
production | string | Deployment type (Local, Demo, Docker, etc.) |
gitHubStars | string | GitHub star count display |
The NavbarComponent conditionally renders menu items based on the production environment. In non-Demo modes, additional options like "Import Data" and "Settings" become available.
Sources: frontend/app/components/Navigation/NavbarComponent.tsx:1-150
#### StatusMessengerComponent
The StatusMessengerComponent provides toast-style notifications for application events. Messages are filtered to display only those within the last 5 seconds, providing transient user feedback.
graph LR
A[Message Event] --> B[Timestamp Check]
B --> C{Within 5s?}
C -->|Yes| D[Animate In]
C -->|No| E[Filter Out]
D --> F[Display Message]
F --> G[Animate Out]Messages are color-coded by type using a getMessageColor() function and include icons for visual identification.
Sources: frontend/app/components/Navigation/StatusMessenger.tsx:1-80
Chat Components
#### ChatInterface
The ChatInterface is the central component for RAG interactions. It manages chat messages, user input, and the configuration panel.
| State Variable | Type | Purpose |
|---|---|---|
messages | Message[] | Array of chat messages |
previewText | string | Streaming response preview |
isFetching | RefObject | Fetching status indicator |
selectedSetting | string | Active sub-panel (Chat/Config) |
fetchingStatus | string | Current operation status |
The component supports two sub-views: the chat view for conversation and a configuration view for RAG settings. A cancel button allows users to interrupt ongoing operations.
Sources: frontend/app/components/Chat/ChatInterface.tsx:1-150
#### ChatMessage
The ChatMessage component renders individual messages with support for multiple message types and syntax highlighting for code blocks.
| Message Type | Styling | Features |
|---|---|---|
user | Right-aligned, primary background | Plain text display |
system | Left-aligned, alternate background | Markdown + syntax highlighting |
error | Warning background color | Error notifications |
retrieval | Standard background | Retrieval results |
For system messages containing code, the component uses react-syntax-highlighter with theme support for both light and dark modes.
<SyntaxHighlighter
style={selectedTheme.theme === "dark" ? oneDark : oneLight}
language={match[1]}
PreTag="div"
>
{String(children).replace(/\n$/, "")}
</SyntaxHighlighter>
Sources: frontend/app/components/Chat/ChatMessage.tsx:1-100
Document Components
#### ContentView
The ContentView component displays document content with pagination support for both chunks and pages.
| Feature | Description |
|---|---|
| Chunk Navigation | Previous/Next chunk buttons |
| Page Navigation | Page-based content display |
| Scroll Handling | Overflow auto-scroll for long content |
| Label Display | Truncated label badges with max-width constraints |
The component conditionally renders navigation text based on whether chunk scores are available, switching between "Chunk" and "Page" labels.
Sources: frontend/app/components/Document/ContentView.tsx:1-100
Ingestion Components
#### BasicSettingView
The BasicSettingView provides configuration options for document ingestion, including source links and labels.
| Field | Purpose | Constraints |
|---|---|---|
source | Reference link to original document | Optional field |
label | Document categorization labels | Enter key adds label |
chunker | Text chunking strategy | Read-only display |
embedder | Embedding model selection | Read-only display |
Labels are added via keyboard (Enter key) or button click, with each label rendered as a removable badge.
Sources: frontend/app/components/Ingestion/BasicSettingView.tsx:1-150
#### FileSelectionView
The FileSelectionView displays a list of selected files and provides import functionality.
Files are rendered as FileComponent instances, each with delete and selection capabilities. The import footer contains action buttons that become available when the WebSocket connection is online.
graph TD
A[FileSelectionView] --> B[FileComponent List]
B --> C{User Action}
C --> D[Delete File]
C --> E[Select File]
C --> F[Import Selected]
F --> G{WebSocket Online?}
G -->|Yes| H[Show Import Button]
G -->|No| I[Hide Import Button]Sources: frontend/app/components/Ingestion/FileSelectionView.tsx:1-100
Settings Components
#### InfoView
The InfoView component displays system information and provides reset functionality for various aspects of the application.
| Section | Information Displayed |
|---|---|
| Weaviate Cluster | Name, status, shard count |
| Collections | Collection count, names, object counts |
| Reset Options | Documents, Config, Verba, Suggestions |
Reset operations are protected by modal confirmations using the UserModalComponent, requiring explicit user confirmation before executing destructive actions.
Sources: frontend/app/components/Settings/InfoView.tsx:1-100
Login Components
#### LoginView
The LoginView handles initial deployment type selection for the application.
| Deployment Option | Icon | Description |
|---|---|---|
| Weaviate | FaDatabase | Weaviate Cloud deployment |
| Docker | FaDocker | Docker container deployment |
| Custom | TbDatabaseEdit | Custom backend connection |
| Local | FaLaptopCode | Local development mode |
The component manages connection states and displays loading indicators during authentication attempts.
Sources: frontend/app/components/Login/LoginView.tsx:1-100
Application Entry Point
The main page component (frontend/app/page.tsx) orchestrates the overall application flow, handling:
- Environment detection and configuration
- WebSocket connection management
- Theme persistence
- Page routing based on deployment status
graph TD
A[Page Load] --> B{Production Mode?}
B -->|Demo| C[Direct to Main]
B -->|Local| D[LoginView]
D --> E{Deployment Selected}
E -->|Weaviate| F[Configure Weaviate]
E -->|Docker| G[Connect to Docker]
E -->|Custom| H[Configure Custom]
E -->|Local| I[Full Setup]
F --> J[Main Application]
G --> J
H --> J
I --> J
C --> JThe footer displays a "Built with ♥ and Weaviate" message, confirming the project's association with Weaviate.
Sources: frontend/app/page.tsx:1-80
Theme System
The application supports both light and dark themes, with theme preferences persisted across sessions. Theme values are passed down to components that require styling adjustments, such as ChatMessage for syntax highlighting.
// Theme-dependent syntax highlighting
const codeStyle = selectedTheme.theme === "dark" ? oneDark : oneLight;
Component Communication
Components communicate through several mechanisms:
- Props Drilling: Parent components pass state and callbacks to children via props
- Ref Objects: Used for mutable values that don't trigger re-renders (e.g.,
isFetching) - WebSocket Events: Real-time updates from the backend server
- State Callbacks: Functions like
setCurrentPagefor navigation state
State Management Summary
| Component Area | Primary State | Secondary State |
|---|---|---|
| Chat | messages, previewText | fetchingStatus, selectedSetting |
| Documents | content, chunkScores | page, selectedDocument |
| Ingestion | fileMap, selectedFileData | source, label |
| Navigation | currentPage | socketOnline, production |
| Settings | collectionPayload | clusterPayload, credentials |
Styling Conventions
The frontend uses Tailwind CSS with custom color variables following the -verba suffix convention:
| Color Variable | Usage |
|---|---|
bg-bg-verba | Primary background |
bg-bg-alt-verba | Alternate background |
text-text-verba | Primary text |
text-text-alt-verba | Alternate text |
bg-button-verba | Button backgrounds |
hover:bg-button-hover-verba | Button hover states |
This systematic naming ensures consistent theming throughout the application.
Key Features Summary
| Feature | Components | Description |
|---|---|---|
| Chat Interface | ChatInterface, ChatMessage | RAG conversation with syntax highlighting |
| Document Explorer | ContentView, DocumentExplorer | View and navigate documents/chunks |
| Data Import | FileSelectionView, BasicSettingView | File upload and configuration |
| System Settings | InfoView, SettingsView | System info and reset options |
| Real-time Updates | StatusMessenger | Toast notifications for events |
| Navigation | NavbarComponent | Page routing and menu |
Sources: [frontend/app/page.tsx:1-50]()
Doramagic Pitfall Log
Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.
First-time setup may fail or require extra isolation and rollback planning.
First-time setup may fail or require extra isolation and rollback planning.
First-time setup may fail or require extra isolation and rollback planning.
First-time setup may fail or require extra isolation and rollback planning.
Doramagic Pitfall Log
Doramagic extracted 14 source-linked risk signals. Review them before installing or handing real data to the project.
1. Installation risk: 1.0.1 Beautiful Verba
- Severity: medium
- Finding: Installation risk is backed by a source signal: 1.0.1 Beautiful Verba. Treat it as a review item until the current version is checked.
- User impact: First-time setup may fail or require extra isolation and rollback planning.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: Source-linked evidence: https://github.com/weaviate/Verba/releases/tag/1.0.0
2. Installation risk: v0.4.0
- Severity: medium
- Finding: Installation risk is backed by a source signal: v0.4.0. Treat it as a review item until the current version is checked.
- User impact: First-time setup may fail or require extra isolation and rollback planning.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: Source-linked evidence: https://github.com/weaviate/Verba/releases/tag/0.4.0
3. Installation risk: v1.0.3
- Severity: medium
- Finding: Installation risk is backed by a source signal: v1.0.3. Treat it as a review item until the current version is checked.
- User impact: First-time setup may fail or require extra isolation and rollback planning.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: Source-linked evidence: https://github.com/weaviate/Verba/releases/tag/v1.0.3
4. Installation risk: v2.1.0
- Severity: medium
- Finding: Installation risk is backed by a source signal: v2.1.0. Treat it as a review item until the current version is checked.
- User impact: First-time setup may fail or require extra isolation and rollback planning.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: Source-linked evidence: https://github.com/weaviate/Verba/releases/tag/v2.1
5. Capability assumption: README/documentation is current enough for a first validation pass.
- Severity: medium
- Finding: README/documentation is current enough for a first validation pass.
- User impact: The project should not be treated as fully validated until this signal is reviewed.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: capability.assumptions | github_repo:672002598 | https://github.com/weaviate/Verba | README/documentation is current enough for a first validation pass.
6. Maintenance risk: Maintainer activity is unknown
- Severity: medium
- Finding: Maintenance risk is backed by a source signal: Maintainer activity is unknown. Treat it as a review item until the current version is checked.
- User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: evidence.maintainer_signals | github_repo:672002598 | https://github.com/weaviate/Verba | last_activity_observed missing
7. Security or permission risk: no_demo
- Severity: medium
- Finding: no_demo
- User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: downstream_validation.risk_items | github_repo:672002598 | https://github.com/weaviate/Verba | no_demo; severity=medium
8. Security or permission risk: No sandbox install has been executed yet; downstream must verify before user use.
- Severity: medium
- Finding: No sandbox install has been executed yet; downstream must verify before user use.
- User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: risks.safety_notes | github_repo:672002598 | https://github.com/weaviate/Verba | No sandbox install has been executed yet; downstream must verify before user use.
9. Security or permission risk: no_demo
- Severity: medium
- Finding: no_demo
- User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: risks.scoring_risks | github_repo:672002598 | https://github.com/weaviate/Verba | no_demo; severity=medium
10. Security or permission risk: v0.3.0
- Severity: medium
- Finding: Security or permission risk is backed by a source signal: v0.3.0. Treat it as a review item until the current version is checked.
- User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: Source-linked evidence: https://github.com/weaviate/Verba/releases/tag/0.3.0
11. Security or permission risk: v0.3.1
- Severity: medium
- Finding: Security or permission risk is backed by a source signal: v0.3.1. Treat it as a review item until the current version is checked.
- User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: Source-linked evidence: https://github.com/weaviate/Verba/releases/tag/0.3.1
12. Security or permission risk: v2.1.2
- Severity: medium
- Finding: Security or permission risk is backed by a source signal: v2.1.2. Treat it as a review item until the current version is checked.
- User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: Source-linked evidence: https://github.com/weaviate/Verba/releases/tag/v2.1.2
Source: Doramagic discovery, validation, and Project Pack records
Community Discussion Evidence
These external discussion links are review inputs, not standalone proof that the project is production-ready.
Count of project-level external discussion links exposed on this manual page.
Open the linked issues or discussions before treating the pack as ready for your environment.
Community Discussion Evidence
Doramagic exposes project-level community discussion separately from official documentation. Review these links before using Verba with real data or production workflows.
- v2.1.2 - github / github_release
- v2.1.0 - github / github_release
- v1.0.3 - github / github_release
- 1.0.1 Beautiful Verba - github / github_release
- v0.4.0 - github / github_release
- v0.3.1 - github / github_release
- v0.3.0 - github / github_release
- README/documentation is current enough for a first validation pass. - GitHub / issue
Source: Project Pack community evidence and pitfall evidence