Doramagic Project Pack · Human Manual

weaviate

Weaviate is designed to power AI-native applications by providing:

Introduction to Weaviate

Related topics: System Architecture, Getting Started

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Fast Search Performance

Continue reading this section for the full explanation and source context.

Section Flexible Vectorization

Continue reading this section for the full explanation and source context.

Section Advanced Hybrid and Image Search

Continue reading this section for the full explanation and source context.

Related topics: System Architecture, Getting Started

Introduction to Weaviate

Weaviate is an open-source, cloud-native vector database written in Go that stores both objects and vectors, enabling high-performance semantic search at scale. It combines vector similarity search with keyword filtering, retrieval-augmented generation (RAG), and reranking capabilities in a single query interface.

Source: README.md

Overview

Weaviate is designed to power AI-native applications by providing:

  • Vector storage and search: Store and query vector embeddings at scale
  • Hybrid search: Combine semantic (vector) and keyword (BM25) search
  • Built-in RAG: Integrate generative AI directly into search workflows
  • Flexible deployment: Run on-premises, in the cloud, or use Weaviate Cloud
  • Multi-tenancy: Support for namespaced data with tenant isolation

Source: README.md

Core Features

Fast Search Performance

Weaviate performs complex semantic searches over billions of vectors in milliseconds. Built entirely in Go, the architecture ensures high responsiveness and reliability under heavy load. Weaviate uses HNSW (Hierarchical Navigable Small World) graphs as its primary ANN (Approximate Nearest Neighbor) algorithm.

Source: README.md

Flexible Vectorization

Weaviate supports two approaches to vector storage:

ApproachDescription
Automatic VectorizationVectorize data at import time using integrated vectorizers from OpenAI, Cohere, HuggingFace, Google, and more
Bring Your Own VectorsImport pre-generated vector embeddings directly

Source: README.md

The platform combines multiple search paradigms:

  • Semantic Search: Vector-based meaning-based similarity search
  • BM25 Keyword Search: Traditional keyword-based retrieval
  • Image Search: Search using images as queries
  • Advanced Filtering: Filter results using metadata and properties

Source: README.md

Integrated RAG and Reranking

Go beyond simple retrieval with built-in generative search capabilities. Weaviate supports late interaction models (like ColBERT) through the HFresh module for enhanced retrieval accuracy.

Source: README.md

Architecture Overview

Weaviate's architecture is built for scalability and reliability, with several key internal components:

graph TD
    A[Client Applications] --> B[REST API / gRPC API]
    A --> C[GraphQL API]
    B --> D[Query Engine]
    C --> D
    D --> E[HNSW Vector Index]
    D --> F[BM25 Inverted Index]
    D --> G[Object Store]
    E --> H[Vector Cache]
    F --> I[Segment Store]
    G --> I
    H --> J[Memory / Disk]
    I --> J

Key Components

ComponentPurpose
REST/gRPC APIPrimary interface for data operations and administration
GraphQL APIFlexible query interface for complex data retrieval
Query EngineOrchestrates search across vector and keyword indexes
HNSW IndexApproximate nearest neighbor search for vectors
BM25 IndexFull-text keyword search capability
Object StorePersistent storage for objects and metadata
Vector CacheIn-memory caching for frequently accessed vectors

Source: README.md

Search Modes

Weaviate supports multiple search strategies, configurable through the runtime system:

StrategyDescription
LeaderOnlyAll queries routed to cluster leader
LocalOnlyQueries served from local data only
LeaderOnMismatchRoute to leader only when local results don't meet criteria

Source: usecases/config/runtime/collection_retrieval_strategy.go:1-15

Client Libraries and APIs

Weaviate provides official client libraries for multiple programming languages:

LanguageDocumentation
Pythondocs.weaviate.io/weaviate/client-libraries/python
JavaScript/TypeScriptdocs.weaviate.io/weaviate/client-libraries/typescript
Javadocs.weaviate.io/weaviate/client-libraries/java
Godocs.weaviate.io/weaviate/client-libraries/go
C#/.NETdocs.weaviate.io/weaviate/client-libraries/csharp

API Interfaces

Weaviate exposes three API interfaces for communication:

  1. REST API: Standard HTTP interface for CRUD operations
  2. gRPC API: High-performance protocol buffer-based interface
  3. GraphQL API: Flexible query language for complex data retrieval

Source: README.md

Modules and Extensions

Weaviate includes a modular architecture that supports extension through modules:

Sum Transformers Module

The sum-transformers module provides summarization capabilities, allowing Weaviate to generate summaries of text properties during import or query time.

graph LR
    A[Text Input] --> B[Sum Transformers Module]
    B --> C[Summary Result]

The module client provides:

  • GetSummary: Generate summaries for text properties
  • MetaInfo: Retrieve model metadata from the transformer service

Source: modules/sum-transformers/client/client.go:1-50

Model Providers

Weaviate integrates with numerous embedding model providers:

ProviderDescription
OpenAIGPT and embedding models
CohereCommand and embedding models
HuggingFaceOpen-source models
GoogleVertex AI and PaLM models
Model2VecLightweight embeddings

Source: README.md

Deployment Options

Weaviate supports flexible deployment models:

graph TD
    A[Weaviate Deployment] --> B[Self-Hosted]
    A --> C[Weaviate Cloud]
    B --> D[On-Premises]
    B --> E[Cloud VMs]
    B --> F[Kubernetes]
    C --> G[Weaviate Cloud Services]
    C --> H[Multi-tenant Clusters]

Integrations

Weaviate integrates with external services across multiple categories:

CategoryExamples
Cloud HyperscalersAWS, Google Cloud
Compute InfrastructureModal, Replicate, Replicated
Data PlatformsAirbyte, Databricks, Confluent
LLM and Agent FrameworksLangChain, LlamaIndex, CrewAI

Source: README.md

Configuration and Telemetry

Runtime Configuration

Weaviate uses a dynamic configuration system that can be reloaded at runtime without service interruption. Configuration changes are monitored and applied automatically.

Source: usecases/config/runtime/manager_test.go:1-50

Feature Flags

The system supports feature flags through LaunchDarkly integration, enabling:

  • Gradual rollouts of new features
  • A/B testing of capabilities
  • Dynamic configuration changes

Source: usecases/config/runtime/launch_darkly.go:1-40

Telemetry Dashboard

A local telemetry dashboard tool is available for monitoring Weaviate instances, providing:

  • Real-time telemetry reception
  • Machine statistics tracking
  • Client usage monitoring (Python, JavaScript, Go, C#, Java)
  • Module usage information
  • Object and collection counts

Source: tools/telemetry-dashboard/README.md

Common Use Cases

Weaviate is designed for a variety of AI-powered applications:

Use CaseDescription
RAG SystemsRetrieval-augmented generation for LLMs
Semantic SearchMeaning-based search across documents
Image SearchSearch using images as queries
Recommendation EnginesPersonalized recommendations based on similarity
ChatbotsContext-aware conversational AI
Content ClassificationAutomated categorization of content

Source: README.md

Community Resources

Recipes and Examples

The community maintains extensive code samples:

ApplicationDescription
ElysiaDecision tree-based agentic system
VerbaOpen-source RAG application
HealthsearchSupplement product search demo
Awesome-MoviateMovie recommendation engine

Source: README.md

Contributing

Weaviate welcomes community contributions. Key resources for contributors:

Source: README.md

For more information, refer to:

Source: https://github.com/weaviate/weaviate / Human Manual

Getting Started

Related topics: Introduction to Weaviate, REST and gRPC API Layer

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Docker Compose (Local Development)

Continue reading this section for the full explanation and source context.

Section Starting Weaviate

Continue reading this section for the full explanation and source context.

Section Python Quick Start Example

Continue reading this section for the full explanation and source context.

Related topics: Introduction to Weaviate, REST and gRPC API Layer

Getting Started

Weaviate is an open-source, cloud-native vector database that stores both objects and vectors, enabling semantic search at scale. This guide provides the essential steps to install, configure, and run your first Weaviate instance.

Overview

Weaviate combines vector similarity search with keyword filtering, retrieval-augmented generation (RAG), and reranking in a single query interface. It is built in Go for performance and reliability.

Key Capabilities:

CapabilityDescription
Vector StorageStore and index high-dimensional vectors for semantic search
Hybrid SearchCombine BM25 keyword search with vector similarity
RAG IntegrationBuilt-in generative search with LLM modules
FilteringApply metadata filters to refine search results
Multi-tenancySupport for isolated tenant data
ReplicationHorizontal scaling with RAFT-based consensus

Source: README.md:1-15

Installation Methods

Weaviate can be deployed using Docker Compose for local development or Kubernetes for production environments.

Docker Compose (Local Development)

The fastest way to get started is using the official Docker Compose configuration:

# docker-compose.yml
version: '3.4'
services:
  weaviate:
    image: cr.weaviate.io/semitechnologies/weaviate:latest
    ports:
      - "8080:8080"
    environment:
      QUERY_DEFAULTS_LIMIT: 25
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
      PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
      ENABLE_MODULES: 'text2vec-contextionary'
      CLUSTER_HOSTNAME: 'node1'

Source: docker-compose.yml:1-12

Starting Weaviate

# Start in background
docker compose up -d

# Verify running
docker compose ps

# View logs
docker compose logs -f weaviate

Source: docker-compose.yml:1-30

Client Libraries

Weaviate provides official client libraries for multiple programming languages:

LanguagePackageDocumentation
Pythonpip install weaviate-clientdocs.weaviate.io
JavaScript/TypeScriptnpm install weaviate-clientdocs.weaviate.io
Gogo get github.com/weaviate/weaviate-go-client/v4docs.weaviate.io
JavaMaven dependency availabledocs.weaviate.io
C#/.NETNuGet package availabledocs.weaviate.io

Source: README.md:89-97

Python Quick Start Example

import weaviate

# Connect to local Weaviate
client = weaviate.Client("http://localhost:8080")

# Define collection schema
articles = client.collections.create(
    name="Article",
    vectorizer_config=[
        weaviate.classes.config.NamedVectors.text2vec_contextionary(
            name="title_vector"
        )
    ],
    properties=[
        {"name": "title", "dataType": ["text"]},
        {"name": "content", "dataType": ["text"]}
    ]
)

# Add objects
client.collections.get("Article").objects.insert(
    properties={
        "title": "Getting Started with Weaviate",
        "content": "Weaviate is a vector database..."
    }
)

# Perform semantic search
results = articles.query.near_text(query="Search objects by meaning", limit=1)
print(results.objects[0])

client.close()

Source: README.md:58-88

Go Client Example

The Go client provides type-safe access to Weaviate:

package main

import (
    "fmt"
    "context"
    "github.com/weaviate/weaviate-go-client/v4/weaviate"
)

func main() {
    cfg := weaviate.Config{
        Host: "localhost:8080",
    }
    client, err := weaviate.NewClient(cfg)
    if err != nil {
        panic(err)
    }
    
    ctx := context.Background()
    
    // Query example
    result, err := client.GraphQL().Get().
        WithClassName("Article").
        WithNearText(weaviate.NearTextParams{
            Query: "Getting started",
        }).
        WithLimit(10).
        Do(ctx)
}

Source: client/weaviate_client.go:1-50

API Interfaces

Weaviate exposes three API interfaces for communication:

API TypeDefault PortUse Case
REST API8080CRUD operations, configuration
gRPC API50051High-performance queries
GraphQL API8080Complex queries and aggregations

Source: README.md:100-102

REST API Endpoints

Common REST endpoints for getting started:

POST   /v1/objects          # Create object
GET    /v1/objects/{id}     # Retrieve object
PUT    /v1/objects/{id}     # Update object
DELETE /v1/objects/{id}     # Delete object
POST   /v1/objects/batch    # Batch import
GET    /v1/schema           # Get schema
POST   /v1/schema           # Create schema
GET    /v1/.well-known/ready # Health check

GraphQL Queries

{
  Get {
    Article(
      nearText: { concepts: ["vector search"] }
      limit: 10
    ) {
      title
      content
      _additional {
        certainty
        vector
      }
    }
  }
}

Collection Schema

Collections define the data structure in Weaviate:

from weaviate.classes.config import Configure, DataType, Property, Vectorizer

client.collections.create(
    name="Document",
    vectorizer_config=Configure.Vectorizer.text2vec_contextionary(
        vectorize_collection_name=True
    ),
    properties=[
        Property(
            name="title",
            data_type=DataType.TEXT,
            vectorize_property_name=True
        ),
        Property(
            name="content",
            data_type=DataType.TEXT
        ),
        Property(
            name="category",
            data_type=DataType.TEXT,
            filterable=True,
            sortable=True
        )
    ]
)

Source: README.md:40-57

Vector Index Configuration

Weaviate uses HNSW (Hierarchical Navigable Small World) as the default vector index:

ParameterDefaultDescription
efConstruction128Build-time depth
maxConnections64Max connections per node
ef256Search parameter
vectorCacheMaxObjects1000000000Vector cache size

Source: adapters/repos/db/vector/hnsw/vector_index.go:1-50

Data Flow

graph TD
    A[Client Application] -->|REST/gRPC| B[Weaviate Server]
    B --> C{Query Type}
    C -->|Vector Search| D[HNSW Index]
    C -->|BM25 Search| E[Inverted Index]
    C -->|Hybrid Search| F[Combine Results]
    D --> G[Object Store]
    E --> G
    F --> G
    G --> H[Response to Client]

Configuration Options

Key environment variables for initial setup:

VariableDefaultDescription
HOST0.0.0.0Server bind address
PORT8080HTTP port
PERSISTENCE_DATA_PATH/var/lib/weaviateData storage path
AUTHENTICATION_ANONYMOUS_ACCESS_ENABLEDfalseEnable anonymous access
ENABLE_MODULES-Enabled module list
QUERY_DEFAULTS_LIMIT25Default result limit

Source: docker-compose.yml:5-12

After setting up your collection with data, perform your first search:

# Semantic search
results = client.collections.get("Article").query.near_text(
    query="What is Weaviate?",
    limit=5
)

# Hybrid search (vector + keyword)
results = client.collections.get("Article").query.hybrid(
    query="vector database",
    limit=5,
    alpha=0.5  # Balance between vector (1.0) and keyword (0.0)
)

# With metadata filter
results = client.collections.get("Article").query.near_text(
    query="machine learning",
    limit=5,
    filters=Filter.by_property("category").equal("tutorial")
)

Next Steps

Community Resources

Access the Weaviate community for support:

Source: https://github.com/weaviate/weaviate / Human Manual

System Architecture

Related topics: Cluster and RAFT Consensus, Vector Indexes (HNSW and HFresh), LSMKV Storage Engine

Section Related Pages

Continue reading this section for the full explanation and source context.

Section REST API Handlers

Continue reading this section for the full explanation and source context.

Section gRPC API Handlers

Continue reading this section for the full explanation and source context.

Section Request Processing Flow

Continue reading this section for the full explanation and source context.

Related topics: Cluster and RAFT Consensus, Vector Indexes (HNSW and HFresh), LSMKV Storage Engine

System Architecture

Weaviate is an open-source, cloud-native vector database written in Go that stores both objects and vectors, enabling semantic search at scale. The architecture combines vector similarity search with keyword filtering, retrieval-augmented generation (RAG), and reranking in a single query interface. Source: README.md

Overview

Weaviate's architecture follows a layered design pattern with clear separation between handlers, use cases, and repositories. The system is built entirely in Go for performance and reliability, ensuring highly responsive AI applications even under heavy load.

graph TD
    subgraph "Client Layer"
        Python[Python Client]
        TypeScript[TypeScript Client]
        Java[Java Client]
        Go[Go Client]
        CSharp[C# Client]
    end
    
    subgraph "API Layer"
        REST[REST API<br/>:8080]
        gRPC[gRPC API<br/>:50051]
        GraphQL[GraphQL API]
    end
    
    subgraph "Handler Layer"
        RESTHandler[REST Handlers]
        GRPCHandler[gRPC Handlers]
        ClusterHandler[Cluster Handlers]
    end
    
    subgraph "Use Case Layer"
        Objects[Objects Manager]
        Schema[Schema Manager]
        Search[Search Use Cases]
    end
    
    subgraph "Repository Layer"
        DBRepo[DB Repository]
        VectorIndex[Vector Index<br/>HNSW]
        Modules[Module Clients]
    end
    
    Python --> REST
    Python --> gRPC
    TypeScript --> REST
    Java --> REST
    Go --> REST
    Go --> gRPC
    REST --> RESTHandler
    gRPC --> GRPCHandler
    RESTHandler --> Objects
    GRPCHandler --> Objects
    Objects --> DBRepo
    DBRepo --> VectorIndex

API Layer

Weaviate exposes multiple APIs for communicating with the database server, providing flexibility for different client requirements.

API TypeDefault PortProtocolUse Case
REST API8080HTTPGeneral operations, CRUD
gRPC API50051Protocol BuffersHigh-performance, streaming
GraphQL API8080HTTP/GraphQLComplex queries

Source: README.md

REST API Handlers

The REST API is implemented in adapters/handlers/rest/server.go and handles HTTP-based requests for all database operations. The REST interface supports:

  • Collection management (create, read, update, delete)
  • Object CRUD operations
  • Search queries (vector, BM25, hybrid)
  • Schema management
  • Backup and export operations

gRPC API Handlers

The gRPC interface, defined in adapters/handlers/grpc/server.go, provides high-performance binary communication using Protocol Buffers. The gRPC API is particularly important for:

  • Distributed task coordination
  • High-throughput data ingestion
  • Replication operations
  • Cluster communication

The protobuf message definitions in cluster/proto/api/message.pb.go define the contract for all gRPC communications:

message AddDistributedTaskRequest {
    string namespace = 1;
    string id = 2;
    bytes payload = 4;
    int64 submitted_at_unix_millis = 5;
    repeated string unit_ids = 6;
    repeated UnitSpec unit_specs = 7;
    bool needs_preparation_barrier = 8;
}

Source: cluster/proto/api/message.pb.go

Handler Layer

The handler layer acts as the bridge between external API requests and internal business logic. Handlers are responsible for:

  • Request validation and parsing
  • Authentication and authorization
  • Rate limiting
  • Request routing to appropriate use cases

Request Processing Flow

sequenceDiagram
    Client->>+RESTHandler: HTTP Request
    RESTHandler->>+GRPCHandler: Internal Call
    GRPCHandler->>+ObjectsManager: Use Case Request
    ObjectsManager->>+DBRepository: Data Operation
    DBRepository->>+VectorIndex: Index Operation
    VectorIndex-->>-DBRepository: Index Result
    DBRepository-->>-ObjectsManager: Data Result
    ObjectsManager-->>-GRPCHandler: Business Result
    GRPCHandler-->>-RESTHandler: Proto Result
    RESTHandler-->>-Client: HTTP Response

Use Case Layer

The use case layer contains the business logic and orchestration of database operations. Key components include:

Objects Manager

The usecases/objects/manager.go handles all object-related operations:

  • Object creation, retrieval, update, and deletion
  • Batch operations
  • Tenant management
  • Time-to-live (TTL) handling
graph LR
    subgraph "Objects Manager"
        Create[Create Object]
        Get[Get Object]
        Update[Update Object]
        Delete[Delete Object]
        Batch[Batch Operations]
        TTL[TTL Handler]
    end
    
    Create --> Repo[(Repository)]
    Get --> Repo
    Update --> Repo
    Delete --> Repo
    Batch --> Repo
    TTL --> Repo

Source: usecases/objects/manager.go

Repository Layer

The repository layer provides data access abstractions and manages persistence.

Database Repository

The adapters/repos/db/index.go implements the primary data repository that handles:

  • Object storage and retrieval
  • Sharding across nodes
  • Replication coordination
  • Compaction and cleanup operations

Vector Index

Weaviate uses HNSW (Hierarchical Navigable Small World) for vector indexing, providing:

  • Sub-millisecond search latency
  • Scalability to billions of vectors
  • Support for various distance metrics (cosine, dot product, L2)

Module Clients

Module clients provide connectivity to external services:

ModulePurposeClient Location
text2vec-contextionaryBuilt-in text vectorizationmodules/text2vec-contextionary/client/
sum-transformersText summarizationmodules/sum-transformers/client/
multi2vec-*Multi-modal vectorizationmodules/multi2vec-*/

Source: modules/text2vec-contextionary/client/contextionary.go

Distributed Architecture

Weaviate supports horizontal scaling through a distributed architecture based on RAFT consensus for cluster coordination.

Distributed Task Handling

The system uses a distributed task model for operations that span multiple nodes:

graph TD
    subgraph "Node A"
        Task1[AddDistributedTaskRequest]
        Prep1[Preparation Barrier]
    end
    
    subgraph "Node B"
        Task2[Distributed Task]
        Prep2[Preparation Barrier]
    end
    
    subgraph "Node C"
        Task3[Distributed Task]
        Prep3[Preparation Barrier]
    end
    
    Task1 -->|Coordination| Task2
    Task1 -->|Coordination| Task3
    Prep1 -->|Sync| Prep2
    Prep2 -->|Sync| Prep3
    Prep3 -->|Complete| Task1

Distributed task messages include:

Message TypePurpose
AddDistributedTaskRequestSubmit new distributed task
RecordDistributedTaskPreparationCompleteAckRequestAcknowledge preparation phase
RecordDistributedTaskPostCompletionAckRequestSWAP-phase callback result
MarkTaskFinalizedRequestMark task as complete
CleanUpDistributedTaskRequestCleanup task resources

Source: cluster/proto/api/message.pb.go

Replication and Clustering

Weaviate implements multi-layer replication for high availability:

Replication Details Query Types

The system provides detailed replication information through specialized query types:

Query TypePurpose
TYPE_GET_REPLICATION_DETAILSGet replication details
TYPE_GET_REPLICATION_DETAILS_BY_COLLECTIONCollection-specific replication info
TYPE_GET_REPLICATION_DETAILS_BY_COLLECTION_AND_SHARDShard-level replication info
TYPE_GET_REPLICATION_DETAILS_BY_TARGET_NODENode-specific replication info

Source: cluster/proto/api/message.pb.go

RAFT Consensus

Cluster coordination uses the RAFT consensus algorithm to ensure:

  • Leader election across nodes
  • Log replication for consistency
  • Fault tolerance during node failures

Recent stability improvements addressed in releases:

  • v1.36.13: TTL context cancellation handling during deletions
  • v1.35.19: Recursive RAFT commands fixes
  • v1.35.17: HNSW visited lists improvements

Module System

Weaviate's extensibility comes from its module system, which allows integration with various AI providers and services.

Built-in Vectorizers

  • OpenAI: text2vec-openai
  • Cohere: text2vec-cohere
  • HuggingFace: text2vec-transformers
  • Google: multi2vec-google
  • Model2Vec: Lightweight vectorization

Custom Vectorizers

Users can bring their own pre-generated vectors or use custom vectorization modules.

Source: README.md

Search Architecture

Weaviate combines multiple search paradigms in a unified query interface:

Search Types

TypeDescriptionUse Case
Vector SearchSemantic similarity in embedding space"Find similar items"
BM25Keyword-based retrievalExact term matching
HybridCombines vector + BM25Best of both worlds
Image SearchVisual similarity"Find similar images"

Query Execution Flow

graph TD
    Query[Search Query] --> Parse[Parse & Validate]
    Parse --> Filters[Apply Filters]
    Filters --> Hybrid{Hybrid?}
    Hybrid -->|Yes| Vec[Vector Search]
    Hybrid -->|Yes| Bm[BM25 Search]
    Hybrid -->|No| Single[Single Search]
    Vec --> Merge[Merge & Rerank]
    Bm --> Merge
    Single --> Merge
    Merge --> Rank[Ranking]
    Rank --> Return[Return Results]

Filtering

Advanced filtering capabilities support complex queries:

  • Property-based filters
  • Nested object filtering (v1.38.0+)
  • Tenant isolation
  • Metadata filtering

Note: Community users have reported issues with metadata filters in hybrid search when using n8n integration. Source: Community Issue #11262

Storage Architecture

Object Storage

Objects are stored with the following structure:

  • Properties: JSON-serializable key-value pairs
  • Vectors: Float arrays (normalized for cosine similarity)
  • Metadata: Timestamps, version info, tenant association

Compaction

The compaction system handles secondary index optimization:

  • Merges varying secondary keys
  • Reduces storage overhead
  • Recent fixes address size accumulation issues

Source: Community Release v1.37.1

Telemetry

Weaviate includes a telemetry dashboard (tools/telemetry-dashboard/) for monitoring:

  • Client usage statistics (Python, Java, TypeScript, Go, C#)
  • Module usage patterns
  • Object and collection counts
  • Machine-level aggregations

Source: tools/telemetry-dashboard/README.md

Key Architectural Decisions

Why Go?

  • Performance: Native execution with minimal overhead
  • Concurrency: Built-in goroutines for parallel operations
  • Memory: Efficient garbage collection tuned for database workloads
  • Cross-platform: Single binary deployment

Why Multiple APIs?

FactorRESTgRPCGraphQL
Browser supportNativeRequires proxyNative
PerformanceGoodExcellentGood
Schema evolutionFlexibleStrictFlexible
Use caseGeneralHigh-throughputComplex queries

Common Issues and Considerations

Community-Reported Limitations

IssueImpactWorkaround
No batch patch/partial update (#2124)400k updates take ~20hIndividual updates
Windows binary support (#3315)No native Windows buildDocker/Linux
Missing "NOT" filter operator (#3683)Cannot exclude criteriaAlternative filters
Multiple vector properties (#2465)Single vector per objectSeparate collections

Performance Optimization

For optimal performance:

  1. Use appropriate vectorizer settings
  2. Configure HNSW parameters (efConstruction, maxConnections)
  3. Enable compression for large datasets
  4. Use proper indexing for filtering properties
  5. Leverage batch operations for bulk imports

Summary

Weaviate's architecture provides a scalable, high-performance vector database solution with:

  • Multi-API support (REST, gRPC, GraphQL) for client flexibility
  • Distributed operation via RAFT consensus and distributed task handling
  • Extensible modules for AI/ML integration
  • Combined search (vector, keyword, hybrid) in a single interface
  • Enterprise features including replication, sharding, and compaction

The layered architecture ensures maintainability while the Go implementation provides the performance required for production AI applications.

Source: https://github.com/weaviate/weaviate / Human Manual

Cluster and RAFT Consensus

Related topics: System Architecture

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Cluster Topology

Continue reading this section for the full explanation and source context.

Section RAFT Operations

Continue reading this section for the full explanation and source context.

Section Task Types

Continue reading this section for the full explanation and source context.

Related topics: System Architecture

Cluster and RAFT Consensus

Overview

Weaviate's distributed architecture relies on RAFT consensus to maintain consistency across nodes in a cluster. RAFT is a distributed consensus algorithm that provides fault-tolerant leader election and log replication, ensuring that all nodes in the cluster agree on the same state even when some nodes fail.

In Weaviate, the RAFT subsystem handles critical cluster operations including:

  • Schema consistency: Ensuring all nodes see the same collection definitions
  • Replication coordination: Managing how data is copied across replicas
  • Distributed task management: Coordinating long-running operations across the cluster
  • Shard management: Handling replica additions and removals

Source: cluster/proto/api/message.pb.go

Architecture

Cluster Topology

Weaviate clusters consist of multiple nodes that communicate via gRPC. Each node can participate in RAFT consensus, with one node elected as leader. The leader coordinates all write operations and replicates them to follower nodes.

graph TB
    subgraph "Weaviate Cluster"
        Leader["Leader Node<br/>(RAFT Leader)"]
        Follower1["Follower Node 1"]
        Follower2["Follower Node 2"]
        FollowerN["Follower Node N"]
        
        Leader <--> Follower1
        Leader <--> Follower2
        Leader <--> FollowerN
    end
    
    subgraph "Data Layer"
        Shard1["Shard 1"]
        Shard2["Shard 2"]
        ShardN["Shard N"]
    end
    
    Leader --> Shard1
    Follower1 --> Shard2
    Follower2 --> ShardN

RAFT Operations

The RAFT implementation supports a comprehensive set of operation types defined in the protobuf schema:

Operation TypeIDDescription
TYPE_UNSPECIFIED0Unspecified operation
TYPE_ADD_CLASS1Add a new collection/class
TYPE_UPDATE_CLASS2Update collection definition
TYPE_DELETE_CLASS3Delete a collection
TYPE_RESTORE_CLASS4Restore a deleted collection
TYPE_ADD_PROPERTY5Add property to collection
TYPE_UPDATE_PROPERTY6Update collection property
TYPE_UPDATE_SHARD_STATUS10Update shard status
TYPE_ADD_REPLICA_TO_SHARD11Add replica to shard
TYPE_DELETE_REPLICA_FROM_SHARD12Remove replica from shard
TYPE_ADD_TENANT16Add tenant in multi-tenancy
TYPE_UPDATE_TENANT17Update tenant status
TYPE_DELETE_TENANT18Delete tenant
TYPE_TENANT_PROCESS19Process tenant operation

Source: cluster/proto/api/message.pb.go:301-320

Distributed Tasks

Weaviate uses a distributed task system for operations that span multiple nodes. This system ensures crash safety and proper coordination of complex operations.

Task Types

Task TypeIDPurpose
TYPE_DISTRIBUTED_TASK_LIST300List distributed tasks
TYPE_DISTRIBUTED_TASK_ADD300Add new distributed task
TYPE_DISTRIBUTED_TASK_CANCEL301Cancel a running task
TYPE_DISTRIBUTED_TASK_RECORD_NODE_COMPLETED302Record node completion
TYPE_DISTRIBUTED_TASK_CLEAN_UP303Clean up task resources
TYPE_DISTRIBUTED_TASK_RECORD_UNIT_COMPLETED304Record unit completion
TYPE_DISTRIBUTED_TASK_UPDATE_UNIT_PROGRESS305Update task progress
TYPE_DISTRIBUTED_TASK_MARK_FINALIZED306Mark task as finalized
TYPE_DISTRIBUTED_TASK_RECORD_POST_COMPLETION_ACK307Record post-completion acknowledgment
TYPE_DISTRIBUTED_TASK_RECORD_PREPARATION_COMPLETE_ACK308Record preparation acknowledgment

Source: cluster/proto/api/message.pb.go:301-308

Task Lifecycle

stateDiagram-v2
    [*] --> Created: Task Added
    Created --> Preparing: Start Preparation
    Preparing --> Prepared: All Nodes Prepared
    Prepared --> Running: Begin Execution
    Running --> UnitCompleted: Record Unit Done
    UnitCompleted --> AllCompleted: All Units Done
    AllCompleted --> Finalized: Mark Finalized
    AllCompleted --> Cancelled: Cancel Request
    Preparing --> Cancelled: Cancel Request
    Running --> Cancelled: Cancel Request
    Finalized --> [*]
    Cancelled --> [*]

Task Request Structure

The distributed task system uses specialized request types for tracking progress and completion:

type RecordDistributedTaskNodeCompletionRequest struct {
    Namespace         string  // Tenant/namespace identifier
    Id                string  // Task identifier
    Version           uint64  // Task version for consistency
    NodeId            string  // Completing node identifier
    Error             string  // Error message if failed
    FinishedAtUnixMillis int64 // Completion timestamp
}

Source: cluster/proto/api/message.pb.go

Replication Coordination

Replica Management Operations

Weaviate supports dynamic replica management through RAFT:

OperationDescription
TYPE_ADD_REPLICA_TO_SHARDAdds a new replica to a shard for increased redundancy
TYPE_DELETE_REPLICA_FROM_SHARDRemoves a replica when reducing redundancy
TYPE_UPDATE_SHARD_STATUSUpdates the operational status of a shard

Source: cluster/proto/api/message.pb.go:301-312

Replication States

The system tracks detailed replication operation states:

State TypeIDDescription
TYPE_GET_REPLICATION_DETAILS200Get replication details
TYPE_GET_REPLICATION_DETAILS_BY_COLLECTION201Collection-specific replication info
TYPE_GET_REPLICATION_DETAILS_BY_COLLECTION_AND_SHARD202Shard-specific replication info
TYPE_GET_REPLICATION_SCALE_PLAN208Get scaling plan for replication
TYPE_REPLICATION_REPLICATE_FORCE_DELETE_BY_UUID224Force delete specific object

Source: cluster/proto/api/message.pb.go:100-224

Schema Management in Cluster

Schema Consistency

When schema changes occur (add/update/delete classes or properties), these changes must be propagated to all nodes consistently. The schema manager coordinates this through the RAFT subsystem.

Multi-Tenancy Support

The RAFT consensus system handles tenant operations atomically:

  • Tenant addition: Ensures all nodes initialize tenant storage
  • Tenant updates: Coordinates status changes across replicas
  • Tenant deletion: Safely removes tenant data across all nodes

Source: cluster/schema/manager.go

Stability Improvements

Recent releases have focused on improving RAFT stability:

VersionFixes
v1.36.13RAFT stability fixes
v1.35.19Recursive RAFT commands fix
v1.37.0Internal cluster comms improvements

Source: Release Notes - v1.36.13, Release Notes - v1.35.19, Release Notes - v1.37.0

Crash Safety

The distributed task system closes crash-safety gaps where a node failure during operations (such as RunSwapOnShard) could cause cluster-wide schema inconsistencies. The task recording mechanism ensures:

  1. All replicas record task completion
  2. Failed tasks are properly cleaned up
  3. Cluster-wide state converges after recovery

Configuration

Cluster Context

Weaviate integrates with LaunchDarkly for feature flags and cluster-specific configuration:

type LDIntegration struct {
    ldClient  *ldclient.LDClient  // LaunchDarkly client
    ldContext ldcontext.MultiContext // Cluster-aware context
}

The cluster context includes:

  • Cluster key: Identifies the entire cluster
  • Org context: Tenant/organization identifier
  • Node key: Individual node identifier

Source: usecases/config/runtime/launch_darkly.go

Common Issues and Resolutions

Metadata Filter with Hybrid Search (n8n)

When using metadata filters with hybrid search, ensure that the filter is properly specified at the query level. The filter should be applied after the hybrid search completes its vector and keyword components.

Issue Reference: GitHub Issue #11262

Batch Update Performance

Large batch updates (e.g., 400k records) may take extended time due to RAFT replication overhead. Consider:

  • Using async replication for non-critical updates
  • Batching updates with appropriate chunk sizes
  • Utilizing tenant isolation to reduce coordination overhead

Issue Reference: GitHub Issue #2124

Best Practices

  1. Node Count: Maintain an odd number of nodes (3, 5, 7) to avoid split-brain scenarios
  2. Network Latency: Ensure low-latency network connections between cluster nodes
  3. Tenant Isolation: Use multi-tenancy features to reduce RAFT coordination overhead
  4. Monitoring: Monitor RAFT leader elections and replication lag
  5. Upgrades: Follow rolling upgrade procedures to maintain quorum during updates

Source: https://github.com/weaviate/weaviate / Human Manual

Vector Indexes (HNSW and HFresh)

Related topics: Hybrid Search Implementation, LSMKV Storage Engine

Section Related Pages

Continue reading this section for the full explanation and source context.

Related topics: Hybrid Search Implementation, LSMKV Storage Engine

Vector Indexes (HNSW and HFresh)

Overview

Weaviate provides multiple vector index types to enable efficient Approximate Nearest Neighbor (ANN) search at scale. Vector indexes are critical for performing semantic searches over billions of vectors in milliseconds, combining vector similarity search with keyword filtering in a single query interface.

Weaviate currently supports two primary vector index implementations:

Index TypeDescriptionStatus
HNSWHierarchical Navigable Small WorldGenerally Available (GA)
HFreshHigh-performance Fresh indexGenerally Available (GA) as of v1.38.0
FlatBasic flat vector indexAvailable for specific use cases

Source: README.md

Source: https://github.com/weaviate/weaviate / Human Manual

LSMKV Storage Engine

Related topics: Vector Indexes (HNSW and HFresh), Inverted Index and Filtering

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Store

Continue reading this section for the full explanation and source context.

Section Bucket

Continue reading this section for the full explanation and source context.

Section Segment Files

Continue reading this section for the full explanation and source context.

Related topics: Vector Indexes (HNSW and HFresh), Inverted Index and Filtering

LSMKV Storage Engine

LSMKV (Log-Structured Merging Key-Value) is Weaviate's core storage engine built on the Log-Structured Merge-tree (LSM tree) architecture. It provides high-performance, disk-based persistence for Weaviate's vector database operations, managing both object data and inverted indices with optimized write and read paths.

Overview

LSMKV serves as the foundational storage layer for Weaviate's repository layer, located at adapters/repos/db/lsmkv/. It replaces or complements traditional B-tree based storage by leveraging sequential write patterns that are significantly faster for write-heavy workloads typical of vector database ingestion pipelines.

The storage engine implements a multi-level architecture where data flows through memory-based write buffers into persistent segments on disk, with periodic compaction operations merging and optimizing these segments.

graph TD
    A[Write Operations] --> B[Memtable Buffer]
    B --> C{Memtable Full?}
    C -->|Yes| D[Flush to L0 Segment]
    D --> E[L0 Segments]
    E --> F[Compaction]
    F --> G[L1+ Segments]
    G --> H[Read Path]
    A --> H

Architecture Components

Store

The Store is the top-level container managing multiple LSMKV buckets and their associated segments. It handles initialization, lifecycle management, and coordinates operations across all buckets within a single storage directory.

ComponentResponsibility
StoreTop-level storage container, manages bucket registry
BucketIndividual key-value namespace within a store
SegmentPersistent on-disk data file
SegmentGroupCollection of segments at the same level
CompactorBackground process for segment merging

Source: adapters/repos/db/lsmkv/store.go

Bucket

Buckets represent isolated key-value namespaces within the store. Each bucket maintains its own memtable, segment list, and compaction state. Weaviate uses multiple buckets for different data structures:

  • Object bucket: Stores serialized objects with their properties
  • Inverted index bucket: Maintains term-to-objectID mappings
  • Vector bucket: Stores compressed or raw vector data
  • Additional metadata buckets for schema and tenant information

Source: adapters/repos/db/lsmkv/bucket.go

Segment Files

Segments are the fundamental unit of persistence in LSMKV. Each segment file contains:

  • Data section: Key-value pairs sorted by key
  • Index section: Sparse index mapping keys to positions in the data section
  • Bloom filter: Optional probabilistic structure for fast membership tests
  • Primary key index: For efficient lookups within the segment

Segments are named with a consistent format that includes:

  • Level indicator (L0, L1, etc.)
  • Sequence number
  • Creation timestamp

Source: adapters/repos/db/lsmkv/segment_index.go

Compactor

The compactor is the background process responsible for merging segments across levels. Weaviate implements different compactor strategies:

Compactor TypePurposeUse Case
ReplaceCompactorSimple key-value replacement during mergeStandard object storage
RoaringSetCompactorBitmap-based set operationsInverted indices, posting lists

The compactor addresses the issue of varying secondary key counts mentioned in multiple release notes, ensuring that secondary index size accumulation remains bounded and predictable during merges.

Source: adapters/repos/db/lsmkv/compactor_replace.go Source: adapters/repos/db/lsmkv/compactor_roaring_set.go

Write Path

Memtable Buffering

Write operations first enter an in-memory buffer (memtable):

  1. Key and value are serialized
  2. Entry is appended to the memtable in sorted order
  3. Write-ahead log (WAL) receives a copy for durability
  4. Operation returns success once WAL is persisted
sequenceDiagram
    participant Client
    participant Memtable
    participant WAL
    participant Segment
    
    Client->>Memtable: Put(key, value)
    Memtable->>WAL: Write entry
    WAL-->>Memtable: Persisted
    Memtable-->>Client: Ack
    Note over Memtable: Buffer fills up
    Memtable->>Segment: Flush to L0
    Segment-->>Memtable: Done

Segment Flush

When the memtable reaches its configured capacity:

  1. Memtable is frozen (no new writes accepted)
  2. New memtable is created for incoming writes
  3. Frozen memtable is sorted and written as a new L0 segment
  4. Bloom filter is built for the new segment
  5. Sparse index is generated

Read Path

Point Lookups

For single-key lookups:

  1. Check memtable (most recent data)
  2. Check L0 segments (most recent to oldest)
  3. Check lower-level segments (L1, L2, etc.)
  4. Return first matching value found

Range Scans

For range queries:

  1. Query bloom filters to determine which segments may contain data
  2. Search qualifying segments using primary key indices
  3. Merge results from all segments
  4. Apply deletion tombstones

Compaction and Maintenance

Compaction Triggers

Compaction is triggered by:

  • Segment count thresholds per level
  • Size-based thresholds
  • Time-based policies
  • Explicit user requests (reindex operations)

The compaction process ensures:

  • Optimal read performance by reducing segment count
  • reclamation of overwritten or deleted data
  • Management of secondary index size accumulation

Compaction Levels

LevelDescriptionMax Segments
L0Recently flushed, may overlapConfigurable
L1Result of L0 compactionConfigurable
L2+Result of L1+ compactionConfigurable

Source: adapters/repos/db/lsmkv/segment_group.go

Configuration Options

Key LSMKV storage configuration parameters:

ParameterDefaultDescription
memtableSizeDynamicSize of in-memory buffer before flush
segmentMaxSizeDynamicMaximum size of a single segment
bloomFilterEnabledEnable/disable bloom filters
compactionIntervalBackgroundInterval between compaction checks
walEnabledtrueEnable write-ahead logging

Performance Issues

The community has reported performance concerns related to storage operations:

  • Batch update performance: Issue #2124 highlights that batch updates taking ~20h for 400k records can be attributed in part to storage layer overhead
  • Compaction overhead: Recent stability fixes (v1.36.12-v1.37.2) addressed secondary index size accumulation during compaction

Recent Improvements

Recent releases have included significant storage engine improvements:

  • v1.37.1-1.37.2: Fixed secondary index size accumulation for varying key counts
  • v1.35.17: Improved HNSW visited lists and dequeuing during backups
  • v1.36.11: Fixed compression metadata persistence to prevent data loss on restart

Tenant Isolation

The TTL stability improvements (v1.36.13, v1.37.2) ensure that tenant re-deactivation works correctly even when storage operations are interrupted, preventing orphaned data.

Usage Patterns

Embedded Mode

LSMKV operates in embedded mode when Weaviate runs without an external database dependency, providing:

  • Self-contained persistence
  • Simplified deployment
  • Reduced operational overhead

Clustered Mode

In distributed deployments, each node maintains its own LSMKV store:

  • Local persistence with network replication
  • Synchronized schema across nodes
  • Distributed compaction coordination

Source: https://github.com/weaviate/weaviate / Human Manual

Inverted Index and Filtering

Related topics: Hybrid Search Implementation, Filtering and Query Language, LSMKV Storage Engine

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Core Components

Continue reading this section for the full explanation and source context.

Section Filter Types Supported

Continue reading this section for the full explanation and source context.

Section Scoring Algorithm

Continue reading this section for the full explanation and source context.

Related topics: Hybrid Search Implementation, Filtering and Query Language, LSMKV Storage Engine

Inverted Index and Filtering

The inverted index is a core component of Weaviate's storage layer that enables efficient filtering and keyword-based search operations. While the vector index handles semantic similarity searches, the inverted index provides fast lookups for exact-match filtering, property-based queries, and BM25-based keyword search. This dual-index architecture allows Weaviate to combine vector search with traditional filtering in a single query operation.

Overview

Weaviate maintains an inverted index for each collection (class) that maps property values to the object IDs containing those values. This mapping enables O(1) or O(log n) lookup times for filter operations, regardless of the total number of objects in the collection. The inverted index is persisted on disk and loaded into memory as needed for query execution.

graph TD
    A[Query with Filter] --> B[Parse Filter Conditions]
    B --> C{Filter Type?}
    C -->|Exact Match| D[Inverted Index Lookup]
    C -->|Text Search| E[BM25 Searcher]
    C -->|Combined| F[Merge Results]
    D --> G[Get Object IDs]
    E --> H[Calculate BM25 Scores]
    H --> F
    F --> I[Apply to Vector Results]
    I --> J[Return Filtered Results]

Source: adapters/repos/db/inverted/searcher.go

Architecture Components

The inverted index implementation is located in adapters/repos/db/inverted/ and consists of several interconnected components that work together to provide filtering capabilities.

Core Components

ComponentFilePurpose
Searchersearcher.goMain entry point for inverted index queries and filter execution
BM25 Searcherbm25_searcher.goImplements BM25 ranking algorithm for keyword search
Objectsobjects.goManages the underlying object storage and index updates
Configconfig.goDefines configuration options for inverted index behavior

Source: adapters/repos/db/inverted/config.go

Filter Types Supported

The inverted index supports multiple filter types that can be combined in complex query expressions. These include equality filters, range filters for numeric and date properties, set membership filters, and text search filters.

Equality Filters use direct hash table lookups to find objects where a property equals a specific value. For string properties, the index stores lowercase normalized values to enable case-insensitive matching while maintaining lookup efficiency.

Range Filters work with numeric properties (int, number) and date/time properties. These filters use ordered data structures to efficiently find objects within a specified range without scanning all indexed values.

Set Filters allow checking if an object's array property contains any of the specified values. This is commonly used for tag-based filtering where an object may have multiple categories or labels.

Source: adapters/repos/db/inverted/searcher.go

BM25 Search Implementation

The BM25 (Best Matching 25) algorithm provides relevance-based ranking for keyword searches. Weaviate's implementation calculates scores based on term frequency, inverse document frequency, and document length normalization.

Scoring Algorithm

BM25 calculates relevance scores using the following key parameters:

ParameterDescriptionDefault Behavior
k1Term frequency saturation constantTypically 1.2-2.0
bDocument length normalization factorTypically 0.75
avgdlAverage document length in collectionCalculated dynamically

The formula combines term frequency saturation with inverse document frequency to rank documents. Higher k1 values increase the impact of term frequency, while the b parameter controls how much document length affects the score.

Source: adapters/repos/db/inverted/bm25_searcher.go

BM25 Search Workflow

sequenceDiagram
    participant Client
    participant BM25Searcher
    participant InvertedIndex
    participant DocumentStore

    Client->>BM25Searcher: Query: "keyword search"
    BM25Searcher->>InvertedIndex: Get posting lists for terms
    InvertedIndex-->>BM25Searcher: Term frequencies and document counts
    BM25Searcher->>BM25Searcher: Calculate IDF for each term
    BM25Searcher->>BM25Searcher: Compute BM25 scores per document
    BM25Searcher->>DocumentStore: Fetch document text for scoring
    DocumentStore-->>BM25Searcher: Document contents
    BM25Searcher->>BM25Searcher: Apply term frequency scoring
    BM25Searcher-->>Client: Ranked document results

Source: adapters/repos/db/inverted/bm25_searcher.go

Filter Execution and Result Merging

When a query includes both vector similarity and filters, Weaviate executes both searches independently and merges the results. The filtering operation serves as a pre-filter or post-filter depending on the query configuration and performance considerations.

Filter Modes

Pre-filtering applies the filter before vector search, narrowing down the candidate set before similarity ranking. This approach is efficient when filters are highly selective and reduce the search space significantly.

Post-filtering evaluates the filter after vector similarity ranking, which can be faster when the vector search naturally produces relevant results and the filter removes only a small percentage of candidates. However, this mode may require fetching additional candidates to meet the requested limit after filtering.

Source: adapters/repos/db/inverted/searcher.go

Result Set Operations

The inverted index supports set operations for combining filter results:

  • AND: Intersects multiple filter conditions to find objects matching all criteria
  • OR: Unions multiple filter conditions to find objects matching any criterion
  • NOT: Negates a filter condition to exclude matching objects

Complex filter expressions are parsed into an execution tree where each node represents a filter operation. The tree is evaluated bottom-up, combining intermediate result sets as operations complete.

Source: adapters/repos/db/inverted/objects.go

Configuration Options

The inverted index behavior can be customized through collection schema configuration. These settings affect indexing performance, storage size, and query behavior.

ConfigurationTypeDescription
indexFilterablebooleanEnable/disable filtering on this property (default: true)
indexSortablebooleanEnable/disable sorting on this property (default: false)
indexSearchablebooleanEnable/disable BM25 search on this property (default: true for text)
tokenizationstringHow text is tokenized: word, lowercase, whitespace, field
stopwordsobjectCustom stopword list for text indexing

Source: adapters/repos/db/inverted/config.go

Property Indexing Settings

Properties can be individually configured to optimize index behavior for specific use cases. Text properties that will only be used for exact-match filtering should disable the searchable index to save storage space. Properties used in high-cardinality filters may benefit from case normalization settings.

Source: adapters/repos/db/inverted/config.go

Hybrid search in Weaviate combines vector similarity search with BM25 keyword search. The inverted index plays a critical role in the BM25 component by providing fast access to term frequencies and document counts needed for relevance scoring.

Hybrid Score Calculation

Hybrid search calculates both vector similarity scores and BM25 scores, then combines them using a weighted formula. The relative importance of keyword matching versus semantic similarity can be adjusted through the query's alpha parameter, where alpha=1.0 is pure vector search and alpha=0.0 is pure keyword search.

Source: adapters/repos/db/inverted/bm25_searcher.go

Known Limitations

Based on community feedback and reported issues, there are important considerations when using filters with hybrid search:

Metadata Filter Compatibility: Filters applied to metadata (text properties) may not work correctly in all hybrid search configurations, particularly when used through third-party integrations. This is a known issue tracked in the community. Users should test filter behavior with their specific configuration and consider applying filters as a post-processing step if needed.

Source: adapters/repos/db/inverted/searcher.go

Performance Considerations

The inverted index is optimized for read-heavy workloads but requires careful management during write operations. Understanding these tradeoffs helps in designing efficient data models and query patterns.

Index Maintenance

When objects are added, updated, or deleted, the inverted index must be updated to reflect the changes. Weaviate uses write-ahead logging and eventual consistency to batch index updates, reducing the performance impact on concurrent read operations.

Memory Usage

Frequently accessed portions of the inverted index are cached in memory. The index's memory footprint depends on the number of unique property values, the total number of objects, and the configuration of filterable and searchable properties.

Query Optimization

Efficient filter queries share common characteristics:

  • High selectivity: Filters that match fewer objects execute faster
  • Proper indexing: Properties used in filters should have indexFilterable enabled
  • Appropriate data types: Numeric filters generally perform better than text equality filters on high-cardinality properties

Source: adapters/repos/db/inverted/searcher.go

Source: https://github.com/weaviate/weaviate / Human Manual

Filtering and Query Language

Related topics: Hybrid Search Implementation, Inverted Index and Filtering

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Key Components

Continue reading this section for the full explanation and source context.

Section Core Filter Structure

Continue reading this section for the full explanation and source context.

Section Supported Operators

Continue reading this section for the full explanation and source context.

Related topics: Hybrid Search Implementation, Inverted Index and Filtering

Filtering and Query Language

Weaviate provides a powerful filtering system that enables precise data retrieval by combining traditional database filtering capabilities with vector search operations. The filtering infrastructure supports complex query compositions, nested object filtering, and integration with hybrid search operations.

Overview

Weaviate's filtering and query language serves as the foundation for data retrieval operations across all search types. The system provides:

  • Semantic and keyword filtering - Filter results based on property values
  • Hybrid search integration - Combine vector search with BM25 and metadata filters
  • Nested object filtering - Query deeply into nested data structures
  • Multi-condition filters - Build complex queries with AND/OR logic
  • Operator support - Rich set of comparison and text operators

The filtering layer sits between the API handlers and the database repository layer, parsing user-provided filter expressions into an Abstract Syntax Tree (AST) that can be efficiently executed against the inverted index.

Architecture

The filtering system follows a layered architecture:

graph TD
    A[GraphQL/REST API Request] --> B[Filter Parser]
    B --> C[Filter AST]
    C --> D[Inverted Index Searcher]
    D --> E[Filtered Results]
    
    F[Hybrid Search Query] --> G[Vector Search Component]
    G --> H[BM25 Component]
    H --> B
    G --> H
    G --> C
    H --> C
    
    I[Nested Object Filter] --> J[Nested Searcher]
    J --> D

Key Components

ComponentLocationResponsibility
Filter Parseradapters/handlers/graphql/local/common_filters/Parses filter expressions into AST
Filter ASTentities/filters/filters.goDefines the filter data structure
Inverted Searcheradapters/repos/db/inverted/Executes filters against inverted index
Nested Searcheradapters/repos/db/inverted/searcher_nested.goHandles nested object filtering

Filter Data Model

The filter data model is defined in entities/filters/filters.go and represents the canonical structure for all filter operations.

Core Filter Structure

type Filter struct {
    Operator OperatorType
    // Path to the property being filtered
    Path []string
    // Value for the filter
    Value *Value
}

Supported Operators

OperatorDescriptionExample
EqualExact matchname == "John"
NotEqualNot equalname != "John"
GreaterThanGreater thanage > 21
GreaterThanEqualGreater than or equalage >= 21
LessThanLess thanage < 65
LessThanEqualLess than or equalage <= 65
LikePattern matchname like "Jo*"
IsNullNull checkname is null
IsNotNullNot nullname is not null
ContainsAnyArray contains anytags contains any ["a", "b"]
ContainsAllArray contains alltags contains all ["a", "b"]

Filter Parsing Pipeline

Filters reach Weaviate through REST and GraphQL APIs. The parsing pipeline transforms raw filter expressions into executable operations.

sequenceDiagram
    participant Client
    participant GraphQL as GraphQL Handler
    participant Parser as Filter Parser
    participant AST as Filter AST
    participant Searcher as Inverted Searcher
    participant DB as Database
    
    Client->>GraphQL: GraphQL Query with filters
    GraphQL->>Parser: Parse filter expression
    Parser->>Parser: Tokenize and validate
    Parser->>AST: Generate Filter AST
    AST->>Searcher: Execute filter
    Searcher->>DB: Query inverted index
    DB-->>Searcher: Matching object IDs
    Searcher-->>AST: Filtered ID set
    AST-->>GraphQL: Filtered results
    GraphQL-->>Client: JSON response

Parser Implementation

The filter parser in adapters/handlers/graphql/local/common_filters/parse_filters_into_ast.go performs the following steps:

  1. Tokenization - Split filter string into tokens
  2. AST Generation - Build the filter tree structure
  3. Validation - Verify path existence and type compatibility
  4. Optimization - Reorder conditions for efficient execution

Hybrid Search Filters

Hybrid search combines vector similarity search with BM25 keyword matching and metadata filters. When filters are applied to hybrid queries, they are executed against both the vector search results and the BM25 results.

graph LR
    A[Hybrid Query] --> B[Vector Search]
    A --> C[BM25 Search]
    A --> D[Metadata Filter]
    
    B --> E[Result Set A]
    C --> F[Result Set B]
    D --> G[Result Set C]
    
    E --> H[Intersection/Union]
    F --> H
    G --> H
    
    H --> I[Final Results]

Important Considerations

When using filters with hybrid search in external integrations (such as n8n), be aware of:

  • Filter placement - Filters must be correctly passed to the hybrid query structure
  • Field type matching - Text fields require specific filter syntax compared to numeric fields
  • Operator selection - Use appropriate operators for each data type
Note: There is a known issue where metadata filters may not work correctly for hybrid search in certain integration scenarios. See Issue #11262 for details.

Nested Object Filtering

Weaviate supports storing and filtering nested objects. The nested object filtering capability allows querying properties within nested data structures.

Nested Filter Structure

Filters for nested objects use dot-notation paths to specify the property hierarchy:

{
  "path": ["parent", "child", "grandchild"],
  "operator": "Equal",
  "valueText": "value"
}

Nested Searcher Architecture

The nested searcher in adapters/repos/db/inverted/searcher_nested.go handles the execution of filters targeting nested properties:

graph TD
    A[Nested Filter Request] --> B[Path Resolution]
    B --> C{Nested Property?}
    C -->|Yes| D[Nested Index Lookup]
    C -->|No| E[Standard Index Lookup]
    D --> F[Flatten Results]
    E --> G[Direct Results]
    F --> H[Merge Results]
    G --> H

Limitations

As noted in Issue #3694, filtering for nested objects has specific requirements:

  • Nested filtering requires proper index configuration
  • Path segments must match the schema definition exactly
  • Array-based nested properties require specific operator usage (ContainsAny, ContainsAll)

GraphQL Filter Syntax

Weaviate's GraphQL API provides a comprehensive filter syntax for constructing queries.

Basic Filter Structure

{
  Get {
    Article(
      where: {
        path: ["category"],
        operator: Equal,
        valueText: "technology"
      }
    ) {
      title
      content
    }
  }
}

Compound Filters with AND/OR

{
  Get {
    Article(
      where: {
        operator: And,
        operands: [
          {
            path: ["category"],
            operator: Equal,
            valueText: "technology"
          },
          {
            path: ["publishDate"],
            operator: GreaterThan,
            valueInt: 1609459200
          }
        ]
      }
    ) {
      title
    }
  }
}

Text Search with Filters

{
  Get {
    Article(
      nearText: { concepts: ["artificial intelligence"] },
      where: {
        path: ["author"],
        operator: Like,
        valueText: "John*"
      }
    ) {
      title
    }
  }
}

REST API Filtering

Filters can also be applied through the REST API using query parameters.

Query Parameter Syntax

GET /v1/Article?where={"path":["category"],"operator":"Equal","valueText":"tech"}

Combined Search and Filter

POST /v1/Article/search
{
  "vector": [0.1, 0.2, ...],
  "distance": 0.3,
  "filters": {
    "path": ["inStock"],
    "operator": "Equal",
    "valueBoolean": true
  }
}

Common Filter Patterns

Numeric Range Filter

where: {
  operator: And,
  operands: [
    { path: ["price"], operator: GreaterThanEqual, valueInt: 100 },
    { path: ["price"], operator: LessThanEqual, valueInt: 500 }
  ]
}

Array Membership Filter

where: {
  path: ["tags"],
  operator: ContainsAny,
  valueTextArray: ["featured", "premium"]
}

Text Pattern Filter

where: {
  path: ["description"],
  operator: Like,
  valueText: "*machine*"
}

Null Check Filter

where: {
  path: ["deletedAt"],
  operator: IsNull,
  valueBoolean: false
}

Known Issues and Limitations

Active Community Issues

IssueDescriptionStatus
#11262Metadata filter does not work for hybrid search in n8nOpen
#3694Filtering and Vectorization for Nested ObjectsOpen
#3683Feature Request: Add "Not" operator in filterOpen

"Not" Operator Limitations

As noted in Issue #3683, the "Not" operator was previously removed due to implementation issues. Currently, negative filtering must be achieved through indirect approaches:

  • Use NotEqual for single-value exclusions
  • Combine IsNull with Equal for some exclusion patterns
  • Use ContainsAny/ContainsAll with complement sets for array exclusions

Performance Considerations

Index Optimization

  • Ensure filtered properties are indexed
  • Review index statistics for optimal query planning
  • Consider composite indexes for frequently combined filters

Query Optimization

  1. Filter order - Place the most selective filters first
  2. Operator selection - Use equality operators when possible
  3. Path specificity - Use exact paths rather than wildcard patterns
  4. Batch operations - When filtering large datasets, consider pagination

Source: https://github.com/weaviate/weaviate / Human Manual

REST and gRPC API Layer

Related topics: System Architecture

Section Related Pages

Continue reading this section for the full explanation and source context.

Section API Layer Design

Continue reading this section for the full explanation and source context.

Section REST API Purpose and Scope

Continue reading this section for the full explanation and source context.

Section gRPC API Purpose and Scope

Continue reading this section for the full explanation and source context.

Related topics: System Architecture

REST and gRPC API Layer

Overview

Weaviate provides a dual-layer API architecture consisting of a REST API for standard HTTP-based operations and a gRPC API for high-performance internal cluster communication and optimized search operations. The REST API serves as the primary external interface for client applications, while gRPC handles efficient inter-node communication within distributed Weaviate clusters.

The system supports multiple API endpoints including the REST API, gRPC API, and GraphQL API to communicate with the database server. Source: README.md:1

Architecture

API Layer Design

Weaviate's API layer is organized into two primary categories:

  1. REST Handlers (adapters/handlers/rest/) - Handle HTTP-based client requests for objects, schemas, and batch operations
  2. gRPC Service (adapters/handlers/grpc/) - Handle high-performance binary protocol communication for internal cluster operations and search requests
graph TD
    A[Client Applications] --> B[REST API Gateway]
    A --> C[gRPC Client]
    B --> D[REST Handlers]
    C --> E[gRPC Service Layer]
    D --> F[Business Logic Layer]
    E --> F
    F --> G[(Data Storage)]
    H[Cluster Nodes] --> E

REST API Purpose and Scope

The REST API provides the primary interface for:

  • Object Management - CRUD operations for data objects
  • Schema Operations - Collection/class definitions and property management
  • Batch Operations - Bulk import and updates of objects
  • Search Endpoints - Semantic search, hybrid search, BM25, and filtered queries
  • System Management - Backup, cluster, and tenant operations

gRPC API Purpose and Scope

The gRPC layer handles:

  • Internal Cluster Communication - Node-to-node coordination for distributed operations
  • Distributed Task Management - Task distribution across cluster nodes
  • High-Performance Search - Optimized search request processing
  • Replication Coordination - Shard synchronization and replication state

gRPC Service Implementation

Cluster Service Protocol

The internal cluster communication uses gRPC with the ClusterService defined in cluster/proto/api/message.proto. The service defines five primary RPC methods:

MethodFull Method NamePurpose
RemovePeer/weaviate.internal.cluster.ClusterService/RemovePeerRemove a node from the cluster
JoinPeer/weaviate.internal.cluster.ClusterService/JoinPeerAdd a new node to the cluster
NotifyPeer/weaviate.internal.cluster.ClusterService/NotifyPeerBroadcast node state changes
Apply/weaviate.internal.cluster.ClusterService/ApplyApply distributed operations
Query/weaviate.internal.cluster.ClusterService/QueryQuery cluster state

Source: cluster/proto/api/message_grpc.pb.go:21-25

Generated gRPC Code

The gRPC code is generated using protoc-gen-go-grpc (v1.6.2) from the protobuf definition files. The generated code provides:

  • Type-safe client interfaces (ClusterServiceClient)
  • Server-side interfaces (ClusterServiceServer)
  • Streaming RPC support for high-volume data transfers
  • Automatic connection management and load balancing

gRPC Client Configuration

Internal gRPC clients are configured with specific options for reliability and performance:

grpc.NewClient(uri,
    grpc.WithTransportCredentials(insecure.NewCredentials()),
    grpc.WithDefaultCallOptions(grpc.MaxCallRecvMsgSize(1024*1024*48)))

Key configuration aspects:

  • Transport Credentials: Uses insecure credentials for internal cluster communication (mTLS is handled at the network layer)
  • Message Size: Maximum receive message size of 48MB for handling large search results and batch transfers
  • Connection Management: Automatic connection pooling and failover

Source: modules/text2vec-contextionary/client/contextionary.go:25-29

Protocol Buffer Message Types

Query Request Types

The cluster protocol defines extensive query types for internal operations:

Type IDType NamePurpose
0TYPE_UNSPECIFIEDUnspecified query type
1TYPE_GET_CLASSESRetrieve class definitions
2TYPE_GET_SCHEMAGet full schema
3TYPE_GET_TENANTSList tenants in multi-tenant setup
4TYPE_GET_SHARD_OWNERGet shard ownership information
5TYPE_GET_TENANTS_SHARDSMap tenants to their shards
6TYPE_GET_SHARDING_STATEGet sharding configuration
7TYPE_GET_CLASS_VERSIONSTrack schema version changes
8TYPE_GET_COLLECTIONS_COUNTCollection statistics
30TYPE_HAS_PERMISSIONAuthorization checks
31-34TYPE_GET_ROLES_*RBAC role management
200-208TYPE_GET_REPLICATION_*Replication state queries
300TYPE_DISTRIBUTED_TASK_LISTList distributed tasks

Source: cluster/proto/api/message.pb.go:1-50

Distributed Task Messages

The protocol includes comprehensive support for distributed task execution:

Message TypeFieldsPurpose
AddDistributedTaskRequestNamespace, Id, Payload, UnitIds, UnitSpecsSubmit a distributed task
RecordDistributedTaskPreparationCompleteAckRequestNamespace, Id, Version, NodeId, SuccessAcknowledge preparation phase
RecordDistributedTaskUnitCompletionRequestNamespace, Id, Version, NodeId, UnitId, Error, FinishedAtUnixMillisReport unit completion
CleanUpDistributedTaskRequestNamespace, Id, VersionClean up task resources

Source: cluster/proto/api/message.pb.go:1-100

REST API Structure

Handler Organization

The REST API handlers are organized by functional domain:

Handler DomainFile PatternOperations
Objectshandlers_objects.goObject CRUD operations
Schemahandlers_schema.goCollection and property management
Batchhandlers_batch_objects.goBulk operations
Searchhandlers_*search*.goQuery and search operations

HTTP Module Clients

Module integrations use HTTP clients for external service communication:

type client struct {
    origin     string
    httpClient *http.Client
    logger     logrus.FieldLogger
}

Configuration includes:

  • Origin URL: Base URL for module service endpoints
  • Timeout: Configurable request timeout
  • Logging: Field logger for debugging and monitoring

Source: modules/sum-transformers/client/client.go:1-50

API Communication Patterns

Request-Response Flow

sequenceDiagram
    participant Client
    participant REST as REST API
    participant Business as Business Logic
    participant Storage as Data Layer
    
    Client->>REST: HTTP Request
    REST->>Business: Process Request
    Business->>Storage: Data Operation
    Storage-->>Business: Result
    Business-->>REST: Processed Response
    REST-->>Client: HTTP Response

gRPC Internal Communication

sequenceDiagram
    participant NodeA as Node A
    participant NodeB as Node B
    participant Raft as RAFT Consensus
    
    NodeA->>NodeB: gRPC Query/Apply
    NodeB->>Raft: Consensus Decision
    Raft-->>NodeB: Consensus Result
    NodeB-->>NodeA: Response with Version/ACK

Common API Operations

Search Operations

Weaviate supports multiple search modalities through its API layer:

Search TypeDescriptionAPI Support
Semantic SearchVector-based similarity searchREST, gRPC
Hybrid SearchCombines semantic + BM25 keyword searchREST, gRPC
BM25Traditional keyword-based rankingREST, gRPC
Filtered SearchWith metadata filteringREST, gRPC
Image SearchVector search using imagesREST

Source: README.md:1

Filtering Considerations

When using metadata filters with hybrid search, ensure filters are properly applied at the API request level. Known limitation: some integration scenarios (e.g., n8n Vector Store node) may have issues with metadata filters in hybrid search mode.

Community Issue: #11262 - Metadata filter does not work for hybrid search in n8n

Version and Compatibility

API Stability

The REST API follows semantic versioning with the main Weaviate release. Breaking changes are documented in release notes.

Protocol Buffer Compatibility

Generated protobuf code maintains backward compatibility:

// Code generated by protoc-gen-go-grpc. DO NOT EDIT.
// versions:
// - protoc-gen-go-grpc v1.6.2
// - protoc             (unknown)
// source: api/message.proto

The system requires gRPC-Go v1.64.0 or later for proper compatibility:

// This is a compile-time assertion to ensure that this generated file
// is compatible with the grpc package it is being compiled against.
// Requires gRPC-Go v1.64.0 or later.
const _ = grpc.SupportPackageIsVersion9

Source: cluster/proto/api/message_grpc.pb.go:1-20

Best Practices

Client Implementation

  1. Use Connection Pooling: Reuse gRPC connections for better performance
  2. Configure Timeouts: Set appropriate timeouts for long-running operations
  3. Handle Retries: Implement retry logic for transient failures
  4. Monitor Connections: Track connection health for cluster nodes

Error Handling

The API layer provides descriptive error messages for common failure scenarios:

if res.StatusCode > 399 {
    return nil, errors.Errorf("fail with status %d: %s", res.StatusCode, resBody.Error)
}

Source: modules/sum-transformers/client/client.go:1-80

Further Resources

Source: https://github.com/weaviate/weaviate / Human Manual

Doramagic Pitfall Log

Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.

high Installation risk requires verification

May increase setup, validation, or first-run risk for the user.

high Security or permission risk requires verification

May increase setup, validation, or first-run risk for the user.

high Security or permission risk requires verification

May increase setup, validation, or first-run risk for the user.

medium Configuration risk requires verification

Developers may misconfigure credentials, environment, or host setup: bq.rescoreLimit=-1 accepted and silently discarded (no validation)

Doramagic Pitfall Log

Found 26 structured pitfall item(s), including 3 high/blocking item(s). Top priority: Installation risk - Installation risk requires verification.

1. Installation risk: Installation risk requires verification

  • Severity: high
  • Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: community_evidence:github | cevd_123630d837d14b1c83a54d58a53135f4 | https://github.com/weaviate/weaviate/issues/11534

2. Security or permission risk: Security or permission risk requires verification

  • Severity: high
  • Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: community_evidence:github | cevd_4ee4e459abca4d2d85de3fa2a5930252 | https://github.com/weaviate/weaviate/issues/11402

3. Security or permission risk: Security or permission risk requires verification

  • Severity: high
  • Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: community_evidence:github | cevd_897580adbd7d461abe5642c6dac0c4c9 | https://github.com/weaviate/weaviate/issues/11401

4. Configuration risk: Configuration risk requires verification

  • Severity: medium
  • Finding: Developers should check this configuration risk before relying on the project: bq.rescoreLimit=-1 accepted and silently discarded (no validation)
  • User impact: Developers may misconfigure credentials, environment, or host setup: bq.rescoreLimit=-1 accepted and silently discarded (no validation)
  • Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: bq.rescoreLimit=-1 accepted and silently discarded (no validation). Context: Observed when using node, python, docker
  • Evidence: failure_mode_cluster:github_issue | fmev_66059c07dc1bf7e9ba3ae29643f74f6d | https://github.com/weaviate/weaviate/issues/11402

5. Configuration risk: Configuration risk requires verification

  • Severity: medium
  • Finding: Developers should check this configuration risk before relying on the project: replicationFactor=-1 accepted and silently normalized to 1 (no validation)
  • User impact: Developers may misconfigure credentials, environment, or host setup: replicationFactor=-1 accepted and silently normalized to 1 (no validation)
  • Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: replicationFactor=-1 accepted and silently normalized to 1 (no validation). Context: Observed when using node, python, docker
  • Evidence: failure_mode_cluster:github_issue | fmev_923df3fcace46105c3011afcdccf8664 | https://github.com/weaviate/weaviate/issues/11401

6. Configuration risk: Configuration risk requires verification

  • Severity: medium
  • Finding: Developers should check this configuration risk before relying on the project: v1.35.20 - Adjust text2vec-google batch limits + qa scripts
  • User impact: Upgrade or migration may change expected behavior: v1.35.20 - Adjust text2vec-google batch limits + qa scripts
  • Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: v1.35.20 - Adjust text2vec-google batch limits + qa scripts. Context: Observed when using docker, linux
  • Evidence: failure_mode_cluster:github_release | fmev_2b31f056d2e0d7add2fd46a24be9c52d | https://github.com/weaviate/weaviate/releases/tag/v1.35.20

7. Configuration risk: Configuration risk requires verification

  • Severity: medium
  • Finding: Developers should check this configuration risk before relying on the project: v1.36.14 - Backup GCS module avoid full object scan during listing Fix
  • User impact: Upgrade or migration may change expected behavior: v1.36.14 - Backup GCS module avoid full object scan during listing Fix
  • Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: v1.36.14 - Backup GCS module avoid full object scan during listing Fix. Context: Observed when using docker, linux
  • Evidence: failure_mode_cluster:github_release | fmev_595f992aafdc3ac5ef016c95e400760a | https://github.com/weaviate/weaviate/releases/tag/v1.36.14

8. Configuration risk: Configuration risk requires verification

  • Severity: medium
  • Finding: Developers should check this configuration risk before relying on the project: v1.37.3 - Cluster steadiness & async replication Fixes
  • User impact: Upgrade or migration may change expected behavior: v1.37.3 - Cluster steadiness & async replication Fixes
  • Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: v1.37.3 - Cluster steadiness & async replication Fixes. Context: Observed when using node
  • Evidence: failure_mode_cluster:github_release | fmev_1aa71228476d3cb06842102005b0b1ad | https://github.com/weaviate/weaviate/releases/tag/v1.37.3

9. Capability evidence risk: Capability evidence risk requires verification

  • Severity: medium
  • Finding: README/documentation is current enough for a first validation pass.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: capability.assumptions | github_repo:55072677 | https://github.com/weaviate/weaviate

10. Runtime risk: Runtime risk requires verification

  • Severity: medium
  • Finding: Project evidence flags a runtime risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: packet_text.keyword_scan | github_repo:55072677 | https://github.com/weaviate/weaviate

11. Maintenance risk: Maintenance risk requires verification

  • Severity: medium
  • Finding: Developers should check this migration risk before relying on the project: v1.35.21 - New text2vec-digitalocean module
  • User impact: Upgrade or migration may change expected behavior: v1.35.21 - New text2vec-digitalocean module
  • Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: v1.35.21 - New text2vec-digitalocean module. Context: Observed during version upgrade or migration.
  • Evidence: failure_mode_cluster:github_release | fmev_8e7fecd34326feab3c938f272056b706 | https://github.com/weaviate/weaviate/releases/tag/v1.35.21

12. Maintenance risk: Maintenance risk requires verification

  • Severity: medium
  • Finding: Developers should check this migration risk before relying on the project: v1.36.15 - new text2vec-digitalocean module, fixed text2vec-google batch logic
  • User impact: Upgrade or migration may change expected behavior: v1.36.15 - new text2vec-digitalocean module, fixed text2vec-google batch logic
  • Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: v1.36.15 - new text2vec-digitalocean module, fixed text2vec-google batch logic. Context: Observed during version upgrade or migration.
  • Evidence: failure_mode_cluster:github_release | fmev_c5fbc8f87fe8aa9724f70b1833fd9e6b | https://github.com/weaviate/weaviate/releases/tag/v1.36.15

Source: Doramagic discovery, validation, and Project Pack records

Community Discussion Evidence

These external discussion links are review inputs, not standalone proof that the project is production-ready.

Sources 12

Count of project-level external discussion links exposed on this manual page.

Use Review before install

Open the linked issues or discussions before treating the pack as ready for your environment.

Community Discussion Evidence

Doramagic exposes project-level community discussion separately from official documentation. Review these links before using weaviate with real data or production workflows.

Source: Project Pack community evidence and pitfall evidence