milvus Manual - Doramagic.ai

Doramagic Project Pack · Human Manual

milvus

Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search

Milvus Overview and Distributed Architecture

Related topics: Data Management, Storage Engine, and Indexing, Query, Search Execution, and Streaming, Client SDKs, Deployment, and Common Operational Issues

Section Related Pages

Continue reading this section for the full explanation and source context.

Milvus Overview and Distributed Architecture

Purpose and Scope

Milvus is a cloud-native, distributed vector database designed for similarity search over large-scale embedding data. The repository houses the server, its official Go SDK, and supporting infrastructure. The Go SDK is published from the client/v2 module, where users create a milvusclient.Client against a milvusclient.ClientConfig{Address: ...} to issue collection, search, and insert operations Source: client/README.md:17-29. The server is built around a microservice-style architecture in which each role — proxy, query node, data node, streaming coordinator — is independently scalable. The current release line is milvus-2.6.18 (release notes pending) and a milvus-3.0.0-beta extends Milvus with "External Collection" integration into the open lake ecosystem Source: release: milvus-3.0.0-beta. The Go SDK has reached client/v2.6.5, adding nullable vector columns, ARRAY_APPEND/ARRAY_REMOVE upsert helpers, and validation that newly added vector fields are nullable Source: release: client/v2.6.5.

Core Distributed Components

Milvus decomposes workload across stateless proxies and stateful workers. Proxies terminate SDK traffic and forward data-plane requests. QueryNodes serve vector and scalar queries; DataNodes handle sealed-segment indexing and persistence. The Streaming Coordinator (streamingcoord) is the control plane for the write path and channel assignment. Its AssignmentService is the implementation of streamingpb.StreamingCoordAssignmentServiceServer and exposes AssignmentDiscover for streaming nodes to subscribe to channel layouts Source: internal/streamingcoord/server/service/assignment.go:1-49. The service registers an AlterReplicateConfigV2AckCallback and exposes a listenerTotal Prometheus gauge metrics.StreamingCoordAssignmentListenerTotal keyed on the node ID Source: internal/streamingcoord/server/service/assignment.go:26-33.

The data plane is separated from the control plane by a Write-Ahead Log (WAL). A single Milvus deployment can use the embedded RocksMQ broker or an external system such as Kafka; deployments that report bugs against "MQ type: kafka" confirm the broker is pluggable Source: issue #50333. Below is the high-level topology:

flowchart LR
    SDK[Go/Python/Java/Node SDK] --> Proxy
    Proxy --> QN[QueryNode pool]
    Proxy --> DN[DataNode pool]
    Proxy --> SC[StreamingCoord]
    SC -->|channel assignment| QN
    SC -->|broadcast DDL| WAL[(WAL: RocksMQ / Kafka)]
    WAL --> QN
    WAL --> DN
    SC -->|AssignmentDiscover gRPC| SN[StreamingNode]
    SC -->|vchannelFair policy| BAL[Balancer]

Streaming Coordination and the WAL

The streaming coordinator splits its duties across a broadcaster (which fans out control-plane DDL/control messages to all log nodes) and a balancer (which assigns physical PChannels to streaming nodes). The balancer supports pluggable policies; the vchannelfair policy, built by PolicyBuilder, is registered under the name "vchannelFair" and is constructed from parameters such as WALBalancerPolicyVChannelFairPChannelWeight, VChannelWeight, AntiAffinityWeight, RebalanceTolerance, and RebalanceMaxStep Source: internal/streamingcoord/server/balancer/policy/vchannelfair/builder.go:1-60. The broadcaster exposes lifecycle parameters used by tests: WALBroadcasterTombstoneCheckInternal, WALBroadcasterTombstoneMaxCount, and related knobs govern how often tombstone state is compacted Source: internal/streamingcoord/server/broadcaster/broadcaster_test.go:33-37.

Streaming nodes receive their assignment by opening a long-lived AssignmentDiscover stream; the helper discoverGrpcServerHelper.SendFullAssignment walks the balancer's Relations, queries the StreamingNodeManagerClient for the live set of streaming nodes, and emits one StreamingNodeAssignment proto per node with the channels it should own Source: internal/streamingcoord/server/service/discover/discover_grpc_server_helper.go:13-34. For replication, the same assignmentServiceImpl validates incoming AlterReplicateConfigV2 requests and rejects no-op changes with the sentinel errReplicateConfigurationSame Source: internal/streamingcoord/server/service/assignment.go:22-23.

The embedded broker, rocksmq, is a RocksDB-backed message queue used when no external MQ is configured. Its server tracks topics, page IDs (constructKey, parsePageID), and a retention policy driven by RocksmqCfg.RetentionSizeInMB and RetentionTimeInMinutes; retention is disabled when both are -1 Source: pkg/mq/mqimpl/rocksmq/server/rocksmq_impl.go:1-26. Consumer groups are coordinated through a consumerList guarded by a sync.RWMutex, with Add/Remove/Get/Notify methods ensuring that consumer registrations are deduplicated and lifecycle events reach the broker loop Source: pkg/mq/mqimpl/rocksmq/server/rocksmq_impl.go:28-58.

Proxy and ShardClient

Inside the proxy, the shardclient package is responsible for translating logical requests into physical QueryNode calls. It maintains a database → collection → shard → leader cache, owns a pool of gRPC clients to QueryNodes, and applies an LBPolicy to distribute load across replicas Source: internal/proxy/shardclient/README.md:9-25. The package delivers four guarantees: connection lifecycle management, a shard-leader cache to avoid extra coordination round-trips, configurable load balancing, and automatic retry/failover when a QueryNode is unavailable Source: internal/proxy/shardclient/README.md:13-23.

Observability, Tooling, and Community

Cross-cutting concerns are factored into shared libraries. mlog is a context-aware logging package built on zap whose design goals include mandatory context passing, zero-overhead abstraction via type aliases, automatic field accumulation across the call chain, cross-service gRPC metadata propagation, and lazy field encoding for disabled log levels Source: pkg/mlog/README.md:5-11. Development workflow is automated by tools/mgit.py, which produces Conventional-Commits-style messages, signs the DCO, creates a feature branch, opens a PR linked to a GitHub issue, and supports cherry-pick to release branches such as 2.6 and 2.5 Source: tools/README.md:30-60; all PRs must include an issue: #NNNNN reference Source: tools/README.md:64-70.

Long-standing community requests have shaped the architecture: ongoing demand for a real string primary key type (#4430), backup/restore (#9685), in-place schema evolution on non-empty collections (#20405), and ScaNN index support (#2771) reflect where users push the distributed model. The recently merged nullable StructArray work — exercised by issue #50333, an ASAN heap-buffer-overflow when querying a dynamically added nullable StructArray with element_filter — shows that schema evolution remains an active surface Source: issue #50333.

Data Management, Storage Engine, and Indexing

Related topics: Milvus Overview and Distributed Architecture, Query, Search Execution, and Streaming

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Topic Lifecycle

Continue reading this section for the full explanation and source context.

Section Message Production and Page Tracking

Continue reading this section for the full explanation and source context.

Section Consumer Groups

Continue reading this section for the full explanation and source context.

Data Management, Storage Engine, and Indexing

Overview

Milvus organizes data management, storage, and indexing across three cooperating subsystems: (1) an in-process message queue (RocksMQ) that buffers streaming writes, (2) a streaming coordinator that assigns physical channels (PChannels) to streaming nodes, and (3) a core engine that holds chunked/growing segments and exposes indexes. The retrieved source files focus on the first two layers — RocksMQ's topic lifecycle, retention, and the streaming channel balancer — which together form the data ingestion substrate. Index construction itself lives in the C++ core; this page documents the surrounding storage machinery and channel topology that any index ultimately feeds on.

Message Queue Storage: RocksMQ

RocksMQ is the lightweight, embedded message queue Milvus uses when an external broker is not configured. It persists topics inside RocksDB and writes per-message entries keyed by topic and monotonically increasing message ID.

Topic Lifecycle

A topic is created on demand the first time it is referenced. The implementation rejects topic names containing /, locks the topic in a global topicMu map, and seeds metadata keys (TopicIDTitle, message-size counters, page-size entries) in the underlying KV store. Source: pkg/mq/mqimpl/rocksmq/server/rocksmq_impl.go.

if strings.Contains(topicName, "/") {
    return retry.Unrecoverable(fmt.Errorf("topic name = %s contains \"/\"", topicName))
}
topicIDKey := TopicIDTitle + topicName
val, err := rmq.kv.Load(context.TODO(), topicIDKey)
if err != nil { return err }
if val != "" { return nil } // topic already exists

DestroyTopic performs a coordinated cleanup under the same per-topic lock: it removes message data, page-size and page-timestamp prefixes, the acknowledged-timestamp index, and the topic header atomically via MultiRemove. Source: pkg/mq/mqimpl/rocksmq/server/rocksmq_impl.go.

Message Production and Page Tracking

Producers append messages and update a running "current page" size counter. When a new message would overflow the configured PageSize, the producer finalizes the current page by writing a PageMsgSizeTitle/{endID} entry and a timestamp entry, then resets the counter. This yields a paged on-disk layout that retention can later walk efficiently. Source: pkg/mq/mqimpl/rocksmq/server/rocksmq_impl.go.

Consumer Groups

A consumerList is attached to every topic and keyed by group name. RegisterConsumer ensures each group is registered once, and getCurrentID loads the last-acked position from the KV store so that restarts resume from the same offset. Source: pkg/mq/mqimpl/rocksmq/server/rocksmq_impl.go.

Data Retention

Retention is the bridge between unbounded message growth and bounded disk usage. RocksMQ supports two independent policies, both controlled through paramtable:

Knob	Default behavior	Effect
`RocksmqCfg.RetentionSizeInMB`	`-1` (disabled)	Triggers page deletion once acked bytes exceed the threshold.
`RocksmqCfg.RetentionTimeInMinutes`	`-1` (disabled)	Triggers page deletion once an acked page is older than the threshold.
`RocksmqCfg.PageSize`	default page size in MB	Sets the page granularity for production and retention sweeps.

Source: pkg/mq/mqimpl/rocksmq/server/rocksmq_impl.go, pkg/mq/mqimpl/rocksmq/server/rocksmq_retention.go.

Retention scans the PageMsgSizeTitle/{topic}/ prefix in order, then loads the corresponding AckedTsTitle/{pageID} timestamp. Pages that are fully acked and either exceed the size budget or are older than the retention time are deleted in a single RocksDB WriteBatch that also clears their acked-timestamp entries. The same loop never deletes a page that still has unacked consumers. Source: pkg/mq/mqimpl/rocksmq/server/rocksmq_retention.go.

The retention sweep can be exercised in tests: TestRmqRetention_PageSizeExpire forces PageSize=10 and RetentionSizeInMB=1, produces ~100,000 small messages, consumes them, and asserts that pages are reclaimed. Source: pkg/mq/mqimpl/rocksmq/server/rocksmq_retention_test.go.

Streaming Channel Coordination

Above RocksMQ sits the streaming coordinator, which assigns PChannels to streaming nodes. The PChannelMeta wrapper exposes the channel's state machine:

stateDiagram-v2
    [*] --> PCHANNEL_META_STATE_UNASSIGNED
    PCHANNEL_META_STATE_UNASSIGNED --> PCHANNEL_META_STATE_ASSIGNING
    PCHANNEL_META_STATE_ASSIGNING --> PCHANNEL_META_STATE_ASSIGNED
    PCHANNEL_META_STATE_ASSIGNED --> PCHANNEL_META_STATE_ASSIGNING
    PCHANNEL_META_STATE_ASSIGNING --> PCHANNEL_META_STATE_UNASSIGNED
    PCHANNEL_META_STATE_ASSIGNED --> [*]

State helpers IsAssigned and IsAssignedOrAssigning gate operations that must wait for stable ownership. Each transition writes a history entry (Term, AccessMode, Node) so the metadata can be replayed for diagnostics. Source: internal/streamingcoord/server/balancer/channel/pchannel.go.

The vchannel-fair balancer distributes PChannels across nodes roughly proportional to the virtual-channel count each one already serves. Tests build an assignmentSnapshot with explicit ChannelsToNodes and AllNodesInfo maps and assert that the fair policy produces a balanced reassignment. Source: internal/streamingcoord/server/balancer/policy/vchannelfair/pchannel_count_fair_test.go.

Development Tooling and Community Context

The tools/mgit.py script wraps commit and PR creation with AI-assisted messages, automatic DCO signing, and optional cherry-picks to release branches (e.g. 2.6, 2.5). This matters for storage-engine work because retention bugs and channel-assignment fixes typically need to land on multiple release branches. Source: tools/README.md.

Community discussions show that the storage layer is the bottleneck for several long-standing feature requests: dynamic schema evolution (issue #20405), string primary keys (issue #1924), and on-the-fly ScaNN/backup-restore (#2771, #9685). Each of these ultimately requires coordination between the segment manager (C++ core), the data coordinator, and the channel balancer documented above. Community bug #50333 (ASAN heap-buffer-overflow on a dynamically added nullable StructArray with element_filter) is a recent reminder that schema changes must stay aligned with the storage codecs.

The C++ core vendors the fluent::NamedType template to express strong types (e.g. distinct Width/Height types) — a pattern used throughout the indexing and storage code to prevent silent unit confusion. Source: internal/core/thirdparty/NamedType/README.md.

Common Failure Modes

Topic name rejected: a / in the topic name returns retry.Unrecoverable from CreateTopic. Source: pkg/mq/mqimpl/rocksmq/server/rocksmq_impl.go.
Retention starved by unacked consumers: the retention loop breaks as soon as it sees a page with no AckedTsTitle entry, so a stuck consumer group will keep pages alive indefinitely. Source: pkg/mq/mqimpl/rocksmq/server/rocksmq_retention.go.
Stale consumer offset: updateAckedInfo returns an error if a consumer group is not registered, which the tests assert explicitly. Source: pkg/mq/mqimpl/rocksmq/server/rocksmq_impl_test.go.
Channel reassignment storms: the vchannel-fair policy recomputes on every snapshot; extreme skew in vchannel counts can trigger repeated reassignments. Source: internal/streamingcoord/server/balancer/policy/vchannelfair/pchannel_count_fair_test.go.

Query, Search Execution, and Streaming

Related topics: Milvus Overview and Distributed Architecture, Data Management, Storage Engine, and Indexing

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Topic Lifecycle

Continue reading this section for the full explanation and source context.

Section Produce / Consume

Continue reading this section for the full explanation and source context.

Section Retention and Cleanup

Continue reading this section for the full explanation and source context.

Query, Search Execution, and Streaming

Overview

In Milvus, query and search execution depends on a streaming substrate that carries data insertions, deletions, and control messages between Proxy, QueryNodes, DataNodes, and the StreamingCoord. The streaming layer uses physical channels (PChannels) and virtual channels (VChannels) to shard traffic, RocksMQ as the default embedded message queue, and a balancer that assigns PChannels to streaming nodes. The components covered in this page describe the queueing primitives, the retention policies, and the channel-balancing logic that together underpin every search and query path in Milvus. Source: pkg/mq/mqimpl/rocksmq/server/rocksmq_impl.go:1-30

Streaming Architecture and Channel Topology

The streaming coordinator maintains metadata about physical channels and the nodes that own them. A PChannelMeta exposes assignment state, history, and timing. Two helper methods describe the lifecycle of a channel:

Method	Purpose
`IsAssigned()`	Returns true when the channel is in the `PCHANNEL_META_STATE_ASSIGNED` state
`IsAssignedOrAssigning()`	Returns true for both `ASSIGNED` and `ASSIGNING` states
`LastAssignTimestamp()`	Returns the last assignment time as a `time.Time`
`CopyForWrite()`	Produces a mutable copy of the channel for safe mutation

Source: internal/streamingcoord/server/balancer/channel/pchannel.go:1-50

The vchannelfair balancer builds an AssignmentLayout snapshot that includes the set of channels, the node status map, an ExpectedAccessMode map, and statistics for each channel (vchannel counts). A test helper newLayout populates the layout with StreamingNodeInfo, PChannelInfo, and per-channel PChannelStatsView so that fairness algorithms can reason about the distribution. Source: internal/streamingcoord/server/balancer/policy/vchannelfair/pchannel_count_fair_test.go:1-60

flowchart LR
    Proxy[Proxy / SDK] -->|writes| PChannel1[PChannel 1]
    Proxy -->|writes| PChannel2[PChannel 2]
    PChannel1 --> RMQ[RocksMQ Broker]
    PChannel2 --> RMQ
    RMQ --> QN1[QueryNode 1]
    RMQ --> QN2[QueryNode 2]
    SC[StreamingCoord] -->|balances| QN1
    SC -->|balances| QN2
    QN1 --> Search[Search / Query Execution]
    QN2 --> Search

The diagram above shows how Proxy writes flow into PChannels, are persisted by RocksMQ, are assigned to specific streaming nodes by the coordinator, and are finally consumed by the query layer that performs vector and scalar search. The 2.6 release line continues to ship additions such as EmbeddingList / MAX_SIM search and struct-array vector sub-fields that ride on this same channel infrastructure. Source: client/v2.6.4 release notes

RocksMQ: The Embedded Message Queue

RocksMQ is a RocksDB-backed message queue that Milvus uses by default when an external broker (such as Kafka or Pulsar) is not configured. It implements the Producer and Consumer interfaces for streaming nodes. The implementation supports topic creation and destruction, message production, consumer-group management, and retention.

Topic Lifecycle

CreateTopic refuses names containing /, because the keyspace encodes the topic name into RocksDB keys. The function checks for an existing topic ID, registers a per-topic sync.Mutex in the global topicMu map, and stores consumer-tracking data structures. Source: pkg/mq/mqimpl/rocksmq/server/rocksmq_impl.go:CreateTopic

DestroyTopic acquires the per-topic mutex, removes consumer state from rmq.consumers, clears topicName2LatestMsgID, and issues prefix deletions for the topic data, page-size metadata, page-timestamp metadata, and acknowledged-timestamp metadata. Source: pkg/mq/mqimpl/rocksmq/server/rocksmq_impl.go:DestroyTopic

Produce / Consume

Produce validates input, locks the topic mutex, allocates a contiguous message-ID range via allocMsgID, builds a gorocksdb.WriteBatch, and updates per-page message-size and timestamp metadata in a single MultiSave call. The page boundaries are computed using the configured PageSize. Source: pkg/mq/mqimpl/rocksmq/server/rocksmq_impl.go:Produce

consumerList tracks active consumers by group name and provides Add, Remove, and Get operations guarded by an RWMutex. Source: pkg/mq/mqimpl/rocksmq/server/rocksmq_impl.go:consumerList

Retention and Cleanup

Retention is driven by RetentionSizeInMB and RetentionTimeInMinutes parameters. checkRetention returns true when at least one of them is set to a value other than -1. Source: pkg/mq/mqimpl/rocksmq/server/rocksmq_impl.go:checkRetention

The retention engine in pkg/mq/mqimpl/rocksmq/server/rocksmq_retention.go iterates page metadata with a bounded upper bound, checks whether each page has been acknowledged, and either deletes the page range (using DeleteRange on RocksDB) or stops iteration. Time-based and size-based expiration use msgTimeExpiredCheck and msgSizeExpiredCheck helpers and accumulate deletedAckedSize and pageCleaned counters. The write batch removes the acked-timestamp range startID, endID+1) in addition to the message range. Source: [pkg/mq/mqimpl/rocksmq/server/rocksmq_retention.go:1-120

The retention test exercises the path with 100,000 messages, registers a consumer, drains all messages, and confirms that subsequent Produce calls are not cleaned up. Source: pkg/mq/mqimpl/rocksmq/server/rocksmq_retention_test.go:1-80

Test Doubles

A MockRocksMQ generated by mockery is exposed in pkg/mq/mqimpl/rocksmq/server/mock_rocksmq.go to allow other packages to stub the queue. It supports the standard On(...).Return(...) pattern for Produce and other methods. This decoupling lets the search/query code path be exercised without a live RocksDB instance. Source: pkg/mq/mqimpl/rocksmq/server/mock_rocksmq.go:Produce

Channel Balancing for Query Workload

The vchannelfair balancer treats the layout as a set of ChannelID → PChannelInfoAssigned mappings plus per-channel stats. Test code in pchannel_count_fair_test.go populates the ChannelsToNodes and Stats fields and verifies that the assignment snapshot preserves them across clone operations. The assignmentSnapshot carries Assignments, and the balancer attempts to spread PChannels across streaming nodes so that the number of vchannels per node remains balanced. Source: internal/streamingcoord/server/balancer/policy/vchannelfair/pchannel_count_fair_test.go:TestAssignmentClone

Common Failure Modes and Considerations

Topic name with /: CreateTopic rejects it as Unrecoverable, because RocksDB key construction (constructKey) does not escape the separator. Source: pkg/mq/mqimpl/rocksmq/server/rocksmq_impl.go:constructKey
Page size overflow: When the next message would exceed PageSize, a new page boundary is recorded and a fresh accumulator starts. Misconfiguration of PageSize can lead to many small pages and increased retention pressure. Source: pkg/mq/mqimpl/rocksmq/server/rocksmq_impl.go:Produce
Retention disabled: RetentionSizeInMB = -1 and RetentionTimeInMinutes = -1 disable expiration, so operators must be careful not to fill the disk. Source: pkg/mq/mqimpl/rocksmq/server/rocksmq_impl.go:checkRetention
Stale consumer acks: If a consumer group is destroyed but the channel still has acked-timestamp entries, updateAckedInfo returns an error, preventing the retention loop from advancing past that page. Source: pkg/mq/mqimpl/rocksmq/server/rocksmq_impl_test.go:updateAckedInfo
Schema evolution: Long-standing community request #20405 asks for modifying the collection schema after creation; the streaming layer must mediate the re-shard, which is why channel assignment and CopyForWrite exist. Source: issue #20405
Backup and restore: Issue #9685 highlights that streaming data is the central state that any backup must capture consistently with the segment state on QueryNodes. Source: issue #9685

Client SDKs, Deployment, and Common Operational Issues

Related topics: Milvus Overview and Distributed Architecture, Data Management, Storage Engine, and Indexing, Query, Search Execution, and Streaming

Section Related Pages

Continue reading this section for the full explanation and source context.

Client SDKs, Deployment, and Common Operational Issues

Overview

Milvus is a cloud-native vector database that ships a multi-language client SDK family (Python pymilvus, Go client/v2, Java, and Node.js) alongside a server composed of stateless proxies, worker nodes, a streaming coordinator, and a pluggable message-queue backend. As of the v2.6.13–v2.6.18 and v3.0-beta release lines, the server and SDKs evolve on a coordinated cadence — each Milvus release pins a specific SDK version per language to keep wire-protocol features aligned (see the version matrices in the milvus-2.6.17 and milvus-2.6.16 release notes). The latest tagged Milvus server release in the community context is milvus-2.6.18 (release), and the most recent dedicated client release is client/v2.6.5 (release).

This page synthesizes what the in-tree source code reveals about deployment topology, the message-queue backend, channel balancing, and the developer workflow tooling, and cross-references the most upvoted community issues that surface as common operational pain points.

Deployment Topology and Message-Queue Backends

Milvus supports both standalone and cluster deployments. The cluster mode is organized around the streamingcoord component, which owns the lifecycle of physical channels (PChannel) and the assignment of virtual channels (VChannel) to streaming nodes. The VChannel-fair balancer keeps track of how many VChannels each node currently serves and computes a GlobalUnbalancedScore to drive reassignment decisions. Source: internal/streamingcoord/server/balancer/policy/vchannelfair/expected_layout.go:1-40.

A PChannelMeta carries the assignment state (ASSIGNED / ASSIGNING) and the full history of which node owned the channel in past terms. Source: internal/streamingcoord/server/balancer/channel/pchannel.go:30-60. This is the data structure the balancer reads and writes to recover layout after a restart — see the recovery test for proxy session discovery in internal/streamingcoord/server/balancer/balancer_test.go:30-60.

The hot path of data ingestion flows through the embedded message queue. Milvus ships with RocksMQ, a RocksDB-backed implementation that does not require an external Kafka/Pulsar cluster. Key operational parameters live in the param table:

flowchart LR
  SDK[Client SDK] -->|gRPC| P[Proxy]
  P --> MQ{RocksMQ / Kafka / Pulsar}
  MQ --> W[Worker / StreamingNode]
  W --> S[Storage<br/>MinIO/S3/OSS]
  W -->|results| P --> SDK
  SC[Streaming Coordinator] -. assigns .-> W
  SC -. owns .-> MQ

RocksMQ enforces per-topic page-level retention. Retention is controlled by RocksmqCfg.RetentionSizeInMB and RocksmqCfg.RetentionTimeInMinutes; if either is set away from -1, the retention worker wakes up and trims acked pages. Source: pkg/mq/mqimpl/rocksmq/server/rocksmq_impl.go:60-75. The retention sweeper iterates pages in order, computing curDeleteSize and breaking out of the loop as soon as it hits a page that is either un-acked or not yet expired, which is what guarantees monotonic progress without re-scanning the whole topic. Source: pkg/mq/mqimpl/rocksmq/server/rocksmq_retention.go:1-120.

MQ Backend	External Dependency	Typical Use
RocksMQ	None (embedded)	Standalone / small cluster, dev & tests
Kafka	Kafka cluster	Production, large fan-out, DLQ workflows
Pulsar	Pulsar cluster	Production with BookKeeper tiered storage

The retention test suite documents the expected operator-visible behavior: after a Produce of 100k messages and a Consume of all of them followed by a ForceSeek back to the earliest id, the page must still be servable because retention respects ack state. Source: pkg/mq/mqimpl/rocksmq/server/rocksmq_retention_test.go:1-90.

Common Operational Issues From Community Discussions

Several recurring issues surface in the issue tracker that operators should plan around:

Schema mutation on populated collections — issue #20405 (15 comments) requests the ability to add fields to a non-empty collection. Until a built-in AddCollectionField RPC is fully general, the client/v2.6.5 release notes show that vector fields added to existing collections are validated to be nullable before the request is sent (release notes). Plan schema changes at design time, and make any post-hoc vector columns nullable.
String primary keys — issue #1924 (17 comments) is still one of the most-engaged legacy requests. Some SDK helpers (e.g. ARRAY_APPEND / ARRAY_REMOVE upsert helpers) now ship in client/v2.6.5 for string-keyed workloads.
Backup and restore — issue #9685 drives the Milvus Backup companion project. Verify that any backup tool you adopt supports the MilvusVersion of your cluster; the v2.6 line supports client/v2.6.5 Go SDK features.
ASAN heap-buffer-overflow in nullable StructArray with element_filter — issue #50333 is a known correctness bug in master at commit b32b0ad8be; affected deployments should pin to a tagged release (e.g. milvus-2.6.18) and avoid combining dynamic field addition with element_filter queries on nullable struct arrays.
ScaNN index support — issue #2771 (21 comments) remains a long-standing feature request; native ScaNN is not part of the supported index types listed in the SDK release matrices.
String scalar field type — issue #4430 (15 comments) overlaps with the VARCHAR support that already exists for keys; if a non-key VARCHAR field is required, validate against your target server version.

Developer Workflow: `mgit.py`

The repository ships a Python helper under tools/ that wraps the commit and PR workflow in a single command. It enforces Milvus conventions: branch naming ({type}/{description}-{timestamp}), commit-message prefixes (feat:, fix:, enhance:, refactor:, test:, docs:, chore:), DCO sign-off, and an optional AI-generated commit message (Gemini / Claude / OpenAI). Source: tools/README.md:1-80.

Typical usage:

# Stage and commit
python3 tools/mgit.py --commit

# Push and open a PR (with mandatory issue reference)
python3 tools/mgit.py --pr

# End-to-end: create branch, commit, push, open PR, cherry-pick to release branches
python3 tools/mgit.py --all

The tool refuses to commit on master, prompts before running make fmt / make static-check, and requires every PR body to include an issue: #NNNNN reference — for cherry-picks also a pr: #NNNNN line. Source: tools/README.md:80-200. If an AI key is not configured, the script silently falls back to manual message entry, so contributors can still participate.

Doramagic Pitfall Log

Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.

high Installation risk requires verification

May increase setup, validation, or first-run risk for the user.

medium Installation risk requires verification

Upgrade or migration may change expected behavior: client/v2.6.3

medium Installation risk requires verification

May increase setup, validation, or first-run risk for the user.

medium Installation risk requires verification

May increase setup, validation, or first-run risk for the user.

Doramagic Pitfall Log

Found 22 structured pitfall item(s), including 1 high/blocking item(s). Top priority: Installation risk - Installation risk requires verification.

1. Installation risk: Installation risk requires verification

Severity: high
Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: community_evidence:github | https://github.com/milvus-io/milvus/issues/50333

2. Installation risk: Installation risk requires verification

Severity: medium
Finding: Developers should check this installation risk before relying on the project: client/v2.6.3
User impact: Upgrade or migration may change expected behavior: client/v2.6.3
Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: client/v2.6.3. Context: Observed during version upgrade or migration.
Evidence: failure_mode_cluster:github_release | https://github.com/milvus-io/milvus/releases/tag/client/v2.6.3

3. Installation risk: Installation risk requires verification

Severity: medium
Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: community_evidence:github | https://github.com/milvus-io/milvus/issues/50282

4. Installation risk: Installation risk requires verification

Severity: medium
Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: community_evidence:github | https://github.com/milvus-io/milvus/issues/50384

5. Configuration risk: Configuration risk requires verification

Severity: medium
Finding: Developers should check this configuration risk before relying on the project: client/v2.6.4
User impact: Upgrade or migration may change expected behavior: client/v2.6.4
Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: client/v2.6.4. Context: Source discussion did not expose a precise runtime context.
Evidence: failure_mode_cluster:github_release | https://github.com/milvus-io/milvus/releases/tag/client/v2.6.4

6. Configuration risk: Configuration risk requires verification

Severity: medium
Finding: Developers should check this configuration risk before relying on the project: milvus-2.6.13
User impact: Upgrade or migration may change expected behavior: milvus-2.6.13
Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: milvus-2.6.13. Context: Observed when using node, python
Evidence: failure_mode_cluster:github_release | https://github.com/milvus-io/milvus/releases/tag/v2.6.13

7. Configuration risk: Configuration risk requires verification

Severity: medium
Finding: Developers should check this configuration risk before relying on the project: milvus-2.6.14
User impact: Upgrade or migration may change expected behavior: milvus-2.6.14
Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: milvus-2.6.14. Context: Observed when using node, python
Evidence: failure_mode_cluster:github_release | https://github.com/milvus-io/milvus/releases/tag/v2.6.14

8. Configuration risk: Configuration risk requires verification

Severity: medium
Finding: Developers should check this configuration risk before relying on the project: milvus-2.6.15
User impact: Upgrade or migration may change expected behavior: milvus-2.6.15
Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: milvus-2.6.15. Context: Observed when using node, python, cuda
Evidence: failure_mode_cluster:github_release | https://github.com/milvus-io/milvus/releases/tag/v2.6.15

9. Configuration risk: Configuration risk requires verification

Severity: medium
Finding: Developers should check this configuration risk before relying on the project: milvus-2.6.16
User impact: Upgrade or migration may change expected behavior: milvus-2.6.16
Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: milvus-2.6.16. Context: Observed when using node, python
Evidence: failure_mode_cluster:github_release | https://github.com/milvus-io/milvus/releases/tag/v2.6.16

10. Configuration risk: Configuration risk requires verification

Severity: medium
Finding: Developers should check this configuration risk before relying on the project: milvus-2.6.17
User impact: Upgrade or migration may change expected behavior: milvus-2.6.17
Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: milvus-2.6.17. Context: Observed when using node, python
Evidence: failure_mode_cluster:github_release | https://github.com/milvus-io/milvus/releases/tag/v2.6.17

11. Configuration risk: Configuration risk requires verification

Severity: medium
Finding: Project evidence flags a configuration risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: community_evidence:github | https://github.com/milvus-io/milvus/issues/50452

12. Capability evidence risk: Capability evidence risk requires verification

Severity: medium
Finding: README/documentation is current enough for a first validation pass.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: capability.assumptions | github_repo:208728772 | https://github.com/milvus-io/milvus

Source: Doramagic discovery, validation, and Project Pack records

Community Discussion Evidence

These external discussion links are review inputs, not standalone proof that the project is production-ready.

Sources 12

Count of project-level external discussion links exposed on this manual page.

Use Review before install

Open the linked issues or discussions before treating the pack as ready for your environment.

Community Discussion Evidence

Doramagic exposes project-level community discussion separately from official documentation. Review these links before using milvus with real data or production workflows.

[[Bug]: Go SDK e2e-arm intermittently fails TestHybridSearchMultiVectorsP](https://github.com/milvus-io/milvus/issues/50384) - github / github_issue
[[Bug]: import with storage v3 is much slower than with storage v3](https://github.com/milvus-io/milvus/issues/50452) - github / github_issue
[[Bug]: DiskANN stream load misses nullable vector valid_data sidecar](https://github.com/milvus-io/milvus/issues/50282) - github / github_issue
[[Bug]: ASAN heap-buffer-overflow when querying a dynamically added nulla](https://github.com/milvus-io/milvus/issues/50333) - github / github_issue
milvus-2.6.18 - github / github_release
client/v2.6.5 - github / github_release
milvus-2.6.17 - github / github_release
milvus-2.6.16 - github / github_release
milvus-3.0.0-beta - github / github_release
client/v2.6.4 - github / github_release
milvus-2.6.15 - github / github_release
Installation risk requires verification - GitHub / issue

Source: Project Pack community evidence and pitfall evidence

milvus

Milvus Overview and Distributed Architecture

Related Pages

Milvus Overview and Distributed Architecture

Purpose and Scope

Core Distributed Components

Streaming Coordination and the WAL

Proxy and ShardClient

Observability, Tooling, and Community

See Also

Data Management, Storage Engine, and Indexing

Related Pages

Data Management, Storage Engine, and Indexing

Overview

Message Queue Storage: RocksMQ

Topic Lifecycle

Message Production and Page Tracking

Consumer Groups

Data Retention

Streaming Channel Coordination

Development Tooling and Community Context

Common Failure Modes

See Also

Query, Search Execution, and Streaming

Related Pages

Query, Search Execution, and Streaming

Overview

Streaming Architecture and Channel Topology

RocksMQ: The Embedded Message Queue

Topic Lifecycle

Produce / Consume

Retention and Cleanup

Test Doubles

Channel Balancing for Query Workload

Common Failure Modes and Considerations

See Also

Client SDKs, Deployment, and Common Operational Issues

Related Pages

Client SDKs, Deployment, and Common Operational Issues

Overview

Deployment Topology and Message-Queue Backends

Common Operational Issues From Community Discussions

Developer Workflow: `mgit.py`

See Also

Doramagic Pitfall Log

Doramagic Pitfall Log

1. Installation risk: Installation risk requires verification

2. Installation risk: Installation risk requires verification

3. Installation risk: Installation risk requires verification

4. Installation risk: Installation risk requires verification

5. Configuration risk: Configuration risk requires verification

6. Configuration risk: Configuration risk requires verification

7. Configuration risk: Configuration risk requires verification

8. Configuration risk: Configuration risk requires verification

9. Configuration risk: Configuration risk requires verification

10. Configuration risk: Configuration risk requires verification

11. Configuration risk: Configuration risk requires verification

12. Capability evidence risk: Capability evidence risk requires verification

Community Discussion Evidence

Community Discussion Evidence