Model Serving Runtime
Public
vLLM Inference Serving Pack
vLLM capability pack for high-throughput LLM inference serving, API deployment checks, memory budgets, and rollback boundaries.
Tag preview
1 preview project.
vLLM capability pack for high-throughput LLM inference serving, API deployment checks, memory budgets, and rollback boundaries.