# https://github.com/las7/TakoVM Project Manual

Generated at: 2026-06-07 04:26:08 UTC

## Table of Contents

- [System Architecture & Core Components](#page-1)
- [Security Model, Sandboxing & Mitigations](#page-2)
- [Deployment, Scaling & Operations](#page-3)
- [SDK, CLI & Job Management](#page-4)

<a id='page-1'></a>

## System Architecture & Core Components

### Related Pages

Related topics: [Security Model, Sandboxing & Mitigations](#page-2), [Deployment, Scaling & Operations](#page-3), [SDK, CLI & Job Management](#page-4)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [tako_vm/__init__.py](https://github.com/las7/TakoVM/blob/main/tako_vm/__init__.py)
- [tako_vm/models.py](https://github.com/las7/TakoVM/blob/main/tako_vm/models.py)
- [tako_vm/storage.py](https://github.com/las7/TakoVM/blob/main/tako_vm/storage.py)
- [tako_vm/job_types.py](https://github.com/las7/TakoVM/blob/main/tako_vm/job_types.py)
- [tako_vm/job_types.json](https://github.com/las7/TakoVM/blob/main/tako_vm/job_types.json)
- [tako_vm/version.py](https://github.com/las7/TakoVM/blob/main/tako_vm/version.py)
- [tako_vm/security.py](https://github.com/las7/TakoVM/blob/main/tako_vm/security.py)
- [tako_vm/config.py](https://github.com/las7/TakoVM/blob/main/tako_vm/config.py)
- [tako_vm/cli.py](https://github.com/las7/TakoVM/blob/main/tako_vm/cli.py)
- [tako_vm/execution/builder.py](https://github.com/las7/TakoVM/blob/main/tako_vm/execution/builder.py)
- [tako_vm/server/app.py](https://github.com/las7/TakoVM/blob/main/tako_vm/server/app.py)
- [README.md](https://github.com/las7/TakoVM/blob/main/README.md)
- [mkdocs.yml](https://github.com/las7/TakoVM/blob/main/mkdocs.yml)
</details>

# System Architecture & Core Components

Tako VM is a job-queue runtime that executes arbitrary Python code inside isolated Docker containers. The system is shipped as a single `tako_vm` package, exposed both as a Python SDK and as a FastAPI HTTP server, and backed by PostgreSQL for persistence (Source: [tako_vm/__init__.py:1-30](), [README.md:1-40]()). This page describes the runtime tiers and the core modules that compose the system, based directly on the source code.

## High-Level Architecture

The system is organized into three runtime tiers: an HTTP API, a persistence layer, and an execution layer that runs jobs in Docker containers.

```mermaid
flowchart LR
    Client["Client / Python SDK"] -->|"HTTP POST /jobs"| API["FastAPI Server<br/>(tako_vm/server/app.py)"]
    API -->|"persist + idempotency"| PG[("PostgreSQL<br/>(tako_vm/storage.py)")]
    API -->|"enqueue"| Queue["Job Queue"]
    Queue -->|"dequeue"| Worker["Worker Pool"]
    Worker -->|"docker run"| Container[("Job Container<br/>(job_types.json)")]
    Container -->|"stdout / stderr / result"| Worker
    Worker -->|"update ExecutionRecord"| PG
    API -->|"POST /build"| Builder["ContainerBuilder<br/>(tako_vm/execution/builder.py)"]
    Builder -->|"register JobVersion"| PG
```

The server is started by the `server` CLI subcommand and exposes a FastAPI application defined in `tako_vm/server/app.py`. Observed routes include job submission, status polling, dead-letter queue inspection, container build triggers, and pool statistics (Source: [tako_vm/server/app.py:1-120]()).

The CLI surface in `tako_vm/cli.py` is intentionally small: `server`, `dev up|down|status`, `status`, `validate`, `config`, `setup`, and `version` (Source: [tako_vm/cli.py:30-80]()). Community issues #30 and #37 flag that several docs reference a non-existent `tako-vm build job-type` subcommand and `python -m tako_vm.execution.builder` invocations; these are not in the actual parser and should not be used.

## Core Data Models

The canonical record returned from every execution is `ExecutionRecord`, a Pydantic `BaseModel` with `extra="forbid"`. It carries:

- `execution_id` — UUID generated on construction.
- `status` — one of `pending`, `queued`, `running`, `succeeded`, `failed`, `timeout`, `oom`, `cancelled`.
- `job_type` and `job_ref` — references into the job type registry (e.g., `svg-processing@sha256:a1b2c3d4`).
- Lifecycle timestamps: `created_at`, `queued_at`, `dequeued_at`, `started_at` (Source: [tako_vm/models.py:80-130]()).

`ResourceUsage` tracks `max_rss_mb`, `cpu_time_ms`, and `wall_time_ms` with hard upper bounds (e.g., wall time ≤ 24h) so that pathological values cannot overflow the schema (Source: [tako_vm/models.py:30-55]()). `JobVersion` records immutable build metadata: `digest`, `image_ref`, `dockerfile_hash`, `requirements_hash`, `built_at`, and `built_by` (Source: [tako_vm/models.py:130-170]()). All hash fields are validated to be either empty or 64-character lowercase hex (Source: [tako_vm/models.py:150-165]()).

Canonical JSON serialization is performed by `sha256_json`, which uses sorted keys and minimal separators so that digests are deterministic regardless of dict ordering in caller code (Source: [tako_vm/models.py:15-25]()).

Community note: Developers polling `/jobs/{id}` may be confused by the difference between `pending` (queue status) and `queued` (record status). These are deliberately separate lifecycle stages and are emitted by the queue/storage layer in that order — see issue #29.

## Storage and Persistence Layer

`ExecutionStorage` in `tako_vm/storage.py` uses `asyncpg` against PostgreSQL. The schema is created on startup and centers on two tables:

- `execution_records` — one row per submitted job, indexed by `status`, `job_type`, `created_at`, and composite `(status, job_type, created_at)`. Idempotency is enforced via a unique partial index on `idempotency_key` (Source: [tako_vm/storage.py:15-60]()).
- `job_versions` — keyed by `digest`, storing image references, build hashes, and provenance fields (Source: [tako_vm/storage.py:120-145]()).

The `__init__` re-exports the public surface: `ExecutionRecord`, `ResourceUsage`, `Artifact`, `JobVersion`, `TakoVMConfig`, the `JobTypeRegistry`, and exception classes (`TakoVMError`, `SDKExecutionError`, `ValidationError`) (Source: [tako_vm/__init__.py:1-30]()). Configuration is loaded from a YAML file with Pydantic-validated bounds — for example, `ContainerLimits` rejects `nofile_soft` outside `[64, 65536]` (Source: [tako_vm/config.py:1-60]()).

## Job Types, Versioning, and Security

### Job types

`JobType` is a dataclass describing a pre-configured container: `requirements`, `python_version`, `base_image`, `shared_code`, `environment`, `memory_limit`, `cpu_limit`, `timeout`, `startup_timeout`, `network_enabled`, `session_enabled`, and a nested `gpu` block (Source: [tako_vm/job_types.py:25-60]()). `JobTypeRegistry` loads defaults from `tako_vm/job_types.json`, which ships with `default`, `data-processing`, `ml-inference`, and `test-with-secrets` (Source: [tako_vm/job_types.json:1-100]()).

The `test-with-secrets` entry exists specifically to exercise the environment-variable sanitization path — values like `sk-secret-api-key-12345` and a `DATABASE_URL` containing a password demonstrate why environment exposure is treated as critical (issue #40). `ContainerBuilder.generate_dockerfile` validates the Python version, base image, requirements, and env keys/values before producing a Dockerfile (Source: [tako_vm/execution/builder.py:50-90]()).

### Versioning

`VersionManager.compute_digest` produces a content-addressed SHA256 over every field that affects the resulting image — requirements, Python version, base image, environment, shared code, and GPU configuration (Source: [tako_vm/version.py:35-60]()). The `JobVersion.full_ref` property formats references as `job_type@sha256:<short-digest>` (Source: [tako_vm/models.py:165-175]()). Versions are persisted via `register_version` and looked up by either full digest or short prefix (Source: [tako_vm/storage.py:120-180]()).

### Security

`tako_vm/security.py` defines `SANITIZE_PATTERNS` for error messages — masking temp directories, user homes, internal IPs, and 64-character hex container IDs — and output caps default to 64 KiB for both stdout and stderr (Source: [tako_vm/security.py:15-30]()). These utilities are wired into the build route so that build failures returned to the client cannot leak container internals (Source: [tako_vm/server/app.py:50-90]()).

The architecture shown above reflects what is implemented in the source. As flagged in issue #34, the deployment/scaling docs previously conflated planned features (container pooling, distributed workers, S3 storage) with shipped behavior; readers should treat the modules described on this page as the current, supported surface.

## See Also

- [REST API Reference](https://github.com/las7/TakoVM/blob/main/docs/api/rest.md)
- [Python SDK Reference](https://github.com/las7/TakoVM/blob/main/docs/api/sdk.md)
- [Guide: Async Jobs](https://github.com/las7/TakoVM/blob/main/docs/guide/async-jobs.md)
- [Security: Mitigations](https://github.com/las7/TakoVM/blob/main/docs/security/mitigations.md)
- [Configuration Reference](https://github.com/las7/TakoVM/blob/main/tako_vm.yaml.example)

---

<a id='page-2'></a>

## Security Model, Sandboxing & Mitigations

### Related Pages

Related topics: [System Architecture & Core Components](#page-1), [Deployment, Scaling & Operations](#page-3)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [tako_vm/security.py](https://github.com/las7/TakoVM/blob/main/tako_vm/security.py)
- [tako_vm/models.py](https://github.com/las7/TakoVM/blob/main/tako_vm/models.py)
- [tako_vm/config.py](https://github.com/las7/TakoVM/blob/main/tako_vm/config.py)
- [tako_vm/job_types.py](https://github.com/las7/TakoVM/blob/main/tako_vm/job_types.py)
- [tako_vm/execution/builder.py](https://github.com/las7/TakoVM/blob/main/tako_vm/execution/builder.py)
- [tako_vm/execution/worker.py](https://github.com/las7/TakoVM/blob/main/tako_vm/execution/worker.py)
- [tako_vm/server/limits.py](https://github.com/las7/TakoVM/blob/main/tako_vm/server/limits.py)
- [tako_vm/server/app.py](https://github.com/las7/TakoVM/blob/main/tako_vm/server/app.py)
- [tako_vm/version.py](https://github.com/las7/TakoVM/blob/main/tako_vm/version.py)
- [tako_vm/job_types.json](https://github.com/las7/TakoVM/blob/main/tako_vm/job_types.json)
- [README.md](https://github.com/las7/TakoVM/blob/main/README.md)
- [mkdocs.yml](https://github.com/las7/TakoVM/blob/main/mkdocs.yml)
</details>

# Security Model, Sandboxing & Mitigations

Tako VM is positioned as a runtime for executing untrusted Python (including AI-generated code) inside isolated Docker containers. The security model is a **defense-in-depth** stack: each request is gated by API-layer protections, validated against Pydantic-bounded configuration, executed inside a container with explicit resource limits, and post-processed through output sanitization before being returned or persisted.

This page describes what the source code actually implements today. Where a mitigation is tracked but not yet implemented, this is called out explicitly, because the community has flagged that the existing `docs/security/mitigations.md` reads as an issue tracker rather than documentation (see [issue #32](https://github.com/las7/TakoVM/issues/32)).

## High-Level Architecture

```mermaid
flowchart LR
    Client[Client / SDK] -->|HTTP + correlation id| RL[Rate Limit + Payload Size Middleware]
    RL --> API[FastAPI App / Endpoints]
    API --> Q[Queue / Worker Pool]
    Q --> W[Worker]
    W --> CB[Container Builder]
    CB --> D[(Docker Container)]
    D -->|stdout / stderr / exit_code| W
    W --> S[Security Sanitizers]
    S --> ST[(PostgreSQL ExecutionStore)]
    S --> Client
```

The worker hands a Docker image produced by the [ContainerBuilder](tako_vm/execution/builder.py) to a container, captures the result, and runs it through the sanitization utilities in [tako_vm/security.py](tako_vm/security.py:1) before persisting or returning it. Source: [tako_vm/server/app.py:1](tako_vm/server/app.py), [tako_vm/execution/worker.py:1](tako_vm/execution/worker.py).

## Sandboxing Layers

### Container Isolation

Every job runs inside a per-execution Docker container built from a `JobType` definition. A `JobType` ([tako_vm/job_types.py:1](tako_vm/job_types.py)) bundles:

| Field | Purpose | Default |
|-------|---------|---------|
| `base_image` | Custom image (validated by `validate_docker_image`) | `python:{version}-slim` |
| `requirements` | Pip packages installed in image | `[]` |
| `environment` | Env vars injected at build/run | `{}` |
| `memory_limit` | Container memory cap | `512m` |
| `cpu_limit` | Container CPU cap | `1.0` |
| `timeout` | Code execution timeout (s) | `30` |
| `network_enabled` | Network namespace toggle | `false` |
| `gpu.*` | GPU passthrough settings | disabled |

Source: [tako_vm/job_types.py:1](tako_vm/job_types.py), [tako_vm/execution/builder.py:1](tako_vm/execution/builder.py). The `default`, `data-processing`, and `ml-inference` job types in [tako_vm/job_types.json:1](tako_vm/job_types.json) all ship with `network_enabled: false`, which is the default posture for untrusted code.

### Resource Limits and Validation

Container-level knobs are backed by validated Pydantic models. `ContainerLimits` ([tako_vm/config.py:1](tako_vm/config.py)) enforces tight bounds — e.g. `nofile_soft` between 64 and 65536 — so a misconfiguration cannot accidentally disable isolation. Configuration is loaded from a YAML file with env-var overrides, and only files in the search path (`./tako_vm.yaml`, `config/tako_vm.yaml`, `~/.tako_vm/config.yaml`, `/etc/tako_vm/config.yaml`) are honored.

### Identity and Lineage

The `ExecutionRecord` model ([tako_vm/models.py:1](tako_vm/models.py)) carries `execution_id`, `parent_execution_id`, `relationship`, `idempotency_key`, and `idempotency_fingerprint`. The worker propagates these from incoming job data and stores `code_hash` and `input_hash` so each execution is independently reproducible and traceable. Source: [tako_vm/execution/worker.py:1](tako_vm/execution/worker.py).

## Output Protection and Sanitization

After a container finishes, the worker passes captured output through the helpers in [tako_vm/security.py:1](tako_vm/security.py). The module exposes two categories of protection:

### Output Capping

`DEFAULT_MAX_STDOUT_BYTES` and `DEFAULT_MAX_STDERR_BYTES` are both `65536` (64 KB) by default. Anything beyond the cap is truncated and the corresponding `stdout_truncated` / `stderr_truncated` boolean is flipped on the persisted record. This bounds the size of what an attacker can exfiltrate through `print(...)` calls. Source: [tako_vm/security.py:1](tako_vm/security.py), [tako_vm/storage.py:1](tako_vm/storage.py) (columns `stdout_truncated`, `stderr_truncated`).

### Pattern-Based Sanitization

`SANITIZE_PATTERNS` is a list of regex replacements applied to error messages and (where used) artifact text. Patterns cover temp directories, user home paths, container-internal paths, private IP ranges, and Docker container IDs. Examples:

```text
/tmp/job-AbCdEfGh   ->  /tmp/job-***
/Users/alice        ->  /home/***
/app/some/path      ->  /app/***
172.31.4.10         ->  172.***.***
a]f0e9d8c7b6... (64 hex)  ->  <container-id>
```

Source: [tako_vm/security.py:1](tako_vm/security.py). The intent is that even if a container internally references an absolute path, a private IP, or a 12-/64-character hex token, the value seen by the SDK caller is redacted.

## API-Layer Protections

The FastAPI server in [tako_vm/server/app.py](tako_vm/server/app.py) is wrapped by middleware defined in [tako_vm/server/limits.py](tako_vm/server/limits.py):

- **`FixedWindowRateLimiter`** — In-memory, per-key (typically client IP) fixed-window counter. Returns `(allowed, retry_after_seconds)`. OpenAPI and docs paths are exempt via `_RATE_LIMIT_EXEMPT_PATHS` and `_RATE_LIMIT_EXEMPT_PREFIXES`.
- **`PayloadTooLargeError`** — Raised when a request body exceeds the configured cap, enforced both via header (`Content-Length`) and via streaming body inspection. The middleware also assigns / propagates a correlation ID via `set_correlation_id` / `get_correlation_id` for log tracing.

These controls complement, not replace, container isolation: a flooded client cannot starve the queue, and an oversized body cannot be used to DoS the worker or blow up the PostgreSQL store. Source: [tako_vm/server/limits.py:1](tako_vm/server/limits.py).

## Known Gaps and Mitigations

A few risks are well-known to the community and are not yet fully closed by the code shown here:

- **Environment variable exposure.** The shipped `test-with-secrets` job type ([tako_vm/job_types.json](tako_vm/job_types.json)) demonstrates that `environment` values such as `API_KEY` and `DATABASE_URL` are set inside the container and are therefore readable from `/proc/self/environ` by any code running in it. The `SANITIZE_PATTERNS` in [tako_vm/security.py](tako_vm/security.py) redact some of these tokens from error messages, but they do **not** block programmatic reads inside the container. Community reports (issues [#40](https://github.com/las7/TakoVM/issues/40) and [#32](https://github.com/las7/TakoVM/issues/32)) flag this as **Priority: CRITICAL** and call for explicit warnings in the quickstart and a future mitigation plan.
- **Output capping is best-effort.** Truncation is done in the host worker, not the container itself, so a misbehaving job can still allocate memory up to its `memory_limit` before truncation. Tighten `memory_limit` for untrusted job types.
- **Idempotency vs. secrecy.** `idempotency_key` is stored as plain text in `execution_records` ([tako_vm/storage.py](tako_vm/storage.py)). Do not put secrets there.
- **Documented but not implemented.** `docs/security/mitigations.md` has been criticized for describing Phase 1–4 mitigations that are not in code yet (issue [#32](https://github.com/las7/TakoVM/issues/32)). Treat the security guide as a backlog, not a feature list.

The API reference ([docs/api/rest.md](https://las7.github.io/TakoVM/api/rest/)) and the [Configuration Reference](tako_vm.yaml.example) remain the authoritative source for what is actually enforced today.

## See Also

- [Architecture Overview](https://las7.github.io/TakoVM/architecture/)
- [REST API Reference](https://las7.github.io/TakoVM/api/rest/)
- [Python SDK Reference](https://las7.github.io/TakoVM/api/sdk/)
- [Configuration](https://las7.github.io/TakoVM/getting-started/configuration/)
- Community issue tracker: [las7/TakoVM issues](https://github.com/las7/TakoVM/issues)

---

<a id='page-3'></a>

## Deployment, Scaling & Operations

### Related Pages

Related topics: [System Architecture & Core Components](#page-1), [Security Model, Sandboxing & Mitigations](#page-2), [SDK, CLI & Job Management](#page-4)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [docker-compose.yaml](https://github.com/las7/TakoVM/blob/main/docker-compose.yaml)
- [docker/Dockerfile.server](https://github.com/las7/TakoVM/blob/main/docker/Dockerfile.server)
- [docker/Dockerfile.executor](https://github.com/las7/TakoVM/blob/main/docker/Dockerfile.executor)
- [docker/entrypoint.sh](https://github.com/las7/TakoVM/blob/main/docker/entrypoint.sh)
- [tako_vm.yaml.example](https://github.com/las7/TakoVM/blob/main/tako_vm.yaml.example)
- [tako_vm/config.py](https://github.com/las7/TakoVM/blob/main/tako_vm/config.py)
- [tako_vm/cli.py](https://github.com/las7/TakoVM/blob/main/tako_vm/cli.py)
- [tako_vm/server/app.py](https://github.com/las7/TakoVM/blob/main/tako_vm/server/app.py)
- [tako_vm/storage.py](https://github.com/las7/TakoVM/blob/main/tako_vm/storage.py)
- [tako_vm/security.py](https://github.com/las7/TakoVM/blob/main/tako_vm/security.py)
- [tako_vm/models.py](https://github.com/las7/TakoVM/blob/main/tako_vm/models.py)
- [README.md](https://github.com/las7/TakoVM/blob/main/README.md)
</details>

# Deployment, Scaling & Operations

Tako VM is delivered as a self-hosted, containerized job-queue and execution runtime. The system has two main container artifacts — the API/server image and the executor image — plus a PostgreSQL backing store. This page describes how the components are deployed, how configuration is loaded, what runtime telemetry endpoints exist for operations, and the scaling posture that is implemented today (versus features still marked as planned in related documentation).

## Deployment Topology

The production layout is two containers plus a database, started either via the `tako-vm` CLI in development mode or through `docker-compose` in production.

```mermaid
flowchart LR
    Client[Client / SDK] -->|HTTP| Server[Tako VM API Server<br/>docker/Dockerfile.server]
    Server -->|SQL| DB[(PostgreSQL<br/>execution_records, job_versions)]
    Server -->|docker run| Exec[Executor Container<br/>docker/Dockerfile.executor]
    Exec -->|stdout/stderr/artifacts| Server
    Server -->|persists record| DB
```

The `tako-vm dev up --with-server` command brings up local PostgreSQL and the API server together, while `tako-vm server` runs the API server only against an already-running database. Source: [tako_vm/cli.py:1-50](). The `tako-vm setup` command performs the executor-image pull and Docker daemon verification required before any job can be executed; it is the documented prerequisite for any host that will dispatch work. Source: [tako_vm/cli.py:51-130]()`.

The entrypoint script (`docker/entrypoint.sh`) is responsible for starting the Uvicorn process inside the server image, while the executor image (`docker/Dockerfile.executor`) is the immutable worker environment in which every job runs. Both images are versioned and tied to the package `__version__` in `tako_vm/__init__.py`.

## Configuration and Resource Limits

Configuration is loaded from a YAML file with optional environment-variable overrides and validated by Pydantic models in `tako_vm/config.py`. The search order for the config file is `tako_vm.yaml` (cwd) → `config/tako_vm.yaml` → `~/.tako_vm/config.yaml` → `/etc/tako_vm/config.yaml`. Source: [tako_vm/config.py:1-40]()`.

`ContainerLimits` enforces strict bounds on file-descriptor and process counts, rejecting values outside the accepted ranges (e.g. `nofile_soft` between 64 and 65536). Source: [tako_vm/config.py:40-80](). Per-job-type limits are declared on the `JobType` dataclass — `memory_limit`, `cpu_limit`, `timeout`, `startup_timeout`, plus GPU fields — and these flow into the generated Dockerfile and `docker run` invocation. Source: [tako_vm/job_types.py:1-100]().

Operators can validate any candidate configuration before applying it with `tako-vm validate <file>`, which surfaces schema errors rather than failing at runtime. Source: [tako_vm/cli.py:1-50]().

## Runtime Telemetry and Operations

Three operational endpoints are exposed on the API server, each implemented in `tako_vm/server/app.py`:

| Endpoint | Purpose | Source |
|----------|---------|--------|
| `GET /pool/stats` | Worker pool utilization and queue depth | [tako_vm/server/app.py:1-50]() |
| `GET /dlq/stats` | Dead-letter-queue counts grouped by `ErrorType` | [tako_vm/server/app.py:1-50]() |
| `GET /dlq` | Paginated DLQ entries for inspection/replay | [tako_vm/server/app.py:1-50]() |

Dead-letter-queue data is stored in the `execution_records` table together with `error_json`, `exit_code`, and `status` columns, so operators can query failures directly with SQL. Source: [tako_vm/storage.py:1-80]()`.

Health checks are surfaced by `tako-vm status`, which calls the server and reports reachability. Source: [tako_vm/cli.py:1-50](). Idempotency is enforced at the storage layer via a unique partial index on `idempotency_key`, so duplicate submissions collapse to a single record and the originating client receives the same `execution_id` on retry. Source: [tako_vm/storage.py:1-80]().

## Scaling Posture

What is implemented today:

- **Container-pool worker model** with adjustable concurrency, observable through `/pool/stats`.
- **Per-job-type resource partitioning** (CPU, memory, GPU) declared in `JobType` and enforced by Docker at run time. Source: [tako_vm/job_types.py:1-100]().
- **Persistent execution history** that supports replay because the record stores `code_hash`, `input_hash`, `params_hash`, and `input_artifacts_hash`. Source: [tako_vm/models.py:1-80]().
- **Content-addressed job-type versions** registered through `VersionManager.register_version`, producing digests in the form `job_type@sha256:<12hex>`. Source: [tako_vm/version.py:1-80]().

What is **not** implemented in the source tree at this revision and is therefore aspirational:

- A `tako-vm build job-type` CLI subcommand is referenced in some user-facing documents but does **not** exist in `tako_vm/cli.py`; container builds are performed programmatically via `ContainerBuilder` and the build endpoint in the API. Source: [tako_vm/cli.py:1-50](), [tako_vm/execution/builder.py:1-60]().
- `python -m tako_vm.execution.builder --init-defaults all` and `python -m tako_vm.container_builder --build-all` are likewise absent from the codebase and should not be used as operational commands.
- Distributed workers and S3-backed artifact storage are documented in the scaling guide as planned features; the storage layer is PostgreSQL-only. Source: [tako_vm/storage.py:1-80]().

Operators scaling beyond a single host should plan for vertical scaling of the API server and executor pool, and for sharding at the application layer, rather than depending on multi-node worker orchestration that is not yet present in the repository.

## Security Boundaries in Production

Every job runs in its own container with the network disabled by default (`network_enabled=False` on `JobType`), and an allowlist can be applied per job type when network access is required. Source: [tako_vm/job_types.py:1-100](). Output is capped to the `DEFAULT_MAX_STDOUT_BYTES` / `DEFAULT_MAX_STDERR_BYTES` limits in `tako_vm/security.py`, and error messages are scrubbed of host paths, container IDs, and internal IP addresses before being persisted in `error_json`. Source: [tako_vm/security.py:1-60]().

Operators must take care when populating `JobType.environment`: secret values placed there are visible to job code at runtime and have historically been readable through `/proc/self/environ` inside the container. Treat any value passed through this field as in-process and rotate it on a regular cadence.

## See Also

- [Architecture](architecture.md)
- [Production Setup](deployment/production.md)
- [Scaling](deployment/scaling.md)
- [Security](deployment/security.md)
- [Mitigations](security/mitigations.md)

---

<a id='page-4'></a>

## SDK, CLI & Job Management

### Related Pages

Related topics: [System Architecture & Core Components](#page-1), [Deployment, Scaling & Operations](#page-3)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [tako_vm/cli.py](https://github.com/las7/TakoVM/blob/main/tako_vm/cli.py)
- [tako_vm/sdk/__init__.py](https://github.com/las7/TakoVM/blob/main/tako_vm/sdk/__init__.py)
- [tako_vm/sdk/client.py](https://github.com/las7/TakoVM/blob/main/tako_vm/sdk/client.py)
- [tako_vm/job_types.py](https://github.com/las7/TakoVM/blob/main/tako_vm/job_types.py)
- [tako_vm/execution/builder.py](https://github.com/las7/TakoVM/blob/main/tako_vm/execution/builder.py)
- [tako_vm/execution/worker.py](https://github.com/las7/TakoVM/blob/main/tako_vm/execution/worker.py)
- [tako_vm/models.py](https://github.com/las7/TakoVM/blob/main/tako_vm/models.py)
- [tako_vm/version.py](https://github.com/las7/TakoVM/blob/main/tako_vm/version.py)
- [tako_vm/job_types.json](https://github.com/las7/TakoVM/blob/main/tako_vm/job_types.json)
- [tako_vm/config.py](https://github.com/las7/TakoVM/blob/main/tako_vm/config.py)
- [tako_vm/server/app.py](https://github.com/las7/TakoVM/blob/main/tako_vm/server/app.py)
- [README.md](https://github.com/las7/TakoVM/blob/main/README.md)
</details>

# SDK, CLI & Job Management

## Overview

Tako VM exposes three user-facing surfaces for submitting and managing jobs: the `tako-vm` command-line tool, the Python `tako_vm.sdk` client, and the REST API + worker pool that performs execution. Together they cover the full lifecycle: defining a job type, building its container, submitting code+data, monitoring status, and replaying past runs.

The CLI lives in [tako_vm/cli.py](https://github.com/las7/TakoVM/blob/main/tako_vm/cli.py) and is the primary administrative interface. The SDK in [tako_vm/sdk/__init__.py](https://github.com/las7/TakoVM/blob/main/tako_vm/sdk/__init__.py) re-exports `send`, `send_raw`, `configure`, `list_job_types`, and `get_job_type` from [tako_vm/sdk/client.py](https://github.com/las7/TakoVM/blob/main/tako_vm/sdk/client.py). Job types are defined as dataclasses in [tako_vm/job_types.py](https://github.com/las7/TakoVM/blob/main/tako_vm/job_types.py), with sensible defaults loaded from [tako_vm/job_types.json](https://github.com/las7/TakoVM/blob/main/tako_vm/job_types.json).

## Command-Line Interface (CLI)

The CLI provides subcommands for the full operator workflow. The argparse setup in [tako_vm/cli.py](https://github.com/las7/TakoVM/blob/main/tako_vm/cli.py) registers these subcommands:

| Subcommand | Purpose |
|------------|---------|
| `server` | Start the API server (optional `--port`) |
| `dev up/down/status` | Manage local PostgreSQL for development (`--with-server` starts API too) |
| `setup` | Pull the executor image and verify Docker |
| `config` | Show current config (`--json`, `--show-defaults`) |
| `validate [file]` | Validate a YAML config against the schema |
| `status` | Check server health |
| `version` | Print the installed Tako VM version |
| `--config <path>` | Global override for the config file path |

The `setup` subcommand checks Docker availability by running `docker info`, then pulls the default executor image as shown in [tako_vm/cli.py](https://github.com/las7/TakoVM/blob/main/tako_vm/cli.py). The default image reference is exposed via `tako_vm.constants.DEFAULT_IMAGE` and resolved to `ghcr.io/las7/takovm/executor:<version>` using `tako_vm.__version__`.

> **Community note (issue #30):** Documentation in `docs/architecture.md`, `docs/guide/custom-libraries.md`, and `docs/development/troubleshooting.md` references a `tako-vm build job-type <name>` subcommand that does not exist in [tako_vm/cli.py](https://github.com/las7/TakoVM/blob/main/tako_vm/cli.py). Container builds are instead triggered via `POST /job-types/{name}/build` on the REST API or programmatically via `ContainerBuilder` in [tako_vm/execution/builder.py](https://github.com/las7/TakoVM/blob/main/tako_vm/execution/builder.py). Similarly, issue #37 flags non-existent `python -m tako_vm.execution.builder` and `python -m tako_vm.container_builder` invocations referenced in deployment docs.

## Python SDK

The SDK is a thin typed wrapper that serializes a decorated function and ships it to the server. The public API is centralized in [tako_vm/sdk/__init__.py](https://github.com/las7/TakoVM/blob/main/tako_vm/sdk/__init__.py), exposing `TakoVM`, `ExecutionResult`, `ExecutionError`, `TakoVMError`, and `ValidationError`. Two calling styles are supported:

- **`send(func, *args, **kwargs)`** — decorator-style entry. The function body is serialized and executed remotely; the function does **not** run locally. Type hints drive argument serialization, not just type checking.
- **`send_raw(code, input_data, job_type=...)`** — explicit string payload, useful for dynamic code generation or when no Python client is available.

Discovery helpers `list_job_types()` and `get_job_type(name)` read the registry populated from the bundled [tako_vm/job_types.json](https://github.com/las7/TakoVM/blob/main/tako_vm/job_types.json), which ships `default`, `data-processing`, and `ml-inference` profiles out of the box.

> **Community note (issue #39):** The `send()` execution model is widely misunderstood — developers may assume the function executes locally and the SDK only forwards results. In fact, the body is serialized and run inside a Docker container on the worker, and type annotations are part of the wire protocol used to encode inputs. See [tako_vm/sdk/client.py](https://github.com/las7/TakoVM/blob/main/tako_vm/sdk/client.py) for the serialization contract.

```mermaid
flowchart LR
    A[Developer writes @send function] --> B[SDK serializes body + type hints]
    B --> C[POST /jobs to REST API]
    C --> D[ExecutionRecord status='queued']
    D --> E[Worker pulls job]
    E --> F[ContainerBuilder launches image]
    F --> G[Code runs in container]
    G --> H[Record transitions to 'completed' / 'failed']
    H --> I[SDK returns ExecutionResult]
```

## Job Types, Versioning & Job Management

A `JobType` ([tako_vm/job_types.py](https://github.com/las7/TakoVM/blob/main/tako_vm/job_types.py)) is a dataclass describing a reusable container profile: `requirements`, `python_version`, `base_image`, `shared_code`, `environment`, `memory_limit`, `cpu_limit`, `timeout`, `network_enabled`, `session_enabled`, and GPU options. Each profile becomes a Docker image built by `ContainerBuilder.build()` in [tako_vm/execution/builder.py](https://github.com/las7/TakoVM/blob/main/tako_vm/execution/builder.py), which validates the Python version and emits a Dockerfile.

When a job is submitted, [tako_vm/execution/worker.py](https://github.com/las7/TakoVM/blob/main/tako_vm/execution/worker.py) creates an `ExecutionRecord` ([tako_vm/models.py](https://github.com/las7/TakoVM/blob/main/tako_vm/models.py)) with `status="queued"`, hashes the code and inputs for replay support, and stores the original payload as internal artifacts. The `VersionManager` in [tako_vm/version.py](https://github.com/las7/TakoVM/blob/main/tako_vm/version.py) computes a content-based digest from the job type's requirements, Python version, base image, environment, shared code, and GPU settings, returning a `JobVersion` whose `full_ref` is formatted as `name@sha256:<12-char-digest>`.

> **Community note (issue #29):** Documentation inconsistently uses `pending` and `queued`. The authoritative source is the `ExecutionRecord.status` field, which begins life as `"queued"` when a job is enqueued in [tako_vm/execution/worker.py](https://github.com/las7/TakoVM/blob/main/tako_vm/execution/worker.py). Polling `/jobs/{id}` returns this status; `pending` is a queue-level concept, not a record-level status.

Operational endpoints in [tako_vm/server/app.py](https://github.com/las7/TakoVM/blob/main/tako_vm/server/app.py) expose `/pool/stats` for worker telemetry and `/dlq/stats` plus `/dlq` for the dead letter queue. These power the runbooks requested in issue #33 (circuit-breaker investigation, DLQ inspection, Docker daemon recovery). Configuration is loaded by [tako_vm/config.py](https://github.com/las7/TakoVM/blob/main/tako_vm/config.py) with bounds-checked Pydantic models and the documented search path `tako_vm.yaml → config/tako_vm.yaml → ~/.tako_vm/config.yaml → /etc/tako_vm/config.yaml`.

> **Security note (issue #40):** Environment variables passed to a job are visible inside the container at `/proc/self/environ` unless explicit mitigations are in place. The `security` module ([tako_vm/security.py](https://github.com/las7/TakoVM/blob/main/tako_vm/security.py)) ships `validate_env_key` and `validate_env_value` helpers; never pass long-lived secrets as job environment variables — use a secret manager and inject them server-side.

## See Also

- [REST API Reference](rest-api.md)
- [Configuration Reference](configuration.md)
- [Security Model & Mitigations](security.md)
- [Container Builder & Job Types](container-builder.md)

---

<!-- evidence_pipeline_checked: true -->
<!-- evidence_injected: true -->

---

## Pitfall Log

Project: las7/takovm

Summary: Found 18 structured pitfall item(s), including 0 high/blocking item(s). Top priority: Installation risk - Installation risk requires verification.

## 1. Installation risk - Installation risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Suggested check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | cevd_78a96aee28384d93b96002b77e6027ea | https://github.com/las7/TakoVM/issues/38

## 2. Installation risk - Installation risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Suggested check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | cevd_1702731d27b44ab483e81e1e39003db1 | https://github.com/las7/TakoVM/issues/33

## 3. Installation risk - Installation risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Suggested check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | cevd_31b37ccb63c64ad89eb926be535f46d5 | https://github.com/las7/TakoVM/issues/31

## 4. Installation risk - Installation risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Suggested check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | cevd_f10bb15a58f541479f103040c56f2845 | https://github.com/las7/TakoVM/issues/37

## 5. Installation risk - Installation risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Suggested check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | cevd_a81c31adb886463ba3fe59c976ca6534 | https://github.com/las7/TakoVM/issues/30

## 6. Capability evidence risk - Capability evidence risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a capability evidence risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Suggested check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | cevd_81231681c1d64897a42c17e87568c065 | https://github.com/las7/TakoVM/issues/34

## 7. Capability evidence risk - Capability evidence risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: README/documentation is current enough for a first validation pass.
- User impact: May increase setup, validation, or first-run risk for the user.
- Suggested check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: capability.assumptions | hn_item:48431257 | https://news.ycombinator.com/item?id=48431257

## 8. Runtime risk - Runtime risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a runtime risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Suggested check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | cevd_14c9181db25446ecaa58b6cfef944ab4 | https://github.com/las7/TakoVM/issues/39

## 9. Runtime risk - Runtime risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a runtime risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Suggested check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | cevd_726651d2543a41fcb2b12eacce59f8eb | https://github.com/las7/TakoVM/issues/29

## 10. Maintenance risk - Maintenance risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Suggested check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: evidence.maintainer_signals | hn_item:48431257 | https://news.ycombinator.com/item?id=48431257

## 11. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Suggested check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: downstream_validation.risk_items | hn_item:48431257 | https://news.ycombinator.com/item?id=48431257

## 12. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Suggested check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: risks.scoring_risks | hn_item:48431257 | https://news.ycombinator.com/item?id=48431257

## 13. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Suggested check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | cevd_8dcc73f80c184253a4646f4293c8e38c | https://github.com/las7/TakoVM/issues/15

## 14. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Suggested check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | cevd_17ba6c128eaa4f4996dcd06c7d309f2b | https://github.com/las7/TakoVM/issues/14

## 15. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Suggested check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | cevd_169ec35f009e4f86a8b322531f643e96 | https://github.com/las7/TakoVM/issues/32

## 16. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Suggested check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | cevd_e70e50033ee2425c80d460b507c69efd | https://github.com/las7/TakoVM/issues/40

## 17. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: issue_or_pr_quality=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Suggested check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: evidence.maintainer_signals | hn_item:48431257 | https://news.ycombinator.com/item?id=48431257

## 18. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: release_recency=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Suggested check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: evidence.maintainer_signals | hn_item:48431257 | https://news.ycombinator.com/item?id=48431257

<!-- canonical_name: las7/takovm; human_manual_source: deepwiki_human_wiki -->
